You are not logged in.

#1 2016-03-18 12:10:54

Alad
Wiki Admin/IRC Op
From: Bagelstan
Registered: 2014-05-04
Posts: 2,412
Website

[SOLVED] Caching packages.gz with curl

I have a script which downloads packages.gz from aur.archlinux.org, to run grep -P on it and query the results via AurJson. I'd like to cache packages.gz, as downloading a new one for every search seems exaggerated, and AFAIK, it is only updated every few hours anyway.

I've tried using curl -z, however it always downloads the complete file:

curl https://aur.archlinux.org/packages.gz -o /home/archie/.cache/aursearch/packages.gz -z /home/archie/.cache/aursearch/packages.gz --create-dirs
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100  191k    0  191k    0     0   868k      0 --:--:-- --:--:-- --:--:--  869k

The header looks like this:

HTTP/1.1 200 OK
Server: nginx/1.8.1
Date: Fri, 18 Mar 2016 12:05:57 GMT
Content-Type: text/plain;charset=UTF-8
Connection: keep-alive
Cache-Control: no-cache, must-revalidate
Expires: Tue, 11 Oct 1988 22:00:00 GMT
Pragma: no-cache
Content-Encoding: gzip
Strict-Transport-Security: max-age=16070400

So there's no "Last modified" field. How would I best proceed in this case?

Last edited by Alad (2016-03-18 12:50:22)


Mods are just community members who have the occasionally necessary option to move threads around and edit posts. -- Trilby

Offline

#2 2016-03-18 12:32:12

x33a
Forum Fellow
Registered: 2009-08-15
Posts: 4,587

Re: [SOLVED] Caching packages.gz with curl

From what I can tell, since the server is explicitly forbidding caching, there is nothing you can do. Here's a related topic: https://stackoverflow.com/questions/173 … p-response

One thing you can do is to write a wrapper script which checks if the file is present locally and only download a new one when the mtime value exceeds a value specified by you.

Offline

#3 2016-03-18 12:49:44

Alad
Wiki Admin/IRC Op
From: Bagelstan
Registered: 2014-05-04
Posts: 2,412
Website

Re: [SOLVED] Caching packages.gz with curl

Thanks, I've done something similar, but with a pipe hack to close stdin after one line:

if stamp_l=$(stat --format '%Y' packages.gz 2>/dev/null); then
    # aurweb lacks a "Last-Modified" field, so use head to close stdin
    # after reading one line.
    stamp_r=$(curl -s "$aurweb"/packages.gz | zcat | head -1 | awk -F, '{print $3}')
    stamp_r=$(date -d "$stamp_r" '+%s')

    if ((stamp_r > stamp_l)); then
	fetch
    else
	msg2 "packages.gz does not need updating"
    fi
else
    fetch
fi

The Unix magic wink

Last edited by Alad (2016-03-18 12:53:57)


Mods are just community members who have the occasionally necessary option to move threads around and edit posts. -- Trilby

Offline

Board footer

Powered by FluxBB