You are not logged in.

#1 2007-07-05 23:42:15

FurryNeko
Member
Registered: 2006-10-31
Posts: 6

Pacman transparent HTTP cache

Hey all,

I'm a fairly new Arch user, and to be honest, a fairly new Linux user in general. Over the years I tried so many times to make the transition from Windows to Linux and after finding Arch, I felt it was finally time, but I digress...

I now run 2 Arch machines here and one Arch virtual machine, but I found myself with a little problem. Having 3 machines meant that it required either 3 times the bandwidth to keep them updated, or a lot of hassle with copying packages around the network. I even experimented with using the Pacman cache directory on an NFS share, but none of these were acceptable to me.

So this afternoon I sat down and coded a solution I was happy with.

I run a small Linux server here, Debian based, which serves as a small file server to the local network and also hosts a few services for myself and a friend. This server also runs Lighttpd which I use for developing with PHP and Perl.

The idea was that this machine would be a mirror for Arch to update from, but I didn't want to mirror everything, just those packages which I used. After much searching through Google I discovered someone who had done something similar for Debian based distributions, apt-cache. It's essentially a small web-server which, when queried for a package, first checks it's local cache, and if it doesn't find the file, it downloads it from an official mirror, both storing it locally and sending it to the client.

I've never coded in Java personally and I didn't want to have 2 web-servers running when one would suffice, so I set about coding something similar in PHP.

The end result is 130 lines of code, and a url.rewrite rule, which achieves exactly what I was after. It works like this:

1) Pacman requests a file from the local server
2) Local server checks to see if it has the file

3a) Local server cannot find the file so it requests it from an Arch mirror
4a) The file is simultaneously downloaded, written to disk and sent to Pacman.

3b) Local server has the file
4b) The file is sent to Pacman

The end result is a transparent cache which will only have to download the file once. An example of the speed increase is as follows:

# pacman -S --downloadonly kernel26
kernel26                21.7M        806.4K/s    00:00:28

# rm /var/cache/pacman/pkg/kernel26-2.6.21.5-1.pkg.tar.gz

# pacman -S --downloadonly kernel26
kernel26                21.7M        8.5M/s      00:00:03

As you can see, after the package was cached on the server, it didn't need re-downloading and as such it transferred to the local machine at LAN speeds.

My mirror entries for the repositories looks like this:

Server = http://192.168.0.1/pacman-cache/pkg/current/os/i686/
Server = http://192.168.0.1/pacman-cache/pkg/extra/os/i686/
etc....

So, my question is this; would anyone out there be interested in the code? Right now it still needs a lot of work before it could be made public as there's very little error checking; I need to handle unexpected conditions like a broken download, and I also have to add handling to deal with the db.tar.gz files being updated, but as and when I feel it's ready would anyone use it?

I'd appreciate any input anyone felt like sharing, even feature requests =^.^=

PS: I hope this is in the right sub-forum... I didn't think it belonged in the actual Pacman forum, but if it did, apologies!

Offline

#2 2007-07-06 00:05:35

cactus
Taco Eater
From: t͈̫̹ͨa͖͕͎̱͈ͨ͆ć̥̖̝o̫̫̼s͈̭̱̞͍̃!̰
Registered: 2004-05-25
Posts: 4,622
Website

Re: Pacman transparent HTTP cache

No worries about the location.
Moving it to "community contributions"..which seems like a good place for it.

Neat idea too. smile


"Be conservative in what you send; be liberal in what you accept." -- Postel's Law
"tacos" -- Cactus' Law
"t̥͍͎̪̪͗a̴̻̩͈͚ͨc̠o̩̙͈ͫͅs͙͎̙͊ ͔͇̫̜t͎̳̀a̜̞̗ͩc̗͍͚o̲̯̿s̖̣̤̙͌ ̖̜̈ț̰̫͓ạ̪͖̳c̲͎͕̰̯̃̈o͉ͅs̪ͪ ̜̻̖̜͕" -- -̖͚̫̙̓-̺̠͇ͤ̃ ̜̪̜ͯZ͔̗̭̞ͪA̝͈̙͖̩L͉̠̺͓G̙̞̦͖O̳̗͍

Offline

#3 2007-07-06 23:12:44

FurryNeko
Member
Registered: 2006-10-31
Posts: 6

Re: Pacman transparent HTTP cache

cactus wrote:

No worries about the location.
Moving it to "community contributions"..which seems like a good place for it.

Neat idea too. smile

Thanks for moving the topic cactus. If I hadn't been so tired from coding I might have spotted the community contribution forum all the way down there!

As for the little project, well, this morning it had 130 lines of code, as of 10 minutes ago it now has 366 lines lol It's becoming far more stable and I think I've worked out all the bugs in the current code bar one; pacman is sometimes timing out when requesting a file, I need to find out if it's the fault of my script, my server config, or pacman being impatient. It's more than likely to be one of the first two as it only seems to happen when requesting non-cached files, so it could be caused by an unexpected delay connecting to the mirror I suppose.

I've also written code to extract the md5sums from the various repository database files. It turns these into a master list, which is updated when the database is, and uses that information to verify the cached package before sending it to pacman. It's more a precaution than anything else, but it also catches incomplete downloads as their md5s don't match those in the list.

I still need to write some sort of logging system, catch any errors I'm not already and perhaps investigate the possibility of resuming, but I'm not sure that'll be possible. The idea already far exceeded the original implementation though!.

I'll post more as the project develops.

Offline

#4 2007-07-06 23:30:16

cactus
Taco Eater
From: t͈̫̹ͨa͖͕͎̱͈ͨ͆ć̥̖̝o̫̫̼s͈̭̱̞͍̃!̰
Registered: 2004-05-25
Posts: 4,622
Website

Re: Pacman transparent HTTP cache

You might try uncommenting the XferCommand in pacman.conf, and adding a --timeout (or maybe just --read-timeout) option to wget, and seeing if that helps bypass the timeout you are experiencing.

XferCommand = /usr/bin/wget --passive-ftp --read-timeout=10 -c -O %o %u

The wget manpage has more information about the various timeouts wget uses, which I believe are in seconds.....


"Be conservative in what you send; be liberal in what you accept." -- Postel's Law
"tacos" -- Cactus' Law
"t̥͍͎̪̪͗a̴̻̩͈͚ͨc̠o̩̙͈ͫͅs͙͎̙͊ ͔͇̫̜t͎̳̀a̜̞̗ͩc̗͍͚o̲̯̿s̖̣̤̙͌ ̖̜̈ț̰̫͓ạ̪͖̳c̲͎͕̰̯̃̈o͉ͅs̪ͪ ̜̻̖̜͕" -- -̖͚̫̙̓-̺̠͇ͤ̃ ̜̪̜ͯZ͔̗̭̞ͪA̝͈̙͖̩L͉̠̺͓G̙̞̦͖O̳̗͍

Offline

#5 2007-07-17 10:11:09

FurryNeko
Member
Registered: 2006-10-31
Posts: 6

Re: Pacman transparent HTTP cache

Just a little update smile

I solved the timeout problem. It wasn't a misconfiguration but rather a bug in the code that was causing it to stall randomly when retrieving a remote package, it'll now happily retrieve multiple packages without a problem.

The following new features have been added since I last posted:

Logging support
It's a little primitive, but it works. The location of the log file is customizable so it should be possible for logrotate to work with it, hence I've not added any rotation system of my own.

Setup support
The cache script now functions correctly when initially installing Arch via the /arch/setup script. I've run it through once or twice, but it could do with further testing.

Testing & Unstable repository support
Self explanatory really tongue

Currently the script only supports the i686 architecture (due to some hard-coded paths), I'd need to do some recoding to support x86_64 as well, it's something I'm considering, assuming I can get VMWare to run a 64bit Arch install. Resuming is still on the "maybe" pile as I'm still trying to come up with a way of coding it cleanly, it's a case of trying to balance effort vs reward on this one.

I'm also currently working on a quick-and-simple administration interface for the cache. It should let you see what files are cached, remove selected ones or an entire repository worth of cache. Perhaps even have the ability to verify the local files against the md5 summary files I build. It's in the early stages right now, but it'll hopefully be complete in the near future.

Overall it works very well and depending on how many bugs I run into when giving it a real test, I should be able to release it here in the not-too-distant future, then anyone who's interested can play with it and perhaps improve it beyond my original design.

Offline

#6 2007-07-17 12:29:47

[vEX]
Member
From: Sweden
Registered: 2006-11-23
Posts: 450

Re: Pacman transparent HTTP cache

I only run a single machine but I'd still be interested in poking around with the PHP code, sounds like something I could have use for whenever I set up another Arch machine.


PC: Antec P182B | Asus P8Z77-V PRO | Intel i5 3570k | 16GB DDR3 | GeForce 450GTS | 4TB HDD | Pioneer BDR-207D | Asus Xonar DX | Altec Lansing CS21 | Eizo EV2736W-BK | Arch Linux x86_64
HTPC: Antec NSK2480 | ASUS M3A78-EM (AMD 780G) | AMD Athlon X3 425 | 8GB DDR2 | GeForce G210 | 2TB HDD | Arch Linux x86_64
Server: Raspberry Pi (model B) | 512MB RAM | 750GB HDD | Arch Linux ARM

Offline

#7 2007-07-19 12:03:54

FurryNeko
Member
Registered: 2006-10-31
Posts: 6

Re: Pacman transparent HTTP cache

Ok! Here we go!

This version is as feature-complete as I feel it needs to be. There's still a few things I'd like to add at some point in the future, but as it stands now, the code is complete and as bug-free as I can manage.

I've added support for x86_64, as well as installs on x86_64. The admin interface is literally an overview of the files currently cached; I'd originally planned to add support for emptying the cache or removing individual packages, but left it out due to security concerns. Resuming was scrapped, it would take a huge rewrite of the existing code to add support for it, and the effort vs reward just wasn't enough. The script functions perfectly without resuming anyway.

Finally, I've thrown together a quick and dirty README. It should give a bit of an overview of the code and some basic help getting it all up and running.

Please excuse both my messy code and bad spelling / grammar tongue

Have fun!
Neko

Download: http://furryneko.myzen.co.uk/pacman-cache-1.0.tar.gz

Offline

Board footer

Powered by FluxBB