You are not logged in.
I'm starting a project to allow ArchLinux to be used on a Cluster environment (autoinstallation of nodes and such). I'm going to implement this where I'm working right now (~25 node cluster). Currently they're using RocksClusters.
The problem is that the connection to internet from work is generally really bad during the day. There's a HTTP proxy in the middle. The other day I tried installing archlinux using the FTP image and I took more than 5 hours just to do an upgrade + installing subversion and other packages, right after an FTP installation (which wasn't fast either).
The idea is that the frontend (the main node of the cluster) would hold a local mirror of packages so that when nodes install use that mirror (the frontend would use this also, because of the bad speed).
As I think it should be better to only update the mirror and perform an upgrade not very often (if something breaks I would leave users stranded until I fix it), I thought I should download a snapshot of extra/ and current/ only once. But the best speed I get from rsync (even at night, where an HTTP transfer from kernel.org goes at 200KB/s) is ~13KB/s this would take days (and when it's done I would have to resync because of any newer package that could have been released in the meantime).
I could download extra/ and current/ at home (I have 250KB/s downstream but I get like ~100KB/s from rsync), record several CDs (6!... ~(3GB + 700MB)/700MB) but that's not very nice. I think that maybe this would be just for the first time. Afterwards an rsync would take a lot less, but I don't know how much less.
Obiously I could speed things a little If I download the full ISO and rsync current using that as a base. But for extra/ I don't have a ISOs.
I think this is a little impractical (to download everything) as I wouldn't need whole extra/ anyways. But it's hard to know all packages needed and their dependencies to download only those.
So... I would like to know if anyone has any ideas on how to make this practical. I wouldn't wan't my whole project to crumble because of this detail.
It's annoying because using pacman at home, always works at max speed.
BTW, I've read that HOWTO that explains how to mount pacman's cache on the nodes to have a shared cache. But I'm not very sure if that's a good option. Anyway, that would imply to download everything at work, which would take years.
Offline
4GB USB pendrives are getting affordable these days, so why not use one to move the repo between home and work?
Offline
pacman -Syup
This will print URIs of all packages that have to be updated. Then you can download them at home.
BTW IMHO you should really set up shared pacman cache as explained in my HOWTO on Wiki - this will give you the possibility of quick upgrades from work in cases of security risks or just when you want to install package right now. And this is really better than rsyncing the whole repo.
I have 48kbps to my ISP on my second company's server and it does not take much time to update system.
to live is to die
Offline
4GB USB pendrives are getting affordable these days, so why not use one to move the repo between home and work?
I think I'm going to burn a DVD (I think it all fits in one) and let's see how it goes...
Offline
pacman -Syup
This will print URIs of all packages that have to be updated. Then you can download them at home.
BTW IMHO you should really set up shared pacman cache as explained in my HOWTO on Wiki - this will give you the possibility of quick upgrades from work in cases of security risks or just when you want to install package right now. And this is really better than rsyncing the whole repo.
I have 48kbps to my ISP on my second company's server and it does not take much time to update system.
The thing is that with that method I need a first download of all packages to fill the cache. That's what takes the longest time...
I think I'm going to try out this:
* rsync at home (already got current last night)
* burn a DVD
* go to work and then update the packages on DVD using rsync again (this should be fast, if I don't wait long time after recording it)
And to optimize further rsync's:
* Do a first install on all nodes an try it out for a few days (so I install all packages needed)
* Construct a list of packages used by all nodes and frontend
* Remove them from my mirror
* Do further rsync updates only updating the files I already have
This would be the manual approach of the shared cache idea I think.
Thanks for the help
Offline
As a sidenote, this sounds like a very cool project you're working on. If you get far enough, make sure to ask about getting included on the homepage in arch related projects.
The suggestion box only accepts patches.
Offline
As a sidenote, this sounds like a very cool project you're working on. If you get far enough, make sure to ask about getting included on the homepage in arch related projects.
Will do
BTW, I started a little Wiki page at: http://wiki.archlinux.org/index.php/Kickstart.
It's just an explanation on how it works but there's no code. I'm developing a wiki locally to start documenting stuff a little using dokuwiki on my machine. I'm waiting for berlios.de to answer my project registration request but berlios.de seems to be zombie (no response from other project requests, the server hosting subversion has a full disk, no response to support requests...).
Bad thing that sourceforge doesn't work for it (they mount the web disk read-only and dokuwiki writes data to disk).
BTW, If you know any good open-source projects place to put a dokuwiki site I'll appreciate it
Offline
The thing is that with that method I need a first download of all packages to fill the cache. That's what takes the longest time...
Why do you need to do that? Don't you have all installed packages in /var/cache/pacman/pkg/ already? If you didn't run pacman -Scc all your installed packages should be already cached.
to live is to die
Offline
V01D wrote:The thing is that with that method I need a first download of all packages to fill the cache. That's what takes the longest time...
Why do you need to do that? Don't you have all installed packages in /var/cache/pacman/pkg/ already? If you didn't run pacman -Scc all your installed packages should be already cached.
After installation the packages that are in cache are the ones from current. All the stuff from extra/ won't be there until I install something from there.
Anyway, if I installl from a full CD I get old packages which I have to pacman -Syu after installation (that takes long time).
Offline
After installation the packages that are in cache are the ones from current. All the stuff from extra/ won't be there until I install something from there.
Anyway, if I installl from a full CD I get old packages which I have to pacman -Syu after installation (that takes long time).
Oh, so that's how is it.
I think I'm going to try out this:
* rsync at home (already got current last night)
* burn a DVD
* go to work and then update the packages on DVD using rsync again (this should be fast, if I don't wait long time after recording it)And to optimize further rsync's:
* Do a first install on all nodes an try it out for a few days (so I install all packages needed)
* Construct a list of packages used by all nodes and frontend
* Remove them from my mirror
* Do further rsync updates only updating the files I already haveThis would be the manual approach of the shared cache idea I think.
Hmm... but why do you want to use rsync? You'll need to download the whole repo, which is quite large (current + extra + testing + community > 5.1GB, extra is the largest). I suggest you to download only those packages and their dependencies that you use.
I have similar situation. At work I have unlimited traffic (48kbps at day and 128kbps at night), at home - fast connection (up to 256kbps) but I pay for every megabyte (a little, but after 100-500 megabytes it becomes very noticeable). So I do
yes | pacman -Syuw
or
yes | pacman -Syw pkg1 pkg2 ... pkgN
at work (especially when packages are big), then put new downloaded files on my flash drive, then put them into /var/cache/pacman/pkg/ at home, and then I only need to do pacman -Sy before installing which takes less than a minute.
I have 1GB flashdrive so I can always keep the whole cache on it. Synchronizing work cache <-> flash drive <-> home cache is very easy.
P.S.: Recently I decided to make complete mirror of all i686 packages from archlinux.org with rsync. Not for myself but for my friends that wanted to install Linux. Anyway I don't pay for every megabyte at my work. However it took almost a week to download 5.1 GB of packages.
IMHO for most local mirror solutions using rsync is overkill. How many users are there that use more than 30% of packages from repos? So why to make full mirror with rsync when you can cache only installed packages?
to live is to die
Offline