You are not logged in.

#1 2006-07-29 07:13:15

V01D
Member
Registered: 2006-07-18
Posts: 128

Any ideas on how to do a local mirror for this situation?

I'm starting a project to allow ArchLinux to be used on a Cluster environment (autoinstallation of nodes and such). I'm going to implement this where I'm working right now (~25 node cluster). Currently they're using RocksClusters.

The problem is that the connection to internet from work is generally really bad during the day. There's a HTTP proxy in the middle. The other day I tried installing archlinux using the FTP image and I took more than 5 hours just to do an upgrade + installing subversion and other packages, right after an FTP installation (which wasn't fast either).

The idea is that the frontend (the main node of the cluster) would hold a local mirror of packages so that when nodes install use that mirror (the frontend would use this also, because of the bad speed).

As I think it should be better to only update the mirror and perform an upgrade not very often (if something breaks I would leave users stranded until I fix it), I thought I should download a snapshot of extra/ and current/ only once. But the best speed I get from rsync (even at night, where an HTTP transfer from kernel.org goes at 200KB/s) is ~13KB/s this would take days (and when it's done I would have to resync because of any newer package that could have been released in the meantime).

I could download extra/ and current/ at home (I have 250KB/s downstream but I get like ~100KB/s from rsync), record several CDs (6!... ~(3GB + 700MB)/700MB) but that's not very nice. I think that maybe this would be just for the first time. Afterwards an rsync would take a lot less, but I don't know how much less.

Obiously I could speed things a little If I download the full ISO and rsync current using that as a base. But for extra/ I don't have a ISOs.

I think this is a little impractical (to download everything) as I wouldn't need whole extra/ anyways. But it's hard to know all packages needed and their dependencies to download only those.

So... I would like to know if anyone has any ideas on how to make this practical. I wouldn't wan't my whole project to crumble because of this detail.
It's annoying because using pacman at home, always works at max speed.

BTW, I've read that HOWTO that explains how to mount pacman's cache on the nodes to have a shared cache. But I'm not very sure if that's a good option. Anyway, that would imply to download everything at work, which would take years.

Offline

#2 2006-07-29 10:51:16

stavrosg
Member
From: Rhodes, Greece
Registered: 2005-05-01
Posts: 330
Website

Re: Any ideas on how to do a local mirror for this situation?

4GB USB pendrives are getting affordable these days, so why not use one to move the repo between home and work?

Offline

#3 2006-07-29 11:38:03

Romashka
Forum Fellow
Registered: 2005-12-07
Posts: 1,054

Re: Any ideas on how to do a local mirror for this situation?

pacman -Syup

This will print URIs of all packages that have to be updated. Then you can download them at home.

BTW IMHO you should really set up shared pacman cache as explained in my HOWTO on Wiki - this will give you the possibility of quick upgrades from work in cases of security risks or just when you want to install package right now. And this is really better than rsyncing the whole repo.
I have 48kbps to my ISP on my second company's server and it does not take much time to update system.


to live is to die

Offline

#4 2006-07-29 17:19:21

V01D
Member
Registered: 2006-07-18
Posts: 128

Re: Any ideas on how to do a local mirror for this situation?

stavrosg wrote:

4GB USB pendrives are getting affordable these days, so why not use one to move the repo between home and work?

I think I'm going to burn a DVD (I think it all fits in one) and let's see how it goes...

Offline

#5 2006-07-29 17:27:17

V01D
Member
Registered: 2006-07-18
Posts: 128

Re: Any ideas on how to do a local mirror for this situation?

Romashka wrote:
pacman -Syup

This will print URIs of all packages that have to be updated. Then you can download them at home.

BTW IMHO you should really set up shared pacman cache as explained in my HOWTO on Wiki - this will give you the possibility of quick upgrades from work in cases of security risks or just when you want to install package right now. And this is really better than rsyncing the whole repo.
I have 48kbps to my ISP on my second company's server and it does not take much time to update system.

The thing is that with that method I need a first download of all packages to fill the cache. That's what takes the longest time...

I think I'm going to try out this:
* rsync at home (already got current last night)
* burn a DVD
* go to work and then update the packages on DVD using rsync again (this should be fast, if I don't wait long time after recording it)

And to optimize further rsync's:
* Do a first install on all nodes an try it out for a few days (so I install all packages needed)
* Construct a list of packages used by all nodes and frontend
* Remove them from my mirror
* Do further rsync updates only updating the files I already have

This would be the manual approach of the shared cache idea I think.

Thanks for the help  big_smile

Offline

#6 2006-07-29 18:48:56

neotuli
Lazy Developer
From: London, UK
Registered: 2004-07-06
Posts: 1,204
Website

Re: Any ideas on how to do a local mirror for this situation?

As a sidenote, this sounds like a very cool project you're working on. If you get far enough, make sure to ask about getting included on the homepage in arch related projects. smile


The suggestion box only accepts patches.

Offline

#7 2006-07-29 19:01:11

V01D
Member
Registered: 2006-07-18
Posts: 128

Re: Any ideas on how to do a local mirror for this situation?

neotuli wrote:

As a sidenote, this sounds like a very cool project you're working on. If you get far enough, make sure to ask about getting included on the homepage in arch related projects. smile

Will do  big_smile
BTW, I started a little Wiki page at: http://wiki.archlinux.org/index.php/Kickstart.
It's just an explanation on how it works but there's no code. I'm developing a wiki locally to start documenting stuff a little using dokuwiki on my machine. I'm waiting for berlios.de to answer my project registration request but berlios.de seems to be zombie (no response from other project requests, the server hosting subversion has a full disk, no response to support requests...).
Bad thing that sourceforge doesn't work for it (they mount the web disk read-only and dokuwiki writes data to disk).

BTW, If you know any good open-source projects place to put a dokuwiki site I'll appreciate it  big_smile

Offline

#8 2006-07-29 19:06:20

Romashka
Forum Fellow
Registered: 2005-12-07
Posts: 1,054

Re: Any ideas on how to do a local mirror for this situation?

V01D wrote:

The thing is that with that method I need a first download of all packages to fill the cache. That's what takes the longest time...

Why do you need to do that? Don't you have all installed packages in /var/cache/pacman/pkg/ already? If you didn't run pacman -Scc all your installed packages should be already cached.


to live is to die

Offline

#9 2006-07-29 19:19:28

V01D
Member
Registered: 2006-07-18
Posts: 128

Re: Any ideas on how to do a local mirror for this situation?

Romashka wrote:
V01D wrote:

The thing is that with that method I need a first download of all packages to fill the cache. That's what takes the longest time...

Why do you need to do that? Don't you have all installed packages in /var/cache/pacman/pkg/ already? If you didn't run pacman -Scc all your installed packages should be already cached.

After installation the packages that are in cache are the ones from current. All the stuff from extra/ won't be there until I install something from there.
Anyway, if I installl from a full CD I get old packages which I have to pacman -Syu after installation (that takes long time).

Offline

#10 2006-07-29 19:39:26

Romashka
Forum Fellow
Registered: 2005-12-07
Posts: 1,054

Re: Any ideas on how to do a local mirror for this situation?

V01D wrote:

After installation the packages that are in cache are the ones from current. All the stuff from extra/ won't be there until I install something from there.
Anyway, if I installl from a full CD I get old packages which I have to pacman -Syu after installation (that takes long time).

Oh, so that's how is it.

V01D wrote:

I think I'm going to try out this:
* rsync at home (already got current last night)
* burn a DVD
* go to work and then update the packages on DVD using rsync again (this should be fast, if I don't wait long time after recording it)

And to optimize further rsync's:
* Do a first install on all nodes an try it out for a few days (so I install all packages needed)
* Construct a list of packages used by all nodes and frontend
* Remove them from my mirror
* Do further rsync updates only updating the files I already have

This would be the manual approach of the shared cache idea I think.

Hmm... but why do you want to use rsync? You'll need to download the whole repo, which is quite large (current + extra + testing + community > 5.1GB, extra is the largest). I suggest you to download only those packages and their dependencies that you use.

I have similar situation. At work I have unlimited traffic (48kbps at day and 128kbps at night), at home - fast connection (up to 256kbps) but I pay for every megabyte (a little, but after 100-500 megabytes it becomes very noticeable). So I do

yes | pacman -Syuw

or

yes | pacman -Syw pkg1 pkg2 ... pkgN

at work (especially when packages are big), then put new downloaded files on my flash drive, then put them into /var/cache/pacman/pkg/ at home, and then I only need to do pacman -Sy before installing which takes less than a minute.

I have 1GB flashdrive so I can always keep the whole cache on it. Synchronizing work cache <-> flash drive <-> home cache is very easy.


P.S.: Recently I decided to make complete mirror of all i686 packages from archlinux.org with rsync. Not for myself but for my friends that wanted to install Linux. Anyway I don't pay for every megabyte at my work. However it took almost a week to download 5.1 GB of packages.

IMHO for most local mirror solutions using rsync is overkill. How many users are there that use more than 30% of packages from repos? So why to make full mirror with rsync when you can cache only installed packages?


to live is to die

Offline

Board footer

Powered by FluxBB