You are not logged in.

#1 2005-03-15 15:35:00

jp_fielding
Member
Registered: 2004-08-28
Posts: 85

mirroring arch [SOLVED]

has anybody ever run any numbers to find the threshold number of users and average package usage to justify a local mirror?

we have about 7 desktop boxes using x and 6 servers. 

for myself, i have install 238 packages coming in at a svelt 428M for /var/cache/pacman/pkg after running pacman -Sc.  that's probably about the average for the desktops.  our servers average about 85 with the cache running about 100M.

the problem is that that's simply a snapshot, the actually volatility of those packages and the average size per time period is lost.

we're currently mirroring testing,current and extra.

at first we thought this was a great a idea until one of my co-workers suggested that running a debian mirror over at the local university they found the benefit questionable.  of course, debian is a _very_ large creature, and 3 architectures on unstable may have accouted for that. 

for our part, the only wasteful things i can think we're mirroring is kde and openoffice. 

what might actually help in decision making would be a package/size change list over certain time periods.  like hourly or daily.  anybody seen anything like this?

probably going to try ntop on this box, but thought i would solicit over experiences first.

any thoughts?

thanks,
jp

Offline

#2 2005-03-15 20:31:19

cactus
Taco Eater
From: t͈̫̹ͨa͖͕͎̱͈ͨ͆ć̥̖̝o̫̫̼s͈̭̱̞͍̃!̰
Registered: 2004-05-25
Posts: 4,622
Website

Re: mirroring arch [SOLVED]

Why not mirror? Is there some downside for you as far as storage goes?

If for no other reason, it reduces the strain on the arch server, and other mirrors, so it is a good thing.


"Be conservative in what you send; be liberal in what you accept." -- Postel's Law
"tacos" -- Cactus' Law
"t̥͍͎̪̪͗a̴̻̩͈͚ͨc̠o̩̙͈ͫͅs͙͎̙͊ ͔͇̫̜t͎̳̀a̜̞̗ͩc̗͍͚o̲̯̿s̖̣̤̙͌ ̖̜̈ț̰̫͓ạ̪͖̳c̲͎͕̰̯̃̈o͉ͅs̪ͪ ̜̻̖̜͕" -- -̖͚̫̙̓-̺̠͇ͤ̃ ̜̪̜ͯZ͔̗̭̞ͪA̝͈̙͖̩L͉̠̺͓G̙̞̦͖O̳̗͍

Offline

#3 2005-03-15 20:49:47

jp_fielding
Member
Registered: 2004-08-28
Posts: 85

Re: mirroring arch [SOLVED]

the storage doesn't bother us at all, but there does appear to be a threshold where mirroring actually uses up more bandwidth than not.  this is the situation that my debian 'friend' had :-).

for instance, say arch had about 50mb per day of updates for all packages, but of all packages, only 3mb was generally of interest per system, with 15 systems, thats 45mb.  at that point, the threshold value of mirroring is negative. 

this is just an example of where that threshold might be, i don't know yet and don't have any numbers, so i was curious if anyone else had looked at this problem.

-edit, btw, it certainly reduces our in-house bandwidth, the question is when it reduces arch's bandwidth

Offline

#4 2005-03-15 22:01:19

cactus
Taco Eater
From: t͈̫̹ͨa͖͕͎̱͈ͨ͆ć̥̖̝o̫̫̼s͈̭̱̞͍̃!̰
Registered: 2004-05-25
Posts: 4,622
Website

Re: mirroring arch [SOLVED]

hmm..i see.

well, it wouldn't be too hard to have rsync exclude the files you don't need. like (kde*, gnome*, etc.)
then you could have abs on a sever, and have a script go through and remove the things that you do not have in your mirror'ed repo, then run gensync locally.

It would take a little while writing a script to automate it, but it shouldn't be too hard.


"Be conservative in what you send; be liberal in what you accept." -- Postel's Law
"tacos" -- Cactus' Law
"t̥͍͎̪̪͗a̴̻̩͈͚ͨc̠o̩̙͈ͫͅs͙͎̙͊ ͔͇̫̜t͎̳̀a̜̞̗ͩc̗͍͚o̲̯̿s̖̣̤̙͌ ̖̜̈ț̰̫͓ạ̪͖̳c̲͎͕̰̯̃̈o͉ͅs̪ͪ ̜̻̖̜͕" -- -̖͚̫̙̓-̺̠͇ͤ̃ ̜̪̜ͯZ͔̗̭̞ͪA̝͈̙͖̩L͉̠̺͓G̙̞̦͖O̳̗͍

Offline

#5 2005-03-15 22:18:28

jp_fielding
Member
Registered: 2004-08-28
Posts: 85

Re: mirroring arch [SOLVED]

yeah, we'ld talked about that, but with 8 or so different users, trying to restrict the packages is _very_ tricky.  i know one user in particular uses gnome, but has from time to time used k3b, which has some kde deps that we probably woundn't have brought down otherwise. 

it would be significantly easier to simply run the mirroring rsynced, but we won't do it if in the end it is wasteful.

we could probably even do some rsync tricks to ignore certain files, if pacman could know enough to fail over to the next mirror if a declared package isn't shown locally. 

in this case, we could rsync to filter uncommon things like openoffice or kde stuff for instance, and if someone did go for k3b, which would be listed in the packages, it could see we didn't have it and try the next mirror.

actually, that might be a nice feature, i've had that situation once or twice when i've gone to a mirror and a file that was advertised was not there.

Offline

#6 2005-03-15 22:29:50

MNKyDeth
Member
From: MI
Registered: 2003-09-13
Posts: 89

Re: mirroring arch [SOLVED]

I understand your situation seems more office or possibly work related but I have found it invaluable to rsync the Arch repos. The trick is I could never justy the use of the bandwidth myself, but, I host lan parties and do work for people (repair, IT, and other such stuff). I have converted many over to Arch atleast on a trial run setup.

The trick is, I use dvd-rw's, this way I can put the repos on dvd and hand them out to the people that I have converted or have setup for trial runs. This way if there was a problem with a package that cropped up I know about it ahead of time. I wait till a fix has been implemented and then once a month I fetch the dvd-rw's re-burn the repos on them, then redistribute the dvd's. Imo, this saves much bandwidth so people are not constantly hitting the arch repos servers.

I also hold lan parties that are 100% Linux compat game wise. During these events I bring my repos machine and help people out setting up Arch if they so desire. This greatly boosts install times and keeps the people away from the arch repos servers, then I give them the option if they want my distributed dvd thing I do.

Like I said, I could never justify rsync'ing the repos for my own benefit, but for what I do, it greatly helps me and everyone else out I think.

For 7 desktops and 6 servers I could see it justified in an office business setting because it can greatly help reduce your cost and help speed up the time of delivery overall. If you feel you are wasting bandwidth there is the donate to Arch thingy here to help pay for there server cost. smile

I should prolly take my own advice on that last line. smile

I should also mention that I only do an rsync once afery 2-4 weeks, as most things don't need to be updated asap unless it is a major security updated packge or something major changed like moving to udev or xorg. Havn't had any more of those recently though lately.

Offline

#7 2005-03-15 22:55:27

jp_fielding
Member
Registered: 2004-08-28
Posts: 85

Re: mirroring arch [SOLVED]

unfortunately at least 2 of us  :oops: are obsessive compulsive and update at least daily.  sometimes more. 

i'm going too have to look into some form of lazy mirroring.  where requested/downloaded packages are the only ones kept in the mirror.

Offline

#8 2005-03-15 23:07:24

i3839
Member
Registered: 2004-02-04
Posts: 1,185

Re: mirroring arch [SOLVED]

I didn't read the whole thread in detail, but this is how I'd do it:

Have one main box hosting the packages.

Let all other comps mount the main box's package dir with NFS on their /var/cache/pacman/pkg/ directory.

That way if someone does a sync it will be fetched from the cache, or if it isn't there it will be added to the cache and all Arch boxes in the network can use it.

EDIT:
You'd probably want the pacman databases to be on the main server too and letting everyone use that.

Offline

#9 2005-03-15 23:07:31

Snowman
Developer/Forum Fellow
From: Montreal, Canada
Registered: 2004-08-20
Posts: 5,212

Re: mirroring arch [SOLVED]

I am doing something similar to what you want because I only have internet access to the univ. websites/servers . The refered files are at http://www.astro.umontreal.ca/~belanger/repo/ if you want to check them.
Here are the steps:
1. On university server, I run db.sh to get new database.
2. On local computer, I run pacman -Syup to get a list of URL of the new packages.
3. I paste that list to a pkg file (on university server)
4. On university server, I run ul.sh to download packages from official arch mirors
5.  On local computer, I run pacman -Su to download packages from university server to local computer.

That way only the packages I need are downloaded from official arch mirors.  I don't know how well that can be automated to several computer.  I suppose that each computer could sumbit his list of requested packages to the local mirror server where they can be concatenated  and duplicates removed.

Offline

#10 2005-03-15 23:31:25

jp_fielding
Member
Registered: 2004-08-28
Posts: 85

Re: mirroring arch [SOLVED]

Snowman:
that's a pretty neat way to do it.  but like you said, managing that for a couple of computers might be a bit much.  i'll have to look at that a bit closer.

i3839:
that would be ideal, except we all use laptops, except for the servers of course.  the servers can just nfs mount their caches to a single location and that would be perfect.  for the laptops we'ld have to find a way to make that mount accessible outside our firewall and then that would work.

Offline

#11 2005-03-16 00:26:56

neotuli
Lazy Developer
From: London, UK
Registered: 2004-07-06
Posts: 1,204
Website

Re: mirroring arch [SOLVED]

Personally, I like i3839's idea. As for making it accessible outside the firewall, I don't know how much of a security hazard that is, I googled a bit..so I'll just toss some links here:
http://www.coda.cs.cmu.edu/ -same idea as NFS, advertises encryption too
http://www.math.ualberta.ca/imaging/snfs/ -I don't know how easy this is to get going, but it sounds like a very secure idea
if you put up a firewall on the machine, ideas out of here can work: http://nfs.sourceforge.net/nfs-howto/security.html (section 6.4)

I think it's quite feasable, and good luck  smile

EDIT: actually, check out section 6.5 of the last link, it's got pretty good instructions on tunneling it through ssh, which I still think is the easiest and most secure solution.


The suggestion box only accepts patches.

Offline

#12 2005-03-16 12:13:11

jp_fielding
Member
Registered: 2004-08-28
Posts: 85

Re: mirroring arch [SOLVED]

okay, so the feature i was looking for in pacman already exists, which i swear it didn't.  but after testing a bit more, i see that i was wrong.

i thought i got this error if you setup your mirrors files and a package was missing.  say i rename bind to xbind on our mirror and try to sync it with my pacman.conf just directly listing our mirror:

[root@aragorn ~]# pacman -S bind

Targets: bind-9.3.1-1

Total Package Size:   1.9 MB

Proceed with upgrade? [Y/n] y

:: Retrieving packages from current...

failed downloading /archlinux.org/current/os/i686/bind-9.3.1-1.pkg.tar.gz from beethoven.3yd.com: HTTP/1.1 404 Not Found

error: failed to retrieve some files from current

but when i go back and simply put our mirror file, and try with bind still renamed i get this:

[root@aragorn ~]# pacman -Syu
:: Synchronizing package databases...
 testing                  [##############################################] 100%      30K   759.6K/s  00:00:00
 current                  [##############################################] 100%      44K    1236K/s  00:00:00
 extra                    [##############################################] 100%     195K    1146K/s  00:00:00
[root@aragorn ~]# pacman -S bind

Targets: bind-9.3.1-1

Total Package Size:   1.9 MB

Proceed with upgrade? [Y/n] y

:: Retrieving packages from current...

failed downloading /archlinux.org/current/os/i686/bind-9.3.1-1.pkg.tar.gz from beethoven.3yd.com: HTTP/1.1 404 Not Found

 bind-9.3.1-1             [#####                                         ]  11%     214K    76.6K/s  00:00:24

that meant that that single mirror didn't have the file.  but after just testing pacman does appear to try the next mirror in the list. 

so it appears that my solution is simpler that i could have hoped.

i can run rsync filtering out packages (include filter generated from multi-package lists from above, thanks snowman), we setup our mirror files properly and if someone wants something thats not listed, then they have to go to their next mirror to get it. 

thanks again everyone for the comments and good ideas.  i'm still going to look into a few of the other ideas.  the encrypted nfs mount is of other interests at this point.   i may still setup a noauto nfs mount for our cache that is the repo for our laptops that allows us to quickly mount an existing cache when here, and just use our own when not.

jp

Offline

#13 2005-03-16 13:38:00

i3839
Member
Registered: 2004-02-04
Posts: 1,185

Re: mirroring arch [SOLVED]

If I were you I'd want nonexisting packages to be downloaded onto the server when it isn't there yet, so other can use it too, thus downloading only the packages needed without need to keep a list of whatever.

I'd use the XferCommand option pacman.conf which calls a script that does two things:
First it executes a script on the server with ssh.
Then it uses scp to download the package from the server.

The script on the server checks if the package is in the cache, and if that isn't the case it downloads the package.

Depending on the server's connection speed, the downloading can happen parallel after some tweaking with the scripts.

Offline

#14 2005-03-16 13:46:44

jp_fielding
Member
Registered: 2004-08-28
Posts: 85

Re: mirroring arch [SOLVED]

that's a good idea. one of our guys was going to do something like that but from the server side instead, with apache.  that would alieviate the need to for anything more than using the mirror to get it updated.  for the moment, i've simply generated rsync include lists from the package lists of a couple of example users on a few systems. 

but the hope is to have an apache plugin to accept incoming requests, and fire off a python script to retrieve missing files and download to the system and update the include script.

we'll keep posting any progress here.

Offline

#15 2005-03-16 15:58:37

jp_fielding
Member
Registered: 2004-08-28
Posts: 85

Re: mirroring arch [SOLVED]

basically our first step landed us this....

we our mirror dir /var/archlinux.org/

we have a cron script that wraps our sync script, our sync looks like this:

#!/bin/bash

MIRROR=/var/archlinux.org

go(){
    grep -h . $MIRROR/includes/* | awk {'print $1"-*"'} | rsync -avz --delete --delete-excluded --include-from - --exclude =os/i686/*.pkg.tar.gz archlinux.org::$1 $MIRROR/$2
}

go "current" "current/"
go "extra" "extra/"
go "ftp/testing/" "testing/"
#go "ftp/unstable/" "unstable/"

the files that exist in the subdir includes are basically generated from each interested use running:

pacman -Q | awk {'print $1'} 

this creates our includes list of things to bring down and keep rsync'd.

this has brought down our server space from 2.8G to 850M.  not that drive space was an issue, but that 2Gs wasted download on the server.  and the problem of managing which files to bring down becomes a user problem.

we're hoping by the end of the week to have the update that lets our mirror proxy to the arch server to grab a missing file and add it to our repo and sync list.  then it's zero maintaince and minimal bandwidth.  if we simply relay to the arch served package db files, we wouldn't even need to rsync, simply let the first user that wants a package do the waiting.

Offline

#16 2005-03-16 20:55:07

cjdj
Member
From: Perth, Western Australia
Registered: 2004-05-07
Posts: 121

Re: mirroring arch [SOLVED]

I have 4 servers and a number of workstations.  I use squid and frox proxies, and tell pacman to use wget.  wget is setup to go thru the proxies.  The proxies are configured to cache and keep large files (by default they dont).

That way, the first time the file is downloaded it is cached locally, and all other systems will get it locally.   Eventually it will get replaced by newer files.

I toyed with the idea of mirroring the repos, but since I needed a webproxy anyway, I figured it would work better this way.

Offline

#17 2005-03-16 22:32:37

MNKyDeth
Member
From: MI
Registered: 2003-09-13
Posts: 89

Re: mirroring arch [SOLVED]

Wow, great progress, I am gonna try and get something similar setup for myself now. Lots of great tips in here to really streamline the DL'ing. For what I do I'll still need more packages than what I'll use for the other people but this could really help me speed up some things.

Thanks for all the info from the poster aswell on letting us know what is involved. smile

Offline

Board footer

Powered by FluxBB