Creating a local mirror

Benedict_White · 2008-04-24 10:17:49

I have several machines running Arch Linux and install new ones on a regular basis.

What is the best way of keeping a local mirror and keeping it up to date?

TheBodziO · 2008-04-24 18:42:21

I'm not convinced that local mirror is a good way of keeping your „farm" up to date, because you'll be mirroring all packages in arch repositories. I may be wrong—tell me if I am—but I think that you have some lab(s) to manage, with hosts that have similar, if not identical, set of software on them. In such case I think that local cache + cron jobs would be much nicer. There's a nice wiki page that describes such solution: http://wiki.archlinux.org/index.php/Net … cman_Cache. I'm using pacman cache at home currently and it works perfectly.

I can think of other ways of keeping a set of computers updated but I don't know how would they work in the real world. I've got a cople of ideas about it. If you'll be interrested I'm willing to share them.

Please, don't take pointing you to wiki as a sign RTFMism .

If you'll have any questions about my appliance of caching don't hesitate to ask them. I'll be glad to answer!

Pudge · 2008-04-25 02:31:13

I have seven computers running Arch. This is how I avoid downloading the same updates seven times thus saving both my bandwidth and the mirror's bandwidth. I have one computer running Arch that I use as my server, with a host name of folding1. On that server's hard drive, I have a large partition off of root named /server.

drwxrwxr-x 15 root users 4096 Mar 26 19:38 server

Note the permissions and owners. This allows both root and users to access the various subdirectories under /server.

I have a directory on the server that I use as my new cache directory, /server/cache/pacman/x86_64_pkg. I use ssh, fuse, and sshfs to use this directory on the server as the package cache no matter what remote computer I am updating. I accomplish this with the following scripts located in /root/bin of all the six remote computers.

The first one is named update

#!/bin/bash
sshfs root@folding1:/server/cache/pacman/x86_64_pkg    /var/cache/pacman/pkg
pacman -Syu
fusermount -u /var/cache/pacman/pkg
exit

The second one is named add

#!/bin/bash
sshfs root@folding1:/server/cache/pacman/x86_64_pkg    /var/cache/pacman/pkg
pacman -Sy $@
fusermount -u /var/cache/pacman/pkg
exit

You use update as root in place of pacman -Syu. You use add package1 package2 package3 etc in place of pacman -Sy package1 package2 package3 etc

This is accomplished as follows. First the directory /var/cache/pacman/pkg on all six remote computers MUST be empty. When you run the update script, the line

  sshfs root@folding1:/server/cache/pacman/x86_64_pkg    /var/cache/pacman/pkg

connects to the server via ssh, fuse, and sshfs and mounts the server directory /server/cache/pacman/x86_64_pkg to the remote computer's mount point of /var/cache/pacman/pkg. The fact that we are using /var/cache/pacman/pkg as a mount point is why that directory MUST be empty. So now any time the remote computer accesses it's /var/cache/pacman/pkg directory, it is in actuality accessing the server directory.

Next, the pacman -Syu command is run and pacman acts as it always does. Except that when it downloads updated packages, it is actually downloading them to the /server/cache/pacman/x86_64_pkg directory on the server. It also installs the packages from the server directory.

Lastly, the line

fusermount -u /var/cache/pacman/pkg

unmounts the server directory from /var/cache/pacman/pkg and the association between the two is ended.

Now when you update the second remote computer using the update script, it does as above. Except that before pacman downloads a package, it checks to see if that package-version exists in /var/cache/pacman/pkg and if it does exist, it doesn't download it again. When you updated the first remote computer, the process downloaded all the new packages to the new cache directory on the server. So when you update the second, third, and etc remote computers, the packages already exists and don't need to be downloaded again. If the second computer updated needs an application or package updated that is not on the first computer, it will download those packages. But any packages that are common on the two will already be in the server's cache and do not need to be downloaded.

To make the server computer utilize this new package cache, change the following line in the server's /etc/pacman.conf file

CacheDir    = /server/cache/pacman/x86_64_pkg

The update and add scripts are not used on the server, pacman is used as it normally is.

When you run "pacman -Syu" or "pacman -Sy packagename" pacman compares the package database on the mirror to the package database on your computer. You specify which mirror to compare to and download from in the /etc/pacman.d/mirrorlist file. This is still handled locally in each computer, so you can have one computer download updates from the locke.ssu.edu mirror and another download updates from the ftp-linux.cc.gatech.edu mirror if you wanted to. I don't know why you would want to do this, but I'm just explaining how it works. Irregardless of where the package-version was downloaded from, if it exists in the server's package cache from updating a previous computer it doesn't need to b e downloaded again saving both time and bandwidth.

Here is the only caveat. Pacman can still be used as normal on the client computers, but it will start to populate the /var/cache/pacman/pkg directory with package files. These packages must be deleted before using the server's package cache again. If when using the update or add scripts, you see a sshfs error warning that the mount point is not empty, hit CTRL-C and empty the local package cache. Or you could add a line of code to the scripts that would automatically rm all files in the local package cache before the sshfs command is called.

When using sshfs, you can place root's root/.ssh/id_rsa.pub keyfile in the server's /root/.ssh/authorized_keys file and a password will not be needed. Otherwise the script will prompt you for the server's root password. For just a little added security, I have it prompt for the password.

I have been using this system for quite some time, and it works great. I think this way is a little simpler than the Wiki's method, but not much. I also think it is a little safer as you do not have to edit your fstab file and the sshfs connection is only active when the script is run. From my experience, I also recommend not using a shared package database.

If you need help on getting ssh and sshfs running on your computer, see THIS TUTORIAL.

I hope everyone can understand this, I'm not a very good writer.

Pudge

Last edited by Pudge (2008-04-25 02:58:25)

TheBodziO · 2008-04-25 08:19:39

Your method is very similar to the one that's wikified.

AFAICU, correct me if I'm wrong, you're mounting your package cache on machine beeing updated each time you do update or add some package and unmounting it immediately. It appears also that you're downloading whole repository tree every time you're using "add" or "update". It's understandably needed in your strategy but also it multiplies traffic to your choosen arch mirror (leaving alone local traffic which would not be of highest orders by doing so) unless you're using some proxy. I think that more sensible way is to update repo tree only on your server and mount it on "/var/lib/pacman/sync" on remaining machines (you could mount it as "ro"). By doing so you'll be able to speed up both "add" and "update" considerably (they won't have to use "y" switch for pacman). You could also mount your caches on boot negating the necessity of having separate scripts to add packages and update system. You can also put update procedure in cron which will leave you (mostly) from performing update manually.

I think this is a better way of dealing with multiple machines update. Yet I don't want to be authoritative I know that you have a reason for doing things like this. I'll be happy to hear it.

Pudge · 2008-04-25 15:24:41

TheBodziO wrote:

Your method is very similar to the one that's wikified.

It is just a tweaked version of the method that's wikified.

TheBodziO wrote:

AFAICU, correct me if I'm wrong, you're mounting your package cache on machine beeing updated each time you do update or add some package and unmounting it immediately.

Correct. To understand my mindset when programming, perhaps a little history is in order. I am retired from AT&T after 34 years of service. My first computer experience was in the mid 70's using UNIX of course since AT&T invented UNIX. Back in those days, computer resources were limited, amount of ram, storage space, processing time, etc. So I was taught that when you are done using something (a reserved chunk of RAM, a process, temporary files, a port, etc) turn it off and/or clean it up. I typically update my systems one a week and seldom add packages. So according to how I was taught, why have the server's package cache mounted to (in my case 6 different client computers) 24 X 7 when it is only used a few minutes a week? Leaving multiple mountings of the package cache 24 X 7 ties up computer resources for no reason. In my opinion, it is also safer to not have the package cache mounted 24 X 7. These six connections being up is a possible security risk, just six more ways a hacker could possibly compromise my network. I was taught to close and lock doors after I am done with them, why invite someone in.

TheBodziO wrote:

It appears also that you're downloading whole repository tree every time you're using "add" or "update". It's understandably needed in your strategy but also it multiplies traffic to your choosen arch mirror (leaving alone local traffic which would not be of highest orders by doing so) unless you're using some proxy. I think that more sensible way is to update repo tree only on your server and mount it on "/var/lib/pacman/sync" on remaining machines (you could mount it as "ro"). By doing so you'll be able to speed up both "add" and "update" considerably (they won't have to use "y" switch for pacman).

Originally, that was also my way of thinking, and how I did it. But I ran into some problems and having each computer download it's own package database eliminated my problem and I never went further with it. Downloading the package database isn't a terribly large load on the mirrors so I didn't worry about it. If I remember correctly, PART of the problem was that with that setup, you need to update the server first to sync the package database before any other computer is updated or added to. So in cases like the recent udev update that was possibly risky, I wanted to update my "testing" computer before I updated my server. This would require me to ssh into the server and do a "pacman -Sy" to sync the package database without updating anything, then update my client computer. I would not always remember to do this. I also update my computer, my wife's computer, and my testing computer much more often than I update the server. So again I would have to remember to sync the server's package database first, and at my age having to remember to do something first is a bad thing.

TheBodziO wrote:

You could also mount your caches on boot negating the necessity of having separate scripts to add packages and update system. You can also put update procedure in cron which will leave you (mostly) from performing update manually.

The above is simply a suggested way of doing the task that happens to work in my situation. Anyone is free to add or delete from it to make it fit their situation, or ignore it completely. As to doing updates from cron. With Arch's rolling release system and Arch's bleeding edge packages, I prefer to keep an eye on the main page announcements and on the forums, then I decide when to update. Again, anyone is free to do what they want on when to update.

Pudge

TheBodziO · 2008-04-25 17:54:51

Thank for an elaborate explanation! I knew that it was a reasoning behind it.

Pudge wrote:

TheBodziO wrote:
AFAICU, correct me if I'm wrong, you're mounting your package cache on machine beeing updated each time you do update or add some package and unmounting it immediately.
Correct. To understand my mindset when programming, perhaps a little history is in order. I am retired from AT&T after 34 years of service. My first computer experience was in the mid 70's using UNIX of course since AT&T invented UNIX. Back in those days, computer resources were limited, amount of ram, storage space, processing time, etc. So I was taught that when you are done using something (a reserved chunk of RAM, a process, temporary files, a port, etc) turn it off and/or clean it up. I typically update my systems one a week and seldom add packages. So according to how I was taught, why have the server's package cache mounted to (in my case 6 different client computers) 24 X 7 when it is only used a few minutes a week? Leaving multiple mountings of the package cache 24 X 7 ties up computer resources for no reason. In my opinion, it is also safer to not have the package cache mounted 24 X 7. These six connections being up is a possible security risk, just six more ways a hacker could possibly compromise my network. I was taught to close and lock doors after I am done with them, why invite someone in.

That's interresting view. I've never thought of it like this. I guess I've always had "too much" resources . I'm glad that you've remind me that memory is of a finite quantity. Security aspect is also reasonable to me. I must take it into account next time I'll be considering mounting some remote filesystems.

Pudge wrote:

TheBodziO wrote:
It appears also that you're downloading whole repository tree every time you're using "add" or "update". It's understandably needed in your strategy but also it multiplies traffic to your choosen arch mirror (leaving alone local traffic which would not be of highest orders by doing so) unless you're using some proxy. I think that more sensible way is to update repo tree only on your server and mount it on "/var/lib/pacman/sync" on remaining machines (you could mount it as "ro"). By doing so you'll be able to speed up both "add" and "update" considerably (they won't have to use "y" switch for pacman).
Originally, that was also my way of thinking, and how I did it. But I ran into some problems and having each computer download it's own package database eliminated my problem and I never went further with it. Downloading the package database isn't a terribly large load on the mirrors so I didn't worry about it. If I remember correctly, PART of the problem was that with that setup, you need to update the server first to sync the package database before any other computer is updated or added to.

Indeed. The server would have to be the first to have a repository tree updated.

Pudge wrote:

So in cases like the recent udev update that was possibly risky, I wanted to update my "testing" computer before I updated my server. This would require me to ssh into the server and do a "pacman -Sy" to sync the package database without updating anything, then update my client computer. I would not always remember to do this. I also update my computer, my wife's computer, and my testing computer much more often than I update the server. So again I would have to remember to sync the server's package database first, and at my age having to remember to do something first is a bad thing.

I believe that leaving something "to be remebered" isn't the good practice in system administration regardless of age . I must admit that the amount of traffic generated by updates is not a big deal in your setup.

I think that a ftp proxy would help to cut traffic considerably and also speed things up—granted—for the cost of some server resources. Anyway it would effectively download most recent files to your server and this download would be triggered by any of your machines. At the same time each system could be upgraded separately.

Pudge wrote:

TheBodziO wrote:
You could also mount your caches on boot negating the necessity of having separate scripts to add packages and update system. You can also put update procedure in cron which will leave you (mostly) from performing update manually.
The above is simply a suggested way of doing the task that happens to work in my situation. Anyone is free to add or delete from it to make it fit their situation, or ignore it completely.

A mere suggestion, that's all .

Pudge wrote:

As to doing updates from cron. With Arch's rolling release system and Arch's bleeding edge packages, I prefer to keep an eye on the main page announcements and on the forums, then I decide when to update. Again, anyone is free to do what they want on when to update.
Pudge

It makes me to think even more about using ftp proxy as a part of the solution.

By reading your answer I realised that the problem of keeping multiple arch hosts up to date needs more careful consideration and attention. Perhaps there is a place for a project that would develop some policies and tools to perform this task both efficiently and securely?

It's been a pleasure to read your post! Thanks again for sharing your vast knowledge and experience with me and the others!

tomk · 2008-04-25 18:04:51

I export my server's /var/cache/pacman/pkg via nfs, and mount it on /var/cache/pacman/pkg on the other machines. I don't know if that's what the wiki says because I haven't read that page.

TheBodziO · 2008-04-25 19:04:15

tomk wrote:

I export my server's /var/cache/pacman/pkg via nfs, and mount it on /var/cache/pacman/pkg on the other machines. I don't know if that's what the wiki says because I haven't read that page.

Yup! That's it!

Arch Linux

#1 2008-04-24 10:17:49

Creating a local mirror

#2 2008-04-24 18:42:21

Re: Creating a local mirror

#3 2008-04-25 02:31:13

Re: Creating a local mirror

#4 2008-04-25 08:19:39

Re: Creating a local mirror

#5 2008-04-25 15:24:41

Re: Creating a local mirror

#6 2008-04-25 17:54:51

Re: Creating a local mirror

#7 2008-04-25 18:04:51

Re: Creating a local mirror

#8 2008-04-25 19:04:15

Re: Creating a local mirror

Board footer