You are not logged in.

#1 2020-05-09 22:28:30

zan
Member
Registered: 2020-05-08
Posts: 14

Minimal pacman mirrorlist maintenance

I apologize in advance if such simple automations are not meant to be posted here and I will delete the post if requested.

I've been using this simple approach for /etc/pacman.d/mirrorlist updating with only https mirrors from specific countries. You can easily change to rsync or http or skip the check altogether, and also add or remove more country codes as you see fit.

sudo pacman -S curl jq pacman-contrib

curl is used to fetch the mirrorlist JSON status, jq to parse the JSON and select mirrors we're interested in and pacman-contrib provides the rankmirrors utility.

#!/bin/sh
# /etc/pacman.d/hooks/mirrorlist.hook
#
# To prevent the systemd unit above from running potentially any code that could
# be put into this file, copy the script to the secure /usr/local/bin directory.

# Exit immediately if a command exits with a non-zero exit status.
set -e

# Fail-check: make sure you have root permissions.
if [ ! -w /etc/pacman.d/mirrorlist ]; then
   printf '%s\n' ':: Error: missing required root permissions.'
   exit 1
fi

# Mirrorlist status from the last 24 hours.
URL='https://archlinux.org/mirrors/status/json/'

# Return only secure mirrors from selected countries.
FILTER='.urls | [.[] | select(.protocol == "https")]
              | [.[] | select(.completion_pct == 1.0)]
              | [.[] | select(.country_code == "CH", .country_code == "AT",
                              .country_code == "DK", .country_code == "FI",
                              .country_code == "IS", .country_code == "LU",
                              .country_code == "NL", .country_code == "NO",
                              .country_code == "SI")]
              | .[] | "## \(.country)\nServer = \(.url)$repo/os/$arch"'

# Fetch and filter the mirrors, then rank them by connection and opening speed.
curl -qs "$URL" | jq -r "$FILTER" | rankmirrors -v - > /tmp/mirrorlist

# Fail-check: make sure the new mirrorlist is not empty.
if [ -s /tmp/mirrorlist ]; then
   # Backup previous mirrorlist and move over the new one.
   mv /etc/pacman.d/mirrorlist /etc/pacman.d/mirrorlist.previous
   mv /tmp/mirrorlist /etc/pacman.d/mirrorlist

   # Remove mirrorlist.pacnew created during pacman-mirrorlist upgrade.
   [ -f /etc/pacman.d/mirrorlist.pacnew ] && rm /etc/pacman.d/mirrorlist.pacnew
else
   printf '%s\n' "      - built an empty mirrorlist, check the script's FILTER"
fi

# Exit with successful status to satisfy the pacman hook.
exit 0

Store the script in /usr/local/bin/pacman-mirrorlist.

To automate the updating I went with a simple pacman hook that runs the script on pacman-mirrorlist package upgrades.

[Trigger]
Operation = Upgrade
Type = Package
Target = pacman-mirrorlist

[Action]
Description = Updating pacman mirrorlist and removing mirrorlist.pacnew...
When = PostTransaction
Exec = /bin/sh -c "/usr/local/bin/pacman-mirrorlist"

Store the hook in /etc/pacman.d/hooks/mirrorlist.hook and that's it. The next time pacman-mirrorlist package is upgraded your mirrorlist will be refreshed.

Last edited by zan (2021-01-10 22:21:59)

Offline

#2 2020-05-10 04:18:48

Awebb
Member
Registered: 2010-05-06
Posts: 6,275

Re: Minimal pacman mirrorlist maintenance

1. https://www.archlinux.org/packages/comm … reflector/

2. Exec = /bin/sh -c "/home/$(logname)/.local/bin/pacman-mirrorlist"

Put that in /usr/local/bin instead. Imagine some wild process appeared and changed the content of that file to, say, dd'ing your disk with zeroes.

Online

#3 2020-05-10 04:47:54

eschwartz
Fellow
Registered: 2014-08-08
Posts: 4,097

Re: Minimal pacman mirrorlist maintenance

Using logname there can also do some hilarious things under various circumstances where it isn't sure who you are and simply returns "logname: no login name" on stderr. But yeah, the larger issue is permitting totally untrusted code to be run under situations where the pacman binary itself is trusted.


Managing AUR repos The Right Way -- aurpublish (now a standalone tool)

Offline

#4 2020-05-10 11:49:46

zan
Member
Registered: 2020-05-08
Posts: 14

Re: Minimal pacman mirrorlist maintenance

Thank you for the feedback!

I tried it out before I went with this approach, but I didn't need 90% of the "bloat" in there and prefer to glance over ~30 lines (instead of 900+) of clear and easy to read instructions. Helps me understand faster what it does half a year later. Do you see any glaring holes with this kind of thinking?

Awebb wrote:

2. Exec = /bin/sh -c "/home/$(logname)/.local/bin/pacman-mirrorlist"

Put that in /usr/local/bin instead. Imagine some wild process appeared and changed the content of that file to, say, dd'ing your disk with zeroes.

eschwartz wrote:

Using logname there can also do some hilarious things under various circumstances where it isn't sure who you are and simply returns "logname: no login name" on stderr. But yeah, the larger issue is permitting totally untrusted code to be run under situations where the pacman binary itself is trusted.

If I understood you correctly, it's not a problem to have scripts that don't require root permissions in ~/.local/bin and added to the PATH. However if they require root permissions it's a huge security hole to have them easily changed by your user prior to being ran by root?

Offline

#5 2020-05-10 12:11:02

Awebb
Member
Registered: 2010-05-06
Posts: 6,275

Re: Minimal pacman mirrorlist maintenance

The systemd unit runs as root and changes a file in /etc, by executing code from the user home. That's the problem.

Online

#6 2020-05-10 13:48:09

Trilby
Inspector Parrot
Registered: 2011-11-29
Posts: 29,449
Website

Re: Minimal pacman mirrorlist maintenance

zan wrote:

I tried [reflector] out before I went with this approach, but I didn't need 90% of the "bloat" in there and prefer to glance over ~30 lines (instead of 900+) of clear and easy to read instructions. Helps me understand faster what it does half a year later. Do you see any glaring holes with this kind of thinking?

I agree - notwithstanding that "bloat" has more of a negative connotation than I'd personally put on reflector, it's a great tool I used for quite some time but eventually decided I wanted a much simpler approach which I posted here.

But you are conflating simplifying the code with some design choices that are being questioned here.  It is a false dichotomy to suggest that you either need a thousand lines of code providing all sorts of features or a few lines of code that is error prone.  In fact, it's often quite the opposite.  No one here critiqued the fact that you wrote concise and targetted code - they just noted the places where it could go wrong.  Write concise simple code that doesn't go wrong.

Edit: for clarity, my short code has one definite failing: there is no specified timeout, so if connecting to a mirror stalls, the whole script stalls.  But this is a loss of function failure, not a security-impacting failure.  It's always good to consider what might go wrong if your code fails and assess how much of a concern that should be.  That's what some of the above posts are highlighting: when everything goes right with your code, it should work great; but if/when things go wrong, the results could be very very bad.

Last edited by Trilby (2020-05-10 13:51:21)


"UNIX is simple and coherent..." - Dennis Ritchie, "GNU's Not UNIX" -  Richard Stallman

Offline

#7 2020-05-10 14:17:21

zan
Member
Registered: 2020-05-08
Posts: 14

Re: Minimal pacman mirrorlist maintenance

Awebb wrote:

The systemd unit runs as root and changes a file in /etc, by executing code from the user home. That's the problem.

Thank you for explaining further. I understand exactly what you and eschwartz meant now & have updated the script.

Edit: I've removed part of the previous response.

Last edited by zan (2020-06-13 12:25:07)

Offline

#8 2020-05-10 14:37:40

eschwartz
Fellow
Registered: 2014-08-08
Posts: 4,097

Re: Minimal pacman mirrorlist maintenance

Trilby, I interpreted Awebb's mentioning of reflector very differently. I blindly assumed it was mentioned for the same reason I'd mention it: not because one should prefer to use it, but because one should be aware that it exists before writing your own.

Sure, not everyone wants or needs to use reflector, ad there's a few different reasons that might be. Another might be "it's the only thing on my server which would depend on a python interpreter, so I would rather use something that isn't written in python".
Alternatively, one might want the fun experience of writing their own version.

And those are all perfectly fine. But on the off chance that reflector *would* be a satisfactory solution, I would still want to mention "hey, there's already a tool for that, so unless you have a particular reason to prefer writing your own, you might want to take the easy route of reusing the existing one."


Managing AUR repos The Right Way -- aurpublish (now a standalone tool)

Offline

#9 2020-05-10 15:22:40

Trilby
Inspector Parrot
Registered: 2011-11-29
Posts: 29,449
Website

Re: Minimal pacman mirrorlist maintenance

This thread did inspire me to try rewriting my tool in a shell script using jq and it works well, but the python version linked above is consistently faster.

EDIT: That said, I was able to "multithread" the speed checks in the shell version easily resulting in a huge savings in running time.  But then I checked and found that this resulted in horribly inaccurate speed results due to the network bottleneck at my end (likely resulting in the first urls tested getting the lion's share of the bandwidth and getting much "faster" scores).  So with single threaded versions, python is notably quicker.

Last edited by Trilby (2020-05-10 15:32:10)


"UNIX is simple and coherent..." - Dennis Ritchie, "GNU's Not UNIX" -  Richard Stallman

Offline

#10 2020-05-10 17:57:20

Awebb
Member
Registered: 2010-05-06
Posts: 6,275

Re: Minimal pacman mirrorlist maintenance

I suggested reflector, because it's a well known tool and basically the default recommendation, in case OP didn't know.

Online

#11 2020-06-13 11:39:27

zan
Member
Registered: 2020-05-08
Posts: 14

Re: Minimal pacman mirrorlist maintenance

eschwartz wrote:

...

Hitting the nail on it's head. This is just one less python dependency on my system.

Trilby wrote:

...

I'm happy to see you gave it a chance. :-)

Awebb wrote:

...

That's how I took it as well. I appreciate your terse responses because you conveyed a lot of information with those two posts.

Last edited by zan (2020-06-13 13:15:22)

Offline

#12 2020-06-13 12:15:35

Trilby
Inspector Parrot
Registered: 2011-11-29
Posts: 29,449
Website

Re: Minimal pacman mirrorlist maintenance

EDIT: removed a no longer relevant response.  Thanks for your newest edits.  The previous ones had me baffled.  I am not known for being the gentlest soul around these parts - but I was agreeing with you, applauding your goal, and being inspired by it.  I was pretty confused about why that would bother you.

Last edited by Trilby (2020-06-13 12:34:29)


"UNIX is simple and coherent..." - Dennis Ritchie, "GNU's Not UNIX" -  Richard Stallman

Offline

#13 2020-06-13 12:22:52

zan
Member
Registered: 2020-05-08
Posts: 14

Re: Minimal pacman mirrorlist maintenance

Yeah, you're right. Having a shit of a moment and I completely misread the whole thing in a wrong light. Sorry, I hope you can understand and it has nothing to do with you. I've removed the stupidity.

Last edited by zan (2020-06-24 16:40:46)

Offline

#14 2020-06-13 17:04:52

adventurer
Member
Registered: 2014-05-04
Posts: 119

Re: Minimal pacman mirrorlist maintenance

Nice approach! I wasn't aware of jq.

Wouldn't it make sense to add more filters like:

| [.[] | select(.completion_pct == 1.0)]
| [.[] | select(.score < 1.0)]

This would, e.g., make sure that only fully synced mirrors are used. Besides, it speeds up the ranking process.

Offline

#15 2020-06-13 19:06:17

zan
Member
Registered: 2020-05-08
Posts: 14

Re: Minimal pacman mirrorlist maintenance

https://www.archlinux.org/mirrors/status/

| [.[] | select(.completion_pct == 1.0)]

Completion %: The number of mirror checks that have successfully connected and disconnected from the given URL. If this is below 100%, the mirror may be unreliable.

Yup, this one is a no-brainer and I've updated the post with it. Maybe someone with more mirror insight can comment if the mirrors at ~98% can be considered reliable with an asterisk.

| [.[] | select(.score < 1.0)]

Mirror Score: A very rough calculation for ranking mirrors. It is currently calculated as (hours delay + average duration + standard deviation) / completion percentage. Lower is better.

μ Delay: The calculated average mirroring delay; e.g. the mean value of last check − last sync for each check of this mirror URL. Due to the timing of mirror checks, any value under one hour should be viewed as ideal.

μ Duration: The average (mean) time it took to connect and retrieve the lastsync file from the given URL. Note that this connection time is from the location of the Arch server; your geography may product different results.

σ Duration: The standard deviation of the connect and retrieval time. A high standard deviation can indicate an unstable or overloaded mirror.

I've highlighted what made the next point in my mind. There's a small selection of countries in the ~40 mirrors at < 1.0 range. The majority falls into the < 5.0 range. I think this leads to a premature optimization that will affect those in areas where the average duration to connect is different due to being closer to the mirror geographically. For example, mirrors for countries I select fall into the 2.0-3.0 range.

If the threshold is based on the location of the Arch server and not the user's location then the script will return 0 mirrors when there are n ready, but were deemed "slow" due to the above circumstances.

These are my thoughts after thinking / reading about it for ~30 minutes. If I'm reaching wrong conclusions please correct me.

Last edited by zan (2020-06-13 20:07:40)

Offline

#16 2020-06-14 12:42:04

adventurer
Member
Registered: 2014-05-04
Posts: 119

Re: Minimal pacman mirrorlist maintenance

I understand your objections. Nevertheless, I think that taking score into account still makes sense as it is an indication that mirrors are well maintained and reachable, IMO. I agree that selecting for a score range is problematic and sorting for it (regardless of its absolute value) would be certainly better.

I searched a bit for a possible solution and found this. I tried implementing this in various forms but to no avail. But as mentioned I'm not familiar with json parsing and jq. Perhaps you know how to handle this.

Offline

#17 2020-06-15 12:57:55

adventurer
Member
Registered: 2014-05-04
Posts: 119

Re: Minimal pacman mirrorlist maintenance

@zan: It just came to my mind that a modification would be useful for the case that something goes wrong when creating a new mirrorlist:

# Backup previous mirrorlist and move over the new one if /tmp/mirrorlist is not empty
if [ -s /tmp/mirrorlist ]
then 
     mv /etc/pacman.d/mirrorlist /etc/pacman.d/mirrorlist.previous
     mv /tmp/mirrorlist /etc/pacman.d/mirrorlist
else
     echo "/tmp/mirrorlist is empty - keep previous mirrorlist"
fi

Offline

#18 2020-06-16 11:12:26

zan
Member
Registered: 2020-05-08
Posts: 14

Re: Minimal pacman mirrorlist maintenance

If there's no additional filtering based on the score (removal of mirrors), then there's no need to sort the list since it gets directly piped to rankmirrors which sorts it by your connection speed instead. I think this sort takes the same amount of time regardless if the provided list is sorted by x since it has to contact every mirror to calculate it's speed.

The script sets -e which makes it exit before any changes are made to the file system if any command doesn't exit successfully. That should leave only the empty list issue you've addressed. I've added the -s check and a notice to check the script's FILTER if it ever runs into this case.

Thanks for digging in and providing solutions! :-)

* I've also removed the colors for the non-root error message since they probably don't match other users color schemes.

Last edited by zan (2020-06-16 12:29:53)

Offline

Board footer

Powered by FluxBB