Arch Linux Forums / Minimal pacman mirrorlist maintenance

Re: Minimal pacman mirrorlist maintenance

2020-06-16T11:12:26Z

If there's no additional filtering based on the score (removal of mirrors), then there's no need to sort the list since it gets directly piped to rankmirrors which sorts it by your connection speed instead. I think this sort takes the same amount of time regardless if the provided list is sorted by x since it has to contact every mirror to calculate it's speed.

The script sets -e which makes it exit before any changes are made to the file system if any command doesn't exit successfully. That should leave only the empty list issue you've addressed. I've added the -s check and a notice to check the script's FILTER if it ever runs into this case.

Thanks for digging in and providing solutions! :-)

* I've also removed the colors for the non-root error message since they probably don't match other users color schemes.

Re: Minimal pacman mirrorlist maintenance

2020-06-15T12:57:55Z

@zan: It just came to my mind that a modification would be useful for the case that something goes wrong when creating a new mirrorlist:

# Backup previous mirrorlist and move over the new one if /tmp/mirrorlist is not empty
if [ -s /tmp/mirrorlist ]
then 
     mv /etc/pacman.d/mirrorlist /etc/pacman.d/mirrorlist.previous
     mv /tmp/mirrorlist /etc/pacman.d/mirrorlist
else
     echo "/tmp/mirrorlist is empty - keep previous mirrorlist"
fi

Re: Minimal pacman mirrorlist maintenance

2020-06-14T12:42:04Z

I understand your objections. Nevertheless, I think that taking score into account still makes sense as it is an indication that mirrors are well maintained and reachable, IMO. I agree that selecting for a score range is problematic and sorting for it (regardless of its absolute value) would be certainly better.

I searched a bit for a possible solution and found this. I tried implementing this in various forms but to no avail. But as mentioned I'm not familiar with json parsing and jq. Perhaps you know how to handle this.

Re: Minimal pacman mirrorlist maintenance

2020-06-13T19:06:17Z

https://www.archlinux.org/mirrors/status/

| [.[] | select(.completion_pct == 1.0)]

Completion %: The number of mirror checks that have successfully connected and disconnected from the given URL. If this is below 100%, the mirror may be unreliable.

Yup, this one is a no-brainer and I've updated the post with it. Maybe someone with more mirror insight can comment if the mirrors at ~98% can be considered reliable with an asterisk.

| [.[] | select(.score < 1.0)]

Mirror Score: A very rough calculation for ranking mirrors. It is currently calculated as (hours delay + average duration + standard deviation) / completion percentage. Lower is better.
μ Delay: The calculated average mirroring delay; e.g. the mean value of last check − last sync for each check of this mirror URL. Due to the timing of mirror checks, any value under one hour should be viewed as ideal.
μ Duration: The average (mean) time it took to connect and retrieve the lastsync file from the given URL. Note that this connection time is from the location of the Arch server; your geography may product different results.
σ Duration: The standard deviation of the connect and retrieval time. A high standard deviation can indicate an unstable or overloaded mirror.

I've highlighted what made the next point in my mind. There's a small selection of countries in the ~40 mirrors at < 1.0 range. The majority falls into the < 5.0 range. I think this leads to a premature optimization that will affect those in areas where the average duration to connect is different due to being closer to the mirror geographically. For example, mirrors for countries I select fall into the 2.0-3.0 range.

If the threshold is based on the location of the Arch server and not the user's location then the script will return 0 mirrors when there are n ready, but were deemed "slow" due to the above circumstances.

These are my thoughts after thinking / reading about it for ~30 minutes. If I'm reaching wrong conclusions please correct me.

Re: Minimal pacman mirrorlist maintenance

2020-06-13T17:04:52Z

Nice approach! I wasn't aware of jq.

Wouldn't it make sense to add more filters like:

| [.[] | select(.completion_pct == 1.0)]
| [.[] | select(.score < 1.0)]

This would, e.g., make sure that only fully synced mirrors are used. Besides, it speeds up the ranking process.

Re: Minimal pacman mirrorlist maintenance

2020-06-13T12:22:52Z

Yeah, you're right. Having a shit of a moment and I completely misread the whole thing in a wrong light. Sorry, I hope you can understand and it has nothing to do with you. I've removed the stupidity.

Re: Minimal pacman mirrorlist maintenance

2020-06-13T12:15:35Z

EDIT: removed a no longer relevant response. Thanks for your newest edits. The previous ones had me baffled. I am not known for being the gentlest soul around these parts - but I was agreeing with you, applauding your goal, and being inspired by it. I was pretty confused about why that would bother you.

Re: Minimal pacman mirrorlist maintenance

2020-06-13T11:39:27Z

eschwartz wrote:

...

Hitting the nail on it's head. This is just one less python dependency on my system.

Trilby wrote:

...

I'm happy to see you gave it a chance. :-)

Awebb wrote:

...

That's how I took it as well. I appreciate your terse responses because you conveyed a lot of information with those two posts.

Re: Minimal pacman mirrorlist maintenance

2020-05-10T17:57:20Z

I suggested reflector, because it's a well known tool and basically the default recommendation, in case OP didn't know.

Re: Minimal pacman mirrorlist maintenance

2020-05-10T15:22:40Z

This thread did inspire me to try rewriting my tool in a shell script using jq and it works well, but the python version linked above is consistently faster.

EDIT: That said, I was able to "multithread" the speed checks in the shell version easily resulting in a huge savings in running time. But then I checked and found that this resulted in horribly inaccurate speed results due to the network bottleneck at my end (likely resulting in the first urls tested getting the lion's share of the bandwidth and getting much "faster" scores). So with single threaded versions, python is notably quicker.

Re: Minimal pacman mirrorlist maintenance

2020-05-10T14:37:40Z

Trilby, I interpreted Awebb's mentioning of reflector very differently. I blindly assumed it was mentioned for the same reason I'd mention it: not because one should prefer to use it, but because one should be aware that it exists before writing your own.

Sure, not everyone wants or needs to use reflector, ad there's a few different reasons that might be. Another might be "it's the only thing on my server which would depend on a python interpreter, so I would rather use something that isn't written in python".
Alternatively, one might want the fun experience of writing their own version.

And those are all perfectly fine. But on the off chance that reflector *would* be a satisfactory solution, I would still want to mention "hey, there's already a tool for that, so unless you have a particular reason to prefer writing your own, you might want to take the easy route of reusing the existing one."

Re: Minimal pacman mirrorlist maintenance

2020-05-10T14:17:21Z

Awebb wrote:

The systemd unit runs as root and changes a file in /etc, by executing code from the user home. That's the problem.

Thank you for explaining further. I understand exactly what you and eschwartz meant now & have updated the script.

Edit: I've removed part of the previous response.

Re: Minimal pacman mirrorlist maintenance

2020-05-10T13:48:09Z

zan wrote:

I tried [reflector] out before I went with this approach, but I didn't need 90% of the "bloat" in there and prefer to glance over ~30 lines (instead of 900+) of clear and easy to read instructions. Helps me understand faster what it does half a year later. Do you see any glaring holes with this kind of thinking?

I agree - notwithstanding that "bloat" has more of a negative connotation than I'd personally put on reflector, it's a great tool I used for quite some time but eventually decided I wanted a much simpler approach which I posted here.

But you are conflating simplifying the code with some design choices that are being questioned here. It is a false dichotomy to suggest that you either need a thousand lines of code providing all sorts of features or a few lines of code that is error prone. In fact, it's often quite the opposite. No one here critiqued the fact that you wrote concise and targetted code - they just noted the places where it could go wrong. Write concise simple code that doesn't go wrong.

Edit: for clarity, my short code has one definite failing: there is no specified timeout, so if connecting to a mirror stalls, the whole script stalls. But this is a loss of function failure, not a security-impacting failure. It's always good to consider what might go wrong if your code fails and assess how much of a concern that should be. That's what some of the above posts are highlighting: when everything goes right with your code, it should work great; but if/when things go wrong, the results could be very very bad.

Re: Minimal pacman mirrorlist maintenance

2020-05-10T12:11:02Z

The systemd unit runs as root and changes a file in /etc, by executing code from the user home. That's the problem.

Re: Minimal pacman mirrorlist maintenance

2020-05-10T11:49:46Z

Thank you for the feedback!

Awebb wrote:

1. https://www.archlinux.org/packages/comm … reflector/

I tried it out before I went with this approach, but I didn't need 90% of the "bloat" in there and prefer to glance over ~30 lines (instead of 900+) of clear and easy to read instructions. Helps me understand faster what it does half a year later. Do you see any glaring holes with this kind of thinking?

Awebb wrote:

2. Exec = /bin/sh -c "/home/$(logname)/.local/bin/pacman-mirrorlist"
Put that in /usr/local/bin instead. Imagine some wild process appeared and changed the content of that file to, say, dd'ing your disk with zeroes.

eschwartz wrote:

Using logname there can also do some hilarious things under various circumstances where it isn't sure who you are and simply returns "logname: no login name" on stderr. But yeah, the larger issue is permitting totally untrusted code to be run under situations where the pacman binary itself is trusted.

If I understood you correctly, it's not a problem to have scripts that don't require root permissions in ~/.local/bin and added to the PATH. However if they require root permissions it's a huge security hole to have them easily changed by your user prior to being ran by root?