The script sets -e which makes it exit before any changes are made to the file system if any command doesn't exit successfully. That should leave only the empty list issue you've addressed. I've added the -s check and a notice to check the script's FILTER if it ever runs into this case.
Thanks for digging in and providing solutions! :-)
* I've also removed the colors for the non-root error message since they probably don't match other users color schemes.
]]># Backup previous mirrorlist and move over the new one if /tmp/mirrorlist is not empty
if [ -s /tmp/mirrorlist ]
then
mv /etc/pacman.d/mirrorlist /etc/pacman.d/mirrorlist.previous
mv /tmp/mirrorlist /etc/pacman.d/mirrorlist
else
echo "/tmp/mirrorlist is empty - keep previous mirrorlist"
fi
I searched a bit for a possible solution and found this. I tried implementing this in various forms but to no avail. But as mentioned I'm not familiar with json parsing and jq. Perhaps you know how to handle this.
]]>| [.[] | select(.completion_pct == 1.0)]
Completion %: The number of mirror checks that have successfully connected and disconnected from the given URL. If this is below 100%, the mirror may be unreliable.
Yup, this one is a no-brainer and I've updated the post with it. Maybe someone with more mirror insight can comment if the mirrors at ~98% can be considered reliable with an asterisk.
| [.[] | select(.score < 1.0)]
Mirror Score: A very rough calculation for ranking mirrors. It is currently calculated as (hours delay + average duration + standard deviation) / completion percentage. Lower is better.
μ Delay: The calculated average mirroring delay; e.g. the mean value of last check − last sync for each check of this mirror URL. Due to the timing of mirror checks, any value under one hour should be viewed as ideal.
μ Duration: The average (mean) time it took to connect and retrieve the lastsync file from the given URL. Note that this connection time is from the location of the Arch server; your geography may product different results.
σ Duration: The standard deviation of the connect and retrieval time. A high standard deviation can indicate an unstable or overloaded mirror.
I've highlighted what made the next point in my mind. There's a small selection of countries in the ~40 mirrors at < 1.0 range. The majority falls into the < 5.0 range. I think this leads to a premature optimization that will affect those in areas where the average duration to connect is different due to being closer to the mirror geographically. For example, mirrors for countries I select fall into the 2.0-3.0 range.
If the threshold is based on the location of the Arch server and not the user's location then the script will return 0 mirrors when there are n ready, but were deemed "slow" due to the above circumstances.
These are my thoughts after thinking / reading about it for ~30 minutes. If I'm reaching wrong conclusions please correct me.
]]>Wouldn't it make sense to add more filters like:
| [.[] | select(.completion_pct == 1.0)]
| [.[] | select(.score < 1.0)]
This would, e.g., make sure that only fully synced mirrors are used. Besides, it speeds up the ranking process.
]]>...
Hitting the nail on it's head. This is just one less python dependency on my system.
...
I'm happy to see you gave it a chance. :-)
...
That's how I took it as well. I appreciate your terse responses because you conveyed a lot of information with those two posts.
]]>EDIT: That said, I was able to "multithread" the speed checks in the shell version easily resulting in a huge savings in running time. But then I checked and found that this resulted in horribly inaccurate speed results due to the network bottleneck at my end (likely resulting in the first urls tested getting the lion's share of the bandwidth and getting much "faster" scores). So with single threaded versions, python is notably quicker.
]]>Sure, not everyone wants or needs to use reflector, ad there's a few different reasons that might be. Another might be "it's the only thing on my server which would depend on a python interpreter, so I would rather use something that isn't written in python".
Alternatively, one might want the fun experience of writing their own version.
And those are all perfectly fine. But on the off chance that reflector *would* be a satisfactory solution, I would still want to mention "hey, there's already a tool for that, so unless you have a particular reason to prefer writing your own, you might want to take the easy route of reusing the existing one."
]]>The systemd unit runs as root and changes a file in /etc, by executing code from the user home. That's the problem.
Thank you for explaining further. I understand exactly what you and eschwartz meant now & have updated the script.
Edit: I've removed part of the previous response.
]]>I tried [reflector] out before I went with this approach, but I didn't need 90% of the "bloat" in there and prefer to glance over ~30 lines (instead of 900+) of clear and easy to read instructions. Helps me understand faster what it does half a year later. Do you see any glaring holes with this kind of thinking?
I agree - notwithstanding that "bloat" has more of a negative connotation than I'd personally put on reflector, it's a great tool I used for quite some time but eventually decided I wanted a much simpler approach which I posted here.
But you are conflating simplifying the code with some design choices that are being questioned here. It is a false dichotomy to suggest that you either need a thousand lines of code providing all sorts of features or a few lines of code that is error prone. In fact, it's often quite the opposite. No one here critiqued the fact that you wrote concise and targetted code - they just noted the places where it could go wrong. Write concise simple code that doesn't go wrong.
Edit: for clarity, my short code has one definite failing: there is no specified timeout, so if connecting to a mirror stalls, the whole script stalls. But this is a loss of function failure, not a security-impacting failure. It's always good to consider what might go wrong if your code fails and assess how much of a concern that should be. That's what some of the above posts are highlighting: when everything goes right with your code, it should work great; but if/when things go wrong, the results could be very very bad.
]]>I tried it out before I went with this approach, but I didn't need 90% of the "bloat" in there and prefer to glance over ~30 lines (instead of 900+) of clear and easy to read instructions. Helps me understand faster what it does half a year later. Do you see any glaring holes with this kind of thinking?
2. Exec = /bin/sh -c "/home/$(logname)/.local/bin/pacman-mirrorlist"
Put that in /usr/local/bin instead. Imagine some wild process appeared and changed the content of that file to, say, dd'ing your disk with zeroes.
Using logname there can also do some hilarious things under various circumstances where it isn't sure who you are and simply returns "logname: no login name" on stderr. But yeah, the larger issue is permitting totally untrusted code to be run under situations where the pacman binary itself is trusted.
If I understood you correctly, it's not a problem to have scripts that don't require root permissions in ~/.local/bin and added to the PATH. However if they require root permissions it's a huge security hole to have them easily changed by your user prior to being ran by root?
]]>