Idea: Use Historical Mirror data to improve mirror management?

davidm · 2010-02-09 18:10:01

In reading a complaint here I thought about how it might be possible to improve the upgrade experience in a pragmatic way. I considered the existing tools we have such as reflector and the mirror status but realized that nothing seems to keep track of which mirror is typically most up to date, only whether it is not up-to-date at a given moment. Perhaps one way or another this could be changed to improve the experience as obviously that isn't always optimal.

One idea might be to either create a new utility (or enhance an existing one) which would do something such as:

- Check mirrors X times a day. (Once or twice a day might be a sane default)

- Keep track of reliability, Freshness, and Speed.

- Use a set of customizable ranking factors in order to rank the mirrors. Ideally these would be customizable by the user to fit their exact needs or preferences. Consideration and thought should go into the implementation. For instance by default we might want to weigh data within the last day more than data a week old. Data a month old should not have as much relevance. This too might be customizable by the user.

- Using the ranking factors write out a new mirror list which is intelligently updated.

This would make maintaining the mirror list more of a "set it up and forget it" type experience for the average user (assuming good defaults) . Ideally good mirrors in the users locality would eventually "rise to the top" and bad mirrors would "fall to the bottom". If done intelligently the user would be able to easily customize things to their preference. Such as again, maybe they want 70% of the score to be Speed, 20% of the score Freshness, 10% reliability. Or perhaps they want it to be 70% Freshness, 15% speed, 15% reliability. That could likewise be done.

Maybe it would be enough to not have to run this as daemon or using cron but to merely have existing tools such as reflector merely keep track of this historical data and intelligently use it. I do realize that reflector could be ran as a regular cron job and do much of this, however it makes decision based on the moment only and does not use historical data (i.e. maybe the mirror usually works very well but went down for the day or even hour that the check was ran, etc).

Just thought I'd share the idea I came up with during lunch for consideration.

Last edited by davidm (2010-02-09 18:19:39)

fukawi2 · 2010-02-09 21:40:51

Perhaps something like this?
http://people.cs.uu.nl/henkp/mirmon/

Example:
http://www.archserver.org/mirmon

davidm · 2010-02-09 22:44:17

Pretty much yes. Then use the data to modify the local mirror list in addition to allow a lot of customization of weights to a given attribute/variable.

Simplified Example [Math might be a bit iffy this is just a quick example] :

1 - 10 controls weight for a factor (multiplier)

Freshness: 5 (50%)
Speed: 3 (30%)
Reliability: 2 (20%)

Freshness Value [1 - 100]

Updated within .....

< 1 hour: 100 points
2 hours 99
3 hours 98
4 hours 97
5 hours 96
6 hours 95

...

(doesn't need to be even, that wouldn't be optimal anyway]

24 hours: 50
36 hours: 25
48 hours: 10
72 hours: 5
> 72 hours: 0

------------------

Speed [1 - 100] (Something like....)

KiB/s / 10

So 1 MiB/s = 100
500 KiB/s = 50
100 KiB/s = 10
50 KiB/s = 5

------------------------------

Server A:

Checked: Midnight

Updated 1.5 hours ago. [Value = 99]
Speed: 500 KiB/s [Value = 50]

Checked: Noon

Updated: 3.0 hours ago [Value = 97]
Speed: 400 KiB/s [ Value = 40]

Server B:

Checked: Midnight

Updated 6.0 hours ago. [Value = 94]
Speed: 700 KiB/s [Value = 70]

Checked: Noon

Updated: 1.0 hours ago [Value = 99]
Speed: 700 KiB/s [ Value = 70]

Server A

Freshness Score: 98
Speed Score: 45

Server B:

Freshness Score: 97
Speed Score: 70

Weight Factors:

Freshness: 5 (50%)
Speed: 3 (30%)
Reliability: 2 (20%) [not implemented here]

Weighted Scores per server:

Server A:
Freshness: 98 * 5 = 490
Speed: 45 * 3 = 135

Server A Total Score: 625

Server B:

Freshness: 97 * 5 = 485
Speed: 70 * 3 = 210

Server A Total Score: 695

New Ranked Mirror Order:

Server B (695) [Server B wins because of the speed difference here]
Server A (625)

Only here it is just using simple averages for simplicity. We would presumably assign weights depending on how recent the data is.

Example Weights:

Recent Run (N) : 100
N -1 Run: 50
N -2 Run: 49

(or whatever, this is just to show the concept.)

Also the example above doesn't take into account reliability because this is too long already but you get the point. Obviously there have to be minimums as well and other actions in order for it to make sense. You'd tweak the numbers and algorithm so that it'd make sense. The idea being that historical data is used to help determine which mirror has the best chance of being updated, fast, and reliable. Basically like using reflector only not being so dependent on the one run.

You can always get a fresh, fast, and reliable mirror using reflector now but you always have to run it right before updating to be sure. If you don't there is a good chance your number one mirror might be a "one hit wonder" and not update again for another 30 days or something crazy. The goal here would be to handle things like that using historical data on the client side using an intelligent algorithm which takes into account historical data as well as the user's preference.

Last edited by davidm (2010-02-09 22:51:21)

Arch Linux

#1 2010-02-09 18:10:01

Idea: Use Historical Mirror data to improve mirror management?

#2 2010-02-09 21:40:51

Re: Idea: Use Historical Mirror data to improve mirror management?

#3 2010-02-09 22:44:17

Re: Idea: Use Historical Mirror data to improve mirror management?

Board footer