You are not logged in.

#1 2006-03-01 11:23:22

drakosha
Member
Registered: 2006-01-03
Posts: 253
Website

script to find fastest repository

It's in python, run it using:
repository_test.py /etc/pacman.d/unstable
(unstable is just an example)

It'll examine all the servers in the specified file and print 5 fastest sorted by access time. Time is measured based on login and get file listing.

Any suggestions/improvements are appreciated.
Enjoy:

#! /usr/bin/python

from ftplib import FTP
import sys
import urllib
import time

def timeCmd(cmd):
    before = time.time();
    try:
        cmd();
    except KeyboardInterrupt, ki:
        raise ki
    except Exception, e:
        print 'ERROR: ', e
        return 99999999
    return time.time() - before;

def talkToServer(server, dir):
    ftp = FTP(server)
    ftp.login()
    ftp.cwd(dir)
    ftp.nlst()

def getFuncToTime(server, dir):
    return lambda : talkToServer(server, dir)

def splitUrl(url):
    server = urllib.splittype(url.strip())[1]
    return urllib.splithost(server)

def cmpPairBySecond(p1, p2):
    if p1[1] == p2[1]: return 0
    if p1[1] < p2[1]: return -1
    return 1

if __name__ == "__main__":
    if len(sys.argv) != 2:
        print 'Usage: ', sys.argv[0], ' <pacman-servers-list-file>'
        sys.exit(0)
    fl = open(sys.argv[1], 'r')

    serverToTime = {}
    for ln in fl.readlines():
        splitted = ln.split('=')
        if splitted[0].strip() != 'Server': continue

        serverUrl = splitted[1]
        if serverUrl[-1] == 'n': serverUrl = serverUrl[0:-1]
        splittedUrl = splitUrl(serverUrl)
        print 'Querying: ', splittedUrl[0], '...'
        serverToTime[serverUrl] = timeCmd(getFuncToTime(splittedUrl[0], splittedUrl[1]))
        #print 't',serverToTime[serverUrl]

    items = serverToTime.items()
    items.sort(cmpPairBySecond)
    print '======================================'
    print 'Servers sorted by time'
    print '======================================'
    for i in items[0:5]:
        print i[0], ': ', i[1]

Offline

#2 2006-03-01 11:33:36

murkus
Member
From: Europe/Helsinki
Registered: 2004-03-19
Posts: 254

Re: script to find fastest repository

Is there any advantage using your script over sortmirrors.pl?

Offline

#3 2006-03-01 11:41:09

drakosha
Member
Registered: 2006-01-03
Posts: 253
Website

Re: script to find fastest repository

i was not aware of it...
ups...  :oops:  :oops:

Offline

#4 2006-03-01 13:09:42

murkus
Member
From: Europe/Helsinki
Registered: 2004-03-19
Posts: 254

Re: script to find fastest repository

drakosha wrote:

i was not aware of it...
ups...  :oops:  :oops:


Well, it's not very clearly advertised..

I was just wondering if there was some improvements in your script.

Offline

#5 2006-03-01 13:16:55

drakosha
Member
Registered: 2006-01-03
Posts: 253
Website

Re: script to find fastest repository

i can think of only one iprovement - it requires nothing besides python

Offline

#6 2006-03-01 16:30:09

Dusty
Schwag Merchant
From: Medicine Hat, Alberta, Canada
Registered: 2004-01-18
Posts: 5,986
Website

Re: script to find fastest repository

also, sortmirrors.pl has been buggy for a while because of an upstream bug in <I've forgotten the name of the package>.

Dusty

Offline

#7 2006-03-01 18:54:48

Lone_Wolf
Member
From: Netherlands, Europe
Registered: 2005-10-04
Posts: 11,911

Re: script to find fastest repository

Netselect is the package that is used by sortmirrors (and has some flaws).


Disliking systemd intensely, but not satisfied with alternatives so focusing on taming systemd.


(A works at time B)  && (time C > time B ) ≠  (A works at time C)

Offline

#8 2006-03-01 19:01:56

Dusty
Schwag Merchant
From: Medicine Hat, Alberta, Canada
Registered: 2004-01-18
Posts: 5,986
Website

Re: script to find fastest repository

yeah yeah, that one. Wonder why my brain misfiled it.

Dusty

Offline

#9 2006-03-01 19:10:57

MaceM
Member
From: Austria
Registered: 2003-11-26
Posts: 47

Re: script to find fastest repository

drakosha's script connects to the ftp server, sortmirror "only" pings them (afaik).
maybe the python script can be modified to fetch a file from the ftp and thereby sort the servers by their actual download speed, which i'd  prefer to sorting it by ping respond times.

Offline

#10 2006-03-02 09:26:28

drakosha
Member
Registered: 2006-01-03
Posts: 253
Website

Re: script to find fastest repository

It's more than just a ping, it's also doing "ls" on the server, which is supposed to be a long list.
The problem with downloading a file is to know what file to download:
1. it must be there
2. it must be not too big

Offline

#11 2006-03-02 10:16:54

tomk
Forum Fellow
From: Ireland
Registered: 2004-07-21
Posts: 9,839

Re: script to find fastest repository

Dusty wrote:

Wonder why my brain misfiled it.

Because you knew it would be easy to find next time it was needed - I do it all the time, I think it's a subconscious thing.

Offline

#12 2006-03-02 10:28:12

tomk
Forum Fellow
From: Ireland
Registered: 2004-07-21
Posts: 9,839

Re: script to find fastest repository

Sorry - back on topic - I like it. Apart from netselect's bugginess, I never liked the fact that sortmirror.pl actually sorted the mirrors - I wanted it to do what you're doing i.e. tell me the best mirrors, so I can edit the pacman.d files myself. Your script correctly selected heanet as my fastest mirror for current/extra/community, with varying results after that, but generally within the same group of 6-7.

Offline

#13 2006-03-02 10:39:47

murkus
Member
From: Europe/Helsinki
Registered: 2004-03-19
Posts: 254

Re: script to find fastest repository

Cool,
It seems that drakosha's script is good improvement over the official script. I'm gonna try it at home, let's see if it brings better dl speeds.

Offline

#14 2006-03-03 17:27:41

oliv
Member
Registered: 2005-04-17
Posts: 58

Re: script to find fastest repository

I also find that it's a better method than netselect. When I launch sortmirrors I always get as first server a strange one:

ERROR:  550 /pub/linux/distributions/archlinux/community/os/i686: No such file or directory.
ERROR:  530 Login incorrect.
Or Error 404...


P.S. I really start to like python . big_smile

Offline

#15 2006-03-03 21:43:01

Dusty
Schwag Merchant
From: Medicine Hat, Alberta, Canada
Registered: 2004-01-18
Posts: 5,986
Website

Re: script to find fastest repository

I added a comment to the bug report, maybe this will go official.

http://bugs.archlinux.org/task/2952

Nice introduction to the forums, drakosha, a nice meaningful contribution. :-)

Dusty

Offline

#16 2006-03-04 02:03:46

FoPref
Member
From: Erlangen / Germany
Registered: 2004-03-24
Posts: 96
Website

Re: script to find fastest repository

Hi,


it could have another major advantage if it would be able to connect over proxy... as people behind a proxy can't use netselect.
I would really appreciate proxy support.


It is at least in advantage on some routed systems with NAT which also don't allow what netselect does.


Regards,
Ford Prefect

Offline

#17 2006-03-04 08:54:27

drakosha
Member
Registered: 2006-01-03
Posts: 253
Website

Re: script to find fastest repository

First of all thanks for all the positive feedbacks - it gives a really good feeling to read them smile
2nd - feel free to add suggestions here, i'll try to implement those.

About proxy suggestion: is there a standard way to define a proxy? Some env. variables? Some other way?

Offline

#18 2006-03-04 10:13:19

IceRAM
Member
From: Bucharest, Romania
Registered: 2004-03-04
Posts: 772
Website

Re: script to find fastest repository

I was looking today over the mirrors list in /etc/pacman.d/* and I couldn't help not noticing that most of them (probably except 1 or 2) were FTP sites. I think connection establishment over FTP is somehow slower than over HTTP. Anyhow, I started a discussion on this matter on the maillist.

At the same time, I've discovered that some mirrors might not be up to date. You could replace or add to the current method of detecting the speed of the download something like: downloading reponame.db.tar.gz - I don't know if that file holds the last update tune, but you could use the file time to show the age of the repo. A could might be the fastest, but could be outdated at the same time. Sometimes, the db file might be too small to check the speed, so maybe a ls is more appropriate then.

Offline

#19 2006-03-04 16:18:06

FoPref
Member
From: Erlangen / Germany
Registered: 2004-03-24
Posts: 96
Website

Re: script to find fastest repository

Hi,

the standard way to define proxys in the command line are indeed environmental variables. These are ftp_proxy, http_proxy and they are well recognized (ie: wget, mplayer, ...).

These variables are to be set in URL form. Here are examples from my setup:
http_proxy=http://proxy:8080/
ftp_proxy=http://proxy:8080/

As you see, it is possible to define an http proxy for ftp access. Best would be to use an ftp library which comes with in-house proxy-support I guess.


cu
Ford Prefect

Offline

#20 2006-03-05 17:42:49

stonecrest
Member
From: Boulder
Registered: 2005-01-22
Posts: 1,190

Re: script to find fastest repository

I'd just like to point out the obvious contradiction that occurs from a script like this. As more people use it and switch to the faster mirrors, these mirrors will become slower. Just playing devil's advocate smile Very nice script.


I am a gated community.

Offline

#21 2006-03-05 18:36:31

Dusty
Schwag Merchant
From: Medicine Hat, Alberta, Canada
Registered: 2004-01-18
Posts: 5,986
Website

Re: script to find fastest repository

stonecrest wrote:

I'd just like to point out the obvious contradiction that occurs from a script like this. As more people use it and switch to the faster mirrors, these mirrors will become slower. Just playing devil's advocate smile Very nice script.

Not strictly true -- the download speed from the mirror depends on your location and other factors.

Dusty

Offline

#22 2006-03-06 07:19:50

drakosha
Member
Registered: 2006-01-03
Posts: 253
Website

Re: script to find fastest repository

After digging in python docs, it looks like it's easily doable (proxy) with urllib2. New version soon smile

Offline

#23 2006-03-06 13:22:20

oliv
Member
Registered: 2005-04-17
Posts: 58

Re: script to find fastest repository

I also have sometimes problems with not updated mirrors. Sometimes I get a kernel update but the modules aren't yet updated, or some dependancy which won't resolve because a package doesn't exist yet.

Is there a way to find the last update date ? Maybe with the package db ?

EDIT: I just mean to check if the repository isn't too old (< 3 days ?). Maybe you can just check the last edit date with "ls -l current.db.tar.gz". But can we consider that there is everyday edited packages ?

Offline

#24 2006-03-06 14:46:05

drakosha
Member
Registered: 2006-01-03
Posts: 253
Website

Re: script to find fastest repository

i don't know how to do it, pacman expert to resque?

Offline

#25 2006-03-06 17:22:05

drakosha
Member
Registered: 2006-01-03
Posts: 253
Website

Re: script to find fastest repository

Changelog:
* added command line arguments
* should honor http/ftp proxies defined via env variables

Make sure it's 1.2 that you use - i updated this post at least twice!

enjoy big_smile

#! /usr/bin/python
# ver 1.2
import urllib2
import sys
import time
from optparse import OptionParser

def createOptParser():
    parser = OptionParser()
    parser.add_option("-s", "--server-number", default=5,
                dest="server_number",
                help="amount of servers to print, 0 for all")
    parser.add_option("-v", "--verbose",
                  action="store_true", dest="verbose", default=False,
                  help="be verbose")
    return parser

def timeCmd(cmd):
    before = time.time();
    try:
        cmd();
    except KeyboardInterrupt, ki:
        raise ki
    except Exception, e:
        print 'tERROR: ', e
        return 99999999
    return time.time() - before;

def talkToServer(serverUrl):
    opener = urllib2.build_opener()
    tmp = opener.open(serverUrl).read()

def getFuncToTime(serverUrl):
    return lambda : talkToServer(serverUrl)

def cmpPairBySecond(p1, p2):
    if p1[1] == p2[1]: return 0
    if p1[1] < p2[1]: return -1
    return 1

if __name__ == "__main__":
    parser = createOptParser()
    (options, args) = parser.parse_args()

    if len(args) != 1:
        parser.print_help()
        sys.exit(0)
    
    fl = open(args[0], 'r')
    serverToTime = {}
    print 'Querying servers, it might take some time '
    for ln in fl.readlines():
        splitted = ln.split('=')
        if splitted[0].strip() != 'Server': continue

        serverUrl = splitted[1]
        if serverUrl[-1] == 'n': serverUrl = serverUrl[0:-1]
        if not options.verbose: print '*',
        else:            print serverUrl, '...',
        #sys.stdout.flush()
        serverToTime[serverUrl] = timeCmd(getFuncToTime(serverUrl))
        if options.verbose: print 't',serverToTime[serverUrl]

    items = serverToTime.items()
    items.sort(cmpPairBySecond)
    numberOfItemsToShow = int(options.server_number)
    if numberOfItemsToShow == 0: numberOfItemsToShow = len(items)
    if len(items) > 0:
        if not options.verbose: print
        print '======================================'
        print 'Servers sorted by time'
        print '======================================'
        for i in items[0:numberOfItemsToShow]:
            print i[0], ': ', i[1]

Offline

Board footer

Powered by FluxBB