You are not logged in.

#1 2008-03-31 14:29:42

Rasi
Member
From: Germany
Registered: 2007-08-14
Posts: 1,914
Website

Getting tracklist from freedb?

Hey, i am looking for an easy way to get a txt file with the contents of a freedb entry.

right now i have a working sed script, that i can feed with a given freedb URL. This one will be stripped until only the track titles remain and saves the result to a file.

So far so good.
But now i want to advance this thing so it handles search values. i already checked and the URL is this one:

http://www.freedb.org/freedb_search.php?words=<searchwords>&allfields=NO&fields=artist&fields=title&allcats=YES&grouping=none

this means i can easily replace <searchwords> with $1 $2 $3 and so on...
the only thing left now is to get the right URL out of the resulting page... because it returns, of course, many...
The URLS i am talking about do all appear behind the text "Disc ID" any idea?


Also if someone accidently has some working perl/python/whatever script that does just this.. you are more than welcome smile

Last edited by Rasi (2008-03-31 14:45:42)


He hoped and prayed that there wasn't an afterlife. Then he realized there was a contradiction involved here and merely hoped that there wasn't an afterlife.

Douglas Adams

Offline

#2 2008-04-01 16:23:08

kumico
Member
Registered: 2007-09-28
Posts: 224
Website

Re: Getting tracklist from freedb?

#!/usr/bin/env python

import sys, re, urllib

rx_ids = re.compile('Disc-ID.*?<a href="(.*?)"', re.IGNORECASE | re.MULTILINE)
query = '+'.join(sys.argv[1:])

def get_url(url):
    try:
        sck = urllib.urlopen(url)
        contents = sck.read()
        sck.close()
        return contents
    except IOError:
        print sys.stderr, 'Error Retrieving Page: %s' % url
        sys.exit(1)

def get_ids(page_num, page=None):
    page_url = 'http://www.freedb.org/freedb_search.php?words=%s&allfields=NO&\
    allcats=YES&grouping=none&page=%d' % (query, page_num)
    if page is None:
        page = get_url(page_url)
    if page:
        ids = rx_ids.findall(page)
        if ids:
            for url in ids:
                print url




page = get_url('http://www.freedb.org/freedb_search.php?words=%s&allfields=NO\
&fields=artist&fields=title&allcats=YES&grouping=none' % query)

count = re.search('(\d+) result\(s\) found displayed on (\d+) page\(s\)', page)

if count:
    num_results = int(count.group(1))
    num_pages = int(count.group(2))
else:
    print sys.stderr, 'No results found.'
    sys.exit(0)

print '%d Result(s) found on %d page(s)' % (num_results, num_pages)

get_ids(1, page)
i = 2
while (i <= num_pages):
    sys.stdout.write('Retrieving Page %d ... %d%%\r' % (i, 100.0 * i / num_pages))
    sys.stdout.flush()
    try:
        get_ids(i)
    except KeyboardInterrupt:
        print ' ' * 36
        sys.exit(0)
    i += 1

cdids.py korn issues

you might need to improve on it,
i just whipped it together just now

prints to stdout and displays some progress

Offline

Board footer

Powered by FluxBB