You are not logged in.
Pages: 1
Hey, i am looking for an easy way to get a txt file with the contents of a freedb entry.
right now i have a working sed script, that i can feed with a given freedb URL. This one will be stripped until only the track titles remain and saves the result to a file.
So far so good.
But now i want to advance this thing so it handles search values. i already checked and the URL is this one:
http://www.freedb.org/freedb_search.php?words=<searchwords>&allfields=NO&fields=artist&fields=title&allcats=YES&grouping=none
this means i can easily replace <searchwords> with $1 $2 $3 and so on...
the only thing left now is to get the right URL out of the resulting page... because it returns, of course, many...
The URLS i am talking about do all appear behind the text "Disc ID" any idea?
Also if someone accidently has some working perl/python/whatever script that does just this.. you are more than welcome
Last edited by Rasi (2008-03-31 14:45:42)
He hoped and prayed that there wasn't an afterlife. Then he realized there was a contradiction involved here and merely hoped that there wasn't an afterlife.
Douglas Adams
Offline
#!/usr/bin/env python
import sys, re, urllib
rx_ids = re.compile('Disc-ID.*?<a href="(.*?)"', re.IGNORECASE | re.MULTILINE)
query = '+'.join(sys.argv[1:])
def get_url(url):
try:
sck = urllib.urlopen(url)
contents = sck.read()
sck.close()
return contents
except IOError:
print sys.stderr, 'Error Retrieving Page: %s' % url
sys.exit(1)
def get_ids(page_num, page=None):
page_url = 'http://www.freedb.org/freedb_search.php?words=%s&allfields=NO&\
allcats=YES&grouping=none&page=%d' % (query, page_num)
if page is None:
page = get_url(page_url)
if page:
ids = rx_ids.findall(page)
if ids:
for url in ids:
print url
page = get_url('http://www.freedb.org/freedb_search.php?words=%s&allfields=NO\
&fields=artist&fields=title&allcats=YES&grouping=none' % query)
count = re.search('(\d+) result\(s\) found displayed on (\d+) page\(s\)', page)
if count:
num_results = int(count.group(1))
num_pages = int(count.group(2))
else:
print sys.stderr, 'No results found.'
sys.exit(0)
print '%d Result(s) found on %d page(s)' % (num_results, num_pages)
get_ids(1, page)
i = 2
while (i <= num_pages):
sys.stdout.write('Retrieving Page %d ... %d%%\r' % (i, 100.0 * i / num_pages))
sys.stdout.flush()
try:
get_ids(i)
except KeyboardInterrupt:
print ' ' * 36
sys.exit(0)
i += 1
cdids.py korn issues
you might need to improve on it,
i just whipped it together just now
prints to stdout and displays some progress
Offline
Pages: 1