You are not logged in.
Hello,
I would like to create a small script to parse http://jt.france2.fr/20h/ in order to extract the latest News program available:
i.e.:
mms://a988.v101995.c10199.e.vm.akamaistream.net/7/988/10199/3f97c7e6/ftvigrp.download.akamai.com/10199/cappuccino/production/publication/France_2/Autre/2009/S23/40692_HD_20h_20090602.wmv
etc.
I am not sure how to easily to do that (perl - python - bash/awk - or ???)
Thanks in advance for your help!
Ludo
Offline
use `lynx -dump $url | grep $searchstring` in many a bash scripts for this sort of thing (curl or wget could also be used). could also replace grep with awk though if you want to isolate the download_url, send it to `wget -O $filename $download_url` then call vlc $filename or something... this should get you started. i see some manpages in your future.
good luck
Last edited by brisbin33 (2009-06-02 20:30:05)
//github/
Offline
What about feeding the page to
grep -o 'mms://[^"]\+'
?
Offline
It's actually straightforward, but I had to struggle a bit, because the url you gave is wrong. The actual player page can be found at http://jt.france2.fr/player/20h/index-fr.php - this is embedded in an iframe on the front page.
Now, to get the url from this player page with bash and wget, do
$ wget "http://jt.france2.fr/player/20h/index-fr.php" -O - 2>/dev/null | grep -o -E 'mms://.*\.wmv'
mms://a988.v101995.c10199.e.vm.akamaistream.net/7/988/10199/3f97c7e6/ftvigrp.download.akamai.com/10199/cappuccino/production/publication/France_2/Autre/2009/S23/40692_HD_20h_20090602.wmv
Offline
Thank you for all the answers and thanks a lot gnud!
Offline
Wow. Slick bash scripting is cool. Hard to read (for me!) though.
Another option is to grab ruby (or python, or any other higher level language you want to learn...) and use a library for parsing HTML. For example in Ruby you could check out Hpricot or Nokigiri.
Not quite as slick of an option, but you could poke around another language and if you ever wanted to extend it, using a framework will make your life a lot easier.
You can't really go wrong. Solving problems in an automated fashion with good tools is certainly a joy.
Offline
In this case, Bash is actually (IMHO) simpler.
In two distinct commands, you (1) fretch the html and (2) extract the url.
Offline
Hey, this could prove useful to me too...
I want to expand my addressbook and I have requested a list of my colleagues - that request was denied because of privacy reasons. While they deny my request, all the information can be found on a website without any security measures
So I'm thinking of writing a spider to crawl every http://url/path?id=[1-5000] and save the results in a textfile I can parse with PHP (since that's the only language I have experience with) - I might try bash if it is as simple as this...
Offline