You are not logged in.

#1 2010-10-17 23:52:40

korkadapa
Member
Registered: 2008-08-27
Posts: 32

Help with sed

Hi!

I'm trying to write a script that downloads the latest newscast for me. I download the html page with wget but I need help to extract the video stream url from there.

First I download the page with wget:

wget svtplay.se/t/102534/aktuellt

Then, in the html file that I just downloaded there's a link to a rtmp:// stream. Looks like this:

simon ~
> cat aktuellt | grep rtmp
            <param name="flashvars" value="dynamicStreams=url:rtmp://fl11.c91005.cdn.qbrick.com/91005/_definst_/kluster/20101017/1136333-1017AKTUELLT2100-PLAY-mp4-c-v1.mp4,bitrate:850|url:rtmp://fl11.c91005.cdn.qbrick.com/91005/_definst_/kluster/20101017/1136333-1017AKTUELLT2100-PLAY-mp4-b-v1,bitrate:320&amp;background=http://media.svt.se/download/mcc/flash/20101017/1136333-1017AKTUELLT2100-PLAY/1136333-1017AKTUELLT2100-PLAY_start_0.jpg&amp;urlinmail=http://svtplay.se/v/2196528/aktuellt/17_10_21_00&amp;liveStart=&amp;length=841&amp;noemail=true&amp;noembed=true&amp;autostart=true&amp;buffertime=2.0&amp;a=2196528&amp;expression=full&amp;startpos=0&amp;expired=false&amp;statisticsUrl=http://ld.svt.se/svt/svt/s?svt-play.Nyheter.Hela-program.17-10-21%3A00.2196528&amp;client=svt-play&amp;folderStructure=Aktuellt.Hela+program.Hela+program&amp;category=Nyheter&amp;title=17%2F10+21%3A00&amp;broadcastDate=20101017" />
                <param name="flashvars" value="dynamicStreams=url:rtmp://fl11.c91005.cdn.qbrick.com/91005/_definst_/kluster/20101017/1136333-1017AKTUELLT2100-PLAY-mp4-c-v1.mp4,bitrate:850|url:rtmp://fl11.c91005.cdn.qbrick.com/91005/_definst_/kluster/20101017/1136333-1017AKTUELLT2100-PLAY-mp4-b-v1,bitrate:320&amp;background=http://media.svt.se/download/mcc/flash/20101017/1136333-1017AKTUELLT2100-PLAY/1136333-1017AKTUELLT2100-PLAY_start_0.jpg&amp;urlinmail=http://svtplay.se/v/2196528/aktuellt/17_10_21_00&amp;liveStart=&amp;length=841&amp;noemail=true&amp;noembed=true&amp;autostart=true&amp;buffertime=2.0&amp;a=2196528&amp;expression=full&amp;startpos=0&amp;expired=false&amp;statisticsUrl=http://ld.svt.se/svt/svt/s?svt-play.Nyheter.Hela-program.17-10-21%3A00.2196528&amp;client=svt-play&amp;folderStructure=Aktuellt.Hela+program.Hela+program&amp;category=Nyheter&amp;title=17%2F10+21%3A00&amp;broadcastDate=20101017" />
                        <div class="external-player">L&auml;nk f&ouml;r extern spelare:&nbsp;<a class="external-player" href="rtmp://fl11.c91005.cdn.qbrick.com/91005/_definst_/kluster/20101017/1136333-1017AKTUELLT2100-PLAY-mp4-c-v1.mp4">Flash (rtmp)</a></div>

I'd like to extract the first rtmp:// link with sed (or awk, or whatever. I don't know), rtmp://fl11.c91005.cdn.qbrick.com/91005/_definst_/kluster/20101017/1136333-1017AKTUELLT2100-PLAY-mp4-c-v1.mp4 in this case. This url changes every day so I'd need something that prints everything from the first occurance of rtmp:// to .mp4. I'm guessing it's possible, I just don't have any idea how to accomplish this.

I'd appreciate som help with this.

Simon.

Offline

#2 2010-10-18 00:34:44

falconindy
Developer
From: New York, USA
Registered: 2009-10-22
Posts: 4,111
Website

Re: Help with sed

sed's the wrong tool (initially) to use. You'll want something that understands HTML. I had success using xmllint:

xmllint --html --xpath '//param[@name="flashvars"]/@value' <(curl -s svtplay.se/t/102534/aktuellt) 2>/dev/null | sed -n 's/.*url:\(rtmp:.*\.mp4\).*/\1/p'

Offline

#3 2010-10-18 05:47:05

karol
Archivist
Registered: 2009-05-06
Posts: 25,440

Re: Help with sed

Using falconindy's line, the whole thing can look e.g. like this:

rtmpdump -e -r $(xmllint --html --xpath '//param[@name="flashvars"]/@value' <(curl -s svtplay.se/t/102534/aktuellt) 2>/dev/null | sed -n 's/.*url:\(rtmp:.*\.mp4\).*/\1/p') -o $(date "+%F").mp4

Offline

Board footer

Powered by FluxBB