You are not logged in.

#1 2009-09-03 11:12:58

lolilolicon
Member
Registered: 2009-03-05
Posts: 1,722

How to play with javascripts from command line?

This is a problem that I always have no clue how to deal with.

To begin with, take a look at this page: http://jsharer.com/file/1345737.htm

If you click on "[TUTHE@ManYv][D`z][FLCL][05][H264_Vorbis][E4273181].mkv", after a little while a little window pops up and if you right click on the "点击此处开始下载", you can copy the direct download link.

I previously used a simple bash script using curl to parse direct download links from this site. I like to do it at command line. It was easy because the link used to be directly shown in the page source code (though I had to call curl twice for it, it was fine.)

But now I can't. It's all javascript. I ain't got no idea how to deal with this kind of stuff. I made my attempt to look at the page source, all I can tell is that the <div id="download_block"> part and the <div id="lord_content"> line seems to be important.

I want to use my old good command line instead of click click click!

Can somebody help me with this? Or please just give me some hints. It'd be fully appreciated. Thanks!


This silver ladybug at line 28...

Offline

#2 2009-09-03 11:21:25

Profjim
Member
From: NYC
Registered: 2008-03-24
Posts: 658

Re: How to play with javascripts from command line?

If you know python, the package community/python-mechanize lets you automate a lot of this kind of thing. You can tell it to go to url so-and-so, simulate a click on the third link matching regexp such-and-such, preserving cookies the original page gave you, and so on.

Has made most of my webpage scraping a hell of lot easier.

EDIT: but I don't know whether it processes embedded Javascript, perhaps not.

Last edited by Profjim (2009-09-03 11:22:37)

Offline

#3 2009-09-03 11:31:05

Procyon
Member
Registered: 2008-05-07
Posts: 1,819

Re: How to play with javascripts from command line?

Looks like the page just has an ID. It seems the ID is looked up on the fly with javascript in a database to retrieve the download location instead of with php or whatever generated in the page. Seems really hard to bypass...

Offline

Board footer

Powered by FluxBB