You are not logged in.
Pages: 1
Topic closed
I posted this on stackoverflow but got no love and no help, I figured I may try here as well.
I bought an ebook for a class I am in, however, I really hate how the site is laid out and it makes it a pain to read. So I'd like to put the pages into some sort of text document/pdf.
Basically, I don't know the best way to go about pulling pages from the site. You have to be logged in to access the book, and I really have no idea how they authenticate(and I don't know if they'd appreciate me trying to mess around with that), so I need to write something that will be a macro for my browser, or I need to write a script that can control my browser after I am logged in to the site. I'd also like to point out that what I am trying to do is perfectly within the terms of service, I am allowed to print any pages I want, I am just trying to automate that.
So anyone have any idea on how I can go about this? I am essentially trying to replicate something like wget -m (or it can simply pull all text/images off of a frame whose name I supply), but it needs to be done through my browser so that I can be authenticated throughout the process. Then I can go in and parse the html in python or C or whatever I choose.
Offline
Moderator note:
I believe this thread runs afoul of the legality (copyright) provisions detailed in the forum etiquette
I am closing this thread. If I am not correct, please contact me with information related to your rights to use their material in this manner and I will reopen this thread.
ewaller
Nothing is too wonderful to be true, if it be consistent with the laws of nature -- Michael Faraday
Sometimes it is the people no one can imagine anything of who do the things no one can imagine. -- Alan Turing
---
How to Ask Questions the Smart Way
Offline
Pages: 1
Topic closed