You are not logged in.
I'm having this issue with extracting html, using Selenium's page_source function. When I have it as this:
sur = open('selog.txt', 'w')
muhPage=driver.page_source
print muhPage
#encode('ascii', 'ignore')
sur.write(muhPage)It produces this error:
Traceback (most recent call last):
File "./seltest.py", line 33, in <module>
sur.write(muhPage)
UnicodeEncodeError: 'ascii' codec can't encode characters in position 852-854: ordinal not in range(128)So, I've tried modifying it to this: "sur.write(muhPage.encode('ascii', 'ignore')". However, when I do this, it doesn't always extract all the information from the page; sometimes it falls short of a large chunk at the beginning. Not sure if this is an error on my end or the server.
Last edited by apolyonn (2014-03-18 18:20:38)
Offline
Specifically, I'm only getting the <head>...</head> information. After one (sometimes two) successful run(s), in which all page information displays, it will do this. Does this have something to do with cache/cookies?
Offline
Nevermind, I figured it out. Apparently, the JavaScript process hadn't finished before my script would fetch the page_source. So, I added a time_sleep() function.
import time
[...]
sur = open('selog.txt', 'w')
[...]
inputSearch.click()
time.sleep(4)
muhPage=driver.page_source
print muhPage
sur.write(muhPage.encode('ascii', 'ignore'))I'm going to curl up in the fetal position now and pretend that JavaScript doesn't exist.
Solved.
Last edited by apolyonn (2014-03-18 18:15:20)
Offline