[SOLVED] Python .write() and encoding error

apolyonn · 2014-03-18 06:39:02

I'm having this issue with extracting html, using Selenium's page_source function. When I have it as this:

sur = open('selog.txt', 'w')

muhPage=driver.page_source
print muhPage
#encode('ascii', 'ignore')
sur.write(muhPage)

It produces this error:

Traceback (most recent call last):
  File "./seltest.py", line 33, in <module>
    sur.write(muhPage)
UnicodeEncodeError: 'ascii' codec can't encode characters in position 852-854: ordinal not in range(128)

So, I've tried modifying it to this: "sur.write(muhPage.encode('ascii', 'ignore')". However, when I do this, it doesn't always extract all the information from the page; sometimes it falls short of a large chunk at the beginning. Not sure if this is an error on my end or the server.

Last edited by apolyonn (2014-03-18 18:20:38)

apolyonn · 2014-03-18 16:38:22

Specifically, I'm only getting the <head>...</head> information. After one (sometimes two) successful run(s), in which all page information displays, it will do this. Does this have something to do with cache/cookies?

apolyonn · 2014-03-18 18:14:51

Nevermind, I figured it out. Apparently, the JavaScript process hadn't finished before my script would fetch the page_source. So, I added a time_sleep() function.

import time
[...]

sur = open('selog.txt', 'w')
[...]
inputSearch.click()

time.sleep(4)

muhPage=driver.page_source
print muhPage
sur.write(muhPage.encode('ascii', 'ignore'))

I'm going to curl up in the fetal position now and pretend that JavaScript doesn't exist.
Solved.

Last edited by apolyonn (2014-03-18 18:15:20)

Arch Linux

#1 2014-03-18 06:39:02

[SOLVED] Python .write() and encoding error

#2 2014-03-18 16:38:22

Re: [SOLVED] Python .write() and encoding error

#3 2014-03-18 18:14:51

Re: [SOLVED] Python .write() and encoding error

Board footer