You are not logged in.

#1 2014-03-18 06:39:02

apolyonn
Member
Registered: 2013-05-21
Posts: 46

[SOLVED] Python .write() and encoding error

I'm having this issue with extracting html, using Selenium's page_source function.  When I have it as this:

sur = open('selog.txt', 'w')

muhPage=driver.page_source
print muhPage
#encode('ascii', 'ignore')
sur.write(muhPage)

It produces this error:

Traceback (most recent call last):
  File "./seltest.py", line 33, in <module>
    sur.write(muhPage)
UnicodeEncodeError: 'ascii' codec can't encode characters in position 852-854: ordinal not in range(128)

So, I've tried modifying it to this: "sur.write(muhPage.encode('ascii', 'ignore')".  However, when I do this, it doesn't always extract all the information from the page; sometimes it falls short of a large chunk at the beginning.  Not sure if this is an error on my end or the server.

Last edited by apolyonn (2014-03-18 18:20:38)

Offline

#2 2014-03-18 16:38:22

apolyonn
Member
Registered: 2013-05-21
Posts: 46

Re: [SOLVED] Python .write() and encoding error

Specifically, I'm only getting the <head>...</head> information.  After one (sometimes two) successful run(s), in which all page information displays, it will do this.  Does this have something to do with cache/cookies?

Offline

#3 2014-03-18 18:14:51

apolyonn
Member
Registered: 2013-05-21
Posts: 46

Re: [SOLVED] Python .write() and encoding error

Nevermind, I figured it out.  Apparently, the JavaScript process hadn't finished before my script would fetch the page_source.  So, I added a time_sleep() function.

import time
[...]

sur = open('selog.txt', 'w')
[...]
inputSearch.click()

time.sleep(4)

muhPage=driver.page_source
print muhPage
sur.write(muhPage.encode('ascii', 'ignore'))

I'm going to curl up in the fetal position now and pretend that JavaScript doesn't exist.
Solved.

Last edited by apolyonn (2014-03-18 18:15:20)

Offline

Board footer

Powered by FluxBB