You are not logged in.
Pages: 1
Hello everyone
I'm trying to write a script in python that suppose to download web pages content (html).
I'm using this for downloading one page:
data=urllib2.urlopen(url)
and it is pretty good for one page.
But what should I do if I want to recieve about 200 pages?
This code will do it too slow because in every use it creates a connection over and over again.
Is there a faster way in python to recieve lots of pages?
Thanks, and sorry for my English
Last edited by DarkLikeHell (2009-05-23 22:53:41)
Offline
take a look into sockets
Offline
Thanks for the reply.
How can I use it for downloading pages?
The server does not send pages. It just hosts them...
Last edited by DarkLikeHell (2009-05-23 15:50:33)
Offline
#!/usr/bin/env python
import socket
host = 'example.com'
port = 80
sck = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
sck.connect((host, port))
request = '/index.html'
sck.send("GET %s HTTP/1.0\r\nHost: %s\r\n\r\n" % (request, host))
html = sck.recv(2048)
sck.close()
# the head http headers end with \r\n\r\n, you can skip them if you want
header_end = html.find('\r\n\r\n') + 4
html = html[header_end:]
print html
you might wanna read up on sockets, and handle the errors, but i think that shows the basics
1. get a socket
2. connect it to a host
3. send a request
4. recv a reply
5. goto 4 until you're done
6. close the socket
if urllib worked, then this will work it's the same
Offline
Cool thanks
Solved.
Offline
Pages: 1