You are not logged in.

#1 2009-05-23 13:38:36

DarkLikeHell
Member
Registered: 2009-02-07
Posts: 71

[Solved] python and urllib2

Hello everyone
I'm trying to write a script in python that suppose to download web pages content (html).
I'm using this for downloading one page:

data=urllib2.urlopen(url)

and it is pretty good for one page.
But what should I do if I want to recieve about 200 pages?
This code will do it too slow because in every use it creates a connection over and over again.
Is there a faster way in python to recieve lots of pages?
Thanks, and sorry for my English smile

Last edited by DarkLikeHell (2009-05-23 22:53:41)

Offline

#2 2009-05-23 14:05:59

kumyco
Member
From: somewhere
Registered: 2008-06-23
Posts: 153
Website

Re: [Solved] python and urllib2

take a look into sockets

Offline

#3 2009-05-23 15:49:27

DarkLikeHell
Member
Registered: 2009-02-07
Posts: 71

Re: [Solved] python and urllib2

Thanks for the reply.
How can I use it for downloading pages?
The server does not send pages. It just hosts them...

Last edited by DarkLikeHell (2009-05-23 15:50:33)

Offline

#4 2009-05-23 18:30:57

kumyco
Member
From: somewhere
Registered: 2008-06-23
Posts: 153
Website

Re: [Solved] python and urllib2

#!/usr/bin/env python

import socket

host = 'example.com'
port = 80

sck = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
sck.connect((host, port))

request = '/index.html'
sck.send("GET %s HTTP/1.0\r\nHost: %s\r\n\r\n" % (request, host))

html = sck.recv(2048)
sck.close()

# the head http headers end with \r\n\r\n, you can skip them if you want
header_end = html.find('\r\n\r\n') + 4
html = html[header_end:]

print html

you might wanna read up on sockets, and handle the errors, but i think that shows the basics
1. get a socket
2. connect it to a host
3. send a request
4. recv a reply
5. goto 4 until you're done
6. close the socket

if urllib worked, then this will work it's the same

Offline

#5 2009-05-23 22:53:21

DarkLikeHell
Member
Registered: 2009-02-07
Posts: 71

Re: [Solved] python and urllib2

Cool thanks smile
Solved.

Offline

Board footer

Powered by FluxBB