[Python] Cut a string using RE ?

Boris Bolgradov · 2008-10-09 15:43:38

I'm making a python script to get the weather temperature of my city. Here's the current state of the script:

import urllib2

addr = "http://weather.weatherbug.com/Bulgaria/Yambol-weather.html"
page = urllib2.urlopen(addr)

for line in page.readlines():
    if line.find('<div id="divTemp" class="wXconditions-temp">') != -1:
    temp = line

print temp

The problem is that I'm getting the string

<div id="divTemp" class="wXconditions-temp">17.0°C</div>

from weatherbug.com, where 17.0°C is the temperature. I'm trying to make the script to print 17.0°C, but I can't understand how to use RE to cut it. Can you give me a hand with this? Thanks!

PS: I've made this in bash before, but I need to do it in Python. Here's how it looks in bash:

$ lynx -dump -hiddenlinks=ignore -nolist http://weather.weatherbug.com/Bulgaria/Yambol-weather.html | grep "C" | head | tail -n1
   17.0°C

BetterLeftUnsaid · 2008-10-09 16:44:44

You could try something like this:

#!/usr/bin/env python
import urllib2
import re

addr = "http://weather.weatherbug.com/Bulgaria/Yambol-weather.html"
page = urllib2.urlopen(addr)

for line in page.readlines():
    match = re.search(r'<div id="divTemp" class="wXconditions-temp">(\d+.\d?)\°', line)
    if match:
      temp = match.group(1)

print temp

Which basically just groups the string using parenthasis, and then prints out the grouped section, which is the temperature. However, this is only printing a number (like 17.0), it doesn't show the little degrees sign or the C, but it wouldn't be too hard to add in, I suppose. And if there isn't a match, you'll get an error because 'temp' won't be defined.

Last edited by BetterLeftUnsaid (2008-10-09 16:46:22)

smoon · 2008-10-09 16:45:27

This should work:

import urllib2, re

addr = "http://weather.weatherbug.com/Bulgaria/Yambol-weather.html"
page = urllib2.urlopen(addr)
regexp = re.compile(r'.*>(\d{2}\.\d).+C<.*')

for line in page.readlines():
    if line.find('<div id="divTemp" class="wXconditions-temp">') != -1:
        m = regexp.match(line)
        print m.groups()[0]
        break

Now try to change the regular expression so you can remove your if line.find...
Another way would be to use BeautifulSoup for the parsing.

Boris Bolgradov · 2008-10-09 16:48:36

Yaaay, I made it! Woohooo. I lost my day making this stupid script and *finally* it's done. Here it is:

#!/usr/bin/env python
import urllib2, re

addr = "http://weather.weatherbug.com/Bulgaria/Yambol-weather.html"
page = urllib2.urlopen(addr)
degree_symbol = unichr(176).encode("latin-1")

for line in page.readlines():
    if line.find('<div id="divTemp" class="wXconditions-temp">') != -1:
    temp = line
    break

temp = re.sub('.*">', '', temp)
temp = re.sub('&deg.*\n', '', temp)
print temp + degree_symbol + "C"

And when I run it:

$ python weather.py 
16.0°C

EDIT: Wow! I'm a bit slow with the typing. Thanks for the help guys! I'll check them out.

Last edited by Boris Bolgradov (2008-10-09 16:50:57)

Arch Linux

#1 2008-10-09 15:43:38

[Python] Cut a string using RE ?

#2 2008-10-09 16:44:44

Re: [Python] Cut a string using RE ?

#3 2008-10-09 16:45:27

Re: [Python] Cut a string using RE ?

#4 2008-10-09 16:48:36

Re: [Python] Cut a string using RE ?

Board footer