You are not logged in.
Pages: 1
I am looking at collecting data from yahoo. I want to get data from a page that changes time to time. for example on this page i would like data like the following
15.00 CAC.X 4.00 Down 1.32 3.95 4.10 196 31,084
15.00 CAC.X 4.00 Down 1.32 4.00 4.10 196 31,084
15.00 CAC.X 4.00 Down 1.32 4.00 4.10 196 31,084
15.00 CAC.X 4.00 Down 1.32 4.00 4.15 196 31,084
How can i do this.? I know java and perl and some php. Could anyone gives me a jump start and tell me what method i should do i.e. what program or script language. I am not to sure how to tackle with problem.
Offline
The program links has a dump option which may be of use to you, but are you sure there is no feed for this information that they offer so that scripts don't waste their bandwidth?
archlinux - please read this and this — twice — then ask questions.
--
http://rsontech.net | http://github.com/rson
Offline
1. Get the html source
2. Write something that can interpret the source
3. Wrap it around a while and add a timer.
2. is the problem, however it's not as hard as you think. You can see the data you want is outside of the tags, except Up/Down.
So... even regex can do it. Here is a simple script, it isn't optimized because it will be easier to read.
while true; do
wget -q -O - 'http://finance.yahoo.com/q/op?s=C&m=2009-01' | sed -n '/<tr>/s/<tr>/\n/gp' | grep 'CAC.X' | sed 's/down_r[^>]*>/>DOWN/' | sed 's/up_g[^>]*>/>UP/' | sed 's/<[^>]*>/ /g' | sed 's/^ *//' | sed 's/ */ /g'
sleep 60
done
Offline
Personally, I would write a little python script that uses urllib2 and BeautifulSoup to parse it, find the info, and then transform it to your needs
Offline
thanks Procyon your script works perfect
Offline
Personally, I would write a little python script that uses urllib2 and BeautifulSoup to parse it, find the info, and then transform it to your needs
I prefer to use Python for text manipulation, too. Even though it may not be the fastest way if you're trying to process a very large amount of data, the code is much cleaner.
-- jwc
http://jwcxz.com/ | blog
dotman - manage your dotfiles across multiple environments
icsy - an alarm for powernappers
Offline
I use w3m for everything like this, so I'll mention it even though probably nobody cares.
w3m 'http://finance.yahoo.com/q/op?s=C&m=2009-01' > stuff.txt
then clean the stuff you don't want out of stuff.txt
Offline
I use w3m for everything like this, so I'll mention it even though probably nobody cares.
w3m 'http://finance.yahoo.com/q/op?s=C&m=2009-01' > stuff.txt
then clean the stuff you don't want out of stuff.txt
i like it!
Offline
This is a linux forum. I strongly doubt that your ads, for Windows software, will be appreciated here.
Offline
This is a linux forum. I strongly doubt that your ads, for Windows software, will be appreciated here.
Good call. If you see garbage like this is the future, please report it. Banning IP addresses like that one is a pleasure.
Offline
Pages: 1