You are not logged in.
Hi, I'm writting a little bash script under Ubuntu, but I'll use it under Arch ^^
I got a problem, and I don't know how can I fix it.
$ wget -O boulet.html http://twitter.com/justinbieber
--2010-12-22 12:05:53-- http://twitter.com/justinbieber
Resolving twitter.com... 128.242.240.212, 168.143.162.116, 128.242.245.180
Connecting to twitter.com|128.242.240.212|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: 52909 (52K) [text/html]
Saving to: `boulet.html'
100%[==============================================================================================================================>] 52,909 57.1K/s in 0.9s
2010-12-22 12:05:54 (57.1 KB/s) - `boulet.html' saved [52909/52909]
Then I open boulet.html (local disk) and the online twitter page:
Why does the two pages aren't the same ? boulet.html was downloaded one minut ago, and it lacks a lot of tweets... ?
And less important, why does the downloaded page (boulet.html) is in French while I have my OS in English ?
Offline
your useragent of wget isn' t the same as the one of firefox/or other browser you can fix this by reading the wget man apge.
And ofc. wget only downloads a local copy so it misses updates
Offline
I tried to put my Firefox User Agent in the wget command, but it changed nothing. Off course I know that I will miss updades with the downloaded page, but I downloaded it one minute before checking the online version
Offline
Why does the two pages aren't the same ? boulet.html was downloaded one minut ago, and it lacks a lot of tweets... ?
And less important, why does the downloaded page (boulet.html) is in French while I have my OS in English ?
Content negotiation. Your browser sends in each http request some information the server can use to tailor the response to each request.
For example, you probably configured your browser to prefer the English language, so in each request it adds the header
Accept-Language: en
The twitter server reads that information and sends you the English page.
I assume twitter uses some sort of geolocation which resolves your ip to a French speaking country (Quebec?), and thus, when you omit Accept-Language (like when you use a raw tool like wget), you are sent the localized page.
And lastly, it seems to me localized pages lag behind the original ones.
So, you should try
wget --header="Accept-Language: en" -O boulet.html http://twitter.com/justinbieber
and for the love of god, Justin Bieber???
Offline
Okay, Thanks for your explanations about the headers.
I live in France, but my whole system including Firefox, and wget is in English (it's a choice). Even with wget --user-agent set to en-US the page downloaded is in French
Another fact: Now twitter send me the right webpage with all the updates. It did that when I first tried my script but then I don't know why, it sent me old pages :S
And, I gave the justinbieber page here, but my script doesn't use this one.
I don't listen that kind of "music" at all, but I needed a Twitter profile with a lot of post (like a teenager ahah)
Offline
carlocci doesn't mention the user-agent, but another HTTP header part, have you try his line ?
I'm in France, with a system in English and the wget get line works great.
If I remove
--header="Accept-Language: en"
I get a French page like you.
Sorry in advance for my poor english...
Offline