You are not logged in.
How old are you thread ---> https://bbs.archlinux.org/viewtopic.php?id=24166
Who can write a script that will parse all 36 pages of this thread and only collect ages people entered in a single column?
example:
Ages
21
20
13
43
26
24
Or maybe a bit more sophisticated, two columns, one with the age and another with the post's date.
example:
Age,date
21,2006-07-23
20,2009-07-24
13,2009-07-24
43,2009-07-23
26,2009-09-10
24,2009-12-02
I messed around w/ a for loop using lynx -dump and sed but quickly gave up as this is beyond my puny skills.
Last edited by graysky (2010-12-11 18:56:02)
CPU-optimized Linux-ck packages @ Repo-ck • AUR packages • Zsh and other configs
Offline
Won't work i think, many people using funny age notations there..
ᶘ ᵒᴥᵒᶅ
Offline
scarney wrote:42 8)
hmm, the ultimate answer...
i'm on my way to 42, so far i'm about 2/3...
If you can get past this post without risking to miss a less than 10 years old Archer, you can have all the kudos in the world.
Aaaaand:
00010111
This one might hurt your feelings in the long run.
Last edited by Awebb (2010-12-11 19:46:23)
Offline
Won't work i think, many people using funny age notations there..
Yeah, I noticed. How could I use sed to find a string, then take the contents of the next line to a new file? It seems like you can just look at the raw html output and cut out all the junk by keying on the phrase <div class="postmsg"> then printing the next line to a new file I can then further refine with sed and perhaps bring into a spreadsheet to sort out the rest of the junk my hand.
I have no idea how to use sed to do this...
Last edited by graysky (2010-12-11 19:48:44)
CPU-optimized Linux-ck packages @ Repo-ck • AUR packages • Zsh and other configs
Offline
I was considering doing this for a project for statistics class, but with everybody trying to be clever it would be nearly impossible to parse out good data.
however the thread is only like 36 pages long, it would only take a few hours!
Hofstadter's Law:
It always takes longer than you expect, even when you take into account Hofstadter's Law.
Offline
Well, I think it should be easier than one would expect. Take those "binary ages" for example:
Assuming no one is aged 01, 10, 11, 100, ... (none of these should be an Arch user, right?) you can simply check if the number consists of ones and zeros and if so, convert binary to decimal. Voila!
All it takes is someone bored enough to write the code
It is better to keep your mouth shut and be thought a fool than to open it and remove all doubt. (Mark Twain)
Offline
Well, I think it should be easier than one would expect. Take those "binary ages" for example:
Assuming no one is aged 01, 10, 11, 100, ... (none of these should be an Arch user, right?) you can simply check if the number consists of ones and zeros and if so, convert binary to decimal. Voila!
All it takes is someone bored enough to write the code
Sounds like an exceptional waste of time.
ᶘ ᵒᴥᵒᶅ
Offline
It would be faster to hand-parse
Allan-Volunteer on the (topic being discussed) mailn lists. You never get the people who matters attention on the forums.
jasonwryan-Installing Arch is a measure of your literacy. Maintaining Arch is a measure of your diligence. Contributing to Arch is a measure of your competence.
Griemak-Bleeding edge, not bleeding flat. Edge denotes falls will occur from time to time. Bring your own parachute.
Offline
Working on it, but I'm currently messing around with some parse errors.
Offline
Well, I think it should be easier than one would expect. Take those "binary ages" for example:
Assuming no one is aged 01, 10, 11, 100, ... (none of these should be an Arch user, right?) you can simply check if the number consists of ones and zeros and if so, convert binary to decimal. Voila!
All it takes is someone bored enough to write the code
Or, you could check for eight digits of 1 and 0.
Did anyone use hex or octal?
And as an additional challenge, it should grab the poster's username as well and put that in a third column.
Offline
litemotiv wrote:Won't work i think, many people using funny age notations there..
Yeah, I noticed. How could I use sed to find a string, then take the contents of the next line to a new file? It seems like you can just look at the raw html output and cut out all the junk by keying on the phrase <div class="postmsg"> then printing the next line to a new file I can then further refine with sed and perhaps bring into a spreadsheet to sort out the rest of the junk my hand.
I have no idea how to use sed to do this...
That's a rough approach, you're better off finding a language with a tagsoup library to pull out that div's contents (python, haskell, php all have this).
Even still, the way with which people describe their ages in that thread is way to varied to have much success (IMO).
//github/
Offline
It's tough enough even with a real brain. Is "0016" supposed to be octal? I can't think of any other good reason for the leading zeros, but it wasn't explicitly stated.
Figuring out binary can't be too hard -- 10 and 11 only make sense in decimal and anything greater only makes sense in binary (omitting centenarians and toddlers). Other devices pose problems.
But yeah, what with everybody on there posting in English, you're better off doing it with a computer that understands English.
Offline