You are not logged in.

#1 2012-12-20 23:24:52

graysky
Member
From: /run/user/1000
Registered: 2008-12-01
Posts: 8,336
Website

Need to extract only decimal numbers for a glob of text [SOLVED]

If you have a look at /dev/zero's thread here, you'll see that users have been posting the output of his script which are numbers that range from 2 to 5 decimal places.  If I dump this entire thread to txt file, how can I:

1) Delete everything except for numbers of the following formats (where 'x' is a digit and '.' is a decimal point)?
2) Format the output to be one target per line?

x.xx
x.xxx
x.xxxx
x.xxxxx

xx.xx
xx.xxx
xx.xxxx
xx.xxxxx

I have experimented with some sed strings but am not making any traction.  Perhaps your perl or awk ninjas have a good solution?

Here is the source file which was generated from a copy/paste of that thread into an empty text file: http://pastebin.com/ZkRFhFAr

Last edited by graysky (2012-12-21 11:19:05)


CPU-optimized Linux-ck packages @ Repo-ck  • AUR packagesZsh and other configs

Offline

#2 2012-12-21 00:45:01

graysky
Member
From: /run/user/1000
Registered: 2008-12-01
Posts: 8,336
Website

Re: Need to extract only decimal numbers for a glob of text [SOLVED]

I'm getting closer but still no cigar:

% grep '[0-9]\.[0-9][0-9]' unixness_thread.txt | sed 's/[^0-9.]*//g'

CPU-optimized Linux-ck packages @ Repo-ck  • AUR packagesZsh and other configs

Offline

#3 2012-12-21 00:59:47

jasonwryan
Forum & Wiki Admin
From: .nz
Registered: 2009-05-09
Posts: 18,067
Website

Re: Need to extract only decimal numbers for a glob of text [SOLVED]

This is not really tested, and quite crude...

awk '/^[0-9]?.[0-9]+/ && !/[A-Za-z]/ {if (NF>=2) print $1,$2; else if (NF==1) print $1}' numbers.txt

Arch + dwm   •   Mercurial repos  •   Github

Registered Linux User #482438

Online

#4 2012-12-21 01:35:29

Trent
Member
From: Baltimore, MD (US)
Registered: 2009-04-16
Posts: 986

Re: Need to extract only decimal numbers for a glob of text [SOLVED]

Ooh, a challenge...

* 1 hour later *

#!/usr/bin/perl -an
@ary = grep /^\d{1,2}\.\d{2,5}$/, @F;
print join("\n", @ary) . "\n" if @ary;

Run as ./script.pl unixness_thread.txt

Not perfect, it grabs 10.04 which is an Ubuntu version in context, and 4.10 which is an Xfce version... but not too far off. Stick a "sort -n" on the end and knock off the obvious outliers and you'll be half there.

Edit -- just noticed it skips at least one number in parentheses: (8.73086). Was thinking the default splitting behavior would not be a problem but I was wrong. Not sure how to fix this but I think it can be done with -F.

Edit again -- changing the shebang as follows seems to work:

#!/usr/bin/perl -an -F/[^\d\.]/

Last edited by Trent (2012-12-21 01:55:08)

Offline

#5 2012-12-21 02:34:43

graysky
Member
From: /run/user/1000
Registered: 2008-12-01
Posts: 8,336
Website

Re: Need to extract only decimal numbers for a glob of text [SOLVED]

Thanks guys.  I ended up using my code and manually cleaning up the output.  I will try both of your proposals tomorrow when I have some time.


CPU-optimized Linux-ck packages @ Repo-ck  • AUR packagesZsh and other configs

Offline

#6 2012-12-21 05:54:25

rockin turtle
Member
From: Montana, USA
Registered: 2009-10-22
Posts: 216

Re: Need to extract only decimal numbers for a glob of text [SOLVED]

grep -E -o '\<[0-9]{1,2}\.[0-9]{2,5}\>' file

Offline

#7 2012-12-21 11:18:47

graysky
Member
From: /run/user/1000
Registered: 2008-12-01
Posts: 8,336
Website

Re: Need to extract only decimal numbers for a glob of text [SOLVED]

@rockin turtle - Wow, that fits the bill. Tks!


CPU-optimized Linux-ck packages @ Repo-ck  • AUR packagesZsh and other configs

Offline

Board footer

Powered by FluxBB