sed help / extracting useful bits from a line with sed

graysky · 2011-04-10 17:32:11

I need to parse some log files extracting only the numerical parts of interest. Example line:

x264 [info]: SSIM Mean Y:0.9689320 (15.077db)

Two goals here:
1) Capture the number after the "Y:" --> 0.9689320
2) Capture the number in () without the db --> 15.077

I thought I'd attack it using sed to find anything up to Y: and delete it, then anything after a space and delete it, but find myself unable to do it. Suggestions are welcomed. I can do it with awk/sed combo but want to learn a better way.

awk '{print $5}' | sed 's/Y://'

Last edited by graysky (2011-04-10 17:35:21)

Awebb · 2011-04-10 17:53:44

I'm also in the middle of learning the whole regex thing.

sed 's/^.*Y://'

^ means the beginning of the line.
. means any character.
* means any number of the character before *.

So everything in front (and including Y) is now empty.

This is the first goal. Analog to ^ for the beginning of the line, you have $ for the end of the line. And make sure you don't forget about the brackets and what you have to do with them, because ( ... ) has it's own meaning in sed. *speaking in riddles to leave some traces of learning effect ;-)*)

EDIT

More stuff:
- 's/a/b/' changes the first occurrence of a to b, while 's/a/b/g' changes every 'a' into 'b'.
- 's/[abc]/d/' changes a, b and c into d.

Last edited by Awebb (2011-04-10 18:06:40)

disraptor · 2011-04-10 18:19:46

The first number you can get with

sed 's/.*Y:\([0-9\.]*\).*/\1/'

which basically does what you've described. The entire line is replaced by \1 which is automatically substituted by the content of the first $ and $ pair, i.e. the number.

If you want to get both numbers with a single sed call you could use

sed 's/.*Y:\([0-9\.]*\) (\([0-9\.]*\).*/\1 \2/'

And with awk you could do something like

awk '{ print substr($5, 3) }'

which is much easier to read

fsckd · 2011-04-10 18:36:44

what you want to do is express a match for all your lines using a regex, like what you'd use in grep, then use  to capture the sections you want

like s,a $b$ $c$,\1 \2,
where in grep you'd do 'a b c'

or just use awk

Last edited by fsckd (2011-04-10 18:39:30)

harryNID · 2011-04-10 21:44:26

@graysky - As others have said use sed backreferences. Here is an another example.

sed 's/^.*:$[[:alnum:]]*\.[[:alnum:]]*$ [^$]*(\(.*$db)[^\)]*$/\1 \2/'

Note: disraptor's regex is much easier to read. I'd use it! I just whipped this up before I saw his

In:
echo "x264 [info]: SSIM Mean Y:0.9689320 (15.077db)" | sed 's/^.*:$[[:alnum:]]*\.[[:alnum:]]*$ [^$]*(\(.*$db)[^\)]*$/\1 \2/'

Out:
0.9689320 15.077

In awk: (One way of doing it!)

mawk '{gsub(/[Y\:db]/,""); print $5,$6}'

In:
echo "x264 [info]: SSIM Mean Y:0.9689320 (15.077db)" | mawk '{gsub(/[Y\:db]/,""); print $5,$6}'

Out:
0.9689320 15.077

This was quick but I hope it gets you on the right path

Edit:
How about just grep?

echo $(grep -o '[0-9]\.[0-9]*')

In:
echo "x264 [info]: SSIM Mean Y:0.9689320 (15.077db)" | echo $(grep -o '[0-9]\.[0-9]*')

Out:
0.9689320 5.077

Cheap, but I'd thought I would add it anyway!

P.S. Here is the full way to extract both numbers using awk's substr function as disraptor mentioned before.

mawk '{print substr($5,3),substr($6,2,6)}'

Last edited by harryNID (2011-04-11 20:08:31)

Procyon · 2011-04-10 22:50:02

while read u u u u y db rest; do
printf "%s\t%s\n" ${y//[^0-9.]/} ${db//[^0-9.]/}
done < FILE.txt

graysky · 2011-04-10 23:11:36

@all - thanks for all the suggestions!

Arch Linux

#1 2011-04-10 17:32:11

sed help / extracting useful bits from a line with sed

#2 2011-04-10 17:53:44

Re: sed help / extracting useful bits from a line with sed

#3 2011-04-10 18:19:46

Re: sed help / extracting useful bits from a line with sed

#4 2011-04-10 18:36:44

Re: sed help / extracting useful bits from a line with sed

#5 2011-04-10 21:44:26

Re: sed help / extracting useful bits from a line with sed

#6 2011-04-10 22:50:02

Re: sed help / extracting useful bits from a line with sed

#7 2011-04-10 23:11:36

Re: sed help / extracting useful bits from a line with sed

Board footer