You are not logged in.
I need to parse some log files extracting only the numerical parts of interest. Example line:
x264 [info]: SSIM Mean Y:0.9689320 (15.077db)
Two goals here:
1) Capture the number after the "Y:" --> 0.9689320
2) Capture the number in () without the db --> 15.077
I thought I'd attack it using sed to find anything up to Y: and delete it, then anything after a space and delete it, but find myself unable to do it. Suggestions are welcomed. I can do it with awk/sed combo but want to learn a better way.
awk '{print $5}' | sed 's/Y://'
Last edited by graysky (2011-04-10 17:35:21)
CPU-optimized Linux-ck packages @ Repo-ck • AUR packages • Zsh and other configs
Offline
I'm also in the middle of learning the whole regex thing.
sed 's/^.*Y://'
^ means the beginning of the line.
. means any character.
* means any number of the character before *.
So everything in front (and including Y) is now empty.
This is the first goal. Analog to ^ for the beginning of the line, you have $ for the end of the line. And make sure you don't forget about the brackets and what you have to do with them, because ( ... ) has it's own meaning in sed. *speaking in riddles to leave some traces of learning effect ;-)*)
EDIT
More stuff:
- 's/a/b/' changes the first occurrence of a to b, while 's/a/b/g' changes every 'a' into 'b'.
- 's/[abc]/d/' changes a, b and c into d.
Last edited by Awebb (2011-04-10 18:06:40)
Offline
The first number you can get with
sed 's/.*Y:\([0-9\.]*\).*/\1/'
which basically does what you've described. The entire line is replaced by \1 which is automatically substituted by the content of the first \( and \) pair, i.e. the number.
If you want to get both numbers with a single sed call you could use
sed 's/.*Y:\([0-9\.]*\) (\([0-9\.]*\).*/\1 \2/'
And with awk you could do something like
awk '{ print substr($5, 3) }'
which is much easier to read
Offline
what you want to do is express a match for all your lines using a regex, like what you'd use in grep, then use \(\) to capture the sections you want
like s,a \(b\) \(c\),\1 \2,
where in grep you'd do 'a b c'
or just use awk
Last edited by fsckd (2011-04-10 18:39:30)
aur S & M :: forum rules :: Community Ethos
Resources for Women, POC, LGBT*, and allies
Offline
@graysky - As others have said use sed backreferences. Here is an another example.
sed 's/^.*:\([[:alnum:]]*\.[[:alnum:]]*\) [^\(]*(\(.*\)db)[^\)]*$/\1 \2/'
Note: disraptor's regex is much easier to read. I'd use it! I just whipped this up before I saw his
In:
echo "x264 [info]: SSIM Mean Y:0.9689320 (15.077db)" | sed 's/^.*:\([[:alnum:]]*\.[[:alnum:]]*\) [^\(]*(\(.*\)db)[^\)]*$/\1 \2/'
Out:
0.9689320 15.077
In awk: (One way of doing it!)
mawk '{gsub(/[Y\:\(\)db]/,""); print $5,$6}'
In:
echo "x264 [info]: SSIM Mean Y:0.9689320 (15.077db)" | mawk '{gsub(/[Y\:\(\)db]/,""); print $5,$6}'
Out:
0.9689320 15.077
This was quick but I hope it gets you on the right path
Edit:
How about just grep?
echo $(grep -o '[0-9]\.[0-9]*')
In:
echo "x264 [info]: SSIM Mean Y:0.9689320 (15.077db)" | echo $(grep -o '[0-9]\.[0-9]*')
Out:
0.9689320 5.077
Cheap, but I'd thought I would add it anyway!
P.S. Here is the full way to extract both numbers using awk's substr function as disraptor mentioned before.
mawk '{print substr($5,3),substr($6,2,6)}'
Last edited by harryNID (2011-04-11 20:08:31)
In solving a problem of this sort, the grand thing is to be able to reason backward. That is a very useful accomplishment, and a very easy one, but people do not practice it much. In the everyday affairs of life it is more useful to reason forward, and so the other comes to be neglected. There are fifty who can reason synthetically for one who can reason analytically. --Sherlock Holmes
Offline
while read u u u u y db rest; do
printf "%s\t%s\n" ${y//[^0-9.]/} ${db//[^0-9.]/}
done < FILE.txt
Offline
@all - thanks for all the suggestions!
CPU-optimized Linux-ck packages @ Repo-ck • AUR packages • Zsh and other configs
Offline