You are not logged in.

#1 2011-09-17 16:04:00

srikanthradix
Member
Registered: 2010-10-19
Posts: 35

[SOLVED] Faster script?

Log file extract id and destination substrings using awk.

20110911 12:30:33 [seq=123444][src=sample1][Id=12345][Destination=CME][SourceSystem<5177>=RAINIER]

In temp1.log, there are a million lines with the same format as above which needs id and destination to be extracted.

1st attempt:

[srikanth@hana ~]$ time awk '{match($0,/Id=([0-9]*)/,a); print a[1]; match($0,/Destination=([A-Za-z_]*)/,a); print a[1];}' temp1.log > ids.log

real    0m14.748s
user    0m14.656s
sys    0m0.077s

2nd attempt:

[srikanth@hana ~]$ time awk '
BEGIN {
             destlen=length("Destination=") + 1; 
             idlen = length("Id=") + 1; 
} 
{
             split ($0,a,"["); 
             j=0;  len=length(a); 
             for(i=4;i<=len;i++){ 
                                 alen=length(a[i]); 
                                 if(a[i] ~ "Id"){ 
                                                      print substr(a[i],idlen,alen-idlen); 
                                 } 
                                 if(a[i] ~ "Dest"){ 
                                                          print substr(a[i],destlen,alen-destlen); i=len; 
                                 } 
            } 
}' temp1.log > ids.log

real    0m11.576s
user    0m11.463s
sys    0m0.103s


Also, I would like to use awk.

Can I do any better than this?

Last edited by srikanthradix (2011-09-22 21:55:08)


This profile is scheduled for deletion.

Offline

#2 2011-09-17 18:26:41

juster
Forum Fellow
Registered: 2008-10-07
Posts: 195

Re: [SOLVED] Faster script?

Here is a simple example using your latest approach which avoids splitting and using regexps for the sake of speed:

{
    i = index($0, "[Id=")                                                      
    id = substr($0, i + 4)
    id = substr(id, 1, index(id, "]") - 1)

    i = index($0, "[Destination=")
    dest = substr($0, i + 13)
    dest = substr(dest, 1, index(dest, "]") - 1)

    print id, dest
}

If there are no spaces in your data between the brackets, then it is slightly faster to use $3 instead of $0.

The next example changes the field separator (FS) to split each line where one or more brackets occur, instead of on whitespace. This also assumes that id and destination are always the third and fourth key/value pair enclosed in brackets. When splitting on brackets, $1 is the date and time string, so the id is $4 and the destination is $5.

BEGIN {
    FS="[[\\]]+" # \\ are converted to single \ in string                      
                 # \] in regexp escapes the ] inside [...]                     
}
{ print substr($4, index($4, "=") + 1), substr($5, index($5, "=") + 1) }

Offline

#3 2011-09-17 18:46:10

Procyon
Member
Registered: 2008-05-07
Posts: 1,819

Re: [SOLVED] Faster script?

@Juster:

I went for getting rid of the regex too. I got it 20% faster by putting the reduced strings back in $0 and putting the first call to index inside the substr.

time awk '{
                                                          
    $0 = substr($0, index($0,"[Id=") + 4)
    id = substr($0, 1, index($0, "]") - 1)

    
    $0 = substr($0, index($0, "[Destination=") + 13)
    dest = substr($0, 1, index($0, "]") - 1)

    print id, dest
}' temp1.log > /dev/null

Last edited by Procyon (2011-09-17 19:00:50)

Offline

#4 2011-09-17 19:32:40

srikanthradix
Member
Registered: 2010-10-19
Posts: 35

Re: [SOLVED] Faster script?

time awk '{
    i = index($3, "[Id=")    
    id = substr($3, i + 4)
    id = substr(id, 1, index(id, "]") - 1)

    i = index($3, "[Destination=")
    dest = substr($3, i + 13)
    dest = substr(dest, 1, index(dest, "]") - 1)

    print id, dest
}' temp1.log > ids.log

real    0m4.219s
user    0m4.146s
sys    0m0.067s


The Id and Destination can be in any position after $3. I should have mentioned that(mea culpa).Hence, I can't assign $0 to the value of substr.

But, This works for me. That definitely is a much speedy script than mine.

Thanks to you both.

Last edited by srikanthradix (2011-09-18 01:15:57)


This profile is scheduled for deletion.

Offline

Board footer

Powered by FluxBB