You are not logged in.

#1 2011-05-16 21:23:22

oliver
Member
Registered: 2007-12-12
Posts: 448

[solved]how to extract recent log entries from a file (based on time)?

I have a daily log file with hundreds of thousands of entries in the following format. 

field1,field2,field3,field4,field5,field6,field7,field8,field9,20110516192001.100
field1,field2,field3,field4,field5,field6,field7,field8,field9,20110516192002.200
field1,field2,field3,field4,field5,field6,field7,field8,field9,20110516192003.300
field1,field2,field3,field4,field5,field6,field7,field8,field9,20110516192004.400
field1,field2,field3,field4,field5,field6,field7,field8,field9,20110516192005.500

It's always in the same format and the 10th field is always the timestamp (YYYYMMDDHHMMSS.MS)
Since the file rotates daily, the 10th field will always be 20110516xxxxxx.xxx for today and will be 20110517xxxxxx.xxx tomorrow

What I want to do is only look at entries that have been written in the last 30 minutes.

At a high level, here's my plan
1) Get the date/time from 30 minutes ago... write it to a variable
2) Iterate through the file line by line comparing the 10th field to the variable, if it's larger write the line to a tmp file
3) Use tmp file for my analysis

This seems incredibly inefficient to me...  what would be a more graceful way to do it?  I have regular solaris tools at my disposal (plus python)

Thanks

Last edited by oliver (2011-05-17 12:41:43)

Offline

#2 2011-05-16 23:13:01

fukawi2
Ex-Administratorino
From: .vic.au
Registered: 2007-09-28
Posts: 6,231
Website

Re: [solved]how to extract recent log entries from a file (based on time)?

I'm guessing something with awk will do the job, but I'm not familiar enough with awk to be able to write something for you.

Here's an easy way to work out 30 minutes ago though:

TZ_START=$(date +%Y%m%d%H%M%S -d '30 minutes ago')

Offline

#3 2011-05-17 06:57:47

rockin turtle
Member
From: Montana, USA
Registered: 2009-10-22
Posts: 227

Re: [solved]how to extract recent log entries from a file (based on time)?

The algorith you describe really is a viable approach.  Since this is a log file, each line should have a time stamp later than all lines that preceed it in the file. A more efficient algoithm could do a binary search through the file for the time stamp you are interested in.  This would be easy enough in to do in C or python, but your algoithm could be fast enough. If this is the case, you could try the following quick & dirty bash script.

#!/bin/bash

seconds() {
    secs=$(($1 % 100))
    mins=$(($1 / 100 % 100))
    hrs=$(($1 / 10000 % 100))
    days=$(($1 / 1000000 % 100))
    month=$(($1 / 100000000 % 100))
    year=$(($1 / 10000000000))
    (LC_TIME=C date +%s -d $(printf "%d-%02d-%02d %2d:%02d:%02d" $year $month $days $hrs $mins $secs))
}

found=0
now=$(date +%s)
while read line
do
    if [ "$found" -eq "0" ]
    then
        ts=${line##*,}
        ts=$(seconds ${ts%.*})
        diff=$(( ($now - $ts)/60 ))
        [[ $diff -lt "30" ]] && found=1
    fi
    [[ $found -ne 0 ]] && echo "$line"
done < $1

It will write (to stdout) all lines following the first line that has been time stamped within the last 30 minutes (ignoring milliseconds). You could redirect the output of this script to a file of your choice for analysis as follows:

$ ./script logfile > tmp

Last edited by rockin turtle (2011-05-17 06:58:41)

Offline

#4 2011-05-17 07:18:43

quigybo
Member
Registered: 2009-01-15
Posts: 223

Re: [solved]how to extract recent log entries from a file (based on time)?

Since it is a daily log file, you don't need to worry about the days and a simple numerical comparison will suffice:

TZ_START=$(date +%Y%m%d%H%M%S -d '30 minutes ago')
awk -F, '{if($10 >= '$TZ_START') print}' < old > new

Offline

#5 2011-05-17 09:37:14

fukawi2
Ex-Administratorino
From: .vic.au
Registered: 2007-09-28
Posts: 6,231
Website

Re: [solved]how to extract recent log entries from a file (based on time)?

That's what I was trying to get at quigybo, nice big_smile

Offline

#6 2011-05-17 09:57:04

skanky
Member
From: WAIS
Registered: 2009-10-23
Posts: 1,847

Re: [solved]how to extract recent log entries from a file (based on time)?

quigybo wrote:

Since it is a daily log file, you don't need to worry about the days and a simple numerical comparison will suffice:

TZ_START=$(date +%Y%m%d%H%M%S -d '30 minutes ago')
awk -F, '{if($10 >= '$TZ_START') print}' < old > new

Just for the record, though there's no real advantage in this case, if you're using gawk, the timestamp can be generated in the gawk script:

gawk -F, 'BEGIN{time = strftime("%Y%m%d%H%M%S", systime() - 1800)}
               {if($10 >= time') print}' < old > new

"...one cannot be angry when one looks at a penguin."  - John Ruskin
"Life in general is a bit shit, and so too is the internet. And that's all there is." - scepticisle

Offline

#7 2011-05-17 12:34:35

oliver
Member
Registered: 2007-12-12
Posts: 448

Re: [solved]how to extract recent log entries from a file (based on time)?

nice - thank you all for the advice and support.  I'm marking this as solved :-)

Offline

Board footer

Powered by FluxBB