[solved]how to extract recent log entries from a file (based on time)?

oliver · 2011-05-16 21:23:22

I have a daily log file with hundreds of thousands of entries in the following format.

field1,field2,field3,field4,field5,field6,field7,field8,field9,20110516192001.100
field1,field2,field3,field4,field5,field6,field7,field8,field9,20110516192002.200
field1,field2,field3,field4,field5,field6,field7,field8,field9,20110516192003.300
field1,field2,field3,field4,field5,field6,field7,field8,field9,20110516192004.400
field1,field2,field3,field4,field5,field6,field7,field8,field9,20110516192005.500

It's always in the same format and the 10th field is always the timestamp (YYYYMMDDHHMMSS.MS)
Since the file rotates daily, the 10th field will always be 20110516xxxxxx.xxx for today and will be 20110517xxxxxx.xxx tomorrow

What I want to do is only look at entries that have been written in the last 30 minutes.

At a high level, here's my plan
1) Get the date/time from 30 minutes ago... write it to a variable
2) Iterate through the file line by line comparing the 10th field to the variable, if it's larger write the line to a tmp file
3) Use tmp file for my analysis

This seems incredibly inefficient to me... what would be a more graceful way to do it? I have regular solaris tools at my disposal (plus python)

Thanks

Last edited by oliver (2011-05-17 12:41:43)

fukawi2 · 2011-05-16 23:13:01

I'm guessing something with awk will do the job, but I'm not familiar enough with awk to be able to write something for you.

Here's an easy way to work out 30 minutes ago though:

TZ_START=$(date +%Y%m%d%H%M%S -d '30 minutes ago')

rockin turtle · 2011-05-17 06:57:47

The algorith you describe really is a viable approach. Since this is a log file, each line should have a time stamp later than all lines that preceed it in the file. A more efficient algoithm could do a binary search through the file for the time stamp you are interested in. This would be easy enough in to do in C or python, but your algoithm could be fast enough. If this is the case, you could try the following quick & dirty bash script.

#!/bin/bash

seconds() {
    secs=$(($1 % 100))
    mins=$(($1 / 100 % 100))
    hrs=$(($1 / 10000 % 100))
    days=$(($1 / 1000000 % 100))
    month=$(($1 / 100000000 % 100))
    year=$(($1 / 10000000000))
    (LC_TIME=C date +%s -d $(printf "%d-%02d-%02d %2d:%02d:%02d" $year $month $days $hrs $mins $secs))
}

found=0
now=$(date +%s)
while read line
do
    if [ "$found" -eq "0" ]
    then
        ts=${line##*,}
        ts=$(seconds ${ts%.*})
        diff=$(( ($now - $ts)/60 ))
        [[ $diff -lt "30" ]] && found=1
    fi
    [[ $found -ne 0 ]] && echo "$line"
done < $1

It will write (to stdout) all lines following the first line that has been time stamped within the last 30 minutes (ignoring milliseconds). You could redirect the output of this script to a file of your choice for analysis as follows:

$ ./script logfile > tmp

Last edited by rockin turtle (2011-05-17 06:58:41)

quigybo · 2011-05-17 07:18:43

Since it is a daily log file, you don't need to worry about the days and a simple numerical comparison will suffice:

TZ_START=$(date +%Y%m%d%H%M%S -d '30 minutes ago')
awk -F, '{if($10 >= '$TZ_START') print}' < old > new

fukawi2 · 2011-05-17 09:37:14

That's what I was trying to get at quigybo, nice

skanky · 2011-05-17 09:57:04

quigybo wrote:

Since it is a daily log file, you don't need to worry about the days and a simple numerical comparison will suffice:
TZ_START=$(date +%Y%m%d%H%M%S -d '30 minutes ago')
awk -F, '{if($10 >= '$TZ_START') print}' < old > new

Just for the record, though there's no real advantage in this case, if you're using gawk, the timestamp can be generated in the gawk script:

gawk -F, 'BEGIN{time = strftime("%Y%m%d%H%M%S", systime() - 1800)}
               {if($10 >= time') print}' < old > new

oliver · 2011-05-17 12:34:35

nice - thank you all for the advice and support. I'm marking this as solved :-)

Arch Linux

#1 2011-05-16 21:23:22

[solved]how to extract recent log entries from a file (based on time)?

#2 2011-05-16 23:13:01

Re: [solved]how to extract recent log entries from a file (based on time)?

#3 2011-05-17 06:57:47

Re: [solved]how to extract recent log entries from a file (based on time)?

#4 2011-05-17 07:18:43

Re: [solved]how to extract recent log entries from a file (based on time)?

#5 2011-05-17 09:37:14

Re: [solved]how to extract recent log entries from a file (based on time)?

#6 2011-05-17 09:57:04

Re: [solved]how to extract recent log entries from a file (based on time)?

#7 2011-05-17 12:34:35

Re: [solved]how to extract recent log entries from a file (based on time)?

Board footer