You are not logged in.

#1 2011-09-22 21:10:30

srikanthradix
Member
Registered: 2010-10-19
Posts: 35

Obtain substring using python.

20110920 19:59:59.752441 [ORDER PENDINGACCEPT        ] [seq=178122233][Major=O][Minor=a][src=osf][Id=11579995][ref=53839822][SourceSystemTimeStamp=2011-09-20 23:59:59.750][OrderID<25100>=11579995][Shares<38>=400][OrderType<40>=2][Side<54>=5][Destination<25104>=router][Tif<59>=5][Capacity<528>=A][RefId<25105>=1150414889][ParentID<25101>=-1]

Need to get Destination "router" and Id "11579995" substring from the line using python. How can I do that? There are lot of tags which are avoided for brevity and Tags are not in fixed location. but they all come after seq Tag.

From my previous post: https://bbs.archlinux.org/viewtopic.php?id=126660 awk seems pretty fast.

Also, how do I close it as SOLVED once the answer/solution is obtained?

Last edited by srikanthradix (2011-09-22 21:17:16)


This profile is scheduled for deletion.

Offline

#2 2011-09-22 21:48:11

karol
Archivist
Registered: 2009-05-06
Posts: 25,440

Re: Obtain substring using python.

srikanthradix wrote:

Also, how do I close it as SOLVED once the answer/solution is obtained?

Edit your first post and add '[solved]' to the topic line.
https://wiki.archlinux.org/index.php/Fo … ow_to_Post

Last edited by karol (2011-09-22 21:49:56)

Offline

#3 2011-09-23 02:22:28

Mr.Elendig
#archlinux@freenode channel op
From: The intertubes
Registered: 2004-11-07
Posts: 4,094

Re: Obtain substring using python.

re module perhaps?


Evil #archlinux@libera.chat channel op and general support dude.
. files on github, Screenshots, Random pics and the rest

Offline

#4 2011-09-23 14:31:27

srikanthradix
Member
Registered: 2010-10-19
Posts: 35

Re: Obtain substring using python.

for line in open("temp.log"):
        found = line.find("Id=")
        if found > -1:
                next=line.find("]",found)
                subs=line[found+3:next]
                print subs

Is there a better way to do it? or is this it? I mean performance wise.


This profile is scheduled for deletion.

Offline

#5 2011-09-23 15:01:40

srikanthradix
Member
Registered: 2010-10-19
Posts: 35

Re: Obtain substring using python.

bash-3.2$ time python temp.py > ids1.log

real    0m0.300s
user    0m0.286s
sys     0m0.012s

bash-3.2$ cat temp.py
for line in open("temp.log"):
        start_idx = line.find("Id=")
        if start_idx > -1:
                end_idx=line.find("]",start_idx)
                subs=line[start_idx+3:end_idx]
                print subs
        start_idx = line.find("<25104>=")
        if start_idx > -1:
                end_idx=line.find("]",start_idx)
                subs=line[start_idx+8:end_idx]
                print subs

where as when I do it with old awk

bash-3.2$ time awk '{
    i = index($0, "Id=")
    if(i > 0) {
    id = substr($0, i + 3)
    id = substr(id, 1, index(id, "]") - 1)
    print id
    }
    i = index($0, "<25104>=")
    if(i > 0) {
    dest = substr($0, i + 8)
    dest = substr(dest, 1, index(dest, "]") - 1)
    print dest
    }
}' temp.log > ids.log

real    0m0.189s
user    0m0.177s
sys     0m0.012s

Last edited by srikanthradix (2011-09-23 15:02:56)


This profile is scheduled for deletion.

Offline

#6 2011-09-23 15:50:12

Mr.Elendig
#archlinux@freenode channel op
From: The intertubes
Registered: 2004-11-07
Posts: 4,094

Re: Obtain substring using python.

Time using the re module too.


Evil #archlinux@libera.chat channel op and general support dude.
. files on github, Screenshots, Random pics and the rest

Offline

#7 2011-09-23 16:02:53

marxav
Member
From: Gatineau, PQ, Canada
Registered: 2006-09-24
Posts: 386

Re: Obtain substring using python.

import re
f=open("temp.log")
pattern=r'\[Destination<\d+>=(.+)\]'
out=re.search(patterns,f
print(out.group(1))

Offline

#8 2011-09-23 16:07:56

srikanthradix
Member
Registered: 2010-10-19
Posts: 35

Re: Obtain substring using python.

<<Execution>>

bash-3.2$ python temp.py > ids1.log

<<Output>>

bash-3.2$ tail -1 ids1.log

0.277968883514

<<temp.py>>

bash-3.2$ cat temp.py
from time import time as clock

start = clock()

for line in open("temp.log"):
        start_idx = line.find("Id=")
        if start_idx > -1:
                end_idx=line.find("]",start_idx)
                subs=line[start_idx+3:end_idx]
                print subs
        start_idx = line.find("<25104>=")
        if start_idx > -1:
                end_idx=line.find("]",start_idx)
                subs=line[start_idx+8:end_idx]
                print subs

diff = (clock() - start)
print diff

This profile is scheduled for deletion.

Offline

#9 2011-09-23 18:29:22

srikanthradix
Member
Registered: 2010-10-19
Posts: 35

Re: Obtain substring using python.

regular expression is wreaking havoc on the time

<<Execute>>

bash-3.2$ python temp2.py > ids1.log

<<Output>>

bash-3.2$ tail -1 ids1.log

0.770469903946

<<Code>>

bash-3.2$ cat temp2.py
import re
file = open("temp.log")
from time import time as clock
start = clock()
while 1:
        lines = file.readlines(10000)
        if not lines:
                break
        for line in lines:
                out=re.search(r"(<25104>)\=(?P<dest>\w+)", line)
                if out is None:
                        pass
                else:
                        print(out.group('dest'))

                out=re.search(r"(Id)\=(?P<id>\w+)", line)
                if out is None:
                        pass
                else:
                        print(out.group('id'))

diff = (clock() - start)
print diff

This profile is scheduled for deletion.

Offline

#10 2011-09-24 00:43:00

kachelaqa
Member
Registered: 2010-09-26
Posts: 216

Re: Obtain substring using python.

srikanthradix wrote:

<<Execution>>

bash-3.2$ python temp.py > ids1.log

<<Output>>

bash-3.2$ tail -1 ids1.log

0.277968883514

<<temp.py>>

bash-3.2$ cat temp.py
from time import time as clock

start = clock()

for line in open("temp.log"):
        start_idx = line.find("Id=")
        if start_idx > -1:
                end_idx=line.find("]",start_idx)
                subs=line[start_idx+3:end_idx]
                print subs
        start_idx = line.find("<25104>=")
        if start_idx > -1:
                end_idx=line.find("]",start_idx)
                subs=line[start_idx+8:end_idx]
                print subs

diff = (clock() - start)
print diff

your timing method is probably giving you bogus results.

firstly: never use time.time() for benchmarking code. it will almost always give inaccurate results (see here for why). use the timeit module instead.

secondly: don't include print statements in the code you're testing because the i/o will mask the real performance of your algorithm.

try running your code like this:

from timeit import timeit

def func():
    output = []
    for line in open("temp.log"):
        start_idx = line.find("Id=")
        if start_idx > -1:
            end_idx=line.find("]",start_idx)
            subs=line[start_idx+3:end_idx]
            output.append(subs)
        start_idx = line.find("<25104>=")
        if start_idx > -1:
            end_idx=line.find("]",start_idx)
            subs=line[start_idx+8:end_idx]
            output.append(subs)
    return '\n'.join(output)

time = timeit('func()', 'from __main__ import func', number=3)
print 'func: %.8f sec/pass' % (time / 3)

Offline

#11 2011-09-24 19:38:37

Nisstyre56
Member
From: Canada
Registered: 2010-03-25
Posts: 85

Re: Obtain substring using python.

print [item for item in str.split("][") if "Destination" in item or "ID" in item]

output:

['OrderID<25100>=11579995', 'Destination<25104>=router', 'ParentID<25101>=-1]']

performance is O(n*2) (because str.split() gets run before the list comprehension, since python doesn't come with a lazy version of split for strings)

Last edited by Nisstyre56 (2011-09-24 19:44:49)


In Zen they say: If something is boring after two minutes, try it for four. If still boring, try it for eight, sixteen, thirty-two, and so on. Eventually one discovers that it's not boring at all but very interesting.
~ John Cage

Offline

Board footer

Powered by FluxBB