You are not logged in.
Pages: 1
20110920 19:59:59.752441 [ORDER PENDINGACCEPT ] [seq=178122233][Major=O][Minor=a][src=osf][Id=11579995][ref=53839822][SourceSystemTimeStamp=2011-09-20 23:59:59.750][OrderID<25100>=11579995][Shares<38>=400][OrderType<40>=2][Side<54>=5][Destination<25104>=router][Tif<59>=5][Capacity<528>=A][RefId<25105>=1150414889][ParentID<25101>=-1]
Need to get Destination "router" and Id "11579995" substring from the line using python. How can I do that? There are lot of tags which are avoided for brevity and Tags are not in fixed location. but they all come after seq Tag.
From my previous post: https://bbs.archlinux.org/viewtopic.php?id=126660 awk seems pretty fast.
Also, how do I close it as SOLVED once the answer/solution is obtained?
Last edited by srikanthradix (2011-09-22 21:17:16)
This profile is scheduled for deletion.
Offline
Also, how do I close it as SOLVED once the answer/solution is obtained?
Edit your first post and add '[solved]' to the topic line.
https://wiki.archlinux.org/index.php/Fo … ow_to_Post
Last edited by karol (2011-09-22 21:49:56)
Offline
re module perhaps?
Evil #archlinux@libera.chat channel op and general support dude.
. files on github, Screenshots, Random pics and the rest
Offline
for line in open("temp.log"):
found = line.find("Id=")
if found > -1:
next=line.find("]",found)
subs=line[found+3:next]
print subs
Is there a better way to do it? or is this it? I mean performance wise.
This profile is scheduled for deletion.
Offline
bash-3.2$ time python temp.py > ids1.log
real 0m0.300s
user 0m0.286s
sys 0m0.012s
bash-3.2$ cat temp.py
for line in open("temp.log"):
start_idx = line.find("Id=")
if start_idx > -1:
end_idx=line.find("]",start_idx)
subs=line[start_idx+3:end_idx]
print subs
start_idx = line.find("<25104>=")
if start_idx > -1:
end_idx=line.find("]",start_idx)
subs=line[start_idx+8:end_idx]
print subs
where as when I do it with old awk
bash-3.2$ time awk '{
i = index($0, "Id=")
if(i > 0) {
id = substr($0, i + 3)
id = substr(id, 1, index(id, "]") - 1)
print id
}
i = index($0, "<25104>=")
if(i > 0) {
dest = substr($0, i + 8)
dest = substr(dest, 1, index(dest, "]") - 1)
print dest
}
}' temp.log > ids.log
real 0m0.189s
user 0m0.177s
sys 0m0.012s
Last edited by srikanthradix (2011-09-23 15:02:56)
This profile is scheduled for deletion.
Offline
Time using the re module too.
Evil #archlinux@libera.chat channel op and general support dude.
. files on github, Screenshots, Random pics and the rest
Offline
import re
f=open("temp.log")
pattern=r'\[Destination<\d+>=(.+)\]'
out=re.search(patterns,f
print(out.group(1))
Offline
<<Execution>>
bash-3.2$ python temp.py > ids1.log
<<Output>>
bash-3.2$ tail -1 ids1.log
0.277968883514
<<temp.py>>
bash-3.2$ cat temp.py
from time import time as clock
start = clock()
for line in open("temp.log"):
start_idx = line.find("Id=")
if start_idx > -1:
end_idx=line.find("]",start_idx)
subs=line[start_idx+3:end_idx]
print subs
start_idx = line.find("<25104>=")
if start_idx > -1:
end_idx=line.find("]",start_idx)
subs=line[start_idx+8:end_idx]
print subs
diff = (clock() - start)
print diff
This profile is scheduled for deletion.
Offline
regular expression is wreaking havoc on the time
<<Execute>>
bash-3.2$ python temp2.py > ids1.log
<<Output>>
bash-3.2$ tail -1 ids1.log
0.770469903946
<<Code>>
bash-3.2$ cat temp2.py
import re
file = open("temp.log")
from time import time as clock
start = clock()
while 1:
lines = file.readlines(10000)
if not lines:
break
for line in lines:
out=re.search(r"(<25104>)\=(?P<dest>\w+)", line)
if out is None:
pass
else:
print(out.group('dest'))
out=re.search(r"(Id)\=(?P<id>\w+)", line)
if out is None:
pass
else:
print(out.group('id'))
diff = (clock() - start)
print diff
This profile is scheduled for deletion.
Offline
<<Execution>>
bash-3.2$ python temp.py > ids1.log
<<Output>>
bash-3.2$ tail -1 ids1.log
0.277968883514
<<temp.py>>
bash-3.2$ cat temp.py from time import time as clock start = clock() for line in open("temp.log"): start_idx = line.find("Id=") if start_idx > -1: end_idx=line.find("]",start_idx) subs=line[start_idx+3:end_idx] print subs start_idx = line.find("<25104>=") if start_idx > -1: end_idx=line.find("]",start_idx) subs=line[start_idx+8:end_idx] print subs diff = (clock() - start) print diff
your timing method is probably giving you bogus results.
firstly: never use time.time() for benchmarking code. it will almost always give inaccurate results (see here for why). use the timeit module instead.
secondly: don't include print statements in the code you're testing because the i/o will mask the real performance of your algorithm.
try running your code like this:
from timeit import timeit
def func():
output = []
for line in open("temp.log"):
start_idx = line.find("Id=")
if start_idx > -1:
end_idx=line.find("]",start_idx)
subs=line[start_idx+3:end_idx]
output.append(subs)
start_idx = line.find("<25104>=")
if start_idx > -1:
end_idx=line.find("]",start_idx)
subs=line[start_idx+8:end_idx]
output.append(subs)
return '\n'.join(output)
time = timeit('func()', 'from __main__ import func', number=3)
print 'func: %.8f sec/pass' % (time / 3)
Offline
print [item for item in str.split("][") if "Destination" in item or "ID" in item]
output:
['OrderID<25100>=11579995', 'Destination<25104>=router', 'ParentID<25101>=-1]']
performance is O(n*2) (because str.split() gets run before the list comprehension, since python doesn't come with a lazy version of split for strings)
Last edited by Nisstyre56 (2011-09-24 19:44:49)
In Zen they say: If something is boring after two minutes, try it for four. If still boring, try it for eight, sixteen, thirty-two, and so on. Eventually one discovers that it's not boring at all but very interesting.
~ John Cage
Offline
Pages: 1