You are not logged in.
Pages: 1
I've been learning Python by going through problems on Project Euler, and my script for Problem 14 took longer to execute then I thought it should. I then found another example for the problem, that was very similar, but took much less to execute. I modified both to make them as similar as possible, and the only difference is that the slow script stores 2 values in two int variables, and the fast one stores them in a tuple.
Here are the scripts...
#!/usr/bin/env python2
# This script takes ~4 seconds to complete.
known = {}
def countSteps(num):
steps = 0
while num > 1:
if num not in known:
if num % 2 == 0:
num /= 2
else:
num = 3*num + 1
steps += 1
else:
return steps + known[num]
return steps
highest = (0,0)
for n in xrange(1,1000000,1):
s = countSteps(n)
if s > highest[1]:
highest=(n,s)
known.update({n:s})
print "Answer:"
print "%d : %d" % (highest[0], highest[1])
This one is much slower...
#!/usr/bin/env python2
# This script takes ~50 seconds to complete.
known = {}
def countSteps(num):
steps = 0
while num > 1:
if num not in known:
if num % 2 == 0:
num /= 2
else:
num = 3*num + 1
steps += 1
else:
return steps + known[num]
return steps
high_count = 0
high_n = 0
for n in xrange(1,1000000,1):
s = countSteps(n)
if s > high_count:
high_count = s
high_n = n
known.update({high_n:high_count})
print "Answer:"
print "%d : $d" % (high_n, high_count)
Can anyone explain why one performs so much better than the other? Both of them yield a correct result.
Offline
#!/usr/bin/env python2
# This script takes ~50 (?) seconds to complete.
known = {}
def countSteps(num):
steps = 0
while num > 1:
if num not in known:
if num % 2 == 0:
num /= 2
else:
num = 3*num + 1
steps += 1
else:
return steps + known[num]
return steps
high_count = 0
high_n = 0
for n in xrange(1,1000000,1):
s = countSteps(n)
if s > high_count:
high_count = s
high_n = n
known.update({n:s})
print "Answer:"
print "%d : %d" % (high_n, high_count)
I used this version as the 'slow' one. 'Fast' one is unmodified.
$ time python slow.py
Answer:
837799 : 524
real0m7.117s
user0m6.510s
sys0m0.070s
$ time python fast.py
Answer:
837799 : 524
real0m7.149s
user0m6.553s
sys0m0.067s
I don't know python, so I would guess it's probably the typo on the last line. (?)
Offline
Ooops, yeah that was a typo.
However, I don't understand why it ran so fast for you... that slow script takes ~50 seconds to execute for me:
harry@ projecteuler > cat test-slow.py
#!/usr/bin/env python2
known = {}
def countSteps(num):
steps = 0
while num > 1:
if num not in known:
if num % 2 == 0:
num /= 2
else:
num = 3*num + 1
steps += 1
else:
return steps + known[num]
return steps
high_count = 0
high_n = 0
for n in xrange(1,1000000,1):
s = countSteps(n)
if s > high_count:
high_count = s
high_n = n
known.update({high_n:high_count})
print "Answer:"
print "%d : %d" % (high_n, high_count)
harry@ projecteuler > time python2 test-slow.py
Answer:
837799 : 524
real 0m49.408s
user 0m49.337s
sys 0m0.013s
I tested it on a few different boxes, with the same results.
Offline
Offline
The difference in execution time between the two for me is ~45 seconds. This enough of a difference that I'm sure just 'time' will suffice. I woudln't be as confused if it was just a coupel seconds. I've executed the script on 4 seperate boxes (all different distros/versions of python2), and they all yeild the same result for me...
On some advise, I ran both with cProfile:
> python2 -m cProfile test-slow.py
Answer:
837799 : 524
2000000 function calls in 52.688 seconds
Ordered by: standard name
ncalls tottime percall cumtime percall filename:lineno(function)
1 3.322 3.322 52.688 52.688 test-slow.py:3(<module>)
999999 47.892 0.000 47.892 0.000 test-slow.py:4(countSteps)
1 0.000 0.000 0.000 0.000 {method 'disable' of '_lsprof.Profiler' objects}
999999 1.473 0.000 1.473 0.000 {method 'update' of 'dict' objects}
> python2 -m cProfile test.py
Answer:
837799 : 524
2000000 function calls in 9.367 seconds
Ordered by: standard name
ncalls tottime percall cumtime percall filename:lineno(function)
1 3.281 3.281 9.367 9.367 test.py:3(<module>)
999999 4.510 0.000 4.510 0.000 test.py:4(countSteps)
1 0.000 0.000 0.000 0.000 {method 'disable' of '_lsprof.Profiler' objects}
999999 1.577 0.000 1.577 0.000 {method 'update' of 'dict' objects}
The difference shown is in the execution time of the countSteps() function, but the function in both scripts is exactly the same, and executed wiith the same exact variables...
Offline
So, when you change the line "known.update({high_n:high_count})" in the slow script to "known.update({n:s})", then it executes as quickly as the "fast" one. This is because, if you only store the lengths of those chains that qualify for being "high_count", you don't have a record of those shorter chains which are nonetheless important sub-chains for many of the starting values you need to calculate.
So, basically, a smaller table means less optimization. A fast execution time requires that you store the length of every chain you calculate, so that you can look it up later, for the cases in which it is a sub-chain later on. Remember that the length of Collatz chains is pretty unintuitive. A larger starting value does not imply a longer chain.
Last edited by austin.rbn (2011-06-03 10:47:26)
"Computer Science is embarrassed by the computer." -- Alan J. Perlis
Offline
Pages: 1