efficiency of the "in" operator in python..

vkumar · 2009-04-12 03:01:01

It's a small thing, but I'd like to know what is more efficient when checking if object "a" is the same as objects "b" and "c";

def c1():
...     a = 1
...     if a == 2 or a == 3:
...             print "ok, no"
...     else:
...             print "kk"

  2           0 LOAD_CONST               1 (1)
              3 STORE_FAST               0 (a)

  3           6 LOAD_FAST                0 (a)
              9 LOAD_CONST               2 (2)
             12 COMPARE_OP               2 (==)
             15 JUMP_IF_TRUE            13 (to 31)
             18 POP_TOP             
             19 LOAD_FAST                0 (a)
             22 LOAD_CONST               3 (3)
             25 COMPARE_OP               2 (==)
             28 JUMP_IF_FALSE            9 (to 40)
        >>   31 POP_TOP             

  4          32 LOAD_CONST               4 ('ok, no')
             35 PRINT_ITEM          
             36 PRINT_NEWLINE       
             37 JUMP_FORWARD             6 (to 46)
        >>   40 POP_TOP             

  6          41 LOAD_CONST               5 ('kk')
             44 PRINT_ITEM          
             45 PRINT_NEWLINE       
        >>   46 LOAD_CONST               0 (None)
             49 RETURN_VALUE

or...

def c2():
...     a = 1
...     if a in (2, 3):
...             print "lol whut?"
...     else:
...             print "fine.."

  2           0 LOAD_CONST               1 (1)
              3 STORE_FAST               0 (a)

  3           6 LOAD_FAST                0 (a)
              9 LOAD_CONST               6 ((2, 3))
             12 COMPARE_OP               6 (in)
             15 JUMP_IF_FALSE            9 (to 27)
             18 POP_TOP             

  4          19 LOAD_CONST               4 ('lol whut?')
             22 PRINT_ITEM          
             23 PRINT_NEWLINE       
             24 JUMP_FORWARD             6 (to 33)
        >>   27 POP_TOP             

  6          28 LOAD_CONST               5 ('fine..')
             31 PRINT_ITEM          
             32 PRINT_NEWLINE       
        >>   33 LOAD_CONST               0 (None)
             36 RETURN_VALUE

c2 appears to use less instructions, but it also populates a tuple and uses the "in" operator - both of which appear heavier than their atom and "==" counterparts.

Any thoughts?

Killa B · 2009-04-12 03:08:12

I think c2() is just silly.

c1() makes a lot more sense, and would be less likely to enrage the people who read your code.

I have no idea which one is faster, but either way I'd say c1() is better.

buttons · 2009-04-12 03:28:48

Killa B wrote:

I think c2() is just silly.
c1() makes a lot more sense, and would be less likely to enrage the people who read your code.
I have no idea which one is faster, but either way I'd say c1() is better.

I'm not sure I could disagree with this more.

c2 is more readable and concise.

It is also likely to be more consistent. i.e., what if it were 4 variables?

if a == 1 or a == 2 or a == 3 or a == 4

or

if a in xrange(1,5)

etc.

buttons · 2009-04-12 03:36:51

On topic, there's an easy way to test this.

I took your code (minus the print statements) and added a for loop that ran it a million times and timed it a few times and recorded the average. Obviously there could be other factors, but in general this is pretty good. You could use python -c cProfile if you REALLY wanted exact numbers.

c1 avg over a million: 0.258
c2 avg over a million: 0.264

If this is in your inner loop, it will need to execute 125,000,000 times before you would notice 1s of difference.

cactus · 2009-04-12 04:10:58

This thread is Hilarious!

Last edited by cactus (2009-04-12 04:11:37)

moljac024 · 2009-04-12 07:53:41

lol @ cactus showing up with that same post over and over

vkumar · 2009-04-12 14:27:06

If this is in your inner loop, it will need to execute 125,000,000 times before you would notice 1s of difference.

!

"a == 2 or a == 3" it is, then.

edit:
I'm not great at this math stuff, but I don't get 125,000,000.. 0.264x - 0.258x = 0.006x, 1x / 0.006x = 167. 167 * 1,000,000 = 167,000,000 runs, on average.

edit #2:
Yeah, but still, death to "in"!!

edit #3:
Hm.. but is this a valid "in" benchmark? By examining the disassembly, we find that Python creates and assigns to "a" every time c{1,2} is run - so if that was nested in a for loop, it would be done *n* amount of times. The same is true for the (2, 3) tuple. This is overhead! So I propose;

a = 1
conds = (2, 3)
for x in range(0, 1000000):
  if a == 2 or a == 3:
  # if a in conds:
    pass
  else:
    pass

Last edited by vkumar (2009-04-12 16:55:38)

vkumar · 2009-04-12 17:17:34

And the shocking results;

Python 3.0.1 (r301:69561, Feb 13 2009, 20:04:18) [MSC v.1500 32 bit (Intel)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> import cProfile
>>> def c1():
... for x in range(0, 1000000):
... if a == 2 or a == 3:
... pass
... else:
... pass
...
>>> def c2():
... for x in range(0, 1000000):
... if a in conds:
... pass
... else:
... pass
...
>>> global a, conds
>>> a = 1
>>> conds = (2, 3)
>>> def t1():
... cProfile.run('c1()')
... cProfile.run('c2()')
...
>>> t1()
4 function calls in 0.348 CPU seconds
Ordered by: standard name
ncalls tottime percall cumtime percall filename:lineno(function)
1 0.348 0.348 0.348 0.348 <stdin>:1(c1)
1 0.000 0.000 0.348 0.348 <string>:1(<module>)
1 0.000 0.000 0.348 0.348 {built-in method exec}
1 0.000 0.000 0.000 0.000 {method 'disable' of '_lsprof.Profiler' objects}

4 function calls in 0.274 CPU seconds
Ordered by: standard name
ncalls tottime percall cumtime percall filename:lineno(function)
1 0.274 0.274 0.274 0.274 <stdin>:1(c2)
1 0.000 0.000 0.274 0.274 <string>:1(<module>)
1 0.000 0.000 0.274 0.274 {built-in method exec}
1 0.000 0.000 0.000 0.000 {method 'disable' of '_lsprof.Profiler' objects}

>>> t1()
4 function calls in 0.335 CPU seconds
Ordered by: standard name
ncalls tottime percall cumtime percall filename:lineno(function)
1 0.335 0.335 0.335 0.335 <stdin>:1(c1)
1 0.000 0.000 0.335 0.335 <string>:1(<module>)
1 0.000 0.000 0.335 0.335 {built-in method exec}
1 0.000 0.000 0.000 0.000 {method 'disable' of '_lsprof.Profiler' objects}

4 function calls in 0.273 CPU seconds
Ordered by: standard name
ncalls tottime percall cumtime percall filename:lineno(function)
1 0.273 0.273 0.273 0.273 <stdin>:1(c2)
1 0.000 0.000 0.273 0.273 <string>:1(<module>)
1 0.000 0.000 0.273 0.273 {built-in method exec}
1 0.000 0.000 0.000 0.000 {method 'disable' of '_lsprof.Profiler' objects}

So c2 is faster than c1 (after a million iterations) by (0.062+0.074)/2 seconds --> 0.068 secs. 1/0.068 = 14.706. 14.706 * 1000000 = 14,706,000 runs until you lose an entire second of time to c1.

Conclusion?
"in" is actually great!

Dusty · 2009-04-12 18:24:32

More or less unrelated:

http://python.net/~goodger/projects/pyc … possible-1

You might also want to test the python 2.5 any() keyword here. I'd also run it through pypy's experimental JIT. Its always nice to have a collection of benchmarks to ignore.

Python is slow. Readability is more important than efficiency, or you would be writing a C extension. Even if you are using python, there are a lot more important things you can be optimizing. 'in' is definitely the winner when it comes to efficiency.

In short, I'm with cactus.

Dusty

vkumar · 2009-04-12 19:58:20

Its always nice to have a collection of benchmarks to ignore.

What else am I going to do with my spare time? read? I'll stick with my useless benchmarks, thank you very much

freakcode · 2009-04-12 20:16:20

For anything big you would be using sets or matrices anyway... "in" is a great shorthand.

tam1138 · 2009-04-13 01:13:50

Magic numbers are bad.

good_numbers = (2, 3)
a = 1
if a in good_numbers:
    print "yay"
else:
    print "boo"

pedepy · 2009-04-24 21:41:19

vkumar wrote:

And the shocking results;
Python 3.0.1 (r301:69561, Feb 13 2009, 20:04:18) [MSC v.1500 32 bit (Intel)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> import cProfile

uhhh .. win32?

BUSTED !

but seriously; maybe you need to test 'in' against different expressions, as in, evaluating in lists vs tuples, mesure if there is greating difference when using a higher number of values to compare against, etc.. there might not be a big difference in finding within a list of 2 or 3 values, but the difference might be more obvious if searching for a substring in a string of a few thousand characters in lenght... (which is also not very elegant to do in any other way..)

Last edited by pedepy (2009-04-24 21:45:24)

Arch Linux

#1 2009-04-12 03:01:01

efficiency of the "in" operator in python..

#2 2009-04-12 03:08:12

Re: efficiency of the "in" operator in python..

#3 2009-04-12 03:28:48

Re: efficiency of the "in" operator in python..

#4 2009-04-12 03:36:51

Re: efficiency of the "in" operator in python..

#5 2009-04-12 04:10:58

Re: efficiency of the "in" operator in python..

#6 2009-04-12 07:53:41

Re: efficiency of the "in" operator in python..

#7 2009-04-12 14:27:06

Re: efficiency of the "in" operator in python..

#8 2009-04-12 17:17:34

Re: efficiency of the "in" operator in python..

#9 2009-04-12 18:24:32

Re: efficiency of the "in" operator in python..

#10 2009-04-12 19:58:20

Re: efficiency of the "in" operator in python..

#11 2009-04-12 20:16:20

Re: efficiency of the "in" operator in python..

#12 2009-04-13 01:13:50

Re: efficiency of the "in" operator in python..

#13 2009-04-24 21:41:19

Re: efficiency of the "in" operator in python..

Board footer