Python 实用宝典

Question 1

How do you access other class variables from a list comprehension within the class definition? The following works in Python 2 but fails in Python 3:

class Foo:
    x = 5
    y = [x for i in range(1)]

Python 3.2 gives the error:

NameError: global name 'x' is not defined

Trying Foo.x doesn’t work either. Any ideas on how to do this in Python 3?

A slightly more complicated motivating example:

from collections import namedtuple
class StateDatabase:
    State = namedtuple('State', ['name', 'capital'])
    db = [State(*args) for args in [
        ['Alabama', 'Montgomery'],
        ['Alaska', 'Juneau'],
        # ...
    ]]

In this example, apply() would have been a decent workaround, but it is sadly removed from Python 3.

Question 2

Class scope and list, set or dictionary comprehensions, as well as generator expressions do not mix.

The why; or, the official word on this

In Python 3, list comprehensions were given a proper scope (local namespace) of their own, to prevent their local variables bleeding over into the surrounding scope (see Python list comprehension rebind names even after scope of comprehension. Is this right?). That’s great when using such a list comprehension in a module or in a function, but in classes, scoping is a little, uhm, strange.

This is documented in pep 227:

Names in class scope are not accessible. Names are resolved in the innermost enclosing function scope. If a class definition occurs in a chain of nested scopes, the resolution process skips class definitions.

and in the class compound statement documentation:

The class’s suite is then executed in a new execution frame (see section Naming and binding), using a newly created local namespace and the original global namespace. (Usually, the suite contains only function definitions.) When the class’s suite finishes execution, its execution frame is discarded but its local namespace is saved. [4] A class object is then created using the inheritance list for the base classes and the saved local namespace for the attribute dictionary.

Emphasis mine; the execution frame is the temporary scope.

Because the scope is repurposed as the attributes on a class object, allowing it to be used as a nonlocal scope as well leads to undefined behaviour; what would happen if a class method referred to x as a nested scope variable, then manipulates Foo.x as well, for example? More importantly, what would that mean for subclasses of Foo? Python has to treat a class scope differently as it is very different from a function scope.

Last, but definitely not least, the linked Naming and binding section in the Execution model documentation mentions class scopes explicitly:

The scope of names defined in a class block is limited to the class block; it does not extend to the code blocks of methods – this includes comprehensions and generator expressions since they are implemented using a function scope. This means that the following will fail:
class A:
     a = 42
     b = list(a + i for i in range(10))

So, to summarize: you cannot access the class scope from functions, list comprehensions or generator expressions enclosed in that scope; they act as if that scope does not exist. In Python 2, list comprehensions were implemented using a shortcut, but in Python 3 they got their own function scope (as they should have had all along) and thus your example breaks. Other comprehension types have their own scope regardless of Python version, so a similar example with a set or dict comprehension would break in Python 2.

# Same error, in Python 2 or 3
y = {x: x for i in range(1)}

The (small) exception; or, why one part may still work

There’s one part of a comprehension or generator expression that executes in the surrounding scope, regardless of Python version. That would be the expression for the outermost iterable. In your example, it’s the range(1):

y = [x for i in range(1)]
#               ^^^^^^^^

Thus, using x in that expression would not throw an error:

# Runs fine
y = [i for i in range(x)]

This only applies to the outermost iterable; if a comprehension has multiple for clauses, the iterables for inner for clauses are evaluated in the comprehension’s scope:

# NameError
y = [i for i in range(1) for j in range(x)]

This design decision was made in order to throw an error at genexp creation time instead of iteration time when creating the outermost iterable of a generator expression throws an error, or when the outermost iterable turns out not to be iterable. Comprehensions share this behavior for consistency.

Looking under the hood; or, way more detail than you ever wanted

You can see this all in action using the dis module. I’m using Python 3.3 in the following examples, because it adds qualified names that neatly identify the code objects we want to inspect. The bytecode produced is otherwise functionally identical to Python 3.2.

To create a class, Python essentially takes the whole suite that makes up the class body (so everything indented one level deeper than the class <name>: line), and executes that as if it were a function:

>>> import dis
>>> def foo():
...     class Foo:
...         x = 5
...         y = [x for i in range(1)]
...     return Foo
... 
>>> dis.dis(foo)
  2           0 LOAD_BUILD_CLASS     
              1 LOAD_CONST               1 (<code object Foo at 0x10a436030, file "<stdin>", line 2>) 
              4 LOAD_CONST               2 ('Foo') 
              7 MAKE_FUNCTION            0 
             10 LOAD_CONST               2 ('Foo') 
             13 CALL_FUNCTION            2 (2 positional, 0 keyword pair) 
             16 STORE_FAST               0 (Foo) 

  5          19 LOAD_FAST                0 (Foo) 
             22 RETURN_VALUE

The first LOAD_CONST there loads a code object for the Foo class body, then makes that into a function, and calls it. The result of that call is then used to create the namespace of the class, its __dict__. So far so good.

The thing to note here is that the bytecode contains a nested code object; in Python, class definitions, functions, comprehensions and generators all are represented as code objects that contain not only bytecode, but also structures that represent local variables, constants, variables taken from globals, and variables taken from the nested scope. The compiled bytecode refers to those structures and the python interpreter knows how to access those given the bytecodes presented.

The important thing to remember here is that Python creates these structures at compile time; the class suite is a code object (<code object Foo at 0x10a436030, file "<stdin>", line 2>) that is already compiled.

Let’s inspect that code object that creates the class body itself; code objects have a co_consts structure:

>>> foo.__code__.co_consts
(None, <code object Foo at 0x10a436030, file "<stdin>", line 2>, 'Foo')
>>> dis.dis(foo.__code__.co_consts[1])
  2           0 LOAD_FAST                0 (__locals__) 
              3 STORE_LOCALS         
              4 LOAD_NAME                0 (__name__) 
              7 STORE_NAME               1 (__module__) 
             10 LOAD_CONST               0 ('foo.<locals>.Foo') 
             13 STORE_NAME               2 (__qualname__) 

  3          16 LOAD_CONST               1 (5) 
             19 STORE_NAME               3 (x) 

  4          22 LOAD_CONST               2 (<code object <listcomp> at 0x10a385420, file "<stdin>", line 4>) 
             25 LOAD_CONST               3 ('foo.<locals>.Foo.<listcomp>') 
             28 MAKE_FUNCTION            0 
             31 LOAD_NAME                4 (range) 
             34 LOAD_CONST               4 (1) 
             37 CALL_FUNCTION            1 (1 positional, 0 keyword pair) 
             40 GET_ITER             
             41 CALL_FUNCTION            1 (1 positional, 0 keyword pair) 
             44 STORE_NAME               5 (y) 
             47 LOAD_CONST               5 (None) 
             50 RETURN_VALUE

The above bytecode creates the class body. The function is executed and the resulting locals() namespace, containing x and y is used to create the class (except that it doesn’t work because x isn’t defined as a global). Note that after storing 5 in x, it loads another code object; that’s the list comprehension; it is wrapped in a function object just like the class body was; the created function takes a positional argument, the range(1) iterable to use for its looping code, cast to an iterator. As shown in the bytecode, range(1) is evaluated in the class scope.

From this you can see that the only difference between a code object for a function or a generator, and a code object for a comprehension is that the latter is executed immediately when the parent code object is executed; the bytecode simply creates a function on the fly and executes it in a few small steps.

Python 2.x uses inline bytecode there instead, here is output from Python 2.7:

  2           0 LOAD_NAME                0 (__name__)
              3 STORE_NAME               1 (__module__)

  3           6 LOAD_CONST               0 (5)
              9 STORE_NAME               2 (x)

  4          12 BUILD_LIST               0
             15 LOAD_NAME                3 (range)
             18 LOAD_CONST               1 (1)
             21 CALL_FUNCTION            1
             24 GET_ITER            
        >>   25 FOR_ITER                12 (to 40)
             28 STORE_NAME               4 (i)
             31 LOAD_NAME                2 (x)
             34 LIST_APPEND              2
             37 JUMP_ABSOLUTE           25
        >>   40 STORE_NAME               5 (y)
             43 LOAD_LOCALS         
             44 RETURN_VALUE

No code object is loaded, instead a FOR_ITER loop is run inline. So in Python 3.x, the list generator was given a proper code object of its own, which means it has its own scope.

However, the comprehension was compiled together with the rest of the python source code when the module or script was first loaded by the interpreter, and the compiler does not consider a class suite a valid scope. Any referenced variables in a list comprehension must look in the scope surrounding the class definition, recursively. If the variable wasn’t found by the compiler, it marks it as a global. Disassembly of the list comprehension code object shows that x is indeed loaded as a global:

>>> foo.__code__.co_consts[1].co_consts
('foo.<locals>.Foo', 5, <code object <listcomp> at 0x10a385420, file "<stdin>", line 4>, 'foo.<locals>.Foo.<listcomp>', 1, None)
>>> dis.dis(foo.__code__.co_consts[1].co_consts[2])
  4           0 BUILD_LIST               0 
              3 LOAD_FAST                0 (.0) 
        >>    6 FOR_ITER                12 (to 21) 
              9 STORE_FAST               1 (i) 
             12 LOAD_GLOBAL              0 (x) 
             15 LIST_APPEND              2 
             18 JUMP_ABSOLUTE            6 
        >>   21 RETURN_VALUE

This chunk of bytecode loads the first argument passed in (the range(1) iterator), and just like the Python 2.x version uses FOR_ITER to loop over it and create its output.

Had we defined x in the foo function instead, x would be a cell variable (cells refer to nested scopes):

>>> def foo():
...     x = 2
...     class Foo:
...         x = 5
...         y = [x for i in range(1)]
...     return Foo
... 
>>> dis.dis(foo.__code__.co_consts[2].co_consts[2])
  5           0 BUILD_LIST               0 
              3 LOAD_FAST                0 (.0) 
        >>    6 FOR_ITER                12 (to 21) 
              9 STORE_FAST               1 (i) 
             12 LOAD_DEREF               0 (x) 
             15 LIST_APPEND              2 
             18 JUMP_ABSOLUTE            6 
        >>   21 RETURN_VALUE

The LOAD_DEREF will indirectly load x from the code object cell objects:

>>> foo.__code__.co_cellvars               # foo function `x`
('x',)
>>> foo.__code__.co_consts[2].co_cellvars  # Foo class, no cell variables
()
>>> foo.__code__.co_consts[2].co_consts[2].co_freevars  # Refers to `x` in foo
('x',)
>>> foo().y
[2]

The actual referencing looks the value up from the current frame data structures, which were initialized from a function object’s .__closure__ attribute. Since the function created for the comprehension code object is discarded again, we do not get to inspect that function’s closure. To see a closure in action, we’d have to inspect a nested function instead:

>>> def spam(x):
...     def eggs():
...         return x
...     return eggs
... 
>>> spam(1).__code__.co_freevars
('x',)
>>> spam(1)()
1
>>> spam(1).__closure__
>>> spam(1).__closure__[0].cell_contents
1
>>> spam(5).__closure__[0].cell_contents
5

So, to summarize:

List comprehensions get their own code objects in Python 3, and there is no difference between code objects for functions, generators or comprehensions; comprehension code objects are wrapped in a temporary function object and called immediately.
Code objects are created at compile time, and any non-local variables are marked as either global or as free variables, based on the nested scopes of the code. The class body is not considered a scope for looking up those variables.
When executing the code, Python has only to look into the globals, or the closure of the currently executing object. Since the compiler didn’t include the class body as a scope, the temporary function namespace is not considered.

A workaround; or, what to do about it

If you were to create an explicit scope for the x variable, like in a function, you can use class-scope variables for a list comprehension:

>>> class Foo:
...     x = 5
...     def y(x):
...         return [x for i in range(1)]
...     y = y(x)
... 
>>> Foo.y
[5]

The ‘temporary’ y function can be called directly; we replace it when we do with its return value. Its scope is considered when resolving x:

>>> foo.__code__.co_consts[1].co_consts[2]
<code object y at 0x10a5df5d0, file "<stdin>", line 4>
>>> foo.__code__.co_consts[1].co_consts[2].co_cellvars
('x',)

Of course, people reading your code will scratch their heads over this a little; you may want to put a big fat comment in there explaining why you are doing this.

The best work-around is to just use __init__ to create an instance variable instead:

def __init__(self):
    self.y = [self.x for i in range(1)]

and avoid all the head-scratching, and questions to explain yourself. For your own concrete example, I would not even store the namedtuple on the class; either use the output directly (don’t store the generated class at all), or use a global:

from collections import namedtuple
State = namedtuple('State', ['name', 'capital'])

class StateDatabase:
    db = [State(*args) for args in [
       ('Alabama', 'Montgomery'),
       ('Alaska', 'Juneau'),
       # ...
    ]]

Question 3

In my opinion it is a flaw in Python 3. I hope they change it.

Old Way (works in 2.7, throws NameError: name 'x' is not defined in 3+):

class A:
    x = 4
    y = [x+i for i in range(1)]

NOTE: simply scoping it with A.x would not solve it

New Way (works in 3+):

class A:
    x = 4
    y = (lambda x=x: [x+i for i in range(1)])()

Because the syntax is so ugly I just initialize all my class variables in the constructor typically

Question 4

The accepted answer provides excellent information, but there appear to be a few other wrinkles here — differences between list comprehension and generator expressions. A demo that I played around with:

class Foo:

    # A class-level variable.
    X = 10

    # I can use that variable to define another class-level variable.
    Y = sum((X, X))

    # Works in Python 2, but not 3.
    # In Python 3, list comprehensions were given their own scope.
    try:
        Z1 = sum([X for _ in range(3)])
    except NameError:
        Z1 = None

    # Fails in both.
    # Apparently, generator expressions (that's what the entire argument
    # to sum() is) did have their own scope even in Python 2.
    try:
        Z2 = sum(X for _ in range(3))
    except NameError:
        Z2 = None

    # Workaround: put the computation in lambda or def.
    compute_z3 = lambda val: sum(val for _ in range(3))

    # Then use that function.
    Z3 = compute_z3(X)

    # Also worth noting: here I can refer to XS in the for-part of the
    # generator expression (Z4 works), but I cannot refer to XS in the
    # inner-part of the generator expression (Z5 fails).
    XS = [15, 15, 15, 15]
    Z4 = sum(val for val in XS)
    try:
        Z5 = sum(XS[i] for i in range(len(XS)))
    except NameError:
        Z5 = None

print(Foo.Z1, Foo.Z2, Foo.Z3, Foo.Z4, Foo.Z5)

Question 5

This is a bug in Python. Comprehensions are advertised as being equivalent to for loops, but this is not true in classes. At least up to Python 3.6.6, in a comprehension used in a class, only one variable from outside the comprehension is accessible inside the comprehension, and it must be used as the outermost iterator. In a function, this scope limitation does not apply.

To illustrate why this is a bug, let’s return to the original example. This fails:

class Foo:
    x = 5
    y = [x for i in range(1)]

But this works:

def Foo():
    x = 5
    y = [x for i in range(1)]

The limitation is stated at the end of this section in the reference guide.

Question 6

Since the outermost iterator is evaluated in the surrounding scope we can use zip together with itertools.repeat to carry the dependencies over to the comprehension’s scope:

import itertools as it

class Foo:
    x = 5
    y = [j for i, j in zip(range(3), it.repeat(x))]

One can also use nested for loops in the comprehension and include the dependencies in the outermost iterable:

class Foo:
    x = 5
    y = [j for j in (x,) for i in range(3)]

For the specific example of the OP:

from collections import namedtuple
import itertools as it

class StateDatabase:
    State = namedtuple('State', ['name', 'capital'])
    db = [State(*args) for State, args in zip(it.repeat(State), [
        ['Alabama', 'Montgomery'],
        ['Alaska', 'Juneau'],
        # ...
    ])]

Question 7

Lets suppose I have a list like this:

mylist = ["a","b","c","d"]

To get the values printed along with their index I can use Python’s enumerate function like this

>>> for i,j in enumerate(mylist):
...     print i,j
...
0 a
1 b
2 c
3 d
>>>

Now, when I try to use it inside a list comprehension it gives me this error

>>> [i,j for i,j in enumerate(mylist)]
  File "<stdin>", line 1
    [i,j for i,j in enumerate(mylist)]
           ^
SyntaxError: invalid syntax

So, my question is: what is the correct way of using enumerate inside list comprehension?

Question 8

Try this:

[(i, j) for i, j in enumerate(mylist)]

You need to put i,j inside a tuple for the list comprehension to work. Alternatively, given that enumerate() already returns a tuple, you can return it directly without unpacking it first:

[pair for pair in enumerate(mylist)]

Either way, the result that gets returned is as expected:

> [(0, 'a'), (1, 'b'), (2, 'c'), (3, 'd')]

Question 9

Just to be really clear, this has nothing to do with enumerate and everything to do with list comprehension syntax.

This list comprehension returns a list of tuples:

[(i,j) for i in range(3) for j in 'abc']

this a list of dicts:

[{i:j} for i in range(3) for j in 'abc']

a list of lists:

[[i,j] for i in range(3) for j in 'abc']

a syntax error:

[i,j for i in range(3) for j in 'abc']

Which is inconsistent (IMHO) and confusing with dictionary comprehensions syntax:

>>> {i:j for i,j in enumerate('abcdef')}
{0: 'a', 1: 'b', 2: 'c', 3: 'd', 4: 'e', 5: 'f'}

And a set of tuples:

>>> {(i,j) for i,j in enumerate('abcdef')}
set([(0, 'a'), (4, 'e'), (1, 'b'), (2, 'c'), (5, 'f'), (3, 'd')])

As Óscar López stated, you can just pass the enumerate tuple directly:

>>> [t for t in enumerate('abcdef') ] 
[(0, 'a'), (1, 'b'), (2, 'c'), (3, 'd'), (4, 'e'), (5, 'f')]

Question 10

Or, if you don’t insist on using a list comprehension:

>>> mylist = ["a","b","c","d"]
>>> list(enumerate(mylist))
[(0, 'a'), (1, 'b'), (2, 'c'), (3, 'd')]

Question 11

If you’re using long lists, it appears the list comprehension’s faster, not to mention more readable.

~$ python -mtimeit -s"mylist = ['a','b','c','d']" "list(enumerate(mylist))"
1000000 loops, best of 3: 1.61 usec per loop
~$ python -mtimeit -s"mylist = ['a','b','c','d']" "[(i, j) for i, j in enumerate(mylist)]"
1000000 loops, best of 3: 0.978 usec per loop
~$ python -mtimeit -s"mylist = ['a','b','c','d']" "[t for t in enumerate(mylist)]"
1000000 loops, best of 3: 0.767 usec per loop

Question 12

Here’s a way to do it:

>>> mylist = ['a', 'b', 'c', 'd']
>>> [item for item in enumerate(mylist)]
[(0, 'a'), (1, 'b'), (2, 'c'), (3, 'd')]

Alternatively, you can do:

>>> [(i, j) for i, j in enumerate(mylist)]
[(0, 'a'), (1, 'b'), (2, 'c'), (3, 'd')]

The reason you got an error was that you were missing the () around i and j to make it a tuple.

Question 13

Be explicit about the tuples.

[(i, j) for (i, j) in enumerate(mylist)]

Question 14

All great answer guys. I know the question here is specific to enumeration but how about something like this, just another perspective

from itertools import izip, count
a = ["5", "6", "1", "2"]
tupleList = list( izip( count(), a ) )
print(tupleList)

It becomes more powerful, if one has to iterate multiple lists in parallel in terms of performance. Just a thought

a = ["5", "6", "1", "2"]
b = ["a", "b", "c", "d"]
tupleList = list( izip( count(), a, b ) )
print(tupleList)

Question 15

I have some a list comprehension in Python in which each iteration can throw an exception.

For instance, if I have:

eggs = (1,3,0,3,2)

[1/egg for egg in eggs]

I’ll get a ZeroDivisionError exception in the 3rd element.

How can I handle this exception and continue execution of the list comprehension?

The only way I can think of is to use a helper function:

def spam(egg):
    try:
        return 1/egg
    except ZeroDivisionError:
        # handle division by zero error
        # leave empty for now
        pass

But this looks a bit cumbersome to me.

Is there a better way to do this in Python?

Note: This is a simple example (see “for instance” above) that I contrived because my real example requires some context. I’m not interested in avoiding divide by zero errors but in handling exceptions in a list comprehension.

Question 16

There is no built-in expression in Python that lets you ignore an exception (or return alternate values &c in case of exceptions), so it’s impossible, literally speaking, to “handle exceptions in a list comprehension” because a list comprehension is an expression containing other expression, nothing more (i.e., no statements, and only statements can catch/ignore/handle exceptions).

Function calls are expression, and the function bodies can include all the statements you want, so delegating the evaluation of the exception-prone sub-expression to a function, as you’ve noticed, is one feasible workaround (others, when feasible, are checks on values that might provoke exceptions, as also suggested in other answers).

The correct responses to the question “how to handle exceptions in a list comprehension” are all expressing part of all of this truth: 1) literally, i.e. lexically IN the comprehension itself, you can’t; 2) practically, you delegate the job to a function or check for error prone values when that’s feasible. Your repeated claim that this is not an answer is thus unfounded.

Question 17

I realize this question is quite old, but you can also create a general function to make this kind of thing easier:

def catch(func, handle=lambda e : e, *args, **kwargs):
    try:
        return func(*args, **kwargs)
    except Exception as e:
        return handle(e)

Then, in your comprehension:

eggs = (1,3,0,3,2)
[catch(lambda : 1/egg) for egg in eggs]
[1, 0, ('integer division or modulo by zero'), 0, 0]

You can of course make the default handle function whatever you want (say you’d rather return ‘None’ by default).

Hope this helps you or any future viewers of this question!

Note: in python 3, I would make the ‘handle’ argument keyword only, and put it at the end of the argument list. This would make actually passing arguments and such through catch much more natural.

Question 18

You can use

[1/egg for egg in eggs if egg != 0]

this will simply skip elements that are zero.

Question 19

No there’s not a better way. In a lot of cases you can use avoidance like Peter does

Your other option is to not use comprehensions

eggs = (1,3,0,3,2)

result=[]
for egg in eggs:
    try:
        result.append(egg/0)
    except ZeroDivisionError:
        # handle division by zero error
        # leave empty for now
        pass

Up to you to decide whether that is more cumbersome or not

Question 20

I think a helper function, as suggested by the one who asks the initial question and Bryan Head as well, is good and not cumbersome at all. A single line of magic code which does all the work is just not always possible so a helper function is a perfect solution if one wants to avoid for loops. However I would modify it to this one:

# A modified version of the helper function by the Question starter 
def spam(egg):
    try:
        return 1/egg, None
    except ZeroDivisionError as err:
        # handle division by zero error        
        return None, err

The output will be this [(1/1, None), (1/3, None), (None, ZeroDivisionError), (1/3, None), (1/2, None)]. With this answer you are in full control to continue in any way you want.

Alternative:

def spam2(egg):
    try:
        return 1/egg 
    except ZeroDivisionError:
        # handle division by zero error        
        return ZeroDivisionError

Yes, the error is returned, not raised.

Question 21

I didn’t see any answer mention this. But this example would be one way of preventing an exception from being raised for known failing cases.

eggs = (1,3,0,3,2)
[1/egg if egg > 0 else None for egg in eggs]


Output: [1, 0, None, 0, 0]

Question 22

Comprehensions are having some unexpected interactions with scoping. Is this the expected behavior?

I’ve got a method:

def leave_room(self, uid):
  u = self.user_by_id(uid)
  r = self.rooms[u.rid]

  other_uids = [ouid for ouid in r.users_by_id.keys() if ouid != u.uid]
  other_us = [self.user_by_id(uid) for uid in other_uids]

  r.remove_user(uid) # OOPS! uid has been re-bound by the list comprehension above

  # Interestingly, it's rebound to the last uid in the list, so the error only shows
  # up when len > 1

At the risk of whining, this is a brutal source of errors. As I write new code, I just occasionally find very weird errors due to rebinding — even now that I know it’s a problem. I need to make a rule like “always preface temp vars in list comprehensions with underscore”, but even that’s not fool-proof.

The fact that there’s this random time-bomb waiting kind of negates all the nice “ease of use” of list comprehensions.

Question 23

List comprehensions leak the loop control variable in Python 2 but not in Python 3. Here’s Guido van Rossum (creator of Python) explaining the history behind this:

We also made another change in Python 3, to improve equivalence between list comprehensions and generator expressions. In Python 2, the list comprehension “leaks” the loop control variable into the surrounding scope:
x = 'before'
a = [x for x in 1, 2, 3]
print x # this prints '3', not 'before'
This was an artifact of the original implementation of list comprehensions; it was one of Python’s “dirty little secrets” for years. It started out as an intentional compromise to make list comprehensions blindingly fast, and while it was not a common pitfall for beginners, it definitely stung people occasionally. For generator expressions we could not do this. Generator expressions are implemented using generators, whose execution requires a separate execution frame. Thus, generator expressions (especially if they iterate over a short sequence) were less efficient than list comprehensions.

However, in Python 3, we decided to fix the “dirty little secret” of list comprehensions by using the same implementation strategy as for generator expressions. Thus, in Python 3, the above example (after modification to use print(x) :-) will print ‘before’, proving that the ‘x’ in the list comprehension temporarily shadows but does not override the ‘x’ in the surrounding scope.

Question 24

Yes, list comprehensions “leak” their variable in Python 2.x, just like for loops.

In retrospect, this was recognized to be a mistake, and it was avoided with generator expressions. EDIT: As Matt B. notes it was also avoided when set and dictionary comprehension syntaxes were backported from Python 3.

List comprehensions’ behavior had to be left as it is in Python 2, but it’s fully fixed in Python 3.

This means that in all of:

list(x for x in a if x>32)
set(x//4 for x in a if x>32)         # just another generator exp.
dict((x, x//16) for x in a if x>32)  # yet another generator exp.
{x//4 for x in a if x>32}            # 2.7+ syntax
{x: x//16 for x in a if x>32}        # 2.7+ syntax

the x is always local to the expression while these:

[x for x in a if x>32]
set([x//4 for x in a if x>32])         # just another list comp.
dict([(x, x//16) for x in a if x>32])  # yet another list comp.

in Python 2.x all leak the x variable to the surrounding scope.

UPDATE for Python 3.8(?): PEP 572 will introduce := assignment operator that deliberately leaks out of comprehensions and generator expressions! It’s motivated by essentially 2 use cases: capturing a “witness” from early-terminating functions like any() and all():

if any((comment := line).startswith('#') for line in lines):
    print("First comment:", comment)
else:
    print("There are no comments")

and updating mutable state:

total = 0
partial_sums = [total := total + v for v in values]

See Appendix B for exact scoping. The variable is assigned in closest surrounding def or lambda, unless that function declares it nonlocal or global.

Question 25

Yes, assignment occurs there, just like it would in a for loop. No new scope is being created.

This is definitely the expected behavior: on each cycle, the value is bound to the name you specify. For instance,

>>> x=0
>>> a=[1,54,4,2,32,234,5234,]
>>> [x for x in a if x>32]
[54, 234, 5234]
>>> x
5234

Once that’s recognized, it seems easy enough to avoid: don’t use existing names for the variables within comprehensions.

Question 26

Interestingly this doesn’t affect dictionary or set comprehensions.

>>> [x for x in range(1, 10)]
[1, 2, 3, 4, 5, 6, 7, 8, 9]
>>> x
9
>>> {x for x in range(1, 5)}
set([1, 2, 3, 4])
>>> x
9
>>> {x:x for x in range(1, 100)}
{1: 1, 2: 2, 3: 3, 4: 4, 5: 5, 6: 6, 7: 7, 8: 8, 9: 9, 10: 10, 11: 11, 12: 12, 13: 13, 14: 14, 15: 15, 16: 16, 17: 17, 18: 18, 19: 19, 20: 20, 21: 21, 22: 22, 23: 23, 24: 24, 25: 25, 26: 26, 27: 27, 28: 28, 29: 29, 30: 30, 31: 31, 32: 32, 33: 33, 34: 34, 35: 35, 36: 36, 37: 37, 38: 38, 39: 39, 40: 40, 41: 41, 42: 42, 43: 43, 44: 44, 45: 45, 46: 46, 47: 47, 48: 48, 49: 49, 50: 50, 51: 51, 52: 52, 53: 53, 54: 54, 55: 55, 56: 56, 57: 57, 58: 58, 59: 59, 60: 60, 61: 61, 62: 62, 63: 63, 64: 64, 65: 65, 66: 66, 67: 67, 68: 68, 69: 69, 70: 70, 71: 71, 72: 72, 73: 73, 74: 74, 75: 75, 76: 76, 77: 77, 78: 78, 79: 79, 80: 80, 81: 81, 82: 82, 83: 83, 84: 84, 85: 85, 86: 86, 87: 87, 88: 88, 89: 89, 90: 90, 91: 91, 92: 92, 93: 93, 94: 94, 95: 95, 96: 96, 97: 97, 98: 98, 99: 99}
>>> x
9

However it has been fixed in 3 as noted above.

Question 27

some workaround, for python 2.6, when this behaviour is not desirable

# python
Python 2.6.6 (r266:84292, Aug  9 2016, 06:11:56)
Type "help", "copyright", "credits" or "license" for more information.
>>> x=0
>>> a=list(x for x in xrange(9))
>>> x
0
>>> a=[x for x in xrange(9)]
>>> x
8

Question 28

In python3 while in list comprehension the variable is not getting change after it’s scope over but when we use simple for-loop the variable is getting reassigned out of scope.

i = 1 print(i) print([i in range(5)]) print(i) Value of i will remain 1 only.

Now just use simply for loop the value of i will be reassigned.

Question 29

I would like to know if there is a better way to print all objects in a Python list than this :

myList = [Person("Foo"), Person("Bar")]
print("\n".join(map(str, myList)))
Foo
Bar

I read this way is not really good :

myList = [Person("Foo"), Person("Bar")]
for p in myList:
    print(p)

Isn’t there something like :

print(p) for p in myList

If not, my question is… why ? If we can do this kind of stuff with comprehensive lists, why not as a simple statement outside a list ?

Question 30

Assuming you are using Python 3.x:

print(*myList, sep='\n')

You can get the same behavior on Python 2.x using from __future__ import print_function, as noted by mgilson in comments.

With the print statement on Python 2.x you will need iteration of some kind, regarding your question about print(p) for p in myList not working, you can just use the following which does the same thing and is still one line:

for p in myList: print p

For a solution that uses '\n'.join(), I prefer list comprehensions and generators over map() so I would probably use the following:

print '\n'.join(str(p) for p in myList)

Question 31

I use this all the time :

#!/usr/bin/python

l = [1,2,3,7] 
print "".join([str(x) for x in l])

Question 32

[print(a) for a in list] will give a bunch of None types at the end though it prints out all the items

Question 33

For Python 2.*:

If you overload the function __str__() for your Person class, you can omit the part with map(str, …). Another way for this is creating a function, just like you wrote:

def write_list(lst):
    for item in lst:
        print str(item) 

...

write_list(MyList)

There is in Python 3.* the argument sep for the print() function. Take a look at documentation.

Question 34

Expanding @lucasg’s answer (inspired by the comment it received):

To get a formatted list output, you can do something along these lines:

l = [1,2,5]
print ", ".join('%02d'%x for x in l)

01, 02, 05

Now the ", " provides the separator (only between items, not at the end) and the formatting string '02d'combined with %x gives a formatted string for each item x – in this case, formatted as an integer with two digits, left-filled with zeros.

Question 35

To display each content, I use:

mylist = ['foo', 'bar']
indexval = 0
for i in range(len(mylist)):     
    print(mylist[indexval])
    indexval += 1

Example of using in a function:

def showAll(listname, startat):
   indexval = startat
   try:
      for i in range(len(mylist)):
         print(mylist[indexval])
         indexval = indexval + 1
   except IndexError:
      print('That index value you gave is out of range.')

Hope I helped.

Question 36

I think this is the most convenient if you just want to see the content in the list:

myList = ['foo', 'bar']
print('myList is %s' % str(myList))

Simple, easy to read and can be used together with format string.

Question 37

OP’s question is: does something like following exists, if not then why

print(p) for p in myList # doesn't work, OP's intuition

answer is, it does exist which is:

[p for p in myList] #works perfectly

Basically, use [] for list comprehension and get rid of print to avoiding printing None. To see why print prints None see this

Question 38

I recently made a password generator and although I’m VERY NEW to python, I whipped this up as a way to display all items in a list (with small edits to fit your needs…

    x = 0
    up = 0
    passwordText = ""
    password = []
    userInput = int(input("Enter how many characters you want your password to be: "))
    print("\n\n\n") # spacing

    while x <= (userInput - 1): #loops as many times as the user inputs above
            password.extend([choice(groups.characters)]) #adds random character from groups file that has all lower/uppercase letters and all numbers
            x = x+1 #adds 1 to x w/o using x ++1 as I get many errors w/ that
            passwordText = passwordText + password[up]
            up = up+1 # same as x increase


    print(passwordText)

Like I said, IM VERY NEW to Python and I’m sure this is way to clunky for a expert, but I’m just here for another example

Question 39

Assuming you are fine with your list being printed [1,2,3], then an easy way in Python3 is:

mylist=[1,2,3,'lorem','ipsum','dolor','sit','amet']

print(f"There are {len(mylist):d} items in this lorem list: {str(mylist):s}")

Running this produces the following output:

There are 8 items in this lorem list: [1, 2, 3, ‘lorem’, ‘ipsum’, ‘dolor’, ‘sit’, ‘amet’]

Question 40

Think about a function that I’m calling for its side effects, not return values (like printing to screen, updating GUI, printing to a file, etc.).

def fun_with_side_effects(x):
    ...side effects...
    return y

Now, is it Pythonic to use list comprehensions to call this func:

[fun_with_side_effects(x) for x in y if (...conditions...)]

Note that I don’t save the list anywhere

Or should I call this func like this:

for x in y:
    if (...conditions...):
        fun_with_side_effects(x)

Which is better and why?

Question 41

It is very anti-Pythonic to do so, and any seasoned Pythonista will give you hell over it. The intermediate list is thrown away after it is created, and it could potentially be very, very large, and therefore expensive to create.

Question 42

You shouldn’t use a list comprehension, because as people have said that will build a large temporary list that you don’t need. The following two methods are equivalent:

consume(side_effects(x) for x in xs)

for x in xs:
    side_effects(x)

with the definition of consume from the itertools man page:

def consume(iterator, n=None):
    "Advance the iterator n-steps ahead. If n is none, consume entirely."
    # Use functions that consume iterators at C speed.
    if n is None:
        # feed the entire iterator into a zero-length deque
        collections.deque(iterator, maxlen=0)
    else:
        # advance to the empty slice starting at position n
        next(islice(iterator, n, n), None)

Of course, the latter is clearer and easier to understand.

Question 43

List comprehensions are for creating lists. And unless you are actually creating a list, you should not use list comprehensions.

So I would got for the second option, just iterating over the list and then call the function when the conditions apply.

Question 44

Second is better.

Think of the person who would need to understand your code. You can get bad karma easily with the first :)

You could go middle between the two by using filter(). Consider the example:

y=[1,2,3,4,5,6]
def func(x):
    print "call with %r"%x

for x in filter(lambda x: x>3, y):
    func(x)

Question 45

Depends on your goal.

If you are trying to do some operation on each object in a list, the second approach should be adopted.

If you are trying to generate a list from another list, you may use list comprehension.

Explicit is better than implicit. Simple is better than complex. (Python Zen)

Question 46

You can do

for z in (fun_with_side_effects(x) for x in y if (...conditions...)): pass

but it’s not very pretty.

Question 47

Using a list comprehension for its side effects is ugly, non-Pythonic, inefficient, and I wouldn’t do it. I would use a for loop instead, because a for loop signals a procedural style in which side-effects are important.

But, if you absolutely insist on using a list comprehension for its side effects, you should avoid the inefficiency by using a generator expression instead. If you absolutely insist on this style, do one of these two:

any(fun_with_side_effects(x) and False for x in y if (...conditions...))

or:

all(fun_with_side_effects(x) or True for x in y if (...conditions...))

These are generator expressions, and they do not generate a random list that gets tossed out. I think the all form is perhaps slightly more clear, though I think both of them are confusing and shouldn’t be used.

I think this is ugly and I wouldn’t actually do it in code. But if you insist on implementing your loops in this fashion, that’s how I would do it.

I tend to feel that list comprehensions and their ilk should signal an attempt to use something at least faintly resembling a functional style. Putting things with side effects that break that assumption will cause people to have to read your code more carefully, and I think that’s a bad thing.

Question 48

Are for loops really “bad”? If not, in what situation(s) would they be better than using a more conventional “vectorized” approach?¹

I am familiar with the concept of “vectorization”, and how pandas employs vectorized techniques to speed up computation. Vectorized functions broadcast operations over the entire series or DataFrame to achieve speedups much greater than conventionally iterating over the data.

However, I am quite surprised to see a lot of code (including from answers on Stack Overflow) offering solutions to problems that involve looping through data using for loops and list comprehensions. The documentation and API say that loops are “bad”, and that one should “never” iterate over arrays, series, or DataFrames. So, how come I sometimes see users suggesting loop-based solutions?

_{1 – While it is true that the question sounds somewhat broad, the truth is that there are very specific situations when for loops are usually better than conventionally iterating over data. This post aims to capture this for posterity.}

Question 49

TLDR; No, for loops are not blanket “bad”, at least, not always. It is probably more accurate to say that some vectorized operations are slower than iterating, versus saying that iteration is faster than some vectorized operations. Knowing when and why is key to getting the most performance out of your code. In a nutshell, these are the situations where it is worth considering an alternative to vectorized pandas functions:

When your data is small (…depending on what you’re doing),
When dealing with object/mixed dtypes
When using the str/regex accessor functions

Let’s examine these situations individually.

Iteration v/s Vectorization on Small Data

Pandas follows a “Convention Over Configuration” approach in its API design. This means that the same API has been fitted to cater to a broad range of data and use cases.

When a pandas function is called, the following things (among others) must internally be handled by the function, to ensure working

Index/axis alignment
Handling mixed datatypes
Handling missing data

Almost every function will have to deal with these to varying extents, and this presents an overhead. The overhead is less for numeric functions (for example, Series.add), while it is more pronounced for string functions (for example, Series.str.replace).

for loops, on the other hand, are faster then you think. What’s even better is list comprehensions (which create lists through for loops) are even faster as they are optimized iterative mechanisms for list creation.

List comprehensions follow the pattern

[f(x) for x in seq]

Where seq is a pandas series or DataFrame column. Or, when operating over multiple columns,

[f(x, y) for x, y in zip(seq1, seq2)]

Where seq1 and seq2 are columns.

Numeric Comparison
Consider a simple boolean indexing operation. The list comprehension method has been timed against Series.ne (!=) and query. Here are the functions:

# Boolean indexing with Numeric value comparison.
df[df.A != df.B]                            # vectorized !=
df.query('A != B')                          # query (numexpr)
df[[x != y for x, y in zip(df.A, df.B)]]    # list comp

For simplicity, I have used the perfplot package to run all the timeit tests in this post. The timings for the operations above are below:

The list comprehension outperforms query for moderately sized N, and even outperforms the vectorized not equals comparison for tiny N. Unfortunately, the list comprehension scales linearly, so it does not offer much performance gain for larger N.

Note
It is worth mentioning that much of the benefit of list comprehension come from not having to worry about the index alignment, but this means that if your code is dependent on indexing alignment, this will break. In some cases, vectorised operations over the underlying NumPy arrays can be considered as bringing in the “best of both worlds”, allowing for vectorisation without all the unneeded overhead of the pandas functions. This means that you can rewrite the operation above as
df[df.A.values != df.B.values]
Which outperforms both the pandas and list comprehension equivalents:

NumPy vectorization is out of the scope of this post, but it is definitely worth considering, if performance matters.

Value Counts
Taking another example – this time, with another vanilla python construct that is faster than a for loop – collections.Counter. A common requirement is to compute the value counts and return the result as a dictionary. This is done with value_counts, np.unique, and Counter:

# Value Counts comparison.
ser.value_counts(sort=False).to_dict()           # value_counts
dict(zip(*np.unique(ser, return_counts=True)))   # np.unique
Counter(ser)                                     # Counter

The results are more pronounced, Counter wins out over both vectorized methods for a larger range of small N (~3500).

Note
More trivia (courtesy @user2357112). The Counter is implemented with a C accelerator, so while it still has to work with python objects instead of the underlying C datatypes, it is still faster than a for loop. Python power!

Of course, the take away from here is that the performance depends on your data and use case. The point of these examples is to convince you not to rule out these solutions as legitimate options. If these still don’t give you the performance you need, there is always cython and numba. Let’s add this test into the mix.

from numba import njit, prange

@njit(parallel=True)
def get_mask(x, y):
    result = [False] * len(x)
    for i in prange(len(x)):
        result[i] = x[i] != y[i]

    return np.array(result)

df[get_mask(df.A.values, df.B.values)] # numba

Numba offers JIT compilation of loopy python code to very powerful vectorized code. Understanding how to make numba work involves a learning curve.

Operations with Mixed/`object` dtypes

String-based Comparison
Revisiting the filtering example from the first section, what if the columns being compared are strings? Consider the same 3 functions above, but with the input DataFrame cast to string.

# Boolean indexing with string value comparison.
df[df.A != df.B]                            # vectorized !=
df.query('A != B')                          # query (numexpr)
df[[x != y for x, y in zip(df.A, df.B)]]    # list comp

So, what changed? The thing to note here is that string operations are inherently difficult to vectorize. Pandas treats strings as objects, and all operations on objects fall back to a slow, loopy implementation.

Now, because this loopy implementation is surrounded by all the overhead mentioned above, there is a constant magnitude difference between these solutions, even though they scale the same.

When it comes to operations on mutable/complex objects, there is no comparison. List comprehension outperforms all operations involving dicts and lists.

Accessing Dictionary Value(s) by Key
Here are timings for two operations that extract a value from a column of dictionaries: map and the list comprehension. The setup is in the Appendix, under the heading “Code Snippets”.

# Dictionary value extraction.
ser.map(operator.itemgetter('value'))     # map
pd.Series([x.get('value') for x in ser])  # list comprehension

Positional List Indexing
Timings for 3 operations that extract the 0th element from a list of columns (handling exceptions), map, str.get accessor method, and the list comprehension:

# List positional indexing. 
def get_0th(lst):
    try:
        return lst[0]
    # Handle empty lists and NaNs gracefully.
    except (IndexError, TypeError):
        return np.nan

ser.map(get_0th)                                          # map
ser.str[0]                                                # str accessor
pd.Series([x[0] if len(x) > 0 else np.nan for x in ser])  # list comp
pd.Series([get_0th(x) for x in ser])                      # list comp safe

Note
If the index matters, you would want to do:
pd.Series([...], index=ser.index)
When reconstructing the series.

List Flattening
A final example is flattening lists. This is another common problem, and demonstrates just how powerful pure python is here.

# Nested list flattening.
pd.DataFrame(ser.tolist()).stack().reset_index(drop=True)  # stack
pd.Series(list(chain.from_iterable(ser.tolist())))         # itertools.chain
pd.Series([y for x in ser for y in x])                     # nested list comp

Both itertools.chain.from_iterable and the nested list comprehension are pure python constructs, and scale much better than the stack solution.

These timings are a strong indication of the fact that pandas is not equipped to work with mixed dtypes, and that you should probably refrain from using it to do so. Wherever possible, data should be present as scalar values (ints/floats/strings) in separate columns.

Lastly, the applicability of these solutions depend widely on your data. So, the best thing to do would be to test these operations on your data before deciding what to go with. Notice how I have not timed apply on these solutions, because it would skew the graph (yes, it’s that slow).

Regex Operations, and `.str` Accessor Methods

Pandas can apply regex operations such as str.contains, str.extract, and str.extractall, as well as other “vectorized” string operations (such as str.split, str.find,str.translate`, and so on) on string columns. These functions are slower than list comprehensions, and are meant to be more convenience functions than anything else.

It is usually much faster to pre-compile a regex pattern and iterate over your data with re.compile (also see Is it worth using Python’s re.compile?). The list comp equivalent to str.contains looks something like this:

p = re.compile(...)
ser2 = pd.Series([x for x in ser if p.search(x)])

Or,

ser2 = ser[[bool(p.search(x)) for x in ser]]

If you need to handle NaNs, you can do something like

ser[[bool(p.search(x)) if pd.notnull(x) else False for x in ser]]

The list comp equivalent to str.extract (without groups) will look something like:

df['col2'] = [p.search(x).group(0) for x in df['col']]

If you need to handle no-matches and NaNs, you can use a custom function (still faster!):

def matcher(x):
    m = p.search(str(x))
    if m:
        return m.group(0)
    return np.nan

df['col2'] = [matcher(x) for x in df['col']]

The matcher function is very extensible. It can be fitted to return a list for each capture group, as needed. Just extract query the group or groups attribute of the matcher object.

For str.extractall, change p.search to p.findall.

String Extraction
Consider a simple filtering operation. The idea is to extract 4 digits if it is preceded by an upper case letter.

# Extracting strings.
p = re.compile(r'(?<=[A-Z])(\d{4})')
def matcher(x):
    m = p.search(x)
    if m:
        return m.group(0)
    return np.nan

ser.str.extract(r'(?<=[A-Z])(\d{4})', expand=False)   #  str.extract
pd.Series([matcher(x) for x in ser])                  #  list comprehension

More Examples
Full disclosure – I am the author (in part or whole) of these posts listed below.

Conclusion

As shown from the examples above, iteration shines when working with small rows of DataFrames, mixed datatypes, and regular expressions.

The speedup you get depends on your data and your problem, so your mileage may vary. The best thing to do is to carefully run tests and see if the payout is worth the effort.

The “vectorized” functions shine in their simplicity and readability, so if performance is not critical, you should definitely prefer those.

Another side note, certain string operations deal with constraints that favour the use of NumPy. Here are two examples where careful NumPy vectorization outperforms python:

Additionally, sometimes just operating on the underlying arrays via .values as opposed to on the Series or DataFrames can offer a healthy enough speedup for most usual scenarios (see the Note in the Numeric Comparison section above). So, for example df[df.A.values != df.B.values] would show instant performance boosts over df[df.A != df.B]. Using .values may not be appropriate in every situation, but it is a useful hack to know.

As mentioned above, it’s up to you to decide whether these solutions are worth the trouble of implementing.

Appendix: Code Snippets

import perfplot  
import operator 
import pandas as pd
import numpy as np
import re

from collections import Counter
from itertools import chain

# Boolean indexing with Numeric value comparison.
perfplot.show(
    setup=lambda n: pd.DataFrame(np.random.choice(1000, (n, 2)), columns=['A','B']),
    kernels=[
        lambda df: df[df.A != df.B],
        lambda df: df.query('A != B'),
        lambda df: df[[x != y for x, y in zip(df.A, df.B)]],
        lambda df: df[get_mask(df.A.values, df.B.values)]
    ],
    labels=['vectorized !=', 'query (numexpr)', 'list comp', 'numba'],
    n_range=[2**k for k in range(0, 15)],
    xlabel='N'
)

# Value Counts comparison.
perfplot.show(
    setup=lambda n: pd.Series(np.random.choice(1000, n)),
    kernels=[
        lambda ser: ser.value_counts(sort=False).to_dict(),
        lambda ser: dict(zip(*np.unique(ser, return_counts=True))),
        lambda ser: Counter(ser),
    ],
    labels=['value_counts', 'np.unique', 'Counter'],
    n_range=[2**k for k in range(0, 15)],
    xlabel='N',
    equality_check=lambda x, y: dict(x) == dict(y)
)

# Boolean indexing with string value comparison.
perfplot.show(
    setup=lambda n: pd.DataFrame(np.random.choice(1000, (n, 2)), columns=['A','B'], dtype=str),
    kernels=[
        lambda df: df[df.A != df.B],
        lambda df: df.query('A != B'),
        lambda df: df[[x != y for x, y in zip(df.A, df.B)]],
    ],
    labels=['vectorized !=', 'query (numexpr)', 'list comp'],
    n_range=[2**k for k in range(0, 15)],
    xlabel='N',
    equality_check=None
)

# Dictionary value extraction.
ser1 = pd.Series([{'key': 'abc', 'value': 123}, {'key': 'xyz', 'value': 456}])
perfplot.show(
    setup=lambda n: pd.concat([ser1] * n, ignore_index=True),
    kernels=[
        lambda ser: ser.map(operator.itemgetter('value')),
        lambda ser: pd.Series([x.get('value') for x in ser]),
    ],
    labels=['map', 'list comprehension'],
    n_range=[2**k for k in range(0, 15)],
    xlabel='N',
    equality_check=None
)

# List positional indexing. 
ser2 = pd.Series([['a', 'b', 'c'], [1, 2], []])        
perfplot.show(
    setup=lambda n: pd.concat([ser2] * n, ignore_index=True),
    kernels=[
        lambda ser: ser.map(get_0th),
        lambda ser: ser.str[0],
        lambda ser: pd.Series([x[0] if len(x) > 0 else np.nan for x in ser]),
        lambda ser: pd.Series([get_0th(x) for x in ser]),
    ],
    labels=['map', 'str accessor', 'list comprehension', 'list comp safe'],
    n_range=[2**k for k in range(0, 15)],
    xlabel='N',
    equality_check=None
)

# Nested list flattening.
ser3 = pd.Series([['a', 'b', 'c'], ['d', 'e'], ['f', 'g']])
perfplot.show(
    setup=lambda n: pd.concat([ser2] * n, ignore_index=True),
    kernels=[
        lambda ser: pd.DataFrame(ser.tolist()).stack().reset_index(drop=True),
        lambda ser: pd.Series(list(chain.from_iterable(ser.tolist()))),
        lambda ser: pd.Series([y for x in ser for y in x]),
    ],
    labels=['stack', 'itertools.chain', 'nested list comp'],
    n_range=[2**k for k in range(0, 15)],
    xlabel='N',    
    equality_check=None

)

# Extracting strings.
ser4 = pd.Series(['foo xyz', 'test A1234', 'D3345 xtz'])
perfplot.show(
    setup=lambda n: pd.concat([ser4] * n, ignore_index=True),
    kernels=[
        lambda ser: ser.str.extract(r'(?<=[A-Z])(\d{4})', expand=False),
        lambda ser: pd.Series([matcher(x) for x in ser])
    ],
    labels=['str.extract', 'list comprehension'],
    n_range=[2**k for k in range(0, 15)],
    xlabel='N',
    equality_check=None
)

Question 50

In short

for loop + iterrows is extremely slow. Overhead isn’t significant on ~1k rows, but noticeable on 10k+ rows.
for loop + itertuples is much faster than iterrows or apply.
vectorization is usually much faster than itertuples

Benchmark

Question 51

How are you supposed to break up a very long list comprehension?

[something_that_is_pretty_long for something_that_is_pretty_long in somethings_that_are_pretty_long]

I have also seen somewhere that people that dislike using ‘\’ to break up lines, but never understood why. What is the reason behind this?

Question 52

[x
 for
 x
 in
 (1,2,3)
]

works fine, so you can pretty much do as you please. I’d personally prefer

 [something_that_is_pretty_long
  for something_that_is_pretty_long
  in somethings_that_are_pretty_long]

The reason why \ isn’t appreciated very much is that it appears at the end of a line, where it either doesn’t stand out or needs extra padding, which has to be fixed when line lengths change:

x = very_long_term                     \
  + even_longer_term_than_the_previous \
  + a_third_term

In such cases, use parens:

x = (very_long_term
     + even_longer_term_than_the_previous
     + a_third_term)

Question 53

I’m not opposed to:

variable = [something_that_is_pretty_long
            for something_that_is_pretty_long
            in somethings_that_are_pretty_long]

You don’t need \ in this case. In general, I think people avoid \ because it’s slightly ugly, but also can give problems if it’s not the very last thing on the line (make sure no whitespace follows it). I think it’s much better to use it than not, though, in order to keep your line lengths down.

Since \ isn’t necessary in the above case, or for parenthesized expressions, I actually find it fairly rare that I even need to use it.

Question 54

You can also make use of multiple indentations in cases where you’re dealing with a list of several data structures.

new_list = [
    {
        'attribute 1': a_very_long_item.attribute1,
        'attribute 2': a_very_long_item.attribute2,
        'list_attribute': [
            {
                'dict_key_1': attribute_item.attribute2,
                'dict_key_2': attribute_item.attribute2
            }
            for attribute_item
            in a_very_long_item.list_of_items
         ]
    }
    for a_very_long_item
    in a_very_long_list
    if a_very_long_item not in [some_other_long_item
        for some_other_long_item 
        in some_other_long_list
    ]
]

Notice how it also filters onto another list using an if statement. Dropping the if statement to its own line is useful as well.

Question 55

I have two lists as below

tags = [u'man', u'you', u'are', u'awesome']
entries = [[u'man', u'thats'],[ u'right',u'awesome']]

I want to extract entries from entries when they are in tags:

result = []

for tag in tags:
    for entry in entries:
        if tag in entry:
            result.extend(entry)

How can I write the two loops as a single line list comprehension?

Question 56

This should do it:

[entry for tag in tags for entry in entries if tag in entry]

Question 57

The best way to remember this is that the order of for loop inside the list comprehension is based on the order in which they appear in traditional loop approach. Outer most loop comes first, and then the inner loops subsequently.

So, the equivalent list comprehension would be:

[entry for tag in tags for entry in entries if tag in entry]

In general, if-else statement comes before the first for loop, and if you have just an if statement, it will come at the end. For e.g, if you would like to add an empty list, if tag is not in entry, you would do it like this:

[entry if tag in entry else [] for tag in tags for entry in entries]

Question 58

The appropriate LC would be

[entry for tag in tags for entry in entries if tag in entry]

The order of the loops in the LC is similar to the ones in nested loops, the if statements go to the end and the conditional expressions go in the beginning, something like

[a if a else b for a in sequence]

See the Demo –

>>> tags = [u'man', u'you', u'are', u'awesome']
>>> entries = [[u'man', u'thats'],[ u'right',u'awesome']]
>>> [entry for tag in tags for entry in entries if tag in entry]
[[u'man', u'thats'], [u'right', u'awesome']]
>>> result = []
    for tag in tags:
        for entry in entries:
            if tag in entry:
                result.append(entry)


>>> result
[[u'man', u'thats'], [u'right', u'awesome']]

EDIT – Since, you need the result to be flattened, you could use a similar list comprehension and then flatten the results.

>>> result = [entry for tag in tags for entry in entries if tag in entry]
>>> from itertools import chain
>>> list(chain.from_iterable(result))
[u'man', u'thats', u'right', u'awesome']

Adding this together, you could just do

>>> list(chain.from_iterable(entry for tag in tags for entry in entries if tag in entry))
[u'man', u'thats', u'right', u'awesome']

You use a generator expression here instead of a list comprehension. (Perfectly matches the 79 character limit too (without the list call))

Question 59

tags = [u'man', u'you', u'are', u'awesome']
entries = [[u'man', u'thats'],[ u'right',u'awesome']]

result = []
[result.extend(entry) for tag in tags for entry in entries if tag in entry]

print(result)

Output:

['man', 'thats', 'right', 'awesome']

Question 60

In comprehension, the nested lists iteration should follow the same order than the equivalent imbricated for loops.

To understand, we will take a simple example from NLP. You want to create a list of all words from a list of sentences where each sentence is a list of words.

>>> list_of_sentences = [['The','cat','chases', 'the', 'mouse','.'],['The','dog','barks','.']]
>>> all_words = [word for sentence in list_of_sentences for word in sentence]
>>> all_words
['The', 'cat', 'chases', 'the', 'mouse', '.', 'The', 'dog', 'barks', '.']

To remove the repeated words, you can use a set {} instead of a list []

>>> all_unique_words = list({word for sentence in list_of_sentences for word in sentence}]
>>> all_unique_words
['.', 'dog', 'the', 'chase', 'barks', 'mouse', 'The', 'cat']

or apply list(set(all_words))

>>> all_unique_words = list(set(all_words))
['.', 'dog', 'the', 'chases', 'barks', 'mouse', 'The', 'cat']

Question 61

return=[entry for tag in tags for entry in entries if tag in entry for entry in entry]

Question 62

In terms of performance in Python, is a list-comprehension, or functions like map(), filter() and reduce() faster than a for loop? Why, technically, they run in a C speed, while the for loop runs in the python virtual machine speed?.

Suppose that in a game that I’m developing I need to draw complex and huge maps using for loops. This question would be definitely relevant, for if a list-comprehension, for example, is indeed faster, it would be a much better option in order to avoid lags (Despite the visual complexity of the code).

Question 63

The following are rough guidelines and educated guesses based on experience. You should timeit or profile your concrete use case to get hard numbers, and those numbers may occasionally disagree with the below.

A list comprehension is usually a tiny bit faster than the precisely equivalent for loop (that actually builds a list), most likely because it doesn’t have to look up the list and its append method on every iteration. However, a list comprehension still does a bytecode-level loop:

>>> dis.dis(<the code object for `[x for x in range(10)]`>)
 1           0 BUILD_LIST               0
             3 LOAD_FAST                0 (.0)
       >>    6 FOR_ITER                12 (to 21)
             9 STORE_FAST               1 (x)
            12 LOAD_FAST                1 (x)
            15 LIST_APPEND              2
            18 JUMP_ABSOLUTE            6
       >>   21 RETURN_VALUE

Using a list comprehension in place of a loop that doesn’t build a list, nonsensically accumulating a list of meaningless values and then throwing the list away, is often slower because of the overhead of creating and extending the list. List comprehensions aren’t magic that is inherently faster than a good old loop.

As for functional list processing functions: While these are written in C and probably outperform equivalent functions written in Python, they are not necessarily the fastest option. Some speed up is expected if the function is written in C too. But most cases using a lambda (or other Python function), the overhead of repeatedly setting up Python stack frames etc. eats up any savings. Simply doing the same work in-line, without function calls (e.g. a list comprehension instead of map or filter) is often slightly faster.

Suppose that in a game that I’m developing I need to draw complex and huge maps using for loops. This question would be definitely relevant, for if a list-comprehension, for example, is indeed faster, it would be a much better option in order to avoid lags (Despite the visual complexity of the code).

Chances are, if code like this isn’t already fast enough when written in good non-“optimized” Python, no amount of Python level micro optimization is going to make it fast enough and you should start thinking about dropping to C. While extensive micro optimizations can often speed up Python code considerably, there is a low (in absolute terms) limit to this. Moreover, even before you hit that ceiling, it becomes simply more cost efficient (15% speedup vs. 300% speed up with the same effort) to bite the bullet and write some C.

Question 64

If you check the info on python.org, you can see this summary:

Version Time (seconds)
Basic loop 3.47
Eliminate dots 2.45
Local variable & no dots 1.79
Using map function 0.54

But you really should read the above article in details to understand the cause of the performance difference.

I also strongly suggest you should time your code by using timeit. At the end of the day, there can be a situation where, for example, you may need to break out of for loop when a condition is met. It could potentially be faster than finding out the result by calling map.

问题：从类定义中的列表理解访问类变量

回答 0

为什么；或者，关于这个的正式词

（小）异常；或者，为什么一部分仍然可以工作

在引擎盖下看；或者，比您想要的方式更详细

解决方法；或者，该怎么办

The why; or, the official word on this

The (small) exception; or, why one part may still work

Looking under the hood; or, way more detail than you ever wanted

A workaround; or, what to do about it

回答 1

回答 2

回答 3

回答 4

问题：Python使用枚举内部列表理解

回答 0

回答 1

回答 2

回答 3

回答 4

回答 5

回答 6

问题：如何处理列表推导中的异常？

回答 0

回答 1

回答 2

回答 3

回答 4

回答 5

问题：列表理解甚至在理解范围之后也会重新绑定名称。这是正确的吗？

回答 0

回答 1

回答 2

回答 3

回答 4

回答 5

问题：Python方式打印列表项

回答 0

回答 1

回答 2

回答 3

回答 4

回答 5

回答 6

回答 7

回答 8

回答 9

问题：使用列表推导只是副作用是Pythonic吗？

回答 0

回答 1

回答 2

回答 3

回答 4

回答 5

回答 6

问题：熊猫中的for循环真的不好吗？我什么时候应该在意？

回答 0

小数据上的迭代v / s矢量化

混合/ objectdtype操作

正则表达式操作和访问器.str方法

结论

附录：代码段

Iteration v/s Vectorization on Small Data

Operations with Mixed/object dtypes

Regex Operations, and .str Accessor Methods

Conclusion

Appendix: Code Snippets

回答 1

问题：python中列表推导或生成器表达式的行连续

回答 0

回答 1

回答 2

问题：如何在列表理解Python中构建两个for循环

回答 0

回答 1

回答 2

回答 3

回答 4

回答 5

问题：列表理解和功能比“ for循环”快吗？

混合/ `object`dtype操作

正则表达式操作和访问器`.str`方法

Operations with Mixed/`object` dtypes

Regex Operations, and `.str` Accessor Methods