嵌套函数中的局部变量

问题:嵌套函数中的局部变量

好的,请耐心等待,我知道它看起来会令人费解,但是请帮助我了解发生了什么。

from functools import partial

class Cage(object):
    def __init__(self, animal):
        self.animal = animal

def gotimes(do_the_petting):
    do_the_petting()

def get_petters():
    for animal in ['cow', 'dog', 'cat']:
        cage = Cage(animal)

        def pet_function():
            print "Mary pets the " + cage.animal + "."

        yield (animal, partial(gotimes, pet_function))

funs = list(get_petters())

for name, f in funs:
    print name + ":", 
    f()

给出:

cow: Mary pets the cat.
dog: Mary pets the cat.
cat: Mary pets the cat.

所以基本上,为什么我没有得到三种不同的动物?难道不是cage“打包”到嵌套函数的局部作用域中吗?如果不是,对嵌套函数的调用如何查找局部变量?

我知道遇到这些问题通常意味着一个人“做错了”,但是我想了解会发生什么。

Okay, bear with me on this, I know it’s going to look horribly convoluted, but please help me understand what’s happening.

from functools import partial

class Cage(object):
    def __init__(self, animal):
        self.animal = animal

def gotimes(do_the_petting):
    do_the_petting()

def get_petters():
    for animal in ['cow', 'dog', 'cat']:
        cage = Cage(animal)

        def pet_function():
            print "Mary pets the " + cage.animal + "."

        yield (animal, partial(gotimes, pet_function))

funs = list(get_petters())

for name, f in funs:
    print name + ":", 
    f()

Gives:

cow: Mary pets the cat.
dog: Mary pets the cat.
cat: Mary pets the cat.

So basically, why am I not getting three different animals? Isn’t the cage ‘packaged’ into the local scope of the nested function? If not, how does a call to the nested function look up the local variables?

I know that running into these kind of problems usually means one is ‘doing it wrong’, but I’d like to understand what happens.


回答 0

嵌套函数在执行时(而不是在定义时)从父范围中查找变量。

编译函数主体,然后验证“自由”变量(未在函数本身中通过赋值定义),然后将其作为闭包单元绑定到函数,并且代码使用索引引用每个单元格。pet_function因此具有一个自由变量(cage),然后将其通过一个闭合单元引用,索引为0的闭合本身指向局部变量cageget_petters功能。

当您实际调用该函数时,该闭包将用于在您调用该函数时查看cage周围作用域中的值。问题就在这里。在您调用函数时,该函数已经完成了对其结果的计算。将在在执行过程中的一些点局部变量分配各的,和字符串,但在功能的结束,包含了最后一个值。因此,当您调用每个动态返回的函数时,就会得到打印的值。get_petterscage'cow''dog''cat'cage'cat''cat'

解决方法是不依赖闭包。您可以改用部分函数,创建新的函数作用域或将变量绑定为关键字parameter默认值

  • 部分函数示例,使用functools.partial()

    from functools import partial
    
    def pet_function(cage=None):
        print "Mary pets the " + cage.animal + "."
    
    yield (animal, partial(gotimes, partial(pet_function, cage=cage)))
  • 创建一个新的范围示例:

    def scoped_cage(cage=None):
        def pet_function():
            print "Mary pets the " + cage.animal + "."
        return pet_function
    
    yield (animal, partial(gotimes, scoped_cage(cage)))
  • 将变量绑定为关键字参数的默认值:

    def pet_function(cage=cage):
        print "Mary pets the " + cage.animal + "."
    
    yield (animal, partial(gotimes, pet_function))

无需scoped_cage在循环中定义函数,编译仅进行一次,而不是在循环的每次迭代中进行。

The nested function looks up variables from the parent scope when executed, not when defined.

The function body is compiled, and the ‘free’ variables (not defined in the function itself by assignment), are verified, then bound as closure cells to the function, with the code using an index to reference each cell. pet_function thus has one free variable (cage) which is then referenced via a closure cell, index 0. The closure itself points to the local variable cage in the get_petters function.

When you actually call the function, that closure is then used to look at the value of cage in the surrounding scope at the time you call the function. Here lies the problem. By the time you call your functions, the get_petters function is already done computing it’s results. The cage local variable at some point during that execution was assigned each of the 'cow', 'dog', and 'cat' strings, but at the end of the function, cage contains that last value 'cat'. Thus, when you call each of the dynamically returned functions, you get the value 'cat' printed.

The work-around is to not rely on closures. You can use a partial function instead, create a new function scope, or bind the variable as a default value for a keyword parameter.

  • Partial function example, using functools.partial():

    from functools import partial
    
    def pet_function(cage=None):
        print "Mary pets the " + cage.animal + "."
    
    yield (animal, partial(gotimes, partial(pet_function, cage=cage)))
    
  • Creating a new scope example:

    def scoped_cage(cage=None):
        def pet_function():
            print "Mary pets the " + cage.animal + "."
        return pet_function
    
    yield (animal, partial(gotimes, scoped_cage(cage)))
    
  • Binding the variable as a default value for a keyword parameter:

    def pet_function(cage=cage):
        print "Mary pets the " + cage.animal + "."
    
    yield (animal, partial(gotimes, pet_function))
    

There is no need to define the scoped_cage function in the loop, compilation only takes place once, not on each iteration of the loop.


回答 1

我的理解是,在实际调用产生的pet_function时而不是之前,在父函数命名空间中查找了笼子。

所以当你这样做

funs = list(get_petters())

您生成3个函数,这些函数将找到最后创建的笼子。

如果您将最后一个循环替换为:

for name, f in get_petters():
    print name + ":", 
    f()

您实际上会得到:

cow: Mary pets the cow.
dog: Mary pets the dog.
cat: Mary pets the cat.

My understanding is that cage is looked for in the parent function namespace when the yielded pet_function is actually called, not before.

So when you do

funs = list(get_petters())

You generate 3 functions which will find the lastly created cage.

If you replace your last loop with :

for name, f in get_petters():
    print name + ":", 
    f()

You will actually get :

cow: Mary pets the cow.
dog: Mary pets the dog.
cat: Mary pets the cat.

回答 2

这源于以下

for i in range(2): 
    pass

print(i)  # prints 1

迭代后,将的值i延迟存储为最终值。

作为生成器,该函数可以工作(即依次打印每个值),但是在转换为列表时,它将在生成器上运行,因此对cagecage.animal)的所有调用都返回cats。

This stems from the following

for i in range(2): 
    pass

print(i)  # prints 1

after iterating the value of i is lazily stored as its final value.

As a generator the function would work (i.e. printing each value in turn), but when transforming to a list it runs over the generator, hence all calls to cage (cage.animal) return cats.


回答 3

让我们简化问题。定义:

def get_petters():
    for animal in ['cow', 'dog', 'cat']:
        def pet_function():
            return "Mary pets the " + animal + "."

        yield (animal, pet_function)

然后,就像在问题中一样,我们得到:

>>> for name, f in list(get_petters()):
...     print(name + ":", f())

cow: Mary pets the cat.
dog: Mary pets the cat.
cat: Mary pets the cat.

但是,如果我们避免创建list()第一个:

>>> for name, f in get_petters():
...     print(name + ":", f())

cow: Mary pets the cow.
dog: Mary pets the dog.
cat: Mary pets the cat.

这是怎么回事?为什么这种微妙的差异会完全改变我们的结果?


如果我们看一下list(get_petters()),从不断变化的内存地址可以明显看出,我们确实产生了三种不同的功能:

>>> list(get_petters())

[('cow', <function get_petters.<locals>.pet_function at 0x7ff2b988d790>),
 ('dog', <function get_petters.<locals>.pet_function at 0x7ff2c18f51f0>),
 ('cat', <function get_petters.<locals>.pet_function at 0x7ff2c14a9f70>)]

但是,请看一下cell这些函数绑定到的:

>>> for _, f in list(get_petters()):
...     print(f(), f.__closure__)

Mary pets the cat. (<cell at 0x7ff2c112a9d0: str object at 0x7ff2c3f437f0>,)
Mary pets the cat. (<cell at 0x7ff2c112a9d0: str object at 0x7ff2c3f437f0>,)
Mary pets the cat. (<cell at 0x7ff2c112a9d0: str object at 0x7ff2c3f437f0>,)

>>> for _, f in get_petters():
...     print(f(), f.__closure__)

Mary pets the cow. (<cell at 0x7ff2b86b5d00: str object at 0x7ff2c1a95670>,)
Mary pets the dog. (<cell at 0x7ff2b86b5d00: str object at 0x7ff2c1a952f0>,)
Mary pets the cat. (<cell at 0x7ff2b86b5d00: str object at 0x7ff2c3f437f0>,)

对于这两个循环,cell对象在整个迭代过程中保持不变。但是,正如预期的那样,str它引用的具体内容在第二个循环中有所不同。该cell对象引用animal,在get_petters()调用时创建。但是,在生成器函数运行时animal更改str它所指的对象。

在第一个循环中,在每次迭代期间,我们都创建了所有fs,但是只有在生成器get_petters()完全用尽并且list已经创建a 函数之后,才调用它们。

在第二个循环中,在每次迭代期间,我们暂停get_petters()生成器并f在每次暂停后调用。因此,我们最终animal在生成器功能暂停的那一刻检索了值。

正如@Claudiu对类似问题的回答

创建了三个单独的函数,但是每个函数都封闭了定义它们的环境-在这种情况下,是全局环境(如果将循环放在另一个函数内部,则为外部函数的环境)。不过,这确实是问题所在-在这种环境中,animal变量是突变的,并且所有的闭包都引用相同的animal

[编者注:i已更改为animal。]

Let’s simplify the question. Define:

def get_petters():
    for animal in ['cow', 'dog', 'cat']:
        def pet_function():
            return "Mary pets the " + animal + "."

        yield (animal, pet_function)

Then, just like in the question, we get:

>>> for name, f in list(get_petters()):
...     print(name + ":", f())

cow: Mary pets the cat.
dog: Mary pets the cat.
cat: Mary pets the cat.

But if we avoid creating a list() first:

>>> for name, f in get_petters():
...     print(name + ":", f())

cow: Mary pets the cow.
dog: Mary pets the dog.
cat: Mary pets the cat.

What’s going on? Why does this subtle difference completely change our results?


If we look at list(get_petters()), it’s clear from the changing memory addresses that we do indeed yield three different functions:

>>> list(get_petters())

[('cow', <function get_petters.<locals>.pet_function at 0x7ff2b988d790>),
 ('dog', <function get_petters.<locals>.pet_function at 0x7ff2c18f51f0>),
 ('cat', <function get_petters.<locals>.pet_function at 0x7ff2c14a9f70>)]

However, take a look at the cells that these functions are bound to:

>>> for _, f in list(get_petters()):
...     print(f(), f.__closure__)

Mary pets the cat. (<cell at 0x7ff2c112a9d0: str object at 0x7ff2c3f437f0>,)
Mary pets the cat. (<cell at 0x7ff2c112a9d0: str object at 0x7ff2c3f437f0>,)
Mary pets the cat. (<cell at 0x7ff2c112a9d0: str object at 0x7ff2c3f437f0>,)

>>> for _, f in get_petters():
...     print(f(), f.__closure__)

Mary pets the cow. (<cell at 0x7ff2b86b5d00: str object at 0x7ff2c1a95670>,)
Mary pets the dog. (<cell at 0x7ff2b86b5d00: str object at 0x7ff2c1a952f0>,)
Mary pets the cat. (<cell at 0x7ff2b86b5d00: str object at 0x7ff2c3f437f0>,)

For both loops, the cell object remains the same throughout the iterations. However, as expected, the specific str it references varies in the second loop. The cell object refers to animal, which is created when get_petters() is called. However, animal changes what str object it refers to as the generator function runs.

In the first loop, during each iteration, we create all the fs, but we only call them after the generator get_petters() is completely exhausted and a list of functions is already created.

In the second loop, during each iteration, we are pausing the get_petters() generator and calling f after each pause. Thus, we end up retrieving the value of animal at that moment in time that the generator function is paused.

As @Claudiu puts in an answer to a similar question:

Three separate functions are created, but they each have the closure of the environment they’re defined in – in this case, the global environment (or the outer function’s environment if the loop is placed inside another function). This is exactly the problem, though — in this environment, animal is mutated, and the closures all refer to the same animal.

[Editor note: i has been changed to animal.]