实际上,Python 3.3中新的“ yield from”语法的主要用途是什么?

问题:实际上,Python 3.3中新的“ yield from”语法的主要用途是什么?

我很难缠住PEP 380

  1. 在什么情况下“产生于”有用?
  2. 什么是经典用例?
  3. 为什么与微线程相比?

[更新]

现在,我了解了造成困难的原因。我曾经使用过生成器,但从未真正使用过协程(由PEP-342引入)。尽管有一些相似之处,但生成器和协程基本上是两个不同的概念。了解协程(不仅是生成器)是了解新语法的关键。

恕我直言,协程是最晦涩的Python功能,大多数书籍使它看起来毫无用处且无趣。

感谢您做出的出色回答,特别感谢agf及其与David Beazley演讲相关的评论。大卫·罗克。

I’m having a hard time wrapping my brain around PEP 380.

  1. What are the situations where “yield from” is useful?
  2. What is the classic use case?
  3. Why is it compared to micro-threads?

[ update ]

Now I understand the cause of my difficulties. I’ve used generators, but never really used coroutines (introduced by PEP-342). Despite some similarities, generators and coroutines are basically two different concepts. Understanding coroutines (not only generators) is the key to understanding the new syntax.

IMHO coroutines are the most obscure Python feature, most books make it look useless and uninteresting.

Thanks for the great answers, but special thanks to agf and his comment linking to David Beazley presentations. David rocks.


回答 0

让我们先解决一件事。该解释yield from g就等于for v in g: yield v 甚至没有开始做正义什么yield from是一回事。因为,让我们面对现实,如果所有的事情yield from都是扩大for循环,那么它就不必添加yield from语言,也不能阻止在Python 2.x中实现一堆新功能。

什么yield from所做的就是建立主叫方和副生成器之间的透明双向连接

  • 从某种意义上说,该连接是“透明的”,它也将正确地传播所有内容,而不仅仅是所生成的元素(例如,传播异常)。

  • 该连接是在意义上是“双向”的数据可以同时寄给一个生成器。

如果我们在谈论TCP,yield from g可能意味着“现在暂时断开客户端的套接字,然后将其重新连接到该其他服务器套接字”。

顺便说一句,如果您不确定向生成器发送数据意味着什么,则需要删除所有内容并首先了解协程,它们非常有用(将它们与子例程进行对比),但是不幸的是在Python中鲜为人知。戴夫·比兹利(Dave Beazley)的《协程》好奇类是一个很好的开始。阅读幻灯片24-33以获得快速入门。

使用以下命令从生成器读取数据

def reader():
    """A generator that fakes a read from a file, socket, etc."""
    for i in range(4):
        yield '<< %s' % i

def reader_wrapper(g):
    # Manually iterate over data produced by reader
    for v in g:
        yield v

wrap = reader_wrapper(reader())
for i in wrap:
    print(i)

# Result
<< 0
<< 1
<< 2
<< 3

reader()我们可以手动完成,而不必手动进行迭代yield from

def reader_wrapper(g):
    yield from g

那行得通,我们消除了一行代码。意图可能会更清晰(或不太清楚)。但是生活没有改变。

使用第1部分中的收益将数据发送到生成器(协程)

现在,让我们做一些更有趣的事情。让我们创建一个名为coroutine的协程writer,它接受发送给它的数据并写入套接字,fd等。

def writer():
    """A coroutine that writes data *sent* to it to fd, socket, etc."""
    while True:
        w = (yield)
        print('>> ', w)

现在的问题是,包装器函数应如何处理将数据发送到编写器,以便将任何发送到包装器的数据透明地发送到writer()

def writer_wrapper(coro):
    # TBD
    pass

w = writer()
wrap = writer_wrapper(w)
wrap.send(None)  # "prime" the coroutine
for i in range(4):
    wrap.send(i)

# Expected result
>>  0
>>  1
>>  2
>>  3

包装器需要(显然)接受发送给它的数据,并且还应处理StopIterationfor循环用尽时的情况。显然只是做for x in coro: yield x不会做。这是一个有效的版本。

def writer_wrapper(coro):
    coro.send(None)  # prime the coro
    while True:
        try:
            x = (yield)  # Capture the value that's sent
            coro.send(x)  # and pass it to the writer
        except StopIteration:
            pass

或者,我们可以这样做。

def writer_wrapper(coro):
    yield from coro

这样可以节省6行代码,使其更具可读性,并且可以正常工作。魔法!

从第2部分-异常处理将数据发送到生成器收益

让我们使其更加复杂。如果我们的作者需要处理异常怎么办?假设writer句柄a 遇到一个SpamException,它将打印***

class SpamException(Exception):
    pass

def writer():
    while True:
        try:
            w = (yield)
        except SpamException:
            print('***')
        else:
            print('>> ', w)

如果我们不改变writer_wrapper怎么办?它行得通吗?我们试试吧

# writer_wrapper same as above

w = writer()
wrap = writer_wrapper(w)
wrap.send(None)  # "prime" the coroutine
for i in [0, 1, 2, 'spam', 4]:
    if i == 'spam':
        wrap.throw(SpamException)
    else:
        wrap.send(i)

# Expected Result
>>  0
>>  1
>>  2
***
>>  4

# Actual Result
>>  0
>>  1
>>  2
Traceback (most recent call last):
  ... redacted ...
  File ... in writer_wrapper
    x = (yield)
__main__.SpamException

嗯,它不起作用,因为x = (yield)只是引发了异常,一切都崩溃了。让它正常工作,但手动处理异常并将其发送或将其抛出到子生成器(writer)中

def writer_wrapper(coro):
    """Works. Manually catches exceptions and throws them"""
    coro.send(None)  # prime the coro
    while True:
        try:
            try:
                x = (yield)
            except Exception as e:   # This catches the SpamException
                coro.throw(e)
            else:
                coro.send(x)
        except StopIteration:
            pass

这可行。

# Result
>>  0
>>  1
>>  2
***
>>  4

但是,这也是!

def writer_wrapper(coro):
    yield from coro

yield from透明地处理发送值或抛出的值到副生成器。

但是,这仍然不能涵盖所有极端情况。如果外部生成器关闭,会发生什么?如果子生成器返回一个值(是的,在Python 3.3+中,生成器可以返回值),该如何处理?yield from透明地处理所有的极端案例是让人印象深刻yield from只是神奇地工作并处理了所有这些情况。

我个人认为这yield from是一个糟糕的关键字选择,因为它不会使双向性变得显而易见。提出了其他关键字(例如delegate但被拒绝了,因为向该语言添加新关键字比合并现有关键字要困难得多。

总之,最好将其yield from视为transparent two way channel调用方和子生成方之间的。

参考文献:

  1. PEP 380-委派给子生成器的语法(尤因)[v3.3,2009-02-13]
  2. PEP 342-通过增强型生成器进行协同程序(GvR,Eby)[v2.5,2005-05-10]

Let’s get one thing out of the way first. The explanation that yield from g is equivalent to for v in g: yield v does not even begin to do justice to what yield from is all about. Because, let’s face it, if all yield from does is expand the for loop, then it does not warrant adding yield from to the language and preclude a whole bunch of new features from being implemented in Python 2.x.

What yield from does is it establishes a transparent bidirectional connection between the caller and the sub-generator:

  • The connection is “transparent” in the sense that it will propagate everything correctly too, not just the elements being generated (e.g. exceptions are propagated).

  • The connection is “bidirectional” in the sense that data can be both sent from and to a generator.

(If we were talking about TCP, yield from g might mean “now temporarily disconnect my client’s socket and reconnect it to this other server socket”.)

BTW, if you are not sure what sending data to a generator even means, you need to drop everything and read about coroutines first—they’re very useful (contrast them with subroutines), but unfortunately lesser-known in Python. Dave Beazley’s Curious Course on Coroutines is an excellent start. Read slides 24-33 for a quick primer.

Reading data from a generator using yield from

def reader():
    """A generator that fakes a read from a file, socket, etc."""
    for i in range(4):
        yield '<< %s' % i

def reader_wrapper(g):
    # Manually iterate over data produced by reader
    for v in g:
        yield v

wrap = reader_wrapper(reader())
for i in wrap:
    print(i)

# Result
<< 0
<< 1
<< 2
<< 3

Instead of manually iterating over reader(), we can just yield from it.

def reader_wrapper(g):
    yield from g

That works, and we eliminated one line of code. And probably the intent is a little bit clearer (or not). But nothing life changing.

Sending data to a generator (coroutine) using yield from – Part 1

Now let’s do something more interesting. Let’s create a coroutine called writer that accepts data sent to it and writes to a socket, fd, etc.

def writer():
    """A coroutine that writes data *sent* to it to fd, socket, etc."""
    while True:
        w = (yield)
        print('>> ', w)

Now the question is, how should the wrapper function handle sending data to the writer, so that any data that is sent to the wrapper is transparently sent to the writer()?

def writer_wrapper(coro):
    # TBD
    pass

w = writer()
wrap = writer_wrapper(w)
wrap.send(None)  # "prime" the coroutine
for i in range(4):
    wrap.send(i)

# Expected result
>>  0
>>  1
>>  2
>>  3

The wrapper needs to accept the data that is sent to it (obviously) and should also handle the StopIteration when the for loop is exhausted. Evidently just doing for x in coro: yield x won’t do. Here is a version that works.

def writer_wrapper(coro):
    coro.send(None)  # prime the coro
    while True:
        try:
            x = (yield)  # Capture the value that's sent
            coro.send(x)  # and pass it to the writer
        except StopIteration:
            pass

Or, we could do this.

def writer_wrapper(coro):
    yield from coro

That saves 6 lines of code, make it much much more readable and it just works. Magic!

Sending data to a generator yield from – Part 2 – Exception handling

Let’s make it more complicated. What if our writer needs to handle exceptions? Let’s say the writer handles a SpamException and it prints *** if it encounters one.

class SpamException(Exception):
    pass

def writer():
    while True:
        try:
            w = (yield)
        except SpamException:
            print('***')
        else:
            print('>> ', w)

What if we don’t change writer_wrapper? Does it work? Let’s try

# writer_wrapper same as above

w = writer()
wrap = writer_wrapper(w)
wrap.send(None)  # "prime" the coroutine
for i in [0, 1, 2, 'spam', 4]:
    if i == 'spam':
        wrap.throw(SpamException)
    else:
        wrap.send(i)

# Expected Result
>>  0
>>  1
>>  2
***
>>  4

# Actual Result
>>  0
>>  1
>>  2
Traceback (most recent call last):
  ... redacted ...
  File ... in writer_wrapper
    x = (yield)
__main__.SpamException

Um, it’s not working because x = (yield) just raises the exception and everything comes to a crashing halt. Let’s make it work, but manually handling exceptions and sending them or throwing them into the sub-generator (writer)

def writer_wrapper(coro):
    """Works. Manually catches exceptions and throws them"""
    coro.send(None)  # prime the coro
    while True:
        try:
            try:
                x = (yield)
            except Exception as e:   # This catches the SpamException
                coro.throw(e)
            else:
                coro.send(x)
        except StopIteration:
            pass

This works.

# Result
>>  0
>>  1
>>  2
***
>>  4

But so does this!

def writer_wrapper(coro):
    yield from coro

The yield from transparently handles sending the values or throwing values into the sub-generator.

This still does not cover all the corner cases though. What happens if the outer generator is closed? What about the case when the sub-generator returns a value (yes, in Python 3.3+, generators can return values), how should the return value be propagated? That yield from transparently handles all the corner cases is really impressive. yield from just magically works and handles all those cases.

I personally feel yield from is a poor keyword choice because it does not make the two-way nature apparent. There were other keywords proposed (like delegate but were rejected because adding a new keyword to the language is much more difficult than combining existing ones.

In summary, it’s best to think of yield from as a transparent two way channel between the caller and the sub-generator.

References:

  1. PEP 380 – Syntax for delegating to a sub-generator (Ewing) [v3.3, 2009-02-13]
  2. PEP 342 – Coroutines via Enhanced Generators (GvR, Eby) [v2.5, 2005-05-10]

回答 1

在什么情况下“产生于”是有用的?

您遇到这样的循环的每种情况:

for x in subgenerator:
  yield x

作为PEP介绍,这是一个相当幼稚企图在使用子发生器,它缺少几个方面,特别是妥善处理.throw()/ .send()/ .close()通过引进机制PEP 342。要正确执行此操作,需要相当复杂的代码。

什么是经典用例?

考虑您要从递归数据结构中提取信息。假设我们要获取树中的所有叶节点:

def traverse_tree(node):
  if not node.children:
    yield node
  for child in node.children:
    yield from traverse_tree(child)

更重要的是,直到之前yield from,还没有简单的重构生成器代码的方法。假设您有一个(无意义的)生成器,如下所示:

def get_list_values(lst):
  for item in lst:
    yield int(item)
  for item in lst:
    yield str(item)
  for item in lst:
    yield float(item)

现在,您决定将这些循环分解为单独的生成器。不带yield from,这是很丑陋的,直到您是否真的想这样做三思。使用yield from,实际上看起来很不错:

def get_list_values(lst):
  for sub in [get_list_values_as_int, 
              get_list_values_as_str, 
              get_list_values_as_float]:
    yield from sub(lst)

为什么与微线程相比?

我认为PEP中的这一部分谈论的是,每个生成器确实都有其自己的隔离执行上下文。以及使用yield和来在生成者迭代器和调用者之间切换执行的事实__next__()分别,这类似于线程,其中操作系统会不时切换执行线程以及执行上下文(堆栈,寄存器, …)。

其效果也相当:生成器迭代器和调用者都同时在其执行状态中进行,它们的执行是交错的。例如,如果生成器进行某种计算,并且调用方打印出结果,则结果可用时,您将立即看到它们。这是一种并发形式。

这种类比不是特定于的yield from-而是Python中生成器的一般属性。

What are the situations where “yield from” is useful?

Every situation where you have a loop like this:

for x in subgenerator:
  yield x

As the PEP describes, this is a rather naive attempt at using the subgenerator, it’s missing several aspects, especially the proper handling of the .throw()/.send()/.close() mechanisms introduced by PEP 342. To do this properly, rather complicated code is necessary.

What is the classic use case?

Consider that you want to extract information from a recursive data structure. Let’s say we want to get all leaf nodes in a tree:

def traverse_tree(node):
  if not node.children:
    yield node
  for child in node.children:
    yield from traverse_tree(child)

Even more important is the fact that until the yield from, there was no simple method of refactoring the generator code. Suppose you have a (senseless) generator like this:

def get_list_values(lst):
  for item in lst:
    yield int(item)
  for item in lst:
    yield str(item)
  for item in lst:
    yield float(item)

Now you decide to factor out these loops into separate generators. Without yield from, this is ugly, up to the point where you will think twice whether you actually want to do it. With yield from, it’s actually nice to look at:

def get_list_values(lst):
  for sub in [get_list_values_as_int, 
              get_list_values_as_str, 
              get_list_values_as_float]:
    yield from sub(lst)

Why is it compared to micro-threads?

I think what this section in the PEP is talking about is that every generator does have its own isolated execution context. Together with the fact that execution is switched between the generator-iterator and the caller using yield and __next__(), respectively, this is similar to threads, where the operating system switches the executing thread from time to time, along with the execution context (stack, registers, …).

The effect of this is also comparable: Both the generator-iterator and the caller progress in their execution state at the same time, their executions are interleaved. For example, if the generator does some kind of computation and the caller prints out the results, you’ll see the results as soon as they’re available. This is a form of concurrency.

That analogy isn’t anything specific to yield from, though – it’s rather a general property of generators in Python.


回答 2

无论您从生成器内部调用生成器的哪个位置,都需要一个“泵”来重新yield设置值: for v in inner_generator: yield v。正如PEP所指出的那样,大多数人都忽略了这一点的微妙复杂性。throw()PEP中提供了一个示例,例如非本地流控制。yield from inner_generator无论您for之前编写了显式循环的地方,都将使用新语法。但是,它不仅是语法糖,它还处理了for循环忽略的所有极端情况。成为“丑闻”会鼓励人们使用它,从而获得正确的行为。

讨论线程中的此消息讨论了以下复杂性:

有了PEP 342引入的其他生成器功能,情况已不再如此:如Greg的PEP中所述,简单的迭代不正确地支持send()和throw()。当分解它们时,支持send()和throw()所需的体操实际上并不那么复杂,但是它们也不是简单的。

除了观察到生成器是一种平行论之外,我无法与微线程进行比较。您可以将挂起的生成器视为通过以下方式发送值的线程:yield到使用者线程的线程。实际的实现可能并非如此(Python开发人员显然对实际的实现非常感兴趣),但这与用户无关。

新的yield from语法不会在线程方面为语言增加任何其他功能,而只是使正确使用现有功能更加容易。或更准确地说,它使专家编写的复杂内部生成器的新手消费者可以更轻松地通过该生成器,而不会破坏其任何复杂功能。

Wherever you invoke a generator from within a generator you need a “pump” to re-yield the values: for v in inner_generator: yield v. As the PEP points out there are subtle complexities to this which most people ignore. Non-local flow-control like throw() is one example given in the PEP. The new syntax yield from inner_generator is used wherever you would have written the explicit for loop before. It’s not merely syntactic sugar, though: It handles all of the corner cases that are ignored by the for loop. Being “sugary” encourages people to use it and thus get the right behaviors.

This message in the discussion thread talks about these complexities:

With the additional generator features introduced by PEP 342, that is no longer the case: as described in Greg’s PEP, simple iteration doesn’t support send() and throw() correctly. The gymnastics needed to support send() and throw() actually aren’t that complex when you break them down, but they aren’t trivial either.

I can’t speak to a comparison with micro-threads, other than to observe that generators are a type of paralellism. You can consider the suspended generator to be a thread which sends values via yield to a consumer thread. The actual implementation may be nothing like this (and the actual implementation is obviously of great interest to the Python developers) but this does not concern the users.

The new yield from syntax does not add any additional capability to the language in terms of threading, it just makes it easier to use existing features correctly. Or more precisely it makes it easier for a novice consumer of a complex inner generator written by an expert to pass through that generator without breaking any of its complex features.


回答 3

一个简短的示例将帮助您理解的一个yield from用例:从另一个生成器获取价值

def flatten(sequence):
    """flatten a multi level list or something
    >>> list(flatten([1, [2], 3]))
    [1, 2, 3]
    >>> list(flatten([1, [2], [3, [4]]]))
    [1, 2, 3, 4]
    """
    for element in sequence:
        if hasattr(element, '__iter__'):
            yield from flatten(element)
        else:
            yield element

print(list(flatten([1, [2], [3, [4]]])))

A short example will help you understand one of yield from‘s use case: get value from another generator

def flatten(sequence):
    """flatten a multi level list or something
    >>> list(flatten([1, [2], 3]))
    [1, 2, 3]
    >>> list(flatten([1, [2], [3, [4]]]))
    [1, 2, 3, 4]
    """
    for element in sequence:
        if hasattr(element, '__iter__'):
            yield from flatten(element)
        else:
            yield element

print(list(flatten([1, [2], [3, [4]]])))

回答 4

yield from 基本上以有效的方式链接迭代器:

# chain from itertools:
def chain(*iters):
    for it in iters:
        for item in it:
            yield item

# with the new keyword
def chain(*iters):
    for it in iters:
        yield from it

如您所见,它删除了一个纯Python循环。这几乎就是它的全部工作,但是链接迭代器是Python中很常见的模式。

线程基本上是一种功能,使您可以在完全随机的点跳出函数,然后跳回另一个函数的状态。线程管理器经常执行此操作,因此该程序似乎可以同时运行所有这些功能。问题是这些点是随机的,因此您需要使用锁定来防止主管在有问题的点停止该功能。

在这种意义上,生成器与线程非常相似:它们允许您指定特定点(无论何时, yield),您可以在其中跳入和跳出。当以这种方式使用时,生成器称为协程。

阅读有关Python中协程的出色教程,以了解更多详细信息

yield from basically chains iterators in a efficient way:

# chain from itertools:
def chain(*iters):
    for it in iters:
        for item in it:
            yield item

# with the new keyword
def chain(*iters):
    for it in iters:
        yield from it

As you can see it removes one pure Python loop. That’s pretty much all it does, but chaining iterators is a pretty common pattern in Python.

Threads are basically a feature that allow you to jump out of functions at completely random points and jump back into the state of another function. The thread supervisor does this very often, so the program appears to run all these functions at the same time. The problem is that the points are random, so you need to use locking to prevent the supervisor from stopping the function at a problematic point.

Generators are pretty similar to threads in this sense: They allow you to specify specific points (whenever they yield) where you can jump in and out. When used this way, generators are called coroutines.

Read this excellent tutorials about coroutines in Python for more details


回答 5

在应用的使用为异步IO协程yield from也有类似的行为作为await协程功能。两者都用于中止协程的执行。

对于Asyncio,如果不需要支持较旧的Python版本(即> 3.5),则建议使用async def/ await作为定义协程的语法。因此yield from,协程中不再需要。

但通常在asyncio之外,如先前答案中所述,yield from <sub-generator>在迭代子生成器方面还有其他用途。

In applied usage for the Asynchronous IO coroutine, yield from has a similar behavior as await in a coroutine function. Both of which is used to suspend the execution of coroutine.

For Asyncio, if there’s no need to support an older Python version (i.e. >3.5), async def/await is the recommended syntax to define a coroutine. Thus yield from is no longer needed in a coroutine.

But in general outside of asyncio, yield from <sub-generator> has still some other usage in iterating the sub-generator as mentioned in the earlier answer.


回答 6

该代码定义了一个函数,该函数fixed_sum_digits返回一个生成器,该生成器枚举所有六个数字的数字,以使数字的总和为20。

def iter_fun(sum, deepness, myString, Total):
    if deepness == 0:
        if sum == Total:
            yield myString
    else:  
        for i in range(min(10, Total - sum + 1)):
            yield from iter_fun(sum + i,deepness - 1,myString + str(i),Total)

def fixed_sum_digits(digits, Tot):
    return iter_fun(0,digits,"",Tot) 

试着不用来写yield from。如果您找到有效的方法,请告诉我。

我认为对于这种情况:访问树yield from使代码更简单,更清晰。

This code defines a function fixed_sum_digits returning a generator enumerating all six digits numbers such that the sum of digits is 20.

def iter_fun(sum, deepness, myString, Total):
    if deepness == 0:
        if sum == Total:
            yield myString
    else:  
        for i in range(min(10, Total - sum + 1)):
            yield from iter_fun(sum + i,deepness - 1,myString + str(i),Total)

def fixed_sum_digits(digits, Tot):
    return iter_fun(0,digits,"",Tot) 

Try to write it without yield from. If you find an effective way to do it let me know.

I think that for cases like this one: visiting trees, yield from makes the code simpler and cleaner.


回答 7

简而言之,为迭代器函数yield from提供尾递归

Simply put, yield from provides tail recursion for iterator functions.