标签归档:generator

我如何知道生成器从一开始是否为空?

问题:我如何知道生成器从一开始是否为空?

有没有测试,如果生成器组没有项目,如一个简单的方法peekhasNextisEmpty,类似的规定?

Is there a simple way of testing if the generator has no items, like peek, hasNext, isEmpty, something along those lines?


回答 0

您问题的简单答案:不,没有简单的方法。有很多解决方法。

实际上,不应该有一种简单的方法,因为生成器是什么:一种在不将序列保存在内存中的情况下输出值序列的方法。因此,没有向后遍历。

您可以编写has_next函数,甚至可以将其作为带有精美装饰器的方法添加到生成器上。

The simple answer to your question: no, there is no simple way. There are a whole lot of work-arounds.

There really shouldn’t be a simple way, because of what generators are: a way to output a sequence of values without holding the sequence in memory. So there’s no backward traversal.

You could write a has_next function or maybe even slap it on to a generator as a method with a fancy decorator if you wanted to.


回答 1

建议:

def peek(iterable):
    try:
        first = next(iterable)
    except StopIteration:
        return None
    return first, itertools.chain([first], iterable)

用法:

res = peek(mysequence)
if res is None:
    # sequence is empty.  Do stuff.
else:
    first, mysequence = res
    # Do something with first, maybe?
    # Then iterate over the sequence:
    for element in mysequence:
        # etc.

Suggestion:

def peek(iterable):
    try:
        first = next(iterable)
    except StopIteration:
        return None
    return first, itertools.chain([first], iterable)

Usage:

res = peek(mysequence)
if res is None:
    # sequence is empty.  Do stuff.
else:
    first, mysequence = res
    # Do something with first, maybe?
    # Then iterate over the sequence:
    for element in mysequence:
        # etc.

回答 2

一种简单的方法是将可选参数用于next(),如果生成器用尽(或为空),则使用该参数。例如:

iterable = some_generator()

_exhausted = object()

if next(iterable, _exhausted) == _exhausted:
    print('generator is empty')

编辑:更正了mehtunguh注释中指出的问题。

A simple way is to use the optional parameter for next() which is used if the generator is exhausted (or empty). For example:

iterable = some_generator()

_exhausted = object()

if next(iterable, _exhausted) == _exhausted:
    print('generator is empty')

Edit: Corrected the problem pointed out in mehtunguh’s comment.


回答 3

next(generator, None) is not None

或替换,None但是无论您知道什么值都不在您的生成器中。

编辑:是的,这将跳过生成器中的1个项目。但是,通常我会检查生成器是否仅出于验证目的而为空,然后才真正不使用它。否则我会做类似的事情:

def foo(self):
    if next(self.my_generator(), None) is None:
        raise Exception("Not initiated")

    for x in self.my_generator():
        ...

也就是说,如果您的生成器来自函数,则此方法有效,如中所述generator()

next(generator, None) is not None

Or replace None but whatever value you know it’s not in your generator.

Edit: Yes, this will skip 1 item in the generator. Often, however, I check whether a generator is empty only for validation purposes, then don’t really use it. Or otherwise I do something like:

def foo(self):
    if next(self.my_generator(), None) is None:
        raise Exception("Not initiated")

    for x in self.my_generator():
        ...

That is, this works if your generator comes from a function, as in generator().


回答 4

最好的方法,恕我直言,将避免特殊的测试。大多数情况下,使用生成器一种测试:

thing_generated = False

# Nothing is lost here. if nothing is generated, 
# the for block is not executed. Often, that's the only check
# you need to do. This can be done in the course of doing
# the work you wanted to do anyway on the generated output.
for thing in my_generator():
    thing_generated = True
    do_work(thing)

如果这还不够好,您仍然可以执行显式测试。此时,thing将包含最后生成的值。如果未生成任何内容,则它将是未定义的-除非您已经定义了变量。您可以检查的值thing,但这有点不可靠。相反,只需在块内设置一个标志,然后再检查它:

if not thing_generated:
    print "Avast, ye scurvy dog!"

The best approach, IMHO, would be to avoid a special test. Most times, use of a generator is the test:

thing_generated = False

# Nothing is lost here. if nothing is generated, 
# the for block is not executed. Often, that's the only check
# you need to do. This can be done in the course of doing
# the work you wanted to do anyway on the generated output.
for thing in my_generator():
    thing_generated = True
    do_work(thing)

If that’s not good enough, you can still perform an explicit test. At this point, thing will contain the last value generated. If nothing was generated, it will be undefined – unless you’ve already defined the variable. You could check the value of thing, but that’s a bit unreliable. Instead, just set a flag within the block and check it afterward:

if not thing_generated:
    print "Avast, ye scurvy dog!"

回答 5

我讨厌提供第二种解决方案,尤其是我自己不会使用的解决方案,但是,如果您绝对必须这样做并且不消耗生成器,那么在其他答案中:

def do_something_with_item(item):
    print item

empty_marker = object()

try:
     first_item = my_generator.next()     
except StopIteration:
     print 'The generator was empty'
     first_item = empty_marker

if first_item is not empty_marker:
    do_something_with_item(first_item)
    for item in my_generator:
        do_something_with_item(item)

现在我真的不喜欢这种解决方案,因为我认为这不是生成器的使用方式。

I hate to offer a second solution, especially one that I would not use myself, but, if you absolutely had to do this and to not consume the generator, as in other answers:

def do_something_with_item(item):
    print item

empty_marker = object()

try:
     first_item = my_generator.next()     
except StopIteration:
     print 'The generator was empty'
     first_item = empty_marker

if first_item is not empty_marker:
    do_something_with_item(first_item)
    for item in my_generator:
        do_something_with_item(item)

Now I really don’t like this solution, because I believe that this is not how generators are to be used.


回答 6

我意识到该帖子目前已有5年历史了,但是我在寻找惯用的方法时发现了它,并且没有看到我的解决方案发布。因此,对于后代:

import itertools

def get_generator():
    """
    Returns (bool, generator) where bool is true iff the generator is not empty.
    """
    gen = (i for i in [0, 1, 2, 3, 4])
    a, b = itertools.tee(gen)
    try:
        a.next()
    except StopIteration:
        return (False, b)
    return (True, b)

当然,正如我敢肯定的,很多评论员都会指出,这很hacky,并且仅在某些有限的情况下才起作用(例如,生成器是无副作用的)。YMMV。

I realize that this post is 5 years old at this point, but I found it while looking for an idiomatic way of doing this, and did not see my solution posted. So for posterity:

import itertools

def get_generator():
    """
    Returns (bool, generator) where bool is true iff the generator is not empty.
    """
    gen = (i for i in [0, 1, 2, 3, 4])
    a, b = itertools.tee(gen)
    try:
        a.next()
    except StopIteration:
        return (False, b)
    return (True, b)

Of course, as I’m sure many commentators will point out, this is hacky and only works at all in certain limited situations (where the generators are side-effect free, for example). YMMV.


回答 7

很抱歉使用明显的方法,但是最好的方法是:

for item in my_generator:
     print item

现在,您已经检测到生成器在使用时是空的。当然,如果生成器为空,则将永远不会显示项目。

这可能并不完全适合您的代码,但这是生成器的惯用法:迭代,因此也许您可能会稍微改变方法,或者根本不使用生成器。

Sorry for the obvious approach, but the best way would be to do:

for item in my_generator:
     print item

Now you have detected that the generator is empty while you are using it. Of course, item will never be displayed if the generator is empty.

This may not exactly fit in with your code, but this is what the idiom of the generator is for: iterating, so perhaps you might change your approach slightly, or not use generators at all.


回答 8

您需要查看生成器是否为空的所有方法是尝试获取下一个结果。当然,如果您还没有准备好使用该结果,则必须将其存储起来,以便以后再次返回。

这是一个包装器类,可以将其添加到现有迭代器中以添加__nonzero__测试,因此您可以使用simple来查看生成器是否为空if。它也可能会变成装饰器。

class GenWrapper:
    def __init__(self, iter):
        self.source = iter
        self.stored = False

    def __iter__(self):
        return self

    def __nonzero__(self):
        if self.stored:
            return True
        try:
            self.value = next(self.source)
            self.stored = True
        except StopIteration:
            return False
        return True

    def __next__(self):  # use "next" (without underscores) for Python 2.x
        if self.stored:
            self.stored = False
            return self.value
        return next(self.source)

使用方法如下:

with open(filename, 'r') as f:
    f = GenWrapper(f)
    if f:
        print 'Not empty'
    else:
        print 'Empty'

请注意,您可以随时检查是否为空,而不仅仅是在迭代开始时。

All you need to do to see if a generator is empty is to try to get the next result. Of course if you’re not ready to use that result then you have to store it to return it again later.

Here’s a wrapper class that can be added to an existing iterator to add an __nonzero__ test, so you can see if the generator is empty with a simple if. It can probably also be turned into a decorator.

class GenWrapper:
    def __init__(self, iter):
        self.source = iter
        self.stored = False

    def __iter__(self):
        return self

    def __nonzero__(self):
        if self.stored:
            return True
        try:
            self.value = next(self.source)
            self.stored = True
        except StopIteration:
            return False
        return True

    def __next__(self):  # use "next" (without underscores) for Python 2.x
        if self.stored:
            self.stored = False
            return self.value
        return next(self.source)

Here’s how you’d use it:

with open(filename, 'r') as f:
    f = GenWrapper(f)
    if f:
        print 'Not empty'
    else:
        print 'Empty'

Note that you can check for emptiness at any time, not just at the start of the iteration.


回答 9

在马克·兰瑟姆(Mark Ransom)的提示下,这是一个可用于包装任何迭代器的类,以便您可以窥视,将值推回到流中并检查是否为空。这是一个简单的想法,具有一个简单的实现,过去我很方便。

class Pushable:

    def __init__(self, iter):
        self.source = iter
        self.stored = []

    def __iter__(self):
        return self

    def __bool__(self):
        if self.stored:
            return True
        try:
            self.stored.append(next(self.source))
        except StopIteration:
            return False
        return True

    def push(self, value):
        self.stored.append(value)

    def peek(self):
        if self.stored:
            return self.stored[-1]
        value = next(self.source)
        self.stored.append(value)
        return value

    def __next__(self):
        if self.stored:
            return self.stored.pop()
        return next(self.source)

Prompted by Mark Ransom, here’s a class that you can use to wrap any iterator so that you can peek ahead, push values back onto the stream and check for empty. It’s a simple idea with a simple implementation that I’ve found very handy in the past.

class Pushable:

    def __init__(self, iter):
        self.source = iter
        self.stored = []

    def __iter__(self):
        return self

    def __bool__(self):
        if self.stored:
            return True
        try:
            self.stored.append(next(self.source))
        except StopIteration:
            return False
        return True

    def push(self, value):
        self.stored.append(value)

    def peek(self):
        if self.stored:
            return self.stored[-1]
        value = next(self.source)
        self.stored.append(value)
        return value

    def __next__(self):
        if self.stored:
            return self.stored.pop()
        return next(self.source)

回答 10

刚好落在这个线程上,并意识到缺少一个非常简单易读的答案:

def is_empty(generator):
    for item in generator:
        return False
    return True

如果我们不打算消耗任何物品,那么我们需要将第一个物品重新注入到生成器中:

def is_empty_no_side_effects(generator):
    try:
        item = next(generator)
        def my_generator():
            yield item
            yield from generator
        return my_generator(), False
    except StopIteration:
        return (_ for _ in []), True

例:

>>> g=(i for i in [])
>>> g,empty=is_empty_no_side_effects(g)
>>> empty
True
>>> g=(i for i in range(10))
>>> g,empty=is_empty_no_side_effects(g)
>>> empty
False
>>> list(g)
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]

Just fell on this thread and realized that a very simple and easy to read answer was missing:

def is_empty(generator):
    for item in generator:
        return False
    return True

If we are not suppose to consume any item then we need to re-inject the first item into the generator:

def is_empty_no_side_effects(generator):
    try:
        item = next(generator)
        def my_generator():
            yield item
            yield from generator
        return my_generator(), False
    except StopIteration:
        return (_ for _ in []), True

Example:

>>> g=(i for i in [])
>>> g,empty=is_empty_no_side_effects(g)
>>> empty
True
>>> g=(i for i in range(10))
>>> g,empty=is_empty_no_side_effects(g)
>>> empty
False
>>> list(g)
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]

回答 11

>>> gen = (i for i in [])
>>> next(gen)
Traceback (most recent call last):
  File "<pyshell#43>", line 1, in <module>
    next(gen)
StopIteration

在生成器结尾处StopIteration引发,因为在您的情况下立即到达结尾,所以引发异常。但通常您不应该检查是否存在下一个值。

您可以做的另一件事是:

>>> gen = (i for i in [])
>>> if not list(gen):
    print('empty generator')
>>> gen = (i for i in [])
>>> next(gen)
Traceback (most recent call last):
  File "<pyshell#43>", line 1, in <module>
    next(gen)
StopIteration

At the end of generator StopIteration is raised, since in your case end is reached immediately, exception is raised. But normally you shouldn’t check for existence of next value.

another thing you can do is:

>>> gen = (i for i in [])
>>> if not list(gen):
    print('empty generator')

回答 12

如果您使用生成器之前需要知道,那么没有,没有简单的方法。如果您可以等到使用生成器后再使用,则有一种简单的方法:

was_empty = True

for some_item in some_generator:
    was_empty = False
    do_something_with(some_item)

if was_empty:
    handle_already_empty_generator_case()

If you need to know before you use the generator, then no, there is no simple way. If you can wait until after you have used the generator, there is a simple way:

was_empty = True

for some_item in some_generator:
    was_empty = False
    do_something_with(some_item)

if was_empty:
    handle_already_empty_generator_case()

回答 13

只需用itertools.chain包装生成器,然后将表示可迭代结束的内容作为第二个可迭代,然后进行简单检查即可。

例如:

import itertools

g = some_iterable
eog = object()
wrap_g = itertools.chain(g, [eog])

现在剩下的就是检查我们附加到iterable末尾的值,当您读取它时,它将表示末尾

for value in wrap_g:
    if value == eog: # DING DING! We just found the last element of the iterable
        pass # Do something

Simply wrap the generator with itertools.chain, put something that will represent the end of the iterable as the second iterable, then simply check for that.

Ex:

import itertools

g = some_iterable
eog = object()
wrap_g = itertools.chain(g, [eog])

Now all that’s left is to check for that value we appended to the end of the iterable, when you read it then that will signify the end

for value in wrap_g:
    if value == eog: # DING DING! We just found the last element of the iterable
        pass # Do something

回答 14

在我的情况下,我需要先了解是否填充了许多生成器,然后再将其传递给一个函数,该函数合并了各个项,即zip(...)。解决方案与接受的答案相似但足够不同:

定义:

def has_items(iterable):
    try:
        return True, itertools.chain([next(iterable)], iterable)
    except StopIteration:
        return False, []

用法:

def filter_empty(iterables):
    for iterable in iterables:
        itr_has_items, iterable = has_items(iterable)
        if itr_has_items:
            yield iterable


def merge_iterables(iterables):
    populated_iterables = filter_empty(iterables)
    for items in zip(*populated_iterables):
        # Use items for each "slice"

我的特定问题具有以下属性:可迭代项为空或具有完全相同的条目数。

In my case I needed to know if a host of generators was populated before I passed it on to a function, which merged the items, i.e., zip(...). The solution is similar, but different enough, from the accepted answer:

Definition:

def has_items(iterable):
    try:
        return True, itertools.chain([next(iterable)], iterable)
    except StopIteration:
        return False, []

Usage:

def filter_empty(iterables):
    for iterable in iterables:
        itr_has_items, iterable = has_items(iterable)
        if itr_has_items:
            yield iterable


def merge_iterables(iterables):
    populated_iterables = filter_empty(iterables)
    for items in zip(*populated_iterables):
        # Use items for each "slice"

My particular problem has the property that the iterables are either empty or has exactly the same number of entries.


回答 15

我发现只有这种解决方案也可以用于空迭代。

def is_generator_empty(generator):
    a, b = itertools.tee(generator)
    try:
        next(a)
    except StopIteration:
        return True, b
    return False, b

is_empty, generator = is_generator_empty(generator)

或者,如果您不想为此使用异常,请尝试使用

def is_generator_empty(generator):
    a, b = itertools.tee(generator)
    for item in a:
        return False, b
    return True, b

is_empty, generator = is_generator_empty(generator)

标记的解决方案中,您无法将其用于空发生器,例如

def get_empty_generator():
    while False:
        yield None 

generator = get_empty_generator()

I found only this solution as working for empty iterations as well.

def is_generator_empty(generator):
    a, b = itertools.tee(generator)
    try:
        next(a)
    except StopIteration:
        return True, b
    return False, b

is_empty, generator = is_generator_empty(generator)

Or if you do not want to use exception for this try to use

def is_generator_empty(generator):
    a, b = itertools.tee(generator)
    for item in a:
        return False, b
    return True, b

is_empty, generator = is_generator_empty(generator)

In the marked solution you are not possible to use it for empty generators like

def get_empty_generator():
    while False:
        yield None 

generator = get_empty_generator()

回答 16

这是一个古老且已回答的问题,但正如以前没有人显示过的那样,它来了:

for _ in generator:
    break
else:
    print('Empty')

你可以在这里阅读更多

This is an old and answered question, but as no one has shown it before, here it goes:

for _ in generator:
    break
else:
    print('Empty')

You can read more here


回答 17

这是我使用的简单方法,用于在检查是否产生某些结果时继续返回迭代器,而只是检查循环是否运行:

        n = 0
        for key, value in iterator:
            n+=1
            yield key, value
        if n == 0:
            print ("nothing found in iterator)
            break

Here is my simple approach that i use to keep on returning an iterator while checking if something was yielded I just check if the loop runs:

        n = 0
        for key, value in iterator:
            n+=1
            yield key, value
        if n == 0:
            print ("nothing found in iterator)
            break

回答 18

这是一个包装生成器的简单装饰器,因此如果为空,则返回None。如果您的代码需要循环遍历之前知道生成器是否会生成任何东西这将很有用。

def generator_or_none(func):
    """Wrap a generator function, returning None if it's empty. """

    def inner(*args, **kwargs):
        # peek at the first item; return None if it doesn't exist
        try:
            next(func(*args, **kwargs))
        except StopIteration:
            return None

        # return original generator otherwise first item will be missing
        return func(*args, **kwargs)

    return inner

用法:

import random

@generator_or_none
def random_length_generator():
    for i in range(random.randint(0, 10)):
        yield i

gen = random_length_generator()
if gen is None:
    print('Generator is empty')

其中一个有用的示例是在模板代码中-即jinja2

{% if content_generator %}
  <section>
    <h4>Section title</h4>
    {% for item in content_generator %}
      {{ item }}
    {% endfor %
  </section>
{% endif %}

Here’s a simple decorator which wraps the generator, so it returns None if empty. This can be useful if your code needs to know whether the generator will produce anything before looping through it.

def generator_or_none(func):
    """Wrap a generator function, returning None if it's empty. """

    def inner(*args, **kwargs):
        # peek at the first item; return None if it doesn't exist
        try:
            next(func(*args, **kwargs))
        except StopIteration:
            return None

        # return original generator otherwise first item will be missing
        return func(*args, **kwargs)

    return inner

Usage:

import random

@generator_or_none
def random_length_generator():
    for i in range(random.randint(0, 10)):
        yield i

gen = random_length_generator()
if gen is None:
    print('Generator is empty')

One example where this is useful is in templating code – i.e. jinja2

{% if content_generator %}
  <section>
    <h4>Section title</h4>
    {% for item in content_generator %}
      {{ item }}
    {% endfor %
  </section>
{% endif %}

回答 19

使用islice,您只需要检查第一次迭代即可发现它是否为空。

从itertools导入islice

def isempty(iterable):
    返回列表(islice(iterable,1))== []

using islice you need only check up to the first iteration to discover if it is empty.

from itertools import islice

def isempty(iterable):
    return list(islice(iterable,1)) == []


回答 20

怎么样使用any()?我将其与生成器配合使用,并且工作正常。这里有人解释一下

What about using any()? I use it with generators and it’s working fine. Here there is guy explaining a little about this


回答 21

在cytoolz中使用偷看功能。

from cytoolz import peek
from typing import Tuple, Iterable

def is_empty_iterator(g: Iterable) -> Tuple[Iterable, bool]:
    try:
        _, g = peek(g)
        return g, False
    except StopIteration:
        return g, True

此函数返回的迭代器将等效于作为参数传入的原始迭代器。

Use the peek function in cytoolz.

from cytoolz import peek
from typing import Tuple, Iterable

def is_empty_iterator(g: Iterable) -> Tuple[Iterable, bool]:
    try:
        _, g = peek(g)
        return g, False
    except StopIteration:
        return g, True

The iterator returned by this function will be equivalent to the original one passed in as an argument.


回答 22

我通过使用sum函数解决了它。请参阅下面的示例,我使用了glob.iglob(它返回一个生成器)。

def isEmpty():
    files = glob.iglob(search)
    if sum(1 for _ in files):
        return True
    return False

*这可能不适用于巨大的生成器,但对于较小的列表应该表现良好

I solved it by using the sum function. See below for an example I used with glob.iglob (which returns a generator).

def isEmpty():
    files = glob.iglob(search)
    if sum(1 for _ in files):
        return True
    return False

*This will probably not work for HUGE generators but should perform nicely for smaller lists


了解Python中的生成器

问题:了解Python中的生成器

我目前正在阅读Python食谱,目前正在研究生成器。我发现很难回头。

我来自Java的背景,是否有Java的等效语言?这本书讲的是“生产者/消费者”,但是当我听说线程的时候。

什么是生成器,为什么要使用它?显然,无需引用任何书籍(除非您可以直接从书籍中找到一个体面,简单的答案)。也许举一些例子,如果您感到慷慨!

I am reading the Python cookbook at the moment and am currently looking at generators. I’m finding it hard to get my head round.

As I come from a Java background, is there a Java equivalent? The book was speaking about ‘Producer / Consumer’, however when I hear that I think of threading.

What is a generator and why would you use it? Without quoting any books, obviously (unless you can find a decent, simplistic answer direct from a book). Perhaps with examples, if you’re feeling generous!


回答 0

注意:本文采用Python 3.x语法。

一个生成器仅仅是它返回一个对象,你可以调用一个函数next,这样在每次调用它返回一定的价值,直到它提出了一个StopIterationexceptions,这表明所有值已经产生。这样的对象称为迭代器

普通函数使用来返回单个值return,就像Java中一样。但是,在Python中,有一个替代方法称为yieldyield在函数中的任何地方使用它都会使其生成器。遵守以下代码:

>>> def myGen(n):
...     yield n
...     yield n + 1
... 
>>> g = myGen(6)
>>> next(g)
6
>>> next(g)
7
>>> next(g)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
StopIteration

如您所见,myGen(n)是一个产生n和的函数n + 1。每次调用都会next产生一个值,直到产生所有值为止。for循环next在后台调用,因此:

>>> for n in myGen(6):
...     print(n)
... 
6
7

同样,还有生成器表达式,它们提供了一种方法来简要描述某些常见的生成器类型:

>>> g = (n for n in range(3, 5))
>>> next(g)
3
>>> next(g)
4
>>> next(g)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
StopIteration

请注意,生成器表达式非常类似于列表推导

>>> lc = [n for n in range(3, 5)]
>>> lc
[3, 4]

观察到生成器对象仅生成一次,但是其代码并非一次运行。仅调用以next实际执行(部分)代码。一旦yield到达语句,生成器中的代码将停止执​​行,并在该语句上返回值。next然后,对下一个调用会导致执行在生成器在最后一个生成器被保留的状态下继续执行yield。这是常规函数的根本区别:常规函数始终在“顶部”开始执行,并在返回值时丢弃其状态。

关于这个主题还有更多的事情要说。例如,可以将send数据返回到生成器(参考)中。但这是我建议您在了解生成器的基本概念之前不要研究的东西。

现在您可能会问:为什么使用生成器?有两个很好的理由:

  • 使用生成器可以更简洁地描述某些概念。
  • 无需创建返回值列表的函数,而是可以编写生成器以动态生成值。这意味着不需要构造任何列表,这意味着生成的代码具有更高的内存效率。这样,甚至可以描述太大而无法容纳在内存中的数据流。
  • 生成器提供了一种自然的方式来描述无限流。考虑例如斐波那契数

    >>> def fib():
    ...     a, b = 0, 1
    ...     while True:
    ...         yield a
    ...         a, b = b, a + b
    ... 
    >>> import itertools
    >>> list(itertools.islice(fib(), 10))
    [0, 1, 1, 2, 3, 5, 8, 13, 21, 34]

    该代码用于itertools.islice从无限流中获取有限数量的元素。建议您仔细看一下itertools模块中的功能,因为它们是轻松编写高级生成器的基本工具。


   关于Python <= 2.6:在上面的示例中next是一个函数,该函数__next__在给定对象上调用方法。在Python <= 2.6中,使用了一种稍有不同的技术,即o.next()代替next(o)。Python 2.7具有next()call,.next因此您无需在2.7中使用以下内容:

>>> g = (n for n in range(3, 5))
>>> g.next()
3

Note: this post assumes Python 3.x syntax.

A generator is simply a function which returns an object on which you can call next, such that for every call it returns some value, until it raises a StopIteration exception, signaling that all values have been generated. Such an object is called an iterator.

Normal functions return a single value using return, just like in Java. In Python, however, there is an alternative, called yield. Using yield anywhere in a function makes it a generator. Observe this code:

>>> def myGen(n):
...     yield n
...     yield n + 1
... 
>>> g = myGen(6)
>>> next(g)
6
>>> next(g)
7
>>> next(g)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
StopIteration

As you can see, myGen(n) is a function which yields n and n + 1. Every call to next yields a single value, until all values have been yielded. for loops call next in the background, thus:

>>> for n in myGen(6):
...     print(n)
... 
6
7

Likewise there are generator expressions, which provide a means to succinctly describe certain common types of generators:

>>> g = (n for n in range(3, 5))
>>> next(g)
3
>>> next(g)
4
>>> next(g)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
StopIteration

Note that generator expressions are much like list comprehensions:

>>> lc = [n for n in range(3, 5)]
>>> lc
[3, 4]

Observe that a generator object is generated once, but its code is not run all at once. Only calls to next actually execute (part of) the code. Execution of the code in a generator stops once a yield statement has been reached, upon which it returns a value. The next call to next then causes execution to continue in the state in which the generator was left after the last yield. This is a fundamental difference with regular functions: those always start execution at the “top” and discard their state upon returning a value.

There are more things to be said about this subject. It is e.g. possible to send data back into a generator (reference). But that is something I suggest you do not look into until you understand the basic concept of a generator.

Now you may ask: why use generators? There are a couple of good reasons:

  • Certain concepts can be described much more succinctly using generators.
  • Instead of creating a function which returns a list of values, one can write a generator which generates the values on the fly. This means that no list needs to be constructed, meaning that the resulting code is more memory efficient. In this way one can even describe data streams which would simply be too large to fit in memory.
  • Generators allow for a natural way to describe infinite streams. Consider for example the Fibonacci numbers:

    >>> def fib():
    ...     a, b = 0, 1
    ...     while True:
    ...         yield a
    ...         a, b = b, a + b
    ... 
    >>> import itertools
    >>> list(itertools.islice(fib(), 10))
    [0, 1, 1, 2, 3, 5, 8, 13, 21, 34]
    

    This code uses itertools.islice to take a finite number of elements from an infinite stream. You are advised to have a good look at the functions in the itertools module, as they are essential tools for writing advanced generators with great ease.


   About Python <=2.6: in the above examples next is a function which calls the method __next__ on the given object. In Python <=2.6 one uses a slightly different technique, namely o.next() instead of next(o). Python 2.7 has next() call .next so you need not use the following in 2.7:

>>> g = (n for n in range(3, 5))
>>> g.next()
3

回答 1

生成器实际上是一个在完成之前返回(数据)的函数,但是它会在那一点暂停,您可以在那一点恢复该函数。

>>> def myGenerator():
...     yield 'These'
...     yield 'words'
...     yield 'come'
...     yield 'one'
...     yield 'at'
...     yield 'a'
...     yield 'time'

>>> myGeneratorInstance = myGenerator()
>>> next(myGeneratorInstance)
These
>>> next(myGeneratorInstance)
words

等等。生成器的(或一个)好处是,因为生成器一次处理一个数据,所以您可以处理大量数据;对于列表,过多的内存需求可能会成为问题。生成器与列表一样,都是可迭代的,因此可以以相同的方式使用它们:

>>> for word in myGeneratorInstance:
...     print word
These
words
come
one
at 
a 
time

请注意,生成器提供了另一种处理无穷大的方法,例如

>>> from time import gmtime, strftime
>>> def myGen():
...     while True:
...         yield strftime("%a, %d %b %Y %H:%M:%S +0000", gmtime())    
>>> myGeneratorInstance = myGen()
>>> next(myGeneratorInstance)
Thu, 28 Jun 2001 14:17:15 +0000
>>> next(myGeneratorInstance)
Thu, 28 Jun 2001 14:18:02 +0000   

生成器封装了一个无限循环,但这不是问题,因为每次请求它时,您只会得到每个答案。

A generator is effectively a function that returns (data) before it is finished, but it pauses at that point, and you can resume the function at that point.

>>> def myGenerator():
...     yield 'These'
...     yield 'words'
...     yield 'come'
...     yield 'one'
...     yield 'at'
...     yield 'a'
...     yield 'time'

>>> myGeneratorInstance = myGenerator()
>>> next(myGeneratorInstance)
These
>>> next(myGeneratorInstance)
words

and so on. The (or one) benefit of generators is that because they deal with data one piece at a time, you can deal with large amounts of data; with lists, excessive memory requirements could become a problem. Generators, just like lists, are iterable, so they can be used in the same ways:

>>> for word in myGeneratorInstance:
...     print word
These
words
come
one
at 
a 
time

Note that generators provide another way to deal with infinity, for example

>>> from time import gmtime, strftime
>>> def myGen():
...     while True:
...         yield strftime("%a, %d %b %Y %H:%M:%S +0000", gmtime())    
>>> myGeneratorInstance = myGen()
>>> next(myGeneratorInstance)
Thu, 28 Jun 2001 14:17:15 +0000
>>> next(myGeneratorInstance)
Thu, 28 Jun 2001 14:18:02 +0000   

The generator encapsulates an infinite loop, but this isn’t a problem because you only get each answer every time you ask for it.


回答 2

首先,术语“ 生成器”最初在Python中定义不清,从而引起很多混乱。您可能是指迭代器可迭代对象(请参阅此处)。然后在Python中还有生成器函数(返回生成器对象),生成器对象(即迭代器)和生成器表达式(它们被评估为生成器对象)。

根据生成器的词汇表条目,似乎正式的术语是生成器是“生成器功能”的缩写。过去,文档中对术语的定义不一致,但是幸运的是,此问题已得到解决。

精确一点,避免在没有进一步说明的情况下使用术语“生成器”可能仍然是一个好主意。

First of all, the term generator originally was somewhat ill-defined in Python, leading to lots of confusion. You probably mean iterators and iterables (see here). Then in Python there are also generator functions (which return a generator object), generator objects (which are iterators) and generator expressions (which are evaluated to a generator object).

According to the glossary entry for generator it seems that the official terminology is now that generator is short for “generator function”. In the past the documentation defined the terms inconsistently, but fortunately this has been fixed.

It might still be a good idea to be precise and avoid the term “generator” without further specification.


回答 3

生成器可以被认为是创建迭代器的简写。它们的行为类似于Java迭代器。例:

>>> g = (x for x in range(10))
>>> g
<generator object <genexpr> at 0x7fac1c1e6aa0>
>>> g.next()
0
>>> g.next()
1
>>> g.next()
2
>>> list(g)   # force iterating the rest
[3, 4, 5, 6, 7, 8, 9]
>>> g.next()  # iterator is at the end; calling next again will throw
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
StopIteration

希望这对您有所帮助。

更新:

正如许多其他答案所示,创建生成器的方法有很多。您可以像上面的示例一样使用括号语法,也可以使用yield。另一个有趣的功能是生成器可以是“无限的”-不会停止的迭代器:

>>> def infinite_gen():
...     n = 0
...     while True:
...         yield n
...         n = n + 1
... 
>>> g = infinite_gen()
>>> g.next()
0
>>> g.next()
1
>>> g.next()
2
>>> g.next()
3
...

Generators could be thought of as shorthand for creating an iterator. They behave like a Java Iterator. Example:

>>> g = (x for x in range(10))
>>> g
<generator object <genexpr> at 0x7fac1c1e6aa0>
>>> g.next()
0
>>> g.next()
1
>>> g.next()
2
>>> list(g)   # force iterating the rest
[3, 4, 5, 6, 7, 8, 9]
>>> g.next()  # iterator is at the end; calling next again will throw
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
StopIteration

Hope this helps/is what you are looking for.

Update:

As many other answers are showing, there are different ways to create a generator. You can use the parentheses syntax as in my example above, or you can use yield. Another interesting feature is that generators can be “infinite” — iterators that don’t stop:

>>> def infinite_gen():
...     n = 0
...     while True:
...         yield n
...         n = n + 1
... 
>>> g = infinite_gen()
>>> g.next()
0
>>> g.next()
1
>>> g.next()
2
>>> g.next()
3
...

回答 4

没有等效的Java。

这是一个人为的例子:

#! /usr/bin/python
def  mygen(n):
    x = 0
    while x < n:
        x = x + 1
        if x % 3 == 0:
            yield x

for a in mygen(100):
    print a

生成器中存在一个从0到n的循环,如果循环变量是3的倍数,它将产生该变量。

for循环的每次迭代期间,都会执行生成器。如果这是生成器的第一次执行,则它从头开始,否则从生成的上一时间开始继续。

There is no Java equivalent.

Here is a bit of a contrived example:

#! /usr/bin/python
def  mygen(n):
    x = 0
    while x < n:
        x = x + 1
        if x % 3 == 0:
            yield x

for a in mygen(100):
    print a

There is a loop in the generator that runs from 0 to n, and if the loop variable is a multiple of 3, it yields the variable.

During each iteration of the for loop the generator is executed. If it is the first time the generator executes, it starts at the beginning, otherwise it continues from the previous time it yielded.


回答 5

我喜欢用堆栈框架向那些在编程语言和计算领域具有良好背景的人描述生成器。

在许多语言中,有一个堆栈,其上方是当前堆栈“框架”。堆栈框架包括为函数本地变量分配的空间,包括传递给该函数的参数。

调用函数时,当前执行点(“程序计数器”或等效程序)被压入堆栈,并创建一个新的堆栈框架。然后执行将转移到被调用函数的开头。

对于常规函数,函数有时会返回一个值,并且堆栈会“弹出”。该函数的堆栈帧将被丢弃,并在先前的位置恢复执行。

当函数是生成器时,它可以使用yield语句返回一个值,不会丢弃堆栈帧。函数中局部变量的值和程序计数器将保留。这允许生成器在以后的时间恢复,并从yield语句继续执行,并且可以执行更多代码并返回另一个值。

在Python 2.5之前,这是所有生成器所做的。Python 2.5的加入到传回值的能力到生成器为好。这样,传入的值就可以用作由yield语句生成的表达式,该语句从生成器临时返回了控制(和值)。

生成器的主要优点是保留了函数的“状态”,这与常规函数不同,在常规函数中,每次丢弃堆栈帧时,您都会丢失所有的“状态”。第二个优点是避免了某些函数调用开销(创建和删除堆栈帧),尽管这通常是次要的优点。

I like to describe generators, to those with a decent background in programming languages and computing, in terms of stack frames.

In many languages, there is a stack on top of which is the current stack “frame”. The stack frame includes space allocated for variables local to the function including the arguments passed in to that function.

When you call a function, the current point of execution (the “program counter” or equivalent) is pushed onto the stack, and a new stack frame is created. Execution then transfers to the beginning of the function being called.

With regular functions, at some point the function returns a value, and the stack is “popped”. The function’s stack frame is discarded and execution resumes at the previous location.

When a function is a generator, it can return a value without the stack frame being discarded, using the yield statement. The values of local variables and the program counter within the function are preserved. This allows the generator to be resumed at a later time, with execution continuing from the yield statement, and it can execute more code and return another value.

Before Python 2.5 this was all generators did. Python 2.5 added the ability to pass values back in to the generator as well. In doing so, the passed-in value is available as an expression resulting from the yield statement which had temporarily returned control (and a value) from the generator.

The key advantage to generators is that the “state” of the function is preserved, unlike with regular functions where each time the stack frame is discarded, you lose all that “state”. A secondary advantage is that some of the function call overhead (creating and deleting stack frames) is avoided, though this is a usually a minor advantage.


回答 6

我可以添加到Stephan202答案中的唯一一件事是建议您看一下David Beazley在PyCon ’08上的演讲“系统程序员的生成器技巧”,这是我所见过的生成器的方式和原因的最好的单一解释。任何地方。这就是让我从“ Python看起来很有趣”到“这就是我一直在寻找的东西”的原因。在http://www.dabeaz.com/generators/上

The only thing I can add to Stephan202’s answer is a recommendation that you take a look at David Beazley’s PyCon ’08 presentation “Generator Tricks for Systems Programmers,” which is the best single explanation of the how and why of generators that I’ve seen anywhere. This is the thing that took me from “Python looks kind of fun” to “This is what I’ve been looking for.” It’s at http://www.dabeaz.com/generators/.


回答 7

有助于清楚地区分函数foo和生成器foo(n):

def foo(n):
    yield n
    yield n+1

foo是一个函数。foo(6)是一个生成器对象。

使用生成器对象的典型方法是在循环中:

for n in foo(6):
    print(n)

循环打印

# 6
# 7

将生成器视为可恢复的功能。

yield类似的行为return在某种意义上说,那些取得了获取值的“返回”由生成器。但是,与return不同的是,下一次与生成器不同的是,生成器的函数foo在上一个yield语句之后恢复从上次中断的位置继续运行,直到遇到另一个yield语句为止。

在后台,当您调用bar=foo(6)生成器对象栏时,便会为其定义一个next属性。

您可以自己调用它以检索从foo产生的值:

next(bar)    # Works in Python 2.6 or Python 3.x
bar.next()   # Works in Python 2.5+, but is deprecated. Use next() if possible.

当foo结束时(并且不再有产生的值),调用next(bar)将引发StopInteration错误。

It helps to make a clear distinction between the function foo, and the generator foo(n):

def foo(n):
    yield n
    yield n+1

foo is a function. foo(6) is a generator object.

The typical way to use a generator object is in a loop:

for n in foo(6):
    print(n)

The loop prints

# 6
# 7

Think of a generator as a resumable function.

yield behaves like return in the sense that values that are yielded get “returned” by the generator. Unlike return, however, the next time the generator gets asked for a value, the generator’s function, foo, resumes where it left off — after the last yield statement — and continues to run until it hits another yield statement.

Behind the scenes, when you call bar=foo(6) the generator object bar is defined for you to have a next attribute.

You can call it yourself to retrieve values yielded from foo:

next(bar)    # Works in Python 2.6 or Python 3.x
bar.next()   # Works in Python 2.5+, but is deprecated. Use next() if possible.

When foo ends (and there are no more yielded values), calling next(bar) throws a StopInteration error.


回答 8

这篇文章将使用斐波那契数作为工具来解释Python生成器的有用性。

这篇文章将同时介绍C ++和Python代码。

斐波那契数定义为以下顺序:0、1、1、2、3、5、8、13、21、34,…。

或一般来说:

F0 = 0
F1 = 1
Fn = Fn-1 + Fn-2

这可以非常容易地转移到C ++函数中:

size_t Fib(size_t n)
{
    //Fib(0) = 0
    if(n == 0)
        return 0;

    //Fib(1) = 1
    if(n == 1)
        return 1;

    //Fib(N) = Fib(N-2) + Fib(N-1)
    return Fib(n-2) + Fib(n-1);
}

但是,如果要打印前六个斐波那契数,则将使用上述函数重新计算很多值。

例如:Fib(3) = Fib(2) + Fib(1),而且Fib(2)还会重新计算Fib(1)。您想要计算的值越高,您的收益就越差。

因此,可能会想通过跟踪中的状态来重写上面的内容main

// Not supported for the first two elements of Fib
size_t GetNextFib(size_t &pp, size_t &p)
{
    int result = pp + p;
    pp = p;
    p = result;
    return result;
}

int main(int argc, char *argv[])
{
    size_t pp = 0;
    size_t p = 1;
    std::cout << "0 " << "1 ";
    for(size_t i = 0; i <= 4; ++i)
    {
        size_t fibI = GetNextFib(pp, p);
        std::cout << fibI << " ";
    }
    return 0;
}

但这很丑陋,并且使中的逻辑复杂化main。最好不必担心我们main功能的状态。

我们可以返回一个vector值a ,并使用一个iterator来遍历该组值,但是对于大量的返回值,这一次需要大量内存。

回到我们以前的方法,如果我们除了打印数字还想做其他事情,会发生什么?我们必须复制并粘贴整个代码块,main然后将输出语句更改为我们想要执行的其他任何操作。而且,如果您复制并粘贴代码,则应该被枪杀。你不想被枪杀,是吗?

为了解决这些问题并避免被枪杀,我们可以使用回调函数重写此代码块。每次遇到新的斐波那契数字时,我们都会调用回调函数。

void GetFibNumbers(size_t max, void(*FoundNewFibCallback)(size_t))
{
    if(max-- == 0) return;
    FoundNewFibCallback(0);
    if(max-- == 0) return;
    FoundNewFibCallback(1);

    size_t pp = 0;
    size_t p = 1;
    for(;;)
    {
        if(max-- == 0) return;
        int result = pp + p;
        pp = p;
        p = result;
        FoundNewFibCallback(result);
    }
}

void foundNewFib(size_t fibI)
{
    std::cout << fibI << " ";
}

int main(int argc, char *argv[])
{
    GetFibNumbers(6, foundNewFib);
    return 0;
}

显然,这是一种改进,您的输入逻辑main并不那么混乱,您可以使用斐波那契数字进行任何操作,只需定义新的回调即可。

但这仍然不是完美的。如果您只想获取前两个斐波那契数,然后做某事,然后再获取更多,然后再做其他事情,该怎么办?

好吧,我们可以像main往常一样继续,我们可以再次将状态添加到中,从而允许GetFibNumbers从任意点开始。但这将使我们的代码更加膨胀,对于像打印斐波那契数字这样的简单任务而言,它看起来已经太大了。

我们可以通过几个线程来实现生产者和消费者模型。但这使代码更加复杂。

相反,让我们谈论生成器。

Python具有很好的语言功能,可以解决诸如此类的生成器之类的问题。

生成器允许您执行功能,在任意点处停止,然后从上次中断的地方继续执行。每次返回一个值。

考虑以下使用生成器的代码:

def fib():
    pp, p = 0, 1
    while 1:
        yield pp
        pp, p = p, pp+p

g = fib()
for i in range(6):
    g.next()

这给了我们结果:

0 1 1 2 3 5

yield语句与Python生成器结合使用。它保存函数的状态并返回yeilded值。下次您在生成器上调用next()函数时,它将在中断收益率的地方继续。

到目前为止,这比回调函数代码更干净。我们拥有更干净的代码,更小的代码,更不用说更多的功能代码了(Python允许任意大的整数)。

资源

This post will use Fibonacci numbers as a tool to build up to explaining the usefulness of Python generators.

This post will feature both C++ and Python code.

Fibonacci numbers are defined as the sequence: 0, 1, 1, 2, 3, 5, 8, 13, 21, 34, ….

Or in general:

F0 = 0
F1 = 1
Fn = Fn-1 + Fn-2

This can be transferred into a C++ function extremely easily:

size_t Fib(size_t n)
{
    //Fib(0) = 0
    if(n == 0)
        return 0;

    //Fib(1) = 1
    if(n == 1)
        return 1;

    //Fib(N) = Fib(N-2) + Fib(N-1)
    return Fib(n-2) + Fib(n-1);
}

But if you want to print the first six Fibonacci numbers, you will be recalculating a lot of the values with the above function.

For example: Fib(3) = Fib(2) + Fib(1), but Fib(2) also recalculates Fib(1). The higher the value you want to calculate, the worse off you will be.

So one may be tempted to rewrite the above by keeping track of the state in main.

// Not supported for the first two elements of Fib
size_t GetNextFib(size_t &pp, size_t &p)
{
    int result = pp + p;
    pp = p;
    p = result;
    return result;
}

int main(int argc, char *argv[])
{
    size_t pp = 0;
    size_t p = 1;
    std::cout << "0 " << "1 ";
    for(size_t i = 0; i <= 4; ++i)
    {
        size_t fibI = GetNextFib(pp, p);
        std::cout << fibI << " ";
    }
    return 0;
}

But this is very ugly, and it complicates our logic in main. It would be better to not have to worry about state in our main function.

We could return a vector of values and use an iterator to iterate over that set of values, but this requires a lot of memory all at once for a large number of return values.

So back to our old approach, what happens if we wanted to do something else besides print the numbers? We’d have to copy and paste the whole block of code in main and change the output statements to whatever else we wanted to do. And if you copy and paste code, then you should be shot. You don’t want to get shot, do you?

To solve these problems, and to avoid getting shot, we may rewrite this block of code using a callback function. Every time a new Fibonacci number is encountered, we would call the callback function.

void GetFibNumbers(size_t max, void(*FoundNewFibCallback)(size_t))
{
    if(max-- == 0) return;
    FoundNewFibCallback(0);
    if(max-- == 0) return;
    FoundNewFibCallback(1);

    size_t pp = 0;
    size_t p = 1;
    for(;;)
    {
        if(max-- == 0) return;
        int result = pp + p;
        pp = p;
        p = result;
        FoundNewFibCallback(result);
    }
}

void foundNewFib(size_t fibI)
{
    std::cout << fibI << " ";
}

int main(int argc, char *argv[])
{
    GetFibNumbers(6, foundNewFib);
    return 0;
}

This is clearly an improvement, your logic in main is not as cluttered, and you can do anything you want with the Fibonacci numbers, simply define new callbacks.

But this is still not perfect. What if you wanted to only get the first two Fibonacci numbers, and then do something, then get some more, then do something else?

Well, we could go on like we have been, and we could start adding state again into main, allowing GetFibNumbers to start from an arbitrary point. But this will further bloat our code, and it already looks too big for a simple task like printing Fibonacci numbers.

We could implement a producer and consumer model via a couple of threads. But this complicates the code even more.

Instead let’s talk about generators.

Python has a very nice language feature that solves problems like these called generators.

A generator allows you to execute a function, stop at an arbitrary point, and then continue again where you left off. Each time returning a value.

Consider the following code that uses a generator:

def fib():
    pp, p = 0, 1
    while 1:
        yield pp
        pp, p = p, pp+p

g = fib()
for i in range(6):
    g.next()

Which gives us the results:

0 1 1 2 3 5

The yield statement is used in conjuction with Python generators. It saves the state of the function and returns the yeilded value. The next time you call the next() function on the generator, it will continue where the yield left off.

This is by far more clean than the callback function code. We have cleaner code, smaller code, and not to mention much more functional code (Python allows arbitrarily large integers).

Source


回答 9

我相信迭代器和生成器的首次出现是在20年前的Icon编程语言中。

您可能会喜欢Icon概述,它使您可以专心围绕它们,而不必专注于语法(因为Icon是您可能不知道的语言,并且Griswold向其他语言的人解释了他的语言的好处)。

在这里仅阅读了几段之后,生成器和迭代器的实用程序可能会变得更加明显。

I believe the first appearance of iterators and generators were in the Icon programming language, about 20 years ago.

You may enjoy the Icon overview, which lets you wrap your head around them without concentrating on the syntax (since Icon is a language you probably don’t know, and Griswold was explaining the benefits of his language to people coming from other languages).

After reading just a few paragraphs there, the utility of generators and iterators might become more apparent.


回答 10

列表理解的经验表明它们在整个Python中具有广泛的实用性。但是,许多用例不需要在内存中创建完整列表。相反,它们只需要一次遍历一个元素。

例如,以下求和代码将在内存中构建一个完整的正方形列表,遍历这些值,并且在不再需要引用时,删除该列表:

sum([x*x for x in range(10)])

通过使用生成器表达式来节省内存:

sum(x*x for x in range(10))

容器对象的构造函数具有类似的好处:

s = Set(word  for line in page  for word in line.split())
d = dict( (k, func(k)) for k in keylist)

生成器表达式对于诸如sum(),min()和max()之类的函数特别有用,这些函数将可迭代的输入减少为单个值:

max(len(line)  for line in file  if line.strip())

更多

Experience with list comprehensions has shown their widespread utility throughout Python. However, many of the use cases do not need to have a full list created in memory. Instead, they only need to iterate over the elements one at a time.

For instance, the following summation code will build a full list of squares in memory, iterate over those values, and, when the reference is no longer needed, delete the list:

sum([x*x for x in range(10)])

Memory is conserved by using a generator expression instead:

sum(x*x for x in range(10))

Similar benefits are conferred on constructors for container objects:

s = Set(word  for line in page  for word in line.split())
d = dict( (k, func(k)) for k in keylist)

Generator expressions are especially useful with functions like sum(), min(), and max() that reduce an iterable input to a single value:

max(len(line)  for line in file  if line.strip())

more


回答 11

我编写了这段代码,解释了有关生成器的3个关键概念:

def numbers():
    for i in range(10):
            yield i

gen = numbers() #this line only returns a generator object, it does not run the code defined inside numbers

for i in gen: #we iterate over the generator and the values are printed
    print(i)

#the generator is now empty

for i in gen: #so this for block does not print anything
    print(i)

I put up this piece of code which explains 3 key concepts about generators:

def numbers():
    for i in range(10):
            yield i

gen = numbers() #this line only returns a generator object, it does not run the code defined inside numbers

for i in gen: #we iterate over the generator and the values are printed
    print(i)

#the generator is now empty

for i in gen: #so this for block does not print anything
    print(i)

您可以使用Python生成器函数做什么?

问题:您可以使用Python生成器函数做什么?

我开始学习Python,并且遇到过生成器函数,这些函数中包含yield语句。我想知道这些功能确实可以解决哪些类型的问题。

I’m starting to learn Python and I’ve come across generator functions, those that have a yield statement in them. I want to know what types of problems that these functions are really good at solving.


回答 0

生成器为您提供懒惰的评估。您可以通过遍历它们来使用它们,可以显式地使用“ for”,也可以通过将其传递给任何迭代的函数或构造来隐式使用。您可以将生成器视为返回多个项目,就像它们返回一个列表一样,但是它们不是一次全部返回它们而是一一返回,而是暂停生成器功能,直到请求下一个项目。

生成器适用于计算大量结果(特别是涉及循环本身的计算),在这些情况下您不知道是否需要所有结果,或者不想在同一时间为所有结果分配内存。或者对于生成器使用另一台生成器或消耗某些其他资源的情况,如果这种情况发生得越晚越方便。

生成器的另一个用途(实际上是相同的)是用迭代替换回调。在某些情况下,您希望函数执行大量工作,并偶尔向调用者报告。传统上,您将为此使用回调函数。您将此回调传递给工作函数,它将定期调用此回调。生成器方法是工作函数(现在是生成器)对回调一无所知,仅在需要报告某些内容时才产生。调用者没有编写单独的回调并将其传递给工作函数,而是在生成器周围的一个“ for”循环中完成所有报告工作。

例如,假设您编写了一个“文件系统搜索”程序。您可以完整地执行搜索,收集结果,然后一次显示一个。在显示第一个结果之前,必须先收集所有结果,并且所有结果将同时存储在内存中。或者,您可以在找到结果时显示结果,这样可以提高内存效率,并且对用户友好得多。后者可以通过将结果打印功能传递给文件系统搜索功能来完成,也可以仅使搜索功能为生成器并迭代结果来完成。

如果要查看后两种方法的示例,请参见os.path.walk()(带有回调的旧文件系统行走功能)和os.walk()(新的文件系统行走生成器)。当然,如果您确实想将所有结果收集到列表中,生成器方法可以轻松转换为大列表方法:

big_list = list(the_generator)

Generators give you lazy evaluation. You use them by iterating over them, either explicitly with ‘for’ or implicitly by passing it to any function or construct that iterates. You can think of generators as returning multiple items, as if they return a list, but instead of returning them all at once they return them one-by-one, and the generator function is paused until the next item is requested.

Generators are good for calculating large sets of results (in particular calculations involving loops themselves) where you don’t know if you are going to need all results, or where you don’t want to allocate the memory for all results at the same time. Or for situations where the generator uses another generator, or consumes some other resource, and it’s more convenient if that happened as late as possible.

Another use for generators (that is really the same) is to replace callbacks with iteration. In some situations you want a function to do a lot of work and occasionally report back to the caller. Traditionally you’d use a callback function for this. You pass this callback to the work-function and it would periodically call this callback. The generator approach is that the work-function (now a generator) knows nothing about the callback, and merely yields whenever it wants to report something. The caller, instead of writing a separate callback and passing that to the work-function, does all the reporting work in a little ‘for’ loop around the generator.

For example, say you wrote a ‘filesystem search’ program. You could perform the search in its entirety, collect the results and then display them one at a time. All of the results would have to be collected before you showed the first, and all of the results would be in memory at the same time. Or you could display the results while you find them, which would be more memory efficient and much friendlier towards the user. The latter could be done by passing the result-printing function to the filesystem-search function, or it could be done by just making the search function a generator and iterating over the result.

If you want to see an example of the latter two approaches, see os.path.walk() (the old filesystem-walking function with callback) and os.walk() (the new filesystem-walking generator.) Of course, if you really wanted to collect all results in a list, the generator approach is trivial to convert to the big-list approach:

big_list = list(the_generator)

回答 1

使用生成器的原因之一是使某种解决方案的解决方案更清晰。

另一种是一次处理一个结果,避免构建庞大的结果列表,而这些结果无论如何都要分开处理。

如果您有一个像这样的fibonacci-up-to-n函数:

# function version
def fibon(n):
    a = b = 1
    result = []
    for i in xrange(n):
        result.append(a)
        a, b = b, a + b
    return result

您可以这样更轻松地编写函数:

# generator version
def fibon(n):
    a = b = 1
    for i in xrange(n):
        yield a
        a, b = b, a + b

功能更清晰。如果您使用这样的功能:

for x in fibon(1000000):
    print x,

在此示例中,如果使用生成器版本,则将完全不会创建整个1000000项列表,一次只能创建一个值。使用列表版本时,情况并非如此,先创建列表。

One of the reasons to use generator is to make the solution clearer for some kind of solutions.

The other is to treat results one at a time, avoiding building huge lists of results that you would process separated anyway.

If you have a fibonacci-up-to-n function like this:

# function version
def fibon(n):
    a = b = 1
    result = []
    for i in xrange(n):
        result.append(a)
        a, b = b, a + b
    return result

You can more easily write the function as this:

# generator version
def fibon(n):
    a = b = 1
    for i in xrange(n):
        yield a
        a, b = b, a + b

The function is clearer. And if you use the function like this:

for x in fibon(1000000):
    print x,

in this example, if using the generator version, the whole 1000000 item list won’t be created at all, just one value at a time. That would not be the case when using the list version, where a list would be created first.


回答 2

请参阅PEP 255中的“动机”部分。

生成器的一种非显而易见的用法是创建可中断的函数,使您可以在不使用线程的情况下执行诸如更新UI或“同时”(实际上是交错)运行多个作业的操作。

See the “Motivation” section in PEP 255.

A non-obvious use of generators is creating interruptible functions, which lets you do things like update UI or run several jobs “simultaneously” (interleaved, actually) while not using threads.


回答 3

我发现这种解释消除了我的怀疑。因为有可能不认识的Generators人也不知道yield

返回

return语句将销毁所有局部变量,并将结果值返回(返回)给调用方。如果稍后再调用同一函数,则该函数将获得一组新的变量。

Yield

但是,如果在退出函数时不丢弃局部变量该怎么办?这意味着我们可以resume the function停下来。这是generators引入概念的地方,yield语句从function上次中断的地方继续。

  def generate_integers(N):
    for i in xrange(N):
    yield i

    In [1]: gen = generate_integers(3)
    In [2]: gen
    <generator object at 0x8117f90>
    In [3]: gen.next()
    0
    In [4]: gen.next()
    1
    In [5]: gen.next()

这就是Python中returnyield语句之间的区别。

Yield语句使函数成为生成器函数。

因此,生成器是用于创建迭代器的简单而强大的工具。它们的编写方式类似于常规函数,但是yield只要要返回数据,就使用该语句。每次调用next()时,生成器将从上次中断的地方恢复(它会记住所有数据值以及最后执行的语句)。

I find this explanation which clears my doubt. Because there is a possibility that person who don’t know Generators also don’t know about yield

Return

The return statement is where all the local variables are destroyed and the resulting value is given back (returned) to the caller. Should the same function be called some time later, the function will get a fresh new set of variables.

Yield

But what if the local variables aren’t thrown away when we exit a function? This implies that we can resume the function where we left off. This is where the concept of generators are introduced and the yield statement resumes where the function left off.

  def generate_integers(N):
    for i in xrange(N):
    yield i

    In [1]: gen = generate_integers(3)
    In [2]: gen
    <generator object at 0x8117f90>
    In [3]: gen.next()
    0
    In [4]: gen.next()
    1
    In [5]: gen.next()

So that’s the difference between return and yield statements in Python.

Yield statement is what makes a function a generator function.

So generators are a simple and powerful tool for creating iterators. They are written like regular functions, but they use the yield statement whenever they want to return data. Each time next() is called, the generator resumes where it left off (it remembers all the data values and which statement was last executed).


回答 4

真实的例子

假设您的MySQL表中有1亿个域,并且您想更新每个域的Alexa排名。

您需要做的第一件事是从数据库中选择域名。

假设您的表名是domains,列名是domain

如果您使用 SELECT domain FROM domains它,将返回1亿行,这将消耗大量内存。因此您的服务器可能会崩溃。

因此,您决定分批运行该程序。假设我们的批量为1000。

在第一批中,我们将查询前1000行,检查每个域的Alexa排名并更新数据库行。

在第二批中,我们将处理接下来的1000行。在第三批中,它将是从2001年到3000年,依此类推。

现在我们需要一个生成器函数来生成批处理。

这是我们的生成器函数:

def ResultGenerator(cursor, batchsize=1000):
    while True:
        results = cursor.fetchmany(batchsize)
        if not results:
            break
        for result in results:
            yield result

如您所见,我们的功能不断yield取得结果。如果您使用关键字return而不是yield,则整个函数在返回时将结束。

return - returns only once
yield - returns multiple times

如果函数使用关键字 yield那么它就是生成器。

现在您可以像这样迭代:

db = MySQLdb.connect(host="localhost", user="root", passwd="root", db="domains")
cursor = db.cursor()
cursor.execute("SELECT domain FROM domains")
for result in ResultGenerator(cursor):
    doSomethingWith(result)
db.close()

Real World Example

Let’s say you have 100 million domains in your MySQL table, and you would like to update Alexa rank for each domain.

First thing you need is to select your domain names from the database.

Let’s say your table name is domains and column name is domain.

If you use SELECT domain FROM domains it’s going to return 100 million rows which is going to consume lot of memory. So your server might crash.

So you decided to run the program in batches. Let’s say our batch size is 1000.

In our first batch we will query the first 1000 rows, check Alexa rank for each domain and update the database row.

In our second batch we will work on the next 1000 rows. In our third batch it will be from 2001 to 3000 and so on.

Now we need a generator function which generates our batches.

Here is our generator function:

def ResultGenerator(cursor, batchsize=1000):
    while True:
        results = cursor.fetchmany(batchsize)
        if not results:
            break
        for result in results:
            yield result

As you can see, our function keeps yielding the results. If you used the keyword return instead of yield, then the whole function would be ended once it reached return.

return - returns only once
yield - returns multiple times

If a function uses the keyword yield then it’s a generator.

Now you can iterate like this:

db = MySQLdb.connect(host="localhost", user="root", passwd="root", db="domains")
cursor = db.cursor()
cursor.execute("SELECT domain FROM domains")
for result in ResultGenerator(cursor):
    doSomethingWith(result)
db.close()

回答 5

缓冲。当可以大块地获取数据但以小块来处理数据时,生成器可能会有所帮助:

def bufferedFetch():
  while True:
     buffer = getBigChunkOfData()
     # insert some code to break on 'end of data'
     for i in buffer:    
          yield i

上面的内容使您可以轻松地将缓冲与处理分开。现在,消费者函数可以只逐个获取值,而不必担心缓冲。

Buffering. When it is efficient to fetch data in large chunks, but process it in small chunks, then a generator might help:

def bufferedFetch():
  while True:
     buffer = getBigChunkOfData()
     # insert some code to break on 'end of data'
     for i in buffer:    
          yield i

The above lets you easily separate buffering from processing. The consumer function can now just get the values one by one without worrying about buffering.


回答 6

我发现生成器在清理代码以及为您提供封装和模块化代码的独特方法方面非常有帮助。在你需要的东西,不停地吐了价值基于其自身的内部处理和情况时,从任何地方在你的代码调用的东西需要(而不仅仅是内部的循环或例如块),生成器功能,用。

一个抽象的例子是斐波那契数字生成器,它不存在于循环中,当从任何地方调用它时,它将始终返回序列中的下一个数字:

def fib():
    first = 0
    second = 1
    yield first
    yield second

    while 1:
        next = first + second
        yield next
        first = second
        second = next

fibgen1 = fib()
fibgen2 = fib()

现在,您有两个斐波那契数字生成器对象,可以在代码中的任何位置调用它们,它们将始终按以下顺序依次返回更大的斐波那契数字:

>>> fibgen1.next(); fibgen1.next(); fibgen1.next(); fibgen1.next()
0
1
1
2
>>> fibgen2.next(); fibgen2.next()
0
1
>>> fibgen1.next(); fibgen1.next()
3
5

生成器的妙处在于它们封装状态而无需经历创建对象的麻烦。考虑它们的一种方法是记住它们内部状态的“功能”。

我从Python Generators获得了Fibonacci示例-它们是什么?稍加想象,您就可以想到许多其他情况,其中生成器为for循环和其他传统迭代构造提供了绝佳的替代方案。

I have found that generators are very helpful in cleaning up your code and by giving you a very unique way to encapsulate and modularize code. In a situation where you need something to constantly spit out values based on its own internal processing and when that something needs to be called from anywhere in your code (and not just within a loop or a block for example), generators are the feature to use.

An abstract example would be a Fibonacci number generator that does not live within a loop and when it is called from anywhere will always return the next number in the sequence:

def fib():
    first = 0
    second = 1
    yield first
    yield second

    while 1:
        next = first + second
        yield next
        first = second
        second = next

fibgen1 = fib()
fibgen2 = fib()

Now you have two Fibonacci number generator objects which you can call from anywhere in your code and they will always return ever larger Fibonacci numbers in sequence as follows:

>>> fibgen1.next(); fibgen1.next(); fibgen1.next(); fibgen1.next()
0
1
1
2
>>> fibgen2.next(); fibgen2.next()
0
1
>>> fibgen1.next(); fibgen1.next()
3
5

The lovely thing about generators is that they encapsulate state without having to go through the hoops of creating objects. One way of thinking about them is as “functions” which remember their internal state.

I got the Fibonacci example from Python Generators – What are they? and with a little imagination, you can come up with a lot of other situations where generators make for a great alternative to for loops and other traditional iteration constructs.


回答 7

简单的解释:考虑一条for语句

for item in iterable:
   do_stuff()

很多时候,其中的所有项目iterable并不需要一开始就存在,而是可以根据需要即时生成。两者都可以更有效率

  • 空间(您无需同时存储所有物品)和
  • 时间(迭代可能会在需要所有项目之前完成)。

有时,您甚至都不知道所有项目。例如:

for command in user_input():
   do_stuff_with(command)

您无法事先知道所有用户的命令,但是如果您有生成器来处理命令,则可以使用类似这样的循环:

def user_input():
    while True:
        wait_for_command()
        cmd = get_command()
        yield cmd

使用生成器,您还可以在无限序列上进行迭代,这当然在对容器进行迭代时是不可能的。

The simple explanation: Consider a for statement

for item in iterable:
   do_stuff()

A lot of the time, all the items in iterable doesn’t need to be there from the start, but can be generated on the fly as they’re required. This can be a lot more efficient in both

  • space (you never need to store all the items simultaneously) and
  • time (the iteration may finish before all the items are needed).

Other times, you don’t even know all the items ahead of time. For example:

for command in user_input():
   do_stuff_with(command)

You have no way of knowing all the user’s commands beforehand, but you can use a nice loop like this if you have a generator handing you commands:

def user_input():
    while True:
        wait_for_command()
        cmd = get_command()
        yield cmd

With generators you can also have iteration over infinite sequences, which is of course not possible when iterating over containers.


回答 8

我最喜欢的用法是“过滤”和“减少”操作。

假设我们正在读取文件,只需要以“ ##”开头的行。

def filter2sharps( aSequence ):
    for l in aSequence:
        if l.startswith("##"):
            yield l

然后,我们可以在适当的循环中使用generator函数

source= file( ... )
for line in filter2sharps( source.readlines() ):
    print line
source.close()

减少示例类似。假设我们有一个文件,需要在其中定位<Location>...</Location>行块。[不是HTML标签,而是恰好看起来像标签的行。]

def reduceLocation( aSequence ):
    keep= False
    block= None
    for line in aSequence:
        if line.startswith("</Location"):
            block.append( line )
            yield block
            block= None
            keep= False
        elif line.startsWith("<Location"):
            block= [ line ]
            keep= True
        elif keep:
            block.append( line )
        else:
            pass
    if block is not None:
        yield block # A partial block, icky

同样,我们可以在适当的for循环中使用此生成器。

source = file( ... )
for b in reduceLocation( source.readlines() ):
    print b
source.close()

这个想法是,生成器函数允许我们过滤或减少序列,一次生成另一个序列一个值。

My favorite uses are “filter” and “reduce” operations.

Let’s say we’re reading a file, and only want the lines which begin with “##”.

def filter2sharps( aSequence ):
    for l in aSequence:
        if l.startswith("##"):
            yield l

We can then use the generator function in a proper loop

source= file( ... )
for line in filter2sharps( source.readlines() ):
    print line
source.close()

The reduce example is similar. Let’s say we have a file where we need to locate blocks of <Location>...</Location> lines. [Not HTML tags, but lines that happen to look tag-like.]

def reduceLocation( aSequence ):
    keep= False
    block= None
    for line in aSequence:
        if line.startswith("</Location"):
            block.append( line )
            yield block
            block= None
            keep= False
        elif line.startsWith("<Location"):
            block= [ line ]
            keep= True
        elif keep:
            block.append( line )
        else:
            pass
    if block is not None:
        yield block # A partial block, icky

Again, we can use this generator in a proper for loop.

source = file( ... )
for b in reduceLocation( source.readlines() ):
    print b
source.close()

The idea is that a generator function allows us to filter or reduce a sequence, producing a another sequence one value at a time.


回答 9

一个可以使用生成器的实际示例是,如果您具有某种形状,并且想要遍历生成器的角,边缘或其他任何东西。对于我自己的项目(这里的源代码),我有一个矩形:

class Rect():

    def __init__(self, x, y, width, height):
        self.l_top  = (x, y)
        self.r_top  = (x+width, y)
        self.r_bot  = (x+width, y+height)
        self.l_bot  = (x, y+height)

    def __iter__(self):
        yield self.l_top
        yield self.r_top
        yield self.r_bot
        yield self.l_bot

现在,我可以创建一个矩形并在其角上循环:

myrect=Rect(50, 50, 100, 100)
for corner in myrect:
    print(corner)

相反,__iter__您可以有一个方法iter_corners并用调用它for corner in myrect.iter_corners()__iter__从那时起,使用起来更加优雅,我们可以直接在for表达式中使用类实例名称。

A practical example where you could make use of a generator is if you have some kind of shape and you want to iterate over its corners, edges or whatever. For my own project (source code here) I had a rectangle:

class Rect():

    def __init__(self, x, y, width, height):
        self.l_top  = (x, y)
        self.r_top  = (x+width, y)
        self.r_bot  = (x+width, y+height)
        self.l_bot  = (x, y+height)

    def __iter__(self):
        yield self.l_top
        yield self.r_top
        yield self.r_bot
        yield self.l_bot

Now I can create a rectangle and loop over its corners:

myrect=Rect(50, 50, 100, 100)
for corner in myrect:
    print(corner)

Instead of __iter__ you could have a method iter_corners and call that with for corner in myrect.iter_corners(). It’s just more elegant to use __iter__ since then we can use the class instance name directly in the for expression.


回答 10

遍历输入保持状态时,基本上避免使用回调函数。

请参见此处此处以概述使用生成器可以完成的操作。

Basically avoiding call-back functions when iterating over input maintaining state.

See here and here for an overview of what can be done using generators.


回答 11

但是,这里有一些很好的答案,我还建议您完整阅读Python Functional Programming教程,该教程有助于解释一些更强大的生成器用例。

Some good answers here, however, I’d also recommend a complete read of the Python Functional Programming tutorial which helps explain some of the more potent use-cases of generators.


回答 12

由于未提及生成器的send方法,因此下面是一个示例:

def test():
    for i in xrange(5):
        val = yield
        print(val)

t = test()

# Proceed to 'yield' statement
next(t)

# Send value to yield
t.send(1)
t.send('2')
t.send([3])

它显示了将值发送到正在运行的生成器的可能性。以下视频中有关生成器的更高级的类(包括yield爆炸,并行处理的生成器,逃避递归限制等)

David Beazley在PyCon 2014上的生成器

Since the send method of a generator has not been mentioned, here is an example:

def test():
    for i in xrange(5):
        val = yield
        print(val)

t = test()

# Proceed to 'yield' statement
next(t)

# Send value to yield
t.send(1)
t.send('2')
t.send([3])

It shows the possibility to send a value to a running generator. A more advanced course on generators in the video below (including yield from explination, generators for parallel processing, escaping the recursion limit, etc.)

David Beazley on generators at PyCon 2014


回答 13

当我们的Web服务器充当代理时,我使用生成器:

  1. 客户端从服务器请求代理的URL
  2. 服务器开始加载目标网址
  3. 服务器屈服于将结果尽快返回给客户端

I use generators when our web server is acting as a proxy:

  1. The client requests a proxied url from the server
  2. The server begins to load the target url
  3. The server yields to return the results to the client as soon as it gets them

回答 14

成堆的东西。任何时候您想要生成一系列项目,但又不想一次将它们“物化”到一个列表中。例如,您可能有一个简单的生成器返回质数:

def primes():
    primes_found = set()
    primes_found.add(2)
    yield 2
    for i in itertools.count(1):
        candidate = i * 2 + 1
        if not all(candidate % prime for prime in primes_found):
            primes_found.add(candidate)
            yield candidate

然后,您可以使用它来生成后续素数的乘积:

def prime_products():
    primeiter = primes()
    prev = primeiter.next()
    for prime in primeiter:
        yield prime * prev
        prev = prime

这些是相当琐碎的示例,但是您可以看到它在不预先生成大型数据集(可能无限!)的情况下如何有用,这只是更明显的用途之一。

Piles of stuff. Any time you want to generate a sequence of items, but don’t want to have to ‘materialize’ them all into a list at once. For example, you could have a simple generator that returns prime numbers:

def primes():
    primes_found = set()
    primes_found.add(2)
    yield 2
    for i in itertools.count(1):
        candidate = i * 2 + 1
        if not all(candidate % prime for prime in primes_found):
            primes_found.add(candidate)
            yield candidate

You could then use that to generate the products of subsequent primes:

def prime_products():
    primeiter = primes()
    prev = primeiter.next()
    for prime in primeiter:
        yield prime * prev
        prev = prime

These are fairly trivial examples, but you can see how it can be useful for processing large (potentially infinite!) datasets without generating them in advance, which is only one of the more obvious uses.


回答 15

也适合打印最多n的质数:

def genprime(n=10):
    for num in range(3, n+1):
        for factor in range(2, num):
            if num%factor == 0:
                break
        else:
            yield(num)

for prime_num in genprime(100):
    print(prime_num)

Also good for printing the prime numbers up to n:

def genprime(n=10):
    for num in range(3, n+1):
        for factor in range(2, num):
            if num%factor == 0:
                break
        else:
            yield(num)

for prime_num in genprime(100):
    print(prime_num)

如何从生成器中仅选择一项(在python中)?

问题:如何从生成器中仅选择一项(在python中)?

我有一个类似下面的生成器函数:

def myfunct():
  ...
  yield result

调用此函数的常用方法是:

for r in myfunct():
  dostuff(r)

我的问题是,有什么方法可以随时从生成器中获取一个元素吗?例如,我想做类似的事情:

while True:
  ...
  if something:
      my_element = pick_just_one_element(myfunct())
      dostuff(my_element)
  ...

I have a generator function like the following:

def myfunct():
  ...
  yield result

The usual way to call this function would be:

for r in myfunct():
  dostuff(r)

My question, is there a way to get just one element from the generator whenever I like? For example, I’d like to do something like:

while True:
  ...
  if something:
      my_element = pick_just_one_element(myfunct())
      dostuff(my_element)
  ...

回答 0

使用创建一个生成器

g = myfunct()

每当您想要一个项目时,请使用

next(g)

(或g.next()在Python 2.5或更低版本中)。

如果生成器退出,它将升高StopIteration。您可以根据需要捕获此异常,也可以将default参数用于next()

next(g, default_value)

Create a generator using

g = myfunct()

Everytime you would like an item, use

next(g)

(or g.next() in Python 2.5 or below).

If the generator exits, it will raise StopIteration. You can either catch this exception if necessary, or use the default argument to next():

next(g, default_value)

回答 1

要仅选择生成器的一个元素,请breakfor语句中使用,或list(itertools.islice(gen, 1))

根据您的示例(从字面上看),您可以执行以下操作:

while True:
  ...
  if something:
      for my_element in myfunct():
          dostuff(my_element)
          break
      else:
          do_generator_empty()

如果您想“ 每当我喜欢的时候就从 [生成的] 生成器中仅获取一个元素 ”(我想是最初意图的50%,也是最常见的意图),那么:

gen = myfunct()
while True:
  ...
  if something:
      for my_element in gen:
          dostuff(my_element)
          break
      else:
          do_generator_empty()

这样generator.next()可以避免显式使用,并且输入结束处理不需要(神秘的)StopIteration异常处理或额外的默认值比较。

else:for,如果你想要做一些特别的结束产生的case语句段时,才需要。

注意上next()/ .next()

在Python3中,该.next()方法被重命名.__next__()为有充分的理由:它被认为是低级的(PEP 3114)。在Python 2.6之前,内置函数next()不存在。甚至讨论过迁移next()到该operator模块(这本来是明智的做法),因为它很少需要,并且内置名称的可疑膨胀。

next()没有默认值的情况下使用仍然是非常低级的实践- StopIteration在普通的应用程序代码中公开地将神秘的东西扔掉。而且使用next()默认的哨兵-最好是next()直接输入的唯一选择builtins-受限制,并且通常会给出奇怪的非Python逻辑/可读性的原因。

底线:很少使用next()-就像使用operator模块的功能一样。使用for x in iteratorislicelist(iterator)等功能接受一个迭代器无缝地使用是在应用层上的迭代器的自然方式-而且相当总是可能的。next()是低级的,一个额外的概念,很明显-正如该线程的问题所示。虽然例如,使用breakfor是常规的。

For picking just one element of a generator use break in a for statement, or list(itertools.islice(gen, 1))

According to your example (literally) you can do something like:

while True:
  ...
  if something:
      for my_element in myfunct():
          dostuff(my_element)
          break
      else:
          do_generator_empty()

If you want “get just one element from the [once generated] generator whenever I like” (I suppose 50% thats the original intention, and the most common intention) then:

gen = myfunct()
while True:
  ...
  if something:
      for my_element in gen:
          dostuff(my_element)
          break
      else:
          do_generator_empty()

This way explicit use of generator.next() can be avoided, and end-of-input handling doesn’t require (cryptic) StopIteration exception handling or extra default value comparisons.

The else: of for statement section is only needed if you want do something special in case of end-of-generator.

Note on next() / .next():

In Python3 the .next() method was renamed to .__next__() for good reason: its considered low-level (PEP 3114). Before Python 2.6 the builtin function next() did not exist. And it was even discussed to move next() to the operator module (which would have been wise), because of its rare need and questionable inflation of builtin names.

Using next() without default is still very low-level practice – throwing the cryptic StopIteration like a bolt out of the blue in normal application code openly. And using next() with default sentinel – which best should be the only option for a next() directly in builtins – is limited and often gives reason to odd non-pythonic logic/readablity.

Bottom line: Using next() should be very rare – like using functions of operator module. Using for x in iterator , islice, list(iterator) and other functions accepting an iterator seamlessly is the natural way of using iterators on application level – and quite always possible. next() is low-level, an extra concept, unobvious – as the question of this thread shows. While e.g. using break in for is conventional.


回答 2

我不认为有一种便捷的方法可以从生成器中检索任意值。生成器将提供next()方法来遍历自身,但是不会立即生成完整序列以节省内存。那就是生成器和列表之间的功能差异。

I don’t believe there’s a convenient way to retrieve an arbitrary value from a generator. The generator will provide a next() method to traverse itself, but the full sequence is not produced immediately to save memory. That’s the functional difference between a generator and a list.


回答 3

对于那些浏览这些答案的人来说,它们是Python3的完整工作示例…在这里,您可以继续:

def numgen():
    x = 1000
    while True:
        x += 1
        yield x

nums = numgen() # because it must be the _same_ generator

for n in range(3):
    numnext = next(nums)
    print(numnext)

输出:

1001
1002
1003

For those of you scanning through these answers for a complete working example for Python3… well here ya go:

def numgen():
    x = 1000
    while True:
        x += 1
        yield x

nums = numgen() # because it must be the _same_ generator

for n in range(3):
    numnext = next(nums)
    print(numnext)

This outputs:

1001
1002
1003

回答 4

Generator是产生迭代器的函数。因此,一旦有了迭代器实例,就可以使用next()从迭代器中获取下一项。例如,使用next()函数来获取第一个项目,然后for in用于处理剩余的项目:

# create new instance of iterator by calling a generator function
items = generator_function()

# fetch and print first item
first = next(items)
print('first item:', first)

# process remaining items:
for item in items:
    print('next item:', item)

Generator is a function that produces an iterator. Therefore, once you have iterator instance, use next() to fetch the next item from the iterator. As an example, use next() function to fetch the first item, and later use for in to process remaining items:

# create new instance of iterator by calling a generator function
items = generator_function()

# fetch and print first item
first = next(items)
print('first item:', first)

# process remaining items:
for item in items:
    print('next item:', item)

回答 5

generator = myfunct()
while True:
   my_element = generator.next()

确保捕获采用最后一个元素后引发的异常

generator = myfunct()
while True:
   my_element = generator.next()

make sure to catch the exception thrown after the last element is taken


回答 6

我相信唯一的方法是从迭代器中获取一个列表,然后从该列表中获取所需的元素。

l = list(myfunct())
l[4]

I believe the only way is to get a list from the iterator then get the element you want from that list.

l = list(myfunct())
l[4]

如何在Python中加入两个生成器?

问题:如何在Python中加入两个生成器?

我想更改以下代码

for directory, dirs, files in os.walk(directory_1):
    do_something()

for directory, dirs, files in os.walk(directory_2):
    do_something()

此代码:

for directory, dirs, files in os.walk(directory_1) + os.walk(directory_2):
    do_something()

我得到了错误:

+不支持的操作数类型:“ generator”和“ generator”

如何在Python中加入两个生成器?

I want to change the following code

for directory, dirs, files in os.walk(directory_1):
    do_something()

for directory, dirs, files in os.walk(directory_2):
    do_something()

to this code:

for directory, dirs, files in os.walk(directory_1) + os.walk(directory_2):
    do_something()

I get the error:

unsupported operand type(s) for +: ‘generator’ and ‘generator’

How to join two generators in Python?


回答 0

我认为itertools.chain()应该这样做。


回答 1

代码示例:

from itertools import chain

def generator1():
    for item in 'abcdef':
        yield item

def generator2():
    for item in '123456':
        yield item

generator3 = chain(generator1(), generator2())
for item in generator3:
    print item

A example of code:

from itertools import chain

def generator1():
    for item in 'abcdef':
        yield item

def generator2():
    for item in '123456':
        yield item

generator3 = chain(generator1(), generator2())
for item in generator3:
    print item

回答 2

在Python(3.5或更高版本)中,您可以执行以下操作:

def concat(a, b):
    yield from a
    yield from b

In Python (3.5 or greater) you can do:

def concat(a, b):
    yield from a
    yield from b

回答 3

简单的例子:

from itertools import chain
x = iter([1,2,3])      #Create Generator Object (listiterator)
y = iter([3,4,5])      #another one
result = chain(x, y)   #Chained x and y

Simple example:

from itertools import chain
x = iter([1,2,3])      #Create Generator Object (listiterator)
y = iter([3,4,5])      #another one
result = chain(x, y)   #Chained x and y

回答 4

使用itertools.chain.from_iterable,您可以执行以下操作:

def genny(start):
  for x in range(start, start+3):
    yield x

y = [1, 2]
ab = [o for o in itertools.chain.from_iterable(genny(x) for x in y)]
print(ab)

With itertools.chain.from_iterable you can do things like:

def genny(start):
  for x in range(start, start+3):
    yield x

y = [1, 2]
ab = [o for o in itertools.chain.from_iterable(genny(x) for x in y)]
print(ab)

回答 5

在这里,它使用带有s 的生成器表达式for

a = range(3)
b = range(5)
ab = (i for it in (a, b) for i in it)
assert list(ab) == [0, 1, 2, 0, 1, 2, 3, 4]

Here it is using a generator expression with nested fors:

a = range(3)
b = range(5)
ab = (i for it in (a, b) for i in it)
assert list(ab) == [0, 1, 2, 0, 1, 2, 3, 4]

回答 6

也可以使用unpack运算符*

concat = (*gen1(), *gen2())

注意:对于“非惰性”可迭代对象,工作效率最高。也可以与其他种类的理解一起使用。生成器concat的首选方法是来自@Uduse的答案

One can also use unpack operator *:

concat = (*gen1(), *gen2())

NOTE: Works most efficiently for ‘non-lazy’ iterables. Can also be used with different kind of comprehensions. Preferred way for generator concat would be from the answer from @Uduse


回答 7

如果要使生成器分开,但仍要同时遍历它们,则可以使用zip():

注意:迭代在两个生成器中的较短者处停止

例如:

for (root1, dir1, files1), (root2, dir2, files2) in zip(os.walk(path1), os.walk(path2)):

    for file in files1:
        #do something with first list of files

    for file in files2:
        #do something with second list of files

If you want to keep the generators separate but still iterate over them at the same time you can use zip():

NOTE: Iteration stops at the shorter of the two generators

For example:

for (root1, dir1, files1), (root2, dir2, files2) in zip(os.walk(path1), os.walk(path2)):

    for file in files1:
        #do something with first list of files

    for file in files2:
        #do something with second list of files

回答 8

假设我们必须使用生成器(gen1和gen 2),并且我们想要执行一些额外的计算,这需要两者的结果。我们可以通过map方法返回这种函数/计算的结果,而map方法又返回我们可以循环使用的生成器。

在这种情况下,需要通过lambda函数来实现函数/计算。棘手的部分是我们打算在地图及其lambda函数中进行的操作。

建议解决方案的一般形式:

def function(gen1,gen2):
        for item in map(lambda x, y: do_somethin(x,y), gen1, gen2):
            yield item

Lets say that we have to generators (gen1 and gen 2) and we want to perform some extra calculation that requires the outcome of both. We can return the outcome of such function/calculation through the map method, which in turn returns a generator that we can loop upon.

In this scenario, the function/calculation needs to be implemented via the lambda function. The tricky part is what we aim to do inside the map and its lambda function.

General form of proposed solution:

def function(gen1,gen2):
        for item in map(lambda x, y: do_somethin(x,y), gen1, gen2):
            yield item

回答 9

所有那些复杂的解决方案…

做就是了:

for dir in director_1, directory_2:
    for directory, dirs, files in os.walk(dir):
        do_something()

如果您确实要“加入”两个生成器,请执行以下操作:

for directory, dirs, files in 
        [x for osw in [os.walk(director_1), os.walk(director_2)] 
               for x in osw]:
    do_something()

All those complicated solutions…

just do:

for dir in directory_1, directory_2:
    for directory, dirs, files in os.walk(dir):
        do_something()

If you really want to “join” both generators, then do :

for directory, dirs, files in [
        x for osw in [os.walk(directory_1), os.walk(directory_2)] 
               for x in osw
        ]:
    do_something()

在Python中读取大文件的惰性方法?

问题:在Python中读取大文件的惰性方法?

我有一个很大的文件4GB,当我尝试读取它时,计算机挂起了。因此,我想逐个读取它,并且在处理完每个块之后,将处理后的块存储到另一个文件中并读取下一个块。

yield这些零件有什么方法吗?

我很想有一个懒惰的方法

I have a very big file 4GB and when I try to read it my computer hangs. So I want to read it piece by piece and after processing each piece store the processed piece into another file and read next piece.

Is there any method to yield these pieces ?

I would love to have a lazy method.


回答 0

要编写一个惰性函数,只需使用yield

def read_in_chunks(file_object, chunk_size=1024):
    """Lazy function (generator) to read a file piece by piece.
    Default chunk size: 1k."""
    while True:
        data = file_object.read(chunk_size)
        if not data:
            break
        yield data


with open('really_big_file.dat') as f:
    for piece in read_in_chunks(f):
        process_data(piece)

另一个选择是使用iter和辅助功能:

f = open('really_big_file.dat')
def read1k():
    return f.read(1024)

for piece in iter(read1k, ''):
    process_data(piece)

如果文件是基于行的,则文件对象已经是行的惰性生成器:

for line in open('really_big_file.dat'):
    process_data(line)

To write a lazy function, just use yield:

def read_in_chunks(file_object, chunk_size=1024):
    """Lazy function (generator) to read a file piece by piece.
    Default chunk size: 1k."""
    while True:
        data = file_object.read(chunk_size)
        if not data:
            break
        yield data


with open('really_big_file.dat') as f:
    for piece in read_in_chunks(f):
        process_data(piece)

Another option would be to use iter and a helper function:

f = open('really_big_file.dat')
def read1k():
    return f.read(1024)

for piece in iter(read1k, ''):
    process_data(piece)

If the file is line-based, the file object is already a lazy generator of lines:

for line in open('really_big_file.dat'):
    process_data(line)

回答 1

如果您的计算机,操作系统和python是64位的,则可以使用mmap模块将文件的内容映射到内存中,并使用索引和切片对其进行访问。以下是文档中的示例:

import mmap
with open("hello.txt", "r+") as f:
    # memory-map the file, size 0 means whole file
    map = mmap.mmap(f.fileno(), 0)
    # read content via standard file methods
    print map.readline()  # prints "Hello Python!"
    # read content via slice notation
    print map[:5]  # prints "Hello"
    # update content using slice notation;
    # note that new content must have same size
    map[6:] = " world!\n"
    # ... and read again using standard file methods
    map.seek(0)
    print map.readline()  # prints "Hello  world!"
    # close the map
    map.close()

如果您的计算机,操作系统或python是32位的,则映射大型文件可能会保留地址空间的大部分,并使内存程序饿死

If your computer, OS and python are 64-bit, then you can use the mmap module to map the contents of the file into memory and access it with indices and slices. Here an example from the documentation:

import mmap
with open("hello.txt", "r+") as f:
    # memory-map the file, size 0 means whole file
    map = mmap.mmap(f.fileno(), 0)
    # read content via standard file methods
    print map.readline()  # prints "Hello Python!"
    # read content via slice notation
    print map[:5]  # prints "Hello"
    # update content using slice notation;
    # note that new content must have same size
    map[6:] = " world!\n"
    # ... and read again using standard file methods
    map.seek(0)
    print map.readline()  # prints "Hello  world!"
    # close the map
    map.close()

If either your computer, OS or python are 32-bit, then mmap-ing large files can reserve large parts of your address space and starve your program of memory.


回答 2

file.readlines() 接受一个可选的size参数,该参数近似返回的行中读取的行数。

bigfile = open('bigfilename','r')
tmp_lines = bigfile.readlines(BUF_SIZE)
while tmp_lines:
    process([line for line in tmp_lines])
    tmp_lines = bigfile.readlines(BUF_SIZE)

file.readlines() takes in an optional size argument which approximates the number of lines read in the lines returned.

bigfile = open('bigfilename','r')
tmp_lines = bigfile.readlines(BUF_SIZE)
while tmp_lines:
    process([line for line in tmp_lines])
    tmp_lines = bigfile.readlines(BUF_SIZE)

回答 3

已经有很多不错的答案,但是如果您的整个文件都在一行上,并且您仍要处理“行”(与固定大小的块相对),那么这些答案将无济于事。

99%的时间,可以逐行处理文件。然后,按照此答案的建议,您可以将文件对象本身用作延迟生成器:

with open('big.csv') as f:
    for line in f:
        process(line)

然而,有一次我遇到了一个非常非常大的(几乎)单行文件,其中的行分隔符实际上没有'\n',但是'|'

  • 不能逐行读取,但是我仍然需要逐行处理它。
  • 转换'|''\n'处理前也是不可能的,因为此csv的某些字段包含'\n'(自由文本用户输入)。
  • 还排除了使用csv库的原因,因为至少在lib的早期版本中,已对其进行了硬编码以逐行读取输入

对于这种情况,我创建了以下代码段:

def rows(f, chunksize=1024, sep='|'):
    """
    Read a file where the row separator is '|' lazily.

    Usage:

    >>> with open('big.csv') as f:
    >>>     for r in rows(f):
    >>>         process(row)
    """
    curr_row = ''
    while True:
        chunk = f.read(chunksize)
        if chunk == '': # End of file
            yield curr_row
            break
        while True:
            i = chunk.find(sep)
            if i == -1:
                break
            yield curr_row + chunk[:i]
            curr_row = ''
            chunk = chunk[i+1:]
        curr_row += chunk

我能够成功使用它来解决我的问题。它已通过各种块大小的广泛测试。


测试套件,适合那些想要说服自己的人。

test_file = 'test_file'

def cleanup(func):
    def wrapper(*args, **kwargs):
        func(*args, **kwargs)
        os.unlink(test_file)
    return wrapper

@cleanup
def test_empty(chunksize=1024):
    with open(test_file, 'w') as f:
        f.write('')
    with open(test_file) as f:
        assert len(list(rows(f, chunksize=chunksize))) == 1

@cleanup
def test_1_char_2_rows(chunksize=1024):
    with open(test_file, 'w') as f:
        f.write('|')
    with open(test_file) as f:
        assert len(list(rows(f, chunksize=chunksize))) == 2

@cleanup
def test_1_char(chunksize=1024):
    with open(test_file, 'w') as f:
        f.write('a')
    with open(test_file) as f:
        assert len(list(rows(f, chunksize=chunksize))) == 1

@cleanup
def test_1025_chars_1_row(chunksize=1024):
    with open(test_file, 'w') as f:
        for i in range(1025):
            f.write('a')
    with open(test_file) as f:
        assert len(list(rows(f, chunksize=chunksize))) == 1

@cleanup
def test_1024_chars_2_rows(chunksize=1024):
    with open(test_file, 'w') as f:
        for i in range(1023):
            f.write('a')
        f.write('|')
    with open(test_file) as f:
        assert len(list(rows(f, chunksize=chunksize))) == 2

@cleanup
def test_1025_chars_1026_rows(chunksize=1024):
    with open(test_file, 'w') as f:
        for i in range(1025):
            f.write('|')
    with open(test_file) as f:
        assert len(list(rows(f, chunksize=chunksize))) == 1026

@cleanup
def test_2048_chars_2_rows(chunksize=1024):
    with open(test_file, 'w') as f:
        for i in range(1022):
            f.write('a')
        f.write('|')
        f.write('a')
        # -- end of 1st chunk --
        for i in range(1024):
            f.write('a')
        # -- end of 2nd chunk
    with open(test_file) as f:
        assert len(list(rows(f, chunksize=chunksize))) == 2

@cleanup
def test_2049_chars_2_rows(chunksize=1024):
    with open(test_file, 'w') as f:
        for i in range(1022):
            f.write('a')
        f.write('|')
        f.write('a')
        # -- end of 1st chunk --
        for i in range(1024):
            f.write('a')
        # -- end of 2nd chunk
        f.write('a')
    with open(test_file) as f:
        assert len(list(rows(f, chunksize=chunksize))) == 2

if __name__ == '__main__':
    for chunksize in [1, 2, 4, 8, 16, 32, 64, 128, 256, 512, 1024]:
        test_empty(chunksize)
        test_1_char_2_rows(chunksize)
        test_1_char(chunksize)
        test_1025_chars_1_row(chunksize)
        test_1024_chars_2_rows(chunksize)
        test_1025_chars_1026_rows(chunksize)
        test_2048_chars_2_rows(chunksize)
        test_2049_chars_2_rows(chunksize)

There are already many good answers, but if your entire file is on a single line and you still want to process “rows” (as opposed to fixed-size blocks), these answers will not help you.

99% of the time, it is possible to process files line by line. Then, as suggested in this answer, you can to use the file object itself as lazy generator:

with open('big.csv') as f:
    for line in f:
        process(line)

However, I once ran into a very very big (almost) single line file, where the row separator was in fact not '\n' but '|'.

  • Reading line by line was not an option, but I still needed to process it row by row.
  • Converting'|' to '\n' before processing was also out of the question, because some of the fields of this csv contained '\n' (free text user input).
  • Using the csv library was also ruled out because the fact that, at least in early versions of the lib, it is hardcoded to read the input line by line.

For these kind of situations, I created the following snippet:

def rows(f, chunksize=1024, sep='|'):
    """
    Read a file where the row separator is '|' lazily.

    Usage:

    >>> with open('big.csv') as f:
    >>>     for r in rows(f):
    >>>         process(row)
    """
    curr_row = ''
    while True:
        chunk = f.read(chunksize)
        if chunk == '': # End of file
            yield curr_row
            break
        while True:
            i = chunk.find(sep)
            if i == -1:
                break
            yield curr_row + chunk[:i]
            curr_row = ''
            chunk = chunk[i+1:]
        curr_row += chunk

I was able to use it successfully to solve my problem. It has been extensively tested, with various chunk sizes.


Test suite, for those who want to convince themselves.

test_file = 'test_file'

def cleanup(func):
    def wrapper(*args, **kwargs):
        func(*args, **kwargs)
        os.unlink(test_file)
    return wrapper

@cleanup
def test_empty(chunksize=1024):
    with open(test_file, 'w') as f:
        f.write('')
    with open(test_file) as f:
        assert len(list(rows(f, chunksize=chunksize))) == 1

@cleanup
def test_1_char_2_rows(chunksize=1024):
    with open(test_file, 'w') as f:
        f.write('|')
    with open(test_file) as f:
        assert len(list(rows(f, chunksize=chunksize))) == 2

@cleanup
def test_1_char(chunksize=1024):
    with open(test_file, 'w') as f:
        f.write('a')
    with open(test_file) as f:
        assert len(list(rows(f, chunksize=chunksize))) == 1

@cleanup
def test_1025_chars_1_row(chunksize=1024):
    with open(test_file, 'w') as f:
        for i in range(1025):
            f.write('a')
    with open(test_file) as f:
        assert len(list(rows(f, chunksize=chunksize))) == 1

@cleanup
def test_1024_chars_2_rows(chunksize=1024):
    with open(test_file, 'w') as f:
        for i in range(1023):
            f.write('a')
        f.write('|')
    with open(test_file) as f:
        assert len(list(rows(f, chunksize=chunksize))) == 2

@cleanup
def test_1025_chars_1026_rows(chunksize=1024):
    with open(test_file, 'w') as f:
        for i in range(1025):
            f.write('|')
    with open(test_file) as f:
        assert len(list(rows(f, chunksize=chunksize))) == 1026

@cleanup
def test_2048_chars_2_rows(chunksize=1024):
    with open(test_file, 'w') as f:
        for i in range(1022):
            f.write('a')
        f.write('|')
        f.write('a')
        # -- end of 1st chunk --
        for i in range(1024):
            f.write('a')
        # -- end of 2nd chunk
    with open(test_file) as f:
        assert len(list(rows(f, chunksize=chunksize))) == 2

@cleanup
def test_2049_chars_2_rows(chunksize=1024):
    with open(test_file, 'w') as f:
        for i in range(1022):
            f.write('a')
        f.write('|')
        f.write('a')
        # -- end of 1st chunk --
        for i in range(1024):
            f.write('a')
        # -- end of 2nd chunk
        f.write('a')
    with open(test_file) as f:
        assert len(list(rows(f, chunksize=chunksize))) == 2

if __name__ == '__main__':
    for chunksize in [1, 2, 4, 8, 16, 32, 64, 128, 256, 512, 1024]:
        test_empty(chunksize)
        test_1_char_2_rows(chunksize)
        test_1_char(chunksize)
        test_1025_chars_1_row(chunksize)
        test_1024_chars_2_rows(chunksize)
        test_1025_chars_1026_rows(chunksize)
        test_2048_chars_2_rows(chunksize)
        test_2049_chars_2_rows(chunksize)

回答 4

f = ... # file-like object, i.e. supporting read(size) function and 
        # returning empty string '' when there is nothing to read

def chunked(file, chunk_size):
    return iter(lambda: file.read(chunk_size), '')

for data in chunked(f, 65536):
    # process the data

更新:最好在https://stackoverflow.com/a/4566523/38592中解释该方法

f = ... # file-like object, i.e. supporting read(size) function and 
        # returning empty string '' when there is nothing to read

def chunked(file, chunk_size):
    return iter(lambda: file.read(chunk_size), '')

for data in chunked(f, 65536):
    # process the data

UPDATE: The approach is best explained in https://stackoverflow.com/a/4566523/38592


回答 5

请参阅python的官方文档 https://docs.python.org/zh-cn/3/library/functions.html?#iter

也许这种方法更pythonic:

from functools import partial

"""A file object returned by open() is a iterator with
read method which could specify current read's block size"""
with open('mydata.db', 'r') as f_in:

    part_read = partial(f_in.read, 1024*1024)
    iterator = iter(part_read, b'')

    for index, block in enumerate(iterator, start=1):
        block = process_block(block)    # process block data
        with open(f'{index}.txt', 'w') as f_out:
            f_out.write(block)

Refer to python’s official documentation https://docs.python.org/3/library/functions.html#iter

Maybe this method is more pythonic:

from functools import partial

"""A file object returned by open() is a iterator with
read method which could specify current read's block size"""
with open('mydata.db', 'r') as f_in:

    part_read = partial(f_in.read, 1024*1024)
    iterator = iter(part_read, b'')

    for index, block in enumerate(iterator, start=1):
        block = process_block(block)    # process your block data
        
        with open(f'{index}.txt', 'w') as f_out:
            f_out.write(block)

回答 6

我认为我们可以这样写:

def read_file(path, block_size=1024): 
    with open(path, 'rb') as f: 
        while True: 
            piece = f.read(block_size) 
            if piece: 
                yield piece 
            else: 
                return

for piece in read_file(path):
    process_piece(piece)

I think we can write like this:

def read_file(path, block_size=1024): 
    with open(path, 'rb') as f: 
        while True: 
            piece = f.read(block_size) 
            if piece: 
                yield piece 
            else: 
                return

for piece in read_file(path):
    process_piece(piece)

回答 7

由于声誉低下,我不允许发表评论,但是SilentGhosts解决方案应该可以通过file.readlines([sizehint])轻松得多

python文件方法

编辑:SilentGhost是正确的,但这应该比:

s = "" 
for i in xrange(100): 
   s += file.next()

i am not allowed to comment due to my low reputation, but SilentGhosts solution should be much easier with file.readlines([sizehint])

python file methods

edit: SilentGhost is right, but this should be better than:

s = "" 
for i in xrange(100): 
   s += file.next()

回答 8

我处于类似情况。目前尚不清楚您是否知道块大小(以字节为单位)。我通常不知道,但是所需的记录(行)数是已知的:

def get_line():
     with open('4gb_file') as file:
         for i in file:
             yield i

lines_required = 100
gen = get_line()
chunk = [i for i, j in zip(gen, range(lines_required))]

更新:谢谢nosklo。这就是我的意思。它几乎起作用了,只是它丢失了块之间的一行。

chunk = [next(gen) for i in range(lines_required)]

技巧不丢失任何行,但看起来不是很好。

I’m in a somewhat similar situation. It’s not clear whether you know chunk size in bytes; I usually don’t, but the number of records (lines) that is required is known:

def get_line():
     with open('4gb_file') as file:
         for i in file:
             yield i

lines_required = 100
gen = get_line()
chunk = [i for i, j in zip(gen, range(lines_required))]

Update: Thanks nosklo. Here’s what I meant. It almost works, except that it loses a line ‘between’ chunks.

chunk = [next(gen) for i in range(lines_required)]

Does the trick w/o losing any lines, but it doesn’t look very nice.


回答 9

要逐行处理,这是一个很好的解决方案:

  def stream_lines(file_name):
    file = open(file_name)
    while True:
      line = file.readline()
      if not line:
        file.close()
        break
      yield line

只要没有空行。

To process line by line, this is an elegant solution:

  def stream_lines(file_name):
    file = open(file_name)
    while True:
      line = file.readline()
      if not line:
        file.close()
        break
      yield line

As long as there’re no blank lines.


回答 10

您可以使用以下代码。

file_obj = open('big_file') 

open()返回一个文件对象

然后使用os.stat获取大小

file_size = os.stat('big_file').st_size

for i in range( file_size/1024):
    print file_obj.read(1024)

you can use following code.

file_obj = open('big_file') 

open() returns a file object

then use os.stat for getting size

file_size = os.stat('big_file').st_size

for i in range( file_size/1024):
    print file_obj.read(1024)

Python的生成器和迭代器之间的区别

问题:Python的生成器和迭代器之间的区别

迭代器和生成器有什么区别?有关何时使用每种情况的一些示例会有所帮助。

What is the difference between iterators and generators? Some examples for when you would use each case would be helpful.


回答 0

iterator是一个更笼统的概念:其类具有next方法(__next__在Python 3中)和具有__iter__方法的任何对象return self

每个生成器都是一个迭代器,但反之亦然。生成器是通过调用具有一个或多个yield表达式(yield在Python 2.5及更早版本中为语句)的函数而构建的,并且该函数是满足上一段对的定义的对象iterator

当您需要一个具有某种复杂状态维护行为的类,或者想要公开除next(和__iter____init__)之外的其他方法时,您可能想使用自定义迭代器,而不是生成器。通常,一个生成器(有时,对于足够简单的需求,一个生成器表达式)就足够了,并且它更容易编写代码,因为状态维护(在合理范围内)基本上是由挂起和恢复帧“为您完成的”。

例如,一个生成器,例如:

def squares(start, stop):
    for i in range(start, stop):
        yield i * i

generator = squares(a, b)

或等效的生成器表达式(genexp)

generator = (i*i for i in range(a, b))

将需要更多代码来构建为自定义迭代器:

class Squares(object):
    def __init__(self, start, stop):
       self.start = start
       self.stop = stop
    def __iter__(self): return self
    def next(self): # __next__ in Python 3
       if self.start >= self.stop:
           raise StopIteration
       current = self.start * self.start
       self.start += 1
       return current

iterator = Squares(a, b)

但是,当然,有了类,Squares您可以轻松地提供其他方法,即

    def current(self):
       return self.start

如果您在应用程序中实际需要这种额外功能。

iterator is a more general concept: any object whose class has a next method (__next__ in Python 3) and an __iter__ method that does return self.

Every generator is an iterator, but not vice versa. A generator is built by calling a function that has one or more yield expressions (yield statements, in Python 2.5 and earlier), and is an object that meets the previous paragraph’s definition of an iterator.

You may want to use a custom iterator, rather than a generator, when you need a class with somewhat complex state-maintaining behavior, or want to expose other methods besides next (and __iter__ and __init__). Most often, a generator (sometimes, for sufficiently simple needs, a generator expression) is sufficient, and it’s simpler to code because state maintenance (within reasonable limits) is basically “done for you” by the frame getting suspended and resumed.

For example, a generator such as:

def squares(start, stop):
    for i in range(start, stop):
        yield i * i

generator = squares(a, b)

or the equivalent generator expression (genexp)

generator = (i*i for i in range(a, b))

would take more code to build as a custom iterator:

class Squares(object):
    def __init__(self, start, stop):
       self.start = start
       self.stop = stop
    def __iter__(self): return self
    def next(self): # __next__ in Python 3
       if self.start >= self.stop:
           raise StopIteration
       current = self.start * self.start
       self.start += 1
       return current

iterator = Squares(a, b)

But, of course, with class Squares you could easily offer extra methods, i.e.

    def current(self):
       return self.start

if you have any actual need for such extra functionality in your application.


回答 1

迭代器和生成器有什么区别?有关何时使用每种情况的一些示例会有所帮助。

总结:迭代器是具有__iter__和方法的对象__next__next在Python 2中)。生成器提供了一种简单的内置方法来创建Iterator的实例。

包含yield的函数仍然是一个函数,在调用该函数时,它会返回生成器对象的实例:

def a_function():
    "when called, returns generator object"
    yield

生成器表达式还返回生成器:

a_generator = (i for i in range(0))

有关更深入的说明和示例,请继续阅读。

生成器迭代器

具体来说,生成器是迭代器的子类型。

>>> import collections, types
>>> issubclass(types.GeneratorType, collections.Iterator)
True

我们可以通过几种方式创建生成器。一种非常普遍且简单的方法是使用函数。

具体来说,其中包含yield的函数是一个函数,在调用该函数时会返回生成器:

>>> def a_function():
        "just a function definition with yield in it"
        yield
>>> type(a_function)
<class 'function'>
>>> a_generator = a_function()  # when called
>>> type(a_generator)           # returns a generator
<class 'generator'>

同样,生成器是迭代器:

>>> isinstance(a_generator, collections.Iterator)
True

迭代器可迭代的

迭代器是可迭代的

>>> issubclass(collections.Iterator, collections.Iterable)
True

这需要一个__iter__返回Iterator 的方法:

>>> collections.Iterable()
Traceback (most recent call last):
  File "<pyshell#79>", line 1, in <module>
    collections.Iterable()
TypeError: Can't instantiate abstract class Iterable with abstract methods __iter__

内置元组,列表,字典,集合,冻结集合,字符串,字节字符串,字节数组,范围和内存视图是可迭代对象的一些示例:

>>> all(isinstance(element, collections.Iterable) for element in (
        (), [], {}, set(), frozenset(), '', b'', bytearray(), range(0), memoryview(b'')))
True

迭代器需要一个next__next__方法

在Python 2中:

>>> collections.Iterator()
Traceback (most recent call last):
  File "<pyshell#80>", line 1, in <module>
    collections.Iterator()
TypeError: Can't instantiate abstract class Iterator with abstract methods next

在Python 3中:

>>> collections.Iterator()
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: Can't instantiate abstract class Iterator with abstract methods __next__

我们可以使用以下函数从内置对象(或自定义对象)中获取迭代器iter

>>> all(isinstance(iter(element), collections.Iterator) for element in (
        (), [], {}, set(), frozenset(), '', b'', bytearray(), range(0), memoryview(b'')))
True

__iter__当您尝试将对象与for循环一起使用时,将调用该方法。然后__next__,在迭代器对象上调用该方法以获取循环中的每个项目。StopIteration耗尽后,迭代器会上升,并且此时无法重用。

从文档中

在“内置类型” 文档的“迭代器类型”部分的“生成器类型”部分中:

Python的生成器提供了一种实现迭代器协议的便捷方法。如果容器对象的__iter__()方法作为生成器实现的,它会自动返回一个迭代器对象(在技术上,一个生成器对象)供给__iter__()next()[ __next__()在Python 3]的方法。有关生成器的更多信息,可以在yield表达式的文档中找到。

(已添加重点。)

因此,我们从中了解到生成器是(便捷的)迭代器类型。

示例迭代器对象

您可以通过创建或扩展自己的对象来创建实现Iterator协议的对象。

class Yes(collections.Iterator):

    def __init__(self, stop):
        self.x = 0
        self.stop = stop

    def __iter__(self):
        return self

    def next(self):
        if self.x < self.stop:
            self.x += 1
            return 'yes'
        else:
            # Iterators must raise when done, else considered broken
            raise StopIteration

    __next__ = next # Python 3 compatibility

但是,使用Generator来执行此操作会更容易:

def yes(stop):
    for _ in range(stop):
        yield 'yes'

也许更简单一些,生成器表达式(类似于列表推导):

yes_expr = ('yes' for _ in range(stop))

它们都可以以相同的方式使用:

>>> stop = 4             
>>> for i, y1, y2, y3 in zip(range(stop), Yes(stop), yes(stop), 
                             ('yes' for _ in range(stop))):
...     print('{0}: {1} == {2} == {3}'.format(i, y1, y2, y3))
...     
0: yes == yes == yes
1: yes == yes == yes
2: yes == yes == yes
3: yes == yes == yes

结论

当需要将Python对象扩展为可以迭代的对象时,可以直接使用Iterator协议。

但是,在大多数情况下,最适合yield用于定义返回生成器迭代器或考虑生成器表达式的函数。

最后,请注意,生成器提供了更多的协同程序功能。我将yield在有关“ yield”关键字的作用?”的回答中深入解释Generators和该语句。

What is the difference between iterators and generators? Some examples for when you would use each case would be helpful.

In summary: Iterators are objects that have an __iter__ and a __next__ (next in Python 2) method. Generators provide an easy, built-in way to create instances of Iterators.

A function with yield in it is still a function, that, when called, returns an instance of a generator object:

def a_function():
    "when called, returns generator object"
    yield

A generator expression also returns a generator:

a_generator = (i for i in range(0))

For a more in-depth exposition and examples, keep reading.

A Generator is an Iterator

Specifically, generator is a subtype of iterator.

>>> import collections, types
>>> issubclass(types.GeneratorType, collections.Iterator)
True

We can create a generator several ways. A very common and simple way to do so is with a function.

Specifically, a function with yield in it is a function, that, when called, returns a generator:

>>> def a_function():
        "just a function definition with yield in it"
        yield
>>> type(a_function)
<class 'function'>
>>> a_generator = a_function()  # when called
>>> type(a_generator)           # returns a generator
<class 'generator'>

And a generator, again, is an Iterator:

>>> isinstance(a_generator, collections.Iterator)
True

An Iterator is an Iterable

An Iterator is an Iterable,

>>> issubclass(collections.Iterator, collections.Iterable)
True

which requires an __iter__ method that returns an Iterator:

>>> collections.Iterable()
Traceback (most recent call last):
  File "<pyshell#79>", line 1, in <module>
    collections.Iterable()
TypeError: Can't instantiate abstract class Iterable with abstract methods __iter__

Some examples of iterables are the built-in tuples, lists, dictionaries, sets, frozen sets, strings, byte strings, byte arrays, ranges and memoryviews:

>>> all(isinstance(element, collections.Iterable) for element in (
        (), [], {}, set(), frozenset(), '', b'', bytearray(), range(0), memoryview(b'')))
True

Iterators require a next or __next__ method

In Python 2:

>>> collections.Iterator()
Traceback (most recent call last):
  File "<pyshell#80>", line 1, in <module>
    collections.Iterator()
TypeError: Can't instantiate abstract class Iterator with abstract methods next

And in Python 3:

>>> collections.Iterator()
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: Can't instantiate abstract class Iterator with abstract methods __next__

We can get the iterators from the built-in objects (or custom objects) with the iter function:

>>> all(isinstance(iter(element), collections.Iterator) for element in (
        (), [], {}, set(), frozenset(), '', b'', bytearray(), range(0), memoryview(b'')))
True

The __iter__ method is called when you attempt to use an object with a for-loop. Then the __next__ method is called on the iterator object to get each item out for the loop. The iterator raises StopIteration when you have exhausted it, and it cannot be reused at that point.

From the documentation

From the Generator Types section of the Iterator Types section of the Built-in Types documentation:

Python’s generators provide a convenient way to implement the iterator protocol. If a container object’s __iter__() method is implemented as a generator, it will automatically return an iterator object (technically, a generator object) supplying the __iter__() and next() [__next__() in Python 3] methods. More information about generators can be found in the documentation for the yield expression.

(Emphasis added.)

So from this we learn that Generators are a (convenient) type of Iterator.

Example Iterator Objects

You might create object that implements the Iterator protocol by creating or extending your own object.

class Yes(collections.Iterator):

    def __init__(self, stop):
        self.x = 0
        self.stop = stop

    def __iter__(self):
        return self

    def next(self):
        if self.x < self.stop:
            self.x += 1
            return 'yes'
        else:
            # Iterators must raise when done, else considered broken
            raise StopIteration

    __next__ = next # Python 3 compatibility

But it’s easier to simply use a Generator to do this:

def yes(stop):
    for _ in range(stop):
        yield 'yes'

Or perhaps simpler, a Generator Expression (works similarly to list comprehensions):

yes_expr = ('yes' for _ in range(stop))

They can all be used in the same way:

>>> stop = 4             
>>> for i, y1, y2, y3 in zip(range(stop), Yes(stop), yes(stop), 
                             ('yes' for _ in range(stop))):
...     print('{0}: {1} == {2} == {3}'.format(i, y1, y2, y3))
...     
0: yes == yes == yes
1: yes == yes == yes
2: yes == yes == yes
3: yes == yes == yes

Conclusion

You can use the Iterator protocol directly when you need to extend a Python object as an object that can be iterated over.

However, in the vast majority of cases, you are best suited to use yield to define a function that returns a Generator Iterator or consider Generator Expressions.

Finally, note that generators provide even more functionality as coroutines. I explain Generators, along with the yield statement, in depth on my answer to “What does the “yield” keyword do?”.


回答 2

迭代器:

迭代器是使用next()方法获取序列的下一个值的对象。

生成器:

生成器是一种使用yield方法生成或产生值序列的函数。

生成器函数(如以下示例中的ex:函数)返回的生成next()器对象(如ex f中的示例)的每个方法调用都将按foo()顺序生成下一个值。

调用生成器函数时,它甚至不开始执行函数就返回生成器对象。当next()方法被称为首次,函数开始执行,直到它到达它返回产生值yield语句。收益跟踪(即记住上一次执行)。第二个next()调用从先前的值继续。

下面的示例演示了yield和生成器对象上的next方法的调用之间的相互作用。

>>> def foo():
...     print "begin"
...     for i in range(3):
...         print "before yield", i
...         yield i
...         print "after yield", i
...     print "end"
...
>>> f = foo()
>>> f.next()
begin
before yield 0            # Control is in for loop
0
>>> f.next()
after yield 0             
before yield 1            # Continue for loop
1
>>> f.next()
after yield 1
before yield 2
2
>>> f.next()
after yield 2
end
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
StopIteration
>>>

Iterators:

Iterator are objects which uses next() method to get next value of sequence.

Generators:

A generator is a function that produces or yields a sequence of values using yield method.

Every next() method call on generator object(for ex: f as in below example) returned by generator function(for ex: foo() function in below example), generates next value in sequence.

When a generator function is called, it returns an generator object without even beginning execution of the function. When next() method is called for the first time, the function starts executing until it reaches yield statement which returns the yielded value. The yield keeps track of i.e. remembers last execution. And second next() call continues from previous value.

The following example demonstrates the interplay between yield and call to next method on generator object.

>>> def foo():
...     print "begin"
...     for i in range(3):
...         print "before yield", i
...         yield i
...         print "after yield", i
...     print "end"
...
>>> f = foo()
>>> f.next()
begin
before yield 0            # Control is in for loop
0
>>> f.next()
after yield 0             
before yield 1            # Continue for loop
1
>>> f.next()
after yield 1
before yield 2
2
>>> f.next()
after yield 2
end
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
StopIteration
>>>

回答 3

添加答案是因为现有答案中没有一个专门解决官方文献中的混乱。

生成器函数是使用yield代替定义的普通函数return。调用时,生成器函数将返回一个生成器对象,它是一种迭代器-它具有一个next()方法。当您调用时next(),将返回生成器函数产生的下一个值。

函数或对象都可以称为“生成器”,具体取决于您阅读的是哪个Python源文档。在Python的词汇说发生器功能,而Python的维基意味着生成器对象。在Python的教程非常设法暗示用三句话的空间用法:

生成器是用于创建迭代器的简单而强大的工具。它们的编写方式与常规函数类似,但是只要要返回数据就使用yield语句。每次在其上调用next()时,生成器都会从上次中断的地方继续(它会记住所有数据值以及最后执行的语句)。

前两个句子用生成器函数标识生成器,而第三句话用生成器对象标识它们。

尽管存在所有这些困惑,但您仍然可以找到Python语言参考来获得清晰明确的词:

yield表达式仅在定义生成器函数时使用,并且只能在函数定义的主体中使用。在函数定义中使用yield表达式足以使该定义创建一个生成器函数,而不是普通函数。

调用生成器函数时,它将返回称为生成器的迭代器。然后,该生成器控制生成器功能的执行。

因此,在正式和精确的用法中,“生成器”不合格表示生成器对象,而不是生成器功能。

上面的参考是针对Python 2的,但是Python 3语言参考却说了同样的话。但是,Python 3词汇表指出

generator …通常是指生成器函数,但在某些情况下可能是指生成器迭代器。在预期含义不明确的情况下,使用完整术语可以避免歧义。

Adding an answer because none of the existing answers specifically address the confusion in the official literature.

Generator functions are ordinary functions defined using yield instead of return. When called, a generator function returns a generator object, which is a kind of iterator – it has a next() method. When you call next(), the next value yielded by the generator function is returned.

Either the function or the object may be called the “generator” depending on which Python source document you read. The Python glossary says generator functions, while the Python wiki implies generator objects. The Python tutorial remarkably manages to imply both usages in the space of three sentences:

Generators are a simple and powerful tool for creating iterators. They are written like regular functions but use the yield statement whenever they want to return data. Each time next() is called on it, the generator resumes where it left off (it remembers all the data values and which statement was last executed).

The first two sentences identify generators with generator functions, while the third sentence identifies them with generator objects.

Despite all this confusion, one can seek out the Python language reference for the clear and final word:

The yield expression is only used when defining a generator function, and can only be used in the body of a function definition. Using a yield expression in a function definition is sufficient to cause that definition to create a generator function instead of a normal function.

When a generator function is called, it returns an iterator known as a generator. That generator then controls the execution of a generator function.

So, in formal and precise usage, “generator” unqualified means generator object, not generator function.

The above references are for Python 2 but Python 3 language reference says the same thing. However, the Python 3 glossary states that

generator … Usually refers to a generator function, but may refer to a generator iterator in some contexts. In cases where the intended meaning isn’t clear, using the full terms avoids ambiguity.


回答 4

每个人都有一个非常好的示例冗长的答案,我对此表示感谢。我只是想为在概念上仍不太清楚的人提供简短的答案:

如果创建自己的迭代器,则涉及到一点点-您必须创建一个类并至少实现iter和next方法。但是,如果您不想经历这种麻烦并想快速创建迭代器,该怎么办。幸运的是,Python提供了一种定义迭代器的捷径。您需要做的就是定义一个至少调用一次yield的函数,现在当您调用该函数时,它将返回“ something ”,其作用类似于迭代器(您可以调用next方法并在for循环中使用它)。这个东西在Python中有一个名字叫做Generator

希望可以澄清一下。

Everybody has a really nice and verbose answer with examples and I really appreciate it. I just wanted to give a short few lines answer for people who are still not quite clear conceptually:

If you create your own iterator, it is a little bit involved – you have to create a class and at least implement the iter and the next methods. But what if you don’t want to go through this hassle and want to quickly create an iterator. Fortunately, Python provides a short-cut way to defining an iterator. All you need to do is define a function with at least 1 call to yield and now when you call that function it will return “something” which will act like an iterator (you can call next method and use it in a for loop). This something has a name in Python called Generator

Hope that clarifies a bit.


回答 5

先前的答案未添加此功能:生成器具有close方法,而典型的迭代器则没有。该close方法StopIteration在生成器中触发一个异常,该异常可能会finally在该迭代器的子句中捕获,从而有机会运行一些清理操作。这种抽象使它比简单的迭代器在大型迭代器中最有用。一个人可以关闭一个生成器,就像一个人可以关闭一个文件一样,而不必担心底层内容。

也就是说,我对第一个问题的个人回答是:iteratable __iter__仅具有一个方法,典型的迭代器__next__仅具有一个方法,生成器具有an __iter__和a __next__以及一个extra close

对于第二个问题,我个人的回答是:在公共界面中,我倾向于偏爱生成器,因为它更具弹性:该close方法具有更大的可组合性yield from。在本地,我可以使用迭代器,但前提是它是一个平面且简单的结构(迭代器不容易编写),并且有理由相信该序列很短,尤其是在序列结束之前可以将其停止的情况。我倾向于将迭代器视为低级原语,而不是文字。

对于控制流而言,生成器是一个与承诺一样重要的概念:两者都是抽象的且可组合的。

Previous answers missed this addition: a generator has a close method, while typical iterators don’t. The close method triggers a StopIteration exception in the generator, which may be caught in a finally clause in that iterator, to get a chance to run some clean‑up. This abstraction makes it most usable in the large than simple iterators. One can close a generator as one could close a file, without having to bother about what’s underneath.

That said, my personal answer to the first question would be: iteratable has an __iter__ method only, typical iterators have a __next__ method only, generators has both an __iter__ and a __next__ and an additional close.

For the second question, my personal answer would be: in a public interface, I tend to favor generators a lot, since it’s more resilient: the close method an a greater composability with yield from. Locally, I may use iterators, but only if it’s a flat and simple structure (iterators does not compose easily) and if there are reasons to believe the sequence is rather short especially if it may be stopped before it reach the end. I tend to look at iterators as a low level primitive, except as literals.

For control flow matters, generators are an as much important concept as promises: both are abstract and composable.


回答 6

生成器功能,生成器对象,生成器:

一个生成器的功能就像Python中的常规功能,但它包含一个或多个yield语句。生成器函数是一个很好的工具,它可以尽可能轻松地创建 Iterator对象。通过generator函数返回的Iterator对象也称为Generator对象Generator

在此示例中,我创建了一个Generator函数,该函数返回Generator对象<generator object fib at 0x01342480>。就像其他迭代器一样,Generator对象可以在for循环中使用,也可以与内置函数一起使用,该 函数next()从generator返回下一个值。

def fib(max):
    a, b = 0, 1
    for i in range(max):
        yield a
        a, b = b, a + b
print(fib(10))             #<generator object fib at 0x01342480>

for i in fib(10):
    print(i)               # 0 1 1 2 3 5 8 13 21 34


print(next(myfib))         #0
print(next(myfib))         #1
print(next(myfib))         #1
print(next(myfib))         #2

因此,生成器函数是创建Iterator对象的最简单方法。

迭代器

每个生成器对象都是一个迭代器,但反之则不是。如果自定义迭代器对象的类实现__iter____next__方法(也称为迭代器协议),则可以创建该对象 。

但是,使用生成器函数来创建迭代器要容易得多,因为它们可以简化迭代器的创建,但是自定义迭代器为您提供了更大的自由度,并且您还可以根据需要实现其他方法,如下例所示。

class Fib:
    def __init__(self,max):
        self.current=0
        self.next=1
        self.max=max
        self.count=0

    def __iter__(self):
        return self

    def __next__(self):
        if self.count>self.max:
            raise StopIteration
        else:
            self.current,self.next=self.next,(self.current+self.next)
            self.count+=1
            return self.next-self.current

    def __str__(self):
        return "Generator object"

itobj=Fib(4)
print(itobj)               #Generator object

for i in Fib(4):  
    print(i)               #0 1 1 2

print(next(itobj))         #0
print(next(itobj))         #1
print(next(itobj))         #1

Generator Function, Generator Object, Generator:

A Generator function is just like a regular function in Python but it contains one or more yield statements. Generator functions is a great tool to create Iterator objects as easy as possible. The Iterator object returend by generator function is also called Generator object or Generator.

In this example I have created a Generator function which returns a Generator object <generator object fib at 0x01342480>. Just like other iterators, Generator objects can be used in a for loop or with the built-in function next() which returns the next value from generator.

def fib(max):
    a, b = 0, 1
    for i in range(max):
        yield a
        a, b = b, a + b
print(fib(10))             #<generator object fib at 0x01342480>

for i in fib(10):
    print(i)               # 0 1 1 2 3 5 8 13 21 34


print(next(myfib))         #0
print(next(myfib))         #1
print(next(myfib))         #1
print(next(myfib))         #2

So a generator function is the easiest way to create an Iterator object.

Iterator:

Every generator object is an iterator but not vice versa. A custom iterator object can be created if its class implements __iter__ and __next__ method (also called iterator protocol).

However, it is much easier to use generators function to create iterators because they simplify their creation, but a custom Iterator gives you more freedom and you can also implement other methods according to your requirements as shown in the below example.

class Fib:
    def __init__(self,max):
        self.current=0
        self.next=1
        self.max=max
        self.count=0

    def __iter__(self):
        return self

    def __next__(self):
        if self.count>self.max:
            raise StopIteration
        else:
            self.current,self.next=self.next,(self.current+self.next)
            self.count+=1
            return self.next-self.current

    def __str__(self):
        return "Generator object"

itobj=Fib(4)
print(itobj)               #Generator object

for i in Fib(4):  
    print(i)               #0 1 1 2

print(next(itobj))         #0
print(next(itobj))         #1
print(next(itobj))         #1

回答 7

强烈推荐Ned Batchelder的示例 用于迭代器和生成器

没有生成器的方法会做一些偶数运算

def evens(stream):
   them = []
   for n in stream:
      if n % 2 == 0:
         them.append(n)
   return them

而使用生成器

def evens(stream):
    for n in stream:
        if n % 2 == 0:
            yield n
  • 我们不需要任何清单return声明
  • 对于大/无限长的流有效…它只是走并产生价值

evens照常调用方法(生成器)

num = [...]
for n in evens(num):
   do_smth(n)
  • 生成器也用于打破双回路

迭代器

一整页的书是可迭代的,书签是 迭代器

这个书签除了移动外别无其他 next

litr = iter([1,2,3])
next(litr) ## 1
next(litr) ## 2
next(litr) ## 3
next(litr) ## StopIteration  (Exception) as we got end of the iterator

要使用Generator …我们需要一个函数

要使用迭代器……我们需要nextiter

如前所述:

Generator函数返回迭代器对象

迭代器的全部优点:

一次将一个元素存储在内存中

Examples from Ned Batchelder highly recommended for iterators and generators

A method without generators that do something to even numbers

def evens(stream):
   them = []
   for n in stream:
      if n % 2 == 0:
         them.append(n)
   return them

while by using a generator

def evens(stream):
    for n in stream:
        if n % 2 == 0:
            yield n
  • We don’t need any list nor a return statement
  • Efficient for large/ infinite length stream … it just walks and yield the value

Calling the evens method (generator) is as usual

num = [...]
for n in evens(num):
   do_smth(n)
  • Generator also used to Break double loop

Iterator

A book full of pages is an iterable, A bookmark is an iterator

and this bookmark has nothing to do except to move next

litr = iter([1,2,3])
next(litr) ## 1
next(litr) ## 2
next(litr) ## 3
next(litr) ## StopIteration  (Exception) as we got end of the iterator

To use Generator … we need a function

To use Iterator … we need next and iter

As been said:

A Generator function returns an iterator object

The Whole benefit of Iterator:

Store one element a time in memory


回答 8

您可以将两种方法比较相同的数据:

def myGeneratorList(n):
    for i in range(n):
        yield i

def myIterableList(n):
    ll = n*[None]
    for i in range(n):
        ll[i] = i
    return ll

# Same values
ll1 = myGeneratorList(10)
ll2 = myIterableList(10)
for i1, i2 in zip(ll1, ll2):
    print("{} {}".format(i1, i2))

# Generator can only be read once
ll1 = myGeneratorList(10)
ll2 = myIterableList(10)

print("{} {}".format(len(list(ll1)), len(ll2)))
print("{} {}".format(len(list(ll1)), len(ll2)))

# Generator can be read several times if converted into iterable
ll1 = list(myGeneratorList(10))
ll2 = myIterableList(10)

print("{} {}".format(len(list(ll1)), len(ll2)))
print("{} {}".format(len(list(ll1)), len(ll2)))

此外,如果检查内存占用量,生成器将占用更少的内存,因为它不需要同时将所有值存储在内存中。

You can compare both approaches for the same data:

def myGeneratorList(n):
    for i in range(n):
        yield i

def myIterableList(n):
    ll = n*[None]
    for i in range(n):
        ll[i] = i
    return ll

# Same values
ll1 = myGeneratorList(10)
ll2 = myIterableList(10)
for i1, i2 in zip(ll1, ll2):
    print("{} {}".format(i1, i2))

# Generator can only be read once
ll1 = myGeneratorList(10)
ll2 = myIterableList(10)

print("{} {}".format(len(list(ll1)), len(ll2)))
print("{} {}".format(len(list(ll1)), len(ll2)))

# Generator can be read several times if converted into iterable
ll1 = list(myGeneratorList(10))
ll2 = myIterableList(10)

print("{} {}".format(len(list(ll1)), len(ll2)))
print("{} {}".format(len(list(ll1)), len(ll2)))

Besides, if you check the memory footprint, the generator takes much less memory as it doesn’t need to store all the values in memory at the same time.


回答 9

我用一种非常简单的方式专门为Python新手编写了代码,尽管Python深入人心地做了很多事情。

让我们从最基本的开始:

考虑一个清单,

l = [1,2,3]

让我们编写一个等效的函数:

def f():
    return [1,2,3]

o / p为print(l): [1,2,3]&o / p为print(f()) : [1,2,3]

让列表l变得可迭代:在python中,列表始终是可迭代的,这意味着您可以随时使用迭代器。

让我们在列表上应用迭代器:

iter_l = iter(l) # iterator applied explicitly

让我们迭代一个函数,即编写一个等效的生成器函数。 在python中,一旦您引入了关键字yield;它成为生成器函数,并且迭代器将被隐式应用。

注意:每个生成器始终可以应用隐式迭代器进行迭代,此处隐式迭代器是关键, 因此生成器函数将是:

def f():
  yield 1 
  yield 2
  yield 3

iter_f = f() # which is iter(f) as iterator is already applied implicitly

因此,如果您观察到,一旦创建函数fa generator,它已经是iter(f)

现在,

l是列表,应用迭代器方法“ iter”后,它变为iter(l)

f已经是iter(f),在应用迭代器方法“ iter”之后,它变为iter(iter(f)),再次是iter(f)

您将int强制转换为已经为int的int(x)并保留为int(x)。

例如:

print(type(iter(iter(l))))

<class 'list_iterator'>

永远不要忘记这是Python,而不是C或C ++

因此,以上解释得出的结论是:

列出l〜= iter(l)

生成器函数f == iter(f)

I am writing specifically for Python newbies in a very simple way, though deep down Python does so many things.

Let’s start with the very basic:

Consider a list,

l = [1,2,3]

Let’s write an equivalent function:

def f():
    return [1,2,3]

o/p of print(l): [1,2,3] & o/p of print(f()) : [1,2,3]

Let’s make list l iterable: In python list is always iterable that means you can apply iterator whenever you want.

Let’s apply iterator on list:

iter_l = iter(l) # iterator applied explicitly

Let’s make a function iterable, i.e. write an equivalent generator function. In python as soon as you introduce the keyword yield; it becomes a generator function and iterator will be applied implicitly.

Note: Every generator is always iterable with implicit iterator applied and here implicit iterator is the crux So the generator function will be:

def f():
  yield 1 
  yield 2
  yield 3

iter_f = f() # which is iter(f) as iterator is already applied implicitly

So if you have observed, as soon as you made function f a generator, it is already iter(f)

Now,

l is the list, after applying iterator method “iter” it becomes, iter(l)

f is already iter(f), after applying iterator method “iter” it becomes, iter(iter(f)), which is again iter(f)

It’s kinda you are casting int to int(x) which is already int and it will remain int(x).

For example o/p of :

print(type(iter(iter(l))))

is

<class 'list_iterator'>

Never forget this is Python and not C or C++

Hence the conclusion from above explanation is:

list l ~= iter(l)

generator function f == iter(f)


生成器表达式与列表理解

问题:生成器表达式与列表理解

什么时候应该使用生成器表达式,什么时候应该在Python中使用列表推导?

# Generator expression
(x*2 for x in range(256))

# List comprehension
[x*2 for x in range(256)]

When should you use generator expressions and when should you use list comprehensions in Python?

# Generator expression
(x*2 for x in range(256))

# List comprehension
[x*2 for x in range(256)]

回答 0

John的答案很好(当您要迭代多次时,列表理解会更好)。但是,还应注意,如果要使用任何列表方法,都应使用列表。例如,以下代码将不起作用:

def gen():
    return (something for something in get_some_stuff())

print gen()[:2]     # generators don't support indexing or slicing
print [5,6] + gen() # generators can't be added to lists

基本上,如果您要做的只是迭代一次,则使用生成器表达式。如果要存储和使用生成的结果,则最好使用列表理解功能。

由于性能是选择彼此的最常见原因,所以我的建议是不要担心它,而只选择一个即可。如果您发现程序运行速度太慢,则只有这样,您才应回去担心调整代码。

John’s answer is good (that list comprehensions are better when you want to iterate over something multiple times). However, it’s also worth noting that you should use a list if you want to use any of the list methods. For example, the following code won’t work:

def gen():
    return (something for something in get_some_stuff())

print gen()[:2]     # generators don't support indexing or slicing
print [5,6] + gen() # generators can't be added to lists

Basically, use a generator expression if all you’re doing is iterating once. If you want to store and use the generated results, then you’re probably better off with a list comprehension.

Since performance is the most common reason to choose one over the other, my advice is to not worry about it and just pick one; if you find that your program is running too slowly, then and only then should you go back and worry about tuning your code.


回答 1

遍历生成器表达式列表理解将执行相同的操作。但是,列表理解将首先在内存中创建整个列表,而生成器表达式将在运行中创建项目,因此您可以将其用于非常大的(也可以是无限的!)序列。

Iterating over the generator expression or the list comprehension will do the same thing. However, the list comprehension will create the entire list in memory first while the generator expression will create the items on the fly, so you are able to use it for very large (and also infinite!) sequences.


回答 2

当结果需要多次迭代或速度至关重要时,请使用列表推导。使用范围较大或无限的生成器表达式。

有关更多信息,请参见生成器表达式和列表推导。

Use list comprehensions when the result needs to be iterated over multiple times, or where speed is paramount. Use generator expressions where the range is large or infinite.

See Generator expressions and list comprehensions for more info.


回答 3

重要的是列表理解会创建一个新列表。生成器创建一个可迭代的对象,当您使用这些位时,它将动态“过滤”源材料。

假设您有一个名为“ hugefile.txt”的2TB日志文件,并且想要以单词“ ENTRY”开头的所有行的内容和长度。

因此,您尝试通过编写列表理解来开始:

logfile = open("hugefile.txt","r")
entry_lines = [(line,len(line)) for line in logfile if line.startswith("ENTRY")]

这样会抓取整个文件,处理每一行,并将匹配的行存储在数组中。因此,此阵列最多可以包含2TB的内容。那会占用很多RAM,对于您的目的可能不切实际。

因此,我们可以使用生成器将“过滤器”应用于我们的内容。直到我们开始遍历结果之前,才实际读取任何数据。

logfile = open("hugefile.txt","r")
entry_lines = ((line,len(line)) for line in logfile if line.startswith("ENTRY"))

甚至没有从我们的文件中读取任何一行。实际上,假设我们想进一步过滤结果:

long_entries = ((line,length) for (line,length) in entry_lines if length > 80)

仍未读取任何内容,但是我们现在指定了两个生成器,它们将根据需要对数据起作用。

让我们将过滤后的行写到另一个文件中:

outfile = open("filtered.txt","a")
for entry,length in long_entries:
    outfile.write(entry)

现在我们读取输入文件。随着for循环继续请求其他行,long_entries生成器要求生成器提供行entry_lines,仅返回长度大于80个字符的行。然后,entry_lines生成器从logfile迭代迭代器读取文件。

因此,不是以完全填充列表的形式将数据“推送”到输出函数,而是为输出函数提供了一种仅在需要时才“拉”数据的方法。在我们的情况下,这要高效得多,但不够灵活。生成器是一种方式,一次通过。我们读取的日志文件中的数据会立即被丢弃,因此我们无法返回上一行。另一方面,完成数据后,我们不必担心保留数据。

The important point is that the list comprehension creates a new list. The generator creates a an iterable object that will “filter” the source material on-the-fly as you consume the bits.

Imagine you have a 2TB log file called “hugefile.txt”, and you want the content and length for all the lines that start with the word “ENTRY”.

So you try starting out by writing a list comprehension:

logfile = open("hugefile.txt","r")
entry_lines = [(line,len(line)) for line in logfile if line.startswith("ENTRY")]

This slurps up the whole file, processes each line, and stores the matching lines in your array. This array could therefore contain up to 2TB of content. That’s a lot of RAM, and probably not practical for your purposes.

So instead we can use a generator to apply a “filter” to our content. No data is actually read until we start iterating over the result.

logfile = open("hugefile.txt","r")
entry_lines = ((line,len(line)) for line in logfile if line.startswith("ENTRY"))

Not even a single line has been read from our file yet. In fact, say we want to filter our result even further:

long_entries = ((line,length) for (line,length) in entry_lines if length > 80)

Still nothing has been read, but we’ve specified now two generators that will act on our data as we wish.

Lets write out our filtered lines to another file:

outfile = open("filtered.txt","a")
for entry,length in long_entries:
    outfile.write(entry)

Now we read the input file. As our for loop continues to request additional lines, the long_entries generator demands lines from the entry_lines generator, returning only those whose length is greater than 80 characters. And in turn, the entry_lines generator requests lines (filtered as indicated) from the logfile iterator, which in turn reads the file.

So instead of “pushing” data to your output function in the form of a fully-populated list, you’re giving the output function a way to “pull” data only when its needed. This is in our case much more efficient, but not quite as flexible. Generators are one way, one pass; the data from the log file we’ve read gets immediately discarded, so we can’t go back to a previous line. On the other hand, we don’t have to worry about keeping data around once we’re done with it.


回答 4

生成器表达式的好处是它使用较少的内存,因为它不会立即构建整个列表。当列表是中间变量时,最好使用生成器表达式,例如对结果求和或根据结果创建字典。

例如:

sum(x*2 for x in xrange(256))

dict( (k, some_func(k)) for k in some_list_of_keys )

这样做的好处是列表不会完全生成,因此使用的内存很少(而且应该更快)

但是,当所需的最终产品是列表时,应该使用列表推导。您将不会使用生成器表达式保存任何内存,因为您需要生成的列表。您还可以获得能够使用任何列表功能(如已排序或反转)的好处。

例如:

reversed( [x*2 for x in xrange(256)] )

The benefit of a generator expression is that it uses less memory since it doesn’t build the whole list at once. Generator expressions are best used when the list is an intermediary, such as summing the results, or creating a dict out of the results.

For example:

sum(x*2 for x in xrange(256))

dict( (k, some_func(k)) for k in some_list_of_keys )

The advantage there is that the list isn’t completely generated, and thus little memory is used (and should also be faster)

You should, though, use list comprehensions when the desired final product is a list. You are not going to save any memeory using generator expressions, since you want the generated list. You also get the benefit of being able to use any of the list functions like sorted or reversed.

For example:

reversed( [x*2 for x in xrange(256)] )

回答 5

从可变对象(如列表)创建生成器时,请注意,生成器将在使用生成器时(而不是在创建生成器时)根据列表的状态进行评估:

>>> mylist = ["a", "b", "c"]
>>> gen = (elem + "1" for elem in mylist)
>>> mylist.clear()
>>> for x in gen: print (x)
# nothing

如果您的列表有可能被修改(或列表中的可变对象),但是您需要在生成器创建时的状态,则需要使用列表推导。

When creating a generator from a mutable object (like a list) be aware that the generator will get evaluated on the state of the list at time of using the generator, not at time of the creation of the generator:

>>> mylist = ["a", "b", "c"]
>>> gen = (elem + "1" for elem in mylist)
>>> mylist.clear()
>>> for x in gen: print (x)
# nothing

If there is any chance of your list getting modified (or a mutable object inside that list) but you need the state at creation of the generator you need to use a list comprehension instead.


回答 6

有时,您可以从itertools中使用tee函数,它为同一生成器返回多个迭代器,这些迭代器可以独立使用。

Sometimes you can get away with the tee function from itertools, it returns multiple iterators for the same generator that can be used independently.


回答 7

我正在使用Hadoop Mincemeat模块。我认为这是一个值得注意的好例子:

import mincemeat

def mapfn(k,v):
    for w in v:
        yield 'sum',w
        #yield 'count',1


def reducefn(k,v): 
    r1=sum(v)
    r2=len(v)
    print r2
    m=r1/r2
    std=0
    for i in range(r2):
       std+=pow(abs(v[i]-m),2)  
    res=pow((std/r2),0.5)
    return r1,r2,res

在这里,生成器从文本文件(最大为15GB)中获取数字,并使用Hadoop的map-reduce对这些数字进行简单的数学运算。如果我没有使用yield函数,而是使用列表理解,那么计算总和和平均值将花费更长的时间(更不用说空间复杂性了)。

Hadoop是利用Generators的所有优点的一个很好的例子。

I’m using the Hadoop Mincemeat module. I think this is a great example to take a note of:

import mincemeat

def mapfn(k,v):
    for w in v:
        yield 'sum',w
        #yield 'count',1


def reducefn(k,v): 
    r1=sum(v)
    r2=len(v)
    print r2
    m=r1/r2
    std=0
    for i in range(r2):
       std+=pow(abs(v[i]-m),2)  
    res=pow((std/r2),0.5)
    return r1,r2,res

Here the generator gets numbers out of a text file (as big as 15GB) and applies simple math on those numbers using Hadoop’s map-reduce. If I had not used the yield function, but instead a list comprehension, it would have taken a much longer time calculating the sums and average (not to mention the space complexity).

Hadoop is a great example for using all the advantages of Generators.


“ yield”关键字有什么作用?

问题:“ yield”关键字有什么作用?

yield关键字在Python中的用途是什么?

例如,我试图理解这段代码1

def _get_child_candidates(self, distance, min_dist, max_dist):
    if self._leftchild and distance - max_dist < self._median:
        yield self._leftchild
    if self._rightchild and distance + max_dist >= self._median:
        yield self._rightchild  

这是调用方法:

result, candidates = [], [self]
while candidates:
    node = candidates.pop()
    distance = node._get_dist(obj)
    if distance <= max_dist and distance >= min_dist:
        result.extend(node._values)
    candidates.extend(node._get_child_candidates(distance, min_dist, max_dist))
return result

_get_child_candidates调用该方法会怎样?是否返回列表?一个元素?再叫一次吗?后续通话何时停止?


1.这段代码是由Jochen Schulz(jrschulz)编写的,Jochen Schulz是一个很好的用于度量空间的Python库。这是完整源代码的链接:Module mspace

What is the use of the yield keyword in Python, and what does it do?

For example, I’m trying to understand this code1:

def _get_child_candidates(self, distance, min_dist, max_dist):
    if self._leftchild and distance - max_dist < self._median:
        yield self._leftchild
    if self._rightchild and distance + max_dist >= self._median:
        yield self._rightchild  

And this is the caller:

result, candidates = [], [self]
while candidates:
    node = candidates.pop()
    distance = node._get_dist(obj)
    if distance <= max_dist and distance >= min_dist:
        result.extend(node._values)
    candidates.extend(node._get_child_candidates(distance, min_dist, max_dist))
return result

What happens when the method _get_child_candidates is called? Is a list returned? A single element? Is it called again? When will subsequent calls stop?


1. This piece of code was written by Jochen Schulz (jrschulz), who made a great Python library for metric spaces. This is the link to the complete source: Module mspace.


回答 0

要了解其yield作用,您必须了解什么是生成器。而且,在您了解生成器之前,您必须了解iterables

可迭代

创建列表时,可以一一阅读它的项目。逐一读取其项称为迭代:

>>> mylist = [1, 2, 3]
>>> for i in mylist:
...    print(i)
1
2
3

mylist是一个可迭代的。当您使用列表推导时,您将创建一个列表,因此是可迭代的:

>>> mylist = [x*x for x in range(3)]
>>> for i in mylist:
...    print(i)
0
1
4

您可以使用的所有“ for... in...”都是可迭代的;listsstrings,文件…

这些可迭代的方法很方便,因为您可以随意读取它们,但是您将所有值都存储在内存中,当拥有很多值时,这并不总是想要的。

生成器

生成器是迭代器,一种迭代,您只能迭代一次。生成器不会将所有值存储在内存中,它们会即时生成值

>>> mygenerator = (x*x for x in range(3))
>>> for i in mygenerator:
...    print(i)
0
1
4

只是您使用()代替一样[]。但是,由于生成器只能使用一次,因此您无法执行for i in mygenerator第二次:生成器计算0,然后忽略它,然后计算1,最后一次计算4,最后一次。

Yield

yield是与一样使用的关键字return,不同之处在于该函数将返回生成器。

>>> def createGenerator():
...    mylist = range(3)
...    for i in mylist:
...        yield i*i
...
>>> mygenerator = createGenerator() # create a generator
>>> print(mygenerator) # mygenerator is an object!
<generator object createGenerator at 0xb7555c34>
>>> for i in mygenerator:
...     print(i)
0
1
4

这是一个无用的示例,但是当您知道函数将返回大量的值(只需要读取一次)时,它就很方便。

要掌握yield,您必须了解在调用函数时,在函数主体中编写的代码不会运行。该函数仅返回生成器对象,这有点棘手:-)

然后,您的代码将在每次for使用生成器时从中断处继续。

现在最困难的部分是:

第一次for调用从您的函数创建的生成器对象时,它将从头开始运行函数中的代码,直到命中为止yield,然后它将返回循环的第一个值。然后,每个后续调用将运行您在函数中编写的循环的另一个迭代,并返回下一个值。这将一直持续到生成器被认为是空的为止,这在函数运行时没有命中时就会发生yield。那可能是因为循环已经结束,或者是因为您不再满足"if/else"


您的代码说明

生成器:

# Here you create the method of the node object that will return the generator
def _get_child_candidates(self, distance, min_dist, max_dist):

    # Here is the code that will be called each time you use the generator object:

    # If there is still a child of the node object on its left
    # AND if the distance is ok, return the next child
    if self._leftchild and distance - max_dist < self._median:
        yield self._leftchild

    # If there is still a child of the node object on its right
    # AND if the distance is ok, return the next child
    if self._rightchild and distance + max_dist >= self._median:
        yield self._rightchild

    # If the function arrives here, the generator will be considered empty
    # there is no more than two values: the left and the right children

调用方法:

# Create an empty list and a list with the current object reference
result, candidates = list(), [self]

# Loop on candidates (they contain only one element at the beginning)
while candidates:

    # Get the last candidate and remove it from the list
    node = candidates.pop()

    # Get the distance between obj and the candidate
    distance = node._get_dist(obj)

    # If distance is ok, then you can fill the result
    if distance <= max_dist and distance >= min_dist:
        result.extend(node._values)

    # Add the children of the candidate in the candidate's list
    # so the loop will keep running until it will have looked
    # at all the children of the children of the children, etc. of the candidate
    candidates.extend(node._get_child_candidates(distance, min_dist, max_dist))

return result

该代码包含几个智能部分:

  • 循环在一个列表上迭代,但是循环在迭代时列表会扩展:-)这是浏览所有这些嵌套数据的一种简洁方法,即使这样做有点危险,因为您可能会遇到无限循环。在这种情况下,请candidates.extend(node._get_child_candidates(distance, min_dist, max_dist))耗尽所有生成器的值,但是while继续创建新的生成器对象,因为它们未应用于同一节点,因此将产生与先前值不同的值。

  • extend()方法是期望可迭代并将其值添加到列表的列表对象方法。

通常我们将一个列表传递给它:

>>> a = [1, 2]
>>> b = [3, 4]
>>> a.extend(b)
>>> print(a)
[1, 2, 3, 4]

但是在您的代码中,它得到了一个生成器,这很好,因为:

  1. 您无需两次读取值。
  2. 您可能有很多孩子,并且您不希望所有孩子都存储在内存中。

它之所以有效,是因为Python不在乎方法的参数是否为列表。Python期望可迭代,因此它将与字符串,列表,元组和生成器一起使用!这就是所谓的鸭子输入,这是Python如此酷的原因之一。但这是另一个故事,还有另一个问题…

您可以在这里停止,或者阅读一点以了解生成器的高级用法:

控制生成器耗尽

>>> class Bank(): # Let's create a bank, building ATMs
...    crisis = False
...    def create_atm(self):
...        while not self.crisis:
...            yield "$100"
>>> hsbc = Bank() # When everything's ok the ATM gives you as much as you want
>>> corner_street_atm = hsbc.create_atm()
>>> print(corner_street_atm.next())
$100
>>> print(corner_street_atm.next())
$100
>>> print([corner_street_atm.next() for cash in range(5)])
['$100', '$100', '$100', '$100', '$100']
>>> hsbc.crisis = True # Crisis is coming, no more money!
>>> print(corner_street_atm.next())
<type 'exceptions.StopIteration'>
>>> wall_street_atm = hsbc.create_atm() # It's even true for new ATMs
>>> print(wall_street_atm.next())
<type 'exceptions.StopIteration'>
>>> hsbc.crisis = False # The trouble is, even post-crisis the ATM remains empty
>>> print(corner_street_atm.next())
<type 'exceptions.StopIteration'>
>>> brand_new_atm = hsbc.create_atm() # Build a new one to get back in business
>>> for cash in brand_new_atm:
...    print cash
$100
$100
$100
$100
$100
$100
$100
$100
$100
...

注意:对于Python 3,请使用print(corner_street_atm.__next__())print(next(corner_street_atm))

对于诸如控制对资源的访问之类的各种事情,它可能很有用。

Itertools,您最好的朋友

itertools模块包含用于操纵可迭代对象的特殊功能。曾经希望复制一个生成器吗?连锁两个生成器?用一行代码对嵌套列表中的值进行分组?Map / Zip没有创建另一个列表?

然后就import itertools

一个例子?让我们看一下四马比赛的可能到达顺序:

>>> horses = [1, 2, 3, 4]
>>> races = itertools.permutations(horses)
>>> print(races)
<itertools.permutations object at 0xb754f1dc>
>>> print(list(itertools.permutations(horses)))
[(1, 2, 3, 4),
 (1, 2, 4, 3),
 (1, 3, 2, 4),
 (1, 3, 4, 2),
 (1, 4, 2, 3),
 (1, 4, 3, 2),
 (2, 1, 3, 4),
 (2, 1, 4, 3),
 (2, 3, 1, 4),
 (2, 3, 4, 1),
 (2, 4, 1, 3),
 (2, 4, 3, 1),
 (3, 1, 2, 4),
 (3, 1, 4, 2),
 (3, 2, 1, 4),
 (3, 2, 4, 1),
 (3, 4, 1, 2),
 (3, 4, 2, 1),
 (4, 1, 2, 3),
 (4, 1, 3, 2),
 (4, 2, 1, 3),
 (4, 2, 3, 1),
 (4, 3, 1, 2),
 (4, 3, 2, 1)]

了解迭代的内部机制

迭代是一个隐含可迭代(实现__iter__()方法)和迭代器(实现__next__()方法)的过程。可迭代对象是可以从中获取迭代器的任何对象。迭代器是使您可以迭代的对象。

本文还提供了有关循环如何for工作的更多信息

To understand what yield does, you must understand what generators are. And before you can understand generators, you must understand iterables.

Iterables

When you create a list, you can read its items one by one. Reading its items one by one is called iteration:

>>> mylist = [1, 2, 3]
>>> for i in mylist:
...    print(i)
1
2
3

mylist is an iterable. When you use a list comprehension, you create a list, and so an iterable:

>>> mylist = [x*x for x in range(3)]
>>> for i in mylist:
...    print(i)
0
1
4

Everything you can use “for... in...” on is an iterable; lists, strings, files…

These iterables are handy because you can read them as much as you wish, but you store all the values in memory and this is not always what you want when you have a lot of values.

Generators

Generators are iterators, a kind of iterable you can only iterate over once. Generators do not store all the values in memory, they generate the values on the fly:

>>> mygenerator = (x*x for x in range(3))
>>> for i in mygenerator:
...    print(i)
0
1
4

It is just the same except you used () instead of []. BUT, you cannot perform for i in mygenerator a second time since generators can only be used once: they calculate 0, then forget about it and calculate 1, and end calculating 4, one by one.

Yield

yield is a keyword that is used like return, except the function will return a generator.

>>> def createGenerator():
...    mylist = range(3)
...    for i in mylist:
...        yield i*i
...
>>> mygenerator = createGenerator() # create a generator
>>> print(mygenerator) # mygenerator is an object!
<generator object createGenerator at 0xb7555c34>
>>> for i in mygenerator:
...     print(i)
0
1
4

Here it’s a useless example, but it’s handy when you know your function will return a huge set of values that you will only need to read once.

To master yield, you must understand that when you call the function, the code you have written in the function body does not run. The function only returns the generator object, this is a bit tricky :-)

Then, your code will continue from where it left off each time for uses the generator.

Now the hard part:

The first time the for calls the generator object created from your function, it will run the code in your function from the beginning until it hits yield, then it’ll return the first value of the loop. Then, each subsequent call will run another iteration of the loop you have written in the function and return the next value. This will continue until the generator is considered empty, which happens when the function runs without hitting yield. That can be because the loop has come to an end, or because you no longer satisfy an "if/else".


Your code explained

Generator:

# Here you create the method of the node object that will return the generator
def _get_child_candidates(self, distance, min_dist, max_dist):

    # Here is the code that will be called each time you use the generator object:

    # If there is still a child of the node object on its left
    # AND if the distance is ok, return the next child
    if self._leftchild and distance - max_dist < self._median:
        yield self._leftchild

    # If there is still a child of the node object on its right
    # AND if the distance is ok, return the next child
    if self._rightchild and distance + max_dist >= self._median:
        yield self._rightchild

    # If the function arrives here, the generator will be considered empty
    # there is no more than two values: the left and the right children

Caller:

# Create an empty list and a list with the current object reference
result, candidates = list(), [self]

# Loop on candidates (they contain only one element at the beginning)
while candidates:

    # Get the last candidate and remove it from the list
    node = candidates.pop()

    # Get the distance between obj and the candidate
    distance = node._get_dist(obj)

    # If distance is ok, then you can fill the result
    if distance <= max_dist and distance >= min_dist:
        result.extend(node._values)

    # Add the children of the candidate in the candidate's list
    # so the loop will keep running until it will have looked
    # at all the children of the children of the children, etc. of the candidate
    candidates.extend(node._get_child_candidates(distance, min_dist, max_dist))

return result

This code contains several smart parts:

  • The loop iterates on a list, but the list expands while the loop is being iterated :-) It’s a concise way to go through all these nested data even if it’s a bit dangerous since you can end up with an infinite loop. In this case, candidates.extend(node._get_child_candidates(distance, min_dist, max_dist)) exhaust all the values of the generator, but while keeps creating new generator objects which will produce different values from the previous ones since it’s not applied on the same node.

  • The extend() method is a list object method that expects an iterable and adds its values to the list.

Usually we pass a list to it:

>>> a = [1, 2]
>>> b = [3, 4]
>>> a.extend(b)
>>> print(a)
[1, 2, 3, 4]

But in your code, it gets a generator, which is good because:

  1. You don’t need to read the values twice.
  2. You may have a lot of children and you don’t want them all stored in memory.

And it works because Python does not care if the argument of a method is a list or not. Python expects iterables so it will work with strings, lists, tuples, and generators! This is called duck typing and is one of the reasons why Python is so cool. But this is another story, for another question…

You can stop here, or read a little bit to see an advanced use of a generator:

Controlling a generator exhaustion

>>> class Bank(): # Let's create a bank, building ATMs
...    crisis = False
...    def create_atm(self):
...        while not self.crisis:
...            yield "$100"
>>> hsbc = Bank() # When everything's ok the ATM gives you as much as you want
>>> corner_street_atm = hsbc.create_atm()
>>> print(corner_street_atm.next())
$100
>>> print(corner_street_atm.next())
$100
>>> print([corner_street_atm.next() for cash in range(5)])
['$100', '$100', '$100', '$100', '$100']
>>> hsbc.crisis = True # Crisis is coming, no more money!
>>> print(corner_street_atm.next())
<type 'exceptions.StopIteration'>
>>> wall_street_atm = hsbc.create_atm() # It's even true for new ATMs
>>> print(wall_street_atm.next())
<type 'exceptions.StopIteration'>
>>> hsbc.crisis = False # The trouble is, even post-crisis the ATM remains empty
>>> print(corner_street_atm.next())
<type 'exceptions.StopIteration'>
>>> brand_new_atm = hsbc.create_atm() # Build a new one to get back in business
>>> for cash in brand_new_atm:
...    print cash
$100
$100
$100
$100
$100
$100
$100
$100
$100
...

Note: For Python 3, useprint(corner_street_atm.__next__()) or print(next(corner_street_atm))

It can be useful for various things like controlling access to a resource.

Itertools, your best friend

The itertools module contains special functions to manipulate iterables. Ever wish to duplicate a generator? Chain two generators? Group values in a nested list with a one-liner? Map / Zip without creating another list?

Then just import itertools.

An example? Let’s see the possible orders of arrival for a four-horse race:

>>> horses = [1, 2, 3, 4]
>>> races = itertools.permutations(horses)
>>> print(races)
<itertools.permutations object at 0xb754f1dc>
>>> print(list(itertools.permutations(horses)))
[(1, 2, 3, 4),
 (1, 2, 4, 3),
 (1, 3, 2, 4),
 (1, 3, 4, 2),
 (1, 4, 2, 3),
 (1, 4, 3, 2),
 (2, 1, 3, 4),
 (2, 1, 4, 3),
 (2, 3, 1, 4),
 (2, 3, 4, 1),
 (2, 4, 1, 3),
 (2, 4, 3, 1),
 (3, 1, 2, 4),
 (3, 1, 4, 2),
 (3, 2, 1, 4),
 (3, 2, 4, 1),
 (3, 4, 1, 2),
 (3, 4, 2, 1),
 (4, 1, 2, 3),
 (4, 1, 3, 2),
 (4, 2, 1, 3),
 (4, 2, 3, 1),
 (4, 3, 1, 2),
 (4, 3, 2, 1)]

Understanding the inner mechanisms of iteration

Iteration is a process implying iterables (implementing the __iter__() method) and iterators (implementing the __next__() method). Iterables are any objects you can get an iterator from. Iterators are objects that let you iterate on iterables.

There is more about it in this article about how for loops work.


回答 1

理解的捷径 yield

当您看到带有yield语句的函数时,请应用以下简单技巧,以了解将发生的情况:

  1. result = []在函数的开头插入一行。
  2. 替换每个yield exprresult.append(expr)
  3. return result在函数底部插入一行。
  4. 是的-不再yield声明!阅读并找出代码。
  5. 将功能与原始定义进行比较。

这个技巧可能会让您对函数背后的逻辑yield有所了解,但是实际发生的事情与基于列表的方法发生的事情明显不同。在许多情况下,yield方法也将具有更高的内存效率和更快的速度。在其他情况下,即使原始函数运行正常,此技巧也会使您陷入无限循环。请继续阅读以了解更多信息…

不要混淆您的Iterable,Iterators和Generators

首先,迭代器协议 -当您编写时

for x in mylist:
    ...loop body...

Python执行以下两个步骤:

  1. 获取一个迭代器 mylist

    调用iter(mylist)->这将返回一个带有next()方法(或__next__()在Python 3中)。

    [这是大多数人忘记告诉您的步骤]

  2. 使用迭代器遍历项目:

    继续next()在从步骤1返回的迭代器上调用该方法。从的返回值next()被分配给x并执行循环体。如果StopIteration从内部引发异常next(),则意味着迭代器中没有更多值,并且退出了循环。

事实是,Python在想要遍历对象内容的任何时候都执行上述两个步骤-因此它可以是for循环,但也可以是类似的代码otherlist.extend(mylist)(其中otherlist是Python列表)。

mylist是一个可迭代的,因为它实现了迭代器协议。在用户定义的类中,可以实现该__iter__()方法以使您的类的实例可迭代。此方法应返回迭代器。迭代器是带有next()方法的对象。它可以同时实现__iter__(),并next()在同一类,并有__iter__()回报self。这适用于简单的情况,但是当您希望两个迭代器同时在同一个对象上循环时,则不能使用。

这就是迭代器协议,许多对象都实现了该协议:

  1. 内置列表,字典,元组,集合,文件。
  2. 实现的用户定义的类__iter__()
  3. 生成器。

请注意,for循环不知道它要处理的是哪种对象-它仅遵循迭代器协议,并且很高兴在调用时逐项获取next()。内置列表一一返回它们的项,词典一一返回,文件一一返回,依此类推。生成器返回…就是这样yield

def f123():
    yield 1
    yield 2
    yield 3

for item in f123():
    print item

yield如果没有三个return语句,f123()则只执行第一个语句,而不是语句,然后函数将退出。但是f123()没有普通的功能。当f123()被调用时,它不会返回yield语句中的任何值!它返回一个生成器对象。另外,该函数并没有真正退出-进入了挂起状态。当for循环尝试遍历生成器对象时,该函数从yield先前返回的下一行从其挂起状态恢复,执行下一行代码(在这种情况下为yield语句),并将其作为下一行返回项目。这会一直发生,直到函数退出,此时生成器将引发StopIteration,然后循环退出。

因此,生成器对象有点像适配器-在一端,它通过公开__iter__()next()保持for循环满意的方法来展示迭代器协议。但是,在另一端,它仅运行该函数以从中获取下一个值,然后将其放回暂停模式。

为什么使用生成器?

通常,您可以编写不使用生成器但实现相同逻辑的代码。一种选择是使用我之前提到的临时列表“技巧”。这并非在所有情况下都可行,例如,如果您有无限循环,或者当您的列表很长时,这可能会导致内存使用效率低下。另一种方法是实现一个新的可迭代类SomethingIter,该类将状态保留在实例成员中,并在其next()(或__next__()Python 3)方法中执行下一个逻辑步骤。根据逻辑,next()方法中的代码可能最终看起来非常复杂并且容易出现错误。在这里,生成器提供了一种干净而简单的解决方案。

Shortcut to understanding yield

When you see a function with yield statements, apply this easy trick to understand what will happen:

  1. Insert a line result = [] at the start of the function.
  2. Replace each yield expr with result.append(expr).
  3. Insert a line return result at the bottom of the function.
  4. Yay – no more yield statements! Read and figure out code.
  5. Compare function to the original definition.

This trick may give you an idea of the logic behind the function, but what actually happens with yield is significantly different than what happens in the list based approach. In many cases, the yield approach will be a lot more memory efficient and faster too. In other cases, this trick will get you stuck in an infinite loop, even though the original function works just fine. Read on to learn more…

Don’t confuse your Iterables, Iterators, and Generators

First, the iterator protocol – when you write

for x in mylist:
    ...loop body...

Python performs the following two steps:

  1. Gets an iterator for mylist:

    Call iter(mylist) -> this returns an object with a next() method (or __next__() in Python 3).

    [This is the step most people forget to tell you about]

  2. Uses the iterator to loop over items:

    Keep calling the next() method on the iterator returned from step 1. The return value from next() is assigned to x and the loop body is executed. If an exception StopIteration is raised from within next(), it means there are no more values in the iterator and the loop is exited.

The truth is Python performs the above two steps anytime it wants to loop over the contents of an object – so it could be a for loop, but it could also be code like otherlist.extend(mylist) (where otherlist is a Python list).

Here mylist is an iterable because it implements the iterator protocol. In a user-defined class, you can implement the __iter__() method to make instances of your class iterable. This method should return an iterator. An iterator is an object with a next() method. It is possible to implement both __iter__() and next() on the same class, and have __iter__() return self. This will work for simple cases, but not when you want two iterators looping over the same object at the same time.

So that’s the iterator protocol, many objects implement this protocol:

  1. Built-in lists, dictionaries, tuples, sets, files.
  2. User-defined classes that implement __iter__().
  3. Generators.

Note that a for loop doesn’t know what kind of object it’s dealing with – it just follows the iterator protocol, and is happy to get item after item as it calls next(). Built-in lists return their items one by one, dictionaries return the keys one by one, files return the lines one by one, etc. And generators return… well that’s where yield comes in:

def f123():
    yield 1
    yield 2
    yield 3

for item in f123():
    print item

Instead of yield statements, if you had three return statements in f123() only the first would get executed, and the function would exit. But f123() is no ordinary function. When f123() is called, it does not return any of the values in the yield statements! It returns a generator object. Also, the function does not really exit – it goes into a suspended state. When the for loop tries to loop over the generator object, the function resumes from its suspended state at the very next line after the yield it previously returned from, executes the next line of code, in this case, a yield statement, and returns that as the next item. This happens until the function exits, at which point the generator raises StopIteration, and the loop exits.

So the generator object is sort of like an adapter – at one end it exhibits the iterator protocol, by exposing __iter__() and next() methods to keep the for loop happy. At the other end, however, it runs the function just enough to get the next value out of it, and puts it back in suspended mode.

Why Use Generators?

Usually, you can write code that doesn’t use generators but implements the same logic. One option is to use the temporary list ‘trick’ I mentioned before. That will not work in all cases, for e.g. if you have infinite loops, or it may make inefficient use of memory when you have a really long list. The other approach is to implement a new iterable class SomethingIter that keeps the state in instance members and performs the next logical step in it’s next() (or __next__() in Python 3) method. Depending on the logic, the code inside the next() method may end up looking very complex and be prone to bugs. Here generators provide a clean and easy solution.


回答 2

这样想:

迭代器只是一个带有next()方法的对象的美化名词。因此,产生收益的函数最终是这样的:

原始版本:

def some_function():
    for i in xrange(4):
        yield i

for i in some_function():
    print i

这基本上是Python解释器使用上面的代码执行的操作:

class it:
    def __init__(self):
        # Start at -1 so that we get 0 when we add 1 below.
        self.count = -1

    # The __iter__ method will be called once by the 'for' loop.
    # The rest of the magic happens on the object returned by this method.
    # In this case it is the object itself.
    def __iter__(self):
        return self

    # The next method will be called repeatedly by the 'for' loop
    # until it raises StopIteration.
    def next(self):
        self.count += 1
        if self.count < 4:
            return self.count
        else:
            # A StopIteration exception is raised
            # to signal that the iterator is done.
            # This is caught implicitly by the 'for' loop.
            raise StopIteration

def some_func():
    return it()

for i in some_func():
    print i

为了更深入地了解幕后发生的事情,for可以将循环重写为:

iterator = some_func()
try:
    while 1:
        print iterator.next()
except StopIteration:
    pass

这是否更有意义,还是会让您更加困惑?:)

我要指出,这为了说明的目的过于简单化。:)

Think of it this way:

An iterator is just a fancy sounding term for an object that has a next() method. So a yield-ed function ends up being something like this:

Original version:

def some_function():
    for i in xrange(4):
        yield i

for i in some_function():
    print i

This is basically what the Python interpreter does with the above code:

class it:
    def __init__(self):
        # Start at -1 so that we get 0 when we add 1 below.
        self.count = -1

    # The __iter__ method will be called once by the 'for' loop.
    # The rest of the magic happens on the object returned by this method.
    # In this case it is the object itself.
    def __iter__(self):
        return self

    # The next method will be called repeatedly by the 'for' loop
    # until it raises StopIteration.
    def next(self):
        self.count += 1
        if self.count < 4:
            return self.count
        else:
            # A StopIteration exception is raised
            # to signal that the iterator is done.
            # This is caught implicitly by the 'for' loop.
            raise StopIteration

def some_func():
    return it()

for i in some_func():
    print i

For more insight as to what’s happening behind the scenes, the for loop can be rewritten to this:

iterator = some_func()
try:
    while 1:
        print iterator.next()
except StopIteration:
    pass

Does that make more sense or just confuse you more? :)

I should note that this is an oversimplification for illustrative purposes. :)


回答 3

yield关键字被减少到两个简单的事实:

  1. 如果编译器在函数内部的任何位置检测到yield关键字,则该函数不再通过该语句返回。相反,它立即返回一个懒惰的“待处理列表”对象,称为生成器return
  2. 生成器是可迭代的。什么是可迭代的?就像是listor或setor range或dict-view一样,它带有用于以特定顺序访问每个元素内置协议

简而言之:生成器是一个懒惰的,增量待定的list,并且yield语句允许您使用函数符号来编程生成器应逐渐吐出的列表值

generator = myYieldingFunction(...)
x = list(generator)

   generator
       v
[x[0], ..., ???]

         generator
             v
[x[0], x[1], ..., ???]

               generator
                   v
[x[0], x[1], x[2], ..., ???]

                       StopIteration exception
[x[0], x[1], x[2]]     done

list==[x[0], x[1], x[2]]

让我们定义一个makeRange类似于Python的函数range。调用makeRange(n)“返回生成器”:

def makeRange(n):
    # return 0,1,2,...,n-1
    i = 0
    while i < n:
        yield i
        i += 1

>>> makeRange(5)
<generator object makeRange at 0x19e4aa0>

要强制生成器立即返回其待处理的值,可以将其传递给list()(就像您可以进行任何迭代一样):

>>> list(makeRange(5))
[0, 1, 2, 3, 4]

将示例与“仅返回列表”进行比较

可以将上面的示例视为仅创建一个列表,并将其附加并返回:

# list-version                   #  # generator-version
def makeRange(n):                #  def makeRange(n):
    """return [0,1,2,...,n-1]""" #~     """return 0,1,2,...,n-1"""
    TO_RETURN = []               #>
    i = 0                        #      i = 0
    while i < n:                 #      while i < n:
        TO_RETURN += [i]         #~         yield i
        i += 1                   #          i += 1  ## indented
    return TO_RETURN             #>

>>> makeRange(5)
[0, 1, 2, 3, 4]

但是,有一个主要区别。请参阅最后一节。


您如何使用生成器

可迭代是列表理解的最后一部分,并且所有生成器都是可迭代的,因此经常像这样使用它们:

#                   _ITERABLE_
>>> [x+10 for x in makeRange(5)]
[10, 11, 12, 13, 14]

为了使生成器更好地使用,您可以使用该itertools模块(一定要使用chain.from_iterable而不是chain在保修期内)。例如,您甚至可以使用生成器来实现无限长的惰性列表,例如itertools.count()。您可以实现自己的def enumerate(iterable): zip(count(), iterable),也可以yield在while循环中使用关键字来实现。

请注意:生成器实际上可以用于更多事情,例如实现协程或不确定性编程或其他优雅的事情。但是,我在这里提出的“惰性列表”观点是您会发现的最常见用法。


幕后花絮

这就是“ Python迭代协议”的工作方式。就是说,当你做什么的时候list(makeRange(5))。这就是我之前所说的“懒惰的增量列表”。

>>> x=iter(range(5))
>>> next(x)
0
>>> next(x)
1
>>> next(x)
2
>>> next(x)
3
>>> next(x)
4
>>> next(x)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
StopIteration

内置函数next()仅调用对象.next()函数,它是“迭代协议”的一部分,可以在所有迭代器上找到。您可以手动使用next()函数(以及迭代协议的其他部分)来实现奇特的事情,通常是以牺牲可读性为代价的,因此请避免这样做。


细节

通常,大多数人不会关心以下区别,并且可能想在这里停止阅读。

用Python来说,可迭代对象是“了解for循环的概念”的任何对象,例如列表[1,2,3],而迭代器是所请求的for循环的特定实例,例如[1,2,3].__iter__()。一个生成器是完全一样的任何迭代器,除了它是写(带有功能语法)的方式。

当您从列表中请求迭代器时,它将创建一个新的迭代器。但是,当您从迭代器请求迭代器时(很少这样做),它只会为您提供自身的副本。

因此,在极少数情况下,您可能无法执行此类操作…

> x = myRange(5)
> list(x)
[0, 1, 2, 3, 4]
> list(x)
[]

…然后记住生成器是迭代器 ; 即是一次性使用。如果要重用它,则应myRange(...)再次调用。如果需要两次使用结果,请将结果转换为列表并将其存储在变量中x = list(myRange(5))。那些绝对需要克隆生成器的人(例如,正在可怕地修改程序的人)可以itertools.tee在绝对必要的情况下使用,因为可复制的迭代器Python PEP标准建议已被推迟。

The yield keyword is reduced to two simple facts:

  1. If the compiler detects the yield keyword anywhere inside a function, that function no longer returns via the return statement. Instead, it immediately returns a lazy “pending list” object called a generator
  2. A generator is iterable. What is an iterable? It’s anything like a list or set or range or dict-view, with a built-in protocol for visiting each element in a certain order.

In a nutshell: a generator is a lazy, incrementally-pending list, and yield statements allow you to use function notation to program the list values the generator should incrementally spit out.

generator = myYieldingFunction(...)
x = list(generator)

   generator
       v
[x[0], ..., ???]

         generator
             v
[x[0], x[1], ..., ???]

               generator
                   v
[x[0], x[1], x[2], ..., ???]

                       StopIteration exception
[x[0], x[1], x[2]]     done

list==[x[0], x[1], x[2]]

Example

Let’s define a function makeRange that’s just like Python’s range. Calling makeRange(n) RETURNS A GENERATOR:

def makeRange(n):
    # return 0,1,2,...,n-1
    i = 0
    while i < n:
        yield i
        i += 1

>>> makeRange(5)
<generator object makeRange at 0x19e4aa0>

To force the generator to immediately return its pending values, you can pass it into list() (just like you could any iterable):

>>> list(makeRange(5))
[0, 1, 2, 3, 4]

Comparing example to “just returning a list”

The above example can be thought of as merely creating a list which you append to and return:

# list-version                   #  # generator-version
def makeRange(n):                #  def makeRange(n):
    """return [0,1,2,...,n-1]""" #~     """return 0,1,2,...,n-1"""
    TO_RETURN = []               #>
    i = 0                        #      i = 0
    while i < n:                 #      while i < n:
        TO_RETURN += [i]         #~         yield i
        i += 1                   #          i += 1  ## indented
    return TO_RETURN             #>

>>> makeRange(5)
[0, 1, 2, 3, 4]

There is one major difference, though; see the last section.


How you might use generators

An iterable is the last part of a list comprehension, and all generators are iterable, so they’re often used like so:

#                   _ITERABLE_
>>> [x+10 for x in makeRange(5)]
[10, 11, 12, 13, 14]

To get a better feel for generators, you can play around with the itertools module (be sure to use chain.from_iterable rather than chain when warranted). For example, you might even use generators to implement infinitely-long lazy lists like itertools.count(). You could implement your own def enumerate(iterable): zip(count(), iterable), or alternatively do so with the yield keyword in a while-loop.

Please note: generators can actually be used for many more things, such as implementing coroutines or non-deterministic programming or other elegant things. However, the “lazy lists” viewpoint I present here is the most common use you will find.


Behind the scenes

This is how the “Python iteration protocol” works. That is, what is going on when you do list(makeRange(5)). This is what I describe earlier as a “lazy, incremental list”.

>>> x=iter(range(5))
>>> next(x)
0
>>> next(x)
1
>>> next(x)
2
>>> next(x)
3
>>> next(x)
4
>>> next(x)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
StopIteration

The built-in function next() just calls the objects .next() function, which is a part of the “iteration protocol” and is found on all iterators. You can manually use the next() function (and other parts of the iteration protocol) to implement fancy things, usually at the expense of readability, so try to avoid doing that…


Minutiae

Normally, most people would not care about the following distinctions and probably want to stop reading here.

In Python-speak, an iterable is any object which “understands the concept of a for-loop” like a list [1,2,3], and an iterator is a specific instance of the requested for-loop like [1,2,3].__iter__(). A generator is exactly the same as any iterator, except for the way it was written (with function syntax).

When you request an iterator from a list, it creates a new iterator. However, when you request an iterator from an iterator (which you would rarely do), it just gives you a copy of itself.

Thus, in the unlikely event that you are failing to do something like this…

> x = myRange(5)
> list(x)
[0, 1, 2, 3, 4]
> list(x)
[]

… then remember that a generator is an iterator; that is, it is one-time-use. If you want to reuse it, you should call myRange(...) again. If you need to use the result twice, convert the result to a list and store it in a variable x = list(myRange(5)). Those who absolutely need to clone a generator (for example, who are doing terrifyingly hackish metaprogramming) can use itertools.tee if absolutely necessary, since the copyable iterator Python PEP standards proposal has been deferred.


回答 4

什么是yield关键词在Python呢?

答案大纲/摘要

  • 具有的函数yield在被调用时将返回Generator
  • 生成器是迭代器,因为它们实现了迭代器协议,因此您可以对其进行迭代。
  • 也可以生成器发送信息,使其在概念上成为协程
  • 在Python 3中,您可以使用双向一个生成器委托给另一个生成器yield from
  • (附录对几个答案进行了评论,包括最上面的一个,并讨论了return在生成器中的用法。)

生成器:

yield仅在函数定义内部合法,并且函数定义中包含yield使其返回生成器。

生成器的想法来自具有不同实现方式的其他语言(请参见脚注1)。在Python的Generators中,代码的执行会在收益率点冻结。调用生成器时(下面将讨论方法),恢复执行,然后冻结下一个Yield。

yield提供了一种实现迭代器协议的简便方法,该协议由以下两种方法定义: __iter__next(Python 2)或__next__(Python 3)。这两种方法都使对象成为迭代器,您可以使用模块中的IteratorAbstract Base Class对其进行类型检查collections

>>> def func():
...     yield 'I am'
...     yield 'a generator!'
... 
>>> type(func)                 # A function with yield is still a function
<type 'function'>
>>> gen = func()
>>> type(gen)                  # but it returns a generator
<type 'generator'>
>>> hasattr(gen, '__iter__')   # that's an iterable
True
>>> hasattr(gen, 'next')       # and with .next (.__next__ in Python 3)
True                           # implements the iterator protocol.

生成器类型是迭代器的子类型:

>>> import collections, types
>>> issubclass(types.GeneratorType, collections.Iterator)
True

并且如有必要,我们可以像这样进行类型检查:

>>> isinstance(gen, types.GeneratorType)
True
>>> isinstance(gen, collections.Iterator)
True

的一个功能Iterator 是,一旦用尽,您将无法重复使用或重置它:

>>> list(gen)
['I am', 'a generator!']
>>> list(gen)
[]

如果要再次使用其功能,则必须另做一个(请参见脚注2):

>>> list(func())
['I am', 'a generator!']

一个人可以通过编程方式产生数据,例如:

def func(an_iterable):
    for item in an_iterable:
        yield item

上面的简单生成器也等效于下面的生成器-从Python 3.3开始(在Python 2中不可用),您可以使用yield from

def func(an_iterable):
    yield from an_iterable

但是,yield from还允许委派给子生成器,这将在以下有关使用子协程进行合作委派的部分中进行解释。

协程:

yield 形成一个表达式,该表达式允许将数据发送到生成器中(请参见脚注3)

这是一个示例,请注意该received变量,该变量将指向发送到生成器的数据:

def bank_account(deposited, interest_rate):
    while True:
        calculated_interest = interest_rate * deposited 
        received = yield calculated_interest
        if received:
            deposited += received


>>> my_account = bank_account(1000, .05)

首先,我们必须使内置函数生成器排队next。它将调用适当的next__next__方法,具体取决于您所使用的Python版本:

>>> first_year_interest = next(my_account)
>>> first_year_interest
50.0

现在我们可以将数据发送到生成器中。(发送None与呼叫相同next。):

>>> next_year_interest = my_account.send(first_year_interest + 1000)
>>> next_year_interest
102.5

合作协办小组 yield from

现在,回想一下yield fromPython 3中可用的功能。这使我们可以将协程委托给子协程:

def money_manager(expected_rate):
    under_management = yield     # must receive deposited value
    while True:
        try:
            additional_investment = yield expected_rate * under_management 
            if additional_investment:
                under_management += additional_investment
        except GeneratorExit:
            '''TODO: write function to send unclaimed funds to state'''
        finally:
            '''TODO: write function to mail tax info to client'''


def investment_account(deposited, manager):
    '''very simple model of an investment account that delegates to a manager'''
    next(manager) # must queue up manager
    manager.send(deposited)
    while True:
        try:
            yield from manager
        except GeneratorExit:
            return manager.close()

现在我们可以将功能委派给子生成器,并且生成器可以像上面一样使用它:

>>> my_manager = money_manager(.06)
>>> my_account = investment_account(1000, my_manager)
>>> first_year_return = next(my_account)
>>> first_year_return
60.0
>>> next_year_return = my_account.send(first_year_return + 1000)
>>> next_year_return
123.6

你可以阅读更多的精确语义yield fromPEP 380。

其他方法:关闭并抛出

close方法GeneratorExit在函数执行被冻结的时候引发。这也将由调用,__del__因此您可以将任何清理代码放在处理位置GeneratorExit

>>> my_account.close()

您还可以引发异常,该异常可以在生成器中处理或传播回用户:

>>> import sys
>>> try:
...     raise ValueError
... except:
...     my_manager.throw(*sys.exc_info())
... 
Traceback (most recent call last):
  File "<stdin>", line 4, in <module>
  File "<stdin>", line 2, in <module>
ValueError

结论

我相信我已经涵盖了以下问题的各个方面:

什么是yield关键词在Python呢?

事实证明,这样yield做确实很有帮助。我相信我可以为此添加更详尽的示例。如果您想要更多或有建设性的批评,请在下面评论中告诉我。


附录:

对最佳/可接受答案的评论**

  • 仅以列表为例,它对使可迭代的内容感到困惑。请参阅上面的参考资料,但总而言之:iterable具有__iter__返回iterator的方法。一个迭代器提供了一个.next(Python 2里或.__next__(Python 3的)方法,它是隐式由称为for循环,直到它提出StopIteration,并且一旦这样做,将继续这样做。
  • 然后,它使用生成器表达式来描述什么是生成器。由于生成器只是创建迭代器的一种简便方法,因此它只会使事情变得混乱,而我们仍然没有涉及到这一yield部分。
  • 控制生成器的排气中,他调用了.next方法,而应该使用内置函数next。这将是一个适当的间接层,因为他的代码在Python 3中不起作用。
  • Itertools?这根本与做什么无关yield
  • 没有讨论yieldyield fromPython 3中的新功能一起提供的方法。最高/可接受的答案是非常不完整的答案。

yield生成器表达或理解中提出的答案的评论。

该语法当前允许列表理解中的任何表达式。

expr_stmt: testlist_star_expr (annassign | augassign (yield_expr|testlist) |
                     ('=' (yield_expr|testlist_star_expr))*)
...
yield_expr: 'yield' [yield_arg]
yield_arg: 'from' test | testlist

由于yield是一种表达,因此尽管没有特别好的用例,但有人认为它可以用于理解或生成器表达中。

CPython核心开发人员正在讨论弃用其津贴。这是邮件列表中的相关帖子:

2017年1月30日19:05,布雷特·坎农写道:

2017年1月29日星期日,克雷格·罗德里格斯(Craig Rodrigues)在星期日写道:

两种方法我都可以。恕我直言,把事情留在Python 3中是不好的。

我的投票是SyntaxError,因为您没有从语法中得到期望。

我同意这对我们来说是一个明智的选择,因为依赖当前行为的任何代码确实太聪明了,无法维护。

在到达目的地方面,我们可能需要:

  • 3.7中的语法警告或弃用警告
  • 2.7.x中的Py3k警告
  • 3.8中的SyntaxError

干杯,尼克。

-Nick Coghlan | gmail.com上的ncoghlan | 澳大利亚布里斯班

此外,还有一个悬而未决的问题(10544),似乎正说明这绝不是一个好主意(PyPy,用Python编写的Python实现,已经在发出语法警告。)

最重要的是,直到CPython的开发人员另行告诉我们为止:不要放入yield生成器表达式或理解。

return生成器中的语句

Python 2中

在生成器函数中,该return语句不允许包含expression_list。在这种情况下,裸露return表示生成器已完成并且将引起StopIteration提升。

An expression_list基本上是由逗号分隔的任意数量的表达式-本质上,在Python 2中,您可以使用停止生成器return,但不能返回值。

Python 3中

在生成器函数中,该return语句指示生成器完成并且将引起StopIteration提升。返回的值(如果有)用作构造的参数,StopIteration并成为StopIteration.value属性。

脚注

  1. 提案中引用了CLU,Sather和Icon语言,以将生成器的概念引入Python。总体思路是,一个函数可以维护内部状态并根据用户的需要产生中间数据点。这有望在性能上优于其他方法,包括Python线程,该方法甚至在某些系统上不可用。

  2. 例如,这意味着xrange对象(range在Python 3中)不是Iterator,即使它们是可迭代的,因为它们可以被重用。像列表一样,它们的__iter__方法返回迭代器对象。

  3. yield最初是作为语句引入的,这意味着它只能出现在代码块的一行的开头。现在yield创建一个yield表达式。 https://docs.python.org/2/reference/simple_stmts.html#grammar-token-yield_stmt 提出 此更改是为了允许用户将数据发送到生成器中,就像接收数据一样。要发送数据,必须能够将其分配给某物,为此,一条语句就行不通了。

What does the yield keyword do in Python?

Answer Outline/Summary

  • A function with yield, when called, returns a Generator.
  • Generators are iterators because they implement the iterator protocol, so you can iterate over them.
  • A generator can also be sent information, making it conceptually a coroutine.
  • In Python 3, you can delegate from one generator to another in both directions with yield from.
  • (Appendix critiques a couple of answers, including the top one, and discusses the use of return in a generator.)

Generators:

yield is only legal inside of a function definition, and the inclusion of yield in a function definition makes it return a generator.

The idea for generators comes from other languages (see footnote 1) with varying implementations. In Python’s Generators, the execution of the code is frozen at the point of the yield. When the generator is called (methods are discussed below) execution resumes and then freezes at the next yield.

yield provides an easy way of implementing the iterator protocol, defined by the following two methods: __iter__ and next (Python 2) or __next__ (Python 3). Both of those methods make an object an iterator that you could type-check with the Iterator Abstract Base Class from the collections module.

>>> def func():
...     yield 'I am'
...     yield 'a generator!'
... 
>>> type(func)                 # A function with yield is still a function
<type 'function'>
>>> gen = func()
>>> type(gen)                  # but it returns a generator
<type 'generator'>
>>> hasattr(gen, '__iter__')   # that's an iterable
True
>>> hasattr(gen, 'next')       # and with .next (.__next__ in Python 3)
True                           # implements the iterator protocol.

The generator type is a sub-type of iterator:

>>> import collections, types
>>> issubclass(types.GeneratorType, collections.Iterator)
True

And if necessary, we can type-check like this:

>>> isinstance(gen, types.GeneratorType)
True
>>> isinstance(gen, collections.Iterator)
True

A feature of an Iterator is that once exhausted, you can’t reuse or reset it:

>>> list(gen)
['I am', 'a generator!']
>>> list(gen)
[]

You’ll have to make another if you want to use its functionality again (see footnote 2):

>>> list(func())
['I am', 'a generator!']

One can yield data programmatically, for example:

def func(an_iterable):
    for item in an_iterable:
        yield item

The above simple generator is also equivalent to the below – as of Python 3.3 (and not available in Python 2), you can use yield from:

def func(an_iterable):
    yield from an_iterable

However, yield from also allows for delegation to subgenerators, which will be explained in the following section on cooperative delegation with sub-coroutines.

Coroutines:

yield forms an expression that allows data to be sent into the generator (see footnote 3)

Here is an example, take note of the received variable, which will point to the data that is sent to the generator:

def bank_account(deposited, interest_rate):
    while True:
        calculated_interest = interest_rate * deposited 
        received = yield calculated_interest
        if received:
            deposited += received


>>> my_account = bank_account(1000, .05)

First, we must queue up the generator with the builtin function, next. It will call the appropriate next or __next__ method, depending on the version of Python you are using:

>>> first_year_interest = next(my_account)
>>> first_year_interest
50.0

And now we can send data into the generator. (Sending None is the same as calling next.) :

>>> next_year_interest = my_account.send(first_year_interest + 1000)
>>> next_year_interest
102.5

Cooperative Delegation to Sub-Coroutine with yield from

Now, recall that yield from is available in Python 3. This allows us to delegate coroutines to a subcoroutine:

def money_manager(expected_rate):
    under_management = yield     # must receive deposited value
    while True:
        try:
            additional_investment = yield expected_rate * under_management 
            if additional_investment:
                under_management += additional_investment
        except GeneratorExit:
            '''TODO: write function to send unclaimed funds to state'''
        finally:
            '''TODO: write function to mail tax info to client'''


def investment_account(deposited, manager):
    '''very simple model of an investment account that delegates to a manager'''
    next(manager) # must queue up manager
    manager.send(deposited)
    while True:
        try:
            yield from manager
        except GeneratorExit:
            return manager.close()

And now we can delegate functionality to a sub-generator and it can be used by a generator just as above:

>>> my_manager = money_manager(.06)
>>> my_account = investment_account(1000, my_manager)
>>> first_year_return = next(my_account)
>>> first_year_return
60.0
>>> next_year_return = my_account.send(first_year_return + 1000)
>>> next_year_return
123.6

You can read more about the precise semantics of yield from in PEP 380.

Other Methods: close and throw

The close method raises GeneratorExit at the point the function execution was frozen. This will also be called by __del__ so you can put any cleanup code where you handle the GeneratorExit:

>>> my_account.close()

You can also throw an exception which can be handled in the generator or propagated back to the user:

>>> import sys
>>> try:
...     raise ValueError
... except:
...     my_manager.throw(*sys.exc_info())
... 
Traceback (most recent call last):
  File "<stdin>", line 4, in <module>
  File "<stdin>", line 2, in <module>
ValueError

Conclusion

I believe I have covered all aspects of the following question:

What does the yield keyword do in Python?

It turns out that yield does a lot. I’m sure I could add even more thorough examples to this. If you want more or have some constructive criticism, let me know by commenting below.


Appendix:

Critique of the Top/Accepted Answer**

  • It is confused on what makes an iterable, just using a list as an example. See my references above, but in summary: an iterable has an __iter__ method returning an iterator. An iterator provides a .next (Python 2 or .__next__ (Python 3) method, which is implicitly called by for loops until it raises StopIteration, and once it does, it will continue to do so.
  • It then uses a generator expression to describe what a generator is. Since a generator is simply a convenient way to create an iterator, it only confuses the matter, and we still have not yet gotten to the yield part.
  • In Controlling a generator exhaustion he calls the .next method, when instead he should use the builtin function, next. It would be an appropriate layer of indirection, because his code does not work in Python 3.
  • Itertools? This was not relevant to what yield does at all.
  • No discussion of the methods that yield provides along with the new functionality yield from in Python 3. The top/accepted answer is a very incomplete answer.

Critique of answer suggesting yield in a generator expression or comprehension.

The grammar currently allows any expression in a list comprehension.

expr_stmt: testlist_star_expr (annassign | augassign (yield_expr|testlist) |
                     ('=' (yield_expr|testlist_star_expr))*)
...
yield_expr: 'yield' [yield_arg]
yield_arg: 'from' test | testlist

Since yield is an expression, it has been touted by some as interesting to use it in comprehensions or generator expression – in spite of citing no particularly good use-case.

The CPython core developers are discussing deprecating its allowance. Here’s a relevant post from the mailing list:

On 30 January 2017 at 19:05, Brett Cannon wrote:

On Sun, 29 Jan 2017 at 16:39 Craig Rodrigues wrote:

I’m OK with either approach. Leaving things the way they are in Python 3 is no good, IMHO.

My vote is it be a SyntaxError since you’re not getting what you expect from the syntax.

I’d agree that’s a sensible place for us to end up, as any code relying on the current behaviour is really too clever to be maintainable.

In terms of getting there, we’ll likely want:

  • SyntaxWarning or DeprecationWarning in 3.7
  • Py3k warning in 2.7.x
  • SyntaxError in 3.8

Cheers, Nick.

— Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia

Further, there is an outstanding issue (10544) which seems to be pointing in the direction of this never being a good idea (PyPy, a Python implementation written in Python, is already raising syntax warnings.)

Bottom line, until the developers of CPython tell us otherwise: Don’t put yield in a generator expression or comprehension.

The return statement in a generator

In Python 2:

In a generator function, the return statement is not allowed to include an expression_list. In that context, a bare return indicates that the generator is done and will cause StopIteration to be raised.

An expression_list is basically any number of expressions separated by commas – essentially, in Python 2, you can stop the generator with return, but you can’t return a value.

In Python 3:

In a generator function, the return statement indicates that the generator is done and will cause StopIteration to be raised. The returned value (if any) is used as an argument to construct StopIteration and becomes the StopIteration.value attribute.

Footnotes

  1. The languages CLU, Sather, and Icon were referenced in the proposal to introduce the concept of generators to Python. The general idea is that a function can maintain internal state and yield intermediate data points on demand by the user. This promised to be superior in performance to other approaches, including Python threading, which isn’t even available on some systems.

  2. This means, for example, that xrange objects (range in Python 3) aren’t Iterators, even though they are iterable, because they can be reused. Like lists, their __iter__ methods return iterator objects.

  3. yield was originally introduced as a statement, meaning that it could only appear at the beginning of a line in a code block. Now yield creates a yield expression. https://docs.python.org/2/reference/simple_stmts.html#grammar-token-yield_stmt This change was proposed to allow a user to send data into the generator just as one might receive it. To send data, one must be able to assign it to something, and for that, a statement just won’t work.


回答 5

yield就像return-它返回您告诉的内容(作为生成器)。不同之处在于,下一次您调用生成器时,执行将从上一次对yield语句的调用开始。与return不同的是,在产生良率时不会清除堆栈帧,但是会将控制权转移回调用方,因此下次调用该函数时,其状态将恢复。

就您的代码而言,该函数get_child_candidates的作用就像一个迭代器,以便在扩展列表时,它一次将一个元素添加到新列表中。

list.extend调用迭代器,直到耗尽为止。在您发布的代码示例的情况下,只返回一个元组并将其添加到列表中会更加清楚。

yield is just like return – it returns whatever you tell it to (as a generator). The difference is that the next time you call the generator, execution starts from the last call to the yield statement. Unlike return, the stack frame is not cleaned up when a yield occurs, however control is transferred back to the caller, so its state will resume the next time the function is called.

In the case of your code, the function get_child_candidates is acting like an iterator so that when you extend your list, it adds one element at a time to the new list.

list.extend calls an iterator until it’s exhausted. In the case of the code sample you posted, it would be much clearer to just return a tuple and append that to the list.


回答 6

还有另外一件事要提及:yield的函数实际上不必终止。我写了这样的代码:

def fib():
    last, cur = 0, 1
    while True: 
        yield cur
        last, cur = cur, last + cur

然后我可以在其他代码中使用它:

for f in fib():
    if some_condition: break
    coolfuncs(f);

它确实有助于简化某些问题,并使某些事情更易于使用。

There’s one extra thing to mention: a function that yields doesn’t actually have to terminate. I’ve written code like this:

def fib():
    last, cur = 0, 1
    while True: 
        yield cur
        last, cur = cur, last + cur

Then I can use it in other code like this:

for f in fib():
    if some_condition: break
    coolfuncs(f);

It really helps simplify some problems, and makes some things easier to work with.


回答 7

对于那些偏爱简单示例的人,请在此交互式Python会话中进行冥想:

>>> def f():
...   yield 1
...   yield 2
...   yield 3
... 
>>> g = f()
>>> for i in g:
...   print(i)
... 
1
2
3
>>> for i in g:
...   print(i)
... 
>>> # Note that this time nothing was printed

For those who prefer a minimal working example, meditate on this interactive Python session:

>>> def f():
...   yield 1
...   yield 2
...   yield 3
... 
>>> g = f()
>>> for i in g:
...   print(i)
... 
1
2
3
>>> for i in g:
...   print(i)
... 
>>> # Note that this time nothing was printed

回答 8

TL; DR

代替这个:

def square_list(n):
    the_list = []                         # Replace
    for x in range(n):
        y = x * x
        the_list.append(y)                # these
    return the_list                       # lines

做这个:

def square_yield(n):
    for x in range(n):
        y = x * x
        yield y                           # with this one.

每当您发现自己从头开始建立清单时,就yield逐一列出。

这是我第一次屈服。


yield是一种含蓄的说法

建立一系列的东西

相同的行为:

>>> for square in square_list(4):
...     print(square)
...
0
1
4
9
>>> for square in square_yield(4):
...     print(square)
...
0
1
4
9

不同的行为:

收益是单次通过:您只能迭代一次。当一个函数包含一个yield时,我们称其为Generator函数。还有一个迭代器就是它返回的内容。这些术语在揭示。我们失去了容器的便利性,但获得了按需计算且任意长的序列的功效。

Yield懒惰,推迟了计算。当您调用函数时,其中包含yield的函数实际上根本不会执行。它返回一个迭代器对象,该对象记住它从何处中断。每次您调用next()迭代器(这在for循环中发生)时,执行都会向前推进到下一个收益。return引发StopIteration并结束序列(这是for循环的自然结束)。

Yield多才多艺。数据不必全部存储在一起,可以一次存储一次。它可以是无限的。

>>> def squares_all_of_them():
...     x = 0
...     while True:
...         yield x * x
...         x += 1
...
>>> squares = squares_all_of_them()
>>> for _ in range(4):
...     print(next(squares))
...
0
1
4
9

如果您需要多次通过,而系列又不太长,只需调用list()它:

>>> list(square_yield(4))
[0, 1, 4, 9]

单词的出色选择,yield因为两种含义都适用:

Yield —生产或提供(如在农业中)

…提供系列中的下一个数据。

屈服 —让步或放弃(如在政治权力中一样)

…放弃CPU执行,直到迭代器前进。

TL;DR

Instead of this:

def square_list(n):
    the_list = []                         # Replace
    for x in range(n):
        y = x * x
        the_list.append(y)                # these
    return the_list                       # lines

do this:

def square_yield(n):
    for x in range(n):
        y = x * x
        yield y                           # with this one.

Whenever you find yourself building a list from scratch, yield each piece instead.

This was my first “aha” moment with yield.


yield is a sugary way to say

build a series of stuff

Same behavior:

>>> for square in square_list(4):
...     print(square)
...
0
1
4
9
>>> for square in square_yield(4):
...     print(square)
...
0
1
4
9

Different behavior:

Yield is single-pass: you can only iterate through once. When a function has a yield in it we call it a generator function. And an iterator is what it returns. Those terms are revealing. We lose the convenience of a container, but gain the power of a series that’s computed as needed, and arbitrarily long.

Yield is lazy, it puts off computation. A function with a yield in it doesn’t actually execute at all when you call it. It returns an iterator object that remembers where it left off. Each time you call next() on the iterator (this happens in a for-loop) execution inches forward to the next yield. return raises StopIteration and ends the series (this is the natural end of a for-loop).

Yield is versatile. Data doesn’t have to be stored all together, it can be made available one at a time. It can be infinite.

>>> def squares_all_of_them():
...     x = 0
...     while True:
...         yield x * x
...         x += 1
...
>>> squares = squares_all_of_them()
>>> for _ in range(4):
...     print(next(squares))
...
0
1
4
9

If you need multiple passes and the series isn’t too long, just call list() on it:

>>> list(square_yield(4))
[0, 1, 4, 9]

Brilliant choice of the word yield because both meanings apply:

yield — produce or provide (as in agriculture)

…provide the next data in the series.

yield — give way or relinquish (as in political power)

…relinquish CPU execution until the iterator advances.


回答 9

Yield可以为您提供生成器。

def get_odd_numbers(i):
    return range(1, i, 2)
def yield_odd_numbers(i):
    for x in range(1, i, 2):
       yield x
foo = get_odd_numbers(10)
bar = yield_odd_numbers(10)
foo
[1, 3, 5, 7, 9]
bar
<generator object yield_odd_numbers at 0x1029c6f50>
bar.next()
1
bar.next()
3
bar.next()
5

如您所见,在第一种情况下,foo将整个列表立即保存在内存中。对于包含5个元素的列表来说,这不是什么大问题,但是如果您想要500万个列表,该怎么办?这不仅是一个巨大的内存消耗者,而且在调用该函数时还花费大量时间来构建。

在第二种情况下,bar只需为您提供一个生成器。生成器是可迭代的-这意味着您可以在for循环等中使用它,但是每个值只能被访问一次。所有的值也不会同时存储在存储器中。生成器对象“记住”您上次调用它时在循环中的位置-这样,如果您使用的是一个迭代的(例如)计数为500亿,则不必计数为500亿立即存储500亿个数字以进行计算。

再次,这是一个非常人为的示例,如果您真的想计数到500亿,则可能会使用itertools。:)

这是生成器最简单的用例。如您所说,它可以用来编写有效的排列,使用yield可以将内容推入调用堆栈,而不是使用某种堆栈变量。生成器还可以用于特殊的树遍历以及所有其他方式。

Yield gives you a generator.

def get_odd_numbers(i):
    return range(1, i, 2)
def yield_odd_numbers(i):
    for x in range(1, i, 2):
       yield x
foo = get_odd_numbers(10)
bar = yield_odd_numbers(10)
foo
[1, 3, 5, 7, 9]
bar
<generator object yield_odd_numbers at 0x1029c6f50>
bar.next()
1
bar.next()
3
bar.next()
5

As you can see, in the first case foo holds the entire list in memory at once. It’s not a big deal for a list with 5 elements, but what if you want a list of 5 million? Not only is this a huge memory eater, it also costs a lot of time to build at the time that the function is called.

In the second case, bar just gives you a generator. A generator is an iterable–which means you can use it in a for loop, etc, but each value can only be accessed once. All the values are also not stored in memory at the same time; the generator object “remembers” where it was in the looping the last time you called it–this way, if you’re using an iterable to (say) count to 50 billion, you don’t have to count to 50 billion all at once and store the 50 billion numbers to count through.

Again, this is a pretty contrived example, you probably would use itertools if you really wanted to count to 50 billion. :)

This is the most simple use case of generators. As you said, it can be used to write efficient permutations, using yield to push things up through the call stack instead of using some sort of stack variable. Generators can also be used for specialized tree traversal, and all manner of other things.


回答 10

它正在返回生成器。我对Python并不是特别熟悉,但是如果您熟悉C#的迭代器块,我相信它与C#的迭代器块一样

关键思想是,编译器/解释器/无论做什么都做一些技巧,以便就调用者而言,他们可以继续调用next(),并且将继续返回值- 就像Generator方法已暂停一样。现在显然您不能真正地“暂停”方法,因此编译器构建了一个状态机,供您记住您当前所​​在的位置以及局部变量等的外观。这比自己编写迭代器要容易得多。

It’s returning a generator. I’m not particularly familiar with Python, but I believe it’s the same kind of thing as C#’s iterator blocks if you’re familiar with those.

The key idea is that the compiler/interpreter/whatever does some trickery so that as far as the caller is concerned, they can keep calling next() and it will keep returning values – as if the generator method was paused. Now obviously you can’t really “pause” a method, so the compiler builds a state machine for you to remember where you currently are and what the local variables etc look like. This is much easier than writing an iterator yourself.


回答 11

在描述如何使用生成器的许多很棒的答案中,我还没有给出一种答案。这是编程语言理论的答案:

yieldPython中的语句返回一个生成器。Python中的生成器是一个返回延续的函数(特别是协程类型,但是延续代表了一种更通用的机制来了解正在发生的事情)。

编程语言理论中的连续性是一种更为基础的计算,但是由于它们很难推理而且也很难实现,因此并不经常使用。但是,关于延续是什么的想法很简单:只是尚未完成的计算状态。在此状态下,将保存变量的当前值,尚未执行的操作等。然后,在稍后的某个时刻,可以在程序中调用继续,以便将程序的变量重置为该状态,并执行保存的操作。

以这种更一般的形式进行的延续可以两种方式实现。在call/cc方式,程序的堆栈字面上保存,然后调用延续时,堆栈恢复。

在延续传递样式(CPS)中,延续只是普通的函数(仅在函数是第一类的语言中),程序员明确地对其进行管理并传递给子例程。以这种方式,程序状态由闭包(以及恰好在其中编码的变量)表示,而不是驻留在堆栈中某个位置的变量。管理控制流的函数接受连续作为参数(在CPS的某些变体中,函数可以接受多个连续),并通过简单地调用它们并随后返回来调用它们来操纵控制流。延续传递样式的一个非常简单的示例如下:

def save_file(filename):
  def write_file_continuation():
    write_stuff_to_file(filename)

  check_if_file_exists_and_user_wants_to_overwrite(write_file_continuation)

在这个(非常简单的)示例中,程序员保存了将文件实际写入连续的操作(该操作可能是非常复杂的操作,需要写出许多细节),然后传递该连续(例如,首先类闭包)给另一个进行更多处理的运算符,然后在必要时调用它。(我在实际的GUI编程中经常使用这种设计模式,这是因为它节省了我的代码行,或更重要的是,在GUI事件触发后管理了控制流。)

在不失一般性的前提下,本文的其余部分将连续性概念化为CPS,因为它很容易理解和阅读。


现在让我们谈谈Python中的生成器。生成器是延续的特定子类型。而延续能够在一般的保存状态计算(即程序调用堆栈),生成器只能保存迭代的状态经过一个迭代器。虽然,对于生成器的某些用例,此定义有些误导。例如:

def f():
  while True:
    yield 4

显然,这是一个合理的迭代器,其行为已得到很好的定义-每次生成器对其进行迭代时,它都会返回4(并永远这样做)。但是,在考虑迭代器(即for x in collection: do_something(x))时,可能并没有想到可迭代的原型类型。此示例说明了生成器的功能:如果有什么是迭代器,生成器可以保存其迭代状态。

重申一下:连续可以保存程序堆栈的状态,而生成器可以保存迭代的状态。这意味着延续比生成器强大得多,但是生成器也非常简单。它们对于语言设计者来说更容易实现,对程序员来说也更容易使用(如果您有时间要燃烧,请尝试阅读并理解有关延续和call / cc的本页)。

但是您可以轻松地将生成器实现(并概念化)为连续传递样式的一种简单的特定情况:

每当yield调用时,它告诉函数返回一个延续。再次调用该函数时,将从中断处开始。因此,在伪伪代码(即不是伪代码,而不是代码)中,生成器的next方法基本上如下:

class Generator():
  def __init__(self,iterable,generatorfun):
    self.next_continuation = lambda:generatorfun(iterable)

  def next(self):
    value, next_continuation = self.next_continuation()
    self.next_continuation = next_continuation
    return value

其中,yield关键字实际上是真正的生成器功能语法糖,基本上是这样的:

def generatorfun(iterable):
  if len(iterable) == 0:
    raise StopIteration
  else:
    return (iterable[0], lambda:generatorfun(iterable[1:]))

请记住,这只是伪代码,Python中生成器的实际实现更为复杂。但是,作为练习以了解发生了什么,请尝试使用连续传递样式来实现生成器对象,而不使用yield关键字。

There is one type of answer that I don’t feel has been given yet, among the many great answers that describe how to use generators. Here is the programming language theory answer:

The yield statement in Python returns a generator. A generator in Python is a function that returns continuations (and specifically a type of coroutine, but continuations represent the more general mechanism to understand what is going on).

Continuations in programming languages theory are a much more fundamental kind of computation, but they are not often used, because they are extremely hard to reason about and also very difficult to implement. But the idea of what a continuation is, is straightforward: it is the state of a computation that has not yet finished. In this state, the current values of variables, the operations that have yet to be performed, and so on, are saved. Then at some point later in the program the continuation can be invoked, such that the program’s variables are reset to that state and the operations that were saved are carried out.

Continuations, in this more general form, can be implemented in two ways. In the call/cc way, the program’s stack is literally saved and then when the continuation is invoked, the stack is restored.

In continuation passing style (CPS), continuations are just normal functions (only in languages where functions are first class) which the programmer explicitly manages and passes around to subroutines. In this style, program state is represented by closures (and the variables that happen to be encoded in them) rather than variables that reside somewhere on the stack. Functions that manage control flow accept continuation as arguments (in some variations of CPS, functions may accept multiple continuations) and manipulate control flow by invoking them by simply calling them and returning afterwards. A very simple example of continuation passing style is as follows:

def save_file(filename):
  def write_file_continuation():
    write_stuff_to_file(filename)

  check_if_file_exists_and_user_wants_to_overwrite(write_file_continuation)

In this (very simplistic) example, the programmer saves the operation of actually writing the file into a continuation (which can potentially be a very complex operation with many details to write out), and then passes that continuation (i.e, as a first-class closure) to another operator which does some more processing, and then calls it if necessary. (I use this design pattern a lot in actual GUI programming, either because it saves me lines of code or, more importantly, to manage control flow after GUI events trigger.)

The rest of this post will, without loss of generality, conceptualize continuations as CPS, because it is a hell of a lot easier to understand and read.


Now let’s talk about generators in Python. Generators are a specific subtype of continuation. Whereas continuations are able in general to save the state of a computation (i.e., the program’s call stack), generators are only able to save the state of iteration over an iterator. Although, this definition is slightly misleading for certain use cases of generators. For instance:

def f():
  while True:
    yield 4

This is clearly a reasonable iterable whose behavior is well defined — each time the generator iterates over it, it returns 4 (and does so forever). But it isn’t probably the prototypical type of iterable that comes to mind when thinking of iterators (i.e., for x in collection: do_something(x)). This example illustrates the power of generators: if anything is an iterator, a generator can save the state of its iteration.

To reiterate: Continuations can save the state of a program’s stack and generators can save the state of iteration. This means that continuations are more a lot powerful than generators, but also that generators are a lot, lot easier. They are easier for the language designer to implement, and they are easier for the programmer to use (if you have some time to burn, try to read and understand this page about continuations and call/cc).

But you could easily implement (and conceptualize) generators as a simple, specific case of continuation passing style:

Whenever yield is called, it tells the function to return a continuation. When the function is called again, it starts from wherever it left off. So, in pseudo-pseudocode (i.e., not pseudocode, but not code) the generator’s next method is basically as follows:

class Generator():
  def __init__(self,iterable,generatorfun):
    self.next_continuation = lambda:generatorfun(iterable)

  def next(self):
    value, next_continuation = self.next_continuation()
    self.next_continuation = next_continuation
    return value

where the yield keyword is actually syntactic sugar for the real generator function, basically something like:

def generatorfun(iterable):
  if len(iterable) == 0:
    raise StopIteration
  else:
    return (iterable[0], lambda:generatorfun(iterable[1:]))

Remember that this is just pseudocode and the actual implementation of generators in Python is more complex. But as an exercise to understand what is going on, try to use continuation passing style to implement generator objects without use of the yield keyword.


回答 12

这是简单语言的示例。我将提供高级人类概念与低级Python概念之间的对应关系。

我想对数字序列进行运算,但是我不想为创建该序列而烦恼自己,我只想着重于自己想做的运算。因此,我执行以下操作:

  • 我打电话给你,告诉你我想要一个以特定方式产生的数字序列,让您知道算法是什么。
    此步骤对应于def生成器函数,即包含a的函数yield
  • 稍后,我告诉您,“好,准备告诉我数字的顺序”。
    此步骤对应于调用生成器函数,该函数返回生成器对象。请注意,您还没有告诉我任何数字。你只要拿起纸和铅笔。
  • 我问你,“告诉我下一个号码”,然后你告诉我第一个号码;之后,您等我问您下一个电话号码。记住您的位置,已经说过的电话号码以及下一个电话号码是您的工作。我不在乎细节。
    此步骤对应于调用.next()生成器对象。
  • …重复上一步,直到…
  • 最终,您可能会走到尽头。你不告诉我电话号码;您只是大声喊道:“抱马!我做完了!没有数字了!”
    此步骤对应于生成器对象结束其工作并引发StopIteration异常。生成器函数不需要引发异常。函数结束或发出时,它将自动引发return

这就是生成器的功能(包含的函数yield);它开始执行,在执行时暂停yield,并在要求输入.next()值时从上一个点继续执行。根据设计,它与Python的迭代器协议完美契合,该协议描述了如何顺序请求值。

迭代器协议最著名的用户是forPython中的命令。因此,无论何时执行以下操作:

for item in sequence:

不管sequence是列表,字符串,字典还是如上所述的生成器对象,都没有关系;结果是相同的:您从一个序列中逐个读取项目。

注意,def包含一个yield关键字的函数并不是创建生成器的唯一方法;这是创建一个的最简单的方法。

有关更准确的信息,请阅读Python文档中有关迭代器类型yield语句生成器的信息。

Here is an example in plain language. I will provide a correspondence between high-level human concepts to low-level Python concepts.

I want to operate on a sequence of numbers, but I don’t want to bother my self with the creation of that sequence, I want only to focus on the operation I want to do. So, I do the following:

  • I call you and tell you that I want a sequence of numbers which is produced in a specific way, and I let you know what the algorithm is.
    This step corresponds to defining the generator function, i.e. the function containing a yield.
  • Sometime later, I tell you, “OK, get ready to tell me the sequence of numbers”.
    This step corresponds to calling the generator function which returns a generator object. Note that you don’t tell me any numbers yet; you just grab your paper and pencil.
  • I ask you, “tell me the next number”, and you tell me the first number; after that, you wait for me to ask you for the next number. It’s your job to remember where you were, what numbers you have already said, and what is the next number. I don’t care about the details.
    This step corresponds to calling .next() on the generator object.
  • … repeat previous step, until…
  • eventually, you might come to an end. You don’t tell me a number; you just shout, “hold your horses! I’m done! No more numbers!”
    This step corresponds to the generator object ending its job, and raising a StopIteration exception The generator function does not need to raise the exception. It’s raised automatically when the function ends or issues a return.

This is what a generator does (a function that contains a yield); it starts executing, pauses whenever it does a yield, and when asked for a .next() value it continues from the point it was last. It fits perfectly by design with the iterator protocol of Python, which describes how to sequentially request values.

The most famous user of the iterator protocol is the for command in Python. So, whenever you do a:

for item in sequence:

it doesn’t matter if sequence is a list, a string, a dictionary or a generator object like described above; the result is the same: you read items off a sequence one by one.

Note that defining a function which contains a yield keyword is not the only way to create a generator; it’s just the easiest way to create one.

For more accurate information, read about iterator types, the yield statement and generators in the Python documentation.


回答 13

尽管有许多答案说明了为什么要使用a yield来生成生成器,但是的使用更多了yield。创建协程非常容易,这使信息可以在两个代码块之间传递。我不会重复任何有关使用yield生成器的优秀示例。

为了帮助理解yield以下代码中的功能,您可以用手指在带有的任何代码中跟踪循环yield。每次手指触摸时yield,您都必须等待输入a next或a send。当next被调用时,您通过跟踪代码,直到你打yield…上的右边的代码yield进行评估,并返回给调用者…那你就等着。当next再次被调用时,您将在代码中执行另一个循环。但是,您会注意到,在协程中,yield也可以与send… 一起使用,它将从调用方将值发送 yielding函数。如果send给出a,则yield接收到发送的值,然后将其吐到左侧…然后遍历代码,直到您yield再次单击为止(返回值,就像next被调用一样)。

例如:

>>> def coroutine():
...     i = -1
...     while True:
...         i += 1
...         val = (yield i)
...         print("Received %s" % val)
...
>>> sequence = coroutine()
>>> sequence.next()
0
>>> sequence.next()
Received None
1
>>> sequence.send('hello')
Received hello
2
>>> sequence.close()

While a lot of answers show why you’d use a yield to create a generator, there are more uses for yield. It’s quite easy to make a coroutine, which enables the passing of information between two blocks of code. I won’t repeat any of the fine examples that have already been given about using yield to create a generator.

To help understand what a yield does in the following code, you can use your finger to trace the cycle through any code that has a yield. Every time your finger hits the yield, you have to wait for a next or a send to be entered. When a next is called, you trace through the code until you hit the yield… the code on the right of the yield is evaluated and returned to the caller… then you wait. When next is called again, you perform another loop through the code. However, you’ll note that in a coroutine, yield can also be used with a send… which will send a value from the caller into the yielding function. If a send is given, then yield receives the value sent, and spits it out the left hand side… then the trace through the code progresses until you hit the yield again (returning the value at the end, as if next was called).

For example:

>>> def coroutine():
...     i = -1
...     while True:
...         i += 1
...         val = (yield i)
...         print("Received %s" % val)
...
>>> sequence = coroutine()
>>> sequence.next()
0
>>> sequence.next()
Received None
1
>>> sequence.send('hello')
Received hello
2
>>> sequence.close()

回答 14

还有另一个yield用途和含义(自Python 3.3起):

yield from <expr>

PEP 380-委托给子生成器的语法

提出了一种语法,供生成器将其部分操作委托给另一生成器。这允许包含“ yield”的一段代码被分解出来并放置在另一个生成器中。此外,允许子生成器返回一个值,并且该值可用于委派生成器。

当一个生成器重新产生由另一个生成器生成的值时,新语法还为优化提供了一些机会。

此外,将引入(自Python 3.5起):

async def new_coroutine(data):
   ...
   await blocking_action()

为了避免将协程与常规生成器混淆(今天yield在两者中都使用)。

There is another yield use and meaning (since Python 3.3):

yield from <expr>

From PEP 380 — Syntax for Delegating to a Subgenerator:

A syntax is proposed for a generator to delegate part of its operations to another generator. This allows a section of code containing ‘yield’ to be factored out and placed in another generator. Additionally, the subgenerator is allowed to return with a value, and the value is made available to the delegating generator.

The new syntax also opens up some opportunities for optimisation when one generator re-yields values produced by another.

Moreover this will introduce (since Python 3.5):

async def new_coroutine(data):
   ...
   await blocking_action()

to avoid coroutines being confused with a regular generator (today yield is used in both).


回答 15

所有好的答案,但是对于新手来说有点困难。

我认为您已经了解了该return声明。

作为一个比喻,returnyield是一对双胞胎。return表示“返回并停止”,而“收益”则表示“返回但继续”

  1. 尝试使用获取num_list return
def num_list(n):
    for i in range(n):
        return i

运行:

In [5]: num_list(3)
Out[5]: 0

看,您只会得到一个数字,而不是列表。return永远不要让你高高兴兴,只实现一次就退出。

  1. 来了 yield

替换returnyield

In [10]: def num_list(n):
    ...:     for i in range(n):
    ...:         yield i
    ...:

In [11]: num_list(3)
Out[11]: <generator object num_list at 0x10327c990>

In [12]: list(num_list(3))
Out[12]: [0, 1, 2]

现在,您将赢得所有数字。

与计划return一次运行和停止yield运行的时间进行比较。你可以理解returnreturn one of them,和yield作为return all of them。这称为iterable

  1. 我们可以yield使用以下步骤重写语句return
In [15]: def num_list(n):
    ...:     result = []
    ...:     for i in range(n):
    ...:         result.append(i)
    ...:     return result

In [16]: num_list(3)
Out[16]: [0, 1, 2]

这是关于 yield

列表return输出和对象之间的区别yield输出是:

您将始终从列表对象获取[0,1,2],但只能从“对象yield输出”中检索一次。因此,它具有一个新的名称generator对象,如Out[11]: <generator object num_list at 0x10327c990>

总之,作为一个隐喻,它可以:

  • return并且yield是双胞胎
  • list并且generator是双胞胎

All great answers, however a bit difficult for newbies.

I assume you have learned the return statement.

As an analogy, return and yield are twins. return means ‘return and stop’ whereas ‘yield` means ‘return, but continue’

  1. Try to get a num_list with return.
def num_list(n):
    for i in range(n):
        return i

Run it:

In [5]: num_list(3)
Out[5]: 0

See, you get only a single number rather than a list of them. return never allows you prevail happily, just implements once and quit.

  1. There comes yield

Replace return with yield:

In [10]: def num_list(n):
    ...:     for i in range(n):
    ...:         yield i
    ...:

In [11]: num_list(3)
Out[11]: <generator object num_list at 0x10327c990>

In [12]: list(num_list(3))
Out[12]: [0, 1, 2]

Now, you win to get all the numbers.

Comparing to return which runs once and stops, yield runs times you planed. You can interpret return as return one of them, and yield as return all of them. This is called iterable.

  1. One more step we can rewrite yield statement with return
In [15]: def num_list(n):
    ...:     result = []
    ...:     for i in range(n):
    ...:         result.append(i)
    ...:     return result

In [16]: num_list(3)
Out[16]: [0, 1, 2]

It’s the core about yield.

The difference between a list return outputs and the object yield output is:

You will always get [0, 1, 2] from a list object but only could retrieve them from ‘the object yield output’ once. So, it has a new name generator object as displayed in Out[11]: <generator object num_list at 0x10327c990>.

In conclusion, as a metaphor to grok it:

  • return and yield are twins
  • list and generator are twins

回答 16

以下是一些Python示例,这些示例说明如何实际实现生成器,就像Python没有为其提供语法糖一样:

作为Python生成器:

from itertools import islice

def fib_gen():
    a, b = 1, 1
    while True:
        yield a
        a, b = b, a + b

assert [1, 1, 2, 3, 5] == list(islice(fib_gen(), 5))

使用词法闭包而不是生成器

def ftake(fnext, last):
    return [fnext() for _ in xrange(last)]

def fib_gen2():
    #funky scope due to python2.x workaround
    #for python 3.x use nonlocal
    def _():
        _.a, _.b = _.b, _.a + _.b
        return _.a
    _.a, _.b = 0, 1
    return _

assert [1,1,2,3,5] == ftake(fib_gen2(), 5)

使用对象闭包而不是生成器(因为ClosuresAndObjectsAreEquivalent

class fib_gen3:
    def __init__(self):
        self.a, self.b = 1, 1

    def __call__(self):
        r = self.a
        self.a, self.b = self.b, self.a + self.b
        return r

assert [1,1,2,3,5] == ftake(fib_gen3(), 5)

Here are some Python examples of how to actually implement generators as if Python did not provide syntactic sugar for them:

As a Python generator:

from itertools import islice

def fib_gen():
    a, b = 1, 1
    while True:
        yield a
        a, b = b, a + b

assert [1, 1, 2, 3, 5] == list(islice(fib_gen(), 5))

Using lexical closures instead of generators

def ftake(fnext, last):
    return [fnext() for _ in xrange(last)]

def fib_gen2():
    #funky scope due to python2.x workaround
    #for python 3.x use nonlocal
    def _():
        _.a, _.b = _.b, _.a + _.b
        return _.a
    _.a, _.b = 0, 1
    return _

assert [1,1,2,3,5] == ftake(fib_gen2(), 5)

Using object closures instead of generators (because ClosuresAndObjectsAreEquivalent)

class fib_gen3:
    def __init__(self):
        self.a, self.b = 1, 1

    def __call__(self):
        r = self.a
        self.a, self.b = self.b, self.a + self.b
        return r

assert [1,1,2,3,5] == ftake(fib_gen3(), 5)

回答 17

我打算发布“阅读Beazley的“ Python:基本参考”的第19页,以快速了解生成器”,但是已经有许多其他人发布了不错的描述。

另外,请注意,它们yield可以在协程中用作生成函数的双重功能。尽管它与您的代码段用法不同,(yield)但是可以用作函数中的表达式。当调用者使用该send()方法向该方法发送值时,协程将执行直到(yield)遇到下一条语句。

生成器和协程是设置数据流类型应用程序的一种很酷的方法。我认为有必要了解该yield语句在函数中的其他用法。

I was going to post “read page 19 of Beazley’s ‘Python: Essential Reference’ for a quick description of generators”, but so many others have posted good descriptions already.

Also, note that yield can be used in coroutines as the dual of their use in generator functions. Although it isn’t the same use as your code snippet, (yield) can be used as an expression in a function. When a caller sends a value to the method using the send() method, then the coroutine will execute until the next (yield) statement is encountered.

Generators and coroutines are a cool way to set up data-flow type applications. I thought it would be worthwhile knowing about the other use of the yield statement in functions.


回答 18

从编程的角度来看,迭代器被实现为thunk

为了将迭代器,生成器和线程池实现为并发执行等,作为重击(也称为匿名函数),人们使用发送到具有分派器的闭包对象的消息,然后分派器对“消息”做出响应。

http://en.wikipedia.org/wiki/Message_passing

next ”是发送给闭包的消息,由“ iter ”创建 ”调用。

有很多方法可以实现此计算。我使用了变异,但是通过返回当前值和下一个生成器,很容易做到无变异。

这是一个使用R6RS结构的演示,但是其语义与Python完全相同。它是相同的计算模型,只需要更改语法就可以用Python重写它。

Welcome to Racket v6.5.0.3.

-> (define gen
     (lambda (l)
       (define yield
         (lambda ()
           (if (null? l)
               'END
               (let ((v (car l)))
                 (set! l (cdr l))
                 v))))
       (lambda(m)
         (case m
           ('yield (yield))
           ('init  (lambda (data)
                     (set! l data)
                     'OK))))))
-> (define stream (gen '(1 2 3)))
-> (stream 'yield)
1
-> (stream 'yield)
2
-> (stream 'yield)
3
-> (stream 'yield)
'END
-> ((stream 'init) '(a b))
'OK
-> (stream 'yield)
'a
-> (stream 'yield)
'b
-> (stream 'yield)
'END
-> (stream 'yield)
'END
->

From a programming viewpoint, the iterators are implemented as thunks.

To implement iterators, generators, and thread pools for concurrent execution, etc. as thunks (also called anonymous functions), one uses messages sent to a closure object, which has a dispatcher, and the dispatcher answers to “messages”.

http://en.wikipedia.org/wiki/Message_passing

next” is a message sent to a closure, created by the “iter” call.

There are lots of ways to implement this computation. I used mutation, but it is easy to do it without mutation, by returning the current value and the next yielder.

Here is a demonstration which uses the structure of R6RS, but the semantics is absolutely identical to Python’s. It’s the same model of computation, and only a change in syntax is required to rewrite it in Python.

Welcome to Racket v6.5.0.3.

-> (define gen
     (lambda (l)
       (define yield
         (lambda ()
           (if (null? l)
               'END
               (let ((v (car l)))
                 (set! l (cdr l))
                 v))))
       (lambda(m)
         (case m
           ('yield (yield))
           ('init  (lambda (data)
                     (set! l data)
                     'OK))))))
-> (define stream (gen '(1 2 3)))
-> (stream 'yield)
1
-> (stream 'yield)
2
-> (stream 'yield)
3
-> (stream 'yield)
'END
-> ((stream 'init) '(a b))
'OK
-> (stream 'yield)
'a
-> (stream 'yield)
'b
-> (stream 'yield)
'END
-> (stream 'yield)
'END
->

回答 19

这是一个简单的示例:

def isPrimeNumber(n):
    print "isPrimeNumber({}) call".format(n)
    if n==1:
        return False
    for x in range(2,n):
        if n % x == 0:
            return False
    return True

def primes (n=1):
    while(True):
        print "loop step ---------------- {}".format(n)
        if isPrimeNumber(n): yield n
        n += 1

for n in primes():
    if n> 10:break
    print "wiriting result {}".format(n)

输出:

loop step ---------------- 1
isPrimeNumber(1) call
loop step ---------------- 2
isPrimeNumber(2) call
loop step ---------------- 3
isPrimeNumber(3) call
wiriting result 3
loop step ---------------- 4
isPrimeNumber(4) call
loop step ---------------- 5
isPrimeNumber(5) call
wiriting result 5
loop step ---------------- 6
isPrimeNumber(6) call
loop step ---------------- 7
isPrimeNumber(7) call
wiriting result 7
loop step ---------------- 8
isPrimeNumber(8) call
loop step ---------------- 9
isPrimeNumber(9) call
loop step ---------------- 10
isPrimeNumber(10) call
loop step ---------------- 11
isPrimeNumber(11) call

我不是Python开发人员,但在我看来 yield保持着程序流程的位置,并且下一个循环从“ yield”位置开始。似乎它正在那个位置等待,就在那之前,在外面返回一个值,下一次继续工作。

这似乎是一种有趣而又不错的能力:D

Here is a simple example:

def isPrimeNumber(n):
    print "isPrimeNumber({}) call".format(n)
    if n==1:
        return False
    for x in range(2,n):
        if n % x == 0:
            return False
    return True

def primes (n=1):
    while(True):
        print "loop step ---------------- {}".format(n)
        if isPrimeNumber(n): yield n
        n += 1

for n in primes():
    if n> 10:break
    print "wiriting result {}".format(n)

Output:

loop step ---------------- 1
isPrimeNumber(1) call
loop step ---------------- 2
isPrimeNumber(2) call
loop step ---------------- 3
isPrimeNumber(3) call
wiriting result 3
loop step ---------------- 4
isPrimeNumber(4) call
loop step ---------------- 5
isPrimeNumber(5) call
wiriting result 5
loop step ---------------- 6
isPrimeNumber(6) call
loop step ---------------- 7
isPrimeNumber(7) call
wiriting result 7
loop step ---------------- 8
isPrimeNumber(8) call
loop step ---------------- 9
isPrimeNumber(9) call
loop step ---------------- 10
isPrimeNumber(10) call
loop step ---------------- 11
isPrimeNumber(11) call

I am not a Python developer, but it looks to me yield holds the position of program flow and the next loop start from “yield” position. It seems like it is waiting at that position, and just before that, returning a value outside, and next time continues to work.

It seems to be an interesting and nice ability :D


回答 20

这是做什么事情的心理yield印象。

我喜欢将线程视为具有堆栈(即使未以这种方式实现)。

调用普通函数时,它将其局部变量放在堆栈上,进行一些计算,然后清除堆栈并返回。再也看不到其局部变量的值。

对于一个yield函数,当其代码开始运行时(即,在调用该函数之后,返回生成器对象,next()然后调用该方法的生成器对象),它类似地将其局部变量放入堆栈中并进行一段时间的计算。但是,当它命中该yield语句时,在清除堆栈的一部分并返回之前,它会对其局部变量进行快照,并将其存储在生成器对象中。它还在代码中写下了当前位置(即特定的yield语句)。

因此,这是生成器挂起的一种冻结函数。

next()随后被调用时,它检索功能的物品入堆栈,重新蓬勃生机。该函数从中断处继续进行计算,而忽略了它刚刚在冷库中度过了一个永恒的事实。

比较以下示例:

def normalFunction():
    return
    if False:
        pass

def yielderFunction():
    return
    if False:
        yield 12

当我们调用第二个函数时,它的行为与第一个函数非常不同。该yield语句可能无法到达,但是如果它存在于任何地方,它将改变我们正在处理的内容的性质。

>>> yielderFunction()
<generator object yielderFunction at 0x07742D28>

调用yielderFunction()不会运行其代码,而是使代码生成器。(yielder为便于阅读,以这样的名称命名可能是个好主意。)

>>> gen = yielderFunction()
>>> dir(gen)
['__class__',
 ...
 '__iter__',    #Returns gen itself, to make it work uniformly with containers
 ...            #when given to a for loop. (Containers return an iterator instead.)
 'close',
 'gi_code',
 'gi_frame',
 'gi_running',
 'next',        #The method that runs the function's body.
 'send',
 'throw']

gi_codegi_frame字段是冻结状态的存储位置。用探索它们dir(..),我们可以确认我们上面的心理模型是可信的。

Here is a mental image of what yield does.

I like to think of a thread as having a stack (even when it’s not implemented that way).

When a normal function is called, it puts its local variables on the stack, does some computation, then clears the stack and returns. The values of its local variables are never seen again.

With a yield function, when its code begins to run (i.e. after the function is called, returning a generator object, whose next() method is then invoked), it similarly puts its local variables onto the stack and computes for a while. But then, when it hits the yield statement, before clearing its part of the stack and returning, it takes a snapshot of its local variables and stores them in the generator object. It also writes down the place where it’s currently up to in its code (i.e. the particular yield statement).

So it’s a kind of a frozen function that the generator is hanging onto.

When next() is called subsequently, it retrieves the function’s belongings onto the stack and re-animates it. The function continues to compute from where it left off, oblivious to the fact that it had just spent an eternity in cold storage.

Compare the following examples:

def normalFunction():
    return
    if False:
        pass

def yielderFunction():
    return
    if False:
        yield 12

When we call the second function, it behaves very differently to the first. The yield statement might be unreachable, but if it’s present anywhere, it changes the nature of what we’re dealing with.

>>> yielderFunction()
<generator object yielderFunction at 0x07742D28>

Calling yielderFunction() doesn’t run its code, but makes a generator out of the code. (Maybe it’s a good idea to name such things with the yielder prefix for readability.)

>>> gen = yielderFunction()
>>> dir(gen)
['__class__',
 ...
 '__iter__',    #Returns gen itself, to make it work uniformly with containers
 ...            #when given to a for loop. (Containers return an iterator instead.)
 'close',
 'gi_code',
 'gi_frame',
 'gi_running',
 'next',        #The method that runs the function's body.
 'send',
 'throw']

The gi_code and gi_frame fields are where the frozen state is stored. Exploring them with dir(..), we can confirm that our mental model above is credible.


回答 21

就像每个答案所建议的那样,yield用于创建序列生成器。它用于动态生成一些序列。例如,在网络上逐行读取文件时,可以使用以下yield功能:

def getNextLines():
   while con.isOpen():
       yield con.read()

您可以在代码中使用它,如下所示:

for line in getNextLines():
    doSomeThing(line)

执行控制转移陷阱

执行foryield时,执行控制将从getNextLines()转移到循环中。因此,每次调用getNextLines()时,都会从上次暂停的位置开始执行。

因此,简而言之,具有以下代码的函数

def simpleYield():
    yield "first time"
    yield "second time"
    yield "third time"
    yield "Now some useful value {}".format(12)

for i in simpleYield():
    print i

将打印

"first time"
"second time"
"third time"
"Now some useful value 12"

Like every answer suggests, yield is used for creating a sequence generator. It’s used for generating some sequence dynamically. For example, while reading a file line by line on a network, you can use the yield function as follows:

def getNextLines():
   while con.isOpen():
       yield con.read()

You can use it in your code as follows:

for line in getNextLines():
    doSomeThing(line)

Execution Control Transfer gotcha

The execution control will be transferred from getNextLines() to the for loop when yield is executed. Thus, every time getNextLines() is invoked, execution begins from the point where it was paused last time.

Thus in short, a function with the following code

def simpleYield():
    yield "first time"
    yield "second time"
    yield "third time"
    yield "Now some useful value {}".format(12)

for i in simpleYield():
    print i

will print

"first time"
"second time"
"third time"
"Now some useful value 12"

回答 22

一个简单的例子来了解它是什么: yield

def f123():
    for _ in range(4):
        yield 1
        yield 2


for i in f123():
    print (i)

输出为:

1 2 1 2 1 2 1 2

An easy example to understand what it is: yield

def f123():
    for _ in range(4):
        yield 1
        yield 2


for i in f123():
    print (i)

The output is:

1 2 1 2 1 2 1 2

回答 23

(我下面的回答仅从使用Python生成器的角度讲,而不是生成器机制基础实现,它涉及堆栈和堆操作的一些技巧。)

在python函数中yield使用when 代替a return时,该函数变成了一个特殊的名称generator function。该函数将返回一个generator类型的对象。yield关键字是一个标志,通知Python编译器将特殊对待这样的功能。普通函数将在返回一些值后终止。但是在编译器的帮助下,可以将 generator函数视为可恢复的。也就是说,将恢复执行上下文,并且将从上次运行继续执行。在您显式调用return之前,它将引发StopIteration异常(这也是迭代器协议的一部分),或到达函数的结尾。我发现了很多关于引用的generator,但是这一个从中functional programming perspective最容易消化。

(现在,我想根据我自己的理解来讨论其背后的原理generatoriterator基础。我希望这可以帮助您掌握迭代器和生成器的基本动机。这种概念也出现在其他语言中,例如C#。)

据我了解,当我们要处理一堆数据时,通常先将数据存储在某个地方,然后再逐一处理。但是这种幼稚的方法是有问题的。如果数据量巨大,则预先存储它们是很昂贵的。因此data,为什么不直接存储自身,为什么不metadata间接存储某种形式,即the logic how the data is computed

有两种包装此类元数据的方法。

  1. 面向对象的方法,我们包装了元数据as a class。这就是所谓的iterator实现迭代器协议的人(即__next__()__iter__()方法)。这也是常见的迭代器设计模式
  2. 在功能方法上,我们包装了元数据as a function。这就是所谓的generator function。但是在后台,返回的generator object静态IS-A迭代器仍然存在,因为它也实现了迭代器协议。

无论哪种方式,都会创建一个迭代器,即某个可以为您提供所需数据的对象。OO方法可能有点复杂。无论如何,要使用哪一个取决于您。

(My below answer only speaks from the perspective of using Python generator, not the underlying implementation of generator mechanism, which involves some tricks of stack and heap manipulation.)

When yield is used instead of a return in a python function, that function is turned into something special called generator function. That function will return an object of generator type. The yield keyword is a flag to notify the python compiler to treat such function specially. Normal functions will terminate once some value is returned from it. But with the help of the compiler, the generator function can be thought of as resumable. That is, the execution context will be restored and the execution will continue from last run. Until you explicitly call return, which will raise a StopIteration exception (which is also part of the iterator protocol), or reach the end of the function. I found a lot of references about generator but this one from the functional programming perspective is the most digestable.

(Now I want to talk about the rationale behind generator, and the iterator based on my own understanding. I hope this can help you grasp the essential motivation of iterator and generator. Such concept shows up in other languages as well such as C#.)

As I understand, when we want to process a bunch of data, we usually first store the data somewhere and then process it one by one. But this naive approach is problematic. If the data volume is huge, it’s expensive to store them as a whole beforehand. So instead of storing the data itself directly, why not store some kind of metadata indirectly, i.e. the logic how the data is computed.

There are 2 approaches to wrap such metadata.

  1. The OO approach, we wrap the metadata as a class. This is the so-called iterator who implements the iterator protocol (i.e. the __next__(), and __iter__() methods). This is also the commonly seen iterator design pattern.
  2. The functional approach, we wrap the metadata as a function. This is the so-called generator function. But under the hood, the returned generator object still IS-A iterator because it also implements the iterator protocol.

Either way, an iterator is created, i.e. some object that can give you the data you want. The OO approach may be a bit complex. Anyway, which one to use is up to you.


回答 24

总之,该yield语句将您的函数转换为一个工厂,该工厂产生一个称为a的特殊对象,该对象generator环绕原始函数的主体。当generator被重复,直到它到达下一个执行的功能yield后停止执行,计算结果为传递给值yield。它将在每次迭代中重复此过程,直到执行路径退出函数为止。例如,

def simple_generator():
    yield 'one'
    yield 'two'
    yield 'three'

for i in simple_generator():
    print i

简单地输出

one
two
three

动力来自将生成器与计算序列的循环配合使用,生成器每次执行循环都会停止,以“产生”下一个计算结果,这样就可以即时计算列表,而好处是可以存储保存用于特别大的计算

假设您想创建自己的range函数来产生可迭代的数字范围,则可以这样做,

def myRangeNaive(i):
    n = 0
    range = []
    while n < i:
        range.append(n)
        n = n + 1
    return range

像这样使用

for i in myRangeNaive(10):
    print i

但这是低效的,因为

  • 您创建只使用一次的数组(这会浪费内存)
  • 这段代码实际上在该数组上循环了两次!:(

幸运的是,Guido和他的团队足够慷慨地开发生成器,因此我们可以做到这一点。

def myRangeSmart(i):
    n = 0
    while n < i:
       yield n
       n = n + 1
    return

for i in myRangeSmart(10):
    print i

现在,每次迭代时,生成器上的一个称为next()函数的函数都会执行该函数,直到达到“ yield”语句为止,该语句在该语句中停止并“屈服”值或到达函数的末尾。在这种情况下,在第一次调用时,next()执行到yield语句并产生yield’n’,在下一次调用时,它将执行递增语句,跳回到’while’,对其求值,如果为true,它将停止并再次产生yield’n’,它将继续以这种方式,直到while条件返回false且生成器跳到函数的末尾。

In summary, the yield statement transforms your function into a factory that produces a special object called a generator which wraps around the body of your original function. When the generator is iterated, it executes your function until it reaches the next yield then suspends execution and evaluates to the value passed to yield. It repeats this process on each iteration until the path of execution exits the function. For instance,

def simple_generator():
    yield 'one'
    yield 'two'
    yield 'three'

for i in simple_generator():
    print i

simply outputs

one
two
three

The power comes from using the generator with a loop that calculates a sequence, the generator executes the loop stopping each time to ‘yield’ the next result of the calculation, in this way it calculates a list on the fly, the benefit being the memory saved for especially large calculations

Say you wanted to create a your own range function that produces an iterable range of numbers, you could do it like so,

def myRangeNaive(i):
    n = 0
    range = []
    while n < i:
        range.append(n)
        n = n + 1
    return range

and use it like this;

for i in myRangeNaive(10):
    print i

But this is inefficient because

  • You create an array that you only use once (this wastes memory)
  • This code actually loops over that array twice! :(

Luckily Guido and his team were generous enough to develop generators so we could just do this;

def myRangeSmart(i):
    n = 0
    while n < i:
       yield n
       n = n + 1
    return

for i in myRangeSmart(10):
    print i

Now upon each iteration a function on the generator called next() executes the function until it either reaches a ‘yield’ statement in which it stops and ‘yields’ the value or reaches the end of the function. In this case on the first call, next() executes up to the yield statement and yield ‘n’, on the next call it will execute the increment statement, jump back to the ‘while’, evaluate it, and if true, it will stop and yield ‘n’ again, it will continue that way until the while condition returns false and the generator jumps to the end of the function.


回答 25

Yield是一个对象

return函数中的A 将返回单个值。

如果您希望函数返回大量值,请使用yield

更重要的yield是,是一个障碍

就像CUDA语言中的barrier一样,它在完成之前不会转移控制权。

也就是说,它将从头开始运行函数中的代码,直到命中为止yield。然后,它将返回循环的第一个值。

然后,其他所有调用将再次运行您在函数中编写的循环,返回下一个值,直到没有任何值可返回为止。

Yield is an object

A return in a function will return a single value.

If you want a function to return a huge set of values, use yield.

More importantly, yield is a barrier.

like barrier in the CUDA language, it will not transfer control until it gets completed.

That is, it will run the code in your function from the beginning until it hits yield. Then, it’ll return the first value of the loop.

Then, every other call will run the loop you have written in the function one more time, returning the next value until there isn’t any value to return.


回答 26

许多人使用return而不是yield,但是在某些情况下yield可以更高效,更轻松地工作。

这是yield绝对适合的示例:

返回(函数中)

import random

def return_dates():
    dates = [] # With 'return' you need to create a list then return it
    for i in range(5):
        date = random.choice(["1st", "2nd", "3rd", "4th", "5th", "6th", "7th", "8th", "9th", "10th"])
        dates.append(date)
    return dates

Yield(以功能计)

def yield_dates():
    for i in range(5):
        date = random.choice(["1st", "2nd", "3rd", "4th", "5th", "6th", "7th", "8th", "9th", "10th"])
        yield date # 'yield' makes a generator automatically which works
                   # in a similar way. This is much more efficient.

通话功能

dates_list = return_dates()
print(dates_list)
for i in dates_list:
    print(i)

dates_generator = yield_dates()
print(dates_generator)
for i in dates_generator:
    print(i)

这两个函数执行相同的操作,但是yield使用三行而不是五行,并且少担心一个变量。

这是代码的结果:

输出量

如您所见,两个函数都做同样的事情。唯一的区别是return_dates()提供列表和yield_dates()生成器。

现实生活中的例子可能是像逐行读取文件,或者只是想生成一个生成器。

Many people use return rather than yield, but in some cases yield can be more efficient and easier to work with.

Here is an example which yield is definitely best for:

return (in function)

import random

def return_dates():
    dates = [] # With 'return' you need to create a list then return it
    for i in range(5):
        date = random.choice(["1st", "2nd", "3rd", "4th", "5th", "6th", "7th", "8th", "9th", "10th"])
        dates.append(date)
    return dates

yield (in function)

def yield_dates():
    for i in range(5):
        date = random.choice(["1st", "2nd", "3rd", "4th", "5th", "6th", "7th", "8th", "9th", "10th"])
        yield date # 'yield' makes a generator automatically which works
                   # in a similar way. This is much more efficient.

Calling functions

dates_list = return_dates()
print(dates_list)
for i in dates_list:
    print(i)

dates_generator = yield_dates()
print(dates_generator)
for i in dates_generator:
    print(i)

Both functions do the same thing, but yield uses three lines instead of five and has one less variable to worry about.

This is the result from the code:

Output

As you can see both functions do the same thing. The only difference is return_dates() gives a list and yield_dates() gives a generator.

A real life example would be something like reading a file line by line or if you just want to make a generator.


回答 27

yield就像函数的返回元素一样。不同之处在于,yield元素将功能转换为生成器。生成器的行为就像一个函数,直到“屈服”为止。生成器停止运行,直到下一次调用为止,并从与启动完全相同的点继续运行。您可以通过调用来获得所有“屈服”值的序列list(generator())

yield is like a return element for a function. The difference is, that the yield element turns a function into a generator. A generator behaves just like a function until something is ‘yielded’. The generator stops until it is next called, and continues from exactly the same point as it started. You can get a sequence of all the ‘yielded’ values in one, by calling list(generator()).


回答 28

yield关键字简单地收集返回结果。想想yieldreturn +=

The yield keyword simply collects returning results. Think of yield like return +=


回答 29

这是一种yield基于简单的方法来计算斐波那契数列,解释如下:

def fib(limit=50):
    a, b = 0, 1
    for i in range(limit):
       yield b
       a, b = b, a+b

当您将其输入到REPL中并尝试调用它时,您将得到一个神秘的结果:

>>> fib()
<generator object fib at 0x7fa38394e3b8>

这是因为存在yield向您发送信号的Python,您想要创建一个生成器,即一个按需生成值的对象。

那么,如何生成这些值?这可以通过使用内置函数直接完成,也可以next通过将其提供给使用值的构造间接完成。

使用内置next()函数,您可以直接调用.next/ __next__,强制生成器生成一个值:

>>> g = fib()
>>> next(g)
1
>>> next(g)
1
>>> next(g)
2
>>> next(g)
3
>>> next(g)
5

间接地,如果您提供fibfor循环,list初始化程序,tuple初始化程序或其他任何期望对象生成/产生值的对象,则将“消耗”生成器,直到无法再生成任何值(并且返回) :

results = []
for i in fib(30):       # consumes fib
    results.append(i) 
# can also be accomplished with
results = list(fib(30)) # consumes fib

同样,使用tuple初始化程序:

>>> tuple(fib(5))       # consumes fib
(1, 1, 2, 3, 5)

生成器在延迟方面与功能有所不同。它通过保持其本地状态并允许您在需要时恢复来实现此目的。

首次调用fib时:

f = fib()

Python编译函数,遇到yield关键字,然后简单地将生成器对象返回给您。看起来不是很有帮助。

然后,当您请求它直接或间接生成第一个值时,它将执行找到的所有语句,直到遇到a为止yield,然后返回您提供给它的值yield并暂停。为了更好地说明这一点,让我们使用一些print调用(print "text"在Python 2上用if 代替):

def yielder(value):
    """ This is an infinite generator. Only use next on it """ 
    while 1:
        print("I'm going to generate the value for you")
        print("Then I'll pause for a while")
        yield value
        print("Let's go through it again.")

现在,输入REPL:

>>> gen = yielder("Hello, yield!")

您现在有了一个生成器对象,等待一个命令来生成一个值。使用next并查看打印出的内容:

>>> next(gen) # runs until it finds a yield
I'm going to generate the value for you
Then I'll pause for a while
'Hello, yield!'

未报价的结果是所打印的内容。引用的结果是从返回的结果yieldnext现在再次调用:

>>> next(gen) # continues from yield and runs again
Let's go through it again.
I'm going to generate the value for you
Then I'll pause for a while
'Hello, yield!'

生成器会记住它在此处暂停yield value并从那里继续。打印下一条消息yield,并再次执行搜索以使其暂停的语句(由于while循环)。

Here’s a simple yield based approach, to compute the fibonacci series, explained:

def fib(limit=50):
    a, b = 0, 1
    for i in range(limit):
       yield b
       a, b = b, a+b

When you enter this into your REPL and then try and call it, you’ll get a mystifying result:

>>> fib()
<generator object fib at 0x7fa38394e3b8>

This is because the presence of yield signaled to Python that you want to create a generator, that is, an object that generates values on demand.

So, how do you generate these values? This can either be done directly by using the built-in function next, or, indirectly by feeding it to a construct that consumes values.

Using the built-in next() function, you directly invoke .next/__next__, forcing the generator to produce a value:

>>> g = fib()
>>> next(g)
1
>>> next(g)
1
>>> next(g)
2
>>> next(g)
3
>>> next(g)
5

Indirectly, if you provide fib to a for loop, a list initializer, a tuple initializer, or anything else that expects an object that generates/produces values, you’ll “consume” the generator until no more values can be produced by it (and it returns):

results = []
for i in fib(30):       # consumes fib
    results.append(i) 
# can also be accomplished with
results = list(fib(30)) # consumes fib

Similarly, with a tuple initializer:

>>> tuple(fib(5))       # consumes fib
(1, 1, 2, 3, 5)

A generator differs from a function in the sense that it is lazy. It accomplishes this by maintaining it’s local state and allowing you to resume whenever you need to.

When you first invoke fib by calling it:

f = fib()

Python compiles the function, encounters the yield keyword and simply returns a generator object back at you. Not very helpful it seems.

When you then request it generates the first value, directly or indirectly, it executes all statements that it finds, until it encounters a yield, it then yields back the value you supplied to yield and pauses. For an example that better demonstrates this, let’s use some print calls (replace with print "text" if on Python 2):

def yielder(value):
    """ This is an infinite generator. Only use next on it """ 
    while 1:
        print("I'm going to generate the value for you")
        print("Then I'll pause for a while")
        yield value
        print("Let's go through it again.")

Now, enter in the REPL:

>>> gen = yielder("Hello, yield!")

you have a generator object now waiting for a command for it to generate a value. Use next and see what get’s printed:

>>> next(gen) # runs until it finds a yield
I'm going to generate the value for you
Then I'll pause for a while
'Hello, yield!'

The unquoted results are what’s printed. The quoted result is what is returned from yield. Call next again now:

>>> next(gen) # continues from yield and runs again
Let's go through it again.
I'm going to generate the value for you
Then I'll pause for a while
'Hello, yield!'

The generator remembers it was paused at yield value and resumes from there. The next message is printed and the search for the yield statement to pause at it performed again (due to the while loop).