您可以使用Python生成器函数做什么?

问题:您可以使用Python生成器函数做什么?

我开始学习Python,并且遇到过生成器函数,这些函数中包含yield语句。我想知道这些功能确实可以解决哪些类型的问题。

I’m starting to learn Python and I’ve come across generator functions, those that have a yield statement in them. I want to know what types of problems that these functions are really good at solving.


回答 0

生成器为您提供懒惰的评估。您可以通过遍历它们来使用它们,可以显式地使用“ for”,也可以通过将其传递给任何迭代的函数或构造来隐式使用。您可以将生成器视为返回多个项目,就像它们返回一个列表一样,但是它们不是一次全部返回它们而是一一返回,而是暂停生成器功能,直到请求下一个项目。

生成器适用于计算大量结果(特别是涉及循环本身的计算),在这些情况下您不知道是否需要所有结果,或者不想在同一时间为所有结果分配内存。或者对于生成器使用另一台生成器或消耗某些其他资源的情况,如果这种情况发生得越晚越方便。

生成器的另一个用途(实际上是相同的)是用迭代替换回调。在某些情况下,您希望函数执行大量工作,并偶尔向调用者报告。传统上,您将为此使用回调函数。您将此回调传递给工作函数,它将定期调用此回调。生成器方法是工作函数(现在是生成器)对回调一无所知,仅在需要报告某些内容时才产生。调用者没有编写单独的回调并将其传递给工作函数,而是在生成器周围的一个“ for”循环中完成所有报告工作。

例如,假设您编写了一个“文件系统搜索”程序。您可以完整地执行搜索,收集结果,然后一次显示一个。在显示第一个结果之前,必须先收集所有结果,并且所有结果将同时存储在内存中。或者,您可以在找到结果时显示结果,这样可以提高内存效率,并且对用户友好得多。后者可以通过将结果打印功能传递给文件系统搜索功能来完成,也可以仅使搜索功能为生成器并迭代结果来完成。

如果要查看后两种方法的示例,请参见os.path.walk()(带有回调的旧文件系统行走功能)和os.walk()(新的文件系统行走生成器)。当然,如果您确实想将所有结果收集到列表中,生成器方法可以轻松转换为大列表方法:

big_list = list(the_generator)

Generators give you lazy evaluation. You use them by iterating over them, either explicitly with ‘for’ or implicitly by passing it to any function or construct that iterates. You can think of generators as returning multiple items, as if they return a list, but instead of returning them all at once they return them one-by-one, and the generator function is paused until the next item is requested.

Generators are good for calculating large sets of results (in particular calculations involving loops themselves) where you don’t know if you are going to need all results, or where you don’t want to allocate the memory for all results at the same time. Or for situations where the generator uses another generator, or consumes some other resource, and it’s more convenient if that happened as late as possible.

Another use for generators (that is really the same) is to replace callbacks with iteration. In some situations you want a function to do a lot of work and occasionally report back to the caller. Traditionally you’d use a callback function for this. You pass this callback to the work-function and it would periodically call this callback. The generator approach is that the work-function (now a generator) knows nothing about the callback, and merely yields whenever it wants to report something. The caller, instead of writing a separate callback and passing that to the work-function, does all the reporting work in a little ‘for’ loop around the generator.

For example, say you wrote a ‘filesystem search’ program. You could perform the search in its entirety, collect the results and then display them one at a time. All of the results would have to be collected before you showed the first, and all of the results would be in memory at the same time. Or you could display the results while you find them, which would be more memory efficient and much friendlier towards the user. The latter could be done by passing the result-printing function to the filesystem-search function, or it could be done by just making the search function a generator and iterating over the result.

If you want to see an example of the latter two approaches, see os.path.walk() (the old filesystem-walking function with callback) and os.walk() (the new filesystem-walking generator.) Of course, if you really wanted to collect all results in a list, the generator approach is trivial to convert to the big-list approach:

big_list = list(the_generator)

回答 1

使用生成器的原因之一是使某种解决方案的解决方案更清晰。

另一种是一次处理一个结果,避免构建庞大的结果列表,而这些结果无论如何都要分开处理。

如果您有一个像这样的fibonacci-up-to-n函数:

# function version
def fibon(n):
    a = b = 1
    result = []
    for i in xrange(n):
        result.append(a)
        a, b = b, a + b
    return result

您可以这样更轻松地编写函数:

# generator version
def fibon(n):
    a = b = 1
    for i in xrange(n):
        yield a
        a, b = b, a + b

功能更清晰。如果您使用这样的功能:

for x in fibon(1000000):
    print x,

在此示例中,如果使用生成器版本,则将完全不会创建整个1000000项列表,一次只能创建一个值。使用列表版本时,情况并非如此,先创建列表。

One of the reasons to use generator is to make the solution clearer for some kind of solutions.

The other is to treat results one at a time, avoiding building huge lists of results that you would process separated anyway.

If you have a fibonacci-up-to-n function like this:

# function version
def fibon(n):
    a = b = 1
    result = []
    for i in xrange(n):
        result.append(a)
        a, b = b, a + b
    return result

You can more easily write the function as this:

# generator version
def fibon(n):
    a = b = 1
    for i in xrange(n):
        yield a
        a, b = b, a + b

The function is clearer. And if you use the function like this:

for x in fibon(1000000):
    print x,

in this example, if using the generator version, the whole 1000000 item list won’t be created at all, just one value at a time. That would not be the case when using the list version, where a list would be created first.


回答 2

请参阅PEP 255中的“动机”部分。

生成器的一种非显而易见的用法是创建可中断的函数,使您可以在不使用线程的情况下执行诸如更新UI或“同时”(实际上是交错)运行多个作业的操作。

See the “Motivation” section in PEP 255.

A non-obvious use of generators is creating interruptible functions, which lets you do things like update UI or run several jobs “simultaneously” (interleaved, actually) while not using threads.


回答 3

我发现这种解释消除了我的怀疑。因为有可能不认识的Generators人也不知道yield

返回

return语句将销毁所有局部变量,并将结果值返回(返回)给调用方。如果稍后再调用同一函数,则该函数将获得一组新的变量。

Yield

但是,如果在退出函数时不丢弃局部变量该怎么办?这意味着我们可以resume the function停下来。这是generators引入概念的地方,yield语句从function上次中断的地方继续。

  def generate_integers(N):
    for i in xrange(N):
    yield i

    In [1]: gen = generate_integers(3)
    In [2]: gen
    <generator object at 0x8117f90>
    In [3]: gen.next()
    0
    In [4]: gen.next()
    1
    In [5]: gen.next()

这就是Python中returnyield语句之间的区别。

Yield语句使函数成为生成器函数。

因此,生成器是用于创建迭代器的简单而强大的工具。它们的编写方式类似于常规函数,但是yield只要要返回数据,就使用该语句。每次调用next()时,生成器将从上次中断的地方恢复(它会记住所有数据值以及最后执行的语句)。

I find this explanation which clears my doubt. Because there is a possibility that person who don’t know Generators also don’t know about yield

Return

The return statement is where all the local variables are destroyed and the resulting value is given back (returned) to the caller. Should the same function be called some time later, the function will get a fresh new set of variables.

Yield

But what if the local variables aren’t thrown away when we exit a function? This implies that we can resume the function where we left off. This is where the concept of generators are introduced and the yield statement resumes where the function left off.

  def generate_integers(N):
    for i in xrange(N):
    yield i

    In [1]: gen = generate_integers(3)
    In [2]: gen
    <generator object at 0x8117f90>
    In [3]: gen.next()
    0
    In [4]: gen.next()
    1
    In [5]: gen.next()

So that’s the difference between return and yield statements in Python.

Yield statement is what makes a function a generator function.

So generators are a simple and powerful tool for creating iterators. They are written like regular functions, but they use the yield statement whenever they want to return data. Each time next() is called, the generator resumes where it left off (it remembers all the data values and which statement was last executed).


回答 4

真实的例子

假设您的MySQL表中有1亿个域,并且您想更新每个域的Alexa排名。

您需要做的第一件事是从数据库中选择域名。

假设您的表名是domains,列名是domain

如果您使用 SELECT domain FROM domains它,将返回1亿行,这将消耗大量内存。因此您的服务器可能会崩溃。

因此,您决定分批运行该程序。假设我们的批量为1000。

在第一批中,我们将查询前1000行,检查每个域的Alexa排名并更新数据库行。

在第二批中,我们将处理接下来的1000行。在第三批中,它将是从2001年到3000年,依此类推。

现在我们需要一个生成器函数来生成批处理。

这是我们的生成器函数:

def ResultGenerator(cursor, batchsize=1000):
    while True:
        results = cursor.fetchmany(batchsize)
        if not results:
            break
        for result in results:
            yield result

如您所见,我们的功能不断yield取得结果。如果您使用关键字return而不是yield,则整个函数在返回时将结束。

return - returns only once
yield - returns multiple times

如果函数使用关键字 yield那么它就是生成器。

现在您可以像这样迭代:

db = MySQLdb.connect(host="localhost", user="root", passwd="root", db="domains")
cursor = db.cursor()
cursor.execute("SELECT domain FROM domains")
for result in ResultGenerator(cursor):
    doSomethingWith(result)
db.close()

Real World Example

Let’s say you have 100 million domains in your MySQL table, and you would like to update Alexa rank for each domain.

First thing you need is to select your domain names from the database.

Let’s say your table name is domains and column name is domain.

If you use SELECT domain FROM domains it’s going to return 100 million rows which is going to consume lot of memory. So your server might crash.

So you decided to run the program in batches. Let’s say our batch size is 1000.

In our first batch we will query the first 1000 rows, check Alexa rank for each domain and update the database row.

In our second batch we will work on the next 1000 rows. In our third batch it will be from 2001 to 3000 and so on.

Now we need a generator function which generates our batches.

Here is our generator function:

def ResultGenerator(cursor, batchsize=1000):
    while True:
        results = cursor.fetchmany(batchsize)
        if not results:
            break
        for result in results:
            yield result

As you can see, our function keeps yielding the results. If you used the keyword return instead of yield, then the whole function would be ended once it reached return.

return - returns only once
yield - returns multiple times

If a function uses the keyword yield then it’s a generator.

Now you can iterate like this:

db = MySQLdb.connect(host="localhost", user="root", passwd="root", db="domains")
cursor = db.cursor()
cursor.execute("SELECT domain FROM domains")
for result in ResultGenerator(cursor):
    doSomethingWith(result)
db.close()

回答 5

缓冲。当可以大块地获取数据但以小块来处理数据时,生成器可能会有所帮助:

def bufferedFetch():
  while True:
     buffer = getBigChunkOfData()
     # insert some code to break on 'end of data'
     for i in buffer:    
          yield i

上面的内容使您可以轻松地将缓冲与处理分开。现在,消费者函数可以只逐个获取值,而不必担心缓冲。

Buffering. When it is efficient to fetch data in large chunks, but process it in small chunks, then a generator might help:

def bufferedFetch():
  while True:
     buffer = getBigChunkOfData()
     # insert some code to break on 'end of data'
     for i in buffer:    
          yield i

The above lets you easily separate buffering from processing. The consumer function can now just get the values one by one without worrying about buffering.


回答 6

我发现生成器在清理代码以及为您提供封装和模块化代码的独特方法方面非常有帮助。在你需要的东西,不停地吐了价值基于其自身的内部处理和情况时,从任何地方在你的代码调用的东西需要(而不仅仅是内部的循环或例如块),生成器功能,用。

一个抽象的例子是斐波那契数字生成器,它不存在于循环中,当从任何地方调用它时,它将始终返回序列中的下一个数字:

def fib():
    first = 0
    second = 1
    yield first
    yield second

    while 1:
        next = first + second
        yield next
        first = second
        second = next

fibgen1 = fib()
fibgen2 = fib()

现在,您有两个斐波那契数字生成器对象,可以在代码中的任何位置调用它们,它们将始终按以下顺序依次返回更大的斐波那契数字:

>>> fibgen1.next(); fibgen1.next(); fibgen1.next(); fibgen1.next()
0
1
1
2
>>> fibgen2.next(); fibgen2.next()
0
1
>>> fibgen1.next(); fibgen1.next()
3
5

生成器的妙处在于它们封装状态而无需经历创建对象的麻烦。考虑它们的一种方法是记住它们内部状态的“功能”。

我从Python Generators获得了Fibonacci示例-它们是什么?稍加想象,您就可以想到许多其他情况,其中生成器为for循环和其他传统迭代构造提供了绝佳的替代方案。

I have found that generators are very helpful in cleaning up your code and by giving you a very unique way to encapsulate and modularize code. In a situation where you need something to constantly spit out values based on its own internal processing and when that something needs to be called from anywhere in your code (and not just within a loop or a block for example), generators are the feature to use.

An abstract example would be a Fibonacci number generator that does not live within a loop and when it is called from anywhere will always return the next number in the sequence:

def fib():
    first = 0
    second = 1
    yield first
    yield second

    while 1:
        next = first + second
        yield next
        first = second
        second = next

fibgen1 = fib()
fibgen2 = fib()

Now you have two Fibonacci number generator objects which you can call from anywhere in your code and they will always return ever larger Fibonacci numbers in sequence as follows:

>>> fibgen1.next(); fibgen1.next(); fibgen1.next(); fibgen1.next()
0
1
1
2
>>> fibgen2.next(); fibgen2.next()
0
1
>>> fibgen1.next(); fibgen1.next()
3
5

The lovely thing about generators is that they encapsulate state without having to go through the hoops of creating objects. One way of thinking about them is as “functions” which remember their internal state.

I got the Fibonacci example from Python Generators – What are they? and with a little imagination, you can come up with a lot of other situations where generators make for a great alternative to for loops and other traditional iteration constructs.


回答 7

简单的解释:考虑一条for语句

for item in iterable:
   do_stuff()

很多时候,其中的所有项目iterable并不需要一开始就存在,而是可以根据需要即时生成。两者都可以更有效率

  • 空间(您无需同时存储所有物品)和
  • 时间(迭代可能会在需要所有项目之前完成)。

有时,您甚至都不知道所有项目。例如:

for command in user_input():
   do_stuff_with(command)

您无法事先知道所有用户的命令,但是如果您有生成器来处理命令,则可以使用类似这样的循环:

def user_input():
    while True:
        wait_for_command()
        cmd = get_command()
        yield cmd

使用生成器,您还可以在无限序列上进行迭代,这当然在对容器进行迭代时是不可能的。

The simple explanation: Consider a for statement

for item in iterable:
   do_stuff()

A lot of the time, all the items in iterable doesn’t need to be there from the start, but can be generated on the fly as they’re required. This can be a lot more efficient in both

  • space (you never need to store all the items simultaneously) and
  • time (the iteration may finish before all the items are needed).

Other times, you don’t even know all the items ahead of time. For example:

for command in user_input():
   do_stuff_with(command)

You have no way of knowing all the user’s commands beforehand, but you can use a nice loop like this if you have a generator handing you commands:

def user_input():
    while True:
        wait_for_command()
        cmd = get_command()
        yield cmd

With generators you can also have iteration over infinite sequences, which is of course not possible when iterating over containers.


回答 8

我最喜欢的用法是“过滤”和“减少”操作。

假设我们正在读取文件,只需要以“ ##”开头的行。

def filter2sharps( aSequence ):
    for l in aSequence:
        if l.startswith("##"):
            yield l

然后,我们可以在适当的循环中使用generator函数

source= file( ... )
for line in filter2sharps( source.readlines() ):
    print line
source.close()

减少示例类似。假设我们有一个文件,需要在其中定位<Location>...</Location>行块。[不是HTML标签,而是恰好看起来像标签的行。]

def reduceLocation( aSequence ):
    keep= False
    block= None
    for line in aSequence:
        if line.startswith("</Location"):
            block.append( line )
            yield block
            block= None
            keep= False
        elif line.startsWith("<Location"):
            block= [ line ]
            keep= True
        elif keep:
            block.append( line )
        else:
            pass
    if block is not None:
        yield block # A partial block, icky

同样,我们可以在适当的for循环中使用此生成器。

source = file( ... )
for b in reduceLocation( source.readlines() ):
    print b
source.close()

这个想法是,生成器函数允许我们过滤或减少序列,一次生成另一个序列一个值。

My favorite uses are “filter” and “reduce” operations.

Let’s say we’re reading a file, and only want the lines which begin with “##”.

def filter2sharps( aSequence ):
    for l in aSequence:
        if l.startswith("##"):
            yield l

We can then use the generator function in a proper loop

source= file( ... )
for line in filter2sharps( source.readlines() ):
    print line
source.close()

The reduce example is similar. Let’s say we have a file where we need to locate blocks of <Location>...</Location> lines. [Not HTML tags, but lines that happen to look tag-like.]

def reduceLocation( aSequence ):
    keep= False
    block= None
    for line in aSequence:
        if line.startswith("</Location"):
            block.append( line )
            yield block
            block= None
            keep= False
        elif line.startsWith("<Location"):
            block= [ line ]
            keep= True
        elif keep:
            block.append( line )
        else:
            pass
    if block is not None:
        yield block # A partial block, icky

Again, we can use this generator in a proper for loop.

source = file( ... )
for b in reduceLocation( source.readlines() ):
    print b
source.close()

The idea is that a generator function allows us to filter or reduce a sequence, producing a another sequence one value at a time.


回答 9

一个可以使用生成器的实际示例是,如果您具有某种形状,并且想要遍历生成器的角,边缘或其他任何东西。对于我自己的项目(这里的源代码),我有一个矩形:

class Rect():

    def __init__(self, x, y, width, height):
        self.l_top  = (x, y)
        self.r_top  = (x+width, y)
        self.r_bot  = (x+width, y+height)
        self.l_bot  = (x, y+height)

    def __iter__(self):
        yield self.l_top
        yield self.r_top
        yield self.r_bot
        yield self.l_bot

现在,我可以创建一个矩形并在其角上循环:

myrect=Rect(50, 50, 100, 100)
for corner in myrect:
    print(corner)

相反,__iter__您可以有一个方法iter_corners并用调用它for corner in myrect.iter_corners()__iter__从那时起,使用起来更加优雅,我们可以直接在for表达式中使用类实例名称。

A practical example where you could make use of a generator is if you have some kind of shape and you want to iterate over its corners, edges or whatever. For my own project (source code here) I had a rectangle:

class Rect():

    def __init__(self, x, y, width, height):
        self.l_top  = (x, y)
        self.r_top  = (x+width, y)
        self.r_bot  = (x+width, y+height)
        self.l_bot  = (x, y+height)

    def __iter__(self):
        yield self.l_top
        yield self.r_top
        yield self.r_bot
        yield self.l_bot

Now I can create a rectangle and loop over its corners:

myrect=Rect(50, 50, 100, 100)
for corner in myrect:
    print(corner)

Instead of __iter__ you could have a method iter_corners and call that with for corner in myrect.iter_corners(). It’s just more elegant to use __iter__ since then we can use the class instance name directly in the for expression.


回答 10

遍历输入保持状态时,基本上避免使用回调函数。

请参见此处此处以概述使用生成器可以完成的操作。

Basically avoiding call-back functions when iterating over input maintaining state.

See here and here for an overview of what can be done using generators.


回答 11

但是,这里有一些很好的答案,我还建议您完整阅读Python Functional Programming教程,该教程有助于解释一些更强大的生成器用例。

Some good answers here, however, I’d also recommend a complete read of the Python Functional Programming tutorial which helps explain some of the more potent use-cases of generators.


回答 12

由于未提及生成器的send方法,因此下面是一个示例:

def test():
    for i in xrange(5):
        val = yield
        print(val)

t = test()

# Proceed to 'yield' statement
next(t)

# Send value to yield
t.send(1)
t.send('2')
t.send([3])

它显示了将值发送到正在运行的生成器的可能性。以下视频中有关生成器的更高级的类(包括yield爆炸,并行处理的生成器,逃避递归限制等)

David Beazley在PyCon 2014上的生成器

Since the send method of a generator has not been mentioned, here is an example:

def test():
    for i in xrange(5):
        val = yield
        print(val)

t = test()

# Proceed to 'yield' statement
next(t)

# Send value to yield
t.send(1)
t.send('2')
t.send([3])

It shows the possibility to send a value to a running generator. A more advanced course on generators in the video below (including yield from explination, generators for parallel processing, escaping the recursion limit, etc.)

David Beazley on generators at PyCon 2014


回答 13

当我们的Web服务器充当代理时,我使用生成器:

  1. 客户端从服务器请求代理的URL
  2. 服务器开始加载目标网址
  3. 服务器屈服于将结果尽快返回给客户端

I use generators when our web server is acting as a proxy:

  1. The client requests a proxied url from the server
  2. The server begins to load the target url
  3. The server yields to return the results to the client as soon as it gets them

回答 14

成堆的东西。任何时候您想要生成一系列项目,但又不想一次将它们“物化”到一个列表中。例如,您可能有一个简单的生成器返回质数:

def primes():
    primes_found = set()
    primes_found.add(2)
    yield 2
    for i in itertools.count(1):
        candidate = i * 2 + 1
        if not all(candidate % prime for prime in primes_found):
            primes_found.add(candidate)
            yield candidate

然后,您可以使用它来生成后续素数的乘积:

def prime_products():
    primeiter = primes()
    prev = primeiter.next()
    for prime in primeiter:
        yield prime * prev
        prev = prime

这些是相当琐碎的示例,但是您可以看到它在不预先生成大型数据集(可能无限!)的情况下如何有用,这只是更明显的用途之一。

Piles of stuff. Any time you want to generate a sequence of items, but don’t want to have to ‘materialize’ them all into a list at once. For example, you could have a simple generator that returns prime numbers:

def primes():
    primes_found = set()
    primes_found.add(2)
    yield 2
    for i in itertools.count(1):
        candidate = i * 2 + 1
        if not all(candidate % prime for prime in primes_found):
            primes_found.add(candidate)
            yield candidate

You could then use that to generate the products of subsequent primes:

def prime_products():
    primeiter = primes()
    prev = primeiter.next()
    for prime in primeiter:
        yield prime * prev
        prev = prime

These are fairly trivial examples, but you can see how it can be useful for processing large (potentially infinite!) datasets without generating them in advance, which is only one of the more obvious uses.


回答 15

也适合打印最多n的质数:

def genprime(n=10):
    for num in range(3, n+1):
        for factor in range(2, num):
            if num%factor == 0:
                break
        else:
            yield(num)

for prime_num in genprime(100):
    print(prime_num)

Also good for printing the prime numbers up to n:

def genprime(n=10):
    for num in range(3, n+1):
        for factor in range(2, num):
            if num%factor == 0:
                break
        else:
            yield(num)

for prime_num in genprime(100):
    print(prime_num)