问题:在Python中重置生成器对象

我有一个由多个yield返回的生成器对象。准备调用此生成器是相当耗时的操作。这就是为什么我想多次重用生成器。

y = FunctionWithYield()
for x in y: print(x)
#here must be something to reset 'y'
for x in y: print(x)

当然,我会考虑将内容复制到简单列表中。有没有办法重置我的生成器?

I have a generator object returned by multiple yield. Preparation to call this generator is rather time-consuming operation. That is why I want to reuse the generator several times.

y = FunctionWithYield()
for x in y: print(x)
#here must be something to reset 'y'
for x in y: print(x)

Of course, I’m taking in mind copying content into simple list. Is there a way to reset my generator?


回答 0

另一个选择是使用该itertools.tee()函数来创建生成器的第二版本:

y = FunctionWithYield()
y, y_backup = tee(y)
for x in y:
    print(x)
for x in y_backup:
    print(x)

如果原始迭代可能未处理所有项目,则从内存使用的角度来看这可能是有益的。

Another option is to use the itertools.tee() function to create a second version of your generator:

y = FunctionWithYield()
y, y_backup = tee(y)
for x in y:
    print(x)
for x in y_backup:
    print(x)

This could be beneficial from memory usage point of view if the original iteration might not process all the items.


回答 1

生成器不能倒退。您有以下选择:

  1. 再次运行生成器功能,重新开始生成:

    y = FunctionWithYield()
    for x in y: print(x)
    y = FunctionWithYield()
    for x in y: print(x)
  2. 将生成器结果存储在内存或磁盘上的数据结构中,可以再次进行迭代:

    y = list(FunctionWithYield())
    for x in y: print(x)
    # can iterate again:
    for x in y: print(x)

选项1的缺点是它会再次计算值。如果那是CPU密集型的,那么您最终将计算两次。另一方面,2的缺点是存储空间。整个值列表将存储在内存中。如果值太多,那将是不切实际的。

因此,您将获得经典的内存与处理权衡。我无法想象一种在不存储值或再次计算值的情况下倒带生成器的方法。

Generators can’t be rewound. You have the following options:

  1. Run the generator function again, restarting the generation:

    y = FunctionWithYield()
    for x in y: print(x)
    y = FunctionWithYield()
    for x in y: print(x)
    
  2. Store the generator results in a data structure on memory or disk which you can iterate over again:

    y = list(FunctionWithYield())
    for x in y: print(x)
    # can iterate again:
    for x in y: print(x)
    

The downside of option 1 is that it computes the values again. If that’s CPU-intensive you end up calculating twice. On the other hand, the downside of 2 is the storage. The entire list of values will be stored on memory. If there are too many values, that can be unpractical.

So you have the classic memory vs. processing tradeoff. I can’t imagine a way of rewinding the generator without either storing the values or calculating them again.


回答 2

>>> def gen():
...     def init():
...         return 0
...     i = init()
...     while True:
...         val = (yield i)
...         if val=='restart':
...             i = init()
...         else:
...             i += 1

>>> g = gen()
>>> g.next()
0
>>> g.next()
1
>>> g.next()
2
>>> g.next()
3
>>> g.send('restart')
0
>>> g.next()
1
>>> g.next()
2
>>> def gen():
...     def init():
...         return 0
...     i = init()
...     while True:
...         val = (yield i)
...         if val=='restart':
...             i = init()
...         else:
...             i += 1

>>> g = gen()
>>> g.next()
0
>>> g.next()
1
>>> g.next()
2
>>> g.next()
3
>>> g.send('restart')
0
>>> g.next()
1
>>> g.next()
2

回答 3

可能最简单的解决方案是将昂贵的零件包装在一个对象中,然后将其传递给生成器:

data = ExpensiveSetup()
for x in FunctionWithYield(data): pass
for x in FunctionWithYield(data): pass

这样,您可以缓存昂贵的计算。

如果可以将所有结果同时保存在RAM中,请使用,list()以将生成器的结果具体化为简单列表并进行处理。

Probably the most simple solution is to wrap the expensive part in an object and pass that to the generator:

data = ExpensiveSetup()
for x in FunctionWithYield(data): pass
for x in FunctionWithYield(data): pass

This way, you can cache the expensive calculations.

If you can keep all results in RAM at the same time, then use list() to materialize the results of the generator in a plain list and work with that.


回答 4

我想为旧问题提供其他解决方案

class IterableAdapter:
    def __init__(self, iterator_factory):
        self.iterator_factory = iterator_factory

    def __iter__(self):
        return self.iterator_factory()

squares = IterableAdapter(lambda: (x * x for x in range(5)))

for x in squares: print(x)
for x in squares: print(x)

与类似的东西相比,这样做的好处list(iterator)O(1)空间复杂度list(iterator)O(n)。缺点是,如果您只能访问迭代器,而不能访问生成迭代器的函数,则无法使用此方法。例如,执行以下操作似乎很合理,但它不起作用。

g = (x * x for x in range(5))

squares = IterableAdapter(lambda: g)

for x in squares: print(x)
for x in squares: print(x)

I want to offer a different solution to an old problem

class IterableAdapter:
    def __init__(self, iterator_factory):
        self.iterator_factory = iterator_factory

    def __iter__(self):
        return self.iterator_factory()

squares = IterableAdapter(lambda: (x * x for x in range(5)))

for x in squares: print(x)
for x in squares: print(x)

The benefit of this when compared to something like list(iterator) is that this is O(1) space complexity and list(iterator) is O(n). The disadvantage is that, if you only have access to the iterator, but not the function that produced the iterator, then you cannot use this method. For example, it might seem reasonable to do the following, but it will not work.

g = (x * x for x in range(5))

squares = IterableAdapter(lambda: g)

for x in squares: print(x)
for x in squares: print(x)

回答 5

如果GrzegorzOledzki的答案不足够,您可能会send()用来实现目标。有关增强的生成器和yield表达式的更多详细信息,请参见PEP-0342

更新:另请参见。它涉及到上面提到的一些内存与处理权衡,但是它可能比仅将生成器结果存储在list;中节省一些内存。这取决于您使用生成器的方式。

If GrzegorzOledzki’s answer won’t suffice, you could probably use send() to accomplish your goal. See PEP-0342 for more details on enhanced generators and yield expressions.

UPDATE: Also see . It involves some of that memory vs. processing tradeoff mentioned above, but it might save some memory over just storing the generator results in a list; it depends on how you’re using the generator.


回答 6

如果您的生成器是纯粹的,从某种意义上说,它的输出仅取决于传递的参数和步骤号,并且您希望生成的生成器可重新启动,则下面的排序片段可能会很方便:

import copy

def generator(i):
    yield from range(i)

g = generator(10)
print(list(g))
print(list(g))

class GeneratorRestartHandler(object):
    def __init__(self, gen_func, argv, kwargv):
        self.gen_func = gen_func
        self.argv = copy.copy(argv)
        self.kwargv = copy.copy(kwargv)
        self.local_copy = iter(self)

    def __iter__(self):
        return self.gen_func(*self.argv, **self.kwargv)

    def __next__(self):
        return next(self.local_copy)

def restartable(g_func: callable) -> callable:
    def tmp(*argv, **kwargv):
        return GeneratorRestartHandler(g_func, argv, kwargv)

    return tmp

@restartable
def generator2(i):
    yield from range(i)

g = generator2(10)
print(next(g))
print(list(g))
print(list(g))
print(next(g))

输出:

[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
[]
0
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
1

If your generator is pure in a sense that its output only depends on passed arguments and the step number, and you want the resulting generator to be restartable, here’s a sort snippet that might be handy:

import copy

def generator(i):
    yield from range(i)

g = generator(10)
print(list(g))
print(list(g))

class GeneratorRestartHandler(object):
    def __init__(self, gen_func, argv, kwargv):
        self.gen_func = gen_func
        self.argv = copy.copy(argv)
        self.kwargv = copy.copy(kwargv)
        self.local_copy = iter(self)

    def __iter__(self):
        return self.gen_func(*self.argv, **self.kwargv)

    def __next__(self):
        return next(self.local_copy)

def restartable(g_func: callable) -> callable:
    def tmp(*argv, **kwargv):
        return GeneratorRestartHandler(g_func, argv, kwargv)

    return tmp

@restartable
def generator2(i):
    yield from range(i)

g = generator2(10)
print(next(g))
print(list(g))
print(list(g))
print(next(g))

outputs:

[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
[]
0
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
1

回答 7

tee的官方文档中

通常,如果一个迭代器在另一个迭代器启动之前使用了大部分或全部数据,则使用list()而不是tee()更快。

因此,最好list(iterable)在您的情况下使用。

From official documentation of tee:

In general, if one iterator uses most or all of the data before another iterator starts, it is faster to use list() instead of tee().

So it’s best to use list(iterable) instead in your case.


回答 8

使用包装函数处理 StopIteration

您可以在生成器的函数中编写一个简单的包装函数,以跟踪生成器何时耗尽。它将使用StopIteration生成器到达迭代结束时抛出的异常来执行此操作。

import types

def generator_wrapper(function=None, **kwargs):
    assert function is not None, "Please supply a function"
    def inner_func(function=function, **kwargs):
        generator = function(**kwargs)
        assert isinstance(generator, types.GeneratorType), "Invalid function"
        try:
            yield next(generator)
        except StopIteration:
            generator = function(**kwargs)
            yield next(generator)
    return inner_func

正如您在上面看到的那样,当包装函数捕获StopIteration异常时,它只是简单地重新初始化了生成器对象(使用函数调用的另一个实例)。

然后,假设您在如下所示的位置定义了生成器提供的函数,则可以使用Python函数装饰器语法隐式包装它:

@generator_wrapper
def generator_generating_function(**kwargs):
    for item in ["a value", "another value"]
        yield item

Using a wrapper function to handle StopIteration

You could write a simple wrapper function to your generator-generating function that tracks when the generator is exhausted. It will do so using the StopIteration exception a generator throws when it reaches end of iteration.

import types

def generator_wrapper(function=None, **kwargs):
    assert function is not None, "Please supply a function"
    def inner_func(function=function, **kwargs):
        generator = function(**kwargs)
        assert isinstance(generator, types.GeneratorType), "Invalid function"
        try:
            yield next(generator)
        except StopIteration:
            generator = function(**kwargs)
            yield next(generator)
    return inner_func

As you can spot above, when our wrapper function catches a StopIteration exception, it simply re-initializes the generator object (using another instance of the function call).

And then, assuming you define your generator-supplying function somewhere as below, you could use the Python function decorator syntax to wrap it implicitly:

@generator_wrapper
def generator_generating_function(**kwargs):
    for item in ["a value", "another value"]
        yield item

回答 9

您可以定义一个返回生成器的函数

def f():
  def FunctionWithYield(generator_args):
    code here...

  return FunctionWithYield

现在,您可以随意进行多次:

for x in f()(generator_args): print(x)
for x in f()(generator_args): print(x)

You can define a function that returns your generator

def f():
  def FunctionWithYield(generator_args):
    code here...

  return FunctionWithYield

Now you can just do as many times as you like:

for x in f()(generator_args): print(x)
for x in f()(generator_args): print(x)

回答 10

我不确定您所说的昂贵准备是什么意思,但我想您实际上有

data = ... # Expensive computation
y = FunctionWithYield(data)
for x in y: print(x)
#here must be something to reset 'y'
# this is expensive - data = ... # Expensive computation
# y = FunctionWithYield(data)
for x in y: print(x)

如果是这样,为什么不重用data

I’m not sure what you meant by expensive preparation, but I guess you actually have

data = ... # Expensive computation
y = FunctionWithYield(data)
for x in y: print(x)
#here must be something to reset 'y'
# this is expensive - data = ... # Expensive computation
# y = FunctionWithYield(data)
for x in y: print(x)

If that’s the case, why not reuse data?


回答 11

没有重置迭代器的选项。通过next()函数进行迭代时,通常会弹出迭代器。唯一的方法是在迭代器对象上进行迭代之前进行备份。检查下面。

创建项目0到9的迭代器对象

i=iter(range(10))

遍历将弹出的next()函数

print(next(i))

将迭代器对象转换为列表

L=list(i)
print(L)
output: [1, 2, 3, 4, 5, 6, 7, 8, 9]

因此项目0已经弹出。当我们将迭代器转换为列表时,所有项目也会弹出。

next(L) 

Traceback (most recent call last):
  File "<pyshell#129>", line 1, in <module>
    next(L)
StopIteration

因此,您需要在开始迭代之前将迭代器转换为要备份的列表。列表可以用转换为迭代器iter(<list-object>)

There is no option to reset iterators. Iterator usually pops out when it iterate through next() function. Only way is to take a backup before iterate on the iterator object. Check below.

Creating iterator object with items 0 to 9

i=iter(range(10))

Iterating through next() function which will pop out

print(next(i))

Converting the iterator object to list

L=list(i)
print(L)
output: [1, 2, 3, 4, 5, 6, 7, 8, 9]

so item 0 is already popped out. Also all the items are popped as we converted the iterator to list.

next(L) 

Traceback (most recent call last):
  File "<pyshell#129>", line 1, in <module>
    next(L)
StopIteration

So you need to convert the iterator to lists for backup before start iterating. List could be converted to iterator with iter(<list-object>)


回答 12

现在,您可以使用more_itertools.seekable(第三方工具)启用重置迭代器的功能。

通过安装 > pip install more_itertools

import more_itertools as mit


y = mit.seekable(FunctionWithYield())
for x in y:
    print(x)

y.seek(0)                                              # reset iterator
for x in y:
    print(x)

注意:前进迭代器时内存消耗会增加,因此请警惕大型可迭代对象。

You can now use more_itertools.seekable (a third-party tool) which enables resetting iterators.

Install via > pip install more_itertools

import more_itertools as mit


y = mit.seekable(FunctionWithYield())
for x in y:
    print(x)

y.seek(0)                                              # reset iterator
for x in y:
    print(x)

Note: memory consumption grows while advancing the iterator, so be wary of large iterables.


回答 13

您可以通过使用itertools.cycle()来实现, 可以使用此方法创建一个迭代器,然后在该迭代器上执行for循环,该循环将遍历其值。

例如:

def generator():
for j in cycle([i for i in range(5)]):
    yield j

gen = generator()
for i in range(20):
    print(next(gen))

会生成20个数字,重复0到4。

来自文档的注释:

Note, this member of the toolkit may require significant auxiliary storage (depending on the length of the iterable).

You can do that by using itertools.cycle() you can create an iterator with this method and then execute a for loop over the iterator which will loop over its values.

For example:

def generator():
for j in cycle([i for i in range(5)]):
    yield j

gen = generator()
for i in range(20):
    print(next(gen))

will generate 20 numbers, 0 to 4 repeatedly.

A note from the docs:

Note, this member of the toolkit may require significant auxiliary storage (depending on the length of the iterable).

回答 14

好的,您说您想多次调用一个生成器,但是初始化非常昂贵。

class InitializedFunctionWithYield(object):
    def __init__(self):
        # do expensive initialization
        self.start = 5

    def __call__(self, *args, **kwargs):
        # do cheap iteration
        for i in xrange(5):
            yield self.start + i

y = InitializedFunctionWithYield()

for x in y():
    print x

for x in y():
    print x

另外,您可以只创建遵循迭代器协议并定义某种“重置”功能的类。

class MyIterator(object):
    def __init__(self):
        self.reset()

    def reset(self):
        self.i = 5

    def __iter__(self):
        return self

    def next(self):
        i = self.i
        if i > 0:
            self.i -= 1
            return i
        else:
            raise StopIteration()

my_iterator = MyIterator()

for x in my_iterator:
    print x

print 'resetting...'
my_iterator.reset()

for x in my_iterator:
    print x

https://docs.python.org/2/library/stdtypes.html#iterator-types http://anandology.com/python-practice-book/iterators.html

Ok, you say you want to call a generator multiple times, but initialization is expensive… What about something like this?

class InitializedFunctionWithYield(object):
    def __init__(self):
        # do expensive initialization
        self.start = 5

    def __call__(self, *args, **kwargs):
        # do cheap iteration
        for i in xrange(5):
            yield self.start + i

y = InitializedFunctionWithYield()

for x in y():
    print x

for x in y():
    print x

Alternatively, you could just make your own class that follows the iterator protocol and defines some sort of ‘reset’ function.

class MyIterator(object):
    def __init__(self):
        self.reset()

    def reset(self):
        self.i = 5

    def __iter__(self):
        return self

    def next(self):
        i = self.i
        if i > 0:
            self.i -= 1
            return i
        else:
            raise StopIteration()

my_iterator = MyIterator()

for x in my_iterator:
    print x

print 'resetting...'
my_iterator.reset()

for x in my_iterator:
    print x

https://docs.python.org/2/library/stdtypes.html#iterator-types http://anandology.com/python-practice-book/iterators.html


回答 15

我的答案解决了一个稍微不同的问题:如果生成器的初始化成本很高,而每个生成的对象的生成成本却很高。但是我们需要在多个函数中多次使用生成器。为了精确地调用生成器和每个生成的对象一次,我们可以使用线程,并在不同的线程中运行每个使用方法。由于GIL,我们可能无法实现真正​​的并行性,但是我们将实现我们的目标。

在以下情况下,该方法做得很好:深度学习模型处理了大量图像。结果是图像上许多对象的大量蒙版。每个掩码都会占用内存。我们大约有10种方法可以进行不同的统计和度量,但是它们会同时拍摄所有图像。所有图像都无法容纳在内存中。这些方法可以轻松地重写为接受迭代器。

class GeneratorSplitter:
'''
Split a generator object into multiple generators which will be sincronised. Each call to each of the sub generators will cause only one call in the input generator. This way multiple methods on threads can iterate the input generator , and the generator will cycled only once.
'''

def __init__(self, gen):
    self.gen = gen
    self.consumers: List[GeneratorSplitter.InnerGen] = []
    self.thread: threading.Thread = None
    self.value = None
    self.finished = False
    self.exception = None

def GetConsumer(self):
    # Returns a generator object. 
    cons = self.InnerGen(self)
    self.consumers.append(cons)
    return cons

def _Work(self):
    try:
        for d in self.gen:
            for cons in self.consumers:
                cons.consumed.wait()
                cons.consumed.clear()

            self.value = d

            for cons in self.consumers:
                cons.readyToRead.set()

        for cons in self.consumers:
            cons.consumed.wait()

        self.finished = True

        for cons in self.consumers:
            cons.readyToRead.set()
    except Exception as ex:
        self.exception = ex
        for cons in self.consumers:
            cons.readyToRead.set()

def Start(self):
    self.thread = threading.Thread(target=self._Work)
    self.thread.start()

class InnerGen:
    def __init__(self, parent: "GeneratorSplitter"):
        self.parent: "GeneratorSplitter" = parent
        self.readyToRead: threading.Event = threading.Event()
        self.consumed: threading.Event = threading.Event()
        self.consumed.set()

    def __iter__(self):
        return self

    def __next__(self):
        self.readyToRead.wait()
        self.readyToRead.clear()
        if self.parent.finished:
            raise StopIteration()
        if self.parent.exception:
            raise self.parent.exception
        val = self.parent.value
        self.consumed.set()
        return val

用法:

genSplitter = GeneratorSplitter(expensiveGenerator)

metrics={}
executor = ThreadPoolExecutor(max_workers=3)
f1 = executor.submit(mean,genSplitter.GetConsumer())
f2 = executor.submit(max,genSplitter.GetConsumer())
f3 = executor.submit(someFancyMetric,genSplitter.GetConsumer())
genSplitter.Start()

metrics.update(f1.result())
metrics.update(f2.result())
metrics.update(f3.result())

My answer solves slightly different problem: If the generator is expensive to initialize and each generated object is expensive to generate. But we need to consume the generator multiple times in multiple functions. In order to call the generator and each generated object exactly once we can use threads and Run each of the consuming methods in different thread. We may not achieve true parallelism due to GIL, but we will achieve our goal.

This approach did a good job in the following case: deep learning model processes a lot of images. The result is a lot of masks for a lot of objects on the image. Each mask consumes memory. We have around 10 methods which make different statistics and metrics, but they take all the images at once. All the images cannot fit in memory. The moethods can easily be rewritten to accept iterator.

class GeneratorSplitter:
'''
Split a generator object into multiple generators which will be sincronised. Each call to each of the sub generators will cause only one call in the input generator. This way multiple methods on threads can iterate the input generator , and the generator will cycled only once.
'''

def __init__(self, gen):
    self.gen = gen
    self.consumers: List[GeneratorSplitter.InnerGen] = []
    self.thread: threading.Thread = None
    self.value = None
    self.finished = False
    self.exception = None

def GetConsumer(self):
    # Returns a generator object. 
    cons = self.InnerGen(self)
    self.consumers.append(cons)
    return cons

def _Work(self):
    try:
        for d in self.gen:
            for cons in self.consumers:
                cons.consumed.wait()
                cons.consumed.clear()

            self.value = d

            for cons in self.consumers:
                cons.readyToRead.set()

        for cons in self.consumers:
            cons.consumed.wait()

        self.finished = True

        for cons in self.consumers:
            cons.readyToRead.set()
    except Exception as ex:
        self.exception = ex
        for cons in self.consumers:
            cons.readyToRead.set()

def Start(self):
    self.thread = threading.Thread(target=self._Work)
    self.thread.start()

class InnerGen:
    def __init__(self, parent: "GeneratorSplitter"):
        self.parent: "GeneratorSplitter" = parent
        self.readyToRead: threading.Event = threading.Event()
        self.consumed: threading.Event = threading.Event()
        self.consumed.set()

    def __iter__(self):
        return self

    def __next__(self):
        self.readyToRead.wait()
        self.readyToRead.clear()
        if self.parent.finished:
            raise StopIteration()
        if self.parent.exception:
            raise self.parent.exception
        val = self.parent.value
        self.consumed.set()
        return val

Ussage:

genSplitter = GeneratorSplitter(expensiveGenerator)

metrics={}
executor = ThreadPoolExecutor(max_workers=3)
f1 = executor.submit(mean,genSplitter.GetConsumer())
f2 = executor.submit(max,genSplitter.GetConsumer())
f3 = executor.submit(someFancyMetric,genSplitter.GetConsumer())
genSplitter.Start()

metrics.update(f1.result())
metrics.update(f2.result())
metrics.update(f3.result())

回答 16

可以通过代码对象来完成。这是例子。

code_str="y=(a for a in [1,2,3,4])"
code1=compile(code_str,'<string>','single')
exec(code1)
for i in y: print i

1 2 3 4

for i in y: print i


exec(code1)
for i in y: print i

1 2 3 4

It can be done by code object. Here is the example.

code_str="y=(a for a in [1,2,3,4])"
code1=compile(code_str,'<string>','single')
exec(code1)
for i in y: print i

1 2 3 4

for i in y: print i


exec(code1)
for i in y: print i

1 2 3 4


声明:本站所有文章,如无特殊说明或标注,均为本站原创发布。任何个人或组织,在未征得本站同意时,禁止复制、盗用、采集、发布本站内容到任何网站、书籍等各类媒体平台。如若本站内容侵犯了原著者的合法权益,可联系我们进行处理。