标签归档:yield

与Python生成器模式等效的C ++

问题:与Python生成器模式等效的C ++

我有一些需要在C ++中模仿的示例Python代码。我不需要任何特定的解决方案(例如基于协同例程的收益解决方案,尽管它们也是可接受的答案),我只需要以某种方式重现语义。

Python

这是一个基本的序列生成器,显然太大了,无法存储实例化版本。

def pair_sequence():
    for i in range(2**32):
        for j in range(2**32):
            yield (i, j)

目标是维护上述序列的两个实例,并以半锁步的方式在块上进行迭代。在下面的示例中,first_pass使用对的序列来初始化缓冲区,然后second_pass重新生成相同的精确序列并再次处理缓冲区。

def run():
    seq1 = pair_sequence()
    seq2 = pair_sequence()

    buffer = [0] * 1000
    first_pass(seq1, buffer)
    second_pass(seq2, buffer)
    ... repeat ...

C ++

对于C ++解决方案,我唯一能找到的就是模仿yieldC ++协程,但是我还没有找到有关如何执行此操作的良好参考。我也对解决此问题的替代(非常规)解决方案感兴趣。我没有足够的内存预算来保留两次通过之间的序列副本。

I’ve got some example Python code that I need to mimic in C++. I do not require any specific solution (such as co-routine based yield solutions, although they would be acceptable answers as well), I simply need to reproduce the semantics in some manner.

Python

This is a basic sequence generator, clearly too large to store a materialized version.

def pair_sequence():
    for i in range(2**32):
        for j in range(2**32):
            yield (i, j)

The goal is to maintain two instances of the sequence above, and iterate over them in semi-lockstep, but in chunks. In the example below the first_pass uses the sequence of pairs to initialize the buffer, and the second_pass regenerates the same exact sequence and processes the buffer again.

def run():
    seq1 = pair_sequence()
    seq2 = pair_sequence()

    buffer = [0] * 1000
    first_pass(seq1, buffer)
    second_pass(seq2, buffer)
    ... repeat ...

C++

The only thing I can find for a solution in C++ is to mimic yield with C++ coroutines, but I haven’t found any good reference on how to do this. I’m also interested in alternative (non general) solutions for this problem. I do not have enough memory budget to keep a copy of the sequence between passes.


回答 0

生成器在C ++中存在,只是另外一个名称:Input Iterators。例如,从读取std::cin类似于使用的生成器char

您只需要了解生成器的功能:

  • 有大量数据:局部变量定义状态
  • 有一个初始化方法
  • 有一个“下一个”方法
  • 有一种信号终止的方法

在您的琐碎示例中,这很容易。从概念上讲:

struct State { unsigned i, j; };

State make();

void next(State&);

bool isDone(State const&);

当然,我们将其包装为适当的类:

class PairSequence:
    // (implicit aliases)
    public std::iterator<
        std::input_iterator_tag,
        std::pair<unsigned, unsigned>
    >
{
  // C++03
  typedef void (PairSequence::*BoolLike)();
  void non_comparable();
public:
  // C++11 (explicit aliases)
  using iterator_category = std::input_iterator_tag;
  using value_type = std::pair<unsigned, unsigned>;
  using reference = value_type const&;
  using pointer = value_type const*;
  using difference_type = ptrdiff_t;

  // C++03 (explicit aliases)
  typedef std::input_iterator_tag iterator_category;
  typedef std::pair<unsigned, unsigned> value_type;
  typedef value_type const& reference;
  typedef value_type const* pointer;
  typedef ptrdiff_t difference_type;

  PairSequence(): done(false) {}

  // C++11
  explicit operator bool() const { return !done; }

  // C++03
  // Safe Bool idiom
  operator BoolLike() const {
    return done ? 0 : &PairSequence::non_comparable;
  }

  reference operator*() const { return ij; }
  pointer operator->() const { return &ij; }

  PairSequence& operator++() {
    static unsigned const Max = std::numeric_limts<unsigned>::max();

    assert(!done);

    if (ij.second != Max) { ++ij.second; return *this; }
    if (ij.first != Max) { ij.second = 0; ++ij.first; return *this; }

    done = true;
    return *this;
  }

  PairSequence operator++(int) {
    PairSequence const tmp(*this);
    ++*this;
    return tmp;
  }

private:
  bool done;
  value_type ij;
};

所以,嗯…可能是C ++有点冗长:)

Generators exist in C++, just under another name: Input Iterators. For example, reading from std::cin is similar to having a generator of char.

You simply need to understand what a generator does:

  • there is a blob of data: the local variables define a state
  • there is an init method
  • there is a “next” method
  • there is a way to signal termination

In your trivial example, it’s easy enough. Conceptually:

struct State { unsigned i, j; };

State make();

void next(State&);

bool isDone(State const&);

Of course, we wrap this as a proper class:

class PairSequence:
    // (implicit aliases)
    public std::iterator<
        std::input_iterator_tag,
        std::pair<unsigned, unsigned>
    >
{
  // C++03
  typedef void (PairSequence::*BoolLike)();
  void non_comparable();
public:
  // C++11 (explicit aliases)
  using iterator_category = std::input_iterator_tag;
  using value_type = std::pair<unsigned, unsigned>;
  using reference = value_type const&;
  using pointer = value_type const*;
  using difference_type = ptrdiff_t;

  // C++03 (explicit aliases)
  typedef std::input_iterator_tag iterator_category;
  typedef std::pair<unsigned, unsigned> value_type;
  typedef value_type const& reference;
  typedef value_type const* pointer;
  typedef ptrdiff_t difference_type;

  PairSequence(): done(false) {}

  // C++11
  explicit operator bool() const { return !done; }

  // C++03
  // Safe Bool idiom
  operator BoolLike() const {
    return done ? 0 : &PairSequence::non_comparable;
  }

  reference operator*() const { return ij; }
  pointer operator->() const { return &ij; }

  PairSequence& operator++() {
    static unsigned const Max = std::numeric_limts<unsigned>::max();

    assert(!done);

    if (ij.second != Max) { ++ij.second; return *this; }
    if (ij.first != Max) { ij.second = 0; ++ij.first; return *this; }

    done = true;
    return *this;
  }

  PairSequence operator++(int) {
    PairSequence const tmp(*this);
    ++*this;
    return tmp;
  }

private:
  bool done;
  value_type ij;
};

So hum yeah… might be that C++ is a tad more verbose :)


回答 1

在C ++中有迭代器,但是实现迭代器并不容易:必须查阅迭代器概念并仔细设计新的迭代器类以实现它们。值得庆幸的是,Boost有一个iterator_facade模板,该模板应该有助于实现迭代器和兼容迭代器的生成器。

有时可以使用无堆栈协程来实现迭代器

PS另请参见 本文,其中同时提到了switchChristopher M. Kohlhoff 的黑客行为和Oliver Kowalke的Boost.Coroutine。Oliver Kowalke的工作 Giovanni P. Deretta 对Boost.Coroutine 的后续

PS我想你也可以用lambdas编写一种生成器:

std::function<int()> generator = []{
  int i = 0;
  return [=]() mutable {
    return i < 10 ? i++ : -1;
  };
}();
int ret = 0; while ((ret = generator()) != -1) std::cout << "generator: " << ret << std::endl;

或使用函子:

struct generator_t {
  int i = 0;
  int operator() () {
    return i < 10 ? i++ : -1;
  }
} generator;
int ret = 0; while ((ret = generator()) != -1) std::cout << "generator: " << ret << std::endl;

PS这是用Mordor协同程序实现的生成器:

#include <iostream>
using std::cout; using std::endl;
#include <mordor/coroutine.h>
using Mordor::Coroutine; using Mordor::Fiber;

void testMordor() {
  Coroutine<int> coro ([](Coroutine<int>& self) {
    int i = 0; while (i < 9) self.yield (i++);
  });
  for (int i = coro.call(); coro.state() != Fiber::TERM; i = coro.call()) cout << i << endl;
}

In C++ there are iterators, but implementing an iterator isn’t straightforward: one has to consult the iterator concepts and carefully design the new iterator class to implement them. Thankfully, Boost has an iterator_facade template which should help implementing the iterators and iterator-compatible generators.

Sometimes a stackless coroutine can be used to implement an iterator.

P.S. See also this article which mentions both a switch hack by Christopher M. Kohlhoff and Boost.Coroutine by Oliver Kowalke. Oliver Kowalke’s work is a followup on Boost.Coroutine by Giovanni P. Deretta.

P.S. I think you can also write a kind of generator with lambdas:

std::function<int()> generator = []{
  int i = 0;
  return [=]() mutable {
    return i < 10 ? i++ : -1;
  };
}();
int ret = 0; while ((ret = generator()) != -1) std::cout << "generator: " << ret << std::endl;

Or with a functor:

struct generator_t {
  int i = 0;
  int operator() () {
    return i < 10 ? i++ : -1;
  }
} generator;
int ret = 0; while ((ret = generator()) != -1) std::cout << "generator: " << ret << std::endl;

P.S. Here’s a generator implemented with the Mordor coroutines:

#include <iostream>
using std::cout; using std::endl;
#include <mordor/coroutine.h>
using Mordor::Coroutine; using Mordor::Fiber;

void testMordor() {
  Coroutine<int> coro ([](Coroutine<int>& self) {
    int i = 0; while (i < 9) self.yield (i++);
  });
  for (int i = coro.call(); coro.state() != Fiber::TERM; i = coro.call()) cout << i << endl;
}

回答 2

由于Boost.Coroutine2现在很好地支持了它(我找到它是因为我想解决完全相同的yield问题),所以我发布了符合您最初意图的C ++代码:

#include <stdint.h>
#include <iostream>
#include <memory>
#include <boost/coroutine2/all.hpp>

typedef boost::coroutines2::coroutine<std::pair<uint16_t, uint16_t>> coro_t;

void pair_sequence(coro_t::push_type& yield)
{
    uint16_t i = 0;
    uint16_t j = 0;
    for (;;) {
        for (;;) {
            yield(std::make_pair(i, j));
            if (++j == 0)
                break;
        }
        if (++i == 0)
            break;
    }
}

int main()
{
    coro_t::pull_type seq(boost::coroutines2::fixedsize_stack(),
                          pair_sequence);
    for (auto pair : seq) {
        print_pair(pair);
    }
    //while (seq) {
    //    print_pair(seq.get());
    //    seq();
    //}
}

在此示例中,pair_sequence不接受其他参数。如果需要,std::bind或者在将lambda push_type传递给coro_t::pull_type构造函数时,应使用lambda生成仅包含(of )个参数的函数对象。

Since Boost.Coroutine2 now supports it very well (I found it because I wanted to solve exactly the same yield problem), I am posting the C++ code that matches your original intention:

#include <stdint.h>
#include <iostream>
#include <memory>
#include <boost/coroutine2/all.hpp>

typedef boost::coroutines2::coroutine<std::pair<uint16_t, uint16_t>> coro_t;

void pair_sequence(coro_t::push_type& yield)
{
    uint16_t i = 0;
    uint16_t j = 0;
    for (;;) {
        for (;;) {
            yield(std::make_pair(i, j));
            if (++j == 0)
                break;
        }
        if (++i == 0)
            break;
    }
}

int main()
{
    coro_t::pull_type seq(boost::coroutines2::fixedsize_stack(),
                          pair_sequence);
    for (auto pair : seq) {
        print_pair(pair);
    }
    //while (seq) {
    //    print_pair(seq.get());
    //    seq();
    //}
}

In this example, pair_sequence does not take additional arguments. If it needs to, std::bind or a lambda should be used to generate a function object that takes only one argument (of push_type), when it is passed to the coro_t::pull_type constructor.


回答 3

所有涉及编写自己的迭代器的答案都是完全错误的。这样的答案完全错过了Python生成器的意义(该语言最大的独特功能之一)。生成器最重要的是执行从中断处开始执行。迭代器不会发生这种情况。取而代之的是,您必须手动存储状态信息,以便在重新调用operator ++或operator * 时,在下一个函数调用的最开始便有正确的信息。这就是为什么编写自己的C ++迭代器是一个巨大的痛苦。相反,生成器优雅,并且易于读写。

我认为本机C ++中没有适合Python生成器的良好模拟,至少目前还没有(有传言称yield将在C ++ 17中实现)。您可以借助第三方(例如,永伟的Boost建议)或自己动手做一些类似的事情。

我会说本机C ++中最接近的东西是线程。线程可以维护一组暂停的局部变量,并且可以在中断处继续执行,这与生成器非常相似,但是您需要使用一些附加的基础架构来支持生成器对象与其调用者之间的通信。例如

// Infrastructure

template <typename Element>
class Channel { ... };

// Application

using IntPair = std::pair<int, int>;

void yield_pairs(int end_i, int end_j, Channel<IntPair>* out) {
  for (int i = 0; i < end_i; ++i) {
    for (int j = 0; j < end_j; ++j) {
      out->send(IntPair{i, j});  // "yield"
    }
  }
  out->close();
}

void MyApp() {
  Channel<IntPair> pairs;
  std::thread generator(yield_pairs, 32, 32, &pairs);
  for (IntPair pair : pairs) {
    UsePair(pair);
  }
  generator.join();
}

但是,此解决方案有几个缺点:

  1. 线程是“昂贵的”。大多数人会认为这是对线程的“过度”使用,尤其是当生成器如此简单时。
  2. 您需要记住一些清理操作。这些可以是自动化的,但是您将需要更多的基础架构,而这些基础架构又可能被视为“过于奢侈”。无论如何,您需要进行的清理工作是:
    1. out-> close()
    2. generator.join()
  3. 这不允许您停止生成器。您可以进行一些修改以添加该功能,但是这会使代码混乱。它永远不会像Python的yield语句那么干净。
  4. 除2之外,每次您要“实例化”生成器对象时,还需要其他样板位:
    1. 通道*输出参数
    2. 主变量:对,生成器

All answers that involve writing your own iterator are completely wrong. Such answers entirely miss the point of Python generators (one of the language’s greatest and unique features). The most important thing about generators is that execution picks up where it left off. This does not happen to iterators. Instead, you must manually store state information such that when operator++ or operator* is called anew, the right information is in place at the very beginning of the next function call. This is why writing your own C++ iterator is a gigantic pain; whereas, generators are elegant, and easy to read+write.

I don’t think there is a good analog for Python generators in native C++, at least not yet (there is a rummor that yield will land in C++17). You can get something similarish by resorting to third-party (e.g. Yongwei’s Boost suggestion), or rolling your own.

I would say the closest thing in native C++ is threads. A thread can maintain a suspended set of local variables, and can continue execution where it left off, very much like generators, but you need to roll a little bit of additional infrastructure to support communication between the generator object and its caller. E.g.

// Infrastructure

template <typename Element>
class Channel { ... };

// Application

using IntPair = std::pair<int, int>;

void yield_pairs(int end_i, int end_j, Channel<IntPair>* out) {
  for (int i = 0; i < end_i; ++i) {
    for (int j = 0; j < end_j; ++j) {
      out->send(IntPair{i, j});  // "yield"
    }
  }
  out->close();
}

void MyApp() {
  Channel<IntPair> pairs;
  std::thread generator(yield_pairs, 32, 32, &pairs);
  for (IntPair pair : pairs) {
    UsePair(pair);
  }
  generator.join();
}

This solution has several downsides though:

  1. Threads are “expensive”. Most people would consider this to be an “extravagant” use of threads, especially when your generator is so simple.
  2. There are a couple of clean up actions that you need to remember. These could be automated, but you’d need even more infrastructure, which again, is likely to be seen as “too extravagant”. Anyway, the clean ups that you need are:
    1. out->close()
    2. generator.join()
  3. This does not allow you to stop generator. You could make some modifications to add that ability, but it adds clutter to the code. It would never be as clean as Python’s yield statement.
  4. In addition to 2, there are other bits of boilerplate that are needed each time you want to “instantiate” a generator object:
    1. Channel* out parameter
    2. Additional variables in main: pairs, generator

回答 4

您可能应该在Visual Studio 2015的std :: experimental中检查生成器,例如:https : //blogs.msdn.microsoft.com/vcblog/2014/11/12/resumable-functions-in-c/

我认为这正是您想要的。总体生成器应在C ++ 17中可用,因为这只是实验性的Microsoft VC功能。

You should probably check generators in std::experimental in Visual Studio 2015 e.g: https://blogs.msdn.microsoft.com/vcblog/2014/11/12/resumable-functions-in-c/

I think it’s exactly what you are looking for. Overall generators should be available in C++17 as this is only experimental Microsoft VC feature.


回答 5

如果只需要为相对较少的特定生成器执行此操作,则可以将每个实现生成为一个类,其中成员数据等效于Python生成器函数的局部变量。然后,您将具有一个next函数,该函数返回生成器将产生的下一个内容,并以此更新内部状态。

我相信,这基本上与Python生成器的实现方式相似。主要的区别是它们可以记住生成器函数的字节码中的偏移量,作为“内部状态”的一部分,这意味着可以将生成器写为包含yield的循环。您将不得不从上一个计算下一个值。在您的情况下pair_sequence,这是微不足道的。它可能不适用于复杂的生成器。

您还需要一些指示终止的方法。如果返回的是“类似指针的”,并且NULL不应为有效的yieldable值,则可以将NULL指针用作终止指示符。否则,您需要带外信号。

If you only need to do this for a relatively small number of specific generators, you can implement each as a class, where the member data is equivalent to the local variables of the Python generator function. Then you have a next function that returns the next thing the generator would yield, updating the internal state as it does so.

This is basically similar to how Python generators are implemented, I believe. The major difference being they can remember an offset into the bytecode for the generator function as part of the “internal state”, which means the generators can be written as loops containing yields. You would have to instead calculate the next value from the previous. In the case of your pair_sequence, that’s pretty trivial. It may not be for complex generators.

You also need some way of indicating termination. If what you’re returning is “pointer-like”, and NULL should not be a valid yieldable value you could use a NULL pointer as a termination indicator. Otherwise you need an out-of-band signal.


回答 6

这样的事情非常相似:

struct pair_sequence
{
    typedef pair<unsigned int, unsigned int> result_type;
    static const unsigned int limit = numeric_limits<unsigned int>::max()

    pair_sequence() : i(0), j(0) {}

    result_type operator()()
    {
        result_type r(i, j);
        if(j < limit) j++;
        else if(i < limit)
        {
          j = 0;
          i++;
        }
        else throw out_of_range("end of iteration");
    }

    private:
        unsigned int i;
        unsigned int j;
}

使用operator()只是要对该生成器执行的操作的一个问题,例如,您还可以将其构建为流,并确保它适合istream_iterator。

Something like this is very similar:

struct pair_sequence
{
    typedef pair<unsigned int, unsigned int> result_type;
    static const unsigned int limit = numeric_limits<unsigned int>::max()

    pair_sequence() : i(0), j(0) {}

    result_type operator()()
    {
        result_type r(i, j);
        if(j < limit) j++;
        else if(i < limit)
        {
          j = 0;
          i++;
        }
        else throw out_of_range("end of iteration");
    }

    private:
        unsigned int i;
        unsigned int j;
}

Using the operator() is only a question of what you want to do with this generator, you could also build it as a stream and make sure it adapts to an istream_iterator, for example.


回答 7

使用range-v3

#include <iostream>
#include <tuple>
#include <range/v3/all.hpp>

using namespace std;
using namespace ranges;

auto generator = [x = view::iota(0) | view::take(3)] {
    return view::cartesian_product(x, x);
};

int main () {
    for (auto x : generator()) {
        cout << get<0>(x) << ", " << get<1>(x) << endl;
    }

    return 0;
}

Using range-v3:

#include <iostream>
#include <tuple>
#include <range/v3/all.hpp>

using namespace std;
using namespace ranges;

auto generator = [x = view::iota(0) | view::take(3)] {
    return view::cartesian_product(x, x);
};

int main () {
    for (auto x : generator()) {
        cout << get<0>(x) << ", " << get<1>(x) << endl;
    }

    return 0;
}

回答 8

这样的东西:

使用示例:

using ull = unsigned long long;

auto main() -> int {
    for (ull val : range_t<ull>(100)) {
        std::cout << val << std::endl;
    }

    return 0;
}

将打印从0到99的数字

Something like this:

Example use:

using ull = unsigned long long;

auto main() -> int {
    for (ull val : range_t<ull>(100)) {
        std::cout << val << std::endl;
    }

    return 0;
}

Will print the numbers from 0 to 99


回答 9

好吧,今天我也在寻找在C ++ 11下实现轻松收集的实现。实际上,我很失望,因为我发现的一切都与python生成器或C#yield操作符等东西相距太远……或者过于复杂。

目的是使收集仅在需要时才发出其项目。

我希望它像这样:

auto emitter = on_range<int>(a, b).yield(
    [](int i) {
         /* do something with i */
         return i * 2;
    });

我发现这个职位,恕我直言,最好的回答是大约boost.coroutine2,通过永伟吴。由于它是最接近作者想要的东西。

值得学习加强日常护理。.而且我也许会在周末去做。但是到目前为止,我正在使用非常小的实现。希望它对其他人有帮助。

下面是使用示例,然后实现。

范例.cpp

#include <iostream>
#include "Generator.h"
int main() {
    typedef std::pair<int, int> res_t;

    auto emitter = Generator<res_t, int>::on_range(0, 3)
        .yield([](int i) {
            return std::make_pair(i, i * i);
        });

    for (auto kv : emitter) {
        std::cout << kv.first << "^2 = " << kv.second << std::endl;
    }

    return 0;
}

Generator.h

template<typename ResTy, typename IndexTy>
struct yield_function{
    typedef std::function<ResTy(IndexTy)> type;
};

template<typename ResTy, typename IndexTy>
class YieldConstIterator {
public:
    typedef IndexTy index_t;
    typedef ResTy res_t;
    typedef typename yield_function<res_t, index_t>::type yield_function_t;

    typedef YieldConstIterator<ResTy, IndexTy> mytype_t;
    typedef ResTy value_type;

    YieldConstIterator(index_t index, yield_function_t yieldFunction) :
            mIndex(index),
            mYieldFunction(yieldFunction) {}

    mytype_t &operator++() {
        ++mIndex;
        return *this;
    }

    const value_type operator*() const {
        return mYieldFunction(mIndex);
    }

    bool operator!=(const mytype_t &r) const {
        return mIndex != r.mIndex;
    }

protected:

    index_t mIndex;
    yield_function_t mYieldFunction;
};

template<typename ResTy, typename IndexTy>
class YieldIterator : public YieldConstIterator<ResTy, IndexTy> {
public:

    typedef YieldConstIterator<ResTy, IndexTy> parent_t;

    typedef IndexTy index_t;
    typedef ResTy res_t;
    typedef typename yield_function<res_t, index_t>::type yield_function_t;
    typedef ResTy value_type;

    YieldIterator(index_t index, yield_function_t yieldFunction) :
            parent_t(index, yieldFunction) {}

    value_type operator*() {
        return parent_t::mYieldFunction(parent_t::mIndex);
    }
};

template<typename IndexTy>
struct Range {
public:
    typedef IndexTy index_t;
    typedef Range<IndexTy> mytype_t;

    index_t begin;
    index_t end;
};

template<typename ResTy, typename IndexTy>
class GeneratorCollection {
public:

    typedef Range<IndexTy> range_t;

    typedef IndexTy index_t;
    typedef ResTy res_t;
    typedef typename yield_function<res_t, index_t>::type yield_function_t;
    typedef YieldIterator<ResTy, IndexTy> iterator;
    typedef YieldConstIterator<ResTy, IndexTy> const_iterator;

    GeneratorCollection(range_t range, const yield_function_t &yieldF) :
            mRange(range),
            mYieldFunction(yieldF) {}

    iterator begin() {
        return iterator(mRange.begin, mYieldFunction);
    }

    iterator end() {
        return iterator(mRange.end, mYieldFunction);
    }

    const_iterator begin() const {
        return const_iterator(mRange.begin, mYieldFunction);
    }

    const_iterator end() const {
        return const_iterator(mRange.end, mYieldFunction);
    }

private:
    range_t mRange;
    yield_function_t mYieldFunction;
};

template<typename ResTy, typename IndexTy>
class Generator {
public:
    typedef IndexTy index_t;
    typedef ResTy res_t;
    typedef typename yield_function<res_t, index_t>::type yield_function_t;

    typedef Generator<ResTy, IndexTy> mytype_t;
    typedef Range<IndexTy> parent_t;
    typedef GeneratorCollection<ResTy, IndexTy> finalized_emitter_t;
    typedef  Range<IndexTy> range_t;

protected:
    Generator(range_t range) : mRange(range) {}
public:
    static mytype_t on_range(index_t begin, index_t end) {
        return mytype_t({ begin, end });
    }

    finalized_emitter_t yield(yield_function_t f) {
        return finalized_emitter_t(mRange, f);
    }
protected:

    range_t mRange;
};      

Well, today I also was looking for easy collection implementation under C++11. Actually I was disappointed, because everything I found is too far from things like python generators, or C# yield operator… or too complicated.

The purpose is to make collection which will emit its items only when it is required.

I wanted it to be like this:

auto emitter = on_range<int>(a, b).yield(
    [](int i) {
         /* do something with i */
         return i * 2;
    });

I found this post, IMHO best answer was about boost.coroutine2, by Yongwei Wu. Since it is the nearest to what author wanted.

It is worth learning boost couroutines.. And I’ll perhaps do on weekends. But so far I’m using my very small implementation. Hope it helps to someone else.

Below is example of use, and then implementation.

Example.cpp

#include <iostream>
#include "Generator.h"
int main() {
    typedef std::pair<int, int> res_t;

    auto emitter = Generator<res_t, int>::on_range(0, 3)
        .yield([](int i) {
            return std::make_pair(i, i * i);
        });

    for (auto kv : emitter) {
        std::cout << kv.first << "^2 = " << kv.second << std::endl;
    }

    return 0;
}

Generator.h

template<typename ResTy, typename IndexTy>
struct yield_function{
    typedef std::function<ResTy(IndexTy)> type;
};

template<typename ResTy, typename IndexTy>
class YieldConstIterator {
public:
    typedef IndexTy index_t;
    typedef ResTy res_t;
    typedef typename yield_function<res_t, index_t>::type yield_function_t;

    typedef YieldConstIterator<ResTy, IndexTy> mytype_t;
    typedef ResTy value_type;

    YieldConstIterator(index_t index, yield_function_t yieldFunction) :
            mIndex(index),
            mYieldFunction(yieldFunction) {}

    mytype_t &operator++() {
        ++mIndex;
        return *this;
    }

    const value_type operator*() const {
        return mYieldFunction(mIndex);
    }

    bool operator!=(const mytype_t &r) const {
        return mIndex != r.mIndex;
    }

protected:

    index_t mIndex;
    yield_function_t mYieldFunction;
};

template<typename ResTy, typename IndexTy>
class YieldIterator : public YieldConstIterator<ResTy, IndexTy> {
public:

    typedef YieldConstIterator<ResTy, IndexTy> parent_t;

    typedef IndexTy index_t;
    typedef ResTy res_t;
    typedef typename yield_function<res_t, index_t>::type yield_function_t;
    typedef ResTy value_type;

    YieldIterator(index_t index, yield_function_t yieldFunction) :
            parent_t(index, yieldFunction) {}

    value_type operator*() {
        return parent_t::mYieldFunction(parent_t::mIndex);
    }
};

template<typename IndexTy>
struct Range {
public:
    typedef IndexTy index_t;
    typedef Range<IndexTy> mytype_t;

    index_t begin;
    index_t end;
};

template<typename ResTy, typename IndexTy>
class GeneratorCollection {
public:

    typedef Range<IndexTy> range_t;

    typedef IndexTy index_t;
    typedef ResTy res_t;
    typedef typename yield_function<res_t, index_t>::type yield_function_t;
    typedef YieldIterator<ResTy, IndexTy> iterator;
    typedef YieldConstIterator<ResTy, IndexTy> const_iterator;

    GeneratorCollection(range_t range, const yield_function_t &yieldF) :
            mRange(range),
            mYieldFunction(yieldF) {}

    iterator begin() {
        return iterator(mRange.begin, mYieldFunction);
    }

    iterator end() {
        return iterator(mRange.end, mYieldFunction);
    }

    const_iterator begin() const {
        return const_iterator(mRange.begin, mYieldFunction);
    }

    const_iterator end() const {
        return const_iterator(mRange.end, mYieldFunction);
    }

private:
    range_t mRange;
    yield_function_t mYieldFunction;
};

template<typename ResTy, typename IndexTy>
class Generator {
public:
    typedef IndexTy index_t;
    typedef ResTy res_t;
    typedef typename yield_function<res_t, index_t>::type yield_function_t;

    typedef Generator<ResTy, IndexTy> mytype_t;
    typedef Range<IndexTy> parent_t;
    typedef GeneratorCollection<ResTy, IndexTy> finalized_emitter_t;
    typedef  Range<IndexTy> range_t;

protected:
    Generator(range_t range) : mRange(range) {}
public:
    static mytype_t on_range(index_t begin, index_t end) {
        return mytype_t({ begin, end });
    }

    finalized_emitter_t yield(yield_function_t f) {
        return finalized_emitter_t(mRange, f);
    }
protected:

    range_t mRange;
};      

回答 10

这个答案适用于C语言(因此我认为也适用于C ++语言)

#include <stdio.h>

const uint64_t MAX = 1ll<<32;

typedef struct {
    uint64_t i, j;
} Pair;

Pair* generate_pairs()
{
    static uint64_t i = 0;
    static uint64_t j = 0;
    
    Pair p = {i,j};
    if(j++ < MAX)
    {
        return &p;
    }
        else if(++i < MAX)
    {
        p.i++;
        p.j = 0;
        j = 0;
        return &p;
    }
    else
    {
        return NULL;
    }
}

int main()
{
    while(1)
    {
        Pair *p = generate_pairs();
        if(p != NULL)
        {
            //printf("%d,%d\n",p->i,p->j);
        }
        else
        {
            //printf("end");
            break;
        }
    }
    return 0;
}

这是一种模拟生成器的简单,非面向对象的方法。这按我的预期工作。

This answer works in C (and hence i think works in c++ too)

#include <stdio.h>

const uint64_t MAX = 1ll<<32;

typedef struct {
    uint64_t i, j;
} Pair;

Pair* generate_pairs()
{
    static uint64_t i = 0;
    static uint64_t j = 0;
    
    Pair p = {i,j};
    if(j++ < MAX)
    {
        return &p;
    }
        else if(++i < MAX)
    {
        p.i++;
        p.j = 0;
        j = 0;
        return &p;
    }
    else
    {
        return NULL;
    }
}

int main()
{
    while(1)
    {
        Pair *p = generate_pairs();
        if(p != NULL)
        {
            //printf("%d,%d\n",p->i,p->j);
        }
        else
        {
            //printf("end");
            break;
        }
    }
    return 0;
}

This is simple, non object-oriented way to mimic a generator. This worked as expected for me.


回答 11

正如函数模拟堆栈的概念一样,生成器模拟队列的概念。剩下的就是语义。

附带说明,您始终可以通过使用操作堆栈而不是数据来模拟带有堆栈的队列。实际上,这意味着您可以通过返回一对来实现类似队列的行为,该对的第二个值要么具有要调用的下一个函数,要么表明我们没有值。但是,这比收益与收益的关系更为普遍。它允许模拟任何值的队列,而不是生成器期望的同类值,但是无需保留完整的内部队列。

更具体地说,由于C ++对队列没有自然的抽象,因此您需要使用在内部实现队列的构造。因此,使用迭代器给出示例的答案是该概念的不错实现。

这实际上意味着,如果您只想快速地进行操作,然后使用队列值,就可以使用生成器产生的值,那么您可以使用准系统队列功能来实现某些功能。

Just as a function simulates the concept of a stack, generators simulate the concept of a queue. The rest is semantics.

As a side note, you can always simulate a queue with a stack by using a stack of operations instead of data. What that practically means is that you can implement a queue-like behavior by returning a pair, the second value of which either has the next function to be called or indicates that we are out of values. But this is more general than what yield vs return does. It allows to simulate a queue of any values rather than homogeneous values that you expect from a generator, but without keeping a full internal queue.

More specifically, since C++ does not have a natural abstraction for a queue, you need to use constructs which implement a queue internally. So the answer which gave the example with iterators is a decent implementation of the concept.

What this practically means is that you can implement something with bare-bones queue functionality if you just want something quick and then consume queue’s values just as you would consume values yielded from a generator.


在Python中重置生成器对象

问题:在Python中重置生成器对象

我有一个由多个yield返回的生成器对象。准备调用此生成器是相当耗时的操作。这就是为什么我想多次重用生成器。

y = FunctionWithYield()
for x in y: print(x)
#here must be something to reset 'y'
for x in y: print(x)

当然,我会考虑将内容复制到简单列表中。有没有办法重置我的生成器?

I have a generator object returned by multiple yield. Preparation to call this generator is rather time-consuming operation. That is why I want to reuse the generator several times.

y = FunctionWithYield()
for x in y: print(x)
#here must be something to reset 'y'
for x in y: print(x)

Of course, I’m taking in mind copying content into simple list. Is there a way to reset my generator?


回答 0

另一个选择是使用该itertools.tee()函数来创建生成器的第二版本:

y = FunctionWithYield()
y, y_backup = tee(y)
for x in y:
    print(x)
for x in y_backup:
    print(x)

如果原始迭代可能未处理所有项目,则从内存使用的角度来看这可能是有益的。

Another option is to use the itertools.tee() function to create a second version of your generator:

y = FunctionWithYield()
y, y_backup = tee(y)
for x in y:
    print(x)
for x in y_backup:
    print(x)

This could be beneficial from memory usage point of view if the original iteration might not process all the items.


回答 1

生成器不能倒退。您有以下选择:

  1. 再次运行生成器功能,重新开始生成:

    y = FunctionWithYield()
    for x in y: print(x)
    y = FunctionWithYield()
    for x in y: print(x)
  2. 将生成器结果存储在内存或磁盘上的数据结构中,可以再次进行迭代:

    y = list(FunctionWithYield())
    for x in y: print(x)
    # can iterate again:
    for x in y: print(x)

选项1的缺点是它会再次计算值。如果那是CPU密集型的,那么您最终将计算两次。另一方面,2的缺点是存储空间。整个值列表将存储在内存中。如果值太多,那将是不切实际的。

因此,您将获得经典的内存与处理权衡。我无法想象一种在不存储值或再次计算值的情况下倒带生成器的方法。

Generators can’t be rewound. You have the following options:

  1. Run the generator function again, restarting the generation:

    y = FunctionWithYield()
    for x in y: print(x)
    y = FunctionWithYield()
    for x in y: print(x)
    
  2. Store the generator results in a data structure on memory or disk which you can iterate over again:

    y = list(FunctionWithYield())
    for x in y: print(x)
    # can iterate again:
    for x in y: print(x)
    

The downside of option 1 is that it computes the values again. If that’s CPU-intensive you end up calculating twice. On the other hand, the downside of 2 is the storage. The entire list of values will be stored on memory. If there are too many values, that can be unpractical.

So you have the classic memory vs. processing tradeoff. I can’t imagine a way of rewinding the generator without either storing the values or calculating them again.


回答 2

>>> def gen():
...     def init():
...         return 0
...     i = init()
...     while True:
...         val = (yield i)
...         if val=='restart':
...             i = init()
...         else:
...             i += 1

>>> g = gen()
>>> g.next()
0
>>> g.next()
1
>>> g.next()
2
>>> g.next()
3
>>> g.send('restart')
0
>>> g.next()
1
>>> g.next()
2
>>> def gen():
...     def init():
...         return 0
...     i = init()
...     while True:
...         val = (yield i)
...         if val=='restart':
...             i = init()
...         else:
...             i += 1

>>> g = gen()
>>> g.next()
0
>>> g.next()
1
>>> g.next()
2
>>> g.next()
3
>>> g.send('restart')
0
>>> g.next()
1
>>> g.next()
2

回答 3

可能最简单的解决方案是将昂贵的零件包装在一个对象中,然后将其传递给生成器:

data = ExpensiveSetup()
for x in FunctionWithYield(data): pass
for x in FunctionWithYield(data): pass

这样,您可以缓存昂贵的计算。

如果可以将所有结果同时保存在RAM中,请使用,list()以将生成器的结果具体化为简单列表并进行处理。

Probably the most simple solution is to wrap the expensive part in an object and pass that to the generator:

data = ExpensiveSetup()
for x in FunctionWithYield(data): pass
for x in FunctionWithYield(data): pass

This way, you can cache the expensive calculations.

If you can keep all results in RAM at the same time, then use list() to materialize the results of the generator in a plain list and work with that.


回答 4

我想为旧问题提供其他解决方案

class IterableAdapter:
    def __init__(self, iterator_factory):
        self.iterator_factory = iterator_factory

    def __iter__(self):
        return self.iterator_factory()

squares = IterableAdapter(lambda: (x * x for x in range(5)))

for x in squares: print(x)
for x in squares: print(x)

与类似的东西相比,这样做的好处list(iterator)O(1)空间复杂度list(iterator)O(n)。缺点是,如果您只能访问迭代器,而不能访问生成迭代器的函数,则无法使用此方法。例如,执行以下操作似乎很合理,但它不起作用。

g = (x * x for x in range(5))

squares = IterableAdapter(lambda: g)

for x in squares: print(x)
for x in squares: print(x)

I want to offer a different solution to an old problem

class IterableAdapter:
    def __init__(self, iterator_factory):
        self.iterator_factory = iterator_factory

    def __iter__(self):
        return self.iterator_factory()

squares = IterableAdapter(lambda: (x * x for x in range(5)))

for x in squares: print(x)
for x in squares: print(x)

The benefit of this when compared to something like list(iterator) is that this is O(1) space complexity and list(iterator) is O(n). The disadvantage is that, if you only have access to the iterator, but not the function that produced the iterator, then you cannot use this method. For example, it might seem reasonable to do the following, but it will not work.

g = (x * x for x in range(5))

squares = IterableAdapter(lambda: g)

for x in squares: print(x)
for x in squares: print(x)

回答 5

如果GrzegorzOledzki的答案不足够,您可能会send()用来实现目标。有关增强的生成器和yield表达式的更多详细信息,请参见PEP-0342

更新:另请参见itertools.tee()。它涉及到上面提到的一些内存与处理权衡,但是它可能比仅将生成器结果存储在list;中节省一些内存。这取决于您使用生成器的方式。

If GrzegorzOledzki’s answer won’t suffice, you could probably use send() to accomplish your goal. See PEP-0342 for more details on enhanced generators and yield expressions.

UPDATE: Also see itertools.tee(). It involves some of that memory vs. processing tradeoff mentioned above, but it might save some memory over just storing the generator results in a list; it depends on how you’re using the generator.


回答 6

如果您的生成器是纯粹的,从某种意义上说,它的输出仅取决于传递的参数和步骤号,并且您希望生成的生成器可重新启动,则下面的排序片段可能会很方便:

import copy

def generator(i):
    yield from range(i)

g = generator(10)
print(list(g))
print(list(g))

class GeneratorRestartHandler(object):
    def __init__(self, gen_func, argv, kwargv):
        self.gen_func = gen_func
        self.argv = copy.copy(argv)
        self.kwargv = copy.copy(kwargv)
        self.local_copy = iter(self)

    def __iter__(self):
        return self.gen_func(*self.argv, **self.kwargv)

    def __next__(self):
        return next(self.local_copy)

def restartable(g_func: callable) -> callable:
    def tmp(*argv, **kwargv):
        return GeneratorRestartHandler(g_func, argv, kwargv)

    return tmp

@restartable
def generator2(i):
    yield from range(i)

g = generator2(10)
print(next(g))
print(list(g))
print(list(g))
print(next(g))

输出:

[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
[]
0
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
1

If your generator is pure in a sense that its output only depends on passed arguments and the step number, and you want the resulting generator to be restartable, here’s a sort snippet that might be handy:

import copy

def generator(i):
    yield from range(i)

g = generator(10)
print(list(g))
print(list(g))

class GeneratorRestartHandler(object):
    def __init__(self, gen_func, argv, kwargv):
        self.gen_func = gen_func
        self.argv = copy.copy(argv)
        self.kwargv = copy.copy(kwargv)
        self.local_copy = iter(self)

    def __iter__(self):
        return self.gen_func(*self.argv, **self.kwargv)

    def __next__(self):
        return next(self.local_copy)

def restartable(g_func: callable) -> callable:
    def tmp(*argv, **kwargv):
        return GeneratorRestartHandler(g_func, argv, kwargv)

    return tmp

@restartable
def generator2(i):
    yield from range(i)

g = generator2(10)
print(next(g))
print(list(g))
print(list(g))
print(next(g))

outputs:

[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
[]
0
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
1

回答 7

tee的官方文档中

通常,如果一个迭代器在另一个迭代器启动之前使用了大部分或全部数据,则使用list()而不是tee()更快。

因此,最好list(iterable)在您的情况下使用。

From official documentation of tee:

In general, if one iterator uses most or all of the data before another iterator starts, it is faster to use list() instead of tee().

So it’s best to use list(iterable) instead in your case.


回答 8

使用包装函数处理 StopIteration

您可以在生成器的函数中编写一个简单的包装函数,以跟踪生成器何时耗尽。它将使用StopIteration生成器到达迭代结束时抛出的异常来执行此操作。

import types

def generator_wrapper(function=None, **kwargs):
    assert function is not None, "Please supply a function"
    def inner_func(function=function, **kwargs):
        generator = function(**kwargs)
        assert isinstance(generator, types.GeneratorType), "Invalid function"
        try:
            yield next(generator)
        except StopIteration:
            generator = function(**kwargs)
            yield next(generator)
    return inner_func

正如您在上面看到的那样,当包装函数捕获StopIteration异常时,它只是简单地重新初始化了生成器对象(使用函数调用的另一个实例)。

然后,假设您在如下所示的位置定义了生成器提供的函数,则可以使用Python函数装饰器语法隐式包装它:

@generator_wrapper
def generator_generating_function(**kwargs):
    for item in ["a value", "another value"]
        yield item

Using a wrapper function to handle StopIteration

You could write a simple wrapper function to your generator-generating function that tracks when the generator is exhausted. It will do so using the StopIteration exception a generator throws when it reaches end of iteration.

import types

def generator_wrapper(function=None, **kwargs):
    assert function is not None, "Please supply a function"
    def inner_func(function=function, **kwargs):
        generator = function(**kwargs)
        assert isinstance(generator, types.GeneratorType), "Invalid function"
        try:
            yield next(generator)
        except StopIteration:
            generator = function(**kwargs)
            yield next(generator)
    return inner_func

As you can spot above, when our wrapper function catches a StopIteration exception, it simply re-initializes the generator object (using another instance of the function call).

And then, assuming you define your generator-supplying function somewhere as below, you could use the Python function decorator syntax to wrap it implicitly:

@generator_wrapper
def generator_generating_function(**kwargs):
    for item in ["a value", "another value"]
        yield item

回答 9

您可以定义一个返回生成器的函数

def f():
  def FunctionWithYield(generator_args):
    code here...

  return FunctionWithYield

现在,您可以随意进行多次:

for x in f()(generator_args): print(x)
for x in f()(generator_args): print(x)

You can define a function that returns your generator

def f():
  def FunctionWithYield(generator_args):
    code here...

  return FunctionWithYield

Now you can just do as many times as you like:

for x in f()(generator_args): print(x)
for x in f()(generator_args): print(x)

回答 10

我不确定您所说的昂贵准备是什么意思,但我想您实际上有

data = ... # Expensive computation
y = FunctionWithYield(data)
for x in y: print(x)
#here must be something to reset 'y'
# this is expensive - data = ... # Expensive computation
# y = FunctionWithYield(data)
for x in y: print(x)

如果是这样,为什么不重用data

I’m not sure what you meant by expensive preparation, but I guess you actually have

data = ... # Expensive computation
y = FunctionWithYield(data)
for x in y: print(x)
#here must be something to reset 'y'
# this is expensive - data = ... # Expensive computation
# y = FunctionWithYield(data)
for x in y: print(x)

If that’s the case, why not reuse data?


回答 11

没有重置迭代器的选项。通过next()函数进行迭代时,通常会弹出迭代器。唯一的方法是在迭代器对象上进行迭代之前进行备份。检查下面。

创建项目0到9的迭代器对象

i=iter(range(10))

遍历将弹出的next()函数

print(next(i))

将迭代器对象转换为列表

L=list(i)
print(L)
output: [1, 2, 3, 4, 5, 6, 7, 8, 9]

因此项目0已经弹出。当我们将迭代器转换为列表时,所有项目也会弹出。

next(L) 

Traceback (most recent call last):
  File "<pyshell#129>", line 1, in <module>
    next(L)
StopIteration

因此,您需要在开始迭代之前将迭代器转换为要备份的列表。列表可以用转换为迭代器iter(<list-object>)

There is no option to reset iterators. Iterator usually pops out when it iterate through next() function. Only way is to take a backup before iterate on the iterator object. Check below.

Creating iterator object with items 0 to 9

i=iter(range(10))

Iterating through next() function which will pop out

print(next(i))

Converting the iterator object to list

L=list(i)
print(L)
output: [1, 2, 3, 4, 5, 6, 7, 8, 9]

so item 0 is already popped out. Also all the items are popped as we converted the iterator to list.

next(L) 

Traceback (most recent call last):
  File "<pyshell#129>", line 1, in <module>
    next(L)
StopIteration

So you need to convert the iterator to lists for backup before start iterating. List could be converted to iterator with iter(<list-object>)


回答 12

现在,您可以使用more_itertools.seekable(第三方工具)启用重置迭代器的功能。

通过安装 > pip install more_itertools

import more_itertools as mit


y = mit.seekable(FunctionWithYield())
for x in y:
    print(x)

y.seek(0)                                              # reset iterator
for x in y:
    print(x)

注意:前进迭代器时内存消耗会增加,因此请警惕大型可迭代对象。

You can now use more_itertools.seekable (a third-party tool) which enables resetting iterators.

Install via > pip install more_itertools

import more_itertools as mit


y = mit.seekable(FunctionWithYield())
for x in y:
    print(x)

y.seek(0)                                              # reset iterator
for x in y:
    print(x)

Note: memory consumption grows while advancing the iterator, so be wary of large iterables.


回答 13

您可以通过使用itertools.cycle()来实现, 可以使用此方法创建一个迭代器,然后在该迭代器上执行for循环,该循环将遍历其值。

例如:

def generator():
for j in cycle([i for i in range(5)]):
    yield j

gen = generator()
for i in range(20):
    print(next(gen))

会生成20个数字,重复0到4。

来自文档的注释:

Note, this member of the toolkit may require significant auxiliary storage (depending on the length of the iterable).

You can do that by using itertools.cycle() you can create an iterator with this method and then execute a for loop over the iterator which will loop over its values.

For example:

def generator():
for j in cycle([i for i in range(5)]):
    yield j

gen = generator()
for i in range(20):
    print(next(gen))

will generate 20 numbers, 0 to 4 repeatedly.

A note from the docs:

Note, this member of the toolkit may require significant auxiliary storage (depending on the length of the iterable).

回答 14

好的,您说您想多次调用一个生成器,但是初始化非常昂贵。

class InitializedFunctionWithYield(object):
    def __init__(self):
        # do expensive initialization
        self.start = 5

    def __call__(self, *args, **kwargs):
        # do cheap iteration
        for i in xrange(5):
            yield self.start + i

y = InitializedFunctionWithYield()

for x in y():
    print x

for x in y():
    print x

另外,您可以只创建遵循迭代器协议并定义某种“重置”功能的类。

class MyIterator(object):
    def __init__(self):
        self.reset()

    def reset(self):
        self.i = 5

    def __iter__(self):
        return self

    def next(self):
        i = self.i
        if i > 0:
            self.i -= 1
            return i
        else:
            raise StopIteration()

my_iterator = MyIterator()

for x in my_iterator:
    print x

print 'resetting...'
my_iterator.reset()

for x in my_iterator:
    print x

https://docs.python.org/2/library/stdtypes.html#iterator-types http://anandology.com/python-practice-book/iterators.html

Ok, you say you want to call a generator multiple times, but initialization is expensive… What about something like this?

class InitializedFunctionWithYield(object):
    def __init__(self):
        # do expensive initialization
        self.start = 5

    def __call__(self, *args, **kwargs):
        # do cheap iteration
        for i in xrange(5):
            yield self.start + i

y = InitializedFunctionWithYield()

for x in y():
    print x

for x in y():
    print x

Alternatively, you could just make your own class that follows the iterator protocol and defines some sort of ‘reset’ function.

class MyIterator(object):
    def __init__(self):
        self.reset()

    def reset(self):
        self.i = 5

    def __iter__(self):
        return self

    def next(self):
        i = self.i
        if i > 0:
            self.i -= 1
            return i
        else:
            raise StopIteration()

my_iterator = MyIterator()

for x in my_iterator:
    print x

print 'resetting...'
my_iterator.reset()

for x in my_iterator:
    print x

https://docs.python.org/2/library/stdtypes.html#iterator-types http://anandology.com/python-practice-book/iterators.html


回答 15

我的答案解决了一个稍微不同的问题:如果生成器的初始化成本很高,而每个生成的对象的生成成本却很高。但是我们需要在多个函数中多次使用生成器。为了精确地调用生成器和每个生成的对象一次,我们可以使用线程,并在不同的线程中运行每个使用方法。由于GIL,我们可能无法实现真正​​的并行性,但是我们将实现我们的目标。

在以下情况下,该方法做得很好:深度学习模型处理了大量图像。结果是图像上许多对象的大量蒙版。每个掩码都会占用内存。我们大约有10种方法可以进行不同的统计和度量,但是它们会同时拍摄所有图像。所有图像都无法容纳在内存中。这些方法可以轻松地重写为接受迭代器。

class GeneratorSplitter:
'''
Split a generator object into multiple generators which will be sincronised. Each call to each of the sub generators will cause only one call in the input generator. This way multiple methods on threads can iterate the input generator , and the generator will cycled only once.
'''

def __init__(self, gen):
    self.gen = gen
    self.consumers: List[GeneratorSplitter.InnerGen] = []
    self.thread: threading.Thread = None
    self.value = None
    self.finished = False
    self.exception = None

def GetConsumer(self):
    # Returns a generator object. 
    cons = self.InnerGen(self)
    self.consumers.append(cons)
    return cons

def _Work(self):
    try:
        for d in self.gen:
            for cons in self.consumers:
                cons.consumed.wait()
                cons.consumed.clear()

            self.value = d

            for cons in self.consumers:
                cons.readyToRead.set()

        for cons in self.consumers:
            cons.consumed.wait()

        self.finished = True

        for cons in self.consumers:
            cons.readyToRead.set()
    except Exception as ex:
        self.exception = ex
        for cons in self.consumers:
            cons.readyToRead.set()

def Start(self):
    self.thread = threading.Thread(target=self._Work)
    self.thread.start()

class InnerGen:
    def __init__(self, parent: "GeneratorSplitter"):
        self.parent: "GeneratorSplitter" = parent
        self.readyToRead: threading.Event = threading.Event()
        self.consumed: threading.Event = threading.Event()
        self.consumed.set()

    def __iter__(self):
        return self

    def __next__(self):
        self.readyToRead.wait()
        self.readyToRead.clear()
        if self.parent.finished:
            raise StopIteration()
        if self.parent.exception:
            raise self.parent.exception
        val = self.parent.value
        self.consumed.set()
        return val

用法:

genSplitter = GeneratorSplitter(expensiveGenerator)

metrics={}
executor = ThreadPoolExecutor(max_workers=3)
f1 = executor.submit(mean,genSplitter.GetConsumer())
f2 = executor.submit(max,genSplitter.GetConsumer())
f3 = executor.submit(someFancyMetric,genSplitter.GetConsumer())
genSplitter.Start()

metrics.update(f1.result())
metrics.update(f2.result())
metrics.update(f3.result())

My answer solves slightly different problem: If the generator is expensive to initialize and each generated object is expensive to generate. But we need to consume the generator multiple times in multiple functions. In order to call the generator and each generated object exactly once we can use threads and Run each of the consuming methods in different thread. We may not achieve true parallelism due to GIL, but we will achieve our goal.

This approach did a good job in the following case: deep learning model processes a lot of images. The result is a lot of masks for a lot of objects on the image. Each mask consumes memory. We have around 10 methods which make different statistics and metrics, but they take all the images at once. All the images cannot fit in memory. The moethods can easily be rewritten to accept iterator.

class GeneratorSplitter:
'''
Split a generator object into multiple generators which will be sincronised. Each call to each of the sub generators will cause only one call in the input generator. This way multiple methods on threads can iterate the input generator , and the generator will cycled only once.
'''

def __init__(self, gen):
    self.gen = gen
    self.consumers: List[GeneratorSplitter.InnerGen] = []
    self.thread: threading.Thread = None
    self.value = None
    self.finished = False
    self.exception = None

def GetConsumer(self):
    # Returns a generator object. 
    cons = self.InnerGen(self)
    self.consumers.append(cons)
    return cons

def _Work(self):
    try:
        for d in self.gen:
            for cons in self.consumers:
                cons.consumed.wait()
                cons.consumed.clear()

            self.value = d

            for cons in self.consumers:
                cons.readyToRead.set()

        for cons in self.consumers:
            cons.consumed.wait()

        self.finished = True

        for cons in self.consumers:
            cons.readyToRead.set()
    except Exception as ex:
        self.exception = ex
        for cons in self.consumers:
            cons.readyToRead.set()

def Start(self):
    self.thread = threading.Thread(target=self._Work)
    self.thread.start()

class InnerGen:
    def __init__(self, parent: "GeneratorSplitter"):
        self.parent: "GeneratorSplitter" = parent
        self.readyToRead: threading.Event = threading.Event()
        self.consumed: threading.Event = threading.Event()
        self.consumed.set()

    def __iter__(self):
        return self

    def __next__(self):
        self.readyToRead.wait()
        self.readyToRead.clear()
        if self.parent.finished:
            raise StopIteration()
        if self.parent.exception:
            raise self.parent.exception
        val = self.parent.value
        self.consumed.set()
        return val

Ussage:

genSplitter = GeneratorSplitter(expensiveGenerator)

metrics={}
executor = ThreadPoolExecutor(max_workers=3)
f1 = executor.submit(mean,genSplitter.GetConsumer())
f2 = executor.submit(max,genSplitter.GetConsumer())
f3 = executor.submit(someFancyMetric,genSplitter.GetConsumer())
genSplitter.Start()

metrics.update(f1.result())
metrics.update(f2.result())
metrics.update(f3.result())

回答 16

可以通过代码对象来完成。这是例子。

code_str="y=(a for a in [1,2,3,4])"
code1=compile(code_str,'<string>','single')
exec(code1)
for i in y: print i

1 2 3 4

for i in y: print i


exec(code1)
for i in y: print i

1 2 3 4

It can be done by code object. Here is the example.

code_str="y=(a for a in [1,2,3,4])"
code1=compile(code_str,'<string>','single')
exec(code1)
for i in y: print i

1 2 3 4

for i in y: print i


exec(code1)
for i in y: print i

1 2 3 4


实际上,Python 3.3中新的“ yield from”语法的主要用途是什么?

问题:实际上,Python 3.3中新的“ yield from”语法的主要用途是什么?

我很难缠住PEP 380

  1. 在什么情况下“产生于”有用?
  2. 什么是经典用例?
  3. 为什么与微线程相比?

[更新]

现在,我了解了造成困难的原因。我曾经使用过生成器,但从未真正使用过协程(由PEP-342引入)。尽管有一些相似之处,但生成器和协程基本上是两个不同的概念。了解协程(不仅是生成器)是了解新语法的关键。

恕我直言,协程是最晦涩的Python功能,大多数书籍使它看起来毫无用处且无趣。

感谢您做出的出色回答,特别感谢agf及其与David Beazley演讲相关的评论。大卫·罗克。

I’m having a hard time wrapping my brain around PEP 380.

  1. What are the situations where “yield from” is useful?
  2. What is the classic use case?
  3. Why is it compared to micro-threads?

[ update ]

Now I understand the cause of my difficulties. I’ve used generators, but never really used coroutines (introduced by PEP-342). Despite some similarities, generators and coroutines are basically two different concepts. Understanding coroutines (not only generators) is the key to understanding the new syntax.

IMHO coroutines are the most obscure Python feature, most books make it look useless and uninteresting.

Thanks for the great answers, but special thanks to agf and his comment linking to David Beazley presentations. David rocks.


回答 0

让我们先解决一件事。该解释yield from g就等于for v in g: yield v 甚至没有开始做正义什么yield from是一回事。因为,让我们面对现实,如果所有的事情yield from都是扩大for循环,那么它就不必添加yield from语言,也不能阻止在Python 2.x中实现一堆新功能。

什么yield from所做的就是建立主叫方和副生成器之间的透明双向连接

  • 从某种意义上说,该连接是“透明的”,它也将正确地传播所有内容,而不仅仅是所生成的元素(例如,传播异常)。

  • 该连接是在意义上是“双向”的数据可以同时寄给一个生成器。

如果我们在谈论TCP,yield from g可能意味着“现在暂时断开客户端的套接字,然后将其重新连接到该其他服务器套接字”。

顺便说一句,如果您不确定向生成器发送数据意味着什么,则需要删除所有内容并首先了解协程,它们非常有用(将它们与子例程进行对比),但是不幸的是在Python中鲜为人知。戴夫·比兹利(Dave Beazley)的《协程》好奇类是一个很好的开始。阅读幻灯片24-33以获得快速入门。

使用以下命令从生成器读取数据

def reader():
    """A generator that fakes a read from a file, socket, etc."""
    for i in range(4):
        yield '<< %s' % i

def reader_wrapper(g):
    # Manually iterate over data produced by reader
    for v in g:
        yield v

wrap = reader_wrapper(reader())
for i in wrap:
    print(i)

# Result
<< 0
<< 1
<< 2
<< 3

reader()我们可以手动完成,而不必手动进行迭代yield from

def reader_wrapper(g):
    yield from g

那行得通,我们消除了一行代码。意图可能会更清晰(或不太清楚)。但是生活没有改变。

使用第1部分中的收益将数据发送到生成器(协程)

现在,让我们做一些更有趣的事情。让我们创建一个名为coroutine的协程writer,它接受发送给它的数据并写入套接字,fd等。

def writer():
    """A coroutine that writes data *sent* to it to fd, socket, etc."""
    while True:
        w = (yield)
        print('>> ', w)

现在的问题是,包装器函数应如何处理将数据发送到编写器,以便将任何发送到包装器的数据透明地发送到writer()

def writer_wrapper(coro):
    # TBD
    pass

w = writer()
wrap = writer_wrapper(w)
wrap.send(None)  # "prime" the coroutine
for i in range(4):
    wrap.send(i)

# Expected result
>>  0
>>  1
>>  2
>>  3

包装器需要(显然)接受发送给它的数据,并且还应处理StopIterationfor循环用尽时的情况。显然只是做for x in coro: yield x不会做。这是一个有效的版本。

def writer_wrapper(coro):
    coro.send(None)  # prime the coro
    while True:
        try:
            x = (yield)  # Capture the value that's sent
            coro.send(x)  # and pass it to the writer
        except StopIteration:
            pass

或者,我们可以这样做。

def writer_wrapper(coro):
    yield from coro

这样可以节省6行代码,使其更具可读性,并且可以正常工作。魔法!

从第2部分-异常处理将数据发送到生成器收益

让我们使其更加复杂。如果我们的作者需要处理异常怎么办?假设writer句柄a 遇到一个SpamException,它将打印***

class SpamException(Exception):
    pass

def writer():
    while True:
        try:
            w = (yield)
        except SpamException:
            print('***')
        else:
            print('>> ', w)

如果我们不改变writer_wrapper怎么办?它行得通吗?我们试试吧

# writer_wrapper same as above

w = writer()
wrap = writer_wrapper(w)
wrap.send(None)  # "prime" the coroutine
for i in [0, 1, 2, 'spam', 4]:
    if i == 'spam':
        wrap.throw(SpamException)
    else:
        wrap.send(i)

# Expected Result
>>  0
>>  1
>>  2
***
>>  4

# Actual Result
>>  0
>>  1
>>  2
Traceback (most recent call last):
  ... redacted ...
  File ... in writer_wrapper
    x = (yield)
__main__.SpamException

嗯,它不起作用,因为x = (yield)只是引发了异常,一切都崩溃了。让它正常工作,但手动处理异常并将其发送或将其抛出到子生成器(writer)中

def writer_wrapper(coro):
    """Works. Manually catches exceptions and throws them"""
    coro.send(None)  # prime the coro
    while True:
        try:
            try:
                x = (yield)
            except Exception as e:   # This catches the SpamException
                coro.throw(e)
            else:
                coro.send(x)
        except StopIteration:
            pass

这可行。

# Result
>>  0
>>  1
>>  2
***
>>  4

但是,这也是!

def writer_wrapper(coro):
    yield from coro

yield from透明地处理发送值或抛出的值到副生成器。

但是,这仍然不能涵盖所有极端情况。如果外部生成器关闭,会发生什么?如果子生成器返回一个值(是的,在Python 3.3+中,生成器可以返回值),该如何处理?yield from透明地处理所有的极端案例是让人印象深刻yield from只是神奇地工作并处理了所有这些情况。

我个人认为这yield from是一个糟糕的关键字选择,因为它不会使双向性变得显而易见。提出了其他关键字(例如delegate但被拒绝了,因为向该语言添加新关键字比合并现有关键字要困难得多。

总之,最好将其yield from视为transparent two way channel调用方和子生成方之间的。

参考文献:

  1. PEP 380-委派给子生成器的语法(尤因)[v3.3,2009-02-13]
  2. PEP 342-通过增强型生成器进行协同程序(GvR,Eby)[v2.5,2005-05-10]

Let’s get one thing out of the way first. The explanation that yield from g is equivalent to for v in g: yield v does not even begin to do justice to what yield from is all about. Because, let’s face it, if all yield from does is expand the for loop, then it does not warrant adding yield from to the language and preclude a whole bunch of new features from being implemented in Python 2.x.

What yield from does is it establishes a transparent bidirectional connection between the caller and the sub-generator:

  • The connection is “transparent” in the sense that it will propagate everything correctly too, not just the elements being generated (e.g. exceptions are propagated).

  • The connection is “bidirectional” in the sense that data can be both sent from and to a generator.

(If we were talking about TCP, yield from g might mean “now temporarily disconnect my client’s socket and reconnect it to this other server socket”.)

BTW, if you are not sure what sending data to a generator even means, you need to drop everything and read about coroutines first—they’re very useful (contrast them with subroutines), but unfortunately lesser-known in Python. Dave Beazley’s Curious Course on Coroutines is an excellent start. Read slides 24-33 for a quick primer.

Reading data from a generator using yield from

def reader():
    """A generator that fakes a read from a file, socket, etc."""
    for i in range(4):
        yield '<< %s' % i

def reader_wrapper(g):
    # Manually iterate over data produced by reader
    for v in g:
        yield v

wrap = reader_wrapper(reader())
for i in wrap:
    print(i)

# Result
<< 0
<< 1
<< 2
<< 3

Instead of manually iterating over reader(), we can just yield from it.

def reader_wrapper(g):
    yield from g

That works, and we eliminated one line of code. And probably the intent is a little bit clearer (or not). But nothing life changing.

Sending data to a generator (coroutine) using yield from – Part 1

Now let’s do something more interesting. Let’s create a coroutine called writer that accepts data sent to it and writes to a socket, fd, etc.

def writer():
    """A coroutine that writes data *sent* to it to fd, socket, etc."""
    while True:
        w = (yield)
        print('>> ', w)

Now the question is, how should the wrapper function handle sending data to the writer, so that any data that is sent to the wrapper is transparently sent to the writer()?

def writer_wrapper(coro):
    # TBD
    pass

w = writer()
wrap = writer_wrapper(w)
wrap.send(None)  # "prime" the coroutine
for i in range(4):
    wrap.send(i)

# Expected result
>>  0
>>  1
>>  2
>>  3

The wrapper needs to accept the data that is sent to it (obviously) and should also handle the StopIteration when the for loop is exhausted. Evidently just doing for x in coro: yield x won’t do. Here is a version that works.

def writer_wrapper(coro):
    coro.send(None)  # prime the coro
    while True:
        try:
            x = (yield)  # Capture the value that's sent
            coro.send(x)  # and pass it to the writer
        except StopIteration:
            pass

Or, we could do this.

def writer_wrapper(coro):
    yield from coro

That saves 6 lines of code, make it much much more readable and it just works. Magic!

Sending data to a generator yield from – Part 2 – Exception handling

Let’s make it more complicated. What if our writer needs to handle exceptions? Let’s say the writer handles a SpamException and it prints *** if it encounters one.

class SpamException(Exception):
    pass

def writer():
    while True:
        try:
            w = (yield)
        except SpamException:
            print('***')
        else:
            print('>> ', w)

What if we don’t change writer_wrapper? Does it work? Let’s try

# writer_wrapper same as above

w = writer()
wrap = writer_wrapper(w)
wrap.send(None)  # "prime" the coroutine
for i in [0, 1, 2, 'spam', 4]:
    if i == 'spam':
        wrap.throw(SpamException)
    else:
        wrap.send(i)

# Expected Result
>>  0
>>  1
>>  2
***
>>  4

# Actual Result
>>  0
>>  1
>>  2
Traceback (most recent call last):
  ... redacted ...
  File ... in writer_wrapper
    x = (yield)
__main__.SpamException

Um, it’s not working because x = (yield) just raises the exception and everything comes to a crashing halt. Let’s make it work, but manually handling exceptions and sending them or throwing them into the sub-generator (writer)

def writer_wrapper(coro):
    """Works. Manually catches exceptions and throws them"""
    coro.send(None)  # prime the coro
    while True:
        try:
            try:
                x = (yield)
            except Exception as e:   # This catches the SpamException
                coro.throw(e)
            else:
                coro.send(x)
        except StopIteration:
            pass

This works.

# Result
>>  0
>>  1
>>  2
***
>>  4

But so does this!

def writer_wrapper(coro):
    yield from coro

The yield from transparently handles sending the values or throwing values into the sub-generator.

This still does not cover all the corner cases though. What happens if the outer generator is closed? What about the case when the sub-generator returns a value (yes, in Python 3.3+, generators can return values), how should the return value be propagated? That yield from transparently handles all the corner cases is really impressive. yield from just magically works and handles all those cases.

I personally feel yield from is a poor keyword choice because it does not make the two-way nature apparent. There were other keywords proposed (like delegate but were rejected because adding a new keyword to the language is much more difficult than combining existing ones.

In summary, it’s best to think of yield from as a transparent two way channel between the caller and the sub-generator.

References:

  1. PEP 380 – Syntax for delegating to a sub-generator (Ewing) [v3.3, 2009-02-13]
  2. PEP 342 – Coroutines via Enhanced Generators (GvR, Eby) [v2.5, 2005-05-10]

回答 1

在什么情况下“产生于”是有用的?

您遇到这样的循环的每种情况:

for x in subgenerator:
  yield x

作为PEP介绍,这是一个相当幼稚企图在使用子发生器,它缺少几个方面,特别是妥善处理.throw()/ .send()/ .close()通过引进机制PEP 342。要正确执行此操作,需要相当复杂的代码。

什么是经典用例?

考虑您要从递归数据结构中提取信息。假设我们要获取树中的所有叶节点:

def traverse_tree(node):
  if not node.children:
    yield node
  for child in node.children:
    yield from traverse_tree(child)

更重要的是,直到之前yield from,还没有简单的重构生成器代码的方法。假设您有一个(无意义的)生成器,如下所示:

def get_list_values(lst):
  for item in lst:
    yield int(item)
  for item in lst:
    yield str(item)
  for item in lst:
    yield float(item)

现在,您决定将这些循环分解为单独的生成器。不带yield from,这是很丑陋的,直到您是否真的想这样做三思。使用yield from,实际上看起来很不错:

def get_list_values(lst):
  for sub in [get_list_values_as_int, 
              get_list_values_as_str, 
              get_list_values_as_float]:
    yield from sub(lst)

为什么与微线程相比?

我认为PEP中的这一部分谈论的是,每个生成器确实都有其自己的隔离执行上下文。以及使用yield和来在生成者迭代器和调用者之间切换执行的事实__next__()分别,这类似于线程,其中操作系统会不时切换执行线程以及执行上下文(堆栈,寄存器, …)。

其效果也相当:生成器迭代器和调用者都同时在其执行状态中进行,它们的执行是交错的。例如,如果生成器进行某种计算,并且调用方打印出结果,则结果可用时,您将立即看到它们。这是一种并发形式。

这种类比不是特定于的yield from-而是Python中生成器的一般属性。

What are the situations where “yield from” is useful?

Every situation where you have a loop like this:

for x in subgenerator:
  yield x

As the PEP describes, this is a rather naive attempt at using the subgenerator, it’s missing several aspects, especially the proper handling of the .throw()/.send()/.close() mechanisms introduced by PEP 342. To do this properly, rather complicated code is necessary.

What is the classic use case?

Consider that you want to extract information from a recursive data structure. Let’s say we want to get all leaf nodes in a tree:

def traverse_tree(node):
  if not node.children:
    yield node
  for child in node.children:
    yield from traverse_tree(child)

Even more important is the fact that until the yield from, there was no simple method of refactoring the generator code. Suppose you have a (senseless) generator like this:

def get_list_values(lst):
  for item in lst:
    yield int(item)
  for item in lst:
    yield str(item)
  for item in lst:
    yield float(item)

Now you decide to factor out these loops into separate generators. Without yield from, this is ugly, up to the point where you will think twice whether you actually want to do it. With yield from, it’s actually nice to look at:

def get_list_values(lst):
  for sub in [get_list_values_as_int, 
              get_list_values_as_str, 
              get_list_values_as_float]:
    yield from sub(lst)

Why is it compared to micro-threads?

I think what this section in the PEP is talking about is that every generator does have its own isolated execution context. Together with the fact that execution is switched between the generator-iterator and the caller using yield and __next__(), respectively, this is similar to threads, where the operating system switches the executing thread from time to time, along with the execution context (stack, registers, …).

The effect of this is also comparable: Both the generator-iterator and the caller progress in their execution state at the same time, their executions are interleaved. For example, if the generator does some kind of computation and the caller prints out the results, you’ll see the results as soon as they’re available. This is a form of concurrency.

That analogy isn’t anything specific to yield from, though – it’s rather a general property of generators in Python.


回答 2

无论您从生成器内部调用生成器的哪个位置,都需要一个“泵”来重新yield设置值: for v in inner_generator: yield v。正如PEP所指出的那样,大多数人都忽略了这一点的微妙复杂性。throw()PEP中提供了一个示例,例如非本地流控制。yield from inner_generator无论您for之前编写了显式循环的地方,都将使用新语法。但是,它不仅是语法糖,它还处理了for循环忽略的所有极端情况。成为“丑闻”会鼓励人们使用它,从而获得正确的行为。

讨论线程中的此消息讨论了以下复杂性:

有了PEP 342引入的其他生成器功能,情况已不再如此:如Greg的PEP中所述,简单的迭代不正确地支持send()和throw()。当分解它们时,支持send()和throw()所需的体操实际上并不那么复杂,但是它们也不是简单的。

除了观察到生成器是一种平行论之外,我无法与微线程进行比较。您可以将挂起的生成器视为通过以下方式发送值的线程:yield到使用者线程的线程。实际的实现可能并非如此(Python开发人员显然对实际的实现非常感兴趣),但这与用户无关。

新的yield from语法不会在线程方面为语言增加任何其他功能,而只是使正确使用现有功能更加容易。或更准确地说,它使专家编写的复杂内部生成器的新手消费者可以更轻松地通过该生成器,而不会破坏其任何复杂功能。

Wherever you invoke a generator from within a generator you need a “pump” to re-yield the values: for v in inner_generator: yield v. As the PEP points out there are subtle complexities to this which most people ignore. Non-local flow-control like throw() is one example given in the PEP. The new syntax yield from inner_generator is used wherever you would have written the explicit for loop before. It’s not merely syntactic sugar, though: It handles all of the corner cases that are ignored by the for loop. Being “sugary” encourages people to use it and thus get the right behaviors.

This message in the discussion thread talks about these complexities:

With the additional generator features introduced by PEP 342, that is no longer the case: as described in Greg’s PEP, simple iteration doesn’t support send() and throw() correctly. The gymnastics needed to support send() and throw() actually aren’t that complex when you break them down, but they aren’t trivial either.

I can’t speak to a comparison with micro-threads, other than to observe that generators are a type of paralellism. You can consider the suspended generator to be a thread which sends values via yield to a consumer thread. The actual implementation may be nothing like this (and the actual implementation is obviously of great interest to the Python developers) but this does not concern the users.

The new yield from syntax does not add any additional capability to the language in terms of threading, it just makes it easier to use existing features correctly. Or more precisely it makes it easier for a novice consumer of a complex inner generator written by an expert to pass through that generator without breaking any of its complex features.


回答 3

一个简短的示例将帮助您理解的一个yield from用例:从另一个生成器获取价值

def flatten(sequence):
    """flatten a multi level list or something
    >>> list(flatten([1, [2], 3]))
    [1, 2, 3]
    >>> list(flatten([1, [2], [3, [4]]]))
    [1, 2, 3, 4]
    """
    for element in sequence:
        if hasattr(element, '__iter__'):
            yield from flatten(element)
        else:
            yield element

print(list(flatten([1, [2], [3, [4]]])))

A short example will help you understand one of yield from‘s use case: get value from another generator

def flatten(sequence):
    """flatten a multi level list or something
    >>> list(flatten([1, [2], 3]))
    [1, 2, 3]
    >>> list(flatten([1, [2], [3, [4]]]))
    [1, 2, 3, 4]
    """
    for element in sequence:
        if hasattr(element, '__iter__'):
            yield from flatten(element)
        else:
            yield element

print(list(flatten([1, [2], [3, [4]]])))

回答 4

yield from 基本上以有效的方式链接迭代器:

# chain from itertools:
def chain(*iters):
    for it in iters:
        for item in it:
            yield item

# with the new keyword
def chain(*iters):
    for it in iters:
        yield from it

如您所见,它删除了一个纯Python循环。这几乎就是它的全部工作,但是链接迭代器是Python中很常见的模式。

线程基本上是一种功能,使您可以在完全随机的点跳出函数,然后跳回另一个函数的状态。线程管理器经常执行此操作,因此该程序似乎可以同时运行所有这些功能。问题是这些点是随机的,因此您需要使用锁定来防止主管在有问题的点停止该功能。

在这种意义上,生成器与线程非常相似:它们允许您指定特定点(无论何时, yield),您可以在其中跳入和跳出。当以这种方式使用时,生成器称为协程。

阅读有关Python中协程的出色教程,以了解更多详细信息

yield from basically chains iterators in a efficient way:

# chain from itertools:
def chain(*iters):
    for it in iters:
        for item in it:
            yield item

# with the new keyword
def chain(*iters):
    for it in iters:
        yield from it

As you can see it removes one pure Python loop. That’s pretty much all it does, but chaining iterators is a pretty common pattern in Python.

Threads are basically a feature that allow you to jump out of functions at completely random points and jump back into the state of another function. The thread supervisor does this very often, so the program appears to run all these functions at the same time. The problem is that the points are random, so you need to use locking to prevent the supervisor from stopping the function at a problematic point.

Generators are pretty similar to threads in this sense: They allow you to specify specific points (whenever they yield) where you can jump in and out. When used this way, generators are called coroutines.

Read this excellent tutorials about coroutines in Python for more details


回答 5

在应用的使用为异步IO协程yield from也有类似的行为作为await协程功能。两者都用于中止协程的执行。

对于Asyncio,如果不需要支持较旧的Python版本(即> 3.5),则建议使用async def/ await作为定义协程的语法。因此yield from,协程中不再需要。

但通常在asyncio之外,如先前答案中所述,yield from <sub-generator>在迭代子生成器方面还有其他用途。

In applied usage for the Asynchronous IO coroutine, yield from has a similar behavior as await in a coroutine function. Both of which is used to suspend the execution of coroutine.

For Asyncio, if there’s no need to support an older Python version (i.e. >3.5), async def/await is the recommended syntax to define a coroutine. Thus yield from is no longer needed in a coroutine.

But in general outside of asyncio, yield from <sub-generator> has still some other usage in iterating the sub-generator as mentioned in the earlier answer.


回答 6

该代码定义了一个函数,该函数fixed_sum_digits返回一个生成器,该生成器枚举所有六个数字的数字,以使数字的总和为20。

def iter_fun(sum, deepness, myString, Total):
    if deepness == 0:
        if sum == Total:
            yield myString
    else:  
        for i in range(min(10, Total - sum + 1)):
            yield from iter_fun(sum + i,deepness - 1,myString + str(i),Total)

def fixed_sum_digits(digits, Tot):
    return iter_fun(0,digits,"",Tot) 

试着不用来写yield from。如果您找到有效的方法,请告诉我。

我认为对于这种情况:访问树yield from使代码更简单,更清晰。

This code defines a function fixed_sum_digits returning a generator enumerating all six digits numbers such that the sum of digits is 20.

def iter_fun(sum, deepness, myString, Total):
    if deepness == 0:
        if sum == Total:
            yield myString
    else:  
        for i in range(min(10, Total - sum + 1)):
            yield from iter_fun(sum + i,deepness - 1,myString + str(i),Total)

def fixed_sum_digits(digits, Tot):
    return iter_fun(0,digits,"",Tot) 

Try to write it without yield from. If you find an effective way to do it let me know.

I think that for cases like this one: visiting trees, yield from makes the code simpler and cleaner.


回答 7

简而言之,为迭代器函数yield from提供尾递归

Simply put, yield from provides tail recursion for iterator functions.


“ yield”关键字有什么作用?

问题:“ yield”关键字有什么作用?

yield关键字在Python中的用途是什么?

例如,我试图理解这段代码1

def _get_child_candidates(self, distance, min_dist, max_dist):
    if self._leftchild and distance - max_dist < self._median:
        yield self._leftchild
    if self._rightchild and distance + max_dist >= self._median:
        yield self._rightchild  

这是调用方法:

result, candidates = [], [self]
while candidates:
    node = candidates.pop()
    distance = node._get_dist(obj)
    if distance <= max_dist and distance >= min_dist:
        result.extend(node._values)
    candidates.extend(node._get_child_candidates(distance, min_dist, max_dist))
return result

_get_child_candidates调用该方法会怎样?是否返回列表?一个元素?再叫一次吗?后续通话何时停止?


1.这段代码是由Jochen Schulz(jrschulz)编写的,Jochen Schulz是一个很好的用于度量空间的Python库。这是完整源代码的链接:Module mspace

What is the use of the yield keyword in Python, and what does it do?

For example, I’m trying to understand this code1:

def _get_child_candidates(self, distance, min_dist, max_dist):
    if self._leftchild and distance - max_dist < self._median:
        yield self._leftchild
    if self._rightchild and distance + max_dist >= self._median:
        yield self._rightchild  

And this is the caller:

result, candidates = [], [self]
while candidates:
    node = candidates.pop()
    distance = node._get_dist(obj)
    if distance <= max_dist and distance >= min_dist:
        result.extend(node._values)
    candidates.extend(node._get_child_candidates(distance, min_dist, max_dist))
return result

What happens when the method _get_child_candidates is called? Is a list returned? A single element? Is it called again? When will subsequent calls stop?


1. This piece of code was written by Jochen Schulz (jrschulz), who made a great Python library for metric spaces. This is the link to the complete source: Module mspace.


回答 0

要了解其yield作用,您必须了解什么是生成器。而且,在您了解生成器之前,您必须了解iterables

可迭代

创建列表时,可以一一阅读它的项目。逐一读取其项称为迭代:

>>> mylist = [1, 2, 3]
>>> for i in mylist:
...    print(i)
1
2
3

mylist是一个可迭代的。当您使用列表推导时,您将创建一个列表,因此是可迭代的:

>>> mylist = [x*x for x in range(3)]
>>> for i in mylist:
...    print(i)
0
1
4

您可以使用的所有“ for... in...”都是可迭代的;listsstrings,文件…

这些可迭代的方法很方便,因为您可以随意读取它们,但是您将所有值都存储在内存中,当拥有很多值时,这并不总是想要的。

生成器

生成器是迭代器,一种迭代,您只能迭代一次。生成器不会将所有值存储在内存中,它们会即时生成值

>>> mygenerator = (x*x for x in range(3))
>>> for i in mygenerator:
...    print(i)
0
1
4

只是您使用()代替一样[]。但是,由于生成器只能使用一次,因此您无法执行for i in mygenerator第二次:生成器计算0,然后忽略它,然后计算1,最后一次计算4,最后一次。

Yield

yield是与一样使用的关键字return,不同之处在于该函数将返回生成器。

>>> def createGenerator():
...    mylist = range(3)
...    for i in mylist:
...        yield i*i
...
>>> mygenerator = createGenerator() # create a generator
>>> print(mygenerator) # mygenerator is an object!
<generator object createGenerator at 0xb7555c34>
>>> for i in mygenerator:
...     print(i)
0
1
4

这是一个无用的示例,但是当您知道函数将返回大量的值(只需要读取一次)时,它就很方便。

要掌握yield,您必须了解在调用函数时,在函数主体中编写的代码不会运行。该函数仅返回生成器对象,这有点棘手:-)

然后,您的代码将在每次for使用生成器时从中断处继续。

现在最困难的部分是:

第一次for调用从您的函数创建的生成器对象时,它将从头开始运行函数中的代码,直到命中为止yield,然后它将返回循环的第一个值。然后,每个后续调用将运行您在函数中编写的循环的另一个迭代,并返回下一个值。这将一直持续到生成器被认为是空的为止,这在函数运行时没有命中时就会发生yield。那可能是因为循环已经结束,或者是因为您不再满足"if/else"


您的代码说明

生成器:

# Here you create the method of the node object that will return the generator
def _get_child_candidates(self, distance, min_dist, max_dist):

    # Here is the code that will be called each time you use the generator object:

    # If there is still a child of the node object on its left
    # AND if the distance is ok, return the next child
    if self._leftchild and distance - max_dist < self._median:
        yield self._leftchild

    # If there is still a child of the node object on its right
    # AND if the distance is ok, return the next child
    if self._rightchild and distance + max_dist >= self._median:
        yield self._rightchild

    # If the function arrives here, the generator will be considered empty
    # there is no more than two values: the left and the right children

调用方法:

# Create an empty list and a list with the current object reference
result, candidates = list(), [self]

# Loop on candidates (they contain only one element at the beginning)
while candidates:

    # Get the last candidate and remove it from the list
    node = candidates.pop()

    # Get the distance between obj and the candidate
    distance = node._get_dist(obj)

    # If distance is ok, then you can fill the result
    if distance <= max_dist and distance >= min_dist:
        result.extend(node._values)

    # Add the children of the candidate in the candidate's list
    # so the loop will keep running until it will have looked
    # at all the children of the children of the children, etc. of the candidate
    candidates.extend(node._get_child_candidates(distance, min_dist, max_dist))

return result

该代码包含几个智能部分:

  • 循环在一个列表上迭代,但是循环在迭代时列表会扩展:-)这是浏览所有这些嵌套数据的一种简洁方法,即使这样做有点危险,因为您可能会遇到无限循环。在这种情况下,请candidates.extend(node._get_child_candidates(distance, min_dist, max_dist))耗尽所有生成器的值,但是while继续创建新的生成器对象,因为它们未应用于同一节点,因此将产生与先前值不同的值。

  • extend()方法是期望可迭代并将其值添加到列表的列表对象方法。

通常我们将一个列表传递给它:

>>> a = [1, 2]
>>> b = [3, 4]
>>> a.extend(b)
>>> print(a)
[1, 2, 3, 4]

但是在您的代码中,它得到了一个生成器,这很好,因为:

  1. 您无需两次读取值。
  2. 您可能有很多孩子,并且您不希望所有孩子都存储在内存中。

它之所以有效,是因为Python不在乎方法的参数是否为列表。Python期望可迭代,因此它将与字符串,列表,元组和生成器一起使用!这就是所谓的鸭子输入,这是Python如此酷的原因之一。但这是另一个故事,还有另一个问题…

您可以在这里停止,或者阅读一点以了解生成器的高级用法:

控制生成器耗尽

>>> class Bank(): # Let's create a bank, building ATMs
...    crisis = False
...    def create_atm(self):
...        while not self.crisis:
...            yield "$100"
>>> hsbc = Bank() # When everything's ok the ATM gives you as much as you want
>>> corner_street_atm = hsbc.create_atm()
>>> print(corner_street_atm.next())
$100
>>> print(corner_street_atm.next())
$100
>>> print([corner_street_atm.next() for cash in range(5)])
['$100', '$100', '$100', '$100', '$100']
>>> hsbc.crisis = True # Crisis is coming, no more money!
>>> print(corner_street_atm.next())
<type 'exceptions.StopIteration'>
>>> wall_street_atm = hsbc.create_atm() # It's even true for new ATMs
>>> print(wall_street_atm.next())
<type 'exceptions.StopIteration'>
>>> hsbc.crisis = False # The trouble is, even post-crisis the ATM remains empty
>>> print(corner_street_atm.next())
<type 'exceptions.StopIteration'>
>>> brand_new_atm = hsbc.create_atm() # Build a new one to get back in business
>>> for cash in brand_new_atm:
...    print cash
$100
$100
$100
$100
$100
$100
$100
$100
$100
...

注意:对于Python 3,请使用print(corner_street_atm.__next__())print(next(corner_street_atm))

对于诸如控制对资源的访问之类的各种事情,它可能很有用。

Itertools,您最好的朋友

itertools模块包含用于操纵可迭代对象的特殊功能。曾经希望复制一个生成器吗?连锁两个生成器?用一行代码对嵌套列表中的值进行分组?Map / Zip没有创建另一个列表?

然后就import itertools

一个例子?让我们看一下四马比赛的可能到达顺序:

>>> horses = [1, 2, 3, 4]
>>> races = itertools.permutations(horses)
>>> print(races)
<itertools.permutations object at 0xb754f1dc>
>>> print(list(itertools.permutations(horses)))
[(1, 2, 3, 4),
 (1, 2, 4, 3),
 (1, 3, 2, 4),
 (1, 3, 4, 2),
 (1, 4, 2, 3),
 (1, 4, 3, 2),
 (2, 1, 3, 4),
 (2, 1, 4, 3),
 (2, 3, 1, 4),
 (2, 3, 4, 1),
 (2, 4, 1, 3),
 (2, 4, 3, 1),
 (3, 1, 2, 4),
 (3, 1, 4, 2),
 (3, 2, 1, 4),
 (3, 2, 4, 1),
 (3, 4, 1, 2),
 (3, 4, 2, 1),
 (4, 1, 2, 3),
 (4, 1, 3, 2),
 (4, 2, 1, 3),
 (4, 2, 3, 1),
 (4, 3, 1, 2),
 (4, 3, 2, 1)]

了解迭代的内部机制

迭代是一个隐含可迭代(实现__iter__()方法)和迭代器(实现__next__()方法)的过程。可迭代对象是可以从中获取迭代器的任何对象。迭代器是使您可以迭代的对象。

本文还提供了有关循环如何for工作的更多信息

To understand what yield does, you must understand what generators are. And before you can understand generators, you must understand iterables.

Iterables

When you create a list, you can read its items one by one. Reading its items one by one is called iteration:

>>> mylist = [1, 2, 3]
>>> for i in mylist:
...    print(i)
1
2
3

mylist is an iterable. When you use a list comprehension, you create a list, and so an iterable:

>>> mylist = [x*x for x in range(3)]
>>> for i in mylist:
...    print(i)
0
1
4

Everything you can use “for... in...” on is an iterable; lists, strings, files…

These iterables are handy because you can read them as much as you wish, but you store all the values in memory and this is not always what you want when you have a lot of values.

Generators

Generators are iterators, a kind of iterable you can only iterate over once. Generators do not store all the values in memory, they generate the values on the fly:

>>> mygenerator = (x*x for x in range(3))
>>> for i in mygenerator:
...    print(i)
0
1
4

It is just the same except you used () instead of []. BUT, you cannot perform for i in mygenerator a second time since generators can only be used once: they calculate 0, then forget about it and calculate 1, and end calculating 4, one by one.

Yield

yield is a keyword that is used like return, except the function will return a generator.

>>> def createGenerator():
...    mylist = range(3)
...    for i in mylist:
...        yield i*i
...
>>> mygenerator = createGenerator() # create a generator
>>> print(mygenerator) # mygenerator is an object!
<generator object createGenerator at 0xb7555c34>
>>> for i in mygenerator:
...     print(i)
0
1
4

Here it’s a useless example, but it’s handy when you know your function will return a huge set of values that you will only need to read once.

To master yield, you must understand that when you call the function, the code you have written in the function body does not run. The function only returns the generator object, this is a bit tricky :-)

Then, your code will continue from where it left off each time for uses the generator.

Now the hard part:

The first time the for calls the generator object created from your function, it will run the code in your function from the beginning until it hits yield, then it’ll return the first value of the loop. Then, each subsequent call will run another iteration of the loop you have written in the function and return the next value. This will continue until the generator is considered empty, which happens when the function runs without hitting yield. That can be because the loop has come to an end, or because you no longer satisfy an "if/else".


Your code explained

Generator:

# Here you create the method of the node object that will return the generator
def _get_child_candidates(self, distance, min_dist, max_dist):

    # Here is the code that will be called each time you use the generator object:

    # If there is still a child of the node object on its left
    # AND if the distance is ok, return the next child
    if self._leftchild and distance - max_dist < self._median:
        yield self._leftchild

    # If there is still a child of the node object on its right
    # AND if the distance is ok, return the next child
    if self._rightchild and distance + max_dist >= self._median:
        yield self._rightchild

    # If the function arrives here, the generator will be considered empty
    # there is no more than two values: the left and the right children

Caller:

# Create an empty list and a list with the current object reference
result, candidates = list(), [self]

# Loop on candidates (they contain only one element at the beginning)
while candidates:

    # Get the last candidate and remove it from the list
    node = candidates.pop()

    # Get the distance between obj and the candidate
    distance = node._get_dist(obj)

    # If distance is ok, then you can fill the result
    if distance <= max_dist and distance >= min_dist:
        result.extend(node._values)

    # Add the children of the candidate in the candidate's list
    # so the loop will keep running until it will have looked
    # at all the children of the children of the children, etc. of the candidate
    candidates.extend(node._get_child_candidates(distance, min_dist, max_dist))

return result

This code contains several smart parts:

  • The loop iterates on a list, but the list expands while the loop is being iterated :-) It’s a concise way to go through all these nested data even if it’s a bit dangerous since you can end up with an infinite loop. In this case, candidates.extend(node._get_child_candidates(distance, min_dist, max_dist)) exhaust all the values of the generator, but while keeps creating new generator objects which will produce different values from the previous ones since it’s not applied on the same node.

  • The extend() method is a list object method that expects an iterable and adds its values to the list.

Usually we pass a list to it:

>>> a = [1, 2]
>>> b = [3, 4]
>>> a.extend(b)
>>> print(a)
[1, 2, 3, 4]

But in your code, it gets a generator, which is good because:

  1. You don’t need to read the values twice.
  2. You may have a lot of children and you don’t want them all stored in memory.

And it works because Python does not care if the argument of a method is a list or not. Python expects iterables so it will work with strings, lists, tuples, and generators! This is called duck typing and is one of the reasons why Python is so cool. But this is another story, for another question…

You can stop here, or read a little bit to see an advanced use of a generator:

Controlling a generator exhaustion

>>> class Bank(): # Let's create a bank, building ATMs
...    crisis = False
...    def create_atm(self):
...        while not self.crisis:
...            yield "$100"
>>> hsbc = Bank() # When everything's ok the ATM gives you as much as you want
>>> corner_street_atm = hsbc.create_atm()
>>> print(corner_street_atm.next())
$100
>>> print(corner_street_atm.next())
$100
>>> print([corner_street_atm.next() for cash in range(5)])
['$100', '$100', '$100', '$100', '$100']
>>> hsbc.crisis = True # Crisis is coming, no more money!
>>> print(corner_street_atm.next())
<type 'exceptions.StopIteration'>
>>> wall_street_atm = hsbc.create_atm() # It's even true for new ATMs
>>> print(wall_street_atm.next())
<type 'exceptions.StopIteration'>
>>> hsbc.crisis = False # The trouble is, even post-crisis the ATM remains empty
>>> print(corner_street_atm.next())
<type 'exceptions.StopIteration'>
>>> brand_new_atm = hsbc.create_atm() # Build a new one to get back in business
>>> for cash in brand_new_atm:
...    print cash
$100
$100
$100
$100
$100
$100
$100
$100
$100
...

Note: For Python 3, useprint(corner_street_atm.__next__()) or print(next(corner_street_atm))

It can be useful for various things like controlling access to a resource.

Itertools, your best friend

The itertools module contains special functions to manipulate iterables. Ever wish to duplicate a generator? Chain two generators? Group values in a nested list with a one-liner? Map / Zip without creating another list?

Then just import itertools.

An example? Let’s see the possible orders of arrival for a four-horse race:

>>> horses = [1, 2, 3, 4]
>>> races = itertools.permutations(horses)
>>> print(races)
<itertools.permutations object at 0xb754f1dc>
>>> print(list(itertools.permutations(horses)))
[(1, 2, 3, 4),
 (1, 2, 4, 3),
 (1, 3, 2, 4),
 (1, 3, 4, 2),
 (1, 4, 2, 3),
 (1, 4, 3, 2),
 (2, 1, 3, 4),
 (2, 1, 4, 3),
 (2, 3, 1, 4),
 (2, 3, 4, 1),
 (2, 4, 1, 3),
 (2, 4, 3, 1),
 (3, 1, 2, 4),
 (3, 1, 4, 2),
 (3, 2, 1, 4),
 (3, 2, 4, 1),
 (3, 4, 1, 2),
 (3, 4, 2, 1),
 (4, 1, 2, 3),
 (4, 1, 3, 2),
 (4, 2, 1, 3),
 (4, 2, 3, 1),
 (4, 3, 1, 2),
 (4, 3, 2, 1)]

Understanding the inner mechanisms of iteration

Iteration is a process implying iterables (implementing the __iter__() method) and iterators (implementing the __next__() method). Iterables are any objects you can get an iterator from. Iterators are objects that let you iterate on iterables.

There is more about it in this article about how for loops work.


回答 1

理解的捷径 yield

当您看到带有yield语句的函数时,请应用以下简单技巧,以了解将发生的情况:

  1. result = []在函数的开头插入一行。
  2. 替换每个yield exprresult.append(expr)
  3. return result在函数底部插入一行。
  4. 是的-不再yield声明!阅读并找出代码。
  5. 将功能与原始定义进行比较。

这个技巧可能会让您对函数背后的逻辑yield有所了解,但是实际发生的事情与基于列表的方法发生的事情明显不同。在许多情况下,yield方法也将具有更高的内存效率和更快的速度。在其他情况下,即使原始函数运行正常,此技巧也会使您陷入无限循环。请继续阅读以了解更多信息…

不要混淆您的Iterable,Iterators和Generators

首先,迭代器协议 -当您编写时

for x in mylist:
    ...loop body...

Python执行以下两个步骤:

  1. 获取一个迭代器 mylist

    调用iter(mylist)->这将返回一个带有next()方法(或__next__()在Python 3中)。

    [这是大多数人忘记告诉您的步骤]

  2. 使用迭代器遍历项目:

    继续next()在从步骤1返回的迭代器上调用该方法。从的返回值next()被分配给x并执行循环体。如果StopIteration从内部引发异常next(),则意味着迭代器中没有更多值,并且退出了循环。

事实是,Python在想要遍历对象内容的任何时候都执行上述两个步骤-因此它可以是for循环,但也可以是类似的代码otherlist.extend(mylist)(其中otherlist是Python列表)。

mylist是一个可迭代的,因为它实现了迭代器协议。在用户定义的类中,可以实现该__iter__()方法以使您的类的实例可迭代。此方法应返回迭代器。迭代器是带有next()方法的对象。它可以同时实现__iter__(),并next()在同一类,并有__iter__()回报self。这适用于简单的情况,但是当您希望两个迭代器同时在同一个对象上循环时,则不能使用。

这就是迭代器协议,许多对象都实现了该协议:

  1. 内置列表,字典,元组,集合,文件。
  2. 实现的用户定义的类__iter__()
  3. 生成器。

请注意,for循环不知道它要处理的是哪种对象-它仅遵循迭代器协议,并且很高兴在调用时逐项获取next()。内置列表一一返回它们的项,词典一一返回,文件一一返回,依此类推。生成器返回…就是这样yield

def f123():
    yield 1
    yield 2
    yield 3

for item in f123():
    print item

yield如果没有三个return语句,f123()则只执行第一个语句,而不是语句,然后函数将退出。但是f123()没有普通的功能。当f123()被调用时,它不会返回yield语句中的任何值!它返回一个生成器对象。另外,该函数并没有真正退出-进入了挂起状态。当for循环尝试遍历生成器对象时,该函数从yield先前返回的下一行从其挂起状态恢复,执行下一行代码(在这种情况下为yield语句),并将其作为下一行返回项目。这会一直发生,直到函数退出,此时生成器将引发StopIteration,然后循环退出。

因此,生成器对象有点像适配器-在一端,它通过公开__iter__()next()保持for循环满意的方法来展示迭代器协议。但是,在另一端,它仅运行该函数以从中获取下一个值,然后将其放回暂停模式。

为什么使用生成器?

通常,您可以编写不使用生成器但实现相同逻辑的代码。一种选择是使用我之前提到的临时列表“技巧”。这并非在所有情况下都可行,例如,如果您有无限循环,或者当您的列表很长时,这可能会导致内存使用效率低下。另一种方法是实现一个新的可迭代类SomethingIter,该类将状态保留在实例成员中,并在其next()(或__next__()Python 3)方法中执行下一个逻辑步骤。根据逻辑,next()方法中的代码可能最终看起来非常复杂并且容易出现错误。在这里,生成器提供了一种干净而简单的解决方案。

Shortcut to understanding yield

When you see a function with yield statements, apply this easy trick to understand what will happen:

  1. Insert a line result = [] at the start of the function.
  2. Replace each yield expr with result.append(expr).
  3. Insert a line return result at the bottom of the function.
  4. Yay – no more yield statements! Read and figure out code.
  5. Compare function to the original definition.

This trick may give you an idea of the logic behind the function, but what actually happens with yield is significantly different than what happens in the list based approach. In many cases, the yield approach will be a lot more memory efficient and faster too. In other cases, this trick will get you stuck in an infinite loop, even though the original function works just fine. Read on to learn more…

Don’t confuse your Iterables, Iterators, and Generators

First, the iterator protocol – when you write

for x in mylist:
    ...loop body...

Python performs the following two steps:

  1. Gets an iterator for mylist:

    Call iter(mylist) -> this returns an object with a next() method (or __next__() in Python 3).

    [This is the step most people forget to tell you about]

  2. Uses the iterator to loop over items:

    Keep calling the next() method on the iterator returned from step 1. The return value from next() is assigned to x and the loop body is executed. If an exception StopIteration is raised from within next(), it means there are no more values in the iterator and the loop is exited.

The truth is Python performs the above two steps anytime it wants to loop over the contents of an object – so it could be a for loop, but it could also be code like otherlist.extend(mylist) (where otherlist is a Python list).

Here mylist is an iterable because it implements the iterator protocol. In a user-defined class, you can implement the __iter__() method to make instances of your class iterable. This method should return an iterator. An iterator is an object with a next() method. It is possible to implement both __iter__() and next() on the same class, and have __iter__() return self. This will work for simple cases, but not when you want two iterators looping over the same object at the same time.

So that’s the iterator protocol, many objects implement this protocol:

  1. Built-in lists, dictionaries, tuples, sets, files.
  2. User-defined classes that implement __iter__().
  3. Generators.

Note that a for loop doesn’t know what kind of object it’s dealing with – it just follows the iterator protocol, and is happy to get item after item as it calls next(). Built-in lists return their items one by one, dictionaries return the keys one by one, files return the lines one by one, etc. And generators return… well that’s where yield comes in:

def f123():
    yield 1
    yield 2
    yield 3

for item in f123():
    print item

Instead of yield statements, if you had three return statements in f123() only the first would get executed, and the function would exit. But f123() is no ordinary function. When f123() is called, it does not return any of the values in the yield statements! It returns a generator object. Also, the function does not really exit – it goes into a suspended state. When the for loop tries to loop over the generator object, the function resumes from its suspended state at the very next line after the yield it previously returned from, executes the next line of code, in this case, a yield statement, and returns that as the next item. This happens until the function exits, at which point the generator raises StopIteration, and the loop exits.

So the generator object is sort of like an adapter – at one end it exhibits the iterator protocol, by exposing __iter__() and next() methods to keep the for loop happy. At the other end, however, it runs the function just enough to get the next value out of it, and puts it back in suspended mode.

Why Use Generators?

Usually, you can write code that doesn’t use generators but implements the same logic. One option is to use the temporary list ‘trick’ I mentioned before. That will not work in all cases, for e.g. if you have infinite loops, or it may make inefficient use of memory when you have a really long list. The other approach is to implement a new iterable class SomethingIter that keeps the state in instance members and performs the next logical step in it’s next() (or __next__() in Python 3) method. Depending on the logic, the code inside the next() method may end up looking very complex and be prone to bugs. Here generators provide a clean and easy solution.


回答 2

这样想:

迭代器只是一个带有next()方法的对象的美化名词。因此,产生收益的函数最终是这样的:

原始版本:

def some_function():
    for i in xrange(4):
        yield i

for i in some_function():
    print i

这基本上是Python解释器使用上面的代码执行的操作:

class it:
    def __init__(self):
        # Start at -1 so that we get 0 when we add 1 below.
        self.count = -1

    # The __iter__ method will be called once by the 'for' loop.
    # The rest of the magic happens on the object returned by this method.
    # In this case it is the object itself.
    def __iter__(self):
        return self

    # The next method will be called repeatedly by the 'for' loop
    # until it raises StopIteration.
    def next(self):
        self.count += 1
        if self.count < 4:
            return self.count
        else:
            # A StopIteration exception is raised
            # to signal that the iterator is done.
            # This is caught implicitly by the 'for' loop.
            raise StopIteration

def some_func():
    return it()

for i in some_func():
    print i

为了更深入地了解幕后发生的事情,for可以将循环重写为:

iterator = some_func()
try:
    while 1:
        print iterator.next()
except StopIteration:
    pass

这是否更有意义,还是会让您更加困惑?:)

我要指出,这为了说明的目的过于简单化。:)

Think of it this way:

An iterator is just a fancy sounding term for an object that has a next() method. So a yield-ed function ends up being something like this:

Original version:

def some_function():
    for i in xrange(4):
        yield i

for i in some_function():
    print i

This is basically what the Python interpreter does with the above code:

class it:
    def __init__(self):
        # Start at -1 so that we get 0 when we add 1 below.
        self.count = -1

    # The __iter__ method will be called once by the 'for' loop.
    # The rest of the magic happens on the object returned by this method.
    # In this case it is the object itself.
    def __iter__(self):
        return self

    # The next method will be called repeatedly by the 'for' loop
    # until it raises StopIteration.
    def next(self):
        self.count += 1
        if self.count < 4:
            return self.count
        else:
            # A StopIteration exception is raised
            # to signal that the iterator is done.
            # This is caught implicitly by the 'for' loop.
            raise StopIteration

def some_func():
    return it()

for i in some_func():
    print i

For more insight as to what’s happening behind the scenes, the for loop can be rewritten to this:

iterator = some_func()
try:
    while 1:
        print iterator.next()
except StopIteration:
    pass

Does that make more sense or just confuse you more? :)

I should note that this is an oversimplification for illustrative purposes. :)


回答 3

yield关键字被减少到两个简单的事实:

  1. 如果编译器在函数内部的任何位置检测到yield关键字,则该函数不再通过该语句返回。相反,它立即返回一个懒惰的“待处理列表”对象,称为生成器return
  2. 生成器是可迭代的。什么是可迭代的?就像是listor或setor range或dict-view一样,它带有用于以特定顺序访问每个元素内置协议

简而言之:生成器是一个懒惰的,增量待定的list,并且yield语句允许您使用函数符号来编程生成器应逐渐吐出的列表值

generator = myYieldingFunction(...)
x = list(generator)

   generator
       v
[x[0], ..., ???]

         generator
             v
[x[0], x[1], ..., ???]

               generator
                   v
[x[0], x[1], x[2], ..., ???]

                       StopIteration exception
[x[0], x[1], x[2]]     done

list==[x[0], x[1], x[2]]

让我们定义一个makeRange类似于Python的函数range。调用makeRange(n)“返回生成器”:

def makeRange(n):
    # return 0,1,2,...,n-1
    i = 0
    while i < n:
        yield i
        i += 1

>>> makeRange(5)
<generator object makeRange at 0x19e4aa0>

要强制生成器立即返回其待处理的值,可以将其传递给list()(就像您可以进行任何迭代一样):

>>> list(makeRange(5))
[0, 1, 2, 3, 4]

将示例与“仅返回列表”进行比较

可以将上面的示例视为仅创建一个列表,并将其附加并返回:

# list-version                   #  # generator-version
def makeRange(n):                #  def makeRange(n):
    """return [0,1,2,...,n-1]""" #~     """return 0,1,2,...,n-1"""
    TO_RETURN = []               #>
    i = 0                        #      i = 0
    while i < n:                 #      while i < n:
        TO_RETURN += [i]         #~         yield i
        i += 1                   #          i += 1  ## indented
    return TO_RETURN             #>

>>> makeRange(5)
[0, 1, 2, 3, 4]

但是,有一个主要区别。请参阅最后一节。


您如何使用生成器

可迭代是列表理解的最后一部分,并且所有生成器都是可迭代的,因此经常像这样使用它们:

#                   _ITERABLE_
>>> [x+10 for x in makeRange(5)]
[10, 11, 12, 13, 14]

为了使生成器更好地使用,您可以使用该itertools模块(一定要使用chain.from_iterable而不是chain在保修期内)。例如,您甚至可以使用生成器来实现无限长的惰性列表,例如itertools.count()。您可以实现自己的def enumerate(iterable): zip(count(), iterable),也可以yield在while循环中使用关键字来实现。

请注意:生成器实际上可以用于更多事情,例如实现协程或不确定性编程或其他优雅的事情。但是,我在这里提出的“惰性列表”观点是您会发现的最常见用法。


幕后花絮

这就是“ Python迭代协议”的工作方式。就是说,当你做什么的时候list(makeRange(5))。这就是我之前所说的“懒惰的增量列表”。

>>> x=iter(range(5))
>>> next(x)
0
>>> next(x)
1
>>> next(x)
2
>>> next(x)
3
>>> next(x)
4
>>> next(x)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
StopIteration

内置函数next()仅调用对象.next()函数,它是“迭代协议”的一部分,可以在所有迭代器上找到。您可以手动使用next()函数(以及迭代协议的其他部分)来实现奇特的事情,通常是以牺牲可读性为代价的,因此请避免这样做。


细节

通常,大多数人不会关心以下区别,并且可能想在这里停止阅读。

用Python来说,可迭代对象是“了解for循环的概念”的任何对象,例如列表[1,2,3],而迭代器是所请求的for循环的特定实例,例如[1,2,3].__iter__()。一个生成器是完全一样的任何迭代器,除了它是写(带有功能语法)的方式。

当您从列表中请求迭代器时,它将创建一个新的迭代器。但是,当您从迭代器请求迭代器时(很少这样做),它只会为您提供自身的副本。

因此,在极少数情况下,您可能无法执行此类操作…

> x = myRange(5)
> list(x)
[0, 1, 2, 3, 4]
> list(x)
[]

…然后记住生成器是迭代器 ; 即是一次性使用。如果要重用它,则应myRange(...)再次调用。如果需要两次使用结果,请将结果转换为列表并将其存储在变量中x = list(myRange(5))。那些绝对需要克隆生成器的人(例如,正在可怕地修改程序的人)可以itertools.tee在绝对必要的情况下使用,因为可复制的迭代器Python PEP标准建议已被推迟。

The yield keyword is reduced to two simple facts:

  1. If the compiler detects the yield keyword anywhere inside a function, that function no longer returns via the return statement. Instead, it immediately returns a lazy “pending list” object called a generator
  2. A generator is iterable. What is an iterable? It’s anything like a list or set or range or dict-view, with a built-in protocol for visiting each element in a certain order.

In a nutshell: a generator is a lazy, incrementally-pending list, and yield statements allow you to use function notation to program the list values the generator should incrementally spit out.

generator = myYieldingFunction(...)
x = list(generator)

   generator
       v
[x[0], ..., ???]

         generator
             v
[x[0], x[1], ..., ???]

               generator
                   v
[x[0], x[1], x[2], ..., ???]

                       StopIteration exception
[x[0], x[1], x[2]]     done

list==[x[0], x[1], x[2]]

Example

Let’s define a function makeRange that’s just like Python’s range. Calling makeRange(n) RETURNS A GENERATOR:

def makeRange(n):
    # return 0,1,2,...,n-1
    i = 0
    while i < n:
        yield i
        i += 1

>>> makeRange(5)
<generator object makeRange at 0x19e4aa0>

To force the generator to immediately return its pending values, you can pass it into list() (just like you could any iterable):

>>> list(makeRange(5))
[0, 1, 2, 3, 4]

Comparing example to “just returning a list”

The above example can be thought of as merely creating a list which you append to and return:

# list-version                   #  # generator-version
def makeRange(n):                #  def makeRange(n):
    """return [0,1,2,...,n-1]""" #~     """return 0,1,2,...,n-1"""
    TO_RETURN = []               #>
    i = 0                        #      i = 0
    while i < n:                 #      while i < n:
        TO_RETURN += [i]         #~         yield i
        i += 1                   #          i += 1  ## indented
    return TO_RETURN             #>

>>> makeRange(5)
[0, 1, 2, 3, 4]

There is one major difference, though; see the last section.


How you might use generators

An iterable is the last part of a list comprehension, and all generators are iterable, so they’re often used like so:

#                   _ITERABLE_
>>> [x+10 for x in makeRange(5)]
[10, 11, 12, 13, 14]

To get a better feel for generators, you can play around with the itertools module (be sure to use chain.from_iterable rather than chain when warranted). For example, you might even use generators to implement infinitely-long lazy lists like itertools.count(). You could implement your own def enumerate(iterable): zip(count(), iterable), or alternatively do so with the yield keyword in a while-loop.

Please note: generators can actually be used for many more things, such as implementing coroutines or non-deterministic programming or other elegant things. However, the “lazy lists” viewpoint I present here is the most common use you will find.


Behind the scenes

This is how the “Python iteration protocol” works. That is, what is going on when you do list(makeRange(5)). This is what I describe earlier as a “lazy, incremental list”.

>>> x=iter(range(5))
>>> next(x)
0
>>> next(x)
1
>>> next(x)
2
>>> next(x)
3
>>> next(x)
4
>>> next(x)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
StopIteration

The built-in function next() just calls the objects .next() function, which is a part of the “iteration protocol” and is found on all iterators. You can manually use the next() function (and other parts of the iteration protocol) to implement fancy things, usually at the expense of readability, so try to avoid doing that…


Minutiae

Normally, most people would not care about the following distinctions and probably want to stop reading here.

In Python-speak, an iterable is any object which “understands the concept of a for-loop” like a list [1,2,3], and an iterator is a specific instance of the requested for-loop like [1,2,3].__iter__(). A generator is exactly the same as any iterator, except for the way it was written (with function syntax).

When you request an iterator from a list, it creates a new iterator. However, when you request an iterator from an iterator (which you would rarely do), it just gives you a copy of itself.

Thus, in the unlikely event that you are failing to do something like this…

> x = myRange(5)
> list(x)
[0, 1, 2, 3, 4]
> list(x)
[]

… then remember that a generator is an iterator; that is, it is one-time-use. If you want to reuse it, you should call myRange(...) again. If you need to use the result twice, convert the result to a list and store it in a variable x = list(myRange(5)). Those who absolutely need to clone a generator (for example, who are doing terrifyingly hackish metaprogramming) can use itertools.tee if absolutely necessary, since the copyable iterator Python PEP standards proposal has been deferred.


回答 4

什么是yield关键词在Python呢?

答案大纲/摘要

  • 具有的函数yield在被调用时将返回Generator
  • 生成器是迭代器,因为它们实现了迭代器协议,因此您可以对其进行迭代。
  • 也可以生成器发送信息,使其在概念上成为协程
  • 在Python 3中,您可以使用双向一个生成器委托给另一个生成器yield from
  • (附录对几个答案进行了评论,包括最上面的一个,并讨论了return在生成器中的用法。)

生成器:

yield仅在函数定义内部合法,并且函数定义中包含yield使其返回生成器。

生成器的想法来自具有不同实现方式的其他语言(请参见脚注1)。在Python的Generators中,代码的执行会在收益率点冻结。调用生成器时(下面将讨论方法),恢复执行,然后冻结下一个Yield。

yield提供了一种实现迭代器协议的简便方法,该协议由以下两种方法定义: __iter__next(Python 2)或__next__(Python 3)。这两种方法都使对象成为迭代器,您可以使用模块中的IteratorAbstract Base Class对其进行类型检查collections

>>> def func():
...     yield 'I am'
...     yield 'a generator!'
... 
>>> type(func)                 # A function with yield is still a function
<type 'function'>
>>> gen = func()
>>> type(gen)                  # but it returns a generator
<type 'generator'>
>>> hasattr(gen, '__iter__')   # that's an iterable
True
>>> hasattr(gen, 'next')       # and with .next (.__next__ in Python 3)
True                           # implements the iterator protocol.

生成器类型是迭代器的子类型:

>>> import collections, types
>>> issubclass(types.GeneratorType, collections.Iterator)
True

并且如有必要,我们可以像这样进行类型检查:

>>> isinstance(gen, types.GeneratorType)
True
>>> isinstance(gen, collections.Iterator)
True

的一个功能Iterator 是,一旦用尽,您将无法重复使用或重置它:

>>> list(gen)
['I am', 'a generator!']
>>> list(gen)
[]

如果要再次使用其功能,则必须另做一个(请参见脚注2):

>>> list(func())
['I am', 'a generator!']

一个人可以通过编程方式产生数据,例如:

def func(an_iterable):
    for item in an_iterable:
        yield item

上面的简单生成器也等效于下面的生成器-从Python 3.3开始(在Python 2中不可用),您可以使用yield from

def func(an_iterable):
    yield from an_iterable

但是,yield from还允许委派给子生成器,这将在以下有关使用子协程进行合作委派的部分中进行解释。

协程:

yield 形成一个表达式,该表达式允许将数据发送到生成器中(请参见脚注3)

这是一个示例,请注意该received变量,该变量将指向发送到生成器的数据:

def bank_account(deposited, interest_rate):
    while True:
        calculated_interest = interest_rate * deposited 
        received = yield calculated_interest
        if received:
            deposited += received


>>> my_account = bank_account(1000, .05)

首先,我们必须使内置函数生成器排队next。它将调用适当的next__next__方法,具体取决于您所使用的Python版本:

>>> first_year_interest = next(my_account)
>>> first_year_interest
50.0

现在我们可以将数据发送到生成器中。(发送None与呼叫相同next。):

>>> next_year_interest = my_account.send(first_year_interest + 1000)
>>> next_year_interest
102.5

合作协办小组 yield from

现在,回想一下yield fromPython 3中可用的功能。这使我们可以将协程委托给子协程:

def money_manager(expected_rate):
    under_management = yield     # must receive deposited value
    while True:
        try:
            additional_investment = yield expected_rate * under_management 
            if additional_investment:
                under_management += additional_investment
        except GeneratorExit:
            '''TODO: write function to send unclaimed funds to state'''
        finally:
            '''TODO: write function to mail tax info to client'''


def investment_account(deposited, manager):
    '''very simple model of an investment account that delegates to a manager'''
    next(manager) # must queue up manager
    manager.send(deposited)
    while True:
        try:
            yield from manager
        except GeneratorExit:
            return manager.close()

现在我们可以将功能委派给子生成器,并且生成器可以像上面一样使用它:

>>> my_manager = money_manager(.06)
>>> my_account = investment_account(1000, my_manager)
>>> first_year_return = next(my_account)
>>> first_year_return
60.0
>>> next_year_return = my_account.send(first_year_return + 1000)
>>> next_year_return
123.6

你可以阅读更多的精确语义yield fromPEP 380。

其他方法:关闭并抛出

close方法GeneratorExit在函数执行被冻结的时候引发。这也将由调用,__del__因此您可以将任何清理代码放在处理位置GeneratorExit

>>> my_account.close()

您还可以引发异常,该异常可以在生成器中处理或传播回用户:

>>> import sys
>>> try:
...     raise ValueError
... except:
...     my_manager.throw(*sys.exc_info())
... 
Traceback (most recent call last):
  File "<stdin>", line 4, in <module>
  File "<stdin>", line 2, in <module>
ValueError

结论

我相信我已经涵盖了以下问题的各个方面:

什么是yield关键词在Python呢?

事实证明,这样yield做确实很有帮助。我相信我可以为此添加更详尽的示例。如果您想要更多或有建设性的批评,请在下面评论中告诉我。


附录:

对最佳/可接受答案的评论**

  • 仅以列表为例,它对使可迭代的内容感到困惑。请参阅上面的参考资料,但总而言之:iterable具有__iter__返回iterator的方法。一个迭代器提供了一个.next(Python 2里或.__next__(Python 3的)方法,它是隐式由称为for循环,直到它提出StopIteration,并且一旦这样做,将继续这样做。
  • 然后,它使用生成器表达式来描述什么是生成器。由于生成器只是创建迭代器的一种简便方法,因此它只会使事情变得混乱,而我们仍然没有涉及到这一yield部分。
  • 控制生成器的排气中,他调用了.next方法,而应该使用内置函数next。这将是一个适当的间接层,因为他的代码在Python 3中不起作用。
  • Itertools?这根本与做什么无关yield
  • 没有讨论yieldyield fromPython 3中的新功能一起提供的方法。最高/可接受的答案是非常不完整的答案。

yield生成器表达或理解中提出的答案的评论。

该语法当前允许列表理解中的任何表达式。

expr_stmt: testlist_star_expr (annassign | augassign (yield_expr|testlist) |
                     ('=' (yield_expr|testlist_star_expr))*)
...
yield_expr: 'yield' [yield_arg]
yield_arg: 'from' test | testlist

由于yield是一种表达,因此尽管没有特别好的用例,但有人认为它可以用于理解或生成器表达中。

CPython核心开发人员正在讨论弃用其津贴。这是邮件列表中的相关帖子:

2017年1月30日19:05,布雷特·坎农写道:

2017年1月29日星期日,克雷格·罗德里格斯(Craig Rodrigues)在星期日写道:

两种方法我都可以。恕我直言,把事情留在Python 3中是不好的。

我的投票是SyntaxError,因为您没有从语法中得到期望。

我同意这对我们来说是一个明智的选择,因为依赖当前行为的任何代码确实太聪明了,无法维护。

在到达目的地方面,我们可能需要:

  • 3.7中的语法警告或弃用警告
  • 2.7.x中的Py3k警告
  • 3.8中的SyntaxError

干杯,尼克。

-Nick Coghlan | gmail.com上的ncoghlan | 澳大利亚布里斯班

此外,还有一个悬而未决的问题(10544),似乎正说明这绝不是一个好主意(PyPy,用Python编写的Python实现,已经在发出语法警告。)

最重要的是,直到CPython的开发人员另行告诉我们为止:不要放入yield生成器表达式或理解。

return生成器中的语句

Python 2中

在生成器函数中,该return语句不允许包含expression_list。在这种情况下,裸露return表示生成器已完成并且将引起StopIteration提升。

An expression_list基本上是由逗号分隔的任意数量的表达式-本质上,在Python 2中,您可以使用停止生成器return,但不能返回值。

Python 3中

在生成器函数中,该return语句指示生成器完成并且将引起StopIteration提升。返回的值(如果有)用作构造的参数,StopIteration并成为StopIteration.value属性。

脚注

  1. 提案中引用了CLU,Sather和Icon语言,以将生成器的概念引入Python。总体思路是,一个函数可以维护内部状态并根据用户的需要产生中间数据点。这有望在性能上优于其他方法,包括Python线程,该方法甚至在某些系统上不可用。

  2. 例如,这意味着xrange对象(range在Python 3中)不是Iterator,即使它们是可迭代的,因为它们可以被重用。像列表一样,它们的__iter__方法返回迭代器对象。

  3. yield最初是作为语句引入的,这意味着它只能出现在代码块的一行的开头。现在yield创建一个yield表达式。 https://docs.python.org/2/reference/simple_stmts.html#grammar-token-yield_stmt 提出 此更改是为了允许用户将数据发送到生成器中,就像接收数据一样。要发送数据,必须能够将其分配给某物,为此,一条语句就行不通了。

What does the yield keyword do in Python?

Answer Outline/Summary

  • A function with yield, when called, returns a Generator.
  • Generators are iterators because they implement the iterator protocol, so you can iterate over them.
  • A generator can also be sent information, making it conceptually a coroutine.
  • In Python 3, you can delegate from one generator to another in both directions with yield from.
  • (Appendix critiques a couple of answers, including the top one, and discusses the use of return in a generator.)

Generators:

yield is only legal inside of a function definition, and the inclusion of yield in a function definition makes it return a generator.

The idea for generators comes from other languages (see footnote 1) with varying implementations. In Python’s Generators, the execution of the code is frozen at the point of the yield. When the generator is called (methods are discussed below) execution resumes and then freezes at the next yield.

yield provides an easy way of implementing the iterator protocol, defined by the following two methods: __iter__ and next (Python 2) or __next__ (Python 3). Both of those methods make an object an iterator that you could type-check with the Iterator Abstract Base Class from the collections module.

>>> def func():
...     yield 'I am'
...     yield 'a generator!'
... 
>>> type(func)                 # A function with yield is still a function
<type 'function'>
>>> gen = func()
>>> type(gen)                  # but it returns a generator
<type 'generator'>
>>> hasattr(gen, '__iter__')   # that's an iterable
True
>>> hasattr(gen, 'next')       # and with .next (.__next__ in Python 3)
True                           # implements the iterator protocol.

The generator type is a sub-type of iterator:

>>> import collections, types
>>> issubclass(types.GeneratorType, collections.Iterator)
True

And if necessary, we can type-check like this:

>>> isinstance(gen, types.GeneratorType)
True
>>> isinstance(gen, collections.Iterator)
True

A feature of an Iterator is that once exhausted, you can’t reuse or reset it:

>>> list(gen)
['I am', 'a generator!']
>>> list(gen)
[]

You’ll have to make another if you want to use its functionality again (see footnote 2):

>>> list(func())
['I am', 'a generator!']

One can yield data programmatically, for example:

def func(an_iterable):
    for item in an_iterable:
        yield item

The above simple generator is also equivalent to the below – as of Python 3.3 (and not available in Python 2), you can use yield from:

def func(an_iterable):
    yield from an_iterable

However, yield from also allows for delegation to subgenerators, which will be explained in the following section on cooperative delegation with sub-coroutines.

Coroutines:

yield forms an expression that allows data to be sent into the generator (see footnote 3)

Here is an example, take note of the received variable, which will point to the data that is sent to the generator:

def bank_account(deposited, interest_rate):
    while True:
        calculated_interest = interest_rate * deposited 
        received = yield calculated_interest
        if received:
            deposited += received


>>> my_account = bank_account(1000, .05)

First, we must queue up the generator with the builtin function, next. It will call the appropriate next or __next__ method, depending on the version of Python you are using:

>>> first_year_interest = next(my_account)
>>> first_year_interest
50.0

And now we can send data into the generator. (Sending None is the same as calling next.) :

>>> next_year_interest = my_account.send(first_year_interest + 1000)
>>> next_year_interest
102.5

Cooperative Delegation to Sub-Coroutine with yield from

Now, recall that yield from is available in Python 3. This allows us to delegate coroutines to a subcoroutine:

def money_manager(expected_rate):
    under_management = yield     # must receive deposited value
    while True:
        try:
            additional_investment = yield expected_rate * under_management 
            if additional_investment:
                under_management += additional_investment
        except GeneratorExit:
            '''TODO: write function to send unclaimed funds to state'''
        finally:
            '''TODO: write function to mail tax info to client'''


def investment_account(deposited, manager):
    '''very simple model of an investment account that delegates to a manager'''
    next(manager) # must queue up manager
    manager.send(deposited)
    while True:
        try:
            yield from manager
        except GeneratorExit:
            return manager.close()

And now we can delegate functionality to a sub-generator and it can be used by a generator just as above:

>>> my_manager = money_manager(.06)
>>> my_account = investment_account(1000, my_manager)
>>> first_year_return = next(my_account)
>>> first_year_return
60.0
>>> next_year_return = my_account.send(first_year_return + 1000)
>>> next_year_return
123.6

You can read more about the precise semantics of yield from in PEP 380.

Other Methods: close and throw

The close method raises GeneratorExit at the point the function execution was frozen. This will also be called by __del__ so you can put any cleanup code where you handle the GeneratorExit:

>>> my_account.close()

You can also throw an exception which can be handled in the generator or propagated back to the user:

>>> import sys
>>> try:
...     raise ValueError
... except:
...     my_manager.throw(*sys.exc_info())
... 
Traceback (most recent call last):
  File "<stdin>", line 4, in <module>
  File "<stdin>", line 2, in <module>
ValueError

Conclusion

I believe I have covered all aspects of the following question:

What does the yield keyword do in Python?

It turns out that yield does a lot. I’m sure I could add even more thorough examples to this. If you want more or have some constructive criticism, let me know by commenting below.


Appendix:

Critique of the Top/Accepted Answer**

  • It is confused on what makes an iterable, just using a list as an example. See my references above, but in summary: an iterable has an __iter__ method returning an iterator. An iterator provides a .next (Python 2 or .__next__ (Python 3) method, which is implicitly called by for loops until it raises StopIteration, and once it does, it will continue to do so.
  • It then uses a generator expression to describe what a generator is. Since a generator is simply a convenient way to create an iterator, it only confuses the matter, and we still have not yet gotten to the yield part.
  • In Controlling a generator exhaustion he calls the .next method, when instead he should use the builtin function, next. It would be an appropriate layer of indirection, because his code does not work in Python 3.
  • Itertools? This was not relevant to what yield does at all.
  • No discussion of the methods that yield provides along with the new functionality yield from in Python 3. The top/accepted answer is a very incomplete answer.

Critique of answer suggesting yield in a generator expression or comprehension.

The grammar currently allows any expression in a list comprehension.

expr_stmt: testlist_star_expr (annassign | augassign (yield_expr|testlist) |
                     ('=' (yield_expr|testlist_star_expr))*)
...
yield_expr: 'yield' [yield_arg]
yield_arg: 'from' test | testlist

Since yield is an expression, it has been touted by some as interesting to use it in comprehensions or generator expression – in spite of citing no particularly good use-case.

The CPython core developers are discussing deprecating its allowance. Here’s a relevant post from the mailing list:

On 30 January 2017 at 19:05, Brett Cannon wrote:

On Sun, 29 Jan 2017 at 16:39 Craig Rodrigues wrote:

I’m OK with either approach. Leaving things the way they are in Python 3 is no good, IMHO.

My vote is it be a SyntaxError since you’re not getting what you expect from the syntax.

I’d agree that’s a sensible place for us to end up, as any code relying on the current behaviour is really too clever to be maintainable.

In terms of getting there, we’ll likely want:

  • SyntaxWarning or DeprecationWarning in 3.7
  • Py3k warning in 2.7.x
  • SyntaxError in 3.8

Cheers, Nick.

— Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia

Further, there is an outstanding issue (10544) which seems to be pointing in the direction of this never being a good idea (PyPy, a Python implementation written in Python, is already raising syntax warnings.)

Bottom line, until the developers of CPython tell us otherwise: Don’t put yield in a generator expression or comprehension.

The return statement in a generator

In Python 2:

In a generator function, the return statement is not allowed to include an expression_list. In that context, a bare return indicates that the generator is done and will cause StopIteration to be raised.

An expression_list is basically any number of expressions separated by commas – essentially, in Python 2, you can stop the generator with return, but you can’t return a value.

In Python 3:

In a generator function, the return statement indicates that the generator is done and will cause StopIteration to be raised. The returned value (if any) is used as an argument to construct StopIteration and becomes the StopIteration.value attribute.

Footnotes

  1. The languages CLU, Sather, and Icon were referenced in the proposal to introduce the concept of generators to Python. The general idea is that a function can maintain internal state and yield intermediate data points on demand by the user. This promised to be superior in performance to other approaches, including Python threading, which isn’t even available on some systems.

  2. This means, for example, that xrange objects (range in Python 3) aren’t Iterators, even though they are iterable, because they can be reused. Like lists, their __iter__ methods return iterator objects.

  3. yield was originally introduced as a statement, meaning that it could only appear at the beginning of a line in a code block. Now yield creates a yield expression. https://docs.python.org/2/reference/simple_stmts.html#grammar-token-yield_stmt This change was proposed to allow a user to send data into the generator just as one might receive it. To send data, one must be able to assign it to something, and for that, a statement just won’t work.


回答 5

yield就像return-它返回您告诉的内容(作为生成器)。不同之处在于,下一次您调用生成器时,执行将从上一次对yield语句的调用开始。与return不同的是,在产生良率时不会清除堆栈帧,但是会将控制权转移回调用方,因此下次调用该函数时,其状态将恢复。

就您的代码而言,该函数get_child_candidates的作用就像一个迭代器,以便在扩展列表时,它一次将一个元素添加到新列表中。

list.extend调用迭代器,直到耗尽为止。在您发布的代码示例的情况下,只返回一个元组并将其添加到列表中会更加清楚。

yield is just like return – it returns whatever you tell it to (as a generator). The difference is that the next time you call the generator, execution starts from the last call to the yield statement. Unlike return, the stack frame is not cleaned up when a yield occurs, however control is transferred back to the caller, so its state will resume the next time the function is called.

In the case of your code, the function get_child_candidates is acting like an iterator so that when you extend your list, it adds one element at a time to the new list.

list.extend calls an iterator until it’s exhausted. In the case of the code sample you posted, it would be much clearer to just return a tuple and append that to the list.


回答 6

还有另外一件事要提及:yield的函数实际上不必终止。我写了这样的代码:

def fib():
    last, cur = 0, 1
    while True: 
        yield cur
        last, cur = cur, last + cur

然后我可以在其他代码中使用它:

for f in fib():
    if some_condition: break
    coolfuncs(f);

它确实有助于简化某些问题,并使某些事情更易于使用。

There’s one extra thing to mention: a function that yields doesn’t actually have to terminate. I’ve written code like this:

def fib():
    last, cur = 0, 1
    while True: 
        yield cur
        last, cur = cur, last + cur

Then I can use it in other code like this:

for f in fib():
    if some_condition: break
    coolfuncs(f);

It really helps simplify some problems, and makes some things easier to work with.


回答 7

对于那些偏爱简单示例的人,请在此交互式Python会话中进行冥想:

>>> def f():
...   yield 1
...   yield 2
...   yield 3
... 
>>> g = f()
>>> for i in g:
...   print(i)
... 
1
2
3
>>> for i in g:
...   print(i)
... 
>>> # Note that this time nothing was printed

For those who prefer a minimal working example, meditate on this interactive Python session:

>>> def f():
...   yield 1
...   yield 2
...   yield 3
... 
>>> g = f()
>>> for i in g:
...   print(i)
... 
1
2
3
>>> for i in g:
...   print(i)
... 
>>> # Note that this time nothing was printed

回答 8

TL; DR

代替这个:

def square_list(n):
    the_list = []                         # Replace
    for x in range(n):
        y = x * x
        the_list.append(y)                # these
    return the_list                       # lines

做这个:

def square_yield(n):
    for x in range(n):
        y = x * x
        yield y                           # with this one.

每当您发现自己从头开始建立清单时,就yield逐一列出。

这是我第一次屈服。


yield是一种含蓄的说法

建立一系列的东西

相同的行为:

>>> for square in square_list(4):
...     print(square)
...
0
1
4
9
>>> for square in square_yield(4):
...     print(square)
...
0
1
4
9

不同的行为:

收益是单次通过:您只能迭代一次。当一个函数包含一个yield时,我们称其为Generator函数。还有一个迭代器就是它返回的内容。这些术语在揭示。我们失去了容器的便利性,但获得了按需计算且任意长的序列的功效。

Yield懒惰,推迟了计算。当您调用函数时,其中包含yield的函数实际上根本不会执行。它返回一个迭代器对象,该对象记住它从何处中断。每次您调用next()迭代器(这在for循环中发生)时,执行都会向前推进到下一个收益。return引发StopIteration并结束序列(这是for循环的自然结束)。

Yield多才多艺。数据不必全部存储在一起,可以一次存储一次。它可以是无限的。

>>> def squares_all_of_them():
...     x = 0
...     while True:
...         yield x * x
...         x += 1
...
>>> squares = squares_all_of_them()
>>> for _ in range(4):
...     print(next(squares))
...
0
1
4
9

如果您需要多次通过,而系列又不太长,只需调用list()它:

>>> list(square_yield(4))
[0, 1, 4, 9]

单词的出色选择,yield因为两种含义都适用:

Yield —生产或提供(如在农业中)

…提供系列中的下一个数据。

屈服 —让步或放弃(如在政治权力中一样)

…放弃CPU执行,直到迭代器前进。

TL;DR

Instead of this:

def square_list(n):
    the_list = []                         # Replace
    for x in range(n):
        y = x * x
        the_list.append(y)                # these
    return the_list                       # lines

do this:

def square_yield(n):
    for x in range(n):
        y = x * x
        yield y                           # with this one.

Whenever you find yourself building a list from scratch, yield each piece instead.

This was my first “aha” moment with yield.


yield is a sugary way to say

build a series of stuff

Same behavior:

>>> for square in square_list(4):
...     print(square)
...
0
1
4
9
>>> for square in square_yield(4):
...     print(square)
...
0
1
4
9

Different behavior:

Yield is single-pass: you can only iterate through once. When a function has a yield in it we call it a generator function. And an iterator is what it returns. Those terms are revealing. We lose the convenience of a container, but gain the power of a series that’s computed as needed, and arbitrarily long.

Yield is lazy, it puts off computation. A function with a yield in it doesn’t actually execute at all when you call it. It returns an iterator object that remembers where it left off. Each time you call next() on the iterator (this happens in a for-loop) execution inches forward to the next yield. return raises StopIteration and ends the series (this is the natural end of a for-loop).

Yield is versatile. Data doesn’t have to be stored all together, it can be made available one at a time. It can be infinite.

>>> def squares_all_of_them():
...     x = 0
...     while True:
...         yield x * x
...         x += 1
...
>>> squares = squares_all_of_them()
>>> for _ in range(4):
...     print(next(squares))
...
0
1
4
9

If you need multiple passes and the series isn’t too long, just call list() on it:

>>> list(square_yield(4))
[0, 1, 4, 9]

Brilliant choice of the word yield because both meanings apply:

yield — produce or provide (as in agriculture)

…provide the next data in the series.

yield — give way or relinquish (as in political power)

…relinquish CPU execution until the iterator advances.


回答 9

Yield可以为您提供生成器。

def get_odd_numbers(i):
    return range(1, i, 2)
def yield_odd_numbers(i):
    for x in range(1, i, 2):
       yield x
foo = get_odd_numbers(10)
bar = yield_odd_numbers(10)
foo
[1, 3, 5, 7, 9]
bar
<generator object yield_odd_numbers at 0x1029c6f50>
bar.next()
1
bar.next()
3
bar.next()
5

如您所见,在第一种情况下,foo将整个列表立即保存在内存中。对于包含5个元素的列表来说,这不是什么大问题,但是如果您想要500万个列表,该怎么办?这不仅是一个巨大的内存消耗者,而且在调用该函数时还花费大量时间来构建。

在第二种情况下,bar只需为您提供一个生成器。生成器是可迭代的-这意味着您可以在for循环等中使用它,但是每个值只能被访问一次。所有的值也不会同时存储在存储器中。生成器对象“记住”您上次调用它时在循环中的位置-这样,如果您使用的是一个迭代的(例如)计数为500亿,则不必计数为500亿立即存储500亿个数字以进行计算。

再次,这是一个非常人为的示例,如果您真的想计数到500亿,则可能会使用itertools。:)

这是生成器最简单的用例。如您所说,它可以用来编写有效的排列,使用yield可以将内容推入调用堆栈,而不是使用某种堆栈变量。生成器还可以用于特殊的树遍历以及所有其他方式。

Yield gives you a generator.

def get_odd_numbers(i):
    return range(1, i, 2)
def yield_odd_numbers(i):
    for x in range(1, i, 2):
       yield x
foo = get_odd_numbers(10)
bar = yield_odd_numbers(10)
foo
[1, 3, 5, 7, 9]
bar
<generator object yield_odd_numbers at 0x1029c6f50>
bar.next()
1
bar.next()
3
bar.next()
5

As you can see, in the first case foo holds the entire list in memory at once. It’s not a big deal for a list with 5 elements, but what if you want a list of 5 million? Not only is this a huge memory eater, it also costs a lot of time to build at the time that the function is called.

In the second case, bar just gives you a generator. A generator is an iterable–which means you can use it in a for loop, etc, but each value can only be accessed once. All the values are also not stored in memory at the same time; the generator object “remembers” where it was in the looping the last time you called it–this way, if you’re using an iterable to (say) count to 50 billion, you don’t have to count to 50 billion all at once and store the 50 billion numbers to count through.

Again, this is a pretty contrived example, you probably would use itertools if you really wanted to count to 50 billion. :)

This is the most simple use case of generators. As you said, it can be used to write efficient permutations, using yield to push things up through the call stack instead of using some sort of stack variable. Generators can also be used for specialized tree traversal, and all manner of other things.


回答 10

它正在返回生成器。我对Python并不是特别熟悉,但是如果您熟悉C#的迭代器块,我相信它与C#的迭代器块一样

关键思想是,编译器/解释器/无论做什么都做一些技巧,以便就调用者而言,他们可以继续调用next(),并且将继续返回值- 就像Generator方法已暂停一样。现在显然您不能真正地“暂停”方法,因此编译器构建了一个状态机,供您记住您当前所​​在的位置以及局部变量等的外观。这比自己编写迭代器要容易得多。

It’s returning a generator. I’m not particularly familiar with Python, but I believe it’s the same kind of thing as C#’s iterator blocks if you’re familiar with those.

The key idea is that the compiler/interpreter/whatever does some trickery so that as far as the caller is concerned, they can keep calling next() and it will keep returning values – as if the generator method was paused. Now obviously you can’t really “pause” a method, so the compiler builds a state machine for you to remember where you currently are and what the local variables etc look like. This is much easier than writing an iterator yourself.


回答 11

在描述如何使用生成器的许多很棒的答案中,我还没有给出一种答案。这是编程语言理论的答案:

yieldPython中的语句返回一个生成器。Python中的生成器是一个返回延续的函数(特别是协程类型,但是延续代表了一种更通用的机制来了解正在发生的事情)。

编程语言理论中的连续性是一种更为基础的计算,但是由于它们很难推理而且也很难实现,因此并不经常使用。但是,关于延续是什么的想法很简单:只是尚未完成的计算状态。在此状态下,将保存变量的当前值,尚未执行的操作等。然后,在稍后的某个时刻,可以在程序中调用继续,以便将程序的变量重置为该状态,并执行保存的操作。

以这种更一般的形式进行的延续可以两种方式实现。在call/cc方式,程序的堆栈字面上保存,然后调用延续时,堆栈恢复。

在延续传递样式(CPS)中,延续只是普通的函数(仅在函数是第一类的语言中),程序员明确地对其进行管理并传递给子例程。以这种方式,程序状态由闭包(以及恰好在其中编码的变量)表示,而不是驻留在堆栈中某个位置的变量。管理控制流的函数接受连续作为参数(在CPS的某些变体中,函数可以接受多个连续),并通过简单地调用它们并随后返回来调用它们来操纵控制流。延续传递样式的一个非常简单的示例如下:

def save_file(filename):
  def write_file_continuation():
    write_stuff_to_file(filename)

  check_if_file_exists_and_user_wants_to_overwrite(write_file_continuation)

在这个(非常简单的)示例中,程序员保存了将文件实际写入连续的操作(该操作可能是非常复杂的操作,需要写出许多细节),然后传递该连续(例如,首先类闭包)给另一个进行更多处理的运算符,然后在必要时调用它。(我在实际的GUI编程中经常使用这种设计模式,这是因为它节省了我的代码行,或更重要的是,在GUI事件触发后管理了控制流。)

在不失一般性的前提下,本文的其余部分将连续性概念化为CPS,因为它很容易理解和阅读。


现在让我们谈谈Python中的生成器。生成器是延续的特定子类型。而延续能够在一般的保存状态计算(即程序调用堆栈),生成器只能保存迭代的状态经过一个迭代器。虽然,对于生成器的某些用例,此定义有些误导。例如:

def f():
  while True:
    yield 4

显然,这是一个合理的迭代器,其行为已得到很好的定义-每次生成器对其进行迭代时,它都会返回4(并永远这样做)。但是,在考虑迭代器(即for x in collection: do_something(x))时,可能并没有想到可迭代的原型类型。此示例说明了生成器的功能:如果有什么是迭代器,生成器可以保存其迭代状态。

重申一下:连续可以保存程序堆栈的状态,而生成器可以保存迭代的状态。这意味着延续比生成器强大得多,但是生成器也非常简单。它们对于语言设计者来说更容易实现,对程序员来说也更容易使用(如果您有时间要燃烧,请尝试阅读并理解有关延续和call / cc的本页)。

但是您可以轻松地将生成器实现(并概念化)为连续传递样式的一种简单的特定情况:

每当yield调用时,它告诉函数返回一个延续。再次调用该函数时,将从中断处开始。因此,在伪伪代码(即不是伪代码,而不是代码)中,生成器的next方法基本上如下:

class Generator():
  def __init__(self,iterable,generatorfun):
    self.next_continuation = lambda:generatorfun(iterable)

  def next(self):
    value, next_continuation = self.next_continuation()
    self.next_continuation = next_continuation
    return value

其中,yield关键字实际上是真正的生成器功能语法糖,基本上是这样的:

def generatorfun(iterable):
  if len(iterable) == 0:
    raise StopIteration
  else:
    return (iterable[0], lambda:generatorfun(iterable[1:]))

请记住,这只是伪代码,Python中生成器的实际实现更为复杂。但是,作为练习以了解发生了什么,请尝试使用连续传递样式来实现生成器对象,而不使用yield关键字。

There is one type of answer that I don’t feel has been given yet, among the many great answers that describe how to use generators. Here is the programming language theory answer:

The yield statement in Python returns a generator. A generator in Python is a function that returns continuations (and specifically a type of coroutine, but continuations represent the more general mechanism to understand what is going on).

Continuations in programming languages theory are a much more fundamental kind of computation, but they are not often used, because they are extremely hard to reason about and also very difficult to implement. But the idea of what a continuation is, is straightforward: it is the state of a computation that has not yet finished. In this state, the current values of variables, the operations that have yet to be performed, and so on, are saved. Then at some point later in the program the continuation can be invoked, such that the program’s variables are reset to that state and the operations that were saved are carried out.

Continuations, in this more general form, can be implemented in two ways. In the call/cc way, the program’s stack is literally saved and then when the continuation is invoked, the stack is restored.

In continuation passing style (CPS), continuations are just normal functions (only in languages where functions are first class) which the programmer explicitly manages and passes around to subroutines. In this style, program state is represented by closures (and the variables that happen to be encoded in them) rather than variables that reside somewhere on the stack. Functions that manage control flow accept continuation as arguments (in some variations of CPS, functions may accept multiple continuations) and manipulate control flow by invoking them by simply calling them and returning afterwards. A very simple example of continuation passing style is as follows:

def save_file(filename):
  def write_file_continuation():
    write_stuff_to_file(filename)

  check_if_file_exists_and_user_wants_to_overwrite(write_file_continuation)

In this (very simplistic) example, the programmer saves the operation of actually writing the file into a continuation (which can potentially be a very complex operation with many details to write out), and then passes that continuation (i.e, as a first-class closure) to another operator which does some more processing, and then calls it if necessary. (I use this design pattern a lot in actual GUI programming, either because it saves me lines of code or, more importantly, to manage control flow after GUI events trigger.)

The rest of this post will, without loss of generality, conceptualize continuations as CPS, because it is a hell of a lot easier to understand and read.


Now let’s talk about generators in Python. Generators are a specific subtype of continuation. Whereas continuations are able in general to save the state of a computation (i.e., the program’s call stack), generators are only able to save the state of iteration over an iterator. Although, this definition is slightly misleading for certain use cases of generators. For instance:

def f():
  while True:
    yield 4

This is clearly a reasonable iterable whose behavior is well defined — each time the generator iterates over it, it returns 4 (and does so forever). But it isn’t probably the prototypical type of iterable that comes to mind when thinking of iterators (i.e., for x in collection: do_something(x)). This example illustrates the power of generators: if anything is an iterator, a generator can save the state of its iteration.

To reiterate: Continuations can save the state of a program’s stack and generators can save the state of iteration. This means that continuations are more a lot powerful than generators, but also that generators are a lot, lot easier. They are easier for the language designer to implement, and they are easier for the programmer to use (if you have some time to burn, try to read and understand this page about continuations and call/cc).

But you could easily implement (and conceptualize) generators as a simple, specific case of continuation passing style:

Whenever yield is called, it tells the function to return a continuation. When the function is called again, it starts from wherever it left off. So, in pseudo-pseudocode (i.e., not pseudocode, but not code) the generator’s next method is basically as follows:

class Generator():
  def __init__(self,iterable,generatorfun):
    self.next_continuation = lambda:generatorfun(iterable)

  def next(self):
    value, next_continuation = self.next_continuation()
    self.next_continuation = next_continuation
    return value

where the yield keyword is actually syntactic sugar for the real generator function, basically something like:

def generatorfun(iterable):
  if len(iterable) == 0:
    raise StopIteration
  else:
    return (iterable[0], lambda:generatorfun(iterable[1:]))

Remember that this is just pseudocode and the actual implementation of generators in Python is more complex. But as an exercise to understand what is going on, try to use continuation passing style to implement generator objects without use of the yield keyword.


回答 12

这是简单语言的示例。我将提供高级人类概念与低级Python概念之间的对应关系。

我想对数字序列进行运算,但是我不想为创建该序列而烦恼自己,我只想着重于自己想做的运算。因此,我执行以下操作:

  • 我打电话给你,告诉你我想要一个以特定方式产生的数字序列,让您知道算法是什么。
    此步骤对应于def生成器函数,即包含a的函数yield
  • 稍后,我告诉您,“好,准备告诉我数字的顺序”。
    此步骤对应于调用生成器函数,该函数返回生成器对象。请注意,您还没有告诉我任何数字。你只要拿起纸和铅笔。
  • 我问你,“告诉我下一个号码”,然后你告诉我第一个号码;之后,您等我问您下一个电话号码。记住您的位置,已经说过的电话号码以及下一个电话号码是您的工作。我不在乎细节。
    此步骤对应于调用.next()生成器对象。
  • …重复上一步,直到…
  • 最终,您可能会走到尽头。你不告诉我电话号码;您只是大声喊道:“抱马!我做完了!没有数字了!”
    此步骤对应于生成器对象结束其工作并引发StopIteration异常。生成器函数不需要引发异常。函数结束或发出时,它将自动引发return

这就是生成器的功能(包含的函数yield);它开始执行,在执行时暂停yield,并在要求输入.next()值时从上一个点继续执行。根据设计,它与Python的迭代器协议完美契合,该协议描述了如何顺序请求值。

迭代器协议最著名的用户是forPython中的命令。因此,无论何时执行以下操作:

for item in sequence:

不管sequence是列表,字符串,字典还是如上所述的生成器对象,都没有关系;结果是相同的:您从一个序列中逐个读取项目。

注意,def包含一个yield关键字的函数并不是创建生成器的唯一方法;这是创建一个的最简单的方法。

有关更准确的信息,请阅读Python文档中有关迭代器类型yield语句生成器的信息。

Here is an example in plain language. I will provide a correspondence between high-level human concepts to low-level Python concepts.

I want to operate on a sequence of numbers, but I don’t want to bother my self with the creation of that sequence, I want only to focus on the operation I want to do. So, I do the following:

  • I call you and tell you that I want a sequence of numbers which is produced in a specific way, and I let you know what the algorithm is.
    This step corresponds to defining the generator function, i.e. the function containing a yield.
  • Sometime later, I tell you, “OK, get ready to tell me the sequence of numbers”.
    This step corresponds to calling the generator function which returns a generator object. Note that you don’t tell me any numbers yet; you just grab your paper and pencil.
  • I ask you, “tell me the next number”, and you tell me the first number; after that, you wait for me to ask you for the next number. It’s your job to remember where you were, what numbers you have already said, and what is the next number. I don’t care about the details.
    This step corresponds to calling .next() on the generator object.
  • … repeat previous step, until…
  • eventually, you might come to an end. You don’t tell me a number; you just shout, “hold your horses! I’m done! No more numbers!”
    This step corresponds to the generator object ending its job, and raising a StopIteration exception The generator function does not need to raise the exception. It’s raised automatically when the function ends or issues a return.

This is what a generator does (a function that contains a yield); it starts executing, pauses whenever it does a yield, and when asked for a .next() value it continues from the point it was last. It fits perfectly by design with the iterator protocol of Python, which describes how to sequentially request values.

The most famous user of the iterator protocol is the for command in Python. So, whenever you do a:

for item in sequence:

it doesn’t matter if sequence is a list, a string, a dictionary or a generator object like described above; the result is the same: you read items off a sequence one by one.

Note that defining a function which contains a yield keyword is not the only way to create a generator; it’s just the easiest way to create one.

For more accurate information, read about iterator types, the yield statement and generators in the Python documentation.


回答 13

尽管有许多答案说明了为什么要使用a yield来生成生成器,但是的使用更多了yield。创建协程非常容易,这使信息可以在两个代码块之间传递。我不会重复任何有关使用yield生成器的优秀示例。

为了帮助理解yield以下代码中的功能,您可以用手指在带有的任何代码中跟踪循环yield。每次手指触摸时yield,您都必须等待输入a next或a send。当next被调用时,您通过跟踪代码,直到你打yield…上的右边的代码yield进行评估,并返回给调用者…那你就等着。当next再次被调用时,您将在代码中执行另一个循环。但是,您会注意到,在协程中,yield也可以与send… 一起使用,它将从调用方将值发送 yielding函数。如果send给出a,则yield接收到发送的值,然后将其吐到左侧…然后遍历代码,直到您yield再次单击为止(返回值,就像next被调用一样)。

例如:

>>> def coroutine():
...     i = -1
...     while True:
...         i += 1
...         val = (yield i)
...         print("Received %s" % val)
...
>>> sequence = coroutine()
>>> sequence.next()
0
>>> sequence.next()
Received None
1
>>> sequence.send('hello')
Received hello
2
>>> sequence.close()

While a lot of answers show why you’d use a yield to create a generator, there are more uses for yield. It’s quite easy to make a coroutine, which enables the passing of information between two blocks of code. I won’t repeat any of the fine examples that have already been given about using yield to create a generator.

To help understand what a yield does in the following code, you can use your finger to trace the cycle through any code that has a yield. Every time your finger hits the yield, you have to wait for a next or a send to be entered. When a next is called, you trace through the code until you hit the yield… the code on the right of the yield is evaluated and returned to the caller… then you wait. When next is called again, you perform another loop through the code. However, you’ll note that in a coroutine, yield can also be used with a send… which will send a value from the caller into the yielding function. If a send is given, then yield receives the value sent, and spits it out the left hand side… then the trace through the code progresses until you hit the yield again (returning the value at the end, as if next was called).

For example:

>>> def coroutine():
...     i = -1
...     while True:
...         i += 1
...         val = (yield i)
...         print("Received %s" % val)
...
>>> sequence = coroutine()
>>> sequence.next()
0
>>> sequence.next()
Received None
1
>>> sequence.send('hello')
Received hello
2
>>> sequence.close()

回答 14

还有另一个yield用途和含义(自Python 3.3起):

yield from <expr>

PEP 380-委托给子生成器的语法

提出了一种语法,供生成器将其部分操作委托给另一生成器。这允许包含“ yield”的一段代码被分解出来并放置在另一个生成器中。此外,允许子生成器返回一个值,并且该值可用于委派生成器。

当一个生成器重新产生由另一个生成器生成的值时,新语法还为优化提供了一些机会。

此外,将引入(自Python 3.5起):

async def new_coroutine(data):
   ...
   await blocking_action()

为了避免将协程与常规生成器混淆(今天yield在两者中都使用)。

There is another yield use and meaning (since Python 3.3):

yield from <expr>

From PEP 380 — Syntax for Delegating to a Subgenerator:

A syntax is proposed for a generator to delegate part of its operations to another generator. This allows a section of code containing ‘yield’ to be factored out and placed in another generator. Additionally, the subgenerator is allowed to return with a value, and the value is made available to the delegating generator.

The new syntax also opens up some opportunities for optimisation when one generator re-yields values produced by another.

Moreover this will introduce (since Python 3.5):

async def new_coroutine(data):
   ...
   await blocking_action()

to avoid coroutines being confused with a regular generator (today yield is used in both).


回答 15

所有好的答案,但是对于新手来说有点困难。

我认为您已经了解了该return声明。

作为一个比喻,returnyield是一对双胞胎。return表示“返回并停止”,而“收益”则表示“返回但继续”

  1. 尝试使用获取num_list return
def num_list(n):
    for i in range(n):
        return i

运行:

In [5]: num_list(3)
Out[5]: 0

看,您只会得到一个数字,而不是列表。return永远不要让你高高兴兴,只实现一次就退出。

  1. 来了 yield

替换returnyield

In [10]: def num_list(n):
    ...:     for i in range(n):
    ...:         yield i
    ...:

In [11]: num_list(3)
Out[11]: <generator object num_list at 0x10327c990>

In [12]: list(num_list(3))
Out[12]: [0, 1, 2]

现在,您将赢得所有数字。

与计划return一次运行和停止yield运行的时间进行比较。你可以理解returnreturn one of them,和yield作为return all of them。这称为iterable

  1. 我们可以yield使用以下步骤重写语句return
In [15]: def num_list(n):
    ...:     result = []
    ...:     for i in range(n):
    ...:         result.append(i)
    ...:     return result

In [16]: num_list(3)
Out[16]: [0, 1, 2]

这是关于 yield

列表return输出和对象之间的区别yield输出是:

您将始终从列表对象获取[0,1,2],但只能从“对象yield输出”中检索一次。因此,它具有一个新的名称generator对象,如Out[11]: <generator object num_list at 0x10327c990>

总之,作为一个隐喻,它可以:

  • return并且yield是双胞胎
  • list并且generator是双胞胎

All great answers, however a bit difficult for newbies.

I assume you have learned the return statement.

As an analogy, return and yield are twins. return means ‘return and stop’ whereas ‘yield` means ‘return, but continue’

  1. Try to get a num_list with return.
def num_list(n):
    for i in range(n):
        return i

Run it:

In [5]: num_list(3)
Out[5]: 0

See, you get only a single number rather than a list of them. return never allows you prevail happily, just implements once and quit.

  1. There comes yield

Replace return with yield:

In [10]: def num_list(n):
    ...:     for i in range(n):
    ...:         yield i
    ...:

In [11]: num_list(3)
Out[11]: <generator object num_list at 0x10327c990>

In [12]: list(num_list(3))
Out[12]: [0, 1, 2]

Now, you win to get all the numbers.

Comparing to return which runs once and stops, yield runs times you planed. You can interpret return as return one of them, and yield as return all of them. This is called iterable.

  1. One more step we can rewrite yield statement with return
In [15]: def num_list(n):
    ...:     result = []
    ...:     for i in range(n):
    ...:         result.append(i)
    ...:     return result

In [16]: num_list(3)
Out[16]: [0, 1, 2]

It’s the core about yield.

The difference between a list return outputs and the object yield output is:

You will always get [0, 1, 2] from a list object but only could retrieve them from ‘the object yield output’ once. So, it has a new name generator object as displayed in Out[11]: <generator object num_list at 0x10327c990>.

In conclusion, as a metaphor to grok it:

  • return and yield are twins
  • list and generator are twins

回答 16

以下是一些Python示例,这些示例说明如何实际实现生成器,就像Python没有为其提供语法糖一样:

作为Python生成器:

from itertools import islice

def fib_gen():
    a, b = 1, 1
    while True:
        yield a
        a, b = b, a + b

assert [1, 1, 2, 3, 5] == list(islice(fib_gen(), 5))

使用词法闭包而不是生成器

def ftake(fnext, last):
    return [fnext() for _ in xrange(last)]

def fib_gen2():
    #funky scope due to python2.x workaround
    #for python 3.x use nonlocal
    def _():
        _.a, _.b = _.b, _.a + _.b
        return _.a
    _.a, _.b = 0, 1
    return _

assert [1,1,2,3,5] == ftake(fib_gen2(), 5)

使用对象闭包而不是生成器(因为ClosuresAndObjectsAreEquivalent

class fib_gen3:
    def __init__(self):
        self.a, self.b = 1, 1

    def __call__(self):
        r = self.a
        self.a, self.b = self.b, self.a + self.b
        return r

assert [1,1,2,3,5] == ftake(fib_gen3(), 5)

Here are some Python examples of how to actually implement generators as if Python did not provide syntactic sugar for them:

As a Python generator:

from itertools import islice

def fib_gen():
    a, b = 1, 1
    while True:
        yield a
        a, b = b, a + b

assert [1, 1, 2, 3, 5] == list(islice(fib_gen(), 5))

Using lexical closures instead of generators

def ftake(fnext, last):
    return [fnext() for _ in xrange(last)]

def fib_gen2():
    #funky scope due to python2.x workaround
    #for python 3.x use nonlocal
    def _():
        _.a, _.b = _.b, _.a + _.b
        return _.a
    _.a, _.b = 0, 1
    return _

assert [1,1,2,3,5] == ftake(fib_gen2(), 5)

Using object closures instead of generators (because ClosuresAndObjectsAreEquivalent)

class fib_gen3:
    def __init__(self):
        self.a, self.b = 1, 1

    def __call__(self):
        r = self.a
        self.a, self.b = self.b, self.a + self.b
        return r

assert [1,1,2,3,5] == ftake(fib_gen3(), 5)

回答 17

我打算发布“阅读Beazley的“ Python:基本参考”的第19页,以快速了解生成器”,但是已经有许多其他人发布了不错的描述。

另外,请注意,它们yield可以在协程中用作生成函数的双重功能。尽管它与您的代码段用法不同,(yield)但是可以用作函数中的表达式。当调用者使用该send()方法向该方法发送值时,协程将执行直到(yield)遇到下一条语句。

生成器和协程是设置数据流类型应用程序的一种很酷的方法。我认为有必要了解该yield语句在函数中的其他用法。

I was going to post “read page 19 of Beazley’s ‘Python: Essential Reference’ for a quick description of generators”, but so many others have posted good descriptions already.

Also, note that yield can be used in coroutines as the dual of their use in generator functions. Although it isn’t the same use as your code snippet, (yield) can be used as an expression in a function. When a caller sends a value to the method using the send() method, then the coroutine will execute until the next (yield) statement is encountered.

Generators and coroutines are a cool way to set up data-flow type applications. I thought it would be worthwhile knowing about the other use of the yield statement in functions.


回答 18

从编程的角度来看,迭代器被实现为thunk

为了将迭代器,生成器和线程池实现为并发执行等,作为重击(也称为匿名函数),人们使用发送到具有分派器的闭包对象的消息,然后分派器对“消息”做出响应。

http://en.wikipedia.org/wiki/Message_passing

next ”是发送给闭包的消息,由“ iter ”创建 ”调用。

有很多方法可以实现此计算。我使用了变异,但是通过返回当前值和下一个生成器,很容易做到无变异。

这是一个使用R6RS结构的演示,但是其语义与Python完全相同。它是相同的计算模型,只需要更改语法就可以用Python重写它。

Welcome to Racket v6.5.0.3.

-> (define gen
     (lambda (l)
       (define yield
         (lambda ()
           (if (null? l)
               'END
               (let ((v (car l)))
                 (set! l (cdr l))
                 v))))
       (lambda(m)
         (case m
           ('yield (yield))
           ('init  (lambda (data)
                     (set! l data)
                     'OK))))))
-> (define stream (gen '(1 2 3)))
-> (stream 'yield)
1
-> (stream 'yield)
2
-> (stream 'yield)
3
-> (stream 'yield)
'END
-> ((stream 'init) '(a b))
'OK
-> (stream 'yield)
'a
-> (stream 'yield)
'b
-> (stream 'yield)
'END
-> (stream 'yield)
'END
->

From a programming viewpoint, the iterators are implemented as thunks.

To implement iterators, generators, and thread pools for concurrent execution, etc. as thunks (also called anonymous functions), one uses messages sent to a closure object, which has a dispatcher, and the dispatcher answers to “messages”.

http://en.wikipedia.org/wiki/Message_passing

next” is a message sent to a closure, created by the “iter” call.

There are lots of ways to implement this computation. I used mutation, but it is easy to do it without mutation, by returning the current value and the next yielder.

Here is a demonstration which uses the structure of R6RS, but the semantics is absolutely identical to Python’s. It’s the same model of computation, and only a change in syntax is required to rewrite it in Python.

Welcome to Racket v6.5.0.3.

-> (define gen
     (lambda (l)
       (define yield
         (lambda ()
           (if (null? l)
               'END
               (let ((v (car l)))
                 (set! l (cdr l))
                 v))))
       (lambda(m)
         (case m
           ('yield (yield))
           ('init  (lambda (data)
                     (set! l data)
                     'OK))))))
-> (define stream (gen '(1 2 3)))
-> (stream 'yield)
1
-> (stream 'yield)
2
-> (stream 'yield)
3
-> (stream 'yield)
'END
-> ((stream 'init) '(a b))
'OK
-> (stream 'yield)
'a
-> (stream 'yield)
'b
-> (stream 'yield)
'END
-> (stream 'yield)
'END
->

回答 19

这是一个简单的示例:

def isPrimeNumber(n):
    print "isPrimeNumber({}) call".format(n)
    if n==1:
        return False
    for x in range(2,n):
        if n % x == 0:
            return False
    return True

def primes (n=1):
    while(True):
        print "loop step ---------------- {}".format(n)
        if isPrimeNumber(n): yield n
        n += 1

for n in primes():
    if n> 10:break
    print "wiriting result {}".format(n)

输出:

loop step ---------------- 1
isPrimeNumber(1) call
loop step ---------------- 2
isPrimeNumber(2) call
loop step ---------------- 3
isPrimeNumber(3) call
wiriting result 3
loop step ---------------- 4
isPrimeNumber(4) call
loop step ---------------- 5
isPrimeNumber(5) call
wiriting result 5
loop step ---------------- 6
isPrimeNumber(6) call
loop step ---------------- 7
isPrimeNumber(7) call
wiriting result 7
loop step ---------------- 8
isPrimeNumber(8) call
loop step ---------------- 9
isPrimeNumber(9) call
loop step ---------------- 10
isPrimeNumber(10) call
loop step ---------------- 11
isPrimeNumber(11) call

我不是Python开发人员,但在我看来 yield保持着程序流程的位置,并且下一个循环从“ yield”位置开始。似乎它正在那个位置等待,就在那之前,在外面返回一个值,下一次继续工作。

这似乎是一种有趣而又不错的能力:D

Here is a simple example:

def isPrimeNumber(n):
    print "isPrimeNumber({}) call".format(n)
    if n==1:
        return False
    for x in range(2,n):
        if n % x == 0:
            return False
    return True

def primes (n=1):
    while(True):
        print "loop step ---------------- {}".format(n)
        if isPrimeNumber(n): yield n
        n += 1

for n in primes():
    if n> 10:break
    print "wiriting result {}".format(n)

Output:

loop step ---------------- 1
isPrimeNumber(1) call
loop step ---------------- 2
isPrimeNumber(2) call
loop step ---------------- 3
isPrimeNumber(3) call
wiriting result 3
loop step ---------------- 4
isPrimeNumber(4) call
loop step ---------------- 5
isPrimeNumber(5) call
wiriting result 5
loop step ---------------- 6
isPrimeNumber(6) call
loop step ---------------- 7
isPrimeNumber(7) call
wiriting result 7
loop step ---------------- 8
isPrimeNumber(8) call
loop step ---------------- 9
isPrimeNumber(9) call
loop step ---------------- 10
isPrimeNumber(10) call
loop step ---------------- 11
isPrimeNumber(11) call

I am not a Python developer, but it looks to me yield holds the position of program flow and the next loop start from “yield” position. It seems like it is waiting at that position, and just before that, returning a value outside, and next time continues to work.

It seems to be an interesting and nice ability :D


回答 20

这是做什么事情的心理yield印象。

我喜欢将线程视为具有堆栈(即使未以这种方式实现)。

调用普通函数时,它将其局部变量放在堆栈上,进行一些计算,然后清除堆栈并返回。再也看不到其局部变量的值。

对于一个yield函数,当其代码开始运行时(即,在调用该函数之后,返回生成器对象,next()然后调用该方法的生成器对象),它类似地将其局部变量放入堆栈中并进行一段时间的计算。但是,当它命中该yield语句时,在清除堆栈的一部分并返回之前,它会对其局部变量进行快照,并将其存储在生成器对象中。它还在代码中写下了当前位置(即特定的yield语句)。

因此,这是生成器挂起的一种冻结函数。

next()随后被调用时,它检索功能的物品入堆栈,重新蓬勃生机。该函数从中断处继续进行计算,而忽略了它刚刚在冷库中度过了一个永恒的事实。

比较以下示例:

def normalFunction():
    return
    if False:
        pass

def yielderFunction():
    return
    if False:
        yield 12

当我们调用第二个函数时,它的行为与第一个函数非常不同。该yield语句可能无法到达,但是如果它存在于任何地方,它将改变我们正在处理的内容的性质。

>>> yielderFunction()
<generator object yielderFunction at 0x07742D28>

调用yielderFunction()不会运行其代码,而是使代码生成器。(yielder为便于阅读,以这样的名称命名可能是个好主意。)

>>> gen = yielderFunction()
>>> dir(gen)
['__class__',
 ...
 '__iter__',    #Returns gen itself, to make it work uniformly with containers
 ...            #when given to a for loop. (Containers return an iterator instead.)
 'close',
 'gi_code',
 'gi_frame',
 'gi_running',
 'next',        #The method that runs the function's body.
 'send',
 'throw']

gi_codegi_frame字段是冻结状态的存储位置。用探索它们dir(..),我们可以确认我们上面的心理模型是可信的。

Here is a mental image of what yield does.

I like to think of a thread as having a stack (even when it’s not implemented that way).

When a normal function is called, it puts its local variables on the stack, does some computation, then clears the stack and returns. The values of its local variables are never seen again.

With a yield function, when its code begins to run (i.e. after the function is called, returning a generator object, whose next() method is then invoked), it similarly puts its local variables onto the stack and computes for a while. But then, when it hits the yield statement, before clearing its part of the stack and returning, it takes a snapshot of its local variables and stores them in the generator object. It also writes down the place where it’s currently up to in its code (i.e. the particular yield statement).

So it’s a kind of a frozen function that the generator is hanging onto.

When next() is called subsequently, it retrieves the function’s belongings onto the stack and re-animates it. The function continues to compute from where it left off, oblivious to the fact that it had just spent an eternity in cold storage.

Compare the following examples:

def normalFunction():
    return
    if False:
        pass

def yielderFunction():
    return
    if False:
        yield 12

When we call the second function, it behaves very differently to the first. The yield statement might be unreachable, but if it’s present anywhere, it changes the nature of what we’re dealing with.

>>> yielderFunction()
<generator object yielderFunction at 0x07742D28>

Calling yielderFunction() doesn’t run its code, but makes a generator out of the code. (Maybe it’s a good idea to name such things with the yielder prefix for readability.)

>>> gen = yielderFunction()
>>> dir(gen)
['__class__',
 ...
 '__iter__',    #Returns gen itself, to make it work uniformly with containers
 ...            #when given to a for loop. (Containers return an iterator instead.)
 'close',
 'gi_code',
 'gi_frame',
 'gi_running',
 'next',        #The method that runs the function's body.
 'send',
 'throw']

The gi_code and gi_frame fields are where the frozen state is stored. Exploring them with dir(..), we can confirm that our mental model above is credible.


回答 21

就像每个答案所建议的那样,yield用于创建序列生成器。它用于动态生成一些序列。例如,在网络上逐行读取文件时,可以使用以下yield功能:

def getNextLines():
   while con.isOpen():
       yield con.read()

您可以在代码中使用它,如下所示:

for line in getNextLines():
    doSomeThing(line)

执行控制转移陷阱

执行foryield时,执行控制将从getNextLines()转移到循环中。因此,每次调用getNextLines()时,都会从上次暂停的位置开始执行。

因此,简而言之,具有以下代码的函数

def simpleYield():
    yield "first time"
    yield "second time"
    yield "third time"
    yield "Now some useful value {}".format(12)

for i in simpleYield():
    print i

将打印

"first time"
"second time"
"third time"
"Now some useful value 12"

Like every answer suggests, yield is used for creating a sequence generator. It’s used for generating some sequence dynamically. For example, while reading a file line by line on a network, you can use the yield function as follows:

def getNextLines():
   while con.isOpen():
       yield con.read()

You can use it in your code as follows:

for line in getNextLines():
    doSomeThing(line)

Execution Control Transfer gotcha

The execution control will be transferred from getNextLines() to the for loop when yield is executed. Thus, every time getNextLines() is invoked, execution begins from the point where it was paused last time.

Thus in short, a function with the following code

def simpleYield():
    yield "first time"
    yield "second time"
    yield "third time"
    yield "Now some useful value {}".format(12)

for i in simpleYield():
    print i

will print

"first time"
"second time"
"third time"
"Now some useful value 12"

回答 22

一个简单的例子来了解它是什么: yield

def f123():
    for _ in range(4):
        yield 1
        yield 2


for i in f123():
    print (i)

输出为:

1 2 1 2 1 2 1 2

An easy example to understand what it is: yield

def f123():
    for _ in range(4):
        yield 1
        yield 2


for i in f123():
    print (i)

The output is:

1 2 1 2 1 2 1 2

回答 23

(我下面的回答仅从使用Python生成器的角度讲,而不是生成器机制基础实现,它涉及堆栈和堆操作的一些技巧。)

在python函数中yield使用when 代替a return时,该函数变成了一个特殊的名称generator function。该函数将返回一个generator类型的对象。yield关键字是一个标志,通知Python编译器将特殊对待这样的功能。普通函数将在返回一些值后终止。但是在编译器的帮助下,可以将 generator函数视为可恢复的。也就是说,将恢复执行上下文,并且将从上次运行继续执行。在您显式调用return之前,它将引发StopIteration异常(这也是迭代器协议的一部分),或到达函数的结尾。我发现了很多关于引用的generator,但是这一个从中functional programming perspective最容易消化。

(现在,我想根据我自己的理解来讨论其背后的原理generatoriterator基础。我希望这可以帮助您掌握迭代器和生成器的基本动机。这种概念也出现在其他语言中,例如C#。)

据我了解,当我们要处理一堆数据时,通常先将数据存储在某个地方,然后再逐一处理。但是这种幼稚的方法是有问题的。如果数据量巨大,则预先存储它们是很昂贵的。因此data,为什么不直接存储自身,为什么不metadata间接存储某种形式,即the logic how the data is computed

有两种包装此类元数据的方法。

  1. 面向对象的方法,我们包装了元数据as a class。这就是所谓的iterator实现迭代器协议的人(即__next__()__iter__()方法)。这也是常见的迭代器设计模式
  2. 在功能方法上,我们包装了元数据as a function。这就是所谓的generator function。但是在后台,返回的generator object静态IS-A迭代器仍然存在,因为它也实现了迭代器协议。

无论哪种方式,都会创建一个迭代器,即某个可以为您提供所需数据的对象。OO方法可能有点复杂。无论如何,要使用哪一个取决于您。

(My below answer only speaks from the perspective of using Python generator, not the underlying implementation of generator mechanism, which involves some tricks of stack and heap manipulation.)

When yield is used instead of a return in a python function, that function is turned into something special called generator function. That function will return an object of generator type. The yield keyword is a flag to notify the python compiler to treat such function specially. Normal functions will terminate once some value is returned from it. But with the help of the compiler, the generator function can be thought of as resumable. That is, the execution context will be restored and the execution will continue from last run. Until you explicitly call return, which will raise a StopIteration exception (which is also part of the iterator protocol), or reach the end of the function. I found a lot of references about generator but this one from the functional programming perspective is the most digestable.

(Now I want to talk about the rationale behind generator, and the iterator based on my own understanding. I hope this can help you grasp the essential motivation of iterator and generator. Such concept shows up in other languages as well such as C#.)

As I understand, when we want to process a bunch of data, we usually first store the data somewhere and then process it one by one. But this naive approach is problematic. If the data volume is huge, it’s expensive to store them as a whole beforehand. So instead of storing the data itself directly, why not store some kind of metadata indirectly, i.e. the logic how the data is computed.

There are 2 approaches to wrap such metadata.

  1. The OO approach, we wrap the metadata as a class. This is the so-called iterator who implements the iterator protocol (i.e. the __next__(), and __iter__() methods). This is also the commonly seen iterator design pattern.
  2. The functional approach, we wrap the metadata as a function. This is the so-called generator function. But under the hood, the returned generator object still IS-A iterator because it also implements the iterator protocol.

Either way, an iterator is created, i.e. some object that can give you the data you want. The OO approach may be a bit complex. Anyway, which one to use is up to you.


回答 24

总之,该yield语句将您的函数转换为一个工厂,该工厂产生一个称为a的特殊对象,该对象generator环绕原始函数的主体。当generator被重复,直到它到达下一个执行的功能yield后停止执行,计算结果为传递给值yield。它将在每次迭代中重复此过程,直到执行路径退出函数为止。例如,

def simple_generator():
    yield 'one'
    yield 'two'
    yield 'three'

for i in simple_generator():
    print i

简单地输出

one
two
three

动力来自将生成器与计算序列的循环配合使用,生成器每次执行循环都会停止,以“产生”下一个计算结果,这样就可以即时计算列表,而好处是可以存储保存用于特别大的计算

假设您想创建自己的range函数来产生可迭代的数字范围,则可以这样做,

def myRangeNaive(i):
    n = 0
    range = []
    while n < i:
        range.append(n)
        n = n + 1
    return range

像这样使用

for i in myRangeNaive(10):
    print i

但这是低效的,因为

  • 您创建只使用一次的数组(这会浪费内存)
  • 这段代码实际上在该数组上循环了两次!:(

幸运的是,Guido和他的团队足够慷慨地开发生成器,因此我们可以做到这一点。

def myRangeSmart(i):
    n = 0
    while n < i:
       yield n
       n = n + 1
    return

for i in myRangeSmart(10):
    print i

现在,每次迭代时,生成器上的一个称为next()函数的函数都会执行该函数,直到达到“ yield”语句为止,该语句在该语句中停止并“屈服”值或到达函数的末尾。在这种情况下,在第一次调用时,next()执行到yield语句并产生yield’n’,在下一次调用时,它将执行递增语句,跳回到’while’,对其求值,如果为true,它将停止并再次产生yield’n’,它将继续以这种方式,直到while条件返回false且生成器跳到函数的末尾。

In summary, the yield statement transforms your function into a factory that produces a special object called a generator which wraps around the body of your original function. When the generator is iterated, it executes your function until it reaches the next yield then suspends execution and evaluates to the value passed to yield. It repeats this process on each iteration until the path of execution exits the function. For instance,

def simple_generator():
    yield 'one'
    yield 'two'
    yield 'three'

for i in simple_generator():
    print i

simply outputs

one
two
three

The power comes from using the generator with a loop that calculates a sequence, the generator executes the loop stopping each time to ‘yield’ the next result of the calculation, in this way it calculates a list on the fly, the benefit being the memory saved for especially large calculations

Say you wanted to create a your own range function that produces an iterable range of numbers, you could do it like so,

def myRangeNaive(i):
    n = 0
    range = []
    while n < i:
        range.append(n)
        n = n + 1
    return range

and use it like this;

for i in myRangeNaive(10):
    print i

But this is inefficient because

  • You create an array that you only use once (this wastes memory)
  • This code actually loops over that array twice! :(

Luckily Guido and his team were generous enough to develop generators so we could just do this;

def myRangeSmart(i):
    n = 0
    while n < i:
       yield n
       n = n + 1
    return

for i in myRangeSmart(10):
    print i

Now upon each iteration a function on the generator called next() executes the function until it either reaches a ‘yield’ statement in which it stops and ‘yields’ the value or reaches the end of the function. In this case on the first call, next() executes up to the yield statement and yield ‘n’, on the next call it will execute the increment statement, jump back to the ‘while’, evaluate it, and if true, it will stop and yield ‘n’ again, it will continue that way until the while condition returns false and the generator jumps to the end of the function.


回答 25

Yield是一个对象

return函数中的A 将返回单个值。

如果您希望函数返回大量值,请使用yield

更重要的yield是,是一个障碍

就像CUDA语言中的barrier一样,它在完成之前不会转移控制权。

也就是说,它将从头开始运行函数中的代码,直到命中为止yield。然后,它将返回循环的第一个值。

然后,其他所有调用将再次运行您在函数中编写的循环,返回下一个值,直到没有任何值可返回为止。

Yield is an object

A return in a function will return a single value.

If you want a function to return a huge set of values, use yield.

More importantly, yield is a barrier.

like barrier in the CUDA language, it will not transfer control until it gets completed.

That is, it will run the code in your function from the beginning until it hits yield. Then, it’ll return the first value of the loop.

Then, every other call will run the loop you have written in the function one more time, returning the next value until there isn’t any value to return.


回答 26

许多人使用return而不是yield,但是在某些情况下yield可以更高效,更轻松地工作。

这是yield绝对适合的示例:

返回(函数中)

import random

def return_dates():
    dates = [] # With 'return' you need to create a list then return it
    for i in range(5):
        date = random.choice(["1st", "2nd", "3rd", "4th", "5th", "6th", "7th", "8th", "9th", "10th"])
        dates.append(date)
    return dates

Yield(以功能计)

def yield_dates():
    for i in range(5):
        date = random.choice(["1st", "2nd", "3rd", "4th", "5th", "6th", "7th", "8th", "9th", "10th"])
        yield date # 'yield' makes a generator automatically which works
                   # in a similar way. This is much more efficient.

通话功能

dates_list = return_dates()
print(dates_list)
for i in dates_list:
    print(i)

dates_generator = yield_dates()
print(dates_generator)
for i in dates_generator:
    print(i)

这两个函数执行相同的操作,但是yield使用三行而不是五行,并且少担心一个变量。

这是代码的结果:

输出量

如您所见,两个函数都做同样的事情。唯一的区别是return_dates()提供列表和yield_dates()生成器。

现实生活中的例子可能是像逐行读取文件,或者只是想生成一个生成器。

Many people use return rather than yield, but in some cases yield can be more efficient and easier to work with.

Here is an example which yield is definitely best for:

return (in function)

import random

def return_dates():
    dates = [] # With 'return' you need to create a list then return it
    for i in range(5):
        date = random.choice(["1st", "2nd", "3rd", "4th", "5th", "6th", "7th", "8th", "9th", "10th"])
        dates.append(date)
    return dates

yield (in function)

def yield_dates():
    for i in range(5):
        date = random.choice(["1st", "2nd", "3rd", "4th", "5th", "6th", "7th", "8th", "9th", "10th"])
        yield date # 'yield' makes a generator automatically which works
                   # in a similar way. This is much more efficient.

Calling functions

dates_list = return_dates()
print(dates_list)
for i in dates_list:
    print(i)

dates_generator = yield_dates()
print(dates_generator)
for i in dates_generator:
    print(i)

Both functions do the same thing, but yield uses three lines instead of five and has one less variable to worry about.

This is the result from the code:

Output

As you can see both functions do the same thing. The only difference is return_dates() gives a list and yield_dates() gives a generator.

A real life example would be something like reading a file line by line or if you just want to make a generator.


回答 27

yield就像函数的返回元素一样。不同之处在于,yield元素将功能转换为生成器。生成器的行为就像一个函数,直到“屈服”为止。生成器停止运行,直到下一次调用为止,并从与启动完全相同的点继续运行。您可以通过调用来获得所有“屈服”值的序列list(generator())

yield is like a return element for a function. The difference is, that the yield element turns a function into a generator. A generator behaves just like a function until something is ‘yielded’. The generator stops until it is next called, and continues from exactly the same point as it started. You can get a sequence of all the ‘yielded’ values in one, by calling list(generator()).


回答 28

yield关键字简单地收集返回结果。想想yieldreturn +=

The yield keyword simply collects returning results. Think of yield like return +=


回答 29

这是一种yield基于简单的方法来计算斐波那契数列,解释如下:

def fib(limit=50):
    a, b = 0, 1
    for i in range(limit):
       yield b
       a, b = b, a+b

当您将其输入到REPL中并尝试调用它时,您将得到一个神秘的结果:

>>> fib()
<generator object fib at 0x7fa38394e3b8>

这是因为存在yield向您发送信号的Python,您想要创建一个生成器,即一个按需生成值的对象。

那么,如何生成这些值?这可以通过使用内置函数直接完成,也可以next通过将其提供给使用值的构造间接完成。

使用内置next()函数,您可以直接调用.next/ __next__,强制生成器生成一个值:

>>> g = fib()
>>> next(g)
1
>>> next(g)
1
>>> next(g)
2
>>> next(g)
3
>>> next(g)
5

间接地,如果您提供fibfor循环,list初始化程序,tuple初始化程序或其他任何期望对象生成/产生值的对象,则将“消耗”生成器,直到无法再生成任何值(并且返回) :

results = []
for i in fib(30):       # consumes fib
    results.append(i) 
# can also be accomplished with
results = list(fib(30)) # consumes fib

同样,使用tuple初始化程序:

>>> tuple(fib(5))       # consumes fib
(1, 1, 2, 3, 5)

生成器在延迟方面与功能有所不同。它通过保持其本地状态并允许您在需要时恢复来实现此目的。

首次调用fib时:

f = fib()

Python编译函数,遇到yield关键字,然后简单地将生成器对象返回给您。看起来不是很有帮助。

然后,当您请求它直接或间接生成第一个值时,它将执行找到的所有语句,直到遇到a为止yield,然后返回您提供给它的值yield并暂停。为了更好地说明这一点,让我们使用一些print调用(print "text"在Python 2上用if 代替):

def yielder(value):
    """ This is an infinite generator. Only use next on it """ 
    while 1:
        print("I'm going to generate the value for you")
        print("Then I'll pause for a while")
        yield value
        print("Let's go through it again.")

现在,输入REPL:

>>> gen = yielder("Hello, yield!")

您现在有了一个生成器对象,等待一个命令来生成一个值。使用next并查看打印出的内容:

>>> next(gen) # runs until it finds a yield
I'm going to generate the value for you
Then I'll pause for a while
'Hello, yield!'

未报价的结果是所打印的内容。引用的结果是从返回的结果yieldnext现在再次调用:

>>> next(gen) # continues from yield and runs again
Let's go through it again.
I'm going to generate the value for you
Then I'll pause for a while
'Hello, yield!'

生成器会记住它在此处暂停yield value并从那里继续。打印下一条消息yield,并再次执行搜索以使其暂停的语句(由于while循环)。

Here’s a simple yield based approach, to compute the fibonacci series, explained:

def fib(limit=50):
    a, b = 0, 1
    for i in range(limit):
       yield b
       a, b = b, a+b

When you enter this into your REPL and then try and call it, you’ll get a mystifying result:

>>> fib()
<generator object fib at 0x7fa38394e3b8>

This is because the presence of yield signaled to Python that you want to create a generator, that is, an object that generates values on demand.

So, how do you generate these values? This can either be done directly by using the built-in function next, or, indirectly by feeding it to a construct that consumes values.

Using the built-in next() function, you directly invoke .next/__next__, forcing the generator to produce a value:

>>> g = fib()
>>> next(g)
1
>>> next(g)
1
>>> next(g)
2
>>> next(g)
3
>>> next(g)
5

Indirectly, if you provide fib to a for loop, a list initializer, a tuple initializer, or anything else that expects an object that generates/produces values, you’ll “consume” the generator until no more values can be produced by it (and it returns):

results = []
for i in fib(30):       # consumes fib
    results.append(i) 
# can also be accomplished with
results = list(fib(30)) # consumes fib

Similarly, with a tuple initializer:

>>> tuple(fib(5))       # consumes fib
(1, 1, 2, 3, 5)

A generator differs from a function in the sense that it is lazy. It accomplishes this by maintaining it’s local state and allowing you to resume whenever you need to.

When you first invoke fib by calling it:

f = fib()

Python compiles the function, encounters the yield keyword and simply returns a generator object back at you. Not very helpful it seems.

When you then request it generates the first value, directly or indirectly, it executes all statements that it finds, until it encounters a yield, it then yields back the value you supplied to yield and pauses. For an example that better demonstrates this, let’s use some print calls (replace with print "text" if on Python 2):

def yielder(value):
    """ This is an infinite generator. Only use next on it """ 
    while 1:
        print("I'm going to generate the value for you")
        print("Then I'll pause for a while")
        yield value
        print("Let's go through it again.")

Now, enter in the REPL:

>>> gen = yielder("Hello, yield!")

you have a generator object now waiting for a command for it to generate a value. Use next and see what get’s printed:

>>> next(gen) # runs until it finds a yield
I'm going to generate the value for you
Then I'll pause for a while
'Hello, yield!'

The unquoted results are what’s printed. The quoted result is what is returned from yield. Call next again now:

>>> next(gen) # continues from yield and runs again
Let's go through it again.
I'm going to generate the value for you
Then I'll pause for a while
'Hello, yield!'

The generator remembers it was paused at yield value and resumes from there. The next message is printed and the search for the yield statement to pause at it performed again (due to the while loop).