标签归档:Python

异步实际上是如何工作的?

问题:异步实际上是如何工作的?

这个问题是由我的另一个问题引起的:如何在cdef中等待?

网路上有关于的大量文章和网志文章asyncio,但它们都是非常肤浅的。我找不到任何有关如何asyncio实际实现以及使I / O异步的信息。我正在尝试阅读源代码,但是它是数千行,不是最高等级的C代码,其中很多处理辅助对象,但是最关键的是,很难在Python语法和它将翻译的C代码之间进行连接入。

Asycnio自己的文档甚至没有帮助。那里没有关于它如何工作的信息,只有一些有关如何使用它的指南,有时也会引起误解/写得很差。

我熟悉Go的协程实现,并希望Python做同样的事情。如果是这样的话,我在上面链接的帖子中出现的代码将奏效。既然没有,我现在想找出原因。到目前为止,我最好的猜测如下,请纠正我错的地方:

  1. 形式的过程定义async def foo(): ...实际上被解释为类继承的方法coroutine
  2. 也许async def实际上是通过await语句分为多个方法,在这些方法上被调用的对象能够跟踪到目前为止执行所取得的进展。
  3. 如果上述条件成立,那么从本质上讲,协程的执行归结为某个全局管理器调用循环对象的方法(循环?)。
  4. 全局管理器以某种方式(如何?)知道何时由Python代码执行I / O操作(仅?),并且能够选择当前执行方法放弃控制后执行的待处理协程方法之一(命中该await语句) )。

换句话说,这是我尝试将某些asyncio语法“简化”为更易于理解的内容:

async def coro(name):
    print('before', name)
    await asyncio.sleep()
    print('after', name)

asyncio.gather(coro('first'), coro('second'))

# translated from async def coro(name)
class Coro(coroutine):
    def before(self, name):
        print('before', name)

    def after(self, name):
        print('after', name)

    def __init__(self, name):
        self.name = name
        self.parts = self.before, self.after
        self.pos = 0

    def __call__():
        self.parts[self.pos](self.name)
        self.pos += 1

    def done(self):
        return self.pos == len(self.parts)


# translated from asyncio.gather()
class AsyncIOManager:

    def gather(*coros):
        while not every(c.done() for c in coros):
            coro = random.choice(coros)
            coro()

如果我的猜测证明是正确的:那么我有一个问题。在这种情况下,I / O实际如何发生?在单独的线程中?整个解释器是否已暂停并且I / O在解释器外部进行?I / O到底是什么意思?如果我的python过程称为C open()过程,然后它又向内核发送了中断,放弃了对它的控制,那么Python解释器如何知道这一点并能够继续运行其他代码,而内核代码则执行实际的I / O,直到它唤醒了最初发送中断的Python过程?原则上,Python解释器如何知道这种情况?

This question is motivated by my another question: How to await in cdef?

There are tons of articles and blog posts on the web about asyncio, but they are all very superficial. I couldn’t find any information about how asyncio is actually implemented, and what makes I/O asynchronous. I was trying to read the source code, but it’s thousands of lines of not the highest grade C code, a lot of which deals with auxiliary objects, but most crucially, it is hard to connect between Python syntax and what C code it would translate into.

Asycnio’s own documentation is even less helpful. There’s no information there about how it works, only some guidelines about how to use it, which are also sometimes misleading / very poorly written.

I’m familiar with Go’s implementation of coroutines, and was kind of hoping that Python did the same thing. If that was the case, the code I came up in the post linked above would have worked. Since it didn’t, I’m now trying to figure out why. My best guess so far is as follows, please correct me where I’m wrong:

  1. Procedure definitions of the form async def foo(): ... are actually interpreted as methods of a class inheriting coroutine.
  2. Perhaps, async def is actually split into multiple methods by await statements, where the object, on which these methods are called is able to keep track of the progress it made through the execution so far.
  3. If the above is true, then, essentially, execution of a coroutine boils down to calling methods of coroutine object by some global manager (loop?).
  4. The global manager is somehow (how?) aware of when I/O operations are performed by Python (only?) code and is able to choose one of the pending coroutine methods to execute after the current executing method relinquished control (hit on the await statement).

In other words, here’s my attempt at “desugaring” of some asyncio syntax into something more understandable:

async def coro(name):
    print('before', name)
    await asyncio.sleep()
    print('after', name)

asyncio.gather(coro('first'), coro('second'))

# translated from async def coro(name)
class Coro(coroutine):
    def before(self, name):
        print('before', name)

    def after(self, name):
        print('after', name)

    def __init__(self, name):
        self.name = name
        self.parts = self.before, self.after
        self.pos = 0

    def __call__():
        self.parts[self.pos](self.name)
        self.pos += 1

    def done(self):
        return self.pos == len(self.parts)


# translated from asyncio.gather()
class AsyncIOManager:

    def gather(*coros):
        while not every(c.done() for c in coros):
            coro = random.choice(coros)
            coro()

Should my guess prove correct: then I have a problem. How does I/O actually happen in this scenario? In a separate thread? Is the whole interpreter suspended and I/O happens outside the interpreter? What exactly is meant by I/O? If my python procedure called C open() procedure, and it in turn sent interrupt to kernel, relinquishing control to it, how does Python interpreter know about this and is able to continue running some other code, while kernel code does the actual I/O and until it wakes up the Python procedure which sent the interrupt originally? How can Python interpreter in principle, be aware of this happening?


回答 0

asyncio如何工作?

在回答这个问题之前,我们需要了解一些基本术语,如果您已经知道一些基本术语,请跳过这些基本术语。

生成器

生成器是使我们能够暂停python函数执行的对象。用户策划的生成器使用关键字实现yield。通过创建包含yield关键字的普通函数,我们将该函数转换为生成器:

>>> def test():
...     yield 1
...     yield 2
...
>>> gen = test()
>>> next(gen)
1
>>> next(gen)
2
>>> next(gen)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
StopIteration

如您所见,调用next()生成器会导致解释器加载测试的帧,并返回yielded值。next()再次调用,使框架再次加载到解释器堆栈中,并继续yield输入另一个值。

到第三次next()调用时,我们的生成器完成了StopIteration并被抛出。

与生成器通讯

生成器的鲜为人知的特点是,你可以与他们使用两种方法进行通信的事实:send()throw()

>>> def test():
...     val = yield 1
...     print(val)
...     yield 2
...     yield 3
...
>>> gen = test()
>>> next(gen)
1
>>> gen.send("abc")
abc
2
>>> gen.throw(Exception())
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "<stdin>", line 4, in test
Exception

调用时gen.send(),该值作为yield关键字的返回值传递。

gen.throw()另一方面,允许在生成器中引发Exception,但在同一位置引发了异常yield

从生成器返回值

从生成器返回一个值,结果将该值放入StopIteration异常中。稍后我们可以从异常中恢复值,并根据需要使用它。

>>> def test():
...     yield 1
...     return "abc"
...
>>> gen = test()
>>> next(gen)
1
>>> try:
...     next(gen)
... except StopIteration as exc:
...     print(exc.value)
...
abc

看,一个新的关键字: yield from

Python 3.4附带了一个新关键字:yield from。什么是关键字允许我们做的,是通过对任何next()send()throw()成为最内嵌套的生成器。如果内部生成器返回一个值,则它也是的返回值yield from

>>> def inner():
...     inner_result = yield 2
...     print('inner', inner_result)
...     return 3
...
>>> def outer():
...     yield 1
...     val = yield from inner()
...     print('outer', val)
...     yield 4
...
>>> gen = outer()
>>> next(gen)
1
>>> next(gen) # Goes inside inner() automatically
2
>>> gen.send("abc")
inner abc
outer 3
4

我写了一篇文章进一步阐述这个话题。

放在一起

yield from在Python 3.4中引入了new关键字之后,我们现在能够在生成器内部创建生成器,就像隧道一样,将数据从最内层生成器来回传递到最外层生成器。这为生成器- 协程产生了新的含义。

协程是可以在运行时停止和恢复的功能。在Python中,它们是使用async def关键字定义的。就像生成器一样,它们也使用自己的形式,yield fromawait。之前asyncawait被在Python 3.5推出,我们创建了创建完全相同的方式生成协同程序(带yield from代替await)。

async def inner():
    return 1

async def outer():
    await inner()

像实现该__iter__()方法的每个迭代器或生成器一样,协程实现__await__()也允许它们每次都继续执行await coro

有一个很好的序列图里面Python文档,你应该看看。

在异步中,除了协程功能外,我们还有两个重要的对象:任务期货

期货

期货是已__await__()实现该方法的对象,其任务是保持某种状态和结果。状态可以是以下之一:

  1. 待处理-未来未设置任何结果或exceptions。
  2. 已取消-将来已使用取消 fut.cancel()
  3. 完成-将来通过使用的结果集fut.set_result()或使用的异常集完成fut.set_exception()

就像您猜到的那样,结果可能是将返回的Python对象,也可能是引发异常的对象。

对象的另一个重要特征future是它们包含一个称为的方法add_done_callback()。此方法允许在任务完成后立即调用函数-无论是引发异常还是完成。

任务

任务对象是特殊的期货,它们围绕着协程,并与最内部和最外部的协程进行通信。每当协程成为await未来时,未来都会一直传递到任务中(就像中的一样yield from),任务会接收它。

接下来,任务将自己绑定到未来。它通过呼吁add_done_callback()未来来做到这一点。从现在开始,如果将来能够实现,通过取消,传递异常或传递Python对象作为结果,任务的回调将被调用,并将恢复为存在。

异步

我们必须回答的最后一个亟待解决的问题是-IO如何实现?

在异步内部,我们有一个事件循环。任务的事件循环。事件循环的工作是在每次准备就绪时调用任务,并将所有工作协调到一台工作机中。

事件循环的IO部分建立在一个称为的关键功能上select。Select是一种阻止功能,由下面的操作系统实现,它允许在套接字上等待传入或传出数据。接收到数据后,它将唤醒,并返回接收到数据的套接字或准备写入的套接字。

当您尝试通过asyncio通过套接字接收或发送数据时,下面实际发生的情况是,首先检查套接字是否有任何可以立即读取或发送的数据。如果其.send()缓冲区已满,或者.recv()缓冲区为空,则将套接字注册到该select函数(只需将其添加到rlistfor recvwlistfor 列表之一send)中,并将适当的函数(await新创建的future对象)绑定到该套接字。

当所有可用任务都在等待将来时,事件循环将调用select并等待。当其中一个套接字有传入数据,或者其send缓冲区耗尽时,asyncio会检查与该套接字绑定的将来对象,并将其设置为完成。

现在所有的魔术都发生了。未来已经完成,之前添加的任务又恢复了活力add_done_callback(),并调用.send()协程以恢复最内部的协程(由于该await链),并且您从附近的缓冲区读取了新接收到的数据被溅到了。

在以下情况下,再次使用方法链recv()

  1. select.select 等待。
  2. 准备好套接字,其中包含数据。
  3. 来自套接字的数据被移入缓冲区。
  4. future.set_result() 叫做。
  5. 添加自己的任务add_done_callback()现在被唤醒。
  6. Task调用.send()协程,协程将一直进入最内层的协程并唤醒它。
  7. 数据正在从缓冲区中读取,并返回给我们谦虚的用户。

总而言之,asyncio使用生成器功能,该功能允许暂停和恢复功能。它使用的yield from功能允许将数据从最内层生成器来回传递到最外层。它使用所有这些命令,以便在等待IO完成(通过使用OS select功能)时停止功能执行。

而最好的呢?当一种功能暂停时,另一种功能可能会运行并与精致的结构(即异步)交错。

How does asyncio work?

Before answering this question we need to understand a few base terms, skip these if you already know any of them.

Generators

Generators are objects that allow us to suspend the execution of a python function. User curated generators are implement using the keyword yield. By creating a normal function containing the yield keyword, we turn that function into a generator:

>>> def test():
...     yield 1
...     yield 2
...
>>> gen = test()
>>> next(gen)
1
>>> next(gen)
2
>>> next(gen)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
StopIteration

As you can see, calling next() on the generator causes the interpreter to load test’s frame, and return the yielded value. Calling next() again, cause the frame to load again into the interpreter stack, and continue on yielding another value.

By the third time next() is called, our generator was finished, and StopIteration was thrown.

Communicating with a generator

A less-known feature of generators, is the fact that you can communicate with them using two methods: send() and throw().

>>> def test():
...     val = yield 1
...     print(val)
...     yield 2
...     yield 3
...
>>> gen = test()
>>> next(gen)
1
>>> gen.send("abc")
abc
2
>>> gen.throw(Exception())
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "<stdin>", line 4, in test
Exception

Upon calling gen.send(), the value is passed as a return value from the yield keyword.

gen.throw() on the other hand, allows throwing Exceptions inside generators, with the exception raised at the same spot yield was called.

Returning values from generators

Returning a value from a generator, results in the value being put inside the StopIteration exception. We can later on recover the value from the exception and use it to our need.

>>> def test():
...     yield 1
...     return "abc"
...
>>> gen = test()
>>> next(gen)
1
>>> try:
...     next(gen)
... except StopIteration as exc:
...     print(exc.value)
...
abc

Behold, a new keyword: yield from

Python 3.4 came with the addition of a new keyword: yield from. What that keyword allows us to do, is pass on any next(), send() and throw() into an inner-most nested generator. If the inner generator returns a value, it is also the return value of yield from:

>>> def inner():
...     inner_result = yield 2
...     print('inner', inner_result)
...     return 3
...
>>> def outer():
...     yield 1
...     val = yield from inner()
...     print('outer', val)
...     yield 4
...
>>> gen = outer()
>>> next(gen)
1
>>> next(gen) # Goes inside inner() automatically
2
>>> gen.send("abc")
inner abc
outer 3
4

I’ve written an article to further elaborate on this topic.

Putting it all together

Upon introducing the new keyword yield from in Python 3.4, we were now able to create generators inside generators that just like a tunnel, pass the data back and forth from the inner-most to the outer-most generators. This has spawned a new meaning for generators – coroutines.

Coroutines are functions that can be stopped and resumed while being run. In Python, they are defined using the async def keyword. Much like generators, they too use their own form of yield from which is await. Before async and await were introduced in Python 3.5, we created coroutines in the exact same way generators were created (with yield from instead of await).

async def inner():
    return 1

async def outer():
    await inner()

Like every iterator or generator that implement the __iter__() method, coroutines implement __await__() which allows them to continue on every time await coro is called.

There’s a nice sequence diagram inside the Python docs that you should check out.

In asyncio, apart from coroutine functions, we have 2 important objects: tasks and futures.

Futures

Futures are objects that have the __await__() method implemented, and their job is to hold a certain state and result. The state can be one of the following:

  1. PENDING – future does not have any result or exception set.
  2. CANCELLED – future was cancelled using fut.cancel()
  3. FINISHED – future was finished, either by a result set using fut.set_result() or by an exception set using fut.set_exception()

The result, just like you have guessed, can either be a Python object, that will be returned, or an exception which may be raised.

Another important feature of future objects, is that they contain a method called add_done_callback(). This method allows functions to be called as soon as the task is done – whether it raised an exception or finished.

Tasks

Task objects are special futures, which wrap around coroutines, and communicate with the inner-most and outer-most coroutines. Every time a coroutine awaits a future, the future is passed all the way back to the task (just like in yield from), and the task receives it.

Next, the task binds itself to the future. It does so by calling add_done_callback() on the future. From now on, if the future will ever be done, by either being cancelled, passed an exception or passed a Python object as a result, the task’s callback will be called, and it will rise back up to existence.

Asyncio

The final burning question we must answer is – how is the IO implemented?

Deep inside asyncio, we have an event loop. An event loop of tasks. The event loop’s job is to call tasks every time they are ready and coordinate all that effort into one single working machine.

The IO part of the event loop is built upon a single crucial function called select. Select is a blocking function, implemented by the operating system underneath, that allows waiting on sockets for incoming or outgoing data. Upon data being received it wakes up, and returns the sockets which received data, or the sockets whom are ready for writing.

When you try to receive or send data over a socket through asyncio, what actually happens below is that the socket is first checked if it has any data that can be immediately read or sent. If its .send() buffer is full, or the .recv() buffer is empty, the socket is registered to the select function (by simply adding it to one of the lists, rlist for recv and wlist for send) and the appropriate function awaits a newly created future object, tied to that socket.

When all available tasks are waiting for futures, the event loop calls select and waits. When the one of the sockets has incoming data, or its send buffer drained up, asyncio checks for the future object tied to that socket, and sets it to done.

Now all the magic happens. The future is set to done, the task that added itself before with add_done_callback() rises up back to life, and calls .send() on the coroutine which resumes the inner-most coroutine (because of the await chain) and you read the newly received data from a nearby buffer it was spilled unto.

Method chain again, in case of recv():

  1. select.select waits.
  2. A ready socket, with data is returned.
  3. Data from the socket is moved into a buffer.
  4. future.set_result() is called.
  5. Task that added itself with add_done_callback() is now woken up.
  6. Task calls .send() on the coroutine which goes all the way into the inner-most coroutine and wakes it up.
  7. Data is being read from the buffer and returned to our humble user.

In summary, asyncio uses generator capabilities, that allow pausing and resuming functions. It uses yield from capabilities that allow passing data back and forth from the inner-most generator to the outer-most. It uses all of those in order to halt function execution while it’s waiting for IO to complete (by using the OS select function).

And the best of all? While one function is paused, another may run and interleave with the delicate fabric, which is asyncio.


回答 1

谈论async/awaitasyncio不是一回事。第一个是基本的低级构造(协程),而第二个是使用这些构造的库。相反,没有单一的最终答案。

下面是如何的一般说明async/awaitasyncio样库的工作。也就是说,可能还有其他的技巧(有…),但是除非您自己构建它们,否则它们是无关紧要的。除非您已经足够知道不必提出这样的问题,否则差异应该可以忽略不计。

1.坚果壳中的协程与子程序

就像子例程(函数,过程,…)一样,协程(生成器,…)是调用堆栈和指令指针的抽象:有执行代码段的堆栈,每个执行段都是特定的指令。

defvs 的区别async def只是为了清楚起见。实际的差别是returnyield。从此,awaityield from从单个调用到整个堆栈取不同。

1.1。子程序

子例程表示一个新的堆栈级别,用于保存局部变量,并且单次遍历其指令即可到达末尾。考虑这样的子例程:

def subfoo(bar):
     qux = 3
     return qux * bar

当您运行它时,这意味着

  1. bar和分配堆栈空间qux
  2. 递归执行第一个语句并跳转到下一个语句
  3. 一次return,将其值推入调用堆栈
  4. 清除堆栈(1.)和指令指针(2.)

值得注意的是,4.表示子例程始终以相同的状态开始。该功能本身专有的所有内容在完成后都会丢失。即使后面有说明,也无法恢复功能return

root -\
  :    \- subfoo --\
  :/--<---return --/
  |
  V

1.2。协程作为持久子例程

协程就像一个子例程,但是可以在破坏其状态的情况下退出。考虑这样的协程:

 def cofoo(bar):
      qux = yield bar  # yield marks a break point
      return qux

当您运行它时,这意味着

  1. bar和分配堆栈空间qux
  2. 递归执行第一个语句并跳转到下一个语句
    1. 一次yield,将其值压入调用堆栈,但存储堆栈和指令指针
    2. 一旦调用yield,恢复堆栈和指令指针并将参数推入qux
  3. 一次return,将其值推入调用堆栈
  4. 清除堆栈(1.)和指令指针(2.)

请注意,添加了2.1和2.2-协程可以在预定的位置挂起并恢复。这类似于在调用另一个子例程期间暂停子例程的方式。区别在于活动协程并不严格绑定到其调用堆栈。相反,悬挂的协程是单独的隔离堆栈的一部分。

root -\
  :    \- cofoo --\
  :/--<+--yield --/
  |    :
  V    :

这意味着悬浮的协程可以在堆栈之间自由存储或移动。任何有权访问协程的调用堆栈都可以决定恢复它。

1.3。遍历调用栈

到目前为止,我们的协程仅在调用堆栈中yield。子程序可以去和高达调用堆栈return()。为了完整性,协程还需要一种机制来提升调用堆栈。考虑这样的协程:

def wrap():
    yield 'before'
    yield from cofoo()
    yield 'after'

当您运行它时,这意味着它仍然像子例程一样分配堆栈和指令指针。当它挂起时,仍然就像存储一个子例程。

然而,yield from确实两者。它挂起堆栈wrap 运行指令cofoo。请注意,它将wrap保持挂起状态,直到cofoo完全完成。每当cofoo挂起或发送任何内容时,cofoo都直接连接到调用堆栈。

1.4。协程一直向下

如建立的那样,yield from允许将两个示波器连接到另一个中间示波器。递归应用时,这意味着堆栈的顶部可以连接到堆栈的底部

root -\
  :    \-> coro_a -yield-from-> coro_b --\
  :/ <-+------------------------yield ---/
  |    :
  :\ --+-- coro_a.send----------yield ---\
  :                             coro_b <-/

请注意,rootcoro_b不知道对方。这使得协程比回调更干净:协程仍然像子例程一样建立在1:1关系上。协程将暂停并恢复其整个现有执行堆栈,直到常规调用点为止。

值得注意的是,root可以恢复任意数量的协程。但是,它永远不能同时恢复多个。同一根的协程是并发的,但不是并行的!

1.5。Python的asyncawait

到目前为止,该解释已明确使用生成器的yieldyield from词汇-基本功能相同。新的Python3.5语法asyncawait主要是为了清楚起见。

def foo():  # subroutine?
     return None

def foo():  # coroutine?
     yield from foofoo()  # generator? coroutine?

async def foo():  # coroutine!
     await foofoo()  # coroutine!
     return None

需要使用async forand async with语句,因为您将yield from/await使用裸露的forand with语句断开链接。

2.简单事件循环的剖析

就一个协程本身而言,没有控制其他协程的概念。它只能对协程堆栈底部的调用者产生控制权。然后,此调用者可以切换到另一个协程并运行它。

几个协程的根节点通常是一个事件循环:在挂起时,协程会产生一个事件,并在该事件上恢复。反过来,事件循环能够有效地等待这些事件发生。这使它可以决定接下来要运行哪个协程,或在恢复之前如何等待。

这种设计意味着循环可以理解一组预定义的事件。几个协程await相互配合,直到最终完成一个事件await。该事件可以通过控制直接与事件循环通信yield

loop -\
  :    \-> coroutine --await--> event --\
  :/ <-+----------------------- yield --/
  |    :
  |    :  # loop waits for event to happen
  |    :
  :\ --+-- send(reply) -------- yield --\
  :        coroutine <--yield-- event <-/

关键是协程暂停允许事件循环和事件直接通信。中间协程堆栈不需要任何有关运行哪个循环或事件如何工作的知识。

2.1.1。及时事件

要处理的最简单事件是到达某个时间点。这也是线程代码的基本块:线程重复sleeps直到条件成立。但是,常规规则sleep本身会阻止执行-我们希望其他协程不被阻止。相反,我们想告诉事件循环何时应恢复当前协程堆栈。

2.1.2。定义事件

事件只是我们可以识别的值-通过枚举,类型或其他标识。我们可以使用存储目标时间的简单类来定义它。除了存储事件信息之外,我们还可以await直接允许一个类。

class AsyncSleep:
    """Event to sleep until a point in time"""
    def __init__(self, until: float):
        self.until = until

    # used whenever someone ``await``s an instance of this Event
    def __await__(self):
        # yield this Event to the loop
        yield self

    def __repr__(self):
        return '%s(until=%.1f)' % (self.__class__.__name__, self.until)

此类仅存储事件-它没有说明如何实际处理它。

唯一的特殊功能是__await__await关键字寻找的内容。实际上,它是一个迭代器,但不适用于常规迭代机制。

2.2.1。等待事件

现在我们有了一个事件,协程对此有何反应?我们应该能够表达相当于sleepawait荷兰国际集团我们的活动。为了更好地了解发生了什么,我们将等待一半的时间两次:

import time

async def asleep(duration: float):
    """await that ``duration`` seconds pass"""
    await AsyncSleep(time.time() + duration / 2)
    await AsyncSleep(time.time() + duration / 2)

我们可以直接实例化并运行此协程。类似于生成器,使用coroutine.send运行协程直到得到yield结果。

coroutine = asleep(100)
while True:
    print(coroutine.send(None))
    time.sleep(0.1)

这给了我们两个AsyncSleep事件,然后是StopIteration协程完成的一个事件。请注意,唯一的延迟来自time.sleep循环!每个AsyncSleep仅存储当前时间的偏移量。

2.2.2。活动+睡眠

目前,我们有两种独立的机制可供使用:

  • AsyncSleep 可以从协程内部产生的事件
  • time.sleep 可以等待而不会影响协程

值得注意的是,这两个是正交的:一个都不影响或触发另一个。结果,我们可以提出自己的策略sleep来应对延迟AsyncSleep

2.3。天真的事件循环

如果我们有几个协程,每个协程可以告诉我们何时要唤醒它。然后,我们可以等到第一个恢复之前,然后再恢复,依此类推。值得注意的是,在每一点上我们只关心下一个

这样可以进行简单的调度:

  1. 按照所需的唤醒时间对协程进行排序
  2. 选择第一个想要唤醒的人
  3. 等到这个时间点
  4. 运行这个协程
  5. 从1开始重复。

一个简单的实现不需要任何高级概念。A list允许按日期对协程进行排序。等待是有规律的time.sleep。运行协程的工作方式与之前一样coroutine.send

def run(*coroutines):
    """Cooperatively run all ``coroutines`` until completion"""
    # store wake-up-time and coroutines
    waiting = [(0, coroutine) for coroutine in coroutines]
    while waiting:
        # 2. pick the first coroutine that wants to wake up
        until, coroutine = waiting.pop(0)
        # 3. wait until this point in time
        time.sleep(max(0.0, until - time.time()))
        # 4. run this coroutine
        try:
            command = coroutine.send(None)
        except StopIteration:
            continue
        # 1. sort coroutines by their desired suspension
        if isinstance(command, AsyncSleep):
            waiting.append((command.until, coroutine))
            waiting.sort(key=lambda item: item[0])

当然,这还有很大的改进空间。我们可以将堆用于等待队列,或者将调度表用于事件。我们还可以从中获取返回值,StopIteration并将其分配给协程。但是,基本原理保持不变。

2.4。合作等待

AsyncSleep事件和run事件循环是定时事件的工作完全实现。

async def sleepy(identifier: str = "coroutine", count=5):
    for i in range(count):
        print(identifier, 'step', i + 1, 'at %.2f' % time.time())
        await asleep(0.1)

run(*(sleepy("coroutine %d" % j) for j in range(5)))

这将在五个协程中的每个协程之间进行协作切换,每个协程暂停0.1秒。即使事件循环是同步的,它仍然可以在0.5秒而不是2.5秒内执行工作。每个协程保持状态并独立运行。

3. I / O事件循环

支持的事件循环sleep适用于轮询。但是,等待文件句柄上的I / O可以更有效地完成:操作系统实现I / O,因此知道哪些句柄已准备就绪。理想情况下,事件循环应支持显式的“ Ready for I / O”事件。

3.1。该select呼叫

Python已经有一个接口可以查询OS的读取I / O句柄。当调用带有读取或写入的句柄时,它返回准备读取或写入的句柄:

readable, writeable, _ = select.select(rlist, wlist, xlist, timeout)

例如,我们可以open写入文件并等待其准备就绪:

write_target = open('/tmp/foo')
readable, writeable, _ = select.select([], [write_target], [])

select返回后,writeable包含我们的打开文件。

3.2。基本I / O事件

AsyncSleep请求类似,我们需要为I / O定义一个事件。使用底层select逻辑,事件必须引用可读对象-例如open文件。另外,我们存储要读取的数据量。

class AsyncRead:
    def __init__(self, file, amount=1):
        self.file = file
        self.amount = amount
        self._buffer = ''

    def __await__(self):
        while len(self._buffer) < self.amount:
            yield self
            # we only get here if ``read`` should not block
            self._buffer += self.file.read(1)
        return self._buffer

    def __repr__(self):
        return '%s(file=%s, amount=%d, progress=%d)' % (
            self.__class__.__name__, self.file, self.amount, len(self._buffer)
        )

AsyncSleep我们一样,我们大多只是存储底层系统调用所需的数据。这次__await__可以恢复多次-直到我们的需求amount被阅读为止。另外,我们return的I / O结果不只是恢复。

3.3。使用读取的I / O增强事件循环

事件循环的基础仍然是run先前定义的。首先,我们需要跟踪读取请求。这不再是排序的时间表,我们仅将读取请求映射到协程。

# new
waiting_read = {}  # type: Dict[file, coroutine]

由于select.select采用了超时参数,因此可以代替time.sleep

# old
time.sleep(max(0.0, until - time.time()))
# new
readable, _, _ = select.select(list(reads), [], [])

这将为我们提供所有可读文件-如果有的话,我们将运行相应的协程。如果没有,我们已经等待了足够长的时间来运行当前的协程。

# new - reschedule waiting coroutine, run readable coroutine
if readable:
    waiting.append((until, coroutine))
    waiting.sort()
    coroutine = waiting_read[readable[0]]

最后,我们必须实际侦听读取请求。

# new
if isinstance(command, AsyncSleep):
    ...
elif isinstance(command, AsyncRead):
    ...

3.4。把它放在一起

上面有点简化。如果我们总是可以阅读的话,我们需要做一些切换,以免饿死协程。我们需要处理没有阅读或等待的东西。但是,最终结果仍适合30 LOC。

def run(*coroutines):
    """Cooperatively run all ``coroutines`` until completion"""
    waiting_read = {}  # type: Dict[file, coroutine]
    waiting = [(0, coroutine) for coroutine in coroutines]
    while waiting or waiting_read:
        # 2. wait until the next coroutine may run or read ...
        try:
            until, coroutine = waiting.pop(0)
        except IndexError:
            until, coroutine = float('inf'), None
            readable, _, _ = select.select(list(waiting_read), [], [])
        else:
            readable, _, _ = select.select(list(waiting_read), [], [], max(0.0, until - time.time()))
        # ... and select the appropriate one
        if readable and time.time() < until:
            if until and coroutine:
                waiting.append((until, coroutine))
                waiting.sort()
            coroutine = waiting_read.pop(readable[0])
        # 3. run this coroutine
        try:
            command = coroutine.send(None)
        except StopIteration:
            continue
        # 1. sort coroutines by their desired suspension ...
        if isinstance(command, AsyncSleep):
            waiting.append((command.until, coroutine))
            waiting.sort(key=lambda item: item[0])
        # ... or register reads
        elif isinstance(command, AsyncRead):
            waiting_read[command.file] = coroutine

3.5。协同I / O

AsyncSleepAsyncRead并且run实现已全功能的睡眠和/或读取。与相同sleepy,我们可以定义一个帮助程序来测试阅读:

async def ready(path, amount=1024*32):
    print('read', path, 'at', '%d' % time.time())
    with open(path, 'rb') as file:
        result = return await AsyncRead(file, amount)
    print('done', path, 'at', '%d' % time.time())
    print('got', len(result), 'B')

run(sleepy('background', 5), ready('/dev/urandom'))

运行此命令,我们可以看到我们的I / O与等待的任务交错:

id background round 1
read /dev/urandom at 1530721148
id background round 2
id background round 3
id background round 4
id background round 5
done /dev/urandom at 1530721148
got 1024 B

4.非阻塞I / O

虽然文件上的I / O可以理解这个概念,但它实际上并不适合于像这样的库asyncioselect调用总是返回文件,并且两者都调用,open并且read可能无限期地阻塞。这阻止了事件循环的所有协程-这很糟糕。诸如此类的库aiofiles使用线程和同步来伪造文件中的非阻塞I / O和事件。

但是,套接字确实允许无阻塞的I / O-并且它们固有的延迟使其变得更加关键。在事件循环中使用时,可以包装等待数据和重试而不会阻塞任何内容。

4.1。非阻塞I / O事件

与我们类似AsyncRead,我们可以为套接字定义一个暂停和读取事件。我们不使用文件,而是使用套接字-该套接字必须是非阻塞的。另外,我们__await__使用socket.recv代替file.read

class AsyncRecv:
    def __init__(self, connection, amount=1, read_buffer=1024):
        assert not connection.getblocking(), 'connection must be non-blocking for async recv'
        self.connection = connection
        self.amount = amount
        self.read_buffer = read_buffer
        self._buffer = b''

    def __await__(self):
        while len(self._buffer) < self.amount:
            try:
                self._buffer += self.connection.recv(self.read_buffer)
            except BlockingIOError:
                yield self
        return self._buffer

    def __repr__(self):
        return '%s(file=%s, amount=%d, progress=%d)' % (
            self.__class__.__name__, self.connection, self.amount, len(self._buffer)
        )

与相比AsyncRead__await__执行真正的非阻塞I / O。当有数据时,它总是读取。如果没有可用数据,它将始终挂起。这意味着仅在我们执行有用的工作时才阻止事件循环。

4.2。解除阻塞事件循环

就事件循环而言,没有什么变化。要监听的事件仍然与文件相同-由标记为ready的文件描述符select

# old
elif isinstance(command, AsyncRead):
    waiting_read[command.file] = coroutine
# new
elif isinstance(command, AsyncRead):
    waiting_read[command.file] = coroutine
elif isinstance(command, AsyncRecv):
    waiting_read[command.connection] = coroutine

在这一点上,显然与AsyncReadAsyncRecv是同一种事件。我们可以轻松地将它们重构为一个具有可交换I / O组件的事件。实际上,事件循环,协程和事件调度程序,任意中间代码和实际I / O 清晰地分开

4.3。非阻塞I / O的丑陋一面

原则上,你应该在这一点上做的是复制的逻辑read作为recvAsyncRecv。但是,这现在变得更加丑陋-当函数在内核内部阻塞时,您必须处理早期返回,但要对您产生控制权。例如,打开连接与打开文件的时间更长:

# file
file = open(path, 'rb')
# non-blocking socket
connection = socket.socket()
connection.setblocking(False)
# open without blocking - retry on failure
try:
    connection.connect((url, port))
except BlockingIOError:
    pass

长话短说,剩下的就是几十行异常处理。此时事件和事件循环已经起作用。

id background round 1
read localhost:25000 at 1530783569
read /dev/urandom at 1530783569
done localhost:25000 at 1530783569 got 32768 B
id background round 2
id background round 3
id background round 4
done /dev/urandom at 1530783569 got 4096 B
id background round 5

附录

github上的示例代码

Talking about async/await and asyncio is not the same thing. The first is a fundamental, low-level construct (coroutines) while the later is a library using these constructs. Conversely, there is no single ultimate answer.

The following is a general description of how async/await and asyncio-like libraries work. That is, there may be other tricks on top (there are…) but they are inconsequential unless you build them yourself. The difference should be negligible unless you already know enough to not have to ask such a question.

1. Coroutines versus subroutines in a nut shell

Just like subroutines (functions, procedures, …), coroutines (generators, …) are an abstraction of call stack and instruction pointer: there is a stack of executing code pieces, and each is at a specific instruction.

The distinction of def versus async def is merely for clarity. The actual difference is return versus yield. From this, await or yield from take the difference from individual calls to entire stacks.

1.1. Subroutines

A subroutine represents a new stack level to hold local variables, and a single traversal of its instructions to reach an end. Consider a subroutine like this:

def subfoo(bar):
     qux = 3
     return qux * bar

When you run it, that means

  1. allocate stack space for bar and qux
  2. recursively execute the first statement and jump to the next statement
  3. once at a return, push its value to the calling stack
  4. clear the stack (1.) and instruction pointer (2.)

Notably, 4. means that a subroutine always starts at the same state. Everything exclusive to the function itself is lost upon completion. A function cannot be resumed, even if there are instructions after return.

root -\
  :    \- subfoo --\
  :/--<---return --/
  |
  V

1.2. Coroutines as persistent subroutines

A coroutine is like a subroutine, but can exit without destroying its state. Consider a coroutine like this:

 def cofoo(bar):
      qux = yield bar  # yield marks a break point
      return qux

When you run it, that means

  1. allocate stack space for bar and qux
  2. recursively execute the first statement and jump to the next statement
    1. once at a yield, push its value to the calling stack but store the stack and instruction pointer
    2. once calling into yield, restore stack and instruction pointer and push arguments to qux
  3. once at a return, push its value to the calling stack
  4. clear the stack (1.) and instruction pointer (2.)

Note the addition of 2.1 and 2.2 – a coroutine can be suspended and resumed at predefined points. This is similar to how a subroutine is suspended during calling another subroutine. The difference is that the active coroutine is not strictly bound to its calling stack. Instead, a suspended coroutine is part of a separate, isolated stack.

root -\
  :    \- cofoo --\
  :/--<+--yield --/
  |    :
  V    :

This means that suspended coroutines can be freely stored or moved between stacks. Any call stack that has access to a coroutine can decide to resume it.

1.3. Traversing the call stack

So far, our coroutine only goes down the call stack with yield. A subroutine can go down and up the call stack with return and (). For completeness, coroutines also need a mechanism to go up the call stack. Consider a coroutine like this:

def wrap():
    yield 'before'
    yield from cofoo()
    yield 'after'

When you run it, that means it still allocates the stack and instruction pointer like a subroutine. When it suspends, that still is like storing a subroutine.

However, yield from does both. It suspends stack and instruction pointer of wrap and runs cofoo. Note that wrap stays suspended until cofoo finishes completely. Whenever cofoo suspends or something is sent, cofoo is directly connected to the calling stack.

1.4. Coroutines all the way down

As established, yield from allows to connect two scopes across another intermediate one. When applied recursively, that means the top of the stack can be connected to the bottom of the stack.

root -\
  :    \-> coro_a -yield-from-> coro_b --\
  :/ <-+------------------------yield ---/
  |    :
  :\ --+-- coro_a.send----------yield ---\
  :                             coro_b <-/

Note that root and coro_b do not know about each other. This makes coroutines much cleaner than callbacks: coroutines still built on a 1:1 relation like subroutines. Coroutines suspend and resume their entire existing execution stack up until a regular call point.

Notably, root could have an arbitrary number of coroutines to resume. Yet, it can never resume more than one at the same time. Coroutines of the same root are concurrent but not parallel!

1.5. Python’s async and await

The explanation has so far explicitly used the yield and yield from vocabulary of generators – the underlying functionality is the same. The new Python3.5 syntax async and await exists mainly for clarity.

def foo():  # subroutine?
     return None

def foo():  # coroutine?
     yield from foofoo()  # generator? coroutine?

async def foo():  # coroutine!
     await foofoo()  # coroutine!
     return None

The async for and async with statements are needed because you would break the yield from/await chain with the bare for and with statements.

2. Anatomy of a simple event loop

By itself, a coroutine has no concept of yielding control to another coroutine. It can only yield control to the caller at the bottom of a coroutine stack. This caller can then switch to another coroutine and run it.

This root node of several coroutines is commonly an event loop: on suspension, a coroutine yields an event on which it wants resume. In turn, the event loop is capable of efficiently waiting for these events to occur. This allows it to decide which coroutine to run next, or how to wait before resuming.

Such a design implies that there is a set of pre-defined events that the loop understands. Several coroutines await each other, until finally an event is awaited. This event can communicate directly with the event loop by yielding control.

loop -\
  :    \-> coroutine --await--> event --\
  :/ <-+----------------------- yield --/
  |    :
  |    :  # loop waits for event to happen
  |    :
  :\ --+-- send(reply) -------- yield --\
  :        coroutine <--yield-- event <-/

The key is that coroutine suspension allows the event loop and events to directly communicate. The intermediate coroutine stack does not require any knowledge about which loop is running it, nor how events work.

2.1.1. Events in time

The simplest event to handle is reaching a point in time. This is a fundamental block of threaded code as well: a thread repeatedly sleeps until a condition is true. However, a regular sleep blocks execution by itself – we want other coroutines to not be blocked. Instead, we want tell the event loop when it should resume the current coroutine stack.

2.1.2. Defining an Event

An event is simply a value we can identify – be it via an enum, a type or other identity. We can define this with a simple class that stores our target time. In addition to storing the event information, we can allow to await a class directly.

class AsyncSleep:
    """Event to sleep until a point in time"""
    def __init__(self, until: float):
        self.until = until

    # used whenever someone ``await``s an instance of this Event
    def __await__(self):
        # yield this Event to the loop
        yield self
    
    def __repr__(self):
        return '%s(until=%.1f)' % (self.__class__.__name__, self.until)

This class only stores the event – it does not say how to actually handle it.

The only special feature is __await__ – it is what the await keyword looks for. Practically, it is an iterator but not available for the regular iteration machinery.

2.2.1. Awaiting an event

Now that we have an event, how do coroutines react to it? We should be able to express the equivalent of sleep by awaiting our event. To better see what is going on, we wait twice for half the time:

import time

async def asleep(duration: float):
    """await that ``duration`` seconds pass"""
    await AsyncSleep(time.time() + duration / 2)
    await AsyncSleep(time.time() + duration / 2)

We can directly instantiate and run this coroutine. Similar to a generator, using coroutine.send runs the coroutine until it yields a result.

coroutine = asleep(100)
while True:
    print(coroutine.send(None))
    time.sleep(0.1)

This gives us two AsyncSleep events and then a StopIteration when the coroutine is done. Notice that the only delay is from time.sleep in the loop! Each AsyncSleep only stores an offset from the current time.

2.2.2. Event + Sleep

At this point, we have two separate mechanisms at our disposal:

  • AsyncSleep Events that can be yielded from inside a coroutine
  • time.sleep that can wait without impacting coroutines

Notably, these two are orthogonal: neither one affects or triggers the other. As a result, we can come up with our own strategy to sleep to meet the delay of an AsyncSleep.

2.3. A naive event loop

If we have several coroutines, each can tell us when it wants to be woken up. We can then wait until the first of them wants to be resumed, then for the one after, and so on. Notably, at each point we only care about which one is next.

This makes for a straightforward scheduling:

  1. sort coroutines by their desired wake up time
  2. pick the first that wants to wake up
  3. wait until this point in time
  4. run this coroutine
  5. repeat from 1.

A trivial implementation does not need any advanced concepts. A list allows to sort coroutines by date. Waiting is a regular time.sleep. Running coroutines works just like before with coroutine.send.

def run(*coroutines):
    """Cooperatively run all ``coroutines`` until completion"""
    # store wake-up-time and coroutines
    waiting = [(0, coroutine) for coroutine in coroutines]
    while waiting:
        # 2. pick the first coroutine that wants to wake up
        until, coroutine = waiting.pop(0)
        # 3. wait until this point in time
        time.sleep(max(0.0, until - time.time()))
        # 4. run this coroutine
        try:
            command = coroutine.send(None)
        except StopIteration:
            continue
        # 1. sort coroutines by their desired suspension
        if isinstance(command, AsyncSleep):
            waiting.append((command.until, coroutine))
            waiting.sort(key=lambda item: item[0])

Of course, this has ample room for improvement. We can use a heap for the wait queue or a dispatch table for events. We could also fetch return values from the StopIteration and assign them to the coroutine. However, the fundamental principle remains the same.

2.4. Cooperative Waiting

The AsyncSleep event and run event loop are a fully working implementation of timed events.

async def sleepy(identifier: str = "coroutine", count=5):
    for i in range(count):
        print(identifier, 'step', i + 1, 'at %.2f' % time.time())
        await asleep(0.1)

run(*(sleepy("coroutine %d" % j) for j in range(5)))

This cooperatively switches between each of the five coroutines, suspending each for 0.1 seconds. Even though the event loop is synchronous, it still executes the work in 0.5 seconds instead of 2.5 seconds. Each coroutine holds state and acts independently.

3. I/O event loop

An event loop that supports sleep is suitable for polling. However, waiting for I/O on a file handle can be done more efficiently: the operating system implements I/O and thus knows which handles are ready. Ideally, an event loop should support an explicit “ready for I/O” event.

3.1. The select call

Python already has an interface to query the OS for read I/O handles. When called with handles to read or write, it returns the handles ready to read or write:

readable, writeable, _ = select.select(rlist, wlist, xlist, timeout)

For example, we can open a file for writing and wait for it to be ready:

write_target = open('/tmp/foo')
readable, writeable, _ = select.select([], [write_target], [])

Once select returns, writeable contains our open file.

3.2. Basic I/O event

Similar to the AsyncSleep request, we need to define an event for I/O. With the underlying select logic, the event must refer to a readable object – say an open file. In addition, we store how much data to read.

class AsyncRead:
    def __init__(self, file, amount=1):
        self.file = file
        self.amount = amount
        self._buffer = ''

    def __await__(self):
        while len(self._buffer) < self.amount:
            yield self
            # we only get here if ``read`` should not block
            self._buffer += self.file.read(1)
        return self._buffer

    def __repr__(self):
        return '%s(file=%s, amount=%d, progress=%d)' % (
            self.__class__.__name__, self.file, self.amount, len(self._buffer)
        )

As with AsyncSleep we mostly just store the data required for the underlying system call. This time, __await__ is capable of being resumed multiple times – until our desired amount has been read. In addition, we return the I/O result instead of just resuming.

3.3. Augmenting an event loop with read I/O

The basis for our event loop is still the run defined previously. First, we need to track the read requests. This is no longer a sorted schedule, we only map read requests to coroutines.

# new
waiting_read = {}  # type: Dict[file, coroutine]

Since select.select takes a timeout parameter, we can use it in place of time.sleep.

# old
time.sleep(max(0.0, until - time.time()))
# new
readable, _, _ = select.select(list(reads), [], [])

This gives us all readable files – if there are any, we run the corresponding coroutine. If there are none, we have waited long enough for our current coroutine to run.

# new - reschedule waiting coroutine, run readable coroutine
if readable:
    waiting.append((until, coroutine))
    waiting.sort()
    coroutine = waiting_read[readable[0]]

Finally, we have to actually listen for read requests.

# new
if isinstance(command, AsyncSleep):
    ...
elif isinstance(command, AsyncRead):
    ...

3.4. Putting it together

The above was a bit of a simplification. We need to do some switching to not starve sleeping coroutines if we can always read. We need to handle having nothing to read or nothing to wait for. However, the end result still fits into 30 LOC.

def run(*coroutines):
    """Cooperatively run all ``coroutines`` until completion"""
    waiting_read = {}  # type: Dict[file, coroutine]
    waiting = [(0, coroutine) for coroutine in coroutines]
    while waiting or waiting_read:
        # 2. wait until the next coroutine may run or read ...
        try:
            until, coroutine = waiting.pop(0)
        except IndexError:
            until, coroutine = float('inf'), None
            readable, _, _ = select.select(list(waiting_read), [], [])
        else:
            readable, _, _ = select.select(list(waiting_read), [], [], max(0.0, until - time.time()))
        # ... and select the appropriate one
        if readable and time.time() < until:
            if until and coroutine:
                waiting.append((until, coroutine))
                waiting.sort()
            coroutine = waiting_read.pop(readable[0])
        # 3. run this coroutine
        try:
            command = coroutine.send(None)
        except StopIteration:
            continue
        # 1. sort coroutines by their desired suspension ...
        if isinstance(command, AsyncSleep):
            waiting.append((command.until, coroutine))
            waiting.sort(key=lambda item: item[0])
        # ... or register reads
        elif isinstance(command, AsyncRead):
            waiting_read[command.file] = coroutine

3.5. Cooperative I/O

The AsyncSleep, AsyncRead and run implementations are now fully functional to sleep and/or read. Same as for sleepy, we can define a helper to test reading:

async def ready(path, amount=1024*32):
    print('read', path, 'at', '%d' % time.time())
    with open(path, 'rb') as file:
        result = await AsyncRead(file, amount)
    print('done', path, 'at', '%d' % time.time())
    print('got', len(result), 'B')

run(sleepy('background', 5), ready('/dev/urandom'))

Running this, we can see that our I/O is interleaved with the waiting task:

id background round 1
read /dev/urandom at 1530721148
id background round 2
id background round 3
id background round 4
id background round 5
done /dev/urandom at 1530721148
got 1024 B

4. Non-Blocking I/O

While I/O on files gets the concept across, it is not really suitable for a library like asyncio: the select call always returns for files, and both open and read may block indefinitely. This blocks all coroutines of an event loop – which is bad. Libraries like aiofiles use threads and synchronization to fake non-blocking I/O and events on file.

However, sockets do allow for non-blocking I/O – and their inherent latency makes it much more critical. When used in an event loop, waiting for data and retrying can be wrapped without blocking anything.

4.1. Non-Blocking I/O event

Similar to our AsyncRead, we can define a suspend-and-read event for sockets. Instead of taking a file, we take a socket – which must be non-blocking. Also, our __await__ uses socket.recv instead of file.read.

class AsyncRecv:
    def __init__(self, connection, amount=1, read_buffer=1024):
        assert not connection.getblocking(), 'connection must be non-blocking for async recv'
        self.connection = connection
        self.amount = amount
        self.read_buffer = read_buffer
        self._buffer = b''

    def __await__(self):
        while len(self._buffer) < self.amount:
            try:
                self._buffer += self.connection.recv(self.read_buffer)
            except BlockingIOError:
                yield self
        return self._buffer

    def __repr__(self):
        return '%s(file=%s, amount=%d, progress=%d)' % (
            self.__class__.__name__, self.connection, self.amount, len(self._buffer)
        )

In contrast to AsyncRead, __await__ performs truly non-blocking I/O. When data is available, it always reads. When no data is available, it always suspends. That means the event loop is only blocked while we perform useful work.

4.2. Un-Blocking the event loop

As far as the event loop is concerned, nothing changes much. The event to listen for is still the same as for files – a file descriptor marked ready by select.

# old
elif isinstance(command, AsyncRead):
    waiting_read[command.file] = coroutine
# new
elif isinstance(command, AsyncRead):
    waiting_read[command.file] = coroutine
elif isinstance(command, AsyncRecv):
    waiting_read[command.connection] = coroutine

At this point, it should be obvious that AsyncRead and AsyncRecv are the same kind of event. We could easily refactor them to be one event with an exchangeable I/O component. In effect, the event loop, coroutines and events cleanly separate a scheduler, arbitrary intermediate code and the actual I/O.

4.3. The ugly side of non-blocking I/O

In principle, what you should do at this point is replicate the logic of read as a recv for AsyncRecv. However, this is much more ugly now – you have to handle early returns when functions block inside the kernel, but yield control to you. For example, opening a connection versus opening a file is much longer:

# file
file = open(path, 'rb')
# non-blocking socket
connection = socket.socket()
connection.setblocking(False)
# open without blocking - retry on failure
try:
    connection.connect((url, port))
except BlockingIOError:
    pass

Long story short, what remains is a few dozen lines of Exception handling. The events and event loop already work at this point.

id background round 1
read localhost:25000 at 1530783569
read /dev/urandom at 1530783569
done localhost:25000 at 1530783569 got 32768 B
id background round 2
id background round 3
id background round 4
done /dev/urandom at 1530783569 got 4096 B
id background round 5

Addendum

Example code at github


回答 2

coro概念上讲,您的退货是正确的,但略微不完整。

await不会无条件地挂起,只有在遇到阻塞调用时才挂起。它如何知道呼叫正在阻塞?这由等待的代码决定。例如,可以将套接字读取的等待实现改为:

def read(sock, n):
    # sock must be in non-blocking mode
    try:
        return sock.recv(n)
    except EWOULDBLOCK:
        event_loop.add_reader(sock.fileno, current_task())
        return SUSPEND

在实际的异步中,等效代码修改a的状态,Future而不返回魔术值,但是概念是相同的。当适当地适合于类似生成器的对象时,可以await编辑以上代码。

在呼叫方,当协程包含:

data = await read(sock, 1024)

它减少了接近:

data = read(sock, 1024)
if data is SUSPEND:
    return SUSPEND
self.pos += 1
self.parts[self.pos](...)

熟悉生成器的人往往会根据yield from悬浮液自动进行描述。

挂起链一直持续到事件循环,该循环注意到协程已挂起,将其从可运行集合中删除,然后继续执行可运行的协程(如果有)。如果没有协程可运行,则循环等待,select()直到协程感兴趣的文件描述符中的任何一个都准备好进行IO。(事件循环维护文件描述符到协程的映射。)

在上面的示例中,一旦select()告知事件循环sock可读,它将重新添加coro到可运行集,因此将从暂停点继续执行。

换一种说法:

  1. 默认情况下,所有操作都在同一线程中发生。

  2. 事件循环负责安排协程,并在协程等待(通常会阻塞或超时的IO调用)准备就绪时将其唤醒。

为了深入了解协程驱动事件循环,我推荐Dave Beazley的演讲,他在现场观众面前演示了从头开始编写事件循环的过程。

Your coro desugaring is conceptually correct, but slightly incomplete.

await doesn’t suspend unconditionally, but only if it encounters a blocking call. How does it know that a call is blocking? This is decided by the code being awaited. For example, an awaitable implementation of socket read could be desugared to:

def read(sock, n):
    # sock must be in non-blocking mode
    try:
        return sock.recv(n)
    except EWOULDBLOCK:
        event_loop.add_reader(sock.fileno, current_task())
        return SUSPEND

In real asyncio the equivalent code modifies the state of a Future instead of returning magic values, but the concept is the same. When appropriately adapted to a generator-like object, the above code can be awaited.

On the caller side, when your coroutine contains:

data = await read(sock, 1024)

It desugars into something close to:

data = read(sock, 1024)
if data is SUSPEND:
    return SUSPEND
self.pos += 1
self.parts[self.pos](...)

People familiar with generators tend to describe the above in terms of yield from which does the suspension automatically.

The suspension chain continues all the way up to the event loop, which notices that the coroutine is suspended, removes it from the runnable set, and goes on to execute coroutines that are runnable, if any. If no coroutines are runnable, the loop waits in select() until either a file descriptor a coroutine is interested in becomes ready for IO. (The event loop maintains a file-descriptor-to-coroutine mapping.)

In the above example, once select() tells the event loop that sock is readable, it will re-add coro to the runnable set, so it will be continued from the point of suspension.

In other words:

  1. Everything happens in the same thread by default.

  2. The event loop is responsible for scheduling the coroutines and waking them up when whatever they were waiting for (typically an IO call that would normally block, or a timeout) becomes ready.

For insight on coroutine-driving event loops, I recommend this talk by Dave Beazley, where he demonstrates coding an event loop from scratch in front of live audience.


回答 3

归结为异步解决的两个主要挑战:

  • 如何在单个线程中执行多个I / O?
  • 如何实现协作式多任务处理?

关于第一点的答案已经存在了很长一段时间,被称为选择循环。在python中,它是在选择器模块中实现的。

第二个问题与协程的概念有关,即协程可以停止执行并在以后恢复。在python中,协程是使用生成器yield from语句实现的。这就是隐藏在async / await语法后面的东西。

答案中的更多资源。


编辑:解决您对goroutines的评论:

在asyncio中,与goroutine最接近的等效项实际上不是协程,而是任务(请参见文档中的区别)。在python中,协程(或生成器)对事件循环或I / O的概念一无所知。它只是一个可以yield在保持其当前状态的同时停止使用其执行的功能,因此可以在以后还原。该yield from语法允许以透明方式链接它们。

现在,在异步任务中,位于链最底部的协程始终最终产生了未来。然后,这种未来会上升到事件循环,并集成到内部机制中。当将来通过其他内部回调设置为完成时,事件循环可以通过将将来发送回协程链来恢复任务。


编辑:解决您帖子中的一些问题:

在这种情况下,I / O实际如何发生?在单独的线程中?整个解释器是否已暂停并且I / O在解释器外部进行?

不,线程中什么也没有发生。I / O始终由事件循环管理,主要是通过文件描述符进行。但是,这些文件描述符的注册通常被高级协同程序隐藏,这使您的工作变得很脏。

I / O到底是什么意思?如果我的python过程称为C open()过程,然后它向内核发送了中断,放弃了对它的控制,那么Python解释器如何知道这一点并能够继续运行其他代码,而内核代码则执行实际的I / O,直到唤醒原来发送中断的Python过程?原则上,Python解释器如何知道这种情况?

I / O是任何阻塞调用。在asyncio中,所有I / O操作都应经过事件循环,因为正如您所说,事件循环无法知道某个同步代码中正在执行阻塞调用。这意味着您不应该open在协程的上下文中使用同步。相反,请使用aiofiles这样的专用库,该库提供的异步版本open

It all boils down to the two main challenges that asyncio is addressing:

  • How to perform multiple I/O in a single thread?
  • How to implement cooperative multitasking?

The answer to the first point has been around for a long while and is called a select loop. In python, it is implemented in the selectors module.

The second question is related to the concept of coroutine, i.e. functions that can stop their execution and be restored later on. In python, coroutines are implemented using generators and the yield from statement. That’s what is hiding behind the async/await syntax.

More resources in this answer.


EDIT: Addressing your comment about goroutines:

The closest equivalent to a goroutine in asyncio is actually not a coroutine but a task (see the difference in the documentation). In python, a coroutine (or a generator) knows nothing about the concepts of event loop or I/O. It simply is a function that can stop its execution using yield while keeping its current state, so it can be restored later on. The yield from syntax allows for chaining them in a transparent way.

Now, within an asyncio task, the coroutine at the very bottom of the chain always ends up yielding a future. This future then bubbles up to the event loop, and gets integrated into the inner machinery. When the future is set to done by some other inner callback, the event loop can restore the task by sending the future back into the coroutine chain.


EDIT: Addressing some of the questions in your post:

How does I/O actually happen in this scenario? In a separate thread? Is the whole interpreter suspended and I/O happens outside the interpreter?

No, nothing happens in a thread. I/O is always managed by the event loop, mostly through file descriptors. However the registration of those file descriptors is usually hidden by high-level coroutines, making the dirty work for you.

What exactly is meant by I/O? If my python procedure called C open() procedure, and it in turn sent interrupt to kernel, relinquishing control to it, how does Python interpreter know about this and is able to continue running some other code, while kernel code does the actual I/O and until it wakes up the Python procedure which sent the interrupt originally? How can Python interpreter in principle, be aware of this happening?

An I/O is any blocking call. In asyncio, all the I/O operations should go through the event loop, because as you said, the event loop has no way to be aware that a blocking call is being performed in some synchronous code. That means you’re not supposed to use a synchronous open within the context of a coroutine. Instead, use a dedicated library such aiofiles which provides an asynchronous version of open.


确保只运行一个程序实例

问题:确保只运行一个程序实例

有没有一种Python方式可以只运行一个程序实例?

我想出的唯一合理的解决方案是尝试将其作为服务器在某个端口上运行,然后试图绑定到同一端口的第二个程序失败。但这不是一个好主意,也许有比这更轻巧的东西了吗?

(考虑到程序有时可能会失败,例如segfault-因此“锁定文件”之类的东西将无法工作)

Is there a Pythonic way to have only one instance of a program running?

The only reasonable solution I’ve come up with is trying to run it as a server on some port, then second program trying to bind to same port – fails. But it’s not really a great idea, maybe there’s something more lightweight than this?

(Take into consideration that program is expected to fail sometimes, i.e. segfault – so things like “lock file” won’t work)


回答 0

以下代码可以完成此工作,它是跨平台的,并且可以在Python 2.4-3.2上运行。我在Windows,OS X和Linux上进行了测试。

from tendo import singleton
me = singleton.SingleInstance() # will sys.exit(-1) if other instance is running

最新的代码版本位于singleton.py中。请在这里提交错误

您可以使用以下方法之一安装tend:

The following code should do the job, it is cross-platform and runs on Python 2.4-3.2. I tested it on Windows, OS X and Linux.

from tendo import singleton
me = singleton.SingleInstance() # will sys.exit(-1) if other instance is running

The latest code version is available singleton.py. Please file bugs here.

You can install tend using one of the following methods:


回答 1

简单,跨平台的解决方案,在发现了另一个问题Zgoda酒店

import fcntl
import os
import sys

def instance_already_running(label="default"):
    """
    Detect if an an instance with the label is already running, globally
    at the operating system level.

    Using `os.open` ensures that the file pointer won't be closed
    by Python's garbage collector after the function's scope is exited.

    The lock will be released when the program exits, or could be
    released if the file pointer were closed.
    """

    lock_file_pointer = os.open(f"/tmp/instance_{label}.lock", os.O_WRONLY)

    try:
        fcntl.lockf(lock_file_pointer, fcntl.LOCK_EX | fcntl.LOCK_NB)
        already_running = False
    except IOError:
        already_running = True

    return already_running

很像S.Lott的建议,但是带有代码。

Simple, cross-platform solution, found in another question by zgoda:

import fcntl
import os
import sys

def instance_already_running(label="default"):
    """
    Detect if an an instance with the label is already running, globally
    at the operating system level.

    Using `os.open` ensures that the file pointer won't be closed
    by Python's garbage collector after the function's scope is exited.

    The lock will be released when the program exits, or could be
    released if the file pointer were closed.
    """

    lock_file_pointer = os.open(f"/tmp/instance_{label}.lock", os.O_WRONLY)

    try:
        fcntl.lockf(lock_file_pointer, fcntl.LOCK_EX | fcntl.LOCK_NB)
        already_running = False
    except IOError:
        already_running = True

    return already_running

A lot like S.Lott’s suggestion, but with the code.


回答 2

此代码特定于Linux。它使用“抽象” UNIX域套接字,但是它很简单并且不会留下过时的锁定文件。与上面的解决方案相比,我更喜欢它,因为它不需要专门保留的TCP端口。

try:
    import socket
    s = socket.socket(socket.AF_UNIX, socket.SOCK_STREAM)
    ## Create an abstract socket, by prefixing it with null. 
    s.bind( '\0postconnect_gateway_notify_lock') 
except socket.error as e:
    error_code = e.args[0]
    error_string = e.args[1]
    print "Process already running (%d:%s ). Exiting" % ( error_code, error_string) 
    sys.exit (0) 

postconnect_gateway_notify_lock可以更改唯一字符串,以允许需要强制执行单个实例的多个程序。

This code is Linux specific. It uses ‘abstract’ UNIX domain sockets, but it is simple and won’t leave stale lock files around. I prefer it to the solution above because it doesn’t require a specially reserved TCP port.

try:
    import socket
    s = socket.socket(socket.AF_UNIX, socket.SOCK_STREAM)
    ## Create an abstract socket, by prefixing it with null. 
    s.bind( '\0postconnect_gateway_notify_lock') 
except socket.error as e:
    error_code = e.args[0]
    error_string = e.args[1]
    print "Process already running (%d:%s ). Exiting" % ( error_code, error_string) 
    sys.exit (0) 

The unique string postconnect_gateway_notify_lock can be changed to allow multiple programs that need a single instance enforced.


回答 3

我不知道它是否足够的pythonic,但是在Java世界中,在定义的端口上进行侦听是一种使用广泛的解决方案,因为它可以在所有主要平台上使用,并且在崩溃的程序上没有任何问题。

侦听端口的另一个优点是可以将命令发送到正在运行的实例。例如,当用户第二次启动该程序时,您可以向运行中的实例发送命令以告诉它打开另一个窗口(例如Firefox就是这样做的。我不知道他们是否使用TCP端口或命名管道,或者这样的东西,虽然)。

I don’t know if it’s pythonic enough, but in the Java world listening on a defined port is a pretty widely used solution, as it works on all major platforms and doesn’t have any problems with crashing programs.

Another advantage of listening to a port is that you could send a command to the running instance. For example when the users starts the program a second time, you could send the running instance a command to tell it to open another window (that’s what Firefox does, for example. I don’t know if they use TCP ports or named pipes or something like that, ‘though).


回答 4

以前从未编写过python,但这是我刚刚在mycheckpoint中实现的功能,以防止crond将其启动两次或更多次:

import os
import sys
import fcntl
fh=0
def run_once():
    global fh
    fh=open(os.path.realpath(__file__),'r')
    try:
        fcntl.flock(fh,fcntl.LOCK_EX|fcntl.LOCK_NB)
    except:
        os._exit(0)

run_once()

在另一期(http://stackoverflow.com/questions/2959474)发布后,找到了Slava-N的建议。此功能称为函数,它锁定正在执行的脚本文件(不是pid文件)并保持锁定状态,直到脚本结束(正常或错误)。

Never written python before, but this is what I’ve just implemented in mycheckpoint, to prevent it being started twice or more by crond:

import os
import sys
import fcntl
fh=0
def run_once():
    global fh
    fh=open(os.path.realpath(__file__),'r')
    try:
        fcntl.flock(fh,fcntl.LOCK_EX|fcntl.LOCK_NB)
    except:
        os._exit(0)

run_once()

Found Slava-N’s suggestion after posting this in another issue (http://stackoverflow.com/questions/2959474). This one is called as a function, locks the executing scripts file (not a pid file) and maintains the lock until the script ends (normal or error).


回答 5

使用一个pid文件。您有一些已知的位置,“ / path / to / pidfile”,并且在启动时会执行以下操作(部分伪代码,因为我是咖啡前人士,所以不想那么努力):

import os, os.path
pidfilePath = """/path/to/pidfile"""
if os.path.exists(pidfilePath):
   pidfile = open(pidfilePath,"r")
   pidString = pidfile.read()
   if <pidString is equal to os.getpid()>:
      # something is real weird
      Sys.exit(BADCODE)
   else:
      <use ps or pidof to see if the process with pid pidString is still running>
      if  <process with pid == 'pidString' is still running>:
          Sys.exit(ALREADAYRUNNING)
      else:
          # the previous server must have crashed
          <log server had crashed>
          <reopen pidfilePath for writing>
          pidfile.write(os.getpid())
else:
    <open pidfilePath for writing>
    pidfile.write(os.getpid())

因此,换句话说,您正在检查是否存在pidfile。如果不是,请将您的pid写入该文件。如果pidfile存在,则检查pid是否为正在运行的进程的pid;如果是这样,则您有另一个正在运行的实时进程,因此只需关闭。如果不是,则先前的进程崩溃了,因此将其记录下来,然后将您自己的pid写入旧文件中。然后继续。

Use a pid file. You have some known location, “/path/to/pidfile” and at startup you do something like this (partially pseudocode because I’m pre-coffee and don’t want to work all that hard):

import os, os.path
pidfilePath = """/path/to/pidfile"""
if os.path.exists(pidfilePath):
   pidfile = open(pidfilePath,"r")
   pidString = pidfile.read()
   if <pidString is equal to os.getpid()>:
      # something is real weird
      Sys.exit(BADCODE)
   else:
      <use ps or pidof to see if the process with pid pidString is still running>
      if  <process with pid == 'pidString' is still running>:
          Sys.exit(ALREADAYRUNNING)
      else:
          # the previous server must have crashed
          <log server had crashed>
          <reopen pidfilePath for writing>
          pidfile.write(os.getpid())
else:
    <open pidfilePath for writing>
    pidfile.write(os.getpid())

So, in other words, you’re checking if a pidfile exists; if not, write your pid to that file. If the pidfile does exist, then check to see if the pid is the pid of a running process; if so, then you’ve got another live process running, so just shut down. If not, then the previous process crashed, so log it, and then write your own pid to the file in place of the old one. Then continue.


回答 6

您已经在另一个线程中找到了对类似问题的答复,因此为了完整起见,请参见如何在Windows上实现名为Mutex的相同目的。

http://code.activestate.com/recipes/474070/

You already found reply to similar question in another thread, so for completeness sake see how to achieve the same on Windows uning named mutex.

http://code.activestate.com/recipes/474070/


回答 7

这可能有效。

  1. 尝试将PID文件创建到已知位置。如果失败,则有人将文件锁定了,您就完成了。

  2. 正常完成后,请关闭并删除PID文件,以便其他人可以覆盖它。

您可以将程序包装在Shell脚本中,即使程序崩溃,该脚本也会删除PID文件。

如果程序挂起,也可以使用PID文件将其杀死。

This may work.

  1. Attempt create a PID file to a known location. If you fail, someone has the file locked, you’re done.

  2. When you finish normally, close and remove the PID file, so someone else can overwrite it.

You can wrap your program in a shell script that removes the PID file even if your program crashes.

You can, also, use the PID file to kill the program if it hangs.


回答 8

在UNIX上,使用锁定文件是一种非常普遍的方法。如果崩溃,则必须手动清理。您可以将PID存储在文件中,并在启动时检查是否有使用此PID的进程,否则将覆盖锁定文件。(但是,您还需要在read-file-check-pid-rewrite-file周围加锁)。您将在os -package中找到获取和检查pid所需的内容。检查是否存在具有给定pid的进程的常见方法是向其发送非致命信号。

其他替代方法可以将其与羊群或posix信号量结合使用。

如saua所建议的那样,打开网络插座可能是最简单,最方便的方法。

Using a lock-file is a quite common approach on unix. If it crashes, you have to clean up manually. You could stor the PID in the file, and on startup check if there is a process with this PID, overriding the lock-file if not. (However, you also need a lock around the read-file-check-pid-rewrite-file). You will find what you need for getting and checking pid in the os-package. The common way of checking if there exists a process with a given pid, is to send it a non-fatal signal.

Other alternatives could be combining this with flock or posix semaphores.

Opening a network socket, as saua proposed, would probably be the easiest and most portable.


回答 9

对于将wxPython用于其应用程序的任何人,您都可以使用此处记录的功能 wx.SingleInstanceChecker

我个人使用一个子类,wx.App该子类利用wx.SingleInstanceChecker和返回存在的应用程序现有实例,并False从中返回OnInit()

import wx

class SingleApp(wx.App):
    """
    class that extends wx.App and only permits a single running instance.
    """

    def OnInit(self):
        """
        wx.App init function that returns False if the app is already running.
        """
        self.name = "SingleApp-%s".format(wx.GetUserId())
        self.instance = wx.SingleInstanceChecker(self.name)
        if self.instance.IsAnotherRunning():
            wx.MessageBox(
                "An instance of the application is already running", 
                "Error", 
                 wx.OK | wx.ICON_WARNING
            )
            return False
        return True

这是一个简单的直接替换wx.App,禁止多个实例。要使用它,只需在代码中将替换wx.AppSingleApp,如下所示:

app = SingleApp(redirect=False)
frame = wx.Frame(None, wx.ID_ANY, "Hello World")
frame.Show(True)
app.MainLoop()

For anybody using wxPython for their application, you can use the function wx.SingleInstanceChecker documented here.

I personally use a subclass of wx.App which makes use of wx.SingleInstanceChecker and returns False from OnInit() if there is an existing instance of the app already executing like so:

import wx

class SingleApp(wx.App):
    """
    class that extends wx.App and only permits a single running instance.
    """

    def OnInit(self):
        """
        wx.App init function that returns False if the app is already running.
        """
        self.name = "SingleApp-%s".format(wx.GetUserId())
        self.instance = wx.SingleInstanceChecker(self.name)
        if self.instance.IsAnotherRunning():
            wx.MessageBox(
                "An instance of the application is already running", 
                "Error", 
                 wx.OK | wx.ICON_WARNING
            )
            return False
        return True

This is a simple drop-in replacement for wx.App that prohibits multiple instances. To use it simply replace wx.App with SingleApp in your code like so:

app = SingleApp(redirect=False)
frame = wx.Frame(None, wx.ID_ANY, "Hello World")
frame.Show(True)
app.MainLoop()

回答 10

这是我最终只能使用Windows的解决方案。将以下内容放入一个模块,可能称为“ onlyone.py”或其他任何模块。将该模块直接包含在__ main __ python脚本文件中。

import win32event, win32api, winerror, time, sys, os
main_path = os.path.abspath(sys.modules['__main__'].__file__).replace("\\", "/")

first = True
while True:
        mutex = win32event.CreateMutex(None, False, main_path + "_{<paste YOUR GUID HERE>}")
        if win32api.GetLastError() == 0:
            break
        win32api.CloseHandle(mutex)
        if first:
            print "Another instance of %s running, please wait for completion" % main_path
            first = False
        time.sleep(1)

说明

该代码尝试创建一个互斥锁,其名称来自脚本的完整路径。我们使用正斜杠来避免与实际文件系统的潜在混淆。

优点

  • 不需要配置或“魔术”标识符,请根据需要在许多不同的脚本中使用它。
  • 周围没有陈旧的文件,互斥锁将与您一起消亡。
  • 等待时打印有用的消息

Here is my eventual Windows-only solution. Put the following into a module, perhaps called ‘onlyone.py’, or whatever. Include that module directly into your __ main __ python script file.

import win32event, win32api, winerror, time, sys, os
main_path = os.path.abspath(sys.modules['__main__'].__file__).replace("\\", "/")

first = True
while True:
        mutex = win32event.CreateMutex(None, False, main_path + "_{<paste YOUR GUID HERE>}")
        if win32api.GetLastError() == 0:
            break
        win32api.CloseHandle(mutex)
        if first:
            print "Another instance of %s running, please wait for completion" % main_path
            first = False
        time.sleep(1)

Explanation

The code attempts to create a mutex with name derived from the full path to the script. We use forward-slashes to avoid potential confusion with the real file system.

Advantages

  • No configuration or ‘magic’ identifiers needed, use it in as many different scripts as needed.
  • No stale files left around, the mutex dies with you.
  • Prints a helpful message when waiting

回答 11

Windows上对此的最佳解决方案是使用@zgoda建议的互斥锁。

import win32event
import win32api
from winerror import ERROR_ALREADY_EXISTS

mutex = win32event.CreateMutex(None, False, 'name')
last_error = win32api.GetLastError()

if last_error == ERROR_ALREADY_EXISTS:
   print("App instance already running")

有些答案使用fctnl了Windows上不可用的(也包含在@sorin tento软件包中),并且如果您尝试使用像pyinstaller静态导入这样的软件包来冻结python应用程序,则会引发错误。

此外,使用锁定文件方法还会导致read-only数据库文件出现问题(sqlite3)。

The best solution for this on windows is to use mutexes as suggested by @zgoda.

import win32event
import win32api
from winerror import ERROR_ALREADY_EXISTS

mutex = win32event.CreateMutex(None, False, 'name')
last_error = win32api.GetLastError()

if last_error == ERROR_ALREADY_EXISTS:
   print("App instance already running")

Some answers use fctnl (included also in @sorin tendo package) which is not available on windows and should you try to freeze your python app using a package like pyinstaller which does static imports, it throws an error.

Also, using the lock file method, creates a read-only problem with database files( experienced this with sqlite3).


回答 12

我将其发布为答案,因为我是新用户,并且Stack Overflow尚未允许我投票。

Sorin Sbarnea的解决方案可在OS X,Linux和Windows下为我工作,对此我深表感谢。

但是,tempfile.gettempdir()在OS X和Windows下表现为一种方式,而在其他some / many / all(?)* nixes下表现为另一种方式(忽略OS X也是Unix!)。区别对于此代码很重要。

OS X和Windows具有用户特定的临时目录,因此,一个用户创建的临时文件对另一用户不可见。相比之下,在许多版本的* nix(我测试过Ubuntu 9,RHEL 5,OpenSolaris 2008和FreeBSD 8)下,临时目录对于所有用户都是/ tmp。

这意味着在多用户计算机上创建锁文件时,它是在/ tmp中创建的,只有第一次创建锁文件的用户才能运行该应用程序。

一种可能的解决方案是将当前用户名嵌入到锁定文件的名称中。

值得注意的是,OP抢占端口的解决方案在多用户计算机上也会出现异常。

I’m posting this as an answer because I’m a new user and Stack Overflow won’t let me vote yet.

Sorin Sbarnea’s solution works for me under OS X, Linux and Windows, and I am grateful for it.

However, tempfile.gettempdir() behaves one way under OS X and Windows and another under other some/many/all(?) *nixes (ignoring the fact that OS X is also Unix!). The difference is important to this code.

OS X and Windows have user-specific temp directories, so a tempfile created by one user isn’t visible to another user. By contrast, under many versions of *nix (I tested Ubuntu 9, RHEL 5, OpenSolaris 2008 and FreeBSD 8), the temp dir is /tmp for all users.

That means that when the lockfile is created on a multi-user machine, it’s created in /tmp and only the user who creates the lockfile the first time will be able to run the application.

A possible solution is to embed the current username in the name of the lock file.

It’s worth noting that the OP’s solution of grabbing a port will also misbehave on a multi-user machine.


回答 13

single_process在我的gentoo上使用;

pip install single_process

例如

from single_process import single_process

@single_process
def main():
    print 1

if __name__ == "__main__":
    main()   

参考:https : //pypi.python.org/pypi/single_process/1.0

I use single_process on my gentoo;

pip install single_process

example:

from single_process import single_process

@single_process
def main():
    print 1

if __name__ == "__main__":
    main()   

refer: https://pypi.python.org/pypi/single_process/1.0


回答 14

我一直怀疑使用进程组应该有一个不错的POSIXy解决方案,而不必使用文件系统,但是我不太确定。就像是:

启动时,您的进程会向特定组中的所有进程发送“ kill -0”。如果存在任何此类进程,则退出。然后,它加入该组。没有其他进程使用该组。

但是,这是一个竞争条件-多个进程都可以恰好同时执行此操作,并且最终都加入了该团队并同时运行。到您添加某种互斥锁使其具有水密性时,您不再需要过程组。

如果您的过程仅由cron启动,每分钟或每小时启动一次,这可能是可以接受的,但是这让我有些紧张,因为它恰恰在您不希望的那一天出错。

我猜毕竟这不是一个很好的解决方案,除非有人可以改进?

I keep suspecting there ought to be a good POSIXy solution using process groups, without having to hit the file system, but I can’t quite nail it down. Something like:

On startup, your process sends a ‘kill -0’ to all processes in a particular group. If any such processes exist, it exits. Then it joins the group. No other processes use that group.

However, this has a race condition – multiple processes could all do this at precisely the same time and all end up joining the group and running simultaneously. By the time you’ve added some sort of mutex to make it watertight, you no longer need the process groups.

This might be acceptable if your process only gets started by cron, once every minute or every hour, but it makes me a bit nervous that it would go wrong precisely on the day when you don’t want it to.

I guess this isn’t a very good solution after all, unless someone can improve on it?


回答 15

上周,我遇到了这个确切的问题,尽管我确实找到了一些好的解决方案,但我还是决定制作一个非常简单干净的python软件包并将其上传到PyPI。它与tendo的不同之处在于它可以锁定任何字符串资源名称。尽管您当然可以锁定__file__以达到相同的效果。

安装方式: pip install quicklock

使用它非常简单:

[nate@Nates-MacBook-Pro-3 ~/live] python
Python 2.7.6 (default, Sep  9 2014, 15:04:36)
[GCC 4.2.1 Compatible Apple LLVM 6.0 (clang-600.0.39)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> from quicklock import singleton
>>> # Let's create a lock so that only one instance of a script will run
...
>>> singleton('hello world')
>>>
>>> # Let's try to do that again, this should fail
...
>>> singleton('hello world')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/Users/nate/live/gallery/env/lib/python2.7/site-packages/quicklock/quicklock.py", line 47, in singleton
    raise RuntimeError('Resource <{}> is currently locked by <Process {}: "{}">'.format(resource, other_process.pid, other_process.name()))
RuntimeError: Resource <hello world> is currently locked by <Process 24801: "python">
>>>
>>> # But if we quit this process, we release the lock automatically
...
>>> ^D
[nate@Nates-MacBook-Pro-3 ~/live] python
Python 2.7.6 (default, Sep  9 2014, 15:04:36)
[GCC 4.2.1 Compatible Apple LLVM 6.0 (clang-600.0.39)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> from quicklock import singleton
>>> singleton('hello world')
>>>
>>> # No exception was thrown, we own 'hello world'!

看一下:https : //pypi.python.org/pypi/quicklock

I ran into this exact problem last week, and although I did find some good solutions, I decided to make a very simple and clean python package and uploaded it to PyPI. It differs from tendo in that it can lock any string resource name. Although you could certainly lock __file__ to achieve the same effect.

Install with: pip install quicklock

Using it is extremely simple:

[nate@Nates-MacBook-Pro-3 ~/live] python
Python 2.7.6 (default, Sep  9 2014, 15:04:36)
[GCC 4.2.1 Compatible Apple LLVM 6.0 (clang-600.0.39)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> from quicklock import singleton
>>> # Let's create a lock so that only one instance of a script will run
...
>>> singleton('hello world')
>>>
>>> # Let's try to do that again, this should fail
...
>>> singleton('hello world')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/Users/nate/live/gallery/env/lib/python2.7/site-packages/quicklock/quicklock.py", line 47, in singleton
    raise RuntimeError('Resource <{}> is currently locked by <Process {}: "{}">'.format(resource, other_process.pid, other_process.name()))
RuntimeError: Resource <hello world> is currently locked by <Process 24801: "python">
>>>
>>> # But if we quit this process, we release the lock automatically
...
>>> ^D
[nate@Nates-MacBook-Pro-3 ~/live] python
Python 2.7.6 (default, Sep  9 2014, 15:04:36)
[GCC 4.2.1 Compatible Apple LLVM 6.0 (clang-600.0.39)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> from quicklock import singleton
>>> singleton('hello world')
>>>
>>> # No exception was thrown, we own 'hello world'!

Take a look: https://pypi.python.org/pypi/quicklock


回答 16

在罗伯托·罗萨里奥(Roberto Rosario)的回答的基础上,我提出了以下功能:

SOCKET = None
def run_single_instance(uniq_name):
    try:
        import socket
        global SOCKET
        SOCKET = socket.socket(socket.AF_UNIX, socket.SOCK_STREAM)
        ## Create an abstract socket, by prefixing it with null.
        # this relies on a feature only in linux, when current process quits, the
        # socket will be deleted.
        SOCKET.bind('\0' + uniq_name)
        return True
    except socket.error as e:
        return False

我们需要定义全局SOCKET变量,因为只有在整个过程退出时才会对其进行垃圾收集。如果我们在函数中声明局部变量,则在函数退出后它将超出范围,因此将删除套接字。

所有的功劳应该归功于罗伯托·罗萨里奥(Roberto Rosario),因为我只是澄清和阐述了他的代码。而且此代码仅在Linux上有效,如https://troydhanson.github.io/network/Unix_domain_sockets.html中以下引用的文字所述:

Linux具有一项特殊功能:如果UNIX域套接字的路径名以空字节\ 0开头,则其名称不会映射到文件系统中。因此,它不会与文件系统中的其他名称冲突。同样,当服务器在抽象命名空间中关闭其UNIX域侦听套接字时,其文件也将被删除。使用常规的UNIX域套接字,该文件在服务器关闭后仍然存在。

Building upon Roberto Rosario’s answer, I come up with the following function:

SOCKET = None
def run_single_instance(uniq_name):
    try:
        import socket
        global SOCKET
        SOCKET = socket.socket(socket.AF_UNIX, socket.SOCK_STREAM)
        ## Create an abstract socket, by prefixing it with null.
        # this relies on a feature only in linux, when current process quits, the
        # socket will be deleted.
        SOCKET.bind('\0' + uniq_name)
        return True
    except socket.error as e:
        return False

We need to define global SOCKET vaiable since it will only be garbage collected when the whole process quits. If we declare a local variable in the function, it will go out of scope after the function exits, thus the socket be deleted.

All the credit should go to Roberto Rosario, since I only clarify and elaborate upon his code. And this code will work only on Linux, as the following quoted text from https://troydhanson.github.io/network/Unix_domain_sockets.html explains:

Linux has a special feature: if the pathname for a UNIX domain socket begins with a null byte \0, its name is not mapped into the filesystem. Thus it won’t collide with other names in the filesystem. Also, when a server closes its UNIX domain listening socket in the abstract namespace, its file is deleted; with regular UNIX domain sockets, the file persists after the server closes it.


回答 17

linux示例

此方法是基于创建临时文件后关闭应用程序而自动删除的。程序启动后,我们验证文件是否存在;如果文件存在(正在执行中),则程序关闭;否则,它将创建文件并继续执行程序。

from tempfile import *
import time
import os
import sys


f = NamedTemporaryFile( prefix='lock01_', delete=True) if not [f  for f in     os.listdir('/tmp') if f.find('lock01_')!=-1] else sys.exit()

YOUR CODE COMES HERE

linux example

This method is based on the creation of a temporary file automatically deleted after you close the application. the program launch we verify the existence of the file; if the file exists ( there is a pending execution) , the program is closed ; otherwise it creates the file and continues the execution of the program.

from tempfile import *
import time
import os
import sys


f = NamedTemporaryFile( prefix='lock01_', delete=True) if not [f  for f in     os.listdir('/tmp') if f.find('lock01_')!=-1] else sys.exit()

YOUR CODE COMES HERE

回答 18

在Linux系统上,还可以询问 pgrep -a实例数,该脚本位于进程列表中(选项-a显示完整的命令行字符串)。例如

import os
import sys
import subprocess

procOut = subprocess.check_output( "/bin/pgrep -u $UID -a python", shell=True, 
                                   executable="/bin/bash", universal_newlines=True)

if procOut.count( os.path.basename(__file__)) > 1 :        
    sys.exit( ("found another instance of >{}<, quitting."
              ).format( os.path.basename(__file__)))

-u $UID如果限制适用于所有用户,请删除。免责声明:a)假定脚本的(基本)名称是唯一的,b)可能存在竞争条件。

On a Linux system one could also ask pgrep -a for the number of instances, the script is found in the process list (option -a reveals the full command line string). E.g.

import os
import sys
import subprocess

procOut = subprocess.check_output( "/bin/pgrep -u $UID -a python", shell=True, 
                                   executable="/bin/bash", universal_newlines=True)

if procOut.count( os.path.basename(__file__)) > 1 :        
    sys.exit( ("found another instance of >{}<, quitting."
              ).format( os.path.basename(__file__)))

Remove -u $UID if the restriction should apply to all users. Disclaimer: a) it is assumed that the script’s (base)name is unique, b) there might be race conditions.


回答 19

import sys,os

# start program
try:  # (1)
    os.unlink('lock')  # (2)
    fd=os.open("lock", os.O_CREAT|os.O_EXCL) # (3)  
except: 
    try: fd=os.open("lock", os.O_CREAT|os.O_EXCL) # (4) 
    except:  
        print "Another Program running !.."  # (5)
        sys.exit()  

# your program  ...
# ...

# exit program
try: os.close(fd)  # (6)
except: pass
try: os.unlink('lock')  
except: pass
sys.exit()  
import sys,os

# start program
try:  # (1)
    os.unlink('lock')  # (2)
    fd=os.open("lock", os.O_CREAT|os.O_EXCL) # (3)  
except: 
    try: fd=os.open("lock", os.O_CREAT|os.O_EXCL) # (4) 
    except:  
        print "Another Program running !.."  # (5)
        sys.exit()  

# your program  ...
# ...

# exit program
try: os.close(fd)  # (6)
except: pass
try: os.unlink('lock')  
except: pass
sys.exit()  

Google App Engine的项目结构

问题:Google App Engine的项目结构

我刚问世时就在Google App Engine中启动了一个应用程序,以使用该技术并从事一个我一直想了很久但从未尝试过的宠物项目。结果是BowlSK。但是,随着它的增长和功能的添加,使其变得井井有条变得非常困难-主要是因为这是我的第一个python项目,在开始工作之前我对此一无所知。

我有的:

  • 主级别包含:
    • 所有.py文件(不知道如何使程序包正常工作)
    • 主页面的所有.html模板
  • 子目录:
    • 用于CSS,图片,JS等的单独文件夹。
    • 包含用于子目录类型网址的.html模板的文件夹

示例:
http : //www.bowlsk.com/映射到HomePage(默认包),模板位于“ index.html”
http://www.bowlsk.com/games/view-series.html?series=7130映射到ViewSeriesPage(同样是默认程序包),位于“ games / view-series.html”的模板

真讨厌 我如何重组?我有两个想法:

  • 主文件夹包含:appdef,索引,main.py?

    • 代码的子文件夹。这一定是我的第一个包裹吗?
    • 模板的子文件夹。文件夹层次结构将与包层次结构匹配
    • CSS,图像,JS等的单个子文件夹。
  • 主文件夹包含appdef,索引,main.py?

    • 代码+模板的子文件夹。这样,我就在模板旁边设置了处理程序类,因为在此阶段,我要添加许多功能,因此对一个进行修改意味着对另一个进行了修改。同样,我必须将此文件夹名称作为Class的第一个软件包名称吗?我希望文件夹为“ src”,但我不希望我的Class为“ src.WhateverPage”

有最佳做法吗?随着Django 1.0的出现,当它成为正式的GAE模板引擎时,我现在可以做些什么来提高与它的集成能力?我将简单地开始尝试这些事情,然后看一看似乎更好,但是pyDev的重构支持似乎不能很好地处理程序包的移动,因此使所有这些再次工作可能不是一件容易的事。

I started an application in Google App Engine right when it came out, to play with the technology and work on a pet project that I had been thinking about for a long time but never gotten around to starting. The result is BowlSK. However, as it has grown, and features have been added, it has gotten really difficult to keep things organized – mainly due to the fact that this is my first python project, and I didn’t know anything about it until I started working.

What I have:

  • Main Level contains:
    • all .py files (didn’t know how to make packages work)
    • all .html templates for main level pages
  • Subdirectories:
    • separate folders for css, images, js, etc.
    • folders that hold .html templates for subdirecty-type urls

Example:
http://www.bowlsk.com/ maps to HomePage (default package), template at “index.html”
http://www.bowlsk.com/games/view-series.html?series=7130 maps to ViewSeriesPage (again, default package), template at “games/view-series.html”

It’s nasty. How do I restructure? I had 2 ideas:

  • Main Folder containing: appdef, indexes, main.py?

    • Subfolder for code. Does this have to be my first package?
    • Subfolder for templates. Folder heirarchy would match package heirarchy
    • Individual subfolders for css, images, js, etc.
  • Main Folder containing appdef, indexes, main.py?

    • Subfolder for code + templates. This way I have the handler class right next to the template, because in this stage, I’m adding lots of features, so modifications to one mean modifications to the other. Again, do I have to have this folder name be the first package name for my classes? I’d like the folder to be “src”, but I don’t want my classes to be “src.WhateverPage”

Is there a best practice? With Django 1.0 on the horizon, is there something I can do now to improve my ability to integrate with it when it becomes the official GAE templating engine? I would simply start trying these things, and seeing which seems better, but pyDev’s refactoring support doesn’t seem to handle package moves very well, so it will likely be a non-trivial task to get all of this working again.


回答 0

首先,我建议您看看“ 使用Python,Django和Google App Engine进行快速开发

GvR在幻灯片演示文稿的第10页上描述了常规/标准项目布局。

在这里,我将从该页面发布布局/结构的略微修改版本。我本人几乎遵循这种模式。您还提到了打包方面的问题。只要确保您的每个子文件夹都有一个__init__.py文件即可。如果它为空也可以。

样板文件

  • 这些项目之间几乎没有差异
  • app.yaml:将所有非静态请求定向到main.py
  • main.py:初始化应用并发送所有请求

项目布局

  • static / *:静态文件;由App Engine直接提供
  • myapp / *。py:特定于应用的python代码
    • views.py,models.py,tests.py,__ init__.py等
  • templates / *。html:模板(或myapp / templates / *。html)

以下是一些可能也有帮助的代码示例:

main.py

import wsgiref.handlers

from google.appengine.ext import webapp
from myapp.views import *

application = webapp.WSGIApplication([
  ('/', IndexHandler),
  ('/foo', FooHandler)
], debug=True)

def main():
  wsgiref.handlers.CGIHandler().run(application)

myapp / views.py

import os
import datetime
import logging
import time

from google.appengine.api import urlfetch
from google.appengine.ext.webapp import template
from google.appengine.api import users
from google.appengine.ext import webapp
from models import *

class IndexHandler(webapp.RequestHandler):
  def get(self):
    date = "foo"
    # Do some processing        
    template_values = {'data': data }
    path = os.path.join(os.path.dirname(__file__) + '/../templates/', 'main.html')
    self.response.out.write(template.render(path, template_values))

class FooHandler(webapp.RequestHandler):
  def get(self):
    #logging.debug("start of handler")

myapp / models.py

from google.appengine.ext import db

class SampleModel(db.Model):

我认为这种布局非常适合新的和相对较小的中型项目。对于较大的项目,我建议分解视图和模型以使其具有以下子文件夹:

项目布局

  • static /:静态文件;由App Engine直接提供
    • js / *。js
    • 图片/*.gif|png|jpg
    • css / *。css
  • myapp /:应用程序结构
    • 型号/*.py
    • 视图/*.py
    • 测试/*.py
    • templates / *。html:模板

First, I would suggest you have a look at “Rapid Development with Python, Django, and Google App Engine

GvR describes a general/standard project layout on page 10 of his slide presentation.

Here I’ll post a slightly modified version of the layout/structure from that page. I pretty much follow this pattern myself. You also mentioned you had trouble with packages. Just make sure each of your sub folders has an __init__.py file. It’s ok if its empty.

Boilerplate files

  • These hardly vary between projects
  • app.yaml: direct all non-static requests to main.py
  • main.py: initialize app and send it all requests

Project lay-out

  • static/*: static files; served directly by App Engine
  • myapp/*.py: app-specific python code
    • views.py, models.py, tests.py, __init__.py, and more
  • templates/*.html: templates (or myapp/templates/*.html)

Here are some code examples that may help as well:

main.py

import wsgiref.handlers

from google.appengine.ext import webapp
from myapp.views import *

application = webapp.WSGIApplication([
  ('/', IndexHandler),
  ('/foo', FooHandler)
], debug=True)

def main():
  wsgiref.handlers.CGIHandler().run(application)

myapp/views.py

import os
import datetime
import logging
import time

from google.appengine.api import urlfetch
from google.appengine.ext.webapp import template
from google.appengine.api import users
from google.appengine.ext import webapp
from models import *

class IndexHandler(webapp.RequestHandler):
  def get(self):
    date = "foo"
    # Do some processing        
    template_values = {'data': data }
    path = os.path.join(os.path.dirname(__file__) + '/../templates/', 'main.html')
    self.response.out.write(template.render(path, template_values))

class FooHandler(webapp.RequestHandler):
  def get(self):
    #logging.debug("start of handler")

myapp/models.py

from google.appengine.ext import db

class SampleModel(db.Model):

I think this layout works great for new and relatively small to medium projects. For larger projects I would suggest breaking up the views and models to have their own sub-folders with something like:

Project lay-out

  • static/: static files; served directly by App Engine
    • js/*.js
    • images/*.gif|png|jpg
    • css/*.css
  • myapp/: app structure
    • models/*.py
    • views/*.py
    • tests/*.py
    • templates/*.html: templates

回答 1

我通常的布局如下所示:

  • app.yaml
  • index.yaml
  • request.py-包含基本的WSGI应用程序
  • LIB
    • __init__.py -常见功能,包括请求处理程序基类
  • 控制器-包含所有处理程序。request.yaml导入这些。
  • 范本
    • 控制器使用的所有django模板
  • 模型
    • 所有数据存储区模型类
  • 静态的
    • 静态文件(css,图像等)。由app.yaml映射到/ static

如果不清楚,我可以提供一些示例,说明我的app.yaml,request.py,lib / init .py和示例控制器的外观。

My usual layout looks something like this:

  • app.yaml
  • index.yaml
  • request.py – contains the basic WSGI app
  • lib
    • __init__.py – common functionality, including a request handler base class
  • controllers – contains all the handlers. request.yaml imports these.
  • templates
    • all the django templates, used by the controllers
  • model
    • all the datastore model classes
  • static
    • static files (css, images, etc). Mapped to /static by app.yaml

I can provide examples of what my app.yaml, request.py, lib/init.py, and sample controllers look like, if this isn’t clear.


回答 2

我今天实现了一个Google App Engine样板,并在github上进行了检查。这与尼克·约翰逊(Nick Johnson)(曾为Google工作)所描述的思路一致。

点击此链接gae-boilerplate

I implemented a google app engine boilerplate today and checked it on github. This is along the lines described by Nick Johnson above (who used to work for Google).

Follow this link gae-boilerplate


回答 3

我认为第一种选择是最佳实践。并将代码文件夹作为您的第一个软件包。Guido van Rossum开发的Rietveld项目是一个很好的学习模型。看看吧:http : //code.google.com/p/rietveld

关于Django 1.0,建议您开始使用Django主干代码,而不要使用内置在django端口中的GAE。再次,看看Rietveld的工作方式。

I think the first option is considered the best practice. And make the code folder your first package. The Rietveld project developed by Guido van Rossum is a very good model to learn from. Have a look at it: http://code.google.com/p/rietveld

With regard to Django 1.0, I suggest you start using the Django trunk code instead of the GAE built in django port. Again, have a look at how it’s done in Rietveld.


回答 4

我喜欢webpy,因此在Google App Engine上将其用作模板框架。
我的软件包文件夹通常是这样组织的:

app.yaml
application.py
index.yaml
/app
   /config
   /controllers
   /db
   /lib
   /models
   /static
        /docs
        /images
        /javascripts
        /stylesheets
   test/
   utility/
   views/

是一个例子。

I like webpy so I’ve adopted it as templating framework on Google App Engine.
My package folders are typically organized like this:

app.yaml
application.py
index.yaml
/app
   /config
   /controllers
   /db
   /lib
   /models
   /static
        /docs
        /images
        /javascripts
        /stylesheets
   test/
   utility/
   views/

Here is an example.


回答 5

在代码布局方面,我并没有完全了解最新的最佳实践等,但是当我完成第一个GAE应用程序时,我在第二个选项中使用了一些东西,其中代码和模板彼此相邻。

造成这种情况的原因有两个:一是将代码和模板保留在附近,其二,我的目录结构布局与网站的布局相似,这对我来说使它变得更容易一点,也记住所有内容在哪里。

I am not entirely up to date on the latest best practices, et cetera when it comes to code layout, but when I did my first GAE application, I used something along your second option, where the code and templates are next to eachother.

There was two reasons for this – one, it kept the code and template nearby, and secondly, I had the directory structure layout mimic that of the website – making it (for me) a bit easier too remember where everything was.


无法通过pip安装Scipy

问题:无法通过pip安装Scipy

当通过pip安装scipy时:

pip install scipy

Pip无法构建scipy,并引发以下错误:

Cleaning up...
Command /Users/administrator/dev/KaggleAux/env/bin/python2.7 -c "import setuptools, tokenize;__file__='/Users/administrator/dev/KaggleAux/env/build/scipy/setup.py';exec(compile(getattr(tokenize, 'open', open)(__file__).read().replace('\r\n', '\n'), __file__, 'exec'))" install --record /var/folders/zl/7698ng4d4nxd49q1845jd9340000gn/T/pip-eO8gua-record/install-record.txt --single-version-externally-managed --compile --install-headers /Users/administrator/dev/KaggleAux/env/bin/../include/site/python2.7 failed with error code 1 in /Users/administrator/dev/KaggleAux/env/build/scipy
Storing debug log for failure in /Users/administrator/.pip/pip.log

如何获得成功构建的秘诀?这可能是OSX Yosemite的一个新问题,因为我刚刚升级并且之前没有安装scipy的问题。


调试日志:

Cleaning up...
  Removing temporary dir /Users/administrator/dev/KaggleAux/env/build...
Command /Users/administrator/dev/KaggleAux/env/bin/python2.7 -c "import setuptools, tokenize;__file__='/Users/administrator/dev/KaggleAux/env/build/scipy/setup.py';exec(compile(getattr(tokenize, 'open', open)(__file__).read().replace('\r\n', '\n'), __file__, 'exec'))" install --record /var/folders/zl/7698ng4d4nxd49q1845jd9340000gn/T/pip-eO8gua-record/install-record.txt --single-version-externally-managed --compile --install-headers /Users/administrator/dev/KaggleAux/env/bin/../include/site/python2.7 failed with error code 1 in /Users/administrator/dev/KaggleAux/env/build/scipy
Exception information:
Traceback (most recent call last):
  File "/Users/administrator/dev/KaggleAux/env/lib/python2.7/site-packages/pip/basecommand.py", line 122, in main
    status = self.run(options, args)
  File "/Users/administrator/dev/KaggleAux/env/lib/python2.7/site-packages/pip/commands/install.py", line 283, in run
    requirement_set.install(install_options, global_options, root=options.root_path)
  File "/Users/administrator/dev/KaggleAux/env/lib/python2.7/site-packages/pip/req.py", line 1435, in install
    requirement.install(install_options, global_options, *args, **kwargs)
  File "/Users/administrator/dev/KaggleAux/env/lib/python2.7/site-packages/pip/req.py", line 706, in install
    cwd=self.source_dir, filter_stdout=self._filter_install, show_stdout=False)
  File "/Users/administrator/dev/KaggleAux/env/lib/python2.7/site-packages/pip/util.py", line 697, in call_subprocess
    % (command_desc, proc.returncode, cwd))
InstallationError: Command /Users/administrator/dev/KaggleAux/env/bin/python2.7 -c "import setuptools, tokenize;__file__='/Users/administrator/dev/KaggleAux/env/build/scipy/setup.py';exec(compile(getattr(tokenize, 'open', open)(__file__).read().replace('\r\n', '\n'), __file__, 'exec'))" install --record /var/folders/zl/7698ng4d4nxd49q1845jd9340000gn/T/pip-eO8gua-record/install-record.txt --single-version-externally-managed --compile --install-headers /Users/administrator/dev/KaggleAux/env/bin/../include/site/python2.7 failed with error code 1 in /Users/administrator/dev/KaggleAux/env/build/scipy

When installing scipy through pip with :

pip install scipy

Pip fails to build scipy and throws the following error:

Cleaning up...
Command /Users/administrator/dev/KaggleAux/env/bin/python2.7 -c "import setuptools, tokenize;__file__='/Users/administrator/dev/KaggleAux/env/build/scipy/setup.py';exec(compile(getattr(tokenize, 'open', open)(__file__).read().replace('\r\n', '\n'), __file__, 'exec'))" install --record /var/folders/zl/7698ng4d4nxd49q1845jd9340000gn/T/pip-eO8gua-record/install-record.txt --single-version-externally-managed --compile --install-headers /Users/administrator/dev/KaggleAux/env/bin/../include/site/python2.7 failed with error code 1 in /Users/administrator/dev/KaggleAux/env/build/scipy
Storing debug log for failure in /Users/administrator/.pip/pip.log

How can I get scipy to build successfully? This may be a new issue with OSX Yosemite since I just upgraded and haven’t had issues installing scipy before.


Debug log:

Cleaning up...
  Removing temporary dir /Users/administrator/dev/KaggleAux/env/build...
Command /Users/administrator/dev/KaggleAux/env/bin/python2.7 -c "import setuptools, tokenize;__file__='/Users/administrator/dev/KaggleAux/env/build/scipy/setup.py';exec(compile(getattr(tokenize, 'open', open)(__file__).read().replace('\r\n', '\n'), __file__, 'exec'))" install --record /var/folders/zl/7698ng4d4nxd49q1845jd9340000gn/T/pip-eO8gua-record/install-record.txt --single-version-externally-managed --compile --install-headers /Users/administrator/dev/KaggleAux/env/bin/../include/site/python2.7 failed with error code 1 in /Users/administrator/dev/KaggleAux/env/build/scipy
Exception information:
Traceback (most recent call last):
  File "/Users/administrator/dev/KaggleAux/env/lib/python2.7/site-packages/pip/basecommand.py", line 122, in main
    status = self.run(options, args)
  File "/Users/administrator/dev/KaggleAux/env/lib/python2.7/site-packages/pip/commands/install.py", line 283, in run
    requirement_set.install(install_options, global_options, root=options.root_path)
  File "/Users/administrator/dev/KaggleAux/env/lib/python2.7/site-packages/pip/req.py", line 1435, in install
    requirement.install(install_options, global_options, *args, **kwargs)
  File "/Users/administrator/dev/KaggleAux/env/lib/python2.7/site-packages/pip/req.py", line 706, in install
    cwd=self.source_dir, filter_stdout=self._filter_install, show_stdout=False)
  File "/Users/administrator/dev/KaggleAux/env/lib/python2.7/site-packages/pip/util.py", line 697, in call_subprocess
    % (command_desc, proc.returncode, cwd))
InstallationError: Command /Users/administrator/dev/KaggleAux/env/bin/python2.7 -c "import setuptools, tokenize;__file__='/Users/administrator/dev/KaggleAux/env/build/scipy/setup.py';exec(compile(getattr(tokenize, 'open', open)(__file__).read().replace('\r\n', '\n'), __file__, 'exec'))" install --record /var/folders/zl/7698ng4d4nxd49q1845jd9340000gn/T/pip-eO8gua-record/install-record.txt --single-version-externally-managed --compile --install-headers /Users/administrator/dev/KaggleAux/env/bin/../include/site/python2.7 failed with error code 1 in /Users/administrator/dev/KaggleAux/env/build/scipy

回答 0

向SciPy团队提出问题后,我们发现您需要使用以下方法升级点子:

pip install --upgrade pip

Python 3这项工作中:

python3 -m pip install --upgrade pip

为SciPy正确安装。为什么?因为:

必须告知较旧版本的pip使用IIRC车轮–use-wheel。或者,您可以升级点子本身,然后应该拿起轮子。

升级pip可以解决此问题,但是您也可以只使用该--use-wheel标志。

After opening up an issue with the SciPy team, we found that you need to upgrade pip with:

pip install --upgrade pip

And in Python 3 this works:

python3 -m pip install --upgrade pip

for SciPy to install properly. Why? Because:

Older versions of pip have to be told to use wheels, IIRC with –use-wheel. Or you can upgrade pip itself, then it should pick up the wheels.

Upgrading pip solves the issue, but you might be able to just use the --use-wheel flag as well.


回答 1

安装了64位Python的Microsoft Windows用户需要.whl此处下载64位的Scipy ,然后将其简单地cd下载到您已下载.whl文件并运行的文件夹中:

pip install scipy-0.16.1-cp27-none-win_amd64.whl

Microsoft Windows users of 64 bit Python installations will need to download the 64 bit .whl of Scipy from here, then simply cd into the folder you’ve downloaded the .whl file and run:

pip install scipy-0.16.1-cp27-none-win_amd64.whl

回答 2

在ubuntu下安装Scipy时遇到相同的问题。
我不得不使用命令:

$ sudo apt-get install libatlas-base-dev gfortran
$ sudo pip3 install scipy

您可以在此处获得更多详细信息使用pip安装SciPy
对不起,在OS X Yosemite下不知道如何进行操作。

I face same problem when install Scipy under ubuntu.
I had to use command:

$ sudo apt-get install libatlas-base-dev gfortran
$ sudo pip3 install scipy

You can get more details here Installing SciPy with pip
Sorry don’t know how to do it under OS X Yosemite.


回答 3

在Windows 10中,大多数选项将不起作用。跟着这些步骤:

在Windows 10与CMD,你不能下载scipy直接使用最知名的命令喜欢的wgetcloning scipy githubpip install scipy,等

要安装,请转至pythonlibs .whl文件,如果正在使用,请python 2.7 32 bit下载,numpy-1.11.2rc1+mkl-cp27-cp27m-win32.whl and scipy-0.18.1-cp27-cp27m-win32.whl或者python 2.7 62 bit先下载numpy-1.11.2rc1+mkl-cp27-cp27m-win_amd64.whl and scipy-0.18.1-cp27-cp27m-win_amd64.whl

下载后,将文件保存在您的下python directory,在我的情况下是c:\>python27

然后运行:

pip install C:\Python27\numpy-1.11.2rc1+mkl-cp27-cp27m-win32.whl 
pip install C:\Python27\scipy-0.18.1-cp27-cp27m-win32.whl

注意:

  1. scipy需要numpy作为依赖项,所以这就是我们在下载numpy之前scipy
  2. cp27.whl文件中的文件意味着这些文件专门用于python 2.7cp33表示python 3.x> = 3.3

In windows 10, most options will not work. Follow these steps:

In Windows 10 with CMD, you cannot download scipy directly using most of the well known commands like wget, cloning scipy github, pip install scipy, etc

To install, go to pythonlibs .whl files , and if you are using python 2.7 32 bit then download numpy-1.11.2rc1+mkl-cp27-cp27m-win32.whl and scipy-0.18.1-cp27-cp27m-win32.whl or if python 2.7 62 bit then download numpy-1.11.2rc1+mkl-cp27-cp27m-win_amd64.whl and scipy-0.18.1-cp27-cp27m-win_amd64.whl

After downloading,save the files under your python directory , in my case it was c:\>python27

Then run:

pip install C:\Python27\numpy-1.11.2rc1+mkl-cp27-cp27m-win32.whl 
pip install C:\Python27\scipy-0.18.1-cp27-cp27m-win32.whl

Note:

  1. scipy needs numpy as dependency, so that’s why we are downloading numpy before scipy.
  2. cp27 in .whl files means that these files are meant for python 2.7 and cp33 stands for python 3.x speciafically >=3.3

回答 4

找到一些线索的答案后,我通过做

brew install gcc 
pip install scipy

(第一步是在我的2011 Mac Book Air上花费96分钟,所以希望您不要着急!)

After finding this answer for some clues, I got this working by doing

brew install gcc 
pip install scipy

(The first of these steps took 96 minutes on my 2011 Mac Book Air so I hope you’re not in a hurry!)


回答 5

如果您是python的新手,请分步阅读或直接进入最后一步。请按照以下方法在Windows 64位,Python 64位上安装scipy 0.18.1。如果以下命令不起作用,请继续进行操作

pip install scipy

注意以下版本

  1. Python

  2. 视窗

  3. .whl版本的numpy和scipy文件

  4. 首先安装numpy和scipy。

    pip install FileName.whl
  5. 对于Numpy:http : //www.lfd.uci.edu/~gohlke/pythonlibs/#numpy 对于Scipy:http : //www.lfd.uci.edu/~gohlke/pythonlibs/#scipy

注意文件名(检查版本号)。

例如:scipy-0.18.1-cp35-cp35m-win_amd64.whl

要检查您的点子支持哪个版本,请转到下面的第2点。

如果您正在使用.whl文件。可能会发生以下错误。

  1. 您正在使用pip版本7.1.0,但是版本8.1.2可用。

您应该考虑通过’python -m pip install –upgrade pip’命令进行升级

  1. 在此平台上不支持scipy-0.15.1-cp33-none-win_amd64.whl.whl

对于上述错误:启动Python并输入:

import pip
print(pip.pep425tags.get_supported())

输出:

[(’cp35’,’cp35m’,’win_amd64’),(’cp35’,’none’,’win_amd64’),(’py3’,’none’,’win_amd64’),(’cp35’,’none ‘,’any’),(’cp3’,’none’,’any’),(’py35’,’none’,’any’),(’py3’,’none’,’any’),( ‘py34’,’none’,’any’),(’py33’,’none’,’any’),(’py32’,’none’,’any’),(’py31’,’none’, ‘any’),(’py30’,’none’,’any’)]

在输出中,您会看到cp35在那里,因此为numpy和scipy下载cp35,欢迎进一步编辑。

If you are totally new to python read step by step or go directly to last step. Follow the below method to install scipy 0.18.1 on Windows 64-bit , Python 64-bit . If below command is not working then proceed further

pip install scipy

Be careful with the versions of

  1. Python

  2. Windows

  3. .whl version of numpy and scipy files

  4. First install numpy and scipy.

    pip install FileName.whl
    
  5. For Numpy:http://www.lfd.uci.edu/~gohlke/pythonlibs/#numpy For Scipy:http://www.lfd.uci.edu/~gohlke/pythonlibs/#scipy

Be aware of the file name (check the version number).

Ex :scipy-0.18.1-cp35-cp35m-win_amd64.whl

To check which version is supported by your pip, go to point No 2 below.

If you are using .whl file . Following errors are likely to occur .

  1. You are using pip version 7.1.0, however version 8.1.2 is available.

You should consider upgrading via the ‘python -m pip install –upgrade pip’ command

  1. scipy-0.15.1-cp33-none-win_amd64.whl.whl is not supported wheel on this platform

For the above error: start Python and type :

import pip
print(pip.pep425tags.get_supported())

Output:

[(‘cp35’, ‘cp35m’, ‘win_amd64’), (‘cp35’, ‘none’, ‘win_amd64’), (‘py3’, ‘none’, ‘win_amd64’), (‘cp35’, ‘none’, ‘any’), (‘cp3’, ‘none’, ‘any’), (‘py35’, ‘none’, ‘any’), (‘py3’, ‘none’, ‘any’), (‘py34’, ‘none’, ‘any’), (‘py33’, ‘none’, ‘any’), (‘py32’, ‘none’, ‘any’), (‘py31’, ‘none’, ‘any’), (‘py30’, ‘none’, ‘any’)]

In the output you will observe cp35 is there , so download cp35 for numpy as well as scipy.Further edits are most welcome.


回答 6

对于Windows 10

C:\目录> pip安装scipy-0.19.0rc2-cp35-cp35m-win_amd64.whl

For Windows 10

C:\directory> pip install scipy-0.19.0rc2-cp35-cp35m-win_amd64.whl


回答 7

而不是艰难地下载特定软件包。我更喜欢使用Conda的更快途径。点有问题。

  • Python -v(3.6.0)
  • Windows 10(64位)

Conda,从以下位置安装conda:https ://conda.io/docs/install/quick.html#windows-miniconda-install

命令提示符

C:\Users\xyz>conda install -c anaconda scipy=0.18.1
Fetching package metadata .............
Solving package specifications:

在环境C:\ Users \ xyz \ Miniconda3中安装的软件包计划:

将安装以下新软件包:

mkl:       2017.0.1-0         anaconda
numpy:     1.12.0-py36_0      anaconda
scipy:     0.18.1-np112py36_1 anaconda

优先级较高的频道将对以下程序包进行超级管理:

conda:     4.3.11-py36_0               --> 4.3.11-py36_0 anaconda
conda-env: 2.6.0-0                     --> 2.6.0-0       anaconda

是否继续([y] / n)?ÿ

conda-env-2.6. 100% |###############################| Time: 0:00:00  32.92 kB/s
mkl-2017.0.1-0 100% |###############################| Time: 0:00:24   5.45 MB/s
numpy-1.12.0-p 100% |###############################| Time: 0:00:00   5.09 MB/s
scipy-0.18.1-n 100% |###############################| Time: 0:00:02   5.59 MB/s
conda-4.3.11-p 100% |###############################| Time: 0:00:00   4.70 MB/s

Rather than going the harder route of downloading specific packages. I prefer to go the faster route of using Conda. pip has its issues.

  • Python -v (3.6.0)
  • Windows 10 (64 bit)

Conda , install conda from : https://conda.io/docs/install/quick.html#windows-miniconda-install

command prompt

C:\Users\xyz>conda install -c anaconda scipy=0.18.1
Fetching package metadata .............
Solving package specifications:

Package plan for installation in environment C:\Users\xyz\Miniconda3:

The following NEW packages will be INSTALLED:

mkl:       2017.0.1-0         anaconda
numpy:     1.12.0-py36_0      anaconda
scipy:     0.18.1-np112py36_1 anaconda

The following packages will be SUPERCEDED by a higher-priority channel:

conda:     4.3.11-py36_0               --> 4.3.11-py36_0 anaconda
conda-env: 2.6.0-0                     --> 2.6.0-0       anaconda

Proceed ([y]/n)? y

conda-env-2.6. 100% |###############################| Time: 0:00:00  32.92 kB/s
mkl-2017.0.1-0 100% |###############################| Time: 0:00:24   5.45 MB/s
numpy-1.12.0-p 100% |###############################| Time: 0:00:00   5.09 MB/s
scipy-0.18.1-n 100% |###############################| Time: 0:00:02   5.59 MB/s
conda-4.3.11-p 100% |###############################| Time: 0:00:00   4.70 MB/s

回答 8

  1. http://www.lfd.uci.edu/~gohlke/pythonlibs/#scipy下载SciPy
  2. 进入下载文件所在的目录和pip install文件。
  3. 转到python shell,运行import scipy;它对我没有任何错误。
  1. Download SciPy from http://www.lfd.uci.edu/~gohlke/pythonlibs/#scipy
  2. Go into the directory the downloaded file is in and pip install the file.
  3. Go to python shell, run import scipy; it worked for me with no errors.

回答 9

这是pip的替代方法。安装scipy时我也遇到了相同的错误使用pip。

然后我下载并安装了MiniConda。然后,我使用以下命令安装pytables。

conda install -c conda-forge scipy

请参考以下屏幕截图。

This is an alternative to pip. I also had the same error when installing scipy with pip.

Then I downloaded and installed MiniConda. And then I used the below command to install pytables.

conda install -c conda-forge scipy

Please refer the below screenshot.


回答 10

我可以建议的最好方法是

  1. 从此位置下载wheel文件以获取您的python版本

  2. 将文件移动到主驱动器,例如C:>

  3. 运行Cmd并输入以下内容

    • 点安装scipy-1.0.0rc1-cp36-none-win_amd64.whl

请注意,这是我用于pyhton 3.6.2的版本,应该可以正常安装

您可能要在确保所有python附加组件都是最新的之后运行此命令

pip list --outdated

the best method I could suggest is this

  1. Download the wheel file from this location for your version of python

  2. Move the file to your Main Drive eg C:>

  3. Run Cmd and enter the following

    • pip install scipy-1.0.0rc1-cp36-none-win_amd64.whl

Please note this is the version I am using for my pyhton 3.6.2 it should install fine

you may want to run this command after to make sure all your python add ons are up to date

pip list --outdated

回答 11

或者,手动下载并执行 适合您的http://www.lfd.uci.edu/~gohlke/pythonlibs Scipy版本。考虑您的Python版本(python –version)和您的系统架构(32/64位)。相应地选择Scipy版本。SciPy的-0.18.1- cp27 -cp27m- 的win32 -用于Python 2.7 32位SciPy的-0.18.1- cp27 -cp27m- win_amd64 -用于Python 2.7 64位否则错误 SciPy的-0.15.1-CP33-NONE-win_amd64.whl不支持.whl的平台 上的滚轮将在安装时弹出。

现在将目录更改为下载的文件并执行命令 pip install scipy-0.15.1-cp33-none-win_amd64.whl.whl(适当更改文件名)

我添加此答案的唯一原因是Arun的答案(对我自己有用)没有提及我所遇到的有关32/64位匹配的任何内容。

Alternatively, manually download and execute http://www.lfd.uci.edu/~gohlke/pythonlibs Scipy version suitable for you. Consider your Python version (python –version) and your system architecture (32/64 bit). Choose the Scipy version accordingly. scipy-0.18.1-cp27-cp27m-win32 – for Python 2.7 32 bit scipy-0.18.1-cp27-cp27m-win_amd64 – for Python 2.7 64 bit Otherwise the error scipy-0.15.1-cp33-none-win_amd64.whl.whl is not supported wheel on this platform will popup on installation.

Now change directory to the downloaded file and execute command pip install scipy-0.15.1-cp33-none-win_amd64.whl.whl (change file name appropriately)

I have added this answer only because the Arun’s answer(found useful by myself) has not mentioned anything about 32/64 bit matching which i have faced.


回答 12

如果您使用的是CentOS,则需要按以下方式安装lapack-devel:

 $ yum install lapack-devel

If you are using CentOS you need to install lapack-devel like so:

 $ yum install lapack-devel

回答 13

尝试从下面的链接下载scipy文件

https://sourceforge.net/projects/scipy/?source=typ_redirect

这将是一个.exe文件,您只需要运行它即可。但请确保选择与您的python版本相对应的scipy版本。

运行scipy.exe文件时,它将找到python目录并进行安装。

Try downloading the scipy file from the below link

https://sourceforge.net/projects/scipy/?source=typ_redirect

It will be a .exe file and you just need to run it. But be sure to chose the scipy version corresponding to your python version.

When the scipy.exe file is run it will locate the python directory and will be installed .


回答 14

使用wheel文件从此处下载安装文件 http://www.lfd.uci.edu/~gohlke/pythonlibs/#scipy install

pip install c:\jjjj\ggg\fdadf.whl

use the wheel file to install download from here http://www.lfd.uci.edu/~gohlke/pythonlibs/#scipy install

pip install c:\jjjj\ggg\fdadf.whl

回答 15

我遇到了同样的问题,并且成功使用了sudo

$ sudo pip install scipy

I was having the same issue, and I had succeeded using sudo.

$ sudo pip install scipy

回答 16

最简单的方法是执行以下步骤:修复python [2.n <python <3.n]的scipy

从以下位置下载必要的文件:http : //www.lfd.uci.edu/~gohlke/pythonlibs/

下载numpy + mkl的版本(需要运行scipy),然后为您的python类型(2.n python编写为2n)或(3.n python编写为3n)下载scipy,n是一个变量。请注意,您必须知道您拥有32位还是64位处理器。

在计算机上的某个位置创建一个目录,例如[C:\ DIRECTORY]以安装文件numpy + mkd.whl和scipy.whl

下载完两个文件后,在计算机上找到文件的位置,然后将其移动到您创建的目录中。

示例:首先需要安装文件才能安装scipy

C:\ DIRECTORY \ numpy \ numpy-0.0.0 + mkl-cp2n-cp2nm-win_amd32.whl

示例:第二个文件安装在

C:\ DIRECTORY \ scipy \ scipy-0.0.0-cp2n-cp2nm-win_amd32.whl

转到命令提示符并针对python 2.n版本继续以下示例:

py -2.n -m pip install C:\DIRECTORY\numpy\numpy-0.0.0+mkl-cp2n-cp2nm-win_amd32.whl

应该安装

py -2.n -m pip install C:\DIRECTORY\scipy\scipy-0.0.0-cp2n-cp2nm-win_amd32.whl

应该安装

如下测试python IDLE上的两个模块:

import numpy

import scipy

如果没有错误返回,则模块正在工作。

国际食品药品监督管理局

The easiest way is in the following steps: Fixing scipy for python [ 2.n < python < 3.n ]

Download the necessary files from: http://www.lfd.uci.edu/~gohlke/pythonlibs/

Download the version of numpy+mkl (needed to run scipy) and then download scipy for your python type (2.n python written as 2n) or (3.n python written as 3n), n is a variable. Note you must know whether you have a 32bit or 64bit processor.

Create a directory somewhere on your computer, example [C:\DIRECTORY] to install the files numpy+mkd.whl and scipy.whl

Once both file are downloaded, find the location of the file on your computer and move it to the directory you created.

Example: First file installation is needed for scipy is in

C:\DIRECTORY\numpy\numpy-0.0.0+mkl-cp2n-cp2nm-win_amd32.whl

Example: Second file installation is in

C:\DIRECTORY\scipy\scipy-0.0.0-cp2n-cp2nm-win_amd32.whl

Go to your command prompt and proceed the following example for a python version 2.n:

py -2.n -m pip install C:\DIRECTORY\numpy\numpy-0.0.0+mkl-cp2n-cp2nm-win_amd32.whl

should install

py -2.n -m pip install C:\DIRECTORY\scipy\scipy-0.0.0-cp2n-cp2nm-win_amd32.whl

should install

Test both modules on your python IDLE as following:

import numpy

import scipy

the modules are working if no errors are returned.

IFDAAS


回答 17

对于Windows(在我的情况下为7):

  1. http://www.lfd.uci.edu/~gohlke/pythonlibs/#scipy下载scipy-0.19.1-cp36-cp36m-win32.whl
  2. 创建一个带有内容的some.bat文件

    @echo off C:\Python36\python.exe -m pip -V C:\Python36\python.exe -m pip install scipy-0.19.1-cp36-cp36m-win32.whl C:\Python36\python.exe -m pip list pause

  3. 然后运行此批处理文件some.bat

  4. 调用python shell“ C:\ Python36 \ pythonw.exe” C:\ Python36 \ Lib \ idlelib \ idle.pyw“并测试scipy是否与

导入密码

For windows(7 in my case):

  1. download scipy-0.19.1-cp36-cp36m-win32.whl from http://www.lfd.uci.edu/~gohlke/pythonlibs/#scipy
  2. create one some.bat file with content

    @echo off C:\Python36\python.exe -m pip -V C:\Python36\python.exe -m pip install scipy-0.19.1-cp36-cp36m-win32.whl C:\Python36\python.exe -m pip list pause

  3. then run this batch file some.bat

  4. call python shell “C:\Python36\pythonw.exe “C:\Python36\Lib\idlelib\idle.pyw” and test if scipy was installed with

import scipy


回答 18

在Windows 10 100%上安装scipy的简单方法是:只需点此====> pip install scipy==1.0.0rc2

晚点再谢我 :)

The easy way to install scipy on Windows 10 100% is this: Just pip this ====> pip install scipy==1.0.0rc2

Thank me later :)


回答 19

我在Python 3.7(3.7.0b4)中遇到了类似的问题。这是由于有关某些编码假设的某些更改(Python 3.6 >> Python 3.7)

结果,许多软件包安装(例如通过pip)失败。

I experienced similar issues with Python 3.7 (3.7.0b4). This was due to some changes regarding some encoding assumptions (Python 3.6 >> Python 3.7)

As a result lots of package installations (e.g. via pip) failed.


回答 20

您可以测试以下答案:

python -m pip install --user numpy scipy matplotlib ipython jupyter pandas sympy nose

You can test this answer:

python -m pip install --user numpy scipy matplotlib ipython jupyter pandas sympy nose

Python中“(1,)== 1”是什么意思?

问题:Python中“(1,)== 1”是什么意思?

我正在测试元组结构,当我==像这样使用运算符时发现它很奇怪:

>>>  (1,) == 1,
Out: (False,)

当我将这两个表达式分配给变量时,结果为true:

>>> a = (1,)
>>> b = 1,
>>> a==b
Out: True

在我看来,这个问题与Python元组尾随逗号语法规则不同。请问==运算符之间的表达式组。

I’m testing the tuple structure, and I found it’s strange when I use the == operator like:

>>>  (1,) == 1,
Out: (False,)

When I assign these two expressions to a variable, the result is true:

>>> a = (1,)
>>> b = 1,
>>> a==b
Out: True

This questions is different from Python tuple trailing comma syntax rule in my view. I ask the group of expressions between == operator.


回答 0

其他答案已经向您显示该行为是由于运算符的优先级所致,如此处所述

下次您遇到类似问题时,我将向您展示如何找到答案。您可以使用以下ast模块来解构表达式的解析方式:

>>> import ast
>>> source_code = '(1,) == 1,'
>>> print(ast.dump(ast.parse(source_code), annotate_fields=False))
Module([Expr(Tuple([Compare(Tuple([Num(1)], Load()), [Eq()], [Num(1)])], Load()))])

从中我们可以看到代码已按照Tim Peters的解释进行了解析:

Module([Expr(
    Tuple([
        Compare(
            Tuple([Num(1)], Load()), 
            [Eq()], 
            [Num(1)]
        )
    ], Load())
)])

Other answers have already shown you that the behaviour is due to operator precedence, as documented here.

I’m going to show you how to find the answer yourself next time you have a question similar to this. You can deconstruct how the expression parses using the ast module:

>>> import ast
>>> source_code = '(1,) == 1,'
>>> print(ast.dump(ast.parse(source_code), annotate_fields=False))
Module([Expr(Tuple([Compare(Tuple([Num(1)], Load()), [Eq()], [Num(1)])], Load()))])

From this we can see that the code gets parsed as Tim Peters explained:

Module([Expr(
    Tuple([
        Compare(
            Tuple([Num(1)], Load()), 
            [Eq()], 
            [Num(1)]
        )
    ], Load())
)])

回答 1

这只是运算符的优先级。你的第一个

(1,) == 1,

像这样的团体:

((1,) == 1),

因此,根据将一个元素的元组1,与整数1进行相等性比较的结果来构建一个具有单个元素的元组False,

This is just operator precedence. Your first

(1,) == 1,

groups like so:

((1,) == 1),

so builds a tuple with a single element from the result of comparing the one-element tuple 1, to the integer 1 for equality They’re not equal, so you get the 1-tuple False, for a result.


回答 2

当你做

>>> (1,) == 1,

它通过将元 (1,)整数进行比较并返回来生成一个元组False

相反,当您分配变量时,两个相等的元组会相互比较。

你可以试试:

>>> x = 1,
>>> x
(1,)

When you do

>>> (1,) == 1,

it builds a tuple with the result from comparing the tuple (1,) with an integer and thus returning False.

Instead when you assign to variables, the two equal tuples are compared with each other.

You can try:

>>> x = 1,
>>> x
(1,)

遍历字符串

问题:遍历字符串

我有这样定义的多行字符串:

foo = """
this is 
a multi-line string.
"""

我们用作我正在编写的解析器的测试输入的字符串。解析器功能接收file-object作为输入并对其进行迭代。它还确实next()直接调用该方法以跳过行,因此我确实需要一个迭代器作为输入,而不是可迭代的。我需要一个迭代器,它可以在字符串的各个行之间进行迭代,就像file-object可以在文本文件的行之间进行迭代一样。我当然可以这样:

lineiterator = iter(foo.splitlines())

是否有更直接的方法?在这种情况下,字符串必须遍历一次才能进行拆分,然后再由解析器再次遍历。在我的测试用例中,这无关紧要,因为那里的字符串很短,我只是出于好奇而问。Python有很多有用且高效的内置程序,但是我找不到适合此需求的东西。

I have a multi-line string defined like this:

foo = """
this is 
a multi-line string.
"""

This string we used as test-input for a parser I am writing. The parser-function receives a file-object as input and iterates over it. It does also call the next() method directly to skip lines, so I really need an iterator as input, not an iterable. I need an iterator that iterates over the individual lines of that string like a file-object would over the lines of a text-file. I could of course do it like this:

lineiterator = iter(foo.splitlines())

Is there a more direct way of doing this? In this scenario the string has to traversed once for the splitting, and then again by the parser. It doesn’t matter in my test-case, since the string is very short there, I am just asking out of curiosity. Python has so many useful and efficient built-ins for such stuff, but I could find nothing that suits this need.


回答 0

这是三种可能性:

foo = """
this is 
a multi-line string.
"""

def f1(foo=foo): return iter(foo.splitlines())

def f2(foo=foo):
    retval = ''
    for char in foo:
        retval += char if not char == '\n' else ''
        if char == '\n':
            yield retval
            retval = ''
    if retval:
        yield retval

def f3(foo=foo):
    prevnl = -1
    while True:
      nextnl = foo.find('\n', prevnl + 1)
      if nextnl < 0: break
      yield foo[prevnl + 1:nextnl]
      prevnl = nextnl

if __name__ == '__main__':
  for f in f1, f2, f3:
    print list(f())

将其作为主要脚本运行,确认这三个功能等效。使用timeit(并使用* 100for foo获得大量字符串以进行更精确的测量):

$ python -mtimeit -s'import asp' 'list(asp.f3())'
1000 loops, best of 3: 370 usec per loop
$ python -mtimeit -s'import asp' 'list(asp.f2())'
1000 loops, best of 3: 1.36 msec per loop
$ python -mtimeit -s'import asp' 'list(asp.f1())'
10000 loops, best of 3: 61.5 usec per loop

注意,我们需要list()调用以确保遍历迭代器,而不仅仅是构建迭代器。

IOW,天真的实现要快得多,甚至都不有趣:比我尝试find调用快6倍,而调用比底层方法快4倍。

经验教训:测量永远是一件好事(但必须准确);像这样的字符串方法splitlines以非常快的方式实现;通过在非常低的级别上进行编程(尤其是通过+=非常小的片段的循环)来将字符串组合在一起可能会非常慢。

编辑:添加了@Jacob的提案,对其进行了稍微修改以使其与其他提案具有相同的结果(保留行尾空白),即:

from cStringIO import StringIO

def f4(foo=foo):
    stri = StringIO(foo)
    while True:
        nl = stri.readline()
        if nl != '':
            yield nl.strip('\n')
        else:
            raise StopIteration

测量得出:

$ python -mtimeit -s'import asp' 'list(asp.f4())'
1000 loops, best of 3: 406 usec per loop

不如.find基于方法的方法好-仍然要牢记,因为它可能不大可能出现小的一次性错误(如f3上面所述,任何出现+1和-1的循环都应该自动触发一个个的怀疑-许多循环应该缺少这些调整并且应该进行调整-尽管我相信我的代码也是正确的,因为我能够使用其他函数检查其输出’)。

但是基于拆分的方法仍然占主导地位。

顺便说一句:可能更好的样式f4是:

from cStringIO import StringIO

def f4(foo=foo):
    stri = StringIO(foo)
    while True:
        nl = stri.readline()
        if nl == '': break
        yield nl.strip('\n')

至少,它不那么冗长。\n不幸的是,需要去除尾随s禁止使用来更清楚,更快速地替换while循环return iter(stri)iter在现代版本的Python中,多余的部分是多余的,我相信从2.3或2.4开始,但它也是无害的)。也许也值得尝试:

    return itertools.imap(lambda s: s.strip('\n'), stri)

或其变体-但我在这里停止,因为这几乎是strip基础,最简单和最快的一项理论练习。

Here are three possibilities:

foo = """
this is 
a multi-line string.
"""

def f1(foo=foo): return iter(foo.splitlines())

def f2(foo=foo):
    retval = ''
    for char in foo:
        retval += char if not char == '\n' else ''
        if char == '\n':
            yield retval
            retval = ''
    if retval:
        yield retval

def f3(foo=foo):
    prevnl = -1
    while True:
      nextnl = foo.find('\n', prevnl + 1)
      if nextnl < 0: break
      yield foo[prevnl + 1:nextnl]
      prevnl = nextnl

if __name__ == '__main__':
  for f in f1, f2, f3:
    print list(f())

Running this as the main script confirms the three functions are equivalent. With timeit (and a * 100 for foo to get substantial strings for more precise measurement):

$ python -mtimeit -s'import asp' 'list(asp.f3())'
1000 loops, best of 3: 370 usec per loop
$ python -mtimeit -s'import asp' 'list(asp.f2())'
1000 loops, best of 3: 1.36 msec per loop
$ python -mtimeit -s'import asp' 'list(asp.f1())'
10000 loops, best of 3: 61.5 usec per loop

Note we need the list() call to ensure the iterators are traversed, not just built.

IOW, the naive implementation is so much faster it isn’t even funny: 6 times faster than my attempt with find calls, which in turn is 4 times faster than a lower-level approach.

Lessons to retain: measurement is always a good thing (but must be accurate); string methods like splitlines are implemented in very fast ways; putting strings together by programming at a very low level (esp. by loops of += of very small pieces) can be quite slow.

Edit: added @Jacob’s proposal, slightly modified to give the same results as the others (trailing blanks on a line are kept), i.e.:

from cStringIO import StringIO

def f4(foo=foo):
    stri = StringIO(foo)
    while True:
        nl = stri.readline()
        if nl != '':
            yield nl.strip('\n')
        else:
            raise StopIteration

Measuring gives:

$ python -mtimeit -s'import asp' 'list(asp.f4())'
1000 loops, best of 3: 406 usec per loop

not quite as good as the .find based approach — still, worth keeping in mind because it might be less prone to small off-by-one bugs (any loop where you see occurrences of +1 and -1, like my f3 above, should automatically trigger off-by-one suspicions — and so should many loops which lack such tweaks and should have them — though I believe my code is also right since I was able to check its output with other functions’).

But the split-based approach still rules.

An aside: possibly better style for f4 would be:

from cStringIO import StringIO

def f4(foo=foo):
    stri = StringIO(foo)
    while True:
        nl = stri.readline()
        if nl == '': break
        yield nl.strip('\n')

at least, it’s a bit less verbose. The need to strip trailing \ns unfortunately prohibits the clearer and faster replacement of the while loop with return iter(stri) (the iter part whereof is redundant in modern versions of Python, I believe since 2.3 or 2.4, but it’s also innocuous). Maybe worth trying, also:

    return itertools.imap(lambda s: s.strip('\n'), stri)

or variations thereof — but I’m stopping here since it’s pretty much a theoretical exercise wrt the strip based, simplest and fastest, one.


回答 1

我不确定您的意思是“然后再由解析器”。拆分完成后,将不再遍历字符串,而仅遍历拆分字符串列表。只要您的字符串的大小不是绝对很大,这实际上可能是最快的方法。python使用不可变字符串的事实意味着您必须始终创建一个新字符串,因此无论如何都必须这样做。

如果字符串很大,则不利之处在于内存使用情况:您将同时在内存中拥有原始字符串和拆分字符串列表,从而使所需的内存增加了一倍。迭代器方法可以节省您的开销,可以根据需要构建字符串,尽管它仍然要付出“分割”的代价。但是,如果您的字符串太大,则通常甚至要避免将未拆分的字符串存储在内存中。最好只从文件中读取字符串,该文件已经允许您以行形式遍历该字符串。

但是,如果您确实已经在内存中存储了一个巨大的字符串,则一种方法是使用StringIO,它为字符串提供了一个类似于文件的接口,包括允许逐行迭代(内部使用.find查找下一个换行符)。您将得到:

import StringIO
s = StringIO.StringIO(myString)
for line in s:
    do_something_with(line)

I’m not sure what you mean by “then again by the parser”. After the splitting has been done, there’s no further traversal of the string, only a traversal of the list of split strings. This will probably actually be the fastest way to accomplish this, so long as the size of your string isn’t absolutely huge. The fact that python uses immutable strings means that you must always create a new string, so this has to be done at some point anyway.

If your string is very large, the disadvantage is in memory usage: you’ll have the original string and a list of split strings in memory at the same time, doubling the memory required. An iterator approach can save you this, building a string as needed, though it still pays the “splitting” penalty. However, if your string is that large, you generally want to avoid even the unsplit string being in memory. It would be better just to read the string from a file, which already allows you to iterate through it as lines.

However if you do have a huge string in memory already, one approach would be to use StringIO, which presents a file-like interface to a string, including allowing iterating by line (internally using .find to find the next newline). You then get:

import StringIO
s = StringIO.StringIO(myString)
for line in s:
    do_something_with(line)

回答 2

如果我没有看错Modules/cStringIO.c,这应该是非常有效的(尽管有些冗长):

from cStringIO import StringIO

def iterbuf(buf):
    stri = StringIO(buf)
    while True:
        nl = stri.readline()
        if nl != '':
            yield nl.strip()
        else:
            raise StopIteration

If I read Modules/cStringIO.c correctly, this should be quite efficient (although somewhat verbose):

from cStringIO import StringIO

def iterbuf(buf):
    stri = StringIO(buf)
    while True:
        nl = stri.readline()
        if nl != '':
            yield nl.strip()
        else:
            raise StopIteration

回答 3

基于正则表达式的搜索有时比生成器方法要快:

RRR = re.compile(r'(.*)\n')
def f4(arg):
    return (i.group(1) for i in RRR.finditer(arg))

Regex-based searching is sometimes faster than generator approach:

RRR = re.compile(r'(.*)\n')
def f4(arg):
    return (i.group(1) for i in RRR.finditer(arg))

回答 4

我想你可以自己动手:

def parse(string):
    retval = ''
    for char in string:
        retval += char if not char == '\n' else ''
        if char == '\n':
            yield retval
            retval = ''
    if retval:
        yield retval

我不确定此实现的效率如何,但这只会在您的字符串上迭代一次。

嗯,生成器。

编辑:

当然,您还想添加想要执行的任何类型的解析操作,但这很简单。

I suppose you could roll your own:

def parse(string):
    retval = ''
    for char in string:
        retval += char if not char == '\n' else ''
        if char == '\n':
            yield retval
            retval = ''
    if retval:
        yield retval

I’m not sure how efficient this implementation is, but that will only iterate over your string once.

Mmm, generators.

Edit:

Of course you’ll also want to add in whatever type of parsing actions you want to take, but that’s pretty simple.


回答 5

您可以遍历“文件”,该文件将产生包括尾随换行符在内的行。要使用字符串制作“虚拟文件”,可以使用StringIO

import io  # for Py2.7 that would be import cStringIO as io

for line in io.StringIO(foo):
    print(repr(line))

You can iterate over “a file”, which produces lines, including the trailing newline character. To make a “virtual file” out of a string, you can use StringIO:

import io  # for Py2.7 that would be import cStringIO as io

for line in io.StringIO(foo):
    print(repr(line))

使用python list comprehension根据条件查找元素的索引

问题:使用python list comprehension根据条件查找元素的索引

当来自Matlab背景时,以下Python代码似乎很冗长

>>> a = [1, 2, 3, 1, 2, 3]
>>> [index for index,value in enumerate(a) if value > 2]
[2, 5]

在Matlab中,我可以写:

>> a = [1, 2, 3, 1, 2, 3];
>> find(a>2)
ans =
     3     6

是否有使用Python编写此代码的简便方法,还是只保留长版本?


感谢您对Python语法原理的所有建议和解释。

在numpy网站上找到以下内容后,我想我已经找到了喜欢的解决方案:

http://docs.scipy.org/doc/numpy/user/basics.indexing.html#boolean-or-mask-index-arrays

将来自该网站的信息应用于上述我的问题,将得到以下结果:

>>> from numpy import array
>>> a = array([1, 2, 3, 1, 2, 3])
>>> b = a>2 
array([False, False, True, False, False, True], dtype=bool)
>>> r = array(range(len(b)))
>>> r(b)
[2, 5]

然后,下面的内容应该可以工作(但是我手头没有Python解释器来对其进行测试):

class my_array(numpy.array):
    def find(self, b):
        r = array(range(len(b)))
        return r(b)


>>> a = my_array([1, 2, 3, 1, 2, 3])
>>> a.find(a>2)
[2, 5]

The following Python code appears to be very long winded when coming from a Matlab background

>>> a = [1, 2, 3, 1, 2, 3]
>>> [index for index,value in enumerate(a) if value > 2]
[2, 5]

When in Matlab I can write:

>> a = [1, 2, 3, 1, 2, 3];
>> find(a>2)
ans =
     3     6

Is there a short hand method of writing this in Python, or do I just stick with the long version?


Thank you for all the suggestions and explanation of the rationale for Python’s syntax.

After finding the following on the numpy website, I think I have found a solution I like:

http://docs.scipy.org/doc/numpy/user/basics.indexing.html#boolean-or-mask-index-arrays

Applying the information from that website to my problem above, would give the following:

>>> from numpy import array
>>> a = array([1, 2, 3, 1, 2, 3])
>>> b = a>2 
array([False, False, True, False, False, True], dtype=bool)
>>> r = array(range(len(b)))
>>> r(b)
[2, 5]

The following should then work (but I haven’t got a Python interpreter on hand to test it):

class my_array(numpy.array):
    def find(self, b):
        r = array(range(len(b)))
        return r(b)


>>> a = my_array([1, 2, 3, 1, 2, 3])
>>> a.find(a>2)
[2, 5]

回答 0

  • 在Python中,您根本不会为此使用索引,而只处理值[value for value in a if value > 2]。通常,处理索引意味着您没有采取最佳方法。

  • 如果确实需要类似于Matlab的API,则可以使用numpy,这是Python中用于多维数组和数值数学的软件包,受Matlab的启发很大。您将使用numpy数组而不是列表。

    >>> import numpy
    >>> a = numpy.array([1, 2, 3, 1, 2, 3])
    >>> a
    array([1, 2, 3, 1, 2, 3])
    >>> numpy.where(a > 2)
    (array([2, 5]),)
    >>> a > 2
    array([False, False,  True, False, False,  True], dtype=bool)
    >>> a[numpy.where(a > 2)]
    array([3, 3])
    >>> a[a > 2]
    array([3, 3])
  • In Python, you wouldn’t use indexes for this at all, but just deal with the values—[value for value in a if value > 2]. Usually dealing with indexes means you’re not doing something the best way.

  • If you do need an API similar to Matlab’s, you would use numpy, a package for multidimensional arrays and numerical math in Python which is heavily inspired by Matlab. You would be using a numpy array instead of a list.

    >>> import numpy
    >>> a = numpy.array([1, 2, 3, 1, 2, 3])
    >>> a
    array([1, 2, 3, 1, 2, 3])
    >>> numpy.where(a > 2)
    (array([2, 5]),)
    >>> a > 2
    array([False, False,  True, False, False,  True], dtype=bool)
    >>> a[numpy.where(a > 2)]
    array([3, 3])
    >>> a[a > 2]
    array([3, 3])
    

回答 1

其他方式:

>>> [i for i in range(len(a)) if a[i] > 2]
[2, 5]

通常,请记住,虽然这find是一个现成的函数,但列表推导是一个通用的解决方案,因此非常有效。没有什么可以阻止您find使用Python 编写函数并在以后根据需要使用它。即:

>>> def find_indices(lst, condition):
...   return [i for i, elem in enumerate(lst) if condition(elem)]
... 
>>> find_indices(a, lambda e: e > 2)
[2, 5]

请注意,我在这里使用列表来模仿Matlab。使用生成器和迭代器会更Pythonic。

Another way:

>>> [i for i in range(len(a)) if a[i] > 2]
[2, 5]

In general, remember that while find is a ready-cooked function, list comprehensions are a general, and thus very powerful solution. Nothing prevents you from writing a find function in Python and use it later as you wish. I.e.:

>>> def find_indices(lst, condition):
...   return [i for i, elem in enumerate(lst) if condition(elem)]
... 
>>> find_indices(a, lambda e: e > 2)
[2, 5]

Note that I’m using lists here to mimic Matlab. It would be more Pythonic to use generators and iterators.


回答 2

对我来说,效果很好:

>>> import numpy as np
>>> a = np.array([1, 2, 3, 1, 2, 3])
>>> np.where(a > 2)[0]
[2 5]

For me it works well:

>>> import numpy as np
>>> a = np.array([1, 2, 3, 1, 2, 3])
>>> np.where(a > 2)[0]
[2 5]

回答 3

也许另一个问题是,“一旦获得这些索引,您将如何处理这些索引?” 如果要使用它们创建另一个列表,那么在Python中,它们是不必要的中间步骤。如果想要所有与给定条件匹配的值,只需使用内置过滤器:

matchingVals = filter(lambda x : x>2, a)

或编写您自己的列表理解:

matchingVals = [x for x in a if x > 2]

如果要从列表中删除它们,那么Python的方法不一定是从列表中删除,而是像编写新列表一样编写列表理解,然后使用listvar[:]左侧的就地分配-侧:

a[:] = [x for x in a if x <= 2]

Matlab find之所以提供它,是因为其以数组为中心的模型通过使用数组索引选择项目而起作用。当然,您可以在Python中执行此操作,但更Pythonic的方式是使用迭代器和生成器,如@EliBendersky所述。

Maybe another question is, “what are you going to do with those indices once you get them?” If you are going to use them to create another list, then in Python, they are an unnecessary middle step. If you want all the values that match a given condition, just use the builtin filter:

matchingVals = filter(lambda x : x>2, a)

Or write your own list comprhension:

matchingVals = [x for x in a if x > 2]

If you want to remove them from the list, then the Pythonic way is not to necessarily remove from the list, but write a list comprehension as if you were creating a new list, and assigning back in-place using the listvar[:] on the left-hand-side:

a[:] = [x for x in a if x <= 2]

Matlab supplies find because its array-centric model works by selecting items using their array indices. You can do this in Python, certainly, but the more Pythonic way is using iterators and generators, as already mentioned by @EliBendersky.


回答 4

即使答案很晚:我认为这仍然是一个很好的问题,而且恕我直言,Python(没有其他库或工具包(例如numpy))仍然缺乏方便的方法来根据手动定义的过滤器访问列表元素的索引。

您可以手动定义一个提供该功能的功能:

def indices(list, filtr=lambda x: bool(x)):
    return [i for i,x in enumerate(list) if filtr(x)]

print(indices([1,0,3,5,1], lambda x: x==1))

Yield:[0,4]

在我的想象中,完美的方法将是创建列表的子类并添加索引作为类方法。这样,只需要使用filter方法:

class MyList(list):
    def __init__(self, *args):
        list.__init__(self, *args)
    def indices(self, filtr=lambda x: bool(x)):
        return [i for i,x in enumerate(self) if filtr(x)]

my_list = MyList([1,0,3,5,1])
my_list.indices(lambda x: x==1)

我在这里详细介绍了该主题:http//tinyurl.com/jajrr87

Even if it’s a late answer: I think this is still a very good question and IMHO Python (without additional libraries or toolkits like numpy) is still lacking a convenient method to access the indices of list elements according to a manually defined filter.

You could manually define a function, which provides that functionality:

def indices(list, filtr=lambda x: bool(x)):
    return [i for i,x in enumerate(list) if filtr(x)]

print(indices([1,0,3,5,1], lambda x: x==1))

Yields: [0, 4]

In my imagination the perfect way would be making a child class of list and adding the indices function as class method. In this way only the filter method would be needed:

class MyList(list):
    def __init__(self, *args):
        list.__init__(self, *args)
    def indices(self, filtr=lambda x: bool(x)):
        return [i for i,x in enumerate(self) if filtr(x)]

my_list = MyList([1,0,3,5,1])
my_list.indices(lambda x: x==1)

I elaborated a bit more on that topic here: http://tinyurl.com/jajrr87


保存交互式Matplotlib图形

问题:保存交互式Matplotlib图形

有没有一种方法可以保存Matplotlib图形,以便可以重新打开它并恢复典型的交互作用?(就像MATLAB中的.fig格式一样?)

我发现自己多次运行相同的脚本来生成这些交互式图形。或者我要向同事发送多个静态PNG文件,以显示绘图的不同方面。我宁愿发送图形对象,并让它们自己与之交互。

Is there a way to save a Matplotlib figure such that it can be re-opened and have typical interaction restored? (Like the .fig format in MATLAB?)

I find myself running the same scripts many times to generate these interactive figures. Or I’m sending my colleagues multiple static PNG files to show different aspects of a plot. I’d rather send the figure object and have them interact with it themselves.


回答 0

这将是一个很棒的功能,但是AFAIK并没有在Matplotlib中实现,并且由于存储数据的方式可能很难实现自己。

我建议(a)将数据处理与生成图形分开(以唯一的名称保存数据),然后编写图形生成脚本(加载已保存数据的指定文件)并根据需要进行编辑或(b )另存为PDF / SVG / PostScript格式,并在某些精美的图形编辑器(如Adobe Illustrator(或Inkscape))中进行编辑。

编辑后,2012年秋季:正如其他人在下面指出的(尽管在此提及,因为这是公认的答案),自1.2版以来,Matplotlib允许您腌制人物。如发行说明所述,这是一项实验性功能,不支持在一个matplotlib版本中保存图形并在另一个matplotlib版本中打开图形。从不受信任的来源恢复泡菜通常也是不安全的。

对于共享/以后的编辑图(需要首先进行大量数据处理,并且可能需要在几个月后进行调整,例如在科学出版物的同行评审中进行调整),我仍然建议(1)的工作流程具有数据处理脚本,该脚本在生成图之前将处理后的数据(放入绘图中)保存到文件中,并且(2)具有单独的绘图生成脚本(您可以根据需要进行调整)以重新创建绘图。通过这种方式,您可以为每个绘图快速运行脚本并重新生成脚本(并使用新数据快速复制绘图设置)。话虽如此,腌制一个人物可能会方便短期/互动/探索性数据分析。

This would be a great feature, but AFAIK it isn’t implemented in Matplotlib and likely would be difficult to implement yourself due to the way figures are stored.

I’d suggest either (a) separate processing the data from generating the figure (which saves data with a unique name) and write a figure generating script (loading a specified file of the saved data) and editing as you see fit or (b) save as PDF/SVG/PostScript format and edit in some fancy figure editor like Adobe Illustrator (or Inkscape).

EDIT post Fall 2012: As others pointed out below (though mentioning here as this is the accepted answer), Matplotlib since version 1.2 allowed you to pickle figures. As the release notes state, it is an experimental feature and does not support saving a figure in one matplotlib version and opening in another. It’s also generally unsecure to restore a pickle from an untrusted source.

For sharing/later editing plots (that require significant data processing first and may need to be tweaked months later say during peer review for a scientific publication), I still recommend the workflow of (1) have a data processing script that before generating a plot saves the processed data (that goes into your plot) into a file, and (2) have a separate plot generation script (that you adjust as necessary) to recreate the plot. This way for each plot you can quickly run a script and re-generate it (and quickly copy over your plot settings with new data). That said, pickling a figure could be convenient for short term/interactive/exploratory data analysis.


回答 1

我刚刚发现了如何做到这一点。@pelson提到的“实验泡菜支持”效果很好。

试试这个:

# Plot something
import matplotlib.pyplot as plt
fig,ax = plt.subplots()
ax.plot([1,2,3],[10,-10,30])

交互式调整后,将图形对象另存为二进制文件:

import pickle
pickle.dump(fig, open('FigureObject.fig.pickle', 'wb')) # This is for Python 3 - py2 may need `file` instead of `open`

稍后,打开图形并保存调整,并显示GUI交互性:

import pickle
figx = pickle.load(open('FigureObject.fig.pickle', 'rb'))

figx.show() # Show the figure, edit it, etc.!

您甚至可以从图中提取数据:

data = figx.axes[0].lines[0].get_data()

(它适用于线条,pcolor和imshow- pcolormesh可使用一些技巧来重建展平的数据。)

我从使用Pickle保存Matplotlib图形获得了出色的技巧。

I just found out how to do this. The “experimental pickle support” mentioned by @pelson works quite well.

Try this:

# Plot something
import matplotlib.pyplot as plt
fig,ax = plt.subplots()
ax.plot([1,2,3],[10,-10,30])

After your interactive tweaking, save the figure object as a binary file:

import pickle
pickle.dump(fig, open('FigureObject.fig.pickle', 'wb')) # This is for Python 3 - py2 may need `file` instead of `open`

Later, open the figure and the tweaks should be saved and GUI interactivity should be present:

import pickle
figx = pickle.load(open('FigureObject.fig.pickle', 'rb'))

figx.show() # Show the figure, edit it, etc.!

You can even extract the data from the plots:

data = figx.axes[0].lines[0].get_data()

(It works for lines, pcolor & imshow – pcolormesh works with some tricks to reconstruct the flattened data.)

I got the excellent tip from Saving Matplotlib Figures Using Pickle.


回答 2

从Matplotlib 1.2开始,我们现在具有实验性的pickle支持。试一试,看看它是否适合您的情况。如果您有任何问题,请通过Matplotlib邮件列表或通过在github.com/matplotlib/matplotlib上打开问题来告知我们

As of Matplotlib 1.2, we now have experimental pickle support. Give that a go and see if it works well for your case. If you have any issues, please let us know on the Matplotlib mailing list or by opening an issue on github.com/matplotlib/matplotlib.


回答 3

为什么不发送Python脚本呢?MATLAB的.fig文件要求收件人具有MATLAB才能显示它们,因此,这等效于发送需要Matplotlib显示的Python脚本。

或者(免责声明:我还没有尝试过),您可以尝试腌制该图:

import pickle
output = open('interactive figure.pickle', 'wb')
pickle.dump(gcf(), output)
output.close()

Why not just send the Python script? MATLAB’s .fig files require the recipient to have MATLAB to display them, so that’s about equivalent to sending a Python script that requires Matplotlib to display.

Alternatively (disclaimer: I haven’t tried this yet), you could try pickling the figure:

import pickle
output = open('interactive figure.pickle', 'wb')
pickle.dump(gcf(), output)
output.close()

回答 4

好问题。这是来自的文档文本pylab.save

pylab不再提供保存功能,尽管旧的pylab函数仍然可以作为matplotlib.mlab.save使用(您仍然可以在pylab中将其称为“ mlab.save”)。但是,对于纯文本文件,我们建议使用numpy.savetxt。为了保存numpy数组,我们建议使用numpy.save及其类似的numpy.load,它们可以在pylab中以np.save和np.load的形式提供。

Good question. Here is the doc text from pylab.save:

pylab no longer provides a save function, though the old pylab function is still available as matplotlib.mlab.save (you can still refer to it in pylab as “mlab.save”). However, for plain text files, we recommend numpy.savetxt. For saving numpy arrays, we recommend numpy.save, and its analog numpy.load, which are available in pylab as np.save and np.load.


回答 5

我想出了一种相对简单的方法(但还有些不常规)来保存我的matplotlib数字。它是这样的:

import libscript

import matplotlib.pyplot as plt
import numpy as np

t = np.arange(0.0, 2.0, 0.01)
s = 1 + np.sin(2*np.pi*t)

#<plot>
plt.plot(t, s)
plt.xlabel('time (s)')
plt.ylabel('voltage (mV)')
plt.title('About as simple as it gets, folks')
plt.grid(True)
plt.show()
#</plot>

save_plot(fileName='plot_01.py',obj=sys.argv[0],sel='plot',ctx=libscript.get_ctx(ctx_global=globals(),ctx_local=locals()))

具有这样save_plot定义的功能(了解逻辑的简单版本):

def save_plot(fileName='',obj=None,sel='',ctx={}):
    """
    Save of matplolib plot to a stand alone python script containing all the data and configuration instructions to regenerate the interactive matplotlib figure.

    Parameters
    ----------
    fileName : [string] Path of the python script file to be created.
    obj : [object] Function or python object containing the lines of code to create and configure the plot to be saved.
    sel : [string] Name of the tag enclosing the lines of code to create and configure the plot to be saved.
    ctx : [dict] Dictionary containing the execution context. Values for variables not defined in the lines of code for the plot will be fetched from the context.

    Returns
    -------
    Return ``'done'`` once the plot has been saved to a python script file. This file contains all the input data and configuration to re-create the original interactive matplotlib figure.
    """
    import os
    import libscript

    N_indent=4

    src=libscript.get_src(obj=obj,sel=sel)
    src=libscript.prepend_ctx(src=src,ctx=ctx,debug=False)
    src='\n'.join([' '*N_indent+line for line in src.split('\n')])

    if(os.path.isfile(fileName)): os.remove(fileName)
    with open(fileName,'w') as f:
        f.write('import sys\n')
        f.write('sys.dont_write_bytecode=True\n')
        f.write('def main():\n')
        f.write(src+'\n')

        f.write('if(__name__=="__main__"):\n')
        f.write(' '*N_indent+'main()\n')

return 'done'

或定义如下功能save_plot(使用zip压缩以生成更浅的图形文件的更好版本):

def save_plot(fileName='',obj=None,sel='',ctx={}):

    import os
    import json
    import zlib
    import base64
    import libscript

    N_indent=4
    level=9#0 to 9, default: 6
    src=libscript.get_src(obj=obj,sel=sel)
    obj=libscript.load_obj(src=src,ctx=ctx,debug=False)
    bin=base64.b64encode(zlib.compress(json.dumps(obj),level))

    if(os.path.isfile(fileName)): os.remove(fileName)
    with open(fileName,'w') as f:
        f.write('import sys\n')
        f.write('sys.dont_write_bytecode=True\n')
        f.write('def main():\n')
        f.write(' '*N_indent+'import base64\n')
        f.write(' '*N_indent+'import zlib\n')
        f.write(' '*N_indent+'import json\n')
        f.write(' '*N_indent+'import libscript\n')
        f.write(' '*N_indent+'bin="'+str(bin)+'"\n')
        f.write(' '*N_indent+'obj=json.loads(zlib.decompress(base64.b64decode(bin)))\n')
        f.write(' '*N_indent+'libscript.exec_obj(obj=obj,tempfile=False)\n')

        f.write('if(__name__=="__main__"):\n')
        f.write(' '*N_indent+'main()\n')

return 'done'

这使用了libscript我自己的模块,该模块主要依赖于inspectast。如果有兴趣,我可以尝试在Github上分享它(首先需要进行一些清理,然后我才能开始使用Github)。

save_plot函数和libscript模块的思想是获取创建图形的python指令(使用module inspect),对其进行分析(使用module ast)以提取依赖于其的所有变量,函数和模块,从执行上下文中提取这些变量并对其进行序列化如python指令(变量的代码将类似于t=[0.0,2.0,0.01]…,模块的代码将类似于import matplotlib.pyplot as plt…)附加在该图指令之前。生成的python指令将另存为python脚本,其执行将重新构建原始的matplotlib图。

可以想象,这对于大多数(如果不是全部)matplotlib图形都适用。

I figured out a relatively simple way (yet slightly unconventional) to save my matplotlib figures. It works like this:

import libscript

import matplotlib.pyplot as plt
import numpy as np

t = np.arange(0.0, 2.0, 0.01)
s = 1 + np.sin(2*np.pi*t)

#<plot>
plt.plot(t, s)
plt.xlabel('time (s)')
plt.ylabel('voltage (mV)')
plt.title('About as simple as it gets, folks')
plt.grid(True)
plt.show()
#</plot>

save_plot(fileName='plot_01.py',obj=sys.argv[0],sel='plot',ctx=libscript.get_ctx(ctx_global=globals(),ctx_local=locals()))

with function save_plot defined like this (simple version to understand the logic):

def save_plot(fileName='',obj=None,sel='',ctx={}):
    """
    Save of matplolib plot to a stand alone python script containing all the data and configuration instructions to regenerate the interactive matplotlib figure.

    Parameters
    ----------
    fileName : [string] Path of the python script file to be created.
    obj : [object] Function or python object containing the lines of code to create and configure the plot to be saved.
    sel : [string] Name of the tag enclosing the lines of code to create and configure the plot to be saved.
    ctx : [dict] Dictionary containing the execution context. Values for variables not defined in the lines of code for the plot will be fetched from the context.

    Returns
    -------
    Return ``'done'`` once the plot has been saved to a python script file. This file contains all the input data and configuration to re-create the original interactive matplotlib figure.
    """
    import os
    import libscript

    N_indent=4

    src=libscript.get_src(obj=obj,sel=sel)
    src=libscript.prepend_ctx(src=src,ctx=ctx,debug=False)
    src='\n'.join([' '*N_indent+line for line in src.split('\n')])

    if(os.path.isfile(fileName)): os.remove(fileName)
    with open(fileName,'w') as f:
        f.write('import sys\n')
        f.write('sys.dont_write_bytecode=True\n')
        f.write('def main():\n')
        f.write(src+'\n')

        f.write('if(__name__=="__main__"):\n')
        f.write(' '*N_indent+'main()\n')

return 'done'

or defining function save_plot like this (better version using zip compression to produce lighter figure files):

def save_plot(fileName='',obj=None,sel='',ctx={}):

    import os
    import json
    import zlib
    import base64
    import libscript

    N_indent=4
    level=9#0 to 9, default: 6
    src=libscript.get_src(obj=obj,sel=sel)
    obj=libscript.load_obj(src=src,ctx=ctx,debug=False)
    bin=base64.b64encode(zlib.compress(json.dumps(obj),level))

    if(os.path.isfile(fileName)): os.remove(fileName)
    with open(fileName,'w') as f:
        f.write('import sys\n')
        f.write('sys.dont_write_bytecode=True\n')
        f.write('def main():\n')
        f.write(' '*N_indent+'import base64\n')
        f.write(' '*N_indent+'import zlib\n')
        f.write(' '*N_indent+'import json\n')
        f.write(' '*N_indent+'import libscript\n')
        f.write(' '*N_indent+'bin="'+str(bin)+'"\n')
        f.write(' '*N_indent+'obj=json.loads(zlib.decompress(base64.b64decode(bin)))\n')
        f.write(' '*N_indent+'libscript.exec_obj(obj=obj,tempfile=False)\n')

        f.write('if(__name__=="__main__"):\n')
        f.write(' '*N_indent+'main()\n')

return 'done'

This makes use a module libscript of my own, which mostly relies on modules inspect and ast. I can try to share it on Github if interest is expressed (it would first require some cleanup and me to get started with Github).

The idea behind this save_plot function and libscript module is to fetch the python instructions that create the figure (using module inspect), analyze them (using module ast) to extract all variables, functions and modules import it relies on, extract these from the execution context and serialize them as python instructions (code for variables will be like t=[0.0,2.0,0.01] … and code for modules will be like import matplotlib.pyplot as plt …) prepended to the figure instructions. The resulting python instructions are saved as a python script whose execution will re-build the original matplotlib figure.

As you can imagine, this works well for most (if not all) matplotlib figures.


virtualenvwrapper和Python 3

问题:virtualenvwrapper和Python 3

我在ubuntu lucid上安装了python 3.3.1并成功创建了virtualenv,如下所示

virtualenv envpy331 --python=/usr/local/bin/python3.3

envpy331在我的主目录上创建了一个文件夹。

我也已经virtualenvwrapper安装了。但是在文档中仅支持的2.4-2.7版本。python是否有人试图组织python3virtualenv?如果是这样,您能告诉我如何吗?

I installed python 3.3.1 on ubuntu lucid and successfully created a virtualenv as below

virtualenv envpy331 --python=/usr/local/bin/python3.3

this created a folder envpy331 on my home dir.

I also have virtualenvwrapper installed.But in the docs only 2.4-2.7 versions of python are supported..Has anyone tried to organize the python3 virtualenv ? If so, can you tell me how ?


回答 0

virtualenvwrapper的最新版本是Python3.2下进行测试。很有可能它也可以与Python3.3一起使用。

The latest version of virtualenvwrapper is tested under Python3.2. Chances are good it will work with Python3.3 too.


回答 1

如果您已经安装了python3以及virtualenvwrapper,那么在虚拟环境中使用python3的唯一操作就是使用以下命令创建环境:

which python3 #Output: /usr/bin/python3
mkvirtualenv --python=/usr/bin/python3 nameOfEnvironment

或者,(至少在使用brew的OSX上):

mkvirtualenv --python=`which python3` nameOfEnvironment

开始使用环境,您将看到在键入python后立即开始使用python3

If you already have python3 installed as well virtualenvwrapper the only thing you would need to do to use python3 with the virtual environment is creating an environment using:

which python3 #Output: /usr/bin/python3
mkvirtualenv --python=/usr/bin/python3 nameOfEnvironment

Or, (at least on OSX using brew):

mkvirtualenv --python=`which python3` nameOfEnvironment

Start using the environment and you’ll see that as soon as you type python you’ll start using python3


回答 2

您可以使virtualenvwrapper使用自定义的Python二进制文件,而不是运行一个virtualenvwrapper。为此,您需要使用virtualenv使用的VIRTUALENV_PYTHON变量:

$ export VIRTUALENV_PYTHON=/usr/bin/python3
$ mkvirtualenv -a myproject myenv
Running virtualenv with interpreter /usr/bin/python3
New python executable in myenv/bin/python3
Also creating executable in myenv/bin/python
(myenv)$ python
Python 3.2.3 (default, Oct 19 2012, 19:53:16) 
[GCC 4.7.2] on linux2
Type "help", "copyright", "credits" or "license" for more information.

You can make virtualenvwrapper use a custom Python binary instead of the one virtualenvwrapper is run with. To do that you need to use VIRTUALENV_PYTHON variable which is utilized by virtualenv:

$ export VIRTUALENV_PYTHON=/usr/bin/python3
$ mkvirtualenv -a myproject myenv
Running virtualenv with interpreter /usr/bin/python3
New python executable in myenv/bin/python3
Also creating executable in myenv/bin/python
(myenv)$ python
Python 3.2.3 (default, Oct 19 2012, 19:53:16) 
[GCC 4.7.2] on linux2
Type "help", "copyright", "credits" or "license" for more information.

回答 3

virtualenvwrapper现在允许您指定不带路径的python可执行文件。

因此(至少在OSX上)mkvirtualenv --python=python3 nameOfEnvironment就足够了。

virtualenvwrapper now lets you specify the python executable without the path.

So (on OSX at least)mkvirtualenv --python=python3 nameOfEnvironment will suffice.


回答 4

在Ubuntu上;使用使用mkvirtualenv -p python3 env_namepython3加载virtualenv。

在环境内部,用于python --version验证。

On Ubuntu; using mkvirtualenv -p python3 env_name loads the virtualenv with python3.

Inside the env, use python --version to verify.


回答 5

您可以将其添加到您的.bash_profile或类似文件中:

alias mkvirtualenv3='mkvirtualenv --python=`which python3`'

然后在要创建python 3环境时使用mkvirtualenv3代替mkvirtualenv

You can add this to your .bash_profile or similar:

alias mkvirtualenv3='mkvirtualenv --python=`which python3`'

Then use mkvirtualenv3 instead of mkvirtualenv when you want to create a python 3 environment.


回答 6

我发现跑步

export VIRTUALENVWRAPPER_PYTHON=/usr/bin/python3

export VIRTUALENVWRAPPER_VIRTUALENV=/usr/bin/virtualenv-3.4

在Ubuntu上的命令行中,强制mkvirtualenv使用python3和virtualenv-3.4。仍然要做

mkvirtualenv --python=/usr/bin/python3 nameOfEnvironment

创造环境。假设您在/ usr / bin / python3中有python3,在/usr/local/bin/virtualenv-3.4中有virtualenv-3.4。

I find that running

export VIRTUALENVWRAPPER_PYTHON=/usr/bin/python3

and

export VIRTUALENVWRAPPER_VIRTUALENV=/usr/bin/virtualenv-3.4

in the command line on Ubuntu forces mkvirtualenv to use python3 and virtualenv-3.4. One still has to do

mkvirtualenv --python=/usr/bin/python3 nameOfEnvironment

to create the environment. This is assuming that you have python3 in /usr/bin/python3 and virtualenv-3.4 in /usr/local/bin/virtualenv-3.4.


回答 7

关于virtualenvwrapper的bitbucket问题跟踪器的这篇文章可能很有趣。在那里提到,大多数virtualenvwrapper的功能都可以在Python 3.3中的venv虚拟环境中使用。

This post on the bitbucket issue tracker of virtualenvwrapper may be of interest. It is mentioned there that most of virtualenvwrapper’s functions work with the venv virtual environments in Python 3.3.


回答 8

我这样添加export VIRTUALENV_PYTHON=/usr/bin/python3到我的~/.bashrc

export WORKON_HOME=$HOME/.virtualenvs
export VIRTUALENV_PYTHON=/usr/bin/python3
source /usr/local/bin/virtualenvwrapper.sh

然后跑 source .bashrc

您可以为每个新环境指定python版本 mkvirtualenv --python=python2 env_name

I added export VIRTUALENV_PYTHON=/usr/bin/python3 to my ~/.bashrc like this:

export WORKON_HOME=$HOME/.virtualenvs
export VIRTUALENV_PYTHON=/usr/bin/python3
source /usr/local/bin/virtualenvwrapper.sh

then run source .bashrc

and you can specify the python version for each new env mkvirtualenv --python=python2 env_name


如何在Python中获得对当前模块属性的引用

问题:如何在Python中获得对当前模块属性的引用

我想做的事情在命令行中看起来像这样:

>>> import mymodule
>>> names = dir(mymodule)

我如何mymodulemymodule自身内部引用所有定义的名称?

像这样:

# mymodule.py
names = dir(__thismodule__)

What I’m trying to do would look like this in the command line:

>>> import mymodule
>>> names = dir(mymodule)

How can I get a reference to all the names defined in mymodule from within mymodule itself?

Something like this:

# mymodule.py
names = dir(__thismodule__)

回答 0

只需使用globals()

globals()—返回表示当前全局符号表的字典。这始终是当前模块的字典(在函数或方法内部,这是定义它的模块,而不是从中调用它的模块)。

http://docs.python.org/library/functions.html#globals

Just use globals()

globals() — Return a dictionary representing the current global symbol table. This is always the dictionary of the current module (inside a function or method, this is the module where it is defined, not the module from which it is called).

http://docs.python.org/library/functions.html#globals


回答 1

如前所述,全局变量为您提供字典,而dir()则为您提供了模块中定义的名称列表。我通常看到的完成方式是这样的:

import sys
dir(sys.modules[__name__])

As previously mentioned, globals gives you a dictionary as opposed to dir() which gives you a list of the names defined in the module. The way I typically see this done is like this:

import sys
dir(sys.modules[__name__])

回答 2

回答可能为时已晚,但我没有为自己找到正确的答案。inspect.stack()python中最接近,最精确的解决方案(比更快)3.7.x

  # search for first module in the stack
  stack_frame = inspect.currentframe()
  while stack_frame:
    print('***', stack_frame.f_code.co_name, stack_frame.f_code.co_filename, stack_frame.f_lineno)
    if stack_frame.f_code.co_name == '<module>':
      if stack_frame.f_code.co_filename != '<stdin>':
        caller_module = inspect.getmodule(stack_frame)
      else:
        # piped or interactive import
        caller_module = sys.modules['__main__']
      if not caller_module is None:
        #... do something here ...
      break
    stack_frame = stack_frame.f_back

优点

  • globals()方法更讲究。
  • 不依赖于堆栈中间框架,例如,可以通过挂钩或3dparty工具(例如pytest
*** foo ... ..
*** boo ... ..
*** runtest c:\python\x86\37\lib\site-packages\xonsh\pytest_plugin.py 58
*** pytest_runtest_call c:\python\x86\37\lib\site-packages\_pytest\runner.py 125
*** _multicall c:\python\x86\37\lib\site-packages\pluggy\callers.py 187
*** <lambda> c:\python\x86\37\lib\site-packages\pluggy\manager.py 86
*** _hookexec c:\python\x86\37\lib\site-packages\pluggy\manager.py 92
*** __call__ c:\python\x86\37\lib\site-packages\pluggy\hooks.py 286
*** <lambda> c:\python\x86\37\lib\site-packages\_pytest\runner.py 201
*** from_call c:\python\x86\37\lib\site-packages\_pytest\runner.py 229
*** call_runtest_hook c:\python\x86\37\lib\site-packages\_pytest\runner.py 201
*** call_and_report c:\python\x86\37\lib\site-packages\_pytest\runner.py 176
*** runtestprotocol c:\python\x86\37\lib\site-packages\_pytest\runner.py 95
*** pytest_runtest_protocol c:\python\x86\37\lib\site-packages\_pytest\runner.py 80
*** _multicall c:\python\x86\37\lib\site-packages\pluggy\callers.py 187
*** <lambda> c:\python\x86\37\lib\site-packages\pluggy\manager.py 86
*** _hookexec c:\python\x86\37\lib\site-packages\pluggy\manager.py 92
*** __call__ c:\python\x86\37\lib\site-packages\pluggy\hooks.py 286
*** pytest_runtestloop c:\python\x86\37\lib\site-packages\_pytest\main.py 258
*** _multicall c:\python\x86\37\lib\site-packages\pluggy\callers.py 187
*** <lambda> c:\python\x86\37\lib\site-packages\pluggy\manager.py 86
*** _hookexec c:\python\x86\37\lib\site-packages\pluggy\manager.py 92
*** __call__ c:\python\x86\37\lib\site-packages\pluggy\hooks.py 286
*** _main c:\python\x86\37\lib\site-packages\_pytest\main.py 237
*** wrap_session c:\python\x86\37\lib\site-packages\_pytest\main.py 193
*** pytest_cmdline_main c:\python\x86\37\lib\site-packages\_pytest\main.py 230
*** _multicall c:\python\x86\37\lib\site-packages\pluggy\callers.py 187
*** <lambda> c:\python\x86\37\lib\site-packages\pluggy\manager.py 86
*** _hookexec c:\python\x86\37\lib\site-packages\pluggy\manager.py 92
*** __call__ c:\python\x86\37\lib\site-packages\pluggy\hooks.py 286
*** main c:\python\x86\37\lib\site-packages\_pytest\config\__init__.py 90
*** <module> c:\Python\x86\37\Scripts\pytest.exe\__main__.py 7
  • 可以处理python管道或交互式会话。

缺点:

  • 一种非常精确的方法,可以返回在可执行文件中注册的模块,例如pytest.exe可能不需要的模块。
  • inspect.getmodule 在有效模块上仍然可能会返回None,具体取决于挂钩

我对python进行了扩展: 如何在给定完整路径的情况下导入模块?

该扩展具有包装功能的情况:

def tkl_get_stack_frame_module_by_offset(skip_stack_frames = 0, use_last_frame_on_out_of_stack = False):
  ...

def tkl_get_stack_frame_module_by_name(name = '<module>'):
  ...

您只需要正确初始化扩展名即可:

# portable import to the global space
sys.path.append(<path-to-tacklelib-module-directory>)
import tacklelib as tkl

tkl.tkl_init(tkl, global_config = {'log_import_module':os.environ.get('TACKLELIB_LOG_IMPORT_MODULE')})

# cleanup
del tkl # must be instead of `tkl = None`, otherwise the variable would be still persist
sys.path.pop()

# use `tkl_*` functions directly from here ...

It might be late to answer, but I didn’t found the correct answer for myself. The most closest and precise solution (faster than inspect.stack()) in the python 3.7.x:

  # search for first module in the stack
  stack_frame = inspect.currentframe()
  while stack_frame:
    print('***', stack_frame.f_code.co_name, stack_frame.f_code.co_filename, stack_frame.f_lineno)
    if stack_frame.f_code.co_name == '<module>':
      if stack_frame.f_code.co_filename != '<stdin>':
        caller_module = inspect.getmodule(stack_frame)
      else:
        # piped or interactive import
        caller_module = sys.modules['__main__']
      if not caller_module is None:
        #... do something here ...
      break
    stack_frame = stack_frame.f_back

Pros:

  • Preciser than globals() method.
  • Does not depend on the stack intermediate frames, which can be added for example, via hooking or by the 3dparty tools like pytest:
*** foo ... ..
*** boo ... ..
*** runtest c:\python\x86\37\lib\site-packages\xonsh\pytest_plugin.py 58
*** pytest_runtest_call c:\python\x86\37\lib\site-packages\_pytest\runner.py 125
*** _multicall c:\python\x86\37\lib\site-packages\pluggy\callers.py 187
*** <lambda> c:\python\x86\37\lib\site-packages\pluggy\manager.py 86
*** _hookexec c:\python\x86\37\lib\site-packages\pluggy\manager.py 92
*** __call__ c:\python\x86\37\lib\site-packages\pluggy\hooks.py 286
*** <lambda> c:\python\x86\37\lib\site-packages\_pytest\runner.py 201
*** from_call c:\python\x86\37\lib\site-packages\_pytest\runner.py 229
*** call_runtest_hook c:\python\x86\37\lib\site-packages\_pytest\runner.py 201
*** call_and_report c:\python\x86\37\lib\site-packages\_pytest\runner.py 176
*** runtestprotocol c:\python\x86\37\lib\site-packages\_pytest\runner.py 95
*** pytest_runtest_protocol c:\python\x86\37\lib\site-packages\_pytest\runner.py 80
*** _multicall c:\python\x86\37\lib\site-packages\pluggy\callers.py 187
*** <lambda> c:\python\x86\37\lib\site-packages\pluggy\manager.py 86
*** _hookexec c:\python\x86\37\lib\site-packages\pluggy\manager.py 92
*** __call__ c:\python\x86\37\lib\site-packages\pluggy\hooks.py 286
*** pytest_runtestloop c:\python\x86\37\lib\site-packages\_pytest\main.py 258
*** _multicall c:\python\x86\37\lib\site-packages\pluggy\callers.py 187
*** <lambda> c:\python\x86\37\lib\site-packages\pluggy\manager.py 86
*** _hookexec c:\python\x86\37\lib\site-packages\pluggy\manager.py 92
*** __call__ c:\python\x86\37\lib\site-packages\pluggy\hooks.py 286
*** _main c:\python\x86\37\lib\site-packages\_pytest\main.py 237
*** wrap_session c:\python\x86\37\lib\site-packages\_pytest\main.py 193
*** pytest_cmdline_main c:\python\x86\37\lib\site-packages\_pytest\main.py 230
*** _multicall c:\python\x86\37\lib\site-packages\pluggy\callers.py 187
*** <lambda> c:\python\x86\37\lib\site-packages\pluggy\manager.py 86
*** _hookexec c:\python\x86\37\lib\site-packages\pluggy\manager.py 92
*** __call__ c:\python\x86\37\lib\site-packages\pluggy\hooks.py 286
*** main c:\python\x86\37\lib\site-packages\_pytest\config\__init__.py 90
*** <module> c:\Python\x86\37\Scripts\pytest.exe\__main__.py 7
  • Can handle python piped or interactive session.

Cons:

  • A kind of much precise and can return modules registered in an executable like for the pytest.exe which might not what you want.
  • inspect.getmodule still may return None on valid modules depending on hooking

I have an extension to the python: How to import a module given the full path?

The extension having wrapper functions for that case:

def tkl_get_stack_frame_module_by_offset(skip_stack_frames = 0, use_last_frame_on_out_of_stack = False):
  ...

def tkl_get_stack_frame_module_by_name(name = '<module>'):
  ...

You have to just initialize the extension properly:

# portable import to the global space
sys.path.append(<path-to-tacklelib-module-directory>)
import tacklelib as tkl

tkl.tkl_init(tkl, global_config = {'log_import_module':os.environ.get('TACKLELIB_LOG_IMPORT_MODULE')})

# cleanup
del tkl # must be instead of `tkl = None`, otherwise the variable would be still persist
sys.path.pop()

# use `tkl_*` functions directly from here ...