标签归档:Python

导入语句是否应该始终位于模块的顶部?

问题:导入语句是否应该始终位于模块的顶部?

PEP 08指出:

导入总是放在文件的顶部,紧随任何模块注释和文档字符串之后,以及模块全局变量和常量之前。

但是,如果仅在极少数情况下使用我要导入的类/方法/函数,那么在需要时进行导入肯定会更有效吗?

这不是吗?

class SomeClass(object):

    def not_often_called(self)
        from datetime import datetime
        self.datetime = datetime.now()

比这更有效?

from datetime import datetime

class SomeClass(object):

    def not_often_called(self)
        self.datetime = datetime.now()

PEP 08 states:

Imports are always put at the top of the file, just after any module comments and docstrings, and before module globals and constants.

However if the class/method/function that I am importing is only used in rare cases, surely it is more efficient to do the import when it is needed?

Isn’t this:

class SomeClass(object):

    def not_often_called(self)
        from datetime import datetime
        self.datetime = datetime.now()

more efficient than this?

from datetime import datetime

class SomeClass(object):

    def not_often_called(self)
        self.datetime = datetime.now()

回答 0

模块导入非常快,但不是即时的。这意味着:

  • 将导入放在模块顶部很好,因为这是微不足道的成本,只需要支付一次即可。
  • 将导入放在函数中会导致对该函数的调用花费更长时间。

因此,如果您关心效率,则将进口放在首位。仅在您的分析显示有帮助的情况下,才将它们移入函数中(您进行了概要分析以查看最能改善性能的地方,对吗?)


我见过执行延迟导入的最佳原因是:

  • 可选的库支持。如果您的代码具有使用不同库的多个路径,则在未安装可选库的情况下不要中断。
  • __init__.py插件的中,可能已导入但并未实际使用。例如Bazaar插件,它使用bzrlib的延迟加载框架。

Module importing is quite fast, but not instant. This means that:

  • Putting the imports at the top of the module is fine, because it’s a trivial cost that’s only paid once.
  • Putting the imports within a function will cause calls to that function to take longer.

So if you care about efficiency, put the imports at the top. Only move them into a function if your profiling shows that would help (you did profile to see where best to improve performance, right??)


The best reasons I’ve seen to perform lazy imports are:

  • Optional library support. If your code has multiple paths that use different libraries, don’t break if an optional library is not installed.
  • In the __init__.py of a plugin, which might be imported but not actually used. Examples are Bazaar plugins, which use bzrlib‘s lazy-loading framework.

回答 1

将import语句放在函数内部可以防止循环依赖。例如,如果您有两个模块X.py和Y.py,并且它们都需要互相导入,那么当您导入其中一个模块导致无限循环时,这将导致循环依赖。如果将import语句移动到一个模块中,则它将在调用该函数之前不会尝试导入另一个模块,并且该模块将已经被导入,因此不会出现无限循环。在此处阅读更多内容-effbot.org/zone/import-confusion.htm

Putting the import statement inside of a function can prevent circular dependencies. For example, if you have 2 modules, X.py and Y.py, and they both need to import each other, this will cause a circular dependency when you import one of the modules causing an infinite loop. If you move the import statement in one of the modules then it won’t try to import the other module till the function is called, and that module will already be imported, so no infinite loop. Read here for more – effbot.org/zone/import-confusion.htm


回答 2

我采用了将所有导入放入使用它们的函数中而不是放在模块顶部的做法。

我得到的好处是能够更可靠地进行重构。当我将一个功能从一个模块移动到另一个模块时,我知道该功能将继续使用其完整的测试遗留功能。如果我在模块的顶部放置了导入文件,那么当我移动一个函数时,我发现我花了很多时间来使新模块的导入文件完整而最少。重构IDE可能与此无关。

如其他地方提到的那样,存在速度损失。我已经在我的应用程序中对此进行了测量,发现对于我的目的而言它并不重要。

能够预先查看所有模块依赖性而无需借助搜索(例如grep),也很不错。但是,我关心模块依赖性的原因通常是因为我正在安装,重构或移动包含多个文件的整个系统,而不仅仅是一个模块。在这种情况下,无论如何,我将执行全局搜索以确保我具有系统级依赖项。因此,我还没有发现全局导入可以帮助我在实践中理解系统。

我通常将检查的内容sys放入if __name__=='__main__'检查中,然后将参数(如sys.argv[1:])传递给main()函数。这使我可以mainsys尚未导入的上下文中使用。

I have adopted the practice of putting all imports in the functions that use them, rather than at the top of the module.

The benefit I get is the ability to refactor more reliably. When I move a function from one module to another, I know that the function will continue to work with all of its legacy of testing intact. If I have my imports at the top of the module, when I move a function, I find that I end up spending a lot of time getting the new module’s imports complete and minimal. A refactoring IDE might make this irrelevant.

There is a speed penalty as mentioned elsewhere. I have measured this in my application and found it to be insignificant for my purposes.

It is also nice to be able to see all module dependencies up front without resorting to search (e.g. grep). However, the reason I care about module dependencies is generally because I’m installing, refactoring, or moving an entire system comprising multiple files, not just a single module. In that case, I’m going to perform a global search anyway to make sure I have the system-level dependencies. So I have not found global imports to aid my understanding of a system in practice.

I usually put the import of sys inside the if __name__=='__main__' check and then pass arguments (like sys.argv[1:]) to a main() function. This allows me to use main in a context where sys has not been imported.


回答 3

在大多数情况下,这样做对于保持清晰性和明智性很有用,但并非总是如此。以下是几个可能会在其他地方导入模块的情况的示例。

首先,您可以拥有一个带有以下形式的单元测试的模块:

if __name__ == '__main__':
    import foo
    aa = foo.xyz()         # initiate something for the test

其次,您可能需要在运行时有条件地导入一些不同的模块。

if [condition]:
    import foo as plugin_api
else:
    import bar as plugin_api
xx = plugin_api.Plugin()
[...]

在其他情况下,您可能会将导入放置在代码的其他部分中。

Most of the time this would be useful for clarity and sensible to do but it’s not always the case. Below are a couple of examples of circumstances where module imports might live elsewhere.

Firstly, you could have a module with a unit test of the form:

if __name__ == '__main__':
    import foo
    aa = foo.xyz()         # initiate something for the test

Secondly, you might have a requirement to conditionally import some different module at runtime.

if [condition]:
    import foo as plugin_api
else:
    import bar as plugin_api
xx = plugin_api.Plugin()
[...]

There are probably other situations where you might place imports in other parts in the code.


回答 4

当函数被调用为零或一次时,第一种变体的确比第二种变体更有效。但是,在第二次及其后的调用中,“导入每个调用”方法实际上效率较低。请参阅此链接以获取延迟加载技术,该技术通过执行“延迟导入”结合了两种方法的优点。

但是,除了效率之外,还有其他原因导致您可能会偏爱一个。一种方法是使阅读该模块相关代码的人更加清楚。它们还具有非常不同的故障特征-如果没有“ datetime”模块,第一个将在加载时失败,而第二个在调用该方法之前不会失败。

补充说明:在IronPython中,导入可能比CPython中昂贵得多,因为代码基本上是在导入时进行编译的。

The first variant is indeed more efficient than the second when the function is called either zero or one times. With the second and subsequent invocations, however, the “import every call” approach is actually less efficient. See this link for a lazy-loading technique that combines the best of both approaches by doing a “lazy import”.

But there are reasons other than efficiency why you might prefer one over the other. One approach is makes it much more clear to someone reading the code as to the dependencies that this module has. They also have very different failure characteristics — the first will fail at load time if there’s no “datetime” module while the second won’t fail until the method is called.

Added Note: In IronPython, imports can be quite a bit more expensive than in CPython because the code is basically being compiled as it’s being imported.


回答 5

Curt提出了一个很好的观点:第二个版本更清晰,它将在加载时而不是以后失败,并且出乎意料地失败。

通常,我不必担心模块的加载效率,因为它的速度(a)非常快,而(b)大多仅在启动时发生。

如果必须在意外的时刻加载重量级模块,则可以通过该__import__函数动态加载它们,并确保捕获ImportError异常并以合理的方式处理它们,这可能更有意义。

Curt makes a good point: the second version is clearer and will fail at load time rather than later, and unexpectedly.

Normally I don’t worry about the efficiency of loading modules, since it’s (a) pretty fast, and (b) mostly only happens at startup.

If you have to load heavyweight modules at unexpected times, it probably makes more sense to load them dynamically with the __import__ function, and be sure to catch ImportError exceptions, and handle them in a reasonable manner.


回答 6

我不会担心过多地预先加载模块的效率。模块占用的内存不会很大(假设它足够模块化),启动成本可以忽略不计。

在大多数情况下,您希望将模块加载到源文件的顶部。对于阅读您的代码的人来说,它更容易分辨出哪个功能或对象来自哪个模块。

将模块导入代码中其他位置的一个很好的理由是,如果该模块在调试语句中使用过。

例如:

do_something_with_x(x)

我可以使用以下命令调试它:

from pprint import pprint
pprint(x)
do_something_with_x(x)

当然,将模块导入代码中其他位置的另一个原因是是否需要动态导入它们。这是因为您几乎别无选择。

我不会担心过多地预先加载模块的效率。模块占用的内存不会很大(假设它足够模块化),启动成本可以忽略不计。

I wouldn’t worry about the efficiency of loading the module up front too much. The memory taken up by the module won’t be very big (assuming it’s modular enough) and the startup cost will be negligible.

In most cases you want to load the modules at the top of the source file. For somebody reading your code, it makes it much easier to tell what function or object came from what module.

One good reason to import a module elsewhere in the code is if it’s used in a debugging statement.

For example:

do_something_with_x(x)

I could debug this with:

from pprint import pprint
pprint(x)
do_something_with_x(x)

Of course, the other reason to import modules elsewhere in the code is if you need to dynamically import them. This is because you pretty much don’t have any choice.

I wouldn’t worry about the efficiency of loading the module up front too much. The memory taken up by the module won’t be very big (assuming it’s modular enough) and the startup cost will be negligible.


回答 7

这是一个折衷,只有程序员才能决定进行。

情况1通过在需要之前不导入datetime模块(并进行可能需要的任何初始化)来节省一些内存和启动时间。请注意,“仅在调用时”执行导入也意味着“在调用时每次”进行导入,因此第一个调用之后的每个调用仍会产生执行导入的额外开销。

情况2通过预先导入datetime来节省一些执行时间和延迟,以便not_often_drawn()在调用时将更快地返回,并且还不会在每次调用时都导致导入开销。

除了效率外,如果import语句在…前面,则更容易在前面看到模块依赖性。将它们隐藏在代码中会使您更难于找到所需的模块。

就个人而言,除了单元测试之类的东西外,我通常都遵循PEP,因此我不希望总是加载它,因为我知道除了测试代码之外不会使用它们。

It’s a tradeoff, that only the programmer can decide to make.

Case 1 saves some memory and startup time by not importing the datetime module (and doing whatever initialization it might require) until needed. Note that doing the import ‘only when called’ also means doing it ‘every time when called’, so each call after the first one is still incurring the additional overhead of doing the import.

Case 2 save some execution time and latency by importing datetime beforehand so that not_often_called() will return more quickly when it is called, and also by not incurring the overhead of an import on every call.

Besides efficiency, it’s easier to see module dependencies up front if the import statements are … up front. Hiding them down in the code can make it more difficult to easily find what modules something depends on.

Personally I generally follow the PEP except for things like unit tests and such that I don’t want always loaded because I know they aren’t going to be used except for test code.


回答 8

这是一个示例,其中所有导入都位于最顶部(这是我唯一需要这样做的时间)。我希望能够在Un * x和Windows上终止子进程。

import os
# ...
try:
    kill = os.kill  # will raise AttributeError on Windows
    from signal import SIGTERM
    def terminate(process):
        kill(process.pid, SIGTERM)
except (AttributeError, ImportError):
    try:
        from win32api import TerminateProcess  # use win32api if available
        def terminate(process):
            TerminateProcess(int(process._handle), -1)
    except ImportError:
        def terminate(process):
            raise NotImplementedError  # define a dummy function

(评论:约翰·米利金说的话。)

Here’s an example where all the imports are at the very top (this is the only time I’ve needed to do this). I want to be able to terminate a subprocess on both Un*x and Windows.

import os
# ...
try:
    kill = os.kill  # will raise AttributeError on Windows
    from signal import SIGTERM
    def terminate(process):
        kill(process.pid, SIGTERM)
except (AttributeError, ImportError):
    try:
        from win32api import TerminateProcess  # use win32api if available
        def terminate(process):
            TerminateProcess(int(process._handle), -1)
    except ImportError:
        def terminate(process):
            raise NotImplementedError  # define a dummy function

(On review: what John Millikin said.)


回答 9

就像许多其他优化一样,您会牺牲一些可读性来提高速度。如John所述,如果您完成了概要分析作业,并且发现这是一项非常有用的更改,并且您需要额外的速度,则可以继续进行。在所有其他进口商品上加上注释可能会很好:

from foo import bar
from baz import qux
# Note: datetime is imported in SomeClass below

This is like many other optimizations – you sacrifice some readability for speed. As John mentioned, if you’ve done your profiling homework and found this to be a significantly useful enough change and you need the extra speed, then go for it. It’d probably be good to put a note up with all the other imports:

from foo import bar
from baz import qux
# Note: datetime is imported in SomeClass below

回答 10

模块初始化仅发生一次-在首次导入时。如果有问题的模块来自标准库,那么您也可能会从程序中的其他模块导入它。对于像日期时间一样普遍的模块,它也可能是许多其他标准库的依赖项。由于模块初始化已经发生,因此import语句的花费很少。此时,它所做的全部工作就是将现有模块对象绑定到本地范围。

将该信息与用于可读性的参数相结合,我想说最好在模块范围内使用import语句。

Module initialization only occurs once – on the first import. If the module in question is from the standard library, then you will likely import it from other modules in your program as well. For a module as prevalent as datetime, it is also likely a dependency for a slew of other standard libraries. The import statement would cost very little then since the module intialization would have happened already. All it is doing at this point is binding the existing module object to the local scope.

Couple that information with the argument for readability and I would say that it is best to have the import statement at module scope.


回答 11

只是为了完成萌的答案和原始问题:

当我们不得不处理循环依赖时,我们可以做一些“技巧”。假设我们正在与模块的工作a.py,并b.py包含x()和B y()分别。然后:

  1. 我们可以移动from imports模块底部的之一。
  2. 我们可以移动from imports实际上需要导入的函数或方法的内部之一(这并不总是可能的,因为您可以在多个地方使用它)。
  3. 我们可以将两者之一更改from imports为如下所示的导入:import a

因此,总结一下。如果您不是在处理循环依赖关系,而是采取某种技巧来避免它们,那么最好将所有导入内容放在顶部,因为在此问题的其他答案中已经说明了这些原因。并且,请在做“技巧”时添加评论,我们始终欢迎您!:)

Just to complete Moe’s answer and the original question:

When we have to deal with circular dependences we can do some “tricks”. Assuming we’re working with modules a.py and b.py that contain x() and b y(), respectively. Then:

  1. We can move one of the from imports at the bottom of the module.
  2. We can move one of the from imports inside the function or method that is actually requiring the import (this isn’t always possible, as you may use it from several places).
  3. We can change one of the two from imports to be an import that looks like: import a

So, to conclude. If you aren’t dealing with circular dependencies and doing some kind of trick to avoid them, then it’s better to put all your imports at the top because of the reasons already explained in other answers to this question. And please, when doing this “tricks” include a comment, it’s always welcome! :)


回答 12

除了已经给出的出色答案外,值得注意的是,进口商品的摆放不仅是风格问题。有时,模块具有隐式依赖关系,需要首先导入或初始化,而顶级导入可能会导致违反所需的执行顺序。

这个问题通常出现在Apache Spark的Python API中,您需要在导入任何pyspark软件包或模块之前初始化SparkContext。最好将pyspark导入放置在保证SparkContext可用的范围内。

In addition to the excellent answers already given, it’s worth noting that the placement of imports is not merely a matter of style. Sometimes a module has implicit dependencies that need to be imported or initialized first, and a top-level import could lead to violations of the required order of execution.

This issue often comes up in Apache Spark’s Python API, where you need to initialize the SparkContext before importing any pyspark packages or modules. It’s best to place pyspark imports in a scope where the SparkContext is guaranteed to be available.


回答 13

我很惊讶地没有看到已经发布的重复负载检查的实际成本数字,尽管对预期的结果有很多很好的解释。

如果您在顶部导入,则无论如何都会承受重击。这个数字很小,但是通常以毫秒为单位,而不是纳秒。

如果导入功能(S)之内,那么你只需要命中的加载,如果首次调用这些功能之一。正如许多人指出的那样,如果根本不发生这种情况,则可以节省加载时间。但是,如果函数被调用很多,您将遭受一次重复的打击,尽管命中率要小得多(用于检查它是否已加载;不是实际重新加载)。另一方面,正如@aaronasterling指出的那样,您还可以节省一点,因为在函数中进行导入使该函数可以使用稍快的局部变量查找来稍后标识名称(http://stackoverflow.com/questions/477096/python- import-coding-style / 4789963#4789963)。

这是一个简单测试的结果,该测试从函数内部导入了一些东西。报告的时间(在2.3 GHz Intel Core i7上的Python 2.7.14中)显示如下(第二次调用比以后的调用更多,这似乎是一致的,尽管我不知道为什么)。

 0 foo:   14429.0924 µs
 1 foo:      63.8962 µs
 2 foo:      10.0136 µs
 3 foo:       7.1526 µs
 4 foo:       7.8678 µs
 0 bar:       9.0599 µs
 1 bar:       6.9141 µs
 2 bar:       7.1526 µs
 3 bar:       7.8678 µs
 4 bar:       7.1526 µs

编码:

from __future__ import print_function
from time import time

def foo():
    import collections
    import re
    import string
    import math
    import subprocess
    return

def bar():
    import collections
    import re
    import string
    import math
    import subprocess
    return

t0 = time()
for i in xrange(5):
    foo()
    t1 = time()
    print("    %2d foo: %12.4f \xC2\xB5s" % (i, (t1-t0)*1E6))
    t0 = t1
for i in xrange(5):
    bar()
    t1 = time()
    print("    %2d bar: %12.4f \xC2\xB5s" % (i, (t1-t0)*1E6))
    t0 = t1

I was surprised not to see actual cost numbers for the repeated load-checks posted already, although there are many good explanations of what to expect.

If you import at the top, you take the load hit no matter what. That’s pretty small, but commonly in the milliseconds, not nanoseconds.

If you import within a function(s), then you only take the hit for loading if and when one of those functions is first called. As many have pointed out, if that doesn’t happen at all, you save the load time. But if the function(s) get called a lot, you take a repeated though much smaller hit (for checking that it has been loaded; not for actually re-loading). On the other hand, as @aaronasterling pointed out you also save a little because importing within a function lets the function use slightly-faster local variable lookups to identify the name later (http://stackoverflow.com/questions/477096/python-import-coding-style/4789963#4789963).

Here are the results of a simple test that imports a few things from inside a function. The times reported (in Python 2.7.14 on a 2.3 GHz Intel Core i7) are shown below (the 2nd call taking more than later calls seems consistent, though I don’t know why).

 0 foo:   14429.0924 µs
 1 foo:      63.8962 µs
 2 foo:      10.0136 µs
 3 foo:       7.1526 µs
 4 foo:       7.8678 µs
 0 bar:       9.0599 µs
 1 bar:       6.9141 µs
 2 bar:       7.1526 µs
 3 bar:       7.8678 µs
 4 bar:       7.1526 µs

The code:

from __future__ import print_function
from time import time

def foo():
    import collections
    import re
    import string
    import math
    import subprocess
    return

def bar():
    import collections
    import re
    import string
    import math
    import subprocess
    return

t0 = time()
for i in xrange(5):
    foo()
    t1 = time()
    print("    %2d foo: %12.4f \xC2\xB5s" % (i, (t1-t0)*1E6))
    t0 = t1
for i in xrange(5):
    bar()
    t1 = time()
    print("    %2d bar: %12.4f \xC2\xB5s" % (i, (t1-t0)*1E6))
    t0 = t1

回答 14

我不希望提供完整的答案,因为其他人已经做得很好。当我发现在功能内部导入模块特别有用时,我只想提及一个用例。我的应用程序使用存储在特定位置的python软件包和模块作为插件。在应用程序启动期间,应用程序遍历该位置的所有模块并将其导入,然后在模块内部查找,如果找到了插件的安装点(在我的情况下,它是具有唯一标识的某些基类的子类ID)将其注册。插件的数量很大(现在有几十个,但将来可能有数百个),每个插件很少使用。在应用程序启动过程中,在我的插件模块顶部添加了第三方库,这会带来一些损失。尤其是某些第三方库的导入非常繁重(例如,密谋导入甚至尝试连接到Internet并下载一些内容,这些内容在启动时增加了大约一秒钟的时间)。通过优化插件中的导入(仅在使用它们的函数中调用它们),我设法将启动时间从10秒缩短到大约2秒。对于我的用户而言,这是一个很大的差异。

所以我的答案是不,不要总是将导入放在模块的顶部。

I do not aspire to provide complete answer, because others have already done this very well. I just want to mention one use case when I find especially useful to import modules inside functions. My application uses python packages and modules stored in certain location as plugins. During application startup, the application walks through all the modules in the location and imports them, then it looks inside the modules and if it finds some mounting points for the plugins (in my case it is a subclass of a certain base class having a unique ID) it registers them. The number of plugins is large (now dozens, but maybe hundreds in the future) and each of them is used quite rarely. Having imports of third party libraries at the top of my plugin modules was a bit penalty during application startup. Especially some thirdparty libraries are heavy to import (e.g. import of plotly even tries to connect to internet and download something which was adding about one second to startup). By optimizing imports (calling them only in the functions where they are used) in the plugins I managed to shrink the startup from 10 seconds to some 2 seconds. That is a big difference for my users.

So my answer is no, do not always put the imports at the top of your modules.


回答 15

有趣的是,到目前为止,没有一个答案提到了并行处理,当序列化的函数代码被推到其他内核时,例如ipyparallel的情况,可能需要在函数中引入导入。

It’s interesting that not a single answer mentioned parallel processing so far, where it might be REQUIRED that the imports are in the function, when the serialized function code is what is being pushed around to other cores, e.g. like in the case of ipyparallel.


回答 16

通过将变量/局部作用域导入函数内部,可以提高性能。这取决于函数中导入事物的用法。如果要循环很多次并访问模块全局对象,则将其作为本地导入可以有所帮助。

test.py

X=10
Y=11
Z=12
def add(i):
  i = i + 10

runlocal.py

from test import add, X, Y, Z

    def callme():
      x=X
      y=Y
      z=Z
      ladd=add 
      for i  in range(100000000):
        ladd(i)
        x+y+z

    callme()

运行

from test import add, X, Y, Z

def callme():
  for i in range(100000000):
    add(i)
    X+Y+Z

callme()

在Linux上使用一段时间显示收益很小

/usr/bin/time -f "\t%E real,\t%U user,\t%S sys" python run.py 
    0:17.80 real,   17.77 user, 0.01 sys
/tmp/test$ /usr/bin/time -f "\t%E real,\t%U user,\t%S sys" python runlocal.py 
    0:14.23 real,   14.22 user, 0.01 sys

真正的是壁钟。用户是程序中的时间。sys是时候进行系统调用了。

https://docs.python.org/3.5/reference/executionmodel.html#resolution-of-names

There can be a performance gain by importing variables/local scoping inside of a function. This depends on the usage of the imported thing inside the function. If you are looping many times and accessing a module global object, importing it as local can help.

test.py

X=10
Y=11
Z=12
def add(i):
  i = i + 10

runlocal.py

from test import add, X, Y, Z

    def callme():
      x=X
      y=Y
      z=Z
      ladd=add 
      for i  in range(100000000):
        ladd(i)
        x+y+z

    callme()

run.py

from test import add, X, Y, Z

def callme():
  for i in range(100000000):
    add(i)
    X+Y+Z

callme()

A time on Linux shows a small gain

/usr/bin/time -f "\t%E real,\t%U user,\t%S sys" python run.py 
    0:17.80 real,   17.77 user, 0.01 sys
/tmp/test$ /usr/bin/time -f "\t%E real,\t%U user,\t%S sys" python runlocal.py 
    0:14.23 real,   14.22 user, 0.01 sys

real is wall clock. user is time in program. sys is time for system calls.

https://docs.python.org/3.5/reference/executionmodel.html#resolution-of-names


回答 17

可读性

除了启动性能外,还有一个可读性参数可用于本地化import语句。例如,在我当前的第一个python项目中,使用python行号1283到1296:

listdata.append(['tk font version', font_version])
listdata.append(['Gtk version', str(Gtk.get_major_version())+"."+
                 str(Gtk.get_minor_version())+"."+
                 str(Gtk.get_micro_version())])

import xml.etree.ElementTree as ET

xmltree = ET.parse('/usr/share/gnome/gnome-version.xml')
xmlroot = xmltree.getroot()
result = []
for child in xmlroot:
    result.append(child.text)
listdata.append(['Gnome version', result[0]+"."+result[1]+"."+
                 result[2]+" "+result[3]])

如果该import语句位于文件的顶部,则必须向上滚动很长一段距离,或者按Home,以查找内容ET。然后,我将不得不导航回到第1283行以继续阅读代码。

确实,即使 import语句位于函数(或类)的顶部(如许多语句所放置的那样),也需要向上和向下分页。

显示Gnome版本号的操作很少,因此import文件顶部会引入不必要的启动延迟。

Readability

In addition to startup performance, there is a readability argument to be made for localizing import statements. For example take python line numbers 1283 through 1296 in my current first python project:

listdata.append(['tk font version', font_version])
listdata.append(['Gtk version', str(Gtk.get_major_version())+"."+
                 str(Gtk.get_minor_version())+"."+
                 str(Gtk.get_micro_version())])

import xml.etree.ElementTree as ET

xmltree = ET.parse('/usr/share/gnome/gnome-version.xml')
xmlroot = xmltree.getroot()
result = []
for child in xmlroot:
    result.append(child.text)
listdata.append(['Gnome version', result[0]+"."+result[1]+"."+
                 result[2]+" "+result[3]])

If the import statement was at the top of file I would have to scroll up a long way, or press Home, to find out what ET was. Then I would have to navigate back to line 1283 to continue reading code.

Indeed even if the import statement was at the top of the function (or class) as many would place it, paging up and back down would be required.

Displaying the Gnome version number will rarely be done so the import at top of file introduces unnecessary startup lag.


回答 18

我想提一下我的一个用例,与@John Millikin和@VK提到的用例非常相似:

可选进口

我使用Jupyter Notebook进行数据分析,并且使用相同的IPython Notebook作为所有分析的模板。在某些情况下,我需要导入Tensorflow来进行一些快速的模型运行,但有时我会在未设置tensorflow或导入缓慢的地方工作。在这些情况下,我将依赖Tensorflow的操作封装在一个辅助函数中,将tensorflow导入该函数内部,并将其绑定到按钮。

这样,我可以“重新启动并运行所有程序”,而不必等待导入,也不必在导入失败时恢复其余的单元格。

I would like to mention a usecase of mine, very similar to those mentioned by @John Millikin and @V.K. :

Optional Imports

I do data analysis with Jupyter Notebook, and I use the same IPython notebook as a template for all analyses. In some occasions, I need to import Tensorflow to do some quick model runs, but sometimes I work in places where tensorflow isn’t set up / is slow to import. In those cases, I encapsulate my Tensorflow-dependent operations in a helper function, import tensorflow inside that function, and bind it to a button.

This way, I could do “restart-and-run-all” without having to wait for the import, or having to resume the rest of the cells when it fails.


回答 19

这是一个有趣的讨论。像许多其他人一样,我什至从未考虑过这个话题。由于想要在我的一个库中使用Django ORM,我不得不在函数中具有导入功能。我不得不打电话django.setup()在导入模型类之前,我,因为这是文件的顶部,由于IoC注入器的构造,它被拖到了完全非Django的库代码中。

我有点四处乱窜,最后将django.setup()in放在单例构造函数中,并将相关的导入放在每个类方法的顶部。现在,这种方法工作正常,但是由于进口商品不在顶部而使我感到不安,而且我也开始担心进口商品的额外时间。然后我来到这里,以极大的兴趣阅读了大家对此的看法。

我有很长的C ++背景,现在使用Python / Cython。我对此的看法是,为什么不将导入内容放入函数中,除非它导致概要分析的瓶颈。这就像在需要变量之前为变量声明空间。麻烦的是,我有数千行代码,所有导入都在顶部!所以我想从现在开始,当我经过并有时间时,在这里和那里更改奇数文件。

This is a fascinating discussion. Like many others I had never even considered this topic. I got cornered into having to have the imports in the functions because of wanting to use the Django ORM in one of my libraries. I was having to call django.setup() before importing my model classes and because this was at the top of the file it was being dragged into completely non-Django library code because of the IoC injector construction.

I kind of hacked around a bit and ended up putting the django.setup() in the singleton constructor and the relevant import at the top of each class method. Now this worked fine but made me uneasy because the imports weren’t at the top and also I started worrying about the extra time hit of the imports. Then I came here and read with great interest everybody’s take on this.

I have a long C++ background and now use Python/Cython. My take on this is that why not put the imports in the function unless it causes you a profiled bottleneck. It’s only like declaring space for variables just before you need them. The trouble is I have thousands of lines of code with all the imports at the top! So I think I will do it from now on and change the odd file here and there when I’m passing through and have the time.


将多个csv文件导入到pandas中并串联到一个DataFrame中

问题:将多个csv文件导入到pandas中并串联到一个DataFrame中

我想将目录中的多个csv文件读入pandas,并将它们连接成一个大的DataFrame。我还无法弄清楚。这是我到目前为止的内容:

import glob
import pandas as pd

# get data file names
path =r'C:\DRO\DCL_rawdata_files'
filenames = glob.glob(path + "/*.csv")

dfs = []
for filename in filenames:
    dfs.append(pd.read_csv(filename))

# Concatenate all data into one DataFrame
big_frame = pd.concat(dfs, ignore_index=True)

我想我在for循环中需要一些帮助吗???

I would like to read several csv files from a directory into pandas and concatenate them into one big DataFrame. I have not been able to figure it out though. Here is what I have so far:

import glob
import pandas as pd

# get data file names
path =r'C:\DRO\DCL_rawdata_files'
filenames = glob.glob(path + "/*.csv")

dfs = []
for filename in filenames:
    dfs.append(pd.read_csv(filename))

# Concatenate all data into one DataFrame
big_frame = pd.concat(dfs, ignore_index=True)

I guess I need some help within the for loop???


回答 0

如果所有csv文件中的列均相同,则可以尝试以下代码。我已添加,header=0以便在读取后csv可以将第一行分配为列名。

import pandas as pd
import glob

path = r'C:\DRO\DCL_rawdata_files' # use your path
all_files = glob.glob(path + "/*.csv")

li = []

for filename in all_files:
    df = pd.read_csv(filename, index_col=None, header=0)
    li.append(df)

frame = pd.concat(li, axis=0, ignore_index=True)

If you have same columns in all your csv files then you can try the code below. I have added header=0 so that after reading csv first row can be assigned as the column names.

import pandas as pd
import glob

path = r'C:\DRO\DCL_rawdata_files' # use your path
all_files = glob.glob(path + "/*.csv")

li = []

for filename in all_files:
    df = pd.read_csv(filename, index_col=None, header=0)
    li.append(df)

frame = pd.concat(li, axis=0, ignore_index=True)

回答 1

替代darindaCoder的答案

path = r'C:\DRO\DCL_rawdata_files'                     # use your path
all_files = glob.glob(os.path.join(path, "*.csv"))     # advisable to use os.path.join as this makes concatenation OS independent

df_from_each_file = (pd.read_csv(f) for f in all_files)
concatenated_df   = pd.concat(df_from_each_file, ignore_index=True)
# doesn't create a list, nor does it append to one

An alternative to darindaCoder’s answer:

path = r'C:\DRO\DCL_rawdata_files'                     # use your path
all_files = glob.glob(os.path.join(path, "*.csv"))     # advisable to use os.path.join as this makes concatenation OS independent

df_from_each_file = (pd.read_csv(f) for f in all_files)
concatenated_df   = pd.concat(df_from_each_file, ignore_index=True)
# doesn't create a list, nor does it append to one

回答 2

import glob, os    
df = pd.concat(map(pd.read_csv, glob.glob(os.path.join('', "my_files*.csv"))))
import glob, os    
df = pd.concat(map(pd.read_csv, glob.glob(os.path.join('', "my_files*.csv"))))

回答 3

Dask库可以从多个文件读取数据帧:

>>> import dask.dataframe as dd
>>> df = dd.read_csv('data*.csv')

(来源:http : //dask.pydata.org/en/latest/examples/dataframe-csv.html

Dask数据框实现了Pandas数据框API的子集。如果所有数据都适合内存,则可以调用df.compute()将数据框转换为Pandas数据框。

The Dask library can read a dataframe from multiple files:

>>> import dask.dataframe as dd
>>> df = dd.read_csv('data*.csv')

(Source: http://dask.pydata.org/en/latest/examples/dataframe-csv.html)

The Dask dataframes implement a subset of the Pandas dataframe API. If all the data fits into memory, you can call df.compute() to convert the dataframe into a Pandas dataframe.


回答 4

这里几乎所有答案都是不必要的复杂(全局模式匹配)或依赖于其他第三方库。您可以使用已内置的Pandas和python(所有版本)在2行中执行此操作。

对于一些文件-1个衬纸:

df = pd.concat(map(pd.read_csv, ['data/d1.csv', 'data/d2.csv','data/d3.csv']))

对于许多文件:

from os import listdir

filepaths = [f for f in listdir("./data") if f.endswith('.csv')]
df = pd.concat(map(pd.read_csv, filepaths))

设置df的这条熊猫线利用了3件事:

  1. Python的地图(函数,可迭代)发送到函数( pd.read_csv()可迭代(我们的列表)(是文件路径中的每个csv元素)。
  2. 熊猫的read_csv()函数可以正常读取每个CSV文件。
  3. 熊猫的concat()将所有这些都放在一个df变量下。

Almost all of the answers here are either unnecessarily complex (glob pattern matching) or rely on additional 3rd party libraries. You can do this in 2 lines using everything Pandas and python (all versions) already have built in.

For a few files – 1 liner:

df = pd.concat(map(pd.read_csv, ['data/d1.csv', 'data/d2.csv','data/d3.csv']))

For many files:

from os import listdir

filepaths = [f for f in listdir("./data") if f.endswith('.csv')]
df = pd.concat(map(pd.read_csv, filepaths))

This pandas line which sets the df utilizes 3 things:

  1. Python’s map (function, iterable) sends to the function (the pd.read_csv()) the iterable (our list) which is every csv element in filepaths).
  2. Panda’s read_csv() function reads in each CSV file as normal.
  3. Panda’s concat() brings all these under one df variable.

回答 5

编辑:我用谷歌搜索https://stackoverflow.com/a/21232849/186078。但是,最近我发现使用numpy进行任何操作,然后将其分配给数据框一次,而不是在迭代的基础上操纵数据框本身,这样更快,并且似乎也可以在此解决方案中工作。

我确实希望任何访问此页面的人都考虑采用这种方法,但又不想将这段巨大的代码作为注释并使其可读性降低。

您可以利用numpy真正加快数据帧的连接速度。

import os
import glob
import pandas as pd
import numpy as np

path = "my_dir_full_path"
allFiles = glob.glob(os.path.join(path,"*.csv"))


np_array_list = []
for file_ in allFiles:
    df = pd.read_csv(file_,index_col=None, header=0)
    np_array_list.append(df.as_matrix())

comb_np_array = np.vstack(np_array_list)
big_frame = pd.DataFrame(comb_np_array)

big_frame.columns = ["col1","col2"....]

时间统计:

total files :192
avg lines per file :8492
--approach 1 without numpy -- 8.248656988143921 seconds ---
total records old :1630571
--approach 2 with numpy -- 2.289292573928833 seconds ---

Edit: I googled my way into https://stackoverflow.com/a/21232849/186078. However of late I am finding it faster to do any manipulation using numpy and then assigning it once to dataframe rather than manipulating the dataframe itself on an iterative basis and it seems to work in this solution too.

I do sincerely want anyone hitting this page to consider this approach, but don’t want to attach this huge piece of code as a comment and making it less readable.

You can leverage numpy to really speed up the dataframe concatenation.

import os
import glob
import pandas as pd
import numpy as np

path = "my_dir_full_path"
allFiles = glob.glob(os.path.join(path,"*.csv"))


np_array_list = []
for file_ in allFiles:
    df = pd.read_csv(file_,index_col=None, header=0)
    np_array_list.append(df.as_matrix())

comb_np_array = np.vstack(np_array_list)
big_frame = pd.DataFrame(comb_np_array)

big_frame.columns = ["col1","col2"....]

Timing stats:

total files :192
avg lines per file :8492
--approach 1 without numpy -- 8.248656988143921 seconds ---
total records old :1630571
--approach 2 with numpy -- 2.289292573928833 seconds ---

回答 6

如果要递归搜索Python 3.5或更高版本),则可以执行以下操作:

from glob import iglob
import pandas as pd

path = r'C:\user\your\path\**\*.csv'

all_rec = iglob(path, recursive=True)     
dataframes = (pd.read_csv(f) for f in all_rec)
big_dataframe = pd.concat(dataframes, ignore_index=True)

请注意,最后三行可以用一行表示:

df = pd.concat((pd.read_csv(f) for f in iglob(path, recursive=True)), ignore_index=True)

您可以在** 此处找到文档。另外,我用iglob代替glob,因为它返回一个迭代器而不是列表。



编辑:多平台递归函数:

您可以将以上内容包装到一个多平台功能(Linux,Windows,Mac)中,因此可以执行以下操作:

df = read_df_rec('C:\user\your\path', *.csv)

这是函数:

from glob import iglob
from os.path import join
import pandas as pd

def read_df_rec(path, fn_regex=r'*.csv'):
    return pd.concat((pd.read_csv(f) for f in iglob(
        join(path, '**', fn_regex), recursive=True)), ignore_index=True)

If you want to search recursively (Python 3.5 or above), you can do the following:

from glob import iglob
import pandas as pd

path = r'C:\user\your\path\**\*.csv'

all_rec = iglob(path, recursive=True)     
dataframes = (pd.read_csv(f) for f in all_rec)
big_dataframe = pd.concat(dataframes, ignore_index=True)

Note that the three last lines can be expressed in one single line:

df = pd.concat((pd.read_csv(f) for f in iglob(path, recursive=True)), ignore_index=True)

You can find the documentation of ** here. Also, I used iglobinstead of glob, as it returns an iterator instead of a list.



EDIT: Multiplatform recursive function:

You can wrap the above into a multiplatform function (Linux, Windows, Mac), so you can do:

df = read_df_rec('C:\user\your\path', *.csv)

Here is the function:

from glob import iglob
from os.path import join
import pandas as pd

def read_df_rec(path, fn_regex=r'*.csv'):
    return pd.concat((pd.read_csv(f) for f in iglob(
        join(path, '**', fn_regex), recursive=True)), ignore_index=True)

回答 7

方便快捷

导入两个或多个csv而不需要列出名称。

import glob

df = pd.concat(map(pd.read_csv, glob.glob('data/*.csv')))

Easy and Fast

Import two or more csv‘s without having to make a list of names.

import glob

df = pd.concat(map(pd.read_csv, glob.glob('data/*.csv')))

回答 8

一个衬里使用map,但是如果您要指定其他参数,则可以执行以下操作:

import pandas as pd
import glob
import functools

df = pd.concat(map(functools.partial(pd.read_csv, sep='|', compression=None), 
                    glob.glob("data/*.csv")))

注意:map本身不允许您提供其他参数。

one liner using map, but if you’d like to specify additional args, you could do:

import pandas as pd
import glob
import functools

df = pd.concat(map(functools.partial(pd.read_csv, sep='|', compression=None), 
                    glob.glob("data/*.csv")))

Note: map by itself does not let you supply additional args.


回答 9

如果压缩了多个csv文件,则可以使用zipfile读取全部内容并进行如下连接:

import zipfile
import numpy as np
import pandas as pd

ziptrain = zipfile.ZipFile('yourpath/yourfile.zip')

train=[]

for f in range(0,len(ziptrain.namelist())):
    if (f == 0):
        train = pd.read_csv(ziptrain.open(ziptrain.namelist()[f]))
    else:
        my_df = pd.read_csv(ziptrain.open(ziptrain.namelist()[f]))
        train = (pd.DataFrame(np.concatenate((train,my_df),axis=0), 
                          columns=list(my_df.columns.values)))

If the multiple csv files are zipped, you may use zipfile to read all and concatenate as below:

import zipfile
import numpy as np
import pandas as pd

ziptrain = zipfile.ZipFile('yourpath/yourfile.zip')

train=[]

for f in range(0,len(ziptrain.namelist())):
    if (f == 0):
        train = pd.read_csv(ziptrain.open(ziptrain.namelist()[f]))
    else:
        my_df = pd.read_csv(ziptrain.open(ziptrain.namelist()[f]))
        train = (pd.DataFrame(np.concatenate((train,my_df),axis=0), 
                          columns=list(my_df.columns.values)))

回答 10

另一个具有列表理解功能的内联函数,它允许将参数与read_csv一起使用。

df = pd.concat([pd.read_csv(f'dir/{f}') for f in os.listdir('dir') if f.endswith('.csv')])

Another on-liner with list comprehension which allows to use arguments with read_csv.

df = pd.concat([pd.read_csv(f'dir/{f}') for f in os.listdir('dir') if f.endswith('.csv')])

回答 11

基于@Sid的正确答案。

串联之前,您可以将csv文件加载到中间字典中,该字典可以根据文件名(格式为dict_of_df['filename.csv'])访问每个数据集。例如,当列名未对齐时,此类词典可帮助您识别异构数据格式的问题。

导入模块并找到文件路径:

import os
import glob
import pandas
from collections import OrderedDict
path =r'C:\DRO\DCL_rawdata_files'
filenames = glob.glob(path + "/*.csv")

注意:OrderedDict不是必需的,但是它将保留文件顺序,这可能对分析有用。

将csv文件加载到字典中。然后连接:

dict_of_df = OrderedDict((f, pandas.read_csv(f)) for f in filenames)
pandas.concat(dict_of_df, sort=True)

键是文件名f,值是csv文件的数据帧内容。除了f用作字典键之外,还可以使用os.path.basename(f)或其他os.path方法将字典中键的大小减小到仅相关的较小部分。

Based on @Sid’s good answer.

Before concatenating, you can load csv files into an intermediate dictionary which gives access to each data set based on the file name (in the form dict_of_df['filename.csv']). Such a dictionary can help you identify issues with heterogeneous data formats, when column names are not aligned for example.

Import modules and locate file paths:

import os
import glob
import pandas
from collections import OrderedDict
path =r'C:\DRO\DCL_rawdata_files'
filenames = glob.glob(path + "/*.csv")

Note: OrderedDict is not necessary, but it’ll keep the order of files which might be useful for analysis.

Load csv files into a dictionary. Then concatenate:

dict_of_df = OrderedDict((f, pandas.read_csv(f)) for f in filenames)
pandas.concat(dict_of_df, sort=True)

Keys are file names f and values are the data frame content of csv files. Instead of using f as a dictionary key, you can also use os.path.basename(f) or other os.path methods to reduce the size of the key in the dictionary to only the smaller part that is relevant.


回答 12

使用pathlib库的替代方法(通常首选而不是os.path)。

此方法避免了pandas concat()/的迭代使用apped()

从pandas文档中:
值得注意的是,concat()(因此,append())会完整复制数据,并且不断重用此函数可能会对性能产生重大影响。如果需要对多个数据集使用该操作,请使用列表推导。

import pandas as pd
from pathlib import Path

dir = Path("../relevant_directory")

df = (pd.read_csv(f) for f in dir.glob("*.csv"))
df = pd.concat(df)

Alternative using the pathlib library (often preferred over os.path).

This method avoids iterative use of pandas concat()/apped().

From the pandas documentation:
It is worth noting that concat() (and therefore append()) makes a full copy of the data, and that constantly reusing this function can create a significant performance hit. If you need to use the operation over several datasets, use a list comprehension.

import pandas as pd
from pathlib import Path

dir = Path("../relevant_directory")

df = (pd.read_csv(f) for f in dir.glob("*.csv"))
df = pd.concat(df)

回答 13

这是在Google云端硬盘上使用Colab的方式

import pandas as pd
import glob

path = r'/content/drive/My Drive/data/actual/comments_only' # use your path
all_files = glob.glob(path + "/*.csv")

li = []

for filename in all_files:
    df = pd.read_csv(filename, index_col=None, header=0)
    li.append(df)

frame = pd.concat(li, axis=0, ignore_index=True,sort=True)
frame.to_csv('/content/drive/onefile.csv')

This is how you can do using Colab on Google Drive

import pandas as pd
import glob

path = r'/content/drive/My Drive/data/actual/comments_only' # use your path
all_files = glob.glob(path + "/*.csv")

li = []

for filename in all_files:
    df = pd.read_csv(filename, index_col=None, header=0)
    li.append(df)

frame = pd.concat(li, axis=0, ignore_index=True,sort=True)
frame.to_csv('/content/drive/onefile.csv')

回答 14

import pandas as pd
import glob

path = r'C:\DRO\DCL_rawdata_files' # use your path
file_path_list = glob.glob(path + "/*.csv")

file_iter = iter(file_path_list)

list_df_csv = []
list_df_csv.append(pd.read_csv(next(file_iter)))

for file in file_iter:
    lsit_df_csv.append(pd.read_csv(file, header=0))
df = pd.concat(lsit_df_csv, ignore_index=True)
import pandas as pd
import glob

path = r'C:\DRO\DCL_rawdata_files' # use your path
file_path_list = glob.glob(path + "/*.csv")

file_iter = iter(file_path_list)

list_df_csv = []
list_df_csv.append(pd.read_csv(next(file_iter)))

for file in file_iter:
    lsit_df_csv.append(pd.read_csv(file, header=0))
df = pd.concat(lsit_df_csv, ignore_index=True)

如何找出Python对象是否是字符串?

问题:如何找出Python对象是否是字符串?

如何检查Python对象是字符串(常规还是Unicode)?

How can I check if a Python object is a string (either regular or Unicode)?


回答 0

Python 2

使用isinstance(obj, basestring)一个对象来测试obj

文件

Python 2

Use isinstance(obj, basestring) for an object-to-test obj.

Docs.


回答 1

Python 2

要检查对象o是否是字符串类型的子类的字符串类型:

isinstance(o, basestring)

因为str和和unicode都是的子类basestring

检查的类型o是否完全是str

type(o) is str

检查是否o是的实例str或的任何子类str

isinstance(o, str)

以上还为Unicode字符串的工作,如果你更换str使用unicode

但是,您可能根本不需要进行显式类型检查。“鸭子打字”可能符合您的需求。请参阅http://docs.python.org/glossary.html#term-duck-typing

另请参阅在python中检查类型的规范方法是什么?

Python 2

To check if an object o is a string type of a subclass of a string type:

isinstance(o, basestring)

because both str and unicode are subclasses of basestring.

To check if the type of o is exactly str:

type(o) is str

To check if o is an instance of str or any subclass of str:

isinstance(o, str)

The above also work for Unicode strings if you replace str with unicode.

However, you may not need to do explicit type checking at all. “Duck typing” may fit your needs. See http://docs.python.org/glossary.html#term-duck-typing.

See also What’s the canonical way to check for type in python?


回答 2

Python 3

在Python 3.x basestring中,str唯一的字符串类型(具有Python 2.x的语义unicode)不再可用。

因此,Python 3.x中的检查只是:

isinstance(obj_to_test, str)

这是对官方转换工具的修复2to3:转换basestringstr

Python 3

In Python 3.x basestring is not available anymore, as str is the sole string type (with the semantics of Python 2.x’s unicode).

So the check in Python 3.x is just:

isinstance(obj_to_test, str)

This follows the fix of the official 2to3 conversion tool: converting basestring to str.


回答 3

Python 2和3

(兼容)

如果您不想检查Python版本(2.x与3.x),请使用sixPyPI)及其string_types属性:

import six

if isinstance(obj, six.string_types):
    print('obj is a string!')

six(一个重量很轻的单文件模块)中,它只是在做这件事

import sys
PY3 = sys.version_info[0] == 3

if PY3:
    string_types = str
else:
    string_types = basestring

Python 2 and 3

(cross-compatible)

If you want to check with no regard for Python version (2.x vs 3.x), use six (PyPI) and its string_types attribute:

import six

if isinstance(obj, six.string_types):
    print('obj is a string!')

Within six (a very light-weight single-file module), it’s simply doing this:

import sys
PY3 = sys.version_info[0] == 3

if PY3:
    string_types = str
else:
    string_types = basestring

回答 4

我发现了这个更多pythonic

if type(aObject) is str:
    #do your stuff here
    pass

由于类型对象是单例,因此可以用于将对象与str类型进行比较

I found this ans more pythonic:

if type(aObject) is str:
    #do your stuff here
    pass

since type objects are singleton, is can be used to do the compare the object to the str type


回答 5

如果一个人想从明确的类型检查(也有说走就走很好的理由远离它),可能是最安全的弦协议的一部分,以检查:

str(maybe_string) == maybe_string

它不会通过迭代的迭代或迭代器,它不会调用列表的串一个字符串,它正确地检测弦乐器的弦。

当然有缺点。例如,str(maybe_string)可能是繁重的计算。通常,答案取决于它

编辑:作为@Tcll 指出的意见,问题实际上询问的方式同时检测unicode字符串和字节串。在Python 2上,此答案将失败,但包含非ASCII字符的unicode字符串将exceptions,在Python 3上,它将False为所有字节串返回。

If one wants to stay away from explicit type-checking (and there are good reasons to stay away from it), probably the safest part of the string protocol to check is:

str(maybe_string) == maybe_string

It won’t iterate through an iterable or iterator, it won’t call a list-of-strings a string and it correctly detects a stringlike as a string.

Of course there are drawbacks. For example, str(maybe_string) may be a heavy calculation. As so often, the answer is it depends.

EDIT: As @Tcll points out in the comments, the question actually asks for a way to detect both unicode strings and bytestrings. On Python 2 this answer will fail with an exception for unicode strings that contain non-ASCII characters, and on Python 3 it will return False for all bytestrings.


回答 6

为了检查您的变量是否是某些东西,您可以像这样:

s='Hello World'
if isinstance(s,str):
#do something here,

isistance的输出将为您提供布尔值True或False,以便您可以进行相应的调整。您可以通过最初使用以下命令检查您的值的期望首字母缩写:type(s)这将返回您键入“ str”,以便您可以在isistance函数中使用它。

In order to check if your variable is something you could go like:

s='Hello World'
if isinstance(s,str):
#do something here,

The output of isistance will give you a boolean True or False value so you can adjust accordingly. You can check the expected acronym of your value by initially using: type(s) This will return you type ‘str’ so you can use it in the isistance function.


回答 7

我可能会像其他人提到的那样以鸭子打字的方式处理这个问题。我怎么知道一个字符串真的是一个字符串?好吧,显然是通过转换为字符串!

def myfunc(word):
    word = unicode(word)
    ...

如果arg已经是字符串或unicode类型,则real_word将保持其值不变。如果传递的对象实现一个__unicode__方法,则该方法用于获取其unicode表示形式。如果传递的对象不能用作字符串,则unicode内建函数引发异常。

I might deal with this in the duck-typing style, like others mention. How do I know a string is really a string? well, obviously by converting it to a string!

def myfunc(word):
    word = unicode(word)
    ...

If the arg is already a string or unicode type, real_word will hold its value unmodified. If the object passed implements a __unicode__ method, that is used to get its unicode representation. If the object passed cannot be used as a string, the unicode builtin raises an exception.


回答 8

isinstance(your_object, basestring)

如果您的对象确实是字符串类型,则将为True。’str’是保留字。

抱歉,正确的答案是使用’basestring’而不是’str’,以便它也包括unicode字符串-如上文其他响应者所述。

isinstance(your_object, basestring)

will be True if your object is indeed a string-type. ‘str’ is reserved word.

my apologies, the correct answer is using ‘basestring’ instead of ‘str’ in order of it to include unicode strings as well – as been noted above by one of the other responders.


回答 9

今天晚上,我遇到了一种情况,我以为我必须检查一下str类型,但事实证明我没有。

我解决问题的方法可能在许多情况下都可以使用,因此,在其他阅读此问题的人员感兴趣的情况下,我在下面提供了此方法(仅适用于Python 3)。

# NOTE: fields is an object that COULD be any number of things, including:
# - a single string-like object
# - a string-like object that needs to be converted to a sequence of 
# string-like objects at some separator, sep
# - a sequence of string-like objects
def getfields(*fields, sep=' ', validator=lambda f: True):
    '''Take a field sequence definition and yield from a validated
     field sequence. Accepts a string, a string with separators, 
     or a sequence of strings'''
    if fields:
        try:
            # single unpack in the case of a single argument
            fieldseq, = fields
            try:
                # convert to string sequence if string
                fieldseq = fieldseq.split(sep)
            except AttributeError:
                # not a string; assume other iterable
                pass
        except ValueError:
            # not a single argument and not a string
            fieldseq = fields
        invalid_fields = [field for field in fieldseq if not validator(field)]
        if invalid_fields:
            raise ValueError('One or more field names is invalid:\n'
                             '{!r}'.format(invalid_fields))
    else:
        raise ValueError('No fields were provided')
    try:
        yield from fieldseq
    except TypeError as e:
        raise ValueError('Single field argument must be a string'
                         'or an interable') from e

一些测试:

from . import getfields

def test_getfields_novalidation():
    result = ['a', 'b']
    assert list(getfields('a b')) == result
    assert list(getfields('a,b', sep=',')) == result
    assert list(getfields('a', 'b')) == result
    assert list(getfields(['a', 'b'])) == result

This evening I ran into a situation in which I thought I was going to have to check against the str type, but it turned out I did not.

My approach to solving the problem will probably work in many situations, so I offer it below in case others reading this question are interested (Python 3 only).

# NOTE: fields is an object that COULD be any number of things, including:
# - a single string-like object
# - a string-like object that needs to be converted to a sequence of 
# string-like objects at some separator, sep
# - a sequence of string-like objects
def getfields(*fields, sep=' ', validator=lambda f: True):
    '''Take a field sequence definition and yield from a validated
     field sequence. Accepts a string, a string with separators, 
     or a sequence of strings'''
    if fields:
        try:
            # single unpack in the case of a single argument
            fieldseq, = fields
            try:
                # convert to string sequence if string
                fieldseq = fieldseq.split(sep)
            except AttributeError:
                # not a string; assume other iterable
                pass
        except ValueError:
            # not a single argument and not a string
            fieldseq = fields
        invalid_fields = [field for field in fieldseq if not validator(field)]
        if invalid_fields:
            raise ValueError('One or more field names is invalid:\n'
                             '{!r}'.format(invalid_fields))
    else:
        raise ValueError('No fields were provided')
    try:
        yield from fieldseq
    except TypeError as e:
        raise ValueError('Single field argument must be a string'
                         'or an interable') from e

Some tests:

from . import getfields

def test_getfields_novalidation():
    result = ['a', 'b']
    assert list(getfields('a b')) == result
    assert list(getfields('a,b', sep=',')) == result
    assert list(getfields('a', 'b')) == result
    assert list(getfields(['a', 'b'])) == result

回答 10

它很简单,请使用以下代码(我们假设提到的对象为obj)-

if type(obj) == str:
    print('It is a string')
else:
    print('It is not a string.')

Its simple, use the following code (we assume the object mentioned to be obj)-

if type(obj) == str:
    print('It is a string')
else:
    print('It is not a string.')

回答 11

您可以通过连接一个空字符串来测试它:

def is_string(s):
  try:
    s += ''
  except:
    return False
  return True

编辑

在指出指出列表失败的评论后纠正我的答案

def is_string(s):
  return isinstance(s, basestring)

You can test it by concatenating with an empty string:

def is_string(s):
  try:
    s += ''
  except:
    return False
  return True

Edit:

Correcting my answer after comments pointing out that this fails with lists

def is_string(s):
  return isinstance(s, basestring)

回答 12

对于类似字符串的鸭式打字方法,它具有同时使用Python 2.x和3.x的优点:

def is_string(obj):
    try:
        obj + ''
        return True
    except TypeError:
        return False

明智的鱼转而使用鸭式输入法之前就与鸭式输入isinstance方式很接近,只是+=对列表的含义与以前不同+

For a nice duck-typing approach for string-likes that has the bonus of working with both Python 2.x and 3.x:

def is_string(obj):
    try:
        obj + ''
        return True
    except TypeError:
        return False

wisefish was close with the duck-typing before he switched to the isinstance approach, except that += has a different meaning for lists than + does.


回答 13

if type(varA) == str or type(varB) == str:
    print 'string involved'

来自EDX-在线类MITx:6.00.1x使用Python进行计算机科学和编程简介

if type(varA) == str or type(varB) == str:
    print 'string involved'

from EDX – online course MITx: 6.00.1x Introduction to Computer Science and Programming Using Python


带参数的装饰器?

问题:带参数的装饰器?

我在装饰器传递变量’insurance_mode’时遇到问题。我可以通过以下装饰器语句来做到这一点:

 @execute_complete_reservation(True)
 def test_booking_gta_object(self):
     self.test_select_gta_object()

但不幸的是,该声明不起作用。也许也许有更好的方法来解决此问题。

def execute_complete_reservation(test_case,insurance_mode):
    def inner_function(self,*args,**kwargs):
        self.test_create_qsf_query()
        test_case(self,*args,**kwargs)
        self.test_select_room_option()
        if insurance_mode:
            self.test_accept_insurance_crosseling()
        else:
            self.test_decline_insurance_crosseling()
        self.test_configure_pax_details()
        self.test_configure_payer_details

    return inner_function

I have a problem with the transfer of variable ‘insurance_mode’ by the decorator. I would do it by the following decorator statement:

 @execute_complete_reservation(True)
 def test_booking_gta_object(self):
     self.test_select_gta_object()

but unfortunately, this statement does not work. Perhaps maybe there is better way to solve this problem.

def execute_complete_reservation(test_case,insurance_mode):
    def inner_function(self,*args,**kwargs):
        self.test_create_qsf_query()
        test_case(self,*args,**kwargs)
        self.test_select_room_option()
        if insurance_mode:
            self.test_accept_insurance_crosseling()
        else:
            self.test_decline_insurance_crosseling()
        self.test_configure_pax_details()
        self.test_configure_payer_details

    return inner_function

回答 0

带参数的装饰器的语法有些不同-带参数的装饰器应返回一个函数,该函数将接受一个函数并返回另一个函数。因此,它实际上应该返回一个普通的装饰器。有点混乱吧?我的意思是:

def decorator_factory(argument):
    def decorator(function):
        def wrapper(*args, **kwargs):
            funny_stuff()
            something_with_argument(argument)
            result = function(*args, **kwargs)
            more_funny_stuff()
            return result
        return wrapper
    return decorator

在这里,您可以阅读有关该主题的更多信息-也可以使用可调用对象来实现此目的,这也在那里进行了说明。

The syntax for decorators with arguments is a bit different – the decorator with arguments should return a function that will take a function and return another function. So it should really return a normal decorator. A bit confusing, right? What I mean is:

def decorator_factory(argument):
    def decorator(function):
        def wrapper(*args, **kwargs):
            funny_stuff()
            something_with_argument(argument)
            result = function(*args, **kwargs)
            more_funny_stuff()
            return result
        return wrapper
    return decorator

Here you can read more on the subject – it’s also possible to implement this using callable objects and that is also explained there.


回答 1

编辑:要深入了解装饰者的心理模型,请看一下这个很棒的Pycon Talk。非常值得30分钟。

考虑带有参数的装饰器的一种方法是

@decorator
def foo(*args, **kwargs):
    pass

转换为

foo = decorator(foo)

因此,如果装饰者有参数,

@decorator_with_args(arg)
def foo(*args, **kwargs):
    pass

转换为

foo = decorator_with_args(arg)(foo)

decorator_with_args 是一个接受自定义参数并返回实际装饰器的函数(将应用于装饰器函数)。

我使用带有局部选项的简单技巧来使我的装饰器变得容易

from functools import partial

def _pseudo_decor(fun, argument):
    def ret_fun(*args, **kwargs):
        #do stuff here, for eg.
        print ("decorator arg is %s" % str(argument))
        return fun(*args, **kwargs)
    return ret_fun

real_decorator = partial(_pseudo_decor, argument=arg)

@real_decorator
def foo(*args, **kwargs):
    pass

更新:

以上,foo成为real_decorator(foo)

装饰函数的一种效果是,名称foo在装饰器声明时被覆盖。foo被的返回值“覆盖” real_decorator。在这种情况下,一个新的功能对象。

的所有foo元数据都将被覆盖,特别是docstring和函数名称。

>>> print(foo)
<function _pseudo_decor.<locals>.ret_fun at 0x10666a2f0>

functools.wraps为我们提供了一种方便的方法,可将文档字符串和名称“提升”为返回的函数。

from functools import partial, wraps

def _pseudo_decor(fun, argument):
    # magic sauce to lift the name and doc of the function
    @wraps(fun)
    def ret_fun(*args, **kwargs):
        #do stuff here, for eg.
        print ("decorator arg is %s" % str(argument))
        return fun(*args, **kwargs)
    return ret_fun

real_decorator = partial(_pseudo_decor, argument=arg)

@real_decorator
def bar(*args, **kwargs):
    pass

>>> print(bar)
<function __main__.bar(*args, **kwargs)>

Edit : for an in-depth understanding of the mental model of decorators, take a look at this awesome Pycon Talk. well worth the 30 minutes.

One way of thinking about decorators with arguments is

@decorator
def foo(*args, **kwargs):
    pass

translates to

foo = decorator(foo)

So if the decorator had arguments,

@decorator_with_args(arg)
def foo(*args, **kwargs):
    pass

translates to

foo = decorator_with_args(arg)(foo)

decorator_with_args is a function which accepts a custom argument and which returns the actual decorator (that will be applied to the decorated function).

I use a simple trick with partials to make my decorators easy

from functools import partial

def _pseudo_decor(fun, argument):
    def ret_fun(*args, **kwargs):
        #do stuff here, for eg.
        print ("decorator arg is %s" % str(argument))
        return fun(*args, **kwargs)
    return ret_fun

real_decorator = partial(_pseudo_decor, argument=arg)

@real_decorator
def foo(*args, **kwargs):
    pass

Update:

Above, foo becomes real_decorator(foo)

One effect of decorating a function is that the name foo is overridden upon decorator declaration. foo is “overridden” by whatever is returned by real_decorator. In this case, a new function object.

All of foo‘s metadata is overridden, notably docstring and function name.

>>> print(foo)
<function _pseudo_decor.<locals>.ret_fun at 0x10666a2f0>

functools.wraps gives us a convenient method to “lift” the docstring and name to the returned function.

from functools import partial, wraps

def _pseudo_decor(fun, argument):
    # magic sauce to lift the name and doc of the function
    @wraps(fun)
    def ret_fun(*args, **kwargs):
        #do stuff here, for eg.
        print ("decorator arg is %s" % str(argument))
        return fun(*args, **kwargs)
    return ret_fun

real_decorator = partial(_pseudo_decor, argument=arg)

@real_decorator
def bar(*args, **kwargs):
    pass

>>> print(bar)
<function __main__.bar(*args, **kwargs)>

回答 2

我想展示一个想法,恕我直言,非常优雅。t.dubrownik提出的解决方案显示了一种始终不变的模式:无论装饰器做什么,都需要三层包装器。

所以我认为这是元装饰器的工作,即装饰器的装饰器。装饰器是一个函数,它实际上可以用作带有参数的常规装饰器:

def parametrized(dec):
    def layer(*args, **kwargs):
        def repl(f):
            return dec(f, *args, **kwargs)
        return repl
    return layer

可以将其应用于常规装饰器以添加参数。例如,假设我们有一个装饰器,它将一个函数的结果加倍:

def double(f):
    def aux(*xs, **kws):
        return 2 * f(*xs, **kws)
    return aux

@double
def function(a):
    return 10 + a

print function(3)    # Prints 26, namely 2 * (10 + 3)

通过@parametrized我们可以构建@multiply具有参数的通用装饰器

@parametrized
def multiply(f, n):
    def aux(*xs, **kws):
        return n * f(*xs, **kws)
    return aux

@multiply(2)
def function(a):
    return 10 + a

print function(3)    # Prints 26

@multiply(3)
def function_again(a):
    return 10 + a

print function(3)          # Keeps printing 26
print function_again(3)    # Prints 39, namely 3 * (10 + 3)

通常,a的第一个参数 装饰器参数是函数,而其余参数将对应于参数化装饰器的参数。

一个有趣的用法示例可以是类型安全的断言修饰符:

import itertools as it

@parametrized
def types(f, *types):
    def rep(*args):
        for a, t, n in zip(args, types, it.count()):
            if type(a) is not t:
                raise TypeError('Value %d has not type %s. %s instead' %
                    (n, t, type(a))
                )
        return f(*args)
    return rep

@types(str, int)  # arg1 is str, arg2 is int
def string_multiply(text, times):
    return text * times

print(string_multiply('hello', 3))    # Prints hellohellohello
print(string_multiply(3, 3))          # Fails miserably with TypeError

最后一点:这里我没有使用functools.wraps包装函数,但是我建议您一直使用它。

I’d like to show an idea which is IMHO quite elegant. The solution proposed by t.dubrownik shows a pattern which is always the same: you need the three-layered wrapper regardless of what the decorator does.

So I thought this is a job for a meta-decorator, that is, a decorator for decorators. As a decorator is a function, it actually works as a regular decorator with arguments:

def parametrized(dec):
    def layer(*args, **kwargs):
        def repl(f):
            return dec(f, *args, **kwargs)
        return repl
    return layer

This can be applied to a regular decorator in order to add parameters. So for instance, say we have the decorator which doubles the result of a function:

def double(f):
    def aux(*xs, **kws):
        return 2 * f(*xs, **kws)
    return aux

@double
def function(a):
    return 10 + a

print function(3)    # Prints 26, namely 2 * (10 + 3)

With @parametrized we can build a generic @multiply decorator having a parameter

@parametrized
def multiply(f, n):
    def aux(*xs, **kws):
        return n * f(*xs, **kws)
    return aux

@multiply(2)
def function(a):
    return 10 + a

print function(3)    # Prints 26

@multiply(3)
def function_again(a):
    return 10 + a

print function(3)          # Keeps printing 26
print function_again(3)    # Prints 39, namely 3 * (10 + 3)

Conventionally the first parameter of a parametrized decorator is the function, while the remaining arguments will correspond to the parameter of the parametrized decorator.

An interesting usage example could be a type-safe assertive decorator:

import itertools as it

@parametrized
def types(f, *types):
    def rep(*args):
        for a, t, n in zip(args, types, it.count()):
            if type(a) is not t:
                raise TypeError('Value %d has not type %s. %s instead' %
                    (n, t, type(a))
                )
        return f(*args)
    return rep

@types(str, int)  # arg1 is str, arg2 is int
def string_multiply(text, times):
    return text * times

print(string_multiply('hello', 3))    # Prints hellohellohello
print(string_multiply(3, 3))          # Fails miserably with TypeError

A final note: here I’m not using functools.wraps for the wrapper functions, but I would recommend using it all the times.


回答 3

这是t.dubrownik的答案的略微修改版本。为什么?

  1. 作为常规模板,您应该从原始函数返回返回值。
  2. 这会更改函数的名称,这可能会影响其他修饰符/代码。

因此使用@functools.wraps()

from functools import wraps

def decorator(argument):
    def real_decorator(function):
        @wraps(function)
        def wrapper(*args, **kwargs):
            funny_stuff()
            something_with_argument(argument)
            retval = function(*args, **kwargs)
            more_funny_stuff()
            return retval
        return wrapper
    return real_decorator

Here is a slightly modified version of t.dubrownik’s answer. Why?

  1. As a general template, you should return the return value from the original function.
  2. This changes the name of the function, which could affect other decorators / code.

So use @functools.wraps():

from functools import wraps

def decorator(argument):
    def real_decorator(function):
        @wraps(function)
        def wrapper(*args, **kwargs):
            funny_stuff()
            something_with_argument(argument)
            retval = function(*args, **kwargs)
            more_funny_stuff()
            return retval
        return wrapper
    return real_decorator

回答 4

我想您的问题是将参数传递给装饰器。这有点棘手,并不简单。

这是如何执行此操作的示例:

class MyDec(object):
    def __init__(self,flag):
        self.flag = flag
    def __call__(self, original_func):
        decorator_self = self
        def wrappee( *args, **kwargs):
            print 'in decorator before wrapee with flag ',decorator_self.flag
            original_func(*args,**kwargs)
            print 'in decorator after wrapee with flag ',decorator_self.flag
        return wrappee

@MyDec('foo de fa fa')
def bar(a,b,c):
    print 'in bar',a,b,c

bar('x','y','z')

印刷品:

in decorator before wrapee with flag  foo de fa fa
in bar x y z
in decorator after wrapee with flag  foo de fa fa

有关更多详细信息,请参见Bruce Eckel的文章。

I presume your problem is passing arguments to your decorator. This is a little tricky and not straightforward.

Here’s an example of how to do this:

class MyDec(object):
    def __init__(self,flag):
        self.flag = flag
    def __call__(self, original_func):
        decorator_self = self
        def wrappee( *args, **kwargs):
            print 'in decorator before wrapee with flag ',decorator_self.flag
            original_func(*args,**kwargs)
            print 'in decorator after wrapee with flag ',decorator_self.flag
        return wrappee

@MyDec('foo de fa fa')
def bar(a,b,c):
    print 'in bar',a,b,c

bar('x','y','z')

Prints:

in decorator before wrapee with flag  foo de fa fa
in bar x y z
in decorator after wrapee with flag  foo de fa fa

See Bruce Eckel’s article for more details.


回答 5

def decorator(argument):
    def real_decorator(function):
        def wrapper(*args):
            for arg in args:
                assert type(arg)==int,f'{arg} is not an interger'
            result = function(*args)
            result = result*argument
            return result
        return wrapper
    return real_decorator

装饰器的用法

@decorator(2)
def adder(*args):
    sum=0
    for i in args:
        sum+=i
    return sum

然后

adder(2,3)

产生

10

adder('hi',3)

产生

---------------------------------------------------------------------------
AssertionError                            Traceback (most recent call last)
<ipython-input-143-242a8feb1cc4> in <module>
----> 1 adder('hi',3)

<ipython-input-140-d3420c248ebd> in wrapper(*args)
      3         def wrapper(*args):
      4             for arg in args:
----> 5                 assert type(arg)==int,f'{arg} is not an interger'
      6             result = function(*args)
      7             result = result*argument

AssertionError: hi is not an interger
def decorator(argument):
    def real_decorator(function):
        def wrapper(*args):
            for arg in args:
                assert type(arg)==int,f'{arg} is not an interger'
            result = function(*args)
            result = result*argument
            return result
        return wrapper
    return real_decorator

Usage of the decorator

@decorator(2)
def adder(*args):
    sum=0
    for i in args:
        sum+=i
    return sum

Then the

adder(2,3)

produces

10

but

adder('hi',3)

produces

---------------------------------------------------------------------------
AssertionError                            Traceback (most recent call last)
<ipython-input-143-242a8feb1cc4> in <module>
----> 1 adder('hi',3)

<ipython-input-140-d3420c248ebd> in wrapper(*args)
      3         def wrapper(*args):
      4             for arg in args:
----> 5                 assert type(arg)==int,f'{arg} is not an interger'
      6             result = function(*args)
      7             result = result*argument

AssertionError: hi is not an interger

回答 6

这是用于函数装饰器的模板,该模板不需要提供()任何参数即可:

import functools


def decorator(x_or_func=None, *decorator_args, **decorator_kws):
    def _decorator(func):
        @functools.wraps(func)
        def wrapper(*args, **kws):
            if 'x_or_func' not in locals() \
                    or callable(x_or_func) \
                    or x_or_func is None:
                x = ...  # <-- default `x` value
            else:
                x = x_or_func
            return func(*args, **kws)

        return wrapper

    return _decorator(x_or_func) if callable(x_or_func) else _decorator

下面是一个示例:

def multiplying(factor_or_func=None):
    def _decorator(func):
        @functools.wraps(func)
        def wrapper(*args, **kwargs):
            if 'factor_or_func' not in locals() \
                    or callable(factor_or_func) \
                    or factor_or_func is None:
                factor = 1
            else:
                factor = factor_or_func
            return factor * func(*args, **kwargs)
        return wrapper
    return _decorator(factor_or_func) if callable(factor_or_func) else _decorator


@multiplying
def summing(x): return sum(x)

print(summing(range(10)))
# 45


@multiplying()
def summing(x): return sum(x)

print(summing(range(10)))
# 45


@multiplying(10)
def summing(x): return sum(x)

print(summing(range(10)))
# 450

This is a template for a function decorator that does not require () if no parameters are to be given:

import functools


def decorator(x_or_func=None, *decorator_args, **decorator_kws):
    def _decorator(func):
        @functools.wraps(func)
        def wrapper(*args, **kws):
            if 'x_or_func' not in locals() \
                    or callable(x_or_func) \
                    or x_or_func is None:
                x = ...  # <-- default `x` value
            else:
                x = x_or_func
            return func(*args, **kws)

        return wrapper

    return _decorator(x_or_func) if callable(x_or_func) else _decorator

an example of this is given below:

def multiplying(factor_or_func=None):
    def _decorator(func):
        @functools.wraps(func)
        def wrapper(*args, **kwargs):
            if 'factor_or_func' not in locals() \
                    or callable(factor_or_func) \
                    or factor_or_func is None:
                factor = 1
            else:
                factor = factor_or_func
            return factor * func(*args, **kwargs)
        return wrapper
    return _decorator(factor_or_func) if callable(factor_or_func) else _decorator


@multiplying
def summing(x): return sum(x)

print(summing(range(10)))
# 45


@multiplying()
def summing(x): return sum(x)

print(summing(range(10)))
# 45


@multiplying(10)
def summing(x): return sum(x)

print(summing(range(10)))
# 450

回答 7

在我的实例中,我决定通过单行lambda解决此问题,以创建一个新的装饰器函数:

def finished_message(function, message="Finished!"):

    def wrapper(*args, **kwargs):
        output = function(*args,**kwargs)
        print(message)
        return output

    return wrapper

@finished_message
def func():
    pass

my_finished_message = lambda f: finished_message(f, "All Done!")

@my_finished_message
def my_func():
    pass

if __name__ == '__main__':
    func()
    my_func()

执行后,将打印:

Finished!
All Done!

也许没有其他解决方案可扩展,但是为我工作。

In my instance, I decided to solve this via a one-line lambda to create a new decorator function:

def finished_message(function, message="Finished!"):

    def wrapper(*args, **kwargs):
        output = function(*args,**kwargs)
        print(message)
        return output

    return wrapper

@finished_message
def func():
    pass

my_finished_message = lambda f: finished_message(f, "All Done!")

@my_finished_message
def my_func():
    pass

if __name__ == '__main__':
    func()
    my_func()

When executed, this prints:

Finished!
All Done!

Perhaps not as extensible as other solutions, but worked for me.


回答 8

编写一个可以使用和不使用参数的装饰器是一个挑战,因为Python在这两种情况下期望完全不同的行为!许多答案都试图解决此问题,以下是@ norok2对答案的改进。具体来说,这种变化消除了对的使用locals()

遵循@ norok2给出的相同示例:

import functools

def multiplying(f_py=None, factor=1):
    assert callable(f_py) or f_py is None
    def _decorator(func):
        @functools.wraps(func)
        def wrapper(*args, **kwargs):
            return factor * func(*args, **kwargs)
        return wrapper
    return _decorator(f_py) if callable(f_py) else _decorator


@multiplying
def summing(x): return sum(x)

print(summing(range(10)))
# 45


@multiplying()
def summing(x): return sum(x)

print(summing(range(10)))
# 45


@multiplying(factor=10)
def summing(x): return sum(x)

print(summing(range(10)))
# 450

玩这个代码

要注意的是,用户必须提供键,值对参数而不是位置参数,并且保留第一个参数。

Writing a decorator that works with and without parameter is a challenge because Python expects completely different behavior in these two cases! Many answers have tried to work around this and below is an improvement of answer by @norok2. Specifically, this variation eliminates the use of locals().

Following the same example as given by @norok2:

import functools

def multiplying(f_py=None, factor=1):
    assert callable(f_py) or f_py is None
    def _decorator(func):
        @functools.wraps(func)
        def wrapper(*args, **kwargs):
            return factor * func(*args, **kwargs)
        return wrapper
    return _decorator(f_py) if callable(f_py) else _decorator


@multiplying
def summing(x): return sum(x)

print(summing(range(10)))
# 45


@multiplying()
def summing(x): return sum(x)

print(summing(range(10)))
# 45


@multiplying(factor=10)
def summing(x): return sum(x)

print(summing(range(10)))
# 450

Play with this code.

The catch is that the user must supply key,value pairs of parameters instead of positional parameters and the first parameter is reserved.


回答 9

众所周知,以下两段代码几乎等效:

@dec
def foo():
    pass    foo = dec(foo)

############################################
foo = dec(foo)

一个常见的错误是认为它@只是隐藏了最左边的参数。

@dec(1, 2, 3)
def foo():
    pass    
###########################################
foo = dec(foo, 1, 2, 3)

如果上面是这样的话,编写装饰器会容易得多@。不幸的是,这不是事情的完成方式。


考虑一个装饰器Wait,它会破坏程序执行几秒钟。如果您未通过等待时间,则默认值为1秒。用例如下所示。

##################################################
@Wait
def print_something(something):
    print(something)

##################################################
@Wait(3)
def print_something_else(something_else):
    print(something_else)

##################################################
@Wait(delay=3)
def print_something_else(something_else):
    print(something_else)

Wait具有参数(例如)时@Wait(3),将其他任何事情发生之前Wait(3) 执行调用。

也就是说,以下两段代码是等效的

@Wait(3)
def print_something_else(something_else):
    print(something_else)

###############################################
return_value = Wait(3)
@return_value
def print_something_else(something_else):
    print(something_else)

这是个问题。

if `Wait` has no arguments:
    `Wait` is the decorator.
else: # `Wait` receives arguments
    `Wait` is not the decorator itself.
    Instead, `Wait` ***returns*** the decorator

一种解决方案如下所示:

让我们从创建以下类开始DelayedDecorator

class DelayedDecorator:
    def __init__(i, cls, *args, **kwargs):
        print("Delayed Decorator __init__", cls, args, kwargs)
        i._cls = cls
        i._args = args
        i._kwargs = kwargs
    def __call__(i, func):
        print("Delayed Decorator __call__", func)
        if not (callable(func)):
            import io
            with io.StringIO() as ss:
                print(
                    "If only one input, input must be callable",
                    "Instead, received:",
                    repr(func),
                    sep="\n",
                    file=ss
                )
                msg = ss.getvalue()
            raise TypeError(msg)
        return i._cls(func, *i._args, **i._kwargs)

现在我们可以编写如下内容:

 dec = DelayedDecorator(Wait, delay=4)
 @dec
 def delayed_print(something):
    print(something)

注意:

  • dec 不接受多个参数。
  • dec 仅接受要包装的功能。

    导入检查类PolyArgDecoratorMeta(type):def 调用(等待,* args,** kwargs):尝试:arg_count = len(args)if(arg_count == 1):如果callable(args [0]):SuperClass = inspect。 getmro(PolyArgDecoratorMeta)[1] r =超类。呼叫(Wait,args [0])否则:r = DelayedDecorator(等待,* args,** kwargs)否则:r = DelayedDecorator(等待,* args,** kwargs)最后:通过return r

    导入时间类Wait(metaclass = PolyArgDecoratorMeta):def init(i,func,delay = 2):i._func = func i._delay = delay

    def __call__(i, *args, **kwargs):
        time.sleep(i._delay)
        r = i._func(*args, **kwargs)
        return r 

以下两段代码是等效的:

@Wait
def print_something(something):
     print (something)

##################################################

def print_something(something):
    print(something)
print_something = Wait(print_something)

我们可以"something"非常缓慢地打印到控制台,如下所示:

print_something("something")

#################################################
@Wait(delay=1)
def print_something_else(something_else):
    print(something_else)

##################################################
def print_something_else(something_else):
    print(something_else)

dd = DelayedDecorator(Wait, delay=1)
print_something_else = dd(print_something_else)

##################################################

print_something_else("something")

最后说明

它可能看起来像一个大量的代码,但你不必写类DelayedDecoratorPolyArgDecoratorMeta每一个时间。您唯一需要亲自编写类似以下内容的代码,这很短:

from PolyArgDecoratorMeta import PolyArgDecoratorMeta
import time
class Wait(metaclass=PolyArgDecoratorMeta):
 def __init__(i, func, delay = 2):
     i._func = func
     i._delay = delay

 def __call__(i, *args, **kwargs):
     time.sleep(i._delay)
     r = i._func(*args, **kwargs)
     return r

It is well known that the following two pieces of code are nearly equivalent:

@dec
def foo():
    pass    foo = dec(foo)

############################################
foo = dec(foo)

A common mistake is to think that @ simply hides the leftmost argument.

@dec(1, 2, 3)
def foo():
    pass    
###########################################
foo = dec(foo, 1, 2, 3)

It would be much easier to write decorators if the above is how @ worked. Unfortunately, that’s not the way things are done.


Consider a decorator Waitwhich haults program execution for a few seconds. If you don’t pass in a Wait-time then the default value is 1 seconds. Use-cases are shown below.

##################################################
@Wait
def print_something(something):
    print(something)

##################################################
@Wait(3)
def print_something_else(something_else):
    print(something_else)

##################################################
@Wait(delay=3)
def print_something_else(something_else):
    print(something_else)

When Wait has an argument, such as @Wait(3), then the call Wait(3) is executed before anything else happens.

That is, the following two pieces of code are equivalent

@Wait(3)
def print_something_else(something_else):
    print(something_else)

###############################################
return_value = Wait(3)
@return_value
def print_something_else(something_else):
    print(something_else)

This is a problem.

if `Wait` has no arguments:
    `Wait` is the decorator.
else: # `Wait` receives arguments
    `Wait` is not the decorator itself.
    Instead, `Wait` ***returns*** the decorator

One solution is shown below:

Let us begin by creating the following class, DelayedDecorator:

class DelayedDecorator:
    def __init__(i, cls, *args, **kwargs):
        print("Delayed Decorator __init__", cls, args, kwargs)
        i._cls = cls
        i._args = args
        i._kwargs = kwargs
    def __call__(i, func):
        print("Delayed Decorator __call__", func)
        if not (callable(func)):
            import io
            with io.StringIO() as ss:
                print(
                    "If only one input, input must be callable",
                    "Instead, received:",
                    repr(func),
                    sep="\n",
                    file=ss
                )
                msg = ss.getvalue()
            raise TypeError(msg)
        return i._cls(func, *i._args, **i._kwargs)

Now we can write things like:

 dec = DelayedDecorator(Wait, delay=4)
 @dec
 def delayed_print(something):
    print(something)

Note that:

  • dec does not not accept multiple arguments.
  • dec only accepts the function to be wrapped.

    import inspect class PolyArgDecoratorMeta(type): def call(Wait, *args, **kwargs): try: arg_count = len(args) if (arg_count == 1): if callable(args[0]): SuperClass = inspect.getmro(PolyArgDecoratorMeta)[1] r = SuperClass.call(Wait, args[0]) else: r = DelayedDecorator(Wait, *args, **kwargs) else: r = DelayedDecorator(Wait, *args, **kwargs) finally: pass return r

    import time class Wait(metaclass=PolyArgDecoratorMeta): def init(i, func, delay = 2): i._func = func i._delay = delay

    def __call__(i, *args, **kwargs):
        time.sleep(i._delay)
        r = i._func(*args, **kwargs)
        return r 
    

The following two pieces of code are equivalent:

@Wait
def print_something(something):
     print (something)

##################################################

def print_something(something):
    print(something)
print_something = Wait(print_something)

We can print "something" to the console very slowly, as follows:

print_something("something")

#################################################
@Wait(delay=1)
def print_something_else(something_else):
    print(something_else)

##################################################
def print_something_else(something_else):
    print(something_else)

dd = DelayedDecorator(Wait, delay=1)
print_something_else = dd(print_something_else)

##################################################

print_something_else("something")

Final Notes

It may look like a lot of code, but you don’t have to write the classes DelayedDecorator and PolyArgDecoratorMeta every-time. The only code you have to personally write something like as follows, which is fairly short:

from PolyArgDecoratorMeta import PolyArgDecoratorMeta
import time
class Wait(metaclass=PolyArgDecoratorMeta):
 def __init__(i, func, delay = 2):
     i._func = func
     i._delay = delay

 def __call__(i, *args, **kwargs):
     time.sleep(i._delay)
     r = i._func(*args, **kwargs)
     return r

回答 10

定义此“ decoratorize函数”以生成定制的装饰器函数:

def decoratorize(FUN, **kw):
    def foo(*args, **kws):
        return FUN(*args, **kws, **kw)
    return foo

使用这种方式:

    @decoratorize(FUN, arg1 = , arg2 = , ...)
    def bar(...):
        ...

define this “decoratorize function” to generate customized decorator function:

def decoratorize(FUN, **kw):
    def foo(*args, **kws):
        return FUN(*args, **kws, **kw)
    return foo

use it this way:

    @decoratorize(FUN, arg1 = , arg2 = , ...)
    def bar(...):
        ...

回答 11

上面的好答案。此示例还说明了@wraps,它从原始函数中获取doc字符串和函数名,并将其应用于新的包装版本:

from functools import wraps

def decorator_func_with_args(arg1, arg2):
    def decorator(f):
        @wraps(f)
        def wrapper(*args, **kwargs):
            print("Before orginal function with decorator args:", arg1, arg2)
            result = f(*args, **kwargs)
            print("Ran after the orginal function")
            return result
        return wrapper
    return decorator

@decorator_func_with_args("foo", "bar")
def hello(name):
    """A function which prints a greeting to the name provided.
    """
    print('hello ', name)
    return 42

print("Starting script..")
x = hello('Bob')
print("The value of x is:", x)
print("The wrapped functions docstring is:", hello.__doc__)
print("The wrapped functions name is:", hello.__name__)

印刷品:

Starting script..
Before orginal function with decorator args: foo bar
hello  Bob
Ran after the orginal function
The value of x is: 42
The wrapped functions docstring is: A function which prints a greeting to the name provided.
The wrapped functions name is: hello

Great answers above. This one also illustrates @wraps, which takes the doc string and function name from the original function and applies it to the new wrapped version:

from functools import wraps

def decorator_func_with_args(arg1, arg2):
    def decorator(f):
        @wraps(f)
        def wrapper(*args, **kwargs):
            print("Before orginal function with decorator args:", arg1, arg2)
            result = f(*args, **kwargs)
            print("Ran after the orginal function")
            return result
        return wrapper
    return decorator

@decorator_func_with_args("foo", "bar")
def hello(name):
    """A function which prints a greeting to the name provided.
    """
    print('hello ', name)
    return 42

print("Starting script..")
x = hello('Bob')
print("The value of x is:", x)
print("The wrapped functions docstring is:", hello.__doc__)
print("The wrapped functions name is:", hello.__name__)

Prints:

Starting script..
Before orginal function with decorator args: foo bar
hello  Bob
Ran after the orginal function
The value of x is: 42
The wrapped functions docstring is: A function which prints a greeting to the name provided.
The wrapped functions name is: hello

回答 12

如果函数和装饰器都必须接受参数,则可以采用以下方法。

例如,有一个名为的装饰器decorator1,它接受一个参数

@decorator1(5)
def func1(arg1, arg2):
    print (arg1, arg2)

func1(1, 2)

现在,如果decorator1参数必须是动态的,或者在调用函数时传递的,

def func1(arg1, arg2):
    print (arg1, arg2)


a = 1
b = 2
seconds = 10

decorator1(seconds)(func1)(a, b)

在上面的代码中

  • seconds 是为 decorator1
  • a, b 是…的论点 func1

In case both the function and the decorator have to take arguments you can follow the below approach.

For example there is a decorator named decorator1 which takes an argument

@decorator1(5)
def func1(arg1, arg2):
    print (arg1, arg2)

func1(1, 2)

Now if the decorator1 argument has to be dynamic, or passed while calling the function,

def func1(arg1, arg2):
    print (arg1, arg2)


a = 1
b = 2
seconds = 10

decorator1(seconds)(func1)(a, b)

In the above code

  • seconds is the argument for decorator1
  • a, b are the arguments of func1

查找使用easy_install / pip安装的所有软件包?

问题:查找使用easy_install / pip安装的所有软件包?

有没有办法找到所有通过easy_install或pip安装的Python PyPI软件包?我的意思是,排除分发工具已经安装的所有东西(在本例中为Debian上的apt-get)。

Is there a way to find all Python PyPI packages that were installed with easy_install or pip? I mean, excluding everything that was/is installed with the distributions tools (in this case apt-get on Debian).


回答 0

pip freeze将输出已安装软件包及其版本的列表。它还允许您将那些程序包写入文件,以便以后用于设置新环境。

https://pip.pypa.io/zh_CN/stable/reference/pip_freeze/#pip-freeze

pip freeze will output a list of installed packages and their versions. It also allows you to write those packages to a file that can later be used to set up a new environment.

https://pip.pypa.io/en/stable/reference/pip_freeze/#pip-freeze


回答 1

从1.3版本的pip开始,您现在可以使用 pip list

它具有一些有用的选项,包括显示过期软件包的能力。这是文档:https : //pip.pypa.io/en/latest/reference/pip_list/

As of version 1.3 of pip you can now use pip list

It has some useful options including the ability to show outdated packages. Here’s the documentation: https://pip.pypa.io/en/latest/reference/pip_list/


回答 2

如果有人想知道您可以使用“ pip show”命令。

pip show [options] <package>

这将列出给定软件包的安装目录。

If anyone is wondering you can use the ‘pip show’ command.

pip show [options] <package>

This will list the install directory of the given package.


回答 3

如果Debian在pip install默认目标上的行为类似于最近的Ubuntu版本,那就太简单了:它安装到(/usr/local/lib/而不是/usr/libapt默认目标))。检查/ubuntu/173323/how-do-i-detect-and-remove-python-packages-installed-via-pip/259747#259747

我是ArchLinux用户,在尝试pip时遇到了同样的问题。这是我在Arch中解决问题的方法。

find /usr/lib/python2.7/site-packages -maxdepth 2 -name __init__.py | xargs pacman -Qo | grep 'No package'

此处的关键是/usr/lib/python2.7/site-packages,这是pip安装到的目录YMMV。pacman -Qo是如何Arch的PAC卡格男人几岁检查该文件的所有权。No package是没有包拥有文件时返回的一部分error: No package owns $FILENAME。整蛊解决方法:我询问有关__init__.py,因为pacman -Qo有点懵,当涉及到目录:(

为了在其他发行版中做到这一点,您必须找出在哪里pip安装东西(仅仅sudo pip install是东西),如何查询文件的所有权(Debian / Ubuntu方法是dpkg -S)以及“没有软件包拥有该路径”返回(Debian)是什么。 / Ubuntu是no path found matching pattern)。Debian / Ubuntu用户,请注意:dpkg -S如果给它一个符号链接,它将失败。只需先使用解决即可realpath。像这样:

find /usr/local/lib/python2.7/dist-packages -maxdepth 2 -name __init__.py | xargs realpath | xargs dpkg -S 2>&1 | grep 'no path found'

Fedora用户可以尝试(感谢@eddygeek):

find /usr/lib/python2.7/site-packages -maxdepth 2 -name __init__.py | xargs rpm -qf | grep 'not owned by any package'

If Debian behaves like recent Ubuntu versions regarding pip install default target, it’s dead easy: it installs to /usr/local/lib/ instead of /usr/lib (apt default target). Check https://askubuntu.com/questions/173323/how-do-i-detect-and-remove-python-packages-installed-via-pip/259747#259747

I am an ArchLinux user and as I experimented with pip I met this same problem. Here’s how I solved it in Arch.

find /usr/lib/python2.7/site-packages -maxdepth 2 -name __init__.py | xargs pacman -Qo | grep 'No package'

Key here is /usr/lib/python2.7/site-packages, which is the directory pip installs to, YMMV. pacman -Qo is how Arch’s pac kage man ager checks for ownership of the file. No package is part of the return it gives when no package owns the file: error: No package owns $FILENAME. Tricky workaround: I’m querying about __init__.py because pacman -Qo is a little bit ignorant when it comes to directories :(

In order to do it for other distros, you have to find out where pip installs stuff (just sudo pip install something), how to query ownership of a file (Debian/Ubuntu method is dpkg -S) and what is the “no package owns that path” return (Debian/Ubuntu is no path found matching pattern). Debian/Ubuntu users, beware: dpkg -S will fail if you give it a symbolic link. Just resolve it first by using realpath. Like this:

find /usr/local/lib/python2.7/dist-packages -maxdepth 2 -name __init__.py | xargs realpath | xargs dpkg -S 2>&1 | grep 'no path found'

Fedora users can try (thanks @eddygeek):

find /usr/lib/python2.7/site-packages -maxdepth 2 -name __init__.py | xargs rpm -qf | grep 'not owned by any package'

回答 4

从…开始:

$ pip list

列出所有软件包。找到所需的软件包后,请使用:

$ pip show <package-name>

这将显示有关此软件包的详细信息,包括其文件夹。如果您已经知道软件包名称,则可以跳过第一部分

点击这里对PIP显示更多的信息,这里的PIP列表的详细信息。

例:

$ pip show jupyter
Name: jupyter
Version: 1.0.0
Summary: Jupyter metapackage. Install all the Jupyter components in one go.
Home-page: http://jupyter.org
Author: Jupyter Development Team
Author-email: jupyter@googlegroups.org
License: BSD
Location: /usr/local/lib/python2.7/site-packages
Requires: ipywidgets, nbconvert, notebook, jupyter-console, qtconsole, ipykernel    

Start with:

$ pip list

To list all packages. Once you found the package you want, use:

$ pip show <package-name>

This will show you details about this package, including its folder. You can skip the first part if you already know the package name

Click here for more information on pip show and here for more information on pip list.

Example:

$ pip show jupyter
Name: jupyter
Version: 1.0.0
Summary: Jupyter metapackage. Install all the Jupyter components in one go.
Home-page: http://jupyter.org
Author: Jupyter Development Team
Author-email: jupyter@googlegroups.org
License: BSD
Location: /usr/local/lib/python2.7/site-packages
Requires: ipywidgets, nbconvert, notebook, jupyter-console, qtconsole, ipykernel    

回答 5

pip.get_installed_distributions() 将给出已安装软件包的列表

import pip
from os.path import join

for package in pip.get_installed_distributions():
    print(package.location) # you can exclude packages that's in /usr/XXX
    print(join(package.location, package._get_metadata("top_level.txt"))) # root directory of this package

pip.get_installed_distributions() will give a list of installed packages

import pip
from os.path import join

for package in pip.get_installed_distributions():
    print(package.location) # you can exclude packages that's in /usr/XXX
    print(join(package.location, package._get_metadata("top_level.txt"))) # root directory of this package

回答 6

下面的程序有点慢,但是它给出了一个可以识别的格式良好的软件包列表pip。也就是说,并不是所有的人都通过“ pip”安装的,但是所有的人都应该能够通过pip进行升级。

$ pip search . | egrep -B1 'INSTALLED|LATEST'

之所以慢,是因为它列出了整个pypi存储库的内容。我提交了一张票,建议pip list提供类似的功能,但效率更高。

样本输出:(将搜索限制为所有子集而不是“。”。)

$ pip search selenium | egrep -B1 'INSTALLED|LATEST'

selenium                  - Python bindings for Selenium
  INSTALLED: 2.24.0
  LATEST:    2.25.0
--
robotframework-selenium2library - Web testing library for Robot Framework
  INSTALLED: 1.0.1 (latest)
$

The below is a little slow, but it gives a nicely formatted list of packages that pip is aware of. That is to say, not all of them were installed “by” pip, but all of them should be able to be upgraded by pip.

$ pip search . | egrep -B1 'INSTALLED|LATEST'

The reason it is slow is that it lists the contents of the entire pypi repo. I filed a ticket suggesting pip list provide similar functionality but more efficiently.

Sample output: (restricted the search to a subset instead of ‘.’ for all.)

$ pip search selenium | egrep -B1 'INSTALLED|LATEST'

selenium                  - Python bindings for Selenium
  INSTALLED: 2.24.0
  LATEST:    2.25.0
--
robotframework-selenium2library - Web testing library for Robot Framework
  INSTALLED: 1.0.1 (latest)
$

回答 7

除了@Paul Woolcock的答案,

pip freeze > requirements.txt

将在当前位置的活动环境中创建一个包含所有已安装软件包以及已安装版本号的需求文件。跑步

pip install -r requirements.txt

将安装需求文件中指定的软件包。

Adding to @Paul Woolcock’s answer,

pip freeze > requirements.txt

will create a requirements file with all installed packages along with the installed version numbers in the active environment at the current location. Running

pip install -r requirements.txt

will install the packages specified in the requirements file.


回答 8

较新版本的pip可以通过pip list -lpip freeze -l--list)执行OP所需的操作。
在Debian上(至少),手册页对此没有明确说明,而我只是在假设该功能必须存在的情况下才发现了with pip list --help

最近有评论表明此功能在文档或现有答案中均不明显(尽管有人暗示),所以我认为应该发布。我本来希望以此作为评论,但我没有信誉点。

Newer versions of pip have the ability to do what the OP wants via pip list -l or pip freeze -l (--list).
On Debian (at least) the man page doesn’t make this clear, and I only discovered it – under the assumption that the feature must exist – with pip list --help.

There are recent comments that suggest this feature is not obvious in either the documentation or the existing answers (although hinted at by some), so I thought I should post. I would have preferred to do so as a comment, but I don’t have the reputation points.


回答 9

请注意,如果您的计算机上安装了多个版本的Python,则每个版本可能都有一些pip版本。

根据您的关联,您可能需要非常谨慎使用以下pip命令:

pip3 list 

在运行Python3.4的地方为我工作。简单使用pip list返回错误The program 'pip' is currently not installed. You can install it by typing: sudo apt-get install python-pip

Take note that if you have multiple versions of Python installed on your computer, you may have a few versions of pip associated with each.

Depending on your associations, you might need to be very cautious of what pip command you use:

pip3 list 

Worked for me, where I’m running Python3.4. Simply using pip list returned the error The program 'pip' is currently not installed. You can install it by typing: sudo apt-get install python-pip.


回答 10

正如@almenon指出的那样,这不再起作用,它也不是在代码中获取包信息的支持方式。以下引发异常:

import pip
installed_packages = dict([(package.project_name, package.version) 
                           for package in pip.get_installed_distributions()])

为此,您可以import pkg_resources。这是一个例子:

import pkg_resources
installed_packages = dict([(package.project_name, package.version)
                           for package in pkg_resources.working_set])

我上线了 v3.6.5

As @almenon pointed out, this no longer works and it is not the supported way to get package information in your code. The following raises an exception:

import pip
installed_packages = dict([(package.project_name, package.version) 
                           for package in pip.get_installed_distributions()])

To accomplish this, you can import pkg_resources. Here’s an example:

import pkg_resources
installed_packages = dict([(package.project_name, package.version)
                           for package in pkg_resources.working_set])

I’m on v3.6.5


回答 11

这是Fedora或其他rpm发行版的一线工具(基于@barraponto技巧):

find /usr/lib/python2.7/site-packages -maxdepth 2 -name __init__.py | xargs rpm -qf | grep 'not owned by any package'

将此附加到上一个命令以获取更干净的输出:

 | sed -r 's:.*/(\w+)/__.*:\1:'

Here is the one-liner for fedora or other rpm distros (based on @barraponto tips):

find /usr/lib/python2.7/site-packages -maxdepth 2 -name __init__.py | xargs rpm -qf | grep 'not owned by any package'

Append this to the previous command to get cleaner output:

 | sed -r 's:.*/(\w+)/__.*:\1:'

回答 12

获取site-packages/dist-packages/如果存在)所有文件/文件夹的名称,然后使用包管理器剥离通过软件包安装的文件/文件夹名称。

Get all file/folder names in site-packages/ (and dist-packages/ if it exists), and use your package manager to strip the ones that were installed via package.


回答 13

pip Frozen列出了所有已安装的软件包,即使不是通过pip / easy_install也是如此。在CentOs / Redhat上,找到了通过rpm安装的软件包。

pip freeze lists all installed packages even if not by pip/easy_install. On CentOs/Redhat a package installed through rpm is found.


回答 14

如果使用Anaconda python发行版,则可以使用以下conda list命令查看通过什么方法安装了什么:

user@pc:~ $ conda list
# packages in environment at /anaconda3:
#
# Name                    Version                   Build  Channel
_ipyw_jlab_nb_ext_conf    0.1.0            py36h2fc01ae_0
alabaster                 0.7.10           py36h174008c_0
amqp                      2.2.2                     <pip>
anaconda                  5.1.0                    py36_2
anaconda-client           1.6.9                    py36_0

要获取安装者pip(可能包括pip其自身)安装的条目:

user@pc:~ $ conda list | grep \<pip
amqp                      2.2.2                     <pip>
astroid                   1.6.2                     <pip>
billiard                  3.5.0.3                   <pip>
blinker                   1.4                       <pip>
ez-setup                  0.9                       <pip>
feedgenerator             1.9                       <pip>

当然,您可能只想选择第一列即可进行处理(pip如果需要,则除外):

user@pc:~ $ conda list | awk '$3 ~ /pip/ {if ($1 != "pip") print $1}'
amqp        
astroid
billiard
blinker
ez-setup
feedgenerator 

最后,您可以获取这些值并使用以下命令pip卸载所有这些值:

user@pc:~ $ conda list | awk '$3 ~ /pip/ {if ($1 != "pip") print $1}' | xargs pip uninstall -y

请注意,使用-y标记pip uninstall来避免必须确认删除。

If you use the Anaconda python distribution, you can use the conda list command to see what was installed by what method:

user@pc:~ $ conda list
# packages in environment at /anaconda3:
#
# Name                    Version                   Build  Channel
_ipyw_jlab_nb_ext_conf    0.1.0            py36h2fc01ae_0
alabaster                 0.7.10           py36h174008c_0
amqp                      2.2.2                     <pip>
anaconda                  5.1.0                    py36_2
anaconda-client           1.6.9                    py36_0

To grab the entries installed by pip (including possibly pip itself):

user@pc:~ $ conda list | grep \<pip
amqp                      2.2.2                     <pip>
astroid                   1.6.2                     <pip>
billiard                  3.5.0.3                   <pip>
blinker                   1.4                       <pip>
ez-setup                  0.9                       <pip>
feedgenerator             1.9                       <pip>

Of course you probably want to just select the first column, which you can do with (excluding pip if needed):

user@pc:~ $ conda list | awk '$3 ~ /pip/ {if ($1 != "pip") print $1}'
amqp        
astroid
billiard
blinker
ez-setup
feedgenerator 

Finally you can grab these values and pip uninstall all of them using the following:

user@pc:~ $ conda list | awk '$3 ~ /pip/ {if ($1 != "pip") print $1}' | xargs pip uninstall -y

Note the use of the -y flag for the pip uninstall to avoid having to give confirmation to delete.


回答 15

对于那些没有安装pip的人,我在github上找到了这个快速脚本(适用于Python 2.7.13):

import pkg_resources
distros = pkg_resources.AvailableDistributions()
for key in distros:
  print distros[key]

For those who don’t have pip installed, I found this quick script on github (works with Python 2.7.13):

import pkg_resources
distros = pkg_resources.AvailableDistributions()
for key in distros:
  print distros[key]

回答 16

点列表[选项]您可以在此处查看完整的参考

pip list [options] You can see the complete reference here


回答 17

至少对于Ubuntu(也许还有其他人)来说,这是可行的(受该线程中的上一篇文章的启发):

printf "Installed with pip:";
pip list 2>/dev/null | gawk '{print $1;}' | while read; do pip show "${REPLY}" 2>/dev/null | grep 'Location: /usr/local/lib/python2.7/dist-packages' >/dev/null; if (( $? == 0 )); then printf " ${REPLY}"; fi; done; echo

At least for Ubuntu (maybe also others) works this (inspired by a previous post in this thread):

printf "Installed with pip:";
pip list 2>/dev/null | gawk '{print $1;}' | while read; do pip show "${REPLY}" 2>/dev/null | grep 'Location: /usr/local/lib/python2.7/dist-packages' >/dev/null; if (( $? == 0 )); then printf " ${REPLY}"; fi; done; echo

没有名为MySQLdb的模块

问题:没有名为MySQLdb的模块

我正在使用Python 2.5.4版并安装MySQL 5.0版和Django。Django在Python上运行良好,但在MySQL上运行良好。我在Windows Vista中使用它。

I am using Python version 2.5.4 and install MySQL version 5.0 and Django. Django is working fine with Python, but not MySQL. I am using it in Windows Vista.


回答 0

您需要使用以下命令之一。哪一个取决于您拥有和使用的操作系统和软件。

  1. easy_install mysql-python(混合OS)
  2. pip安装mysql-python(mix os / python 2)
  3. pip安装mysqlclient(mix os / python 3)
  4. apt-get install python-mysqldb(Linux Ubuntu,…)
  5. cd / usr / ports / databases / py-MySQLdb &&使安装干净(FreeBSD)
  6. yum安装MySQL-python(Linux Fedora,CentOS …)

对于Windows,请参见以下答案:安装mysql-python(Windows)

You need to use one of the following commands. Which one depends on what OS and software you have and use.

  1. easy_install mysql-python (mix os)
  2. pip install mysql-python (mix os/ python 2)
  3. pip install mysqlclient (mix os/ python 3)
  4. apt-get install python-mysqldb (Linux Ubuntu, …)
  5. cd /usr/ports/databases/py-MySQLdb && make install clean (FreeBSD)
  6. yum install MySQL-python (Linux Fedora, CentOS …)

For Windows, see this answer: Install mysql-python (Windows)


回答 1

…并且记住没有针对python3.x的MySQLdb

(我知道问题是关于python2.x的,但是谷歌对这篇文章的评价很高)


编辑:如评论中所述,有一个MySQLdb的fork添加了Python 3支持:github.com/PyMySQL/mysqlclient-python

…and remember there is no MySQLdb for python3.x

(I know the question is about python2.x but google rates this post quite high)


EDIT: As stated in the comments, there’s a MySQLdb’s fork that adds Python 3 support: github.com/PyMySQL/mysqlclient-python


回答 2

如果您的python版本是3.5,请执行pip install mysqlclient,其他操作对我不起作用

if your python version is 3.5, do a pip install mysqlclient, other things didn’t work for me


回答 3

mysqldb是未预安装或未随Django一起安装的Python模块。您可以mysqldb 在此处下载。

mysqldb is a module for Python that doesn’t come pre-installed or with Django. You can download mysqldb here.


回答 4

Ubuntu:

sudo apt-get install python-mysqldb

Ubuntu:

sudo apt-get install python-mysqldb

回答 5

请注意,这并未针对python 3.x进行测试

在CMD中

pip install wheel
pip install pymysql

在settings.py中

import pymysql
pymysql.install_as_MySQLdb()

它和我一起工作

Note this is not tested for python 3.x

In CMD

pip install wheel
pip install pymysql

in settings.py

import pymysql
pymysql.install_as_MySQLdb()

It worked with me


回答 6

pip install PyMySQL

然后将这两行添加到您的Project / Project / init .py

import pymysql
pymysql.install_as_MySQLdb()

适用于WIN和python 3.3+

pip install PyMySQL

and then add this two lines to your Project/Project/init.py

import pymysql
pymysql.install_as_MySQLdb()

Works on WIN and python 3.3+


回答 7

尝试这个。

pip install MySQL-python

Try this.

pip install MySQL-python

回答 8

对于窗口:

pip install mysqlclient pymysql

然后:

导入pymysql pymysql.install_as_MySQLdb()

对于python 3 Ubuntu

sudo apt-get install -y python3-mysqldb

for window :

pip install mysqlclient pymysql

then:

import pymysql pymysql.install_as_MySQLdb()

for python 3 Ubuntu

sudo apt-get install -y python3-mysqldb

回答 9

如果pip install mysqlclient产生错误并且您使用Ubuntu,请尝试:

sudo apt-get install -y python-dev libmysqlclient-dev && sudo pip install mysqlclient

If pip install mysqlclient produces an error and you use Ubuntu, try:

sudo apt-get install -y python-dev libmysqlclient-dev && sudo pip install mysqlclient

回答 10

我在Windows下遇到了同样的情况,并寻找了解决方案。

看到这篇文章安装mysql-python(Windows)

它指出,安装这样的pip环境是困难的,需要许多其他依赖项。

但我最终知道,如果我们使用mysqlclient的版本降至1.3.4,则不再需要该要求,请尝试:

pip install mysqlclient==1.3.4

I met the same situation under windows, and searched for the solution.

Seeing this post Install mysql-python (Windows).

It points out installing such a pip environment is difficult, needs many other dependencies.

But I finally know that if we use mysqlclient with a version down to 1.3.4, it don’t need that requirements any more, so try:

pip install mysqlclient==1.3.4

回答 11

  • 使用转到您的项目目录cd
  • 源/ bin /激活(如果以前没有激活过,请激活环境)。
  • 运行命令 easy_install MySQL-python
  • Go to your project directory with cd.
  • source/bin/activate (activate your env. if not previously).
  • Run the command easy_install MySQL-python

回答 12

pip install --user mysqlclient 

上面的对我来说就像魅力一样对我有效。我实际上是从sqlalchemy出错。环境信息:

Python:3.6,Ubuntu:16.04,conda 4.6.8

pip install --user mysqlclient 

above works for me like charm for me.I go the error from sqlalchemy actually. Environment information :

Python : 3.6, Ubuntu : 16.04,conda 4.6.8


回答 13

感谢derevo,但我认为还有另一种好方法:

  1. 下载并安装ActivePython
  2. 打开命令提示符
  3. 类型 pypm install mysql-python
  4. 阅读特定于此软件包的注释。

我认为pypm它比easy_install

Thanks to derevo but I think there’s another good way for doing this:

  1. Download and install ActivePython
  2. Open Command Prompt
  3. Type pypm install mysql-python
  4. Read the notes specific to this package.

I think pypm is more powerful and reliable than easy_install.


回答 14

对于Python 3+版本

安装mysql-connector为:

pip3 install mysql-connector 

示例Python DB连接代码:

import mysql.connector
db_connection = mysql.connector.connect(
  host="localhost",
  user="root",
  passwd=""
)
print(db_connection)

输出:

> <mysql.connector.connection.MySQLConnection object at > 0x000002338A4C6B00>

这意味着数据库已正确连接。

For Python 3+ version

install mysql-connector as:

pip3 install mysql-connector 

Sample Python DB connection code:

import mysql.connector
db_connection = mysql.connector.connect(
  host="localhost",
  user="root",
  passwd=""
)
print(db_connection)

Output:

> <mysql.connector.connection.MySQLConnection object at > 0x000002338A4C6B00>

This means, database is correctly connected.


回答 15

我个人建议使用pymysql而不是使用真正的MySQL连接器,该连接器为您提供独立于平台的界面,可以通过安装pip

您可以这样编辑SQLAlchemy URL模式: mysql+pymysql://username:passwd@host/database

I personally recommend using pymysql instead of using the genuine MySQL connector, which provides you with a platform independent interface and could be installed through pip.

And you could edit the SQLAlchemy URL schema like this: mysql+pymysql://username:passwd@host/database


回答 16

对于Python 3.6或更高版本sudo apt-get install libmysqlclient-dev,并pip3 install mysqlclient 不会把戏

For Python 3.6+ sudo apt-get install libmysqlclient-dev and pip3 install mysqlclient does the trick


回答 17

如果您在Vista上运行,则可能要签出Bitnami Django堆栈。它是由Apache,Python,MySQL等组成的一站式堆栈,与Bitrock跨平台安装程序打包在一起,使上手非常容易。它可以在Windows,Mac和Linux上运行。哦,是完全免费的:)

If you are running on Vista, you may want to check out the Bitnami Django stack. It is an all-in-one stack of Apache, Python, MySQL, etc. packaged with Bitrock crossplatform installers to make it really easy to get started. It runs on Windows, Mac and Linux. Oh, and is completely free :)


回答 18

我已经尝试了上面的方法,但是仍然没有名为“ MySQLdb”的模块,最后,我成功了

easy_install mysql-python

我的环境是unbuntu 14.04

I have tried methods above, but still no module named ‘MySQLdb’, finally, I succeed with

easy_install mysql-python

my env is unbuntu 14.04


回答 19

在OSX上,这些命令对我有用

brew install mysql-connector-c 
pip install MySQL-python

On OSX these commands worked for me

brew install mysql-connector-c 
pip install MySQL-python

回答 20

如果您使用的是SQLAlchemy,并且错误位于/site-packages/sqlalchemy/dialects/mysql/mysqldb.py

from ...connectors.mysqldb import (
                        MySQLDBExecutionContext,
                        MySQLDBCompiler,
                        MySQLDBIdentifierPreparer,
                        MySQLDBConnector
                    )

因此您可能错过了mysqldb连接器,SQLAlchemy解决方法是在安装mysql-python模块后重新安装sqlalchemy 。

If your are using SQLAlchemy and the error is in /site-packages/sqlalchemy/dialects/mysql/mysqldb.py:

from ...connectors.mysqldb import (
                        MySQLDBExecutionContext,
                        MySQLDBCompiler,
                        MySQLDBIdentifierPreparer,
                        MySQLDBConnector
                    )

so you may have missed mysqldb connector for SQLAlchemy and the solution is to re-install sqlalchemy after installing mysql-python module.


回答 21

Win10 / Python27对我有用:

easy_install mysql-python

所有其他“ pip install …”失败,并出现相关性错误

Win10 / Python27 this worked for me:

easy_install mysql-python

all other ‘pip install…’ failed with dependency errors


回答 22

以上都不是通过docker image在Ubuntu 18.04全新安装上为我工作的。

以下为我解决了它:

apt-get install holland python3-mysqldb

None of the above worked for me on an Ubuntu 18.04 fresh install via docker image.

The following solved it for me:

apt-get install holland python3-mysqldb


回答 23

在运行Catalina v10.15.2的Mac上,出现以下MySQLdb版本冲突:

ImportError: this is MySQLdb version (1, 2, 5, 'final', 1), but _mysql is version (1, 4, 6, 'final', 0)

为了解决这个问题,我做了以下工作:

pip uninstall MySQL-python
pip install MySQL-python

On my mac running Catalina v10.15.2, I had the following MySQLdb version conflict:

ImportError: this is MySQLdb version (1, 2, 5, 'final', 1), but _mysql is version (1, 4, 6, 'final', 0)

To resolve it, I did the following:

pip uninstall MySQL-python
pip install MySQL-python

回答 24

在Debian Buster上,以下解决方案适用于python 3.7:

sudo apt-get install libmysqlclient-dev
sudo apt-get install libssl-dev
pip install mysqlclient

On Debian Buster, the following solution worked for me with python 3.7:

sudo apt-get install libmysqlclient-dev
sudo apt-get install libssl-dev
pip install mysqlclient

回答 25

我在ubuntu(linux),对我有用的是

sudo apt-get install python3-dev default-libmysqlclient-dev build-essential

然后最后

pip install mysqlclient

I am at ubuntu (linux) and what worked for me was

sudo apt-get install python3-dev default-libmysqlclient-dev build-essential

and then finally

pip install mysqlclient

重命名字典键

问题:重命名字典键

有没有一种方法可以重命名字典键,而无需将其值重新分配给新名称并删除旧名称键;而不迭代字典键/值?

对于OrderedDict,在保持键的位置的同时执行相同的操作。

Is there a way to rename a dictionary key, without reassigning its value to a new name and removing the old name key; and without iterating through dict key/value?

In case of OrderedDict, do the same, while keeping that key’s position.


回答 0

对于常规命令,可以使用:

mydict[new_key] = mydict.pop(old_key)

对于OrderedDict,我认为您必须使用一种理解来构建一个全新的。

>>> OrderedDict(zip('123', 'abc'))
OrderedDict([('1', 'a'), ('2', 'b'), ('3', 'c')])
>>> oldkey, newkey = '2', 'potato'
>>> OrderedDict((newkey if k == oldkey else k, v) for k, v in _.viewitems())
OrderedDict([('1', 'a'), ('potato', 'b'), ('3', 'c')])

正如这个问题似乎提出的那样,修改密钥本身是不切实际的,因为dict密钥通常是不可变的对象,例如数字,字符串或元组。您可以尝试在python中实现“重命名”,而不是尝试修改键,而是将值重新分配给新键并删除旧键。

For a regular dict, you can use:

mydict[new_key] = mydict.pop(old_key)

For an OrderedDict, I think you must build an entirely new one using a comprehension.

>>> OrderedDict(zip('123', 'abc'))
OrderedDict([('1', 'a'), ('2', 'b'), ('3', 'c')])
>>> oldkey, newkey = '2', 'potato'
>>> OrderedDict((newkey if k == oldkey else k, v) for k, v in _.viewitems())
OrderedDict([('1', 'a'), ('potato', 'b'), ('3', 'c')])

Modifying the key itself, as this question seems to be asking, is impractical because dict keys are usually immutable objects such as numbers, strings or tuples. Instead of trying to modify the key, reassigning the value to a new key and removing the old key is how you can achieve the “rename” in python.


回答 1

1行中的最佳方法:

>>> d = {'test':[0,1,2]}
>>> d['test2'] = d.pop('test')
>>> d
{'test2': [0, 1, 2]}

best method in 1 line:

>>> d = {'test':[0,1,2]}
>>> d['test2'] = d.pop('test')
>>> d
{'test2': [0, 1, 2]}

回答 2

通过使用check newkey!=oldkey,可以这样:

if newkey!=oldkey:  
    dictionary[newkey] = dictionary[oldkey]
    del dictionary[oldkey]

Using a check for newkey!=oldkey, this way you can do:

if newkey!=oldkey:  
    dictionary[newkey] = dictionary[oldkey]
    del dictionary[oldkey]

回答 3

如果重命名所有字典键:

target_dict = {'k1':'v1', 'k2':'v2', 'k3':'v3'}
new_keys = ['k4','k5','k6']

for key,n_key in zip(target_dict.keys(), new_keys):
    target_dict[n_key] = target_dict.pop(key)

In case of renaming all dictionary keys:

target_dict = {'k1':'v1', 'k2':'v2', 'k3':'v3'}
new_keys = ['k4','k5','k6']

for key,n_key in zip(target_dict.keys(), new_keys):
    target_dict[n_key] = target_dict.pop(key)

回答 4

您可以使用OrderedDict recipeRaymond Hettinger编写的代码并对其进行修改以添加一个rename方法,但这将成为O(N)的复杂性:

def rename(self,key,new_key):
    ind = self._keys.index(key)  #get the index of old key, O(N) operation
    self._keys[ind] = new_key    #replace old key with new key in self._keys
    self[new_key] = self[key]    #add the new key, this is added at the end of self._keys
    self._keys.pop(-1)           #pop the last item in self._keys

例:

dic = OrderedDict((("a",1),("b",2),("c",3)))
print dic
dic.rename("a","foo")
dic.rename("b","bar")
dic["d"] = 5
dic.rename("d","spam")
for k,v in  dic.items():
    print k,v

输出:

OrderedDict({'a': 1, 'b': 2, 'c': 3})
foo 1
bar 2
c 3
spam 5

You can use this OrderedDict recipe written by Raymond Hettinger and modify it to add a rename method, but this is going to be a O(N) in complexity:

def rename(self,key,new_key):
    ind = self._keys.index(key)  #get the index of old key, O(N) operation
    self._keys[ind] = new_key    #replace old key with new key in self._keys
    self[new_key] = self[key]    #add the new key, this is added at the end of self._keys
    self._keys.pop(-1)           #pop the last item in self._keys

Example:

dic = OrderedDict((("a",1),("b",2),("c",3)))
print dic
dic.rename("a","foo")
dic.rename("b","bar")
dic["d"] = 5
dic.rename("d","spam")
for k,v in  dic.items():
    print k,v

output:

OrderedDict({'a': 1, 'b': 2, 'c': 3})
foo 1
bar 2
c 3
spam 5

回答 5

在我之前的一些人提到了.pop一种删除和创建单行密钥的技巧。

我个人认为更明确的实现更具可读性:

d = {'a': 1, 'b': 2}
v = d['b']
del d['b']
d['c'] = v

上面的代码返回 {'a': 1, 'c': 2}

A few people before me mentioned the .pop trick to delete and create a key in a one-liner.

I personally find the more explicit implementation more readable:

d = {'a': 1, 'b': 2}
v = d['b']
del d['b']
d['c'] = v

The code above returns {'a': 1, 'c': 2}


回答 6

其他答案也不错。但是在python3.6中,常规字典也有顺序。因此在正常情况下很难保持钥匙的位置。

def rename(old_dict,old_name,new_name):
    new_dict = {}
    for key,value in zip(old_dict.keys(),old_dict.values()):
        new_key = key if key != old_name else new_name
        new_dict[new_key] = old_dict[key]
    return new_dict

Other answers are pretty good.But in python3.6, regular dict also has order. So it’s hard to keep key’s position in normal case.

def rename(old_dict,old_name,new_name):
    new_dict = {}
    for key,value in zip(old_dict.keys(),old_dict.values()):
        new_key = key if key != old_name else new_name
        new_dict[new_key] = old_dict[key]
    return new_dict

回答 7

在Python 3.6中(继续吗?),我将采用以下一种形式

test = {'a': 1, 'old': 2, 'c': 3}
old_k = 'old'
new_k = 'new'
new_v = 4  # optional

print(dict((new_k, new_v) if k == old_k else (k, v) for k, v in test.items()))

产生

{'a': 1, 'new': 4, 'c': 3}

可能值得注意的是,如果没有print声明,ipython console / jupyter笔记本将按其选择的顺序显示字典。

In Python 3.6 (onwards?) I would go for the following one-liner

test = {'a': 1, 'old': 2, 'c': 3}
old_k = 'old'
new_k = 'new'
new_v = 4  # optional

print(dict((new_k, new_v) if k == old_k else (k, v) for k, v in test.items()))

which produces

{'a': 1, 'new': 4, 'c': 3}

May be worth noting that without the print statement the ipython console/jupyter notebook present the dictionary in an order of their choosing…


回答 8

我想出了这个功能,它不会使原始字典发生变异。该功能也支持字典列表。

import functools
from typing import Union, Dict, List


def rename_dict_keys(
    data: Union[Dict, List[Dict]], old_key: str, new_key: str
):
    """
    This function renames dictionary keys

    :param data:
    :param old_key:
    :param new_key:
    :return: Union[Dict, List[Dict]]
    """
    if isinstance(data, dict):
        res = {k: v for k, v in data.items() if k != old_key}
        try:
            res[new_key] = data[old_key]
        except KeyError:
            raise KeyError(
                "cannot rename key as old key '%s' is not present in data"
                % old_key
            )
        return res
    elif isinstance(data, list):
        return list(
            map(
                functools.partial(
                    rename_dict_keys, old_key=old_key, new_key=new_key
                ),
                data,
            )
        )
    raise ValueError("expected type List[Dict] or Dict got '%s' for data" % type(data))

I came up with this function which does not mutate the original dictionary. This function also supports list of dictionaries too.

import functools
from typing import Union, Dict, List


def rename_dict_keys(
    data: Union[Dict, List[Dict]], old_key: str, new_key: str
):
    """
    This function renames dictionary keys

    :param data:
    :param old_key:
    :param new_key:
    :return: Union[Dict, List[Dict]]
    """
    if isinstance(data, dict):
        res = {k: v for k, v in data.items() if k != old_key}
        try:
            res[new_key] = data[old_key]
        except KeyError:
            raise KeyError(
                "cannot rename key as old key '%s' is not present in data"
                % old_key
            )
        return res
    elif isinstance(data, list):
        return list(
            map(
                functools.partial(
                    rename_dict_keys, old_key=old_key, new_key=new_key
                ),
                data,
            )
        )
    raise ValueError("expected type List[Dict] or Dict got '%s' for data" % type(data))

回答 9

重命名密钥时,我在dict.pop()中使用了@wim的答案,但是我发现了一个问题。在dict上循环以更改键,而又未将旧键列表与dict实例完全分开,导致将新的,已更改的键循环到循环中,并丢失了一些现有键。

首先,我这样做:

for current_key in my_dict:
    new_key = current_key.replace(':','_')
    fixed_metadata[new_key] = fixed_metadata.pop(current_key)

我发现字典以这种方式循环,即使在不应该的时候,字典也一直在寻找键,例如,新的键,即我已更改的键!我需要将实例彼此完全分开,以(a)避免在for循环中找到自己更改的键,以及(b)由于某些原因而在循环中找不到某些键。

我现在正在这样做:

current_keys = list(my_dict.keys())
for current_key in current_keys:
    and so on...

将my_dict.keys()转换为列表对于释放对更改的dict的引用是必要的。仅使用my_dict.keys()就使我与原始实例保持联系,并产生了奇怪的副作用。

I am using @wim ‘s answer above, with dict.pop() when renaming keys, but I found a gotcha. Cycling through the dict to change the keys, without separating the list of old keys completely from the dict instance, resulted in cycling new, changed keys into the loop, and missing some existing keys.

To start with, I did it this way:

for current_key in my_dict:
    new_key = current_key.replace(':','_')
    fixed_metadata[new_key] = fixed_metadata.pop(current_key)

I found that cycling through the dict in this way, the dictionary kept finding keys even when it shouldn’t, i.e., the new keys, the ones I had changed! I needed to separate the instances completely from each other to (a) avoid finding my own changed keys in the for loop, and (b) find some keys that were not being found within the loop for some reason.

I am doing this now:

current_keys = list(my_dict.keys())
for current_key in current_keys:
    and so on...

Converting the my_dict.keys() to a list was necessary to get free of the reference to the changing dict. Just using my_dict.keys() kept me tied to the original instance, with the strange side effects.


回答 10

如果有人想一次重命名所有键,并提供一个包含新名称的列表:

def rename_keys(dict_, new_keys):
    """
     new_keys: type List(), must match length of dict_
    """

    # dict_ = {oldK: value}
    # d1={oldK:newK,} maps old keys to the new ones:  
    d1 = dict( zip( list(dict_.keys()), new_keys) )

          # d1{oldK} == new_key 
    return {d1[oldK]: value for oldK, value in dict_.items()}

In case someone wants to rename all the keys at once providing a list with the new names:

def rename_keys(dict_, new_keys):
    """
     new_keys: type List(), must match length of dict_
    """

    # dict_ = {oldK: value}
    # d1={oldK:newK,} maps old keys to the new ones:  
    d1 = dict( zip( list(dict_.keys()), new_keys) )

          # d1{oldK} == new_key 
    return {d1[oldK]: value for oldK, value in dict_.items()}

回答 11

@ helloswift123我喜欢您的功能。这是在单个调用中重命名多个键的修改:

def rename(d, keymap):
    """
    :param d: old dict
    :type d: dict
    :param keymap: [{:keys from-keys :values to-keys} keymap]
    :returns: new dict
    :rtype: dict
    """
    new_dict = {}
    for key, value in zip(d.keys(), d.values()):
        new_key = keymap.get(key, key)
        new_dict[new_key] = d[key]
    return new_dict

@helloswift123 I like your function. Here is a modification to rename multiple keys in a single call:

def rename(d, keymap):
    """
    :param d: old dict
    :type d: dict
    :param keymap: [{:keys from-keys :values to-keys} keymap]
    :returns: new dict
    :rtype: dict
    """
    new_dict = {}
    for key, value in zip(d.keys(), d.values()):
        new_key = keymap.get(key, key)
        new_dict[new_key] = d[key]
    return new_dict

回答 12

假设您要将键k3重命名为k4:

temp_dict = {'k1':'v1', 'k2':'v2', 'k3':'v3'}
temp_dict['k4']= temp_dict.pop('k3')

Suppose you want to rename key k3 to k4:

temp_dict = {'k1':'v1', 'k2':'v2', 'k3':'v3'}
temp_dict['k4']= temp_dict.pop('k3')

如何查看pytest运行期间创建的正常打印输出?

问题:如何查看pytest运行期间创建的正常打印输出?

有时候,我只想在代码中插入一些打印语句,然后看看在执行该操作时打印出来的内容。我通常的“锻炼”方式是使用现有的pytest测试。但是,当我运行这些命令时,我似乎看不到任何标准输出(至少从我的IDE PyCharm内部)。

有没有一种简单的方法可以在pytest运行期间查看标准输出?

Sometimes I want to just insert some print statements in my code, and see what gets printed out when I exercise it. My usual way to “exercise” it is with existing pytest tests. But when I run these, I don’t seem able to see any standard output (at least from within PyCharm, my IDE).

Is there a simple way to see standard output during a pytest run?


回答 0

-s开关禁用每次测试捕获。

The -s switch disables per-test capturing.


回答 1

在对已接受答案的评论中,问:

有什么方法可以打印到控制台捕获输出,以便将其显示在junit报告中?

在UNIX中,这通常称为teeing。理想情况下,py.test默认是发球而不是捕获。尽管Python很少支持开箱即用,但py.test或任何现有的第三方py.test插件(无论如何我都知道)都不理想。

用Monkey修补py.test来做不受支持的事情并非易事。为什么?因为:

  • 大多数py.test功能都锁定在打算从外部导入的私有_pytest软件包后面。尝试这样做而不知道自己在做什么,通常会导致公共包在运行时引发模糊的异常。非常感谢py.test。真正可靠的体系结构。pytest
  • 即使您确实想出了如何以_pytest安全的方式对专用API 进行Monkey补丁的操作,也必须运行pytest由外部py.test命令运行的公共程序包之前这样做。您不能在插件中执行此操作(例如,conftest测试套件中的顶级模块)。到py.test懒惰地动态导入插件时,您想要进行Monkey补丁的任何py.test类都已被实例化很久了-并且您无权访问该实例。这意味着,如果您希望有意义地应用Monkey补丁,则无法再安全地运行外部py.test命令。相反,您必须使用自定义setuptools包装该命令的运行test 命令(按顺序):
    1. Monkey修补专用_pytestAPI。
    2. 调用public pytest.main()函数运行py.test命令。

这个答案是Monkey修补py.test -s--capture=no选项来捕获stderr而不是 stdout的。默认情况下,这些选项既不捕获stderr也不捕获stdout。当然,这还不够。但是,每一次伟大的旅程都是从一个繁琐的前传开始的,每个人在五年之内就忘记了。

为什么这样 我现在告诉你。我的py.test驱动的测试套件包含缓慢的功能测试。显示这些测试的标准输出是有益的,让人放心,防止leycec到达了killall -9 py.test当又一个长期运行的功能测试失败做几星期什么。但是,显示这些测试的stderr可以防止py.test报告有关测试失败的异常回溯。这是完全没有帮助的。因此,我们强制py.test捕获stderr 而不捕获stdout。

在开始之前,此答案假定您已经有一个test调用py.test 的自定义setuptools 命令。如果不这样做,请参阅py.test编写良好的“良好做法”页面的“ 手动集成”小节。

不要安装pytest亚军,第三方插件setuptools的提供自定义setuptools的test命令也调用py.test。如果已经安装pytest-runner,则可能需要卸载该pip3软件包,然后采用上面链接的手动方法。

假设您按照上面突出显示的“ 手动集成”中的说明进行操作,则您的代码库现在应包含一个PyTest.run_tests()方法。修改此方法,使其类似于:

class PyTest(TestCommand):
             .
             .
             .
    def run_tests(self):
        # Import the public "pytest" package *BEFORE* the private "_pytest"
        # package. While importation order is typically ignorable, imports can
        # technically have side effects. Tragicomically, that is the case here.
        # Importing the public "pytest" package establishes runtime
        # configuration required by submodules of the private "_pytest" package.
        # The former *MUST* always be imported before the latter. Failing to do
        # so raises obtuse exceptions at runtime... which is bad.
        import pytest
        from _pytest.capture import CaptureManager, FDCapture, MultiCapture

        # If the private method to be monkey-patched no longer exists, py.test
        # is either broken or unsupported. In either case, raise an exception.
        if not hasattr(CaptureManager, '_getcapture'):
            from distutils.errors import DistutilsClassError
            raise DistutilsClassError(
                'Class "pytest.capture.CaptureManager" method _getcapture() '
                'not found. The current version of py.test is either '
                'broken (unlikely) or unsupported (likely).'
            )

        # Old method to be monkey-patched.
        _getcapture_old = CaptureManager._getcapture

        # New method applying this monkey-patch. Note the use of:
        #
        # * "out=False", *NOT* capturing stdout.
        # * "err=True", capturing stderr.
        def _getcapture_new(self, method):
            if method == "no":
                return MultiCapture(
                    out=False, err=True, in_=False, Capture=FDCapture)
            else:
                return _getcapture_old(self, method)

        # Replace the old with the new method.
        CaptureManager._getcapture = _getcapture_new

        # Run py.test with all passed arguments.
        errno = pytest.main(self.pytest_args)
        sys.exit(errno)

要启用此Monkey补丁,请运行py.test,如下所示:

python setup.py test -a "-s"

现在将捕获Stderr而不是 stdout。好漂亮!

将上面的Monkey补丁扩展到tee stdout和stderr上,作为练习,留给读者一整桶的空闲时间。

In an upvoted comment to the accepted answer, Joe asks:

Is there any way to print to the console AND capture the output so that it shows in the junit report?

In UNIX, this is commonly referred to as teeing. Ideally, teeing rather than capturing would be the py.test default. Non-ideally, neither py.test nor any existing third-party py.test plugin (…that I know of, anyway) supports teeing – despite Python trivially supporting teeing out-of-the-box.

Monkey-patching py.test to do anything unsupported is non-trivial. Why? Because:

  • Most py.test functionality is locked behind a private _pytest package not intended to be externally imported. Attempting to do so without knowing what you’re doing typically results in the public pytest package raising obscure exceptions at runtime. Thanks alot, py.test. Really robust architecture you got there.
  • Even when you do figure out how to monkey-patch the private _pytest API in a safe manner, you have to do so before running the public pytest package run by the external py.test command. You cannot do this in a plugin (e.g., a top-level conftest module in your test suite). By the time py.test lazily gets around to dynamically importing your plugin, any py.test class you wanted to monkey-patch has long since been instantiated – and you do not have access to that instance. This implies that, if you want your monkey-patch to be meaningfully applied, you can no longer safely run the external py.test command. Instead, you have to wrap the running of that command with a custom setuptools test command that (in order):
    1. Monkey-patches the private _pytest API.
    2. Calls the public pytest.main() function to run the py.test command.

This answer monkey-patches py.test’s -s and --capture=no options to capture stderr but not stdout. By default, these options capture neither stderr nor stdout. This isn’t quite teeing, of course. But every great journey begins with a tedious prequel everyone forgets in five years.

Why do this? I shall now tell you. My py.test-driven test suite contains slow functional tests. Displaying the stdout of these tests is helpful and reassuring, preventing leycec from reaching for killall -9 py.test when yet another long-running functional test fails to do anything for weeks on end. Displaying the stderr of these tests, however, prevents py.test from reporting exception tracebacks on test failures. Which is completely unhelpful. Hence, we coerce py.test to capture stderr but not stdout.

Before we get to it, this answer assumes you already have a custom setuptools test command invoking py.test. If you don’t, see the Manual Integration subsection of py.test’s well-written Good Practices page.

Do not install pytest-runner, a third-party setuptools plugin providing a custom setuptools test command also invoking py.test. If pytest-runner is already installed, you’ll probably need to uninstall that pip3 package and then adopt the manual approach linked to above.

Assuming you followed the instructions in Manual Integration highlighted above, your codebase should now contain a PyTest.run_tests() method. Modify this method to resemble:

class PyTest(TestCommand):
             .
             .
             .
    def run_tests(self):
        # Import the public "pytest" package *BEFORE* the private "_pytest"
        # package. While importation order is typically ignorable, imports can
        # technically have side effects. Tragicomically, that is the case here.
        # Importing the public "pytest" package establishes runtime
        # configuration required by submodules of the private "_pytest" package.
        # The former *MUST* always be imported before the latter. Failing to do
        # so raises obtuse exceptions at runtime... which is bad.
        import pytest
        from _pytest.capture import CaptureManager, FDCapture, MultiCapture

        # If the private method to be monkey-patched no longer exists, py.test
        # is either broken or unsupported. In either case, raise an exception.
        if not hasattr(CaptureManager, '_getcapture'):
            from distutils.errors import DistutilsClassError
            raise DistutilsClassError(
                'Class "pytest.capture.CaptureManager" method _getcapture() '
                'not found. The current version of py.test is either '
                'broken (unlikely) or unsupported (likely).'
            )

        # Old method to be monkey-patched.
        _getcapture_old = CaptureManager._getcapture

        # New method applying this monkey-patch. Note the use of:
        #
        # * "out=False", *NOT* capturing stdout.
        # * "err=True", capturing stderr.
        def _getcapture_new(self, method):
            if method == "no":
                return MultiCapture(
                    out=False, err=True, in_=False, Capture=FDCapture)
            else:
                return _getcapture_old(self, method)

        # Replace the old with the new method.
        CaptureManager._getcapture = _getcapture_new

        # Run py.test with all passed arguments.
        errno = pytest.main(self.pytest_args)
        sys.exit(errno)

To enable this monkey-patch, run py.test as follows:

python setup.py test -a "-s"

Stderr but not stdout will now be captured. Nifty!

Extending the above monkey-patch to tee stdout and stderr is left as an exercise to the reader with a barrel-full of free time.


回答 2

运行测试时,请使用该-s选项。exampletest.py测试运行时,所有打印语句将在控制台上打印。

py.test exampletest.py -s

When running the test use the -s option. All print statements in exampletest.py would get printed on the console when test is run.

py.test exampletest.py -s

回答 3

根据pytest文档pytest的版本3可以临时禁用测试中的捕获:

def test_disabling_capturing(capsys):
    print('this output is captured')
    with capsys.disabled():
        print('output not captured, going directly to sys.stdout')
    print('this output is also captured')

According to pytest documentation, version 3 of pytest can temporary disable capture in a test:

def test_disabling_capturing(capsys):
    print('this output is captured')
    with capsys.disabled():
        print('output not captured, going directly to sys.stdout')
    print('this output is also captured')

回答 4

pytest捕获单个测试的标准输出,并仅在特定条件下显示它们,并默认显示其打印的测试摘要。

额外的摘要信息可以使用’-r’选项显示:

pytest -rP

显示已通过测试的捕获输出。

pytest -rx

显示捕获的失败测试输出(默认行为)。

-r的输出格式比-s的输出格式更漂亮。

pytest captures the stdout from individual tests and displays them only on certain conditions, along with the summary of the tests it prints by default.

Extra summary info can be shown using the ‘-r’ option:

pytest -rP

shows the captured output of passed tests.

pytest -rx

shows the captured output of failed tests (default behaviour).

The formatting of the output is prettier with -r than with -s.


回答 5


尝试 pytest -s -v test_login.py在控制台中获取更多信息。

-v 很短 --verbose

-s 表示“禁用所有捕获”




Try pytest -s -v test_login.py for more info in console.

-v it’s a short --verbose

-s means ‘disable all capturing’




回答 6

如果您使用的是PyCharm IDE,则可以使用“运行”工具栏运行该单个测试或所有测试。“运行”工具窗口显示由应用程序生成的输出,您可以在其中看到所有打印语句,作为测试输出的一部分。

If you are using PyCharm IDE, then you can run that individual test or all tests using Run toolbar. The Run tool window displays output generated by your application and you can see all the print statements in there as part of test output.


回答 7

pytest --capture=tee-sys是最近添加的。您可以捕获并查看stdout / err上的输出。

pytest --capture=tee-sys was recently added. You can capture as well as see the output on stdout/err.


回答 8

其他答案不起作用。查看捕获的输出的唯一方法是使用以下标志:

pytest-显示全部

The other answers don’t work. The only way to see the captured output is using the following flag:

pytest –show-capture all


查找Python解释器的完整路径?

问题:查找Python解释器的完整路径?

如何从当前执行的Python脚本中找到当前运行的Python解释器的完整路径?

How do I find the full path of the currently running Python interpreter from within the currently executing Python script?


回答 0

sys.executable 包含当前运行的Python解释器的完整路径。

import sys

print(sys.executable)

现在记录在这里

sys.executable contains full path of the currently running Python interpreter.

import sys

print(sys.executable)

which is now documented here


回答 1

只是使用os.environ以下方法来指出有用性的另一种方式:

import os
python_executable_path = os.environ['_']

例如

$ python -c "import os; print(os.environ['_'])"
/usr/bin/python

Just noting a different way of questionable usefulness, using os.environ:

import os
python_executable_path = os.environ['_']

e.g.

$ python -c "import os; print(os.environ['_'])"
/usr/bin/python

回答 2

有几种其他方法可以找出Linux中当前使用的python:1)which python命令。2)command -v python命令3)type python命令

同样,在Windows上使用Cygwin也会得到相同的结果。

kuvivek@HOSTNAME ~
$ which python
/usr/bin/python

kuvivek@HOSTNAME ~
$ whereis python
python: /usr/bin/python /usr/bin/python3.4 /usr/lib/python2.7 /usr/lib/python3.4        /usr/include/python2.7 /usr/include/python3.4m /usr/share/man/man1/python.1.gz

kuvivek@HOSTNAME ~
$ which python3
/usr/bin/python3

kuvivek@HOSTNAME ~
$ command -v python
/usr/bin/python

kuvivek@HOSTNAME ~
$ type python
python is hashed (/usr/bin/python)

如果您已经在python shell中。尝试任何这些。注意:这是另一种方法。不是最好的pythonic方法。

>>>
>>> import os
>>> os.popen('which python').read()
'/usr/bin/python\n'
>>>
>>> os.popen('type python').read()
'python is /usr/bin/python\n'
>>>
>>> os.popen('command -v python').read()
'/usr/bin/python\n'
>>>
>>>

There are a few alternate ways to figure out the currently used python in Linux is: 1) which python command. 2) command -v python command 3) type python command

Similarly On Windows with Cygwin will also result the same.

kuvivek@HOSTNAME ~
$ which python
/usr/bin/python

kuvivek@HOSTNAME ~
$ whereis python
python: /usr/bin/python /usr/bin/python3.4 /usr/lib/python2.7 /usr/lib/python3.4        /usr/include/python2.7 /usr/include/python3.4m /usr/share/man/man1/python.1.gz

kuvivek@HOSTNAME ~
$ which python3
/usr/bin/python3

kuvivek@HOSTNAME ~
$ command -v python
/usr/bin/python

kuvivek@HOSTNAME ~
$ type python
python is hashed (/usr/bin/python)

If you are already in the python shell. Try anyone of these. Note: This is an alternate way. Not the best pythonic way.

>>>
>>> import os
>>> os.popen('which python').read()
'/usr/bin/python\n'
>>>
>>> os.popen('type python').read()
'python is /usr/bin/python\n'
>>>
>>> os.popen('command -v python').read()
'/usr/bin/python\n'
>>>
>>>

使用请求在python中下载大文件

问题:使用请求在python中下载大文件

请求是一个非常不错的库。我想用它来下载大文件(> 1GB)。问题是不可能将整个文件保留在内存中,我需要分块读取它。这是以下代码的问题

import requests

def DownloadFile(url)
    local_filename = url.split('/')[-1]
    r = requests.get(url)
    f = open(local_filename, 'wb')
    for chunk in r.iter_content(chunk_size=512 * 1024): 
        if chunk: # filter out keep-alive new chunks
            f.write(chunk)
    f.close()
    return 

由于某种原因,它无法按这种方式工作。仍将响应加载到内存中,然后再将其保存到文件中。

更新

如果您需要一个小型客户端(Python 2.x /3.x),可以从FTP下载大文件,则可以在此处找到它。它支持多线程和重新连接(它确实监视连接),还可以为下载任务调整套接字参数。

Requests is a really nice library. I’d like to use it for download big files (>1GB). The problem is it’s not possible to keep whole file in memory I need to read it in chunks. And this is a problem with the following code

import requests

def DownloadFile(url)
    local_filename = url.split('/')[-1]
    r = requests.get(url)
    f = open(local_filename, 'wb')
    for chunk in r.iter_content(chunk_size=512 * 1024): 
        if chunk: # filter out keep-alive new chunks
            f.write(chunk)
    f.close()
    return 

By some reason it doesn’t work this way. It still loads response into memory before save it to a file.

UPDATE

If you need a small client (Python 2.x /3.x) which can download big files from FTP, you can find it here. It supports multithreading & reconnects (it does monitor connections) also it tunes socket params for the download task.


回答 0

使用以下流代码,无论下载文件的大小如何,Python内存的使用都受到限制:

def download_file(url):
    local_filename = url.split('/')[-1]
    # NOTE the stream=True parameter below
    with requests.get(url, stream=True) as r:
        r.raise_for_status()
        with open(local_filename, 'wb') as f:
            for chunk in r.iter_content(chunk_size=8192): 
                # If you have chunk encoded response uncomment if
                # and set chunk_size parameter to None.
                #if chunk: 
                f.write(chunk)
    return local_filename

请注意,使用返回的字节数iter_content不完全是chunk_size; 它应该是一个通常更大的随机数,并且在每次迭代中都应该有所不同。

https://requests.readthedocs.io/en/latest/user/advanced/#body-content-workflowhttps://requests.readthedocs.io/en/latest/api/#requests.Response.iter_content进一步参考。

With the following streaming code, the Python memory usage is restricted regardless of the size of the downloaded file:

def download_file(url):
    local_filename = url.split('/')[-1]
    # NOTE the stream=True parameter below
    with requests.get(url, stream=True) as r:
        r.raise_for_status()
        with open(local_filename, 'wb') as f:
            for chunk in r.iter_content(chunk_size=8192): 
                # If you have chunk encoded response uncomment if
                # and set chunk_size parameter to None.
                #if chunk: 
                f.write(chunk)
    return local_filename

Note that the number of bytes returned using iter_content is not exactly the chunk_size; it’s expected to be a random number that is often far bigger, and is expected to be different in every iteration.

See https://requests.readthedocs.io/en/latest/user/advanced/#body-content-workflow and https://requests.readthedocs.io/en/latest/api/#requests.Response.iter_content for further reference.


回答 1

如果使用Response.raw和,则容易得多shutil.copyfileobj()

import requests
import shutil

def download_file(url):
    local_filename = url.split('/')[-1]
    with requests.get(url, stream=True) as r:
        with open(local_filename, 'wb') as f:
            shutil.copyfileobj(r.raw, f)

    return local_filename

这样就无需占用过多内存就可以将文件流式传输到磁盘,并且代码很简单。

It’s much easier if you use Response.raw and shutil.copyfileobj():

import requests
import shutil

def download_file(url):
    local_filename = url.split('/')[-1]
    with requests.get(url, stream=True) as r:
        with open(local_filename, 'wb') as f:
            shutil.copyfileobj(r.raw, f)

    return local_filename

This streams the file to disk without using excessive memory, and the code is simple.


回答 2

OP并不是在问什么,但是…这样做很简单urllib

from urllib.request import urlretrieve
url = 'http://mirror.pnl.gov/releases/16.04.2/ubuntu-16.04.2-desktop-amd64.iso'
dst = 'ubuntu-16.04.2-desktop-amd64.iso'
urlretrieve(url, dst)

或这样,如果您要将其保存到临时文件中:

from urllib.request import urlopen
from shutil import copyfileobj
from tempfile import NamedTemporaryFile
url = 'http://mirror.pnl.gov/releases/16.04.2/ubuntu-16.04.2-desktop-amd64.iso'
with urlopen(url) as fsrc, NamedTemporaryFile(delete=False) as fdst:
    copyfileobj(fsrc, fdst)

我看了看这个过程:

watch 'ps -p 18647 -o pid,ppid,pmem,rsz,vsz,comm,args; ls -al *.iso'

而且我看到文件在增长,但内存使用量保持在17 MB。我想念什么吗?

Not exactly what OP was asking, but… it’s ridiculously easy to do that with urllib:

from urllib.request import urlretrieve
url = 'http://mirror.pnl.gov/releases/16.04.2/ubuntu-16.04.2-desktop-amd64.iso'
dst = 'ubuntu-16.04.2-desktop-amd64.iso'
urlretrieve(url, dst)

Or this way, if you want to save it to a temporary file:

from urllib.request import urlopen
from shutil import copyfileobj
from tempfile import NamedTemporaryFile
url = 'http://mirror.pnl.gov/releases/16.04.2/ubuntu-16.04.2-desktop-amd64.iso'
with urlopen(url) as fsrc, NamedTemporaryFile(delete=False) as fdst:
    copyfileobj(fsrc, fdst)

I watched the process:

watch 'ps -p 18647 -o pid,ppid,pmem,rsz,vsz,comm,args; ls -al *.iso'

And I saw the file growing, but memory usage stayed at 17 MB. Am I missing something?


回答 3

您的块大小可能太大,您是否尝试过删除它-一次一次可能是1024个字节?(同样,您可以with用来整理语法)

def DownloadFile(url):
    local_filename = url.split('/')[-1]
    r = requests.get(url)
    with open(local_filename, 'wb') as f:
        for chunk in r.iter_content(chunk_size=1024): 
            if chunk: # filter out keep-alive new chunks
                f.write(chunk)
    return 

顺便说一句,您如何推断响应已加载到内存中?

这听起来仿佛Python没有刷新数据文件,从其他SO问题,你可以尝试f.flush(),并os.fsync()迫使文件的写入和释放内存;

    with open(local_filename, 'wb') as f:
        for chunk in r.iter_content(chunk_size=1024): 
            if chunk: # filter out keep-alive new chunks
                f.write(chunk)
                f.flush()
                os.fsync(f.fileno())

Your chunk size could be too large, have you tried dropping that – maybe 1024 bytes at a time? (also, you could use with to tidy up the syntax)

def DownloadFile(url):
    local_filename = url.split('/')[-1]
    r = requests.get(url)
    with open(local_filename, 'wb') as f:
        for chunk in r.iter_content(chunk_size=1024): 
            if chunk: # filter out keep-alive new chunks
                f.write(chunk)
    return 

Incidentally, how are you deducing that the response has been loaded into memory?

It sounds as if python isn’t flushing the data to file, from other SO questions you could try f.flush() and os.fsync() to force the file write and free memory;

    with open(local_filename, 'wb') as f:
        for chunk in r.iter_content(chunk_size=1024): 
            if chunk: # filter out keep-alive new chunks
                f.write(chunk)
                f.flush()
                os.fsync(f.fileno())