标签归档:coding-style

pythonic的方式做N次没有索引变量?

问题:pythonic的方式做N次没有索引变量?

我每天都越来越喜欢python。

今天,我正在编写一些代码,例如:

for i in xrange(N):
    do_something()

我必须做N次。但是每次都不依赖于i(索引变量)的值。我意识到我正在创建一个我从未使用过的变量(i),并且我想:“无疑,这是一种更加Python化的方法,不需要那个无用的索引变量。”

所以…问题是:您知道如何以更(pythonic)漂亮的方式完成此简单任务吗?

Every day I love python more and more.

Today, I was writing some code like:

for i in xrange(N):
    do_something()

I had to do something N times. But each time didn’t depend on the value of i (index variable). I realized that I was creating a variable I never used (i), and I thought “There surely is a more pythonic way of doing this without the need for that useless index variable.”

So… the question is: do you know how to do this simple task in a more (pythonic) beautiful way?


回答 0

比循环更快的方法xrange(N)是:

import itertools

for _ in itertools.repeat(None, N):
    do_something()

A slightly faster approach than looping on xrange(N) is:

import itertools

for _ in itertools.repeat(None, N):
    do_something()

回答 1

使用_变量,正如我在问这个问题时所了解到的,例如:

# A long way to do integer exponentiation
num = 2
power = 3
product = 1
for _ in xrange(power):
    product *= num
print product

Use the _ variable, as I learned when I asked this question, for example:

# A long way to do integer exponentiation
num = 2
power = 3
product = 1
for _ in xrange(power):
    product *= num
print product

回答 2

我只是使用for _ in range(n),这很重要。它会在Python 2中生成大量数字的整个列表,但是如果您使用的是Python 3,这不是问题。

I just use for _ in range(n), it’s straight to the point. It’s going to generate the entire list for huge numbers in Python 2, but if you’re using Python 3 it’s not a problem.


回答 3

由于函数是一等公民,因此您可以编写小型包装器(来自Alex的回答)

def repeat(f, N):
    for _ in itertools.repeat(None, N): f()

那么您可以将函数作为参数传递。

since function is first-class citizen, you can write small wrapper (from Alex answers)

def repeat(f, N):
    for _ in itertools.repeat(None, N): f()

then you can pass function as argument.


回答 4

_与x相同。但是,这是一个python惯用法,用于指示您不打算使用的标识符。在python中,这些标识符不会像其他语言中的变量那样占用内存或分配空间。很容易忘记这一点。它们只是指向对象的名称,在这种情况下,每次迭代都是一个整数。

The _ is the same thing as x. However it’s a python idiom that’s used to indicate an identifier that you don’t intend to use. In python these identifiers don’t takes memor or allocate space like variables do in other languages. It’s easy to forget that. They’re just names that point to objects, in this case an integer on each iteration.


回答 5

我发现各种答案确实很不错(尤其是Alex Martelli的答案),但是我想直接量化性能,因此我编写了以下脚本:

from itertools import repeat
N = 10000000

def payload(a):
    pass

def standard(N):
    for x in range(N):
        payload(None)

def underscore(N):
    for _ in range(N):
        payload(None)

def loopiter(N):
    for _ in repeat(None, N):
        payload(None)

def loopiter2(N):
    for _ in map(payload, repeat(None, N)):
        pass

if __name__ == '__main__':
    import timeit
    print("standard: ",timeit.timeit("standard({})".format(N),
        setup="from __main__ import standard", number=1))
    print("underscore: ",timeit.timeit("underscore({})".format(N),
        setup="from __main__ import underscore", number=1))
    print("loopiter: ",timeit.timeit("loopiter({})".format(N),
        setup="from __main__ import loopiter", number=1))
    print("loopiter2: ",timeit.timeit("loopiter2({})".format(N),
        setup="from __main__ import loopiter2", number=1))

我还提出了另一种解决方案,该解决方案基于Martelli的解决方案,并用于map()调用有效负载函数。好的,我有点作弊,因为我可以自由地使有效负载接受被丢弃的参数:我不知道是否有解决方法。不过,结果如下:

standard:  0.8398549720004667
underscore:  0.8413165839992871
loopiter:  0.7110594899968419
loopiter2:  0.5891903560004721

因此使用map会比loop的标准提高约30%,比Martelli的标准提高19%。

I found the various answers really elegant (especially Alex Martelli’s) but I wanted to quantify performance first hand, so I cooked up the following script:

from itertools import repeat
N = 10000000

def payload(a):
    pass

def standard(N):
    for x in range(N):
        payload(None)

def underscore(N):
    for _ in range(N):
        payload(None)

def loopiter(N):
    for _ in repeat(None, N):
        payload(None)

def loopiter2(N):
    for _ in map(payload, repeat(None, N)):
        pass

if __name__ == '__main__':
    import timeit
    print("standard: ",timeit.timeit("standard({})".format(N),
        setup="from __main__ import standard", number=1))
    print("underscore: ",timeit.timeit("underscore({})".format(N),
        setup="from __main__ import underscore", number=1))
    print("loopiter: ",timeit.timeit("loopiter({})".format(N),
        setup="from __main__ import loopiter", number=1))
    print("loopiter2: ",timeit.timeit("loopiter2({})".format(N),
        setup="from __main__ import loopiter2", number=1))

I also came up with an alternative solution that builds on Martelli’s one and uses map() to call the payload function. OK, I cheated a bit in that I took the freedom of making the payload accept a parameter that gets discarded: I don’t know if there is a way around this. Nevertheless, here are the results:

standard:  0.8398549720004667
underscore:  0.8413165839992871
loopiter:  0.7110594899968419
loopiter2:  0.5891903560004721

so using map yields an improvement of approximately 30% over the standard for loop and an extra 19% over Martelli’s.


回答 6

假设您已将do_something定义为一个函数,并且您想执行N次。也许您可以尝试以下方法:

todos = [do_something] * N  
for doit in todos:  
    doit()

Assume that you’ve defined do_something as a function, and you’d like to perform it N times. Maybe you can try the following:

todos = [do_something] * N  
for doit in todos:  
    doit()

回答 7

一个简单的while循环呢?

while times > 0:
    do_something()
    times -= 1

您已经有了变量;为什么不使用它?

What about a simple while loop?

while times > 0:
    do_something()
    times -= 1

You already have the variable; why not use it?


如果仅功能B需要功能A,是否应在B内部定义A?[关闭]

问题:如果仅功能B需要功能A,是否应在B内部定义A?[关闭]

简单的例子。两种方法,一种从另一种调用:

def method_a(arg):
    some_data = method_b(arg)

def method_b(arg):
    return some_data

在Python中,我们可以def在另一个内部声明def。因此,如果method_b只需要从中调用method_a,我应该method_b在内部声明method_a吗?像这样 :

def method_a(arg):

    def method_b(arg):
        return some_data

    some_data = method_b(arg)

还是应该避免这样做?

Let’s say that a function A is required only by function B, should A be defined inside B?

Simple example. Two methods, one called from another:

def method_a(arg):
    some_data = method_b(arg)

def method_b(arg):
    return some_data

In Python we can declare def inside another def. So, if method_b is required for and called only from method_a, should I declare method_b inside method_a? like this :

def method_a(arg):
    
    def method_b(arg):
        return some_data

    some_data = method_b(arg)

Or should I avoid doing this?


回答 0

>>> def sum(x, y):
...     def do_it():
...             return x + y
...     return do_it
... 
>>> a = sum(1, 3)
>>> a
<function do_it at 0xb772b304>
>>> a()
4

这是您要找的东西吗?这叫做闭包

>>> def sum(x, y):
...     def do_it():
...             return x + y
...     return do_it
... 
>>> a = sum(1, 3)
>>> a
<function do_it at 0xb772b304>
>>> a()
4

Is this what you were looking for? It’s called a closure.


回答 1

通过这样做,您并没有真正获得太多收益,实际上,它会减慢速度method_a,因为它会在每次调用时定义并重新编译另一个函数。鉴于此,最好在函数名称前加下划线以表明它是私有方法,即_method_b

我想如果嵌套函数的定义由于某种原因每次都发生变化,那么您可能想这样做,但这可能表明您的设计存在缺陷。这就是说,有一个有效的理由这样做,允许嵌套函数使用传递给外部函数,但没有明确传递给他们,这写函数装饰器时,例如有时会发生参数。尽管未定义或使用装饰器,但这仍在接受的答案中显示。

更新:

这里证明了嵌套它们的速度较慢(使用Python 3.6.1),尽管在这种琐碎的情况下公认的嵌套并不多:

setup = """
class Test(object):
    def separate(self, arg):
        some_data = self._method_b(arg)

    def _method_b(self, arg):
        return arg+1

    def nested(self, arg):

        def method_b2(self, arg):
            return arg+1

        some_data = method_b2(self, arg)

obj = Test()
"""
from timeit import Timer
print(min(Timer(stmt='obj.separate(42)', setup=setup).repeat()))  # -> 0.24479823284461724
print(min(Timer(stmt='obj.nested(42)', setup=setup).repeat()))    # -> 0.26553459700452575

注意,我self在示例函数中添加了一些参数,以使它们更像真实的方法(尽管从method_b2技术上讲,它仍然不是Test类的方法)。与您的版本不同,嵌套函数实际上也在该版本中被调用。

You don’t really gain much by doing this, in fact it slows method_a down because it’ll define and recompile the other function every time it’s called. Given that, it would probably be better to just prefix the function name with underscore to indicate it’s a private method — i.e. _method_b.

I suppose you might want to do this if the nested function’s definition varied each time for some reason, but that may indicate a flaw in your design. That said, there is a valid reason to do this to allow the nested function to use arguments that were passed to the outer function but not explicitly passed on to them, which sometimes occurs when writing function decorators, for example. It’s what is being shown in the accepted answer although a decorator is not being defined or used.

Update:

Here’s proof that nesting them is slower (using Python 3.6.1), although admittedly not by much in this trivial case:

setup = """
class Test(object):
    def separate(self, arg):
        some_data = self._method_b(arg)

    def _method_b(self, arg):
        return arg+1

    def nested(self, arg):

        def method_b2(self, arg):
            return arg+1

        some_data = method_b2(self, arg)

obj = Test()
"""
from timeit import Timer
print(min(Timer(stmt='obj.separate(42)', setup=setup).repeat()))  # -> 0.24479823284461724
print(min(Timer(stmt='obj.nested(42)', setup=setup).repeat()))    # -> 0.26553459700452575

Note I added some self arguments to your sample functions to make them more like real methods (although method_b2 still isn’t technically a method of the Test class). Also the nested function is actually called in that version, unlike yours.


回答 2

函数内部的函数通常用于闭包

(有一个很大的竞争究竟是什么使一个封闭的封闭。)

这是使用内置的示例sum()。它定义start一次并从此开始使用:

def sum_partial(start):
    def sum_start(iterable):
        return sum(iterable, start)
    return sum_start

正在使用:

>>> sum_with_1 = sum_partial(1)
>>> sum_with_3 = sum_partial(3)
>>> 
>>> sum_with_1
<function sum_start at 0x7f3726e70b90>
>>> sum_with_3
<function sum_start at 0x7f3726e70c08>
>>> sum_with_1((1,2,3))
7
>>> sum_with_3((1,2,3))
9

内置python闭包

functools.partial 是关闭的示例。

从python docs来看,它大致等同于:

def partial(func, *args, **keywords):
    def newfunc(*fargs, **fkeywords):
        newkeywords = keywords.copy()
        newkeywords.update(fkeywords)
        return func(*(args + fargs), **newkeywords)
    newfunc.func = func
    newfunc.args = args
    newfunc.keywords = keywords
    return newfunc

(对于下面的答案,@ user225312表示敬意。我发现此示例更容易理解,希望可以帮助回答@mango的评论。)

A function inside of a function is commonly used for closures.

(There is a lot of contention over what exactly makes a closure a closure.)

Here’s an example using the built-in sum(). It defines start once and uses it from then on:

def sum_partial(start):
    def sum_start(iterable):
        return sum(iterable, start)
    return sum_start

In use:

>>> sum_with_1 = sum_partial(1)
>>> sum_with_3 = sum_partial(3)
>>> 
>>> sum_with_1
<function sum_start at 0x7f3726e70b90>
>>> sum_with_3
<function sum_start at 0x7f3726e70c08>
>>> sum_with_1((1,2,3))
7
>>> sum_with_3((1,2,3))
9

Built-in python closure

functools.partial is an example of a closure.

From the python docs, it’s roughly equivalent to:

def partial(func, *args, **keywords):
    def newfunc(*fargs, **fkeywords):
        newkeywords = keywords.copy()
        newkeywords.update(fkeywords)
        return func(*(args + fargs), **newkeywords)
    newfunc.func = func
    newfunc.args = args
    newfunc.keywords = keywords
    return newfunc

(Kudos to @user225312 below for the answer. I find this example easier to figure out, and hopefully will help answer @mango’s comment.)


回答 3

通常,不,不要在函数内部定义函数。

除非您有充分的理由。你不知道

为什么不?

在函数内部定义函数的真正好的理由什么?

当您真正想要的是当当网

Generally, no, do not define functions inside functions.

Unless you have a really good reason. Which you don’t.

Why not?

What is a really good reason to define functions inside functions?

When what you actually want is a dingdang closure.


回答 4

在另一个函数中声明一个函数实际上很好。这在创建装饰器时特别有用。

但是,根据经验,如果函数很复杂(超过10行),则最好在模块级别上声明它。

It’s actually fine to declare one function inside another one. This is specially useful creating decorators.

However, as a rule of thumb, if the function is complex (more than 10 lines) it might be a better idea to declare it on the module level.


回答 5

我找到了这个问题,因为我想提出一个问题,如果使用嵌套函数,为什么会对性能产生影响。我在带有四核2.5 GHz Intel i5-2530M处理器的Windows笔记本上使用Python 3.2.5运行了以下功能的测试

def square0(x):
    return x*x

def square1(x):
    def dummy(y):
        return y*y
    return x*x

def square2(x):
    def dummy1(y):
        return y*y
    def dummy2(y):
        return y*y
    return x*x

def square5(x):
    def dummy1(y):
        return y*y
    def dummy2(y):
        return y*y
    def dummy3(y):
        return y*y
    def dummy4(y):
        return y*y
    def dummy5(y):
        return y*y
    return x*x

我对平方1,平方2和平方5进行了以下20次测量:

s=0
for i in range(10**6):
    s+=square0(i)

并得到以下结果

>>> 
m = mean, s = standard deviation, m0 = mean of first testcase
[m-3s,m+3s] is a 0.997 confidence interval if normal distributed

square? m     s       m/m0  [m-3s ,m+3s ]
square0 0.387 0.01515 1.000 [0.342,0.433]
square1 0.460 0.01422 1.188 [0.417,0.503]
square2 0.552 0.01803 1.425 [0.498,0.606]
square5 0.766 0.01654 1.979 [0.717,0.816]
>>> 

square0没有嵌套函数,square1具有一个嵌套函数,square2具有两个嵌套函数和square5五个嵌套函数。嵌套函数仅声明而不被调用。

因此,如果您在未调用的函数中定义了5个嵌套函数,则该函数的执行时间是没有嵌套函数的函数的两倍。我认为使用嵌套函数时应谨慎。

可以在ideone上找到生成此输出的整个测试的Python文件。

I found this question because I wanted to pose a question why there is a performance impact if one uses nested functions. I ran tests for the following functions using Python 3.2.5 on a Windows Notebook with a Quad Core 2.5 GHz Intel i5-2530M processor

def square0(x):
    return x*x

def square1(x):
    def dummy(y):
        return y*y
    return x*x

def square2(x):
    def dummy1(y):
        return y*y
    def dummy2(y):
        return y*y
    return x*x

def square5(x):
    def dummy1(y):
        return y*y
    def dummy2(y):
        return y*y
    def dummy3(y):
        return y*y
    def dummy4(y):
        return y*y
    def dummy5(y):
        return y*y
    return x*x

I measured the following 20 times, also for square1, square2, and square5:

s=0
for i in range(10**6):
    s+=square0(i)

and got the following results

>>> 
m = mean, s = standard deviation, m0 = mean of first testcase
[m-3s,m+3s] is a 0.997 confidence interval if normal distributed

square? m     s       m/m0  [m-3s ,m+3s ]
square0 0.387 0.01515 1.000 [0.342,0.433]
square1 0.460 0.01422 1.188 [0.417,0.503]
square2 0.552 0.01803 1.425 [0.498,0.606]
square5 0.766 0.01654 1.979 [0.717,0.816]
>>> 

square0 has no nested function, square1 has one nested function, square2 has two nested functions and square5 has five nested functions. The nested functions are only declared but not called.

So if you have defined 5 nested funtions in a function that you don’t call then the execution time of the function is twice of the function without a nested function. I think should be cautious when using nested functions.

The Python file for the whole test that generates this output can be found at ideone.


回答 6

这只是有关暴露API的原则。

使用python,避免在外部空间(模块或类)中暴露API是一个好主意,函数是一个很好的封装位置。

这可能是一个好主意。当你确保

  1. 内部函数由外部函数使用。
  2. 内部函数具有很好的名称来解释其用途,因为代码可以说明。
  3. 代码无法被您的同事(或其他代码阅读器)直接理解。

即使滥用此技术也可能会引起问题并暗示设计缺陷。

仅根据我的经验,也许会误解您的问题。

It’s just a principle about exposure APIs.

Using python, It’s a good idea to avoid exposure API in outer space(module or class), function is a good encapsulation place.

It could be a good idea. when you ensure

  1. inner function is ONLY used by outer function.
  2. insider function has a good name to explain its purpose because the code talks.
  3. code cannot directly understand by your colleagues(or other code-reader).

Even though, Abuse this technique may cause problems and implies a design flaw.

Just from my exp, Maybe misunderstand your question.


回答 7

因此,最后主要是一个关于python实现有多聪明的问题,尤其是在内部函数不是闭包而只是in函数仅需要帮助器的情况下。

在清晰易懂的设计中,仅将功能放在需要的地方,而不在其他地方公开才是好的设计,无论它们是嵌入在模块,方法类中还是在另一个函数或方法中。如果做得好,它们确实可以提高代码的清晰度。

并且当内部函数是一个闭包时,即使该函数没有从包含函数中返回以供其他地方使用,它也可以大大提高清晰度。

因此,我想通常会使用它们,但要在您真正关心性能时注意性能可能受到的影响,并且只有在进行实际性能分析后最好将它们删除才能删除它们。

不要在编写的所有python代码中仅使用“内部函数BAD”进行过早的优化。请。

So in the end it is largely a question about how smart the python implementation is or is not, particularly in the case of the inner function not being a closure but simply an in function needed helper only.

In clean understandable design having functions only where they are needed and not exposed elsewhere is good design whether they be embedded in a module, a class as a method, or inside another function or method. When done well they really improve the clarity of the code.

And when the inner function is a closure that can also help with clarity quite a bit even if that function is not returned out of the containing function for use elsewhere.

So I would say generally do use them but be aware of the possible performance hit when you actually are concerned about performance and only remove them if you do actual profiling that shows they best be removed.

Do not do premature optimization of just using “inner functions BAD” throughout all python code you write. Please.


回答 8

这样做是完全可以的,但是除非您需要使用闭包或返回我可能放在模块级别的函数,否则就不要这样做。我想在第二个代码示例中,您的意思是:

...
some_data = method_b() # not some_data = method_b

否则,some_data将成为函数。

在模块级别拥有它会允许其他函数使用method_b(),如果您使用Sphinx(和autodoc)之类的文档进行记录,那么它也将允许您记录method_b。

如果您正在执行某个对象可以表示的操作,则可能还需要考虑将功能放在类的两个方法中。如果这就是您要查找的内容,那么它也包含了逻辑。

It’s perfectly OK doing it that way, but unless you need to use a closure or return the function I’d probably put in the module level. I imagine in the second code example you mean:

...
some_data = method_b() # not some_data = method_b

otherwise, some_data will be the function.

Having it at the module level will allow other functions to use method_b() and if you’re using something like Sphinx (and autodoc) for documentation, it will allow you to document method_b as well.

You also may want to consider just putting the functionality in two methods in a class if you’re doing something that can be representable by an object. This contains logic well too if that’s all you’re looking for.


回答 9

做类似的事情:

def some_function():
    return some_other_function()
def some_other_function():
    return 42 

如果要运行some_function(),它将运行some_other_function()并返回42。

编辑:我最初说过,您不应该在另一个函数内部定义一个函数,但有人指出,有时这样做是很实际的。

Do something like:

def some_function():
    return some_other_function()
def some_other_function():
    return 42 

if you were to run some_function() it would then run some_other_function() and returns 42.

EDIT: I originally stated that you shouldn’t define a function inside of another but it was pointed out that it is practical to do this sometimes.


回答 10

您可以使用它来避免定义全局变量。这为您提供了其他设计的替代方案。提供解决方案的3种设计。

A)使用没有全局变量的函数

def calculate_salary(employee, list_with_all_employees):
    x = _calculate_tax(list_with_all_employees)

    # some other calculations done to x
    pass

    y = # something 

    return y

def _calculate_tax(list_with_all_employees):
    return 1.23456 # return something

B)在全局函数中使用函数

_list_with_all_employees = None

def calculate_salary(employee, list_with_all_employees):

    global _list_with_all_employees
    _list_with_all_employees = list_with_all_employees

    x = _calculate_tax()

    # some other calculations done to x
    pass

    y = # something

    return y

def _calculate_tax():
    return 1.23456 # return something based on the _list_with_all_employees var

C)在另一个函数中使用函数

def calculate_salary(employee, list_with_all_employees):

    def _calculate_tax():
        return 1.23456 # return something based on the list_with_a--Lemployees var

    x = _calculate_tax()

    # some other calculations done to x
    pass
    y = # something 

    return y

解决方案C)允许在外部函数范围内使用变量,而无需在内部函数中声明它们。在某些情况下可能有用。

You can use it to avoid defining global variables. This gives you an alternative for other designs. 3 designs presenting a solution to a problem.

A) Using functions without globals

def calculate_salary(employee, list_with_all_employees):
    x = _calculate_tax(list_with_all_employees)

    # some other calculations done to x
    pass

    y = # something 

    return y

def _calculate_tax(list_with_all_employees):
    return 1.23456 # return something

B) Using functions with globals

_list_with_all_employees = None

def calculate_salary(employee, list_with_all_employees):

    global _list_with_all_employees
    _list_with_all_employees = list_with_all_employees

    x = _calculate_tax()

    # some other calculations done to x
    pass

    y = # something

    return y

def _calculate_tax():
    return 1.23456 # return something based on the _list_with_all_employees var

C) Using functions inside another function

def calculate_salary(employee, list_with_all_employees):

    def _calculate_tax():
        return 1.23456 # return something based on the list_with_a--Lemployees var

    x = _calculate_tax()

    # some other calculations done to x
    pass
    y = # something 

    return y

Solution C) allows to use variables in the scope of the outer function without having the need to declare them in the inner function. Might be useful in some situations.


回答 11

函数在函数python中

def Greater(a,b):
    if a>b:
        return a
    return b

def Greater_new(a,b,c,d):
    return Greater(Greater(a,b),Greater(c,d))

print("Greater Number is :-",Greater_new(212,33,11,999))

Function In function python

def Greater(a,b):
    if a>b:
        return a
    return b

def Greater_new(a,b,c,d):
    return Greater(Greater(a,b),Greater(c,d))

print("Greater Number is :-",Greater_new(212,33,11,999))

我应该使用`import os.path`还是`import os`?

问题:我应该使用`import os.path`还是`import os`?

根据官方文档os.path是一个模块。因此,导入它的首选方式是什么?

# Should I always import it explicitly?
import os.path

要么…

# Is importing os enough?
import os

请不要回答“ os为我导入作品”。我知道,它现在也对我有效(自python 2.6起)。我想知道的是有关此问题的任何官方建议。因此,如果您回答这个问题,请发表您的参考资料

According to the official documentation, os.path is a module. Thus, what is the preferred way of importing it?

# Should I always import it explicitly?
import os.path

Or…

# Is importing os enough?
import os

Please DON’T answer “importing os works for me”. I know, it works for me too right now (as of Python 2.6). What I want to know is any official recommendation about this issue. So, if you answer this question, please post your references.


回答 0

os.path以一种有趣的方式工作。看起来os应该是带有子模块的程序包path,但实际上os是一个普通的模块,sys.modules可以注入魔力os.path。这是发生了什么:

  • Python启动时,会将一堆模块加载到中sys.modules。它们没有绑定到脚本中的任何名称,但是以某种方式导入它们时,您可以访问已创建的模块。

    • sys.modules是在其中缓存模块的命令。导入模块时,如果已经将其导入到某处,则它将实例存储在中sys.modules
  • os是Python启动时加载的模块之一。它将其path属性分配给特定于os的路径模块。

  • 它会注入,sys.modules['os.path'] = path以便您可以像对待import os.path子模块一样进行“ ”操作。

我倾向于将其os.path看作是我要使用os模块,而不是模块中的任何东西,因此,即使它实际上不是被称为包的子模块os,我也可以像导入一个一样来导入它,并且我总是这样做import os.path。这与os.path记录方式一致。


顺便说一句,我认为这种结构导致很多Python程序员对模块和包以及代码组织产生了早期的困惑。这确实有两个原因

  1. 如果您将其os视为一个包并且知道可以执行import os并有权访问该子模块os.path,则稍后可能会感到惊讶,因为您无法执行import twisted并且twisted.spread无需导入即可自动访问。

  2. 令人困惑的是,这os.name是正常现象,字符串和os.path模块。我总是用空__init__.py文件来构造我的包,以便在同一级别上我总是有一种类型的东西:模块/包或其他东西。几个大型的Python项目都采用这种方法,这往往会使代码更加结构化。

os.path works in a funny way. It looks like os should be a package with a submodule path, but in reality os is a normal module that does magic with sys.modules to inject os.path. Here’s what happens:

  • When Python starts up, it loads a bunch of modules into sys.modules. They aren’t bound to any names in your script, but you can access the already-created modules when you import them in some way.

    • sys.modules is a dict in which modules are cached. When you import a module, if it already has been imported somewhere, it gets the instance stored in sys.modules.
  • os is among the modules that are loaded when Python starts up. It assigns its path attribute to an os-specific path module.

  • It injects sys.modules['os.path'] = path so that you’re able to do “import os.path” as though it was a submodule.

I tend to think of os.path as a module I want to use rather than a thing in the os module, so even though it’s not really a submodule of a package called os, I import it sort of like it is one and I always do import os.path. This is consistent with how os.path is documented.


Incidentally, this sort of structure leads to a lot of Python programmers’ early confusion about modules and packages and code organization, I think. This is really for two reasons

  1. If you think of os as a package and know that you can do import os and have access to the submodule os.path, you may be surprised later when you can’t do import twisted and automatically access twisted.spread without importing it.

  2. It is confusing that os.name is a normal thing, a string, and os.path is a module. I always structure my packages with empty __init__.py files so that at the same level I always have one type of thing: a module/package or other stuff. Several big Python projects take this approach, which tends to make more structured code.


回答 1

根据蒂姆·彼得斯(Tim Peters)撰写的PEP-20,“显式胜于隐式”和“可读性”。如果您需要从os模块中获得的全部在之下os.pathimport os.path则会更加明确,并让其他人知道您真正关心的是什么。

同样,PEP-20也说“简单胜于复杂”,因此,如果您还需要一些更笼统的资料osimport os则首选。

As per PEP-20 by Tim Peters, “Explicit is better than implicit” and “Readability counts”. If all you need from the os module is under os.path, import os.path would be more explicit and let others know what you really care about.

Likewise, PEP-20 also says “Simple is better than complex”, so if you also need stuff that resides under the more-general os umbrella, import os would be preferred.


回答 2

最终答案:import os并使用os.path。不要import os.path直接。

从模块本身的文档中:

>>> import os
>>> help(os.path)
...
Instead of importing this module directly, import os and refer to
this module as os.path.  The "os.path" name is an alias for this
module on Posix systems; on other systems (e.g. Mac, Windows),
os.path provides the same operations in a manner specific to that
platform, and is an alias to another module (e.g. macpath, ntpath).
...

Definitive answer: import os and use os.path. do not import os.path directly.

From the documentation of the module itself:

>>> import os
>>> help(os.path)
...
Instead of importing this module directly, import os and refer to
this module as os.path.  The "os.path" name is an alias for this
module on Posix systems; on other systems (e.g. Mac, Windows),
os.path provides the same operations in a manner specific to that
platform, and is an alias to another module (e.g. macpath, ntpath).
...

回答 3

有趣的是,导入os.path将导入所有os。在交互式提示中尝试以下操作:

import os.path
dir(os)

结果与导入os相同。这是因为os.path将基于您拥有的操作系​​统引用不同的模块,因此python将导入os以确定要为路径加载哪个模块。

参考

对于某些模块,说import foo不会暴露foo.bar,所以我猜它确实取决于特定模块的设计。


通常,仅导入所需的显式模块应略快一些。在我的机器上:

import os.path 7.54285810068e-06

import os 9.21904878972e-06

这些时间非常接近,可以忽略不计。您的程序可能os现在或以后需要使用其他模块,因此通常只牺牲两个微秒并import os在以后避免该错误就可以了。我通常只将os整体导入,但是可以看到为什么有些人希望import os.path从技术上提高效率,并向代码读者传达这是os需要使用的模块的唯一部分。在我看来,它本质上可以归结为样式问题。

Interestingly enough, importing os.path will import all of os. try the following in the interactive prompt:

import os.path
dir(os)

The result will be the same as if you just imported os. This is because os.path will refer to a different module based on which operating system you have, so python will import os to determine which module to load for path.

reference

With some modules, saying import foo will not expose foo.bar, so I guess it really depends the design of the specific module.


In general, just importing the explicit modules you need should be marginally faster. On my machine:

import os.path: 7.54285810068e-06 seconds

import os: 9.21904878972e-06 seconds

These times are close enough to be fairly negligible. Your program may need to use other modules from os either now or at a later time, so usually it makes sense just to sacrifice the two microseconds and use import os to avoid this error at a later time. I usually side with just importing os as a whole, but can see why some would prefer import os.path to technically be more efficient and convey to readers of the code that that is the only part of the os module that will need to be used. It essentially boils down to a style question in my mind.


回答 4

常识在这里起作用:os是模块,os.path也是模块。因此,只需导入要使用的模块:

  • 如果要在os模块中使用功能,请导入os

  • 如果要在os.path模块中使用功能,请导入os.path

  • 如果要在两个模块中使用功能,则导入两个模块:

    import os
    import os.path

以供参考:

Common sense works here: os is a module, and os.path is a module, too. So just import the module you want to use:

  • If you want to use functionalities in the os module, then import os.

  • If you want to use functionalities in the os.path module, then import os.path.

  • If you want to use functionalities in both modules, then import both modules:

    import os
    import os.path
    

For reference:


回答 5

找不到任何明确的引用,但我看到os.walk的示例代码使用os.path,但仅导入os

Couldn’t find any definitive reference, but I see that the example code for os.walk uses os.path but only imports os


字典和默认值

问题:字典和默认值

假设connectionDetails是Python字典,那么像这样的重构代码的最佳,最优雅,最“ pythonic”的方法是什么?

if "host" in connectionDetails:
    host = connectionDetails["host"]
else:
    host = someDefaultValue

Assuming connectionDetails is a Python dictionary, what’s the best, most elegant, most “pythonic” way of refactoring code like this?

if "host" in connectionDetails:
    host = connectionDetails["host"]
else:
    host = someDefaultValue

回答 0

像这样:

host = connectionDetails.get('host', someDefaultValue)

Like this:

host = connectionDetails.get('host', someDefaultValue)

回答 1

您也可以这样使用defaultdict

from collections import defaultdict
a = defaultdict(lambda: "default", key="some_value")
a["blabla"] => "default"
a["key"] => "some_value"

您可以传递任何普通函数而不是lambda:

from collections import defaultdict
def a():
  return 4

b = defaultdict(a, key="some_value")
b['absent'] => 4
b['key'] => "some_value"

You can also use the defaultdict like so:

from collections import defaultdict
a = defaultdict(lambda: "default", key="some_value")
a["blabla"] => "default"
a["key"] => "some_value"

You can pass any ordinary function instead of lambda:

from collections import defaultdict
def a():
  return 4

b = defaultdict(a, key="some_value")
b['absent'] => 4
b['key'] => "some_value"

回答 2

虽然这.get()是一个很好的习惯用法,但是它比if/else(比try/except大多数情况下可以预期字典中键的存在要慢):

>>> timeit.timeit(setup="d={1:2, 3:4, 5:6, 7:8, 9:0}", 
... stmt="try:\n a=d[1]\nexcept KeyError:\n a=10")
0.07691968797894333
>>> timeit.timeit(setup="d={1:2, 3:4, 5:6, 7:8, 9:0}", 
... stmt="try:\n a=d[2]\nexcept KeyError:\n a=10")
0.4583777282275605
>>> timeit.timeit(setup="d={1:2, 3:4, 5:6, 7:8, 9:0}", 
... stmt="a=d.get(1, 10)")
0.17784020746671558
>>> timeit.timeit(setup="d={1:2, 3:4, 5:6, 7:8, 9:0}", 
... stmt="a=d.get(2, 10)")
0.17952161730158878
>>> timeit.timeit(setup="d={1:2, 3:4, 5:6, 7:8, 9:0}", 
... stmt="if 1 in d:\n a=d[1]\nelse:\n a=10")
0.10071221458065338
>>> timeit.timeit(setup="d={1:2, 3:4, 5:6, 7:8, 9:0}", 
... stmt="if 2 in d:\n a=d[2]\nelse:\n a=10")
0.06966537335119938

While .get() is a nice idiom, it’s slower than if/else (and slower than try/except if presence of the key in the dictionary can be expected most of the time):

>>> timeit.timeit(setup="d={1:2, 3:4, 5:6, 7:8, 9:0}", 
... stmt="try:\n a=d[1]\nexcept KeyError:\n a=10")
0.07691968797894333
>>> timeit.timeit(setup="d={1:2, 3:4, 5:6, 7:8, 9:0}", 
... stmt="try:\n a=d[2]\nexcept KeyError:\n a=10")
0.4583777282275605
>>> timeit.timeit(setup="d={1:2, 3:4, 5:6, 7:8, 9:0}", 
... stmt="a=d.get(1, 10)")
0.17784020746671558
>>> timeit.timeit(setup="d={1:2, 3:4, 5:6, 7:8, 9:0}", 
... stmt="a=d.get(2, 10)")
0.17952161730158878
>>> timeit.timeit(setup="d={1:2, 3:4, 5:6, 7:8, 9:0}", 
... stmt="if 1 in d:\n a=d[1]\nelse:\n a=10")
0.10071221458065338
>>> timeit.timeit(setup="d={1:2, 3:4, 5:6, 7:8, 9:0}", 
... stmt="if 2 in d:\n a=d[2]\nelse:\n a=10")
0.06966537335119938

回答 3

对于多个不同的默认值,请尝试以下操作:

connectionDetails = { "host": "www.example.com" }
defaults = { "host": "127.0.0.1", "port": 8080 }

completeDetails = {}
completeDetails.update(defaults)
completeDetails.update(connectionDetails)
completeDetails["host"]  # ==> "www.example.com"
completeDetails["port"]  # ==> 8080

For multiple different defaults try this:

connectionDetails = { "host": "www.example.com" }
defaults = { "host": "127.0.0.1", "port": 8080 }

completeDetails = {}
completeDetails.update(defaults)
completeDetails.update(connectionDetails)
completeDetails["host"]  # ==> "www.example.com"
completeDetails["port"]  # ==> 8080

回答 4

python词典中有一个方法可以做到这一点: dict.setdefault

connectionDetails.setdefault('host',someDefaultValue)
host = connectionDetails['host']

但是,与问题所要求的不同,此方法将if 的值设置connectionDetails['host']someDefaultValueif host尚未定义。

There is a method in python dictionaries to do this: dict.setdefault

connectionDetails.setdefault('host',someDefaultValue)
host = connectionDetails['host']

However this method sets the value of connectionDetails['host'] to someDefaultValue if key host is not already defined, unlike what the question asked.


回答 5

(这是一个很晚的答案)

一种替代方法是对类进行子dict类化并实现__missing__()方法,如下所示:

class ConnectionDetails(dict):
    def __missing__(self, key):
        if key == 'host':
            return "localhost"
        raise KeyError(key)

例子:

>>> connection_details = ConnectionDetails(port=80)

>>> connection_details['host']
'localhost'

>>> connection_details['port']
80

>>> connection_details['password']
Traceback (most recent call last):
  File "python", line 1, in <module>
  File "python", line 6, in __missing__
KeyError: 'password'

(this is a late answer)

An alternative is to subclass the dict class and implement the __missing__() method, like this:

class ConnectionDetails(dict):
    def __missing__(self, key):
        if key == 'host':
            return "localhost"
        raise KeyError(key)

Examples:

>>> connection_details = ConnectionDetails(port=80)

>>> connection_details['host']
'localhost'

>>> connection_details['port']
80

>>> connection_details['password']
Traceback (most recent call last):
  File "python", line 1, in <module>
  File "python", line 6, in __missing__
KeyError: 'password'

回答 6

测试@Tim Pietzcker对Python 3.3.5的PyPy(5.2.0-alpha0)情况的怀疑,我发现确实两者.get()if/ else方式的执行情况相似。实际上,在if / else情况下,如果条件和赋值涉及相同的键,则似乎只有一次查找(与最后一次有两次查找的情况比较)。

>>>> timeit.timeit(setup="d={1:2, 3:4, 5:6, 7:8, 9:0}",
.... stmt="try:\n a=d[1]\nexcept KeyError:\n a=10")
0.011889292989508249
>>>> timeit.timeit(setup="d={1:2, 3:4, 5:6, 7:8, 9:0}",
.... stmt="try:\n a=d[2]\nexcept KeyError:\n a=10")
0.07310474599944428
>>>> timeit.timeit(setup="d={1:2, 3:4, 5:6, 7:8, 9:0}",
.... stmt="a=d.get(1, 10)")
0.010391917996457778
>>>> timeit.timeit(setup="d={1:2, 3:4, 5:6, 7:8, 9:0}",
.... stmt="a=d.get(2, 10)")
0.009348208011942916
>>>> timeit.timeit(setup="d={1:2, 3:4, 5:6, 7:8, 9:0}",
.... stmt="if 1 in d:\n a=d[1]\nelse:\n a=10")
0.011475925013655797
>>>> timeit.timeit(setup="d={1:2, 3:4, 5:6, 7:8, 9:0}",
.... stmt="if 2 in d:\n a=d[2]\nelse:\n a=10")
0.009605801998986863
>>>> timeit.timeit(setup="d={1:2, 3:4, 5:6, 7:8, 9:0}",
.... stmt="if 2 in d:\n a=d[2]\nelse:\n a=d[1]")
0.017342638995614834

Testing @Tim Pietzcker’s suspicion about the situation in PyPy (5.2.0-alpha0) for Python 3.3.5, I find that indeed both .get() and the if/else way perform similar. Actually it seems that in the if/else case there is even only a single lookup if the condition and the assignment involve the same key (compare with the last case where there is two lookups).

>>>> timeit.timeit(setup="d={1:2, 3:4, 5:6, 7:8, 9:0}",
.... stmt="try:\n a=d[1]\nexcept KeyError:\n a=10")
0.011889292989508249
>>>> timeit.timeit(setup="d={1:2, 3:4, 5:6, 7:8, 9:0}",
.... stmt="try:\n a=d[2]\nexcept KeyError:\n a=10")
0.07310474599944428
>>>> timeit.timeit(setup="d={1:2, 3:4, 5:6, 7:8, 9:0}",
.... stmt="a=d.get(1, 10)")
0.010391917996457778
>>>> timeit.timeit(setup="d={1:2, 3:4, 5:6, 7:8, 9:0}",
.... stmt="a=d.get(2, 10)")
0.009348208011942916
>>>> timeit.timeit(setup="d={1:2, 3:4, 5:6, 7:8, 9:0}",
.... stmt="if 1 in d:\n a=d[1]\nelse:\n a=10")
0.011475925013655797
>>>> timeit.timeit(setup="d={1:2, 3:4, 5:6, 7:8, 9:0}",
.... stmt="if 2 in d:\n a=d[2]\nelse:\n a=10")
0.009605801998986863
>>>> timeit.timeit(setup="d={1:2, 3:4, 5:6, 7:8, 9:0}",
.... stmt="if 2 in d:\n a=d[2]\nelse:\n a=d[1]")
0.017342638995614834

回答 7

您可以将lamba函数用作单线。制作一个connectionDetails2可以像函数一样访问的新对象 …

connectionDetails2 = lambda k: connectionDetails[k] if k in connectionDetails.keys() else "DEFAULT"

现在使用

connectionDetails2(k)

代替

connectionDetails[k]

如果k在键中,则返回字典值,否则返回"DEFAULT"

You can use a lamba function for this as a one-liner. Make a new object connectionDetails2 which is accessed like a function…

connectionDetails2 = lambda k: connectionDetails[k] if k in connectionDetails.keys() else "DEFAULT"

Now use

connectionDetails2(k)

instead of

connectionDetails[k]

which returns the dictionary value if k is in the keys, otherwise it returns "DEFAULT"


在外部作用域中定义阴影名称有多糟糕?

问题:在外部作用域中定义阴影名称有多糟糕?

我刚刚切换到Pycharm,对所有警告和提示它为我提供了改进我的代码感到非常高兴。除了我不了解的那一项:

This inspection detects shadowing names defined in outer scopes.

我知道从外部作用域访问变量是一种不好的做法,但是隐藏外部作用域有什么问题呢?

这是一个示例,其中Pycharm给我警告消息:

data = [4, 5, 6]

def print_data(data): # <-- Warning: "Shadows 'data' from outer scope
    print data

print_data(data)

I just switched to Pycharm and I am very happy about all the warnings and hints it provides me to improve my code. Except for this one which I don’t understand:

This inspection detects shadowing names defined in outer scopes.

I know it is bad practice to access variable from the outer scope but what is the problem with shadowing the outer scope?

Here is one example, where Pycharm gives me the warning message:

data = [4, 5, 6]

def print_data(data): # <-- Warning: "Shadows 'data' from outer scope
    print data

print_data(data)

回答 0

在上面的代码片段中没什么大不了的,但是请想象一个具有更多参数和更多代码行的函数。然后,您决定将data参数重命名为,yadda但错过了函数主体中使用该参数的位置之一…现在data是指全局变量,您开始有怪异的行为- NameError如果不这样做,您将拥有更明显的表现有一个全球的名字data

还要记住,在Python中,所有东西都是对象(包括模块,类和函数),因此对于函数,模块或类没有明显的命名空间。另一种情况是将函数导入foo到模块顶部,然后在函数主体中的某个位置使用它。然后,在函数中添加一个新参数,并将其命名为-bad lucky- foo

最后,内置函数和类型也位于相同的命名空间中,并且可以以相同的方式进行阴影处理。

如果您的功能短,命名合理且单元测试范围广,那么这些都不是什么大问题,但是好吧,有时您必须维护的代码不够完美,并且被警告可能存在的问题。

No big deal in your above snippet, but imagine a function with a few more arguments and quite a few more lines of code. Then you decide to rename your data argument as yadda but miss one of the places it is used in the function’s body… Now data refers to the global, and you start having weird behaviour – where you would have a much more obvious NameError if you didn’t have a global name data.

Also remember that in Python everything is an object (including modules, classes and functions) so there’s no distinct namespaces for functions, modules or classes. Another scenario is that you import function foo at the top of your module, and use it somewhere in your function body. Then you add a new argument to your function and named it – bad luck – foo.

Finally, built-in functions and types also live in the same namespace and can be shadowed the same way.

None of this is much of a problem if you have short functions, good naming and a decent unittest coverage, but well, sometimes you have to maintain less than perfect code and being warned about such possible issues might help.


回答 1

当前最受投票和接受的答案以及此处的大多数答案都没有抓住重点。

函数有多长,或描述性地命名变量(希望将潜在的名称冲突机会降到最低)都没有关系。

函数的局部变量或其参数恰好在全局范围内共享名称这一事实是完全不相关的。实际上,无论您多么仔细地选择本地变量名称,您的函数都无法预见到“ yadda将来我的好名字是否也将用作全局变量?”。解决方案?根本不用担心!正确的心态是将函数设计为仅使用签名中参数的输入,而无需使用全局范围内的(或将要)什么,然后阴影根本就不是问题。

换句话说,仅当函数需要使用相同名称的局部变量和全局变量时,阴影问题才重要。但是您首先应该避免这种设计。OP的代码实际上并没有这样的设计问题。仅仅是PyCharm不够聪明,它会发出警告以防万一。因此,只是为了使PyCharm满意,并使我们的代码整洁,请参见silyevsk的回答中引用的此解决方案以完全删除全局变量。

def print_data(data):
    print data

def main():
    data = [4, 5, 6]
    print_data(data)

main()

这是解决问题的正确方法,方法是修复/删除全局对象,而不调整当前的局部函数。

The currently most up-voted and accepted answer and most answers here miss the point.

It doesn’t matter how long your function is, or how you name your variable descriptively (to hopefully minimize the chance of potential name collision).

The fact that your function’s local variable or its parameter happens to share a name in the global scope is completely irrelevant. And in fact, no matter how carefully you choose you local variable name, your function can never foresee “whether my cool name yadda will also be used as a global variable in future?”. The solution? Simply don’t worry about that! The correct mindset is to design your function to consume input from and only from its parameters in signature, that way you don’t need to care what is (or will be) in global scope, and then shadowing becomes not an issue at all.

In other words, shadowing problem only matters when your function need to use the same name local variable AND the global variable. But you should avoid such design in the first place. The OP’s code does NOT really have such design problem. It is just that PyCharm is not smart enough and it gives out a warning just in case. So, just to make PyCharm happy, and also make our code clean, see this solution quoting from silyevsk ‘s answer to remove the global variable completely.

def print_data(data):
    print data

def main():
    data = [4, 5, 6]
    print_data(data)

main()

This is the proper way to “solve” this problem, by fixing/removing your global thing, not adjusting your current local function.


回答 2

在某些情况下,一个好的解决方法是将vars +代码移至另一个函数:

def print_data(data):
    print data

def main():
    data = [4, 5, 6]
    print_data(data)

main()

A good workaround in some cases may be to move the vars + code to another function:

def print_data(data):
    print data

def main():
    data = [4, 5, 6]
    print_data(data)

main()

回答 3

这取决于功能的持续时间。功能越长,将来有人对其进行修改的机会就越多,data以为它意味着全局。实际上,这意味着本地,但是由于功能太长了,因此对于他们来说并不明显存在具有该名称的本地。

对于您的示例函数,我认为遮盖全局一点也不差。

It depends how long the function is. The longer the function, the more chance that someone modifying it in future will write data thinking that it means the global. In fact it means the local but because the function is so long it’s not obvious to them that there exists a local with that name.

For your example function, I think that shadowing the global is not bad at all.


回答 4

做这个:

data = [4, 5, 6]

def print_data():
    global data
    print(data)

print_data()

Do this:

data = [4, 5, 6]

def print_data():
    global data
    print(data)

print_data()

回答 5

data = [4, 5, 6] #your global variable

def print_data(data): # <-- Pass in a parameter called "data"
    print data  # <-- Note: You can access global variable inside your function, BUT for now, which is which? the parameter or the global variable? Confused, huh?

print_data(data)
data = [4, 5, 6] #your global variable

def print_data(data): # <-- Pass in a parameter called "data"
    print data  # <-- Note: You can access global variable inside your function, BUT for now, which is which? the parameter or the global variable? Confused, huh?

print_data(data)

回答 6

我喜欢在pycharm的右上角看到一个绿色的勾号。我为变量名加上下划线只是为了清除此警告,因此我可以将重点放在重要警告上。

data = [4, 5, 6]

def print_data(data_): 
    print(data_)

print_data(data)

I like to see a green tick in the top right corner in pycharm. I append the variable names with an underscore just to clear this warning so I can focus on the important warnings.

data = [4, 5, 6]

def print_data(data_): 
    print(data_)

print_data(data)

回答 7

看起来像是100%pytest代码模式

看到:

https://docs.pytest.org/en/latest/fixture.html#conftest-py-sharing-fixture-functions

我也有同样的问题,这就是为什么我找到这篇文章的原因;)

# ./tests/test_twitter1.py
import os
import pytest

from mylib import db
# ...

@pytest.fixture
def twitter():
    twitter_ = db.Twitter()
    twitter_._debug = True
    return twitter_

@pytest.mark.parametrize("query,expected", [
    ("BANCO PROVINCIAL", 8),
    ("name", 6),
    ("castlabs", 42),
])
def test_search(twitter: db.Twitter, query: str, expected: int):

    for query in queries:
        res = twitter.search(query)
        print(res)
        assert res

它会警告 This inspection detects shadowing names defined in outer scopes.

要解决此问题,只需将twitter灯具移入./tests/conftest.py

# ./tests/conftest.py
import pytest

from syntropy import db


@pytest.fixture
def twitter():
    twitter_ = db.Twitter()
    twitter_._debug = True
    return twitter_

然后移除twitter固定装置./tests/test_twitter2.py

# ./tests/test_twitter2.py
import os
import pytest

from mylib import db
# ...

@pytest.mark.parametrize("query,expected", [
    ("BANCO PROVINCIAL", 8),
    ("name", 6),
    ("castlabs", 42),
])
def test_search(twitter: db.Twitter, query: str, expected: int):

    for query in queries:
        res = twitter.search(query)
        print(res)
        assert res

这会让QA,Pycharm和所有人感到高兴

It looks like it 100% pytest code pattern

see:

https://docs.pytest.org/en/latest/fixture.html#conftest-py-sharing-fixture-functions

I had the same problem with, this is why I found this post ;)

# ./tests/test_twitter1.py
import os
import pytest

from mylib import db
# ...

@pytest.fixture
def twitter():
    twitter_ = db.Twitter()
    twitter_._debug = True
    return twitter_

@pytest.mark.parametrize("query,expected", [
    ("BANCO PROVINCIAL", 8),
    ("name", 6),
    ("castlabs", 42),
])
def test_search(twitter: db.Twitter, query: str, expected: int):

    for query in queries:
        res = twitter.search(query)
        print(res)
        assert res

And it will warn with This inspection detects shadowing names defined in outer scopes.

To fix that just move your twitter fixture into ./tests/conftest.py

# ./tests/conftest.py
import pytest

from syntropy import db


@pytest.fixture
def twitter():
    twitter_ = db.Twitter()
    twitter_._debug = True
    return twitter_

And remove twitter fixture like in ./tests/test_twitter2.py

# ./tests/test_twitter2.py
import os
import pytest

from mylib import db
# ...

@pytest.mark.parametrize("query,expected", [
    ("BANCO PROVINCIAL", 8),
    ("name", 6),
    ("castlabs", 42),
])
def test_search(twitter: db.Twitter, query: str, expected: int):

    for query in queries:
        res = twitter.search(query)
        print(res)
        assert res

This will be make happy QA, Pycharm and everyone


在Python中创建一个空列表

问题:在Python中创建一个空列表

在Python中创建新的空列表的最佳方法是什么?

l = [] 

要么

l = list()

我之所以这样问是因为两个原因:

  1. 技术原因,关于哪个更快。(创建一个类会导致开销吗?)
  2. 代码可读性-这是标准约定。

What is the best way to create a new empty list in Python?

l = [] 

or

l = list()

I am asking this because of two reasons:

  1. Technical reasons, as to which is faster. (creating a class causes overhead?)
  2. Code readability – which one is the standard convention.

回答 0

您可以通过以下方法测试哪段代码更快:

% python -mtimeit  "l=[]"
10000000 loops, best of 3: 0.0711 usec per loop

% python -mtimeit  "l=list()"
1000000 loops, best of 3: 0.297 usec per loop

但是,实际上,这种初始化很可能只是程序的一小部分,因此担心此初始化可能会出错。

可读性是非常主观的。我更喜欢[],但是像Alex Martelli这样的一些非常博学的人更喜欢,list()因为它很明显

Here is how you can test which piece of code is faster:

% python -mtimeit  "l=[]"
10000000 loops, best of 3: 0.0711 usec per loop

% python -mtimeit  "l=list()"
1000000 loops, best of 3: 0.297 usec per loop

However, in practice, this initialization is most likely an extremely small part of your program, so worrying about this is probably wrong-headed.

Readability is very subjective. I prefer [], but some very knowledgable people, like Alex Martelli, prefer list() because it is pronounceable.


回答 1

list()本质上比慢[],因为

  1. 有符号查找(python不能事先知道您是否不只是将列表重新定义为其他内容!),

  2. 有函数调用,

  3. 然后它必须检查是否传递了可迭代的参数(以便它可以使用其中的元素创建列表)。在我们的情况下没有,但是有“如果”检查

在大多数情况下,速度差异不会产生任何实际差异。

list() is inherently slower than [], because

  1. there is symbol lookup (no way for python to know in advance if you did not just redefine list to be something else!),

  2. there is function invocation,

  3. then it has to check if there was iterable argument passed (so it can create list with elements from it) ps. none in our case but there is “if” check

In most cases the speed difference won’t make any practical difference though.


回答 2

我用[]

  1. 速度更快,因为列表符号是短路。
  2. 创建包含项目的列表应该创建不包含项目的列表大致相同,为什么会有区别呢?

I use [].

  1. It’s faster because the list notation is a short circuit.
  2. Creating a list with items should look about the same as creating a list without, why should there be a difference?

回答 3

我并不是很了解,但是根据我的经验,jpcgt实际上是正确的。以下示例:如果我使用以下代码

t = [] # implicit instantiation
t = t.append(1)

在解释器中,然后调用t给我“ t”,不带任何列表,如果我附加其他内容,例如

t = t.append(2)

我收到错误“’NoneType’对象没有属性’append’”。但是,如果我通过以下方式创建列表

t = list() # explicit instantiation

然后就可以了

I do not really know about it, but it seems to me, by experience, that jpcgt is actually right. Following example: If I use following code

t = [] # implicit instantiation
t = t.append(1)

in the interpreter, then calling t gives me just “t” without any list, and if I append something else, e.g.

t = t.append(2)

I get the error “‘NoneType’ object has no attribute ‘append'”. If, however, I create the list by

t = list() # explicit instantiation

then it works fine.


回答 4

只是强调@Darkonaut 答案因为我认为它应该更明显。

new_list = []new_list = list()两者都很好(忽略性能),但append()返回None,结果您无法做new_list = new_list.append(something

这种返回类型的决定让我感到非常困惑。uck

Just to highlight @Darkonaut answer because I think it should be more visible.

new_list = [] or new_list = list() are both fine (ignoring performance), but append() returns None, as result you can’t do new_list = new_list.append(something).


导入语句是否应该始终位于模块的顶部?

问题:导入语句是否应该始终位于模块的顶部?

PEP 08指出:

导入总是放在文件的顶部,紧随任何模块注释和文档字符串之后,以及模块全局变量和常量之前。

但是,如果仅在极少数情况下使用我要导入的类/方法/函数,那么在需要时进行导入肯定会更有效吗?

这不是吗?

class SomeClass(object):

    def not_often_called(self)
        from datetime import datetime
        self.datetime = datetime.now()

比这更有效?

from datetime import datetime

class SomeClass(object):

    def not_often_called(self)
        self.datetime = datetime.now()

PEP 08 states:

Imports are always put at the top of the file, just after any module comments and docstrings, and before module globals and constants.

However if the class/method/function that I am importing is only used in rare cases, surely it is more efficient to do the import when it is needed?

Isn’t this:

class SomeClass(object):

    def not_often_called(self)
        from datetime import datetime
        self.datetime = datetime.now()

more efficient than this?

from datetime import datetime

class SomeClass(object):

    def not_often_called(self)
        self.datetime = datetime.now()

回答 0

模块导入非常快,但不是即时的。这意味着:

  • 将导入放在模块顶部很好,因为这是微不足道的成本,只需要支付一次即可。
  • 将导入放在函数中会导致对该函数的调用花费更长时间。

因此,如果您关心效率,则将进口放在首位。仅在您的分析显示有帮助的情况下,才将它们移入函数中(您进行了概要分析以查看最能改善性能的地方,对吗?)


我见过执行延迟导入的最佳原因是:

  • 可选的库支持。如果您的代码具有使用不同库的多个路径,则在未安装可选库的情况下不要中断。
  • __init__.py插件的中,可能已导入但并未实际使用。例如Bazaar插件,它使用bzrlib的延迟加载框架。

Module importing is quite fast, but not instant. This means that:

  • Putting the imports at the top of the module is fine, because it’s a trivial cost that’s only paid once.
  • Putting the imports within a function will cause calls to that function to take longer.

So if you care about efficiency, put the imports at the top. Only move them into a function if your profiling shows that would help (you did profile to see where best to improve performance, right??)


The best reasons I’ve seen to perform lazy imports are:

  • Optional library support. If your code has multiple paths that use different libraries, don’t break if an optional library is not installed.
  • In the __init__.py of a plugin, which might be imported but not actually used. Examples are Bazaar plugins, which use bzrlib‘s lazy-loading framework.

回答 1

将import语句放在函数内部可以防止循环依赖。例如,如果您有两个模块X.py和Y.py,并且它们都需要互相导入,那么当您导入其中一个模块导致无限循环时,这将导致循环依赖。如果将import语句移动到一个模块中,则它将在调用该函数之前不会尝试导入另一个模块,并且该模块将已经被导入,因此不会出现无限循环。在此处阅读更多内容-effbot.org/zone/import-confusion.htm

Putting the import statement inside of a function can prevent circular dependencies. For example, if you have 2 modules, X.py and Y.py, and they both need to import each other, this will cause a circular dependency when you import one of the modules causing an infinite loop. If you move the import statement in one of the modules then it won’t try to import the other module till the function is called, and that module will already be imported, so no infinite loop. Read here for more – effbot.org/zone/import-confusion.htm


回答 2

我采用了将所有导入放入使用它们的函数中而不是放在模块顶部的做法。

我得到的好处是能够更可靠地进行重构。当我将一个功能从一个模块移动到另一个模块时,我知道该功能将继续使用其完整的测试遗留功能。如果我在模块的顶部放置了导入文件,那么当我移动一个函数时,我发现我花了很多时间来使新模块的导入文件完整而最少。重构IDE可能与此无关。

如其他地方提到的那样,存在速度损失。我已经在我的应用程序中对此进行了测量,发现对于我的目的而言它并不重要。

能够预先查看所有模块依赖性而无需借助搜索(例如grep),也很不错。但是,我关心模块依赖性的原因通常是因为我正在安装,重构或移动包含多个文件的整个系统,而不仅仅是一个模块。在这种情况下,无论如何,我将执行全局搜索以确保我具有系统级依赖项。因此,我还没有发现全局导入可以帮助我在实践中理解系统。

我通常将检查的内容sys放入if __name__=='__main__'检查中,然后将参数(如sys.argv[1:])传递给main()函数。这使我可以mainsys尚未导入的上下文中使用。

I have adopted the practice of putting all imports in the functions that use them, rather than at the top of the module.

The benefit I get is the ability to refactor more reliably. When I move a function from one module to another, I know that the function will continue to work with all of its legacy of testing intact. If I have my imports at the top of the module, when I move a function, I find that I end up spending a lot of time getting the new module’s imports complete and minimal. A refactoring IDE might make this irrelevant.

There is a speed penalty as mentioned elsewhere. I have measured this in my application and found it to be insignificant for my purposes.

It is also nice to be able to see all module dependencies up front without resorting to search (e.g. grep). However, the reason I care about module dependencies is generally because I’m installing, refactoring, or moving an entire system comprising multiple files, not just a single module. In that case, I’m going to perform a global search anyway to make sure I have the system-level dependencies. So I have not found global imports to aid my understanding of a system in practice.

I usually put the import of sys inside the if __name__=='__main__' check and then pass arguments (like sys.argv[1:]) to a main() function. This allows me to use main in a context where sys has not been imported.


回答 3

在大多数情况下,这样做对于保持清晰性和明智性很有用,但并非总是如此。以下是几个可能会在其他地方导入模块的情况的示例。

首先,您可以拥有一个带有以下形式的单元测试的模块:

if __name__ == '__main__':
    import foo
    aa = foo.xyz()         # initiate something for the test

其次,您可能需要在运行时有条件地导入一些不同的模块。

if [condition]:
    import foo as plugin_api
else:
    import bar as plugin_api
xx = plugin_api.Plugin()
[...]

在其他情况下,您可能会将导入放置在代码的其他部分中。

Most of the time this would be useful for clarity and sensible to do but it’s not always the case. Below are a couple of examples of circumstances where module imports might live elsewhere.

Firstly, you could have a module with a unit test of the form:

if __name__ == '__main__':
    import foo
    aa = foo.xyz()         # initiate something for the test

Secondly, you might have a requirement to conditionally import some different module at runtime.

if [condition]:
    import foo as plugin_api
else:
    import bar as plugin_api
xx = plugin_api.Plugin()
[...]

There are probably other situations where you might place imports in other parts in the code.


回答 4

当函数被调用为零或一次时,第一种变体的确比第二种变体更有效。但是,在第二次及其后的调用中,“导入每个调用”方法实际上效率较低。请参阅此链接以获取延迟加载技术,该技术通过执行“延迟导入”结合了两种方法的优点。

但是,除了效率之外,还有其他原因导致您可能会偏爱一个。一种方法是使阅读该模块相关代码的人更加清楚。它们还具有非常不同的故障特征-如果没有“ datetime”模块,第一个将在加载时失败,而第二个在调用该方法之前不会失败。

补充说明:在IronPython中,导入可能比CPython中昂贵得多,因为代码基本上是在导入时进行编译的。

The first variant is indeed more efficient than the second when the function is called either zero or one times. With the second and subsequent invocations, however, the “import every call” approach is actually less efficient. See this link for a lazy-loading technique that combines the best of both approaches by doing a “lazy import”.

But there are reasons other than efficiency why you might prefer one over the other. One approach is makes it much more clear to someone reading the code as to the dependencies that this module has. They also have very different failure characteristics — the first will fail at load time if there’s no “datetime” module while the second won’t fail until the method is called.

Added Note: In IronPython, imports can be quite a bit more expensive than in CPython because the code is basically being compiled as it’s being imported.


回答 5

Curt提出了一个很好的观点:第二个版本更清晰,它将在加载时而不是以后失败,并且出乎意料地失败。

通常,我不必担心模块的加载效率,因为它的速度(a)非常快,而(b)大多仅在启动时发生。

如果必须在意外的时刻加载重量级模块,则可以通过该__import__函数动态加载它们,并确保捕获ImportError异常并以合理的方式处理它们,这可能更有意义。

Curt makes a good point: the second version is clearer and will fail at load time rather than later, and unexpectedly.

Normally I don’t worry about the efficiency of loading modules, since it’s (a) pretty fast, and (b) mostly only happens at startup.

If you have to load heavyweight modules at unexpected times, it probably makes more sense to load them dynamically with the __import__ function, and be sure to catch ImportError exceptions, and handle them in a reasonable manner.


回答 6

我不会担心过多地预先加载模块的效率。模块占用的内存不会很大(假设它足够模块化),启动成本可以忽略不计。

在大多数情况下,您希望将模块加载到源文件的顶部。对于阅读您的代码的人来说,它更容易分辨出哪个功能或对象来自哪个模块。

将模块导入代码中其他位置的一个很好的理由是,如果该模块在调试语句中使用过。

例如:

do_something_with_x(x)

我可以使用以下命令调试它:

from pprint import pprint
pprint(x)
do_something_with_x(x)

当然,将模块导入代码中其他位置的另一个原因是是否需要动态导入它们。这是因为您几乎别无选择。

我不会担心过多地预先加载模块的效率。模块占用的内存不会很大(假设它足够模块化),启动成本可以忽略不计。

I wouldn’t worry about the efficiency of loading the module up front too much. The memory taken up by the module won’t be very big (assuming it’s modular enough) and the startup cost will be negligible.

In most cases you want to load the modules at the top of the source file. For somebody reading your code, it makes it much easier to tell what function or object came from what module.

One good reason to import a module elsewhere in the code is if it’s used in a debugging statement.

For example:

do_something_with_x(x)

I could debug this with:

from pprint import pprint
pprint(x)
do_something_with_x(x)

Of course, the other reason to import modules elsewhere in the code is if you need to dynamically import them. This is because you pretty much don’t have any choice.

I wouldn’t worry about the efficiency of loading the module up front too much. The memory taken up by the module won’t be very big (assuming it’s modular enough) and the startup cost will be negligible.


回答 7

这是一个折衷,只有程序员才能决定进行。

情况1通过在需要之前不导入datetime模块(并进行可能需要的任何初始化)来节省一些内存和启动时间。请注意,“仅在调用时”执行导入也意味着“在调用时每次”进行导入,因此第一个调用之后的每个调用仍会产生执行导入的额外开销。

情况2通过预先导入datetime来节省一些执行时间和延迟,以便not_often_drawn()在调用时将更快地返回,并且还不会在每次调用时都导致导入开销。

除了效率外,如果import语句在…前面,则更容易在前面看到模块依赖性。将它们隐藏在代码中会使您更难于找到所需的模块。

就个人而言,除了单元测试之类的东西外,我通常都遵循PEP,因此我不希望总是加载它,因为我知道除了测试代码之外不会使用它们。

It’s a tradeoff, that only the programmer can decide to make.

Case 1 saves some memory and startup time by not importing the datetime module (and doing whatever initialization it might require) until needed. Note that doing the import ‘only when called’ also means doing it ‘every time when called’, so each call after the first one is still incurring the additional overhead of doing the import.

Case 2 save some execution time and latency by importing datetime beforehand so that not_often_called() will return more quickly when it is called, and also by not incurring the overhead of an import on every call.

Besides efficiency, it’s easier to see module dependencies up front if the import statements are … up front. Hiding them down in the code can make it more difficult to easily find what modules something depends on.

Personally I generally follow the PEP except for things like unit tests and such that I don’t want always loaded because I know they aren’t going to be used except for test code.


回答 8

这是一个示例,其中所有导入都位于最顶部(这是我唯一需要这样做的时间)。我希望能够在Un * x和Windows上终止子进程。

import os
# ...
try:
    kill = os.kill  # will raise AttributeError on Windows
    from signal import SIGTERM
    def terminate(process):
        kill(process.pid, SIGTERM)
except (AttributeError, ImportError):
    try:
        from win32api import TerminateProcess  # use win32api if available
        def terminate(process):
            TerminateProcess(int(process._handle), -1)
    except ImportError:
        def terminate(process):
            raise NotImplementedError  # define a dummy function

(评论:约翰·米利金说的话。)

Here’s an example where all the imports are at the very top (this is the only time I’ve needed to do this). I want to be able to terminate a subprocess on both Un*x and Windows.

import os
# ...
try:
    kill = os.kill  # will raise AttributeError on Windows
    from signal import SIGTERM
    def terminate(process):
        kill(process.pid, SIGTERM)
except (AttributeError, ImportError):
    try:
        from win32api import TerminateProcess  # use win32api if available
        def terminate(process):
            TerminateProcess(int(process._handle), -1)
    except ImportError:
        def terminate(process):
            raise NotImplementedError  # define a dummy function

(On review: what John Millikin said.)


回答 9

就像许多其他优化一样,您会牺牲一些可读性来提高速度。如John所述,如果您完成了概要分析作业,并且发现这是一项非常有用的更改,并且您需要额外的速度,则可以继续进行。在所有其他进口商品上加上注释可能会很好:

from foo import bar
from baz import qux
# Note: datetime is imported in SomeClass below

This is like many other optimizations – you sacrifice some readability for speed. As John mentioned, if you’ve done your profiling homework and found this to be a significantly useful enough change and you need the extra speed, then go for it. It’d probably be good to put a note up with all the other imports:

from foo import bar
from baz import qux
# Note: datetime is imported in SomeClass below

回答 10

模块初始化仅发生一次-在首次导入时。如果有问题的模块来自标准库,那么您也可能会从程序中的其他模块导入它。对于像日期时间一样普遍的模块,它也可能是许多其他标准库的依赖项。由于模块初始化已经发生,因此import语句的花费很少。此时,它所做的全部工作就是将现有模块对象绑定到本地范围。

将该信息与用于可读性的参数相结合,我想说最好在模块范围内使用import语句。

Module initialization only occurs once – on the first import. If the module in question is from the standard library, then you will likely import it from other modules in your program as well. For a module as prevalent as datetime, it is also likely a dependency for a slew of other standard libraries. The import statement would cost very little then since the module intialization would have happened already. All it is doing at this point is binding the existing module object to the local scope.

Couple that information with the argument for readability and I would say that it is best to have the import statement at module scope.


回答 11

只是为了完成萌的答案和原始问题:

当我们不得不处理循环依赖时,我们可以做一些“技巧”。假设我们正在与模块的工作a.py,并b.py包含x()和B y()分别。然后:

  1. 我们可以移动from imports模块底部的之一。
  2. 我们可以移动from imports实际上需要导入的函数或方法的内部之一(这并不总是可能的,因为您可以在多个地方使用它)。
  3. 我们可以将两者之一更改from imports为如下所示的导入:import a

因此,总结一下。如果您不是在处理循环依赖关系,而是采取某种技巧来避免它们,那么最好将所有导入内容放在顶部,因为在此问题的其他答案中已经说明了这些原因。并且,请在做“技巧”时添加评论,我们始终欢迎您!:)

Just to complete Moe’s answer and the original question:

When we have to deal with circular dependences we can do some “tricks”. Assuming we’re working with modules a.py and b.py that contain x() and b y(), respectively. Then:

  1. We can move one of the from imports at the bottom of the module.
  2. We can move one of the from imports inside the function or method that is actually requiring the import (this isn’t always possible, as you may use it from several places).
  3. We can change one of the two from imports to be an import that looks like: import a

So, to conclude. If you aren’t dealing with circular dependencies and doing some kind of trick to avoid them, then it’s better to put all your imports at the top because of the reasons already explained in other answers to this question. And please, when doing this “tricks” include a comment, it’s always welcome! :)


回答 12

除了已经给出的出色答案外,值得注意的是,进口商品的摆放不仅是风格问题。有时,模块具有隐式依赖关系,需要首先导入或初始化,而顶级导入可能会导致违反所需的执行顺序。

这个问题通常出现在Apache Spark的Python API中,您需要在导入任何pyspark软件包或模块之前初始化SparkContext。最好将pyspark导入放置在保证SparkContext可用的范围内。

In addition to the excellent answers already given, it’s worth noting that the placement of imports is not merely a matter of style. Sometimes a module has implicit dependencies that need to be imported or initialized first, and a top-level import could lead to violations of the required order of execution.

This issue often comes up in Apache Spark’s Python API, where you need to initialize the SparkContext before importing any pyspark packages or modules. It’s best to place pyspark imports in a scope where the SparkContext is guaranteed to be available.


回答 13

我很惊讶地没有看到已经发布的重复负载检查的实际成本数字,尽管对预期的结果有很多很好的解释。

如果您在顶部导入,则无论如何都会承受重击。这个数字很小,但是通常以毫秒为单位,而不是纳秒。

如果导入功能(S)之内,那么你只需要命中的加载,如果首次调用这些功能之一。正如许多人指出的那样,如果根本不发生这种情况,则可以节省加载时间。但是,如果函数被调用很多,您将遭受一次重复的打击,尽管命中率要小得多(用于检查它是否已加载;不是实际重新加载)。另一方面,正如@aaronasterling指出的那样,您还可以节省一点,因为在函数中进行导入使该函数可以使用稍快的局部变量查找来稍后标识名称(http://stackoverflow.com/questions/477096/python- import-coding-style / 4789963#4789963)。

这是一个简单测试的结果,该测试从函数内部导入了一些东西。报告的时间(在2.3 GHz Intel Core i7上的Python 2.7.14中)显示如下(第二次调用比以后的调用更多,这似乎是一致的,尽管我不知道为什么)。

 0 foo:   14429.0924 µs
 1 foo:      63.8962 µs
 2 foo:      10.0136 µs
 3 foo:       7.1526 µs
 4 foo:       7.8678 µs
 0 bar:       9.0599 µs
 1 bar:       6.9141 µs
 2 bar:       7.1526 µs
 3 bar:       7.8678 µs
 4 bar:       7.1526 µs

编码:

from __future__ import print_function
from time import time

def foo():
    import collections
    import re
    import string
    import math
    import subprocess
    return

def bar():
    import collections
    import re
    import string
    import math
    import subprocess
    return

t0 = time()
for i in xrange(5):
    foo()
    t1 = time()
    print("    %2d foo: %12.4f \xC2\xB5s" % (i, (t1-t0)*1E6))
    t0 = t1
for i in xrange(5):
    bar()
    t1 = time()
    print("    %2d bar: %12.4f \xC2\xB5s" % (i, (t1-t0)*1E6))
    t0 = t1

I was surprised not to see actual cost numbers for the repeated load-checks posted already, although there are many good explanations of what to expect.

If you import at the top, you take the load hit no matter what. That’s pretty small, but commonly in the milliseconds, not nanoseconds.

If you import within a function(s), then you only take the hit for loading if and when one of those functions is first called. As many have pointed out, if that doesn’t happen at all, you save the load time. But if the function(s) get called a lot, you take a repeated though much smaller hit (for checking that it has been loaded; not for actually re-loading). On the other hand, as @aaronasterling pointed out you also save a little because importing within a function lets the function use slightly-faster local variable lookups to identify the name later (http://stackoverflow.com/questions/477096/python-import-coding-style/4789963#4789963).

Here are the results of a simple test that imports a few things from inside a function. The times reported (in Python 2.7.14 on a 2.3 GHz Intel Core i7) are shown below (the 2nd call taking more than later calls seems consistent, though I don’t know why).

 0 foo:   14429.0924 µs
 1 foo:      63.8962 µs
 2 foo:      10.0136 µs
 3 foo:       7.1526 µs
 4 foo:       7.8678 µs
 0 bar:       9.0599 µs
 1 bar:       6.9141 µs
 2 bar:       7.1526 µs
 3 bar:       7.8678 µs
 4 bar:       7.1526 µs

The code:

from __future__ import print_function
from time import time

def foo():
    import collections
    import re
    import string
    import math
    import subprocess
    return

def bar():
    import collections
    import re
    import string
    import math
    import subprocess
    return

t0 = time()
for i in xrange(5):
    foo()
    t1 = time()
    print("    %2d foo: %12.4f \xC2\xB5s" % (i, (t1-t0)*1E6))
    t0 = t1
for i in xrange(5):
    bar()
    t1 = time()
    print("    %2d bar: %12.4f \xC2\xB5s" % (i, (t1-t0)*1E6))
    t0 = t1

回答 14

我不希望提供完整的答案,因为其他人已经做得很好。当我发现在功能内部导入模块特别有用时,我只想提及一个用例。我的应用程序使用存储在特定位置的python软件包和模块作为插件。在应用程序启动期间,应用程序遍历该位置的所有模块并将其导入,然后在模块内部查找,如果找到了插件的安装点(在我的情况下,它是具有唯一标识的某些基类的子类ID)将其注册。插件的数量很大(现在有几十个,但将来可能有数百个),每个插件很少使用。在应用程序启动过程中,在我的插件模块顶部添加了第三方库,这会带来一些损失。尤其是某些第三方库的导入非常繁重(例如,密谋导入甚至尝试连接到Internet并下载一些内容,这些内容在启动时增加了大约一秒钟的时间)。通过优化插件中的导入(仅在使用它们的函数中调用它们),我设法将启动时间从10秒缩短到大约2秒。对于我的用户而言,这是一个很大的差异。

所以我的答案是不,不要总是将导入放在模块的顶部。

I do not aspire to provide complete answer, because others have already done this very well. I just want to mention one use case when I find especially useful to import modules inside functions. My application uses python packages and modules stored in certain location as plugins. During application startup, the application walks through all the modules in the location and imports them, then it looks inside the modules and if it finds some mounting points for the plugins (in my case it is a subclass of a certain base class having a unique ID) it registers them. The number of plugins is large (now dozens, but maybe hundreds in the future) and each of them is used quite rarely. Having imports of third party libraries at the top of my plugin modules was a bit penalty during application startup. Especially some thirdparty libraries are heavy to import (e.g. import of plotly even tries to connect to internet and download something which was adding about one second to startup). By optimizing imports (calling them only in the functions where they are used) in the plugins I managed to shrink the startup from 10 seconds to some 2 seconds. That is a big difference for my users.

So my answer is no, do not always put the imports at the top of your modules.


回答 15

有趣的是,到目前为止,没有一个答案提到了并行处理,当序列化的函数代码被推到其他内核时,例如ipyparallel的情况,可能需要在函数中引入导入。

It’s interesting that not a single answer mentioned parallel processing so far, where it might be REQUIRED that the imports are in the function, when the serialized function code is what is being pushed around to other cores, e.g. like in the case of ipyparallel.


回答 16

通过将变量/局部作用域导入函数内部,可以提高性能。这取决于函数中导入事物的用法。如果要循环很多次并访问模块全局对象,则将其作为本地导入可以有所帮助。

test.py

X=10
Y=11
Z=12
def add(i):
  i = i + 10

runlocal.py

from test import add, X, Y, Z

    def callme():
      x=X
      y=Y
      z=Z
      ladd=add 
      for i  in range(100000000):
        ladd(i)
        x+y+z

    callme()

运行

from test import add, X, Y, Z

def callme():
  for i in range(100000000):
    add(i)
    X+Y+Z

callme()

在Linux上使用一段时间显示收益很小

/usr/bin/time -f "\t%E real,\t%U user,\t%S sys" python run.py 
    0:17.80 real,   17.77 user, 0.01 sys
/tmp/test$ /usr/bin/time -f "\t%E real,\t%U user,\t%S sys" python runlocal.py 
    0:14.23 real,   14.22 user, 0.01 sys

真正的是壁钟。用户是程序中的时间。sys是时候进行系统调用了。

https://docs.python.org/3.5/reference/executionmodel.html#resolution-of-names

There can be a performance gain by importing variables/local scoping inside of a function. This depends on the usage of the imported thing inside the function. If you are looping many times and accessing a module global object, importing it as local can help.

test.py

X=10
Y=11
Z=12
def add(i):
  i = i + 10

runlocal.py

from test import add, X, Y, Z

    def callme():
      x=X
      y=Y
      z=Z
      ladd=add 
      for i  in range(100000000):
        ladd(i)
        x+y+z

    callme()

run.py

from test import add, X, Y, Z

def callme():
  for i in range(100000000):
    add(i)
    X+Y+Z

callme()

A time on Linux shows a small gain

/usr/bin/time -f "\t%E real,\t%U user,\t%S sys" python run.py 
    0:17.80 real,   17.77 user, 0.01 sys
/tmp/test$ /usr/bin/time -f "\t%E real,\t%U user,\t%S sys" python runlocal.py 
    0:14.23 real,   14.22 user, 0.01 sys

real is wall clock. user is time in program. sys is time for system calls.

https://docs.python.org/3.5/reference/executionmodel.html#resolution-of-names


回答 17

可读性

除了启动性能外,还有一个可读性参数可用于本地化import语句。例如,在我当前的第一个python项目中,使用python行号1283到1296:

listdata.append(['tk font version', font_version])
listdata.append(['Gtk version', str(Gtk.get_major_version())+"."+
                 str(Gtk.get_minor_version())+"."+
                 str(Gtk.get_micro_version())])

import xml.etree.ElementTree as ET

xmltree = ET.parse('/usr/share/gnome/gnome-version.xml')
xmlroot = xmltree.getroot()
result = []
for child in xmlroot:
    result.append(child.text)
listdata.append(['Gnome version', result[0]+"."+result[1]+"."+
                 result[2]+" "+result[3]])

如果该import语句位于文件的顶部,则必须向上滚动很长一段距离,或者按Home,以查找内容ET。然后,我将不得不导航回到第1283行以继续阅读代码。

确实,即使 import语句位于函数(或类)的顶部(如许多语句所放置的那样),也需要向上和向下分页。

显示Gnome版本号的操作很少,因此import文件顶部会引入不必要的启动延迟。

Readability

In addition to startup performance, there is a readability argument to be made for localizing import statements. For example take python line numbers 1283 through 1296 in my current first python project:

listdata.append(['tk font version', font_version])
listdata.append(['Gtk version', str(Gtk.get_major_version())+"."+
                 str(Gtk.get_minor_version())+"."+
                 str(Gtk.get_micro_version())])

import xml.etree.ElementTree as ET

xmltree = ET.parse('/usr/share/gnome/gnome-version.xml')
xmlroot = xmltree.getroot()
result = []
for child in xmlroot:
    result.append(child.text)
listdata.append(['Gnome version', result[0]+"."+result[1]+"."+
                 result[2]+" "+result[3]])

If the import statement was at the top of file I would have to scroll up a long way, or press Home, to find out what ET was. Then I would have to navigate back to line 1283 to continue reading code.

Indeed even if the import statement was at the top of the function (or class) as many would place it, paging up and back down would be required.

Displaying the Gnome version number will rarely be done so the import at top of file introduces unnecessary startup lag.


回答 18

我想提一下我的一个用例,与@John Millikin和@VK提到的用例非常相似:

可选进口

我使用Jupyter Notebook进行数据分析,并且使用相同的IPython Notebook作为所有分析的模板。在某些情况下,我需要导入Tensorflow来进行一些快速的模型运行,但有时我会在未设置tensorflow或导入缓慢的地方工作。在这些情况下,我将依赖Tensorflow的操作封装在一个辅助函数中,将tensorflow导入该函数内部,并将其绑定到按钮。

这样,我可以“重新启动并运行所有程序”,而不必等待导入,也不必在导入失败时恢复其余的单元格。

I would like to mention a usecase of mine, very similar to those mentioned by @John Millikin and @V.K. :

Optional Imports

I do data analysis with Jupyter Notebook, and I use the same IPython notebook as a template for all analyses. In some occasions, I need to import Tensorflow to do some quick model runs, but sometimes I work in places where tensorflow isn’t set up / is slow to import. In those cases, I encapsulate my Tensorflow-dependent operations in a helper function, import tensorflow inside that function, and bind it to a button.

This way, I could do “restart-and-run-all” without having to wait for the import, or having to resume the rest of the cells when it fails.


回答 19

这是一个有趣的讨论。像许多其他人一样,我什至从未考虑过这个话题。由于想要在我的一个库中使用Django ORM,我不得不在函数中具有导入功能。我不得不打电话django.setup()在导入模型类之前,我,因为这是文件的顶部,由于IoC注入器的构造,它被拖到了完全非Django的库代码中。

我有点四处乱窜,最后将django.setup()in放在单例构造函数中,并将相关的导入放在每个类方法的顶部。现在,这种方法工作正常,但是由于进口商品不在顶部而使我感到不安,而且我也开始担心进口商品的额外时间。然后我来到这里,以极大的兴趣阅读了大家对此的看法。

我有很长的C ++背景,现在使用Python / Cython。我对此的看法是,为什么不将导入内容放入函数中,除非它导致概要分析的瓶颈。这就像在需要变量之前为变量声明空间。麻烦的是,我有数千行代码,所有导入都在顶部!所以我想从现在开始,当我经过并有时间时,在这里和那里更改奇数文件。

This is a fascinating discussion. Like many others I had never even considered this topic. I got cornered into having to have the imports in the functions because of wanting to use the Django ORM in one of my libraries. I was having to call django.setup() before importing my model classes and because this was at the top of the file it was being dragged into completely non-Django library code because of the IoC injector construction.

I kind of hacked around a bit and ended up putting the django.setup() in the singleton constructor and the relevant import at the top of each class method. Now this worked fine but made me uneasy because the imports weren’t at the top and also I started worrying about the extra time hit of the imports. Then I came here and read with great interest everybody’s take on this.

I have a long C++ background and now use Python/Cython. My take on this is that why not put the imports in the function unless it causes you a profiled bottleneck. It’s only like declaring space for variables just before you need them. The trouble is I have thousands of lines of code with all the imports at the top! So I think I will do it from now on and change the odd file here and there when I’m passing through and have the time.


标准的Python文档字符串格式是什么?[关闭]

问题:标准的Python文档字符串格式是什么?[关闭]

我已经看到了几种用Python编写文档字符串的样式,是否有正式或“同意的”样式?

I have seen a few different styles of writing docstrings in Python, is there an official or “agreed-upon” style?


回答 0

格式

可以按照其他文章所示的几种格式编写Python文档字符串。但是未提及默认的Sphinx文档字符串格式,该格式基于reStructuredText(reST)。您可以在此博客文章中获得有关主要格式的一些信息。

请注意,reST是PEP 287推荐的

以下是文档字符串的主要使用格式。

-Epytext

从历史上看,像Javadoc这样的样式很普遍,因此它被当作Epydoc(具有称为Epytext格式)生成文档的基础。

例:

"""
This is a javadoc style.

@param param1: this is a first param
@param param2: this is a second param
@return: this is a description of what is returned
@raise keyError: raises an exception
"""

-reST

如今,可能更流行的格式是Sphinx用于生成文档的reStructuredText(reST)格式。注意:默认在JetBrains PyCharm中使用它(在定义方法后键入三引号,然后按Enter键)。默认情况下,它也用作Pyment中的输出格式。

例:

"""
This is a reST style.

:param param1: this is a first param
:param param2: this is a second param
:returns: this is a description of what is returned
:raises keyError: raises an exception
"""

– 谷歌

Google有自己常用的格式。Sphinx也可以解释它(即使用Napoleon插件)。

例:

"""
This is an example of Google style.

Args:
    param1: This is the first param.
    param2: This is a second param.

Returns:
    This is a description of what is returned.

Raises:
    KeyError: Raises an exception.
"""

甚至更多的例子

-Numpydoc

请注意,Numpy建议根据Google格式使用自己的numpydoc,并且Sphinx可以使用。

"""
My numpydoc description of a kind
of very exhautive numpydoc format docstring.

Parameters
----------
first : array_like
    the 1st param name `first`
second :
    the 2nd param
third : {'value', 'other'}, optional
    the 3rd param, by default 'value'

Returns
-------
string
    a value in a string

Raises
------
KeyError
    when a key error
OtherError
    when an other error
"""

转换/生成

可以使用Pyment之类的工具自动为尚未记录的Python项目生成文档字符串,或者将现有文档字符串(可以混合多种格式)从一种格式转换为另一种格式。

注意:这些示例摘自Pyment文档

Formats

Python docstrings can be written following several formats as the other posts showed. However the default Sphinx docstring format was not mentioned and is based on reStructuredText (reST). You can get some information about the main formats in this blog post.

Note that the reST is recommended by the PEP 287

There follows the main used formats for docstrings.

– Epytext

Historically a javadoc like style was prevalent, so it was taken as a base for Epydoc (with the called Epytext format) to generate documentation.

Example:

"""
This is a javadoc style.

@param param1: this is a first param
@param param2: this is a second param
@return: this is a description of what is returned
@raise keyError: raises an exception
"""

– reST

Nowadays, the probably more prevalent format is the reStructuredText (reST) format that is used by Sphinx to generate documentation. Note: it is used by default in JetBrains PyCharm (type triple quotes after defining a method and hit enter). It is also used by default as output format in Pyment.

Example:

"""
This is a reST style.

:param param1: this is a first param
:param param2: this is a second param
:returns: this is a description of what is returned
:raises keyError: raises an exception
"""

– Google

Google has their own format that is often used. It also can be interpreted by Sphinx (ie. using Napoleon plugin).

Example:

"""
This is an example of Google style.

Args:
    param1: This is the first param.
    param2: This is a second param.

Returns:
    This is a description of what is returned.

Raises:
    KeyError: Raises an exception.
"""

Even more examples

– Numpydoc

Note that Numpy recommend to follow their own numpydoc based on Google format and usable by Sphinx.

"""
My numpydoc description of a kind
of very exhautive numpydoc format docstring.

Parameters
----------
first : array_like
    the 1st param name `first`
second :
    the 2nd param
third : {'value', 'other'}, optional
    the 3rd param, by default 'value'

Returns
-------
string
    a value in a string

Raises
------
KeyError
    when a key error
OtherError
    when an other error
"""

Converting/Generating

It is possible to use a tool like Pyment to automatically generate docstrings to a Python project not yet documented, or to convert existing docstrings (can be mixing several formats) from a format to an other one.

Note: The examples are taken from the Pyment documentation


回答 1

谷歌的风格指南中包含一个优秀的Python风格指南。它包括可读文档字符串语法的约定,约定比PEP-257提供更好的指导。例如:

def square_root(n):
    """Calculate the square root of a number.

    Args:
        n: the number to get the square root of.
    Returns:
        the square root of n.
    Raises:
        TypeError: if n is not a number.
        ValueError: if n is negative.

    """
    pass

我想将此扩展为在参数中也包含类型信息,如本Sphinx文档教程中所述。例如:

def add_value(self, value):
    """Add a new value.

       Args:
           value (str): the value to add.
    """
    pass

The Google style guide contains an excellent Python style guide. It includes conventions for readable docstring syntax that offers better guidance than PEP-257. For example:

def square_root(n):
    """Calculate the square root of a number.

    Args:
        n: the number to get the square root of.
    Returns:
        the square root of n.
    Raises:
        TypeError: if n is not a number.
        ValueError: if n is negative.

    """
    pass

I like to extend this to also include type information in the arguments, as described in this Sphinx documentation tutorial. For example:

def add_value(self, value):
    """Add a new value.

       Args:
           value (str): the value to add.
    """
    pass

回答 2

PEP-257中的文档字符串约定比PEP-8更为详细。

但是,文档字符串似乎比其他代码区域更具个性。不同的项目将有自己的标准。

我倾向于总是包含docstrings,因为它们倾向于演示如何使用该函数以及该函数的执行速度非常快。

无论字符串的长度如何,我都希望保持一致。我喜欢缩进和间距一致时的代码外观。这意味着,我使用:

def sq(n):
    """
    Return the square of n. 
    """
    return n * n

过度:

def sq(n):
    """Returns the square of n."""
    return n * n

并倾向于在较长的文档字符串中省略第一行的注释:

def sq(n):
    """
    Return the square of n, accepting all numeric types:

    >>> sq(10)
    100

    >>> sq(10.434)
    108.86835599999999

    Raises a TypeError when input is invalid:

    >>> sq(4*'435')
    Traceback (most recent call last):
      ...
    TypeError: can't multiply sequence by non-int of type 'str'

    """
    return n*n

意思是我发现像这样开始的文档字符串很乱。

def sq(n):
    """Return the squared result. 
    ...

Docstring conventions are in PEP-257 with much more detail than PEP-8.

However, docstrings seem to be far more personal than other areas of code. Different projects will have their own standard.

I tend to always include docstrings, because they tend to demonstrate how to use the function and what it does very quickly.

I prefer to keep things consistent, regardless of the length of the string. I like how to code looks when indentation and spacing are consistent. That means, I use:

def sq(n):
    """
    Return the square of n. 
    """
    return n * n

Over:

def sq(n):
    """Returns the square of n."""
    return n * n

And tend to leave off commenting on the first line in longer docstrings:

def sq(n):
    """
    Return the square of n, accepting all numeric types:

    >>> sq(10)
    100

    >>> sq(10.434)
    108.86835599999999

    Raises a TypeError when input is invalid:

    >>> sq(4*'435')
    Traceback (most recent call last):
      ...
    TypeError: can't multiply sequence by non-int of type 'str'

    """
    return n*n

Meaning I find docstrings that start like this to be messy.

def sq(n):
    """Return the squared result. 
    ...

回答 3

显然没有人提到它:您还可以使用Numpy Docstring Standard。它在科学界被广泛使用。

用于解析Google样式文档字符串的Napolean狮身人面像扩展名(在@Nathan的答案中建议)也支持Numpy样式文档字符串,并对两者进行简短的比较

最后一个基本示例给出了它的外观:

def func(arg1, arg2):
    """Summary line.

    Extended description of function.

    Parameters
    ----------
    arg1 : int
        Description of arg1
    arg2 : str
        Description of arg2

    Returns
    -------
    bool
        Description of return value

    See Also
    --------
    otherfunc : some related other function

    Examples
    --------
    These are written in doctest format, and should illustrate how to
    use the function.

    >>> a=[1,2,3]
    >>> print [x + 3 for x in a]
    [4, 5, 6]
    """
    return True

As apparantly no one mentioned it: you can also use the Numpy Docstring Standard. It is widely used in the scientific community.

The Napolean sphinx extension to parse Google-style docstrings (recommended in the answer of @Nathan) also supports Numpy-style docstring, and makes a short comparison of both.

And last a basic example to give an idea how it looks like:

def func(arg1, arg2):
    """Summary line.

    Extended description of function.

    Parameters
    ----------
    arg1 : int
        Description of arg1
    arg2 : str
        Description of arg2

    Returns
    -------
    bool
        Description of return value

    See Also
    --------
    otherfunc : some related other function

    Examples
    --------
    These are written in doctest format, and should illustrate how to
    use the function.

    >>> a=[1,2,3]
    >>> print [x + 3 for x in a]
    [4, 5, 6]
    """
    return True

回答 4

PEP-8是官方的python编码标准。它包含有关文档字符串的部分,该部分引用了PEP- 257-文档字符串的完整规范。

PEP-8 is the official python coding standard. It contains a section on docstrings, which refers to PEP-257 — a complete specification for docstrings.


回答 5

是Python;一切顺利。考虑如何发布您的文档。除了您的源代码读者以外,文档字符串是不可见的。

人们真的很喜欢浏览和搜索网络上的文档。为此,请使用文档工具Sphinx。这是记录Python项目的实际标准。该产品非常漂亮-请访问https://python-guide.readthedocs.org/en/latest/。“ 阅读文档 ”网站将免费托管您的文档。

It’s Python; anything goes. Consider how to publish your documentation. Docstrings are invisible except to readers of your source code.

People really like to browse and search documentation on the web. To achieve that, use the documentation tool Sphinx. It’s the de-facto standard for documenting Python projects. The product is beautiful – take a look at https://python-guide.readthedocs.org/en/latest/ . The website Read the Docs will host your docs for free.


回答 6

我建议使用Vladimir Keleshev的pep257 Python程序根据PEP-257Numpy Docstring Standard检查您的文档字符串,以描述参数,返回值等。

pep257将报告您与标准的差异,称为pylint和pep8。

I suggest using Vladimir Keleshev’s pep257 Python program to check your docstrings against PEP-257 and the Numpy Docstring Standard for describing parameters, returns, etc.

pep257 will report divergence you make from the standard and is called like pylint and pep8.


Python的`如果x不是None`或`如果x不是None`?

问题:Python的`如果x不是None`或`如果x不是None`?

我一直认为该if not x is None版本会更清晰,但是Google的样式指南PEP-8都使用if x is not None。是否存在任何微小的性能差异(我假设不是),并且在任何情况下确实不适合(使另一方成为我的会议的明显获胜者)吗?*

*我指的是任何单身人士,而不仅仅是None

…比较单例,如“无”。使用是或不是。

I’ve always thought of the if not x is None version to be more clear, but Google’s style guide and PEP-8 both use if x is not None. Is there any minor performance difference (I’m assuming not), and is there any case where one really doesn’t fit (making the other a clear winner for my convention)?*

*I’m referring to any singleton, rather than just None.

…to compare singletons like None. Use is or is not.


回答 0

没有性能差异,因为它们可以编译为相同的字节码:

Python 2.6.2 (r262:71600, Apr 15 2009, 07:20:39)
>>> import dis
>>> def f(x):
...    return x is not None
...
>>> dis.dis(f)
  2           0 LOAD_FAST                0 (x)
              3 LOAD_CONST               0 (None)
              6 COMPARE_OP               9 (is not)
              9 RETURN_VALUE
>>> def g(x):
...   return not x is None
...
>>> dis.dis(g)
  2           0 LOAD_FAST                0 (x)
              3 LOAD_CONST               0 (None)
              6 COMPARE_OP               9 (is not)
              9 RETURN_VALUE

从风格上讲,我尽量避免not x is y。尽管编译器总是将其视为not (x is y)。读者可能会误解为(not x) is y。如果我写的x is not y话就没有歧义。

There’s no performance difference, as they compile to the same bytecode:

Python 2.6.2 (r262:71600, Apr 15 2009, 07:20:39)
>>> import dis
>>> def f(x):
...    return x is not None
...
>>> dis.dis(f)
  2           0 LOAD_FAST                0 (x)
              3 LOAD_CONST               0 (None)
              6 COMPARE_OP               9 (is not)
              9 RETURN_VALUE
>>> def g(x):
...   return not x is None
...
>>> dis.dis(g)
  2           0 LOAD_FAST                0 (x)
              3 LOAD_CONST               0 (None)
              6 COMPARE_OP               9 (is not)
              9 RETURN_VALUE

Stylistically, I try to avoid not x is y. Although the compiler will always treat it as not (x is y), a human reader might misunderstand the construct as (not x) is y. If I write x is not y then there is no ambiguity.


回答 1

Google和Python的样式指南都是最佳做法:

if x is not None:
    # Do something about x

使用not x会导致不良结果。

见下文:

>>> x = 1
>>> not x
False
>>> x = [1]
>>> not x
False
>>> x = 0
>>> not x
True
>>> x = [0]         # You don't want to fall in this one.
>>> not x
False

您可能有兴趣了解对Python TrueFalse在Python 中评估了哪些文字:


编辑以下评论:

我只是做了一些测试。先not x is None不取反x,然后与相比较None。实际上,is使用这种方式时,似乎运算符具有更高的优先级:

>>> x
[0]
>>> not x is None
True
>>> not (x is None)
True
>>> (not x) is None
False

因此,not x is None以我的诚实观点,最好避免。


更多编辑:

我只是做了更多测试,可以确认bukzor的评论正确。(至少,我无法证明这一点。)

这意味着if x is not None结果与相同if not x is None。我站得住了。谢谢布克佐。

但是,我的答案仍然是:使用常规if x is not None:]

Both Google and Python‘s style guide is the best practice:

if x is not None:
    # Do something about x

Using not x can cause unwanted results.

See below:

>>> x = 1
>>> not x
False
>>> x = [1]
>>> not x
False
>>> x = 0
>>> not x
True
>>> x = [0]         # You don't want to fall in this one.
>>> not x
False

You may be interested to see what literals are evaluated to True or False in Python:


Edit for comment below:

I just did some more testing. not x is None doesn’t negate x first and then compared to None. In fact, it seems the is operator has a higher precedence when used that way:

>>> x
[0]
>>> not x is None
True
>>> not (x is None)
True
>>> (not x) is None
False

Therefore, not x is None is just, in my honest opinion, best avoided.


More edit:

I just did more testing and can confirm that bukzor’s comment is correct. (At least, I wasn’t able to prove it otherwise.)

This means if x is not None has the exact result as if not x is None. I stand corrected. Thanks bukzor.

However, my answer still stands: Use the conventional if x is not None. :]


回答 2

应该首先编写代码,以便程序员首先可以理解,然后再编译器或解释器理解。“不是”构造比“不是”更像英语。

Code should be written to be understandable to the programmer first, and the compiler or interpreter second. The “is not” construct resembles English more closely than “not is”.


回答 3

Python if x is not None还是if not x is None

TLDR:字节码编译器将它们都解析为x is not None-为了便于阅读,请使用if x is not None

可读性

我们之所以使用Python,是因为我们重视诸如人类可读性,可用性和各种编程范式的正确性之类的东西,而不是性能。

Python针对可读性进行了优化,尤其是在这种情况下。

解析和编译字节码

not 结合更弱is,所以这里没有逻辑的差异。请参阅文档

运算符isis not测试对象标识:x is y当且仅当x和y是同一对象时才为true。x is not y产生反真值。

is not有具体规定,在Python 语法作为语言可读性改善:

comp_op: '<'|'>'|'=='|'>='|'<='|'<>'|'!='|'in'|'not' 'in'|'is'|'is' 'not'

因此,它也是语法的一个统一要素。

当然,它的解析方式不同:

>>> import ast
>>> ast.dump(ast.parse('x is not None').body[0].value)
"Compare(left=Name(id='x', ctx=Load()), ops=[IsNot()], comparators=[Name(id='None', ctx=Load())])"
>>> ast.dump(ast.parse('not x is None').body[0].value)
"UnaryOp(op=Not(), operand=Compare(left=Name(id='x', ctx=Load()), ops=[Is()], comparators=[Name(id='None', ctx=Load())]))"

但是字节编译器实际上会将转换not ... isis not

>>> import dis
>>> dis.dis(lambda x, y: x is not y)
  1           0 LOAD_FAST                0 (x)
              3 LOAD_FAST                1 (y)
              6 COMPARE_OP               9 (is not)
              9 RETURN_VALUE
>>> dis.dis(lambda x, y: not x is y)
  1           0 LOAD_FAST                0 (x)
              3 LOAD_FAST                1 (y)
              6 COMPARE_OP               9 (is not)
              9 RETURN_VALUE

因此,为了便于阅读并按预期使用语言,请使用is not

不使用它不明智的。

Python if x is not None or if not x is None?

TLDR: The bytecode compiler parses them both to x is not None – so for readability’s sake, use if x is not None.

Readability

We use Python because we value things like human readability, useability, and correctness of various paradigms of programming over performance.

Python optimizes for readability, especially in this context.

Parsing and Compiling the Bytecode

The not binds more weakly than is, so there is no logical difference here. See the documentation:

The operators is and is not test for object identity: x is y is true if and only if x and y are the same object. x is not y yields the inverse truth value.

The is not is specifically provided for in the Python grammar as a readability improvement for the language:

comp_op: '<'|'>'|'=='|'>='|'<='|'<>'|'!='|'in'|'not' 'in'|'is'|'is' 'not'

And so it is a unitary element of the grammar as well.

Of course, it is not parsed the same:

>>> import ast
>>> ast.dump(ast.parse('x is not None').body[0].value)
"Compare(left=Name(id='x', ctx=Load()), ops=[IsNot()], comparators=[Name(id='None', ctx=Load())])"
>>> ast.dump(ast.parse('not x is None').body[0].value)
"UnaryOp(op=Not(), operand=Compare(left=Name(id='x', ctx=Load()), ops=[Is()], comparators=[Name(id='None', ctx=Load())]))"

But then the byte compiler will actually translate the not ... is to is not:

>>> import dis
>>> dis.dis(lambda x, y: x is not y)
  1           0 LOAD_FAST                0 (x)
              3 LOAD_FAST                1 (y)
              6 COMPARE_OP               9 (is not)
              9 RETURN_VALUE
>>> dis.dis(lambda x, y: not x is y)
  1           0 LOAD_FAST                0 (x)
              3 LOAD_FAST                1 (y)
              6 COMPARE_OP               9 (is not)
              9 RETURN_VALUE

So for the sake of readability and using the language as it was intended, please use is not.

To not use it is not wise.


回答 4

答案比人们做的要简单。

两种方法都没有技术优势,其他人都使用 “ x不是y” ,这显然是赢家。是否“看起来更像英语”并不重要;每个人都使用它,这意味着Python的每个用户-甚至是中国用户,其语言与Python看起来都不像-都将一目了然地理解它,稍稍不常见的语法将需要花费更多的脑力来解析。

至少在这个领域,不要仅仅为了与众不同而与众不同。

The answer is simpler than people are making it.

There’s no technical advantage either way, and “x is not y” is what everybody else uses, which makes it the clear winner. It doesn’t matter that it “looks more like English” or not; everyone uses it, which means every user of Python–even Chinese users, whose language Python looks nothing like–will understand it at a glance, where the slightly less common syntax will take a couple extra brain cycles to parse.

Don’t be different just for the sake of being different, at least in this field.


回答 5

is not由于is风格上的原因,操作员优先于否定结果。“ if x is not None:”的读法类似于英语,但“ if not x is None:”需要理解操作符的优先级,并且读起来并不像英文。

如果有性能上的差异,我会花钱is not,但这几乎肯定不是决定选择该技术的动机。显然,这将取决于实现。由于这is是不可替代的,因此无论如何都应该很容易优化任何区别。

The is not operator is preferred over negating the result of is for stylistic reasons. “if x is not None:” reads just like English, but “if not x is None:” requires understanding of the operator precedence and does not read like english.

If there is a performance difference my money is on is not, but this almost certainly isn’t the motivation for the decision to prefer that technique. It would obviously be implementation-dependent. Since is isn’t overridable, it should be easy to optimise out any distinction anyhow.


回答 6

我个人使用

if not (x is None):

每个程序员,即使不是Python语法专家的程序员,也都可以毫不歧义地立即理解它。

Personally, I use

if not (x is None):

which is understood immediately without ambiguity by every programmer, even those not expert in the Python syntax.


回答 7

if not x is None与其他编程语言更相似,但if x is not None对我来说绝对听起来更清晰(英语语法更正确)。

话虽如此,这似乎对我来说更偏爱。

if not x is None is more similar to other programming languages, but if x is not None definitely sounds more clear (and is more grammatically correct in English) to me.

That said it seems like it’s more of a preference thing to me.


回答 8

我更喜欢可读性强的形式,而x is not y 不是想如何最终写出运算符的代码处理优先级以产生可读性更高的代码。

I would prefer the more readable form x is not y than I would think how to eventually write the code handling precedence of the operators in order to produce much more readable code.


Python中的单引号与双引号[关闭]

问题:Python中的单引号与双引号[关闭]

根据文档,它们几乎可以互换。是否有出于某种风格的原因要在一个之上使用另一个?

According to the documentation, they’re pretty much interchangeable. Is there a stylistic reason to use one over the other?


回答 0

我喜欢在用于插值或自然语言消息的字符串周围使用双引号,对于像符号一样小的字符串使用单引号,但是如果字符串包含引号或我忘记了,则会违反规则。我对文档字符串使用三重双引号,对正则表达式使用原始字符串文字,即使不需要它们也是如此。

例如:

LIGHT_MESSAGES = {
    'English': "There are %(number_of_lights)s lights.",
    'Pirate':  "Arr! Thar be %(number_of_lights)s lights."
}

def lights_message(language, number_of_lights):
    """Return a language-appropriate string reporting the light count."""
    return LIGHT_MESSAGES[language] % locals()

def is_pirate(message):
    """Return True if the given message sounds piratical."""
    return re.search(r"(?i)(arr|avast|yohoho)!", message) is not None

I like to use double quotes around strings that are used for interpolation or that are natural language messages, and single quotes for small symbol-like strings, but will break the rules if the strings contain quotes, or if I forget. I use triple double quotes for docstrings and raw string literals for regular expressions even if they aren’t needed.

For example:

LIGHT_MESSAGES = {
    'English': "There are %(number_of_lights)s lights.",
    'Pirate':  "Arr! Thar be %(number_of_lights)s lights."
}

def lights_message(language, number_of_lights):
    """Return a language-appropriate string reporting the light count."""
    return LIGHT_MESSAGES[language] % locals()

def is_pirate(message):
    """Return True if the given message sounds piratical."""
    return re.search(r"(?i)(arr|avast|yohoho)!", message) is not None

回答 1

https://docs.python.org/2.0/ref/strings.html引用官方文档:

用简单的英语:字符串文字可以用匹配的单引号(’)或双引号(“)括起来。

因此没有区别。取而代之的是,人们会告诉您选择与上下文匹配并且一致的样式。我会同意-补充一点,试图为此类事情提出“惯例”是没有意义的,因为这样只会使任何新来者感到困惑。

Quoting the official docs at https://docs.python.org/2.0/ref/strings.html:

In plain English: String literals can be enclosed in matching single quotes (‘) or double quotes (“).

So there is no difference. Instead, people will tell you to choose whichever style that matches the context, and to be consistent. And I would agree – adding that it is pointless to try to come up with “conventions” for this sort of thing because you’ll only end up confusing any newcomers.


回答 2

我以前喜欢',尤其是对'''docstrings''',因为我觉得"""this creates some fluff"""。另外,'无需Shift我的瑞士德语键盘上的键即可键入。

从那以后,我改变为使用三引号"""docstrings""",以符合PEP 257

I used to prefer ', especially for '''docstrings''', as I find """this creates some fluff""". Also, ' can be typed without the Shift key on my Swiss German keyboard.

I have since changed to using triple quotes for """docstrings""", to conform to PEP 257.


回答 3

我与Will在一起:

  • 文字双引号
  • 行为类似于标识符的单引号
  • 正则表达式的双引号原始字符串文字
  • 文档字符串三重双引号

我会坚持下去,即使这意味着很多逃避。

从引号引起来的单引号标识符中,我获得了最大的价值。其余的做法只是为了给那些单引号标识符留出一定的空间。

I’m with Will:

  • Double quotes for text
  • Single quotes for anything that behaves like an identifier
  • Double quoted raw string literals for regexps
  • Tripled double quotes for docstrings

I’ll stick with that even if it means a lot of escaping.

I get the most value out of single quoted identifiers standing out because of the quotes. The rest of the practices are there just to give those single quoted identifiers some standing room.


回答 4

如果您的字符串包含一个,则应使用另一个。例如"You're able to do this",或'He said "Hi!"'。除此之外,您应该尽可能地保持一致(在模块内,包内,项目内,组织内)。

如果您的代码将由使用C / C ++的人员阅读(或者如果您在这些语言和Python之间切换),则将其''用于单字符字符串和""较长的字符串可能有助于简化转换。(同样地,对于遵循其他不可互换的其他语言)。

我在野外看到的Python代码倾向于优先"',但只是略微偏偏。从我所看到的情况来看,一个exceptions是,"""these"""它比普遍得多'''these'''

If the string you have contains one, then you should use the other. For example, "You're able to do this", or 'He said "Hi!"'. Other than that, you should simply be as consistent as you can (within a module, within a package, within a project, within an organisation).

If your code is going to be read by people who work with C/C++ (or if you switch between those languages and Python), then using '' for single-character strings, and "" for longer strings might help ease the transition. (Likewise for following other languages where they are not interchangeable).

The Python code I’ve seen in the wild tends to favour " over ', but only slightly. The one exception is that """these""" are much more common than '''these''', from what I have seen.


回答 5

用三引号引起来的注释是该问题的一个有趣的子主题。PEP 257指定文档字符串的三引号。我使用Google代码搜索进行了快速检查,发现Python中的三重双引号大约是三重单引号的 10倍-在Google索引的代码中出现了130万对131K。因此,在多行情况下,如果使用三重双引号,您的代码可能会变得更加熟悉。

Triple quoted comments are an interesting subtopic of this question. PEP 257 specifies triple quotes for doc strings. I did a quick check using Google Code Search and found that triple double quotes in Python are about 10x as popular as triple single quotes — 1.3M vs 131K occurrences in the code Google indexes. So in the multi line case your code is probably going to be more familiar to people if it uses triple double quotes.


回答 6

"If you're going to use apostrophes, 
       ^

you'll definitely want to use double quotes".
   ^

由于这个简单的原因,我总是在外面使用双引号。总是

说到绒毛,如果您将不得不使用转义字符来表示撇号,那么用’简化字符串文字有什么好处?它会冒犯编码员阅读小说吗?我无法想象高中英语课对你有多痛苦!

"If you're going to use apostrophes, 
       ^

you'll definitely want to use double quotes".
   ^

For that simple reason, I always use double quotes on the outside. Always

Speaking of fluff, what good is streamlining your string literals with ‘ if you’re going to have to use escape characters to represent apostrophes? Does it offend coders to read novels? I can’t imagine how painful high school English class was for you!


回答 7

Python使用如下引号:

mystringliteral1="this is a string with 'quotes'"
mystringliteral2='this is a string with "quotes"'
mystringliteral3="""this is a string with "quotes" and more 'quotes'"""
mystringliteral4='''this is a string with 'quotes' and more "quotes"'''
mystringliteral5='this is a string with \"quotes\"'
mystringliteral6='this is a string with \042quotes\042'
mystringliteral6='this is a string with \047quotes\047'

print mystringliteral1
print mystringliteral2
print mystringliteral3
print mystringliteral4
print mystringliteral5
print mystringliteral6

给出以下输出:

this is a string with 'quotes'
this is a string with "quotes"
this is a string with "quotes" and more 'quotes'
this is a string with 'quotes' and more "quotes"
this is a string with "quotes"
this is a string with 'quotes'

Python uses quotes something like this:

mystringliteral1="this is a string with 'quotes'"
mystringliteral2='this is a string with "quotes"'
mystringliteral3="""this is a string with "quotes" and more 'quotes'"""
mystringliteral4='''this is a string with 'quotes' and more "quotes"'''
mystringliteral5='this is a string with \"quotes\"'
mystringliteral6='this is a string with \042quotes\042'
mystringliteral6='this is a string with \047quotes\047'

print mystringliteral1
print mystringliteral2
print mystringliteral3
print mystringliteral4
print mystringliteral5
print mystringliteral6

Which gives the following output:

this is a string with 'quotes'
this is a string with "quotes"
this is a string with "quotes" and more 'quotes'
this is a string with 'quotes' and more "quotes"
this is a string with "quotes"
this is a string with 'quotes'

回答 8

我通常使用双引号,但出于某种原因而不是使用双引号-可能只是出于Java的习惯。

我猜您也更希望内联文字字符串中使用撇号,而不是双引号。

I use double quotes in general, but not for any specific reason – Probably just out of habit from Java.

I guess you’re also more likely to want apostrophes in an inline literal string than you are to want double quotes.


回答 9

我个人坚持一个或另一个。没关系 提供您自己的意思来引用任何一种,只是在您进行协作时使其他人感到困惑。

Personally I stick with one or the other. It doesn’t matter. And providing your own meaning to either quote is just to confuse other people when you collaborate.


回答 10

风格上的偏爱可能比什么都重要。我只是检查了PEP 8,却没有提到单引号和双引号。

我更喜欢单引号,因为它只有一个击键而不是两个击键。也就是说,我不必混入Shift键即可制作单引号。

It’s probably a stylistic preference more than anything. I just checked PEP 8 and didn’t see any mention of single versus double quotes.

I prefer single quotes because its only one keystroke instead of two. That is, I don’t have to mash the shift key to make single quote.


回答 11

在Perl中,当您有不需要插入变量或\ n,\ t,\ r等转义字符的字符串时,您想使用单引号。

PHP与Perl的区别是相同的:单引号中的内容将不会被解释(甚至不会转换\ n),而双引号中可能包含变量以显示其值。

恐怕Python没有。从技术上看,在Python中没有$令牌(或类似符号)可将名称/文本与变量分开。毕竟,这两个功能使Python更具可读性,减少了混乱。单引号和双引号可以在Python中互换使用。

In Perl you want to use single quotes when you have a string which doesn’t need to interpolate variables or escaped characters like \n, \t, \r, etc.

PHP makes the same distinction as Perl: content in single quotes will not be interpreted (not even \n will be converted), as opposed to double quotes which can contain variables to have their value printed out.

Python does not, I’m afraid. Technically seen, there is no $ token (or the like) to separate a name/text from a variable in Python. Both features make Python more readable, less confusing, after all. Single and double quotes can be used interchangeably in Python.


回答 12

我选择使用双引号,因为它们更易于查看。

I chose to use double quotes because they are easier to see.


回答 13

我只是用当时我喜欢的任何东西;能够一时之间在两者之间切换很方便!

当然,引用报价字符时,毕竟在两者之间切换可能不是那么古怪……

I just use whatever strikes my fancy at the time; it’s convenient to be able to switch between the two at a whim!

Of course, when quoting quote characetrs, switching between the two might not be so whimsical after all…


回答 14

您的团队的品味或项目的编码准则。

例如,如果您处于多语言环境中,则可能希望鼓励对其他语言使用的字符串使用相同类型的引号。另外,我个人最喜欢“

Your team’s taste or your project’s coding guidelines.

If you are in a multilanguage environment, you might wish to encourage the use of the same type of quotes for strings that the other language uses, for instance. Else, I personally like best the look of ‘


回答 15

据我所知没有。尽管如果看一些代码,“”通常用于文本字符串(我猜想’在文本内部比’更为常见),并且”出现在哈希键之类的东西中。

None as far as I know. Although if you look at some code, ” ” is commonly used for strings of text (I guess ‘ is more common inside text than “), and ‘ ‘ appears in hashkeys and things like that.


回答 16

我的目标是尽量减少像素和惊喜。我通常更喜欢'最小化像素,但是"如果字符串中带有单引号,则我还是希望最小化像素。但是,对于文档字符串,我更喜欢"""'''因为后者是非标准的,不常见的,因此令人惊讶。如果现在我"按照上述逻辑使用了一堆字符串,但又可以避免使用a字符串,那么我'仍然可以"在其中使用它来保持一致性,以最大程度地减少意外。

也许可以通过以下方式来考虑像素最小化原理。您希望英文字符看起来像A B C还是AA BB CC?后一种选择浪费了50%的非空像素。

I aim to minimize both pixels and surprise. I typically prefer ' in order to minimize pixels, but " instead if the string has an apostrophe, again to minimize pixels. For a docstring, however, I prefer """ over ''' because the latter is non-standard, uncommon, and therefore surprising. If now I have a bunch of strings where I used " per the above logic, but also one that can get away with a ', I may still use " in it to preserve consistency, only to minimize surprise.

Perhaps it helps to think of the pixel minimization philosophy in the following way. Would you rather that English characters looked like A B C or AA BB CC? The latter choice wastes 50% of the non-empty pixels.


回答 17

我使用双引号,是因为除Bash之外,大多数语言(C ++,Java,VB等)都已经使用多年了,因为我也在普通文本中使用双引号,并且因为我在使用(修改过的)非英语键盘,这两个字符都需要使用Shift键。

I use double quotes because I have been doing so for years in most languages (C++, Java, VB…) except Bash, because I also use double quotes in normal text and because I’m using a (modified) non-English keyboard where both characters require the shift key.


回答 18

' = "

/= \=\\

例如:

f = open('c:\word.txt', 'r')
f = open("c:\word.txt", "r")
f = open("c:/word.txt", "r")
f = open("c:\\\word.txt", "r")

结果是一样的

= >>不,他们不一样。单个反斜杠将转义字符。在该示例中,您只是碰巧了运气,因为\k\w不是有效的转义字符,例如\tor \n\\or\"

如果要使用单个反斜杠(并且将反斜杠解释为反斜杠),则需要使用“原始”字符串。您可以通过r在字符串前面加上“ ”来实现

im_raw = r'c:\temp.txt'
non_raw = 'c:\\temp.txt'
another_way = 'c:/temp.txt'

就Windows中的路径而言,正斜杠的解释方式相同。显然,字符串本身是不同的。但是,我不能保证在外部设备上会以这种方式处理它们。

' = "

/ = \ = \\

example :

f = open('c:\word.txt', 'r')
f = open("c:\word.txt", "r")
f = open("c:/word.txt", "r")
f = open("c:\\\word.txt", "r")

Results are the same

=>> no, they’re not the same. A single backslash will escape characters. You just happen to luck out in that example because \k and \w aren’t valid escapes like \t or \n or \\ or \"

If you want to use single backslashes (and have them interpreted as such), then you need to use a “raw” string. You can do this by putting an ‘r‘ in front of the string

im_raw = r'c:\temp.txt'
non_raw = 'c:\\temp.txt'
another_way = 'c:/temp.txt'

As far as paths in Windows are concerned, forward slashes are interpreted the same way. Clearly the string itself is different though. I wouldn’t guarantee that they’re handled this way on an external device though.