分类目录归档:知识问答

如何向字典添加新键?

问题:如何向字典添加新键?

创建字典后是否可以向Python字典添加关键字?

它似乎没有.add()方法。

Is it possible to add a key to a Python dictionary after it has been created?

It doesn’t seem to have an .add() method.


回答 0

d = {'key': 'value'}
print(d)
# {'key': 'value'}
d['mynewkey'] = 'mynewvalue'
print(d)
# {'key': 'value', 'mynewkey': 'mynewvalue'}
d = {'key': 'value'}
print(d)
# {'key': 'value'}
d['mynewkey'] = 'mynewvalue'
print(d)
# {'key': 'value', 'mynewkey': 'mynewvalue'}

回答 1

要同时添加多个键,请使用dict.update()

>>> x = {1:2}
>>> print(x)
{1: 2}

>>> d = {3:4, 5:6, 7:8}
>>> x.update(d)
>>> print(x)
{1: 2, 3: 4, 5: 6, 7: 8}

对于添加单个密钥,可接受的答案具有较少的计算开销。

To add multiple keys simultaneously, use dict.update():

>>> x = {1:2}
>>> print(x)
{1: 2}

>>> d = {3:4, 5:6, 7:8}
>>> x.update(d)
>>> print(x)
{1: 2, 3: 4, 5: 6, 7: 8}

For adding a single key, the accepted answer has less computational overhead.


回答 2

我想整合有关Python字典的信息:

创建一个空字典

data = {}
# OR
data = dict()

用初始值创建字典

data = {'a': 1, 'b': 2, 'c': 3}
# OR
data = dict(a=1, b=2, c=3)
# OR
data = {k: v for k, v in (('a', 1), ('b',2), ('c',3))}

插入/更新单个值

data['a'] = 1  # Updates if 'a' exists, else adds 'a'
# OR
data.update({'a': 1})
# OR
data.update(dict(a=1))
# OR
data.update(a=1)

插入/更新多个值

data.update({'c':3,'d':4})  # Updates 'c' and adds 'd'

创建合并字典而无需修改原始字典

data3 = {}
data3.update(data)  # Modifies data3, not data
data3.update(data2)  # Modifies data3, not data2

删除字典中的项目

del data[key]  # Removes specific element in a dictionary
data.pop(key)  # Removes the key & returns the value
data.clear()  # Clears entire dictionary

检查密钥是否已在字典中

key in data

遍历字典中的对

for key in data: # Iterates just through the keys, ignoring the values
for key, value in d.items(): # Iterates through the pairs
for key in d.keys(): # Iterates just through key, ignoring the values
for value in d.values(): # Iterates just through value, ignoring the keys

从两个列表创建字典

data = dict(zip(list_with_keys, list_with_values))

Python 3.5的新功能

创建合并字典而不修改原始字典:

这使用了称为字典解包的新功能。

data = {**data1, **data2, **data3}

Python 3.9的新功能

更新或添加现有字典的值

现在,更新运算符 |=可用于词典:

data |= {'c':3,'d':4}

创建合并字典而无需修改原始字典

合并操作 |现在工作的字典:

data = data1 | {'c':3,'d':4}

随时添加更多!

I feel like consolidating info about Python dictionaries:

Creating an empty dictionary

data = {}
# OR
data = dict()

Creating a dictionary with initial values

data = {'a': 1, 'b': 2, 'c': 3}
# OR
data = dict(a=1, b=2, c=3)
# OR
data = {k: v for k, v in (('a', 1), ('b',2), ('c',3))}

Inserting/Updating a single value

data['a'] = 1  # Updates if 'a' exists, else adds 'a'
# OR
data.update({'a': 1})
# OR
data.update(dict(a=1))
# OR
data.update(a=1)

Inserting/Updating multiple values

data.update({'c':3,'d':4})  # Updates 'c' and adds 'd'

Creating a merged dictionary without modifying originals

data3 = {}
data3.update(data)  # Modifies data3, not data
data3.update(data2)  # Modifies data3, not data2

Deleting items in dictionary

del data[key]  # Removes specific element in a dictionary
data.pop(key)  # Removes the key & returns the value
data.clear()  # Clears entire dictionary

Check if a key is already in dictionary

key in data

Iterate through pairs in a dictionary

for key in data: # Iterates just through the keys, ignoring the values
for key, value in d.items(): # Iterates through the pairs
for key in d.keys(): # Iterates just through key, ignoring the values
for value in d.values(): # Iterates just through value, ignoring the keys

Create a dictionary from two lists

data = dict(zip(list_with_keys, list_with_values))

New to Python 3.5

Creating a merged dictionary without modifying originals:

This uses a new featrue called dictionary unpacking.

data = {**data1, **data2, **data3}

New to Python 3.9

Update or add values for an existing dictionary

The update operator |= now works for dictionaries:

data |= {'c':3,'d':4}

Creating a merged dictionary without modifying originals

The merge operator | now works for dictionaries:

data = data1 | {'c':3,'d':4}

Feel free to add more!


回答 3

“创建密钥后是否可以向Python字典添加密钥?它似乎没有.add()方法。”

是的,这是可能的,并且它确实具有实现此目的的方法,但是您不想直接使用它。

为了演示如何以及如何不使用它,让我们用dict文字创建一个空的dict {}

my_dict = {}

最佳实践1:下标符号

要使用单个新键和值更新此字典,可以使用下标符号(请参见此处的映射)提供项目分配:

my_dict['new key'] = 'new value'

my_dict 就是现在:

{'new key': 'new value'}

最佳实践2:update方法-2种方法

我们也可以使用update方法高效地使用多个值更新字典。我们可能在dict这里不必要地创建了一个额外的东西,因此我们希望我们dict已经被创建并来自另一个目的或用于另一个目的:

my_dict.update({'key 2': 'value 2', 'key 3': 'value 3'})

my_dict 就是现在:

{'key 2': 'value 2', 'key 3': 'value 3', 'new key': 'new value'}

使用update方法执行此操作的另一种有效方法是使用关键字参数,但是由于它们必须是合法的python单词,因此您不能使用空格或特殊符号或以数字开头的名称,但是许多人认为这是更易读的方法为字典创建键,在这里我们当然避免创建额外的不必要的键dict

my_dict.update(foo='bar', foo2='baz')

my_dict现在是:

{'key 2': 'value 2', 'key 3': 'value 3', 'new key': 'new value', 
 'foo': 'bar', 'foo2': 'baz'}

因此,现在我们介绍了三种更新Python的Python方法dict


魔术方法,__setitem__以及为什么应避免使用

还有另一种dict您不应该使用的更新__setitem__方法,它使用的方法。这是一个示例,说明了如何使用该__setitem__方法将键值对添加到dict,并演示了使用它的不良性能:

>>> d = {}
>>> d.__setitem__('foo', 'bar')
>>> d
{'foo': 'bar'}


>>> def f():
...     d = {}
...     for i in xrange(100):
...         d['foo'] = i
... 
>>> def g():
...     d = {}
...     for i in xrange(100):
...         d.__setitem__('foo', i)
... 
>>> import timeit
>>> number = 100
>>> min(timeit.repeat(f, number=number))
0.0020880699157714844
>>> min(timeit.repeat(g, number=number))
0.005071878433227539

因此,我们看到使用下标符号实际上比使用下标符号要快得多__setitem__。做Python式的事情,也就是说,按照预期的方式使用该语言,通常既可读性强又计算效率高。

“Is it possible to add a key to a Python dictionary after it has been created? It doesn’t seem to have an .add() method.”

Yes it is possible, and it does have a method that implements this, but you don’t want to use it directly.

To demonstrate how and how not to use it, let’s create an empty dict with the dict literal, {}:

my_dict = {}

Best Practice 1: Subscript notation

To update this dict with a single new key and value, you can use the subscript notation (see Mappings here) that provides for item assignment:

my_dict['new key'] = 'new value'

my_dict is now:

{'new key': 'new value'}

Best Practice 2: The update method – 2 ways

We can also update the dict with multiple values efficiently as well using the update method. We may be unnecessarily creating an extra dict here, so we hope our dict has already been created and came from or was used for another purpose:

my_dict.update({'key 2': 'value 2', 'key 3': 'value 3'})

my_dict is now:

{'key 2': 'value 2', 'key 3': 'value 3', 'new key': 'new value'}

Another efficient way of doing this with the update method is with keyword arguments, but since they have to be legitimate python words, you can’t have spaces or special symbols or start the name with a number, but many consider this a more readable way to create keys for a dict, and here we certainly avoid creating an extra unnecessary dict:

my_dict.update(foo='bar', foo2='baz')

and my_dict is now:

{'key 2': 'value 2', 'key 3': 'value 3', 'new key': 'new value', 
 'foo': 'bar', 'foo2': 'baz'}

So now we have covered three Pythonic ways of updating a dict.


Magic method, __setitem__, and why it should be avoided

There’s another way of updating a dict that you shouldn’t use, which uses the __setitem__ method. Here’s an example of how one might use the __setitem__ method to add a key-value pair to a dict, and a demonstration of the poor performance of using it:

>>> d = {}
>>> d.__setitem__('foo', 'bar')
>>> d
{'foo': 'bar'}


>>> def f():
...     d = {}
...     for i in xrange(100):
...         d['foo'] = i
... 
>>> def g():
...     d = {}
...     for i in xrange(100):
...         d.__setitem__('foo', i)
... 
>>> import timeit
>>> number = 100
>>> min(timeit.repeat(f, number=number))
0.0020880699157714844
>>> min(timeit.repeat(g, number=number))
0.005071878433227539

So we see that using the subscript notation is actually much faster than using __setitem__. Doing the Pythonic thing, that is, using the language in the way it was intended to be used, usually is both more readable and computationally efficient.


回答 4

dictionary[key] = value
dictionary[key] = value

回答 5

如果要在字典中添加字典,可以使用此方法。

示例:将新条目添加到词典和子词典中

dictionary = {}
dictionary["new key"] = "some new entry" # add new dictionary entry
dictionary["dictionary_within_a_dictionary"] = {} # this is required by python
dictionary["dictionary_within_a_dictionary"]["sub_dict"] = {"other" : "dictionary"}
print (dictionary)

输出:

{'new key': 'some new entry', 'dictionary_within_a_dictionary': {'sub_dict': {'other': 'dictionarly'}}}

注意: Python要求您首先添加一个子

dictionary["dictionary_within_a_dictionary"] = {}

在添加条目之前。

If you want to add a dictionary within a dictionary you can do it this way.

Example: Add a new entry to your dictionary & sub dictionary

dictionary = {}
dictionary["new key"] = "some new entry" # add new dictionary entry
dictionary["dictionary_within_a_dictionary"] = {} # this is required by python
dictionary["dictionary_within_a_dictionary"]["sub_dict"] = {"other" : "dictionary"}
print (dictionary)

Output:

{'new key': 'some new entry', 'dictionary_within_a_dictionary': {'sub_dict': {'other': 'dictionarly'}}}

NOTE: Python requires that you first add a sub

dictionary["dictionary_within_a_dictionary"] = {}

before adding entries.


回答 6

正统语法为d[key] = value,但是如果键盘缺少方括号键,则可以执行以下操作:

d.__setitem__(key, value)

实际上,定义__getitem____setitem__方法是使自己的类支持方括号语法的方法。参见https://python.developpez.com/cours/DiveIntoPython/php/endiveintopython/object_linked_framework/special_class_methods.php

The orthodox syntax is d[key] = value, but if your keyboard is missing the square bracket keys you could do:

d.__setitem__(key, value)

In fact, defining __getitem__ and __setitem__ methods is how you can make your own class support the square bracket syntax. See https://python.developpez.com/cours/DiveIntoPython/php/endiveintopython/object_oriented_framework/special_class_methods.php


回答 7

您可以创建一个:

class myDict(dict):

    def __init__(self):
        self = dict()

    def add(self, key, value):
        self[key] = value

## example

myd = myDict()
myd.add('apples',6)
myd.add('bananas',3)
print(myd)

给出:

>>> 
{'apples': 6, 'bananas': 3}

You can create one:

class myDict(dict):

    def __init__(self):
        self = dict()

    def add(self, key, value):
        self[key] = value

## example

myd = myDict()
myd.add('apples',6)
myd.add('bananas',3)
print(myd)

Gives:

>>> 
{'apples': 6, 'bananas': 3}

回答 8

这个受欢迎的问题解决了合并词典和语法的功能方法。ab

以下是一些更简单的方法(已在Python 3中测试)…

c = dict( a, **b ) ## see also https://stackoverflow.com/q/2255878
c = dict( list(a.items()) + list(b.items()) )
c = dict( i for d in [a,b] for i in d.items() )

注意:以上第一种方法仅在输入的键b为字符串时才有效。

要添加或修改单个元素b字典将仅包含一个元素…

c = dict( a, **{'d':'dog'} ) ## returns a dictionary based on 'a'

这相当于…

def functional_dict_add( dictionary, key, value ):
   temp = dictionary.copy()
   temp[key] = value
   return temp

c = functional_dict_add( a, 'd', 'dog' )

This popular question addresses functional methods of merging dictionaries a and b.

Here are some of the more straightforward methods (tested in Python 3)…

c = dict( a, **b ) ## see also https://stackoverflow.com/q/2255878
c = dict( list(a.items()) + list(b.items()) )
c = dict( i for d in [a,b] for i in d.items() )

Note: The first method above only works if the keys in b are strings.

To add or modify a single element, the b dictionary would contain only that one element…

c = dict( a, **{'d':'dog'} ) ## returns a dictionary based on 'a'

This is equivalent to…

def functional_dict_add( dictionary, key, value ):
   temp = dictionary.copy()
   temp[key] = value
   return temp

c = functional_dict_add( a, 'd', 'dog' )

回答 9

假设您想生活在不可变的世界中,不想修改原始文件,而是想创建一个新文件dict,这是向原始文件添加新密钥的结果。

在Python 3.5+中,您可以执行以下操作:

params = {'a': 1, 'b': 2}
new_params = {**params, **{'c': 3}}

Python 2等效项是:

params = {'a': 1, 'b': 2}
new_params = dict(params, **{'c': 3})

在这两个之后:

params 仍然等于 {'a': 1, 'b': 2}

new_params 等于 {'a': 1, 'b': 2, 'c': 3}

有时候,您不想修改原始文件(您只想要添加到原始文件的结果)。我发现这可以替代以下内容:

params = {'a': 1, 'b': 2}
new_params = params.copy()
new_params['c'] = 3

要么

params = {'a': 1, 'b': 2}
new_params = params.copy()
new_params.update({'c': 3})

参考:https : //stackoverflow.com/a/2255892/514866

Let’s pretend you want to live in the immutable world and do NOT want to modify the original but want to create a new dict that is the result of adding a new key to the original.

In Python 3.5+ you can do:

params = {'a': 1, 'b': 2}
new_params = {**params, **{'c': 3}}

The Python 2 equivalent is:

params = {'a': 1, 'b': 2}
new_params = dict(params, **{'c': 3})

After either of these:

params is still equal to {'a': 1, 'b': 2}

and

new_params is equal to {'a': 1, 'b': 2, 'c': 3}

There will be times when you don’t want to modify the original (you only want the result of adding to the original). I find this a refreshing alternative to the following:

params = {'a': 1, 'b': 2}
new_params = params.copy()
new_params['c'] = 3

or

params = {'a': 1, 'b': 2}
new_params = params.copy()
new_params.update({'c': 3})

Reference: https://stackoverflow.com/a/2255892/514866


回答 10

如此众多的答案,仍然让每个人都忘记了这个名字奇怪,举止古怪而又方便的地方 dict.setdefault()

这个

value = my_dict.setdefault(key, default)

基本上就是这样做:

try:
    value = my_dict[key]
except KeyError: # key not found
    value = my_dict[key] = default

例如

>>> mydict = {'a':1, 'b':2, 'c':3}
>>> mydict.setdefault('d', 4)
4 # returns new value at mydict['d']
>>> print(mydict)
{'a':1, 'b':2, 'c':3, 'd':4} # a new key/value pair was indeed added
# but see what happens when trying it on an existing key...
>>> mydict.setdefault('a', 111)
1 # old value was returned
>>> print(mydict)
{'a':1, 'b':2, 'c':3, 'd':4} # existing key was ignored

So many answers and still everybody forgot about the strangely named, oddly behaved, and yet still handy dict.setdefault()

This

value = my_dict.setdefault(key, default)

basically just does this:

try:
    value = my_dict[key]
except KeyError: # key not found
    value = my_dict[key] = default

e.g.

>>> mydict = {'a':1, 'b':2, 'c':3}
>>> mydict.setdefault('d', 4)
4 # returns new value at mydict['d']
>>> print(mydict)
{'a':1, 'b':2, 'c':3, 'd':4} # a new key/value pair was indeed added
# but see what happens when trying it on an existing key...
>>> mydict.setdefault('a', 111)
1 # old value was returned
>>> print(mydict)
{'a':1, 'b':2, 'c':3, 'd':4} # existing key was ignored

回答 11

如果您不加入两个字典,而是将新的键值对添加到字典中,那么使用下标表示法似乎是最好的方法。

import timeit

timeit.timeit('dictionary = {"karga": 1, "darga": 2}; dictionary.update({"aaa": 123123, "asd": 233})')
>> 0.49582505226135254

timeit.timeit('dictionary = {"karga": 1, "darga": 2}; dictionary["aaa"] = 123123; dictionary["asd"] = 233;')
>> 0.20782899856567383

但是,例如,如果您想添加数千个新的键值对,则应考虑使用该update()方法。

If you’re not joining two dictionaries, but adding new key-value pairs to a dictionary, then using the subscript notation seems like the best way.

import timeit

timeit.timeit('dictionary = {"karga": 1, "darga": 2}; dictionary.update({"aaa": 123123, "asd": 233})')
>> 0.49582505226135254

timeit.timeit('dictionary = {"karga": 1, "darga": 2}; dictionary["aaa"] = 123123; dictionary["asd"] = 233;')
>> 0.20782899856567383

However, if you’d like to add, for example, thousands of new key-value pairs, you should consider using the update() method.


回答 12

我认为collections指出由许多有用的字典子类和包装器组成的Python 模块也是有用的,这些子类和包装器简化了字典中数据类型添加和修改,特别是defaultdict

dict子类,调用工厂函数以提供缺失值

如果要使用始终由相同数据类型或结构组成的字典(例如列表的字典),这将特别有用。

>>> from collections import defaultdict
>>> example = defaultdict(int)
>>> example['key'] += 1
>>> example['key']
defaultdict(<class 'int'>, {'key': 1})

如果键尚不存在,defaultdict则将给定的值(在我们的例子中10)分配为字典的初始值(通常在循环中使用)。因此,此操作有两件事:将一个新的键添加到字典中(按问题),如果该键尚不存在,则分配值。使用标准字典,这将在+=操作尝试访问尚不存在的值时引发错误:

>>> example = dict()
>>> example['key'] += 1
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
KeyError: 'key'

如果不使用defaultdict,则添加新元素的代码量会大得多,并且可能看起来像这样:

# This type of code would often be inside a loop
if 'key' not in example:
    example['key'] = 0  # add key and initial value to dict; could also be a list
example['key'] += 1  # this is implementing a counter

defaultdict也可以用于复杂的数据类型,例如listset

>>> example = defaultdict(list)
>>> example['key'].append(1)
>>> example
defaultdict(<class 'list'>, {'key': [1]})

添加元素会自动初始化列表。

I think it would also be useful to point out Python’s collections module that consists of many useful dictionary subclasses and wrappers that simplify the addition and modification of data types in a dictionary, specifically defaultdict:

dict subclass that calls a factory function to supply missing values

This is particularly useful if you are working with dictionaries that always consist of the same data types or structures, for example a dictionary of lists.

>>> from collections import defaultdict
>>> example = defaultdict(int)
>>> example['key'] += 1
>>> example['key']
defaultdict(<class 'int'>, {'key': 1})

If the key does not yet exist, defaultdict assigns the value given (in our case 10) as the initial value to the dictionary (often used inside loops). This operation therefore does two things: it adds a new key to a dictionary (as per question), and assigns the value if the key doesn’t yet exist. With the standard dictionary, this would have raised an error as the += operation is trying to access a value that doesn’t yet exist:

>>> example = dict()
>>> example['key'] += 1
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
KeyError: 'key'

Without the use of defaultdict, the amount of code to add a new element would be much greater and perhaps looks something like:

# This type of code would often be inside a loop
if 'key' not in example:
    example['key'] = 0  # add key and initial value to dict; could also be a list
example['key'] += 1  # this is implementing a counter

defaultdict can also be used with complex data types such as list and set:

>>> example = defaultdict(list)
>>> example['key'].append(1)
>>> example
defaultdict(<class 'list'>, {'key': [1]})

Adding an element automatically initialises the list.


回答 13

这是我在这里没有看到的另一种方式:

>>> foo = dict(a=1,b=2)
>>> foo
{'a': 1, 'b': 2}
>>> goo = dict(c=3,**foo)
>>> goo
{'c': 3, 'a': 1, 'b': 2}

您可以使用字典构造函数和隐式扩展来重建字典。此外,有趣的是,此方法可用于控制字典构建过程中的位置顺序(在Python 3.6之后)。实际上,Python 3.7及更高版本保证了插入顺序!

>>> foo = dict(a=1,b=2,c=3,d=4)
>>> new_dict = {k: v for k, v in list(foo.items())[:2]}
>>> new_dict
{'a': 1, 'b': 2}
>>> new_dict.update(newvalue=99)
>>> new_dict
{'a': 1, 'b': 2, 'newvalue': 99}
>>> new_dict.update({k: v for k, v in list(foo.items())[2:]})
>>> new_dict
{'a': 1, 'b': 2, 'newvalue': 99, 'c': 3, 'd': 4}
>>> 

上面是使用字典理解的。

Here’s another way that I didn’t see here:

>>> foo = dict(a=1,b=2)
>>> foo
{'a': 1, 'b': 2}
>>> goo = dict(c=3,**foo)
>>> goo
{'c': 3, 'a': 1, 'b': 2}

You can use the dictionary constructor and implicit expansion to reconstruct a dictionary. Moreover, interestingly, this method can be used to control the positional order during dictionary construction (post Python 3.6). In fact, insertion order is guaranteed for Python 3.7 and above!

>>> foo = dict(a=1,b=2,c=3,d=4)
>>> new_dict = {k: v for k, v in list(foo.items())[:2]}
>>> new_dict
{'a': 1, 'b': 2}
>>> new_dict.update(newvalue=99)
>>> new_dict
{'a': 1, 'b': 2, 'newvalue': 99}
>>> new_dict.update({k: v for k, v in list(foo.items())[2:]})
>>> new_dict
{'a': 1, 'b': 2, 'newvalue': 99, 'c': 3, 'd': 4}
>>> 

The above is using dictionary comprehension.


回答 14

首先检查密钥是否已经存在

a={1:2,3:4}
a.get(1)
2
a.get(5)
None

然后您可以添加新的键和值

first to check whether the key already exists

a={1:2,3:4}
a.get(1)
2
a.get(5)
None

then you can add the new key and value


回答 15

添加字典键,值类。

class myDict(dict):

    def __init__(self):
        self = dict()

    def add(self, key, value):
        #self[key] = value # add new key and value overwriting any exiting same key
        if self.get(key)!=None:
            print('key', key, 'already used') # report if key already used
        self.setdefault(key, value) # if key exit do nothing


## example

myd = myDict()
name = "fred"

myd.add('apples',6)
print('\n', myd)
myd.add('bananas',3)
print('\n', myd)
myd.add('jack', 7)
print('\n', myd)
myd.add(name, myd)
print('\n', myd)
myd.add('apples', 23)
print('\n', myd)
myd.add(name, 2)
print(myd)

add dictionary key, value class.

class myDict(dict):

    def __init__(self):
        self = dict()

    def add(self, key, value):
        #self[key] = value # add new key and value overwriting any exiting same key
        if self.get(key)!=None:
            print('key', key, 'already used') # report if key already used
        self.setdefault(key, value) # if key exit do nothing


## example

myd = myDict()
name = "fred"

myd.add('apples',6)
print('\n', myd)
myd.add('bananas',3)
print('\n', myd)
myd.add('jack', 7)
print('\n', myd)
myd.add(name, myd)
print('\n', myd)
myd.add('apples', 23)
print('\n', myd)
myd.add(name, 2)
print(myd)

“最少惊讶”和可变默认参数

问题:“最少惊讶”和可变默认参数

长时间修改Python的任何人都被以下问题咬伤(或弄成碎片):

def foo(a=[]):
    a.append(5)
    return a

Python新手希望此函数始终返回仅包含一个元素的列表[5]。结果是非常不同的,并且非常令人惊讶(对于新手而言):

>>> foo()
[5]
>>> foo()
[5, 5]
>>> foo()
[5, 5, 5]
>>> foo()
[5, 5, 5, 5]
>>> foo()

我的一位经理曾经第一次遇到此功能,并将其称为该语言的“巨大设计缺陷”。我回答说,这种行为有一个潜在的解释,如果您不了解内部原理,那确实是非常令人困惑和意外的。但是,我无法(对自己)回答以下问题:在函数定义而不是函数执行时绑定默认参数的原因是什么?我怀疑经验丰富的行为是否具有实际用途(谁真正在C中使用了静态变量,却没有滋生bug?)

编辑

巴泽克举了一个有趣的例子。连同您的大多数评论,特别是Utaal的评论,我进一步阐述了:

>>> def a():
...     print("a executed")
...     return []
... 
>>>            
>>> def b(x=a()):
...     x.append(5)
...     print(x)
... 
a executed
>>> b()
[5]
>>> b()
[5, 5]

在我看来,设计决策似乎与将参数范围放置在何处有关:在函数内部还是“一起”使用?

在函数内部进行绑定将意味着x在调用该函数(未定义)时,该绑定实际上已绑定到指定的默认值,这会带来深层的缺陷:def从绑定的一部分(即函数对象)将在定义时发生,部分(默认参数的分配)将在函数调用时发生。

实际行为更加一致:执行该行时将评估该行的所有内容,即在函数定义时进行评估。

Anyone tinkering with Python long enough has been bitten (or torn to pieces) by the following issue:

def foo(a=[]):
    a.append(5)
    return a

Python novices would expect this function to always return a list with only one element: [5]. The result is instead very different, and very astonishing (for a novice):

>>> foo()
[5]
>>> foo()
[5, 5]
>>> foo()
[5, 5, 5]
>>> foo()
[5, 5, 5, 5]
>>> foo()

A manager of mine once had his first encounter with this feature, and called it “a dramatic design flaw” of the language. I replied that the behavior had an underlying explanation, and it is indeed very puzzling and unexpected if you don’t understand the internals. However, I was not able to answer (to myself) the following question: what is the reason for binding the default argument at function definition, and not at function execution? I doubt the experienced behavior has a practical use (who really used static variables in C, without breeding bugs?)

Edit:

Baczek made an interesting example. Together with most of your comments and Utaal’s in particular, I elaborated further:

>>> def a():
...     print("a executed")
...     return []
... 
>>>            
>>> def b(x=a()):
...     x.append(5)
...     print(x)
... 
a executed
>>> b()
[5]
>>> b()
[5, 5]

To me, it seems that the design decision was relative to where to put the scope of parameters: inside the function or “together” with it?

Doing the binding inside the function would mean that x is effectively bound to the specified default when the function is called, not defined, something that would present a deep flaw: the def line would be “hybrid” in the sense that part of the binding (of the function object) would happen at definition, and part (assignment of default parameters) at function invocation time.

The actual behavior is more consistent: everything of that line gets evaluated when that line is executed, meaning at function definition.


回答 0

实际上,这不是设计缺陷,也不是由于内部因素或性能所致。
这完全是因为Python中的函数是一流的对象,而不仅仅是一段代码。

一旦您想到这种方式,就完全有道理了:函数是根据其定义求值的对象;默认参数属于“成员数据”,因此它们的状态可能会从一个调用更改为另一个调用-完全与其他任何对象一样。

无论如何,Effbot 在Python的Default Parameter Values中都很好地解释了这种现象的原因。
我发现它很清晰,我真的建议您阅读它,以更好地了解函数对象的工作原理。

Actually, this is not a design flaw, and it is not because of internals, or performance.
It comes simply from the fact that functions in Python are first-class objects, and not only a piece of code.

As soon as you get to think into this way, then it completely makes sense: a function is an object being evaluated on its definition; default parameters are kind of “member data” and therefore their state may change from one call to the other – exactly as in any other object.

In any case, Effbot has a very nice explanation of the reasons for this behavior in Default Parameter Values in Python.
I found it very clear, and I really suggest reading it for a better knowledge of how function objects work.


回答 1

假设您有以下代码

fruits = ("apples", "bananas", "loganberries")

def eat(food=fruits):
    ...

当我看到eat的声明时,最令人吃惊的事情是认为,如果没有给出第一个参数,它将等于元组 ("apples", "bananas", "loganberries")

但是,假设稍后在代码中,我做类似

def some_random_function():
    global fruits
    fruits = ("blueberries", "mangos")

然后,如果默认参数是在函数执行时绑定的,而不是在函数声明时绑定的,那么我会以一种非常糟糕的方式惊讶地发现结果已经改变。与发现foo上面的功能正在使列表发生变化相比,这将使IMO更加令人惊讶。

真正的问题在于可变变量,所有语言都在一定程度上存在此问题。这是一个问题:假设在Java中,我有以下代码:

StringBuffer s = new StringBuffer("Hello World!");
Map<StringBuffer,Integer> counts = new HashMap<StringBuffer,Integer>();
counts.put(s, 5);
s.append("!!!!");
System.out.println( counts.get(s) );  // does this work?

现在,我的地图StringBuffer在放入地图时会使用密钥的值吗,还是通过引用存储密钥?无论哪种方式,都会有人感到惊讶。尝试Map使用与其放入对象的值相同的值从对象中取出对象的人,或者即使他们使用的键实际上是同一个对象,似乎也无法检索其对象的人用来将其放入地图中(这实际上就是Python不允许将其可变的内置数据类型用作字典键的原因)。

您的示例很好地说明了Python新手会感到惊讶和被咬的情况。但是我认为,如果我们“解决”这个问题,那只会造成一种不同的情况,那就是被它们咬住,而且这种情况甚至不那么直观。而且,在处理可变变量时总是如此。您总是遇到这样的情况:根据编写的代码,某人可以直观地预期一种或相反的行为。

我个人喜欢Python当前的方法:定义函数时会评估默认函数参数,而该对象始终是默认对象。我想他们可以使用空列表来特殊情况,但是这种特殊的大小写会引起更多的惊讶,更不用说向后不兼容了。

Suppose you have the following code

fruits = ("apples", "bananas", "loganberries")

def eat(food=fruits):
    ...

When I see the declaration of eat, the least astonishing thing is to think that if the first parameter is not given, that it will be equal to the tuple ("apples", "bananas", "loganberries")

However, supposed later on in the code, I do something like

def some_random_function():
    global fruits
    fruits = ("blueberries", "mangos")

then if default parameters were bound at function execution rather than function declaration then I would be astonished (in a very bad way) to discover that fruits had been changed. This would be more astonishing IMO than discovering that your foo function above was mutating the list.

The real problem lies with mutable variables, and all languages have this problem to some extent. Here’s a question: suppose in Java I have the following code:

StringBuffer s = new StringBuffer("Hello World!");
Map<StringBuffer,Integer> counts = new HashMap<StringBuffer,Integer>();
counts.put(s, 5);
s.append("!!!!");
System.out.println( counts.get(s) );  // does this work?

Now, does my map use the value of the StringBuffer key when it was placed into the map, or does it store the key by reference? Either way, someone is astonished; either the person who tried to get the object out of the Map using a value identical to the one they put it in with, or the person who can’t seem to retrieve their object even though the key they’re using is literally the same object that was used to put it into the map (this is actually why Python doesn’t allow its mutable built-in data types to be used as dictionary keys).

Your example is a good one of a case where Python newcomers will be surprised and bitten. But I’d argue that if we “fixed” this, then that would only create a different situation where they’d be bitten instead, and that one would be even less intuitive. Moreover, this is always the case when dealing with mutable variables; you always run into cases where someone could intuitively expect one or the opposite behavior depending on what code they’re writing.

I personally like Python’s current approach: default function arguments are evaluated when the function is defined and that object is always the default. I suppose they could special-case using an empty list, but that kind of special casing would cause even more astonishment, not to mention be backwards incompatible.


回答 2

文档的相关部分:

执行功能定义时,默认参数值从左到右评估。这意味着,在定义函数时,表达式将被计算一次,并且每次调用均使用相同的“预计算”值。这对于理解默认参数是可变对象(例如列表或字典)时尤其重要:如果函数修改了该对象(例如,通过将项目附加到列表中),则默认值实际上已被修改。这通常不是预期的。解决此问题的方法是使用None默认值,并在函数主体中显式测试它,例如:

def whats_on_the_telly(penguin=None):
    if penguin is None:
        penguin = []
    penguin.append("property of the zoo")
    return penguin

The relevant part of the documentation:

Default parameter values are evaluated from left to right when the function definition is executed. This means that the expression is evaluated once, when the function is defined, and that the same “pre-computed” value is used for each call. This is especially important to understand when a default parameter is a mutable object, such as a list or a dictionary: if the function modifies the object (e.g. by appending an item to a list), the default value is in effect modified. This is generally not what was intended. A way around this is to use None as the default, and explicitly test for it in the body of the function, e.g.:

def whats_on_the_telly(penguin=None):
    if penguin is None:
        penguin = []
    penguin.append("property of the zoo")
    return penguin

回答 3

我对Python解释器的内部运作一无所知(而且我也不是编译器和解释器的专家),所以如果我提出任何不明智或不可能的事情,也不要怪我。

假设python对象是可变的,我认为在设计默认参数时应考虑到这一点。实例化列表时:

a = []

您希望获得由引用的列表a

为什么要a=[]

def x(a=[]):

在函数定义而不是调用上实例化一个新列表?就像您要问“如果用户不提供参数,则实例化一个新列表并像调用方产生的那样使用它”。我认为这是模棱两可的:

def x(a=datetime.datetime.now()):

用户,是否要a默认为定义或执行时的日期时间x?在这种情况下,与上一个例子一样,我将保持相同的行为,就像默认参数“赋值”是该函数的第一条指令(datetime.now()在函数调用时调用)一样。另一方面,如果用户想要定义时间映射,则可以编写:

b = datetime.datetime.now()
def x(a=b):

我知道,我知道:那是一个封闭。另外,Python可以提供一个关键字来强制定义时间绑定:

def x(static a=b):

I know nothing about the Python interpreter inner workings (and I’m not an expert in compilers and interpreters either) so don’t blame me if I propose anything unsensible or impossible.

Provided that python objects are mutable I think that this should be taken into account when designing the default arguments stuff. When you instantiate a list:

a = []

you expect to get a new list referenced by a.

Why should the a=[] in

def x(a=[]):

instantiate a new list on function definition and not on invocation? It’s just like you’re asking “if the user doesn’t provide the argument then instantiate a new list and use it as if it was produced by the caller”. I think this is ambiguous instead:

def x(a=datetime.datetime.now()):

user, do you want a to default to the datetime corresponding to when you’re defining or executing x? In this case, as in the previous one, I’ll keep the same behaviour as if the default argument “assignment” was the first instruction of the function (datetime.now() called on function invocation). On the other hand, if the user wanted the definition-time mapping he could write:

b = datetime.datetime.now()
def x(a=b):

I know, I know: that’s a closure. Alternatively Python might provide a keyword to force definition-time binding:

def x(static a=b):

回答 4

好吧,原因很简单:绑定是在执行代码时完成的,而函数定义是在执行时定义的。

比较一下:

class BananaBunch:
    bananas = []

    def addBanana(self, banana):
        self.bananas.append(banana)

此代码遭受完全相同的意外情况。bananas是一个类属性,因此,当您向其中添加内容时,它将被添加到该类的所有实例中。原因是完全一样的。

只是“它是如何工作的”,要使其在函数情况下以不同的方式工作可能会很复杂,而在类情况下则可能是不可能的,或者至少会大大减慢对象实例化,因为您必须保留类代码并在创建对象时执行它。

是的,这是意外的。但是一旦一分钱下降,它就完全适合Python的工作方式。实际上,这是一个很好的教学辅助工具,一旦您了解了为什么会发生这种情况,就可以更好地使用python。

也就是说,它应该在任何优秀的Python教程中都非常突出。因为正如您提到的,每个人迟早都会遇到此问题。

Well, the reason is quite simply that bindings are done when code is executed, and the function definition is executed, well… when the functions is defined.

Compare this:

class BananaBunch:
    bananas = []

    def addBanana(self, banana):
        self.bananas.append(banana)

This code suffers from the exact same unexpected happenstance. bananas is a class attribute, and hence, when you add things to it, it’s added to all instances of that class. The reason is exactly the same.

It’s just “How It Works”, and making it work differently in the function case would probably be complicated, and in the class case likely impossible, or at least slow down object instantiation a lot, as you would have to keep the class code around and execute it when objects are created.

Yes, it is unexpected. But once the penny drops, it fits in perfectly with how Python works in general. In fact, it’s a good teaching aid, and once you understand why this happens, you’ll grok python much better.

That said it should feature prominently in any good Python tutorial. Because as you mention, everyone runs into this problem sooner or later.


回答 5

你为什么不自省?

我真的惊讶,没有人对可调用对象执行Python提供的深刻的自省(23适用)。

给定一个简单的小函数,func定义为:

>>> def func(a = []):
...    a.append(5)

当Python遇到它时,它要做的第一件事就是对其进行编译,以便code为此函数创建一个对象。完成此编译步骤后,Python 计算 *,然后默认参数([]此处为空列表)存储在函数对象本身中。正如上面提到的最高答案:a现在可以将列表视为函数的成员func

因此,让我们进行一些自省,前后检查清单如何在内部扩展在函数对象。我Python 3.x为此使用,对于Python 2同样适用(在python 2中使用__defaults__func_defaults;是的,同一事物有两个名称)。

执行前的功能:

>>> def func(a = []):
...     a.append(5)
...     

Python执行此定义后,它将采用指定的任何默认参数(a = []在此处)并将其填充到__defaults__函数对象的属性中(相关部分:Callables):

>>> func.__defaults__
([],)

好的,所以__defaults__正如您期望的那样,将空列表作为中的单个条目。

执行后功能:

现在执行以下功能:

>>> func()

现在,让我们__defaults__再次看看:

>>> func.__defaults__
([5],)

吃惊吗 对象内部的值改变了!现在,对该函数的连续调用将简单地追加到该嵌入式list对象:

>>> func(); func(); func()
>>> func.__defaults__
([5, 5, 5, 5],)

因此,出现“缺陷”的原因是因为默认参数是函数对象的一部分。这里没有什么奇怪的事情,这一切都令人惊讶。

解决此问题的常见方法是使用None默认值,然后在函数体内进行初始化:

def func(a = None):
    # or: a = [] if a is None else a
    if a is None:
        a = []

由于函数主体每次都会重新执行,因此如果没有为传递任何参数,则始终会得到一个新的空列表a


要进一步验证in中的列表__defaults__与函数中使用的列表相同,func您只需更改函数以返回函数体内使用id的列表的列表即可a。然后,把它比作在列表中__defaults__(位置[0]__defaults__),你会看到这些确实是指的同一个列表实例:

>>> def func(a = []): 
...     a.append(5)
...     return id(a)
>>>
>>> id(func.__defaults__[0]) == func()
True

具备内省的力量!


*要验证在函数编译期间Python是否评估默认参数,请尝试执行以下命令:

def bar(a=input('Did you just see me without calling the function?')): 
    pass  # use raw_input in Py2

您会注意到,input()在构建函数并将其绑定到名称的过程完成之前会被调用bar

Why don’t you introspect?

I’m really surprised no one has performed the insightful introspection offered by Python (2 and 3 apply) on callables.

Given a simple little function func defined as:

>>> def func(a = []):
...    a.append(5)

When Python encounters it, the first thing it will do is compile it in order to create a code object for this function. While this compilation step is done, Python evaluates* and then stores the default arguments (an empty list [] here) in the function object itself. As the top answer mentioned: the list a can now be considered a member of the function func.

So, let’s do some introspection, a before and after to examine how the list gets expanded inside the function object. I’m using Python 3.x for this, for Python 2 the same applies (use __defaults__ or func_defaults in Python 2; yes, two names for the same thing).

Function Before Execution:

>>> def func(a = []):
...     a.append(5)
...     

After Python executes this definition it will take any default parameters specified (a = [] here) and cram them in the __defaults__ attribute for the function object (relevant section: Callables):

>>> func.__defaults__
([],)

O.k, so an empty list as the single entry in __defaults__, just as expected.

Function After Execution:

Let’s now execute this function:

>>> func()

Now, let’s see those __defaults__ again:

>>> func.__defaults__
([5],)

Astonished? The value inside the object changes! Consecutive calls to the function will now simply append to that embedded list object:

>>> func(); func(); func()
>>> func.__defaults__
([5, 5, 5, 5],)

So, there you have it, the reason why this ‘flaw’ happens, is because default arguments are part of the function object. There’s nothing weird going on here, it’s all just a bit surprising.

The common solution to combat this is to use None as the default and then initialize in the function body:

def func(a = None):
    # or: a = [] if a is None else a
    if a is None:
        a = []

Since the function body is executed anew each time, you always get a fresh new empty list if no argument was passed for a.


To further verify that the list in __defaults__ is the same as that used in the function func you can just change your function to return the id of the list a used inside the function body. Then, compare it to the list in __defaults__ (position [0] in __defaults__) and you’ll see how these are indeed refering to the same list instance:

>>> def func(a = []): 
...     a.append(5)
...     return id(a)
>>>
>>> id(func.__defaults__[0]) == func()
True

All with the power of introspection!


* To verify that Python evaluates the default arguments during compilation of the function, try executing the following:

def bar(a=input('Did you just see me without calling the function?')): 
    pass  # use raw_input in Py2

as you’ll notice, input() is called before the process of building the function and binding it to the name bar is made.


回答 6

我曾经认为在运行时创建对象是更好的方法。我现在不太确定,因为您确实失去了一些有用的功能,尽管不管是为了防止新手混淆,还是值得的。这样做的缺点是:

1.表现

def foo(arg=something_expensive_to_compute())):
    ...

如果使用了调用时评估,那么每次使用不带参数的函数时都会调用昂贵的函数。您要么为每个调用付出昂贵的代价,要么需要在外部手动缓存该值,从而污染您的命名空间并增加冗长性。

2.强制绑定参数

一个有用的技巧是在创建lambda时将lambda的参数绑定到变量的当前绑定。例如:

funcs = [ lambda i=i: i for i in range(10)]

这将返回分别返回0、1、2、3 …的函数列表。如果更改了行为,则它们将绑定i到i 的调用时值,因此您将获得所有返回的函数的列表9

否则,实现此目的的唯一方法是使用i绑定创建另一个闭包,即:

def make_func(i): return lambda: i
funcs = [make_func(i) for i in range(10)]

3.内省

考虑以下代码:

def foo(a='test', b=100, c=[]):
   print a,b,c

我们可以使用以下inspect模块获取有关参数和默认值的信息:

>>> inspect.getargspec(foo)
(['a', 'b', 'c'], None, None, ('test', 100, []))

该信息对于文档生成,元编程,装饰器等非常有用。

现在,假设可以更改默认行为,使其等效于:

_undefined = object()  # sentinel value

def foo(a=_undefined, b=_undefined, c=_undefined)
    if a is _undefined: a='test'
    if b is _undefined: b=100
    if c is _undefined: c=[]

但是,我们失去了自省的能力,无法看到默认参数。由于尚未构造对象,因此,如果不实际调用函数,就无法拥有它们。我们最好的办法是存储源代码,并将其作为字符串返回。

I used to think that creating the objects at runtime would be the better approach. I’m less certain now, since you do lose some useful features, though it may be worth it regardless simply to prevent newbie confusion. The disadvantages of doing so are:

1. Performance

def foo(arg=something_expensive_to_compute())):
    ...

If call-time evaluation is used, then the expensive function is called every time your function is used without an argument. You’d either pay an expensive price on each call, or need to manually cache the value externally, polluting your namespace and adding verbosity.

2. Forcing bound parameters

A useful trick is to bind parameters of a lambda to the current binding of a variable when the lambda is created. For example:

funcs = [ lambda i=i: i for i in range(10)]

This returns a list of functions that return 0,1,2,3… respectively. If the behaviour is changed, they will instead bind i to the call-time value of i, so you would get a list of functions that all returned 9.

The only way to implement this otherwise would be to create a further closure with the i bound, ie:

def make_func(i): return lambda: i
funcs = [make_func(i) for i in range(10)]

3. Introspection

Consider the code:

def foo(a='test', b=100, c=[]):
   print a,b,c

We can get information about the arguments and defaults using the inspect module, which

>>> inspect.getargspec(foo)
(['a', 'b', 'c'], None, None, ('test', 100, []))

This information is very useful for things like document generation, metaprogramming, decorators etc.

Now, suppose the behaviour of defaults could be changed so that this is the equivalent of:

_undefined = object()  # sentinel value

def foo(a=_undefined, b=_undefined, c=_undefined)
    if a is _undefined: a='test'
    if b is _undefined: b=100
    if c is _undefined: c=[]

However, we’ve lost the ability to introspect, and see what the default arguments are. Because the objects haven’t been constructed, we can’t ever get hold of them without actually calling the function. The best we could do is to store off the source code and return that as a string.


回答 7

捍卫Python的5分

  1. 简单性:行为在以下意义上是简单的:大多数人只会陷入一次陷阱,而不是几次。

  2. 一致性:Python 始终传递对象,而不传递名称。显然,默认参数是函数标题的一部分(而不是函数主体)。因此,应该在模块加载时(并且仅在模块加载时,除非嵌套)进行评估,而不是在函数调用时进行评估。

  3. 用途:正如Frederik Lundh在对“ Python中的默认参数值”的解释中所指出的那样,当前行为对于高级编程可能非常有用。(请谨慎使用。)

  4. 足够的文档:在最基本的Python文档中,该教程在“更多关于定义函数”部分的第一小节中 以“重要警告”的形式大声宣布该问题。警告甚至使用黑体字,很少在标题之外使用。RTFM:阅读精美的手册。

  5. 元学习:陷入陷阱实际上是一个非常有用的时刻(至少如果您是一个反思型学习者),因为您随后将更好地理解上面的“一致性”这一点,这将教给您很多有关Python的知识。

5 points in defense of Python

  1. Simplicity: The behavior is simple in the following sense: Most people fall into this trap only once, not several times.

  2. Consistency: Python always passes objects, not names. The default parameter is, obviously, part of the function heading (not the function body). It therefore ought to be evaluated at module load time (and only at module load time, unless nested), not at function call time.

  3. Usefulness: As Frederik Lundh points out in his explanation of “Default Parameter Values in Python”, the current behavior can be quite useful for advanced programming. (Use sparingly.)

  4. Sufficient documentation: In the most basic Python documentation, the tutorial, the issue is loudly announced as an “Important warning” in the first subsection of Section “More on Defining Functions”. The warning even uses boldface, which is rarely applied outside of headings. RTFM: Read the fine manual.

  5. Meta-learning: Falling into the trap is actually a very helpful moment (at least if you are a reflective learner), because you will subsequently better understand the point “Consistency” above and that will teach you a great deal about Python.


回答 8

此行为很容易通过以下方式解释:

  1. 函数(类等)声明仅执行一次,创建所有默认值对象
  2. 一切都通过引用传递

所以:

def x(a=0, b=[], c=[], d=0):
    a = a + 1
    b = b + [1]
    c.append(1)
    print a, b, c
  1. a 不变-每个分配调用都会创建一个新的int对象-打印新对象
  2. b 不变-从默认值构建新数组并打印
  3. c 更改-对同一对象执行操作-并打印

This behavior is easy explained by:

  1. function (class etc.) declaration is executed only once, creating all default value objects
  2. everything is passed by reference

So:

def x(a=0, b=[], c=[], d=0):
    a = a + 1
    b = b + [1]
    c.append(1)
    print a, b, c
  1. a doesn’t change – every assignment call creates new int object – new object is printed
  2. b doesn’t change – new array is build from default value and printed
  3. c changes – operation is performed on same object – and it is printed

回答 9

您要问的是为什么这样:

def func(a=[], b = 2):
    pass

在内部不等同于此:

def func(a=None, b = None):
    a_default = lambda: []
    b_default = lambda: 2
    def actual_func(a=None, b=None):
        if a is None: a = a_default()
        if b is None: b = b_default()
    return actual_func
func = func()

除了显式调用func(None,None)的情况外,我们将忽略它。

换句话说,为什么不存储默认参数,而不是评估默认参数,并在调用函数时对其进行评估?

一个答案可能就在那里-它可以有效地将具有默认参数的每个函数转换为闭包。即使全部隐藏在解释器中,而不是完全关闭,数据也必须存储在某个地方。它将变慢,并使用更多的内存。

What you’re asking is why this:

def func(a=[], b = 2):
    pass

isn’t internally equivalent to this:

def func(a=None, b = None):
    a_default = lambda: []
    b_default = lambda: 2
    def actual_func(a=None, b=None):
        if a is None: a = a_default()
        if b is None: b = b_default()
    return actual_func
func = func()

except for the case of explicitly calling func(None, None), which we’ll ignore.

In other words, instead of evaluating default parameters, why not store each of them, and evaluate them when the function is called?

One answer is probably right there–it would effectively turn every function with default parameters into a closure. Even if it’s all hidden away in the interpreter and not a full-blown closure, the data’s got to be stored somewhere. It’d be slower and use more memory.


回答 10

1)所谓的“可变默认参数”问题通常是一个特殊的示例,它表明:
“所有带有此问题的函数在实际参数上也遭受类似的副作用,”,
这违反了函数编程的规则,通常不可思议,应将两者固定在一起。

例:

def foo(a=[]):                 # the same problematic function
    a.append(5)
    return a

>>> somevar = [1, 2]           # an example without a default parameter
>>> foo(somevar)
[1, 2, 5]
>>> somevar
[1, 2, 5]                      # usually expected [1, 2]

解决方案:一个副本
的绝对安全解决方案是copydeepcopy输入对象进行操作,然后对副本执行任何操作。

def foo(a=[]):
    a = a[:]     # a copy
    a.append(5)
    return a     # or everything safe by one line: "return a + [5]"

许多内置的可变类型的复制方法如some_dict.copy()some_set.copy(),可以像somelist[:]或那样轻松复制list(some_list)。每个对象也可以通过以下方式复制copy.copy(any_object)或更彻底地复制:copy.deepcopy()(后者有用如果可变对象是从可变对象构成)。有些对象从根本上是基于副作用的,例如“文件”对象,并且不能通过复制有意义地进行复制。复制中

示例问题 类似的SO问题的

class Test(object):            # the original problematic class
  def __init__(self, var1=[]):
    self._var1 = var1

somevar = [1, 2]               # an example without a default parameter
t1 = Test(somevar)
t2 = Test(somevar)
t1._var1.append([1])
print somevar                  # [1, 2, [1]] but usually expected [1, 2]
print t2._var1                 # [1, 2, [1]] but usually expected [1, 2]

不应将其保存在任何公共场所此函数返回的实例的属性中。(假设实例的私有属性不应按惯例从此类或子类的外部进行修改。即为_var1私有属性)

结论:
输入参数对象不应就地修改(突变),也不应将其绑定到函数返回的对象中。(如果我们更喜欢强烈建议没有副作用的编程。请参见Wiki上的“副作用”(在此上下文中,前两段是相关内容)。)

2)
仅当需要对实际参数产生副作用但对默认参数没有副作用时,有用的解决方案是def ...(var1=None): if var1 is None: var1 = [] More。

3)在某些情况下,默认参数的可变行为很有用

1) The so-called problem of “Mutable Default Argument” is in general a special example demonstrating that:
“All functions with this problem suffer also from similar side effect problem on the actual parameter,”
That is against the rules of functional programming, usually undesiderable and should be fixed both together.

Example:

def foo(a=[]):                 # the same problematic function
    a.append(5)
    return a

>>> somevar = [1, 2]           # an example without a default parameter
>>> foo(somevar)
[1, 2, 5]
>>> somevar
[1, 2, 5]                      # usually expected [1, 2]

Solution: a copy
An absolutely safe solution is to copy or deepcopy the input object first and then to do whatever with the copy.

def foo(a=[]):
    a = a[:]     # a copy
    a.append(5)
    return a     # or everything safe by one line: "return a + [5]"

Many builtin mutable types have a copy method like some_dict.copy() or some_set.copy() or can be copied easy like somelist[:] or list(some_list). Every object can be also copied by copy.copy(any_object) or more thorough by copy.deepcopy() (the latter useful if the mutable object is composed from mutable objects). Some objects are fundamentally based on side effects like “file” object and can not be meaningfully reproduced by copy. copying

Example problem for a similar SO question

class Test(object):            # the original problematic class
  def __init__(self, var1=[]):
    self._var1 = var1

somevar = [1, 2]               # an example without a default parameter
t1 = Test(somevar)
t2 = Test(somevar)
t1._var1.append([1])
print somevar                  # [1, 2, [1]] but usually expected [1, 2]
print t2._var1                 # [1, 2, [1]] but usually expected [1, 2]

It shouldn’t be neither saved in any public attribute of an instance returned by this function. (Assuming that private attributes of instance should not be modified from outside of this class or subclasses by convention. i.e. _var1 is a private attribute )

Conclusion:
Input parameters objects shouldn’t be modified in place (mutated) nor they should not be binded into an object returned by the function. (If we prefere programming without side effects which is strongly recommended. see Wiki about “side effect” (The first two paragraphs are relevent in this context.) .)

2)
Only if the side effect on the actual parameter is required but unwanted on the default parameter then the useful solution is def ...(var1=None): if var1 is None: var1 = [] More..

3) In some cases is the mutable behavior of default parameters useful.


回答 11

实际上,这与默认值无关,除了在编写具有可变默认值的函数时,它经常会作为意外行为出现。

>>> def foo(a):
    a.append(5)
    print a

>>> a  = [5]
>>> foo(a)
[5, 5]
>>> foo(a)
[5, 5, 5]
>>> foo(a)
[5, 5, 5, 5]
>>> foo(a)
[5, 5, 5, 5, 5]

此代码中没有默认值,但是您遇到了完全相同的问题。

问题是当调用者不希望这样做时,foo正在修改从调用者传入的可变变量。如果函数被调用类似,这样的代码会很好append_5; 那么调用者将调用该函数以修改其传入的值,并且行为将是预期的。但是这样的函数不太可能采用默认参数,并且可能不会返回列表(因为调用者已经具有对该列表的引用;它只是传入了该列表)。

foo具有默认参数的原件不应修改a是显式传递还是获得默认值。除非上下文/名称/文档中明确指出应该修改参数,否则您的代码应仅保留可变参数。将传入的可变值作为参数用作本地临时对象是一个极坏的主意,无论我们是否使用Python,是否涉及默认参数。

如果您需要在计算内容的过程中破坏性地操作本地临时文件,并且需要从参数值开始进行操作,则需要进行复制。

This actually has nothing to do with default values, other than that it often comes up as an unexpected behaviour when you write functions with mutable default values.

>>> def foo(a):
    a.append(5)
    print a

>>> a  = [5]
>>> foo(a)
[5, 5]
>>> foo(a)
[5, 5, 5]
>>> foo(a)
[5, 5, 5, 5]
>>> foo(a)
[5, 5, 5, 5, 5]

No default values in sight in this code, but you get exactly the same problem.

The problem is that foo is modifying a mutable variable passed in from the caller, when the caller doesn’t expect this. Code like this would be fine if the function was called something like append_5; then the caller would be calling the function in order to modify the value they pass in, and the behaviour would be expected. But such a function would be very unlikely to take a default argument, and probably wouldn’t return the list (since the caller already has a reference to that list; the one it just passed in).

Your original foo, with a default argument, shouldn’t be modifying a whether it was explicitly passed in or got the default value. Your code should leave mutable arguments alone unless it is clear from the context/name/documentation that the arguments are supposed to be modified. Using mutable values passed in as arguments as local temporaries is an extremely bad idea, whether we’re in Python or not and whether there are default arguments involved or not.

If you need to destructively manipulate a local temporary in the course of computing something, and you need to start your manipulation from an argument value, you need to make a copy.


回答 12

话题已经很忙了,但是根据我在这里所读到的内容,以下内容帮助我意识到了它在内部的工作方式:

def bar(a=[]):
     print id(a)
     a = a + [1]
     print id(a)
     return a

>>> bar()
4484370232
4484524224
[1]
>>> bar()
4484370232
4484524152
[1]
>>> bar()
4484370232 # Never change, this is 'class property' of the function
4484523720 # Always a new object 
[1]
>>> id(bar.func_defaults[0])
4484370232

Already busy topic, but from what I read here, the following helped me realizing how it’s working internally:

def bar(a=[]):
     print id(a)
     a = a + [1]
     print id(a)
     return a

>>> bar()
4484370232
4484524224
[1]
>>> bar()
4484370232
4484524152
[1]
>>> bar()
4484370232 # Never change, this is 'class property' of the function
4484523720 # Always a new object 
[1]
>>> id(bar.func_defaults[0])
4484370232

回答 13

这是一项性能优化。通过此功能,您认为这两个函数调用中哪个更快?

def print_tuple(some_tuple=(1,2,3)):
    print some_tuple

print_tuple()        #1
print_tuple((1,2,3)) #2

我会给你一个提示。这是反汇编(请参阅http://docs.python.org/library/dis.html):

#1个

0 LOAD_GLOBAL              0 (print_tuple)
3 CALL_FUNCTION            0
6 POP_TOP
7 LOAD_CONST               0 (None)
10 RETURN_VALUE

#2

 0 LOAD_GLOBAL              0 (print_tuple)
 3 LOAD_CONST               4 ((1, 2, 3))
 6 CALL_FUNCTION            1
 9 POP_TOP
10 LOAD_CONST               0 (None)
13 RETURN_VALUE

我怀疑经验丰富的行为是否具有实际用途(谁真正在C中使用了静态变量,却没有滋生bug?)

正如你所看到的,用一成不变的默认参数时提高性能。如果这是一个经常调用的函数,或者默认参数需要花费很长时间来构造,那么这可能会有所不同。另外,请记住,Python不是C。在C中,您拥有几乎免费的常量。在Python中,您没有此好处。

It’s a performance optimization. As a result of this functionality, which of these two function calls do you think is faster?

def print_tuple(some_tuple=(1,2,3)):
    print some_tuple

print_tuple()        #1
print_tuple((1,2,3)) #2

I’ll give you a hint. Here’s the disassembly (see http://docs.python.org/library/dis.html):

#1

0 LOAD_GLOBAL              0 (print_tuple)
3 CALL_FUNCTION            0
6 POP_TOP
7 LOAD_CONST               0 (None)
10 RETURN_VALUE

#2

 0 LOAD_GLOBAL              0 (print_tuple)
 3 LOAD_CONST               4 ((1, 2, 3))
 6 CALL_FUNCTION            1
 9 POP_TOP
10 LOAD_CONST               0 (None)
13 RETURN_VALUE

I doubt the experienced behavior has a practical use (who really used static variables in C, without breeding bugs ?)

As you can see, there is a performance benefit when using immutable default arguments. This can make a difference if it’s a frequently called function or the default argument takes a long time to construct. Also, bear in mind that Python isn’t C. In C you have constants that are pretty much free. In Python you don’t have this benefit.


回答 14

Python:可变默认参数

在函数编译为函数对象时会评估默认参数。当函数使用该函数时,该函数多次使用它们,它们仍然是同一对象。

当它们是可变的时,当发生突变(例如,通过向其添加元素)时,它们将在连续调用时保持突变。

它们保持变异,因为它们每次都是相同的对象。

等效代码:

由于列表是在编译和实例化函数对象时绑定到函数的,因此:

def foo(mutable_default_argument=[]): # make a list the default argument
    """function that uses a list"""

几乎完全等同于此:

_a_list = [] # create a list in the globals

def foo(mutable_default_argument=_a_list): # make it the default argument
    """function that uses a list"""

del _a_list # remove globals name binding

示范

这是一个演示-您可以在每次引用它们时验证它们是否是同一对象

  • 看到列表是在函数完成编译为函数对象之前创建的,
  • 观察到每次引用列表时ID都是相同的,
  • 观察到第二次调用使用列表的函数时列表保持不变,
  • 观察从源打印输出的顺序(我方便地为您编号):

example.py

print('1. Global scope being evaluated')

def create_list():
    '''noisily create a list for usage as a kwarg'''
    l = []
    print('3. list being created and returned, id: ' + str(id(l)))
    return l

print('2. example_function about to be compiled to an object')

def example_function(default_kwarg1=create_list()):
    print('appending "a" in default default_kwarg1')
    default_kwarg1.append("a")
    print('list with id: ' + str(id(default_kwarg1)) + 
          ' - is now: ' + repr(default_kwarg1))

print('4. example_function compiled: ' + repr(example_function))


if __name__ == '__main__':
    print('5. calling example_function twice!:')
    example_function()
    example_function()

并使用以下命令运行它python example.py

1. Global scope being evaluated
2. example_function about to be compiled to an object
3. list being created and returned, id: 140502758808032
4. example_function compiled: <function example_function at 0x7fc9590905f0>
5. calling example_function twice!:
appending "a" in default default_kwarg1
list with id: 140502758808032 - is now: ['a']
appending "a" in default default_kwarg1
list with id: 140502758808032 - is now: ['a', 'a']

这是否违反了“最少惊讶”的原则?

这种执行顺序经常会使Python的新用户感到困惑。如果您了解Python执行模型,那么就可以预期了。

对新Python用户的一般说明:

但这就是为什么对新用户的通常指示是改为创建其默认参数,如下所示:

def example_function_2(default_kwarg=None):
    if default_kwarg is None:
        default_kwarg = []

这使用None单例作为哨兵对象来告诉函数我们是否获得了默认值以外的参数。如果没有参数,则实际上我们想使用一个新的空列表[]作为默认值。

正如关于控制流教程部分所述

如果您不希望在后续调用之间共享默认值,则可以这样编写函数:

def f(a, L=None):
    if L is None:
        L = []
    L.append(a)
    return L

Python: The Mutable Default Argument

Default arguments get evaluated at the time the function is compiled into a function object. When used by the function, multiple times by that function, they are and remain the same object.

When they are mutable, when mutated (for example, by adding an element to it) they remain mutated on consecutive calls.

They stay mutated because they are the same object each time.

Equivalent code:

Since the list is bound to the function when the function object is compiled and instantiated, this:

def foo(mutable_default_argument=[]): # make a list the default argument
    """function that uses a list"""

is almost exactly equivalent to this:

_a_list = [] # create a list in the globals

def foo(mutable_default_argument=_a_list): # make it the default argument
    """function that uses a list"""

del _a_list # remove globals name binding

Demonstration

Here’s a demonstration – you can verify that they are the same object each time they are referenced by

  • seeing that the list is created before the function has finished compiling to a function object,
  • observing that the id is the same each time the list is referenced,
  • observing that the list stays changed when the function that uses it is called a second time,
  • observing the order in which the output is printed from the source (which I conveniently numbered for you):

example.py

print('1. Global scope being evaluated')

def create_list():
    '''noisily create a list for usage as a kwarg'''
    l = []
    print('3. list being created and returned, id: ' + str(id(l)))
    return l

print('2. example_function about to be compiled to an object')

def example_function(default_kwarg1=create_list()):
    print('appending "a" in default default_kwarg1')
    default_kwarg1.append("a")
    print('list with id: ' + str(id(default_kwarg1)) + 
          ' - is now: ' + repr(default_kwarg1))

print('4. example_function compiled: ' + repr(example_function))


if __name__ == '__main__':
    print('5. calling example_function twice!:')
    example_function()
    example_function()

and running it with python example.py:

1. Global scope being evaluated
2. example_function about to be compiled to an object
3. list being created and returned, id: 140502758808032
4. example_function compiled: <function example_function at 0x7fc9590905f0>
5. calling example_function twice!:
appending "a" in default default_kwarg1
list with id: 140502758808032 - is now: ['a']
appending "a" in default default_kwarg1
list with id: 140502758808032 - is now: ['a', 'a']

Does this violate the principle of “Least Astonishment”?

This order of execution is frequently confusing to new users of Python. If you understand the Python execution model, then it becomes quite expected.

The usual instruction to new Python users:

But this is why the usual instruction to new users is to create their default arguments like this instead:

def example_function_2(default_kwarg=None):
    if default_kwarg is None:
        default_kwarg = []

This uses the None singleton as a sentinel object to tell the function whether or not we’ve gotten an argument other than the default. If we get no argument, then we actually want to use a new empty list, [], as the default.

As the tutorial section on control flow says:

If you don’t want the default to be shared between subsequent calls, you can write the function like this instead:

def f(a, L=None):
    if L is None:
        L = []
    L.append(a)
    return L

回答 15

最短的答案可能是“定义就是执行”,因此整个论点没有严格意义。作为更人为的示例,您可以引用以下内容:

def a(): return []

def b(x=a()):
    print x

希望足以表明在def语句执行时不执行默认参数表达式不是一件容易的事,或者说没有道理,或者两者兼而有之。

我同意,当您尝试使用默认构造函数时,这是一个陷阱。

The shortest answer would probably be “definition is execution”, therefore the whole argument makes no strict sense. As a more contrived example, you may cite this:

def a(): return []

def b(x=a()):
    print x

Hopefully it’s enough to show that not executing the default argument expressions at the execution time of the def statement isn’t easy or doesn’t make sense, or both.

I agree it’s a gotcha when you try to use default constructors, though.


回答 16

使用None的简单解决方法

>>> def bar(b, data=None):
...     data = data or []
...     data.append(b)
...     return data
... 
>>> bar(3)
[3]
>>> bar(3)
[3]
>>> bar(3)
[3]
>>> bar(3, [34])
[34, 3]
>>> bar(3, [34])
[34, 3]

A simple workaround using None

>>> def bar(b, data=None):
...     data = data or []
...     data.append(b)
...     return data
... 
>>> bar(3)
[3]
>>> bar(3)
[3]
>>> bar(3)
[3]
>>> bar(3, [34])
[34, 3]
>>> bar(3, [34])
[34, 3]

回答 17

如果考虑以下因素,这种行为就不足为奇了:

  1. 分配尝试时只读类属性的行为,并且
  2. 函数是对象(在接受的答案中有很好的解释)。

(2)的作用已在该线程中广泛讨论。(1)可能是令人惊讶的原因,因为这种行为在来自其他语言时不是“直观”的。

(1)有关类的Python 教程中进行了描述。在尝试为只读类属性分配值时:

…在最内层作用域之外找到的所有变量都是只读的(尝试写入此类变量只会在最内层作用域内创建一个新的局部变量,而使名称相同的外层变量保持不变)。

回到原始示例并考虑以上几点:

def foo(a=[]):
    a.append(5)
    return a

foo是一个对象,a是的属性foo(位于foo.func_defs[0])。由于a是列表,a因此是可变的,因此是的读写属性foo。实例化函数时,它将初始化为签名指定的空列表,并且只要函数对象存在,就可以进行读取和写入。

foo不覆盖默认值的情况下进行调用会使用中的默认值foo.func_defs。在这种情况下,foo.func_defs[0]用于a功能对象的代码范围内。更改更改,a更改foo.func_defs[0]foo对象的一部分,并在foo

现在,将此与模拟其他语言的默认参数行为的文档示例进行比较,以便每次执行函数时都使用函数签名默认值:

def foo(a, L=None):
    if L is None:
        L = []
    L.append(a)
    return L

考虑到(1)(2),可以看到为什么这样做可以实现所需的行为:

  • foo功能对象被实例化,foo.func_defs[0]被设置为None,一个不可变的对象。
  • 当使用默认值执行函数(L在函数调用中未指定参数)时,foo.func_defs[0]None)在本地作用域中为L
  • 在时L = [],分配不能在处成功foo.func_defs[0],因为该属性是只读的。
  • 对于(1)还会L在本地范围内创建一个也命名为新的本地变量,并用于其余的函数调用。foo.func_defs[0]因此对于以后的调用保持不变foo

This behavior is not surprising if you take the following into consideration:

  1. The behavior of read-only class attributes upon assignment attempts, and that
  2. Functions are objects (explained well in the accepted answer).

The role of (2) has been covered extensively in this thread. (1) is likely the astonishment causing factor, as this behavior is not “intuitive” when coming from other languages.

(1) is described in the Python tutorial on classes. In an attempt to assign a value to a read-only class attribute:

…all variables found outside of the innermost scope are read-only (an attempt to write to such a variable will simply create a new local variable in the innermost scope, leaving the identically named outer variable unchanged).

Look back to the original example and consider the above points:

def foo(a=[]):
    a.append(5)
    return a

Here foo is an object and a is an attribute of foo (available at foo.func_defs[0]). Since a is a list, a is mutable and is thus a read-write attribute of foo. It is initialized to the empty list as specified by the signature when the function is instantiated, and is available for reading and writing as long as the function object exists.

Calling foo without overriding a default uses that default’s value from foo.func_defs. In this case, foo.func_defs[0] is used for a within function object’s code scope. Changes to a change foo.func_defs[0], which is part of the foo object and persists between execution of the code in foo.

Now, compare this to the example from the documentation on emulating the default argument behavior of other languages, such that the function signature defaults are used every time the function is executed:

def foo(a, L=None):
    if L is None:
        L = []
    L.append(a)
    return L

Taking (1) and (2) into account, one can see why this accomplishes the the desired behavior:

  • When the foo function object is instantiated, foo.func_defs[0] is set to None, an immutable object.
  • When the function is executed with defaults (with no parameter specified for L in the function call), foo.func_defs[0] (None) is available in the local scope as L.
  • Upon L = [], the assignment cannot succeed at foo.func_defs[0], because that attribute is read-only.
  • Per (1), a new local variable also named L is created in the local scope and used for the remainder of the function call. foo.func_defs[0] thus remains unchanged for future invocations of foo.

回答 18

我将演示将默认列表值传递给函数的替代结构(与字典同样有效)。

正如其他人广泛评论的那样,list参数在定义时绑定到函数,而不是在执行时绑定。由于列表和字典是可变的,因此对该参数的任何更改都会影响对该函数的其他调用。结果,随后对该函数的调用将收到此共享列表,该共享列表可能已被对该函数的任何其他调用更改。更糟糕的是,两个参数同时使用了此函数的共享参数,而忽略了另一个参数所做的更改。

错误的方法(可能是…)

def foo(list_arg=[5]):
    return list_arg

a = foo()
a.append(6)
>>> a
[5, 6]

b = foo()
b.append(7)
# The value of 6 appended to variable 'a' is now part of the list held by 'b'.
>>> b
[5, 6, 7]  

# Although 'a' is expecting to receive 6 (the last element it appended to the list),
# it actually receives the last element appended to the shared list.
# It thus receives the value 7 previously appended by 'b'.
>>> a.pop()             
7

您可以使用以下命令验证它们是同一对象id

>>> id(a)
5347866528

>>> id(b)
5347866528

Per Brett Slatkin的“有效的Python:59种编写更好的Python的特定方式”,第20项:使用None和文档字符串指定动态默认参数(第48页)

在Python中达到预期结果的约定是提供默认值,None并在docstring中记录实际行为。

此实现可确保对函数的每次调用都可以接收默认列表,也可以将列表传递给函数。

首选方法

def foo(list_arg=None):
   """
   :param list_arg:  A list of input values. 
                     If none provided, used a list with a default value of 5.
   """
   if not list_arg:
       list_arg = [5]
   return list_arg

a = foo()
a.append(6)
>>> a
[5, 6]

b = foo()
b.append(7)
>>> b
[5, 7]

c = foo([10])
c.append(11)
>>> c
[10, 11]

“错误方法”可能存在合法的用例,程序员可能希望共享默认的列表参数,但这比规则更可能是exceptions。

I am going to demonstrate an alternative structure to pass a default list value to a function (it works equally well with dictionaries).

As others have extensively commented, the list parameter is bound to the function when it is defined as opposed to when it is executed. Because lists and dictionaries are mutable, any alteration to this parameter will affect other calls to this function. As a result, subsequent calls to the function will receive this shared list which may have been altered by any other calls to the function. Worse yet, two parameters are using this function’s shared parameter at the same time oblivious to the changes made by the other.

Wrong Method (probably…):

def foo(list_arg=[5]):
    return list_arg

a = foo()
a.append(6)
>>> a
[5, 6]

b = foo()
b.append(7)
# The value of 6 appended to variable 'a' is now part of the list held by 'b'.
>>> b
[5, 6, 7]  

# Although 'a' is expecting to receive 6 (the last element it appended to the list),
# it actually receives the last element appended to the shared list.
# It thus receives the value 7 previously appended by 'b'.
>>> a.pop()             
7

You can verify that they are one and the same object by using id:

>>> id(a)
5347866528

>>> id(b)
5347866528

Per Brett Slatkin’s “Effective Python: 59 Specific Ways to Write Better Python”, Item 20: Use None and Docstrings to specify dynamic default arguments (p. 48)

The convention for achieving the desired result in Python is to provide a default value of None and to document the actual behaviour in the docstring.

This implementation ensures that each call to the function either receives the default list or else the list passed to the function.

Preferred Method:

def foo(list_arg=None):
   """
   :param list_arg:  A list of input values. 
                     If none provided, used a list with a default value of 5.
   """
   if not list_arg:
       list_arg = [5]
   return list_arg

a = foo()
a.append(6)
>>> a
[5, 6]

b = foo()
b.append(7)
>>> b
[5, 7]

c = foo([10])
c.append(11)
>>> c
[10, 11]

There may be legitimate use cases for the ‘Wrong Method’ whereby the programmer intended the default list parameter to be shared, but this is more likely the exception than the rule.


回答 19

这里的解决方案是:

  1. 使用None作为默认值(或随机数object),以及交换机上,在运行时创建自己的价值观; 要么
  2. 使用a lambda作为默认参数,并在try块中调用它以获取默认值(这是lambda抽象用于的事情)。

第二个选项很好,因为该函数的用户可以传递一个可调用的(可能已经存在)(例如type

The solutions here are:

  1. Use None as your default value (or a nonce object), and switch on that to create your values at runtime; or
  2. Use a lambda as your default parameter, and call it within a try block to get the default value (this is the sort of thing that lambda abstraction is for).

The second option is nice because users of the function can pass in a callable, which may be already existing (such as a type)


回答 20

当我们这样做时:

def foo(a=[]):
    ...

… 如果调用者未传递a的值,则将参数分配a给一个未命名的列表。

为了简化讨论,让我们暂时为未命名列表命名。怎么pavlo

def foo(a=pavlo):
   ...

在任何时候,如果调用方法不告诉我们是什么a,我们就会重用pavlo

如果pavlo是可变的(可修改的),并且foo最终对其进行了修改,那么下次foo调用我们注意到的效果时无需指定a

因此,这就是您所看到的(记住,pavlo已初始化为[]):

 >>> foo()
 [5]

现在,pavlo是[5]。

foo()再次调用会再次修改pavlo

>>> foo()
[5, 5]

指定a呼叫时foo()确保pavlo不会被触摸。

>>> ivan = [1, 2, 3, 4]
>>> foo(a=ivan)
[1, 2, 3, 4, 5]
>>> ivan
[1, 2, 3, 4, 5]

因此,pavlo仍然是[5, 5]

>>> foo()
[5, 5, 5]

When we do this:

def foo(a=[]):
    ...

… we assign the argument a to an unnamed list, if the caller does not pass the value of a.

To make things simpler for this discussion, let’s temporarily give the unnamed list a name. How about pavlo ?

def foo(a=pavlo):
   ...

At any time, if the caller doesn’t tell us what a is, we reuse pavlo.

If pavlo is mutable (modifiable), and foo ends up modifying it, an effect we notice the next time foo is called without specifying a.

So this is what you see (Remember, pavlo is initialized to []):

 >>> foo()
 [5]

Now, pavlo is [5].

Calling foo() again modifies pavlo again:

>>> foo()
[5, 5]

Specifying a when calling foo() ensures pavlo is not touched.

>>> ivan = [1, 2, 3, 4]
>>> foo(a=ivan)
[1, 2, 3, 4, 5]
>>> ivan
[1, 2, 3, 4, 5]

So, pavlo is still [5, 5].

>>> foo()
[5, 5, 5]

回答 21

我有时会利用此行为来替代以下模式:

singleton = None

def use_singleton():
    global singleton

    if singleton is None:
        singleton = _make_singleton()

    return singleton.use_me()

如果singleton仅由使用use_singleton,则我喜欢以下模式作为替换:

# _make_singleton() is called only once when the def is executed
def use_singleton(singleton=_make_singleton()):
    return singleton.use_me()

我用它来实例化访问外部资源的客户端类,还用于创建字典或用于记忆的列表。

由于我认为这种模式并不为人所知,因此我做了简短的评论,以防止将来发生误解。

I sometimes exploit this behavior as an alternative to the following pattern:

singleton = None

def use_singleton():
    global singleton

    if singleton is None:
        singleton = _make_singleton()

    return singleton.use_me()

If singleton is only used by use_singleton, I like the following pattern as a replacement:

# _make_singleton() is called only once when the def is executed
def use_singleton(singleton=_make_singleton()):
    return singleton.use_me()

I’ve used this for instantiating client classes that access external resources, and also for creating dicts or lists for memoization.

Since I don’t think this pattern is well known, I do put a short comment in to guard against future misunderstandings.


回答 22

您可以通过替换对象来解决这个问题(并因此替换范围):

def foo(a=[]):
    a = list(a)
    a.append(5)
    return a

丑陋,但是行得通。

You can get round this by replacing the object (and therefore the tie with the scope):

def foo(a=[]):
    a = list(a)
    a.append(5)
    return a

Ugly, but it works.


回答 23

可能确实是:

  1. 有人正在使用每种语言/库功能,并且
  2. 在这里切换行为是不明智的,但是

坚持上述两个功能,并且仍然提出另一点是完全一致的:

  1. 这是一个令人困惑的功能,不幸的是在Python中。

其他答案,或者至少其中一些答案得分为1和2,而不是3,或者得分为3,淡化得分为1和2。但是所有三个答案都是正确的。

的确,在此处中途更换马匹可能会造成重大损坏,并且通过更改Python以直观地处理Stefano的开头代码段可能会产生更多问题。确实可能是一个非常了解Python内部知识的人可以解释后果的雷区。然而,

现有的行为不是Python的,Python是成功的,因为很少有语言违反任何地方的最小惊讶原则 附近这很糟糕。根除它是否明智是一个真正的问题。这是一个设计缺陷。如果您通过尝试找出行为来更好地理解该语言,那么可以说C ++可以完成所有这些工作,甚至更多。通过导航(例如)细微的指针错误,您学到了很多东西。但这不是Python风格的:关心Python足以在这种行为面前持之以恒的人是被该语言吸引的人,因为Python比其他语言具有更少的惊喜。当涉猎者和好奇的人成为Pythonista者时,他们惊讶地发现需要花很少的时间才能完成某项工作-不是因为设计漏洞-我的意思是隐藏的逻辑难题-消除了被Python吸引的程序员的直觉因为它可行

It may be true that:

  1. Someone is using every language/library feature, and
  2. Switching the behavior here would be ill-advised, but

it is entirely consistent to hold to both of the features above and still make another point:

  1. It is a confusing feature and it is unfortunate in Python.

The other answers, or at least some of them either make points 1 and 2 but not 3, or make point 3 and downplay points 1 and 2. But all three are true.

It may be true that switching horses in midstream here would be asking for significant breakage, and that there could be more problems created by changing Python to intuitively handle Stefano’s opening snippet. And it may be true that someone who knew Python internals well could explain a minefield of consequences. However,

The existing behavior is not Pythonic, and Python is successful because very little about the language violates the principle of least astonishment anywhere near this badly. It is a real problem, whether or not it would be wise to uproot it. It is a design flaw. If you understand the language much better by trying to trace out the behavior, I can say that C++ does all of this and more; you learn a lot by navigating, for instance, subtle pointer errors. But this is not Pythonic: people who care about Python enough to persevere in the face of this behavior are people who are drawn to the language because Python has far fewer surprises than other language. Dabblers and the curious become Pythonistas when they are astonished at how little time it takes to get something working–not because of a design fl–I mean, hidden logic puzzle–that cuts against the intuitions of programmers who are drawn to Python because it Just Works.


回答 24

这不是设计缺陷。绊倒这个的人做错了什么。

我看到3种情况,您可能会遇到此问题:

  1. 您打算修改参数作为函数的副作用。在这种情况下,没有默认参数是没有意义的。唯一的exceptions是,当您滥用参数列表以具有函数属性(例如)时cache={},根本就不会期望使用实际参数来调用函数。
  2. 您打算保留该参数不变,但您无意中对其做了修改。那是一个错误,修复它。
  3. 您打算修改在函数内部使用的参数,但是并不希望修改在函数外部可见。在这种情况下,无论是否为默认值,都需要复制该参数!Python不是按值调用的语言,因此它不能为您创建副本,您需要对其进行明确说明。

问题中的示例可能属于类别1或3。奇怪的是,它同时修改了传递的列表并返回了它;您应该选择其中一个。

This is not a design flaw. Anyone who trips over this is doing something wrong.

There are 3 cases I see where you might run into this problem:

  1. You intend to modify the argument as a side effect of the function. In this case it never makes sense to have a default argument. The only exception is when you’re abusing the argument list to have function attributes, e.g. cache={}, and you wouldn’t be expected to call the function with an actual argument at all.
  2. You intend to leave the argument unmodified, but you accidentally did modify it. That’s a bug, fix it.
  3. You intend to modify the argument for use inside the function, but didn’t expect the modification to be viewable outside of the function. In that case you need to make a copy of the argument, whether it was the default or not! Python is not a call-by-value language so it doesn’t make the copy for you, you need to be explicit about it.

The example in the question could fall into category 1 or 3. It’s odd that it both modifies the passed list and returns it; you should pick one or the other.


回答 25

这个“ bug”给了我很多加班时间!但是我开始看到它的潜在用途(但是我还是希望它能在执行时使用)

我会给你我认为有用的例子。

def example(errors=[]):
    # statements
    # Something went wrong
    mistake = True
    if mistake:
        tryToFixIt(errors)
        # Didn't work.. let's try again
        tryToFixItAnotherway(errors)
        # This time it worked
    return errors

def tryToFixIt(err):
    err.append('Attempt to fix it')

def tryToFixItAnotherway(err):
    err.append('Attempt to fix it by another way')

def main():
    for item in range(2):
        errors = example()
    print '\n'.join(errors)

main()

打印以下内容

Attempt to fix it
Attempt to fix it by another way
Attempt to fix it
Attempt to fix it by another way

This “bug” gave me a lot of overtime work hours! But I’m beginning to see a potential use of it (but I would have liked it to be at the execution time, still)

I’m gonna give you what I see as a useful example.

def example(errors=[]):
    # statements
    # Something went wrong
    mistake = True
    if mistake:
        tryToFixIt(errors)
        # Didn't work.. let's try again
        tryToFixItAnotherway(errors)
        # This time it worked
    return errors

def tryToFixIt(err):
    err.append('Attempt to fix it')

def tryToFixItAnotherway(err):
    err.append('Attempt to fix it by another way')

def main():
    for item in range(2):
        errors = example()
    print '\n'.join(errors)

main()

prints the following

Attempt to fix it
Attempt to fix it by another way
Attempt to fix it
Attempt to fix it by another way

回答 26

只需将功能更改为:

def notastonishinganymore(a = []): 
    '''The name is just a joke :)'''
    a = a[:]
    a.append(5)
    return a

Just change the function to be:

def notastonishinganymore(a = []): 
    '''The name is just a joke :)'''
    a = a[:]
    a.append(5)
    return a

回答 27

我认为这个问题的答案在于python如何将数据传递给参数(通过值或引用传递),而不是可变性或python如何处理“ def”语句。

简介。首先,python中有两种类型的数据类型,一种是简单的基本数据类型,例如数字,另一种是对象。其次,当将数据传递给参数时,python按值传递基本数据类型,即,将值的本地副本传递给局部变量,但按引用传递对象,即指向对象的指针。

承认以上两点,让我们解释一下python代码发生了什么。这仅是因为通过引用传递了对象,但与可变/不可变无关,或者可以说,“ def”语句在定义时仅执行一次。

[]是一个对象,因此python将[]的引用传递给a,即,a仅是指向[]的指针,该指针作为对象位于内存中。[]只有一个副本,但是有很多引用。对于第一个foo(),通过append方法将列表[]更改为1。但是请注意,列表对象只有一个副本,该对象现在变为1。当运行第二个foo()时,effbot网页上显示的内容(不再评估项目)是错误的。a被评估为列表对象,尽管现在对象的内容为1。这是通过引用传递的效果!foo(3)的结果可以用相同的方式轻松得出。

为了进一步验证我的答案,让我们看一下另外两个代码。

====== 2号========

def foo(x, items=None):
    if items is None:
        items = []
    items.append(x)
    return items

foo(1)  #return [1]
foo(2)  #return [2]
foo(3)  #return [3]

[]是一个对象,对象也是这样None(前者是可变的,而后者是不可变的。但是可变性与问题无关)。空间中没有一个地方,但我们知道它在那里,那里只有一个副本。因此,每次调用foo时,项都会被评估为“无”(与之对应的答案是只被评估一次),显然,该引用(或地址)为“无”。然后在foo中,item更改为[],即指向另一个具有不同地址的对象。

====== 3号=======

def foo(x, items=[]):
    items.append(x)
    return items

foo(1)    # returns [1]
foo(2,[]) # returns [2]
foo(3)    # returns [1,3]

foo(1)的调用使项指向具有地址的列表对象[],例如11111111。在续集的foo函数中,列表的内容更改为1,但地址未更改,仍然为11111111然后foo(2,[])来了。尽管在调用foo(1)时,foo(2,[])中的[]与默认参数[]的内容相同,但是它们的地址却不同!由于我们显式提供了参数,items因此必须采用这个新地址[]例如2222222),并在进行一些更改后将其返回。现在执行foo(3)。因为只有x提供时,项目必须再次使用其默认值。默认值是多少?它是在定义foo函数时设置的:位于11111111的列表对象。因此,将这些项评估为具有元素1的地址11111111。位于2222222的列表也包含一个元素2,但是任何项目都不会指向该列表更多。因此,3的追加将成为items[1,3]。

从上面的解释中,我们可以看到,在接受的答案中推荐的effbot网页未能给出与此问题相关的答案。而且,我认为effbot网页中的一点是错误的。我认为有关UI.Button的代码是正确的:

for i in range(10):
    def callback():
        print "clicked button", i
    UI.Button("button %s" % i, callback)

每个按钮可以包含一个不同的回调函数,该函数将显示不同的值i。我可以提供一个示例来说明这一点:

x=[]
for i in range(10):
    def callback():
        print(i)
    x.append(callback) 

如果执行,x[7]()我们将得到预期的7,x[9]()并将得到9的另一个值i

I think the answer to this question lies in how python pass data to parameter (pass by value or by reference), not mutability or how python handle the “def” statement.

A brief introduction. First, there are two type of data types in python, one is simple elementary data type, like numbers, and another data type is objects. Second, when passing data to parameters, python pass elementary data type by value, i.e., make a local copy of the value to a local variable, but pass object by reference, i.e., pointers to the object.

Admitting the above two points, let’s explain what happened to the python code. It’s only because of passing by reference for objects, but has nothing to do with mutable/immutable, or arguably the fact that “def” statement is executed only once when it is defined.

[] is an object, so python pass the reference of [] to a, i.e., a is only a pointer to [] which lies in memory as an object. There is only one copy of [] with, however, many references to it. For the first foo(), the list [] is changed to 1 by append method. But Note that there is only one copy of the list object and this object now becomes 1. When running the second foo(), what effbot webpage says (items is not evaluated any more) is wrong. a is evaluated to be the list object, although now the content of the object is 1. This is the effect of passing by reference! The result of foo(3) can be easily derived in the same way.

To further validate my answer, let’s take a look at two additional codes.

====== No. 2 ========

def foo(x, items=None):
    if items is None:
        items = []
    items.append(x)
    return items

foo(1)  #return [1]
foo(2)  #return [2]
foo(3)  #return [3]

[] is an object, so is None (the former is mutable while the latter is immutable. But the mutability has nothing to do with the question). None is somewhere in the space but we know it’s there and there is only one copy of None there. So every time foo is invoked, items is evaluated (as opposed to some answer that it is only evaluated once) to be None, to be clear, the reference (or the address) of None. Then in the foo, item is changed to [], i.e., points to another object which has a different address.

====== No. 3 =======

def foo(x, items=[]):
    items.append(x)
    return items

foo(1)    # returns [1]
foo(2,[]) # returns [2]
foo(3)    # returns [1,3]

The invocation of foo(1) make items point to a list object [] with an address, say, 11111111. the content of the list is changed to 1 in the foo function in the sequel, but the address is not changed, still 11111111. Then foo(2,[]) is coming. Although the [] in foo(2,[]) has the same content as the default parameter [] when calling foo(1), their address are different! Since we provide the parameter explicitly, items has to take the address of this new [], say 2222222, and return it after making some change. Now foo(3) is executed. since only x is provided, items has to take its default value again. What’s the default value? It is set when defining the foo function: the list object located in 11111111. So the items is evaluated to be the address 11111111 having an element 1. The list located at 2222222 also contains one element 2, but it is not pointed by items any more. Consequently, An append of 3 will make items [1,3].

From the above explanations, we can see that the effbot webpage recommended in the accepted answer failed to give a relevant answer to this question. What is more, I think a point in the effbot webpage is wrong. I think the code regarding the UI.Button is correct:

for i in range(10):
    def callback():
        print "clicked button", i
    UI.Button("button %s" % i, callback)

Each button can hold a distinct callback function which will display different value of i. I can provide an example to show this:

x=[]
for i in range(10):
    def callback():
        print(i)
    x.append(callback) 

If we execute x[7]() we’ll get 7 as expected, and x[9]() will gives 9, another value of i.


回答 28

TLDR:定义时间默认值是一致的,并且更具表现力。


定义一个函数影响两个范围:该范围定义包含的功能,并执行范围由包含的功能。尽管很清楚块是如​​何映射到作用域的,但问题是在哪里def <name>(<args=defaults>):属于:

...                           # defining scope
def name(parameter=default):  # ???
    ...                       # execution scope

def name零件必须在定义范围内进行评估- name毕竟我们希望在那里可用。仅在内部评估函数将使其无法访问。

由于parameter是一个常量名,因此我们可以与同时“评估”它def name。这还有一个优势,那就是它可以生成具有已知签名的功能name(parameter=...):,而不是裸露的签名name(...):

现在,什么时候评估default

一致性已经说了“在定义时”:def <name>(<args=defaults>):在定义时最好也评估其他所有内容。延迟其中的一部分将是令人惊讶的选择。

两种选择都不相等:如果default在定义时求值,它仍然会影响执行时间。如果default在执行时评估,则不会影响定义时间。选择“在定义时”允许表达两种情况,而选择“在执行时”只能表达一种情况:

def name(parameter=defined):  # set default at definition time
    ...

def name(parameter=default):     # delay default until execution time
    parameter = default if parameter is None else parameter
    ...

TLDR: Define-time defaults are consistent and strictly more expressive.


Defining a function affects two scopes: the defining scope containing the function, and the execution scope contained by the function. While it is pretty clear how blocks map to scopes, the question is where def <name>(<args=defaults>): belongs to:

...                           # defining scope
def name(parameter=default):  # ???
    ...                       # execution scope

The def name part must evaluate in the defining scope – we want name to be available there, after all. Evaluating the function only inside itself would make it inaccessible.

Since parameter is a constant name, we can “evaluate” it at the same time as def name. This also has the advantage it produces the function with a known signature as name(parameter=...):, instead of a bare name(...):.

Now, when to evaluate default?

Consistency already says “at definition”: everything else of def <name>(<args=defaults>): is best evaluated at definition as well. Delaying parts of it would be the astonishing choice.

The two choices are not equivalent, either: If default is evaluated at definition time, it can still affect execution time. If default is evaluated at execution time, it cannot affect definition time. Choosing “at definition” allows expressing both cases, while choosing “at execution” can express only one:

def name(parameter=defined):  # set default at definition time
    ...

def name(parameter=default):     # delay default until execution time
    parameter = default if parameter is None else parameter
    ...

回答 29

其他所有答案都解释了为什么这实际上是一种不错的期望行为,或者为什么无论如何您都不需要这样做。Mine适用于那些固执己见的人,他们想行使自己的权利将语言屈服于自己的意愿,而不是反过来。

我们将使用装饰器“修复”此行为,该装饰器将复制默认值,而不是为保留其默认值的每个位置参数重用相同的实例。

import inspect
from copy import copy

def sanify(function):
    def wrapper(*a, **kw):
        # store the default values
        defaults = inspect.getargspec(function).defaults # for python2
        # construct a new argument list
        new_args = []
        for i, arg in enumerate(defaults):
            # allow passing positional arguments
            if i in range(len(a)):
                new_args.append(a[i])
            else:
                # copy the value
                new_args.append(copy(arg))
        return function(*new_args, **kw)
    return wrapper

现在,让我们使用此装饰器重新定义函数:

@sanify
def foo(a=[]):
    a.append(5)
    return a

foo() # '[5]'
foo() # '[5]' -- as desired

这对于带有多个参数的函数特别整洁。相比:

# the 'correct' approach
def bar(a=None, b=None, c=None):
    if a is None:
        a = []
    if b is None:
        b = []
    if c is None:
        c = []
    # finally do the actual work

# the nasty decorator hack
@sanify
def bar(a=[], b=[], c=[]):
    # wow, works right out of the box!

重要的是要注意,如果您尝试使用关键字args,上述解决方案将失效,如下所示:

foo(a=[4])

装饰器可以进行调整以允许这样做,但是我们将其留给读者练习;)

Every other answer explains why this is actually a nice and desired behavior, or why you shouldn’t be needing this anyway. Mine is for those stubborn ones who want to exercise their right to bend the language to their will, not the other way around.

We will “fix” this behavior with a decorator that will copy the default value instead of reusing the same instance for each positional argument left at its default value.

import inspect
from copy import copy

def sanify(function):
    def wrapper(*a, **kw):
        # store the default values
        defaults = inspect.getargspec(function).defaults # for python2
        # construct a new argument list
        new_args = []
        for i, arg in enumerate(defaults):
            # allow passing positional arguments
            if i in range(len(a)):
                new_args.append(a[i])
            else:
                # copy the value
                new_args.append(copy(arg))
        return function(*new_args, **kw)
    return wrapper

Now let’s redefine our function using this decorator:

@sanify
def foo(a=[]):
    a.append(5)
    return a

foo() # '[5]'
foo() # '[5]' -- as desired

This is particularly neat for functions that take multiple arguments. Compare:

# the 'correct' approach
def bar(a=None, b=None, c=None):
    if a is None:
        a = []
    if b is None:
        b = []
    if c is None:
        c = []
    # finally do the actual work

with

# the nasty decorator hack
@sanify
def bar(a=[], b=[], c=[]):
    # wow, works right out of the box!

It’s important to note that the above solution breaks if you try to use keyword args, like so:

foo(a=[4])

The decorator could be adjusted to allow for that, but we leave this as an exercise for the reader ;)


如何克隆或复制列表?

问题:如何克隆或复制列表?

在Python中克隆或复制列表有哪些选项?

在使用new_list = my_list,任何修改new_list改变my_list每次。为什么是这样?

What are the options to clone or copy a list in Python?

While using new_list = my_list, any modifications to new_list changes my_list everytime. Why is this?


回答 0

使用new_list = my_list,您实际上没有两个列表。分配只是将引用复制到列表,而不是实际列表,因此将两者复制new_listmy_list在分配后引用同一列表。

要实际复制列表,您有多种可能:

  • 您可以使用内建list.copy()方法(自Python 3.3起可用):

    new_list = old_list.copy()
  • 您可以将其切片:

    new_list = old_list[:]

    Alex Martelli对此看法(至少是在2007年)是,这是一种怪异的语法,永远不要使用它。;)(在他看来,下一个更具可读性)。

  • 您可以使用内置list()函数:

    new_list = list(old_list)
  • 您可以使用generic copy.copy()

    import copy
    new_list = copy.copy(old_list)

    这比list()因为必须找出old_listfirst 的数据类型慢一些。

  • 如果列表包含对象,并且您也想复制它们,请使用generic copy.deepcopy()

    import copy
    new_list = copy.deepcopy(old_list)

    显然,这是最慢且最需要内存的方法,但有时是不可避免的。

例:

import copy

class Foo(object):
    def __init__(self, val):
         self.val = val

    def __repr__(self):
        return 'Foo({!r})'.format(self.val)

foo = Foo(1)

a = ['foo', foo]
b = a.copy()
c = a[:]
d = list(a)
e = copy.copy(a)
f = copy.deepcopy(a)

# edit orignal list and instance 
a.append('baz')
foo.val = 5

print('original: %r\nlist.copy(): %r\nslice: %r\nlist(): %r\ncopy: %r\ndeepcopy: %r'
      % (a, b, c, d, e, f))

结果:

original: ['foo', Foo(5), 'baz']
list.copy(): ['foo', Foo(5)]
slice: ['foo', Foo(5)]
list(): ['foo', Foo(5)]
copy: ['foo', Foo(5)]
deepcopy: ['foo', Foo(1)]

With new_list = my_list, you don’t actually have two lists. The assignment just copies the reference to the list, not the actual list, so both new_list and my_list refer to the same list after the assignment.

To actually copy the list, you have various possibilities:

  • You can use the builtin list.copy() method (available since Python 3.3):

    new_list = old_list.copy()
    
  • You can slice it:

    new_list = old_list[:]
    

    Alex Martelli’s opinion (at least back in 2007) about this is, that it is a weird syntax and it does not make sense to use it ever. ;) (In his opinion, the next one is more readable).

  • You can use the built in list() function:

    new_list = list(old_list)
    
  • You can use generic copy.copy():

    import copy
    new_list = copy.copy(old_list)
    

    This is a little slower than list() because it has to find out the datatype of old_list first.

  • If the list contains objects and you want to copy them as well, use generic copy.deepcopy():

    import copy
    new_list = copy.deepcopy(old_list)
    

    Obviously the slowest and most memory-needing method, but sometimes unavoidable.

Example:

import copy

class Foo(object):
    def __init__(self, val):
         self.val = val

    def __repr__(self):
        return 'Foo({!r})'.format(self.val)

foo = Foo(1)

a = ['foo', foo]
b = a.copy()
c = a[:]
d = list(a)
e = copy.copy(a)
f = copy.deepcopy(a)

# edit orignal list and instance 
a.append('baz')
foo.val = 5

print('original: %r\nlist.copy(): %r\nslice: %r\nlist(): %r\ncopy: %r\ndeepcopy: %r'
      % (a, b, c, d, e, f))

Result:

original: ['foo', Foo(5), 'baz']
list.copy(): ['foo', Foo(5)]
slice: ['foo', Foo(5)]
list(): ['foo', Foo(5)]
copy: ['foo', Foo(5)]
deepcopy: ['foo', Foo(1)]

回答 1

Felix已经提供了一个很好的答案,但是我想我将对各种方法进行速度比较:

  1. 10.59秒(105.9us / itn)- copy.deepcopy(old_list)
  2. 10.16秒(101.6us / itn)-使用Deepcopy Copy()复制类的纯python 方法
  3. 1.488秒(14.88us / itn)-纯python Copy()方法不复制类(仅字典/列表/元组)
  4. 0.325秒(3.25us / itn)- for item in old_list: new_list.append(item)
  5. 0.217秒(2.17us / itn)- [i for i in old_list]列表理解
  6. 0.186秒(1.86us / itn)- copy.copy(old_list)
  7. 0.075秒(0.75us / itn)- list(old_list)
  8. 0.053秒(0.53us / itn)- new_list = []; new_list.extend(old_list)
  9. 0.039秒(0.39us / itn)- old_list[:]列表切片

因此最快的是列表切片。但是请注意copy.copy()list[:]和和python版本list(list)不同,和copy.deepcopy()和不会在列表中复制任何列表,字典和类实例,因此,如果原始版本更改,它们也会在复制的列表中更改,反之亦然。

(如果有人有兴趣或想提出任何问题,请使用以下脚本:)

from copy import deepcopy

class old_class:
    def __init__(self):
        self.blah = 'blah'

class new_class(object):
    def __init__(self):
        self.blah = 'blah'

dignore = {str: None, unicode: None, int: None, type(None): None}

def Copy(obj, use_deepcopy=True):
    t = type(obj)

    if t in (list, tuple):
        if t == tuple:
            # Convert to a list if a tuple to 
            # allow assigning to when copying
            is_tuple = True
            obj = list(obj)
        else: 
            # Otherwise just do a quick slice copy
            obj = obj[:]
            is_tuple = False

        # Copy each item recursively
        for x in xrange(len(obj)):
            if type(obj[x]) in dignore:
                continue
            obj[x] = Copy(obj[x], use_deepcopy)

        if is_tuple: 
            # Convert back into a tuple again
            obj = tuple(obj)

    elif t == dict: 
        # Use the fast shallow dict copy() method and copy any 
        # values which aren't immutable (like lists, dicts etc)
        obj = obj.copy()
        for k in obj:
            if type(obj[k]) in dignore:
                continue
            obj[k] = Copy(obj[k], use_deepcopy)

    elif t in dignore: 
        # Numeric or string/unicode? 
        # It's immutable, so ignore it!
        pass 

    elif use_deepcopy: 
        obj = deepcopy(obj)
    return obj

if __name__ == '__main__':
    import copy
    from time import time

    num_times = 100000
    L = [None, 'blah', 1, 543.4532, 
         ['foo'], ('bar',), {'blah': 'blah'},
         old_class(), new_class()]

    t = time()
    for i in xrange(num_times):
        Copy(L)
    print 'Custom Copy:', time()-t

    t = time()
    for i in xrange(num_times):
        Copy(L, use_deepcopy=False)
    print 'Custom Copy Only Copying Lists/Tuples/Dicts (no classes):', time()-t

    t = time()
    for i in xrange(num_times):
        copy.copy(L)
    print 'copy.copy:', time()-t

    t = time()
    for i in xrange(num_times):
        copy.deepcopy(L)
    print 'copy.deepcopy:', time()-t

    t = time()
    for i in xrange(num_times):
        L[:]
    print 'list slicing [:]:', time()-t

    t = time()
    for i in xrange(num_times):
        list(L)
    print 'list(L):', time()-t

    t = time()
    for i in xrange(num_times):
        [i for i in L]
    print 'list expression(L):', time()-t

    t = time()
    for i in xrange(num_times):
        a = []
        a.extend(L)
    print 'list extend:', time()-t

    t = time()
    for i in xrange(num_times):
        a = []
        for y in L:
            a.append(y)
    print 'list append:', time()-t

    t = time()
    for i in xrange(num_times):
        a = []
        a.extend(i for i in L)
    print 'generator expression extend:', time()-t

Felix already provided an excellent answer, but I thought I’d do a speed comparison of the various methods:

  1. 10.59 sec (105.9us/itn) – copy.deepcopy(old_list)
  2. 10.16 sec (101.6us/itn) – pure python Copy() method copying classes with deepcopy
  3. 1.488 sec (14.88us/itn) – pure python Copy() method not copying classes (only dicts/lists/tuples)
  4. 0.325 sec (3.25us/itn) – for item in old_list: new_list.append(item)
  5. 0.217 sec (2.17us/itn) – [i for i in old_list] (a list comprehension)
  6. 0.186 sec (1.86us/itn) – copy.copy(old_list)
  7. 0.075 sec (0.75us/itn) – list(old_list)
  8. 0.053 sec (0.53us/itn) – new_list = []; new_list.extend(old_list)
  9. 0.039 sec (0.39us/itn) – old_list[:] (list slicing)

So the fastest is list slicing. But be aware that copy.copy(), list[:] and list(list), unlike copy.deepcopy() and the python version don’t copy any lists, dictionaries and class instances in the list, so if the originals change, they will change in the copied list too and vice versa.

(Here’s the script if anyone’s interested or wants to raise any issues:)

from copy import deepcopy

class old_class:
    def __init__(self):
        self.blah = 'blah'

class new_class(object):
    def __init__(self):
        self.blah = 'blah'

dignore = {str: None, unicode: None, int: None, type(None): None}

def Copy(obj, use_deepcopy=True):
    t = type(obj)

    if t in (list, tuple):
        if t == tuple:
            # Convert to a list if a tuple to 
            # allow assigning to when copying
            is_tuple = True
            obj = list(obj)
        else: 
            # Otherwise just do a quick slice copy
            obj = obj[:]
            is_tuple = False

        # Copy each item recursively
        for x in xrange(len(obj)):
            if type(obj[x]) in dignore:
                continue
            obj[x] = Copy(obj[x], use_deepcopy)

        if is_tuple: 
            # Convert back into a tuple again
            obj = tuple(obj)

    elif t == dict: 
        # Use the fast shallow dict copy() method and copy any 
        # values which aren't immutable (like lists, dicts etc)
        obj = obj.copy()
        for k in obj:
            if type(obj[k]) in dignore:
                continue
            obj[k] = Copy(obj[k], use_deepcopy)

    elif t in dignore: 
        # Numeric or string/unicode? 
        # It's immutable, so ignore it!
        pass 

    elif use_deepcopy: 
        obj = deepcopy(obj)
    return obj

if __name__ == '__main__':
    import copy
    from time import time

    num_times = 100000
    L = [None, 'blah', 1, 543.4532, 
         ['foo'], ('bar',), {'blah': 'blah'},
         old_class(), new_class()]

    t = time()
    for i in xrange(num_times):
        Copy(L)
    print 'Custom Copy:', time()-t

    t = time()
    for i in xrange(num_times):
        Copy(L, use_deepcopy=False)
    print 'Custom Copy Only Copying Lists/Tuples/Dicts (no classes):', time()-t

    t = time()
    for i in xrange(num_times):
        copy.copy(L)
    print 'copy.copy:', time()-t

    t = time()
    for i in xrange(num_times):
        copy.deepcopy(L)
    print 'copy.deepcopy:', time()-t

    t = time()
    for i in xrange(num_times):
        L[:]
    print 'list slicing [:]:', time()-t

    t = time()
    for i in xrange(num_times):
        list(L)
    print 'list(L):', time()-t

    t = time()
    for i in xrange(num_times):
        [i for i in L]
    print 'list expression(L):', time()-t

    t = time()
    for i in xrange(num_times):
        a = []
        a.extend(L)
    print 'list extend:', time()-t

    t = time()
    for i in xrange(num_times):
        a = []
        for y in L:
            a.append(y)
    print 'list append:', time()-t

    t = time()
    for i in xrange(num_times):
        a = []
        a.extend(i for i in L)
    print 'generator expression extend:', time()-t

回答 2

有人告诉我Python 3.3+ 增加了list.copy()方法,该方法应与切片一样快:

newlist = old_list.copy()

I’ve been told that Python 3.3+ adds list.copy() method, which should be as fast as slicing:

newlist = old_list.copy()


回答 3

在Python中克隆或复制列表有哪些选项?

在Python 3中,可以使用以下方式创建浅表副本:

a_copy = a_list.copy()

在Python 2和3中,您可以获得包含原始文档完整切片的浅表副本:

a_copy = a_list[:]

说明

复制列表有两种语义方式。浅表副本创建相同对象的新列表,深表副本创建包含新的等效对象的新列表。

浅表副本

浅表副本仅复制列表本身,列表本身是对列表中对象的引用的容器。如果它们本身包含的对象是可变的,并且其中一个被更改,则更改将反映在两个列表中。

在Python 2和3中有不同的方法来执行此操作。Python2的方法也将在Python 3中工作。

Python 2

在Python 2中,制作列表的浅表副本的惯用方法是使用原始列表的完整切片:

a_copy = a_list[:]

您还可以通过将列表通过列表构造函数传递来完成同一件事,

a_copy = list(a_list)

但是使用构造函数的效率较低:

>>> timeit
>>> l = range(20)
>>> min(timeit.repeat(lambda: l[:]))
0.30504298210144043
>>> min(timeit.repeat(lambda: list(l)))
0.40698814392089844

Python 3

在Python 3中,列表获取list.copy方法:

a_copy = a_list.copy()

在Python 3.5中:

>>> import timeit
>>> l = list(range(20))
>>> min(timeit.repeat(lambda: l[:]))
0.38448613602668047
>>> min(timeit.repeat(lambda: list(l)))
0.6309100328944623
>>> min(timeit.repeat(lambda: l.copy()))
0.38122922903858125

制作另一个指针并没有进行复制

然后,每次使用my_list更改时,使用new_list = my_list都会修改new_list。为什么是这样?

my_list只是指向内存中实际列表的名称。当您说不new_list = my_list制作副本时,只是在添加另一个名称,该名称指向内存中的原始列表。复制列表时,我们可能会遇到类似的问题。

>>> l = [[], [], []]
>>> l_copy = l[:]
>>> l_copy
[[], [], []]
>>> l_copy[0].append('foo')
>>> l_copy
[['foo'], [], []]
>>> l
[['foo'], [], []]

该列表只是指向内容的指针数组,因此浅表副本仅复制指针,因此您有两个不同的列表,但是它们具有相同的内容。要复制内容,您需要一个深层副本。

深拷贝

为了使列表的深层副本,在Python 2或3时,使用deepcopy了在copy模块

import copy
a_deep_copy = copy.deepcopy(a_list)

为了演示这如何使我们创建新的子列表:

>>> import copy
>>> l
[['foo'], [], []]
>>> l_deep_copy = copy.deepcopy(l)
>>> l_deep_copy[0].pop()
'foo'
>>> l_deep_copy
[[], [], []]
>>> l
[['foo'], [], []]

因此,我们看到深度复制的列表与原始列表完全不同。您可以滚动自己的函数-但不能。通过使用标准库的Deepcopy函数,您可能会创建本来没有的bug。

不要使用 eval

您可能会将此视为深度复制的一种方法,但不要这样做:

problematic_deep_copy = eval(repr(a_list))
  1. 这很危险,特别是如果您正在评估来自不信任来源的内容时。
  2. 这是不可靠的,如果您要复制的子元素没有可以用来复制等效元素的表示形式。
  3. 它的性能也较差。

在64位Python 2.7中:

>>> import timeit
>>> import copy
>>> l = range(10)
>>> min(timeit.repeat(lambda: copy.deepcopy(l)))
27.55826997756958
>>> min(timeit.repeat(lambda: eval(repr(l))))
29.04534101486206

在64位Python 3.5上:

>>> import timeit
>>> import copy
>>> l = list(range(10))
>>> min(timeit.repeat(lambda: copy.deepcopy(l)))
16.84255409205798
>>> min(timeit.repeat(lambda: eval(repr(l))))
34.813894678023644

What are the options to clone or copy a list in Python?

In Python 3, a shallow copy can be made with:

a_copy = a_list.copy()

In Python 2 and 3, you can get a shallow copy with a full slice of the original:

a_copy = a_list[:]

Explanation

There are two semantic ways to copy a list. A shallow copy creates a new list of the same objects, a deep copy creates a new list containing new equivalent objects.

Shallow list copy

A shallow copy only copies the list itself, which is a container of references to the objects in the list. If the objects contained themselves are mutable and one is changed, the change will be reflected in both lists.

There are different ways to do this in Python 2 and 3. The Python 2 ways will also work in Python 3.

Python 2

In Python 2, the idiomatic way of making a shallow copy of a list is with a complete slice of the original:

a_copy = a_list[:]

You can also accomplish the same thing by passing the list through the list constructor,

a_copy = list(a_list)

but using the constructor is less efficient:

>>> timeit
>>> l = range(20)
>>> min(timeit.repeat(lambda: l[:]))
0.30504298210144043
>>> min(timeit.repeat(lambda: list(l)))
0.40698814392089844

Python 3

In Python 3, lists get the list.copy method:

a_copy = a_list.copy()

In Python 3.5:

>>> import timeit
>>> l = list(range(20))
>>> min(timeit.repeat(lambda: l[:]))
0.38448613602668047
>>> min(timeit.repeat(lambda: list(l)))
0.6309100328944623
>>> min(timeit.repeat(lambda: l.copy()))
0.38122922903858125

Making another pointer does not make a copy

Using new_list = my_list then modifies new_list every time my_list changes. Why is this?

my_list is just a name that points to the actual list in memory. When you say new_list = my_list you’re not making a copy, you’re just adding another name that points at that original list in memory. We can have similar issues when we make copies of lists.

>>> l = [[], [], []]
>>> l_copy = l[:]
>>> l_copy
[[], [], []]
>>> l_copy[0].append('foo')
>>> l_copy
[['foo'], [], []]
>>> l
[['foo'], [], []]

The list is just an array of pointers to the contents, so a shallow copy just copies the pointers, and so you have two different lists, but they have the same contents. To make copies of the contents, you need a deep copy.

Deep copies

To make a deep copy of a list, in Python 2 or 3, use deepcopy in the copy module:

import copy
a_deep_copy = copy.deepcopy(a_list)

To demonstrate how this allows us to make new sub-lists:

>>> import copy
>>> l
[['foo'], [], []]
>>> l_deep_copy = copy.deepcopy(l)
>>> l_deep_copy[0].pop()
'foo'
>>> l_deep_copy
[[], [], []]
>>> l
[['foo'], [], []]

And so we see that the deep copied list is an entirely different list from the original. You could roll your own function – but don’t. You’re likely to create bugs you otherwise wouldn’t have by using the standard library’s deepcopy function.

Don’t use eval

You may see this used as a way to deepcopy, but don’t do it:

problematic_deep_copy = eval(repr(a_list))
  1. It’s dangerous, particularly if you’re evaluating something from a source you don’t trust.
  2. It’s not reliable, if a subelement you’re copying doesn’t have a representation that can be eval’d to reproduce an equivalent element.
  3. It’s also less performant.

In 64 bit Python 2.7:

>>> import timeit
>>> import copy
>>> l = range(10)
>>> min(timeit.repeat(lambda: copy.deepcopy(l)))
27.55826997756958
>>> min(timeit.repeat(lambda: eval(repr(l))))
29.04534101486206

on 64 bit Python 3.5:

>>> import timeit
>>> import copy
>>> l = list(range(10))
>>> min(timeit.repeat(lambda: copy.deepcopy(l)))
16.84255409205798
>>> min(timeit.repeat(lambda: eval(repr(l))))
34.813894678023644

回答 4

已经有很多答案可以告诉您如何制作正确的副本,但是没有一个答案说明您原来的“副本”失败的原因。

Python不会将值存储在变量中。它将名称绑定到对象。您的原始任务采用了所引用的对象并将其my_list绑定到该对象new_list。无论您使用哪个名称,都只有一个列表,因此将其引用为时所做的更改my_list将保持不变new_list。该问题的其他每个答案都为您提供了不同的方法来创建要绑定的新对象new_list

列表中的每个元素都像名称一样,因为每个元素都非排他地绑定到对象。浅表副本会创建一个新列表,其元素绑定到与以前相同的对象。

new_list = list(my_list)  # or my_list[:], but I prefer this syntax
# is simply a shorter way of:
new_list = [element for element in my_list]

要使列表复制更进一步,请复制列表引用的每个对象,然后将这些元素副本绑定到新列表。

import copy  
# each element must have __copy__ defined for this...
new_list = [copy.copy(element) for element in my_list]

这不是一个深层副本,因为列表的每个元素都可以引用其他对象,就像列表绑定到其元素一样。要递归复制列表中的每个元素,然后递归复制每个元素引用的其他对象,依此类推:执行深层复制。

import copy
# each element must have __deepcopy__ defined for this...
new_list = copy.deepcopy(my_list)

有关复制中极端情况的更多信息,请参见文档

There are many answers already that tell you how to make a proper copy, but none of them say why your original ‘copy’ failed.

Python doesn’t store values in variables; it binds names to objects. Your original assignment took the object referred to by my_list and bound it to new_list as well. No matter which name you use there is still only one list, so changes made when referring to it as my_list will persist when referring to it as new_list. Each of the other answers to this question give you different ways of creating a new object to bind to new_list.

Each element of a list acts like a name, in that each element binds non-exclusively to an object. A shallow copy creates a new list whose elements bind to the same objects as before.

new_list = list(my_list)  # or my_list[:], but I prefer this syntax
# is simply a shorter way of:
new_list = [element for element in my_list]

To take your list copy one step further, copy each object that your list refers to, and bind those element copies to a new list.

import copy  
# each element must have __copy__ defined for this...
new_list = [copy.copy(element) for element in my_list]

This is not yet a deep copy, because each element of a list may refer to other objects, just like the list is bound to its elements. To recursively copy every element in the list, and then each other object referred to by each element, and so on: perform a deep copy.

import copy
# each element must have __deepcopy__ defined for this...
new_list = copy.deepcopy(my_list)

See the documentation for more information about corner cases in copying.


回答 5

采用 thing[:]

>>> a = [1,2]
>>> b = a[:]
>>> a += [3]
>>> a
[1, 2, 3]
>>> b
[1, 2]
>>> 

Use thing[:]

>>> a = [1,2]
>>> b = a[:]
>>> a += [3]
>>> a
[1, 2, 3]
>>> b
[1, 2]
>>> 

回答 6

让我们从头开始,探讨这个问题。

因此,假设您有两个列表:

list_1=['01','98']
list_2=[['01','98']]

我们必须复制两个列表,现在从第一个列表开始:

因此,首先让我们尝试将变量设置为copy原始列表list_1

copy=list_1

现在,如果您正在考虑将副本复制到list_1,那么您错了。该id函数可以显示两个变量是否可以指向同一对象。让我们尝试一下:

print(id(copy))
print(id(list_1))

输出为:

4329485320
4329485320

这两个变量是完全相同的参数。你惊喜吗?

因此,我们知道python在变量中不存储任何内容,变量只是引用对象,而对象存储值。这里的对象是a,list但是我们通过两个不同的变量名称创建了对该对象的两个引用。这意味着两个变量都指向相同的对象,只是名称不同。

当您这样做时copy=list_1,它实际上是在做:

在此处输入图片说明

在图像list_1和副本中,这是两个变量名,但是两个变量的对象相同,即 list

因此,如果您尝试修改复制的列表,那么它也将修改原始列表,因为该列表仅存在于此列表中,无论您是从复制列表还是从原始列表进行操作,都将修改该列表:

copy[0]="modify"

print(copy)
print(list_1)

输出:

['modify', '98']
['modify', '98']

因此,它修改了原始列表:

现在,让我们进入用于复制列表的pythonic方法。

copy_1=list_1[:]

此方法解决了我们遇到的第一个问题:

print(id(copy_1))
print(id(list_1))

4338792136
4338791432

因此,如我们所见,两个列表都具有不同的ID,这意味着两个变量都指向不同的对象。所以这里实际发生的是:

在此处输入图片说明

现在,让我们尝试修改列表,看看我们是否仍然面临上一个问题:

copy_1[0]="modify"

print(list_1)
print(copy_1)

输出为:

['01', '98']
['modify', '98']

如您所见,它仅修改了复制的列表。这意味着它有效。

你认为我们完成了吗?否。让我们尝试复制嵌套列表。

copy_2=list_2[:]

list_2应该引用另一个对象,即的副本list_2。让我们检查:

print(id((list_2)),id(copy_2))

我们得到输出:

4330403592 4330403528

现在我们可以假设两个列表都指向不同的对象,所以现在让我们尝试对其进行修改,然后看看它在提供我们想要的东西:

copy_2[0][1]="modify"

print(list_2,copy_2)

这给了我们输出:

[['01', 'modify']] [['01', 'modify']]

这似乎有点令人困惑,因为我们以前使用的相同方法有效。让我们尝试理解这一点。

当您这样做时:

copy_2=list_2[:]

您只复制外部列表,而不复制内部列表。我们可以id再次使用该功能进行检查。

print(id(copy_2[0]))
print(id(list_2[0]))

输出为:

4329485832
4329485832

当我们这样做时copy_2=list_2[:],会发生以下情况:

在此处输入图片说明

它创建列表的副本,但仅创建外部列表副本,而不创建嵌套列表副本,两个变量的嵌套列表相同,因此,如果您尝试修改嵌套列表,则由于嵌套列表对象相同,它也会修改原始列表对于两个列表。

解决办法是什么?解决方案是deepcopy功能。

from copy import deepcopy
deep=deepcopy(list_2)

让我们检查一下:

print(id((list_2)),id(deep))

4322146056 4322148040

两个外部列表都有不同的ID,让我们在内部嵌套列表上尝试一下。

print(id(deep[0]))
print(id(list_2[0]))

输出为:

4322145992
4322145800

如您所见,两个ID不同,这意味着我们可以假设两个嵌套列表现在都指向不同的对象。

这意味着当您执行deep=deepcopy(list_2)实际操作时:

在此处输入图片说明

两个嵌套列表都指向不同的对象,并且它们现在具有单独的嵌套列表副本。

现在,让我们尝试修改嵌套列表,看看它是否解决了先前的问题:

deep[0][1]="modify"
print(list_2,deep)

它输出:

[['01', '98']] [['01', 'modify']]

如您所见,它没有修改原始的嵌套列表,只修改了复制的列表。

Let’s start from the beginning and explore this question.

So let’s suppose you have two lists:

list_1=['01','98']
list_2=[['01','98']]

And we have to copy both lists, now starting from the first list:

So first let’s try by setting the variable copy to our original list, list_1:

copy=list_1

Now if you are thinking copy copied the list_1, then you are wrong. The id function can show us if two variables can point to the same object. Let’s try this:

print(id(copy))
print(id(list_1))

The output is:

4329485320
4329485320

Both variables are the exact same argument. Are you surprised?

So as we know python doesn’t store anything in a variable, Variables are just referencing to the object and object store the value. Here object is a list but we created two references to that same object by two different variable names. This means that both variables are pointing to the same object, just with different names.

When you do copy=list_1, it is actually doing:

enter image description here

Here in the image list_1 and copy are two variable names but the object is same for both variable which is list

So if you try to modify copied list then it will modify the original list too because the list is only one there, you will modify that list no matter you do from the copied list or from the original list:

copy[0]="modify"

print(copy)
print(list_1)

output:

['modify', '98']
['modify', '98']

So it modified the original list :

Now let’s move onto a pythonic method for copying lists.

copy_1=list_1[:]

This method fixes the first issue we had:

print(id(copy_1))
print(id(list_1))

4338792136
4338791432

So as we can see our both list having different id and it means that both variables are pointing to different objects. So what actually going on here is:

enter image description here

Now let’s try to modify the list and let’s see if we still face the previous problem:

copy_1[0]="modify"

print(list_1)
print(copy_1)

The output is:

['01', '98']
['modify', '98']

As you can see, it only modified the copied list. That means it worked.

Do you think we’re done? No. Let’s try to copy our nested list.

copy_2=list_2[:]

list_2 should reference to another object which is copy of list_2. Let’s check:

print(id((list_2)),id(copy_2))

We get the output:

4330403592 4330403528

Now we can assume both lists are pointing different object, so now let’s try to modify it and let’s see it is giving what we want:

copy_2[0][1]="modify"

print(list_2,copy_2)

This gives us the output:

[['01', 'modify']] [['01', 'modify']]

This may seem a little bit confusing, because the same method we previously used worked. Let’s try to understand this.

When you do:

copy_2=list_2[:]

You’re only copying the outer list, not the inside list. We can use the id function once again to check this.

print(id(copy_2[0]))
print(id(list_2[0]))

The output is:

4329485832
4329485832

When we do copy_2=list_2[:], this happens:

enter image description here

It creates the copy of list but only outer list copy, not the nested list copy, nested list is same for both variable, so if you try to modify the nested list then it will modify the original list too as the nested list object is same for both lists.

What is the solution? The solution is the deepcopy function.

from copy import deepcopy
deep=deepcopy(list_2)

Let’s check this:

print(id((list_2)),id(deep))

4322146056 4322148040

Both outer lists have different IDs, let’s try this on the inner nested lists.

print(id(deep[0]))
print(id(list_2[0]))

The output is:

4322145992
4322145800

As you can see both IDs are different, meaning we can assume that both nested lists are pointing different object now.

This means when you do deep=deepcopy(list_2) what actually happens:

enter image description here

Both nested lists are pointing different object and they have separate copy of nested list now.

Now let’s try to modify the nested list and see if it solved the previous issue or not:

deep[0][1]="modify"
print(list_2,deep)

It outputs:

[['01', '98']] [['01', 'modify']]

As you can see, it didn’t modify the original nested list, it only modified the copied list.


回答 7

Python这样做的习惯是 newList = oldList[:]

Python’s idiom for doing this is newList = oldList[:]


回答 8

Python 3.6计时

以下是使用Python 3.6.8的计时结果。请记住,这些时间是相对的,而不是绝对的。

我坚持只做浅表副本,并且还添加了一些新的方法,这些新方法在Python2中是不可能的,例如list.copy()等效于Python3 slice)和列表解包的两种形式(*new_list, = listnew_list = [*list]):

METHOD                  TIME TAKEN
b = [*a]                2.75180600000021
b = a * 1               3.50215399999990
b = a[:]                3.78278899999986  # Python2 winner (see above)
b = a.copy()            4.20556500000020  # Python3 "slice equivalent" (see above)
b = []; b.extend(a)     4.68069800000012
b = a[0:len(a)]         6.84498999999959
*b, = a                 7.54031799999984
b = list(a)             7.75815899999997
b = [i for i in a]      18.4886440000000
b = copy.copy(a)        18.8254879999999
b = []
for item in a:
  b.append(item)        35.4729199999997

我们可以看到Python2赢家仍然表现不错,但并没有在很大程度上超越Python3 list.copy(),特别是考虑到后者的优越可读性。

黑马是拆包和重新打包的方法(b = [*a]),比原始切片快25%,是其他拆包方法(*b, = a)的两倍以上。

b = a * 1 也做得很好。

请注意,这些方法对于列表以外的任何输入均不输出等效结果。它们都适用于可切片的对象,少数适用于任何可迭代的对象,但仅copy.copy()适用于更通用的Python对象。


这是有关各方的测试代码(来自此处的模板):

import timeit

COUNT = 50000000
print("Array duplicating. Tests run", COUNT, "times")
setup = 'a = [0,1,2,3,4,5,6,7,8,9]; import copy'

print("b = list(a)\t\t", timeit.timeit(stmt='b = list(a)', setup=setup, number=COUNT))
print("b = copy.copy(a)\t", timeit.timeit(stmt='b = copy.copy(a)', setup=setup, number=COUNT))
print("b = a.copy()\t\t", timeit.timeit(stmt='b = a.copy()', setup=setup, number=COUNT))
print("b = a[:]\t\t", timeit.timeit(stmt='b = a[:]', setup=setup, number=COUNT))
print("b = a[0:len(a)]\t\t", timeit.timeit(stmt='b = a[0:len(a)]', setup=setup, number=COUNT))
print("*b, = a\t\t\t", timeit.timeit(stmt='*b, = a', setup=setup, number=COUNT))
print("b = []; b.extend(a)\t", timeit.timeit(stmt='b = []; b.extend(a)', setup=setup, number=COUNT))
print("b = []; for item in a: b.append(item)\t", timeit.timeit(stmt='b = []\nfor item in a:  b.append(item)', setup=setup, number=COUNT))
print("b = [i for i in a]\t", timeit.timeit(stmt='b = [i for i in a]', setup=setup, number=COUNT))
print("b = [*a]\t\t", timeit.timeit(stmt='b = [*a]', setup=setup, number=COUNT))
print("b = a * 1\t\t", timeit.timeit(stmt='b = a * 1', setup=setup, number=COUNT))

Python 3.6 Timings

Here are the timing results using Python 3.6.8. Keep in mind these times are relative to one another, not absolute.

I stuck to only doing shallow copies, and also added some new methods that weren’t possible in Python2, such as list.copy() (the Python3 slice equivalent) and two forms of list unpacking (*new_list, = list and new_list = [*list]):

METHOD                  TIME TAKEN
b = [*a]                2.75180600000021
b = a * 1               3.50215399999990
b = a[:]                3.78278899999986  # Python2 winner (see above)
b = a.copy()            4.20556500000020  # Python3 "slice equivalent" (see above)
b = []; b.extend(a)     4.68069800000012
b = a[0:len(a)]         6.84498999999959
*b, = a                 7.54031799999984
b = list(a)             7.75815899999997
b = [i for i in a]      18.4886440000000
b = copy.copy(a)        18.8254879999999
b = []
for item in a:
  b.append(item)        35.4729199999997

We can see the Python2 winner still does well, but doesn’t edge out Python3 list.copy() by much, especially considering the superior readability of the latter.

The dark horse is the unpacking and repacking method (b = [*a]), which is ~25% faster than raw slicing, and more than twice as fast as the other unpacking method (*b, = a).

b = a * 1 also does surprisingly well.

Note that these methods do not output equivalent results for any input other than lists. They all work for sliceable objects, a few work for any iterable, but only copy.copy() works for more general Python objects.


Here is the testing code for interested parties (Template from here):

import timeit

COUNT = 50000000
print("Array duplicating. Tests run", COUNT, "times")
setup = 'a = [0,1,2,3,4,5,6,7,8,9]; import copy'

print("b = list(a)\t\t", timeit.timeit(stmt='b = list(a)', setup=setup, number=COUNT))
print("b = copy.copy(a)\t", timeit.timeit(stmt='b = copy.copy(a)', setup=setup, number=COUNT))
print("b = a.copy()\t\t", timeit.timeit(stmt='b = a.copy()', setup=setup, number=COUNT))
print("b = a[:]\t\t", timeit.timeit(stmt='b = a[:]', setup=setup, number=COUNT))
print("b = a[0:len(a)]\t\t", timeit.timeit(stmt='b = a[0:len(a)]', setup=setup, number=COUNT))
print("*b, = a\t\t\t", timeit.timeit(stmt='*b, = a', setup=setup, number=COUNT))
print("b = []; b.extend(a)\t", timeit.timeit(stmt='b = []; b.extend(a)', setup=setup, number=COUNT))
print("b = []; for item in a: b.append(item)\t", timeit.timeit(stmt='b = []\nfor item in a:  b.append(item)', setup=setup, number=COUNT))
print("b = [i for i in a]\t", timeit.timeit(stmt='b = [i for i in a]', setup=setup, number=COUNT))
print("b = [*a]\t\t", timeit.timeit(stmt='b = [*a]', setup=setup, number=COUNT))
print("b = a * 1\t\t", timeit.timeit(stmt='b = a * 1', setup=setup, number=COUNT))

回答 9

所有其他贡献者都给出了不错的答案,当您只有一个维(级别)列表时,这些方法就可以copy.deepcopy()工作,但是到目前为止,提到的方法仅适用于克隆/复制列表,而list当您处于列表中时,它不能指向嵌套对象使用多维嵌套列表(列表列表)。虽然Felix Kling在回答中提到了此问题,但问题还有很多,并且可能是使用内置方法的变通办法,它可以证明是更快的替代方法deepcopy

虽然new_list = old_list[:]copy.copy(old_list)'对于Py3k old_list.copy()适用于单层列表,它们还原为指向list嵌套在old_list和中的对象new_list,而对其中一个list对象的更改则永久存在于另一个对象中。

编辑:揭露新信息

正如Aaron HallPM 2Ring 所指出的那样,使用eval()不仅是一个坏主意,而且比慢得多copy.deepcopy()

这意味着对于多维列表,唯一的选择是copy.deepcopy()。话虽这么说,当您尝试在中等大小的多维数组上使用它时,性能确实会下降,这确实不是一个选择。我尝试timeit使用42×42的阵列,对于生物信息学应用程序,这并不是闻所未闻的,甚至还不是那么大,我放弃了等待响应,只是开始在这篇文章中输入我的编辑。

这样看来,唯一真正的选择是初始化多个列表并独立处理它们。如果有人对如何处理多维列表复制有任何其他建议,将不胜感激。

如其他人所述,使用模块和多维列表存在 严重的性能问题。copycopy.deepcopy

All of the other contributors gave great answers, which work when you have a single dimension (leveled) list, however of the methods mentioned so far, only copy.deepcopy() works to clone/copy a list and not have it point to the nested list objects when you are working with multidimensional, nested lists (list of lists). While Felix Kling refers to it in his answer, there is a little bit more to the issue and possibly a workaround using built-ins that might prove a faster alternative to deepcopy.

While new_list = old_list[:], copy.copy(old_list)' and for Py3k old_list.copy() work for single-leveled lists, they revert to pointing at the list objects nested within the old_list and the new_list, and changes to one of the list objects are perpetuated in the other.

Edit: New information brought to light

As was pointed out by both Aaron Hall and PM 2Ring using eval() is not only a bad idea, it is also much slower than copy.deepcopy().

This means that for multidimensional lists, the only option is copy.deepcopy(). With that being said, it really isn’t an option as the performance goes way south when you try to use it on a moderately sized multidimensional array. I tried to timeit using a 42×42 array, not unheard of or even that large for bioinformatics applications, and I gave up on waiting for a response and just started typing my edit to this post.

It would seem that the only real option then is to initialize multiple lists and work on them independently. If anyone has any other suggestions, for how to handle multidimensional list copying, it would be appreciated.

As others have stated, there are significant performance issues using the copy module and copy.deepcopy for multidimensional lists.


回答 10

令我惊讶的是尚未提及,因此出于完整性考虑…

您可以使用“ splat运算符”:进行列表解压缩*,这也会复制列表中的元素。

old_list = [1, 2, 3]

new_list = [*old_list]

new_list.append(4)
old_list == [1, 2, 3]
new_list == [1, 2, 3, 4]

该方法的明显缺点是仅在Python 3.5+中可用。

尽管在时间上比较明智,但它似乎比其他常用方法要好。

x = [random.random() for _ in range(1000)]

%timeit a = list(x)
%timeit a = x.copy()
%timeit a = x[:]

%timeit a = [*x]

#: 2.47 µs ± 38.1 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
#: 2.47 µs ± 54.6 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
#: 2.39 µs ± 58.2 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)

#: 2.22 µs ± 43.2 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)

It surprises me that this hasn’t been mentioned yet, so for the sake of completeness…

You can perform list unpacking with the “splat operator”: *, which will also copy elements of your list.

old_list = [1, 2, 3]

new_list = [*old_list]

new_list.append(4)
old_list == [1, 2, 3]
new_list == [1, 2, 3, 4]

The obvious downside to this method is that it is only available in Python 3.5+.

Timing wise though, this appears to perform better than other common methods.

x = [random.random() for _ in range(1000)]

%timeit a = list(x)
%timeit a = x.copy()
%timeit a = x[:]

%timeit a = [*x]

#: 2.47 µs ± 38.1 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
#: 2.47 µs ± 54.6 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
#: 2.39 µs ± 58.2 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)

#: 2.22 µs ± 43.2 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)

回答 11

已经给出的答案中缺少一种独立于python版本的非常简单的方法,您可以在大多数时间使用它(至少我可以这样做):

new_list = my_list * 1       #Solution 1 when you are not using nested lists

但是,如果my_list包含其他容器(例如,嵌套列表),则必须使用Deepcopy,如上面复制库中答案中所建议的那样。例如:

import copy
new_list = copy.deepcopy(my_list)   #Solution 2 when you are using nested lists

奖励:如果您不想复制元素,请使用(也称为浅表复制):

new_list = my_list[:]

让我们了解解决方案1和解决方案2之间的区别

>>> a = range(5)
>>> b = a*1
>>> a,b
([0, 1, 2, 3, 4], [0, 1, 2, 3, 4])
>>> a[2] = 55 
>>> a,b
([0, 1, 55, 3, 4], [0, 1, 2, 3, 4])

如您所见,当我们不使用嵌套列表时,解决方案1可以完美地工作。让我们检查一下将解决方案1应用于嵌套列表时会发生什么。

>>> from copy import deepcopy
>>> a = [range(i,i+4) for i in range(3)]
>>> a
[[0, 1, 2, 3], [1, 2, 3, 4], [2, 3, 4, 5]]
>>> b = a*1
>>> c = deepcopy(a)
>>> for i in (a, b, c): print i   
[[0, 1, 2, 3], [1, 2, 3, 4], [2, 3, 4, 5]]
[[0, 1, 2, 3], [1, 2, 3, 4], [2, 3, 4, 5]]
[[0, 1, 2, 3], [1, 2, 3, 4], [2, 3, 4, 5]]
>>> a[2].append('99')
>>> for i in (a, b, c): print i   
[[0, 1, 2, 3], [1, 2, 3, 4], [2, 3, 4, 5, 99]]
[[0, 1, 2, 3], [1, 2, 3, 4], [2, 3, 4, 5, 99]]   #Solution#1 didn't work in nested list
[[0, 1, 2, 3], [1, 2, 3, 4], [2, 3, 4, 5]]       #Solution #2 - DeepCopy worked in nested list

A very simple approach independent of python version was missing in already given answers which you can use most of the time (at least I do):

new_list = my_list * 1       #Solution 1 when you are not using nested lists

However, If my_list contains other containers (for eg. nested lists) you must use deepcopy as others suggested in the answers above from the copy library. For example:

import copy
new_list = copy.deepcopy(my_list)   #Solution 2 when you are using nested lists

.Bonus: If you don’t want to copy elements use (aka shallow copy):

new_list = my_list[:]

Let’s understand difference between Solution#1 and Solution #2

>>> a = range(5)
>>> b = a*1
>>> a,b
([0, 1, 2, 3, 4], [0, 1, 2, 3, 4])
>>> a[2] = 55 
>>> a,b
([0, 1, 55, 3, 4], [0, 1, 2, 3, 4])

As you can see Solution #1 worked perfectly when we were not using the nested lists. Let’s check what will happen when we apply solution #1 to nested lists.

>>> from copy import deepcopy
>>> a = [range(i,i+4) for i in range(3)]
>>> a
[[0, 1, 2, 3], [1, 2, 3, 4], [2, 3, 4, 5]]
>>> b = a*1
>>> c = deepcopy(a)
>>> for i in (a, b, c): print i   
[[0, 1, 2, 3], [1, 2, 3, 4], [2, 3, 4, 5]]
[[0, 1, 2, 3], [1, 2, 3, 4], [2, 3, 4, 5]]
[[0, 1, 2, 3], [1, 2, 3, 4], [2, 3, 4, 5]]
>>> a[2].append('99')
>>> for i in (a, b, c): print i   
[[0, 1, 2, 3], [1, 2, 3, 4], [2, 3, 4, 5, 99]]
[[0, 1, 2, 3], [1, 2, 3, 4], [2, 3, 4, 5, 99]]   #Solution#1 didn't work in nested list
[[0, 1, 2, 3], [1, 2, 3, 4], [2, 3, 4, 5]]       #Solution #2 - DeepCopy worked in nested list

回答 12

请注意,在某些情况下,如果您定义了自己的自定义类并且想要保留属性,则应使用copy.copy()copy.deepcopy()而不是替代方法,例如在Python 3中:

import copy

class MyList(list):
    pass

lst = MyList([1,2,3])

lst.name = 'custom list'

d = {
'original': lst,
'slicecopy' : lst[:],
'lstcopy' : lst.copy(),
'copycopy': copy.copy(lst),
'deepcopy': copy.deepcopy(lst)
}


for k,v in d.items():
    print('lst: {}'.format(k), end=', ')
    try:
        name = v.name
    except AttributeError:
        name = 'NA'
    print('name: {}'.format(name))

输出:

lst: original, name: custom list
lst: slicecopy, name: NA
lst: lstcopy, name: NA
lst: copycopy, name: custom list
lst: deepcopy, name: custom list

Note that there are some cases where if you have defined your own custom class and you want to keep the attributes then you should use copy.copy() or copy.deepcopy() rather than the alternatives, for example in Python 3:

import copy

class MyList(list):
    pass

lst = MyList([1,2,3])

lst.name = 'custom list'

d = {
'original': lst,
'slicecopy' : lst[:],
'lstcopy' : lst.copy(),
'copycopy': copy.copy(lst),
'deepcopy': copy.deepcopy(lst)
}


for k,v in d.items():
    print('lst: {}'.format(k), end=', ')
    try:
        name = v.name
    except AttributeError:
        name = 'NA'
    print('name: {}'.format(name))

Outputs:

lst: original, name: custom list
lst: slicecopy, name: NA
lst: lstcopy, name: NA
lst: copycopy, name: custom list
lst: deepcopy, name: custom list

回答 13

new_list = my_list[:]

new_list = my_list 尝试了解这一点。假设my_list位于X位置的堆内存中,即my_list指向X。现在,通过分配new_list = my_list,让new_list指向X。这称为浅拷贝。

现在,如果您进行分配,new_list = my_list[:]您只需将my_list的每个对象复制到new_list。这称为深拷贝。

您可以执行的另一种方法是:

  • new_list = list(old_list)
  • import copy new_list = copy.deepcopy(old_list)
new_list = my_list[:]

new_list = my_list Try to understand this. Let’s say that my_list is in the heap memory at location X i.e. my_list is pointing to the X. Now by assigning new_list = my_list you’re Letting new_list pointing to the X. This is known as shallow Copy.

Now if you assign new_list = my_list[:] You’re simply copying each object of my_list to new_list. This is known as Deep copy.

The Other way you can do this are :

  • new_list = list(old_list)
  • import copy new_list = copy.deepcopy(old_list)

回答 14

我想发布一些与其他答案有些不同的东西。即使这很可能不是最容易理解或最快的选择,但它提供了一些深入了解深度复制工作原理的内部视图,并且是深度复制的另一种替代选择。我的函数是否有错误并不重要,因为这样做的目的是显示一种复制对象(如问题答案)的方法,也可以以此为手段来解释深度复制在其核心中的工作方式。

深层复制功能的核心是进行浅层复制的方法。怎么样?简单。任何深层复制功能只会复制不可变对象的容器。对嵌套列表进行深度复制时,仅复制外部列表,而不复制列表内部的可变对象。您仅在复制容器。上课也一样。对类进行深度复制时,将对所有可变属性进行深度复制。又怎样?您为什么只需要复制容器,如列表,字典,元组,迭代器,类和类实例?

这很简单。可变对象实际上不能被复制。它永远不能更改,因此它只是一个值。这意味着您不必重复字符串,数字,布尔值或任何重复的字符串。但是,您将如何复制容器?简单。您只需使用所有值初始化一个新容器。Deepcopy依赖于递归。它会复制所有容器,甚至是其中包含容器的容器,直到没有剩余容器为止。容器是一个不变的对象。

知道这一点后,无需任何引用即可完全复制对象非常容易。这是一个用于深度复制基本数据类型的函数(不适用于自定义类,但您可以随时添加它)

def deepcopy(x):
  immutables = (str, int, bool, float)
  mutables = (list, dict, tuple)
  if isinstance(x, immutables):
    return x
  elif isinstance(x, mutables):
    if isinstance(x, tuple):
      return tuple(deepcopy(list(x)))
    elif isinstance(x, list):
      return [deepcopy(y) for y in x]
    elif isinstance(x, dict):
      values = [deepcopy(y) for y in list(x.values())]
      keys = list(x.keys())
      return dict(zip(keys, values))

Python自己的内置Deepcopy是基于该示例的。唯一的区别是,它支持其他类型,并且通过将属性复制到新的重复类中来支持用户类,并且还可以通过使用备忘录列表或字典对已经看到的对象的引用来阻止无限递归。制作深拷贝确实就是这样。从本质上讲,深层复制只是浅层复制。我希望这个答案可以为问题增添一些内容。

例子

假设您有以下列表:[1,2,3]。不可变的数字不能重复,但是另一层可以重复。您可以使用列表理解来复制它:[x表示[1、2、3]中的x

现在,假设您有以下列表:[[1,2],[3,4],[5,6]]。这次,您想创建一个函数,该函数使用递归来深度复制列表的所有层。代替先前的列表理解:

[x for x in _list]

它使用一个新的列表:

[deepcopy_list(x) for x in _list]

而且deepcopy_list看起来像这样:

def deepcopy_list(x):
  if isinstance(x, (str, bool, float, int)):
    return x
  else:
    return [deepcopy_list(y) for y in x]

然后,您现在有了一个函数,该函数可以使用递归str,bool,floast,int甚至列表的任何列表深复制到无限多个图层。在那里,您可以进行深度复制。

TLDR:Deepcopy使用递归来复制对象,并且仅返回与以前相同的不可变对象,因为不能复制不可变对象。但是,它将深层复制可变对象的最内层,直到到达对象的最外层可变层。

I wanted to post something a bit different then some of the other answers. Even though this is most likely not the most understandable, or fastest option, it provides a bit of an inside view of how deep copy works, as well as being another alternative option for deep copying. It doesn’t really matter if my function has bugs, since the point of this is to show a way to copy objects like the question answers, but also to use this as a point to explain how deepcopy works at its core.

At the core of any deep copy function is way to make a shallow copy. How? Simple. Any deep copy function only duplicates the containers of immutable objects. When you deepcopy a nested list, you are only duplicating the outer lists, not the mutable objects inside of the lists. You are only duplicating the containers. The same works for classes, too. When you deepcopy a class, you deepcopy all of its mutable attributes. So, how? How come you only have to copy the containers, like lists, dicts, tuples, iters, classes, and class instances?

It’s simple. A mutable object can’t really be duplicated. It can never be changed, so it is only a single value. That means you never have to duplicate strings, numbers, bools, or any of those. But how would you duplicate the containers? Simple. You make just initialize a new container with all of the values. Deepcopy relies on recursion. It duplicates all the containers, even ones with containers inside of them, until no containers are left. A container is an immutable object.

Once you know that, completely duplicating an object without any references is pretty easy. Here’s a function for deepcopying basic data-types (wouldn’t work for custom classes but you could always add that)

def deepcopy(x):
  immutables = (str, int, bool, float)
  mutables = (list, dict, tuple)
  if isinstance(x, immutables):
    return x
  elif isinstance(x, mutables):
    if isinstance(x, tuple):
      return tuple(deepcopy(list(x)))
    elif isinstance(x, list):
      return [deepcopy(y) for y in x]
    elif isinstance(x, dict):
      values = [deepcopy(y) for y in list(x.values())]
      keys = list(x.keys())
      return dict(zip(keys, values))

Python’s own built-in deepcopy is based around that example. The only difference is it supports other types, and also supports user-classes by duplicating the attributes into a new duplicate class, and also blocks infinite-recursion with a reference to an object it’s already seen using a memo list or dictionary. And that’s really it for making deep copies. At its core, making a deep copy is just making shallow copies. I hope this answer adds something to the question.

EXAMPLES

Say you have this list: [1, 2, 3]. The immutable numbers cannot be duplicated, but the other layer can. You can duplicate it using a list comprehension: [x for x in [1, 2, 3]

Now, imagine you have this list: [[1, 2], [3, 4], [5, 6]]. This time, you want to make a function, which uses recursion to deep copy all layers of the list. Instead of the previous list comprehension:

[x for x in _list]

It uses a new one for lists:

[deepcopy_list(x) for x in _list]

And deepcopy_list looks like this:

def deepcopy_list(x):
  if isinstance(x, (str, bool, float, int)):
    return x
  else:
    return [deepcopy_list(y) for y in x]

Then now you have a function which can deepcopy any list of strs, bools, floast, ints and even lists to infinitely many layers using recursion. And there you have it, deepcopying.

TLDR: Deepcopy uses recursion to duplicate objects, and merely returns the same immutable objects as before, as immutable objects cannot be duplicated. However, it deepcopies the most inner layers of mutable objects until it reaches the outermost mutable layer of an object.


回答 15

从id和gc进入内存的实用角度。

>>> b = a = ['hell', 'word']
>>> c = ['hell', 'word']

>>> id(a), id(b), id(c)
(4424020872, 4424020872, 4423979272) 
     |           |
      -----------

>>> id(a[0]), id(b[0]), id(c[0])
(4424018328, 4424018328, 4424018328) # all referring to same 'hell'
     |           |           |
      -----------------------

>>> id(a[0][0]), id(b[0][0]), id(c[0][0])
(4422785208, 4422785208, 4422785208) # all referring to same 'h'
     |           |           |
      -----------------------

>>> a[0] += 'o'
>>> a,b,c
(['hello', 'word'], ['hello', 'word'], ['hell', 'word'])  # b changed too
>>> id(a[0]), id(b[0]), id(c[0])
(4424018384, 4424018384, 4424018328) # augmented assignment changed a[0],b[0]
     |           |
      -----------

>>> b = a = ['hell', 'word']
>>> id(a[0]), id(b[0]), id(c[0])
(4424018328, 4424018328, 4424018328) # the same hell
     |           |           |
      -----------------------

>>> import gc
>>> gc.get_referrers(a[0]) 
[['hell', 'word'], ['hell', 'word']]  # one copy belong to a,b, the another for c
>>> gc.get_referrers(('hell'))
[['hell', 'word'], ['hell', 'word'], ('hell', None)] # ('hello', None) 

A slight practical perspective to look into memory through id and gc.

>>> b = a = ['hell', 'word']
>>> c = ['hell', 'word']

>>> id(a), id(b), id(c)
(4424020872, 4424020872, 4423979272) 
     |           |
      -----------

>>> id(a[0]), id(b[0]), id(c[0])
(4424018328, 4424018328, 4424018328) # all referring to same 'hell'
     |           |           |
      -----------------------

>>> id(a[0][0]), id(b[0][0]), id(c[0][0])
(4422785208, 4422785208, 4422785208) # all referring to same 'h'
     |           |           |
      -----------------------

>>> a[0] += 'o'
>>> a,b,c
(['hello', 'word'], ['hello', 'word'], ['hell', 'word'])  # b changed too
>>> id(a[0]), id(b[0]), id(c[0])
(4424018384, 4424018384, 4424018328) # augmented assignment changed a[0],b[0]
     |           |
      -----------

>>> b = a = ['hell', 'word']
>>> id(a[0]), id(b[0]), id(c[0])
(4424018328, 4424018328, 4424018328) # the same hell
     |           |           |
      -----------------------

>>> import gc
>>> gc.get_referrers(a[0]) 
[['hell', 'word'], ['hell', 'word']]  # one copy belong to a,b, the another for c
>>> gc.get_referrers(('hell'))
[['hell', 'word'], ['hell', 'word'], ('hell', None)] # ('hello', None) 

回答 16

在执行以下操作时,请记住在Python中:

    list1 = ['apples','bananas','pineapples']
    list2 = list1

List2不是存储实际的列表,而是对list1的引用。因此,当您对list1执行任何操作时,list2也会发生变化。使用复制模块(不是默认值,可从pip下载)制作列表的原始副本(copy.copy()用于简单列表,copy.deepcopy()用于嵌套列表)。这将使副本不会随第一个列表更改。

Remember that in Python when you do:

    list1 = ['apples','bananas','pineapples']
    list2 = list1

List2 isn’t storing the actual list, but a reference to list1. So when you do anything to list1, list2 changes as well. use the copy module (not default, download on pip) to make an original copy of the list(copy.copy() for simple lists, copy.deepcopy() for nested ones). This makes a copy that doesn’t change with the first list.


回答 17

deepcopy选项是唯一适用于我的方法:

from copy import deepcopy

a = [   [ list(range(1, 3)) for i in range(3) ]   ]
b = deepcopy(a)
b[0][1]=[3]
print('Deep:')
print(a)
print(b)
print('-----------------------------')
a = [   [ list(range(1, 3)) for i in range(3) ]   ]
b = a*1
b[0][1]=[3]
print('*1:')
print(a)
print(b)
print('-----------------------------')
a = [   [ list(range(1, 3)) for i in range(3) ] ]
b = a[:]
b[0][1]=[3]
print('Vector copy:')
print(a)
print(b)
print('-----------------------------')
a = [   [ list(range(1, 3)) for i in range(3) ]  ]
b = list(a)
b[0][1]=[3]
print('List copy:')
print(a)
print(b)
print('-----------------------------')
a = [   [ list(range(1, 3)) for i in range(3) ]  ]
b = a.copy()
b[0][1]=[3]
print('.copy():')
print(a)
print(b)
print('-----------------------------')
a = [   [ list(range(1, 3)) for i in range(3) ]  ]
b = a
b[0][1]=[3]
print('Shallow:')
print(a)
print(b)
print('-----------------------------')

导致输出:

Deep:
[[[1, 2], [1, 2], [1, 2]]]
[[[1, 2], [3], [1, 2]]]
-----------------------------
*1:
[[[1, 2], [3], [1, 2]]]
[[[1, 2], [3], [1, 2]]]
-----------------------------
Vector copy:
[[[1, 2], [3], [1, 2]]]
[[[1, 2], [3], [1, 2]]]
-----------------------------
List copy:
[[[1, 2], [3], [1, 2]]]
[[[1, 2], [3], [1, 2]]]
-----------------------------
.copy():
[[[1, 2], [3], [1, 2]]]
[[[1, 2], [3], [1, 2]]]
-----------------------------
Shallow:
[[[1, 2], [3], [1, 2]]]
[[[1, 2], [3], [1, 2]]]
-----------------------------

The deepcopy option is the only method that works for me:

from copy import deepcopy

a = [   [ list(range(1, 3)) for i in range(3) ]   ]
b = deepcopy(a)
b[0][1]=[3]
print('Deep:')
print(a)
print(b)
print('-----------------------------')
a = [   [ list(range(1, 3)) for i in range(3) ]   ]
b = a*1
b[0][1]=[3]
print('*1:')
print(a)
print(b)
print('-----------------------------')
a = [   [ list(range(1, 3)) for i in range(3) ] ]
b = a[:]
b[0][1]=[3]
print('Vector copy:')
print(a)
print(b)
print('-----------------------------')
a = [   [ list(range(1, 3)) for i in range(3) ]  ]
b = list(a)
b[0][1]=[3]
print('List copy:')
print(a)
print(b)
print('-----------------------------')
a = [   [ list(range(1, 3)) for i in range(3) ]  ]
b = a.copy()
b[0][1]=[3]
print('.copy():')
print(a)
print(b)
print('-----------------------------')
a = [   [ list(range(1, 3)) for i in range(3) ]  ]
b = a
b[0][1]=[3]
print('Shallow:')
print(a)
print(b)
print('-----------------------------')

leads to output of:

Deep:
[[[1, 2], [1, 2], [1, 2]]]
[[[1, 2], [3], [1, 2]]]
-----------------------------
*1:
[[[1, 2], [3], [1, 2]]]
[[[1, 2], [3], [1, 2]]]
-----------------------------
Vector copy:
[[[1, 2], [3], [1, 2]]]
[[[1, 2], [3], [1, 2]]]
-----------------------------
List copy:
[[[1, 2], [3], [1, 2]]]
[[[1, 2], [3], [1, 2]]]
-----------------------------
.copy():
[[[1, 2], [3], [1, 2]]]
[[[1, 2], [3], [1, 2]]]
-----------------------------
Shallow:
[[[1, 2], [3], [1, 2]]]
[[[1, 2], [3], [1, 2]]]
-----------------------------

如何在Windows上安装pip?

问题:如何在Windows上安装pip?

pip是的替代品easy_install。但是我应该在Windows上pip使用安装easy_install吗?有没有更好的办法?

pip is a replacement for easy_install. But should I install pip using easy_install on Windows? Is there a better way?


回答 0

Python 2.7.9+和3.4+

好消息!Pip随附了Python 3.4(2014年3月发布)和Python 2.7.9(2014年12月发布)。这是所有Python版本中的最佳功能。它使每个人都可以访问社区丰富的图书馆。新手不再因设置的困难而无法使用社区库。在与软件包管理器一起交付时,Python加入了RubyNode.jsHaskellPerlGo以及几乎所有其他具有主流开源社区的当代语言。谢谢,Python。

如果确实发现在使用Python 3.4+或Python 2.7.9+时不可用pip,则只需执行例如:

py -3 -m ensurepip

当然,这并不意味着Python打包已解决问题。经验仍然令人沮丧。我将在“堆栈溢出”问题中对此进行讨论。Python是否具有程序包/模块管理系统?

而且,对使用Python 2.7.8或更早版本(社区中相当大的一部分)的每个人来说都可惜。没有计划将Pip运送给您。遵循手册说明。

Python 2≤2.7.8和Python 3≤3.3

面对“包括电池”的座右铭,Python出厂时没有软件包管理器。更糟糕的是,直到最近,Pip一直很难安装。

官方指示

根据https://pip.pypa.io/zh-CN/stable/installing/#do-i-need-to-install-pip

下载时get-pip.py,请小心保存为.py文件而不是文件.txt。然后,在命令提示符下运行它:

python get-pip.py

您可能需要管理员命令提示符才能执行此操作。按照以管理员身份启动命令提示符(Microsoft TechNet)。

这将安装pip程序包,该程序包(在Windows中)包含… \ Scripts \ pip.exe,该路径必须位于PATH环境变量中才能从命令行使用pip(请参阅“替代说明”的第二部分,将其添加到您的路径,

替代说明

官方文档告诉用户从源代码安装Pip及其每个依赖项。对于新手来说,这是乏味的,对新手来说却是困难重重。

为了我们的缘故,Christoph Gohlke .msi为流行的Python软件包准备了Windows安装程序()。他为所有32位和64位Python版本构建安装程序。你需要:

  1. 安装setuptools
  2. 安装点子

对我来说,此安装点位于C:\Python27\Scripts\pip.exepip.exe在您的计算机上查找,然后将其文件夹(例如C:\Python27\Scripts)添加到您的路径(开始/编辑环境变量)。现在,您应该可以从命令行运行pip了。尝试安装软件包:

pip install httpie

你去了(希望)!常见问题的解决方案如下:

代理问题

如果您在办公室工作,则可能位于HTTP代理后面。如果是这样,请设置环境变量http_proxyhttps_proxy。大多数Python应用程序(和其他免费软件)都遵守这些规定。语法示例:

http://proxy_url:port
http://username:password@proxy_url:port

如果您真的不走运,则您的代理可以是Microsoft NTLM代理。自由软件无法应付。唯一的解决方案是安装一个免费软件友好代理,该代理转发给讨厌的代理。http://cntlm.sourceforge.net/

找不到vcvarsall.bat

Python模块可以部分用C或C ++编写。Pip尝试从源代码进行编译。如果没有安装和配置C / C ++编译器,则会看到此错误消息。

错误:找不到vcvarsall.bat

您可以通过安装C ++编译器(例如MinGWVisual C ++)来解决此问题。微软实际上提供了一个专门用于Python的工具。或尝试使用Microsoft Visual C ++编译器Python 2.7

尽管通常更容易检查Christoph网站上的包裹。

Python 2.7.9+ and 3.4+

Good news! Python 3.4 (released March 2014) and Python 2.7.9 (released December 2014) ship with Pip. This is the best feature of any Python release. It makes the community’s wealth of libraries accessible to everyone. Newbies are no longer excluded from using community libraries by the prohibitive difficulty of setup. In shipping with a package manager, Python joins Ruby, Node.js, Haskell, Perl, Go—almost every other contemporary language with a majority open-source community. Thank you, Python.

If you do find that pip is not available when using Python 3.4+ or Python 2.7.9+, simply execute e.g.:

py -3 -m ensurepip

Of course, that doesn’t mean Python packaging is problem solved. The experience remains frustrating. I discuss this in the Stack Overflow question Does Python have a package/module management system?.

And, alas for everyone using Python 2.7.8 or earlier (a sizable portion of the community). There’s no plan to ship Pip to you. Manual instructions follow.

Python 2 ≤ 2.7.8 and Python 3 ≤ 3.3

Flying in the face of its ‘batteries included’ motto, Python ships without a package manager. To make matters worse, Pip was—until recently—ironically difficult to install.

Official instructions

Per https://pip.pypa.io/en/stable/installing/#do-i-need-to-install-pip:

Download get-pip.py, being careful to save it as a .py file rather than .txt. Then, run it from the command prompt:

python get-pip.py

You possibly need an administrator command prompt to do this. Follow Start a Command Prompt as an Administrator (Microsoft TechNet).

This installs the pip package, which (in Windows) contains …\Scripts\pip.exe that path must be in PATH environment variable to use pip from the command line (see the second part of ‘Alternative Instructions’ for adding it to your PATH,

Alternative instructions

The official documentation tells users to install Pip and each of its dependencies from source. That’s tedious for the experienced and prohibitively difficult for newbies.

For our sake, Christoph Gohlke prepares Windows installers (.msi) for popular Python packages. He builds installers for all Python versions, both 32 and 64 bit. You need to:

  1. Install setuptools
  2. Install pip

For me, this installed Pip at C:\Python27\Scripts\pip.exe. Find pip.exe on your computer, then add its folder (for example, C:\Python27\Scripts) to your path (Start / Edit environment variables). Now you should be able to run pip from the command line. Try installing a package:

pip install httpie

There you go (hopefully)! Solutions for common problems are given below:

Proxy problems

If you work in an office, you might be behind an HTTP proxy. If so, set the environment variables http_proxy and https_proxy. Most Python applications (and other free software) respect these. Example syntax:

http://proxy_url:port
http://username:password@proxy_url:port

If you’re really unlucky, your proxy might be a Microsoft NTLM proxy. Free software can’t cope. The only solution is to install a free software friendly proxy that forwards to the nasty proxy. http://cntlm.sourceforge.net/

Unable to find vcvarsall.bat

Python modules can be partly written in C or C++. Pip tries to compile from source. If you don’t have a C/C++ compiler installed and configured, you’ll see this cryptic error message.

Error: Unable to find vcvarsall.bat

You can fix that by installing a C++ compiler such as MinGW or Visual C++. Microsoft actually ships one specifically for use with Python. Or try Microsoft Visual C++ Compiler for Python 2.7.

Often though it’s easier to check Christoph’s site for your package.


回答 1

过时的 -使用分发,而不是此处所述的setuptools。-
过时的#2 -使用作为setuptools的分配已经过时了。

如前所述,pip不包含独立的安装程序,但是您可以使用其前身easy_install进行安装。

所以:

  1. 从此处下载最新的pip版本:http : //pypi.python.org/pypi/pip#downloads
  2. 解压缩
  3. 下载适用于Windows的最后一个简易安装程序:(http://pypi.python.org/pypi/setuptools底部下载.exe)。安装它。
  4. 将未压缩的pip文件夹内容复制到C:\Python2x\文件夹中(不要将整个文件夹复制到其中,仅复制内容),因为python命令在C:\Python2x文件夹外部不起作用,然后运行: python setup.py install
  5. 将您的python添加C:\Python2x\Scripts到路径

大功告成

现在,您可以pip install package像在Linux中那样轻松地安装软件包:)

Outdated — use distribute, not setuptools as described here. —
Outdated #2 — use setuptools as distribute is deprecated.

As you mentioned pip doesn’t include an independent installer, but you can install it with its predecessor easy_install.

So:

  1. Download the last pip version from here: http://pypi.python.org/pypi/pip#downloads
  2. Uncompress it
  3. Download the last easy installer for Windows: (download the .exe at the bottom of http://pypi.python.org/pypi/setuptools ). Install it.
  4. copy the uncompressed pip folder content into C:\Python2x\ folder (don’t copy the whole folder into it, just the content), because python command doesn’t work outside C:\Python2x folder and then run: python setup.py install
  5. Add your python C:\Python2x\Scripts to the path

You are done.

Now you can use pip install package to easily install packages as in Linux :)


回答 2

2014年更新:

1)如果您安装了Python 3.4或更高版本,则pip包含在Python中,并且应该已经在您的系统上运行。

2)如果您运行的版本低于Python 3.4,或者由于某些原因未在Python 3.4中安装pip,则您可能会使用pip的官方安装脚本get-pip.py。pip安装程序现在可以为您获取setuptools,并且可以在不考虑体系结构(32位或64位)的情况下运行。

这里详细说明了安装说明,其中包括:

要安装或升级pip,请安全下载get-pip.py

然后运行以下命令(可能需要管理员访问权限):

python get-pip.py

要升级现有的设置工具(或分发),请运行 pip install -U setuptools

为了后代,我将在下面保留两组旧说明。

旧答案:

对于Windows版本的64位版本-由于ez_setup,64位Windows + Python以前需要使用单独的安装方法,但是我已经在运行32位Python和64位Python的64位Windows上测试了新的分发方法,现在您可以对所有版本的Windows / Python 2.7X使用相同的方法:

使用分配的OLD方法2

  1. 下载分发 -我投入了我的想法C:\Python27\ScriptsScripts如果目录不存在,请随意创建。
  2. 打开命令提示符(在Windows上,如果不使用PowerShell,则应检出connemu2),然后将()更改为下载到的目录。cddistribute_setup.py
  3. 运行distribute_setup :(python distribute_setup.py如果您的python安装目录未添加到路径中,则此方法将不起作用- 请在此处获得帮助
  4. 将当前目录更改Scripts为Python安装目录(C:\Python27\Scripts),或者将该目录以及Python基本安装目录添加至您的%PATH%环境变量。
  5. 使用新安装的setuptools安装pip: easy_install pip

除非您easy_install.exe位于目录中(C:\ Python27 \ Scripts将是Python 2.7的默认设置),或者您将该目录添加到路径中,否则最后一步将不起作用。

使用ez_setup的OLD方法1

从setuptools页面

下载ez_setup.py并运行它;它将下载适当的.egg文件并为您安装。(当前,由于distutils安装程序的兼容性问题,提供的.exe安装程序不支持Windows的64位版本的Python。

之后,您可以继续:

  1. 添加c:\Python2x\Scripts到Windows路径(用已安装的实际版本号替换xin Python2x
  2. 打开一个新的(!)DOS提示符。从那里运行easy_install pip

2014 UPDATE:

1) If you have installed Python 3.4 or later, pip is included with Python and should already be working on your system.

2) If you are running a version below Python 3.4 or if pip was not installed with Python 3.4 for some reason, then you’d probably use pip’s official installation script get-pip.py. The pip installer now grabs setuptools for you, and works regardless of architecture (32-bit or 64-bit).

The installation instructions are detailed here and involve:

To install or upgrade pip, securely download get-pip.py.

Then run the following (which may require administrator access):

python get-pip.py

To upgrade an existing setuptools (or distribute), run pip install -U setuptools

I’ll leave the two sets of old instructions below for posterity.

OLD Answers:

For Windows editions of the 64 bit variety – 64-bit Windows + Python used to require a separate installation method due to ez_setup, but I’ve tested the new distribute method on 64-bit Windows running 32-bit Python and 64-bit Python, and you can now use the same method for all versions of Windows/Python 2.7X:

OLD Method 2 using distribute:

  1. Download distribute – I threw mine in C:\Python27\Scripts (feel free to create a Scripts directory if it doesn’t exist.
  2. Open up a command prompt (on Windows you should check out conemu2 if you don’t use PowerShell) and change (cd) to the directory you’ve downloaded distribute_setup.py to.
  3. Run distribute_setup: python distribute_setup.py (This will not work if your python installation directory is not added to your path – go here for help)
  4. Change the current directory to the Scripts directory for your Python installation (C:\Python27\Scripts) or add that directory, as well as the Python base installation directory to your %PATH% environment variable.
  5. Install pip using the newly installed setuptools: easy_install pip

The last step will not work unless you’re either in the directory easy_install.exe is located in (C:\Python27\Scripts would be the default for Python 2.7), or you have that directory added to your path.

OLD Method 1 using ez_setup:

from the setuptools page

Download ez_setup.py and run it; it will download the appropriate .egg file and install it for you. (Currently, the provided .exe installer does not support 64-bit versions of Python for Windows, due to a distutils installer compatibility issue.

After this, you may continue with:

  1. Add c:\Python2x\Scripts to the Windows path (replace the x in Python2x with the actual version number you have installed)
  2. Open a new (!) DOS prompt. From there run easy_install pip

回答 3

2016年更新:

这些答案已经过时,或者罗word且困难。

如果您拥有Python 3.4+或2.7.9+,它将默认安装在Windows上。否则,简而言之:

  1. 下载pip安装程序:https : //bootstrap.pypa.io/get-pip.py
  2. 如果偏执,请检查文件以确认它不是恶意的(必须b64解码)。
  3. 以Admin身份在下载文件夹中打开一个控制台,然后运行 get-pip.py。或者,在资源管理器中右键单击其图标,然后选择“以管理员身份运行…”。

新的二进制文件pip.exe(和已弃用的easy_install.exe)将在"%ProgramFiles%\PythonXX\Scripts"文件夹(或类似文件)中找到,该文件夹通常不在您的PATH变量中。我建议添加它。

2016+ Update:

These answers are outdated or otherwise wordy and difficult.

If you’ve got Python 3.4+ or 2.7.9+, it will be installed by default on Windows. Otherwise, in short:

  1. Download the pip installer: https://bootstrap.pypa.io/get-pip.py
  2. If paranoid, inspect file to confirm it isn’t malicious (must b64 decode).
  3. Open a console in the download folder as Admin and run get-pip.py. Alternatively, right-click its icon in Explorer and choose the “run as Admin…”.

The new binaries pip.exe (and the deprecated easy_install.exe) will be found in the "%ProgramFiles%\PythonXX\Scripts" folder (or similar), which is often not in your PATH variable. I recommend adding it.


回答 4

2014年3月 发布的Python 3.4 pip附带了以下内容:
http : //docs.python.org/3.4/whatsnew/3.4.html
因此,自Python 3.4发布以来,最新的安装pip方法在Windows上只是安装Python。

推荐的使用方式是将其称为模块,尤其是在安装了多个python发行版或版本的情况下,以确保程序包可以到达正确的位置:
python -m pip install --upgrade packageXYZ

https://docs.python.org/3/installing/#work-with-multiple-versions-of-python-installed-in-parallel

Python 3.4, which was released in March 2014, comes with pip included:
http://docs.python.org/3.4/whatsnew/3.4.html
So, since the release of Python 3.4, the up-to-date way to install pip on Windows is to just install Python.

The recommended way to use it is to call it as a module, especially with multiple python distributions or versions installed, to guarantee packages go to the correct place:
python -m pip install --upgrade packageXYZ

https://docs.python.org/3/installing/#work-with-multiple-versions-of-python-installed-in-parallel


回答 5

当我必须使用Windows时,我使用ActivePython,它将自动将所有内容添加到您的PATH中,并包括一个名为PyPM的软件包管理器,该软件包管理器提供了二进制软件包管理,从而使安装软件包变得更快,更简单。

pipeasy_install不是完全一样的东西,所以有一些事情可以打通pip,但不是easy_install ,反之亦然

我的建议是您获得ActivePython社区版,不要担心在Windows上为Python设置所有东西的麻烦。然后,您可以使用pypm

如果要使用pip,则必须检查PyPMActiveState安装程序中的选项。安装后,您只需要注销并再次登录,pip即可在命令行中使用,因为它包含在ActiveState安装程序PyPM选项中,并且安装程序已经为您设置了路径。PyPM也可以使用,但您不必使用它。

When I have to use Windows, I use ActivePython, which automatically adds everything to your PATH and includes a package manager called PyPM which provides binary package management making it faster and simpler to install packages.

pip and easy_install aren’t exactly the same thing, so there are some things you can get through pip but not easy_install and vice versa.

My recommendation is that you get ActivePython Community Edition and don’t worry about the huge hassle of getting everything set up for Python on Windows. Then, you can just use pypm.

In case you want to use pip you have to check the PyPM option in the ActiveState installer. After installation you only need to logoff and log on again, and pip will be available on the commandline, because it is contained in the ActiveState installer PyPM option and the paths have been set by the installer for you already. PyPM will also be available, but you do not have to use it.


回答 6

最新的方法是使用Windows的软件包管理器Chocolatey

安装完成后,您所要做的就是打开命令提示符并运行下面的以下三个命令,这将安装Python 2.7,easy_install和pip。它会自动检测您是在x64还是x86 Windows上。

cinst python
cinst easy.install
cinst pip

Chocolatey Gallery上的所有其他Python软件包都可以在这里找到。

The up-to-date way is to use Windows’ package manager Chocolatey.

Once this is installed, all you have to do is open a command prompt and run the following the three commands below, which will install Python 2.7, easy_install and pip. It will automatically detect whether you’re on x64 or x86 Windows.

cinst python
cinst easy.install
cinst pip

All of the other Python packages on the Chocolatey Gallery can be found here.


回答 7

2015年3月更新

Python 2.7.9和更高版本(在Python 2系列上)以及Python 3.4和更高版本默认情况下都包含pip,因此您可能已经拥有pip。

如果不这样做,请在提示符下运行以下一行命令(可能需要管理员访问权限):

python -c "exec('try: from urllib2 import urlopen \nexcept: from urllib.request import urlopen');f=urlopen('https://bootstrap.pypa.io/get-pip.py').read();exec(f)"

它将安装pip。如果尚未安装Setuptoolsget-pip.py也将为您安装它。

如评论中所述,以上命令将从GitHub上的Pip源代码存储库下载代码,并在您的环境中动态运行它。因此请注意,这是下载,检查和运行步骤的快捷方式,所有这些操作都使用Python本身通过一个命令完成。如果您信任Pip,请毫无疑问地继续。

确保Windows环境变量PATH包含Python的文件夹(对于Python 2.7.x,默认安装:C:\Python27C:\Python27\Scripts,对于Python 3.3x:C:\Python33C:\Python33\Scripts,依此类推)。

Update March 2015

Python 2.7.9 and later (on the Python 2 series), and Python 3.4 and later include pip by default, so you may have pip already.

If you don’t, run this one line command on your prompt (which may require administrator access):

python -c "exec('try: from urllib2 import urlopen \nexcept: from urllib.request import urlopen');f=urlopen('https://bootstrap.pypa.io/get-pip.py').read();exec(f)"

It will install pip. If Setuptools is not already installed, get-pip.py will install it for you too.

As mentioned in comments, the above command will download code from the Pip source code repository at GitHub, and dynamically run it at your environment. So be noticed that this is a shortcut of the steps download, inspect and run, all with a single command using Python itself. If you trust Pip, proceed without doubt.

Be sure that your Windows environment variable PATH includes Python’s folders (for Python 2.7.x default install: C:\Python27 and C:\Python27\Scripts, for Python 3.3x: C:\Python33 and C:\Python33\Scripts, and so on).


回答 8

安装人员

我内置的Windows安装两种分发PIP这里(其目标是使用pip,而无需任何引导用easy_install或保存和运行Python脚本):

在Windows上,只需先下载并安装distribute,然后pip从上面的链接下载即可。distribute上面的链接确实包含存根.exe安装程序,并且当前仅32位。我尚未在64位Windows上测试过效果。

在Windows上构建

将其重做为新版本的过程并不困难,我将其包含在此处以供参考。

建造 distribute

为了获得存根.exe文件,您需要具有Visual C ++编译器(显然也可以与MinGW一起编译)

hg clone https://bitbucket.org/tarek/distribute
cd distribute
hg checkout 0.6.27
rem optionally, comment out tag_build and tag_svn_revision in setup.cfg
msvc-build-launcher.cmd
python setup.py bdist_win32
cd ..
echo build is in distribute\dist

建造 pip

git clone https://github.com/pypa/pip.git
cd pip
git checkout 1.1
python setup.py bdist_win32
cd ..
echo build is in pip\dist

Installers

I’ve built Windows installers for both distribute and pip here (the goal being to use pip without having to either bootstrap with easy_install or save and run Python scripts):

On Windows, simply download and install first distribute, then pip from the above links. The distribute link above does contain stub .exe installers, and these are currently 32-bit only. I haven’t tested the effect on 64-bit Windows.

Building on Windows

The process to redo this for new versions is not difficult, and I’ve included it here for reference.

Building distribute

In order to get the stub .exe files, you need to have a Visual C++ compiler (it is apparently compilable with MinGW as well)

hg clone https://bitbucket.org/tarek/distribute
cd distribute
hg checkout 0.6.27
rem optionally, comment out tag_build and tag_svn_revision in setup.cfg
msvc-build-launcher.cmd
python setup.py bdist_win32
cd ..
echo build is in distribute\dist

Building pip

git clone https://github.com/pypa/pip.git
cd pip
git checkout 1.1
python setup.py bdist_win32
cd ..
echo build is in pip\dist

回答 9

以下内容适用于Python 2.7。保存此脚本并启动它:

https:

//raw.github.com/pypa/pip/master/contrib/get-pip.py安装了Pip,然后将路径添加到您的环境:

C:\Python27\Scripts

最后

pip install virtualenv

另外,您还需要Microsoft Visual C ++ 2008 Express才能获得良好的编译器,并在安装软件包时避免出现此类消息:

error: Unable to find vcvarsall.bat

如果您使用的是Windows 7的64位版本,则可能会在64位Windows 7上阅读64位Python安装问题,以成功安装Python可执行程序包(带有注册表项)。

The following works for Python 2.7. Save this script and launch it:

https://raw.github.com/pypa/pip/master/contrib/get-pip.py

Pip is installed, then add the path to your environment :

C:\Python27\Scripts

Finally

pip install virtualenv

Also you need Microsoft Visual C++ 2008 Express to get the good compiler and avoid these kind of messages when installing packages:

error: Unable to find vcvarsall.bat

If you have a 64-bit version of Windows 7, you may read 64-bit Python installation issues on 64-bit Windows 7 to successfully install the Python executable package (issue with registry entries).


回答 10

对于最新的Python下载-我在Windows上安装了python 3.6。您不必怀疑所需的一切都在那里,请屏息,我将向您展示如何做到这一点。

  1. 确保为我安装python的位置在以下目录中 在此处输入图片说明

现在,如果您在Windows上,让我们将python和pip添加到环境变量路径设置中,以便在任何地方键入pip或python都可以从安装它们的地方调用python aor pip。

因此,在屏幕上方“ SCRIPTS ” 下的文件夹下找到PIP,让我们在环境变量路径中添加Python和PIP。 在此处输入图片说明

快完成了,让我们用CMD进行测试以使用pip安装goole软件包。

pip install google

在此处输入图片说明

再见!

For latest Python Download – I have python 3.6 on windows. You don’t have to wonder everything you need is there , take a breath i will show you how to do it.

  1. make sure where you install python for me its was in the following directory enter image description here

Now , lets add python and pip into environment variable path settings if you are on windows, so that typing pip or python anywhere call python aor pip from where they are installed.

So, PIP is found under the folder in above screen “SCRIPTS” Lets add Python and PIP in environment variable path. enter image description here

Almost Done , Let test with CMD to install goole package using pip.

pip install google

enter image description here

BYE BYE!


回答 11

要在Python 2.x上全局安装pip ,如Adrián所述,easy_install似乎是最好的解决方案。

但是pip 的安装说明建议使用virtualenv,因为每个virtualenv都会自动安装pip。这不需要root用户访问权限或修改系统Python安装。

尽管安装virtualenv仍然需要easy_install。

2018更新:

Python 3.3+现在包含venv模块,可轻松创建虚拟环境,如下所示:

python3 -m venv /path/to/new/virtual/environment

请参阅有关在创建后激活环境的不同平台方法的文档,但通常为以下之一:

$ source <venv>/bin/activate 

C:\> <venv>\Scripts\activate.bat

To install pip globally on Python 2.x, easy_install appears to be the best solution as Adrián states.

However the installation instructions for pip recommend using virtualenv since every virtualenv has pip installed in it automatically. This does not require root access or modify your system Python installation.

Installing virtualenv still requires easy_install though.

2018 update:

Python 3.3+ now includes the venv module for easily creating virtual environments like so:

python3 -m venv /path/to/new/virtual/environment

See documentation for different platform methods of activating the environment after creation, but typically one of:

$ source <venv>/bin/activate 

C:\> <venv>\Scripts\activate.bat

回答 12

要使用pip,并不一定需要直接在系统中安装pip。您可以通过使用它virtualenv。您可以按照以下步骤操作:

我们通常需要为一个特定项目安装Python软件包。因此,现在创建一个项目文件夹,比方说myproject。

  • 从的解压缩文件夹中复制virtualenv.py文件virtualenv,并将其粘贴到myproject文件夹中

现在创建一个虚拟环境,在myproject文件夹中,如下所示说myvirtualenv

python virtualenv.py myvirtualenv

它会向您显示:

New python executable in myvirtualenv\Scripts\python.exe
Installing setuptools....................................done.
Installing pip.........................done.

现在,您的虚拟环境myvirtualenv已在项目文件夹中创建。您可能会注意到,pip现在已安装在您的虚拟环境中。您需要做的就是使用以下命令激活虚拟环境。

myvirtualenv\Scripts\activate

您将在命令提示符下看到以下内容:

(myvirtualenv) PATH\TO\YOUR\PROJECT\FOLDER>pip install package_name

现在,您可以开始使用pip,但是请确保已激活虚拟环境,方法是查看提示左侧的内容。

这是在虚拟环境中安装pip的最简单方法之一,但是您需要随身携带virtualenv.py文件。

有关安装pip / virtualenv / virtualenvwrapper的更多方法,请访问thegauraw.tumblr.com

To use pip, it is not mandatory that you need to install pip in the system directly. You can use it through virtualenv. What you can do is follow these steps:

We normally need to install Python packages for one particular project. So, now create a project folder, let’s say myproject.

  • Copy the virtualenv.py file from the decompressed folder of virtualenv, and paste inside the myproject folder

Now create a virtual environment, let’s say myvirtualenv as follows, inside the myproject folder:

python virtualenv.py myvirtualenv

It will show you:

New python executable in myvirtualenv\Scripts\python.exe
Installing setuptools....................................done.
Installing pip.........................done.

Now your virtual environment, myvirtualenv, is created inside your project folder. You might notice, pip is now installed inside you virtual environment. All you need to do is activate the virtual environment with the following command.

myvirtualenv\Scripts\activate

You will see the following at the command prompt:

(myvirtualenv) PATH\TO\YOUR\PROJECT\FOLDER>pip install package_name

Now you can start using pip, but make sure you have activated the virtualenv looking at the left of your prompt.

This is one of the easiest way to install pip i.e. inside virtual environment, but you need to have virtualenv.py file with you.

For more ways to install pip/virtualenv/virtualenvwrapper, you can refer to thegauraw.tumblr.com.


回答 13

我只是想为那些无法从Windows 64位安装setuptools的用户添加一个解决方案。在python.org上的此错误中讨论了该问题,截至本评论发布之日仍未解决。提到了一个简单的解决方法,它可以完美地工作。一次注册表更改对我来说很成功。

链接:http//bugs.python.org/issue6792#

适用于我的解决方案…:

为2.6+版本的Python添加此注册表设置:

 [HKEY_LOCAL_MACHINE\SOFTWARE\Wow6432Node\Python\PythonCore\2.6\InstallPath]
 @="C:\\Python26\\"

这很可能是您在Python 2.6+中已经拥有的注册表设置:

 [HKEY_LOCAL_MACHINE\SOFTWARE\Python\PythonCore\2.6\InstallPath]
 @="C:\\Python26\\"

显然,您将需要用正在运行的Python版本替换2.6版本。

I just wanted to add one more solution for those having issues installing setuptools from Windows 64-bit. The issue is discussed in this bug on python.org and is still unresolved as of the date of this comment. A simple workaround is mentioned and it works flawlessly. One registry change did the trick for me.

Link: http://bugs.python.org/issue6792#

Solution that worked for me…:

Add this registry setting for 2.6+ versions of Python:

 [HKEY_LOCAL_MACHINE\SOFTWARE\Wow6432Node\Python\PythonCore\2.6\InstallPath]
 @="C:\\Python26\\"

This is most likely the registry setting you will already have for Python 2.6+:

 [HKEY_LOCAL_MACHINE\SOFTWARE\Python\PythonCore\2.6\InstallPath]
 @="C:\\Python26\\"

Clearly, you will need to replace the 2.6 version with whatever version of Python you are running.


回答 14

更新于2016年: Pip应该已经包含在中Python 2.7.9+ or 3.4+,但是如果由于某种原因它不存在,则可以使用以下一种格式。

PS:

  1. 在大多数情况下,这已经可以满足要求,但是,如果有必要,请确保环境变量PATH包含Python的文件夹(例如,Python 2.7.x在Windows默认安装:C:\Python27 and C:\Python27\Scripts,for Python 3.3xC:\Python33 and C:\Python33\Scripts等)

  2. 我遇到同样的问题,然后在这里的官方网站上找到了这种最简单的方法(一行代码!): http //www.pip-installer.org/en/latest/installing.html

不敢相信那里有这么长的答案(也许已经过时了?)。对他们表示感谢,但请对这个简短的答案进行投票,以帮助更多的新手!

Updated at 2016 : Pip should already be included in Python 2.7.9+ or 3.4+, but if for whatever reason it is not there, you can use the following one-liner.

PS:

  1. This should already be satisfied in most cases but, if necessary, be sure that your environment variable PATH includes Python’s folders (for example, Python 2.7.x on Windows default install: C:\Python27 and C:\Python27\Scripts, for Python 3.3x: C:\Python33 and C:\Python33\Scripts, etc)

  2. I encounter same problem and then found such perhaps easiest way (one liner!) mentioned on official website here: http://www.pip-installer.org/en/latest/installing.html

Can’t believe there are so many lengthy (perhaps outdated?) answers out there. Feeling thankful to them but, please up-vote this short answer to help more new comers!


回答 15

到目前为止,我发现的最好方法就是两行代码:

curl http://python-distribute.org/distribute_setup.py | python
curl https://raw.github.com/pypa/pip/master/contrib/get-pip.py | python

它已在Windows 8上使用PowerShell,Cmd和Git Bash(MinGW)进行了测试。

您可能想要将路径添加到您的环境。就像C:\Python33\Scripts

The best way I found so far, is just two lines of code:

curl http://python-distribute.org/distribute_setup.py | python
curl https://raw.github.com/pypa/pip/master/contrib/get-pip.py | python

It was tested on Windows 8 with PowerShell, Cmd, and Git Bash (MinGW).

And you probably want to add the path to your environment. It’s somewhere like C:\Python33\Scripts.


回答 16

在这里,如何通过简单的方法安装pip。

  1. 这些内容复制并粘贴为get-pip.py文件
  2. get-pip.py复制并粘贴到python文件夹中。C:\Python27
  3. 双击获取get-pip.py文件,它将pip安装到您的计算机上。
  4. 现在您必须为C:\Python27\Scripts环境变量添加路径,因为它包含pip.exe文件。
  5. 现在您可以使用点子了。打开cmd并键入为
    pip install package_name

Here how to install pip with easy way.

  1. copy and paste these content in a file as get-pip.py
  2. copy and paste get-pip.py into python folder.C:\Python27
  3. Double click to get-pip.py file.it will install pip to your computer.
  4. Now you have to add C:\Python27\Scripts path to your enviroment variable.Because it includes pip.exe file.
  5. Now you are ready to use pip. Open cmd and type as
    pip install package_name

回答 17

PythonXY带有pip包括,除其他

PythonXY comes with pip included, among others.


回答 18

我在Windows上使用来自continuum.io 的跨平台Anaconda软件包管理器,它是可靠的。它具有虚拟环境管理功能以及具有常用功能(例如conda,pip)的功能齐全的外壳。

> conda install <package>               # access distributed binaries

> pip install <package>                 # access PyPI packages 

conda还附带了具有非Python依赖项(例如pandasnumpy等)的库的二进制文件。这在Windows上特别有用,因为可能很难正确编译C依赖项。

I use the cross-platform Anaconda package manager from continuum.io on Windows and it is reliable. It has virtual environment management and a fully featured shell with common utilities (e.g. conda, pip).

> conda install <package>               # access distributed binaries

> pip install <package>                 # access PyPI packages 

conda also comes with binaries for libraries with non-Python dependencies, e.g. pandas, numpy, etc. This proves useful particularly on Windows as it can be hard to correctly compile C dependencies.


回答 19

按照此处的说明进行安装时,我遇到了一些问题。我认为以相同的方式在每个Windows环境中安装非常棘手。就我而言,出于同一目的,我需要在同一台计算机上使用Python 2.6、2.7和3.3,这就是为什么我认为还有更多问题的原因。但是以下说明对我来说非常有效,因此可能要根据您的环境尝试使用此说明:

http://docs.python-guide.org/zh-CN/latest/starting/install/win/

另外,由于不同的环境,我发现使用虚拟环境非常有用,我的网站使用不同的库,最好将它们封装到一个文件夹中,查看说明,如果已安装PIP,则只需安装VirtualEnv:

pip install virtualenv

将所有文件运行到该文件夹​​中

virtualenv venv

几秒钟后,您将在venv文件夹中拥有一个包含所有内容的虚拟环境,要运行它,请运行venv / Scripts / activate.bat(停用环境很容易,请使用deactivate.bat)。您安装的每个库都将以venv \ Lib \ site-packages结尾,并且很容易将整个环境移动到某个地方。

我发现的唯一缺点是某些代码编辑器无法识别这种环境,并且由于未找到导入的库,因此您会在代码中看到警告。当然,有一些棘手的方法可以做到这一点,但编辑人员切记,虚拟环境如今非常普遍。

希望能帮助到你。

I had some issues installing in different ways when I followed instructions here. I think it’s very tricky to install in every Windows environment in the same way. In my case I need Python 2.6, 2.7 and 3.3 in the same machine for different purposes so that’s why I think there’re more problems. But the following instructions worked perfectly for me, so might be depending on your environment you should try this one:

http://docs.python-guide.org/en/latest/starting/install/win/

Also, due to the different environments I found incredible useful to use Virtual Environments, I had websites that use different libraries and it’s much better to encapsulate them into a single folder, check out the instructions, briefly if PIP is installed you just install VirtualEnv:

pip install virtualenv

Into the folder you have all your files run

virtualenv venv

And seconds later you have a virtual environment with everything in venv folder, to activate it run venv/Scripts/activate.bat (deactivate the environment is easy, use deactivate.bat). Every library you install will end up in venv\Lib\site-packages and it’s easy to move your whole environment somewhere.

The only downside I found is some code editors can’t recognize this kind of environments, and you will see warnings in your code because imported libraries are not found. Of course there’re tricky ways to do it but it would be nice editors keep in mind Virtual Environments are very normal nowadays.

Hope it helps.


回答 20

  1. 下载脚本:https : //raw.github.com/pypa/pip/master/contrib/get-pip.py
  2. 将其保存在驱动器上,例如C:\ pip-script \ get-pip.py
  3. 从命令提示符导航到该路径,然后运行“ python get-pip.py”

指南链接:http : //www.pip-installer.org/en/latest/installing.html#install-pip

注意:请确保还将这样的脚本路径(C:\ Python27 \ Scripts)添加到int%PATH%环境变量中。

  1. Download script: https://raw.github.com/pypa/pip/master/contrib/get-pip.py
  2. Save it on drive somewhere like C:\pip-script\get-pip.py
  3. Navigate to that path from command prompt and run ” python get-pip.py “

Guide link: http://www.pip-installer.org/en/latest/installing.html#install-pip

Note: Make sure scripts path like this (C:\Python27\Scripts) is added int %PATH% environment variable as well.


回答 21

很简单:

Step 1: wget https://bitbucket.org/pypa/setuptools/raw/bootstrap/ez_setup.py
Step 2: wget https://raw.github.com/pypa/pip/master/contrib/get-pip.py
Step 2: python ez_setup.py
Step 3: python get-pip.py

(确保您的Python和Python脚本目录(例如C:\Python27C:\Python27\Scripts)在PATH中。)

It’s very simple:

Step 1: wget https://bitbucket.org/pypa/setuptools/raw/bootstrap/ez_setup.py
Step 2: wget https://raw.github.com/pypa/pip/master/contrib/get-pip.py
Step 2: python ez_setup.py
Step 3: python get-pip.py

(Make sure your Python and Python script directory (for example, C:\Python27 and C:\Python27\Scripts) are in the PATH.)


回答 22

从2014年2月4日开始工作:):

如果您按照@Colonel Panic的建议尝试通过Windows安装程序文件从http://www.lfd.uci.edu/~gohlke/pythonlibs/#pip安装pip ,则可能已成功安装了pip软件包管理器,但是您可能无法使用pip安装任何软件包。如果您查看pip.log文件,您可能还会遇到与尝试安装Beautiful Soup 4时遇到的SSL错误相同的错误:

Downloading/unpacking beautifulsoup4
  Getting page https://pypi.python.org/simple/beautifulsoup4/
  Could not fetch URL https://pypi.python.org/simple/beautifulsoup4/: **connection error: [Errno 1] _ssl.c:504: error:14090086:SSL routines:SSL3_GET_SERVER_CERTIFICATE:certificate verify failed**
  Will skip URL https://pypi.python.org/simple/beautifulsoup4/ when looking for download links for beautifulsoup4

问题是OpenSSL的旧版本与pip 1.3.1及更高版本不兼容。目前,最简单的解决方法是安装不需要SSL的 pip 1.2.1

在Windows上安装Pip:

  1. https://pypi.python.org/packages/source/p/pip/pip-1.2.1.tar.gz下载pip 1.2.1
  2. 提取pip-1.2.1.tar.gz文件
  3. 将目录更改为提取的文件夹: cd <path to extracted folder>/pip-1.2.1
  4. python setup.py install
  5. 现在确保C:\Python27\Scripts位于PATH中,因为pip安装在C:\Python27\Scripts目录中,这与C:\Python27\Lib\site-packages通常安装Python软件包的位置不同

现在尝试使用pip安装任何软件包。

例如,要requests使用pip 安装软件包,请从cmd运行此命令:

pip install requests

哇!requests将成功安装,您将收到一条成功消息。

Working as of Feb 04 2014 :):

If you have tried installing pip through the Windows installer file from http://www.lfd.uci.edu/~gohlke/pythonlibs/#pip as suggested by @Colonel Panic, you might have installed the pip package manager successfully, but you might be unable to install any packages with pip. You might also have got the same SSL error as I got when I tried to install Beautiful Soup 4 if you look in the pip.log file:

Downloading/unpacking beautifulsoup4
  Getting page https://pypi.python.org/simple/beautifulsoup4/
  Could not fetch URL https://pypi.python.org/simple/beautifulsoup4/: **connection error: [Errno 1] _ssl.c:504: error:14090086:SSL routines:SSL3_GET_SERVER_CERTIFICATE:certificate verify failed**
  Will skip URL https://pypi.python.org/simple/beautifulsoup4/ when looking for download links for beautifulsoup4

The problem is an issue with an old version of OpenSSL being incompatible with pip 1.3.1 and above versions. The easy workaround for now, is to install pip 1.2.1, which does not require SSL:

Installing Pip on Windows:

  1. Download pip 1.2.1 from https://pypi.python.org/packages/source/p/pip/pip-1.2.1.tar.gz
  2. Extract the pip-1.2.1.tar.gz file
  3. Change directory to the extracted folder: cd <path to extracted folder>/pip-1.2.1
  4. Run python setup.py install
  5. Now make sure C:\Python27\Scripts is in PATH because pip is installed in the C:\Python27\Scripts directory unlike C:\Python27\Lib\site-packages where Python packages are normally installed

Now try to install any package using pip.

For example, to install the requests package using pip, run this from cmd:

pip install requests

Whola! requests will be successfully installed and you will get a success message.


回答 23

如果您使用的是从python.org下载的Python 2> = 2.7.9或Python 3> = 3.4二进制文件,则已经安装了pip,但是您需要升级pip。

在Windows上可以轻松完成升级

转到Python命令行并在Python命令下运行

python -m pip install -U pip

使用get-pip.py安装

get-pip.py下载到同一文件夹或您选择的任何其他文件夹中。我假设您将从python.exe文件将其下载到同一文件夹中,然后运行此命令

python get-pip.py

Pip的安装指南非常干净和简单。

使用此工具,您应该可以在两分钟内开始使用Pip。

pip is already installed if you’re using Python 2 >=2.7.9 or Python 3 >=3.4 binaries downloaded from python.org, but you’ll need to upgrade pip.

On Windows upgrade can be done easily

Go to Python command line and run below Python command

python -m pip install -U pip

Installing with get-pip.py

Download get-pip.py in the same folder or any other folder of your choice. I am assuming you will download it in the same folder from you have python.exe file and run this command

python get-pip.py

Pip’s installation guide is pretty clean and simple.

Using this you should be able to get started with Pip in under two minutes.


回答 24

如果您甚至对pip版本有其他问题,可以尝试一下

pip install --trusted-host pypi.python.org --upgrade pip

if you even have other problems with pip version you can try this

pip install --trusted-host pypi.python.org --upgrade pip

回答 25

简单的CMD方法

使用CURL下载get-pip.py

curl --http1.1 https://bootstrap.pypa.io/get-pip.py --output get-pip.py

执行下载的python文件

python get-pip.py

然后将C:\Python37\Scripts路径添加到您的环境变量。假设Python37您的C盘中有一个文件夹,该文件夹的名称可能会因安装的python版本而异

现在,您可以通过运行来安装python软件包

pip install awesome_package_name

Simple CMD way

Use CURL to download get-pip.py

curl --http1.1 https://bootstrap.pypa.io/get-pip.py --output get-pip.py

Execute downloaded python file

python get-pip.py

Then add C:\Python37\Scripts path to your environment variable. Assumes that there is a Python37 folder in your C drive, that folder name may varied according to the installed python version

Now you can install python packages by running

pip install awesome_package_name

回答 26

Python2和安装PipPython3

  1. get-pip.py下载到计算机上的文件夹。
  2. 打开命令提示符,然后导航到包含的文件夹get-pip.py
  3. 运行以下命令:python get-pip.pypython3 get-pip.pypython3.6 get-pip.py,取决于哪个版本的python要安装pip
  4. 点现在应该安装了!

旧答案(仍然有效)

你有没有尝试过 ?

python -m ensurepip

这可能是在任何系统上安装pip的最简单方法。

Installing Pip for Python2 and Python3

  1. Download get-pip.py to a folder on your computer.
  2. Open a command prompt and navigate to the folder containing get-pip.py.
  3. Run the following command:python get-pip.py, python3 get-pip.py or python3.6 get-pip.py, depending on which version of python you want to install pip
  4. Pip should be now installed!

Old answer (still valid)

Have you tried ?

python -m ensurepip

it’s probably the easiest to install pip on any system.


回答 27

只需从此处https://pypi.python.org/pypi/setuptools#windows-simplified下载setuptools-15.2.zip(md5),然后运行ez_setup.py。

Just download setuptools-15.2.zip (md5), from here https://pypi.python.org/pypi/setuptools#windows-simplified , and run ez_setup.py.


回答 28

或者,您可以获得pip-Win,它是pip的多合一安装程序,并且virtualenv在Windows及其GUI上。

  • 从一个Python解释器(即版本)切换到另一个(包括py和pypy)
  • 查看所有已安装的软件包,以及它们是否为最新
  • 安装或升级软件包,或升级pip本身
  • 创建和删除虚拟环境,并在它们之间切换
  • 使用选定的解释器运行IDLE或其他Python脚本

Alternatively, you can get pip-Win which is an all-in-one installer for pip and virtualenv on Windows and its GUI.

  • Switch from one Python interpreter (i.e. version) to another (including py and pypy)
  • See all installed packages, and whether they are up-to-date
  • Install or upgrade a package, or upgrade pip itself
  • Create and delete virtual environments, and switch between them
  • Run the IDLE or another Python script, with the selected interpreter

回答 29

现在,它与Python捆绑在一起。您不需要安装它。

pip -V

这样可以检查是否已安装pip。在极少数情况下,如果未安装,请下载get-pip.py文件并使用python作为

python get-pip.py

Now, it is bundled with Python. You don’t need to install it.

pip -V

This is how you can check whether pip is installed or not. In rare case, if it is not installed, download get-pip.py file and run it with python as

python get-pip.py

“ yield”关键字有什么作用?

问题:“ yield”关键字有什么作用?

yield关键字在Python中的用途是什么?

例如,我试图理解这段代码1

def _get_child_candidates(self, distance, min_dist, max_dist):
    if self._leftchild and distance - max_dist < self._median:
        yield self._leftchild
    if self._rightchild and distance + max_dist >= self._median:
        yield self._rightchild  

这是调用方法:

result, candidates = [], [self]
while candidates:
    node = candidates.pop()
    distance = node._get_dist(obj)
    if distance <= max_dist and distance >= min_dist:
        result.extend(node._values)
    candidates.extend(node._get_child_candidates(distance, min_dist, max_dist))
return result

_get_child_candidates调用该方法会怎样?是否返回列表?一个元素?再叫一次吗?后续通话何时停止?


1.这段代码是由Jochen Schulz(jrschulz)编写的,Jochen Schulz是一个很好的用于度量空间的Python库。这是完整源代码的链接:Module mspace

What is the use of the yield keyword in Python, and what does it do?

For example, I’m trying to understand this code1:

def _get_child_candidates(self, distance, min_dist, max_dist):
    if self._leftchild and distance - max_dist < self._median:
        yield self._leftchild
    if self._rightchild and distance + max_dist >= self._median:
        yield self._rightchild  

And this is the caller:

result, candidates = [], [self]
while candidates:
    node = candidates.pop()
    distance = node._get_dist(obj)
    if distance <= max_dist and distance >= min_dist:
        result.extend(node._values)
    candidates.extend(node._get_child_candidates(distance, min_dist, max_dist))
return result

What happens when the method _get_child_candidates is called? Is a list returned? A single element? Is it called again? When will subsequent calls stop?


1. This piece of code was written by Jochen Schulz (jrschulz), who made a great Python library for metric spaces. This is the link to the complete source: Module mspace.


回答 0

要了解其yield作用,您必须了解什么是生成器。而且,在您了解生成器之前,您必须了解iterables

可迭代

创建列表时,可以一一阅读它的项目。逐一读取其项称为迭代:

>>> mylist = [1, 2, 3]
>>> for i in mylist:
...    print(i)
1
2
3

mylist是一个可迭代的。当您使用列表推导时,您将创建一个列表,因此是可迭代的:

>>> mylist = [x*x for x in range(3)]
>>> for i in mylist:
...    print(i)
0
1
4

您可以使用的所有“ for... in...”都是可迭代的;listsstrings,文件…

这些可迭代的方法很方便,因为您可以随意读取它们,但是您将所有值都存储在内存中,当拥有很多值时,这并不总是想要的。

生成器

生成器是迭代器,一种迭代,您只能迭代一次。生成器不会将所有值存储在内存中,它们会即时生成值

>>> mygenerator = (x*x for x in range(3))
>>> for i in mygenerator:
...    print(i)
0
1
4

只是您使用()代替一样[]。但是,由于生成器只能使用一次,因此您无法执行for i in mygenerator第二次:生成器计算0,然后忽略它,然后计算1,最后一次计算4,最后一次。

Yield

yield是与一样使用的关键字return,不同之处在于该函数将返回生成器。

>>> def createGenerator():
...    mylist = range(3)
...    for i in mylist:
...        yield i*i
...
>>> mygenerator = createGenerator() # create a generator
>>> print(mygenerator) # mygenerator is an object!
<generator object createGenerator at 0xb7555c34>
>>> for i in mygenerator:
...     print(i)
0
1
4

这是一个无用的示例,但是当您知道函数将返回大量的值(只需要读取一次)时,它就很方便。

要掌握yield,您必须了解在调用函数时,在函数主体中编写的代码不会运行。该函数仅返回生成器对象,这有点棘手:-)

然后,您的代码将在每次for使用生成器时从中断处继续。

现在最困难的部分是:

第一次for调用从您的函数创建的生成器对象时,它将从头开始运行函数中的代码,直到命中为止yield,然后它将返回循环的第一个值。然后,每个后续调用将运行您在函数中编写的循环的另一个迭代,并返回下一个值。这将一直持续到生成器被认为是空的为止,这在函数运行时没有命中时就会发生yield。那可能是因为循环已经结束,或者是因为您不再满足"if/else"


您的代码说明

生成器:

# Here you create the method of the node object that will return the generator
def _get_child_candidates(self, distance, min_dist, max_dist):

    # Here is the code that will be called each time you use the generator object:

    # If there is still a child of the node object on its left
    # AND if the distance is ok, return the next child
    if self._leftchild and distance - max_dist < self._median:
        yield self._leftchild

    # If there is still a child of the node object on its right
    # AND if the distance is ok, return the next child
    if self._rightchild and distance + max_dist >= self._median:
        yield self._rightchild

    # If the function arrives here, the generator will be considered empty
    # there is no more than two values: the left and the right children

调用方法:

# Create an empty list and a list with the current object reference
result, candidates = list(), [self]

# Loop on candidates (they contain only one element at the beginning)
while candidates:

    # Get the last candidate and remove it from the list
    node = candidates.pop()

    # Get the distance between obj and the candidate
    distance = node._get_dist(obj)

    # If distance is ok, then you can fill the result
    if distance <= max_dist and distance >= min_dist:
        result.extend(node._values)

    # Add the children of the candidate in the candidate's list
    # so the loop will keep running until it will have looked
    # at all the children of the children of the children, etc. of the candidate
    candidates.extend(node._get_child_candidates(distance, min_dist, max_dist))

return result

该代码包含几个智能部分:

  • 循环在一个列表上迭代,但是循环在迭代时列表会扩展:-)这是浏览所有这些嵌套数据的一种简洁方法,即使这样做有点危险,因为您可能会遇到无限循环。在这种情况下,请candidates.extend(node._get_child_candidates(distance, min_dist, max_dist))耗尽所有生成器的值,但是while继续创建新的生成器对象,因为它们未应用于同一节点,因此将产生与先前值不同的值。

  • extend()方法是期望可迭代并将其值添加到列表的列表对象方法。

通常我们将一个列表传递给它:

>>> a = [1, 2]
>>> b = [3, 4]
>>> a.extend(b)
>>> print(a)
[1, 2, 3, 4]

但是在您的代码中,它得到了一个生成器,这很好,因为:

  1. 您无需两次读取值。
  2. 您可能有很多孩子,并且您不希望所有孩子都存储在内存中。

它之所以有效,是因为Python不在乎方法的参数是否为列表。Python期望可迭代,因此它将与字符串,列表,元组和生成器一起使用!这就是所谓的鸭子输入,这是Python如此酷的原因之一。但这是另一个故事,还有另一个问题…

您可以在这里停止,或者阅读一点以了解生成器的高级用法:

控制生成器耗尽

>>> class Bank(): # Let's create a bank, building ATMs
...    crisis = False
...    def create_atm(self):
...        while not self.crisis:
...            yield "$100"
>>> hsbc = Bank() # When everything's ok the ATM gives you as much as you want
>>> corner_street_atm = hsbc.create_atm()
>>> print(corner_street_atm.next())
$100
>>> print(corner_street_atm.next())
$100
>>> print([corner_street_atm.next() for cash in range(5)])
['$100', '$100', '$100', '$100', '$100']
>>> hsbc.crisis = True # Crisis is coming, no more money!
>>> print(corner_street_atm.next())
<type 'exceptions.StopIteration'>
>>> wall_street_atm = hsbc.create_atm() # It's even true for new ATMs
>>> print(wall_street_atm.next())
<type 'exceptions.StopIteration'>
>>> hsbc.crisis = False # The trouble is, even post-crisis the ATM remains empty
>>> print(corner_street_atm.next())
<type 'exceptions.StopIteration'>
>>> brand_new_atm = hsbc.create_atm() # Build a new one to get back in business
>>> for cash in brand_new_atm:
...    print cash
$100
$100
$100
$100
$100
$100
$100
$100
$100
...

注意:对于Python 3,请使用print(corner_street_atm.__next__())print(next(corner_street_atm))

对于诸如控制对资源的访问之类的各种事情,它可能很有用。

Itertools,您最好的朋友

itertools模块包含用于操纵可迭代对象的特殊功能。曾经希望复制一个生成器吗?连锁两个生成器?用一行代码对嵌套列表中的值进行分组?Map / Zip没有创建另一个列表?

然后就import itertools

一个例子?让我们看一下四马比赛的可能到达顺序:

>>> horses = [1, 2, 3, 4]
>>> races = itertools.permutations(horses)
>>> print(races)
<itertools.permutations object at 0xb754f1dc>
>>> print(list(itertools.permutations(horses)))
[(1, 2, 3, 4),
 (1, 2, 4, 3),
 (1, 3, 2, 4),
 (1, 3, 4, 2),
 (1, 4, 2, 3),
 (1, 4, 3, 2),
 (2, 1, 3, 4),
 (2, 1, 4, 3),
 (2, 3, 1, 4),
 (2, 3, 4, 1),
 (2, 4, 1, 3),
 (2, 4, 3, 1),
 (3, 1, 2, 4),
 (3, 1, 4, 2),
 (3, 2, 1, 4),
 (3, 2, 4, 1),
 (3, 4, 1, 2),
 (3, 4, 2, 1),
 (4, 1, 2, 3),
 (4, 1, 3, 2),
 (4, 2, 1, 3),
 (4, 2, 3, 1),
 (4, 3, 1, 2),
 (4, 3, 2, 1)]

了解迭代的内部机制

迭代是一个隐含可迭代(实现__iter__()方法)和迭代器(实现__next__()方法)的过程。可迭代对象是可以从中获取迭代器的任何对象。迭代器是使您可以迭代的对象。

本文还提供了有关循环如何for工作的更多信息

To understand what yield does, you must understand what generators are. And before you can understand generators, you must understand iterables.

Iterables

When you create a list, you can read its items one by one. Reading its items one by one is called iteration:

>>> mylist = [1, 2, 3]
>>> for i in mylist:
...    print(i)
1
2
3

mylist is an iterable. When you use a list comprehension, you create a list, and so an iterable:

>>> mylist = [x*x for x in range(3)]
>>> for i in mylist:
...    print(i)
0
1
4

Everything you can use “for... in...” on is an iterable; lists, strings, files…

These iterables are handy because you can read them as much as you wish, but you store all the values in memory and this is not always what you want when you have a lot of values.

Generators

Generators are iterators, a kind of iterable you can only iterate over once. Generators do not store all the values in memory, they generate the values on the fly:

>>> mygenerator = (x*x for x in range(3))
>>> for i in mygenerator:
...    print(i)
0
1
4

It is just the same except you used () instead of []. BUT, you cannot perform for i in mygenerator a second time since generators can only be used once: they calculate 0, then forget about it and calculate 1, and end calculating 4, one by one.

Yield

yield is a keyword that is used like return, except the function will return a generator.

>>> def createGenerator():
...    mylist = range(3)
...    for i in mylist:
...        yield i*i
...
>>> mygenerator = createGenerator() # create a generator
>>> print(mygenerator) # mygenerator is an object!
<generator object createGenerator at 0xb7555c34>
>>> for i in mygenerator:
...     print(i)
0
1
4

Here it’s a useless example, but it’s handy when you know your function will return a huge set of values that you will only need to read once.

To master yield, you must understand that when you call the function, the code you have written in the function body does not run. The function only returns the generator object, this is a bit tricky :-)

Then, your code will continue from where it left off each time for uses the generator.

Now the hard part:

The first time the for calls the generator object created from your function, it will run the code in your function from the beginning until it hits yield, then it’ll return the first value of the loop. Then, each subsequent call will run another iteration of the loop you have written in the function and return the next value. This will continue until the generator is considered empty, which happens when the function runs without hitting yield. That can be because the loop has come to an end, or because you no longer satisfy an "if/else".


Your code explained

Generator:

# Here you create the method of the node object that will return the generator
def _get_child_candidates(self, distance, min_dist, max_dist):

    # Here is the code that will be called each time you use the generator object:

    # If there is still a child of the node object on its left
    # AND if the distance is ok, return the next child
    if self._leftchild and distance - max_dist < self._median:
        yield self._leftchild

    # If there is still a child of the node object on its right
    # AND if the distance is ok, return the next child
    if self._rightchild and distance + max_dist >= self._median:
        yield self._rightchild

    # If the function arrives here, the generator will be considered empty
    # there is no more than two values: the left and the right children

Caller:

# Create an empty list and a list with the current object reference
result, candidates = list(), [self]

# Loop on candidates (they contain only one element at the beginning)
while candidates:

    # Get the last candidate and remove it from the list
    node = candidates.pop()

    # Get the distance between obj and the candidate
    distance = node._get_dist(obj)

    # If distance is ok, then you can fill the result
    if distance <= max_dist and distance >= min_dist:
        result.extend(node._values)

    # Add the children of the candidate in the candidate's list
    # so the loop will keep running until it will have looked
    # at all the children of the children of the children, etc. of the candidate
    candidates.extend(node._get_child_candidates(distance, min_dist, max_dist))

return result

This code contains several smart parts:

  • The loop iterates on a list, but the list expands while the loop is being iterated :-) It’s a concise way to go through all these nested data even if it’s a bit dangerous since you can end up with an infinite loop. In this case, candidates.extend(node._get_child_candidates(distance, min_dist, max_dist)) exhaust all the values of the generator, but while keeps creating new generator objects which will produce different values from the previous ones since it’s not applied on the same node.

  • The extend() method is a list object method that expects an iterable and adds its values to the list.

Usually we pass a list to it:

>>> a = [1, 2]
>>> b = [3, 4]
>>> a.extend(b)
>>> print(a)
[1, 2, 3, 4]

But in your code, it gets a generator, which is good because:

  1. You don’t need to read the values twice.
  2. You may have a lot of children and you don’t want them all stored in memory.

And it works because Python does not care if the argument of a method is a list or not. Python expects iterables so it will work with strings, lists, tuples, and generators! This is called duck typing and is one of the reasons why Python is so cool. But this is another story, for another question…

You can stop here, or read a little bit to see an advanced use of a generator:

Controlling a generator exhaustion

>>> class Bank(): # Let's create a bank, building ATMs
...    crisis = False
...    def create_atm(self):
...        while not self.crisis:
...            yield "$100"
>>> hsbc = Bank() # When everything's ok the ATM gives you as much as you want
>>> corner_street_atm = hsbc.create_atm()
>>> print(corner_street_atm.next())
$100
>>> print(corner_street_atm.next())
$100
>>> print([corner_street_atm.next() for cash in range(5)])
['$100', '$100', '$100', '$100', '$100']
>>> hsbc.crisis = True # Crisis is coming, no more money!
>>> print(corner_street_atm.next())
<type 'exceptions.StopIteration'>
>>> wall_street_atm = hsbc.create_atm() # It's even true for new ATMs
>>> print(wall_street_atm.next())
<type 'exceptions.StopIteration'>
>>> hsbc.crisis = False # The trouble is, even post-crisis the ATM remains empty
>>> print(corner_street_atm.next())
<type 'exceptions.StopIteration'>
>>> brand_new_atm = hsbc.create_atm() # Build a new one to get back in business
>>> for cash in brand_new_atm:
...    print cash
$100
$100
$100
$100
$100
$100
$100
$100
$100
...

Note: For Python 3, useprint(corner_street_atm.__next__()) or print(next(corner_street_atm))

It can be useful for various things like controlling access to a resource.

Itertools, your best friend

The itertools module contains special functions to manipulate iterables. Ever wish to duplicate a generator? Chain two generators? Group values in a nested list with a one-liner? Map / Zip without creating another list?

Then just import itertools.

An example? Let’s see the possible orders of arrival for a four-horse race:

>>> horses = [1, 2, 3, 4]
>>> races = itertools.permutations(horses)
>>> print(races)
<itertools.permutations object at 0xb754f1dc>
>>> print(list(itertools.permutations(horses)))
[(1, 2, 3, 4),
 (1, 2, 4, 3),
 (1, 3, 2, 4),
 (1, 3, 4, 2),
 (1, 4, 2, 3),
 (1, 4, 3, 2),
 (2, 1, 3, 4),
 (2, 1, 4, 3),
 (2, 3, 1, 4),
 (2, 3, 4, 1),
 (2, 4, 1, 3),
 (2, 4, 3, 1),
 (3, 1, 2, 4),
 (3, 1, 4, 2),
 (3, 2, 1, 4),
 (3, 2, 4, 1),
 (3, 4, 1, 2),
 (3, 4, 2, 1),
 (4, 1, 2, 3),
 (4, 1, 3, 2),
 (4, 2, 1, 3),
 (4, 2, 3, 1),
 (4, 3, 1, 2),
 (4, 3, 2, 1)]

Understanding the inner mechanisms of iteration

Iteration is a process implying iterables (implementing the __iter__() method) and iterators (implementing the __next__() method). Iterables are any objects you can get an iterator from. Iterators are objects that let you iterate on iterables.

There is more about it in this article about how for loops work.


回答 1

理解的捷径 yield

当您看到带有yield语句的函数时,请应用以下简单技巧,以了解将发生的情况:

  1. result = []在函数的开头插入一行。
  2. 替换每个yield exprresult.append(expr)
  3. return result在函数底部插入一行。
  4. 是的-不再yield声明!阅读并找出代码。
  5. 将功能与原始定义进行比较。

这个技巧可能会让您对函数背后的逻辑yield有所了解,但是实际发生的事情与基于列表的方法发生的事情明显不同。在许多情况下,yield方法也将具有更高的内存效率和更快的速度。在其他情况下,即使原始函数运行正常,此技巧也会使您陷入无限循环。请继续阅读以了解更多信息…

不要混淆您的Iterable,Iterators和Generators

首先,迭代器协议 -当您编写时

for x in mylist:
    ...loop body...

Python执行以下两个步骤:

  1. 获取一个迭代器 mylist

    调用iter(mylist)->这将返回一个带有next()方法(或__next__()在Python 3中)。

    [这是大多数人忘记告诉您的步骤]

  2. 使用迭代器遍历项目:

    继续next()在从步骤1返回的迭代器上调用该方法。从的返回值next()被分配给x并执行循环体。如果StopIteration从内部引发异常next(),则意味着迭代器中没有更多值,并且退出了循环。

事实是,Python在想要遍历对象内容的任何时候都执行上述两个步骤-因此它可以是for循环,但也可以是类似的代码otherlist.extend(mylist)(其中otherlist是Python列表)。

mylist是一个可迭代的,因为它实现了迭代器协议。在用户定义的类中,可以实现该__iter__()方法以使您的类的实例可迭代。此方法应返回迭代器。迭代器是带有next()方法的对象。它可以同时实现__iter__(),并next()在同一类,并有__iter__()回报self。这适用于简单的情况,但是当您希望两个迭代器同时在同一个对象上循环时,则不能使用。

这就是迭代器协议,许多对象都实现了该协议:

  1. 内置列表,字典,元组,集合,文件。
  2. 实现的用户定义的类__iter__()
  3. 生成器。

请注意,for循环不知道它要处理的是哪种对象-它仅遵循迭代器协议,并且很高兴在调用时逐项获取next()。内置列表一一返回它们的项,词典一一返回,文件一一返回,依此类推。生成器返回…就是这样yield

def f123():
    yield 1
    yield 2
    yield 3

for item in f123():
    print item

yield如果没有三个return语句,f123()则只执行第一个语句,而不是语句,然后函数将退出。但是f123()没有普通的功能。当f123()被调用时,它不会返回yield语句中的任何值!它返回一个生成器对象。另外,该函数并没有真正退出-进入了挂起状态。当for循环尝试遍历生成器对象时,该函数从yield先前返回的下一行从其挂起状态恢复,执行下一行代码(在这种情况下为yield语句),并将其作为下一行返回项目。这会一直发生,直到函数退出,此时生成器将引发StopIteration,然后循环退出。

因此,生成器对象有点像适配器-在一端,它通过公开__iter__()next()保持for循环满意的方法来展示迭代器协议。但是,在另一端,它仅运行该函数以从中获取下一个值,然后将其放回暂停模式。

为什么使用生成器?

通常,您可以编写不使用生成器但实现相同逻辑的代码。一种选择是使用我之前提到的临时列表“技巧”。这并非在所有情况下都可行,例如,如果您有无限循环,或者当您的列表很长时,这可能会导致内存使用效率低下。另一种方法是实现一个新的可迭代类SomethingIter,该类将状态保留在实例成员中,并在其next()(或__next__()Python 3)方法中执行下一个逻辑步骤。根据逻辑,next()方法中的代码可能最终看起来非常复杂并且容易出现错误。在这里,生成器提供了一种干净而简单的解决方案。

Shortcut to understanding yield

When you see a function with yield statements, apply this easy trick to understand what will happen:

  1. Insert a line result = [] at the start of the function.
  2. Replace each yield expr with result.append(expr).
  3. Insert a line return result at the bottom of the function.
  4. Yay – no more yield statements! Read and figure out code.
  5. Compare function to the original definition.

This trick may give you an idea of the logic behind the function, but what actually happens with yield is significantly different than what happens in the list based approach. In many cases, the yield approach will be a lot more memory efficient and faster too. In other cases, this trick will get you stuck in an infinite loop, even though the original function works just fine. Read on to learn more…

Don’t confuse your Iterables, Iterators, and Generators

First, the iterator protocol – when you write

for x in mylist:
    ...loop body...

Python performs the following two steps:

  1. Gets an iterator for mylist:

    Call iter(mylist) -> this returns an object with a next() method (or __next__() in Python 3).

    [This is the step most people forget to tell you about]

  2. Uses the iterator to loop over items:

    Keep calling the next() method on the iterator returned from step 1. The return value from next() is assigned to x and the loop body is executed. If an exception StopIteration is raised from within next(), it means there are no more values in the iterator and the loop is exited.

The truth is Python performs the above two steps anytime it wants to loop over the contents of an object – so it could be a for loop, but it could also be code like otherlist.extend(mylist) (where otherlist is a Python list).

Here mylist is an iterable because it implements the iterator protocol. In a user-defined class, you can implement the __iter__() method to make instances of your class iterable. This method should return an iterator. An iterator is an object with a next() method. It is possible to implement both __iter__() and next() on the same class, and have __iter__() return self. This will work for simple cases, but not when you want two iterators looping over the same object at the same time.

So that’s the iterator protocol, many objects implement this protocol:

  1. Built-in lists, dictionaries, tuples, sets, files.
  2. User-defined classes that implement __iter__().
  3. Generators.

Note that a for loop doesn’t know what kind of object it’s dealing with – it just follows the iterator protocol, and is happy to get item after item as it calls next(). Built-in lists return their items one by one, dictionaries return the keys one by one, files return the lines one by one, etc. And generators return… well that’s where yield comes in:

def f123():
    yield 1
    yield 2
    yield 3

for item in f123():
    print item

Instead of yield statements, if you had three return statements in f123() only the first would get executed, and the function would exit. But f123() is no ordinary function. When f123() is called, it does not return any of the values in the yield statements! It returns a generator object. Also, the function does not really exit – it goes into a suspended state. When the for loop tries to loop over the generator object, the function resumes from its suspended state at the very next line after the yield it previously returned from, executes the next line of code, in this case, a yield statement, and returns that as the next item. This happens until the function exits, at which point the generator raises StopIteration, and the loop exits.

So the generator object is sort of like an adapter – at one end it exhibits the iterator protocol, by exposing __iter__() and next() methods to keep the for loop happy. At the other end, however, it runs the function just enough to get the next value out of it, and puts it back in suspended mode.

Why Use Generators?

Usually, you can write code that doesn’t use generators but implements the same logic. One option is to use the temporary list ‘trick’ I mentioned before. That will not work in all cases, for e.g. if you have infinite loops, or it may make inefficient use of memory when you have a really long list. The other approach is to implement a new iterable class SomethingIter that keeps the state in instance members and performs the next logical step in it’s next() (or __next__() in Python 3) method. Depending on the logic, the code inside the next() method may end up looking very complex and be prone to bugs. Here generators provide a clean and easy solution.


回答 2

这样想:

迭代器只是一个带有next()方法的对象的美化名词。因此,产生收益的函数最终是这样的:

原始版本:

def some_function():
    for i in xrange(4):
        yield i

for i in some_function():
    print i

这基本上是Python解释器使用上面的代码执行的操作:

class it:
    def __init__(self):
        # Start at -1 so that we get 0 when we add 1 below.
        self.count = -1

    # The __iter__ method will be called once by the 'for' loop.
    # The rest of the magic happens on the object returned by this method.
    # In this case it is the object itself.
    def __iter__(self):
        return self

    # The next method will be called repeatedly by the 'for' loop
    # until it raises StopIteration.
    def next(self):
        self.count += 1
        if self.count < 4:
            return self.count
        else:
            # A StopIteration exception is raised
            # to signal that the iterator is done.
            # This is caught implicitly by the 'for' loop.
            raise StopIteration

def some_func():
    return it()

for i in some_func():
    print i

为了更深入地了解幕后发生的事情,for可以将循环重写为:

iterator = some_func()
try:
    while 1:
        print iterator.next()
except StopIteration:
    pass

这是否更有意义,还是会让您更加困惑?:)

我要指出,这为了说明的目的过于简单化。:)

Think of it this way:

An iterator is just a fancy sounding term for an object that has a next() method. So a yield-ed function ends up being something like this:

Original version:

def some_function():
    for i in xrange(4):
        yield i

for i in some_function():
    print i

This is basically what the Python interpreter does with the above code:

class it:
    def __init__(self):
        # Start at -1 so that we get 0 when we add 1 below.
        self.count = -1

    # The __iter__ method will be called once by the 'for' loop.
    # The rest of the magic happens on the object returned by this method.
    # In this case it is the object itself.
    def __iter__(self):
        return self

    # The next method will be called repeatedly by the 'for' loop
    # until it raises StopIteration.
    def next(self):
        self.count += 1
        if self.count < 4:
            return self.count
        else:
            # A StopIteration exception is raised
            # to signal that the iterator is done.
            # This is caught implicitly by the 'for' loop.
            raise StopIteration

def some_func():
    return it()

for i in some_func():
    print i

For more insight as to what’s happening behind the scenes, the for loop can be rewritten to this:

iterator = some_func()
try:
    while 1:
        print iterator.next()
except StopIteration:
    pass

Does that make more sense or just confuse you more? :)

I should note that this is an oversimplification for illustrative purposes. :)


回答 3

yield关键字被减少到两个简单的事实:

  1. 如果编译器在函数内部的任何位置检测到yield关键字,则该函数不再通过该语句返回。相反,它立即返回一个懒惰的“待处理列表”对象,称为生成器return
  2. 生成器是可迭代的。什么是可迭代的?就像是listor或setor range或dict-view一样,它带有用于以特定顺序访问每个元素内置协议

简而言之:生成器是一个懒惰的,增量待定的list,并且yield语句允许您使用函数符号来编程生成器应逐渐吐出的列表值

generator = myYieldingFunction(...)
x = list(generator)

   generator
       v
[x[0], ..., ???]

         generator
             v
[x[0], x[1], ..., ???]

               generator
                   v
[x[0], x[1], x[2], ..., ???]

                       StopIteration exception
[x[0], x[1], x[2]]     done

list==[x[0], x[1], x[2]]

让我们定义一个makeRange类似于Python的函数range。调用makeRange(n)“返回生成器”:

def makeRange(n):
    # return 0,1,2,...,n-1
    i = 0
    while i < n:
        yield i
        i += 1

>>> makeRange(5)
<generator object makeRange at 0x19e4aa0>

要强制生成器立即返回其待处理的值,可以将其传递给list()(就像您可以进行任何迭代一样):

>>> list(makeRange(5))
[0, 1, 2, 3, 4]

将示例与“仅返回列表”进行比较

可以将上面的示例视为仅创建一个列表,并将其附加并返回:

# list-version                   #  # generator-version
def makeRange(n):                #  def makeRange(n):
    """return [0,1,2,...,n-1]""" #~     """return 0,1,2,...,n-1"""
    TO_RETURN = []               #>
    i = 0                        #      i = 0
    while i < n:                 #      while i < n:
        TO_RETURN += [i]         #~         yield i
        i += 1                   #          i += 1  ## indented
    return TO_RETURN             #>

>>> makeRange(5)
[0, 1, 2, 3, 4]

但是,有一个主要区别。请参阅最后一节。


您如何使用生成器

可迭代是列表理解的最后一部分,并且所有生成器都是可迭代的,因此经常像这样使用它们:

#                   _ITERABLE_
>>> [x+10 for x in makeRange(5)]
[10, 11, 12, 13, 14]

为了使生成器更好地使用,您可以使用该itertools模块(一定要使用chain.from_iterable而不是chain在保修期内)。例如,您甚至可以使用生成器来实现无限长的惰性列表,例如itertools.count()。您可以实现自己的def enumerate(iterable): zip(count(), iterable),也可以yield在while循环中使用关键字来实现。

请注意:生成器实际上可以用于更多事情,例如实现协程或不确定性编程或其他优雅的事情。但是,我在这里提出的“惰性列表”观点是您会发现的最常见用法。


幕后花絮

这就是“ Python迭代协议”的工作方式。就是说,当你做什么的时候list(makeRange(5))。这就是我之前所说的“懒惰的增量列表”。

>>> x=iter(range(5))
>>> next(x)
0
>>> next(x)
1
>>> next(x)
2
>>> next(x)
3
>>> next(x)
4
>>> next(x)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
StopIteration

内置函数next()仅调用对象.next()函数,它是“迭代协议”的一部分,可以在所有迭代器上找到。您可以手动使用next()函数(以及迭代协议的其他部分)来实现奇特的事情,通常是以牺牲可读性为代价的,因此请避免这样做。


细节

通常,大多数人不会关心以下区别,并且可能想在这里停止阅读。

用Python来说,可迭代对象是“了解for循环的概念”的任何对象,例如列表[1,2,3],而迭代器是所请求的for循环的特定实例,例如[1,2,3].__iter__()。一个生成器是完全一样的任何迭代器,除了它是写(带有功能语法)的方式。

当您从列表中请求迭代器时,它将创建一个新的迭代器。但是,当您从迭代器请求迭代器时(很少这样做),它只会为您提供自身的副本。

因此,在极少数情况下,您可能无法执行此类操作…

> x = myRange(5)
> list(x)
[0, 1, 2, 3, 4]
> list(x)
[]

…然后记住生成器是迭代器 ; 即是一次性使用。如果要重用它,则应myRange(...)再次调用。如果需要两次使用结果,请将结果转换为列表并将其存储在变量中x = list(myRange(5))。那些绝对需要克隆生成器的人(例如,正在可怕地修改程序的人)可以itertools.tee在绝对必要的情况下使用,因为可复制的迭代器Python PEP标准建议已被推迟。

The yield keyword is reduced to two simple facts:

  1. If the compiler detects the yield keyword anywhere inside a function, that function no longer returns via the return statement. Instead, it immediately returns a lazy “pending list” object called a generator
  2. A generator is iterable. What is an iterable? It’s anything like a list or set or range or dict-view, with a built-in protocol for visiting each element in a certain order.

In a nutshell: a generator is a lazy, incrementally-pending list, and yield statements allow you to use function notation to program the list values the generator should incrementally spit out.

generator = myYieldingFunction(...)
x = list(generator)

   generator
       v
[x[0], ..., ???]

         generator
             v
[x[0], x[1], ..., ???]

               generator
                   v
[x[0], x[1], x[2], ..., ???]

                       StopIteration exception
[x[0], x[1], x[2]]     done

list==[x[0], x[1], x[2]]

Example

Let’s define a function makeRange that’s just like Python’s range. Calling makeRange(n) RETURNS A GENERATOR:

def makeRange(n):
    # return 0,1,2,...,n-1
    i = 0
    while i < n:
        yield i
        i += 1

>>> makeRange(5)
<generator object makeRange at 0x19e4aa0>

To force the generator to immediately return its pending values, you can pass it into list() (just like you could any iterable):

>>> list(makeRange(5))
[0, 1, 2, 3, 4]

Comparing example to “just returning a list”

The above example can be thought of as merely creating a list which you append to and return:

# list-version                   #  # generator-version
def makeRange(n):                #  def makeRange(n):
    """return [0,1,2,...,n-1]""" #~     """return 0,1,2,...,n-1"""
    TO_RETURN = []               #>
    i = 0                        #      i = 0
    while i < n:                 #      while i < n:
        TO_RETURN += [i]         #~         yield i
        i += 1                   #          i += 1  ## indented
    return TO_RETURN             #>

>>> makeRange(5)
[0, 1, 2, 3, 4]

There is one major difference, though; see the last section.


How you might use generators

An iterable is the last part of a list comprehension, and all generators are iterable, so they’re often used like so:

#                   _ITERABLE_
>>> [x+10 for x in makeRange(5)]
[10, 11, 12, 13, 14]

To get a better feel for generators, you can play around with the itertools module (be sure to use chain.from_iterable rather than chain when warranted). For example, you might even use generators to implement infinitely-long lazy lists like itertools.count(). You could implement your own def enumerate(iterable): zip(count(), iterable), or alternatively do so with the yield keyword in a while-loop.

Please note: generators can actually be used for many more things, such as implementing coroutines or non-deterministic programming or other elegant things. However, the “lazy lists” viewpoint I present here is the most common use you will find.


Behind the scenes

This is how the “Python iteration protocol” works. That is, what is going on when you do list(makeRange(5)). This is what I describe earlier as a “lazy, incremental list”.

>>> x=iter(range(5))
>>> next(x)
0
>>> next(x)
1
>>> next(x)
2
>>> next(x)
3
>>> next(x)
4
>>> next(x)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
StopIteration

The built-in function next() just calls the objects .next() function, which is a part of the “iteration protocol” and is found on all iterators. You can manually use the next() function (and other parts of the iteration protocol) to implement fancy things, usually at the expense of readability, so try to avoid doing that…


Minutiae

Normally, most people would not care about the following distinctions and probably want to stop reading here.

In Python-speak, an iterable is any object which “understands the concept of a for-loop” like a list [1,2,3], and an iterator is a specific instance of the requested for-loop like [1,2,3].__iter__(). A generator is exactly the same as any iterator, except for the way it was written (with function syntax).

When you request an iterator from a list, it creates a new iterator. However, when you request an iterator from an iterator (which you would rarely do), it just gives you a copy of itself.

Thus, in the unlikely event that you are failing to do something like this…

> x = myRange(5)
> list(x)
[0, 1, 2, 3, 4]
> list(x)
[]

… then remember that a generator is an iterator; that is, it is one-time-use. If you want to reuse it, you should call myRange(...) again. If you need to use the result twice, convert the result to a list and store it in a variable x = list(myRange(5)). Those who absolutely need to clone a generator (for example, who are doing terrifyingly hackish metaprogramming) can use itertools.tee if absolutely necessary, since the copyable iterator Python PEP standards proposal has been deferred.


回答 4

什么是yield关键词在Python呢?

答案大纲/摘要

  • 具有的函数yield在被调用时将返回Generator
  • 生成器是迭代器,因为它们实现了迭代器协议,因此您可以对其进行迭代。
  • 也可以生成器发送信息,使其在概念上成为协程
  • 在Python 3中,您可以使用双向一个生成器委托给另一个生成器yield from
  • (附录对几个答案进行了评论,包括最上面的一个,并讨论了return在生成器中的用法。)

生成器:

yield仅在函数定义内部合法,并且函数定义中包含yield使其返回生成器。

生成器的想法来自具有不同实现方式的其他语言(请参见脚注1)。在Python的Generators中,代码的执行会在收益率点冻结。调用生成器时(下面将讨论方法),恢复执行,然后冻结下一个Yield。

yield提供了一种实现迭代器协议的简便方法,该协议由以下两种方法定义: __iter__next(Python 2)或__next__(Python 3)。这两种方法都使对象成为迭代器,您可以使用模块中的IteratorAbstract Base Class对其进行类型检查collections

>>> def func():
...     yield 'I am'
...     yield 'a generator!'
... 
>>> type(func)                 # A function with yield is still a function
<type 'function'>
>>> gen = func()
>>> type(gen)                  # but it returns a generator
<type 'generator'>
>>> hasattr(gen, '__iter__')   # that's an iterable
True
>>> hasattr(gen, 'next')       # and with .next (.__next__ in Python 3)
True                           # implements the iterator protocol.

生成器类型是迭代器的子类型:

>>> import collections, types
>>> issubclass(types.GeneratorType, collections.Iterator)
True

并且如有必要,我们可以像这样进行类型检查:

>>> isinstance(gen, types.GeneratorType)
True
>>> isinstance(gen, collections.Iterator)
True

的一个功能Iterator 是,一旦用尽,您将无法重复使用或重置它:

>>> list(gen)
['I am', 'a generator!']
>>> list(gen)
[]

如果要再次使用其功能,则必须另做一个(请参见脚注2):

>>> list(func())
['I am', 'a generator!']

一个人可以通过编程方式产生数据,例如:

def func(an_iterable):
    for item in an_iterable:
        yield item

上面的简单生成器也等效于下面的生成器-从Python 3.3开始(在Python 2中不可用),您可以使用yield from

def func(an_iterable):
    yield from an_iterable

但是,yield from还允许委派给子生成器,这将在以下有关使用子协程进行合作委派的部分中进行解释。

协程:

yield 形成一个表达式,该表达式允许将数据发送到生成器中(请参见脚注3)

这是一个示例,请注意该received变量,该变量将指向发送到生成器的数据:

def bank_account(deposited, interest_rate):
    while True:
        calculated_interest = interest_rate * deposited 
        received = yield calculated_interest
        if received:
            deposited += received


>>> my_account = bank_account(1000, .05)

首先,我们必须使内置函数生成器排队next。它将调用适当的next__next__方法,具体取决于您所使用的Python版本:

>>> first_year_interest = next(my_account)
>>> first_year_interest
50.0

现在我们可以将数据发送到生成器中。(发送None与呼叫相同next。):

>>> next_year_interest = my_account.send(first_year_interest + 1000)
>>> next_year_interest
102.5

合作协办小组 yield from

现在,回想一下yield fromPython 3中可用的功能。这使我们可以将协程委托给子协程:

def money_manager(expected_rate):
    under_management = yield     # must receive deposited value
    while True:
        try:
            additional_investment = yield expected_rate * under_management 
            if additional_investment:
                under_management += additional_investment
        except GeneratorExit:
            '''TODO: write function to send unclaimed funds to state'''
        finally:
            '''TODO: write function to mail tax info to client'''


def investment_account(deposited, manager):
    '''very simple model of an investment account that delegates to a manager'''
    next(manager) # must queue up manager
    manager.send(deposited)
    while True:
        try:
            yield from manager
        except GeneratorExit:
            return manager.close()

现在我们可以将功能委派给子生成器,并且生成器可以像上面一样使用它:

>>> my_manager = money_manager(.06)
>>> my_account = investment_account(1000, my_manager)
>>> first_year_return = next(my_account)
>>> first_year_return
60.0
>>> next_year_return = my_account.send(first_year_return + 1000)
>>> next_year_return
123.6

你可以阅读更多的精确语义yield fromPEP 380。

其他方法:关闭并抛出

close方法GeneratorExit在函数执行被冻结的时候引发。这也将由调用,__del__因此您可以将任何清理代码放在处理位置GeneratorExit

>>> my_account.close()

您还可以引发异常,该异常可以在生成器中处理或传播回用户:

>>> import sys
>>> try:
...     raise ValueError
... except:
...     my_manager.throw(*sys.exc_info())
... 
Traceback (most recent call last):
  File "<stdin>", line 4, in <module>
  File "<stdin>", line 2, in <module>
ValueError

结论

我相信我已经涵盖了以下问题的各个方面:

什么是yield关键词在Python呢?

事实证明,这样yield做确实很有帮助。我相信我可以为此添加更详尽的示例。如果您想要更多或有建设性的批评,请在下面评论中告诉我。


附录:

对最佳/可接受答案的评论**

  • 仅以列表为例,它对使可迭代的内容感到困惑。请参阅上面的参考资料,但总而言之:iterable具有__iter__返回iterator的方法。一个迭代器提供了一个.next(Python 2里或.__next__(Python 3的)方法,它是隐式由称为for循环,直到它提出StopIteration,并且一旦这样做,将继续这样做。
  • 然后,它使用生成器表达式来描述什么是生成器。由于生成器只是创建迭代器的一种简便方法,因此它只会使事情变得混乱,而我们仍然没有涉及到这一yield部分。
  • 控制生成器的排气中,他调用了.next方法,而应该使用内置函数next。这将是一个适当的间接层,因为他的代码在Python 3中不起作用。
  • Itertools?这根本与做什么无关yield
  • 没有讨论yieldyield fromPython 3中的新功能一起提供的方法。最高/可接受的答案是非常不完整的答案。

yield生成器表达或理解中提出的答案的评论。

该语法当前允许列表理解中的任何表达式。

expr_stmt: testlist_star_expr (annassign | augassign (yield_expr|testlist) |
                     ('=' (yield_expr|testlist_star_expr))*)
...
yield_expr: 'yield' [yield_arg]
yield_arg: 'from' test | testlist

由于yield是一种表达,因此尽管没有特别好的用例,但有人认为它可以用于理解或生成器表达中。

CPython核心开发人员正在讨论弃用其津贴。这是邮件列表中的相关帖子:

2017年1月30日19:05,布雷特·坎农写道:

2017年1月29日星期日,克雷格·罗德里格斯(Craig Rodrigues)在星期日写道:

两种方法我都可以。恕我直言,把事情留在Python 3中是不好的。

我的投票是SyntaxError,因为您没有从语法中得到期望。

我同意这对我们来说是一个明智的选择,因为依赖当前行为的任何代码确实太聪明了,无法维护。

在到达目的地方面,我们可能需要:

  • 3.7中的语法警告或弃用警告
  • 2.7.x中的Py3k警告
  • 3.8中的SyntaxError

干杯,尼克。

-Nick Coghlan | gmail.com上的ncoghlan | 澳大利亚布里斯班

此外,还有一个悬而未决的问题(10544),似乎正说明这绝不是一个好主意(PyPy,用Python编写的Python实现,已经在发出语法警告。)

最重要的是,直到CPython的开发人员另行告诉我们为止:不要放入yield生成器表达式或理解。

return生成器中的语句

Python 2中

在生成器函数中,该return语句不允许包含expression_list。在这种情况下,裸露return表示生成器已完成并且将引起StopIteration提升。

An expression_list基本上是由逗号分隔的任意数量的表达式-本质上,在Python 2中,您可以使用停止生成器return,但不能返回值。

Python 3中

在生成器函数中,该return语句指示生成器完成并且将引起StopIteration提升。返回的值(如果有)用作构造的参数,StopIteration并成为StopIteration.value属性。

脚注

  1. 提案中引用了CLU,Sather和Icon语言,以将生成器的概念引入Python。总体思路是,一个函数可以维护内部状态并根据用户的需要产生中间数据点。这有望在性能上优于其他方法,包括Python线程,该方法甚至在某些系统上不可用。

  2. 例如,这意味着xrange对象(range在Python 3中)不是Iterator,即使它们是可迭代的,因为它们可以被重用。像列表一样,它们的__iter__方法返回迭代器对象。

  3. yield最初是作为语句引入的,这意味着它只能出现在代码块的一行的开头。现在yield创建一个yield表达式。 https://docs.python.org/2/reference/simple_stmts.html#grammar-token-yield_stmt 提出 此更改是为了允许用户将数据发送到生成器中,就像接收数据一样。要发送数据,必须能够将其分配给某物,为此,一条语句就行不通了。

What does the yield keyword do in Python?

Answer Outline/Summary

  • A function with yield, when called, returns a Generator.
  • Generators are iterators because they implement the iterator protocol, so you can iterate over them.
  • A generator can also be sent information, making it conceptually a coroutine.
  • In Python 3, you can delegate from one generator to another in both directions with yield from.
  • (Appendix critiques a couple of answers, including the top one, and discusses the use of return in a generator.)

Generators:

yield is only legal inside of a function definition, and the inclusion of yield in a function definition makes it return a generator.

The idea for generators comes from other languages (see footnote 1) with varying implementations. In Python’s Generators, the execution of the code is frozen at the point of the yield. When the generator is called (methods are discussed below) execution resumes and then freezes at the next yield.

yield provides an easy way of implementing the iterator protocol, defined by the following two methods: __iter__ and next (Python 2) or __next__ (Python 3). Both of those methods make an object an iterator that you could type-check with the Iterator Abstract Base Class from the collections module.

>>> def func():
...     yield 'I am'
...     yield 'a generator!'
... 
>>> type(func)                 # A function with yield is still a function
<type 'function'>
>>> gen = func()
>>> type(gen)                  # but it returns a generator
<type 'generator'>
>>> hasattr(gen, '__iter__')   # that's an iterable
True
>>> hasattr(gen, 'next')       # and with .next (.__next__ in Python 3)
True                           # implements the iterator protocol.

The generator type is a sub-type of iterator:

>>> import collections, types
>>> issubclass(types.GeneratorType, collections.Iterator)
True

And if necessary, we can type-check like this:

>>> isinstance(gen, types.GeneratorType)
True
>>> isinstance(gen, collections.Iterator)
True

A feature of an Iterator is that once exhausted, you can’t reuse or reset it:

>>> list(gen)
['I am', 'a generator!']
>>> list(gen)
[]

You’ll have to make another if you want to use its functionality again (see footnote 2):

>>> list(func())
['I am', 'a generator!']

One can yield data programmatically, for example:

def func(an_iterable):
    for item in an_iterable:
        yield item

The above simple generator is also equivalent to the below – as of Python 3.3 (and not available in Python 2), you can use yield from:

def func(an_iterable):
    yield from an_iterable

However, yield from also allows for delegation to subgenerators, which will be explained in the following section on cooperative delegation with sub-coroutines.

Coroutines:

yield forms an expression that allows data to be sent into the generator (see footnote 3)

Here is an example, take note of the received variable, which will point to the data that is sent to the generator:

def bank_account(deposited, interest_rate):
    while True:
        calculated_interest = interest_rate * deposited 
        received = yield calculated_interest
        if received:
            deposited += received


>>> my_account = bank_account(1000, .05)

First, we must queue up the generator with the builtin function, next. It will call the appropriate next or __next__ method, depending on the version of Python you are using:

>>> first_year_interest = next(my_account)
>>> first_year_interest
50.0

And now we can send data into the generator. (Sending None is the same as calling next.) :

>>> next_year_interest = my_account.send(first_year_interest + 1000)
>>> next_year_interest
102.5

Cooperative Delegation to Sub-Coroutine with yield from

Now, recall that yield from is available in Python 3. This allows us to delegate coroutines to a subcoroutine:

def money_manager(expected_rate):
    under_management = yield     # must receive deposited value
    while True:
        try:
            additional_investment = yield expected_rate * under_management 
            if additional_investment:
                under_management += additional_investment
        except GeneratorExit:
            '''TODO: write function to send unclaimed funds to state'''
        finally:
            '''TODO: write function to mail tax info to client'''


def investment_account(deposited, manager):
    '''very simple model of an investment account that delegates to a manager'''
    next(manager) # must queue up manager
    manager.send(deposited)
    while True:
        try:
            yield from manager
        except GeneratorExit:
            return manager.close()

And now we can delegate functionality to a sub-generator and it can be used by a generator just as above:

>>> my_manager = money_manager(.06)
>>> my_account = investment_account(1000, my_manager)
>>> first_year_return = next(my_account)
>>> first_year_return
60.0
>>> next_year_return = my_account.send(first_year_return + 1000)
>>> next_year_return
123.6

You can read more about the precise semantics of yield from in PEP 380.

Other Methods: close and throw

The close method raises GeneratorExit at the point the function execution was frozen. This will also be called by __del__ so you can put any cleanup code where you handle the GeneratorExit:

>>> my_account.close()

You can also throw an exception which can be handled in the generator or propagated back to the user:

>>> import sys
>>> try:
...     raise ValueError
... except:
...     my_manager.throw(*sys.exc_info())
... 
Traceback (most recent call last):
  File "<stdin>", line 4, in <module>
  File "<stdin>", line 2, in <module>
ValueError

Conclusion

I believe I have covered all aspects of the following question:

What does the yield keyword do in Python?

It turns out that yield does a lot. I’m sure I could add even more thorough examples to this. If you want more or have some constructive criticism, let me know by commenting below.


Appendix:

Critique of the Top/Accepted Answer**

  • It is confused on what makes an iterable, just using a list as an example. See my references above, but in summary: an iterable has an __iter__ method returning an iterator. An iterator provides a .next (Python 2 or .__next__ (Python 3) method, which is implicitly called by for loops until it raises StopIteration, and once it does, it will continue to do so.
  • It then uses a generator expression to describe what a generator is. Since a generator is simply a convenient way to create an iterator, it only confuses the matter, and we still have not yet gotten to the yield part.
  • In Controlling a generator exhaustion he calls the .next method, when instead he should use the builtin function, next. It would be an appropriate layer of indirection, because his code does not work in Python 3.
  • Itertools? This was not relevant to what yield does at all.
  • No discussion of the methods that yield provides along with the new functionality yield from in Python 3. The top/accepted answer is a very incomplete answer.

Critique of answer suggesting yield in a generator expression or comprehension.

The grammar currently allows any expression in a list comprehension.

expr_stmt: testlist_star_expr (annassign | augassign (yield_expr|testlist) |
                     ('=' (yield_expr|testlist_star_expr))*)
...
yield_expr: 'yield' [yield_arg]
yield_arg: 'from' test | testlist

Since yield is an expression, it has been touted by some as interesting to use it in comprehensions or generator expression – in spite of citing no particularly good use-case.

The CPython core developers are discussing deprecating its allowance. Here’s a relevant post from the mailing list:

On 30 January 2017 at 19:05, Brett Cannon wrote:

On Sun, 29 Jan 2017 at 16:39 Craig Rodrigues wrote:

I’m OK with either approach. Leaving things the way they are in Python 3 is no good, IMHO.

My vote is it be a SyntaxError since you’re not getting what you expect from the syntax.

I’d agree that’s a sensible place for us to end up, as any code relying on the current behaviour is really too clever to be maintainable.

In terms of getting there, we’ll likely want:

  • SyntaxWarning or DeprecationWarning in 3.7
  • Py3k warning in 2.7.x
  • SyntaxError in 3.8

Cheers, Nick.

— Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia

Further, there is an outstanding issue (10544) which seems to be pointing in the direction of this never being a good idea (PyPy, a Python implementation written in Python, is already raising syntax warnings.)

Bottom line, until the developers of CPython tell us otherwise: Don’t put yield in a generator expression or comprehension.

The return statement in a generator

In Python 2:

In a generator function, the return statement is not allowed to include an expression_list. In that context, a bare return indicates that the generator is done and will cause StopIteration to be raised.

An expression_list is basically any number of expressions separated by commas – essentially, in Python 2, you can stop the generator with return, but you can’t return a value.

In Python 3:

In a generator function, the return statement indicates that the generator is done and will cause StopIteration to be raised. The returned value (if any) is used as an argument to construct StopIteration and becomes the StopIteration.value attribute.

Footnotes

  1. The languages CLU, Sather, and Icon were referenced in the proposal to introduce the concept of generators to Python. The general idea is that a function can maintain internal state and yield intermediate data points on demand by the user. This promised to be superior in performance to other approaches, including Python threading, which isn’t even available on some systems.

  2. This means, for example, that xrange objects (range in Python 3) aren’t Iterators, even though they are iterable, because they can be reused. Like lists, their __iter__ methods return iterator objects.

  3. yield was originally introduced as a statement, meaning that it could only appear at the beginning of a line in a code block. Now yield creates a yield expression. https://docs.python.org/2/reference/simple_stmts.html#grammar-token-yield_stmt This change was proposed to allow a user to send data into the generator just as one might receive it. To send data, one must be able to assign it to something, and for that, a statement just won’t work.


回答 5

yield就像return-它返回您告诉的内容(作为生成器)。不同之处在于,下一次您调用生成器时,执行将从上一次对yield语句的调用开始。与return不同的是,在产生良率时不会清除堆栈帧,但是会将控制权转移回调用方,因此下次调用该函数时,其状态将恢复。

就您的代码而言,该函数get_child_candidates的作用就像一个迭代器,以便在扩展列表时,它一次将一个元素添加到新列表中。

list.extend调用迭代器,直到耗尽为止。在您发布的代码示例的情况下,只返回一个元组并将其添加到列表中会更加清楚。

yield is just like return – it returns whatever you tell it to (as a generator). The difference is that the next time you call the generator, execution starts from the last call to the yield statement. Unlike return, the stack frame is not cleaned up when a yield occurs, however control is transferred back to the caller, so its state will resume the next time the function is called.

In the case of your code, the function get_child_candidates is acting like an iterator so that when you extend your list, it adds one element at a time to the new list.

list.extend calls an iterator until it’s exhausted. In the case of the code sample you posted, it would be much clearer to just return a tuple and append that to the list.


回答 6

还有另外一件事要提及:yield的函数实际上不必终止。我写了这样的代码:

def fib():
    last, cur = 0, 1
    while True: 
        yield cur
        last, cur = cur, last + cur

然后我可以在其他代码中使用它:

for f in fib():
    if some_condition: break
    coolfuncs(f);

它确实有助于简化某些问题,并使某些事情更易于使用。

There’s one extra thing to mention: a function that yields doesn’t actually have to terminate. I’ve written code like this:

def fib():
    last, cur = 0, 1
    while True: 
        yield cur
        last, cur = cur, last + cur

Then I can use it in other code like this:

for f in fib():
    if some_condition: break
    coolfuncs(f);

It really helps simplify some problems, and makes some things easier to work with.


回答 7

对于那些偏爱简单示例的人,请在此交互式Python会话中进行冥想:

>>> def f():
...   yield 1
...   yield 2
...   yield 3
... 
>>> g = f()
>>> for i in g:
...   print(i)
... 
1
2
3
>>> for i in g:
...   print(i)
... 
>>> # Note that this time nothing was printed

For those who prefer a minimal working example, meditate on this interactive Python session:

>>> def f():
...   yield 1
...   yield 2
...   yield 3
... 
>>> g = f()
>>> for i in g:
...   print(i)
... 
1
2
3
>>> for i in g:
...   print(i)
... 
>>> # Note that this time nothing was printed

回答 8

TL; DR

代替这个:

def square_list(n):
    the_list = []                         # Replace
    for x in range(n):
        y = x * x
        the_list.append(y)                # these
    return the_list                       # lines

做这个:

def square_yield(n):
    for x in range(n):
        y = x * x
        yield y                           # with this one.

每当您发现自己从头开始建立清单时,就yield逐一列出。

这是我第一次屈服。


yield是一种含蓄的说法

建立一系列的东西

相同的行为:

>>> for square in square_list(4):
...     print(square)
...
0
1
4
9
>>> for square in square_yield(4):
...     print(square)
...
0
1
4
9

不同的行为:

收益是单次通过:您只能迭代一次。当一个函数包含一个yield时,我们称其为Generator函数。还有一个迭代器就是它返回的内容。这些术语在揭示。我们失去了容器的便利性,但获得了按需计算且任意长的序列的功效。

Yield懒惰,推迟了计算。当您调用函数时,其中包含yield的函数实际上根本不会执行。它返回一个迭代器对象,该对象记住它从何处中断。每次您调用next()迭代器(这在for循环中发生)时,执行都会向前推进到下一个收益。return引发StopIteration并结束序列(这是for循环的自然结束)。

Yield多才多艺。数据不必全部存储在一起,可以一次存储一次。它可以是无限的。

>>> def squares_all_of_them():
...     x = 0
...     while True:
...         yield x * x
...         x += 1
...
>>> squares = squares_all_of_them()
>>> for _ in range(4):
...     print(next(squares))
...
0
1
4
9

如果您需要多次通过,而系列又不太长,只需调用list()它:

>>> list(square_yield(4))
[0, 1, 4, 9]

单词的出色选择,yield因为两种含义都适用:

Yield —生产或提供(如在农业中)

…提供系列中的下一个数据。

屈服 —让步或放弃(如在政治权力中一样)

…放弃CPU执行,直到迭代器前进。

TL;DR

Instead of this:

def square_list(n):
    the_list = []                         # Replace
    for x in range(n):
        y = x * x
        the_list.append(y)                # these
    return the_list                       # lines

do this:

def square_yield(n):
    for x in range(n):
        y = x * x
        yield y                           # with this one.

Whenever you find yourself building a list from scratch, yield each piece instead.

This was my first “aha” moment with yield.


yield is a sugary way to say

build a series of stuff

Same behavior:

>>> for square in square_list(4):
...     print(square)
...
0
1
4
9
>>> for square in square_yield(4):
...     print(square)
...
0
1
4
9

Different behavior:

Yield is single-pass: you can only iterate through once. When a function has a yield in it we call it a generator function. And an iterator is what it returns. Those terms are revealing. We lose the convenience of a container, but gain the power of a series that’s computed as needed, and arbitrarily long.

Yield is lazy, it puts off computation. A function with a yield in it doesn’t actually execute at all when you call it. It returns an iterator object that remembers where it left off. Each time you call next() on the iterator (this happens in a for-loop) execution inches forward to the next yield. return raises StopIteration and ends the series (this is the natural end of a for-loop).

Yield is versatile. Data doesn’t have to be stored all together, it can be made available one at a time. It can be infinite.

>>> def squares_all_of_them():
...     x = 0
...     while True:
...         yield x * x
...         x += 1
...
>>> squares = squares_all_of_them()
>>> for _ in range(4):
...     print(next(squares))
...
0
1
4
9

If you need multiple passes and the series isn’t too long, just call list() on it:

>>> list(square_yield(4))
[0, 1, 4, 9]

Brilliant choice of the word yield because both meanings apply:

yield — produce or provide (as in agriculture)

…provide the next data in the series.

yield — give way or relinquish (as in political power)

…relinquish CPU execution until the iterator advances.


回答 9

Yield可以为您提供生成器。

def get_odd_numbers(i):
    return range(1, i, 2)
def yield_odd_numbers(i):
    for x in range(1, i, 2):
       yield x
foo = get_odd_numbers(10)
bar = yield_odd_numbers(10)
foo
[1, 3, 5, 7, 9]
bar
<generator object yield_odd_numbers at 0x1029c6f50>
bar.next()
1
bar.next()
3
bar.next()
5

如您所见,在第一种情况下,foo将整个列表立即保存在内存中。对于包含5个元素的列表来说,这不是什么大问题,但是如果您想要500万个列表,该怎么办?这不仅是一个巨大的内存消耗者,而且在调用该函数时还花费大量时间来构建。

在第二种情况下,bar只需为您提供一个生成器。生成器是可迭代的-这意味着您可以在for循环等中使用它,但是每个值只能被访问一次。所有的值也不会同时存储在存储器中。生成器对象“记住”您上次调用它时在循环中的位置-这样,如果您使用的是一个迭代的(例如)计数为500亿,则不必计数为500亿立即存储500亿个数字以进行计算。

再次,这是一个非常人为的示例,如果您真的想计数到500亿,则可能会使用itertools。:)

这是生成器最简单的用例。如您所说,它可以用来编写有效的排列,使用yield可以将内容推入调用堆栈,而不是使用某种堆栈变量。生成器还可以用于特殊的树遍历以及所有其他方式。

Yield gives you a generator.

def get_odd_numbers(i):
    return range(1, i, 2)
def yield_odd_numbers(i):
    for x in range(1, i, 2):
       yield x
foo = get_odd_numbers(10)
bar = yield_odd_numbers(10)
foo
[1, 3, 5, 7, 9]
bar
<generator object yield_odd_numbers at 0x1029c6f50>
bar.next()
1
bar.next()
3
bar.next()
5

As you can see, in the first case foo holds the entire list in memory at once. It’s not a big deal for a list with 5 elements, but what if you want a list of 5 million? Not only is this a huge memory eater, it also costs a lot of time to build at the time that the function is called.

In the second case, bar just gives you a generator. A generator is an iterable–which means you can use it in a for loop, etc, but each value can only be accessed once. All the values are also not stored in memory at the same time; the generator object “remembers” where it was in the looping the last time you called it–this way, if you’re using an iterable to (say) count to 50 billion, you don’t have to count to 50 billion all at once and store the 50 billion numbers to count through.

Again, this is a pretty contrived example, you probably would use itertools if you really wanted to count to 50 billion. :)

This is the most simple use case of generators. As you said, it can be used to write efficient permutations, using yield to push things up through the call stack instead of using some sort of stack variable. Generators can also be used for specialized tree traversal, and all manner of other things.


回答 10

它正在返回生成器。我对Python并不是特别熟悉,但是如果您熟悉C#的迭代器块,我相信它与C#的迭代器块一样

关键思想是,编译器/解释器/无论做什么都做一些技巧,以便就调用者而言,他们可以继续调用next(),并且将继续返回值- 就像Generator方法已暂停一样。现在显然您不能真正地“暂停”方法,因此编译器构建了一个状态机,供您记住您当前所​​在的位置以及局部变量等的外观。这比自己编写迭代器要容易得多。

It’s returning a generator. I’m not particularly familiar with Python, but I believe it’s the same kind of thing as C#’s iterator blocks if you’re familiar with those.

The key idea is that the compiler/interpreter/whatever does some trickery so that as far as the caller is concerned, they can keep calling next() and it will keep returning values – as if the generator method was paused. Now obviously you can’t really “pause” a method, so the compiler builds a state machine for you to remember where you currently are and what the local variables etc look like. This is much easier than writing an iterator yourself.


回答 11

在描述如何使用生成器的许多很棒的答案中,我还没有给出一种答案。这是编程语言理论的答案:

yieldPython中的语句返回一个生成器。Python中的生成器是一个返回延续的函数(特别是协程类型,但是延续代表了一种更通用的机制来了解正在发生的事情)。

编程语言理论中的连续性是一种更为基础的计算,但是由于它们很难推理而且也很难实现,因此并不经常使用。但是,关于延续是什么的想法很简单:只是尚未完成的计算状态。在此状态下,将保存变量的当前值,尚未执行的操作等。然后,在稍后的某个时刻,可以在程序中调用继续,以便将程序的变量重置为该状态,并执行保存的操作。

以这种更一般的形式进行的延续可以两种方式实现。在call/cc方式,程序的堆栈字面上保存,然后调用延续时,堆栈恢复。

在延续传递样式(CPS)中,延续只是普通的函数(仅在函数是第一类的语言中),程序员明确地对其进行管理并传递给子例程。以这种方式,程序状态由闭包(以及恰好在其中编码的变量)表示,而不是驻留在堆栈中某个位置的变量。管理控制流的函数接受连续作为参数(在CPS的某些变体中,函数可以接受多个连续),并通过简单地调用它们并随后返回来调用它们来操纵控制流。延续传递样式的一个非常简单的示例如下:

def save_file(filename):
  def write_file_continuation():
    write_stuff_to_file(filename)

  check_if_file_exists_and_user_wants_to_overwrite(write_file_continuation)

在这个(非常简单的)示例中,程序员保存了将文件实际写入连续的操作(该操作可能是非常复杂的操作,需要写出许多细节),然后传递该连续(例如,首先类闭包)给另一个进行更多处理的运算符,然后在必要时调用它。(我在实际的GUI编程中经常使用这种设计模式,这是因为它节省了我的代码行,或更重要的是,在GUI事件触发后管理了控制流。)

在不失一般性的前提下,本文的其余部分将连续性概念化为CPS,因为它很容易理解和阅读。


现在让我们谈谈Python中的生成器。生成器是延续的特定子类型。而延续能够在一般的保存状态计算(即程序调用堆栈),生成器只能保存迭代的状态经过一个迭代器。虽然,对于生成器的某些用例,此定义有些误导。例如:

def f():
  while True:
    yield 4

显然,这是一个合理的迭代器,其行为已得到很好的定义-每次生成器对其进行迭代时,它都会返回4(并永远这样做)。但是,在考虑迭代器(即for x in collection: do_something(x))时,可能并没有想到可迭代的原型类型。此示例说明了生成器的功能:如果有什么是迭代器,生成器可以保存其迭代状态。

重申一下:连续可以保存程序堆栈的状态,而生成器可以保存迭代的状态。这意味着延续比生成器强大得多,但是生成器也非常简单。它们对于语言设计者来说更容易实现,对程序员来说也更容易使用(如果您有时间要燃烧,请尝试阅读并理解有关延续和call / cc的本页)。

但是您可以轻松地将生成器实现(并概念化)为连续传递样式的一种简单的特定情况:

每当yield调用时,它告诉函数返回一个延续。再次调用该函数时,将从中断处开始。因此,在伪伪代码(即不是伪代码,而不是代码)中,生成器的next方法基本上如下:

class Generator():
  def __init__(self,iterable,generatorfun):
    self.next_continuation = lambda:generatorfun(iterable)

  def next(self):
    value, next_continuation = self.next_continuation()
    self.next_continuation = next_continuation
    return value

其中,yield关键字实际上是真正的生成器功能语法糖,基本上是这样的:

def generatorfun(iterable):
  if len(iterable) == 0:
    raise StopIteration
  else:
    return (iterable[0], lambda:generatorfun(iterable[1:]))

请记住,这只是伪代码,Python中生成器的实际实现更为复杂。但是,作为练习以了解发生了什么,请尝试使用连续传递样式来实现生成器对象,而不使用yield关键字。

There is one type of answer that I don’t feel has been given yet, among the many great answers that describe how to use generators. Here is the programming language theory answer:

The yield statement in Python returns a generator. A generator in Python is a function that returns continuations (and specifically a type of coroutine, but continuations represent the more general mechanism to understand what is going on).

Continuations in programming languages theory are a much more fundamental kind of computation, but they are not often used, because they are extremely hard to reason about and also very difficult to implement. But the idea of what a continuation is, is straightforward: it is the state of a computation that has not yet finished. In this state, the current values of variables, the operations that have yet to be performed, and so on, are saved. Then at some point later in the program the continuation can be invoked, such that the program’s variables are reset to that state and the operations that were saved are carried out.

Continuations, in this more general form, can be implemented in two ways. In the call/cc way, the program’s stack is literally saved and then when the continuation is invoked, the stack is restored.

In continuation passing style (CPS), continuations are just normal functions (only in languages where functions are first class) which the programmer explicitly manages and passes around to subroutines. In this style, program state is represented by closures (and the variables that happen to be encoded in them) rather than variables that reside somewhere on the stack. Functions that manage control flow accept continuation as arguments (in some variations of CPS, functions may accept multiple continuations) and manipulate control flow by invoking them by simply calling them and returning afterwards. A very simple example of continuation passing style is as follows:

def save_file(filename):
  def write_file_continuation():
    write_stuff_to_file(filename)

  check_if_file_exists_and_user_wants_to_overwrite(write_file_continuation)

In this (very simplistic) example, the programmer saves the operation of actually writing the file into a continuation (which can potentially be a very complex operation with many details to write out), and then passes that continuation (i.e, as a first-class closure) to another operator which does some more processing, and then calls it if necessary. (I use this design pattern a lot in actual GUI programming, either because it saves me lines of code or, more importantly, to manage control flow after GUI events trigger.)

The rest of this post will, without loss of generality, conceptualize continuations as CPS, because it is a hell of a lot easier to understand and read.


Now let’s talk about generators in Python. Generators are a specific subtype of continuation. Whereas continuations are able in general to save the state of a computation (i.e., the program’s call stack), generators are only able to save the state of iteration over an iterator. Although, this definition is slightly misleading for certain use cases of generators. For instance:

def f():
  while True:
    yield 4

This is clearly a reasonable iterable whose behavior is well defined — each time the generator iterates over it, it returns 4 (and does so forever). But it isn’t probably the prototypical type of iterable that comes to mind when thinking of iterators (i.e., for x in collection: do_something(x)). This example illustrates the power of generators: if anything is an iterator, a generator can save the state of its iteration.

To reiterate: Continuations can save the state of a program’s stack and generators can save the state of iteration. This means that continuations are more a lot powerful than generators, but also that generators are a lot, lot easier. They are easier for the language designer to implement, and they are easier for the programmer to use (if you have some time to burn, try to read and understand this page about continuations and call/cc).

But you could easily implement (and conceptualize) generators as a simple, specific case of continuation passing style:

Whenever yield is called, it tells the function to return a continuation. When the function is called again, it starts from wherever it left off. So, in pseudo-pseudocode (i.e., not pseudocode, but not code) the generator’s next method is basically as follows:

class Generator():
  def __init__(self,iterable,generatorfun):
    self.next_continuation = lambda:generatorfun(iterable)

  def next(self):
    value, next_continuation = self.next_continuation()
    self.next_continuation = next_continuation
    return value

where the yield keyword is actually syntactic sugar for the real generator function, basically something like:

def generatorfun(iterable):
  if len(iterable) == 0:
    raise StopIteration
  else:
    return (iterable[0], lambda:generatorfun(iterable[1:]))

Remember that this is just pseudocode and the actual implementation of generators in Python is more complex. But as an exercise to understand what is going on, try to use continuation passing style to implement generator objects without use of the yield keyword.


回答 12

这是简单语言的示例。我将提供高级人类概念与低级Python概念之间的对应关系。

我想对数字序列进行运算,但是我不想为创建该序列而烦恼自己,我只想着重于自己想做的运算。因此,我执行以下操作:

  • 我打电话给你,告诉你我想要一个以特定方式产生的数字序列,让您知道算法是什么。
    此步骤对应于def生成器函数,即包含a的函数yield
  • 稍后,我告诉您,“好,准备告诉我数字的顺序”。
    此步骤对应于调用生成器函数,该函数返回生成器对象。请注意,您还没有告诉我任何数字。你只要拿起纸和铅笔。
  • 我问你,“告诉我下一个号码”,然后你告诉我第一个号码;之后,您等我问您下一个电话号码。记住您的位置,已经说过的电话号码以及下一个电话号码是您的工作。我不在乎细节。
    此步骤对应于调用.next()生成器对象。
  • …重复上一步,直到…
  • 最终,您可能会走到尽头。你不告诉我电话号码;您只是大声喊道:“抱马!我做完了!没有数字了!”
    此步骤对应于生成器对象结束其工作并引发StopIteration异常。生成器函数不需要引发异常。函数结束或发出时,它将自动引发return

这就是生成器的功能(包含的函数yield);它开始执行,在执行时暂停yield,并在要求输入.next()值时从上一个点继续执行。根据设计,它与Python的迭代器协议完美契合,该协议描述了如何顺序请求值。

迭代器协议最著名的用户是forPython中的命令。因此,无论何时执行以下操作:

for item in sequence:

不管sequence是列表,字符串,字典还是如上所述的生成器对象,都没有关系;结果是相同的:您从一个序列中逐个读取项目。

注意,def包含一个yield关键字的函数并不是创建生成器的唯一方法;这是创建一个的最简单的方法。

有关更准确的信息,请阅读Python文档中有关迭代器类型yield语句生成器的信息。

Here is an example in plain language. I will provide a correspondence between high-level human concepts to low-level Python concepts.

I want to operate on a sequence of numbers, but I don’t want to bother my self with the creation of that sequence, I want only to focus on the operation I want to do. So, I do the following:

  • I call you and tell you that I want a sequence of numbers which is produced in a specific way, and I let you know what the algorithm is.
    This step corresponds to defining the generator function, i.e. the function containing a yield.
  • Sometime later, I tell you, “OK, get ready to tell me the sequence of numbers”.
    This step corresponds to calling the generator function which returns a generator object. Note that you don’t tell me any numbers yet; you just grab your paper and pencil.
  • I ask you, “tell me the next number”, and you tell me the first number; after that, you wait for me to ask you for the next number. It’s your job to remember where you were, what numbers you have already said, and what is the next number. I don’t care about the details.
    This step corresponds to calling .next() on the generator object.
  • … repeat previous step, until…
  • eventually, you might come to an end. You don’t tell me a number; you just shout, “hold your horses! I’m done! No more numbers!”
    This step corresponds to the generator object ending its job, and raising a StopIteration exception The generator function does not need to raise the exception. It’s raised automatically when the function ends or issues a return.

This is what a generator does (a function that contains a yield); it starts executing, pauses whenever it does a yield, and when asked for a .next() value it continues from the point it was last. It fits perfectly by design with the iterator protocol of Python, which describes how to sequentially request values.

The most famous user of the iterator protocol is the for command in Python. So, whenever you do a:

for item in sequence:

it doesn’t matter if sequence is a list, a string, a dictionary or a generator object like described above; the result is the same: you read items off a sequence one by one.

Note that defining a function which contains a yield keyword is not the only way to create a generator; it’s just the easiest way to create one.

For more accurate information, read about iterator types, the yield statement and generators in the Python documentation.


回答 13

尽管有许多答案说明了为什么要使用a yield来生成生成器,但是的使用更多了yield。创建协程非常容易,这使信息可以在两个代码块之间传递。我不会重复任何有关使用yield生成器的优秀示例。

为了帮助理解yield以下代码中的功能,您可以用手指在带有的任何代码中跟踪循环yield。每次手指触摸时yield,您都必须等待输入a next或a send。当next被调用时,您通过跟踪代码,直到你打yield…上的右边的代码yield进行评估,并返回给调用者…那你就等着。当next再次被调用时,您将在代码中执行另一个循环。但是,您会注意到,在协程中,yield也可以与send… 一起使用,它将从调用方将值发送 yielding函数。如果send给出a,则yield接收到发送的值,然后将其吐到左侧…然后遍历代码,直到您yield再次单击为止(返回值,就像next被调用一样)。

例如:

>>> def coroutine():
...     i = -1
...     while True:
...         i += 1
...         val = (yield i)
...         print("Received %s" % val)
...
>>> sequence = coroutine()
>>> sequence.next()
0
>>> sequence.next()
Received None
1
>>> sequence.send('hello')
Received hello
2
>>> sequence.close()

While a lot of answers show why you’d use a yield to create a generator, there are more uses for yield. It’s quite easy to make a coroutine, which enables the passing of information between two blocks of code. I won’t repeat any of the fine examples that have already been given about using yield to create a generator.

To help understand what a yield does in the following code, you can use your finger to trace the cycle through any code that has a yield. Every time your finger hits the yield, you have to wait for a next or a send to be entered. When a next is called, you trace through the code until you hit the yield… the code on the right of the yield is evaluated and returned to the caller… then you wait. When next is called again, you perform another loop through the code. However, you’ll note that in a coroutine, yield can also be used with a send… which will send a value from the caller into the yielding function. If a send is given, then yield receives the value sent, and spits it out the left hand side… then the trace through the code progresses until you hit the yield again (returning the value at the end, as if next was called).

For example:

>>> def coroutine():
...     i = -1
...     while True:
...         i += 1
...         val = (yield i)
...         print("Received %s" % val)
...
>>> sequence = coroutine()
>>> sequence.next()
0
>>> sequence.next()
Received None
1
>>> sequence.send('hello')
Received hello
2
>>> sequence.close()

回答 14

还有另一个yield用途和含义(自Python 3.3起):

yield from <expr>

PEP 380-委托给子生成器的语法

提出了一种语法,供生成器将其部分操作委托给另一生成器。这允许包含“ yield”的一段代码被分解出来并放置在另一个生成器中。此外,允许子生成器返回一个值,并且该值可用于委派生成器。

当一个生成器重新产生由另一个生成器生成的值时,新语法还为优化提供了一些机会。

此外,将引入(自Python 3.5起):

async def new_coroutine(data):
   ...
   await blocking_action()

为了避免将协程与常规生成器混淆(今天yield在两者中都使用)。

There is another yield use and meaning (since Python 3.3):

yield from <expr>

From PEP 380 — Syntax for Delegating to a Subgenerator:

A syntax is proposed for a generator to delegate part of its operations to another generator. This allows a section of code containing ‘yield’ to be factored out and placed in another generator. Additionally, the subgenerator is allowed to return with a value, and the value is made available to the delegating generator.

The new syntax also opens up some opportunities for optimisation when one generator re-yields values produced by another.

Moreover this will introduce (since Python 3.5):

async def new_coroutine(data):
   ...
   await blocking_action()

to avoid coroutines being confused with a regular generator (today yield is used in both).


回答 15

所有好的答案,但是对于新手来说有点困难。

我认为您已经了解了该return声明。

作为一个比喻,returnyield是一对双胞胎。return表示“返回并停止”,而“收益”则表示“返回但继续”

  1. 尝试使用获取num_list return
def num_list(n):
    for i in range(n):
        return i

运行:

In [5]: num_list(3)
Out[5]: 0

看,您只会得到一个数字,而不是列表。return永远不要让你高高兴兴,只实现一次就退出。

  1. 来了 yield

替换returnyield

In [10]: def num_list(n):
    ...:     for i in range(n):
    ...:         yield i
    ...:

In [11]: num_list(3)
Out[11]: <generator object num_list at 0x10327c990>

In [12]: list(num_list(3))
Out[12]: [0, 1, 2]

现在,您将赢得所有数字。

与计划return一次运行和停止yield运行的时间进行比较。你可以理解returnreturn one of them,和yield作为return all of them。这称为iterable

  1. 我们可以yield使用以下步骤重写语句return
In [15]: def num_list(n):
    ...:     result = []
    ...:     for i in range(n):
    ...:         result.append(i)
    ...:     return result

In [16]: num_list(3)
Out[16]: [0, 1, 2]

这是关于 yield

列表return输出和对象之间的区别yield输出是:

您将始终从列表对象获取[0,1,2],但只能从“对象yield输出”中检索一次。因此,它具有一个新的名称generator对象,如Out[11]: <generator object num_list at 0x10327c990>

总之,作为一个隐喻,它可以:

  • return并且yield是双胞胎
  • list并且generator是双胞胎

All great answers, however a bit difficult for newbies.

I assume you have learned the return statement.

As an analogy, return and yield are twins. return means ‘return and stop’ whereas ‘yield` means ‘return, but continue’

  1. Try to get a num_list with return.
def num_list(n):
    for i in range(n):
        return i

Run it:

In [5]: num_list(3)
Out[5]: 0

See, you get only a single number rather than a list of them. return never allows you prevail happily, just implements once and quit.

  1. There comes yield

Replace return with yield:

In [10]: def num_list(n):
    ...:     for i in range(n):
    ...:         yield i
    ...:

In [11]: num_list(3)
Out[11]: <generator object num_list at 0x10327c990>

In [12]: list(num_list(3))
Out[12]: [0, 1, 2]

Now, you win to get all the numbers.

Comparing to return which runs once and stops, yield runs times you planed. You can interpret return as return one of them, and yield as return all of them. This is called iterable.

  1. One more step we can rewrite yield statement with return
In [15]: def num_list(n):
    ...:     result = []
    ...:     for i in range(n):
    ...:         result.append(i)
    ...:     return result

In [16]: num_list(3)
Out[16]: [0, 1, 2]

It’s the core about yield.

The difference between a list return outputs and the object yield output is:

You will always get [0, 1, 2] from a list object but only could retrieve them from ‘the object yield output’ once. So, it has a new name generator object as displayed in Out[11]: <generator object num_list at 0x10327c990>.

In conclusion, as a metaphor to grok it:

  • return and yield are twins
  • list and generator are twins

回答 16

以下是一些Python示例,这些示例说明如何实际实现生成器,就像Python没有为其提供语法糖一样:

作为Python生成器:

from itertools import islice

def fib_gen():
    a, b = 1, 1
    while True:
        yield a
        a, b = b, a + b

assert [1, 1, 2, 3, 5] == list(islice(fib_gen(), 5))

使用词法闭包而不是生成器

def ftake(fnext, last):
    return [fnext() for _ in xrange(last)]

def fib_gen2():
    #funky scope due to python2.x workaround
    #for python 3.x use nonlocal
    def _():
        _.a, _.b = _.b, _.a + _.b
        return _.a
    _.a, _.b = 0, 1
    return _

assert [1,1,2,3,5] == ftake(fib_gen2(), 5)

使用对象闭包而不是生成器(因为ClosuresAndObjectsAreEquivalent

class fib_gen3:
    def __init__(self):
        self.a, self.b = 1, 1

    def __call__(self):
        r = self.a
        self.a, self.b = self.b, self.a + self.b
        return r

assert [1,1,2,3,5] == ftake(fib_gen3(), 5)

Here are some Python examples of how to actually implement generators as if Python did not provide syntactic sugar for them:

As a Python generator:

from itertools import islice

def fib_gen():
    a, b = 1, 1
    while True:
        yield a
        a, b = b, a + b

assert [1, 1, 2, 3, 5] == list(islice(fib_gen(), 5))

Using lexical closures instead of generators

def ftake(fnext, last):
    return [fnext() for _ in xrange(last)]

def fib_gen2():
    #funky scope due to python2.x workaround
    #for python 3.x use nonlocal
    def _():
        _.a, _.b = _.b, _.a + _.b
        return _.a
    _.a, _.b = 0, 1
    return _

assert [1,1,2,3,5] == ftake(fib_gen2(), 5)

Using object closures instead of generators (because ClosuresAndObjectsAreEquivalent)

class fib_gen3:
    def __init__(self):
        self.a, self.b = 1, 1

    def __call__(self):
        r = self.a
        self.a, self.b = self.b, self.a + self.b
        return r

assert [1,1,2,3,5] == ftake(fib_gen3(), 5)

回答 17

我打算发布“阅读Beazley的“ Python:基本参考”的第19页,以快速了解生成器”,但是已经有许多其他人发布了不错的描述。

另外,请注意,它们yield可以在协程中用作生成函数的双重功能。尽管它与您的代码段用法不同,(yield)但是可以用作函数中的表达式。当调用者使用该send()方法向该方法发送值时,协程将执行直到(yield)遇到下一条语句。

生成器和协程是设置数据流类型应用程序的一种很酷的方法。我认为有必要了解该yield语句在函数中的其他用法。

I was going to post “read page 19 of Beazley’s ‘Python: Essential Reference’ for a quick description of generators”, but so many others have posted good descriptions already.

Also, note that yield can be used in coroutines as the dual of their use in generator functions. Although it isn’t the same use as your code snippet, (yield) can be used as an expression in a function. When a caller sends a value to the method using the send() method, then the coroutine will execute until the next (yield) statement is encountered.

Generators and coroutines are a cool way to set up data-flow type applications. I thought it would be worthwhile knowing about the other use of the yield statement in functions.


回答 18

从编程的角度来看,迭代器被实现为thunk

为了将迭代器,生成器和线程池实现为并发执行等,作为重击(也称为匿名函数),人们使用发送到具有分派器的闭包对象的消息,然后分派器对“消息”做出响应。

http://en.wikipedia.org/wiki/Message_passing

next ”是发送给闭包的消息,由“ iter ”创建 ”调用。

有很多方法可以实现此计算。我使用了变异,但是通过返回当前值和下一个生成器,很容易做到无变异。

这是一个使用R6RS结构的演示,但是其语义与Python完全相同。它是相同的计算模型,只需要更改语法就可以用Python重写它。

Welcome to Racket v6.5.0.3.

-> (define gen
     (lambda (l)
       (define yield
         (lambda ()
           (if (null? l)
               'END
               (let ((v (car l)))
                 (set! l (cdr l))
                 v))))
       (lambda(m)
         (case m
           ('yield (yield))
           ('init  (lambda (data)
                     (set! l data)
                     'OK))))))
-> (define stream (gen '(1 2 3)))
-> (stream 'yield)
1
-> (stream 'yield)
2
-> (stream 'yield)
3
-> (stream 'yield)
'END
-> ((stream 'init) '(a b))
'OK
-> (stream 'yield)
'a
-> (stream 'yield)
'b
-> (stream 'yield)
'END
-> (stream 'yield)
'END
->

From a programming viewpoint, the iterators are implemented as thunks.

To implement iterators, generators, and thread pools for concurrent execution, etc. as thunks (also called anonymous functions), one uses messages sent to a closure object, which has a dispatcher, and the dispatcher answers to “messages”.

http://en.wikipedia.org/wiki/Message_passing

next” is a message sent to a closure, created by the “iter” call.

There are lots of ways to implement this computation. I used mutation, but it is easy to do it without mutation, by returning the current value and the next yielder.

Here is a demonstration which uses the structure of R6RS, but the semantics is absolutely identical to Python’s. It’s the same model of computation, and only a change in syntax is required to rewrite it in Python.

Welcome to Racket v6.5.0.3.

-> (define gen
     (lambda (l)
       (define yield
         (lambda ()
           (if (null? l)
               'END
               (let ((v (car l)))
                 (set! l (cdr l))
                 v))))
       (lambda(m)
         (case m
           ('yield (yield))
           ('init  (lambda (data)
                     (set! l data)
                     'OK))))))
-> (define stream (gen '(1 2 3)))
-> (stream 'yield)
1
-> (stream 'yield)
2
-> (stream 'yield)
3
-> (stream 'yield)
'END
-> ((stream 'init) '(a b))
'OK
-> (stream 'yield)
'a
-> (stream 'yield)
'b
-> (stream 'yield)
'END
-> (stream 'yield)
'END
->

回答 19

这是一个简单的示例:

def isPrimeNumber(n):
    print "isPrimeNumber({}) call".format(n)
    if n==1:
        return False
    for x in range(2,n):
        if n % x == 0:
            return False
    return True

def primes (n=1):
    while(True):
        print "loop step ---------------- {}".format(n)
        if isPrimeNumber(n): yield n
        n += 1

for n in primes():
    if n> 10:break
    print "wiriting result {}".format(n)

输出:

loop step ---------------- 1
isPrimeNumber(1) call
loop step ---------------- 2
isPrimeNumber(2) call
loop step ---------------- 3
isPrimeNumber(3) call
wiriting result 3
loop step ---------------- 4
isPrimeNumber(4) call
loop step ---------------- 5
isPrimeNumber(5) call
wiriting result 5
loop step ---------------- 6
isPrimeNumber(6) call
loop step ---------------- 7
isPrimeNumber(7) call
wiriting result 7
loop step ---------------- 8
isPrimeNumber(8) call
loop step ---------------- 9
isPrimeNumber(9) call
loop step ---------------- 10
isPrimeNumber(10) call
loop step ---------------- 11
isPrimeNumber(11) call

我不是Python开发人员,但在我看来 yield保持着程序流程的位置,并且下一个循环从“ yield”位置开始。似乎它正在那个位置等待,就在那之前,在外面返回一个值,下一次继续工作。

这似乎是一种有趣而又不错的能力:D

Here is a simple example:

def isPrimeNumber(n):
    print "isPrimeNumber({}) call".format(n)
    if n==1:
        return False
    for x in range(2,n):
        if n % x == 0:
            return False
    return True

def primes (n=1):
    while(True):
        print "loop step ---------------- {}".format(n)
        if isPrimeNumber(n): yield n
        n += 1

for n in primes():
    if n> 10:break
    print "wiriting result {}".format(n)

Output:

loop step ---------------- 1
isPrimeNumber(1) call
loop step ---------------- 2
isPrimeNumber(2) call
loop step ---------------- 3
isPrimeNumber(3) call
wiriting result 3
loop step ---------------- 4
isPrimeNumber(4) call
loop step ---------------- 5
isPrimeNumber(5) call
wiriting result 5
loop step ---------------- 6
isPrimeNumber(6) call
loop step ---------------- 7
isPrimeNumber(7) call
wiriting result 7
loop step ---------------- 8
isPrimeNumber(8) call
loop step ---------------- 9
isPrimeNumber(9) call
loop step ---------------- 10
isPrimeNumber(10) call
loop step ---------------- 11
isPrimeNumber(11) call

I am not a Python developer, but it looks to me yield holds the position of program flow and the next loop start from “yield” position. It seems like it is waiting at that position, and just before that, returning a value outside, and next time continues to work.

It seems to be an interesting and nice ability :D


回答 20

这是做什么事情的心理yield印象。

我喜欢将线程视为具有堆栈(即使未以这种方式实现)。

调用普通函数时,它将其局部变量放在堆栈上,进行一些计算,然后清除堆栈并返回。再也看不到其局部变量的值。

对于一个yield函数,当其代码开始运行时(即,在调用该函数之后,返回生成器对象,next()然后调用该方法的生成器对象),它类似地将其局部变量放入堆栈中并进行一段时间的计算。但是,当它命中该yield语句时,在清除堆栈的一部分并返回之前,它会对其局部变量进行快照,并将其存储在生成器对象中。它还在代码中写下了当前位置(即特定的yield语句)。

因此,这是生成器挂起的一种冻结函数。

next()随后被调用时,它检索功能的物品入堆栈,重新蓬勃生机。该函数从中断处继续进行计算,而忽略了它刚刚在冷库中度过了一个永恒的事实。

比较以下示例:

def normalFunction():
    return
    if False:
        pass

def yielderFunction():
    return
    if False:
        yield 12

当我们调用第二个函数时,它的行为与第一个函数非常不同。该yield语句可能无法到达,但是如果它存在于任何地方,它将改变我们正在处理的内容的性质。

>>> yielderFunction()
<generator object yielderFunction at 0x07742D28>

调用yielderFunction()不会运行其代码,而是使代码生成器。(yielder为便于阅读,以这样的名称命名可能是个好主意。)

>>> gen = yielderFunction()
>>> dir(gen)
['__class__',
 ...
 '__iter__',    #Returns gen itself, to make it work uniformly with containers
 ...            #when given to a for loop. (Containers return an iterator instead.)
 'close',
 'gi_code',
 'gi_frame',
 'gi_running',
 'next',        #The method that runs the function's body.
 'send',
 'throw']

gi_codegi_frame字段是冻结状态的存储位置。用探索它们dir(..),我们可以确认我们上面的心理模型是可信的。

Here is a mental image of what yield does.

I like to think of a thread as having a stack (even when it’s not implemented that way).

When a normal function is called, it puts its local variables on the stack, does some computation, then clears the stack and returns. The values of its local variables are never seen again.

With a yield function, when its code begins to run (i.e. after the function is called, returning a generator object, whose next() method is then invoked), it similarly puts its local variables onto the stack and computes for a while. But then, when it hits the yield statement, before clearing its part of the stack and returning, it takes a snapshot of its local variables and stores them in the generator object. It also writes down the place where it’s currently up to in its code (i.e. the particular yield statement).

So it’s a kind of a frozen function that the generator is hanging onto.

When next() is called subsequently, it retrieves the function’s belongings onto the stack and re-animates it. The function continues to compute from where it left off, oblivious to the fact that it had just spent an eternity in cold storage.

Compare the following examples:

def normalFunction():
    return
    if False:
        pass

def yielderFunction():
    return
    if False:
        yield 12

When we call the second function, it behaves very differently to the first. The yield statement might be unreachable, but if it’s present anywhere, it changes the nature of what we’re dealing with.

>>> yielderFunction()
<generator object yielderFunction at 0x07742D28>

Calling yielderFunction() doesn’t run its code, but makes a generator out of the code. (Maybe it’s a good idea to name such things with the yielder prefix for readability.)

>>> gen = yielderFunction()
>>> dir(gen)
['__class__',
 ...
 '__iter__',    #Returns gen itself, to make it work uniformly with containers
 ...            #when given to a for loop. (Containers return an iterator instead.)
 'close',
 'gi_code',
 'gi_frame',
 'gi_running',
 'next',        #The method that runs the function's body.
 'send',
 'throw']

The gi_code and gi_frame fields are where the frozen state is stored. Exploring them with dir(..), we can confirm that our mental model above is credible.


回答 21

就像每个答案所建议的那样,yield用于创建序列生成器。它用于动态生成一些序列。例如,在网络上逐行读取文件时,可以使用以下yield功能:

def getNextLines():
   while con.isOpen():
       yield con.read()

您可以在代码中使用它,如下所示:

for line in getNextLines():
    doSomeThing(line)

执行控制转移陷阱

执行foryield时,执行控制将从getNextLines()转移到循环中。因此,每次调用getNextLines()时,都会从上次暂停的位置开始执行。

因此,简而言之,具有以下代码的函数

def simpleYield():
    yield "first time"
    yield "second time"
    yield "third time"
    yield "Now some useful value {}".format(12)

for i in simpleYield():
    print i

将打印

"first time"
"second time"
"third time"
"Now some useful value 12"

Like every answer suggests, yield is used for creating a sequence generator. It’s used for generating some sequence dynamically. For example, while reading a file line by line on a network, you can use the yield function as follows:

def getNextLines():
   while con.isOpen():
       yield con.read()

You can use it in your code as follows:

for line in getNextLines():
    doSomeThing(line)

Execution Control Transfer gotcha

The execution control will be transferred from getNextLines() to the for loop when yield is executed. Thus, every time getNextLines() is invoked, execution begins from the point where it was paused last time.

Thus in short, a function with the following code

def simpleYield():
    yield "first time"
    yield "second time"
    yield "third time"
    yield "Now some useful value {}".format(12)

for i in simpleYield():
    print i

will print

"first time"
"second time"
"third time"
"Now some useful value 12"

回答 22

一个简单的例子来了解它是什么: yield

def f123():
    for _ in range(4):
        yield 1
        yield 2


for i in f123():
    print (i)

输出为:

1 2 1 2 1 2 1 2

An easy example to understand what it is: yield

def f123():
    for _ in range(4):
        yield 1
        yield 2


for i in f123():
    print (i)

The output is:

1 2 1 2 1 2 1 2

回答 23

(我下面的回答仅从使用Python生成器的角度讲,而不是生成器机制基础实现,它涉及堆栈和堆操作的一些技巧。)

在python函数中yield使用when 代替a return时,该函数变成了一个特殊的名称generator function。该函数将返回一个generator类型的对象。yield关键字是一个标志,通知Python编译器将特殊对待这样的功能。普通函数将在返回一些值后终止。但是在编译器的帮助下,可以将 generator函数视为可恢复的。也就是说,将恢复执行上下文,并且将从上次运行继续执行。在您显式调用return之前,它将引发StopIteration异常(这也是迭代器协议的一部分),或到达函数的结尾。我发现了很多关于引用的generator,但是这一个从中functional programming perspective最容易消化。

(现在,我想根据我自己的理解来讨论其背后的原理generatoriterator基础。我希望这可以帮助您掌握迭代器和生成器的基本动机。这种概念也出现在其他语言中,例如C#。)

据我了解,当我们要处理一堆数据时,通常先将数据存储在某个地方,然后再逐一处理。但是这种幼稚的方法是有问题的。如果数据量巨大,则预先存储它们是很昂贵的。因此data,为什么不直接存储自身,为什么不metadata间接存储某种形式,即the logic how the data is computed

有两种包装此类元数据的方法。

  1. 面向对象的方法,我们包装了元数据as a class。这就是所谓的iterator实现迭代器协议的人(即__next__()__iter__()方法)。这也是常见的迭代器设计模式
  2. 在功能方法上,我们包装了元数据as a function。这就是所谓的generator function。但是在后台,返回的generator object静态IS-A迭代器仍然存在,因为它也实现了迭代器协议。

无论哪种方式,都会创建一个迭代器,即某个可以为您提供所需数据的对象。OO方法可能有点复杂。无论如何,要使用哪一个取决于您。

(My below answer only speaks from the perspective of using Python generator, not the underlying implementation of generator mechanism, which involves some tricks of stack and heap manipulation.)

When yield is used instead of a return in a python function, that function is turned into something special called generator function. That function will return an object of generator type. The yield keyword is a flag to notify the python compiler to treat such function specially. Normal functions will terminate once some value is returned from it. But with the help of the compiler, the generator function can be thought of as resumable. That is, the execution context will be restored and the execution will continue from last run. Until you explicitly call return, which will raise a StopIteration exception (which is also part of the iterator protocol), or reach the end of the function. I found a lot of references about generator but this one from the functional programming perspective is the most digestable.

(Now I want to talk about the rationale behind generator, and the iterator based on my own understanding. I hope this can help you grasp the essential motivation of iterator and generator. Such concept shows up in other languages as well such as C#.)

As I understand, when we want to process a bunch of data, we usually first store the data somewhere and then process it one by one. But this naive approach is problematic. If the data volume is huge, it’s expensive to store them as a whole beforehand. So instead of storing the data itself directly, why not store some kind of metadata indirectly, i.e. the logic how the data is computed.

There are 2 approaches to wrap such metadata.

  1. The OO approach, we wrap the metadata as a class. This is the so-called iterator who implements the iterator protocol (i.e. the __next__(), and __iter__() methods). This is also the commonly seen iterator design pattern.
  2. The functional approach, we wrap the metadata as a function. This is the so-called generator function. But under the hood, the returned generator object still IS-A iterator because it also implements the iterator protocol.

Either way, an iterator is created, i.e. some object that can give you the data you want. The OO approach may be a bit complex. Anyway, which one to use is up to you.


回答 24

总之,该yield语句将您的函数转换为一个工厂,该工厂产生一个称为a的特殊对象,该对象generator环绕原始函数的主体。当generator被重复,直到它到达下一个执行的功能yield后停止执行,计算结果为传递给值yield。它将在每次迭代中重复此过程,直到执行路径退出函数为止。例如,

def simple_generator():
    yield 'one'
    yield 'two'
    yield 'three'

for i in simple_generator():
    print i

简单地输出

one
two
three

动力来自将生成器与计算序列的循环配合使用,生成器每次执行循环都会停止,以“产生”下一个计算结果,这样就可以即时计算列表,而好处是可以存储保存用于特别大的计算

假设您想创建自己的range函数来产生可迭代的数字范围,则可以这样做,

def myRangeNaive(i):
    n = 0
    range = []
    while n < i:
        range.append(n)
        n = n + 1
    return range

像这样使用

for i in myRangeNaive(10):
    print i

但这是低效的,因为

  • 您创建只使用一次的数组(这会浪费内存)
  • 这段代码实际上在该数组上循环了两次!:(

幸运的是,Guido和他的团队足够慷慨地开发生成器,因此我们可以做到这一点。

def myRangeSmart(i):
    n = 0
    while n < i:
       yield n
       n = n + 1
    return

for i in myRangeSmart(10):
    print i

现在,每次迭代时,生成器上的一个称为next()函数的函数都会执行该函数,直到达到“ yield”语句为止,该语句在该语句中停止并“屈服”值或到达函数的末尾。在这种情况下,在第一次调用时,next()执行到yield语句并产生yield’n’,在下一次调用时,它将执行递增语句,跳回到’while’,对其求值,如果为true,它将停止并再次产生yield’n’,它将继续以这种方式,直到while条件返回false且生成器跳到函数的末尾。

In summary, the yield statement transforms your function into a factory that produces a special object called a generator which wraps around the body of your original function. When the generator is iterated, it executes your function until it reaches the next yield then suspends execution and evaluates to the value passed to yield. It repeats this process on each iteration until the path of execution exits the function. For instance,

def simple_generator():
    yield 'one'
    yield 'two'
    yield 'three'

for i in simple_generator():
    print i

simply outputs

one
two
three

The power comes from using the generator with a loop that calculates a sequence, the generator executes the loop stopping each time to ‘yield’ the next result of the calculation, in this way it calculates a list on the fly, the benefit being the memory saved for especially large calculations

Say you wanted to create a your own range function that produces an iterable range of numbers, you could do it like so,

def myRangeNaive(i):
    n = 0
    range = []
    while n < i:
        range.append(n)
        n = n + 1
    return range

and use it like this;

for i in myRangeNaive(10):
    print i

But this is inefficient because

  • You create an array that you only use once (this wastes memory)
  • This code actually loops over that array twice! :(

Luckily Guido and his team were generous enough to develop generators so we could just do this;

def myRangeSmart(i):
    n = 0
    while n < i:
       yield n
       n = n + 1
    return

for i in myRangeSmart(10):
    print i

Now upon each iteration a function on the generator called next() executes the function until it either reaches a ‘yield’ statement in which it stops and ‘yields’ the value or reaches the end of the function. In this case on the first call, next() executes up to the yield statement and yield ‘n’, on the next call it will execute the increment statement, jump back to the ‘while’, evaluate it, and if true, it will stop and yield ‘n’ again, it will continue that way until the while condition returns false and the generator jumps to the end of the function.


回答 25

Yield是一个对象

return函数中的A 将返回单个值。

如果您希望函数返回大量值,请使用yield

更重要的yield是,是一个障碍

就像CUDA语言中的barrier一样,它在完成之前不会转移控制权。

也就是说,它将从头开始运行函数中的代码,直到命中为止yield。然后,它将返回循环的第一个值。

然后,其他所有调用将再次运行您在函数中编写的循环,返回下一个值,直到没有任何值可返回为止。

Yield is an object

A return in a function will return a single value.

If you want a function to return a huge set of values, use yield.

More importantly, yield is a barrier.

like barrier in the CUDA language, it will not transfer control until it gets completed.

That is, it will run the code in your function from the beginning until it hits yield. Then, it’ll return the first value of the loop.

Then, every other call will run the loop you have written in the function one more time, returning the next value until there isn’t any value to return.


回答 26

许多人使用return而不是yield,但是在某些情况下yield可以更高效,更轻松地工作。

这是yield绝对适合的示例:

返回(函数中)

import random

def return_dates():
    dates = [] # With 'return' you need to create a list then return it
    for i in range(5):
        date = random.choice(["1st", "2nd", "3rd", "4th", "5th", "6th", "7th", "8th", "9th", "10th"])
        dates.append(date)
    return dates

Yield(以功能计)

def yield_dates():
    for i in range(5):
        date = random.choice(["1st", "2nd", "3rd", "4th", "5th", "6th", "7th", "8th", "9th", "10th"])
        yield date # 'yield' makes a generator automatically which works
                   # in a similar way. This is much more efficient.

通话功能

dates_list = return_dates()
print(dates_list)
for i in dates_list:
    print(i)

dates_generator = yield_dates()
print(dates_generator)
for i in dates_generator:
    print(i)

这两个函数执行相同的操作,但是yield使用三行而不是五行,并且少担心一个变量。

这是代码的结果:

输出量

如您所见,两个函数都做同样的事情。唯一的区别是return_dates()提供列表和yield_dates()生成器。

现实生活中的例子可能是像逐行读取文件,或者只是想生成一个生成器。

Many people use return rather than yield, but in some cases yield can be more efficient and easier to work with.

Here is an example which yield is definitely best for:

return (in function)

import random

def return_dates():
    dates = [] # With 'return' you need to create a list then return it
    for i in range(5):
        date = random.choice(["1st", "2nd", "3rd", "4th", "5th", "6th", "7th", "8th", "9th", "10th"])
        dates.append(date)
    return dates

yield (in function)

def yield_dates():
    for i in range(5):
        date = random.choice(["1st", "2nd", "3rd", "4th", "5th", "6th", "7th", "8th", "9th", "10th"])
        yield date # 'yield' makes a generator automatically which works
                   # in a similar way. This is much more efficient.

Calling functions

dates_list = return_dates()
print(dates_list)
for i in dates_list:
    print(i)

dates_generator = yield_dates()
print(dates_generator)
for i in dates_generator:
    print(i)

Both functions do the same thing, but yield uses three lines instead of five and has one less variable to worry about.

This is the result from the code:

Output

As you can see both functions do the same thing. The only difference is return_dates() gives a list and yield_dates() gives a generator.

A real life example would be something like reading a file line by line or if you just want to make a generator.


回答 27

yield就像函数的返回元素一样。不同之处在于,yield元素将功能转换为生成器。生成器的行为就像一个函数,直到“屈服”为止。生成器停止运行,直到下一次调用为止,并从与启动完全相同的点继续运行。您可以通过调用来获得所有“屈服”值的序列list(generator())

yield is like a return element for a function. The difference is, that the yield element turns a function into a generator. A generator behaves just like a function until something is ‘yielded’. The generator stops until it is next called, and continues from exactly the same point as it started. You can get a sequence of all the ‘yielded’ values in one, by calling list(generator()).


回答 28

yield关键字简单地收集返回结果。想想yieldreturn +=

The yield keyword simply collects returning results. Think of yield like return +=


回答 29

这是一种yield基于简单的方法来计算斐波那契数列,解释如下:

def fib(limit=50):
    a, b = 0, 1
    for i in range(limit):
       yield b
       a, b = b, a+b

当您将其输入到REPL中并尝试调用它时,您将得到一个神秘的结果:

>>> fib()
<generator object fib at 0x7fa38394e3b8>

这是因为存在yield向您发送信号的Python,您想要创建一个生成器,即一个按需生成值的对象。

那么,如何生成这些值?这可以通过使用内置函数直接完成,也可以next通过将其提供给使用值的构造间接完成。

使用内置next()函数,您可以直接调用.next/ __next__,强制生成器生成一个值:

>>> g = fib()
>>> next(g)
1
>>> next(g)
1
>>> next(g)
2
>>> next(g)
3
>>> next(g)
5

间接地,如果您提供fibfor循环,list初始化程序,tuple初始化程序或其他任何期望对象生成/产生值的对象,则将“消耗”生成器,直到无法再生成任何值(并且返回) :

results = []
for i in fib(30):       # consumes fib
    results.append(i) 
# can also be accomplished with
results = list(fib(30)) # consumes fib

同样,使用tuple初始化程序:

>>> tuple(fib(5))       # consumes fib
(1, 1, 2, 3, 5)

生成器在延迟方面与功能有所不同。它通过保持其本地状态并允许您在需要时恢复来实现此目的。

首次调用fib时:

f = fib()

Python编译函数,遇到yield关键字,然后简单地将生成器对象返回给您。看起来不是很有帮助。

然后,当您请求它直接或间接生成第一个值时,它将执行找到的所有语句,直到遇到a为止yield,然后返回您提供给它的值yield并暂停。为了更好地说明这一点,让我们使用一些print调用(print "text"在Python 2上用if 代替):

def yielder(value):
    """ This is an infinite generator. Only use next on it """ 
    while 1:
        print("I'm going to generate the value for you")
        print("Then I'll pause for a while")
        yield value
        print("Let's go through it again.")

现在,输入REPL:

>>> gen = yielder("Hello, yield!")

您现在有了一个生成器对象,等待一个命令来生成一个值。使用next并查看打印出的内容:

>>> next(gen) # runs until it finds a yield
I'm going to generate the value for you
Then I'll pause for a while
'Hello, yield!'

未报价的结果是所打印的内容。引用的结果是从返回的结果yieldnext现在再次调用:

>>> next(gen) # continues from yield and runs again
Let's go through it again.
I'm going to generate the value for you
Then I'll pause for a while
'Hello, yield!'

生成器会记住它在此处暂停yield value并从那里继续。打印下一条消息yield,并再次执行搜索以使其暂停的语句(由于while循环)。

Here’s a simple yield based approach, to compute the fibonacci series, explained:

def fib(limit=50):
    a, b = 0, 1
    for i in range(limit):
       yield b
       a, b = b, a+b

When you enter this into your REPL and then try and call it, you’ll get a mystifying result:

>>> fib()
<generator object fib at 0x7fa38394e3b8>

This is because the presence of yield signaled to Python that you want to create a generator, that is, an object that generates values on demand.

So, how do you generate these values? This can either be done directly by using the built-in function next, or, indirectly by feeding it to a construct that consumes values.

Using the built-in next() function, you directly invoke .next/__next__, forcing the generator to produce a value:

>>> g = fib()
>>> next(g)
1
>>> next(g)
1
>>> next(g)
2
>>> next(g)
3
>>> next(g)
5

Indirectly, if you provide fib to a for loop, a list initializer, a tuple initializer, or anything else that expects an object that generates/produces values, you’ll “consume” the generator until no more values can be produced by it (and it returns):

results = []
for i in fib(30):       # consumes fib
    results.append(i) 
# can also be accomplished with
results = list(fib(30)) # consumes fib

Similarly, with a tuple initializer:

>>> tuple(fib(5))       # consumes fib
(1, 1, 2, 3, 5)

A generator differs from a function in the sense that it is lazy. It accomplishes this by maintaining it’s local state and allowing you to resume whenever you need to.

When you first invoke fib by calling it:

f = fib()

Python compiles the function, encounters the yield keyword and simply returns a generator object back at you. Not very helpful it seems.

When you then request it generates the first value, directly or indirectly, it executes all statements that it finds, until it encounters a yield, it then yields back the value you supplied to yield and pauses. For an example that better demonstrates this, let’s use some print calls (replace with print "text" if on Python 2):

def yielder(value):
    """ This is an infinite generator. Only use next on it """ 
    while 1:
        print("I'm going to generate the value for you")
        print("Then I'll pause for a while")
        yield value
        print("Let's go through it again.")

Now, enter in the REPL:

>>> gen = yielder("Hello, yield!")

you have a generator object now waiting for a command for it to generate a value. Use next and see what get’s printed:

>>> next(gen) # runs until it finds a yield
I'm going to generate the value for you
Then I'll pause for a while
'Hello, yield!'

The unquoted results are what’s printed. The quoted result is what is returned from yield. Call next again now:

>>> next(gen) # continues from yield and runs again
Let's go through it again.
I'm going to generate the value for you
Then I'll pause for a while
'Hello, yield!'

The generator remembers it was paused at yield value and resumes from there. The next message is printed and the search for the yield statement to pause at it performed again (due to the while loop).


__name__ ==“ __main__”是什么?

问题:__name__ ==“ __main__”是什么?

给定以下代码,该if __name__ == "__main__":是什么?

# Threading example
import time, thread

def myfunction(string, sleeptime, lock, *args):
    while True:
        lock.acquire()
        time.sleep(sleeptime)
        lock.release()
        time.sleep(sleeptime)

if __name__ == "__main__":
    lock = thread.allocate_lock()
    thread.start_new_thread(myfunction, ("Thread #: 1", 2, lock))
    thread.start_new_thread(myfunction, ("Thread #: 2", 2, lock))

Given the following code, what does the if __name__ == "__main__": do?

# Threading example
import time, thread

def myfunction(string, sleeptime, lock, *args):
    while True:
        lock.acquire()
        time.sleep(sleeptime)
        lock.release()
        time.sleep(sleeptime)

if __name__ == "__main__":
    lock = thread.allocate_lock()
    thread.start_new_thread(myfunction, ("Thread #: 1", 2, lock))
    thread.start_new_thread(myfunction, ("Thread #: 2", 2, lock))

回答 0

每当Python解释器读取源文件时,它都会做两件事:

  • 它设置了一些特殊变量,例如__name__,然后

  • 它执行文件中找到的所有代码。

让我们看看它是如何工作的,以及它与您关于__name__我们在Python脚本中经常看到的检查问题的关系。

代码样例

让我们使用稍微不同的代码示例来探索导入和脚本的工作方式。假设以下文件位于foo.py

# Suppose this is foo.py.

print("before import")
import math

print("before functionA")
def functionA():
    print("Function A")

print("before functionB")
def functionB():
    print("Function B {}".format(math.sqrt(100)))

print("before __name__ guard")
if __name__ == '__main__':
    functionA()
    functionB()
print("after __name__ guard")

特殊变量

当Python交互程序读取源文件时,它首先定义一些特殊变量。在这种情况下,我们关心__name__变量。

当您的模块是主程序时

如果您将模块(源文件)作为主程序运行,例如

python foo.py

解释器将硬编码字符串赋值"__main__"__name__变量,即

# It's as if the interpreter inserts this at the top
# of your module when run as the main program.
__name__ = "__main__" 

当您的模块由另一个导入时

另一方面,假设其他模块是主程序,并且它将导入您的模块。这意味着在主程序中或主程序导入的某些其他模块中有这样的语句:

# Suppose this is in some other main program.
import foo

解释器将搜索您的foo.py文件(以及搜索其他一些变体),并在执行该模块之前,它将"foo"导入语句中的名称分配给__name__变量,即

# It's as if the interpreter inserts this at the top
# of your module when it's imported from another module.
__name__ = "foo"

执行模块的代码

设置特殊变量后,解释器一次执行一次语句,执行模块中的所有代码。您可能想要在代码示例侧面打开另一个窗口,以便您可以按照以下说明进行操作。

总是

  1. 它打印字符串"before import"(不带引号)。

  2. 它将加载math模块并将其分配给名为的变量math。这等效于替换import math为以下内容(请注意,这__import__是Python中的低级函数,它接受字符串并触发实际的导入):

# Find and load a module given its string name, "math",
# then assign it to a local variable called math.
math = __import__("math")
  1. 它输出字符串"before functionA"

  2. 它执行该def块,创建一个功能对象,然后将该功能对象分配给名为的变量functionA

  3. 它输出字符串"before functionB"

  4. 它执行第二个def块,创建另一个函数对象,然后将其分配给名为的变量functionB

  5. 它输出字符串"before __name__ guard"

仅当您的模块是主程序时

  1. 如果您的模块是主程序,那么它将看到__name__确实已将其设置为,"__main__"并且它将调用两个函数,分别输出字符串"Function A""Function B 10.0"

仅当您的模块由另一个导入时

  1. 相反)如果您的模块不是主程序,而是由另一个程序导入的,__name__则将为"foo",不是"__main__",它将跳过if语句的主体。

总是

  1. "after __name__ guard"在两种情况下都将打印字符串。

摘要

总而言之,这是两种情况下的打印内容:

# What gets printed if foo is the main program
before import
before functionA
before functionB
before __name__ guard
Function A
Function B 10.0
after __name__ guard
# What gets printed if foo is imported as a regular module
before import
before functionA
before functionB
before __name__ guard
after __name__ guard

为什么这样工作?

您自然会想知道为什么有人会想要这个。好吧,有时您想编写一个.py文件,该文件既可以被其他程序和/或模块用作模块,也可以作为主程序本身运行。例子:

  • 您的模块是一个库,但是您希望有一个脚本模式,在其中运行一些单元测试或演示。

  • 您的模块仅用作主程序,但是它具有一些单元测试,并且测试框架通过导入.py文件(如脚本)并运行特殊的测试功能来工作。您不希望它只是因为正在导入模块而尝试运行脚本。

  • 您的模块主要用作主程序,但它也为高级用户提供了程序员友好的API。

除了这些示例之外,可以优雅地用Python运行脚本只是设置一些魔术变量并导入脚本。“运行”脚本是导入脚本模块的副作用。

思想的食物

  • 问题:我可以有多个__name__检查块吗?答:这样做很奇怪,但是这种语言不会阻止您。

  • 假设以下内容在中foo2.py。如果python foo2.py在命令行上说会怎样?为什么?

# Suppose this is foo2.py.

def functionA():
    print("a1")
    from foo2 import functionB
    print("a2")
    functionB()
    print("a3")

def functionB():
    print("b")

print("t1")
if __name__ == "__main__":
    print("m1")
    functionA()
    print("m2")
print("t2")
  • 现在,弄清楚如果删除__name__签入会发生什么foo3.py
# Suppose this is foo3.py.

def functionA():
    print("a1")
    from foo3 import functionB
    print("a2")
    functionB()
    print("a3")

def functionB():
    print("b")

print("t1")
print("m1")
functionA()
print("m2")
print("t2")
  • 当用作脚本时,它将做什么?当作为模块导入时?
# Suppose this is in foo4.py
__name__ = "__main__"

def bar():
    print("bar")

print("before __name__ guard")
if __name__ == "__main__":
    bar()
print("after __name__ guard")

Whenever the Python interpreter reads a source file, it does two things:

  • it sets a few special variables like __name__, and then

  • it executes all of the code found in the file.

Let’s see how this works and how it relates to your question about the __name__ checks we always see in Python scripts.

Code Sample

Let’s use a slightly different code sample to explore how imports and scripts work. Suppose the following is in a file called foo.py.

# Suppose this is foo.py.

print("before import")
import math

print("before functionA")
def functionA():
    print("Function A")

print("before functionB")
def functionB():
    print("Function B {}".format(math.sqrt(100)))

print("before __name__ guard")
if __name__ == '__main__':
    functionA()
    functionB()
print("after __name__ guard")

Special Variables

When the Python interpeter reads a source file, it first defines a few special variables. In this case, we care about the __name__ variable.

When Your Module Is the Main Program

If you are running your module (the source file) as the main program, e.g.

python foo.py

the interpreter will assign the hard-coded string "__main__" to the __name__ variable, i.e.

# It's as if the interpreter inserts this at the top
# of your module when run as the main program.
__name__ = "__main__" 

When Your Module Is Imported By Another

On the other hand, suppose some other module is the main program and it imports your module. This means there’s a statement like this in the main program, or in some other module the main program imports:

# Suppose this is in some other main program.
import foo

The interpreter will search for your foo.py file (along with searching for a few other variants), and prior to executing that module, it will assign the name "foo" from the import statement to the __name__ variable, i.e.

# It's as if the interpreter inserts this at the top
# of your module when it's imported from another module.
__name__ = "foo"

Executing the Module’s Code

After the special variables are set up, the interpreter executes all the code in the module, one statement at a time. You may want to open another window on the side with the code sample so you can follow along with this explanation.

Always

  1. It prints the string "before import" (without quotes).

  2. It loads the math module and assigns it to a variable called math. This is equivalent to replacing import math with the following (note that __import__ is a low-level function in Python that takes a string and triggers the actual import):

# Find and load a module given its string name, "math",
# then assign it to a local variable called math.
math = __import__("math")
  1. It prints the string "before functionA".

  2. It executes the def block, creating a function object, then assigning that function object to a variable called functionA.

  3. It prints the string "before functionB".

  4. It executes the second def block, creating another function object, then assigning it to a variable called functionB.

  5. It prints the string "before __name__ guard".

Only When Your Module Is the Main Program

  1. If your module is the main program, then it will see that __name__ was indeed set to "__main__" and it calls the two functions, printing the strings "Function A" and "Function B 10.0".

Only When Your Module Is Imported by Another

  1. (instead) If your module is not the main program but was imported by another one, then __name__ will be "foo", not "__main__", and it’ll skip the body of the if statement.

Always

  1. It will print the string "after __name__ guard" in both situations.

Summary

In summary, here’s what’d be printed in the two cases:

# What gets printed if foo is the main program
before import
before functionA
before functionB
before __name__ guard
Function A
Function B 10.0
after __name__ guard
# What gets printed if foo is imported as a regular module
before import
before functionA
before functionB
before __name__ guard
after __name__ guard

Why Does It Work This Way?

You might naturally wonder why anybody would want this. Well, sometimes you want to write a .py file that can be both used by other programs and/or modules as a module, and can also be run as the main program itself. Examples:

  • Your module is a library, but you want to have a script mode where it runs some unit tests or a demo.

  • Your module is only used as a main program, but it has some unit tests, and the testing framework works by importing .py files like your script and running special test functions. You don’t want it to try running the script just because it’s importing the module.

  • Your module is mostly used as a main program, but it also provides a programmer-friendly API for advanced users.

Beyond those examples, it’s elegant that running a script in Python is just setting up a few magic variables and importing the script. “Running” the script is a side effect of importing the script’s module.

Food for Thought

  • Question: Can I have multiple __name__ checking blocks? Answer: it’s strange to do so, but the language won’t stop you.

  • Suppose the following is in foo2.py. What happens if you say python foo2.py on the command-line? Why?

# Suppose this is foo2.py.

def functionA():
    print("a1")
    from foo2 import functionB
    print("a2")
    functionB()
    print("a3")

def functionB():
    print("b")

print("t1")
if __name__ == "__main__":
    print("m1")
    functionA()
    print("m2")
print("t2")
  • Now, figure out what will happen if you remove the __name__ check in foo3.py:
# Suppose this is foo3.py.

def functionA():
    print("a1")
    from foo3 import functionB
    print("a2")
    functionB()
    print("a3")

def functionB():
    print("b")

print("t1")
print("m1")
functionA()
print("m2")
print("t2")
  • What will this do when used as a script? When imported as a module?
# Suppose this is in foo4.py
__name__ = "__main__"

def bar():
    print("bar")

print("before __name__ guard")
if __name__ == "__main__":
    bar()
print("after __name__ guard")

回答 1

通过将脚本作为命令传递给Python解释器来运行脚本时,

python myscript.py

缩进级别为0的所有代码都将执行。可以很好地定义已定义的函数和类,但是不会运行任何代码。与其他语言不同,它没有main()自动运行的功能-main()函数隐式是顶层的所有代码。

在这种情况下,顶级代码是一个if块。 __name__是一个内置变量,其结果为当前模块的名称。但是,如果模块直接运行(如上myscript.py所示),则将__name__其设置为string "__main__"。因此,您可以通过测试来测试您的脚本是直接运行还是通过其他方式导入

if __name__ == "__main__":
    ...

如果将脚本导入另一个模块,则将导入其各种功能和类定义,并执行其顶层代码,但是上述if子句的then-body中的代码将不会运行,因为条件是没见过。作为一个基本示例,请考虑以下两个脚本:

# file one.py
def func():
    print("func() in one.py")

print("top-level in one.py")

if __name__ == "__main__":
    print("one.py is being run directly")
else:
    print("one.py is being imported into another module")
# file two.py
import one

print("top-level in two.py")
one.func()

if __name__ == "__main__":
    print("two.py is being run directly")
else:
    print("two.py is being imported into another module")

现在,如果您将解释器调用为

python one.py

输出将是

top-level in one.py
one.py is being run directly

如果two.py改为运行:

python two.py

你得到

top-level in one.py
one.py is being imported into another module
top-level in two.py
func() in one.py
two.py is being run directly

因此,当模块one加载时,其__name__等于"one"而不是"__main__"

When your script is run by passing it as a command to the Python interpreter,

python myscript.py

all of the code that is at indentation level 0 gets executed. Functions and classes that are defined are, well, defined, but none of their code gets run. Unlike other languages, there’s no main() function that gets run automatically – the main() function is implicitly all the code at the top level.

In this case, the top-level code is an if block. __name__ is a built-in variable which evaluates to the name of the current module. However, if a module is being run directly (as in myscript.py above), then __name__ instead is set to the string "__main__". Thus, you can test whether your script is being run directly or being imported by something else by testing

if __name__ == "__main__":
    ...

If your script is being imported into another module, its various function and class definitions will be imported and its top-level code will be executed, but the code in the then-body of the if clause above won’t get run as the condition is not met. As a basic example, consider the following two scripts:

# file one.py
def func():
    print("func() in one.py")

print("top-level in one.py")

if __name__ == "__main__":
    print("one.py is being run directly")
else:
    print("one.py is being imported into another module")
# file two.py
import one

print("top-level in two.py")
one.func()

if __name__ == "__main__":
    print("two.py is being run directly")
else:
    print("two.py is being imported into another module")

Now, if you invoke the interpreter as

python one.py

The output will be

top-level in one.py
one.py is being run directly

If you run two.py instead:

python two.py

You get

top-level in one.py
one.py is being imported into another module
top-level in two.py
func() in one.py
two.py is being run directly

Thus, when module one gets loaded, its __name__ equals "one" instead of "__main__".


回答 2

__name__变量(imho)的最简单解释如下:

创建以下文件。

# a.py
import b

# b.py
print "Hello World from %s!" % __name__

if __name__ == '__main__':
    print "Hello World again from %s!" % __name__

运行它们将为您提供以下输出:

$ python a.py
Hello World from b!

如您所见,导入模块时,Python globals()['__name__']在此模块中设置模块的名称。同样,在导入时,模块中的所有代码都在运行。由于if语句评估到False这一部分没有执行。

$ python b.py
Hello World from __main__!
Hello World again from __main__!

如您所见,执行文件时,Python globals()['__name__']在该文件中将设置为"__main__"。这次,该if语句求值True并正在运行。

The simplest explanation for the __name__ variable (imho) is the following:

Create the following files.

# a.py
import b

and

# b.py
print "Hello World from %s!" % __name__

if __name__ == '__main__':
    print "Hello World again from %s!" % __name__

Running them will get you this output:

$ python a.py
Hello World from b!

As you can see, when a module is imported, Python sets globals()['__name__'] in this module to the module’s name. Also, upon import all the code in the module is being run. As the if statement evaluates to False this part is not executed.

$ python b.py
Hello World from __main__!
Hello World again from __main__!

As you can see, when a file is executed, Python sets globals()['__name__'] in this file to "__main__". This time, the if statement evaluates to True and is being run.


回答 3

怎么if __name__ == "__main__":办?

概述基础知识:

  • __name__在作为程序入口点的模块中,全局变量为'__main__'。否则,这就是您导入模块的名称。

  • 因此,if仅当模块是程序的入口点时,该块下的代码才会运行。

  • 它允许模块中的代码可由其他模块导入,而无需在导入时执行下面的代码块。


我们为什么需要这个?

开发和测试您的代码

假设您正在编写旨在用作模块的Python脚本:

def do_important():
    """This function does something very important"""

可以通过在底部添加此函数调用测试模块:

do_important()

并使用以下命令运行它(在命令提示符下):

~$ python important.py

问题

但是,如果要将模块导入到另一个脚本:

import important

在导入时,do_important将调用该函数,因此您可能会do_important()在底部注释掉函数调用。

# do_important() # I must remember to uncomment to execute this!

然后,您必须记住是否已注释掉测试函数调用。这种额外的复杂性将意味着您可能会忘记,从而使您的开发过程更加麻烦。

更好的方法

__name__变量指向当前Python解释器所在的命名空间。

在导入的模块中,它是该模块的名称。

但是在主模块(或交互式Python会话,即解释器的Read,Eval,Print Loop或REPL)中,您正在运行其所有内容"__main__"

因此,如果您在执行之前进行检查:

if __name__ == "__main__":
    do_important()

有了以上内容,您的代码将仅在以主模块运行(或从另一个脚本有意调用)时执行。

更好的方法

不过,有一种Python方式可以对此进行改进。

如果我们想从模块外部运行该业务流程怎么办?

如果我们放上我们想在开发和测试这样的函数时使用的代码,然后在执行以下操作时'__main__'立即进行检查:

def main():
    """business logic for when running this module as the primary one!"""
    setup()
    foo = do_important()
    bar = do_even_more_important(foo)
    for baz in bar:
        do_super_important(baz)
    teardown()

# Here's our payoff idiom!
if __name__ == '__main__':
    main()

现在,我们在模块末尾具有最终功能,如果我们将模块作为主要模块运行,则该功能将运行。

它将允许在不运行该main功能的情况下将该模块及其功能和类导入其他脚本,并且还将允许从其他'__main__'模块运行时调用该模块(及其功能和类),即

import important
important.main()

这个习语也可以在Python文档中的__main__模块说明中找到。该文本指出:

此模块表示解释程序的主程序在其中执行的(否则为匿名)范围-从标准输入,脚本文件或交互式提示中读取的命令。在这种环境中,惯用的“条件脚本”节使脚本运行:

if __name__ == '__main__':
    main()

What does the if __name__ == "__main__": do?

To outline the basics:

  • The global variable, __name__, in the module that is the entry point to your program, is '__main__'. Otherwise, it’s the name you import the module by.

  • So, code under the if block will only run if the module is the entry point to your program.

  • It allows the code in the module to be importable by other modules, without executing the code block beneath on import.


Why do we need this?

Developing and Testing Your Code

Say you’re writing a Python script designed to be used as a module:

def do_important():
    """This function does something very important"""

You could test the module by adding this call of the function to the bottom:

do_important()

and running it (on a command prompt) with something like:

~$ python important.py

The Problem

However, if you want to import the module to another script:

import important

On import, the do_important function would be called, so you’d probably comment out your function call, do_important(), at the bottom.

# do_important() # I must remember to uncomment to execute this!

And then you’ll have to remember whether or not you’ve commented out your test function call. And this extra complexity would mean you’re likely to forget, making your development process more troublesome.

A Better Way

The __name__ variable points to the namespace wherever the Python interpreter happens to be at the moment.

Inside an imported module, it’s the name of that module.

But inside the primary module (or an interactive Python session, i.e. the interpreter’s Read, Eval, Print Loop, or REPL) you are running everything from its "__main__".

So if you check before executing:

if __name__ == "__main__":
    do_important()

With the above, your code will only execute when you’re running it as the primary module (or intentionally call it from another script).

An Even Better Way

There’s a Pythonic way to improve on this, though.

What if we want to run this business process from outside the module?

If we put the code we want to exercise as we develop and test in a function like this and then do our check for '__main__' immediately after:

def main():
    """business logic for when running this module as the primary one!"""
    setup()
    foo = do_important()
    bar = do_even_more_important(foo)
    for baz in bar:
        do_super_important(baz)
    teardown()

# Here's our payoff idiom!
if __name__ == '__main__':
    main()

We now have a final function for the end of our module that will run if we run the module as the primary module.

It will allow the module and its functions and classes to be imported into other scripts without running the main function, and will also allow the module (and its functions and classes) to be called when running from a different '__main__' module, i.e.

import important
important.main()

This idiom can also be found in the Python documentation in an explanation of the __main__ module. That text states:

This module represents the (otherwise anonymous) scope in which the interpreter’s main program executes — commands read either from standard input, from a script file, or from an interactive prompt. It is this environment in which the idiomatic “conditional script” stanza causes a script to run:

if __name__ == '__main__':
    main()

回答 4

if __name__ == "__main__"是使用(例如)命令从(例如)命令行运行脚本时运行的部分python myscript.py

if __name__ == "__main__" is the part that runs when the script is run from (say) the command line using a command like python myscript.py.


回答 5

怎么if __name__ == "__main__":办?

__name__是存在于所有命名空间中的全局变量(在Python中,global实际上是在模块级别上表示)。它通常是模块的名称(作为str类型)。

但是,作为唯一的特殊情况,无论您运行什么Python进程,如mycode.py:

python mycode.py

否则将匿名全局命名空间的值分配'__main__'__name__

因此,包括最后几行

if __name__ == '__main__':
    main()
  • 在mycode.py脚本的末尾,
  • 当它是由Python进程运行的主要入口点模块时,

将导致脚本的唯一定义main函数运行。

使用此构造的另一个好处:如果程序决定何时,还可以将代码作为模块导入另一个脚本中,然后运行main函数。

import mycode
# ... any amount of other code
mycode.main()

What does if __name__ == "__main__": do?

__name__ is a global variable (in Python, global actually means on the module level) that exists in all namespaces. It is typically the module’s name (as a str type).

As the only special case, however, in whatever Python process you run, as in mycode.py:

python mycode.py

the otherwise anonymous global namespace is assigned the value of '__main__' to its __name__.

Thus, including the final lines

if __name__ == '__main__':
    main()
  • at the end of your mycode.py script,
  • when it is the primary, entry-point module that is run by a Python process,

will cause your script’s uniquely defined main function to run.

Another benefit of using this construct: you can also import your code as a module in another script and then run the main function if and when your program decides:

import mycode
# ... any amount of other code
mycode.main()

回答 6

在这里,有关代码的“如何”的机制有很多不同之处,但是对我而言,直到我理解了“为什么”之后,才有意义。这对新程序员特别有用。

取得文件“ ab.py”:

def a():
    print('A function in ab file');
a()

还有第二个文件“ xy.py”:

import ab
def main():
    print('main function: this is where the action is')
def x():
    print ('peripheral task: might be useful in other projects')
x()
if __name__ == "__main__":
    main()

这段代码实际上在做什么?

当你执行时xy.py,你import ab。import语句在导入时立即运行模块,因此ab的操作要在的其余部分之前执行xy。完成后ab,继续xy

解释器会跟踪运行哪个脚本__name__。当您运行脚本时(无论您使用什么名称),解释器都会调用它"__main__",使其成为运行外部脚本后返回的主脚本或“主”脚本。

从该"__main__"脚本调用的任何其他脚本都被分配了其文件名作为其文件名__name__(例如__name__ == "ab.py")。因此,线if __name__ == "__main__":是解释程序的测试,以确定它是否正在解释/解析最初执行的“ home”脚本,或者它是否正在临时窥视另一个(外部)脚本。这使程序员可以灵活地让脚本在直接执行与在外部执行之间有所不同。

让我们逐步看一下上面的代码,以了解发生了什么,首先关注未缩进的行及其在脚本中出现的顺序。请记住,函数-或def-块在被调用之前不会自行执行任何操作。如果自言自语,口译员可能会说:

  • 打开xy.py作为“主”文件;"__main__"__name__变量中调用它。
  • 使用导入并打开文件__name__ == "ab.py"
  • 哦,有功能。我会记住的。
  • 好的,功能a();我才知道 打印’ ab文件中函数 ‘。
  • 文件结尾;回到"__main__"
  • 哦,有功能。我会记住的。
  • 另一个。
  • 功能x(); 好,打印“ 外围任务:可能在其他项目中有用 ”。
  • 这是什么?一个if声明。好了,条件已经满足(变量__name__已设置为"__main__"),所以我将输入main()函数并打印’ main函数:这是操作所在

最下面的两行表示:“如果这是"__main__"‘或’home’脚本,则执行名为main()“ 的功能。这就是为什么您会def main():在顶部看到一个块的原因,其中包含脚本功能的主要流程。

为什么要实施呢?

还记得我之前说的有关导入语句的内容吗?导入模块时,它不仅会“识别”它并等待进一步的指令-实际上会运行脚本中包含的所有可执行操作。因此,将脚本的内容main()有效地隔离到函数中,将其隔离,以便在被另一个脚本导入时不会立即运行。

同样,会有exceptions,但是通常的做法是main()通常不会在外部调用它。因此,您可能想知道又一件事:如果我们不调用main(),为什么还要调用脚本呢?这是因为许多人使用独立的函数来构造脚本,这些独立的函数旨在独立于文件中的其余代码运行。然后在脚本正文中的其他位置调用它们。这使我想到了这一点:

但是代码没有它就可以工作

恩,那就对了。可以从函数未包含的内联脚本中调用这些单独的main()函数。如果您习惯了(就像我在编程的早期学习阶段那样)构建可以完全满足您需要的内联脚本,并且如果您再次需要该操作,那么您将尝试再次找出它。嗯,您不习惯这种代码的内部结构,因为它的构建更加复杂并且阅读起来也不那么直观。

但这是一个脚本,可能无法从外部调用其功能,因为如果执行该脚本,它将立即开始计算和分配变量。而且,如果您想重用某个功能,则新脚本与旧脚本的关联性可能会很高,以至于变量会冲突。

在拆分独立功能时,您可以通过将其调用到另一个脚本中来重用以前的工作。例如,“ example.py”可能会导入“ xy.py”并调用x(),从而利用“ xy.py” 中的“ x”功能。(也许是将给定文本字符串的第三个单词大写;从数字列表中创建一个NumPy数组并对其进行平方;或者对3D表面进行趋势处理。这种可能性是无限的。)

(顺便说一句,这个问题包含@kindall的答案,它最终帮助我理解了-原因,而不是方法。不幸的是,它被标记为与副本的副本,我认为这是错误的。)

There are lots of different takes here on the mechanics of the code in question, the “How”, but for me none of it made sense until I understood the “Why”. This should be especially helpful for new programmers.

Take file “ab.py”:

def a():
    print('A function in ab file');
a()

And a second file “xy.py”:

import ab
def main():
    print('main function: this is where the action is')
def x():
    print ('peripheral task: might be useful in other projects')
x()
if __name__ == "__main__":
    main()

What is this code actually doing?

When you execute xy.py, you import ab. The import statement runs the module immediately on import, so ab‘s operations get executed before the remainder of xy‘s. Once finished with ab, it continues with xy.

The interpreter keeps track of which scripts are running with __name__. When you run a script – no matter what you’ve named it – the interpreter calls it "__main__", making it the master or ‘home’ script that gets returned to after running an external script.

Any other script that’s called from this "__main__" script is assigned its filename as its __name__ (e.g., __name__ == "ab.py"). Hence, the line if __name__ == "__main__": is the interpreter’s test to determine if it’s interpreting/parsing the ‘home’ script that was initially executed, or if it’s temporarily peeking into another (external) script. This gives the programmer flexibility to have the script behave differently if it’s executed directly vs. called externally.

Let’s step through the above code to understand what’s happening, focusing first on the unindented lines and the order they appear in the scripts. Remember that function – or def – blocks don’t do anything by themselves until they’re called. What the interpreter might say if mumbled to itself:

  • Open xy.py as the ‘home’ file; call it "__main__" in the __name__ variable.
  • Import and open file with the __name__ == "ab.py".
  • Oh, a function. I’ll remember that.
  • Ok, function a(); I just learned that. Printing ‘A function in ab file‘.
  • End of file; back to "__main__"!
  • Oh, a function. I’ll remember that.
  • Another one.
  • Function x(); ok, printing ‘peripheral task: might be useful in other projects‘.
  • What’s this? An if statement. Well, the condition has been met (the variable __name__ has been set to "__main__"), so I’ll enter the main() function and print ‘main function: this is where the action is‘.

The bottom two lines mean: “If this is the "__main__" or ‘home’ script, execute the function called main()“. That’s why you’ll see a def main(): block up top, which contains the main flow of the script’s functionality.

Why implement this?

Remember what I said earlier about import statements? When you import a module it doesn’t just ‘recognize’ it and wait for further instructions – it actually runs all the executable operations contained within the script. So, putting the meat of your script into the main() function effectively quarantines it, putting it in isolation so that it won’t immediately run when imported by another script.

Again, there will be exceptions, but common practice is that main() doesn’t usually get called externally. So you may be wondering one more thing: if we’re not calling main(), why are we calling the script at all? It’s because many people structure their scripts with standalone functions that are built to be run independent of the rest of the code in the file. They’re then later called somewhere else in the body of the script. Which brings me to this:

But the code works without it

Yes, that’s right. These separate functions can be called from an in-line script that’s not contained inside a main() function. If you’re accustomed (as I am, in my early learning stages of programming) to building in-line scripts that do exactly what you need, and you’ll try to figure it out again if you ever need that operation again … well, you’re not used to this kind of internal structure to your code, because it’s more complicated to build and it’s not as intuitive to read.

But that’s a script that probably can’t have its functions called externally, because if it did it would immediately start calculating and assigning variables. And chances are if you’re trying to re-use a function, your new script is related closely enough to the old one that there will be conflicting variables.

In splitting out independent functions, you gain the ability to re-use your previous work by calling them into another script. For example, “example.py” might import “xy.py” and call x(), making use of the ‘x’ function from “xy.py”. (Maybe it’s capitalizing the third word of a given text string; creating a NumPy array from a list of numbers and squaring them; or detrending a 3D surface. The possibilities are limitless.)

(As an aside, this question contains an answer by @kindall that finally helped me to understand – the why, not the how. Unfortunately it’s been marked as a duplicate of this one, which I think is a mistake.)


回答 7

当我们的模块(M.py)中有某些语句时,我们希望在将其作为main(而不是导入)运行时执行该语句,我们可以将这些语句(测试用例,打印语句)放在此if块下。

默认情况下(当模块作为主模块运行而不是导入时),该__name__变量设置为"__main__",导入时,该__name__变量将获得一个不同的值,很可能是模块的名称('M')。这有助于一起运行模块的不同变体,分离其特定的输入和输出语句,以及是否存在测试用例。

简而言之,使用此’ if __name__ == "main"‘块可防止在导入模块时运行(某些)代码。

When there are certain statements in our module (M.py) we want to be executed when it’ll be running as main (not imported), we can place those statements (test-cases, print statements) under this if block.

As by default (when module running as main, not imported) the __name__ variable is set to "__main__", and when it’ll be imported the __name__ variable will get a different value, most probably the name of the module ('M'). This is helpful in running different variants of a modules together, and separating their specific input & output statements and also if there are any test-cases.

In short, use this ‘if __name__ == "main" ‘ block to prevent (certain) code from being run when the module is imported.


回答 8

简而言之,__name__是为每个脚本定义的变量,用于定义脚本是作为主模块运行还是作为导入模块运行。

因此,如果我们有两个脚本;

#script1.py
print "Script 1's name: {}".format(__name__)

#script2.py
import script1
print "Script 2's name: {}".format(__name__)

执行script1的输出是

Script 1's name: __main__

执行script2的输出是:

Script1's name is script1
Script 2's name: __main__

如你看到的, __name__告诉我们哪个代码是“主”模块。这很棒,因为您可以编写代码,而不必担心C / C ++中的结构性问题,在这种情况下,如果文件未实现“ main”功能,则无法将其编译为可执行文件,如果可以,然后它不能用作库。

假设您编写的Python脚本功能出色,并实现了许多对其他用途有用的功能。如果要使用它们,我可以导入您的脚本并使用它们而无需执行您的程序(假设您的代码仅在if __name__ == "__main__":上下文中执行 )。而在C / C ++中,您将必须将这些部分分成一个单独的模块,然后再包含文件。如下图所示;

用C导入复杂

箭头是导入链接。对于三个试图包含先前模块代码的模块,有六个文件(九个,计算实现文件)和五个链接。除非将其专门编译为库,否则很难将其他代码包含到C项目中。现在将其描述为适用于Python:

用Python优雅地导入

您编写了一个模块,如果有人想使用您的代码,他们只需将其导入即可,并且__name__变量可以帮助将程序的可执行部分与库部分分开。

Put simply, __name__ is a variable defined for each script that defines whether the script is being run as the main module or it is being run as an imported module.

So if we have two scripts;

#script1.py
print "Script 1's name: {}".format(__name__)

and

#script2.py
import script1
print "Script 2's name: {}".format(__name__)

The output from executing script1 is

Script 1's name: __main__

And the output from executing script2 is:

Script1's name is script1
Script 2's name: __main__

As you can see, __name__ tells us which code is the ‘main’ module. This is great, because you can just write code and not have to worry about structural issues like in C/C++, where, if a file does not implement a ‘main’ function then it cannot be compiled as an executable and if it does, it cannot then be used as a library.

Say you write a Python script that does something great and you implement a boatload of functions that are useful for other purposes. If I want to use them I can just import your script and use them without executing your program (given that your code only executes within the if __name__ == "__main__": context). Whereas in C/C++ you would have to portion out those pieces into a separate module that then includes the file. Picture the situation below;

Complicated importing in C

The arrows are import links. For three modules each trying to include the previous modules code there are six files (nine, counting the implementation files) and five links. This makes it difficult to include other code into a C project unless it is compiled specifically as a library. Now picture it for Python:

Elegant importing in Python

You write a module, and if someone wants to use your code they just import it and the __name__ variable can help to separate the executable portion of the program from the library part.


回答 9

让我们以更抽象的方式看一下答案:

假设我们在以下代码中x.py

...
<Block A>
if __name__ == '__main__':
    <Block B>
...

块A和B,当我们正在运行的运行x.py

但是,y.py例如,当我们运行另一个模块时,仅运行块A(而不运行B),例如,在其中x.py导入了代码并从那里运行代码(例如,当x.py从中调用in函数时y.py)。

Let’s look at the answer in a more abstract way:

Suppose we have this code in x.py:

...
<Block A>
if __name__ == '__main__':
    <Block B>
...

Blocks A and B are run when we are running x.py.

But just block A (and not B) is run when we are running another module, y.py for example, in which x.py is imported and the code is run from there (like when a function in x.py is called from y.py).


回答 10

交互式运行Python时,会为本地__name__变量分配的值__main__。同样,当您从命令行执行Python模块,而不是将其导入另一个模块时,__name__属性被分配为的值__main__,而不是模块的实际名称。通过这种方式,模块可以查看自己的__name__值来自行确定如何使用它们,无论是作为对另一个程序的支持还是作为从命令行执行的主要应用程序。因此,以下习语在Python模块中非常普遍:

if __name__ == '__main__':
    # Do something appropriate here, like calling a
    # main() function defined elsewhere in this module.
    main()
else:
    # Do nothing. This module has been imported by another
    # module that wants to make use of the functions,
    # classes and other useful bits it has defined.

When you run Python interactively the local __name__ variable is assigned a value of __main__. Likewise, when you execute a Python module from the command line, rather than importing it into another module, its __name__ attribute is assigned a value of __main__, rather than the actual name of the module. In this way, modules can look at their own __name__ value to determine for themselves how they are being used, whether as support for another program or as the main application executed from the command line. Thus, the following idiom is quite common in Python modules:

if __name__ == '__main__':
    # Do something appropriate here, like calling a
    # main() function defined elsewhere in this module.
    main()
else:
    # Do nothing. This module has been imported by another
    # module that wants to make use of the functions,
    # classes and other useful bits it has defined.

回答 11

考虑:

if __name__ == "__main__":
    main()

它检查__name__Python脚本的属性是否为"__main__"。换句话说,如果程序本身已执行,则属性将为__main__,因此程序将被执行(在这种情况下,main()函数)。

但是,如果模块使用了您的Python脚本,if则将执行该语句之外的任何代码,因此if \__name__ == "\__main__"仅用于检查该程序是否用作模块,从而决定是否运行该代码。

Consider:

if __name__ == "__main__":
    main()

It checks if the __name__ attribute of the Python script is "__main__". In other words, if the program itself is executed, the attribute will be __main__, so the program will be executed (in this case the main() function).

However, if your Python script is used by a module, any code outside of the if statement will be executed, so if \__name__ == "\__main__" is used just to check if the program is used as a module or not, and therefore decides whether to run the code.


回答 12

在解释任何有关if __name__ == '__main__'它的内容之前,重要的是要了解它是什么__name__以及它做什么。

什么__name__

__name__DunderAlias-可以认为是全局变量(可从模块访问),并且与相似global

它是type(__name__)(yielding <class 'str'>)指示的字符串(如上所述),并且是Python 3Python 2版本的内置标准。

哪里:

它不仅可以在脚本中使用,而且可以在解释器和模块/包中找到。

口译员:

>>> print(__name__)
__main__
>>>

脚本:

test_file.py

print(__name__)

导致 __main__

模块或包装:

somefile.py:

def somefunction():
    print(__name__)

test_file.py:

import somefile
somefile.somefunction()

导致 somefile

请注意,在包或模块中使用时,使用__name__文件名。没有给出实际模块或包路径的路径,但是具有自己的DunderAlias__file__,因此可以这一点。

您应该看到,where __name__,它总是在其中返回主文件(或程序)__main__,并且如果它是一个模块/程序包,或者正在运行其他Python脚本的任何东西,则将在其中返回文件名。起源于。

实践:

作为变量意味着它的值可以被覆盖(“可以”并不意味着“应该”),覆盖的值__name__将导致缺乏可读性。因此,无论出于任何原因都不要这样做。如果您需要一个变量,请定义一个新变量。

始终假定__name__为be 的值__main__或文件名。再次更改此默认值将引起更多混乱,这会带来好处,并进一步导致问题。

例:

>>> __name__ = 'Horrify' # Change default from __main__
>>> if __name__ == 'Horrify': print(__name__)
...
>>> else: print('Not Horrify')
...
Horrify
>>>

通常,将if __name__ == '__main__'in 包含在脚本中被认为是一种好习惯。

现在回答if __name__ == '__main__'

现在我们知道__name__事物的行为变得更加清晰:

一个if是包含的代码块,如果给定的值是true,将执行流控制语句。我们已经看到这__name__可以采取
__main__导入的文件名导入的文件名。

这意味着,如果__name__等于,__main__则该文件必须是主文件并且必须实际上正在运行(或者它是解释器),而不是导入脚本的模块或包。

如果确实__name__采用了值,__main__那么该代码块中的所有内容都将执行。

这告诉我们,如果正在运行的文件是主文件(或者直接从解释器运行),则必须执行该条件。如果它是一个包,则不应该,并且值不应该是__main__

模块:

__name__ 也可以在模块中使用以定义模块名称

变体:

也可以使用进行其他一些不太常见但有用的事情__name__,我将在这里展示一些:

仅当文件是模块或软件包时才执行:

if __name__ != '__main__':
    # Do some useful things 

如果文件是主文件,则运行一个条件,如果文件不是主文件,则运行另一个条件:

if __name__ == '__main__':
    # Execute something
else:
    # Do some useful things

您也可以使用它在软件包和模块上提供可运行的帮助功能/实用程序,而无需精心使用库。

它还允许模块作为主脚本从命令行运行,这也非常有用。

Before explaining anything about if __name__ == '__main__' it is important to understand what __name__ is and what it does.

What is __name__?

__name__ is a DunderAlias – can be thought of as a global variable (accessible from modules) and works in a similar way to global.

It is a string (global as mentioned above) as indicated by type(__name__) (yielding <class 'str'>), and is an inbuilt standard for both Python 3 and Python 2 versions.

Where:

It can not only be used in scripts but can also be found in both the interpreter and modules/packages.

Interpreter:

>>> print(__name__)
__main__
>>>

Script:

test_file.py:

print(__name__)

Resulting in __main__

Module or package:

somefile.py:

def somefunction():
    print(__name__)

test_file.py:

import somefile
somefile.somefunction()

Resulting in somefile

Notice that when used in a package or module, __name__ takes the name of the file. The path of the actual module or package path is not given, but has its own DunderAlias __file__, that allows for this.

You should see that, where __name__, where it is the main file (or program) will always return __main__, and if it is a module/package, or anything that is running off some other Python script, will return the name of the file where it has originated from.

Practice:

Being a variable means that it’s value can be overwritten (“can” does not mean “should”), overwriting the value of __name__ will result in a lack of readability. So do not do it, for any reason. If you need a variable define a new variable.

It is always assumed that the value of __name__ to be __main__ or the name of the file. Once again changing this default value will cause more confusion that it will do good, causing problems further down the line.

example:

>>> __name__ = 'Horrify' # Change default from __main__
>>> if __name__ == 'Horrify': print(__name__)
...
>>> else: print('Not Horrify')
...
Horrify
>>>

It is considered good practice in general to include the if __name__ == '__main__' in scripts.

Now to answer if __name__ == '__main__':

Now we know the behaviour of __name__ things become clearer:

An if is a flow control statement that contains the block of code will execute if the value given is true. We have seen that __name__ can take either __main__ or the file name it has been imported from.

This means that if __name__ is equal to __main__ then the file must be the main file and must actually be running (or it is the interpreter), not a module or package imported into the script.

If indeed __name__ does take the value of __main__ then whatever is in that block of code will execute.

This tells us that if the file running is the main file (or you are running from the interpreter directly) then that condition must execute. If it is a package then it should not, and the value will not be __main__.

Modules:

__name__ can also be used in modules to define the name of a module

Variants:

It is also possible to do other, less common but useful things with __name__, some I will show here:

Executing only if the file is a module or package:

if __name__ != '__main__':
    # Do some useful things 

Running one condition if the file is the main one and another if it is not:

if __name__ == '__main__':
    # Execute something
else:
    # Do some useful things

You can also use it to provide runnable help functions/utilities on packages and modules without the elaborate use of libraries.

It also allows modules to be run from the command line as main scripts, which can be also very useful.


回答 13

我认为最好是深入浅出的答案:

__name__:Python中的每个模块都有一个称为的特殊属性__name__。它是一个内置变量,返回模块的名称。

__main__:与其他编程语言一样,Python也具有执行入口点,即main。'__main__' 是执行顶级代码的作用域的名称。基本上,您有两种使用Python模块的方式:直接将其作为脚本运行,或将其导入。当模块作为脚本运行时,其__name__设置为__main__

因此,当模块作为主程序运行时,该__name__属性的值将设置为__main__。否则,的值将__name__ 设置为包含模块的名称。

I think it’s best to break the answer in depth and in simple words:

__name__: Every module in Python has a special attribute called __name__. It is a built-in variable that returns the name of the module.

__main__: Like other programming languages, Python too has an execution entry point, i.e., main. '__main__' is the name of the scope in which top-level code executes. Basically you have two ways of using a Python module: Run it directly as a script, or import it. When a module is run as a script, its __name__ is set to __main__.

Thus, the value of the __name__ attribute is set to __main__ when the module is run as the main program. Otherwise the value of __name__ is set to contain the name of the module.


回答 14

这是从命令行调用Python文件时的特殊功能。通常用于调用“ main()”函数或执行其他适当的启动代码,例如命令行参数处理。

它可以用几种方式编写。另一个是:

def some_function_for_instance_main():
    dosomething()


__name__ == '__main__' and some_function_for_instance_main()

我并不是说您应该在生产代码中使用它,但是它可以说明没有什么“魔术” if __name__ == '__main__'。在Python文件中调用主函数是一个很好的约定。

It is a special for when a Python file is called from the command line. This is typically used to call a “main()” function or execute other appropriate startup code, like commandline arguments handling for instance.

It could be written in several ways. Another is:

def some_function_for_instance_main():
    dosomething()


__name__ == '__main__' and some_function_for_instance_main()

I am not saying you should use this in production code, but it serves to illustrate that there is nothing “magical” about if __name__ == '__main__'. It is a good convention for invoking a main function in Python files.


回答 15

系统(Python解释器)为源文件(模块)提供了许多变量。您可以随时获取它们的值,因此,让我们关注__name__变量/属性:

当Python加载源代码文件时,它将执行在其中找到的所有代码。(请注意,它不会调用文件中定义的所有方法和函数,但会定义它们。)

但是,在解释器执行源代码文件之前,它会为该文件定义一些特殊的变量。__名称__是Python为每个源代码文件自动定义的那些特殊变量之一。

如果Python正在将此源代码文件作为主程序加载(即,您运行的文件),那么它将为此文件设置特殊的__name__变量,使其具有值“ __main__”

如果是从另一个模块导入的,则将__name__设置为该模块的名称。

因此,在部分示例中:

if __name__ == "__main__":
   lock = thread.allocate_lock()
   thread.start_new_thread(myfunction, ("Thread #: 1", 2, lock))
   thread.start_new_thread(myfunction, ("Thread #: 2", 2, lock))

表示代码块:

lock = thread.allocate_lock()
thread.start_new_thread(myfunction, ("Thread #: 1", 2, lock))
thread.start_new_thread(myfunction, ("Thread #: 2", 2, lock))

仅当您直接运行模块时才会执行;如果另一个模块正在调用/导入该代码块,则该代码块将不会执行,因为__name__的值在该特定实例中将不等于“ main ”。

希望这会有所帮助。

There are a number of variables that the system (Python interpreter) provides for source files (modules). You can get their values anytime you want, so, let us focus on the __name__ variable/attribute:

When Python loads a source code file, it executes all of the code found in it. (Note that it doesn’t call all of the methods and functions defined in the file, but it does define them.)

Before the interpreter executes the source code file though, it defines a few special variables for that file; __name__ is one of those special variables that Python automatically defines for each source code file.

If Python is loading this source code file as the main program (i.e. the file you run), then it sets the special __name__ variable for this file to have a value “__main__”.

If this is being imported from another module, __name__ will be set to that module’s name.

So, in your example in part:

if __name__ == "__main__":
   lock = thread.allocate_lock()
   thread.start_new_thread(myfunction, ("Thread #: 1", 2, lock))
   thread.start_new_thread(myfunction, ("Thread #: 2", 2, lock))

means that the code block:

lock = thread.allocate_lock()
thread.start_new_thread(myfunction, ("Thread #: 1", 2, lock))
thread.start_new_thread(myfunction, ("Thread #: 2", 2, lock))

will be executed only when you run the module directly; the code block will not execute if another module is calling/importing it because the value of __name__ will not equal to “main” in that particular instance.

Hope this helps out.


回答 16

if __name__ == "__main__": 基本上是顶级脚本环境,它指定了解释器(“我首先执行的优先级最高”)。

'__main__'是执行顶级代码的作用域的名称。从标准输入,脚本或交互式提示中读取时,模块的__name__设置等于'__main__'

if __name__ == "__main__":
    # Execute only if run as a script
    main()

if __name__ == "__main__": is basically the top-level script environment, and it specifies the interpreter that (‘I have the highest priority to be executed first’).

'__main__' is the name of the scope in which top-level code executes. A module’s __name__ is set equal to '__main__' when read from standard input, a script, or from an interactive prompt.

if __name__ == "__main__":
    # Execute only if run as a script
    main()

回答 17

在本页的所有答案中,我都读了很多东西。我想说的是,如果您知道这件事,那么您肯定会理解这些答案,否则,您仍然会感到困惑。

简而言之,您需要了解以下几点:

  1. import a 操作实际上会运行所有可以在“ a”中运行的内容

  2. 由于第1点,导入时可能不希望所有内容都在“ a”中运行

  3. 为了解决第2点的问题,python允许您进行条件检查

  4. __name__是所有.py模块中的隐式变量;当a.py被导入,的值__name__a.py模块设置为它的文件名“ a“; 当a.py直接使用运行“ python a.py”,该装置a.py在所述入口点,则该值__name__a.py模块被设置为一个字符串__main__

  5. 基于python如何__name__为每个模块设置变量的机制,您知道如何实现第3点吗?答案很简单,对吧?把一个if条件:if __name__ == "__main__": ...; 您甚至可以__name__ == "a"根据您的功能需求放

python特殊之处很重要的一点是第4点!其余只是基本逻辑。

I’ve been reading so much throughout the answers on this page. I would say, if you know the thing, for sure you will understand those answers, otherwise, you are still confused.

To be short, you need to know several points:

  1. import a action actually runs all that can be ran in “a”

  2. Because of point 1, you may not want everything to be run in “a” when importing it

  3. To solve the problem in point 2, python allows you to put a condition check

  4. __name__ is an implicit variable in all .py modules; when a.py is imported, the value of __name__ of a.py module is set to its file name “a“; when a.py is run directly using “python a.py“, which means a.py is the entry point, then the value of __name__ of a.py module is set to a string __main__

  5. Based on the mechanism how python sets the variable __name__ for each module, do you know how to achieve point 3? The answer is fairly easy, right? Put a if condition: if __name__ == "__main__": ...; you can even put if __name__ == "a" depending on your functional need

The important thing that python is special at is point 4! The rest is just basic logic.


回答 18

考虑:

print __name__

上面的输出是__main__

if __name__ == "__main__":
  print "direct method"

上面的陈述是正确的,并显示“ direct method”。假设他们在另一个类中导入了该类,则不会打印“直接方法”,因为在导入时它将设置__name__ equal to "first model name"

Consider:

print __name__

The output for the above is __main__.

if __name__ == "__main__":
  print "direct method"

The above statement is true and prints “direct method”. Suppose if they imported this class in another class it doesn’t print “direct method” because, while importing, it will set __name__ equal to "first model name".


回答 19

您可以使该文件可用作脚本以及可导入模块

fibo.py(名为的模块fibo

# Other modules can IMPORT this MODULE to use the function fib
def fib(n):    # write Fibonacci series up to n
    a, b = 0, 1
    while b < n:
        print(b, end=' ')
        a, b = b, a+b
    print()

# This allows the file to be used as a SCRIPT
if __name__ == "__main__":
    import sys
    fib(int(sys.argv[1]))

参考:https : //docs.python.org/3.5/tutorial/modules.html

You can make the file usable as a script as well as an importable module.

fibo.py (a module named fibo)

# Other modules can IMPORT this MODULE to use the function fib
def fib(n):    # write Fibonacci series up to n
    a, b = 0, 1
    while b < n:
        print(b, end=' ')
        a, b = b, a+b
    print()

# This allows the file to be used as a SCRIPT
if __name__ == "__main__":
    import sys
    fib(int(sys.argv[1]))

Reference: https://docs.python.org/3.5/tutorial/modules.html


回答 20

的原因

if __name__ == "__main__":
    main()

主要是为了避免由于直接导入代码而导致的导入锁定问题。你想运行,如果是直接调用的文件(这是main()__name__ == "__main__"情况),,但是如果导入了代码,则导入程序必须从真正的主模块输入代码,以避免导入锁定问题。

副作用是您自动登录支持多个入口点的方法。您可以使用main()作为入口点来运行程序,但不必如此。虽然setup.py期望main(),但其他工具使用备用入口点。例如,要将文件作为gunicorn进程运行,请定义app()函数而不是main()。与一样setup.pygunicorn导入您的代码,因此您不希望它在导入时执行任何操作(由于导入锁定问题)。

The reason for

if __name__ == "__main__":
    main()

is primarily to avoid the import lock problems that would arise from having code directly imported. You want main() to run if your file was directly invoked (that’s the __name__ == "__main__" case), but if your code was imported then the importer has to enter your code from the true main module to avoid import lock problems.

A side-effect is that you automatically sign on to a methodology that supports multiple entry points. You can run your program using main() as the entry point, but you don’t have to. While setup.py expects main(), other tools use alternate entry points. For example, to run your file as a gunicorn process, you define an app() function instead of a main(). Just as with setup.py, gunicorn imports your code so you don’t want it do do anything while it’s being imported (because of the import lock issue).


回答 21

该答案适用于学习Python的Java程序员。每个Java文件通常包含一个公共类。您可以通过两种方式使用该类:

  1. 从其他文件调用类。您只需要将其导入调用程序中即可。

  2. 出于测试目的,单独运行类。

对于后一种情况,该类应包含一个公共的静态void main()方法。在Python中,此目的由全局定义的标签实现'__main__'

This answer is for Java programmers learning Python. Every Java file typically contains one public class. You can use that class in two ways:

  1. Call the class from other files. You just have to import it in the calling program.

  2. Run the class stand alone, for testing purposes.

For the latter case, the class should contain a public static void main() method. In Python this purpose is served by the globally defined label '__main__'.


回答 22

if __name__ == '__main__': 仅当模块作为脚本调用时,才会执行以下代码。

例如,考虑以下模块my_test_module.py

# my_test_module.py

print('This is going to be printed out, no matter what')

if __name__ == '__main__':
    print('This is going to be printed out, only if user invokes the module as a script')

第一种可能性:导入my_test_module.py另一个模块

# main.py

import my_test_module

if __name__ == '__main__':
    print('Hello from main.py')

现在,如果您调用main.py

python main.py 

>> 'This is going to be printed out, no matter what'
>> 'Hello from main.py'

请注意,仅执行print()in中的顶级语句my_test_module


第二种可能性:my_test_module.py作为脚本调用

现在,如果您my_test_module.py以Python脚本运行,则两个print()语句都将执行:

python my_test_module.py

>>> 'This is going to be printed out, no matter what'
>>> 'This is going to be printed out, only if user invokes the module as a script'

The code under if __name__ == '__main__': will only be executed if the module is invoked as a script.

As an example consider the following module my_test_module.py:

# my_test_module.py

print('This is going to be printed out, no matter what')

if __name__ == '__main__':
    print('This is going to be printed out, only if user invokes the module as a script')

1st possibility: Import my_test_module.py in another module

# main.py

import my_test_module

if __name__ == '__main__':
    print('Hello from main.py')

Now if you invoke main.py:

python main.py 

>> 'This is going to be printed out, no matter what'
>> 'Hello from main.py'

Note that only the top-level print() statement in my_test_module is executed.


2nd possibility: Invoke my_test_module.py as a script

Now if you run my_test_module.py as a Python script, both print() statements will be exectued:

python my_test_module.py

>>> 'This is going to be printed out, no matter what'
>>> 'This is going to be printed out, only if user invokes the module as a script'

回答 23

python中的每个模块都有一个名为的属性__name____name__ attribute 的值__main__ 是直接运行模块时(例如)python my_module.py。否则(如您说的那样import my_module)的值__name__ 是模块的名称。

简短说明一下小例子。

#Script test.py

apple = 42

def hello_world():
    print("I am inside hello_world")

if __name__ == "__main__":
    print("Value of __name__ is: ", __name__)
    print("Going to call hello_world")
    hello_world()

我们可以直接执行为

python test.py  

输出量

Value of __name__ is: __main__
Going to call hello_world
I am inside hello_world

现在假设我们从其他脚本中调用上述脚本

#script external_calling.py

import test
print(test.apple)
test.hello_world()

print(test.__name__)

当您执行此

python external_calling.py

输出量

42
I am inside hello_world
test

所以,以上是自我解释,当你调用其他脚本的测试,如果循环__name__test.py不会执行。

Every module in python has a attribute called __name__. The value of __name__ attribute is __main__ when the module is run directly, like python my_module.py. Otherwise (like when you say import my_module) the value of __name__ is the name of the module.

Small example to explain in short.

#Script test.py

apple = 42

def hello_world():
    print("I am inside hello_world")

if __name__ == "__main__":
    print("Value of __name__ is: ", __name__)
    print("Going to call hello_world")
    hello_world()

We can execute this directly as

python test.py  

Output

Value of __name__ is: __main__
Going to call hello_world
I am inside hello_world

Now suppose we call above script from other script

#script external_calling.py

import test
print(test.apple)
test.hello_world()

print(test.__name__)

When you execute this

python external_calling.py

Output

42
I am inside hello_world
test

So, above is self explanatory that when you call test from other script, if loop __name__ in test.py will not execute.


回答 24

如果此.py文件是由其他.py文件导入的,则“ if语句”下的代码将不会执行。

如果此.py是python this_py.py在shell下运行,或在Windows中双击。“ if语句”下的代码将被执行。

通常是为了测试而编写的。

If this .py file are imported by other .py files, the code under “the if statement” will not be executed.

If this .py are run by python this_py.py under shell, or double clicked in Windows. the code under “the if statement” will be executed.

It is usually written for testing.


回答 25

如果python解释器正在运行特定模块,则__name__全局变量将具有值"__main__"

  def a():
      print("a")
  def b():
      print("b")

  if __name__ == "__main__": 

          print ("you can see me" )
          a()
  else: 

          print ("You can't see me")
          b()

运行此脚本打印件时,您可以看到我

一个

如果您导入此文件,请说A到文件B并执行文件B,则if __name__ == "__main__"文件A中的文件将变为false,因此将其打印出来 您看不到我

b

If the python interpreter is running a particular module then __name__ global variable will have value "__main__"

  def a():
      print("a")
  def b():
      print("b")

  if __name__ == "__main__": 

          print ("you can see me" )
          a()
  else: 

          print ("You can't see me")
          b()

When you run this script prints you can see me

a

If you import this file say A to file B and execute the file B then if __name__ == "__main__" in file A becomes false, so it prints You can’t see me

b


回答 26

所有答案都对功能进行了解释。但是,我将提供其用法的一个示例,这可能有助于进一步澄清该概念。

假设您有两个Python文件a.py和b.py。现在,a.py导入b.py。我们运行a.py文件,首先执行“ import b.py”代码。在其余的a.py代码运行之前,文件b.py中的代码必须完全运行。

在b.py代码中,有一些代码是该文件b.py独有的,我们不希望导入b.py文件的任何其他文件(b.py文件除外)来运行它。

这就是这行代码检查的内容。如果它是运行代码的主文件(即b.py)(在这种情况下不是)(运行a.py是主文件),则仅执行代码。

All the answers have pretty much explained the functionality. But I will provide one example of its usage which might help clearing out the concept further.

Assume that you have two Python files, a.py and b.py. Now, a.py imports b.py. We run the a.py file, where the “import b.py” code is executed first. Before the rest of the a.py code runs, the code in the file b.py must run completely.

In the b.py code there is some code that is exclusive to that file b.py and we don’t want any other file (other than b.py file), that has imported the b.py file, to run it.

So that is what this line of code checks. If it is the main file (i.e., b.py) running the code, which in this case it is not (a.py is the main file running), then only the code gets executed.


回答 27

创建一个文件a.py

print(__name__) # It will print out __main__

__name__始终等于__main__该文件直接运行时表明它是主文件。

在同一目录中创建另一个文件b.py

import a  # Prints a

运行。它将打印一个,即被导入文件的名称。

因此,为了显示同一文件的两种不同行为,这是一个常用的技巧:

# Code to be run when imported into another python file

if __name__ == '__main__':
    # Code to be run only when run directly

Create a file, a.py:

print(__name__) # It will print out __main__

__name__ is always equal to __main__ whenever that file is run directly showing that this is the main file.

Create another file, b.py, in the same directory:

import a  # Prints a

Run it. It will print a, i.e., the name of the file which is imported.

So, to show two different behavior of the same file, this is a commonly used trick:

# Code to be run when imported into another python file

if __name__ == '__main__':
    # Code to be run only when run directly

回答 28

如果name ==’ main ‘:

我们__name__ == '__main__':经常查看。

它检查是否正在导入模块。

换句话说,if仅当代码直接运行时,才会执行该块中的代码。这里的directly意思是not imported

让我们看一下使用打印模块名称的简单代码的作用:

# test.py
def test():
   print('test module name=%s' %(__name__))

if __name__ == '__main__':
   print('call test()')
   test()

如果我们直接通过运行代码python test.py,则模块名称为__main__

call test()
test module name=__main__

if name == ‘main‘:

We see if __name__ == '__main__': quite often.

It checks if a module is being imported or not.

In other words, the code within the if block will be executed only when the code runs directly. Here directly means not imported.

Let’s see what it does using a simple code that prints the name of the module:

# test.py
def test():
   print('test module name=%s' %(__name__))

if __name__ == '__main__':
   print('call test()')
   test()

If we run the code directly via python test.py, the module name is __main__:

call test()
test module name=__main__

回答 29

简而言之,就像C编程语言中的main函数一样,它是运行文件的入口。

Simply, it is the entry point to run the file, like the main function in the C programming language.


Python是否具有三元条件运算符?

问题:Python是否具有三元条件运算符?

如果Python没有三元条件运算符,是否可以使用其他语言构造来模拟一个?

If Python does not have a ternary conditional operator, is it possible to simulate one using other language constructs?


回答 0

是的,它是在2.5版中添加的。表达式语法为:

a if condition else b

第一condition被评估,则恰好中的任一个ab进行评估并返回基于所述布尔condition。如果conditionTruea则被评估并返回但b被忽略,否则b被评估为返回但a被忽略。

这允许发生短路,因为condition仅当a评估为true时才进行评估,而b根本不评估,而当condition为false 评估时,仅b评估时a就完全不评估。

例如:

>>> 'true' if True else 'false'
'true'
>>> 'true' if False else 'false'
'false'

请注意,条件是表达式,而不是语句。这意味着您不能在条件表达式中使用赋值语句pass或其他语句

>>> pass if False else x = 3
  File "<stdin>", line 1
    pass if False else x = 3
          ^
SyntaxError: invalid syntax

但是,您可以使用条件表达式来分配变量,如下所示:

x = a if True else b

将条件表达式视为在两个值之间切换。当您处于“一个价值或另一个价值”的情况下,它非常有用,但是它没有做其他的事情。

如果需要使用语句,则必须使用普通if 语句而不是条件表达式


请记住,由于某些原因,某些Pythonista对此并不满意:

  • 参数的顺序与condition ? a : b其他许多语言(例如C,C ++,Go,Perl,Ruby,Java,Javascript等)中的经典三元运算符的顺序不同,当人们不熟悉Python的“令人惊讶的”行为使用它(它们可能会颠倒参数顺序)。
  • 有些人认为它“笨拙”,因为它与正常的思维流程相反(先思考条件,然后思考效果)。
  • 风格上的原因。(尽管“内联if”可能确实有用,并且可以使脚本更简洁,但确实会使代码复杂化)

如果您在记住顺序时遇到麻烦,请记住当大声朗读时,您(几乎)说出了您的意思。例如,x = 4 if b > 8 else 9将朗读为x will be 4 if b is greater than 8 otherwise 9

官方文件:

Yes, it was added in version 2.5. The expression syntax is:

a if condition else b

First condition is evaluated, then exactly one of either a or b is evaluated and returned based on the Boolean value of condition. If condition evaluates to True, then a is evaluated and returned but b is ignored, or else when b is evaluated and returned but a is ignored.

This allows short-circuiting because when condition is true only a is evaluated and b is not evaluated at all, but when condition is false only b is evaluated and a is not evaluated at all.

For example:

>>> 'true' if True else 'false'
'true'
>>> 'true' if False else 'false'
'false'

Note that conditionals are an expression, not a statement. This means you can’t use assignment statements or pass or other statements within a conditional expression:

>>> pass if False else x = 3
  File "<stdin>", line 1
    pass if False else x = 3
          ^
SyntaxError: invalid syntax

You can, however, use conditional expressions to assign a variable like so:

x = a if True else b

Think of the conditional expression as switching between two values. It is very useful when you’re in a ‘one value or another’ situation, it but doesn’t do much else.

If you need to use statements, you have to use a normal if statement instead of a conditional expression.


Keep in mind that it’s frowned upon by some Pythonistas for several reasons:

  • The order of the arguments is different from those of the classic condition ? a : b ternary operator from many other languages (such as C, C++, Go, Perl, Ruby, Java, Javascript, etc.), which may lead to bugs when people unfamiliar with Python’s “surprising” behaviour use it (they may reverse the argument order).
  • Some find it “unwieldy”, since it goes contrary to the normal flow of thought (thinking of the condition first and then the effects).
  • Stylistic reasons. (Although the ‘inline if‘ can be really useful, and make your script more concise, it really does complicate your code)

If you’re having trouble remembering the order, then remember that when read aloud, you (almost) say what you mean. For example, x = 4 if b > 8 else 9 is read aloud as x will be 4 if b is greater than 8 otherwise 9.

Official documentation:


回答 1

您可以索引到一个元组:

(falseValue, trueValue)[test]

test需要返回TrueFalse
始终将其实现为:

(falseValue, trueValue)[test == True]

或者您可以使用内置函数bool()来确保布尔值:

(falseValue, trueValue)[bool(<expression>)]

You can index into a tuple:

(falseValue, trueValue)[test]

test needs to return True or False.
It might be safer to always implement it as:

(falseValue, trueValue)[test == True]

or you can use the built-in bool() to assure a Boolean value:

(falseValue, trueValue)[bool(<expression>)]

回答 2

对于2.5之前的版本,有个窍门:

[expression] and [on_true] or [on_false]

on_true 布尔值错误时,可能会给出错误的结果。1
尽管这样做确实有好处,从左到右评估表达式,我认为这很清楚。

1. 是否有C的等价的“?:”三元运算符?

For versions prior to 2.5, there’s the trick:

[expression] and [on_true] or [on_false]

It can give wrong results when on_true has a false boolean value.1
Although it does have the benefit of evaluating expressions left to right, which is clearer in my opinion.

1. Is there an equivalent of C’s ”?:” ternary operator?


回答 3

<expression 1> if <condition> else <expression 2>

a = 1
b = 2

1 if a > b else -1 
# Output is -1

1 if a > b else -1 if a < b else 0
# Output is -1

<expression 1> if <condition> else <expression 2>

a = 1
b = 2

1 if a > b else -1 
# Output is -1

1 if a > b else -1 if a < b else 0
# Output is -1

回答 4

文档中

条件表达式(有时称为“三元运算符”)在所有Python操作中具有最低的优先级。

表达式x if C else y首先计算条件Cnot x);如果C为true,则对x求值并返回其值;否则,将评估y并返回其值。

有关条件表达式的更多详细信息,请参见PEP 308

从2.5版开始新增。

From the documentation:

Conditional expressions (sometimes called a “ternary operator”) have the lowest priority of all Python operations.

The expression x if C else y first evaluates the condition, C (not x); if C is true, x is evaluated and its value is returned; otherwise, y is evaluated and its value is returned.

See PEP 308 for more details about conditional expressions.

New since version 2.5.


回答 5

作为Python增强建议308的一部分,2006年添加了Python条件表达式的运算符。它的形式与普通?:运算符不同,它是:

<expression1> if <condition> else <expression2>

等效于:

if <condition>: <expression1> else: <expression2>

这是一个例子:

result = x if a > b else y

可以使用的另一种语法(与2.5之前的版本兼容):

result = (lambda:y, lambda:x)[a > b]()

懒惰求操作数的地方。

另一种方法是通过索引元组(与大多数其他语言的条件运算符不一致):

result = (y, x)[a > b]

或显式构造的字典:

result = {True: x, False: y}[a > b]

另一种(不太可靠)但更简单的方法是使用andor运算符:

result = (a > b) and x or y

但是如果x这样的话不会起作用False

可能的解决方法是按如下所示制作xy列出或元组:

result = ((a > b) and [x] or [y])[0]

要么:

result = ((a > b) and (x,) or (y,))[0]

如果您使用的是字典,则可以使用,而不是使用三元条件,get(key, default)例如:

shell = os.environ.get('SHELL', "/bin/sh")

资料来源:?:维基百科中的Python

An operator for a conditional expression in Python was added in 2006 as part of Python Enhancement Proposal 308. Its form differ from common ?: operator and it’s:

<expression1> if <condition> else <expression2>

which is equivalent to:

if <condition>: <expression1> else: <expression2>

Here is an example:

result = x if a > b else y

Another syntax which can be used (compatible with versions before 2.5):

result = (lambda:y, lambda:x)[a > b]()

where operands are lazily evaluated.

Another way is by indexing a tuple (which isn’t consistent with the conditional operator of most other languages):

result = (y, x)[a > b]

or explicitly constructed dictionary:

result = {True: x, False: y}[a > b]

Another (less reliable), but simpler method is to use and and or operators:

result = (a > b) and x or y

however this won’t work if x would be False.

A possible workaround is to make x and y lists or tuples as in the following:

result = ((a > b) and [x] or [y])[0]

or:

result = ((a > b) and (x,) or (y,))[0]

If you’re working with dictionaries, instead of using a ternary conditional, you can take advantage of get(key, default), for example:

shell = os.environ.get('SHELL', "/bin/sh")

Source: ?: in Python at Wikipedia


回答 6

不幸的是,

(falseValue, trueValue)[test]

解决方案没有捷径;因此falseValuetrueValue无论条件如何,都对和进行评估。这可能不是最理想的,甚至可能是错误的(即两者兼有,trueValue并且falseValue可能是方法并且有副作用)。

一种解决方案是

(lambda: falseValue, lambda: trueValue)[test]()

(执行会延迟到知道获胜者为止;),但是会在可调用对象和不可调用对象之间引入不一致。此外,使用属性时无法解决问题。

故事就这样了-在提到的3个解决方案之间进行选择是要在具有短路功能,至少使用Зython2.5(恕我直言,不再是问题)与不易于出现“ trueValue-evaluates-to-false”错误之间进行权衡。

Unfortunately, the

(falseValue, trueValue)[test]

solution doesn’t have short-circuit behaviour; thus both falseValue and trueValue are evaluated regardless of the condition. This could be suboptimal or even buggy (i.e. both trueValue and falseValue could be methods and have side-effects).

One solution to this would be

(lambda: falseValue, lambda: trueValue)[test]()

(execution delayed until the winner is known ;)), but it introduces inconsistency between callable and non-callable objects. In addition, it doesn’t solve the case when using properties.

And so the story goes – choosing between 3 mentioned solutions is a trade-off between having the short-circuit feature, using at least Зython 2.5 (IMHO not a problem anymore) and not being prone to “trueValue-evaluates-to-false” errors.


回答 7

不同编程语言的三元运算符

在这里,我只是试图展示ternary operator几种编程语言之间的一些重要区别。

Javascript中的三元运算符

var a = true ? 1 : 0;
# 1
var b = false ? 1 : 0;
# 0

Ruby中的三元运算符

a = true ? 1 : 0
# 1
b = false ? 1 : 0
# 0

Scala中的三元运算符

val a = true ? 1 | 0
# 1
val b = false ? 1 | 0
# 0

R编程中的三元运算符

a <- if (TRUE) 1 else 0
# 1
b <- if (FALSE) 1 else 0
# 0

Python中的三元运算符

a = 1 if True else 0
# 1
b = 1 if False else 0
# 0

Ternary Operator in different programming Languages

Here I just try to show some important difference in ternary operator between a couple of programming languages.

Ternary Operator in Javascript

var a = true ? 1 : 0;
# 1
var b = false ? 1 : 0;
# 0

Ternary Operator in Ruby

a = true ? 1 : 0
# 1
b = false ? 1 : 0
# 0

Ternary operator in Scala

val a = true ? 1 | 0
# 1
val b = false ? 1 | 0
# 0

Ternary operator in R programming

a <- if (TRUE) 1 else 0
# 1
b <- if (FALSE) 1 else 0
# 0

Ternary operator in Python

a = 1 if True else 0
# 1
b = 1 if False else 0
# 0

回答 8

对于Python 2.5及更高版本,有一种特定的语法:

[on_true] if [cond] else [on_false]

在较旧的Python中,未实现三元运算符,但可以对其进行仿真。

cond and on_true or on_false

不过,有一个潜在的问题,如果cond计算结果为True,并on_true评估为Falseon_false返回来代替on_true。如果您想要这种行为,该方法可以,否则请使用以下方法:

{True: on_true, False: on_false}[cond is True] # is True, not == True

可以用以下方法包装:

def q(cond, on_true, on_false)
    return {True: on_true, False: on_false}[cond is True]

并以这种方式使用:

q(cond, on_true, on_false)

它与所有Python版本兼容。

For Python 2.5 and newer there is a specific syntax:

[on_true] if [cond] else [on_false]

In older Pythons a ternary operator is not implemented but it’s possible to simulate it.

cond and on_true or on_false

Though, there is a potential problem, which if cond evaluates to True and on_true evaluates to False then on_false is returned instead of on_true. If you want this behavior the method is OK, otherwise use this:

{True: on_true, False: on_false}[cond is True] # is True, not == True

which can be wrapped by:

def q(cond, on_true, on_false)
    return {True: on_true, False: on_false}[cond is True]

and used this way:

q(cond, on_true, on_false)

It is compatible with all Python versions.


回答 9

您可能经常会发现

cond and on_true or on_false

但这会在on_true == 0时导致问题

>>> x = 0
>>> print x == 0 and 0 or 1 
1
>>> x = 1
>>> print x == 0 and 0 or 1 
1

您期望普通三元运算符得到的结果

>>> x = 0
>>> print 0 if x == 0 else 1 
0
>>> x = 1
>>> print 0 if x == 0 else 1 
1

You might often find

cond and on_true or on_false

but this lead to problem when on_true == 0

>>> x = 0
>>> print x == 0 and 0 or 1 
1
>>> x = 1
>>> print x == 0 and 0 or 1 
1

where you would expect for a normal ternary operator this result

>>> x = 0
>>> print 0 if x == 0 else 1 
0
>>> x = 1
>>> print 0 if x == 0 else 1 
1

回答 10

Python是否具有三元条件运算符?

是。从语法文件

test: or_test ['if' or_test 'else' test] | lambdef

感兴趣的部分是:

or_test ['if' or_test 'else' test]

因此,三元条件运算的形式为:

expression1 if expression2 else expression3

expression3将被懒惰地求值(即,仅expression2在布尔上下文中为false 时才求值)。而且由于递归定义,您可以无限地链接它们(尽管它可能被认为是不好的样式。)

expression1 if expression2 else expression3 if expression4 else expression5 # and so on

使用注意事项:

请注意,每个if之后都必须带有else。人们在学习列表理解和生成器表达式时可能会发现这是一个很难学习的课-以下内容将不起作用,因为Python期望将其他表达式用作第三个表达式:

[expression1 if expression2 for element in iterable]
#                          ^-- need an else here

这引发了一个SyntaxError: invalid syntax。因此,以上内容要么是一个不完整的逻辑(也许用户期望在错误条件下不进行操作),要么是打算将expression2用作过滤器-请注意,以下内容是合法的Python:

[expression1 for element in iterable if expression2]

expression2用作列表理解的过滤器,而不是三元条件运算符。

较窄情况的替代语法:

您可能会发现编写以下内容有些痛苦:

expression1 if expression1 else expression2

expression1对于上述用法,将必须进行两次评估。如果它只是局部变量,则可以限制冗余。但是,此用例的常见且高性能的Python习惯用法是使用or的快捷方式行为:

expression1 or expression2

这在语义上是等效的。请注意,某些样式指南可能出于清楚的原因而限制了此用法-它确实将很多含义包含在很少的语法中。

Does Python have a ternary conditional operator?

Yes. From the grammar file:

test: or_test ['if' or_test 'else' test] | lambdef

The part of interest is:

or_test ['if' or_test 'else' test]

So, a ternary conditional operation is of the form:

expression1 if expression2 else expression3

expression3 will be lazily evaluated (that is, evaluated only if expression2 is false in a boolean context). And because of the recursive definition, you can chain them indefinitely (though it may considered bad style.)

expression1 if expression2 else expression3 if expression4 else expression5 # and so on

A note on usage:

Note that every if must be followed with an else. People learning list comprehensions and generator expressions may find this to be a difficult lesson to learn – the following will not work, as Python expects a third expression for an else:

[expression1 if expression2 for element in iterable]
#                          ^-- need an else here

which raises a SyntaxError: invalid syntax. So the above is either an incomplete piece of logic (perhaps the user expects a no-op in the false condition) or what may be intended is to use expression2 as a filter – notes that the following is legal Python:

[expression1 for element in iterable if expression2]

expression2 works as a filter for the list comprehension, and is not a ternary conditional operator.

Alternative syntax for a more narrow case:

You may find it somewhat painful to write the following:

expression1 if expression1 else expression2

expression1 will have to be evaluated twice with the above usage. It can limit redundancy if it is simply a local variable. However, a common and performant Pythonic idiom for this use-case is to use or‘s shortcutting behavior:

expression1 or expression2

which is equivalent in semantics. Note that some style-guides may limit this usage on the grounds of clarity – it does pack a lot of meaning into very little syntax.


回答 11

模拟python三元运算符。

例如

a, b, x, y = 1, 2, 'a greather than b', 'b greater than a'
result = (lambda:y, lambda:x)[a > b]()

输出:

'b greater than a'

Simulating the python ternary operator.

For example

a, b, x, y = 1, 2, 'a greather than b', 'b greater than a'
result = (lambda:y, lambda:x)[a > b]()

output:

'b greater than a'

回答 12

三元条件运算符仅允许在单行中测试条件,从而代替多行if-else,从而使代码紧凑。

句法 :

如果[表达式],则为[on_true],否则为[on_false]

1-使用三元运算符的简单方法:

# Program to demonstrate conditional operator
a, b = 10, 20
# Copy value of a in min if a < b else copy b
min = a if a < b else b
print(min)  # Output: 10

2-使用元组,字典和lambda的直接方法:

# Python program to demonstrate ternary operator
a, b = 10, 20
# Use tuple for selecting an item
print( (b, a) [a < b] )
# Use Dictionary for selecting an item
print({True: a, False: b} [a < b])
# lamda is more efficient than above two methods
# because in lambda  we are assure that
# only one expression will be evaluated unlike in
# tuple and Dictionary
print((lambda: b, lambda: a)[a < b]()) # in output you should see three 10

3-三元运算符可以写为嵌套if-else:

# Python program to demonstrate nested ternary operator
a, b = 10, 20
print ("Both a and b are equal" if a == b else "a is greater than b"
        if a > b else "b is greater than a")

上面的方法可以写成:

# Python program to demonstrate nested ternary operator
a, b = 10, 20
if a != b:
    if a > b:
        print("a is greater than b")
    else:
        print("b is greater than a")
else:
    print("Both a and b are equal") 
# Output: b is greater than a

Ternary conditional operator simply allows testing a condition in a single line replacing the multiline if-else making the code compact.

Syntax :

[on_true] if [expression] else [on_false]

1- Simple Method to use ternary operator:

# Program to demonstrate conditional operator
a, b = 10, 20
# Copy value of a in min if a < b else copy b
min = a if a < b else b
print(min)  # Output: 10

2- Direct Method of using tuples, Dictionary, and lambda:

# Python program to demonstrate ternary operator
a, b = 10, 20
# Use tuple for selecting an item
print( (b, a) [a < b] )
# Use Dictionary for selecting an item
print({True: a, False: b} [a < b])
# lamda is more efficient than above two methods
# because in lambda  we are assure that
# only one expression will be evaluated unlike in
# tuple and Dictionary
print((lambda: b, lambda: a)[a < b]()) # in output you should see three 10

3- Ternary operator can be written as nested if-else:

# Python program to demonstrate nested ternary operator
a, b = 10, 20
print ("Both a and b are equal" if a == b else "a is greater than b"
        if a > b else "b is greater than a")

Above approach can be written as:

# Python program to demonstrate nested ternary operator
a, b = 10, 20
if a != b:
    if a > b:
        print("a is greater than b")
    else:
        print("b is greater than a")
else:
    print("Both a and b are equal") 
# Output: b is greater than a

回答 13

你可以这样做 :-

[condition] and [expression_1] or [expression_2] ;

例:-

print(number%2 and "odd" or "even")

如果数字为奇数,则将打印“ odd”;如果数字为偶数,则将打印“偶数”。


结果:- 如果条件为true,则执行exp_1,否则执行exp_2。

注意 :- -0,None,False,emptylist,emptyString计算为False。除0以外的任何数据都将评估为True。

运作方式如下:

如果条件[condition]变为“ True”,则将评估expression_1而不是expression_2。如果我们以0(零)“和”某物,结果将总是令人讨厌。因此,在下面的语句中,

0 and exp

完全不会对表达式exp求值,因为带有0的“ and”将始终求值为零,因此无需求值。这就是编译器本身在所有语言中的工作方式。

1 or exp

表达式exp根本不会被求值,因为带有“ 1”的“或”将始终为1。因此,无论如何,只要结果为1,就不会费心计算表达式exp。(编译器优化方法)。

但是在

True and exp1 or exp2

由于第二个表达式exp2不会被求值 True and exp1当exp1不为false时将为True。

同样在

False and exp1 or exp2

由于False等于写入0并用0进行“和”本身将为0,因此不会对表达式exp1进行求值,但是在exp1之后,由于使用了“或”,它将在表达式“或”之后求值exp2。


注意:-仅当expression_1的True值不为False(或0或None或emptylist []或emptystring”。)时,才可以使用这种使用“ or”和“ and”的分支,因为如果expression_1变为False,则由于exp_1和exp_2之间存在“或”,将对表达式_2进行求值。

如果您仍然想使它适用于所有情况,而不论exp_1和exp_2真值是多少,请执行以下操作:

[condition] and ([expression_1] or 1) or [expression_2] ;

you can do this :-

[condition] and [expression_1] or [expression_2] ;

Example:-

print(number%2 and "odd" or "even")

This would print “odd” if the number is odd or “even” if the number is even.


The result :- If condition is true exp_1 is executed else exp_2 is executed.

Note :- 0 , None , False , emptylist , emptyString evaluates as False. And any data other than 0 evaluates to True.

Here’s how it works:

if the condition [condition] becomes “True” then , expression_1 will be evaluated but not expression_2 . If we “and” something with 0 (zero) , the result will always to be fasle .So in the below statement ,

0 and exp

The expression exp won’t be evaluated at all since “and” with 0 will always evaluate to zero and there is no need to evaluate the expression . This is how the compiler itself works , in all languages.

In

1 or exp

the expression exp won’t be evaluated at all since “or” with 1 will always be 1. So it won’t bother to evaluate the expression exp since the result will be 1 anyway . (compiler optimization methods).

But in case of

True and exp1 or exp2

The second expression exp2 won’t be evaluated since True and exp1 would be True when exp1 isn’t false .

Similarly in

False and exp1 or exp2

The expression exp1 won’t be evaluated since False is equivalent to writing 0 and doing “and” with 0 would be 0 itself but after exp1 since “or” is used, it will evaluate the expression exp2 after “or” .


Note:- This kind of branching using “or” and “and” can only be used when the expression_1 doesn’t have a Truth value of False (or 0 or None or emptylist [ ] or emptystring ‘ ‘.) since if expression_1 becomes False , then the expression_2 will be evaluated because of the presence “or” between exp_1 and exp_2.

In case you still want to make it work for all the cases regardless of what exp_1 and exp_2 truth values are, do this :-

[condition] and ([expression_1] or 1) or [expression_2] ;


回答 14

提示多于答案(不需要在第一百遍重复显而易见的内容),但是我有时在这样的结构中将其用作单行捷径:

if conditionX:
    print('yes')
else:
    print('nah')

,变为:

print('yes') if conditionX else print('nah')

有些(很多:)可能不喜欢它是非Python风格的(甚至是红宝石色的:),但我个人认为它更自然-即您通常的表达方式,并且在较大的代码块中更具视觉吸引力。

More a tip than an answer (don’t need to repeat the obvious for the hundreth time), but I sometimes use it as a oneliner shortcut in such constructs:

if conditionX:
    print('yes')
else:
    print('nah')

, becomes:

print('yes') if conditionX else print('nah')

Some (many :) may frown upon it as unpythonic (even, ruby-ish :), but I personally find it more natural – i.e. how you’d express it normally, plus a bit more visually appealing in large blocks of code.


回答 15

a if condition else b

如果您难以记住,只需记住这座金字塔:

     condition
  if           else
a                   b 
a if condition else b

Just memorize this pyramid if you have trouble remembering:

     condition
  if           else
a                   b 

回答 16

Python 条件表达式的替代方法之一

"yes" if boolean else "no"

是以下内容:

{True:"yes", False:"no"}[boolean]

具有以下很好的扩展:

{True:"yes", False:"no", None:"maybe"}[boolean_or_none]

最短的选择仍然是:

("no", "yes")[boolean]

但是别无选择

yes() if boolean else no()

如果要避免对yes() 求值no(),因为

(no(), yes())[boolean]  # bad

no()yes()评估。

One of the alternatives to Python’s conditional expression

"yes" if boolean else "no"

is the following:

{True:"yes", False:"no"}[boolean]

which has the following nice extension:

{True:"yes", False:"no", None:"maybe"}[boolean_or_none]

The shortest alternative remains:

("no", "yes")[boolean]

but there is no alternative to

yes() if boolean else no()

if you want to avoid the evaluation of yes() and no(), because in

(no(), yes())[boolean]  # bad

both no() and yes() are evaluated.


回答 17

许多派生自的编程语言C通常具有以下三元条件运算符的语法:

<condition> ? <expression1> : <expression2>

起初,Python enevolent d ictator ˚F大号 IFE(我的意思是吉多·范罗苏姆,当然)拒绝了(非Python化风格),因为它是挺难理解不习惯的人C的语言。另外,冒号在中:已经有很多用途Python。在PEP 308批准后,Python最终收到了自己的快捷方式条件表达式(我们现在使用的是):

<expression1> if <condition> else <expression2>

因此,首先它评估条件。如果返回True,则将对expression1求值以给出结果,否则将对expression2求值。由于懒惰的评估机制–仅执行一个表达式。

以下是一些示例(条件将从左到右评估):

pressure = 10
print('High' if pressure < 20 else 'Critical')

# Result is 'High'

三元运算符可以串联在一起:

pressure = 5
print('Normal' if pressure < 10 else 'High' if pressure < 20 else 'Critical')

# Result is 'Normal'

下一个与上一个相同:

pressure = 5

if pressure < 20:
    if pressure < 10:
        print('Normal')
    else:
        print('High')
else:
    print('Critical')

# Result is 'Normal'

希望这可以帮助。

Many programming languages derived from C usually have the following syntax of ternary conditional operator:

<condition> ? <expression1> : <expression2>

At first, the Python Benevolent Dictator For Life (I mean Guido van Rossum, of course) rejected it (as non-Pythonic style), since it’s quite hard to understand for people not used to C language. Also, the colon sign : already has many uses in Python. After PEP 308 was approved, Python finally received its own shortcut conditional expression (what we use now):

<expression1> if <condition> else <expression2>

So, firstly it evaluates the condition. If it returns True, expression1 will be evaluated to give the result, otherwise expression2 will be evaluated. Due to Lazy Evaluation mechanics – only one expression will be executed.

Here are some examples (conditions will be evaluated from left to right):

pressure = 10
print('High' if pressure < 20 else 'Critical')

# Result is 'High'

Ternary operators can be chained in series:

pressure = 5
print('Normal' if pressure < 10 else 'High' if pressure < 20 else 'Critical')

# Result is 'Normal'

The following one is the same as previous one:

pressure = 5

if pressure < 20:
    if pressure < 10:
        print('Normal')
    else:
        print('High')
else:
    print('Critical')

# Result is 'Normal'

Hope this helps.


回答 18

正如已经回答的那样,是的,在python中有一个三元运算符:

<expression 1> if <condition> else <expression 2>

附加信息:

如果<expression 1>是这种情况,则可以使用“ 短罪犯”评估

a = True
b = False

# Instead of this:
x = a if a else b

# You could use Short-cirquit evaluation:
x = a or b

PS:当然,短路短路评估不是三元运算符,但是在短路就足够的情况下,经常使用三元运算符。

As already answered, yes there is a ternary operator in python:

<expression 1> if <condition> else <expression 2>

Additional information:

If <expression 1> is the condition you can use Short-cirquit evaluation:

a = True
b = False

# Instead of this:
x = a if a else b

# You could use Short-cirquit evaluation:
x = a or b

PS: Of course, a Short-cirquit evaluation is not a ternary operator but often the ternary is used in cases where the short circuit would be enough.


回答 19

是的,python有一个三元运算符,这是语法和示例代码来演示相同的内容:)

#[On true] if [expression] else[On false]
# if the expression evaluates to true then it will pass On true otherwise On false


a= input("Enter the First Number ")
b= input("Enter the Second Number ")

print("A is Bigger") if a>b else print("B is Bigger")

YES, python have a ternary operator, here is the syntax and an example code to demonstrate the same :)

#[On true] if [expression] else[On false]
# if the expression evaluates to true then it will pass On true otherwise On false


a= input("Enter the First Number ")
b= input("Enter the Second Number ")

print("A is Bigger") if a>b else print("B is Bigger")

回答 20

Python具有三元形式的赋值。但是,人们甚至应该注意更短的形式。

通常需要根据条件将一个值或另一个值赋给变量。

>>> li1 = None
>>> li2 = [1, 2, 3]
>>> 
>>> if li1:
...     a = li1
... else:
...     a = li2
...     
>>> a
[1, 2, 3]

^这是进行此类分配的长格式。

以下是三元形式。但这不是最简洁的方法-请参阅最后一个示例。

>>> a = li1 if li1 else li2
>>> 
>>> a
[1, 2, 3]
>>> 

使用Python,您可以简单地or用于其他分配。

>>> a = li1 or li2
>>> 
>>> a
[1, 2, 3]
>>> 

上面的代码自li1is开始起作用,None并且interp在逻辑表达式中将其视为False。然后,interp继续并计算第二个表达式,该表达式不是None,也不是一个空列表-因此将其分配给a。

这也适用于空列表。例如,如果您想分配a包含项目的列表。

>>> li1 = []
>>> li2 = [1, 2, 3]
>>> 
>>> a = li1 or li2
>>> 
>>> a
[1, 2, 3]
>>> 

知道了这一点,您可以在遇到作业时简单地进行此类作业。这也适用于字符串和其他可迭代对象。您可以分配a不为空的任何字符串。

>>> s1 = ''
>>> s2 = 'hello world'
>>> 
>>> a = s1 or s2
>>> 
>>> a
'hello world'
>>> 

我一直很喜欢C三进制语法,但是Python更进一步!

我知道有人可能说这不是一个好的风格选择,因为它依赖于并非所有开发人员都立即了解的机制。我个人不同意这种观点。Python是一种语法丰富的语言,具有许多惯用技巧,而这些惯用技巧对Dabler而言并不立即显而易见。但是,您越了解和理解底层系统的机制,就越会欣赏它。

Python has a ternary form for assignments; however there may be even a shorter form that people should be aware of.

It’s very common to need to assign to a variable one value or another depending on a condition.

>>> li1 = None
>>> li2 = [1, 2, 3]
>>> 
>>> if li1:
...     a = li1
... else:
...     a = li2
...     
>>> a
[1, 2, 3]

^ This is the long form for doing such assignments.

Below is the ternary form. But this isn’t most succinct way – see last example.

>>> a = li1 if li1 else li2
>>> 
>>> a
[1, 2, 3]
>>> 

With Python, you can simply use or for alternative assignments.

>>> a = li1 or li2
>>> 
>>> a
[1, 2, 3]
>>> 

The above works since li1 is None and the interp treats that as False in logic expressions. The interp then moves on and evaluates the second expression, which is not None and it’s not an empty list – so it gets assigned to a.

This also works with empty lists. For instance, if you want to assign a whichever list has items.

>>> li1 = []
>>> li2 = [1, 2, 3]
>>> 
>>> a = li1 or li2
>>> 
>>> a
[1, 2, 3]
>>> 

Knowing this, you can simply such assignments whenever you encounter them. This also works with strings and other iterables. You could assign a whichever string isn’t empty.

>>> s1 = ''
>>> s2 = 'hello world'
>>> 
>>> a = s1 or s2
>>> 
>>> a
'hello world'
>>> 

I always liked the C ternary syntax, but Python takes it a step further!

I understand that some may say this isn’t a good stylistic choice because it relies on mechanics that aren’t immediately apparent to all developers. I personally disagree with that viewpoint. Python is a syntax rich language with lots of idiomatic tricks that aren’t immediately apparent to the dabler. But the more you learn and understand the mechanics of the underlying system, the more you appreciate it.


回答 21

其他答案正确地讨论了Python三元运算符。我想通过提及一个经常使用三元运算符但有更好的成语的场景来进行补充。这是使用默认值的情况。

假设我们要使用option_value未设置的默认值:

run_algorithm(option_value if option_value is not None else 10)

或简单地

run_algorithm(option_value if option_value else 10)

但是,一个更好的解决方案是简单地编写

run_algorithm(option_value or 10)

Other answers correctly talk about the Python ternary operator. I would like to complement by mentioning a scenario for which the ternary operator is often used but for which there is a better idiom. This is the scenario of using a default value.

Suppose we want to use option_value with a default value if it is not set:

run_algorithm(option_value if option_value is not None else 10)

or simply

run_algorithm(option_value if option_value else 10)

However, an ever better solution is simply to write

run_algorithm(option_value or 10)

回答 22

如果定义了变量,并且您想检查它是否具有值,则可以 a or b

def test(myvar=None):
    # shorter than: print myvar if myvar else "no Input"
    print myvar or "no Input"

test()
test([])
test(False)
test('hello')
test(['Hello'])
test(True)

将输出

no Input
no Input
no Input
hello
['Hello']
True

if variable is defined and you want to check if it has value you can just a or b

def test(myvar=None):
    # shorter than: print myvar if myvar else "no Input"
    print myvar or "no Input"

test()
test([])
test(False)
test('hello')
test(['Hello'])
test(True)

will output

no Input
no Input
no Input
hello
['Hello']
True

回答 23

链接多个运算符的一种巧妙方法:

f = lambda x,y: 'greater' if x > y else 'less' if y > x else 'equal'

array = [(0,0),(0,1),(1,0),(1,1)]

for a in array:
  x, y = a[0], a[1]
  print(f(x,y))

# Output is:
#   equal,
#   less,
#   greater,
#   equal

A neat way to chain multiple operators:

f = lambda x,y: 'greater' if x > y else 'less' if y > x else 'equal'

array = [(0,0),(0,1),(1,0),(1,1)]

for a in array:
  x, y = a[0], a[1]
  print(f(x,y))

# Output is:
#   equal,
#   less,
#   greater,
#   equal


回答 24

我发现麻烦的是默认的python语法val = a if cond else b,所以有时我这样做:

iif = lambda (cond, a, b): a if cond else b
# so I can then use it like:
val = iif(cond, a, b)

当然,它总是总是评估双方(a和b),但它的语法对我来说更清晰

I find cumbersome the default python syntax val = a if cond else b, so sometimes I do this:

iif = lambda (cond, a, b): a if cond else b
# so I can then use it like:
val = iif(cond, a, b)

Of course, it has the downside of always evaluating both sides (a and b), but the syntax it’s way clearer to me


回答 25

is_spacial=True if gender = "Female" else (True if age >= 65 else False)

**

它可以根据需要嵌套。祝你好运

**

is_spacial=True if gender = "Female" else (True if age >= 65 else False)

**

it can be nested as your need. best of luck

**


Python中的元类是什么?

问题:Python中的元类是什么?

在Python中,什么是元类?我们将它们用于什么?

In Python, what are metaclasses and what do we use them for?


回答 0

元类是类的类。类定义类的实例(即对象)的行为,而元类定义类的行为。类是元类的实例。

虽然在Python中,您可以对元类使用任意可调用对象(例如Jerub演示),但是更好的方法是使其成为实际的类。type是Python中常见的元类。type它本身是一个类,并且是它自己的类型。您将无法type纯粹使用Python 重新创建类似的东西,但是Python有点作弊。要在Python中创建自己的元类,您实际上只想将其子类化type

元类最常用作类工厂。当通过调用类创建对象时,Python通过调用元类来创建一个新类(执行“ class”语句时)。因此,将元类与普通方法__init____new__方法结合使用,可以使您在创建类时做“额外的事情”,例如使用某些注册表注册新类或将其完全替换为其他类。

class执行该语句时,Python首先将class语句的主体作为普通代码块执行。生成的命名空间(一个dict)保存了将来类的属性。通过查看要成为类的基类(继承了元类),要成为__metaclass__的类(如果有)的属性或__metaclass__全局变量来确定元类。然后使用该类的名称,基数和属性调用该元类以实例化它。

但是,元类实际上定义了类的类型,而不仅仅是它的工厂,因此您可以使用它们做更多的事情。例如,您可以在元类上定义常规方法。这些元类方法就像类方法,因为它们可以在没有实例的情况下在类上调用,但是它们也不像类方法,因为它们不能在类的实例上被调用。type.__subclasses__()type元类上方法的示例。您还可以定义正常的“魔力”的方法,如__add____iter____getattr__,执行或如何变化的类的行为。

这是一些汇总示例:

def make_hook(f):
    """Decorator to turn 'foo' method into '__foo__'"""
    f.is_hook = 1
    return f

class MyType(type):
    def __new__(mcls, name, bases, attrs):

        if name.startswith('None'):
            return None

        # Go over attributes and see if they should be renamed.
        newattrs = {}
        for attrname, attrvalue in attrs.iteritems():
            if getattr(attrvalue, 'is_hook', 0):
                newattrs['__%s__' % attrname] = attrvalue
            else:
                newattrs[attrname] = attrvalue

        return super(MyType, mcls).__new__(mcls, name, bases, newattrs)

    def __init__(self, name, bases, attrs):
        super(MyType, self).__init__(name, bases, attrs)

        # classregistry.register(self, self.interfaces)
        print "Would register class %s now." % self

    def __add__(self, other):
        class AutoClass(self, other):
            pass
        return AutoClass
        # Alternatively, to autogenerate the classname as well as the class:
        # return type(self.__name__ + other.__name__, (self, other), {})

    def unregister(self):
        # classregistry.unregister(self)
        print "Would unregister class %s now." % self

class MyObject:
    __metaclass__ = MyType


class NoneSample(MyObject):
    pass

# Will print "NoneType None"
print type(NoneSample), repr(NoneSample)

class Example(MyObject):
    def __init__(self, value):
        self.value = value
    @make_hook
    def add(self, other):
        return self.__class__(self.value + other.value)

# Will unregister the class
Example.unregister()

inst = Example(10)
# Will fail with an AttributeError
#inst.unregister()

print inst + inst
class Sibling(MyObject):
    pass

ExampleSibling = Example + Sibling
# ExampleSibling is now a subclass of both Example and Sibling (with no
# content of its own) although it will believe it's called 'AutoClass'
print ExampleSibling
print ExampleSibling.__mro__

A metaclass is the class of a class. A class defines how an instance of the class (i.e. an object) behaves while a metaclass defines how a class behaves. A class is an instance of a metaclass.

While in Python you can use arbitrary callables for metaclasses (like Jerub shows), the better approach is to make it an actual class itself. type is the usual metaclass in Python. type is itself a class, and it is its own type. You won’t be able to recreate something like type purely in Python, but Python cheats a little. To create your own metaclass in Python you really just want to subclass type.

A metaclass is most commonly used as a class-factory. When you create an object by calling the class, Python creates a new class (when it executes the ‘class’ statement) by calling the metaclass. Combined with the normal __init__ and __new__ methods, metaclasses therefore allow you to do ‘extra things’ when creating a class, like registering the new class with some registry or replace the class with something else entirely.

When the class statement is executed, Python first executes the body of the class statement as a normal block of code. The resulting namespace (a dict) holds the attributes of the class-to-be. The metaclass is determined by looking at the baseclasses of the class-to-be (metaclasses are inherited), at the __metaclass__ attribute of the class-to-be (if any) or the __metaclass__ global variable. The metaclass is then called with the name, bases and attributes of the class to instantiate it.

However, metaclasses actually define the type of a class, not just a factory for it, so you can do much more with them. You can, for instance, define normal methods on the metaclass. These metaclass-methods are like classmethods in that they can be called on the class without an instance, but they are also not like classmethods in that they cannot be called on an instance of the class. type.__subclasses__() is an example of a method on the type metaclass. You can also define the normal ‘magic’ methods, like __add__, __iter__ and __getattr__, to implement or change how the class behaves.

Here’s an aggregated example of the bits and pieces:

def make_hook(f):
    """Decorator to turn 'foo' method into '__foo__'"""
    f.is_hook = 1
    return f

class MyType(type):
    def __new__(mcls, name, bases, attrs):

        if name.startswith('None'):
            return None

        # Go over attributes and see if they should be renamed.
        newattrs = {}
        for attrname, attrvalue in attrs.iteritems():
            if getattr(attrvalue, 'is_hook', 0):
                newattrs['__%s__' % attrname] = attrvalue
            else:
                newattrs[attrname] = attrvalue

        return super(MyType, mcls).__new__(mcls, name, bases, newattrs)

    def __init__(self, name, bases, attrs):
        super(MyType, self).__init__(name, bases, attrs)

        # classregistry.register(self, self.interfaces)
        print "Would register class %s now." % self

    def __add__(self, other):
        class AutoClass(self, other):
            pass
        return AutoClass
        # Alternatively, to autogenerate the classname as well as the class:
        # return type(self.__name__ + other.__name__, (self, other), {})

    def unregister(self):
        # classregistry.unregister(self)
        print "Would unregister class %s now." % self

class MyObject:
    __metaclass__ = MyType


class NoneSample(MyObject):
    pass

# Will print "NoneType None"
print type(NoneSample), repr(NoneSample)

class Example(MyObject):
    def __init__(self, value):
        self.value = value
    @make_hook
    def add(self, other):
        return self.__class__(self.value + other.value)

# Will unregister the class
Example.unregister()

inst = Example(10)
# Will fail with an AttributeError
#inst.unregister()

print inst + inst
class Sibling(MyObject):
    pass

ExampleSibling = Example + Sibling
# ExampleSibling is now a subclass of both Example and Sibling (with no
# content of its own) although it will believe it's called 'AutoClass'
print ExampleSibling
print ExampleSibling.__mro__

回答 1

类作为对象

在理解元类之前,您需要掌握Python的类。Python从Smalltalk语言中借用了一个非常特殊的类概念。

在大多数语言中,类只是描述如何产生对象的代码段。在Python中也是如此:

>>> class ObjectCreator(object):
...       pass
...

>>> my_object = ObjectCreator()
>>> print(my_object)
<__main__.ObjectCreator object at 0x8974f2c>

但是类比Python中更多。类也是对象。

是的,对象。

一旦使用关键字class,Python就会执行它并创建一个对象。指令

>>> class ObjectCreator(object):
...       pass
...

在内存中创建一个名称为“ ObjectCreator”的对象。

这个对象(类)本身具有创建对象(实例)的能力,这就是为什么它是一个类

但是,它仍然是一个对象,因此:

  • 您可以将其分配给变量
  • 你可以复制它
  • 您可以为其添加属性
  • 您可以将其作为函数参数传递

例如:

>>> print(ObjectCreator) # you can print a class because it's an object
<class '__main__.ObjectCreator'>
>>> def echo(o):
...       print(o)
...
>>> echo(ObjectCreator) # you can pass a class as a parameter
<class '__main__.ObjectCreator'>
>>> print(hasattr(ObjectCreator, 'new_attribute'))
False
>>> ObjectCreator.new_attribute = 'foo' # you can add attributes to a class
>>> print(hasattr(ObjectCreator, 'new_attribute'))
True
>>> print(ObjectCreator.new_attribute)
foo
>>> ObjectCreatorMirror = ObjectCreator # you can assign a class to a variable
>>> print(ObjectCreatorMirror.new_attribute)
foo
>>> print(ObjectCreatorMirror())
<__main__.ObjectCreator object at 0x8997b4c>

动态创建类

由于类是对象,因此您可以像创建任何对象一样即时创建它们。

首先,您可以使用class以下方法在函数中创建一个类:

>>> def choose_class(name):
...     if name == 'foo':
...         class Foo(object):
...             pass
...         return Foo # return the class, not an instance
...     else:
...         class Bar(object):
...             pass
...         return Bar
...
>>> MyClass = choose_class('foo')
>>> print(MyClass) # the function returns a class, not an instance
<class '__main__.Foo'>
>>> print(MyClass()) # you can create an object from this class
<__main__.Foo object at 0x89c6d4c>

但这并不是那么动态,因为您仍然必须自己编写整个类。

由于类是对象,因此它们必须由某种东西生成。

使用class关键字时,Python会自动创建此对象。但是,与Python中的大多数事情一样,它为您提供了一种手动进行操作的方法。

还记得功能type吗?好的旧函数可以让您知道对象的类型:

>>> print(type(1))
<type 'int'>
>>> print(type("1"))
<type 'str'>
>>> print(type(ObjectCreator))
<type 'type'>
>>> print(type(ObjectCreator()))
<class '__main__.ObjectCreator'>

嗯,type具有完全不同的功能,它也可以动态创建类。type可以将类的描述作为参数,并返回一个类。

(我知道,根据传递给它的参数,同一个函数可以有两种完全不同的用法是很愚蠢的。由于在Python中向后兼容,这是一个问题)

type 这样工作:

type(name, bases, attrs)

哪里:

  • name:类的名称
  • bases:父类的元组(对于继承,可以为空)
  • attrs:包含属性名称和值的字典

例如:

>>> class MyShinyClass(object):
...       pass

可以通过以下方式手动创建:

>>> MyShinyClass = type('MyShinyClass', (), {}) # returns a class object
>>> print(MyShinyClass)
<class '__main__.MyShinyClass'>
>>> print(MyShinyClass()) # create an instance with the class
<__main__.MyShinyClass object at 0x8997cec>

您会注意到,我们使用“ MyShinyClass”作为类的名称和变量来保存类引用。它们可以不同,但​​是没有理由使事情复杂化。

type接受字典来定义类的属性。所以:

>>> class Foo(object):
...       bar = True

可以翻译为:

>>> Foo = type('Foo', (), {'bar':True})

并用作普通类:

>>> print(Foo)
<class '__main__.Foo'>
>>> print(Foo.bar)
True
>>> f = Foo()
>>> print(f)
<__main__.Foo object at 0x8a9b84c>
>>> print(f.bar)
True

当然,您可以从中继承,因此:

>>>   class FooChild(Foo):
...         pass

将会:

>>> FooChild = type('FooChild', (Foo,), {})
>>> print(FooChild)
<class '__main__.FooChild'>
>>> print(FooChild.bar) # bar is inherited from Foo
True

最终,您需要向类中添加方法。只需定义具有适当签名的函数并将其分配为属性即可。

>>> def echo_bar(self):
...       print(self.bar)
...
>>> FooChild = type('FooChild', (Foo,), {'echo_bar': echo_bar})
>>> hasattr(Foo, 'echo_bar')
False
>>> hasattr(FooChild, 'echo_bar')
True
>>> my_foo = FooChild()
>>> my_foo.echo_bar()
True

在动态创建类之后,您可以添加更多方法,就像将方法添加到正常创建的类对象中一样。

>>> def echo_bar_more(self):
...       print('yet another method')
...
>>> FooChild.echo_bar_more = echo_bar_more
>>> hasattr(FooChild, 'echo_bar_more')
True

您会看到我们要去的方向:在Python中,类是对象,您可以动态动态地创建一个类。

这是Python在使用关键字class时所做的,并且是通过使用元类来完成的。

什么是元类(最终)

元类是创建类的“东西”。

您定义类是为了创建对象,对吗?

但是我们了解到Python类是对象。

好吧,元类是创建这些对象的原因。它们是类的类,您可以通过以下方式描绘它们:

MyClass = MetaClass()
my_object = MyClass()

您已经看到,type您可以执行以下操作:

MyClass = type('MyClass', (), {})

这是因为该函数type实际上是一个元类。type是Python用于在幕后创建所有类的元类。

现在您想知道为什么用小写而不是小写Type

好吧,我想这与str,创建字符串对象int的类和创建整数对象的类的一致性有关。type只是创建类对象的类。

您可以通过检查__class__属性来看到。

一切,我的意思是,一切都是Python中的对象。其中包括整数,字符串,函数和类。它们都是对象。所有这些都是从一个类创建的:

>>> age = 35
>>> age.__class__
<type 'int'>
>>> name = 'bob'
>>> name.__class__
<type 'str'>
>>> def foo(): pass
>>> foo.__class__
<type 'function'>
>>> class Bar(object): pass
>>> b = Bar()
>>> b.__class__
<class '__main__.Bar'>

现在,什么是__class__任何__class__

>>> age.__class__.__class__
<type 'type'>
>>> name.__class__.__class__
<type 'type'>
>>> foo.__class__.__class__
<type 'type'>
>>> b.__class__.__class__
<type 'type'>

因此,元类只是创建类对象的东西。

如果愿意,可以将其称为“Class工厂”。

type 是Python使用的内置元类,但是您当然可以创建自己的元类。

__metaclass__属性

在Python 2中,您可以__metaclass__在编写类时添加属性(有关Python 3语法,请参见下一部分):

class Foo(object):
    __metaclass__ = something...
    [...]

如果这样做,Python将使用元类来创建class Foo

小心点,这很棘手。

class Foo(object)先编写,但Foo尚未在内存中创建类对象。

Python将__metaclass__在类定义中查找。如果找到它,它将使用它来创建对象类Foo。如果没有,它将 type用于创建类。

读几次。

当您这样做时:

class Foo(Bar):
    pass

Python执行以下操作:

中有__metaclass__属性Foo吗?

如果是的话,在内存中创建一个类对象(我说的是类对象,陪在我身边在这里),名称Foo使用是什么__metaclass__

如果Python找不到__metaclass__,它将__metaclass__在MODULE级别查找a ,然后尝试执行相同的操作(但仅适用于不继承任何内容的类,基本上是老式的类)。

然后,如果根本找不到任何对象__metaclass__,它将使用Bar的(第一个父对象)自己的元类(可能是默认值type)创建类对象。

请注意,该__metaclass__属性将不会被继承,父(Bar.__class__)的元类将被继承。如果Bar使用的__metaclass__是创建的属性Bartype()(不是type.__new__()),子类不会继承该行为。

现在最大的问题是,您可以输入__metaclass__什么?

答案是:可以创建类的东西。

什么可以创建一个类?type,或任何继承或使用它的内容。

Python 3中的元类

设置元类的语法在Python 3中已更改:

class Foo(object, metaclass=something):
    ...

__metaclass__不再使用该属性,而在基类列表中使用关键字参数。

但是,元类的行为基本保持不变

在python 3中添加到元类的一件事是,您还可以将属性作为关键字参数传递给元类,如下所示:

class Foo(object, metaclass=something, kwarg1=value1, kwarg2=value2):
    ...

阅读以下部分,了解python如何处理此问题。

自定义元类

元类的主要目的是在创建类时自动更改它。

通常,您要针对要在其中创建与当前上下文匹配的类的API进行此操作。

想象一个愚蠢的示例,在该示例中,您决定模块中的所有类的属性都应以大写形式编写。有多种方法可以执行此操作,但是一种方法是__metaclass__在模块级别进行设置。

这样,将使用此元类创建该模块的所有类,而我们只需要告诉元类将所有属性都转换为大写即可。

幸运的是,__metaclass__实际上可以是任何可调用的,它不需要是正式的类(我知道,名称中带有“ class”的东西不必是类,请弄清楚……但这很有用)。

因此,我们将从使用函数的简单示例开始。

# the metaclass will automatically get passed the same argument
# that you usually pass to `type`
def upper_attr(future_class_name, future_class_parents, future_class_attrs):
    """
      Return a class object, with the list of its attribute turned
      into uppercase.
    """
    # pick up any attribute that doesn't start with '__' and uppercase it
    uppercase_attrs = {
        attr if attr.startswith("__") else attr.upper(): v
        for attr, v in future_class_attrs.items()
    }

    # let `type` do the class creation
    return type(future_class_name, future_class_parents, uppercase_attrs)

__metaclass__ = upper_attr # this will affect all classes in the module

class Foo(): # global __metaclass__ won't work with "object" though
    # but we can define __metaclass__ here instead to affect only this class
    # and this will work with "object" children
    bar = 'bip'

让我们检查:

>>> hasattr(Foo, 'bar')
False
>>> hasattr(Foo, 'BAR')
True
>>> Foo.BAR
'bip'

现在,让我们做完全一样的操作,但是对元类使用真实的类:

# remember that `type` is actually a class like `str` and `int`
# so you can inherit from it
class UpperAttrMetaclass(type):
    # __new__ is the method called before __init__
    # it's the method that creates the object and returns it
    # while __init__ just initializes the object passed as parameter
    # you rarely use __new__, except when you want to control how the object
    # is created.
    # here the created object is the class, and we want to customize it
    # so we override __new__
    # you can do some stuff in __init__ too if you wish
    # some advanced use involves overriding __call__ as well, but we won't
    # see this
    def __new__(upperattr_metaclass, future_class_name,
                future_class_parents, future_class_attrs):
        uppercase_attrs = {
            attr if attr.startswith("__") else attr.upper(): v
            for attr, v in future_class_attrs.items()
        }
        return type(future_class_name, future_class_parents, uppercase_attrs)

让我们重写上面的内容,但是现在有了更短,更实际的变量名,我们知道它们的含义了:

class UpperAttrMetaclass(type):
    def __new__(cls, clsname, bases, attrs):
        uppercase_attrs = {
            attr if attr.startswith("__") else attr.upper(): v
            for attr, v in attrs.items()
        }
        return type(clsname, bases, uppercase_attrs)

您可能已经注意到了额外的参数cls。它没有什么特别的:__new__始终将其定义的类作为第一个参数。就像您有self将实例作为第一个参数接收的普通方法一样,还是为类方法定义了类。

但这不是适当的OOP。我们正在type直接调用,而不是覆盖或调用父母的__new__。让我们改为:

class UpperAttrMetaclass(type):
    def __new__(cls, clsname, bases, attrs):
        uppercase_attrs = {
            attr if attr.startswith("__") else attr.upper(): v
            for attr, v in attrs.items()
        }
        return type.__new__(cls, clsname, bases, uppercase_attrs)

通过使用super,我们可以使其更加整洁,这将简化继承(因为是的,您可以具有元类,从元类继承,从类型继承):

class UpperAttrMetaclass(type):
    def __new__(cls, clsname, bases, attrs):
        uppercase_attrs = {
            attr if attr.startswith("__") else attr.upper(): v
            for attr, v in attrs.items()
        }
        return super(UpperAttrMetaclass, cls).__new__(
            cls, clsname, bases, uppercase_attrs)

哦,在python 3中,如果您使用关键字参数进行此调用,例如:

class Foo(object, metaclass=MyMetaclass, kwarg1=value1):
    ...

它将在元类中转换为使用它:

class MyMetaclass(type):
    def __new__(cls, clsname, bases, dct, kwargs1=default):
        ...

而已。实际上,关于元类的更多信息了。

使用元类编写代码的复杂性背后的原因不是因为元类,而是因为您通常使用元类依靠自省,操纵继承以及诸如var之类的变量来做扭曲的事情__dict__

实际上,元类对于进行黑魔法特别有用,因此也很复杂。但就其本身而言,它们很简单:

  • 拦截类创建
  • 修改Class
  • 返回修改后的类

为什么要使用元类类而不是函数?

既然__metaclass__可以接受任何可调用对象,那么为什么要使用一个类,因为它显然更复杂?

这样做有几个原因:

  • 意图很明确。阅读时UpperAttrMetaclass(type),您会知道接下来会发生什么
  • 您可以使用OOP。元类可以继承元类,重写父方法。元类甚至可以使用元类。
  • 如果您指定了元类类,但未指定元类函数,则该类的子类将是其元类的实例。
  • 您可以更好地构建代码。绝对不要像上面的示例那样将元类用于琐碎的事情。通常用于复杂的事情。能够制作几种方法并将它们分组在一个类中的能力对于使代码易于阅读非常有用。
  • 您可以勾上__new____init____call__。这将使您可以做不同的事情。即使通常可以全部使用__new__,有些人也更习惯使用__init__
  • 这些被称为元类,该死!它一定意味着什么!

为什么要使用元类?

现在是个大问题。为什么要使用一些晦涩的易错功能?

好吧,通常您不会:

元类是更深层的魔术,99%的用户永远不必担心。如果您想知道是否需要它们,则不需要(实际上需要它们的人肯定会知道他们需要它们,并且不需要解释原因)。

Python大师Tim Peters

元类的主要用例是创建一个API。一个典型的例子是Django ORM。它允许您定义如下内容:

class Person(models.Model):
    name = models.CharField(max_length=30)
    age = models.IntegerField()

但是,如果您这样做:

person = Person(name='bob', age='35')
print(person.age)

它不会返回IntegerField对象。它将返回一个int,甚至可以直接从数据库中获取它。

这是可能的,因为models.Modeldefine __metaclass__并使用了一些魔术,这些魔术将使Person您使用简单语句定义的对象变成与数据库字段的复杂挂钩。

Django通过公开一个简单的API并使用元类,从该API重新创建代码来完成幕后的实际工作,使看起来复杂的事情变得简单。

最后

首先,您知道类是可以创建实例的对象。

实际上,类本身就是实例。元类的。

>>> class Foo(object): pass
>>> id(Foo)
142630324

一切都是Python中的对象,它们都是类的实例或元类的实例。

除了type

type实际上是它自己的元类。这不是您可以在纯Python中复制的东西,而是通过在实现级别上作弊来完成的。

其次,元类很复杂。您可能不希望将它们用于非常简单的类更改。您可以使用两种不同的技术来更改类:

99%的时间您需要Class变更,最好使用这些。

但是在98%的时间中,您根本不需要更改Class。

Classes as objects

Before understanding metaclasses, you need to master classes in Python. And Python has a very peculiar idea of what classes are, borrowed from the Smalltalk language.

In most languages, classes are just pieces of code that describe how to produce an object. That’s kinda true in Python too:

>>> class ObjectCreator(object):
...       pass
...

>>> my_object = ObjectCreator()
>>> print(my_object)
<__main__.ObjectCreator object at 0x8974f2c>

But classes are more than that in Python. Classes are objects too.

Yes, objects.

As soon as you use the keyword class, Python executes it and creates an OBJECT. The instruction

>>> class ObjectCreator(object):
...       pass
...

creates in memory an object with the name “ObjectCreator”.

This object (the class) is itself capable of creating objects (the instances), and this is why it’s a class.

But still, it’s an object, and therefore:

  • you can assign it to a variable
  • you can copy it
  • you can add attributes to it
  • you can pass it as a function parameter

e.g.:

>>> print(ObjectCreator) # you can print a class because it's an object
<class '__main__.ObjectCreator'>
>>> def echo(o):
...       print(o)
...
>>> echo(ObjectCreator) # you can pass a class as a parameter
<class '__main__.ObjectCreator'>
>>> print(hasattr(ObjectCreator, 'new_attribute'))
False
>>> ObjectCreator.new_attribute = 'foo' # you can add attributes to a class
>>> print(hasattr(ObjectCreator, 'new_attribute'))
True
>>> print(ObjectCreator.new_attribute)
foo
>>> ObjectCreatorMirror = ObjectCreator # you can assign a class to a variable
>>> print(ObjectCreatorMirror.new_attribute)
foo
>>> print(ObjectCreatorMirror())
<__main__.ObjectCreator object at 0x8997b4c>

Creating classes dynamically

Since classes are objects, you can create them on the fly, like any object.

First, you can create a class in a function using class:

>>> def choose_class(name):
...     if name == 'foo':
...         class Foo(object):
...             pass
...         return Foo # return the class, not an instance
...     else:
...         class Bar(object):
...             pass
...         return Bar
...
>>> MyClass = choose_class('foo')
>>> print(MyClass) # the function returns a class, not an instance
<class '__main__.Foo'>
>>> print(MyClass()) # you can create an object from this class
<__main__.Foo object at 0x89c6d4c>

But it’s not so dynamic, since you still have to write the whole class yourself.

Since classes are objects, they must be generated by something.

When you use the class keyword, Python creates this object automatically. But as with most things in Python, it gives you a way to do it manually.

Remember the function type? The good old function that lets you know what type an object is:

>>> print(type(1))
<type 'int'>
>>> print(type("1"))
<type 'str'>
>>> print(type(ObjectCreator))
<type 'type'>
>>> print(type(ObjectCreator()))
<class '__main__.ObjectCreator'>

Well, type has a completely different ability, it can also create classes on the fly. type can take the description of a class as parameters, and return a class.

(I know, it’s silly that the same function can have two completely different uses according to the parameters you pass to it. It’s an issue due to backwards compatibility in Python)

type works this way:

type(name, bases, attrs)

Where:

  • name: name of the class
  • bases: tuple of the parent class (for inheritance, can be empty)
  • attrs: dictionary containing attributes names and values

e.g.:

>>> class MyShinyClass(object):
...       pass

can be created manually this way:

>>> MyShinyClass = type('MyShinyClass', (), {}) # returns a class object
>>> print(MyShinyClass)
<class '__main__.MyShinyClass'>
>>> print(MyShinyClass()) # create an instance with the class
<__main__.MyShinyClass object at 0x8997cec>

You’ll notice that we use “MyShinyClass” as the name of the class and as the variable to hold the class reference. They can be different, but there is no reason to complicate things.

type accepts a dictionary to define the attributes of the class. So:

>>> class Foo(object):
...       bar = True

Can be translated to:

>>> Foo = type('Foo', (), {'bar':True})

And used as a normal class:

>>> print(Foo)
<class '__main__.Foo'>
>>> print(Foo.bar)
True
>>> f = Foo()
>>> print(f)
<__main__.Foo object at 0x8a9b84c>
>>> print(f.bar)
True

And of course, you can inherit from it, so:

>>>   class FooChild(Foo):
...         pass

would be:

>>> FooChild = type('FooChild', (Foo,), {})
>>> print(FooChild)
<class '__main__.FooChild'>
>>> print(FooChild.bar) # bar is inherited from Foo
True

Eventually you’ll want to add methods to your class. Just define a function with the proper signature and assign it as an attribute.

>>> def echo_bar(self):
...       print(self.bar)
...
>>> FooChild = type('FooChild', (Foo,), {'echo_bar': echo_bar})
>>> hasattr(Foo, 'echo_bar')
False
>>> hasattr(FooChild, 'echo_bar')
True
>>> my_foo = FooChild()
>>> my_foo.echo_bar()
True

And you can add even more methods after you dynamically create the class, just like adding methods to a normally created class object.

>>> def echo_bar_more(self):
...       print('yet another method')
...
>>> FooChild.echo_bar_more = echo_bar_more
>>> hasattr(FooChild, 'echo_bar_more')
True

You see where we are going: in Python, classes are objects, and you can create a class on the fly, dynamically.

This is what Python does when you use the keyword class, and it does so by using a metaclass.

What are metaclasses (finally)

Metaclasses are the ‘stuff’ that creates classes.

You define classes in order to create objects, right?

But we learned that Python classes are objects.

Well, metaclasses are what create these objects. They are the classes’ classes, you can picture them this way:

MyClass = MetaClass()
my_object = MyClass()

You’ve seen that type lets you do something like this:

MyClass = type('MyClass', (), {})

It’s because the function type is in fact a metaclass. type is the metaclass Python uses to create all classes behind the scenes.

Now you wonder why the heck is it written in lowercase, and not Type?

Well, I guess it’s a matter of consistency with str, the class that creates strings objects, and int the class that creates integer objects. type is just the class that creates class objects.

You see that by checking the __class__ attribute.

Everything, and I mean everything, is an object in Python. That includes ints, strings, functions and classes. All of them are objects. And all of them have been created from a class:

>>> age = 35
>>> age.__class__
<type 'int'>
>>> name = 'bob'
>>> name.__class__
<type 'str'>
>>> def foo(): pass
>>> foo.__class__
<type 'function'>
>>> class Bar(object): pass
>>> b = Bar()
>>> b.__class__
<class '__main__.Bar'>

Now, what is the __class__ of any __class__ ?

>>> age.__class__.__class__
<type 'type'>
>>> name.__class__.__class__
<type 'type'>
>>> foo.__class__.__class__
<type 'type'>
>>> b.__class__.__class__
<type 'type'>

So, a metaclass is just the stuff that creates class objects.

You can call it a ‘class factory’ if you wish.

type is the built-in metaclass Python uses, but of course, you can create your own metaclass.

The __metaclass__ attribute

In Python 2, you can add a __metaclass__ attribute when you write a class (see next section for the Python 3 syntax):

class Foo(object):
    __metaclass__ = something...
    [...]

If you do so, Python will use the metaclass to create the class Foo.

Careful, it’s tricky.

You write class Foo(object) first, but the class object Foo is not created in memory yet.

Python will look for __metaclass__ in the class definition. If it finds it, it will use it to create the object class Foo. If it doesn’t, it will use type to create the class.

Read that several times.

When you do:

class Foo(Bar):
    pass

Python does the following:

Is there a __metaclass__ attribute in Foo?

If yes, create in memory a class object (I said a class object, stay with me here), with the name Foo by using what is in __metaclass__.

If Python can’t find __metaclass__, it will look for a __metaclass__ at the MODULE level, and try to do the same (but only for classes that don’t inherit anything, basically old-style classes).

Then if it can’t find any __metaclass__ at all, it will use the Bar‘s (the first parent) own metaclass (which might be the default type) to create the class object.

Be careful here that the __metaclass__ attribute will not be inherited, the metaclass of the parent (Bar.__class__) will be. If Bar used a __metaclass__ attribute that created Bar with type() (and not type.__new__()), the subclasses will not inherit that behavior.

Now the big question is, what can you put in __metaclass__ ?

The answer is: something that can create a class.

And what can create a class? type, or anything that subclasses or uses it.

Metaclasses in Python 3

The syntax to set the metaclass has been changed in Python 3:

class Foo(object, metaclass=something):
    ...

i.e. the __metaclass__ attribute is no longer used, in favor of a keyword argument in the list of base classes.

The behaviour of metaclasses however stays largely the same.

One thing added to metaclasses in python 3 is that you can also pass attributes as keyword-arguments into a metaclass, like so:

class Foo(object, metaclass=something, kwarg1=value1, kwarg2=value2):
    ...

Read the section below for how python handles this.

Custom metaclasses

The main purpose of a metaclass is to change the class automatically, when it’s created.

You usually do this for APIs, where you want to create classes matching the current context.

Imagine a stupid example, where you decide that all classes in your module should have their attributes written in uppercase. There are several ways to do this, but one way is to set __metaclass__ at the module level.

This way, all classes of this module will be created using this metaclass, and we just have to tell the metaclass to turn all attributes to uppercase.

Luckily, __metaclass__ can actually be any callable, it doesn’t need to be a formal class (I know, something with ‘class’ in its name doesn’t need to be a class, go figure… but it’s helpful).

So we will start with a simple example, by using a function.

# the metaclass will automatically get passed the same argument
# that you usually pass to `type`
def upper_attr(future_class_name, future_class_parents, future_class_attrs):
    """
      Return a class object, with the list of its attribute turned
      into uppercase.
    """
    # pick up any attribute that doesn't start with '__' and uppercase it
    uppercase_attrs = {
        attr if attr.startswith("__") else attr.upper(): v
        for attr, v in future_class_attrs.items()
    }

    # let `type` do the class creation
    return type(future_class_name, future_class_parents, uppercase_attrs)

__metaclass__ = upper_attr # this will affect all classes in the module

class Foo(): # global __metaclass__ won't work with "object" though
    # but we can define __metaclass__ here instead to affect only this class
    # and this will work with "object" children
    bar = 'bip'

Let’s check:

>>> hasattr(Foo, 'bar')
False
>>> hasattr(Foo, 'BAR')
True
>>> Foo.BAR
'bip'

Now, let’s do exactly the same, but using a real class for a metaclass:

# remember that `type` is actually a class like `str` and `int`
# so you can inherit from it
class UpperAttrMetaclass(type):
    # __new__ is the method called before __init__
    # it's the method that creates the object and returns it
    # while __init__ just initializes the object passed as parameter
    # you rarely use __new__, except when you want to control how the object
    # is created.
    # here the created object is the class, and we want to customize it
    # so we override __new__
    # you can do some stuff in __init__ too if you wish
    # some advanced use involves overriding __call__ as well, but we won't
    # see this
    def __new__(upperattr_metaclass, future_class_name,
                future_class_parents, future_class_attrs):
        uppercase_attrs = {
            attr if attr.startswith("__") else attr.upper(): v
            for attr, v in future_class_attrs.items()
        }
        return type(future_class_name, future_class_parents, uppercase_attrs)

Let’s rewrite the above, but with shorter and more realistic variable names now that we know what they mean:

class UpperAttrMetaclass(type):
    def __new__(cls, clsname, bases, attrs):
        uppercase_attrs = {
            attr if attr.startswith("__") else attr.upper(): v
            for attr, v in attrs.items()
        }
        return type(clsname, bases, uppercase_attrs)

You may have noticed the extra argument cls. There is nothing special about it: __new__ always receives the class it’s defined in, as first parameter. Just like you have self for ordinary methods which receive the instance as first parameter, or the defining class for class methods.

But this is not proper OOP. We are calling type directly and we aren’t overriding or calling the parent’s __new__. Let’s do that instead:

class UpperAttrMetaclass(type):
    def __new__(cls, clsname, bases, attrs):
        uppercase_attrs = {
            attr if attr.startswith("__") else attr.upper(): v
            for attr, v in attrs.items()
        }
        return type.__new__(cls, clsname, bases, uppercase_attrs)

We can make it even cleaner by using super, which will ease inheritance (because yes, you can have metaclasses, inheriting from metaclasses, inheriting from type):

class UpperAttrMetaclass(type):
    def __new__(cls, clsname, bases, attrs):
        uppercase_attrs = {
            attr if attr.startswith("__") else attr.upper(): v
            for attr, v in attrs.items()
        }
        return super(UpperAttrMetaclass, cls).__new__(
            cls, clsname, bases, uppercase_attrs)

Oh, and in python 3 if you do this call with keyword arguments, like this:

class Foo(object, metaclass=MyMetaclass, kwarg1=value1):
    ...

It translates to this in the metaclass to use it:

class MyMetaclass(type):
    def __new__(cls, clsname, bases, dct, kwargs1=default):
        ...

That’s it. There is really nothing more about metaclasses.

The reason behind the complexity of the code using metaclasses is not because of metaclasses, it’s because you usually use metaclasses to do twisted stuff relying on introspection, manipulating inheritance, vars such as __dict__, etc.

Indeed, metaclasses are especially useful to do black magic, and therefore complicated stuff. But by themselves, they are simple:

  • intercept a class creation
  • modify the class
  • return the modified class

Why would you use metaclasses classes instead of functions?

Since __metaclass__ can accept any callable, why would you use a class since it’s obviously more complicated?

There are several reasons to do so:

  • The intention is clear. When you read UpperAttrMetaclass(type), you know what’s going to follow
  • You can use OOP. Metaclass can inherit from metaclass, override parent methods. Metaclasses can even use metaclasses.
  • Subclasses of a class will be instances of its metaclass if you specified a metaclass-class, but not with a metaclass-function.
  • You can structure your code better. You never use metaclasses for something as trivial as the above example. It’s usually for something complicated. Having the ability to make several methods and group them in one class is very useful to make the code easier to read.
  • You can hook on __new__, __init__ and __call__. Which will allow you to do different stuff. Even if usually you can do it all in __new__, some people are just more comfortable using __init__.
  • These are called metaclasses, damn it! It must mean something!

Why would you use metaclasses?

Now the big question. Why would you use some obscure error prone feature?

Well, usually you don’t:

Metaclasses are deeper magic that 99% of users should never worry about. If you wonder whether you need them, you don’t (the people who actually need them know with certainty that they need them, and don’t need an explanation about why).

Python Guru Tim Peters

The main use case for a metaclass is creating an API. A typical example of this is the Django ORM. It allows you to define something like this:

class Person(models.Model):
    name = models.CharField(max_length=30)
    age = models.IntegerField()

But if you do this:

person = Person(name='bob', age='35')
print(person.age)

It won’t return an IntegerField object. It will return an int, and can even take it directly from the database.

This is possible because models.Model defines __metaclass__ and it uses some magic that will turn the Person you just defined with simple statements into a complex hook to a database field.

Django makes something complex look simple by exposing a simple API and using metaclasses, recreating code from this API to do the real job behind the scenes.

The last word

First, you know that classes are objects that can create instances.

Well in fact, classes are themselves instances. Of metaclasses.

>>> class Foo(object): pass
>>> id(Foo)
142630324

Everything is an object in Python, and they are all either instances of classes or instances of metaclasses.

Except for type.

type is actually its own metaclass. This is not something you could reproduce in pure Python, and is done by cheating a little bit at the implementation level.

Secondly, metaclasses are complicated. You may not want to use them for very simple class alterations. You can change classes by using two different techniques:

99% of the time you need class alteration, you are better off using these.

But 98% of the time, you don’t need class alteration at all.


回答 2

请注意,此答案适用于2008年编写的Python 2.x,元类在3.x中略有不同。

元类是使“类”工作的秘诀。新样式对象的默认元类称为“类型”。

class type(object)
  |  type(object) -> the object's type
  |  type(name, bases, dict) -> a new type

元类带有3个参数。’ 名称 ‘,’ 基数 ‘和’ 字典

这是秘密的开始。在此示例类定义中查找名称,基数和字典来自何处。

class ThisIsTheName(Bases, Are, Here):
    All_the_code_here
    def doesIs(create, a):
        dict

让我们定义一个元类,该元类将演示“ class: ” 如何调用它。

def test_metaclass(name, bases, dict):
    print 'The Class Name is', name
    print 'The Class Bases are', bases
    print 'The dict has', len(dict), 'elems, the keys are', dict.keys()

    return "yellow"

class TestName(object, None, int, 1):
    __metaclass__ = test_metaclass
    foo = 1
    def baz(self, arr):
        pass

print 'TestName = ', repr(TestName)

# output => 
The Class Name is TestName
The Class Bases are (<type 'object'>, None, <type 'int'>, 1)
The dict has 4 elems, the keys are ['baz', '__module__', 'foo', '__metaclass__']
TestName =  'yellow'

现在,一个实际上意味着含义的示例将自动使列表中的变量在类上设置为“属性”,并设置为“无”。

def init_attributes(name, bases, dict):
    if 'attributes' in dict:
        for attr in dict['attributes']:
            dict[attr] = None

    return type(name, bases, dict)

class Initialised(object):
    __metaclass__ = init_attributes
    attributes = ['foo', 'bar', 'baz']

print 'foo =>', Initialised.foo
# output=>
foo => None

请注意,Initialised通过拥有元类而获得的不可思议的行为init_attributes不会传递到的子类上Initialised

这是一个更具体的示例,显示了如何子类化“类型”以创建一个在创建类时执行操作的元类。这很棘手:

class MetaSingleton(type):
    instance = None
    def __call__(cls, *args, **kw):
        if cls.instance is None:
            cls.instance = super(MetaSingleton, cls).__call__(*args, **kw)
        return cls.instance

class Foo(object):
    __metaclass__ = MetaSingleton

a = Foo()
b = Foo()
assert a is b

Note, this answer is for Python 2.x as it was written in 2008, metaclasses are slightly different in 3.x.

Metaclasses are the secret sauce that make ‘class’ work. The default metaclass for a new style object is called ‘type’.

class type(object)
  |  type(object) -> the object's type
  |  type(name, bases, dict) -> a new type

Metaclasses take 3 args. ‘name‘, ‘bases‘ and ‘dict

Here is where the secret starts. Look for where name, bases and the dict come from in this example class definition.

class ThisIsTheName(Bases, Are, Here):
    All_the_code_here
    def doesIs(create, a):
        dict

Lets define a metaclass that will demonstrate how ‘class:‘ calls it.

def test_metaclass(name, bases, dict):
    print 'The Class Name is', name
    print 'The Class Bases are', bases
    print 'The dict has', len(dict), 'elems, the keys are', dict.keys()

    return "yellow"

class TestName(object, None, int, 1):
    __metaclass__ = test_metaclass
    foo = 1
    def baz(self, arr):
        pass

print 'TestName = ', repr(TestName)

# output => 
The Class Name is TestName
The Class Bases are (<type 'object'>, None, <type 'int'>, 1)
The dict has 4 elems, the keys are ['baz', '__module__', 'foo', '__metaclass__']
TestName =  'yellow'

And now, an example that actually means something, this will automatically make the variables in the list “attributes” set on the class, and set to None.

def init_attributes(name, bases, dict):
    if 'attributes' in dict:
        for attr in dict['attributes']:
            dict[attr] = None

    return type(name, bases, dict)

class Initialised(object):
    __metaclass__ = init_attributes
    attributes = ['foo', 'bar', 'baz']

print 'foo =>', Initialised.foo
# output=>
foo => None

Note that the magic behaviour that Initialised gains by having the metaclass init_attributes is not passed onto a subclass of Initialised.

Here is an even more concrete example, showing how you can subclass ‘type’ to make a metaclass that performs an action when the class is created. This is quite tricky:

class MetaSingleton(type):
    instance = None
    def __call__(cls, *args, **kw):
        if cls.instance is None:
            cls.instance = super(MetaSingleton, cls).__call__(*args, **kw)
        return cls.instance

class Foo(object):
    __metaclass__ = MetaSingleton

a = Foo()
b = Foo()
assert a is b

回答 3

其他人则解释了元类如何工作以及它们如何适合Python类型系统。这是它们可以用来做什么的一个例子。在我编写的测试框架中,我想跟踪定义类的顺序,以便以后可以按此顺序实例化它们。我发现使用元类执行此操作最简单。

class MyMeta(type):

    counter = 0

    def __init__(cls, name, bases, dic):
        type.__init__(cls, name, bases, dic)
        cls._order = MyMeta.counter
        MyMeta.counter += 1

class MyType(object):              # Python 2
    __metaclass__ = MyMeta

class MyType(metaclass=MyMeta):    # Python 3
    pass

子类的任何内容都MyType将获得一个class属性_order,该属性记录定义类的顺序。

Others have explained how metaclasses work and how they fit into the Python type system. Here’s an example of what they can be used for. In a testing framework I wrote, I wanted to keep track of the order in which classes were defined, so that I could later instantiate them in this order. I found it easiest to do this using a metaclass.

class MyMeta(type):

    counter = 0

    def __init__(cls, name, bases, dic):
        type.__init__(cls, name, bases, dic)
        cls._order = MyMeta.counter
        MyMeta.counter += 1

class MyType(object):              # Python 2
    __metaclass__ = MyMeta

class MyType(metaclass=MyMeta):    # Python 3
    pass

Anything that’s a subclass of MyType then gets a class attribute _order that records the order in which the classes were defined.


回答 4

元类的一种用途是自动向实例添加新的属性和方法。

例如,如果您查看Django模型,则其定义看起来有些混乱。似乎您只是在定义类属性:

class Person(models.Model):
    first_name = models.CharField(max_length=30)
    last_name = models.CharField(max_length=30)

但是,在运行时,Person对象充满了各种有用的方法。请参阅源代码中一些惊人的元类。

One use for metaclasses is adding new properties and methods to an instance automatically.

For example, if you look at Django models, their definition looks a bit confusing. It looks as if you are only defining class properties:

class Person(models.Model):
    first_name = models.CharField(max_length=30)
    last_name = models.CharField(max_length=30)

However, at runtime the Person objects are filled with all sorts of useful methods. See the source for some amazing metaclassery.


回答 5

我认为ONLamp对元类编程的介绍写得很好,尽管已经有好几年历史了,但它对该主题却提供了非常好的介绍。

http://www.onlamp.com/pub/a/python/2003/04/17/metaclasses.html(存档于https://web.archive.org/web/20080206005253/http://www.onlamp。 com / pub / a / python / 2003/04/17 / metaclasses.html

简而言之:类是创建实例的蓝图,元类是创建类的蓝图。很容易看出,在Python中,类也必须是一流的对象才能启用此行为。

我从来没有自己写过书,但是我认为可以在Django框架中看到元数据类的最佳用法之一。模型类使用元类方法来启用声明性样式,以编写新模型或表单类。当元类创建类时,所有成员都可以自定义类本身。

剩下要说的是:如果您不知道什么是元类,则不需要它们的可能性为99%。

I think the ONLamp introduction to metaclass programming is well written and gives a really good introduction to the topic despite being several years old already.

http://www.onlamp.com/pub/a/python/2003/04/17/metaclasses.html (archived at https://web.archive.org/web/20080206005253/http://www.onlamp.com/pub/a/python/2003/04/17/metaclasses.html)

In short: A class is a blueprint for the creation of an instance, a metaclass is a blueprint for the creation of a class. It can be easily seen that in Python classes need to be first-class objects too to enable this behavior.

I’ve never written one myself, but I think one of the nicest uses of metaclasses can be seen in the Django framework. The model classes use a metaclass approach to enable a declarative style of writing new models or form classes. While the metaclass is creating the class, all members get the possibility to customize the class itself.

The thing that’s left to say is: If you don’t know what metaclasses are, the probability that you will not need them is 99%.


回答 6

什么是元类?你用它们做什么?

TLDR:元类实例化并定义类的行为,就像类实例化并定义实例的行为一样。

伪代码:

>>> Class(...)
instance

上面看起来应该很熟悉。好吧,它Class来自哪里?它是一个元类的实例(也是伪代码):

>>> Metaclass(...)
Class

在实际代码中,我们可以传递默认的元类,type实例化一个类并获得一个类所需的一切:

>>> type('Foo', (object,), {}) # requires a name, bases, and a namespace
<class '__main__.Foo'>

换个说法

  • 类是实例,而元类是实例。

    当我们实例化一个对象时,我们得到一个实例:

    >>> object()                          # instantiation of class
    <object object at 0x7f9069b4e0b0>     # instance

    同样,当我们使用默认的元类显式定义一个类时type,我们将其实例化:

    >>> type('Object', (object,), {})     # instantiation of metaclass
    <class '__main__.Object'>             # instance
  • 换句话说,类是元类的实例:

    >>> isinstance(object, type)
    True
  • 换句话说,元类是类的类。

    >>> type(object) == type
    True
    >>> object.__class__
    <class 'type'>

当您编写一个类定义并由Python执行时,它使用一个元类来实例化该类对象(而该对象又将被用于实例化该类的实例)。

正如我们可以使用类定义来更改自定义对象实例的行为一样,我们可以使用元类类定义来更改类对象的行为。

它们可以用来做什么?从文档

元类的潜在用途是无限的。已探索的一些想法包括日志记录,接口检查,自动委派,自动属性创建,代理,框架和自动资源锁定/同步。

然而,除非绝对必要,否则通常鼓励用户避免使用元类。

每次创建类时都使用一个元类:

例如,当您编写类定义时,

class Foo(object): 
    'demo'

您实例化一个类对象。

>>> Foo
<class '__main__.Foo'>
>>> isinstance(Foo, type), isinstance(Foo, object)
(True, True)

这与在功能上调用type适当的参数并将结果分配给该名称的变量相同:

name = 'Foo'
bases = (object,)
namespace = {'__doc__': 'demo'}
Foo = type(name, bases, namespace)

请注意,一些东西会自动添加到__dict__,即命名空间:

>>> Foo.__dict__
dict_proxy({'__dict__': <attribute '__dict__' of 'Foo' objects>, 
'__module__': '__main__', '__weakref__': <attribute '__weakref__' 
of 'Foo' objects>, '__doc__': 'demo'})

在这两种情况下,我们创建的对象的元类都是type

(关于类内容的注释__dict____module__是因为类必须知道它们的定义位置,而 因为我们没有定义__dict____weakref__所以存在__slots__–如果定义,__slots__我们将在实例中节省一些空间,例如我们可以禁止__dict____weakref__排除它们,例如:

>>> Baz = type('Bar', (object,), {'__doc__': 'demo', '__slots__': ()})
>>> Baz.__dict__
mappingproxy({'__doc__': 'demo', '__slots__': (), '__module__': '__main__'})

…但是我离题了。)

我们可以type像其他任何类定义一样扩展:

这是默认__repr__的类:

>>> Foo
<class '__main__.Foo'>

默认情况下,我们在编写Python对象时可以做的最有价值的事情之一就是为其提供良好的支持__repr__。当我们打电话时,help(repr)我们知道对a有一个好的测试__repr__,也需要对相等性进行测试- obj == eval(repr(obj))。以下是我们的类型类的类实例的简单实现,__repr____eq__为我们提供了一个示例,该示例可能会改进__repr__类的默认设置:

class Type(type):
    def __repr__(cls):
        """
        >>> Baz
        Type('Baz', (Foo, Bar,), {'__module__': '__main__', '__doc__': None})
        >>> eval(repr(Baz))
        Type('Baz', (Foo, Bar,), {'__module__': '__main__', '__doc__': None})
        """
        metaname = type(cls).__name__
        name = cls.__name__
        parents = ', '.join(b.__name__ for b in cls.__bases__)
        if parents:
            parents += ','
        namespace = ', '.join(': '.join(
          (repr(k), repr(v) if not isinstance(v, type) else v.__name__))
               for k, v in cls.__dict__.items())
        return '{0}(\'{1}\', ({2}), {{{3}}})'.format(metaname, name, parents, namespace)
    def __eq__(cls, other):
        """
        >>> Baz == eval(repr(Baz))
        True            
        """
        return (cls.__name__, cls.__bases__, cls.__dict__) == (
                other.__name__, other.__bases__, other.__dict__)

因此,现在当我们使用该元类创建对象时,__repr__命令行上的回显所提供的视觉效果要比默认情况少得多:

>>> class Bar(object): pass
>>> Baz = Type('Baz', (Foo, Bar,), {'__module__': '__main__', '__doc__': None})
>>> Baz
Type('Baz', (Foo, Bar,), {'__module__': '__main__', '__doc__': None})

通过__repr__为类实例定义良好的代码,我们可以更强大地调试代码。但是,进行进一步检查eval(repr(Class))的可能性不大(因为将函数从默认值转换为函数是相当不可能__repr__的)。

预期的用法:__prepare__命名空间

例如,如果我们想知道类的方法以什么顺序创建,则可以提供一个有序的dict作为类的命名空间。如果这样做是在Python 3中实现的,我们将使用__prepare__该方法返回该类的命名空间dict

from collections import OrderedDict

class OrderedType(Type):
    @classmethod
    def __prepare__(metacls, name, bases, **kwargs):
        return OrderedDict()
    def __new__(cls, name, bases, namespace, **kwargs):
        result = Type.__new__(cls, name, bases, dict(namespace))
        result.members = tuple(namespace)
        return result

和用法:

class OrderedMethodsObject(object, metaclass=OrderedType):
    def method1(self): pass
    def method2(self): pass
    def method3(self): pass
    def method4(self): pass

现在,我们记录了这些方法(和其他类属性)的创建顺序:

>>> OrderedMethodsObject.members
('__module__', '__qualname__', 'method1', 'method2', 'method3', 'method4')

请注意,此示例改编自文档标准库中的新枚举可实现此目的。

因此,我们要做的是通过创建一个类实例化一个元类。我们也可以像对待其他任何类一样对待元类。它具有方法解析顺序:

>>> inspect.getmro(OrderedType)
(<class '__main__.OrderedType'>, <class '__main__.Type'>, <class 'type'>, <class 'object'>)

而且它大致正确repr(除非找到能够表示函数的方法,否则我们将无法再评估它):

>>> OrderedMethodsObject
OrderedType('OrderedMethodsObject', (object,), {'method1': <function OrderedMethodsObject.method1 at 0x0000000002DB01E0>, 'members': ('__module__', '__qualname__', 'method1', 'method2', 'method3', 'method4'), 'method3': <function OrderedMet
hodsObject.method3 at 0x0000000002DB02F0>, 'method2': <function OrderedMethodsObject.method2 at 0x0000000002DB0268>, '__module__': '__main__', '__weakref__': <attribute '__weakref__' of 'OrderedMethodsObject' objects>, '__doc__': None, '__d
ict__': <attribute '__dict__' of 'OrderedMethodsObject' objects>, 'method4': <function OrderedMethodsObject.method4 at 0x0000000002DB0378>})

What are metaclasses? What do you use them for?

TLDR: A metaclass instantiates and defines behavior for a class just like a class instantiates and defines behavior for an instance.

Pseudocode:

>>> Class(...)
instance

The above should look familiar. Well, where does Class come from? It’s an instance of a metaclass (also pseudocode):

>>> Metaclass(...)
Class

In real code, we can pass the default metaclass, type, everything we need to instantiate a class and we get a class:

>>> type('Foo', (object,), {}) # requires a name, bases, and a namespace
<class '__main__.Foo'>

Putting it differently

  • A class is to an instance as a metaclass is to a class.

    When we instantiate an object, we get an instance:

    >>> object()                          # instantiation of class
    <object object at 0x7f9069b4e0b0>     # instance
    

    Likewise, when we define a class explicitly with the default metaclass, type, we instantiate it:

    >>> type('Object', (object,), {})     # instantiation of metaclass
    <class '__main__.Object'>             # instance
    
  • Put another way, a class is an instance of a metaclass:

    >>> isinstance(object, type)
    True
    
  • Put a third way, a metaclass is a class’s class.

    >>> type(object) == type
    True
    >>> object.__class__
    <class 'type'>
    

When you write a class definition and Python executes it, it uses a metaclass to instantiate the class object (which will, in turn, be used to instantiate instances of that class).

Just as we can use class definitions to change how custom object instances behave, we can use a metaclass class definition to change the way a class object behaves.

What can they be used for? From the docs:

The potential uses for metaclasses are boundless. Some ideas that have been explored include logging, interface checking, automatic delegation, automatic property creation, proxies, frameworks, and automatic resource locking/synchronization.

Nevertheless, it is usually encouraged for users to avoid using metaclasses unless absolutely necessary.

You use a metaclass every time you create a class:

When you write a class definition, for example, like this,

class Foo(object): 
    'demo'

You instantiate a class object.

>>> Foo
<class '__main__.Foo'>
>>> isinstance(Foo, type), isinstance(Foo, object)
(True, True)

It is the same as functionally calling type with the appropriate arguments and assigning the result to a variable of that name:

name = 'Foo'
bases = (object,)
namespace = {'__doc__': 'demo'}
Foo = type(name, bases, namespace)

Note, some things automatically get added to the __dict__, i.e., the namespace:

>>> Foo.__dict__
dict_proxy({'__dict__': <attribute '__dict__' of 'Foo' objects>, 
'__module__': '__main__', '__weakref__': <attribute '__weakref__' 
of 'Foo' objects>, '__doc__': 'demo'})

The metaclass of the object we created, in both cases, is type.

(A side-note on the contents of the class __dict__: __module__ is there because classes must know where they are defined, and __dict__ and __weakref__ are there because we don’t define __slots__ – if we define __slots__ we’ll save a bit of space in the instances, as we can disallow __dict__ and __weakref__ by excluding them. For example:

>>> Baz = type('Bar', (object,), {'__doc__': 'demo', '__slots__': ()})
>>> Baz.__dict__
mappingproxy({'__doc__': 'demo', '__slots__': (), '__module__': '__main__'})

… but I digress.)

We can extend type just like any other class definition:

Here’s the default __repr__ of classes:

>>> Foo
<class '__main__.Foo'>

One of the most valuable things we can do by default in writing a Python object is to provide it with a good __repr__. When we call help(repr) we learn that there’s a good test for a __repr__ that also requires a test for equality – obj == eval(repr(obj)). The following simple implementation of __repr__ and __eq__ for class instances of our type class provides us with a demonstration that may improve on the default __repr__ of classes:

class Type(type):
    def __repr__(cls):
        """
        >>> Baz
        Type('Baz', (Foo, Bar,), {'__module__': '__main__', '__doc__': None})
        >>> eval(repr(Baz))
        Type('Baz', (Foo, Bar,), {'__module__': '__main__', '__doc__': None})
        """
        metaname = type(cls).__name__
        name = cls.__name__
        parents = ', '.join(b.__name__ for b in cls.__bases__)
        if parents:
            parents += ','
        namespace = ', '.join(': '.join(
          (repr(k), repr(v) if not isinstance(v, type) else v.__name__))
               for k, v in cls.__dict__.items())
        return '{0}(\'{1}\', ({2}), {{{3}}})'.format(metaname, name, parents, namespace)
    def __eq__(cls, other):
        """
        >>> Baz == eval(repr(Baz))
        True            
        """
        return (cls.__name__, cls.__bases__, cls.__dict__) == (
                other.__name__, other.__bases__, other.__dict__)

So now when we create an object with this metaclass, the __repr__ echoed on the command line provides a much less ugly sight than the default:

>>> class Bar(object): pass
>>> Baz = Type('Baz', (Foo, Bar,), {'__module__': '__main__', '__doc__': None})
>>> Baz
Type('Baz', (Foo, Bar,), {'__module__': '__main__', '__doc__': None})

With a nice __repr__ defined for the class instance, we have a stronger ability to debug our code. However, much further checking with eval(repr(Class)) is unlikely (as functions would be rather impossible to eval from their default __repr__‘s).

An expected usage: __prepare__ a namespace

If, for example, we want to know in what order a class’s methods are created in, we could provide an ordered dict as the namespace of the class. We would do this with __prepare__ which returns the namespace dict for the class if it is implemented in Python 3:

from collections import OrderedDict

class OrderedType(Type):
    @classmethod
    def __prepare__(metacls, name, bases, **kwargs):
        return OrderedDict()
    def __new__(cls, name, bases, namespace, **kwargs):
        result = Type.__new__(cls, name, bases, dict(namespace))
        result.members = tuple(namespace)
        return result

And usage:

class OrderedMethodsObject(object, metaclass=OrderedType):
    def method1(self): pass
    def method2(self): pass
    def method3(self): pass
    def method4(self): pass

And now we have a record of the order in which these methods (and other class attributes) were created:

>>> OrderedMethodsObject.members
('__module__', '__qualname__', 'method1', 'method2', 'method3', 'method4')

Note, this example was adapted from the documentation – the new enum in the standard library does this.

So what we did was instantiate a metaclass by creating a class. We can also treat the metaclass as we would any other class. It has a method resolution order:

>>> inspect.getmro(OrderedType)
(<class '__main__.OrderedType'>, <class '__main__.Type'>, <class 'type'>, <class 'object'>)

And it has approximately the correct repr (which we can no longer eval unless we can find a way to represent our functions.):

>>> OrderedMethodsObject
OrderedType('OrderedMethodsObject', (object,), {'method1': <function OrderedMethodsObject.method1 at 0x0000000002DB01E0>, 'members': ('__module__', '__qualname__', 'method1', 'method2', 'method3', 'method4'), 'method3': <function OrderedMet
hodsObject.method3 at 0x0000000002DB02F0>, 'method2': <function OrderedMethodsObject.method2 at 0x0000000002DB0268>, '__module__': '__main__', '__weakref__': <attribute '__weakref__' of 'OrderedMethodsObject' objects>, '__doc__': None, '__d
ict__': <attribute '__dict__' of 'OrderedMethodsObject' objects>, 'method4': <function OrderedMethodsObject.method4 at 0x0000000002DB0378>})

回答 7

Python 3更新

(在这一点上)元类中有两个关键方法:

  • __prepare__
  • __new__

__prepare__使您可以提供自定义映射(例如OrderedDict),以在创建类时用作命名空间。您必须返回选择的任何命名空间的实例。如果您没有实现__prepare__一个正常dict使用。

__new__ 负责最终类的实际创建/修改。

一个简单的,不做任何事情的超类将是:

class Meta(type):

    def __prepare__(metaclass, cls, bases):
        return dict()

    def __new__(metacls, cls, bases, clsdict):
        return super().__new__(metacls, cls, bases, clsdict)

一个简单的例子:

假设您要在属性上运行一些简单的验证代码-就像它必须始终为intstr。没有元类,您的类将类似于:

class Person:
    weight = ValidateType('weight', int)
    age = ValidateType('age', int)
    name = ValidateType('name', str)

如您所见,您必须重复两次属性名称。这使得输入错误以及令人烦恼的错误成为可能。

一个简单的元类可以解决该问题:

class Person(metaclass=Validator):
    weight = ValidateType(int)
    age = ValidateType(int)
    name = ValidateType(str)

这是元类的外观(不使用,__prepare__因为不需要它):

class Validator(type):
    def __new__(metacls, cls, bases, clsdict):
        # search clsdict looking for ValidateType descriptors
        for name, attr in clsdict.items():
            if isinstance(attr, ValidateType):
                attr.name = name
                attr.attr = '_' + name
        # create final class and return it
        return super().__new__(metacls, cls, bases, clsdict)

示例运行:

p = Person()
p.weight = 9
print(p.weight)
p.weight = '9'

生成:

9
Traceback (most recent call last):
  File "simple_meta.py", line 36, in <module>
    p.weight = '9'
  File "simple_meta.py", line 24, in __set__
    (self.name, self.type, value))
TypeError: weight must be of type(s) <class 'int'> (got '9')

注意:该示例非常简单,它也可以使用类装饰器来完成,但是大概一个实际的元类会做更多的事情。

“ ValidateType”类供参考:

class ValidateType:
    def __init__(self, type):
        self.name = None  # will be set by metaclass
        self.attr = None  # will be set by metaclass
        self.type = type
    def __get__(self, inst, cls):
        if inst is None:
            return self
        else:
            return inst.__dict__[self.attr]
    def __set__(self, inst, value):
        if not isinstance(value, self.type):
            raise TypeError('%s must be of type(s) %s (got %r)' %
                    (self.name, self.type, value))
        else:
            inst.__dict__[self.attr] = value

Python 3 update

There are (at this point) two key methods in a metaclass:

  • __prepare__, and
  • __new__

__prepare__ lets you supply a custom mapping (such as an OrderedDict) to be used as the namespace while the class is being created. You must return an instance of whatever namespace you choose. If you don’t implement __prepare__ a normal dict is used.

__new__ is responsible for the actual creation/modification of the final class.

A bare-bones, do-nothing-extra metaclass would like:

class Meta(type):

    def __prepare__(metaclass, cls, bases):
        return dict()

    def __new__(metacls, cls, bases, clsdict):
        return super().__new__(metacls, cls, bases, clsdict)

A simple example:

Say you want some simple validation code to run on your attributes — like it must always be an int or a str. Without a metaclass, your class would look something like:

class Person:
    weight = ValidateType('weight', int)
    age = ValidateType('age', int)
    name = ValidateType('name', str)

As you can see, you have to repeat the name of the attribute twice. This makes typos possible along with irritating bugs.

A simple metaclass can address that problem:

class Person(metaclass=Validator):
    weight = ValidateType(int)
    age = ValidateType(int)
    name = ValidateType(str)

This is what the metaclass would look like (not using __prepare__ since it is not needed):

class Validator(type):
    def __new__(metacls, cls, bases, clsdict):
        # search clsdict looking for ValidateType descriptors
        for name, attr in clsdict.items():
            if isinstance(attr, ValidateType):
                attr.name = name
                attr.attr = '_' + name
        # create final class and return it
        return super().__new__(metacls, cls, bases, clsdict)

A sample run of:

p = Person()
p.weight = 9
print(p.weight)
p.weight = '9'

produces:

9
Traceback (most recent call last):
  File "simple_meta.py", line 36, in <module>
    p.weight = '9'
  File "simple_meta.py", line 24, in __set__
    (self.name, self.type, value))
TypeError: weight must be of type(s) <class 'int'> (got '9')

Note: This example is simple enough it could have also been accomplished with a class decorator, but presumably an actual metaclass would be doing much more.

The ‘ValidateType’ class for reference:

class ValidateType:
    def __init__(self, type):
        self.name = None  # will be set by metaclass
        self.attr = None  # will be set by metaclass
        self.type = type
    def __get__(self, inst, cls):
        if inst is None:
            return self
        else:
            return inst.__dict__[self.attr]
    def __set__(self, inst, value):
        if not isinstance(value, self.type):
            raise TypeError('%s must be of type(s) %s (got %r)' %
                    (self.name, self.type, value))
        else:
            inst.__dict__[self.attr] = value

回答 8

__call__()创建类实例时元类方法的作用

如果您已经完成Python编程超过几个月,那么您最终会发现以下代码:

# define a class
class SomeClass(object):
    # ...
    # some definition here ...
    # ...

# create an instance of it
instance = SomeClass()

# then call the object as if it's a function
result = instance('foo', 'bar')

当您__call__()在类上实现magic方法时,后者是可能的。

class SomeClass(object):
    # ...
    # some definition here ...
    # ...

    def __call__(self, foo, bar):
        return bar + foo

__call__()当类的实例用作可调用对象时,将调用该方法。但是,正如我们从前面的答案中看到的那样,类本身是元类的实例,因此,当我们使用该类作为可调用对象时(即,当我们创建它的实例时),实际上是在调用其元类的__call__()方法。在这一点上,大多数Python程序员有些困惑,因为有人告诉他们在创建这样的实例时instance = SomeClass()要调用其__init__()方法。有些人已经挖一个深一点知道,之前__init__()__new__()。好吧,今天,在__new__()元类出现之前,另一层真相被揭示出来了__call__()

让我们从创建类实例的角度专门研究方法调用链。

这是一个元类,它准确记录实例创建之前和实例返回之前的时间。

class Meta_1(type):
    def __call__(cls):
        print "Meta_1.__call__() before creating an instance of ", cls
        instance = super(Meta_1, cls).__call__()
        print "Meta_1.__call__() about to return instance."
        return instance

这是使用该元类的类

class Class_1(object):

    __metaclass__ = Meta_1

    def __new__(cls):
        print "Class_1.__new__() before creating an instance."
        instance = super(Class_1, cls).__new__(cls)
        print "Class_1.__new__() about to return instance."
        return instance

    def __init__(self):
        print "entering Class_1.__init__() for instance initialization."
        super(Class_1,self).__init__()
        print "exiting Class_1.__init__()."

现在让我们创建一个实例 Class_1

instance = Class_1()
# Meta_1.__call__() before creating an instance of <class '__main__.Class_1'>.
# Class_1.__new__() before creating an instance.
# Class_1.__new__() about to return instance.
# entering Class_1.__init__() for instance initialization.
# exiting Class_1.__init__().
# Meta_1.__call__() about to return instance.

请注意,上面的代码除了记录任务之外实际上没有做任何其他事情。每个方法将实际工作委托给其父级的实现,从而保留默认行为。由于typeMeta_1的父类(type是默认的父元类),并考虑了上面输出的排序顺序,因此我们现在可以知道什么是伪实现type.__call__()

class type:
    def __call__(cls, *args, **kwarg):

        # ... maybe a few things done to cls here

        # then we call __new__() on the class to create an instance
        instance = cls.__new__(cls, *args, **kwargs)

        # ... maybe a few things done to the instance here

        # then we initialize the instance with its __init__() method
        instance.__init__(*args, **kwargs)

        # ... maybe a few more things done to instance here

        # then we return it
        return instance

我们可以看到metaclass’ __call__()方法是第一个被调用的方法。然后,它将实例的创建委托给类的__new__()方法,并将实例的初始化委托给实例的__init__()。它也是最终返回该实例的对象。

从上面可以得出结论,元类__call__()也有机会决定是否调用Class_1.__new__()Class_1.__init__()最终将进行调用。在执行过程中,它实际上可能返回这两个方法都未触及的对象。以这种单例模式的方法为例:

class Meta_2(type):
    singletons = {}

    def __call__(cls, *args, **kwargs):
        if cls in Meta_2.singletons:
            # we return the only instance and skip a call to __new__()
            # and __init__()
            print ("{} singleton returning from Meta_2.__call__(), "
                   "skipping creation of new instance.".format(cls))
            return Meta_2.singletons[cls]

        # else if the singleton isn't present we proceed as usual
        print "Meta_2.__call__() before creating an instance."
        instance = super(Meta_2, cls).__call__(*args, **kwargs)
        Meta_2.singletons[cls] = instance
        print "Meta_2.__call__() returning new instance."
        return instance

class Class_2(object):

    __metaclass__ = Meta_2

    def __new__(cls, *args, **kwargs):
        print "Class_2.__new__() before creating instance."
        instance = super(Class_2, cls).__new__(cls)
        print "Class_2.__new__() returning instance."
        return instance

    def __init__(self, *args, **kwargs):
        print "entering Class_2.__init__() for initialization."
        super(Class_2, self).__init__()
        print "exiting Class_2.__init__()."

让我们观察一下反复尝试创建类型的对象时会发生什么 Class_2

a = Class_2()
# Meta_2.__call__() before creating an instance.
# Class_2.__new__() before creating instance.
# Class_2.__new__() returning instance.
# entering Class_2.__init__() for initialization.
# exiting Class_2.__init__().
# Meta_2.__call__() returning new instance.

b = Class_2()
# <class '__main__.Class_2'> singleton returning from Meta_2.__call__(), skipping creation of new instance.

c = Class_2()
# <class '__main__.Class_2'> singleton returning from Meta_2.__call__(), skipping creation of new instance.

a is b is c # True

Role of a metaclass’ __call__() method when creating a class instance

If you’ve done Python programming for more than a few months you’ll eventually stumble upon code that looks like this:

# define a class
class SomeClass(object):
    # ...
    # some definition here ...
    # ...

# create an instance of it
instance = SomeClass()

# then call the object as if it's a function
result = instance('foo', 'bar')

The latter is possible when you implement the __call__() magic method on the class.

class SomeClass(object):
    # ...
    # some definition here ...
    # ...

    def __call__(self, foo, bar):
        return bar + foo

The __call__() method is invoked when an instance of a class is used as a callable. But as we’ve seen from previous answers a class itself is an instance of a metaclass, so when we use the class as a callable (i.e. when we create an instance of it) we’re actually calling its metaclass’ __call__() method. At this point most Python programmers are a bit confused because they’ve been told that when creating an instance like this instance = SomeClass() you’re calling its __init__() method. Some who’ve dug a bit deeper know that before __init__() there’s __new__(). Well, today another layer of truth is being revealed, before __new__() there’s the metaclass’ __call__().

Let’s study the method call chain from specifically the perspective of creating an instance of a class.

This is a metaclass that logs exactly the moment before an instance is created and the moment it’s about to return it.

class Meta_1(type):
    def __call__(cls):
        print "Meta_1.__call__() before creating an instance of ", cls
        instance = super(Meta_1, cls).__call__()
        print "Meta_1.__call__() about to return instance."
        return instance

This is a class that uses that metaclass

class Class_1(object):

    __metaclass__ = Meta_1

    def __new__(cls):
        print "Class_1.__new__() before creating an instance."
        instance = super(Class_1, cls).__new__(cls)
        print "Class_1.__new__() about to return instance."
        return instance

    def __init__(self):
        print "entering Class_1.__init__() for instance initialization."
        super(Class_1,self).__init__()
        print "exiting Class_1.__init__()."

And now let’s create an instance of Class_1

instance = Class_1()
# Meta_1.__call__() before creating an instance of <class '__main__.Class_1'>.
# Class_1.__new__() before creating an instance.
# Class_1.__new__() about to return instance.
# entering Class_1.__init__() for instance initialization.
# exiting Class_1.__init__().
# Meta_1.__call__() about to return instance.

Observe that the code above doesn’t actually do anything more than logging the tasks. Each method delegates the actual work to its parent’s implementation, thus keeping the default behavior. Since type is Meta_1‘s parent class (type being the default parent metaclass) and considering the ordering sequence of the output above, we now have a clue as to what would be the pseudo implementation of type.__call__():

class type:
    def __call__(cls, *args, **kwarg):

        # ... maybe a few things done to cls here

        # then we call __new__() on the class to create an instance
        instance = cls.__new__(cls, *args, **kwargs)

        # ... maybe a few things done to the instance here

        # then we initialize the instance with its __init__() method
        instance.__init__(*args, **kwargs)

        # ... maybe a few more things done to instance here

        # then we return it
        return instance

We can see that the metaclass’ __call__() method is the one that’s called first. It then delegates creation of the instance to the class’s __new__() method and initialization to the instance’s __init__(). It’s also the one that ultimately returns the instance.

From the above it stems that the metaclass’ __call__() is also given the opportunity to decide whether or not a call to Class_1.__new__() or Class_1.__init__() will eventually be made. Over the course of its execution it could actually return an object that hasn’t been touched by either of these methods. Take for example this approach to the singleton pattern:

class Meta_2(type):
    singletons = {}

    def __call__(cls, *args, **kwargs):
        if cls in Meta_2.singletons:
            # we return the only instance and skip a call to __new__()
            # and __init__()
            print ("{} singleton returning from Meta_2.__call__(), "
                   "skipping creation of new instance.".format(cls))
            return Meta_2.singletons[cls]

        # else if the singleton isn't present we proceed as usual
        print "Meta_2.__call__() before creating an instance."
        instance = super(Meta_2, cls).__call__(*args, **kwargs)
        Meta_2.singletons[cls] = instance
        print "Meta_2.__call__() returning new instance."
        return instance

class Class_2(object):

    __metaclass__ = Meta_2

    def __new__(cls, *args, **kwargs):
        print "Class_2.__new__() before creating instance."
        instance = super(Class_2, cls).__new__(cls)
        print "Class_2.__new__() returning instance."
        return instance

    def __init__(self, *args, **kwargs):
        print "entering Class_2.__init__() for initialization."
        super(Class_2, self).__init__()
        print "exiting Class_2.__init__()."

Let’s observe what happens when repeatedly trying to create an object of type Class_2

a = Class_2()
# Meta_2.__call__() before creating an instance.
# Class_2.__new__() before creating instance.
# Class_2.__new__() returning instance.
# entering Class_2.__init__() for initialization.
# exiting Class_2.__init__().
# Meta_2.__call__() returning new instance.

b = Class_2()
# <class '__main__.Class_2'> singleton returning from Meta_2.__call__(), skipping creation of new instance.

c = Class_2()
# <class '__main__.Class_2'> singleton returning from Meta_2.__call__(), skipping creation of new instance.

a is b is c # True

回答 9

元类是一个告诉应该如何创建(某些)其他类的类。

在这种情况下,我将元类视为解决问题的方法:我遇到了一个非常复杂的问题,可能可以用其他方法解决,但我选择使用元类来解决。由于其复杂性,它是我编写的为数不多的模块之一,其中模块中的注释超过了已编写的代码量。这里是…

#!/usr/bin/env python

# Copyright (C) 2013-2014 Craig Phillips.  All rights reserved.

# This requires some explaining.  The point of this metaclass excercise is to
# create a static abstract class that is in one way or another, dormant until
# queried.  I experimented with creating a singlton on import, but that did
# not quite behave how I wanted it to.  See now here, we are creating a class
# called GsyncOptions, that on import, will do nothing except state that its
# class creator is GsyncOptionsType.  This means, docopt doesn't parse any
# of the help document, nor does it start processing command line options.
# So importing this module becomes really efficient.  The complicated bit
# comes from requiring the GsyncOptions class to be static.  By that, I mean
# any property on it, may or may not exist, since they are not statically
# defined; so I can't simply just define the class with a whole bunch of
# properties that are @property @staticmethods.
#
# So here's how it works:
#
# Executing 'from libgsync.options import GsyncOptions' does nothing more
# than load up this module, define the Type and the Class and import them
# into the callers namespace.  Simple.
#
# Invoking 'GsyncOptions.debug' for the first time, or any other property
# causes the __metaclass__ __getattr__ method to be called, since the class
# is not instantiated as a class instance yet.  The __getattr__ method on
# the type then initialises the class (GsyncOptions) via the __initialiseClass
# method.  This is the first and only time the class will actually have its
# dictionary statically populated.  The docopt module is invoked to parse the
# usage document and generate command line options from it.  These are then
# paired with their defaults and what's in sys.argv.  After all that, we
# setup some dynamic properties that could not be defined by their name in
# the usage, before everything is then transplanted onto the actual class
# object (or static class GsyncOptions).
#
# Another piece of magic, is to allow command line options to be set in
# in their native form and be translated into argparse style properties.
#
# Finally, the GsyncListOptions class is actually where the options are
# stored.  This only acts as a mechanism for storing options as lists, to
# allow aggregation of duplicate options or options that can be specified
# multiple times.  The __getattr__ call hides this by default, returning the
# last item in a property's list.  However, if the entire list is required,
# calling the 'list()' method on the GsyncOptions class, returns a reference
# to the GsyncListOptions class, which contains all of the same properties
# but as lists and without the duplication of having them as both lists and
# static singlton values.
#
# So this actually means that GsyncOptions is actually a static proxy class...
#
# ...And all this is neatly hidden within a closure for safe keeping.
def GetGsyncOptionsType():
    class GsyncListOptions(object):
        __initialised = False

    class GsyncOptionsType(type):
        def __initialiseClass(cls):
            if GsyncListOptions._GsyncListOptions__initialised: return

            from docopt import docopt
            from libgsync.options import doc
            from libgsync import __version__

            options = docopt(
                doc.__doc__ % __version__,
                version = __version__,
                options_first = True
            )

            paths = options.pop('<path>', None)
            setattr(cls, "destination_path", paths.pop() if paths else None)
            setattr(cls, "source_paths", paths)
            setattr(cls, "options", options)

            for k, v in options.iteritems():
                setattr(cls, k, v)

            GsyncListOptions._GsyncListOptions__initialised = True

        def list(cls):
            return GsyncListOptions

        def __getattr__(cls, name):
            cls.__initialiseClass()
            return getattr(GsyncListOptions, name)[-1]

        def __setattr__(cls, name, value):
            # Substitut option names: --an-option-name for an_option_name
            import re
            name = re.sub(r'^__', "", re.sub(r'-', "_", name))
            listvalue = []

            # Ensure value is converted to a list type for GsyncListOptions
            if isinstance(value, list):
                if value:
                    listvalue = [] + value
                else:
                    listvalue = [ None ]
            else:
                listvalue = [ value ]

            type.__setattr__(GsyncListOptions, name, listvalue)

    # Cleanup this module to prevent tinkering.
    import sys
    module = sys.modules[__name__]
    del module.__dict__['GetGsyncOptionsType']

    return GsyncOptionsType

# Our singlton abstract proxy class.
class GsyncOptions(object):
    __metaclass__ = GetGsyncOptionsType()

A metaclass is a class that tells how (some) other class should be created.

This is a case where I saw metaclass as a solution to my problem: I had a really complicated problem, that probably could have been solved differently, but I chose to solve it using a metaclass. Because of the complexity, it is one of the few modules I have written where the comments in the module surpass the amount of code that has been written. Here it is…

#!/usr/bin/env python

# Copyright (C) 2013-2014 Craig Phillips.  All rights reserved.

# This requires some explaining.  The point of this metaclass excercise is to
# create a static abstract class that is in one way or another, dormant until
# queried.  I experimented with creating a singlton on import, but that did
# not quite behave how I wanted it to.  See now here, we are creating a class
# called GsyncOptions, that on import, will do nothing except state that its
# class creator is GsyncOptionsType.  This means, docopt doesn't parse any
# of the help document, nor does it start processing command line options.
# So importing this module becomes really efficient.  The complicated bit
# comes from requiring the GsyncOptions class to be static.  By that, I mean
# any property on it, may or may not exist, since they are not statically
# defined; so I can't simply just define the class with a whole bunch of
# properties that are @property @staticmethods.
#
# So here's how it works:
#
# Executing 'from libgsync.options import GsyncOptions' does nothing more
# than load up this module, define the Type and the Class and import them
# into the callers namespace.  Simple.
#
# Invoking 'GsyncOptions.debug' for the first time, or any other property
# causes the __metaclass__ __getattr__ method to be called, since the class
# is not instantiated as a class instance yet.  The __getattr__ method on
# the type then initialises the class (GsyncOptions) via the __initialiseClass
# method.  This is the first and only time the class will actually have its
# dictionary statically populated.  The docopt module is invoked to parse the
# usage document and generate command line options from it.  These are then
# paired with their defaults and what's in sys.argv.  After all that, we
# setup some dynamic properties that could not be defined by their name in
# the usage, before everything is then transplanted onto the actual class
# object (or static class GsyncOptions).
#
# Another piece of magic, is to allow command line options to be set in
# in their native form and be translated into argparse style properties.
#
# Finally, the GsyncListOptions class is actually where the options are
# stored.  This only acts as a mechanism for storing options as lists, to
# allow aggregation of duplicate options or options that can be specified
# multiple times.  The __getattr__ call hides this by default, returning the
# last item in a property's list.  However, if the entire list is required,
# calling the 'list()' method on the GsyncOptions class, returns a reference
# to the GsyncListOptions class, which contains all of the same properties
# but as lists and without the duplication of having them as both lists and
# static singlton values.
#
# So this actually means that GsyncOptions is actually a static proxy class...
#
# ...And all this is neatly hidden within a closure for safe keeping.
def GetGsyncOptionsType():
    class GsyncListOptions(object):
        __initialised = False

    class GsyncOptionsType(type):
        def __initialiseClass(cls):
            if GsyncListOptions._GsyncListOptions__initialised: return

            from docopt import docopt
            from libgsync.options import doc
            from libgsync import __version__

            options = docopt(
                doc.__doc__ % __version__,
                version = __version__,
                options_first = True
            )

            paths = options.pop('<path>', None)
            setattr(cls, "destination_path", paths.pop() if paths else None)
            setattr(cls, "source_paths", paths)
            setattr(cls, "options", options)

            for k, v in options.iteritems():
                setattr(cls, k, v)

            GsyncListOptions._GsyncListOptions__initialised = True

        def list(cls):
            return GsyncListOptions

        def __getattr__(cls, name):
            cls.__initialiseClass()
            return getattr(GsyncListOptions, name)[-1]

        def __setattr__(cls, name, value):
            # Substitut option names: --an-option-name for an_option_name
            import re
            name = re.sub(r'^__', "", re.sub(r'-', "_", name))
            listvalue = []

            # Ensure value is converted to a list type for GsyncListOptions
            if isinstance(value, list):
                if value:
                    listvalue = [] + value
                else:
                    listvalue = [ None ]
            else:
                listvalue = [ value ]

            type.__setattr__(GsyncListOptions, name, listvalue)

    # Cleanup this module to prevent tinkering.
    import sys
    module = sys.modules[__name__]
    del module.__dict__['GetGsyncOptionsType']

    return GsyncOptionsType

# Our singlton abstract proxy class.
class GsyncOptions(object):
    __metaclass__ = GetGsyncOptionsType()

回答 10

tl; dr版本

type(obj)函数获取对象的类型。

type()一类是它的元类

要使用元类:

class Foo(object):
    __metaclass__ = MyMetaClass

type是它自己的元类。类的类是元类-类的主体是传递给用于构造类的元类的参数。

在这里,您可以了解有关如何使用元类自定义类构造的信息。

The tl;dr version

The type(obj) function gets you the type of an object.

The type() of a class is its metaclass.

To use a metaclass:

class Foo(object):
    __metaclass__ = MyMetaClass

type is its own metaclass. The class of a class is a metaclass– the body of a class is the arguments passed to the metaclass that is used to construct the class.

Here you can read about how to use metaclasses to customize class construction.


回答 11

type实际上是一个metaclass创建另一个类的类。大多数metaclass是的子类type。所述metaclass接收new类作为其第一个参数,如下面所提到提供访问与细节类对象:

>>> class MetaClass(type):
...     def __init__(cls, name, bases, attrs):
...         print ('class name: %s' %name )
...         print ('Defining class %s' %cls)
...         print('Bases %s: ' %bases)
...         print('Attributes')
...         for (name, value) in attrs.items():
...             print ('%s :%r' %(name, value))
... 

>>> class NewClass(object, metaclass=MetaClass):
...    get_choch='dairy'
... 
class name: NewClass
Bases <class 'object'>: 
Defining class <class 'NewClass'>
get_choch :'dairy'
__module__ :'builtins'
__qualname__ :'NewClass'

Note:

注意,该类在任何时候都没有实例化。创建类的简单动作触发了metaclass

type is actually a metaclass — a class that creates another classes. Most metaclass are the subclasses of type. The metaclass receives the new class as its first argument and provide access to class object with details as mentioned below:

>>> class MetaClass(type):
...     def __init__(cls, name, bases, attrs):
...         print ('class name: %s' %name )
...         print ('Defining class %s' %cls)
...         print('Bases %s: ' %bases)
...         print('Attributes')
...         for (name, value) in attrs.items():
...             print ('%s :%r' %(name, value))
... 

>>> class NewClass(object, metaclass=MetaClass):
...    get_choch='dairy'
... 
class name: NewClass
Bases <class 'object'>: 
Defining class <class 'NewClass'>
get_choch :'dairy'
__module__ :'builtins'
__qualname__ :'NewClass'

Note:

Notice that the class was not instantiated at any time; the simple act of creating the class triggered execution of the metaclass.


回答 12

Python类本身就是其元类的对象(例如,实例)。

默认元类,当您将类确定为:

class foo:
    ...

元类用于将规则应用于整个类集。例如,假设您正在构建一个ORM来访问数据库,并且希望每个表中的记录属于映射到该表的类(基于字段,业务规则等),并可能使用元类例如,连接池逻辑由所有表中的所有记录类别共享。另一个用途是支持外键的逻辑,该外键涉及多个记录类别。

当您定义元类时,您将子类化,并且可以覆盖以下魔术方法来插入您的逻辑。

class somemeta(type):
    __new__(mcs, name, bases, clsdict):
      """
  mcs: is the base metaclass, in this case type.
  name: name of the new class, as provided by the user.
  bases: tuple of base classes 
  clsdict: a dictionary containing all methods and attributes defined on class

  you must return a class object by invoking the __new__ constructor on the base metaclass. 
 ie: 
    return type.__call__(mcs, name, bases, clsdict).

  in the following case:

  class foo(baseclass):
        __metaclass__ = somemeta

  an_attr = 12

  def bar(self):
      ...

  @classmethod
  def foo(cls):
      ...

      arguments would be : ( somemeta, "foo", (baseclass, baseofbase,..., object), {"an_attr":12, "bar": <function>, "foo": <bound class method>}

      you can modify any of these values before passing on to type
      """
      return type.__call__(mcs, name, bases, clsdict)


    def __init__(self, name, bases, clsdict):
      """ 
      called after type has been created. unlike in standard classes, __init__ method cannot modify the instance (cls) - and should be used for class validaton.
      """
      pass


    def __prepare__():
        """
        returns a dict or something that can be used as a namespace.
        the type will then attach methods and attributes from class definition to it.

        call order :

        somemeta.__new__ ->  type.__new__ -> type.__init__ -> somemeta.__init__ 
        """
        return dict()

    def mymethod(cls):
        """ works like a classmethod, but for class objects. Also, my method will not be visible to instances of cls.
        """
        pass

无论如何,这两个是最常用的钩子。元分类功能强大,而元数据分类的用途清单也不是详尽无遗。

Python classes are themselves objects – as in instance – of their meta-class.

The default metaclass, which is applied when when you determine classes as:

class foo:
    ...

meta class are used to apply some rule to an entire set of classes. For example, suppose you’re building an ORM to access a database, and you want records from each table to be of a class mapped to that table (based on fields, business rules, etc..,), a possible use of metaclass is for instance, connection pool logic, which is share by all classes of record from all tables. Another use is logic to to support foreign keys, which involves multiple classes of records.

when you define metaclass, you subclass type, and can overrided the following magic methods to insert your logic.

class somemeta(type):
    __new__(mcs, name, bases, clsdict):
      """
  mcs: is the base metaclass, in this case type.
  name: name of the new class, as provided by the user.
  bases: tuple of base classes 
  clsdict: a dictionary containing all methods and attributes defined on class

  you must return a class object by invoking the __new__ constructor on the base metaclass. 
 ie: 
    return type.__call__(mcs, name, bases, clsdict).

  in the following case:

  class foo(baseclass):
        __metaclass__ = somemeta

  an_attr = 12

  def bar(self):
      ...

  @classmethod
  def foo(cls):
      ...

      arguments would be : ( somemeta, "foo", (baseclass, baseofbase,..., object), {"an_attr":12, "bar": <function>, "foo": <bound class method>}

      you can modify any of these values before passing on to type
      """
      return type.__call__(mcs, name, bases, clsdict)


    def __init__(self, name, bases, clsdict):
      """ 
      called after type has been created. unlike in standard classes, __init__ method cannot modify the instance (cls) - and should be used for class validaton.
      """
      pass


    def __prepare__():
        """
        returns a dict or something that can be used as a namespace.
        the type will then attach methods and attributes from class definition to it.

        call order :

        somemeta.__new__ ->  type.__new__ -> type.__init__ -> somemeta.__init__ 
        """
        return dict()

    def mymethod(cls):
        """ works like a classmethod, but for class objects. Also, my method will not be visible to instances of cls.
        """
        pass

anyhow, those two are the most commonly used hooks. metaclassing is powerful, and above is nowhere near and exhaustive list of uses for metaclassing.


回答 13

type()函数可以返回对象的类型或创建新的类型,

例如,我们可以使用type()函数创建一个Hi类,而无需在Hi(object)类中使用这种方式:

def func(self, name='mike'):
    print('Hi, %s.' % name)

Hi = type('Hi', (object,), dict(hi=func))
h = Hi()
h.hi()
Hi, mike.

type(Hi)
type

type(h)
__main__.Hi

除了使用type()动态创建类之外,您还可以控制类的创建行为并使用元类。

根据Python对象模型,类是对象,因此该类必须是另一个特定类的实例。默认情况下,Python类是类型类的实例。也就是说,类型是大多数内置类的元类和用户定义类的元类。

class ListMetaclass(type):
    def __new__(cls, name, bases, attrs):
        attrs['add'] = lambda self, value: self.append(value)
        return type.__new__(cls, name, bases, attrs)

class CustomList(list, metaclass=ListMetaclass):
    pass

lst = CustomList()
lst.add('custom_list_1')
lst.add('custom_list_2')

lst
['custom_list_1', 'custom_list_2']

当我们在元类中传递关键字参数时,Magic将会生效,它指示Python解释器通过ListMetaclass创建CustomList。new(),此时,我们可以例如修改类定义,并添加新方法,然后返回修改后的定义。

The type() function can return the type of an object or create a new type,

for example, we can create a Hi class with the type() function and do not need to use this way with class Hi(object):

def func(self, name='mike'):
    print('Hi, %s.' % name)

Hi = type('Hi', (object,), dict(hi=func))
h = Hi()
h.hi()
Hi, mike.

type(Hi)
type

type(h)
__main__.Hi

In addition to using type() to create classes dynamically, you can control creation behavior of class and use metaclass.

According to the Python object model, the class is the object, so the class must be an instance of another certain class. By default, a Python class is instance of the type class. That is, type is metaclass of most of the built-in classes and metaclass of user-defined classes.

class ListMetaclass(type):
    def __new__(cls, name, bases, attrs):
        attrs['add'] = lambda self, value: self.append(value)
        return type.__new__(cls, name, bases, attrs)

class CustomList(list, metaclass=ListMetaclass):
    pass

lst = CustomList()
lst.add('custom_list_1')
lst.add('custom_list_2')

lst
['custom_list_1', 'custom_list_2']

Magic will take effect when we passed keyword arguments in metaclass, it indicates the Python interpreter to create the CustomList through ListMetaclass. new (), at this point, we can modify the class definition, for example, and add a new method and then return the revised definition.


回答 14

除了已发布的答案,我可以说a metaclass定义了一个类的行为。因此,您可以显式设置您的元类。每当Python获得关键字时,class它就会开始搜索metaclass。如果未找到,则使用默认的元类类型创建类的对象。使用该__metaclass__属性,可以设置metaclass您的类:

class MyClass:
   __metaclass__ = type
   # write here other method
   # write here one more method

print(MyClass.__metaclass__)

它将产生如下输出:

class 'type'

当然,您可以创建自己的类metaclass来定义使用您的类创建的任何类的行为。

为此,metaclass必须继承默认类型类,因为这是主要的metaclass

class MyMetaClass(type):
   __metaclass__ = type
   # you can write here any behaviour you want

class MyTestClass:
   __metaclass__ = MyMetaClass

Obj = MyTestClass()
print(Obj.__metaclass__)
print(MyMetaClass.__metaclass__)

输出将是:

class '__main__.MyMetaClass'
class 'type'

In addition to the published answers I can say that a metaclass defines the behaviour for a class. So, you can explicitly set your metaclass. Whenever Python gets a keyword class then it starts searching for the metaclass. If it’s not found – the default metaclass type is used to create the class’s object. Using the __metaclass__ attribute, you can set metaclass of your class:

class MyClass:
   __metaclass__ = type
   # write here other method
   # write here one more method

print(MyClass.__metaclass__)

It’ll produce the output like this:

class 'type'

And, of course, you can create your own metaclass to define the behaviour of any class that are created using your class.

For doing that, your default metaclass type class must be inherited as this is the main metaclass:

class MyMetaClass(type):
   __metaclass__ = type
   # you can write here any behaviour you want

class MyTestClass:
   __metaclass__ = MyMetaClass

Obj = MyTestClass()
print(Obj.__metaclass__)
print(MyMetaClass.__metaclass__)

The output will be:

class '__main__.MyMetaClass'
class 'type'

回答 15

在面向对象的编程中,元类是一个类,其实例是类。就像普通类定义某些对象的行为一样,元类定义某些类及其实例的行为。术语“元类”仅表示用于创建类的内容。换句话说,它是一个类的类。元类用于创建类,因此就像对象是类的实例一样,类是元类的实例。在python中,类也被视为对象。

In object-oriented programming, a metaclass is a class whose instances are classes. Just as an ordinary class defines the behavior of certain objects, a metaclass defines the behavior of certain class and their instances The term metaclass simply means something used to create classes. In other words, it is the class of a class. The metaclass is used to create the class so like the object being an instance of a class, a class is an instance of a metaclass. In python classes are also considered objects.


回答 16

这是其用途的另一个示例:

  • 您可以使用metaclass更改其实例(类)的功能。
class MetaMemberControl(type):
    __slots__ = ()

    @classmethod
    def __prepare__(mcs, f_cls_name, f_cls_parents,  # f_cls means: future class
                    meta_args=None, meta_options=None):  # meta_args and meta_options is not necessarily needed, just so you know.
        f_cls_attr = dict()
        if not "do something or if you want to define your cool stuff of dict...":
            return dict(make_your_special_dict=None)
        else:
            return f_cls_attr

    def __new__(mcs, f_cls_name, f_cls_parents, f_cls_attr,
                meta_args=None, meta_options=None):

        original_getattr = f_cls_attr.get('__getattribute__')
        original_setattr = f_cls_attr.get('__setattr__')

        def init_getattr(self, item):
            if not item.startswith('_'):  # you can set break points at here
                alias_name = '_' + item
                if alias_name in f_cls_attr['__slots__']:
                    item = alias_name
            if original_getattr is not None:
                return original_getattr(self, item)
            else:
                return super(eval(f_cls_name), self).__getattribute__(item)

        def init_setattr(self, key, value):
            if not key.startswith('_') and ('_' + key) in f_cls_attr['__slots__']:
                raise AttributeError(f"you can't modify private members:_{key}")
            if original_setattr is not None:
                original_setattr(self, key, value)
            else:
                super(eval(f_cls_name), self).__setattr__(key, value)

        f_cls_attr['__getattribute__'] = init_getattr
        f_cls_attr['__setattr__'] = init_setattr

        cls = super().__new__(mcs, f_cls_name, f_cls_parents, f_cls_attr)
        return cls


class Human(metaclass=MetaMemberControl):
    __slots__ = ('_age', '_name')

    def __init__(self, name, age):
        self._name = name
        self._age = age

    def __getattribute__(self, item):
        """
        is just for IDE recognize.
        """
        return super().__getattribute__(item)

    """ with MetaMemberControl then you don't have to write as following
    @property
    def name(self):
        return self._name

    @property
    def age(self):
        return self._age
    """


def test_demo():
    human = Human('Carson', 27)
    # human.age = 18  # you can't modify private members:_age  <-- this is defined by yourself.
    # human.k = 18  # 'Human' object has no attribute 'k'  <-- system error.
    age1 = human._age  # It's OK, although the IDE will show some warnings. (Access to a protected member _age of a class)

    age2 = human.age  # It's OK! see below:
    """
    if you do not define `__getattribute__` at the class of Human,
    the IDE will show you: Unresolved attribute reference 'age' for class 'Human'
    but it's ok on running since the MetaMemberControl will help you.
    """


if __name__ == '__main__':
    test_demo()

metaclass是强大的,有很多东西(如Monkey魔术),你可以用它做,但要小心,这可能只知道给你。

Here’s another example of what it can be used for:

  • You can use the metaclass to change the function of its instance (the class).
class MetaMemberControl(type):
    __slots__ = ()

    @classmethod
    def __prepare__(mcs, f_cls_name, f_cls_parents,  # f_cls means: future class
                    meta_args=None, meta_options=None):  # meta_args and meta_options is not necessarily needed, just so you know.
        f_cls_attr = dict()
        if not "do something or if you want to define your cool stuff of dict...":
            return dict(make_your_special_dict=None)
        else:
            return f_cls_attr

    def __new__(mcs, f_cls_name, f_cls_parents, f_cls_attr,
                meta_args=None, meta_options=None):

        original_getattr = f_cls_attr.get('__getattribute__')
        original_setattr = f_cls_attr.get('__setattr__')

        def init_getattr(self, item):
            if not item.startswith('_'):  # you can set break points at here
                alias_name = '_' + item
                if alias_name in f_cls_attr['__slots__']:
                    item = alias_name
            if original_getattr is not None:
                return original_getattr(self, item)
            else:
                return super(eval(f_cls_name), self).__getattribute__(item)

        def init_setattr(self, key, value):
            if not key.startswith('_') and ('_' + key) in f_cls_attr['__slots__']:
                raise AttributeError(f"you can't modify private members:_{key}")
            if original_setattr is not None:
                original_setattr(self, key, value)
            else:
                super(eval(f_cls_name), self).__setattr__(key, value)

        f_cls_attr['__getattribute__'] = init_getattr
        f_cls_attr['__setattr__'] = init_setattr

        cls = super().__new__(mcs, f_cls_name, f_cls_parents, f_cls_attr)
        return cls


class Human(metaclass=MetaMemberControl):
    __slots__ = ('_age', '_name')

    def __init__(self, name, age):
        self._name = name
        self._age = age

    def __getattribute__(self, item):
        """
        is just for IDE recognize.
        """
        return super().__getattribute__(item)

    """ with MetaMemberControl then you don't have to write as following
    @property
    def name(self):
        return self._name

    @property
    def age(self):
        return self._age
    """


def test_demo():
    human = Human('Carson', 27)
    # human.age = 18  # you can't modify private members:_age  <-- this is defined by yourself.
    # human.k = 18  # 'Human' object has no attribute 'k'  <-- system error.
    age1 = human._age  # It's OK, although the IDE will show some warnings. (Access to a protected member _age of a class)

    age2 = human.age  # It's OK! see below:
    """
    if you do not define `__getattribute__` at the class of Human,
    the IDE will show you: Unresolved attribute reference 'age' for class 'Human'
    but it's ok on running since the MetaMemberControl will help you.
    """


if __name__ == '__main__':
    test_demo()

The metaclass is powerful, there are many things (such as monkey magic) you can do with it, but be careful this may only be known to you.


回答 17

在Python中,一个类是一个对象,就像其他任何对象一样,它是“某物”的实例。这种“东西”就是所谓的元类。这个元类是一种特殊的类,它创建其他类的对象。因此,元类负责创建新类。这使程序员可以自定义类的生成方式。

要创建一个元类,通常要重写new()和init()方法。可以重写new()来更改对象的创建方式,而可以重写init()来更改对象的初始化方式。元类可以通过多种方式创建。一种方法是使用type()函数。当使用3个参数调用type()函数时,它将创建一个元类。参数是:

  1. 类的名称
  2. 具有由类继承的基类的元组
  3. 具有所有类方法和类变量的字典

创建元类的另一种方法包括“元类”关键字。将元类定义为简单类。在继承的类的参数中,传递metaclass = metaclass_name

元类可以在以下情况下专门使用:

  1. 当必须将特殊效果应用于所有子类时
  2. 需要自动更改Class(创建时)
  3. 由API开发人员

A class, in Python, is an object, and just like any other object, it is an instance of “something”. This “something” is what is termed as a Metaclass. This metaclass is a special type of class that creates other class’s objects. Hence, metaclass is responsible for making new classes. This allows the programmer to customize the way classes are generated.

To create a metaclass, overriding of new() and init() methods is usually done. new() can be overridden to change the way objects are created, while init() can be overridden to change the way of initializing the object. Metaclass can be created by a number of ways. One of the ways is to use type() function. type() function, when called with 3 parameters, creates a metaclass. The parameters are :-

  1. Class Name
  2. Tuple having base classes inherited by class
  3. A dictionary having all class methods and class variables

Another way of creating a metaclass comprises of ‘metaclass’ keyword. Define the metaclass as a simple class. In the parameters of inherited class, pass metaclass=metaclass_name

Metaclass can be specifically used in the following situations :-

  1. when a particular effect has to be applied to all the subclasses
  2. Automatic change of class (on creation) is required
  3. By API developers

回答 18

请注意,在python 3.6中__init_subclass__(cls, **kwargs),引入了新的dunder方法来替换元类的许多常见用例。创建定义类的子类时调用is。参见python docs

Note that in python 3.6 a new dunder method __init_subclass__(cls, **kwargs) was introduced to replace a lot of common use cases for metaclasses. Is is called when a subclass of the defining class is created. See python docs.


回答 19

元类是一种类,它定义类的行为方式,或者我们可以说类本身是元类的实例。

Metaclass is a kind of class which defines how the class will behave like or we can say that A class is itself an instance of a metaclass.


如何检查文件是否存在无exceptions?

问题:如何检查文件是否存在无exceptions?

如何在不使用try语句的情况下检查文件是否存在?

How do I check if a file exists or not, without using the try statement?


回答 0

如果您要检查的原因是可以执行类似的操作if file_exists: open_it(),则使用try尝试来打开它会。检查然后打开可能会导致文件被删除或移动,或者介于检查和尝试打开之间的时间。

如果您不打算立即打开文件,则可以使用 os.path.isfile

True如果path是现有的常规文件,则返回。这遵循符号链接,因此,对于同一路径,islink()isfile()都可以为true。

import os.path
os.path.isfile(fname) 

如果您需要确保它是一个文件。

从Python 3.4开始,该pathlib模块提供了一种面向对象的方法(pathlib2在2.7中向后移植):

from pathlib import Path

my_file = Path("/path/to/file")
if my_file.is_file():
    # file exists

要检查目录,请执行以下操作:

if my_file.is_dir():
    # directory exists

要检查Path对象是否独立于文件还是目录,请使用exists()

if my_file.exists():
    # path exists

您也可以resolve(strict=True)在一个try块中使用:

try:
    my_abs_path = my_file.resolve(strict=True)
except FileNotFoundError:
    # doesn't exist
else:
    # exists

If the reason you’re checking is so you can do something like if file_exists: open_it(), it’s safer to use a try around the attempt to open it. Checking and then opening risks the file being deleted or moved or something between when you check and when you try to open it.

If you’re not planning to open the file immediately, you can use os.path.isfile

Return True if path is an existing regular file. This follows symbolic links, so both islink() and isfile() can be true for the same path.

import os.path
os.path.isfile(fname) 

if you need to be sure it’s a file.

Starting with Python 3.4, the pathlib module offers an object-oriented approach (backported to pathlib2 in Python 2.7):

from pathlib import Path

my_file = Path("/path/to/file")
if my_file.is_file():
    # file exists

To check a directory, do:

if my_file.is_dir():
    # directory exists

To check whether a Path object exists independently of whether is it a file or directory, use exists():

if my_file.exists():
    # path exists

You can also use resolve(strict=True) in a try block:

try:
    my_abs_path = my_file.resolve(strict=True)
except FileNotFoundError:
    # doesn't exist
else:
    # exists

回答 1

您具有以下os.path.exists功能:

import os.path
os.path.exists(file_path)

这会同时返回True文件和目录,但您可以改用

os.path.isfile(file_path)

测试它是否是专门的文件。它遵循符号链接。

You have the os.path.exists function:

import os.path
os.path.exists(file_path)

This returns True for both files and directories but you can instead use

os.path.isfile(file_path)

to test if it’s a file specifically. It follows symlinks.


回答 2

不像isfile()exists()将返回True目录。因此,根据您只需要纯文件还是目录,您将使用isfile()exists()。这是一些简单的REPL输出:

>>> os.path.isfile("/etc/password.txt")
True
>>> os.path.isfile("/etc")
False
>>> os.path.isfile("/does/not/exist")
False
>>> os.path.exists("/etc/password.txt")
True
>>> os.path.exists("/etc")
True
>>> os.path.exists("/does/not/exist")
False

Unlike isfile(), exists() will return True for directories. So depending on if you want only plain files or also directories, you’ll use isfile() or exists(). Here is some simple REPL output:

>>> os.path.isfile("/etc/password.txt")
True
>>> os.path.isfile("/etc")
False
>>> os.path.isfile("/does/not/exist")
False
>>> os.path.exists("/etc/password.txt")
True
>>> os.path.exists("/etc")
True
>>> os.path.exists("/does/not/exist")
False

回答 3

import os.path

if os.path.isfile(filepath):
import os.path

if os.path.isfile(filepath):

回答 4

使用os.path.isfile()os.access()

import os

PATH = './file.txt'
if os.path.isfile(PATH) and os.access(PATH, os.R_OK):
    print("File exists and is readable")
else:
    print("Either the file is missing or not readable")

Use os.path.isfile() with os.access():

import os

PATH = './file.txt'
if os.path.isfile(PATH) and os.access(PATH, os.R_OK):
    print("File exists and is readable")
else:
    print("Either the file is missing or not readable")

回答 5

import os
os.path.exists(path) # Returns whether the path (directory or file) exists or not
os.path.isfile(path) # Returns whether the file exists or not
import os
os.path.exists(path) # Returns whether the path (directory or file) exists or not
os.path.isfile(path) # Returns whether the file exists or not

回答 6

尽管在(至少一个)现有答案中已经列出了几乎所有可能的方法(例如,添加了Python 3.4特定的内容),但我将尝试将所有内容组合在一起。

注意:我要发布的每个Python标准库代码都属于3.5.3版。

问题陈述

  1. 检查文件(可以参数:也是文件夹(“特殊”文件)吗?)是否存在
  2. 不要使用try / except / else / finally

可能的解决方案

  1. [Python 3]:os.path。存在路径(还要检查其他功能的家庭成员一样os.path.isfileos.path.isdiros.path.lexists对行为略有不同)

    os.path.exists(path)

    返回True如果路径是指现有的路径或一个打开的文件描述符。返回False断开的符号链接。在某些平台上,即使未物理上存在路径,如果False未授予在请求的文件上执行os.stat()的权限,此函数也可能返回。

    一切都很好,但是如果遵循导入树:

    • os.pathposixpath.pyntpath.py

      • genericpath.py,第〜#20 +行

        def exists(path):
            """Test whether a path exists.  Returns False for broken symbolic links"""
            try:
                st = os.stat(path)
            except os.error:
                return False
            return True

    它只是[Python 3]周围的try / 除了:操作系统。statpath,*,dir_fd = None,follow_symlinks = True。因此,您的代码是try / 除了免费的,但在帧堆栈中至少有一个这样的块。这也适用于其他功能(包括 os.path.isfile)。

    1.1。[Python 3]:路径。is_file()

    • 这是一种处理路径的更好的方式(以及更多的python ic),但是
    • 后台,它做的完全一样(pathlib.py,行〜#1330):

      def is_file(self):
          """
          Whether this path is a regular file (also True for symlinks pointing
          to regular files).
          """
          try:
              return S_ISREG(self.stat().st_mode)
          except OSError as e:
              if e.errno not in (ENOENT, ENOTDIR):
                  raise
              # Path doesn't exist or is a broken symlink
              # (see https://bitbucket.org/pitrou/pathlib/issue/12/)
              return False
  2. [Python 3]:使用语句上下文管理器。要么:

    • 创建一个:

      class Swallow:  # Dummy example
          swallowed_exceptions = (FileNotFoundError,)
      
          def __enter__(self):
              print("Entering...")
      
          def __exit__(self, exc_type, exc_value, exc_traceback):
              print("Exiting:", exc_type, exc_value, exc_traceback)
              return exc_type in Swallow.swallowed_exceptions  # only swallow FileNotFoundError (not e.g. TypeError - if the user passes a wrong argument like None or float or ...)
      • 而它的用法-我会复制os.path.isfile行为(请注意,这只是为了演示的目的,也不会尝试写这样的代码制作):

        import os
        import stat
        
        
        def isfile_seaman(path):  # Dummy func
            result = False
            with Swallow():
                result = stat.S_ISREG(os.stat(path).st_mode)
            return result
    • 使用[Python 3]:contextlib。抑制*exceptions -这是具体地用于选择性地抑制异常设计


    但是,它们似乎是try / except / else / finally块的包装,如[Python 3]:with语句指出:

    这使得普通试试 …… 除非 …… 终于被封装为方便重复使用的使用模式。

  3. 文件系统遍历功能(并在结果中搜索匹配项)


    由于这些遍历文件夹,(在大多数情况下)它们对于我们的问题效率不高(有一些exceptions,例如非通配glob bing-如@ShadowRanger所指出的),所以我不再坚持使用它们。更不用说在某些情况下,可能需要处理文件名。

  4. [Python 3]:操作系统。访问路径,模式,*,dir_fd =无,effective_ids =假follow_symlinks =真的行为是接近os.path.exists(实际上这是2个,主要是因为更宽,第二参数)

    • 用户权限可能会限制文件“可见性”,如doc所述:

      …测试调用用户是否具有对path的指定访问权限。模式应该为F_OK以测试路径的存在…

    os.access("/tmp", os.F_OK)

    自从我也工作Ç,我用这个方法,以及因为引擎盖下,它调用本地API小号(同样,通过“$ {} PYTHON_SRC_DIR /Modules/posixmodule.c”),但它也开辟了可能的栅极用户errors,它不像其他变体那样像Python ic。因此,正如@AaronHall正确指出的那样,除非您知道自己在做什么,否则不要使用它:

    注意:也可以通过[Python 3]调用本地API ctypes -Python的外部函数库,但是在大多数情况下,它更为复杂。

    特定于):由于vcruntime *msvcr *). dll导出[MS.Docs]:_access,_waccess函数家族,因此下面是一个示例:

    Python 3.5.3 (v3.5.3:1880cb95a742, Jan 16 2017, 16:02:32) [MSC v.1900 64 bit (AMD64)] on win32
    Type "help", "copyright", "credits" or "license" for more information.
    >>> import os, ctypes
    >>> ctypes.CDLL("msvcrt")._waccess(u"C:\\Windows\\System32\\cmd.exe", os.F_OK)
    0
    >>> ctypes.CDLL("msvcrt")._waccess(u"C:\\Windows\\System32\\cmd.exe.notexist", os.F_OK)
    -1

    注意事项

    • 尽管这不是一个好习惯,但我os.F_OK在通话中使用了,但这只是为了清楚起见(其值为0
    • 我正在使用_waccess,以便相同的代码可在Python3Python2上使用(尽管它们之间存在与Unicode相关的区别)
    • 尽管这是针对非常特定的领域,但之前的任何答案都未提及


    LNXUbtu(16 64)以及)对应物:

    Python 3.5.2 (default, Nov 17 2016, 17:05:23)
    [GCC 5.4.0 20160609] on linux
    Type "help", "copyright", "credits" or "license" for more information.
    >>> import os, ctypes
    >>> ctypes.CDLL("/lib/x86_64-linux-gnu/libc.so.6").access(b"/tmp", os.F_OK)
    0
    >>> ctypes.CDLL("/lib/x86_64-linux-gnu/libc.so.6").access(b"/tmp.notexist", os.F_OK)
    -1

    注意事项

    • 而是硬编码libc的路径(“ /lib/x86_64-linux-gnu/libc.so.6”),该路径在整个系统之间可能(而且很可能会有所不同),可以将None(或空字符串)传递给CDLL构造函数(ctypes.CDLL(None).access(b"/tmp", os.F_OK))。根据[man7]:DLOPEN(3)

      如果filename为NULL,则返回的句柄用于主程序。当给 dlsym()时,此句柄将在主程序中搜索符号,然后在程序启动时加载所有共享对象,然后在dlopen()中加载带有标志RTLD_GLOBAL的所有共享对象。

      • 主(当前)程序( python)与libc链接,因此其符号(包括访问)将加载)
      • 必须小心处理,因为像mainPy_Main这样的函数和(所有)其他功能都可用。打电话给他们可能会造成灾难性的影响(对当前程序)
      • 这也不适用于Win(但是这没什么大不了的,因为msvcrt.dll位于“%SystemRoot%\ System32”中,默认情况下位于%PATH%中)。我想进一步介绍并在Win上复制此行为(并提交补丁),但事实证明,[MS.Docs]:GetProcAddress函数仅“看到” 导出的符号,因此除非有人在主可执行文件中声明该函数如__declspec(dllexport)(为什么地球上普通的人会做的?),主程序加载,但几乎无法使用
  5. 安装一些具有文件系统功能的第三方模块

    最有可能的,将依赖于上述方法之一(可能需要进行一些自定义)。
    一个示例是(再次,特定于Win[GitHub]:mhammond / pywin32-Windows的Python(pywin32)扩展,它是WINAPIPython包装器。

    但是,由于这更像是一种解决方法,所以我在这里停止。

  6. 另一个(lam)解决方法(gainarie)是(我喜欢这样称呼)sysadmin方法:使用Python作为包装器执行Shell命令

    • 获胜

      (py35x64_test) e:\Work\Dev\StackOverflow\q000082831>"e:\Work\Dev\VEnvs\py35x64_test\Scripts\python.exe" -c "import os; print(os.system('dir /b \"C:\\Windows\\System32\\cmd.exe\" > nul 2>&1'))"
      0
      
      (py35x64_test) e:\Work\Dev\StackOverflow\q000082831>"e:\Work\Dev\VEnvs\py35x64_test\Scripts\python.exe" -c "import os; print(os.system('dir /b \"C:\\Windows\\System32\\cmd.exe.notexist\" > nul 2>&1'))"
      1
    • 尼克斯LnxUbtu)):

      [cfati@cfati-ubtu16x64-0:~]> python3 -c "import os; print(os.system('ls \"/tmp\" > /dev/null 2>&1'))"
      0
      [cfati@cfati-ubtu16x64-0:~]> python3 -c "import os; print(os.system('ls \"/tmp.notexist\" > /dev/null 2>&1'))"
      512

底线

  • 不要使用 / 除外 / 其它 / 最后块,因为它们可以防止您遇到一系列令人讨厌的问题。我可以想到的一个反例是性能:此类块非常昂贵,因此请不要将它们放在应该每秒运行数十万次的代码中(但是(在大多数情况下,由于它涉及磁盘访问,事实并非如此)。

最后说明

  • 我将尝试使其保持最新状态,欢迎提出任何建议,我将结合所有有用的内容,使之成为答案

Although almost every possible way has been listed in (at least one of) the existing answers (e.g. Python 3.4 specific stuff was added), I’ll try to group everything together.

Note: every piece of Python standard library code that I’m going to post, belongs to version 3.5.3.

Problem statement:

  1. Check file (arguable: also folder (“special” file) ?) existence
  2. Don’t use try / except / else / finally blocks

Possible solutions:

  1. [Python 3]: os.path.exists(path) (also check other function family members like os.path.isfile, os.path.isdir, os.path.lexists for slightly different behaviors)

    os.path.exists(path)
    

    Return True if path refers to an existing path or an open file descriptor. Returns False for broken symbolic links. On some platforms, this function may return False if permission is not granted to execute os.stat() on the requested file, even if the path physically exists.

    All good, but if following the import tree:

    • os.pathposixpath.py (ntpath.py)

      • genericpath.py, line ~#20+

        def exists(path):
            """Test whether a path exists.  Returns False for broken symbolic links"""
            try:
                st = os.stat(path)
            except os.error:
                return False
            return True
        

    it’s just a try / except block around [Python 3]: os.stat(path, *, dir_fd=None, follow_symlinks=True). So, your code is try / except free, but lower in the framestack there’s (at least) one such block. This also applies to other funcs (including os.path.isfile).

    1.1. [Python 3]: Path.is_file()

    • It’s a fancier (and more pythonic) way of handling paths, but
    • Under the hood, it does exactly the same thing (pathlib.py, line ~#1330):

      def is_file(self):
          """
          Whether this path is a regular file (also True for symlinks pointing
          to regular files).
          """
          try:
              return S_ISREG(self.stat().st_mode)
          except OSError as e:
              if e.errno not in (ENOENT, ENOTDIR):
                  raise
              # Path doesn't exist or is a broken symlink
              # (see https://bitbucket.org/pitrou/pathlib/issue/12/)
              return False
      
  2. [Python 3]: With Statement Context Managers. Either:

    • Create one:

      class Swallow:  # Dummy example
          swallowed_exceptions = (FileNotFoundError,)
      
          def __enter__(self):
              print("Entering...")
      
          def __exit__(self, exc_type, exc_value, exc_traceback):
              print("Exiting:", exc_type, exc_value, exc_traceback)
              return exc_type in Swallow.swallowed_exceptions  # only swallow FileNotFoundError (not e.g. TypeError - if the user passes a wrong argument like None or float or ...)
      
      • And its usage – I’ll replicate the os.path.isfile behavior (note that this is just for demonstrating purposes, do not attempt to write such code for production):

        import os
        import stat
        
        
        def isfile_seaman(path):  # Dummy func
            result = False
            with Swallow():
                result = stat.S_ISREG(os.stat(path).st_mode)
            return result
        
    • Use [Python 3]: contextlib.suppress(*exceptions) – which was specifically designed for selectively suppressing exceptions


    But, they seem to be wrappers over try / except / else / finally blocks, as [Python 3]: The with statement states:

    This allows common tryexceptfinally usage patterns to be encapsulated for convenient reuse.

  3. Filesystem traversal functions (and search the results for matching item(s))


    Since these iterate over folders, (in most of the cases) they are inefficient for our problem (there are exceptions, like non wildcarded globbing – as @ShadowRanger pointed out), so I’m not going to insist on them. Not to mention that in some cases, filename processing might be required.

  4. [Python 3]: os.access(path, mode, *, dir_fd=None, effective_ids=False, follow_symlinks=True) whose behavior is close to os.path.exists (actually it’s wider, mainly because of the 2nd argument)

    • user permissions might restrict the file “visibility” as the doc states:

      …test if the invoking user has the specified access to path. mode should be F_OK to test the existence of path…

    os.access("/tmp", os.F_OK)

    Since I also work in C, I use this method as well because under the hood, it calls native APIs (again, via “${PYTHON_SRC_DIR}/Modules/posixmodule.c”), but it also opens a gate for possible user errors, and it’s not as Pythonic as other variants. So, as @AaronHall rightly pointed out, don’t use it unless you know what you’re doing:

    Note: calling native APIs is also possible via [Python 3]: ctypes – A foreign function library for Python, but in most cases it’s more complicated.

    (Win specific): Since vcruntime* (msvcr*) .dll exports a [MS.Docs]: _access, _waccess function family as well, here’s an example:

    Python 3.5.3 (v3.5.3:1880cb95a742, Jan 16 2017, 16:02:32) [MSC v.1900 64 bit (AMD64)] on win32
    Type "help", "copyright", "credits" or "license" for more information.
    >>> import os, ctypes
    >>> ctypes.CDLL("msvcrt")._waccess(u"C:\\Windows\\System32\\cmd.exe", os.F_OK)
    0
    >>> ctypes.CDLL("msvcrt")._waccess(u"C:\\Windows\\System32\\cmd.exe.notexist", os.F_OK)
    -1
    

    Notes:

    • Although it’s not a good practice, I’m using os.F_OK in the call, but that’s just for clarity (its value is 0)
    • I’m using _waccess so that the same code works on Python3 and Python2 (in spite of unicode related differences between them)
    • Although this targets a very specific area, it was not mentioned in any of the previous answers


    The Lnx (Ubtu (16 x64)) counterpart as well:

    Python 3.5.2 (default, Nov 17 2016, 17:05:23)
    [GCC 5.4.0 20160609] on linux
    Type "help", "copyright", "credits" or "license" for more information.
    >>> import os, ctypes
    >>> ctypes.CDLL("/lib/x86_64-linux-gnu/libc.so.6").access(b"/tmp", os.F_OK)
    0
    >>> ctypes.CDLL("/lib/x86_64-linux-gnu/libc.so.6").access(b"/tmp.notexist", os.F_OK)
    -1
    

    Notes:

    • Instead hardcoding libc‘s path (“/lib/x86_64-linux-gnu/libc.so.6”) which may (and most likely, will) vary across systems, None (or the empty string) can be passed to CDLL constructor (ctypes.CDLL(None).access(b"/tmp", os.F_OK)). According to [man7]: DLOPEN(3):

      If filename is NULL, then the returned handle is for the main program. When given to dlsym(), this handle causes a search for a symbol in the main program, followed by all shared objects loaded at program startup, and then all shared objects loaded by dlopen() with the flag RTLD_GLOBAL.

      • Main (current) program (python) is linked against libc, so its symbols (including access) will be loaded
      • This has to be handled with care, since functions like main, Py_Main and (all the) others are available; calling them could have disastrous effects (on the current program)
      • This doesn’t also apply to Win (but that’s not such a big deal, since msvcrt.dll is located in “%SystemRoot%\System32” which is in %PATH% by default). I wanted to take things further and replicate this behavior on Win (and submit a patch), but as it turns out, [MS.Docs]: GetProcAddress function only “sees” exported symbols, so unless someone declares the functions in the main executable as __declspec(dllexport) (why on Earth the regular person would do that?), the main program is loadable but pretty much unusable
  5. Install some third-party module with filesystem capabilities

    Most likely, will rely on one of the ways above (maybe with slight customizations).
    One example would be (again, Win specific) [GitHub]: mhammond/pywin32 – Python for Windows (pywin32) Extensions, which is a Python wrapper over WINAPIs.

    But, since this is more like a workaround, I’m stopping here.

  6. Another (lame) workaround (gainarie) is (as I like to call it,) the sysadmin approach: use Python as a wrapper to execute shell commands

    • Win:

      (py35x64_test) e:\Work\Dev\StackOverflow\q000082831>"e:\Work\Dev\VEnvs\py35x64_test\Scripts\python.exe" -c "import os; print(os.system('dir /b \"C:\\Windows\\System32\\cmd.exe\" > nul 2>&1'))"
      0
      
      (py35x64_test) e:\Work\Dev\StackOverflow\q000082831>"e:\Work\Dev\VEnvs\py35x64_test\Scripts\python.exe" -c "import os; print(os.system('dir /b \"C:\\Windows\\System32\\cmd.exe.notexist\" > nul 2>&1'))"
      1
      
    • Nix (Lnx (Ubtu)):

      [cfati@cfati-ubtu16x64-0:~]> python3 -c "import os; print(os.system('ls \"/tmp\" > /dev/null 2>&1'))"
      0
      [cfati@cfati-ubtu16x64-0:~]> python3 -c "import os; print(os.system('ls \"/tmp.notexist\" > /dev/null 2>&1'))"
      512
      

Bottom line:

  • Do use try / except / else / finally blocks, because they can prevent you running into a series of nasty problems. A counter-example that I can think of, is performance: such blocks are costly, so try not to place them in code that it’s supposed to run hundreds of thousands times per second (but since (in most cases) it involves disk access, it won’t be the case).

Final note(s):

  • I will try to keep it up to date, any suggestions are welcome, I will incorporate anything useful that will come up into the answer

回答 7

这是检查文件是否存在的最简单方法。仅仅因为文件在您检查时存在并不保证在您需要打开文件时该文件就会存在。

import os
fname = "foo.txt"
if os.path.isfile(fname):
    print("file does exist at this time")
else:
    print("no such file exists at this time")

This is the simplest way to check if a file exists. Just because the file existed when you checked doesn’t guarantee that it will be there when you need to open it.

import os
fname = "foo.txt"
if os.path.isfile(fname):
    print("file does exist at this time")
else:
    print("no such file exists at this time")

回答 8

Python 3.4+具有一个面向对象的路径模块:pathlib。使用这个新模块,您可以检查文件是否存在,如下所示:

import pathlib
p = pathlib.Path('path/to/file')
if p.is_file():  # or p.is_dir() to see if it is a directory
    # do stuff

您可以(通常应该)try/except在打开文件时仍然使用块:

try:
    with p.open() as f:
        # do awesome stuff
except OSError:
    print('Well darn.')

pathlib模块中包含很多很棒的东西:方便的globing,检查文件的所有者,更容易的路径连接等。值得一试。如果您使用的是旧版Python(2.6版或更高版本),则仍可以使用pip安装pathlib:

# installs pathlib2 on older Python versions
# the original third-party module, pathlib, is no longer maintained.
pip install pathlib2

然后按如下所示导入它:

# Older Python versions
import pathlib2 as pathlib

Python 3.4+ has an object-oriented path module: pathlib. Using this new module, you can check whether a file exists like this:

import pathlib
p = pathlib.Path('path/to/file')
if p.is_file():  # or p.is_dir() to see if it is a directory
    # do stuff

You can (and usually should) still use a try/except block when opening files:

try:
    with p.open() as f:
        # do awesome stuff
except OSError:
    print('Well darn.')

The pathlib module has lots of cool stuff in it: convenient globbing, checking file’s owner, easier path joining, etc. It’s worth checking out. If you’re on an older Python (version 2.6 or later), you can still install pathlib with pip:

# installs pathlib2 on older Python versions
# the original third-party module, pathlib, is no longer maintained.
pip install pathlib2

Then import it as follows:

# Older Python versions
import pathlib2 as pathlib

回答 9

首选try语句。它被认为是更好的风格,并且避免了比赛条件。

不要相信我。这个理论有很多支持。这是一对:

Prefer the try statement. It’s considered better style and avoids race conditions.

Don’t take my word for it. There’s plenty of support for this theory. Here’s a couple:


回答 10

如何在不使用try语句的情况下使用Python检查文件是否存在?

从Python 3.4开始可用,导入并实例化Path具有文件名的对象,然后检查is_file方法(请注意,对于指向常规文件的符号链接,此方法也返回True):

>>> from pathlib import Path
>>> Path('/').is_file()
False
>>> Path('/initrd.img').is_file()
True
>>> Path('/doesnotexist').is_file()
False

如果您使用的是Python 2,则可以从pypi,反向移植pathlib模块pathlib2,或者通过其他方式isfile从该os.path模块检查:

>>> import os
>>> os.path.isfile('/')
False
>>> os.path.isfile('/initrd.img')
True
>>> os.path.isfile('/doesnotexist')
False

现在,上面可能是这里最好的实用直接答案,但是有可能出现竞争状况(取决于您要完成的工作),并且底层实现使用try,而Python使用try在实现中无处不在。

因为Python使用 try随处,所以实际上没有理由避免使用它的实现。

但是此答案的其余部分试图考虑这些警告。

更长,更古怪的答案

自Python 3.4起可用,请使用中的新Path对象pathlib。请注意,这.exists不是很正确,因为目录不是文件(在Unix中,一切都是文件)。

>>> from pathlib import Path
>>> root = Path('/')
>>> root.exists()
True

所以我们需要使用is_file

>>> root.is_file()
False

这是有关的帮助is_file

is_file(self)
    Whether this path is a regular file (also True for symlinks pointing
    to regular files).

因此,让我们获得一个我们知道是文件的文件:

>>> import tempfile
>>> file = tempfile.NamedTemporaryFile()
>>> filepathobj = Path(file.name)
>>> filepathobj.is_file()
True
>>> filepathobj.exists()
True

默认情况下,NamedTemporaryFile该文件在关闭时删除(并且在没有更多引用时将自动关闭)。

>>> del file
>>> filepathobj.exists()
False
>>> filepathobj.is_file()
False

但是,如果深入研究实现,您将看到它的is_file使用try

def is_file(self):
    """
    Whether this path is a regular file (also True for symlinks pointing
    to regular files).
    """
    try:
        return S_ISREG(self.stat().st_mode)
    except OSError as e:
        if e.errno not in (ENOENT, ENOTDIR):
            raise
        # Path doesn't exist or is a broken symlink
        # (see https://bitbucket.org/pitrou/pathlib/issue/12/)
        return False

比赛条件:为什么我们喜欢尝试

我们喜欢,try因为它避免了比赛条件。使用try,您只需尝试读取文件,期望它在那里,否则,您将捕获异常并执行有意义的后备行为。

如果要在尝试读取文件之前检查文件是否存在,并且可能要删除它,然后使用多个线程或进程,或者另一个程序知道该文件并可能将其删除,则可能会遇到以下风险:一个竞争条件,如果你检查它的存在,因为你是那么赛车的前打开状态(它的存在)的变化。

竞争条件很难调试,因为存在一个很小的窗口,在竞争窗口中它们可能导致您的程序失败。

但是,如果这是您的动力,则可以获取try使用的语句suppress上下文管理器。

在没有try语句的情况下避免出现竞争情况: suppress

Python 3.4为我们提供了suppress上下文管理器(以前称为ignore上下文管理器),它在较少的行中就语义上完全相同,而也(至少在表面上)满足了避免try语句的原始要求:

from contextlib import suppress
from pathlib import Path

用法:

>>> with suppress(OSError), Path('doesnotexist').open() as f:
...     for line in f:
...         print(line)
... 
>>>
>>> with suppress(OSError):
...     Path('doesnotexist').unlink()
... 
>>> 

对于较早的Python,您可以自己滚动suppress,但是如果没有,try它将比使用更加冗长。我确实相信这实际上是在try Python 3.4之前可以应用到Python的任何级别的唯一答案,因为它使用上下文管理器代替:

class suppress(object):
    def __init__(self, *exceptions):
        self.exceptions = exceptions
    def __enter__(self):
        return self
    def __exit__(self, exc_type, exc_value, traceback):
        if exc_type is not None:
            return issubclass(exc_type, self.exceptions)

尝试一下可能会更容易:

from contextlib import contextmanager

@contextmanager
def suppress(*exceptions):
    try:
        yield
    except exceptions:
        pass

不符合要求的其他选项“不尝试”:

文件

import os
os.path.isfile(path)

文档

os.path.isfile(path)

如果path是现有的常规文件,则返回True。这是继符号链接,这样既islink()并且isfile()可以为相同的路径是正确的。

但是,如果您检查此函数的来源,您会发现它确实使用了try语句:

# This follows symbolic links, so both islink() and isdir() can be true
# for the same path on systems that support symlinks
def isfile(path):
    """Test whether a path is a regular file"""
    try:
        st = os.stat(path)
    except os.error:
        return False
    return stat.S_ISREG(st.st_mode)
>>> OSError is os.error
True

它所做的就是使用给定的路径查看它是否可以获取统计信息OSError,然后捕获并检查它是否是文件(如果没有引发异常)。

如果您打算对文件进行某些操作,建议您使用try-except直接尝试它,以避免出现竞争情况:

try:
    with open(path) as f:
        f.read()
except OSError:
    pass

os.access

可用于Unix和Windows os.access,但要使用它,必须传递标志,并且不能区分文件和目录。这更用于测试真正的调用用户是否在提升的特权环境中具有访问权限:

import os
os.access(path, os.F_OK)

它也遭受与相同的比赛条件问题isfile。从文档

注意:在实际使用open()之前,使用access()检查用户是否被授权打开文件,这会造成安全漏洞,因为用户可能会利用检查和打开文件之间的较短时间间隔来对其进行操作。最好使用EAFP技术。例如:

if os.access("myfile", os.R_OK):
    with open("myfile") as fp:
        return fp.read()
return "some default data"

最好写成:

try:
    fp = open("myfile")
except IOError as e:
    if e.errno == errno.EACCES:
        return "some default data"
    # Not a permission error.
    raise
else:
    with fp:
        return fp.read()

避免使用 os.access。与上面讨论的较高级别的对象和功能相比,它是一个较低级别的功能,具有更多的用户错误机会。

批评另一个答案:

另一个答案是这样的os.access

就我个人而言,我更喜欢这个,因为它在后台调用了本机API(通过“ $ {PYTHON_SRC_DIR} /Modules/posixmodule.c”),但是它也为可能的用户错误打开了大门,并且它不像其他变体那样具有Python风格:

这个答案说它偏爱非Pythonic且容易出错的方法,没有理由。似乎鼓励用户使用不了解它们的低级API。

它还创建了一个上下文管理器,通过无条件返回True,它允许所有Exceptions(包括KeyboardInterruptSystemExit!)以静默方式传递,这是隐藏bug的好方法。

这似乎鼓励用户采用不良做法。

How do I check whether a file exists, using Python, without using a try statement?

Now available since Python 3.4, import and instantiate a Path object with the file name, and check the is_file method (note that this returns True for symlinks pointing to regular files as well):

>>> from pathlib import Path
>>> Path('/').is_file()
False
>>> Path('/initrd.img').is_file()
True
>>> Path('/doesnotexist').is_file()
False

If you’re on Python 2, you can backport the pathlib module from pypi, pathlib2, or otherwise check isfile from the os.path module:

>>> import os
>>> os.path.isfile('/')
False
>>> os.path.isfile('/initrd.img')
True
>>> os.path.isfile('/doesnotexist')
False

Now the above is probably the best pragmatic direct answer here, but there’s the possibility of a race condition (depending on what you’re trying to accomplish), and the fact that the underlying implementation uses a try, but Python uses try everywhere in its implementation.

Because Python uses try everywhere, there’s really no reason to avoid an implementation that uses it.

But the rest of this answer attempts to consider these caveats.

Longer, much more pedantic answer

Available since Python 3.4, use the new Path object in pathlib. Note that .exists is not quite right, because directories are not files (except in the unix sense that everything is a file).

>>> from pathlib import Path
>>> root = Path('/')
>>> root.exists()
True

So we need to use is_file:

>>> root.is_file()
False

Here’s the help on is_file:

is_file(self)
    Whether this path is a regular file (also True for symlinks pointing
    to regular files).

So let’s get a file that we know is a file:

>>> import tempfile
>>> file = tempfile.NamedTemporaryFile()
>>> filepathobj = Path(file.name)
>>> filepathobj.is_file()
True
>>> filepathobj.exists()
True

By default, NamedTemporaryFile deletes the file when closed (and will automatically close when no more references exist to it).

>>> del file
>>> filepathobj.exists()
False
>>> filepathobj.is_file()
False

If you dig into the implementation, though, you’ll see that is_file uses try:

def is_file(self):
    """
    Whether this path is a regular file (also True for symlinks pointing
    to regular files).
    """
    try:
        return S_ISREG(self.stat().st_mode)
    except OSError as e:
        if e.errno not in (ENOENT, ENOTDIR):
            raise
        # Path doesn't exist or is a broken symlink
        # (see https://bitbucket.org/pitrou/pathlib/issue/12/)
        return False

Race Conditions: Why we like try

We like try because it avoids race conditions. With try, you simply attempt to read your file, expecting it to be there, and if not, you catch the exception and perform whatever fallback behavior makes sense.

If you want to check that a file exists before you attempt to read it, and you might be deleting it and then you might be using multiple threads or processes, or another program knows about that file and could delete it – you risk the chance of a race condition if you check it exists, because you are then racing to open it before its condition (its existence) changes.

Race conditions are very hard to debug because there’s a very small window in which they can cause your program to fail.

But if this is your motivation, you can get the value of a try statement by using the suppress context manager.

Avoiding race conditions without a try statement: suppress

Python 3.4 gives us the suppress context manager (previously the ignore context manager), which does semantically exactly the same thing in fewer lines, while also (at least superficially) meeting the original ask to avoid a try statement:

from contextlib import suppress
from pathlib import Path

Usage:

>>> with suppress(OSError), Path('doesnotexist').open() as f:
...     for line in f:
...         print(line)
... 
>>>
>>> with suppress(OSError):
...     Path('doesnotexist').unlink()
... 
>>> 

For earlier Pythons, you could roll your own suppress, but without a try will be more verbose than with. I do believe this actually is the only answer that doesn’t use try at any level in the Python that can be applied to prior to Python 3.4 because it uses a context manager instead:

class suppress(object):
    def __init__(self, *exceptions):
        self.exceptions = exceptions
    def __enter__(self):
        return self
    def __exit__(self, exc_type, exc_value, traceback):
        if exc_type is not None:
            return issubclass(exc_type, self.exceptions)

Perhaps easier with a try:

from contextlib import contextmanager

@contextmanager
def suppress(*exceptions):
    try:
        yield
    except exceptions:
        pass

Other options that don’t meet the ask for “without try”:

isfile

import os
os.path.isfile(path)

from the docs:

os.path.isfile(path)

Return True if path is an existing regular file. This follows symbolic links, so both islink() and isfile() can be true for the same path.

But if you examine the source of this function, you’ll see it actually does use a try statement:

# This follows symbolic links, so both islink() and isdir() can be true
# for the same path on systems that support symlinks
def isfile(path):
    """Test whether a path is a regular file"""
    try:
        st = os.stat(path)
    except os.error:
        return False
    return stat.S_ISREG(st.st_mode)
>>> OSError is os.error
True

All it’s doing is using the given path to see if it can get stats on it, catching OSError and then checking if it’s a file if it didn’t raise the exception.

If you intend to do something with the file, I would suggest directly attempting it with a try-except to avoid a race condition:

try:
    with open(path) as f:
        f.read()
except OSError:
    pass

os.access

Available for Unix and Windows is os.access, but to use you must pass flags, and it does not differentiate between files and directories. This is more used to test if the real invoking user has access in an elevated privilege environment:

import os
os.access(path, os.F_OK)

It also suffers from the same race condition problems as isfile. From the docs:

Note: Using access() to check if a user is authorized to e.g. open a file before actually doing so using open() creates a security hole, because the user might exploit the short time interval between checking and opening the file to manipulate it. It’s preferable to use EAFP techniques. For example:

if os.access("myfile", os.R_OK):
    with open("myfile") as fp:
        return fp.read()
return "some default data"

is better written as:

try:
    fp = open("myfile")
except IOError as e:
    if e.errno == errno.EACCES:
        return "some default data"
    # Not a permission error.
    raise
else:
    with fp:
        return fp.read()

Avoid using os.access. It is a low level function that has more opportunities for user error than the higher level objects and functions discussed above.

Criticism of another answer:

Another answer says this about os.access:

Personally, I prefer this one because under the hood, it calls native APIs (via “${PYTHON_SRC_DIR}/Modules/posixmodule.c”), but it also opens a gate for possible user errors, and it’s not as Pythonic as other variants:

This answer says it prefers a non-Pythonic, error-prone method, with no justification. It seems to encourage users to use low-level APIs without understanding them.

It also creates a context manager which, by unconditionally returning True, allows all Exceptions (including KeyboardInterrupt and SystemExit!) to pass silently, which is a good way to hide bugs.

This seems to encourage users to adopt poor practices.


回答 11

import os
#Your path here e.g. "C:\Program Files\text.txt"
#For access purposes: "C:\\Program Files\\text.txt"
if os.path.exists("C:\..."):   
    print "File found!"
else:
    print "File not found!"

导入os使您可以更轻松地在操作系统中导航和执行标准操作。

供参考,请参阅如何使用Python检查文件是否存在?

如果需要高级操作,请使用shutil

import os
#Your path here e.g. "C:\Program Files\text.txt"
#For access purposes: "C:\\Program Files\\text.txt"
if os.path.exists("C:\..."):   
    print "File found!"
else:
    print "File not found!"

Importing os makes it easier to navigate and perform standard actions with your operating system.

For reference also see How to check whether a file exists using Python?

If you need high-level operations, use shutil.


回答 12

测试的文件和文件夹os.path.isfile()os.path.isdir()os.path.exists()

假定“路径”是有效路径,此表显示了每个函数对文件和文件夹返回的内容:

在此处输入图片说明

您还可以测试文件是否是os.path.splitext()用于获取扩展名的特定类型的文件(如果您还不知道的话)

>>> import os
>>> path = "path to a word document"
>>> os.path.isfile(path)
True
>>> os.path.splitext(path)[1] == ".docx" # test if the extension is .docx
True

Testing for files and folders with os.path.isfile(), os.path.isdir() and os.path.exists()

Assuming that the “path” is a valid path, this table shows what is returned by each function for files and folders:

enter image description here

You can also test if a file is a certain type of file using os.path.splitext() to get the extension (if you don’t already know it)

>>> import os
>>> path = "path to a word document"
>>> os.path.isfile(path)
True
>>> os.path.splitext(path)[1] == ".docx" # test if the extension is .docx
True

回答 13

在2016年,最好的方法仍然是使用os.path.isfile

>>> os.path.isfile('/path/to/some/file.txt')

或者在Python 3中,您可以使用pathlib

import pathlib
path = pathlib.Path('/path/to/some/file.txt')
if path.is_file():
    ...

In 2016 the best way is still using os.path.isfile:

>>> os.path.isfile('/path/to/some/file.txt')

Or in Python 3 you can use pathlib:

import pathlib
path = pathlib.Path('/path/to/some/file.txt')
if path.is_file():
    ...

回答 14

在try / except和之间似乎没有有意义的功能差异isfile(),因此您应该使用哪个才有意义。

如果要读取文件(如果存在),请执行

try:
    f = open(filepath)
except IOError:
    print 'Oh dear.'

但是,如果您只是想重命名文件(如果存在),因此不需要打开它,请执行

if os.path.isfile(filepath):
    os.rename(filepath, filepath + '.old')

如果要写入文件(如果不存在),请执行

# python 2
if not os.path.isfile(filepath):
    f = open(filepath, 'w')

# python 3, x opens for exclusive creation, failing if the file already exists
try:
    f = open(filepath, 'wx')
except IOError:
    print 'file already exists'

如果您需要文件锁定,那是另一回事。

It doesn’t seem like there’s a meaningful functional difference between try/except and isfile(), so you should use which one makes sense.

If you want to read a file, if it exists, do

try:
    f = open(filepath)
except IOError:
    print 'Oh dear.'

But if you just wanted to rename a file if it exists, and therefore don’t need to open it, do

if os.path.isfile(filepath):
    os.rename(filepath, filepath + '.old')

If you want to write to a file, if it doesn’t exist, do

# python 2
if not os.path.isfile(filepath):
    f = open(filepath, 'w')

# python 3, x opens for exclusive creation, failing if the file already exists
try:
    f = open(filepath, 'wx')
except IOError:
    print 'file already exists'

If you need file locking, that’s a different matter.


回答 15

您可以尝试这样做(更安全):

try:
    # http://effbot.org/zone/python-with-statement.htm
    # 'with' is safer to open a file
    with open('whatever.txt') as fh:
        # Do something with 'fh'
except IOError as e:
    print("({})".format(e))

输出为:

([Errno 2]没有这样的文件或目录:’whatever.txt’)

然后,根据结果,您的程序可以仅从那里继续运行,也可以编写代码以停止它。

You could try this (safer):

try:
    # http://effbot.org/zone/python-with-statement.htm
    # 'with' is safer to open a file
    with open('whatever.txt') as fh:
        # Do something with 'fh'
except IOError as e:
    print("({})".format(e))

The ouput would be:

([Errno 2] No such file or directory: ‘whatever.txt’)

Then, depending on the result, your program can just keep running from there or you can code to stop it if you want.


回答 16

尽管我总是建议使用tryexcept语句,但是这里有几种可能(我个人最喜欢使用os.access):

  1. 尝试打开文件:

    打开文件将始终验证文件是否存在。您可以像下面这样创建一个函数:

    def File_Existence(filepath):
        f = open(filepath)
        return True

    如果为False,它将在更高版本的Python中以未处理的IOError或OSError停止执行。要捕获异常,您必须使用tryexcept子句。当然,您总是可以try像这样使用except`语句(感谢hsandt 让我思考):

    def File_Existence(filepath):
        try:
            f = open(filepath)
        except IOError, OSError: # Note OSError is for later versions of Python
            return False
    
        return True
  2. 用途os.path.exists(path)

    这将检查您指定的内容是否存在。但是,它会检查文件目录,因此请注意如何使用它们。

    import os.path
    >>> os.path.exists("this/is/a/directory")
    True
    >>> os.path.exists("this/is/a/file.txt")
    True
    >>> os.path.exists("not/a/directory")
    False
  3. 用途os.access(path, mode)

    这将检查您是否有权访问该文件。它将检查权限。根据os.py文档,输入os.F_OK,它将检查路径的存在。但是,使用此方法会创建一个安全漏洞,因为有人可以使用检查权限到打开文件之间的时间来攻击您的文件。您应该直接打开文件,而不要检查其权限。(EAFPLBYP)。如果您以后不打算打开文件,而仅检查其存在,则可以使用它。

    无论如何,在这里:

    >>> import os
    >>> os.access("/is/a/file.txt", os.F_OK)
    True

我还应该提到,有两种方法将使您无法验证文件的存在。问题将是permission deniedno such file or directory。如果您发现IOError,请设置IOError as e(像我的第一个选项一样),然后键入print(e.args)以便希望确定问题。希望对您有所帮助!:)

Although I always recommend using try and except statements, here are a few possibilities for you (my personal favourite is using os.access):

  1. Try opening the file:

    Opening the file will always verify the existence of the file. You can make a function just like so:

    def File_Existence(filepath):
        f = open(filepath)
        return True
    

    If it’s False, it will stop execution with an unhanded IOError or OSError in later versions of Python. To catch the exception, you have to use a try except clause. Of course, you can always use a try except` statement like so (thanks to hsandt for making me think):

    def File_Existence(filepath):
        try:
            f = open(filepath)
        except IOError, OSError: # Note OSError is for later versions of Python
            return False
    
        return True
    
  2. Use os.path.exists(path):

    This will check the existence of what you specify. However, it checks for files and directories so beware about how you use it.

    import os.path
    >>> os.path.exists("this/is/a/directory")
    True
    >>> os.path.exists("this/is/a/file.txt")
    True
    >>> os.path.exists("not/a/directory")
    False
    
  3. Use os.access(path, mode):

    This will check whether you have access to the file. It will check for permissions. Based on the os.py documentation, typing in os.F_OK, it will check the existence of the path. However, using this will create a security hole, as someone can attack your file using the time between checking the permissions and opening the file. You should instead go directly to opening the file instead of checking its permissions. (EAFP vs LBYP). If you’re not going to open the file afterwards, and only checking its existence, then you can use this.

    Anyway, here:

    >>> import os
    >>> os.access("/is/a/file.txt", os.F_OK)
    True
    

I should also mention that there are two ways that you will not be able to verify the existence of a file. Either the issue will be permission denied or no such file or directory. If you catch an IOError, set the IOError as e (like my first option), and then type in print(e.args) so that you can hopefully determine your issue. I hope it helps! :)


回答 17

日期:2017-12-04

每种可能的解决方案都已在其他答案中列出。

一种检查文件是否存在的直观且可参数的方法如下:

import os
os.path.isfile('~/file.md')  # Returns True if exists, else False
# additionaly check a dir
os.path.isdir('~/folder')  # Returns True if the folder exists, else False
# check either a dir or a file
os.path.exists('~/file')

我做了详尽的备忘单供您参考:

#os.path methods in exhaustive cheatsheet
{'definition': ['dirname',
               'basename',
               'abspath',
               'relpath',
               'commonpath',
               'normpath',
               'realpath'],
'operation': ['split', 'splitdrive', 'splitext',
               'join', 'normcase'],
'compare': ['samefile', 'sameopenfile', 'samestat'],
'condition': ['isdir',
              'isfile',
              'exists',
              'lexists'
              'islink',
              'isabs',
              'ismount',],
 'expand': ['expanduser',
            'expandvars'],
 'stat': ['getatime', 'getctime', 'getmtime',
          'getsize']}

Date:2017-12-04

Every possible solution has been listed in other answers.

An intuitive and arguable way to check if a file exists is the following:

import os
os.path.isfile('~/file.md')  # Returns True if exists, else False
# additionaly check a dir
os.path.isdir('~/folder')  # Returns True if the folder exists, else False
# check either a dir or a file
os.path.exists('~/file')

I made an exhaustive cheatsheet for your reference:

#os.path methods in exhaustive cheatsheet
{'definition': ['dirname',
               'basename',
               'abspath',
               'relpath',
               'commonpath',
               'normpath',
               'realpath'],
'operation': ['split', 'splitdrive', 'splitext',
               'join', 'normcase'],
'compare': ['samefile', 'sameopenfile', 'samestat'],
'condition': ['isdir',
              'isfile',
              'exists',
              'lexists'
              'islink',
              'isabs',
              'ismount',],
 'expand': ['expanduser',
            'expandvars'],
 'stat': ['getatime', 'getctime', 'getmtime',
          'getsize']}

回答 18

如果该文件用于打开,则可以使用以下技术之一:

with open('somefile', 'xt') as f: #Using the x-flag, Python3.3 and above
    f.write('Hello\n')

if not os.path.exists('somefile'): 
    with open('somefile', 'wt') as f:
        f.write("Hello\n")
else:
    print('File already exists!')

更新

为了避免混淆,并根据我得到的答案,当前答案会找到具有给定名称的文件目录。

If the file is for opening you could use one of the following techniques:

with open('somefile', 'xt') as f: #Using the x-flag, Python3.3 and above
    f.write('Hello\n')

if not os.path.exists('somefile'): 
    with open('somefile', 'wt') as f:
        f.write("Hello\n")
else:
    print('File already exists!')

UPDATE

Just to avoid confusion and based on the answers I got, current answer finds either a file or a directory with the given name.


回答 19

另外,os.access()

if os.access("myfile", os.R_OK):
    with open("myfile") as fp:
        return fp.read()

作为R_OKW_OKX_OK标志,以测试权限(DOC)。

Additionally, os.access():

if os.access("myfile", os.R_OK):
    with open("myfile") as fp:
        return fp.read()

Being R_OK, W_OK, and X_OK the flags to test for permissions (doc).


回答 20

if os.path.isfile(path_to_file):
    try: 
        open(path_to_file)
            pass
    except IOError as e:
        print "Unable to open file"

引发异常被认为是程序中流控制的可接受且Pythonic的方法。考虑使用IOErrors处理丢失的文件。在这种情况下,如果文件存在但用户没有读取权限,则将引发IOError异常。

SRC:http//www.pfinn.net/python-check-if-file-exists.html

if os.path.isfile(path_to_file):
    try: 
        open(path_to_file)
            pass
    except IOError as e:
        print "Unable to open file"

Raising exceptions is considered to be an acceptable, and Pythonic, approach for flow control in your program. Consider handling missing files with IOErrors. In this situation, an IOError exception will be raised if the file exists but the user does not have read permissions.

SRC: http://www.pfinn.net/python-check-if-file-exists.html


回答 21

如果导入与NumPy已经用于其它用途,则没有必要导入其他库,例如pathlibospaths等。

import numpy as np
np.DataSource().exists("path/to/your/file")

这将根据其存在返回true或false。

If you imported NumPy already for other purposes then there is no need to import other libraries like pathlib, os, paths, etc.

import numpy as np
np.DataSource().exists("path/to/your/file")

This will return true or false based on its existence.


回答 22

您可以在不使用的情况下写下Brian的建议try:

from contextlib import suppress

with suppress(IOError), open('filename'):
    process()

suppress是Python 3.4的一部分。在较早的发行版中,您可以快速编写自己的隐匿:

from contextlib import contextmanager

@contextmanager
def suppress(*exceptions):
    try:
        yield
    except exceptions:
        pass

You can write Brian’s suggestion without the try:.

from contextlib import suppress

with suppress(IOError), open('filename'):
    process()

suppress is part of Python 3.4. In older releases you can quickly write your own suppress:

from contextlib import contextmanager

@contextmanager
def suppress(*exceptions):
    try:
        yield
    except exceptions:
        pass

回答 23

我是一个已经存在大约十年的软件包的作者,它的功能可以直接解决这个问题。基本上,如果您使用的是非Windows系统,则使用Popen可以访问find。但是,如果您使用的是Windows,它将find使用高效的文件系统walker 复制。

该代码本身不使用try块……除非确定操作系统,然后使您转向“ Unix”风格find或手动编译find。时序测试表明,try确定操作系统的速度更快,因此我确实在那儿使用了它(但没有其他地方)。

>>> import pox
>>> pox.find('*python*', type='file', root=pox.homedir(), recurse=False)
['/Users/mmckerns/.python']

还有文件

>>> print pox.find.__doc__
find(patterns[,root,recurse,type]); Get path to a file or directory

    patterns: name or partial name string of items to search for
    root: path string of top-level directory to search
    recurse: if True, recurse down from root directory
    type: item filter; one of {None, file, dir, link, socket, block, char}
    verbose: if True, be a little verbose about the search

    On some OS, recursion can be specified by recursion depth (an integer).
    patterns can be specified with basic pattern matching. Additionally,
    multiple patterns can be specified by splitting patterns with a ';'
    For example:
        >>> find('pox*', root='..')
        ['/Users/foo/pox/pox', '/Users/foo/pox/scripts/pox_launcher.py']

        >>> find('*shutils*;*init*')
        ['/Users/foo/pox/pox/shutils.py', '/Users/foo/pox/pox/__init__.py']

>>>

如果您愿意看一下的话,可以在这里找到实现:https : //github.com/uqfoundation/pox/blob/89f90fb308f285ca7a62eabe2c38acb87e89dad9/pox/shutils.py#L190

I’m the author of a package that’s been around for about 10 years, and it has a function that addresses this question directly. Basically, if you are on a non-Windows system, it uses Popen to access find. However, if you are on Windows, it replicates find with an efficient filesystem walker.

The code itself does not use a try block… except in determining the operating system and thus steering you to the “Unix”-style find or the hand-buillt find. Timing tests showed that the try was faster in determining the OS, so I did use one there (but nowhere else).

>>> import pox
>>> pox.find('*python*', type='file', root=pox.homedir(), recurse=False)
['/Users/mmckerns/.python']

And the doc…

>>> print pox.find.__doc__
find(patterns[,root,recurse,type]); Get path to a file or directory

    patterns: name or partial name string of items to search for
    root: path string of top-level directory to search
    recurse: if True, recurse down from root directory
    type: item filter; one of {None, file, dir, link, socket, block, char}
    verbose: if True, be a little verbose about the search

    On some OS, recursion can be specified by recursion depth (an integer).
    patterns can be specified with basic pattern matching. Additionally,
    multiple patterns can be specified by splitting patterns with a ';'
    For example:
        >>> find('pox*', root='..')
        ['/Users/foo/pox/pox', '/Users/foo/pox/scripts/pox_launcher.py']

        >>> find('*shutils*;*init*')
        ['/Users/foo/pox/pox/shutils.py', '/Users/foo/pox/pox/__init__.py']

>>>

The implementation, if you care to look, is here: https://github.com/uqfoundation/pox/blob/89f90fb308f285ca7a62eabe2c38acb87e89dad9/pox/shutils.py#L190


回答 24

检查文件或目录是否存在

您可以遵循以下三种方式:

注意1:os.path.isfile仅用于文件

import os.path
os.path.isfile(filename) # True if file exists
os.path.isfile(dirname) # False if directory exists

注意2:os.path.exists用于文件和目录

import os.path
os.path.exists(filename) # True if file exists
os.path.exists(dirname) #True if directory exists

pathlib.Path方法(包含在Python 3+中,可通过pip安装在Python 2中)

from pathlib import Path
Path(filename).exists()

Check file or directory exists

You can follow these three ways:

Note1: The os.path.isfile used only for files

import os.path
os.path.isfile(filename) # True if file exists
os.path.isfile(dirname) # False if directory exists

Note2: The os.path.exists used for both files and directories

import os.path
os.path.exists(filename) # True if file exists
os.path.exists(dirname) #True if directory exists

The pathlib.Path method (included in Python 3+, installable with pip for Python 2)

from pathlib import Path
Path(filename).exists()

回答 25

再添加一个细微的变化,而其他变化未完全反映出来。

这将处理file_path存在None或为空字符串的情况。

def file_exists(file_path):
    if not file_path:
        return False
    elif not os.path.isfile(file_path):
        return False
    else:
        return True

根据Shahbaz的建议添加变体

def file_exists(file_path):
    if not file_path:
        return False
    else:
        return os.path.isfile(file_path)

根据Peter Wood的建议添加变体

def file_exists(file_path):
    return file_path and os.path.isfile(file_path):

Adding one more slight variation which isn’t exactly reflected in the other answers.

This will handle the case of the file_path being None or empty string.

def file_exists(file_path):
    if not file_path:
        return False
    elif not os.path.isfile(file_path):
        return False
    else:
        return True

Adding a variant based on suggestion from Shahbaz

def file_exists(file_path):
    if not file_path:
        return False
    else:
        return os.path.isfile(file_path)

Adding a variant based on suggestion from Peter Wood

def file_exists(file_path):
    return file_path and os.path.isfile(file_path):

回答 26

这是用于Linux命令行环境的1行Python命令。我觉得这个非常好,因为我不是一个很酷的Bash家伙。

python -c "import os.path; print os.path.isfile('/path_to/file.xxx')"

我希望这是有帮助的。

Here’s a 1 line Python command for the Linux command line environment. I find this VERY HANDY since I’m not such a hot Bash guy.

python -c "import os.path; print os.path.isfile('/path_to/file.xxx')"

I hope this is helpful.


回答 27

您可以使用Python的“ OS”库:

>>> import os
>>> os.path.exists("C:\\Users\\####\\Desktop\\test.txt") 
True
>>> os.path.exists("C:\\Users\\####\\Desktop\\test.tx")
False

You can use the “OS” library of Python:

>>> import os
>>> os.path.exists("C:\\Users\\####\\Desktop\\test.txt") 
True
>>> os.path.exists("C:\\Users\\####\\Desktop\\test.tx")
False

回答 28

如何在不使用try语句的情况下检查文件是否存在?

在2016年,可以说这仍然是检查文件是否存在和是否是文件的最简单方法:

import os
os.path.isfile('./file.txt')    # Returns True if exists, else False

isfile实际上只是内部使用os.stat和内部使用的一种辅助方法stat.S_ISREG(mode)。这os.stat是一个较低层的方法,它将为您提供有关文件,目录,套接字,缓冲区等的详细信息。有关os.stat的更多信息

注意:但是,这种方法不会以任何方式锁定文件,因此您的代码可能容易受到“ 检查时间到使用时间 ”(TOCTTOU)错误的攻击

因此,引发异常被认为是程序中流控制的可接受且Pythonic的方法。而且,应该考虑使用IOErrors处理丢失的文件,而不是使用if语句(只是建议)。

How do I check whether a file exists, without using the try statement?

In 2016, this is still arguably the easiest way to check if both a file exists and if it is a file:

import os
os.path.isfile('./file.txt')    # Returns True if exists, else False

isfile is actually just a helper method that internally uses os.stat and stat.S_ISREG(mode) underneath. This os.stat is a lower-level method that will provide you with detailed information about files, directories, sockets, buffers, and more. More about os.stat here

Note: However, this approach will not lock the file in any way and therefore your code can become vulnerable to “time of check to time of use” (TOCTTOU) bugs.

So raising exceptions is considered to be an acceptable, and Pythonic, approach for flow control in your program. And one should consider handling missing files with IOErrors, rather than if statements (just an advice).


回答 29

import os.path

def isReadableFile(file_path, file_name):
    full_path = file_path + "/" + file_name
    try:
        if not os.path.exists(file_path):
            print "File path is invalid."
            return False
        elif not os.path.isfile(full_path):
            print "File does not exist."
            return False
        elif not os.access(full_path, os.R_OK):
            print "File cannot be read."
            return False
        else:
            print "File can be read."
            return True
    except IOError as ex:
        print "I/O error({0}): {1}".format(ex.errno, ex.strerror)
    except Error as ex:
        print "Error({0}): {1}".format(ex.errno, ex.strerror)
    return False
#------------------------------------------------------

path = "/usr/khaled/documents/puzzles"
fileName = "puzzle_1.txt"

isReadableFile(path, fileName)
import os.path

def isReadableFile(file_path, file_name):
    full_path = file_path + "/" + file_name
    try:
        if not os.path.exists(file_path):
            print "File path is invalid."
            return False
        elif not os.path.isfile(full_path):
            print "File does not exist."
            return False
        elif not os.access(full_path, os.R_OK):
            print "File cannot be read."
            return False
        else:
            print "File can be read."
            return True
    except IOError as ex:
        print "I/O error({0}): {1}".format(ex.errno, ex.strerror)
    except Error as ex:
        print "Error({0}): {1}".format(ex.errno, ex.strerror)
    return False
#------------------------------------------------------

path = "/usr/khaled/documents/puzzles"
fileName = "puzzle_1.txt"

isReadableFile(path, fileName)

从Python调用外部命令

问题:从Python调用外部命令

您如何在Python脚本中调用外部命令(就像我在Unix Shell或Windows命令提示符下键入的一样)?

How do you call an external command (as if I’d typed it at the Unix shell or Windows command prompt) from within a Python script?


回答 0

查看标准库中的子流程模块:

import subprocess
subprocess.run(["ls", "-l"])

的优势subprocess主场迎战system的是,它是更灵活(你可以得到的stdoutstderr,“真正”的状态代码,更好的错误处理,等等)。

官方文件建议subprocess在替代模块os.system()

subprocess模块提供了更强大的功能来生成新流程并检索其结果。使用该模块优于使用此功能[ os.system()]。

与子模块更换旧的功能中部分subprocess文件可能有一些有益的食谱。

对于3.5之前的Python版本,请使用call

import subprocess
subprocess.call(["ls", "-l"])

Look at the subprocess module in the standard library:

import subprocess
subprocess.run(["ls", "-l"])

The advantage of subprocess vs. system is that it is more flexible (you can get the stdout, stderr, the “real” status code, better error handling, etc…).

The official documentation recommends the subprocess module over the alternative os.system():

The subprocess module provides more powerful facilities for spawning new processes and retrieving their results; using that module is preferable to using this function [os.system()].

The Replacing Older Functions with the subprocess Module section in the subprocess documentation may have some helpful recipes.

For versions of Python before 3.5, use call:

import subprocess
subprocess.call(["ls", "-l"])

回答 1

下面总结了调用外部程序的方法以及每种方法的优缺点:

  1. os.system("some_command with args")将命令和参数传递到系统的外壳程序。很好,因为您实际上可以以这种方式一次运行多个命令,并设置管道和输入/输出重定向。例如:

    os.system("some_command < input_file | another_command > output_file")  

但是,尽管这样做很方便,但您必须手动处理转义字符(例如空格等)的外壳字符。另一方面,这也使您可以运行仅是外壳程序命令而非实际上是外部程序的命令。请参阅文档

  1. stream = os.popen("some_command with args")os.system除了会为您提供类似于文件的对象之外,您可以使用该对象来访问该过程的标准输入/输出,它的作用与之相同。Popen还有其他3种变体,它们对I / O的处理略有不同。如果您将所有内容都作为字符串传递,那么您的命令将传递到外壳程序;如果将它们作为列表传递,则无需担心转义任何内容。请参阅文档

  2. 模块的Popensubprocess。它旨在替代它,os.popen但缺点是由于太全面而使它稍微复杂一些。例如,您会说:

    print subprocess.Popen("echo Hello World", shell=True, stdout=subprocess.PIPE).stdout.read()

    代替:

    print os.popen("echo Hello World").read()

    但是将所有选项都放在一个统一的类中而不是4个不同的popen函数是一件好事。请参阅文档

  3. call来自subprocess模块的功能。基本上就像Popen类一样,并接受所有相同的参数,但是它只是等待命令完成并提供返回代码。例如:

    return_code = subprocess.call("echo Hello World", shell=True)  

    请参阅文档

  4. 如果您使用的是Python 3.5或更高版本,则可以使用新subprocess.run函数,该函数与上面的代码非常相似,但是更加灵活,并CompletedProcess在命令完成执行后返回一个对象。

  5. os模块还具有您在C程序中拥有的所有fork / exec / spawn函数,但是我不建议直接使用它们。

subprocess模块可能是您所使用的模块。

最后,请注意,对于所有方法,在这些方法中,您将要由外壳执行的最终命令作为字符串传递给您,并且您有责任对其进行转义。如果您传递的字符串的任何部分不能被完全信任,则将带来严重的安全隐患。例如,如果用户正在输入字符串的某些/任何部分。如果不确定,请仅将这些方法与常量一起使用。为了给您暗示的含义,请考虑以下代码:

print subprocess.Popen("echo %s " % user_input, stdout=PIPE).stdout.read()

并想象用户输入了“我的妈妈不爱我&& rm -rf /”这可能会擦除整个文件系统的信息。

Here’s a summary of the ways to call external programs and the advantages and disadvantages of each:

  1. os.system("some_command with args") passes the command and arguments to your system’s shell. This is nice because you can actually run multiple commands at once in this manner and set up pipes and input/output redirection. For example:

    os.system("some_command < input_file | another_command > output_file")  
    

However, while this is convenient, you have to manually handle the escaping of shell characters such as spaces, etc. On the other hand, this also lets you run commands which are simply shell commands and not actually external programs. See the documentation.

  1. stream = os.popen("some_command with args") will do the same thing as os.system except that it gives you a file-like object that you can use to access standard input/output for that process. There are 3 other variants of popen that all handle the i/o slightly differently. If you pass everything as a string, then your command is passed to the shell; if you pass them as a list then you don’t need to worry about escaping anything. See the documentation.

  2. The Popen class of the subprocess module. This is intended as a replacement for os.popen but has the downside of being slightly more complicated by virtue of being so comprehensive. For example, you’d say:

    print subprocess.Popen("echo Hello World", shell=True, stdout=subprocess.PIPE).stdout.read()
    

    instead of:

    print os.popen("echo Hello World").read()
    

    but it is nice to have all of the options there in one unified class instead of 4 different popen functions. See the documentation.

  3. The call function from the subprocess module. This is basically just like the Popen class and takes all of the same arguments, but it simply waits until the command completes and gives you the return code. For example:

    return_code = subprocess.call("echo Hello World", shell=True)  
    

    See the documentation.

  4. If you’re on Python 3.5 or later, you can use the new subprocess.run function, which is a lot like the above but even more flexible and returns a CompletedProcess object when the command finishes executing.

  5. The os module also has all of the fork/exec/spawn functions that you’d have in a C program, but I don’t recommend using them directly.

The subprocess module should probably be what you use.

Finally please be aware that for all methods where you pass the final command to be executed by the shell as a string and you are responsible for escaping it. There are serious security implications if any part of the string that you pass can not be fully trusted. For example, if a user is entering some/any part of the string. If you are unsure, only use these methods with constants. To give you a hint of the implications consider this code:

print subprocess.Popen("echo %s " % user_input, stdout=PIPE).stdout.read()

and imagine that the user enters something “my mama didnt love me && rm -rf /” which could erase the whole filesystem.


回答 2

典型的实现:

import subprocess

p = subprocess.Popen('ls', shell=True, stdout=subprocess.PIPE, stderr=subprocess.STDOUT)
for line in p.stdout.readlines():
    print line,
retval = p.wait()

您可以随意使用stdout管道中的数据进行所需的操作。实际上,您可以简单地省略这些参数(stdout=stderr=),其行为类似于os.system()

Typical implementation:

import subprocess

p = subprocess.Popen('ls', shell=True, stdout=subprocess.PIPE, stderr=subprocess.STDOUT)
for line in p.stdout.readlines():
    print line,
retval = p.wait()

You are free to do what you want with the stdout data in the pipe. In fact, you can simply omit those parameters (stdout= and stderr=) and it’ll behave like os.system().


回答 3

关于从调用者中分离子进程的一些提示(在后台启动子进程)。

假设您要从CGI脚本开始一个长任务。也就是说,子进程的生存期应比CGI脚本执行进程的生存期长。

子流程模块文档中的经典示例是:

import subprocess
import sys

# Some code here

pid = subprocess.Popen([sys.executable, "longtask.py"]) # Call subprocess

# Some more code here

这里的想法是,您不想在long任务.py完成之前在“调用子进程”行中等待。但是尚不清楚示例中“这里有更多代码”行之后会发生什么。

我的目标平台是FreeBSD,但是开发是在Windows上进行的,因此我首先在Windows上遇到了问题。

在Windows(Windows XP)上,直到longtask.py完成工作后,父进程才会完成。这不是CGI脚本中想要的。这个问题不是特定于Python的。在PHP社区中,问题是相同的。

解决方案是将DETACHED_PROCESS 进程创建标志传递给Windows API中的基础CreateProcess函数。如果碰巧安装了pywin32,则可以从win32process模块​​中导入该标志,否则您应该自己定义它:

DETACHED_PROCESS = 0x00000008

pid = subprocess.Popen([sys.executable, "longtask.py"],
                       creationflags=DETACHED_PROCESS).pid

/ * UPD 2015.10.27 @eryksun在下面的注释中指出,语义正确的标志是CREATE_NEW_CONSOLE(0x00000010)* /

在FreeBSD上,我们还有另一个问题:父进程完成后,它也会完成子进程。那也不是您在CGI脚本中想要的。一些实验表明,问题似乎出在共享sys.stdout。可行的解决方案如下:

pid = subprocess.Popen([sys.executable, "longtask.py"], stdout=subprocess.PIPE, stderr=subprocess.PIPE, stdin=subprocess.PIPE)

我没有检查其他平台上的代码,也不知道FreeBSD上该行为的原因。如果有人知道,请分享您的想法。谷歌搜索在Python中启动后台进程尚未阐明。

Some hints on detaching the child process from the calling one (starting the child process in background).

Suppose you want to start a long task from a CGI script. That is, the child process should live longer than the CGI script execution process.

The classical example from the subprocess module documentation is:

import subprocess
import sys

# Some code here

pid = subprocess.Popen([sys.executable, "longtask.py"]) # Call subprocess

# Some more code here

The idea here is that you do not want to wait in the line ‘call subprocess’ until the longtask.py is finished. But it is not clear what happens after the line ‘some more code here’ from the example.

My target platform was FreeBSD, but the development was on Windows, so I faced the problem on Windows first.

On Windows (Windows XP), the parent process will not finish until the longtask.py has finished its work. It is not what you want in a CGI script. The problem is not specific to Python; in the PHP community the problems are the same.

The solution is to pass DETACHED_PROCESS Process Creation Flag to the underlying CreateProcess function in Windows API. If you happen to have installed pywin32, you can import the flag from the win32process module, otherwise you should define it yourself:

DETACHED_PROCESS = 0x00000008

pid = subprocess.Popen([sys.executable, "longtask.py"],
                       creationflags=DETACHED_PROCESS).pid

/* UPD 2015.10.27 @eryksun in a comment below notes, that the semantically correct flag is CREATE_NEW_CONSOLE (0x00000010) */

On FreeBSD we have another problem: when the parent process is finished, it finishes the child processes as well. And that is not what you want in a CGI script either. Some experiments showed that the problem seemed to be in sharing sys.stdout. And the working solution was the following:

pid = subprocess.Popen([sys.executable, "longtask.py"], stdout=subprocess.PIPE, stderr=subprocess.PIPE, stdin=subprocess.PIPE)

I have not checked the code on other platforms and do not know the reasons of the behaviour on FreeBSD. If anyone knows, please share your ideas. Googling on starting background processes in Python does not shed any light yet.


回答 4

import os
os.system("your command")

请注意,这很危险,因为未清除命令。我留给你去谷歌搜索有关“操作系统”和“系统”模块的相关文档。有很多函数(exec *和spawn *)将执行类似的操作。

import os
os.system("your command")

Note that this is dangerous, since the command isn’t cleaned. I leave it up to you to google for the relevant documentation on the ‘os’ and ‘sys’ modules. There are a bunch of functions (exec* and spawn*) that will do similar things.


回答 5

我建议你使用模块,而不是使用os.system因为它没有外壳逃避你,因此更加安全。

subprocess.call(['ping', 'localhost'])

I’d recommend using the subprocess module instead of os.system because it does shell escaping for you and is therefore much safer.

subprocess.call(['ping', 'localhost'])

回答 6

import os
cmd = 'ls -al'
os.system(cmd)

如果要返回命令的结果,可以使用os.popen。但是,自2.6版以来,不推荐使用此方法,而支持subprocess模块,其他答案已经很好地涵盖了这一点。

import os
cmd = 'ls -al'
os.system(cmd)

If you want to return the results of the command, you can use os.popen. However, this is deprecated since version 2.6 in favor of the subprocess module, which other answers have covered well.


回答 7

有很多不同的库,可让您使用Python调用外部命令。对于每个库,我都进行了描述并显示了调用外部命令的示例。我用作示例的命令是ls -l(列出所有文件)。如果您想了解有关任何库的更多信息,我已列出并链接了每个库的文档。

资料来源:

这些都是库:

希望这可以帮助您决定使用哪个库:)

子过程

子进程允许您调用外部命令,并将其连接到它们的输入/输出/错误管道(stdin,stdout和stderr)。子进程是运行命令的默认选择,但有时其他模块更好。

subprocess.run(["ls", "-l"]) # Run command
subprocess.run(["ls", "-l"], stdout=subprocess.PIPE) # This will run the command and return any output
subprocess.run(shlex.split("ls -l")) # You can also use the shlex library to split the command

操作系统

os用于“取决于操作系统的功能”。它也可以用于通过os.system和调用外部命令os.popen(注意:还有一个subprocess.popen)。os将始终运行该shell,对于不需要或不知道如何使用的人来说,它是一个简单的选择subprocess.run

os.system("ls -l") # run command
os.popen("ls -l").read() # This will run the command and return any output

SH

sh是一个子进程接口,可让您像调用程序一样调用程序。如果要多次运行命令,这很有用。

sh.ls("-l") # Run command normally
ls_cmd = sh.Command("ls") # Save command as a variable
ls_cmd() # Run command as if it were a function

plumbum是用于“类似脚本”的Python程序的库。您可以像中的函数那样调用程序sh。如果您要在没有外壳的情况下运行管道,则Plumbum很有用。

ls_cmd = plumbum.local("ls -l") # get command
ls_cmd() # run command

期待

pexpect使您可以生成子应用程序,对其进行控制并在其输出中找到模式。对于在Unix上需要tty的命令,这是子过程的更好选择。

pexpect.run("ls -l") # Run command as normal
child = pexpect.spawn('scp foo user@example.com:.') # Spawns child application
child.expect('Password:') # When this is the output
child.sendline('mypassword')

fabric是一个Python 2.5和2.7库。它允许您执行本地和远程Shell命令。Fabric是在安全Shell(SSH)中运行命令的简单替代方案

fabric.operations.local('ls -l') # Run command as normal
fabric.operations.local('ls -l', capture = True) # Run command and receive output

使者

特使被称为“人类子过程”。它用作subprocess模块周围的便利包装。

r = envoy.run("ls -l") # Run command
r.std_out # get output

命令

commands包含的包装函数os.popen,但由于subprocess是更好的选择,它已从Python 3中删除。

该编辑基于JF Sebastian的评论。

There are lots of different libraries which allow you to call external commands with Python. For each library I’ve given a description and shown an example of calling an external command. The command I used as the example is ls -l (list all files). If you want to find out more about any of the libraries I’ve listed and linked the documentation for each of them.

Sources:

These are all the libraries:

Hopefully this will help you make a decision on which library to use :)

subprocess

Subprocess allows you to call external commands and connect them to their input/output/error pipes (stdin, stdout, and stderr). Subprocess is the default choice for running commands, but sometimes other modules are better.

subprocess.run(["ls", "-l"]) # Run command
subprocess.run(["ls", "-l"], stdout=subprocess.PIPE) # This will run the command and return any output
subprocess.run(shlex.split("ls -l")) # You can also use the shlex library to split the command

os

os is used for “operating system dependent functionality”. It can also be used to call external commands with os.system and os.popen (Note: There is also a subprocess.popen). os will always run the shell and is a simple alternative for people who don’t need to, or don’t know how to use subprocess.run.

os.system("ls -l") # run command
os.popen("ls -l").read() # This will run the command and return any output

sh

sh is a subprocess interface which lets you call programs as if they were functions. This is useful if you want to run a command multiple times.

sh.ls("-l") # Run command normally
ls_cmd = sh.Command("ls") # Save command as a variable
ls_cmd() # Run command as if it were a function

plumbum

plumbum is a library for “script-like” Python programs. You can call programs like functions as in sh. Plumbum is useful if you want to run a pipeline without the shell.

ls_cmd = plumbum.local("ls -l") # get command
ls_cmd() # run command

pexpect

pexpect lets you spawn child applications, control them and find patterns in their output. This is a better alternative to subprocess for commands that expect a tty on Unix.

pexpect.run("ls -l") # Run command as normal
child = pexpect.spawn('scp foo user@example.com:.') # Spawns child application
child.expect('Password:') # When this is the output
child.sendline('mypassword')

fabric

fabric is a Python 2.5 and 2.7 library. It allows you to execute local and remote shell commands. Fabric is simple alternative for running commands in a secure shell (SSH)

fabric.operations.local('ls -l') # Run command as normal
fabric.operations.local('ls -l', capture = True) # Run command and receive output

envoy

envoy is known as “subprocess for humans”. It is used as a convenience wrapper around the subprocess module.

r = envoy.run("ls -l") # Run command
r.std_out # get output

commands

commands contains wrapper functions for os.popen, but it has been removed from Python 3 since subprocess is a better alternative.

The edit was based on J.F. Sebastian’s comment.


回答 8

我总是用fabric这样的东西:

from fabric.operations import local
result = local('ls', capture=True)
print "Content:/n%s" % (result, )

但这似乎是一个很好的工具:sh(Python子进程接口)

看一个例子:

from sh import vgdisplay
print vgdisplay()
print vgdisplay('-v')
print vgdisplay(v=True)

I always use fabric for this things like:

from fabric.operations import local
result = local('ls', capture=True)
print "Content:/n%s" % (result, )

But this seem to be a good tool: sh (Python subprocess interface).

Look at an example:

from sh import vgdisplay
print vgdisplay()
print vgdisplay('-v')
print vgdisplay(v=True)

回答 9

还要检查“ pexpect” Python库。

它允许交互式控制外部程序/命令,甚至包括ssh,ftp,telnet等。您可以键入以下内容:

child = pexpect.spawn('ftp 192.168.0.24')

child.expect('(?i)name .*: ')

child.sendline('anonymous')

child.expect('(?i)password')

Check the “pexpect” Python library, too.

It allows for interactive controlling of external programs/commands, even ssh, ftp, telnet, etc. You can just type something like:

child = pexpect.spawn('ftp 192.168.0.24')

child.expect('(?i)name .*: ')

child.sendline('anonymous')

child.expect('(?i)password')

回答 10

与标准库

使用子流程模块(Python 3):

import subprocess
subprocess.run(['ls', '-l'])

这是推荐的标准方式。但是,构造和编写更复杂的任务(管道,输出,输入等)可能很繁琐。

关于Python版本的注意事项:如果您仍在使用Python 2,subprocess.call的工作方式与此类似。

ProTip:shlex.split可以帮助您解析,和其他功能的命令run,以防您不想(或您不能!)以列表形式提供它们:callsubprocess

import shlex
import subprocess
subprocess.run(shlex.split('ls -l'))

具有外部依赖性

如果您不介意外部依赖性,请使用plumbum

from plumbum.cmd import ifconfig
print(ifconfig['wlan0']())

这是最好的subprocess包装纸。它是跨平台的,即可以在Windows和类似Unix的系统上运行。由安装pip install plumbum

另一个受欢迎的图书馆是sh

from sh import ifconfig
print(ifconfig('wlan0'))

但是,已sh放弃Windows支持,因此它不像以前那样出色。由安装pip install sh

With the standard library

Use the subprocess module (Python 3):

import subprocess
subprocess.run(['ls', '-l'])

It is the recommended standard way. However, more complicated tasks (pipes, output, input, etc.) can be tedious to construct and write.

Note on Python version: If you are still using Python 2, subprocess.call works in a similar way.

ProTip: shlex.split can help you to parse the command for run, call, and other subprocess functions in case you don’t want (or you can’t!) provide them in form of lists:

import shlex
import subprocess
subprocess.run(shlex.split('ls -l'))

With external dependencies

If you do not mind external dependencies, use plumbum:

from plumbum.cmd import ifconfig
print(ifconfig['wlan0']())

It is the best subprocess wrapper. It’s cross-platform, i.e. it works on both Windows and Unix-like systems. Install by pip install plumbum.

Another popular library is sh:

from sh import ifconfig
print(ifconfig('wlan0'))

However, sh dropped Windows support, so it’s not as awesome as it used to be. Install by pip install sh.


回答 11

如果您需要从您所呼叫的命令的输出,那么你可以使用subprocess.check_output(Python的2.7+)。

>>> subprocess.check_output(["ls", "-l", "/dev/null"])
'crw-rw-rw- 1 root root 1, 3 Oct 18  2007 /dev/null\n'

还要注意shell参数。

如果shell是True,则将通过Shell执行指定的命令。如果您主要将Python用于大多数系统外壳程序提供的增强控制流,并且仍希望方便地访问其他外壳程序功能(例如外壳程序管道,文件名通配符,环境变量扩展以及〜扩展到用户的家),则这可能很有用。目录。但是请注意,Python本身提供的实现多壳状的功能(尤其是globfnmatchos.walk()os.path.expandvars()os.path.expanduser(),和shutil)。

If you need the output from the command you are calling, then you can use subprocess.check_output (Python 2.7+).

>>> subprocess.check_output(["ls", "-l", "/dev/null"])
'crw-rw-rw- 1 root root 1, 3 Oct 18  2007 /dev/null\n'

Also note the shell parameter.

If shell is True, the specified command will be executed through the shell. This can be useful if you are using Python primarily for the enhanced control flow it offers over most system shells and still want convenient access to other shell features such as shell pipes, filename wildcards, environment variable expansion, and expansion of ~ to a user’s home directory. However, note that Python itself offers implementations of many shell-like features (in particular, glob, fnmatch, os.walk(), os.path.expandvars(), os.path.expanduser(), and shutil).


回答 12

这就是我运行命令的方式。这段代码包含了您非常需要的所有内容

from subprocess import Popen, PIPE
cmd = "ls -l ~/"
p = Popen(cmd , shell=True, stdout=PIPE, stderr=PIPE)
out, err = p.communicate()
print "Return code: ", p.returncode
print out.rstrip(), err.rstrip()

This is how I run my commands. This code has everything you need pretty much

from subprocess import Popen, PIPE
cmd = "ls -l ~/"
p = Popen(cmd , shell=True, stdout=PIPE, stderr=PIPE)
out, err = p.communicate()
print "Return code: ", p.returncode
print out.rstrip(), err.rstrip()

回答 13

更新:

subprocess.run如果您的代码不需要保持与早期Python版本的兼容性,则从python 3.5开始,建议使用此方法。它更加一致,并且提供与Envoy类似的易用性。(管道并不是那么简单。有关如何查看此问题的信息。)

以下是文档中的一些示例。

运行一个过程:

>>> subprocess.run(["ls", "-l"])  # Doesn't capture output
CompletedProcess(args=['ls', '-l'], returncode=0)

在失败的跑步上加薪:

>>> subprocess.run("exit 1", shell=True, check=True)
Traceback (most recent call last):
  ...
subprocess.CalledProcessError: Command 'exit 1' returned non-zero exit status 1

捕获输出:

>>> subprocess.run(["ls", "-l", "/dev/null"], stdout=subprocess.PIPE)
CompletedProcess(args=['ls', '-l', '/dev/null'], returncode=0,
stdout=b'crw-rw-rw- 1 root root 1, 3 Jan 23 16:23 /dev/null\n')

原始答案:

我建议尝试Envoy。它是子流程的包装,后者旨在替换较旧的模块和功能。特使是人类的子过程。

自述文件中的示例用法:

>>> r = envoy.run('git config', data='data to pipe in', timeout=2)

>>> r.status_code
129
>>> r.std_out
'usage: git config [options]'
>>> r.std_err
''

管道周围的东西:

>>> r = envoy.run('uptime | pbcopy')

>>> r.command
'pbcopy'
>>> r.status_code
0

>>> r.history
[<Response 'uptime'>]

Update:

subprocess.run is the recommended approach as of Python 3.5 if your code does not need to maintain compatibility with earlier Python versions. It’s more consistent and offers similar ease-of-use as Envoy. (Piping isn’t as straightforward though. See this question for how.)

Here’s some examples from the documentation.

Run a process:

>>> subprocess.run(["ls", "-l"])  # Doesn't capture output
CompletedProcess(args=['ls', '-l'], returncode=0)

Raise on failed run:

>>> subprocess.run("exit 1", shell=True, check=True)
Traceback (most recent call last):
  ...
subprocess.CalledProcessError: Command 'exit 1' returned non-zero exit status 1

Capture output:

>>> subprocess.run(["ls", "-l", "/dev/null"], stdout=subprocess.PIPE)
CompletedProcess(args=['ls', '-l', '/dev/null'], returncode=0,
stdout=b'crw-rw-rw- 1 root root 1, 3 Jan 23 16:23 /dev/null\n')

Original answer:

I recommend trying Envoy. It’s a wrapper for subprocess, which in turn aims to replace the older modules and functions. Envoy is subprocess for humans.

Example usage from the README:

>>> r = envoy.run('git config', data='data to pipe in', timeout=2)

>>> r.status_code
129
>>> r.std_out
'usage: git config [options]'
>>> r.std_err
''

Pipe stuff around too:

>>> r = envoy.run('uptime | pbcopy')

>>> r.command
'pbcopy'
>>> r.status_code
0

>>> r.history
[<Response 'uptime'>]

回答 14

使用子过程

…或一个非常简单的命令:

import os
os.system('cat testfile')

Use subprocess.

…or for a very simple command:

import os
os.system('cat testfile')

回答 15

在Python中调用外部命令

简单,使用subprocess.run,它返回一个CompletedProcess对象:

>>> import subprocess
>>> completed_process = subprocess.run('python --version')
Python 3.6.1 :: Anaconda 4.4.0 (64-bit)
>>> completed_process
CompletedProcess(args='python --version', returncode=0)

为什么?

从Python 3.5开始,文档建议使用subprocess.run

推荐的调用子流程的方法是将run()函数用于它可以处理的所有用例。对于更高级的用例,可以直接使用基础Popen接口。

这是最简单的用法示例-完全按照要求进行:

>>> import subprocess
>>> completed_process = subprocess.run('python --version')
Python 3.6.1 :: Anaconda 4.4.0 (64-bit)
>>> completed_process
CompletedProcess(args='python --version', returncode=0)

run等待命令成功完成,然后返回一个CompletedProcess对象。相反,它可以引发TimeoutExpired(如果您给它提供一个timeout=参数)或CalledProcessError(如果失败并通过check=True)。

从上面的示例可以推断,默认情况下,stdout和stderr都通过管道传递到您自己的stdout和stderr。

我们可以检查返回的对象,并查看给出的命令和返回码:

>>> completed_process.args
'python --version'
>>> completed_process.returncode
0

捕获输出

如果要捕获输出,则可以传递subprocess.PIPE给相应的stderrstdout

>>> cp = subprocess.run('python --version', 
                        stderr=subprocess.PIPE, 
                        stdout=subprocess.PIPE)
>>> cp.stderr
b'Python 3.6.1 :: Anaconda 4.4.0 (64-bit)\r\n'
>>> cp.stdout
b''

(我发现将版本信息放入stderr而不是stdout很有意思,并且有点违反直觉。)

传递命令清单

可以很容易地从手动提供命令字符串(如问题所提示)转变为以编程方式构建的字符串。不要以编程方式构建字符串。这是一个潜在的安全问题。最好假设您不信任输入。

>>> import textwrap
>>> args = ['python', textwrap.__file__]
>>> cp = subprocess.run(args, stdout=subprocess.PIPE)
>>> cp.stdout
b'Hello there.\r\n  This is indented.\r\n'

注意,仅args应按位置传递。

完整签名

这是源中的实际签名,如下所示help(run)

def run(*popenargs, input=None, timeout=None, check=False, **kwargs):

popenargskwargs被给予Popen的构造。input可以是universal_newlines=True将通过管道传输到子进程的stdin 的字节字符串(或unicode,如果指定encoding或)。

该文档描述了timeout=check=True比我更可以:

超时参数传递给Popen.communicate()。如果超时到期,子进程将被终止并等待。子进程终止后,将重新引发TimeoutExpired异常。

如果check为true,并且进程以非零退出代码退出,则将引发CalledProcessError异常。该异常的属性包含参数,退出代码以及stdout和stderr(如果已捕获)。

这个示例check=True比我能想到的更好:

>>> subprocess.run("exit 1", shell=True, check=True)
Traceback (most recent call last):
  ...
subprocess.CalledProcessError: Command 'exit 1' returned non-zero exit status 1

扩展签名

这是文档中提供的扩展签名:

subprocess.run(args, *, stdin=None, input=None, stdout=None, stderr=None, 
shell=False, cwd=None, timeout=None, check=False, encoding=None, 
errors=None)

请注意,这表明只应在位置传递args列表。因此,将其余参数作为关键字参数传递。

开普

何时使用Popen代替?我将很难仅根据参数来找到用例。Popen但是,直接使用pollwill 将使您能够访问其方法,包括,“ send_signal”,“ terminate”和“ wait”。

这是来源中Popen给出的签名。我认为这是信息的最精确封装(与相对):help(Popen)

def __init__(self, args, bufsize=-1, executable=None,
             stdin=None, stdout=None, stderr=None,
             preexec_fn=None, close_fds=_PLATFORM_DEFAULT_CLOSE_FDS,
             shell=False, cwd=None, env=None, universal_newlines=False,
             startupinfo=None, creationflags=0,
             restore_signals=True, start_new_session=False,
             pass_fds=(), *, encoding=None, errors=None):

但更多的是信息Popen文档

subprocess.Popen(args, bufsize=-1, executable=None, stdin=None,
                 stdout=None, stderr=None, preexec_fn=None, close_fds=True,
                 shell=False, cwd=None, env=None, universal_newlines=False,
                 startupinfo=None, creationflags=0, restore_signals=True,
                 start_new_session=False, pass_fds=(), *, encoding=None, errors=None)

在新进程中执行子程序。在POSIX上,该类使用类似于os.execvp()的行为来执行子程序。在Windows上,该类使用Windows CreateProcess()函数。Popen的参数如下。

剩下的内容Popen将作为读者的练习。

Calling an external command in Python

Simple, use subprocess.run, which returns a CompletedProcess object:

>>> import subprocess
>>> completed_process = subprocess.run('python --version')
Python 3.6.1 :: Anaconda 4.4.0 (64-bit)
>>> completed_process
CompletedProcess(args='python --version', returncode=0)

Why?

As of Python 3.5, the documentation recommends subprocess.run:

The recommended approach to invoking subprocesses is to use the run() function for all use cases it can handle. For more advanced use cases, the underlying Popen interface can be used directly.

Here’s an example of the simplest possible usage – and it does exactly as asked:

>>> import subprocess
>>> completed_process = subprocess.run('python --version')
Python 3.6.1 :: Anaconda 4.4.0 (64-bit)
>>> completed_process
CompletedProcess(args='python --version', returncode=0)

run waits for the command to successfully finish, then returns a CompletedProcess object. It may instead raise TimeoutExpired (if you give it a timeout= argument) or CalledProcessError (if it fails and you pass check=True).

As you might infer from the above example, stdout and stderr both get piped to your own stdout and stderr by default.

We can inspect the returned object and see the command that was given and the returncode:

>>> completed_process.args
'python --version'
>>> completed_process.returncode
0

Capturing output

If you want to capture the output, you can pass subprocess.PIPE to the appropriate stderr or stdout:

>>> cp = subprocess.run('python --version', 
                        stderr=subprocess.PIPE, 
                        stdout=subprocess.PIPE)
>>> cp.stderr
b'Python 3.6.1 :: Anaconda 4.4.0 (64-bit)\r\n'
>>> cp.stdout
b''

(I find it interesting and slightly counterintuitive that the version info gets put to stderr instead of stdout.)

Pass a command list

One might easily move from manually providing a command string (like the question suggests) to providing a string built programmatically. Don’t build strings programmatically. This is a potential security issue. It’s better to assume you don’t trust the input.

>>> import textwrap
>>> args = ['python', textwrap.__file__]
>>> cp = subprocess.run(args, stdout=subprocess.PIPE)
>>> cp.stdout
b'Hello there.\r\n  This is indented.\r\n'

Note, only args should be passed positionally.

Full Signature

Here’s the actual signature in the source and as shown by help(run):

def run(*popenargs, input=None, timeout=None, check=False, **kwargs):

The popenargs and kwargs are given to the Popen constructor. input can be a string of bytes (or unicode, if specify encoding or universal_newlines=True) that will be piped to the subprocess’s stdin.

The documentation describes timeout= and check=True better than I could:

The timeout argument is passed to Popen.communicate(). If the timeout expires, the child process will be killed and waited for. The TimeoutExpired exception will be re-raised after the child process has terminated.

If check is true, and the process exits with a non-zero exit code, a CalledProcessError exception will be raised. Attributes of that exception hold the arguments, the exit code, and stdout and stderr if they were captured.

and this example for check=True is better than one I could come up with:

>>> subprocess.run("exit 1", shell=True, check=True)
Traceback (most recent call last):
  ...
subprocess.CalledProcessError: Command 'exit 1' returned non-zero exit status 1

Expanded Signature

Here’s an expanded signature, as given in the documentation:

subprocess.run(args, *, stdin=None, input=None, stdout=None, stderr=None, 
shell=False, cwd=None, timeout=None, check=False, encoding=None, 
errors=None)

Note that this indicates that only the args list should be passed positionally. So pass the remaining arguments as keyword arguments.

Popen

When use Popen instead? I would struggle to find use-case based on the arguments alone. Direct usage of Popen would, however, give you access to its methods, including poll, ‘send_signal’, ‘terminate’, and ‘wait’.

Here’s the Popen signature as given in the source. I think this is the most precise encapsulation of the information (as opposed to help(Popen)):

def __init__(self, args, bufsize=-1, executable=None,
             stdin=None, stdout=None, stderr=None,
             preexec_fn=None, close_fds=_PLATFORM_DEFAULT_CLOSE_FDS,
             shell=False, cwd=None, env=None, universal_newlines=False,
             startupinfo=None, creationflags=0,
             restore_signals=True, start_new_session=False,
             pass_fds=(), *, encoding=None, errors=None):

But more informative is the Popen documentation:

subprocess.Popen(args, bufsize=-1, executable=None, stdin=None,
                 stdout=None, stderr=None, preexec_fn=None, close_fds=True,
                 shell=False, cwd=None, env=None, universal_newlines=False,
                 startupinfo=None, creationflags=0, restore_signals=True,
                 start_new_session=False, pass_fds=(), *, encoding=None, errors=None)

Execute a child program in a new process. On POSIX, the class uses os.execvp()-like behavior to execute the child program. On Windows, the class uses the Windows CreateProcess() function. The arguments to Popen are as follows.

Understanding the remaining documentation on Popen will be left as an exercise for the reader.


回答 16

os.system可以,但是有点过时。这也不是很安全。相反,请尝试subprocesssubprocess不会直接调用sh,因此比os.system

在此处获取更多信息。

os.system is OK, but kind of dated. It’s also not very secure. Instead, try subprocess. subprocess does not call sh directly and is therefore more secure than os.system.

Get more information here.


回答 17

还有

>>> from plumbum import local
>>> ls = local["ls"]
>>> ls
LocalCommand(<LocalPath /bin/ls>)
>>> ls()
u'build.py\ndist\ndocs\nLICENSE\nplumbum\nREADME.rst\nsetup.py\ntests\ntodo.txt\n'
>>> notepad = local["c:\\windows\\notepad.exe"]
>>> notepad()                                   # Notepad window pops up
u''                                             # Notepad window is closed by user, command returns

There is also Plumbum

>>> from plumbum import local
>>> ls = local["ls"]
>>> ls
LocalCommand(<LocalPath /bin/ls>)
>>> ls()
u'build.py\ndist\ndocs\nLICENSE\nplumbum\nREADME.rst\nsetup.py\ntests\ntodo.txt\n'
>>> notepad = local["c:\\windows\\notepad.exe"]
>>> notepad()                                   # Notepad window pops up
u''                                             # Notepad window is closed by user, command returns

回答 18

采用:

import os

cmd = 'ls -al'

os.system(cmd)

os-此模块提供了使用依赖于操作系统的功能的可移植方式。

对于更多的os功能,这里是文档。

Use:

import os

cmd = 'ls -al'

os.system(cmd)

os – This module provides a portable way of using operating system-dependent functionality.

For the more os functions, here is the documentation.


回答 19

可能很简单:

import os
cmd = "your command"
os.system(cmd)

It can be this simple:

import os
cmd = "your command"
os.system(cmd)

回答 20

我非常喜欢shell_command的简单性。它建立在子流程模块的顶部。

这是文档中的示例:

>>> from shell_command import shell_call
>>> shell_call("ls *.py")
setup.py  shell_command.py  test_shell_command.py
0
>>> shell_call("ls -l *.py")
-rw-r--r-- 1 ncoghlan ncoghlan  391 2011-12-11 12:07 setup.py
-rw-r--r-- 1 ncoghlan ncoghlan 7855 2011-12-11 16:16 shell_command.py
-rwxr-xr-x 1 ncoghlan ncoghlan 8463 2011-12-11 16:17 test_shell_command.py
0

I quite like shell_command for its simplicity. It’s built on top of the subprocess module.

Here’s an example from the documentation:

>>> from shell_command import shell_call
>>> shell_call("ls *.py")
setup.py  shell_command.py  test_shell_command.py
0
>>> shell_call("ls -l *.py")
-rw-r--r-- 1 ncoghlan ncoghlan  391 2011-12-11 12:07 setup.py
-rw-r--r-- 1 ncoghlan ncoghlan 7855 2011-12-11 16:16 shell_command.py
-rwxr-xr-x 1 ncoghlan ncoghlan 8463 2011-12-11 16:17 test_shell_command.py
0

回答 21

这里还有另一个以前没有提到的区别。

subprocess.Popen将<command>作为子进程执行。就我而言,我需要执行文件<a>,该文件需要与另一个程序<b>通信。

我尝试了子流程,执行成功。但是<b>无法与<a>通信。当我从终端运行时,一切都正常。

还有一个:(注意:kwrite的行为与其他应用程序不同。如果在Firefox上尝试以下操作,结果将有所不同。)

如果尝试这样做os.system("kwrite"),程序流将冻结,直到用户关闭kwrite。为了克服这个问题,我改为尝试os.system(konsole -e kwrite)。这个时间程序继续进行,但是kwrite成为了控制台的子进程。

任何人都运行kwrite而不是子进程(即,它必须在系统监视器中显示在树的最左侧)。

There is another difference here which is not mentioned previously.

subprocess.Popen executes the <command> as a subprocess. In my case, I need to execute file <a> which needs to communicate with another program, <b>.

I tried subprocess, and execution was successful. However <b> could not communicate with <a>. Everything is normal when I run both from the terminal.

One more: (NOTE: kwrite behaves different from other applications. If you try the below with Firefox, the results will not be the same.)

If you try os.system("kwrite"), program flow freezes until the user closes kwrite. To overcome that I tried instead os.system(konsole -e kwrite). This time program continued to flow, but kwrite became the subprocess of the console.

Anyone runs the kwrite not being a subprocess (i.e. in the system monitor it must appear at the leftmost edge of the tree).


回答 22

os.system不允许您存储结果,因此,如果要将结果存储在某个列表或其他内容中,则subprocess.call可以使用。

os.system does not allow you to store results, so if you want to store results in some list or something, a subprocess.call works.


回答 23

subprocess.check_call如果您不想测试返回值,则非常方便。任何错误都会引发异常。

subprocess.check_call is convenient if you don’t want to test return values. It throws an exception on any error.


回答 24

我倾向于将进程shlex一起使用(以处理带引号的字符串的转义):

>>> import subprocess, shlex
>>> command = 'ls -l "/your/path/with spaces/"'
>>> call_params = shlex.split(command)
>>> print call_params
["ls", "-l", "/your/path/with spaces/"]
>>> subprocess.call(call_params)

I tend to use subprocess together with shlex (to handle escaping of quoted strings):

>>> import subprocess, shlex
>>> command = 'ls -l "/your/path/with spaces/"'
>>> call_params = shlex.split(command)
>>> print call_params
["ls", "-l", "/your/path/with spaces/"]
>>> subprocess.call(call_params)

回答 25

无耻的插件,我为此写了一个库:P https://github.com/houqp/shell.py

目前,它基本上是popen和shlex的包装。它还支持管道命令,因此您可以在Python中更轻松地链接命令。因此,您可以执行以下操作:

ex('echo hello shell.py') | "awk '{print $2}'"

Shameless plug, I wrote a library for this :P https://github.com/houqp/shell.py

It’s basically a wrapper for popen and shlex for now. It also supports piping commands so you can chain commands easier in Python. So you can do things like:

ex('echo hello shell.py') | "awk '{print $2}'"

回答 26

在Windows中你可以导入subprocess模块,并通过调用运行外部命令subprocess.Popen()subprocess.Popen().communicate()subprocess.Popen().wait()如下:

# Python script to run a command line
import subprocess

def execute(cmd):
    """
        Purpose  : To execute a command and return exit status
        Argument : cmd - command to execute
        Return   : exit_code
    """
    process = subprocess.Popen(cmd, shell=True, stdout=subprocess.PIPE, stderr=subprocess.PIPE)
    (result, error) = process.communicate()

    rc = process.wait()

    if rc != 0:
        print "Error: failed to execute command:", cmd
        print error
    return result
# def

command = "tasklist | grep python"
print "This process detail: \n", execute(command)

输出:

This process detail:
python.exe                     604 RDP-Tcp#0                  4      5,660 K

In Windows you can just import the subprocess module and run external commands by calling subprocess.Popen(), subprocess.Popen().communicate() and subprocess.Popen().wait() as below:

# Python script to run a command line
import subprocess

def execute(cmd):
    """
        Purpose  : To execute a command and return exit status
        Argument : cmd - command to execute
        Return   : exit_code
    """
    process = subprocess.Popen(cmd, shell=True, stdout=subprocess.PIPE, stderr=subprocess.PIPE)
    (result, error) = process.communicate()

    rc = process.wait()

    if rc != 0:
        print "Error: failed to execute command:", cmd
        print error
    return result
# def

command = "tasklist | grep python"
print "This process detail: \n", execute(command)

Output:

This process detail:
python.exe                     604 RDP-Tcp#0                  4      5,660 K

回答 27

在Linux下,如果您想调用将独立执行的外部命令(在python脚本终止后将继续运行),则可以使用简单的队列作为任务假脱机程序at命令

任务假脱机程序的示例:

import os
os.system('ts <your-command>')

有关任务后台处理程序(ts)的注意事项:

  1. 您可以使用以下命令设置要运行的并发进程数(“插槽”):

    ts -S <number-of-slots>

  2. 安装ts不需要管理员权限。您可以使用简单的源代码下载并编译它make,然后将其添加到路径中,即可完成操作。

Under Linux, in case you would like to call an external command that will execute independently (will keep running after the python script terminates), you can use a simple queue as task spooler or the at command

An example with task spooler:

import os
os.system('ts <your-command>')

Notes about task spooler (ts):

  1. You could set the number of concurrent processes to be run (“slots”) with:

    ts -S <number-of-slots>

  2. Installing ts doesn’t requires admin privileges. You can download and compile it from source with a simple make, add it to your path and you’re done.


回答 28

您可以使用Popen,然后可以检查过程的状态:

from subprocess import Popen

proc = Popen(['ls', '-l'])
if proc.poll() is None:
    proc.kill()

签出subprocess.Popen

You can use Popen, and then you can check the procedure’s status:

from subprocess import Popen

proc = Popen(['ls', '-l'])
if proc.poll() is None:
    proc.kill()

Check out subprocess.Popen.


回答 29

要从OpenStack Neutron获取网络ID :

#!/usr/bin/python
import os
netid = "nova net-list | awk '/ External / { print $2 }'"
temp = os.popen(netid).read()  /* Here temp also contains new line (\n) */
networkId = temp.rstrip()
print(networkId)

nova net-list的输出

+--------------------------------------+------------+------+
| ID                                   | Label      | CIDR |
+--------------------------------------+------------+------+
| 431c9014-5b5d-4b51-a357-66020ffbb123 | test1      | None |
| 27a74fcd-37c0-4789-9414-9531b7e3f126 | External   | None |
| 5a2712e9-70dc-4b0e-9281-17e02f4684c9 | management | None |
| 7aa697f5-0e60-4c15-b4cc-9cb659698512 | Internal   | None |
+--------------------------------------+------------+------+

打印输出(networkId)

27a74fcd-37c0-4789-9414-9531b7e3f126

To fetch the network id from the OpenStack Neutron:

#!/usr/bin/python
import os
netid = "nova net-list | awk '/ External / { print $2 }'"
temp = os.popen(netid).read()  /* Here temp also contains new line (\n) */
networkId = temp.rstrip()
print(networkId)

Output of nova net-list

+--------------------------------------+------------+------+
| ID                                   | Label      | CIDR |
+--------------------------------------+------------+------+
| 431c9014-5b5d-4b51-a357-66020ffbb123 | test1      | None |
| 27a74fcd-37c0-4789-9414-9531b7e3f126 | External   | None |
| 5a2712e9-70dc-4b0e-9281-17e02f4684c9 | management | None |
| 7aa697f5-0e60-4c15-b4cc-9cb659698512 | Internal   | None |
+--------------------------------------+------------+------+

Output of print(networkId)

27a74fcd-37c0-4789-9414-9531b7e3f126