分类目录归档:知识问答

如何在Python中的单个表达式中合并两个字典?

问题:如何在Python中的单个表达式中合并两个字典?

我有两个Python字典,我想编写一个返回合并的这两个字典的表达式。update()如果返回结果而不是就地修改字典,该方法将是我所需要的。

>>> x = {'a': 1, 'b': 2}
>>> y = {'b': 10, 'c': 11}
>>> z = x.update(y)
>>> print(z)
None
>>> x
{'a': 1, 'b': 10, 'c': 11}

我怎样才能在最终的合并字典z,不是x

(更清楚地说,dict.update()我正在寻找的最后一个胜出的冲突处理方法是。)

I have two Python dictionaries, and I want to write a single expression that returns these two dictionaries, merged. The update() method would be what I need, if it returned its result instead of modifying a dictionary in-place.

>>> x = {'a': 1, 'b': 2}
>>> y = {'b': 10, 'c': 11}
>>> z = x.update(y)
>>> print(z)
None
>>> x
{'a': 1, 'b': 10, 'c': 11}

How can I get that final merged dictionary in z, not x?

(To be extra-clear, the last-one-wins conflict-handling of dict.update() is what I’m looking for as well.)


回答 0

如何在一个表达式中合并两个Python字典?

对于字典xyz变成了浅层合并的字典,带有y替换的值x

  • 在Python 3.5或更高版本中:

    z = {**x, **y}
  • 在Python 2(或3.4或更低版本)中,编写一个函数:

    def merge_two_dicts(x, y):
        z = x.copy()   # start with x's keys and values
        z.update(y)    # modifies z with y's keys and values & returns None
        return z

    现在:

    z = merge_two_dicts(x, y)
  • 在Python 3.9.0a4以上(最终发布日期大约2020年10月):PEP-584这里讨论,执行,以进一步简化这一点:

    z = x | y          # NOTE: 3.9+ ONLY

说明

假设您有两个字典,并且想要将它们合并为新字典而不更改原始字典:

x = {'a': 1, 'b': 2}
y = {'b': 3, 'c': 4}

理想的结果是获得一个z合并了值的新字典(),第二个dict的值覆盖第一个字典的值。

>>> z
{'a': 1, 'b': 3, 'c': 4}

PEP 448中提出并从Python 3.5开始可用的新语法是

z = {**x, **y}

它确实是一个表达。

注意,我们也可以使用文字符号合并:

z = {**x, 'foo': 1, 'bar': 2, **y}

现在:

>>> z
{'a': 1, 'b': 3, 'foo': 1, 'bar': 2, 'c': 4}

它现在显示为在3.5发布时间表中实现,PEP 478,并且已进入Python 3.5的新功能文档。

但是,由于许多组织仍在使用Python 2,因此您可能希望以向后兼容的方式进行操作。在Python 2和Python 3.0-3.4中可用的经典Pythonic方法是分两个步骤完成的:

z = x.copy()
z.update(y) # which returns None since it mutates z

在这两种方法中,y将排第二,其值将替换x的值,因此'b'将指向3我们的最终结果。

尚未在Python 3.5上运行,但需要一个表达式

如果您尚未使用Python 3.5,或者需要编写向后兼容的代码,并且希望在单个表达式中使用它,则最有效的方法是将其放入函数中:

def merge_two_dicts(x, y):
    """Given two dicts, merge them into a new dict as a shallow copy."""
    z = x.copy()
    z.update(y)
    return z

然后您有一个表达式:

z = merge_two_dicts(x, y)

您还可以创建一个函数来合并未定义数量的dict,从零到非常大的数量:

def merge_dicts(*dict_args):
    """
    Given any number of dicts, shallow copy and merge into a new dict,
    precedence goes to key value pairs in latter dicts.
    """
    result = {}
    for dictionary in dict_args:
        result.update(dictionary)
    return result

此函数将在Python 2和3中适用于所有字典。例如给以下a命令g

z = merge_dicts(a, b, c, d, e, f, g) 

和中的键值对g优先af,以此类推。

其他答案的批判

不要使用以前接受的答案中看到的内容:

z = dict(x.items() + y.items())

在Python 2中,您将在每个内存字典中创建两个列表,在内存中创建第三个列表,其长度等于前两个字典的长度,然后丢弃所有三个列表以创建字典。在Python 3中,这将失败,因为您将两个dict_items对象而不是两个列表加在一起-

>>> c = dict(a.items() + b.items())
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: unsupported operand type(s) for +: 'dict_items' and 'dict_items'

并且您必须将它们明确创建为列表,例如z = dict(list(x.items()) + list(y.items()))。这浪费了资源和计算能力。

类似地,当值是不可散列的对象(例如列表)时,items()在Python 3(viewitems()在Python 2.7中)进行联合也将失败。即使您的值是可哈希的,由于集合在语义上是无序的,因此关于优先级的行为是不确定的。所以不要这样做:

>>> c = dict(a.items() | b.items())

此示例演示了值不可散列时会发生的情况:

>>> x = {'a': []}
>>> y = {'b': []}
>>> dict(x.items() | y.items())
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: unhashable type: 'list'

这是一个示例,其中y应该优先,但是由于集合的任意顺序,保留了x的值:

>>> x = {'a': 2}
>>> y = {'a': 1}
>>> dict(x.items() | y.items())
{'a': 2}

您不应该使用的另一种技巧:

z = dict(x, **y)

这使用了dict构造函数,并且非常快速且具有内存效率(甚至比我们的两步过程略高),但是除非您确切地知道这里正在发生什么(也就是说,第二个dict作为关键字参数传递给dict,构造函数),很难阅读,它不是预期的用法,因此不是Pythonic。

这是在django修复的用法示例。

字典旨在获取可散列的键(例如,frozenset或元组),但是当键不是字符串时此方法在Python 3中失败。

>>> c = dict(a, **b)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: keyword arguments must be strings

邮件列表中,该语言的创建者Guido van Rossum写道:

我宣布dict({},** {1:3})非法是可以的,因为毕竟这是对**机制的滥用。

显然dict(x,** y)被“调用x.update(y)并返回x”的“酷砍”。我个人觉得它比酷更卑鄙。

根据我的理解(以及对语言创建者的理解),该命令的预期用途dict(**y)是用于创建可读性强的命令,例如:

dict(a=1, b=10, c=11)

代替

{'a': 1, 'b': 10, 'c': 11}

对评论的回应

尽管Guido说了什么dict(x, **y),但符合dict规范,顺便说一句。它仅适用于Python 2和3。事实上,这仅适用于字符串键,这是关键字参数如何工作的直接结果,而不是字典的简称。在这个地方使用**运算符也不会滥用该机制,实际上**正是为了将dict作为关键字传递而设计的。

同样,当键为非字符串时,它不适用于3。隐式调用协定是命名空间采用普通命令,而用户只能传递字符串形式的关键字参数。所有其他可调用对象都强制执行了它。dict在Python 2中破坏了这种一致性:

>>> foo(**{('a', 'b'): None})
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: foo() keywords must be strings
>>> dict(**{('a', 'b'): None})
{('a', 'b'): None}

考虑到其他Python实现(Pypy,Jython,IronPython),这种不一致是很糟糕的。因此,它在Python 3中已得到修复,因为这种用法可能是一个重大更改。

我向您指出,故意编写仅在一种语言版本中有效或仅在特定的任意约束下有效的代码是恶意的无能。

更多评论:

dict(x.items() + y.items()) 仍然是Python 2最具可读性的解决方案。可读性至关重要。

我的回答:merge_two_dicts(x, y)如果我们实际上担心可读性,实际上对我来说似乎更加清晰。而且它不向前兼容,因为Python 2越来越不推荐使用。

{**x, **y}似乎不处理嵌套字典。嵌套键的内容只是被覆盖,而不是被合并。我最终被这些没有递归合并的答案所烧死,令我惊讶的是,没有人提到它。在我对“合并”一词的解释中,这些答案描述的是“将一个词典与另一个词典更新”,而不是合并。

是。我必须回头再问这个问题,它要求两个字典进行浅层合并,第一个字典的值由第二个字典覆盖-在一个表达式中。

假设有两个字典,一个字典可能会递归地将它们合并到一个函数中,但是您应注意不要从任何一个源修改字典,避免这种情况的最可靠方法是在分配值时进行复制。由于键必须是可散列的,因此通常是不可变的,因此复制它们毫无意义:

from copy import deepcopy

def dict_of_dicts_merge(x, y):
    z = {}
    overlapping_keys = x.keys() & y.keys()
    for key in overlapping_keys:
        z[key] = dict_of_dicts_merge(x[key], y[key])
    for key in x.keys() - overlapping_keys:
        z[key] = deepcopy(x[key])
    for key in y.keys() - overlapping_keys:
        z[key] = deepcopy(y[key])
    return z

用法:

>>> x = {'a':{1:{}}, 'b': {2:{}}}
>>> y = {'b':{10:{}}, 'c': {11:{}}}
>>> dict_of_dicts_merge(x, y)
{'b': {2: {}, 10: {}}, 'a': {1: {}}, 'c': {11: {}}}

提出其他值类型的偶发性问题远远超出了此问题的范围,因此,我将为您回答有关“词典合并字典”的规范问题的答案

性能较差但临时性正确

这些方法的性能较差,但是它们将提供正确的行为。他们将少得多比高性能copyupdate或新的拆包,因为他们通过在更高的抽象水平的每个键-值对迭代,但他们做的尊重优先顺序(后者类型的字典具有优先权)

您还可以在dict理解内手动将dict链接:

{k: v for d in dicts for k, v in d.items()} # iteritems in Python 2.7

或在python 2.6中(也许在引入生成器表达式时早在2.4中):

dict((k, v) for d in dicts for k, v in d.items())

itertools.chain 将以正确的顺序在键值对上链接迭代器:

import itertools
z = dict(itertools.chain(x.iteritems(), y.iteritems()))

绩效分析

我将仅对已知行为正确的用法进行性能分析。

import timeit

在Ubuntu 14.04上完成以下操作

在Python 2.7(系统Python)中:

>>> min(timeit.repeat(lambda: merge_two_dicts(x, y)))
0.5726828575134277
>>> min(timeit.repeat(lambda: {k: v for d in (x, y) for k, v in d.items()} ))
1.163769006729126
>>> min(timeit.repeat(lambda: dict(itertools.chain(x.iteritems(), y.iteritems()))))
1.1614501476287842
>>> min(timeit.repeat(lambda: dict((k, v) for d in (x, y) for k, v in d.items())))
2.2345519065856934

在Python 3.5中(死神PPA):

>>> min(timeit.repeat(lambda: {**x, **y}))
0.4094954460160807
>>> min(timeit.repeat(lambda: merge_two_dicts(x, y)))
0.7881555100320838
>>> min(timeit.repeat(lambda: {k: v for d in (x, y) for k, v in d.items()} ))
1.4525277839857154
>>> min(timeit.repeat(lambda: dict(itertools.chain(x.items(), y.items()))))
2.3143140770262107
>>> min(timeit.repeat(lambda: dict((k, v) for d in (x, y) for k, v in d.items())))
3.2069112799945287

词典资源

How can I merge two Python dictionaries in a single expression?

For dictionaries x and y, z becomes a shallowly merged dictionary with values from y replacing those from x.

  • In Python 3.5 or greater:

    z = {**x, **y}
    
  • In Python 2, (or 3.4 or lower) write a function:

    def merge_two_dicts(x, y):
        z = x.copy()   # start with x's keys and values
        z.update(y)    # modifies z with y's keys and values & returns None
        return z
    

    and now:

    z = merge_two_dicts(x, y)
    
  • In Python 3.9.0a4 or greater (final release date approx October 2020): PEP-584, discussed here, was implemented to further simplify this:

    z = x | y          # NOTE: 3.9+ ONLY
    

Explanation

Say you have two dicts and you want to merge them into a new dict without altering the original dicts:

x = {'a': 1, 'b': 2}
y = {'b': 3, 'c': 4}

The desired result is to get a new dictionary (z) with the values merged, and the second dict’s values overwriting those from the first.

>>> z
{'a': 1, 'b': 3, 'c': 4}

A new syntax for this, proposed in PEP 448 and available as of Python 3.5, is

z = {**x, **y}

And it is indeed a single expression.

Note that we can merge in with literal notation as well:

z = {**x, 'foo': 1, 'bar': 2, **y}

and now:

>>> z
{'a': 1, 'b': 3, 'foo': 1, 'bar': 2, 'c': 4}

It is now showing as implemented in the release schedule for 3.5, PEP 478, and it has now made its way into What’s New in Python 3.5 document.

However, since many organizations are still on Python 2, you may wish to do this in a backwards compatible way. The classically Pythonic way, available in Python 2 and Python 3.0-3.4, is to do this as a two-step process:

z = x.copy()
z.update(y) # which returns None since it mutates z

In both approaches, y will come second and its values will replace x‘s values, thus 'b' will point to 3 in our final result.

Not yet on Python 3.5, but want a single expression

If you are not yet on Python 3.5, or need to write backward-compatible code, and you want this in a single expression, the most performant while correct approach is to put it in a function:

def merge_two_dicts(x, y):
    """Given two dicts, merge them into a new dict as a shallow copy."""
    z = x.copy()
    z.update(y)
    return z

and then you have a single expression:

z = merge_two_dicts(x, y)

You can also make a function to merge an undefined number of dicts, from zero to a very large number:

def merge_dicts(*dict_args):
    """
    Given any number of dicts, shallow copy and merge into a new dict,
    precedence goes to key value pairs in latter dicts.
    """
    result = {}
    for dictionary in dict_args:
        result.update(dictionary)
    return result

This function will work in Python 2 and 3 for all dicts. e.g. given dicts a to g:

z = merge_dicts(a, b, c, d, e, f, g) 

and key value pairs in g will take precedence over dicts a to f, and so on.

Critiques of Other Answers

Don’t use what you see in the formerly accepted answer:

z = dict(x.items() + y.items())

In Python 2, you create two lists in memory for each dict, create a third list in memory with length equal to the length of the first two put together, and then discard all three lists to create the dict. In Python 3, this will fail because you’re adding two dict_items objects together, not two lists –

>>> c = dict(a.items() + b.items())
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: unsupported operand type(s) for +: 'dict_items' and 'dict_items'

and you would have to explicitly create them as lists, e.g. z = dict(list(x.items()) + list(y.items())). This is a waste of resources and computation power.

Similarly, taking the union of items() in Python 3 (viewitems() in Python 2.7) will also fail when values are unhashable objects (like lists, for example). Even if your values are hashable, since sets are semantically unordered, the behavior is undefined in regards to precedence. So don’t do this:

>>> c = dict(a.items() | b.items())

This example demonstrates what happens when values are unhashable:

>>> x = {'a': []}
>>> y = {'b': []}
>>> dict(x.items() | y.items())
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: unhashable type: 'list'

Here’s an example where y should have precedence, but instead the value from x is retained due to the arbitrary order of sets:

>>> x = {'a': 2}
>>> y = {'a': 1}
>>> dict(x.items() | y.items())
{'a': 2}

Another hack you should not use:

z = dict(x, **y)

This uses the dict constructor, and is very fast and memory efficient (even slightly more-so than our two-step process) but unless you know precisely what is happening here (that is, the second dict is being passed as keyword arguments to the dict constructor), it’s difficult to read, it’s not the intended usage, and so it is not Pythonic.

Here’s an example of the usage being remediated in django.

Dicts are intended to take hashable keys (e.g. frozensets or tuples), but this method fails in Python 3 when keys are not strings.

>>> c = dict(a, **b)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: keyword arguments must be strings

From the mailing list, Guido van Rossum, the creator of the language, wrote:

I am fine with declaring dict({}, **{1:3}) illegal, since after all it is abuse of the ** mechanism.

and

Apparently dict(x, **y) is going around as “cool hack” for “call x.update(y) and return x”. Personally I find it more despicable than cool.

It is my understanding (as well as the understanding of the creator of the language) that the intended usage for dict(**y) is for creating dicts for readability purposes, e.g.:

dict(a=1, b=10, c=11)

instead of

{'a': 1, 'b': 10, 'c': 11}

Response to comments

Despite what Guido says, dict(x, **y) is in line with the dict specification, which btw. works for both Python 2 and 3. The fact that this only works for string keys is a direct consequence of how keyword parameters work and not a short-comming of dict. Nor is using the ** operator in this place an abuse of the mechanism, in fact ** was designed precisely to pass dicts as keywords.

Again, it doesn’t work for 3 when keys are non-strings. The implicit calling contract is that namespaces take ordinary dicts, while users must only pass keyword arguments that are strings. All other callables enforced it. dict broke this consistency in Python 2:

>>> foo(**{('a', 'b'): None})
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: foo() keywords must be strings
>>> dict(**{('a', 'b'): None})
{('a', 'b'): None}

This inconsistency was bad given other implementations of Python (Pypy, Jython, IronPython). Thus it was fixed in Python 3, as this usage could be a breaking change.

I submit to you that it is malicious incompetence to intentionally write code that only works in one version of a language or that only works given certain arbitrary constraints.

More comments:

dict(x.items() + y.items()) is still the most readable solution for Python 2. Readability counts.

My response: merge_two_dicts(x, y) actually seems much clearer to me, if we’re actually concerned about readability. And it is not forward compatible, as Python 2 is increasingly deprecated.

{**x, **y} does not seem to handle nested dictionaries. the contents of nested keys are simply overwritten, not merged […] I ended up being burnt by these answers that do not merge recursively and I was surprised no one mentioned it. In my interpretation of the word “merging” these answers describe “updating one dict with another”, and not merging.

Yes. I must refer you back to the question, which is asking for a shallow merge of two dictionaries, with the first’s values being overwritten by the second’s – in a single expression.

Assuming two dictionary of dictionaries, one might recursively merge them in a single function, but you should be careful not to modify the dicts from either source, and the surest way to avoid that is to make a copy when assigning values. As keys must be hashable and are usually therefore immutable, it is pointless to copy them:

from copy import deepcopy

def dict_of_dicts_merge(x, y):
    z = {}
    overlapping_keys = x.keys() & y.keys()
    for key in overlapping_keys:
        z[key] = dict_of_dicts_merge(x[key], y[key])
    for key in x.keys() - overlapping_keys:
        z[key] = deepcopy(x[key])
    for key in y.keys() - overlapping_keys:
        z[key] = deepcopy(y[key])
    return z

Usage:

>>> x = {'a':{1:{}}, 'b': {2:{}}}
>>> y = {'b':{10:{}}, 'c': {11:{}}}
>>> dict_of_dicts_merge(x, y)
{'b': {2: {}, 10: {}}, 'a': {1: {}}, 'c': {11: {}}}

Coming up with contingencies for other value types is far beyond the scope of this question, so I will point you at my answer to the canonical question on a “Dictionaries of dictionaries merge”.

Less Performant But Correct Ad-hocs

These approaches are less performant, but they will provide correct behavior. They will be much less performant than copy and update or the new unpacking because they iterate through each key-value pair at a higher level of abstraction, but they do respect the order of precedence (latter dicts have precedence)

You can also chain the dicts manually inside a dict comprehension:

{k: v for d in dicts for k, v in d.items()} # iteritems in Python 2.7

or in python 2.6 (and perhaps as early as 2.4 when generator expressions were introduced):

dict((k, v) for d in dicts for k, v in d.items())

itertools.chain will chain the iterators over the key-value pairs in the correct order:

import itertools
z = dict(itertools.chain(x.iteritems(), y.iteritems()))

Performance Analysis

I’m only going to do the performance analysis of the usages known to behave correctly.

import timeit

The following is done on Ubuntu 14.04

In Python 2.7 (system Python):

>>> min(timeit.repeat(lambda: merge_two_dicts(x, y)))
0.5726828575134277
>>> min(timeit.repeat(lambda: {k: v for d in (x, y) for k, v in d.items()} ))
1.163769006729126
>>> min(timeit.repeat(lambda: dict(itertools.chain(x.iteritems(), y.iteritems()))))
1.1614501476287842
>>> min(timeit.repeat(lambda: dict((k, v) for d in (x, y) for k, v in d.items())))
2.2345519065856934

In Python 3.5 (deadsnakes PPA):

>>> min(timeit.repeat(lambda: {**x, **y}))
0.4094954460160807
>>> min(timeit.repeat(lambda: merge_two_dicts(x, y)))
0.7881555100320838
>>> min(timeit.repeat(lambda: {k: v for d in (x, y) for k, v in d.items()} ))
1.4525277839857154
>>> min(timeit.repeat(lambda: dict(itertools.chain(x.items(), y.items()))))
2.3143140770262107
>>> min(timeit.repeat(lambda: dict((k, v) for d in (x, y) for k, v in d.items())))
3.2069112799945287

Resources on Dictionaries


回答 1

您的情况是:

z = dict(x.items() + y.items())

可以根据需要将最终的dict放入中z,并使key的值b被第二(y)dict的值正确覆盖:

>>> x = {'a':1, 'b': 2}
>>> y = {'b':10, 'c': 11}
>>> z = dict(x.items() + y.items())
>>> z
{'a': 1, 'c': 11, 'b': 10}

如果您使用Python 3,它只会稍微复杂一点。创建z

>>> z = dict(list(x.items()) + list(y.items()))
>>> z
{'a': 1, 'c': 11, 'b': 10}

如果您使用Python版本3.9.0a4或更高版本,则可以直接使用:

x = {'a':1, 'b': 2}
y = {'b':10, 'c': 11}
z = x | y
print(z)

Output: {'a': 1, 'c': 11, 'b': 10}

In your case, what you can do is:

z = dict(x.items() + y.items())

This will, as you want it, put the final dict in z, and make the value for key b be properly overridden by the second (y) dict’s value:

>>> x = {'a':1, 'b': 2}
>>> y = {'b':10, 'c': 11}
>>> z = dict(x.items() + y.items())
>>> z
{'a': 1, 'c': 11, 'b': 10}

If you use Python 3, it is only a little more complicated. To create z:

>>> z = dict(list(x.items()) + list(y.items()))
>>> z
{'a': 1, 'c': 11, 'b': 10}

If you use Python version 3.9.0a4 or greater, then you can directly use:

x = {'a':1, 'b': 2}
y = {'b':10, 'c': 11}
z = x | y
print(z)

Output: {'a': 1, 'c': 11, 'b': 10}

回答 2

替代:

z = x.copy()
z.update(y)

An alternative:

z = x.copy()
z.update(y)

回答 3

另一个更简洁的选择:

z = dict(x, **y)

注意:这已经成为一个流行的答案,但必须指出的是,如果y有任何非字符串键,那么它实际上是对CPython实现细节的滥用,并且在Python 3中不起作用,或在PyPy,IronPython或Jython中。另外,Guido也不是粉丝。因此,我不建议将此技术用于前向兼容或交叉实现的可移植代码,这实际上意味着应完全避免使用它。

Another, more concise, option:

z = dict(x, **y)

Note: this has become a popular answer, but it is important to point out that if y has any non-string keys, the fact that this works at all is an abuse of a CPython implementation detail, and it does not work in Python 3, or in PyPy, IronPython, or Jython. Also, Guido is not a fan. So I can’t recommend this technique for forward-compatible or cross-implementation portable code, which really means it should be avoided entirely.


回答 4

这可能不是一个流行的答案,但是您几乎可以肯定不想这样做。如果要合并的副本,请使用copy(或deepcopy,具体取决于您的需求),然后进行更新。与使用.items()+ .items()进行单行创建相比,两行代码更具可读性-更具Python风格。显式胜于隐式。

此外,当您使用.items()(Python 3.0之前的版本)时,您正在创建一个新列表,其中包含字典中的项目。如果您的词典很大,那将是很大的开销(创建合并字典后将立即丢弃两个大列表)。update()可以更高效地工作,因为它可以逐项执行第二个字典。

时间方面

>>> timeit.Timer("dict(x, **y)", "x = dict(zip(range(1000), range(1000)))\ny=dict(zip(range(1000,2000), range(1000,2000)))").timeit(100000)
15.52571702003479
>>> timeit.Timer("temp = x.copy()\ntemp.update(y)", "x = dict(zip(range(1000), range(1000)))\ny=dict(zip(range(1000,2000), range(1000,2000)))").timeit(100000)
15.694622993469238
>>> timeit.Timer("dict(x.items() + y.items())", "x = dict(zip(range(1000), range(1000)))\ny=dict(zip(range(1000,2000), range(1000,2000)))").timeit(100000)
41.484580039978027

IMO出于可读性考虑,前两者之间的微小速度下降是值得的。此外,仅在Python 2.3中添加了用于字典创建的关键字参数,而copy()和update()将在较早的版本中运行。

This probably won’t be a popular answer, but you almost certainly do not want to do this. If you want a copy that’s a merge, then use copy (or deepcopy, depending on what you want) and then update. The two lines of code are much more readable – more Pythonic – than the single line creation with .items() + .items(). Explicit is better than implicit.

In addition, when you use .items() (pre Python 3.0), you’re creating a new list that contains the items from the dict. If your dictionaries are large, then that is quite a lot of overhead (two large lists that will be thrown away as soon as the merged dict is created). update() can work more efficiently, because it can run through the second dict item-by-item.

In terms of time:

>>> timeit.Timer("dict(x, **y)", "x = dict(zip(range(1000), range(1000)))\ny=dict(zip(range(1000,2000), range(1000,2000)))").timeit(100000)
15.52571702003479
>>> timeit.Timer("temp = x.copy()\ntemp.update(y)", "x = dict(zip(range(1000), range(1000)))\ny=dict(zip(range(1000,2000), range(1000,2000)))").timeit(100000)
15.694622993469238
>>> timeit.Timer("dict(x.items() + y.items())", "x = dict(zip(range(1000), range(1000)))\ny=dict(zip(range(1000,2000), range(1000,2000)))").timeit(100000)
41.484580039978027

IMO the tiny slowdown between the first two is worth it for the readability. In addition, keyword arguments for dictionary creation was only added in Python 2.3, whereas copy() and update() will work in older versions.


回答 5

在后续回答中,您询问了这两种选择的相对性能:

z1 = dict(x.items() + y.items())
z2 = dict(x, **y)

至少在我的机器上(运行Python 2.5.2的相当普通的x86_64),替代z2方法不仅更短,更简单,而且显着更快。您可以使用timeitPython随附的模块自行验证。

示例1:相同的字典将20个连续的整数映射到自身:

% python -m timeit -s 'x=y=dict((i,i) for i in range(20))' 'z1=dict(x.items() + y.items())'
100000 loops, best of 3: 5.67 usec per loop
% python -m timeit -s 'x=y=dict((i,i) for i in range(20))' 'z2=dict(x, **y)' 
100000 loops, best of 3: 1.53 usec per loop

z2胜出率约为3.5。不同的词典似乎会产生完全不同的结果,但z2似乎总是遥遥领先。(如果同一测试的结果不一致,请尝试传递-r大于默认值3的数字。)

示例2:非重叠字典将252个短字符串映射为整数,反之亦然:

% python -m timeit -s 'from htmlentitydefs import codepoint2name as x, name2codepoint as y' 'z1=dict(x.items() + y.items())'
1000 loops, best of 3: 260 usec per loop
% python -m timeit -s 'from htmlentitydefs import codepoint2name as x, name2codepoint as y' 'z2=dict(x, **y)'               
10000 loops, best of 3: 26.9 usec per loop

z2 赢了大约10倍。这是我书中相当大的胜利!

比较z1完这两个项目后,我想知道的不佳表现是否可以归因于构建两个项目列表的开销,这反过来又让我想知道这种变化是否会更好:

from itertools import chain
z3 = dict(chain(x.iteritems(), y.iteritems()))

一些快速测试,例如

% python -m timeit -s 'from itertools import chain; from htmlentitydefs import codepoint2name as x, name2codepoint as y' 'z3=dict(chain(x.iteritems(), y.iteritems()))'
10000 loops, best of 3: 66 usec per loop

我得出的结论是,z3它的速度要比速度快z1,但不及速度z2。绝对不值得所有额外的打字。

讨论中仍然缺少一些重要的内容,这是将这些替代方法与合并两个列表的“明显”方法的性能比较:使用该update方法。为了使事情与表达式保持一致,而不会修改x或y,我将制作x的副本,而不是就地对其进行修改,如下所示:

z0 = dict(x)
z0.update(y)

典型结果:

% python -m timeit -s 'from htmlentitydefs import codepoint2name as x, name2codepoint as y' 'z0=dict(x); z0.update(y)'
10000 loops, best of 3: 26.9 usec per loop

换句话说,z0并且z2似乎具有基本相同的性能。您认为这可能是巧合吗?我不….

实际上,我什至声称纯粹的Python代码不可能做到比这更好。而且,如果您可以在C扩展模块中做得更好,我想Python人士可能会对将您的代码(或方法的变体)并入Python核心感兴趣。Python dict在很多地方都使用过;优化运营非常重要。

您也可以这样写

z0 = x.copy()
z0.update(y)

就像Tony一样,但是(并不奇怪)表示法上的差异对性能没有任何可测量的影响。使用对您而言合适的任何一种。当然,他绝对正确地指出,两语句版本更容易理解。

In a follow-up answer, you asked about the relative performance of these two alternatives:

z1 = dict(x.items() + y.items())
z2 = dict(x, **y)

On my machine, at least (a fairly ordinary x86_64 running Python 2.5.2), alternative z2 is not only shorter and simpler but also significantly faster. You can verify this for yourself using the timeit module that comes with Python.

Example 1: identical dictionaries mapping 20 consecutive integers to themselves:

% python -m timeit -s 'x=y=dict((i,i) for i in range(20))' 'z1=dict(x.items() + y.items())'
100000 loops, best of 3: 5.67 usec per loop
% python -m timeit -s 'x=y=dict((i,i) for i in range(20))' 'z2=dict(x, **y)' 
100000 loops, best of 3: 1.53 usec per loop

z2 wins by a factor of 3.5 or so. Different dictionaries seem to yield quite different results, but z2 always seems to come out ahead. (If you get inconsistent results for the same test, try passing in -r with a number larger than the default 3.)

Example 2: non-overlapping dictionaries mapping 252 short strings to integers and vice versa:

% python -m timeit -s 'from htmlentitydefs import codepoint2name as x, name2codepoint as y' 'z1=dict(x.items() + y.items())'
1000 loops, best of 3: 260 usec per loop
% python -m timeit -s 'from htmlentitydefs import codepoint2name as x, name2codepoint as y' 'z2=dict(x, **y)'               
10000 loops, best of 3: 26.9 usec per loop

z2 wins by about a factor of 10. That’s a pretty big win in my book!

After comparing those two, I wondered if z1‘s poor performance could be attributed to the overhead of constructing the two item lists, which in turn led me to wonder if this variation might work better:

from itertools import chain
z3 = dict(chain(x.iteritems(), y.iteritems()))

A few quick tests, e.g.

% python -m timeit -s 'from itertools import chain; from htmlentitydefs import codepoint2name as x, name2codepoint as y' 'z3=dict(chain(x.iteritems(), y.iteritems()))'
10000 loops, best of 3: 66 usec per loop

lead me to conclude that z3 is somewhat faster than z1, but not nearly as fast as z2. Definitely not worth all the extra typing.

This discussion is still missing something important, which is a performance comparison of these alternatives with the “obvious” way of merging two lists: using the update method. To try to keep things on an equal footing with the expressions, none of which modify x or y, I’m going to make a copy of x instead of modifying it in-place, as follows:

z0 = dict(x)
z0.update(y)

A typical result:

% python -m timeit -s 'from htmlentitydefs import codepoint2name as x, name2codepoint as y' 'z0=dict(x); z0.update(y)'
10000 loops, best of 3: 26.9 usec per loop

In other words, z0 and z2 seem to have essentially identical performance. Do you think this might be a coincidence? I don’t….

In fact, I’d go so far as to claim that it’s impossible for pure Python code to do any better than this. And if you can do significantly better in a C extension module, I imagine the Python folks might well be interested in incorporating your code (or a variation on your approach) into the Python core. Python uses dict in lots of places; optimizing its operations is a big deal.

You could also write this as

z0 = x.copy()
z0.update(y)

as Tony does, but (not surprisingly) the difference in notation turns out not to have any measurable effect on performance. Use whichever looks right to you. Of course, he’s absolutely correct to point out that the two-statement version is much easier to understand.


回答 6

在Python 3.0及更高版本中,您可以使用collections.ChainMap将多个字典或其他映射组合在一起的方式来创建一个可更新的视图:

>>> from collections import ChainMap
>>> x = {'a':1, 'b': 2}
>>> y = {'b':10, 'c': 11}
>>> z = dict(ChainMap({}, y, x))
>>> for k, v in z.items():
        print(k, '-->', v)

a --> 1
b --> 10
c --> 11

适用于Python 3.5和更高版本的更新:可以使用PEP 448扩展词典打包和拆包。快速简便:

>>> x = {'a':1, 'b': 2}
>>> y = y = {'b':10, 'c': 11}
>>> {**x, **y}
{'a': 1, 'b': 10, 'c': 11}

In Python 3.0 and later, you can use collections.ChainMap which groups multiple dicts or other mappings together to create a single, updateable view:

>>> from collections import ChainMap
>>> x = {'a':1, 'b': 2}
>>> y = {'b':10, 'c': 11}
>>> z = dict(ChainMap({}, y, x))
>>> for k, v in z.items():
        print(k, '-->', v)

a --> 1
b --> 10
c --> 11

Update for Python 3.5 and later: You can use PEP 448 extended dictionary packing and unpacking. This is fast and easy:

>>> x = {'a':1, 'b': 2}
>>> y = y = {'b':10, 'c': 11}
>>> {**x, **y}
{'a': 1, 'b': 10, 'c': 11}

回答 7

我想要类似的东西,但是能够指定如何合并重复键上的值,所以我破解了这个(但并未对其进行大量测试)。显然,这不是单个表达式,而是单个函数调用。

def merge(d1, d2, merge_fn=lambda x,y:y):
    """
    Merges two dictionaries, non-destructively, combining 
    values on duplicate keys as defined by the optional merge
    function.  The default behavior replaces the values in d1
    with corresponding values in d2.  (There is no other generally
    applicable merge strategy, but often you'll have homogeneous 
    types in your dicts, so specifying a merge technique can be 
    valuable.)

    Examples:

    >>> d1
    {'a': 1, 'c': 3, 'b': 2}
    >>> merge(d1, d1)
    {'a': 1, 'c': 3, 'b': 2}
    >>> merge(d1, d1, lambda x,y: x+y)
    {'a': 2, 'c': 6, 'b': 4}

    """
    result = dict(d1)
    for k,v in d2.iteritems():
        if k in result:
            result[k] = merge_fn(result[k], v)
        else:
            result[k] = v
    return result

I wanted something similar, but with the ability to specify how the values on duplicate keys were merged, so I hacked this out (but did not heavily test it). Obviously this is not a single expression, but it is a single function call.

def merge(d1, d2, merge_fn=lambda x,y:y):
    """
    Merges two dictionaries, non-destructively, combining 
    values on duplicate keys as defined by the optional merge
    function.  The default behavior replaces the values in d1
    with corresponding values in d2.  (There is no other generally
    applicable merge strategy, but often you'll have homogeneous 
    types in your dicts, so specifying a merge technique can be 
    valuable.)

    Examples:

    >>> d1
    {'a': 1, 'c': 3, 'b': 2}
    >>> merge(d1, d1)
    {'a': 1, 'c': 3, 'b': 2}
    >>> merge(d1, d1, lambda x,y: x+y)
    {'a': 2, 'c': 6, 'b': 4}

    """
    result = dict(d1)
    for k,v in d2.iteritems():
        if k in result:
            result[k] = merge_fn(result[k], v)
        else:
            result[k] = v
    return result

回答 8

递归/深度更新字典

def deepupdate(original, update):
    """
    Recursively update a dict.
    Subdict's won't be overwritten but also updated.
    """
    for key, value in original.iteritems(): 
        if key not in update:
            update[key] = value
        elif isinstance(value, dict):
            deepupdate(value, update[key]) 
    return update

示范:

pluto_original = {
    'name': 'Pluto',
    'details': {
        'tail': True,
        'color': 'orange'
    }
}

pluto_update = {
    'name': 'Pluutoo',
    'details': {
        'color': 'blue'
    }
}

print deepupdate(pluto_original, pluto_update)

输出:

{
    'name': 'Pluutoo',
    'details': {
        'color': 'blue',
        'tail': True
    }
}

感谢rednaw的编辑。

Recursively/deep update a dict

def deepupdate(original, update):
    """
    Recursively update a dict.
    Subdict's won't be overwritten but also updated.
    """
    for key, value in original.iteritems(): 
        if key not in update:
            update[key] = value
        elif isinstance(value, dict):
            deepupdate(value, update[key]) 
    return update

Demonstration:

pluto_original = {
    'name': 'Pluto',
    'details': {
        'tail': True,
        'color': 'orange'
    }
}

pluto_update = {
    'name': 'Pluutoo',
    'details': {
        'color': 'blue'
    }
}

print deepupdate(pluto_original, pluto_update)

Outputs:

{
    'name': 'Pluutoo',
    'details': {
        'color': 'blue',
        'tail': True
    }
}

Thanks rednaw for edits.


回答 9

我不使用副本时可能想到的最佳版本是:

from itertools import chain
x = {'a':1, 'b': 2}
y = {'b':10, 'c': 11}
dict(chain(x.iteritems(), y.iteritems()))

至少在CPython上,它比快,dict(x.items() + y.items())但没有快n = copy(a); n.update(b)。如果更改iteritems()items(),则此版本在Python 3中也可以使用,这是2to3工具自动完成的。

我个人最喜欢这个版本,因为它用一种功能语法很好地描述了我想要的内容。唯一的小问题是,来自y的值优先于来自x的值并不能完全清楚,但是我不认为很难弄清楚。

The best version I could think while not using copy would be:

from itertools import chain
x = {'a':1, 'b': 2}
y = {'b':10, 'c': 11}
dict(chain(x.iteritems(), y.iteritems()))

It’s faster than dict(x.items() + y.items()) but not as fast as n = copy(a); n.update(b), at least on CPython. This version also works in Python 3 if you change iteritems() to items(), which is automatically done by the 2to3 tool.

Personally I like this version best because it describes fairly good what I want in a single functional syntax. The only minor problem is that it doesn’t make completely obvious that values from y takes precedence over values from x, but I don’t believe it’s difficult to figure that out.


回答 10

Python 3.5(PEP 448)允许使用更好的语法选项:

x = {'a': 1, 'b': 1}
y = {'a': 2, 'c': 2}
final = {**x, **y} 
final
# {'a': 2, 'b': 1, 'c': 2}

甚至

final = {'a': 1, 'b': 1, **x, **y}

在Python 3.9中,您还可以使用| 和| =以及下面来自PEP 584的示例

d = {'spam': 1, 'eggs': 2, 'cheese': 3}
e = {'cheese': 'cheddar', 'aardvark': 'Ethel'}
d | e
# {'spam': 1, 'eggs': 2, 'cheese': 'cheddar', 'aardvark': 'Ethel'}

Python 3.5 (PEP 448) allows a nicer syntax option:

x = {'a': 1, 'b': 1}
y = {'a': 2, 'c': 2}
final = {**x, **y} 
final
# {'a': 2, 'b': 1, 'c': 2}

Or even

final = {'a': 1, 'b': 1, **x, **y}

In Python 3.9 you also use | and |= with the below example from PEP 584

d = {'spam': 1, 'eggs': 2, 'cheese': 3}
e = {'cheese': 'cheddar', 'aardvark': 'Ethel'}
d | e
# {'spam': 1, 'eggs': 2, 'cheese': 'cheddar', 'aardvark': 'Ethel'}

回答 11

x = {'a':1, 'b': 2}
y = {'b':10, 'c': 11}
z = dict(x.items() + y.items())
print z

对于两个字典中都有键的项目,您可以通过将最后一个放在输出中来控制哪一个在输出中结束。

x = {'a':1, 'b': 2}
y = {'b':10, 'c': 11}
z = dict(x.items() + y.items())
print z

For items with keys in both dictionaries (‘b’), you can control which one ends up in the output by putting that one last.


回答 12

虽然已经多次回答了该问题,但尚未列出此问题的简单解决方案。

x = {'a':1, 'b': 2}
y = {'b':10, 'c': 11}
z4 = {}
z4.update(x)
z4.update(y)

它与上面提到的z0和邪恶z2一样快,但易于理解和更改。

While the question has already been answered several times, this simple solution to the problem has not been listed yet.

x = {'a':1, 'b': 2}
y = {'b':10, 'c': 11}
z4 = {}
z4.update(x)
z4.update(y)

It is as fast as z0 and the evil z2 mentioned above, but easy to understand and change.


回答 13

def dict_merge(a, b):
  c = a.copy()
  c.update(b)
  return c

new = dict_merge(old, extras)

在这些阴暗而可疑的答案中,这个出色的例子是在Python中合并字典的唯一且唯一的好方法,这是独裁者终身支持的Guido van Rossum本人!有人提出了一半的建议,但没有将其放在函数中。

print dict_merge(
      {'color':'red', 'model':'Mini'},
      {'model':'Ferrari', 'owner':'Carl'})

给出:

{'color': 'red', 'owner': 'Carl', 'model': 'Ferrari'}
def dict_merge(a, b):
  c = a.copy()
  c.update(b)
  return c

new = dict_merge(old, extras)

Among such shady and dubious answers, this shining example is the one and only good way to merge dicts in Python, endorsed by dictator for life Guido van Rossum himself! Someone else suggested half of this, but did not put it in a function.

print dict_merge(
      {'color':'red', 'model':'Mini'},
      {'model':'Ferrari', 'owner':'Carl'})

gives:

{'color': 'red', 'owner': 'Carl', 'model': 'Ferrari'}

回答 14

如果您认为lambda是邪恶的,那么请继续阅读。根据要求,您可以使用一个表达式编写快速而高效的内存解决方案:

x = {'a':1, 'b':2}
y = {'b':10, 'c':11}
z = (lambda a, b: (lambda a_copy: a_copy.update(b) or a_copy)(a.copy()))(x, y)
print z
{'a': 1, 'c': 11, 'b': 10}
print x
{'a': 1, 'b': 2}

如上所述,使用两行或编写函数可能是更好的方法。

If you think lambdas are evil then read no further. As requested, you can write the fast and memory-efficient solution with one expression:

x = {'a':1, 'b':2}
y = {'b':10, 'c':11}
z = (lambda a, b: (lambda a_copy: a_copy.update(b) or a_copy)(a.copy()))(x, y)
print z
{'a': 1, 'c': 11, 'b': 10}
print x
{'a': 1, 'b': 2}

As suggested above, using two lines or writing a function is probably a better way to go.


回答 15

是pythonic。使用理解

z={i:d[i] for d in [x,y] for i in d}

>>> print z
{'a': 1, 'c': 11, 'b': 10}

Be pythonic. Use a comprehension:

z={i:d[i] for d in [x,y] for i in d}

>>> print z
{'a': 1, 'c': 11, 'b': 10}

回答 16

在python3中,该items方法不再返回list,而是返回一个view,其作用类似于set。在这种情况下,您将需要使用set联合,因为与的连接+将不起作用:

dict(x.items() | y.items())

对于2.7版中的类似python3的行为,该viewitems方法应代替items

dict(x.viewitems() | y.viewitems())

无论如何,我还是更喜欢这种表示法,因为将其视为固定的联合运算而不是串联(如标题所示)似乎更为自然。

编辑:

对于python 3还有几点。首先,请注意,dict(x, **y)除非输入的键y是字符串,否则该技巧在python 3中将不起作用。

而且,Raymond Hettinger的Chainmap 答案非常优雅,因为它可以将任意数量的dicts作为参数,但是从文档中看,它似乎依次遍历了每次查找的所有dicts列表:

查找顺序搜索基础映射,直到找到密钥为止。

如果您的应用程序中有很多查找,这可能会减慢您的速度:

In [1]: from collections import ChainMap
In [2]: from string import ascii_uppercase as up, ascii_lowercase as lo; x = dict(zip(lo, up)); y = dict(zip(up, lo))
In [3]: chainmap_dict = ChainMap(y, x)
In [4]: union_dict = dict(x.items() | y.items())
In [5]: timeit for k in union_dict: union_dict[k]
100000 loops, best of 3: 2.15 µs per loop
In [6]: timeit for k in chainmap_dict: chainmap_dict[k]
10000 loops, best of 3: 27.1 µs per loop

因此,查找速度要慢一个数量级。我是Chainmap的粉丝,但在可能有很多查找的地方看起来不太实用。

In python3, the items method no longer returns a list, but rather a view, which acts like a set. In this case you’ll need to take the set union since concatenating with + won’t work:

dict(x.items() | y.items())

For python3-like behavior in version 2.7, the viewitems method should work in place of items:

dict(x.viewitems() | y.viewitems())

I prefer this notation anyways since it seems more natural to think of it as a set union operation rather than concatenation (as the title shows).

Edit:

A couple more points for python 3. First, note that the dict(x, **y) trick won’t work in python 3 unless the keys in y are strings.

Also, Raymond Hettinger’s Chainmap answer is pretty elegant, since it can take an arbitrary number of dicts as arguments, but from the docs it looks like it sequentially looks through a list of all the dicts for each lookup:

Lookups search the underlying mappings successively until a key is found.

This can slow you down if you have a lot of lookups in your application:

In [1]: from collections import ChainMap
In [2]: from string import ascii_uppercase as up, ascii_lowercase as lo; x = dict(zip(lo, up)); y = dict(zip(up, lo))
In [3]: chainmap_dict = ChainMap(y, x)
In [4]: union_dict = dict(x.items() | y.items())
In [5]: timeit for k in union_dict: union_dict[k]
100000 loops, best of 3: 2.15 µs per loop
In [6]: timeit for k in chainmap_dict: chainmap_dict[k]
10000 loops, best of 3: 27.1 µs per loop

So about an order of magnitude slower for lookups. I’m a fan of Chainmap, but looks less practical where there may be many lookups.


回答 17

滥用导致马修回答的单一表达解决方案:

>>> x = {'a':1, 'b': 2}
>>> y = {'b':10, 'c': 11}
>>> z = (lambda f=x.copy(): (f.update(y), f)[1])()
>>> z
{'a': 1, 'c': 11, 'b': 10}

您说您想要一个表达式,所以我滥用lambda了绑定一个名称,并使用元组重写了lambda的一个表达式的限制。随时畏缩。

如果您不关心复制它,当然也可以这样做:

>>> x = {'a':1, 'b': 2}
>>> y = {'b':10, 'c': 11}
>>> z = (x.update(y), x)[1]
>>> z
{'a': 1, 'b': 10, 'c': 11}

Abuse leading to a one-expression solution for Matthew’s answer:

>>> x = {'a':1, 'b': 2}
>>> y = {'b':10, 'c': 11}
>>> z = (lambda f=x.copy(): (f.update(y), f)[1])()
>>> z
{'a': 1, 'c': 11, 'b': 10}

You said you wanted one expression, so I abused lambda to bind a name, and tuples to override lambda’s one-expression limit. Feel free to cringe.

You could also do this of course if you don’t care about copying it:

>>> x = {'a':1, 'b': 2}
>>> y = {'b':10, 'c': 11}
>>> z = (x.update(y), x)[1]
>>> z
{'a': 1, 'b': 10, 'c': 11}

回答 18

使用itertools的简单解决方案,该命令可保留顺序(后格优先)

import itertools as it
merge = lambda *args: dict(it.chain.from_iterable(it.imap(dict.iteritems, args)))

它的用法是:

>>> x = {'a':1, 'b': 2}
>>> y = {'b':10, 'c': 11}
>>> merge(x, y)
{'a': 1, 'b': 10, 'c': 11}

>>> z = {'c': 3, 'd': 4}
>>> merge(x, y, z)
{'a': 1, 'b': 10, 'c': 3, 'd': 4}

Simple solution using itertools that preserves order (latter dicts have precedence)

import itertools as it
merge = lambda *args: dict(it.chain.from_iterable(it.imap(dict.iteritems, args)))

And it’s usage:

>>> x = {'a':1, 'b': 2}
>>> y = {'b':10, 'c': 11}
>>> merge(x, y)
{'a': 1, 'b': 10, 'c': 11}

>>> z = {'c': 3, 'd': 4}
>>> merge(x, y, z)
{'a': 1, 'b': 10, 'c': 3, 'd': 4}

回答 19

两本字典

def union2(dict1, dict2):
    return dict(list(dict1.items()) + list(dict2.items()))

n词典

def union(*dicts):
    return dict(itertools.chain.from_iterable(dct.items() for dct in dicts))

sum性能不佳。参见https://mathieularose.com/how-not-to-flatten-a-list-of-lists-in-python/

Two dictionaries

def union2(dict1, dict2):
    return dict(list(dict1.items()) + list(dict2.items()))

n dictionaries

def union(*dicts):
    return dict(itertools.chain.from_iterable(dct.items() for dct in dicts))

sum has bad performance. See https://mathieularose.com/how-not-to-flatten-a-list-of-lists-in-python/


回答 20

即使答案对于此浅表字典而言是好的,但此处定义的方法均未进行深字典合并。

示例如下:

a = { 'one': { 'depth_2': True }, 'two': True }
b = { 'one': { 'extra': False } }
print dict(a.items() + b.items())

人们会期望这样的结果:

{ 'one': { 'extra': False', 'depth_2': True }, 'two': True }

相反,我们得到以下信息:

{'two': True, 'one': {'extra': False}}

如果“ one”条目确实是合并的,则其字典中的项目应具有“ depth_2”和“ extra”作为条目。

也使用链,不起作用:

from itertools import chain
print dict(chain(a.iteritems(), b.iteritems()))

结果是:

{'two': True, 'one': {'extra': False}}

rcwesick进行的深度合并也产生相同的结果。

是的,可以合并示例字典,但是它们都不是合并的通用机制。一旦编写了可以真正合并的方法,我将在以后进行更新。

Even though the answers were good for this shallow dictionary, none of the methods defined here actually do a deep dictionary merge.

Examples follow:

a = { 'one': { 'depth_2': True }, 'two': True }
b = { 'one': { 'extra': False } }
print dict(a.items() + b.items())

One would expect a result of something like this:

{ 'one': { 'extra': False', 'depth_2': True }, 'two': True }

Instead, we get this:

{'two': True, 'one': {'extra': False}}

The ‘one’ entry should have had ‘depth_2’ and ‘extra’ as items inside its dictionary if it truly was a merge.

Using chain also, does not work:

from itertools import chain
print dict(chain(a.iteritems(), b.iteritems()))

Results in:

{'two': True, 'one': {'extra': False}}

The deep merge that rcwesick gave also creates the same result.

Yes, it will work to merge the sample dictionaries, but none of them are a generic mechanism to merge. I’ll update this later once I write a method that does a true merge.


回答 21

(仅适用于Python2.7 *;对于Python3 *有更简单的解决方案。)

如果您不反对导入标准库模块,则可以执行

from functools import reduce

def merge_dicts(*dicts):
    return reduce(lambda a, d: a.update(d) or a, dicts, {})

(由于总是返回成功,所以中的or alambda是必需的。)dict.updateNone

(For Python2.7* only; there are simpler solutions for Python3*.)

If you’re not averse to importing a standard library module, you can do

from functools import reduce

def merge_dicts(*dicts):
    return reduce(lambda a, d: a.update(d) or a, dicts, {})

(The or a bit in the lambda is necessary because dict.update always returns None on success.)


回答 22

如果您不介意变异x

x.update(y) or x

简单,可读,高效。您知道 update()总是会返回None,这是一个错误的值。因此,上述表达式x在更新后将始终等于。

标准库中的变异方法(如.update()None按约定返回,因此该模式也适用于这些方法。如果您使用的方法不遵循此约定,则or可能无法正常工作。但是,您可以使用元组显示和索引来使其成为单个表达式。无论第一个元素的计算结果如何,此方法都有效。

(x.update(y), x)[-1]

如果还没有x变量,则可以使用lambda本地变量而不使用赋值语句。这相当于lambda用作let表达式,这是功能语言中的一种常用技术,但可能不是Python语言。

(lambda x: x.update(y) or x)({'a': 1, 'b': 2})

尽管与下面使用新的walrus运算符(仅适用于Python 3.8+)没有什么不同:

(x := {'a': 1, 'b': 2}).update(y) or x

如果您确实想要复制,则PEP 448样式最简单{**x, **y}。但是,如果您的(旧)Python版本中没有该功能,则let模式也可以在这里使用。

(lambda z: z.update(y) or z)(x.copy())

(当然,这等效于(z := x.copy()).update(y) or z,但是如果您的Python版本足够新,则可以使用PEP 448样式。)

If you don’t mind mutating x,

x.update(y) or x

Simple, readable, performant. You know update() always returns None, which is a false value. So the above expression will always evaluate to x, after updating it.

Mutating methods in the standard library (like .update()) return None by convention, so this pattern will work on those too. If you’re using a method that doesn’t follow this convention, then or may not work. But, you can use a tuple display and index to make it a single expression, instead. This works regardless of what the first element evaluates to.

(x.update(y), x)[-1]

If you don’t have x in a variable yet, you can use lambda to make a local without using an assignment statement. This amounts to using lambda as a let expression, which is a common technique in functional languages, but maybe unpythonic.

(lambda x: x.update(y) or x)({'a': 1, 'b': 2})

Although it’s not that different from the following use of the new walrus operator (Python 3.8+ only):

(x := {'a': 1, 'b': 2}).update(y) or x

If you do want a copy, PEP 448 style is easiest {**x, **y}. But if that’s not available in your (older) Python version, the let pattern works here too.

(lambda z: z.update(y) or z)(x.copy())

(That is, of course, equivalent to (z := x.copy()).update(y) or z, but if your Python version is new enough for that, then the PEP 448 style will be available.)


回答 23

借鉴这里和其他地方的想法,我理解了一个函数:

def merge(*dicts, **kv): 
      return { k:v for d in list(dicts) + [kv] for k,v in d.items() }

用法(在python 3中测试):

assert (merge({1:11,'a':'aaa'},{1:99, 'b':'bbb'},foo='bar')==\
    {1: 99, 'foo': 'bar', 'b': 'bbb', 'a': 'aaa'})

assert (merge(foo='bar')=={'foo': 'bar'})

assert (merge({1:11},{1:99},foo='bar',baz='quux')==\
    {1: 99, 'foo': 'bar', 'baz':'quux'})

assert (merge({1:11},{1:99})=={1: 99})

您可以改用lambda。

Drawing on ideas here and elsewhere I’ve comprehended a function:

def merge(*dicts, **kv): 
      return { k:v for d in list(dicts) + [kv] for k,v in d.items() }

Usage (tested in python 3):

assert (merge({1:11,'a':'aaa'},{1:99, 'b':'bbb'},foo='bar')==\
    {1: 99, 'foo': 'bar', 'b': 'bbb', 'a': 'aaa'})

assert (merge(foo='bar')=={'foo': 'bar'})

assert (merge({1:11},{1:99},foo='bar',baz='quux')==\
    {1: 99, 'foo': 'bar', 'baz':'quux'})

assert (merge({1:11},{1:99})=={1: 99})

You could use a lambda instead.


回答 24

迄今为止,我列出的解决方案存在的问题是,在合并的词典中,键“ b”的值是10,但按照我的想法,应该是12。鉴于此,我提出以下内容:

import timeit

n=100000
su = """
x = {'a':1, 'b': 2}
y = {'b':10, 'c': 11}
"""

def timeMerge(f,su,niter):
    print "{:4f} sec for: {:30s}".format(timeit.Timer(f,setup=su).timeit(n),f)

timeMerge("dict(x, **y)",su,n)
timeMerge("x.update(y)",su,n)
timeMerge("dict(x.items() + y.items())",su,n)
timeMerge("for k in y.keys(): x[k] = k in x and x[k]+y[k] or y[k] ",su,n)

#confirm for loop adds b entries together
x = {'a':1, 'b': 2}
y = {'b':10, 'c': 11}
for k in y.keys(): x[k] = k in x and x[k]+y[k] or y[k]
print "confirm b elements are added:",x

结果:

0.049465 sec for: dict(x, **y)
0.033729 sec for: x.update(y)                   
0.150380 sec for: dict(x.items() + y.items())   
0.083120 sec for: for k in y.keys(): x[k] = k in x and x[k]+y[k] or y[k]

confirm b elements are added: {'a': 1, 'c': 11, 'b': 12}

The problem I have with solutions listed to date is that, in the merged dictionary, the value for key “b” is 10 but, to my way of thinking, it should be 12. In that light, I present the following:

import timeit

n=100000
su = """
x = {'a':1, 'b': 2}
y = {'b':10, 'c': 11}
"""

def timeMerge(f,su,niter):
    print "{:4f} sec for: {:30s}".format(timeit.Timer(f,setup=su).timeit(n),f)

timeMerge("dict(x, **y)",su,n)
timeMerge("x.update(y)",su,n)
timeMerge("dict(x.items() + y.items())",su,n)
timeMerge("for k in y.keys(): x[k] = k in x and x[k]+y[k] or y[k] ",su,n)

#confirm for loop adds b entries together
x = {'a':1, 'b': 2}
y = {'b':10, 'c': 11}
for k in y.keys(): x[k] = k in x and x[k]+y[k] or y[k]
print "confirm b elements are added:",x

Results:

0.049465 sec for: dict(x, **y)
0.033729 sec for: x.update(y)                   
0.150380 sec for: dict(x.items() + y.items())   
0.083120 sec for: for k in y.keys(): x[k] = k in x and x[k]+y[k] or y[k]

confirm b elements are added: {'a': 1, 'c': 11, 'b': 12}

回答 25

真傻,.update什么也没返回。
我只是使用一个简单的辅助函数来解决问题:

def merge(dict1,*dicts):
    for dict2 in dicts:
        dict1.update(dict2)
    return dict1

例子:

merge(dict1,dict2)
merge(dict1,dict2,dict3)
merge(dict1,dict2,dict3,dict4)
merge({},dict1,dict2)  # this one returns a new copy

It’s so silly that .update returns nothing.
I just use a simple helper function to solve the problem:

def merge(dict1,*dicts):
    for dict2 in dicts:
        dict1.update(dict2)
    return dict1

Examples:

merge(dict1,dict2)
merge(dict1,dict2,dict3)
merge(dict1,dict2,dict3,dict4)
merge({},dict1,dict2)  # this one returns a new copy

回答 26

from collections import Counter
dict1 = {'a':1, 'b': 2}
dict2 = {'b':10, 'c': 11}
result = dict(Counter(dict1) + Counter(dict2))

这应该可以解决您的问题。

from collections import Counter
dict1 = {'a':1, 'b': 2}
dict2 = {'b':10, 'c': 11}
result = dict(Counter(dict1) + Counter(dict2))

This should solve your problem.


回答 27

这可以通过单个dict理解来完成:

>>> x = {'a':1, 'b': 2}
>>> y = {'b':10, 'c': 11}
>>> { key: y[key] if key in y else x[key]
      for key in set(x) + set(y)
    }

在我看来,“单个表达式”部分的最佳答案是因为不需要额外的功能,而且它很简短。

This can be done with a single dict comprehension:

>>> x = {'a':1, 'b': 2}
>>> y = {'b':10, 'c': 11}
>>> { key: y[key] if key in y else x[key]
      for key in set(x) + set(y)
    }

In my view the best answer for the ‘single expression’ part as no extra functions are needed, and it is short.


回答 28

由于PEP 572:Assignment Expressions,Python 3.8发行版(计划于2019年10月20日)将提供一个新选项。新的赋值表达式运算符使您可以分配的结果,并仍然使用它来调用,从而使组合的代码成为单个表达式,而不是两个语句,从而进行了更改::=copyupdate

newdict = dict1.copy()
newdict.update(dict2)

至:

(newdict := dict1.copy()).update(dict2)

同时在各个方面都表现相同。如果还必须返回结果dict(您要求返回的表达式dict;上面创建并分配给newdict,但没有返回,因此您不能使用它将参数直接传递给函数la myfunc((newdict := dict1.copy()).update(dict2))) ,然后将其添加or newdict到末尾(因为updatereturns None是虚假的,因此它将求值并newdict作为表达式的结果返回):

(newdict := dict1.copy()).update(dict2) or newdict

重要警告:通常,我不建议采用以下方法:

newdict = {**dict1, **dict2}

拆包方法更清晰(对于一开始就知道要进行广义拆包的人来说,应该这样),根本不需要名称(因此,构造一个立即传递给a的临时文件时,它会更加简洁。函数或包含在list/ tuple文字等中),并且几乎肯定也更快,因为(在CPython上)大致等同于:

newdict = {}
newdict.update(dict1)
newdict.update(dict2)

但使用具体的dictAPI 在C层完成,因此不涉及动态方法查找/绑定或函数调用分派开销(在此(newdict := dict1.copy()).update(dict2)情况下,行为不可避免地与原始的两层相同,在不连续的步骤中执行工作,并进行动态查找/绑定/方法的调用。

它也更可扩展,因为合并三个dicts是显而易见的:

 newdict = {**dict1, **dict2, **dict3}

使用赋值表达式不会像这样缩放的地方;您能得到的最接近的是:

 (newdict := dict1.copy()).update(dict2), newdict.update(dict3)

或没有Nones 的临时元组,但对每个None结果进行真实性测试:

 (newdict := dict1.copy()).update(dict2) or newdict.update(dict3)

其中的任一个是明显更恶心,并且包括进一步的低效(或者是临时浪费tupleNoneS表示逗号分离,或每个的无意义感实性测试updateNone用于返回or分离)。

赋值表达式方法的唯一真正优势在于:

  1. 您有需要同时处理sets和dicts的通用代码(它们都支持copyupdate,因此代码可以按您期望的那样大致工作)
  2. 您期望接收任意类似dict的对象,而不仅仅是dict自身,并且必须保留左侧的类型和语义(而不是以简单的结尾dict)。尽管myspecialdict({**speciala, **specialb})可能会起作用,但它会涉及一个额外的临时操作dict,并且如果myspecialdict具有平原dict无法保留的功能(例如,常规dicts现在基于键的首次出现保留顺序,而基于键的最后出现保留值;您可能想要一个根据最后一个保留订单键的外观,因此更新值也会将其移到末尾),那么语义将是错误的。由于赋值表达式版本使用命名方法(可能会重载以使其正常运行),因此它根本不会创建一个dict(除非dict1已经是一个dict),并保留原始类型(和原始类型的语义),同时避免任何临时性。

There will be a new option when Python 3.8 releases (scheduled for 20 October, 2019), thanks to PEP 572: Assignment Expressions. The new assignment expression operator := allows you to assign the result of the copy and still use it to call update, leaving the combined code a single expression, rather than two statements, changing:

newdict = dict1.copy()
newdict.update(dict2)

to:

(newdict := dict1.copy()).update(dict2)

while behaving identically in every way. If you must also return the resulting dict (you asked for an expression returning the dict; the above creates and assigns to newdict, but doesn’t return it, so you couldn’t use it to pass an argument to a function as is, a la myfunc((newdict := dict1.copy()).update(dict2))), then just add or newdict to the end (since update returns None, which is falsy, it will then evaluate and return newdict as the result of the expression):

(newdict := dict1.copy()).update(dict2) or newdict

Important caveat: In general, I’d discourage this approach in favor of:

newdict = {**dict1, **dict2}

The unpacking approach is clearer (to anyone who knows about generalized unpacking in the first place, which you should), doesn’t require a name for the result at all (so it’s much more concise when constructing a temporary that is immediately passed to a function or included in a list/tuple literal or the like), and is almost certainly faster as well, being (on CPython) roughly equivalent to:

newdict = {}
newdict.update(dict1)
newdict.update(dict2)

but done at the C layer, using the concrete dict API, so no dynamic method lookup/binding or function call dispatch overhead is involved (where (newdict := dict1.copy()).update(dict2) is unavoidably identical to the original two-liner in behavior, performing the work in discrete steps, with dynamic lookup/binding/invocation of methods.

It’s also more extensible, as merging three dicts is obvious:

 newdict = {**dict1, **dict2, **dict3}

where using assignment expressions won’t scale like that; the closest you could get would be:

 (newdict := dict1.copy()).update(dict2), newdict.update(dict3)

or without the temporary tuple of Nones, but with truthiness testing of each None result:

 (newdict := dict1.copy()).update(dict2) or newdict.update(dict3)

either of which is obviously much uglier, and includes further inefficiencies (either a wasted temporary tuple of Nones for comma separation, or pointless truthiness testing of each update‘s None return for or separation).

The only real advantage to the assignment expression approach occurs if:

  1. You have generic code that needs handle both sets and dicts (both of them support copy and update, so the code works roughly as you’d expect it to)
  2. You expect to receive arbitrary dict-like objects, not just dict itself, and must preserve the type and semantics of the left hand side (rather than ending up with a plain dict). While myspecialdict({**speciala, **specialb}) might work, it would involve an extra temporary dict, and if myspecialdict has features plain dict can’t preserve (e.g. regular dicts now preserve order based on the first appearance of a key, and value based on the last appearance of a key; you might want one that preserves order based on the last appearance of a key so updating a value also moves it to the end), then the semantics would be wrong. Since the assignment expression version uses the named methods (which are presumably overloaded to behave appropriately), it never creates a dict at all (unless dict1 was already a dict), preserving the original type (and original type’s semantics), all while avoiding any temporaries.

回答 29

>>> x = {'a':1, 'b': 2}
>>> y = {'b':10, 'c': 11}
>>> x, z = dict(x), x.update(y) or x
>>> x
{'a': 1, 'b': 2}
>>> y
{'c': 11, 'b': 10}
>>> z
{'a': 1, 'c': 11, 'b': 10}
>>> x = {'a':1, 'b': 2}
>>> y = {'b':10, 'c': 11}
>>> x, z = dict(x), x.update(y) or x
>>> x
{'a': 1, 'b': 2}
>>> y
{'c': 11, 'b': 10}
>>> z
{'a': 1, 'c': 11, 'b': 10}

如何安全地创建嵌套目录?

问题:如何安全地创建嵌套目录?

检查文件目录是否存在的最优雅方法是什么,如果不存在,则使用Python创建目录?这是我尝试过的:

import os

file_path = "/my/directory/filename.txt"
directory = os.path.dirname(file_path)

try:
    os.stat(directory)
except:
    os.mkdir(directory)       

f = file(filename)

不知何故,我想念os.path.exists(感谢魔芋,布莱尔和道格拉斯)。这就是我现在所拥有的:

def ensure_dir(file_path):
    directory = os.path.dirname(file_path)
    if not os.path.exists(directory):
        os.makedirs(directory)

是否有“打开”标志,使它自动发生?

What is the most elegant way to check if the directory a file is going to be written to exists, and if not, create the directory using Python? Here is what I tried:

import os

file_path = "/my/directory/filename.txt"
directory = os.path.dirname(file_path)

try:
    os.stat(directory)
except:
    os.mkdir(directory)       

f = file(filename)

Somehow, I missed os.path.exists (thanks kanja, Blair, and Douglas). This is what I have now:

def ensure_dir(file_path):
    directory = os.path.dirname(file_path)
    if not os.path.exists(directory):
        os.makedirs(directory)

Is there a flag for “open”, that makes this happen automatically?


回答 0

在Python≥3.5上,使用pathlib.Path.mkdir

from pathlib import Path
Path("/my/directory").mkdir(parents=True, exist_ok=True)

对于旧版本的Python,我看到两个质量很好的答案,每个都有一个小缺陷,因此我将对此进行说明:

试试看os.path.exists,然后考虑os.makedirs创建。

import os
if not os.path.exists(directory):
    os.makedirs(directory)

如注释和其他地方所述,存在竞争条件–如果在os.path.existsos.makedirs调用之间创建目录,os.makedirs则将失败并显示OSError。不幸的是,毯式捕获OSError和继续操作并非万无一失,因为它将忽略由于其他因素(例如权限不足,磁盘已满等)而导致的目录创建失败。

一种选择是捕获OSError并检查嵌入式错误代码(请参阅是否存在从Python的OSError获取信息的跨平台方法):

import os, errno

try:
    os.makedirs(directory)
except OSError as e:
    if e.errno != errno.EEXIST:
        raise

或者,可以有第二个os.path.exists,但是假设另一个在第一次检查后创建了目录,然后在第二个检查之前将其删除了–我们仍然可能会上当。

取决于应用程序,并发操作的危险可能比其他因素(例如文件许可权)造成的危险更大或更小。在选择实现之前,开发人员必须了解有关正在开发的特定应用程序及其预期环境的更多信息。

现代版本的Python通过暴露FileExistsError(在3.3+ 版本中)都极大地改善了此代码。

try:
    os.makedirs("path/to/directory")
except FileExistsError:
    # directory already exists
    pass

…并允许关键字参数os.makedirs调用exist_ok(在3.2+版本中)。

os.makedirs("path/to/directory", exist_ok=True)  # succeeds even if directory exists.

On Python ≥ 3.5, use pathlib.Path.mkdir:

from pathlib import Path
Path("/my/directory").mkdir(parents=True, exist_ok=True)

For older versions of Python, I see two answers with good qualities, each with a small flaw, so I will give my take on it:

Try os.path.exists, and consider os.makedirs for the creation.

import os
if not os.path.exists(directory):
    os.makedirs(directory)

As noted in comments and elsewhere, there’s a race condition – if the directory is created between the os.path.exists and the os.makedirs calls, the os.makedirs will fail with an OSError. Unfortunately, blanket-catching OSError and continuing is not foolproof, as it will ignore a failure to create the directory due to other factors, such as insufficient permissions, full disk, etc.

One option would be to trap the OSError and examine the embedded error code (see Is there a cross-platform way of getting information from Python’s OSError):

import os, errno

try:
    os.makedirs(directory)
except OSError as e:
    if e.errno != errno.EEXIST:
        raise

Alternatively, there could be a second os.path.exists, but suppose another created the directory after the first check, then removed it before the second one – we could still be fooled.

Depending on the application, the danger of concurrent operations may be more or less than the danger posed by other factors such as file permissions. The developer would have to know more about the particular application being developed and its expected environment before choosing an implementation.

Modern versions of Python improve this code quite a bit, both by exposing FileExistsError (in 3.3+)…

try:
    os.makedirs("path/to/directory")
except FileExistsError:
    # directory already exists
    pass

…and by allowing a keyword argument to os.makedirs called exist_ok (in 3.2+).

os.makedirs("path/to/directory", exist_ok=True)  # succeeds even if directory exists.

回答 1

Python 3.5以上版本:

import pathlib
pathlib.Path('/my/directory').mkdir(parents=True, exist_ok=True) 

pathlib.Path.mkdir上面使用的递归方式创建目录,并且如果目录已经存在,则不会引发异常。如果不需要或不希望创建父母,请跳过该parents参数。

Python 3.2+:

使用pathlib

如果可以,请安装pathlib名为的当前反向端口pathlib2。不要安装名为的较旧的未维护的反向端口pathlib。接下来,请参考上面的Python 3.5+部分,并对其进行相同的使用。

如果使用Python 3.4,即使它附带了pathlib,它也会丢失有用的exist_ok选项。反向端口旨在提供更新的高级实现,mkdir其中包括缺少的选项。

使用os

import os
os.makedirs(path, exist_ok=True)

os.makedirs上面使用的递归方式创建目录,并且如果目录已经存在,则不会引发异常。exist_ok仅当使用Python 3.2+时,它才具有可选参数,默认值为False。在2.7之前的Python 2.x中不存在此参数。这样,就不需要像Python 2.7那样的手动异常处理。

Python 2.7+:

使用pathlib

如果可以,请安装pathlib名为的当前反向端口pathlib2。不要安装名为的较旧的未维护的反向端口pathlib。接下来,请参考上面的Python 3.5+部分,并对其进行相同的使用。

使用os

import os
try: 
    os.makedirs(path)
except OSError:
    if not os.path.isdir(path):
        raise

虽然可能会先使用朴素的解决方案,os.path.isdir然后再使用os.makedirs,但是上述解决方案颠倒了两个操作的顺序。这样,它可以防止由于创建目录的重复尝试而导致的常见竞争情况,并且还可以消除目录中文件的歧义。

请注意,捕获异常和使用errno的作用有限,因为对于文件和目录,都会引发OSError: [Errno 17] File exists,即errno.EEXIST。仅检查目录是否存在更为可靠。

选择:

mkpath创建嵌套目录,如果目录已经存在,则不执行任何操作。这适用于Python 2和3。

import distutils.dir_util
distutils.dir_util.mkpath(path)

根据Bug 10948,此替代方案的严重局限性在于,对于给定路径,每个python进程仅工作一次。换句话说,如果您使用它来创建目录,然后从Python内部或外部删除该目录,然后mkpath再次mkpath使用它来重新创建同一目录,则将仅默默地使用其先前已创建目录的无效缓存信息,而不会实际再次创建目录。相反,os.makedirs不依赖任何此类缓存。对于某些应用程序,此限制可能是可以的。


关于目录的模式,如果您关心它,请参考文档。

Python 3.5+:

import pathlib
pathlib.Path('/my/directory').mkdir(parents=True, exist_ok=True) 

pathlib.Path.mkdir as used above recursively creates the directory and does not raise an exception if the directory already exists. If you don’t need or want the parents to be created, skip the parents argument.

Python 3.2+:

Using pathlib:

If you can, install the current pathlib backport named pathlib2. Do not install the older unmaintained backport named pathlib. Next, refer to the Python 3.5+ section above and use it the same.

If using Python 3.4, even though it comes with pathlib, it is missing the useful exist_ok option. The backport is intended to offer a newer and superior implementation of mkdir which includes this missing option.

Using os:

import os
os.makedirs(path, exist_ok=True)

os.makedirs as used above recursively creates the directory and does not raise an exception if the directory already exists. It has the optional exist_ok argument only if using Python 3.2+, with a default value of False. This argument does not exist in Python 2.x up to 2.7. As such, there is no need for manual exception handling as with Python 2.7.

Python 2.7+:

Using pathlib:

If you can, install the current pathlib backport named pathlib2. Do not install the older unmaintained backport named pathlib. Next, refer to the Python 3.5+ section above and use it the same.

Using os:

import os
try: 
    os.makedirs(path)
except OSError:
    if not os.path.isdir(path):
        raise

While a naive solution may first use os.path.isdir followed by os.makedirs, the solution above reverses the order of the two operations. In doing so, it prevents a common race condition having to do with a duplicated attempt at creating the directory, and also disambiguates files from directories.

Note that capturing the exception and using errno is of limited usefulness because OSError: [Errno 17] File exists, i.e. errno.EEXIST, is raised for both files and directories. It is more reliable simply to check if the directory exists.

Alternative:

mkpath creates the nested directory, and does nothing if the directory already exists. This works in both Python 2 and 3.

import distutils.dir_util
distutils.dir_util.mkpath(path)

Per Bug 10948, a severe limitation of this alternative is that it works only once per python process for a given path. In other words, if you use it to create a directory, then delete the directory from inside or outside Python, then use mkpath again to recreate the same directory, mkpath will simply silently use its invalid cached info of having previously created the directory, and will not actually make the directory again. In contrast, os.makedirs doesn’t rely on any such cache. This limitation may be okay for some applications.


With regard to the directory’s mode, please refer to the documentation if you care about it.


回答 2

使用tryexcept和来自errno模块的正确错误代码摆脱了竞争条件,并且是跨平台的:

import os
import errno

def make_sure_path_exists(path):
    try:
        os.makedirs(path)
    except OSError as exception:
        if exception.errno != errno.EEXIST:
            raise

换句话说,我们尝试创建目录,但是如果它们已经存在,我们将忽略该错误。另一方面,将报告任何其他错误。例如,如果您预先创建目录’a’并从中删除所有权限,则会OSError引发errno.EACCES(权限被拒绝,错误13)。

Using try except and the right error code from errno module gets rid of the race condition and is cross-platform:

import os
import errno

def make_sure_path_exists(path):
    try:
        os.makedirs(path)
    except OSError as exception:
        if exception.errno != errno.EEXIST:
            raise

In other words, we try to create the directories, but if they already exist we ignore the error. On the other hand, any other error gets reported. For example, if you create dir ‘a’ beforehand and remove all permissions from it, you will get an OSError raised with errno.EACCES (Permission denied, error 13).


回答 3

我个人建议您使用os.path.isdir()代替进行测试os.path.exists()

>>> os.path.exists('/tmp/dirname')
True
>>> os.path.exists('/tmp/dirname/filename.etc')
True
>>> os.path.isdir('/tmp/dirname/filename.etc')
False
>>> os.path.isdir('/tmp/fakedirname')
False

如果你有:

>>> dir = raw_input(":: ")

和愚蠢的用户输入:

:: /tmp/dirname/filename.etc

……你要与一个名为落得filename.etc当你传递参数os.makedirs(),如果你与测试os.path.exists()

I would personally recommend that you use os.path.isdir() to test instead of os.path.exists().

>>> os.path.exists('/tmp/dirname')
True
>>> os.path.exists('/tmp/dirname/filename.etc')
True
>>> os.path.isdir('/tmp/dirname/filename.etc')
False
>>> os.path.isdir('/tmp/fakedirname')
False

If you have:

>>> dir = raw_input(":: ")

And a foolish user input:

:: /tmp/dirname/filename.etc

… You’re going to end up with a directory named filename.etc when you pass that argument to os.makedirs() if you test with os.path.exists().


回答 4

检查os.makedirs:(确保存在完整路径。)
要处理目录可能存在的事实,请catch OSError。(如果exist_okFalse(缺省值),OSError则在目标目录已经存在时引发。)

import os
try:
    os.makedirs('./path/to/somewhere')
except OSError:
    pass

Check os.makedirs: (It makes sure the complete path exists.)
To handle the fact the directory might exist, catch OSError. (If exist_ok is False (the default), an OSError is raised if the target directory already exists.)

import os
try:
    os.makedirs('./path/to/somewhere')
except OSError:
    pass

回答 5

从Python 3.5开始,pathlib.Path.mkdir有一个exist_ok标志:

from pathlib import Path
path = Path('/my/directory/filename.txt')
path.parent.mkdir(parents=True, exist_ok=True) 
# path.parent ~ os.path.dirname(path)

这将以递归方式创建目录,并且如果目录已经存在,则不会引发异常。

(就像从python 3.2开始os.makedirsexist_ok标志一样os.makedirs(path, exist_ok=True)

Starting from Python 3.5, pathlib.Path.mkdir has an exist_ok flag:

from pathlib import Path
path = Path('/my/directory/filename.txt')
path.parent.mkdir(parents=True, exist_ok=True) 
# path.parent ~ os.path.dirname(path)

This recursively creates the directory and does not raise an exception if the directory already exists.

(just as os.makedirs got an exist_ok flag starting from python 3.2 e.g os.makedirs(path, exist_ok=True))


回答 6

对这种情况的具体见解

您在特定路径下提供特定文件,然后从文件路径中提取目录。然后,在确保您拥有目录之后,尝试打开一个文件进行读取。要对此代码发表评论:

filename = "/my/directory/filename.txt"
dir = os.path.dirname(filename)

我们要避免覆盖内置函数dir。另外,filepath或者也许fullfilepath是比它更好的语义名称,filename所以这样写会更好:

import os
filepath = '/my/directory/filename.txt'
directory = os.path.dirname(filepath)

您的最终目标是打开该文件,一开始就声明要写入,但是实际上您正在达到此目标(基于您的代码),就像这样,打开该文件进行读取

if not os.path.exists(directory):
    os.makedirs(directory)
f = file(filename)

假设开放阅读

为什么要为您希望存在并能够读取的文件创建目录?

只是尝试打开文件。

with open(filepath) as my_file:
    do_stuff(my_file)

如果目录或文件不存在,您将获得一个IOError带有相关错误代码的:errno.ENOENT无论您使用什么平台,它都将指向正确的错误代码。您可以根据需要捕获它,例如:

import errno
try:
    with open(filepath) as my_file:
        do_stuff(my_file)
except IOError as error:
    if error.errno == errno.ENOENT:
        print 'ignoring error because directory or file is not there'
    else:
        raise

假设我们正在写作

可能就是您想要的。

在这种情况下,我们可能没有面对任何比赛条件。因此,照原样进行操作,但请注意,编写时需要使用w模式打开(或a追加)。使用上下文管理器打开文件也是Python的最佳实践。

import os
if not os.path.exists(directory):
    os.makedirs(directory)
with open(filepath, 'w') as my_file:
    do_stuff(my_file)

但是,假设我们有几个Python进程试图将其所有数据放入同一目录。然后,我们可能会争执目录的创建。在这种情况下,最好将makedirs调用包装在try-except块中。

import os
import errno
if not os.path.exists(directory):
    try:
        os.makedirs(directory)
    except OSError as error:
        if error.errno != errno.EEXIST:
            raise
with open(filepath, 'w') as my_file:
    do_stuff(my_file)

Insights on the specifics of this situation

You give a particular file at a certain path and you pull the directory from the file path. Then after making sure you have the directory, you attempt to open a file for reading. To comment on this code:

filename = "/my/directory/filename.txt"
dir = os.path.dirname(filename)

We want to avoid overwriting the builtin function, dir. Also, filepath or perhaps fullfilepath is probably a better semantic name than filename so this would be better written:

import os
filepath = '/my/directory/filename.txt'
directory = os.path.dirname(filepath)

Your end goal is to open this file, you initially state, for writing, but you’re essentially approaching this goal (based on your code) like this, which opens the file for reading:

if not os.path.exists(directory):
    os.makedirs(directory)
f = file(filename)

Assuming opening for reading

Why would you make a directory for a file that you expect to be there and be able to read?

Just attempt to open the file.

with open(filepath) as my_file:
    do_stuff(my_file)

If the directory or file isn’t there, you’ll get an IOError with an associated error number: errno.ENOENT will point to the correct error number regardless of your platform. You can catch it if you want, for example:

import errno
try:
    with open(filepath) as my_file:
        do_stuff(my_file)
except IOError as error:
    if error.errno == errno.ENOENT:
        print 'ignoring error because directory or file is not there'
    else:
        raise

Assuming we’re opening for writing

This is probably what you’re wanting.

In this case, we probably aren’t facing any race conditions. So just do as you were, but note that for writing, you need to open with the w mode (or a to append). It’s also a Python best practice to use the context manager for opening files.

import os
if not os.path.exists(directory):
    os.makedirs(directory)
with open(filepath, 'w') as my_file:
    do_stuff(my_file)

However, say we have several Python processes that attempt to put all their data into the same directory. Then we may have contention over creation of the directory. In that case it’s best to wrap the makedirs call in a try-except block.

import os
import errno
if not os.path.exists(directory):
    try:
        os.makedirs(directory)
    except OSError as error:
        if error.errno != errno.EEXIST:
            raise
with open(filepath, 'w') as my_file:
    do_stuff(my_file)

回答 7

试用os.path.exists功能

if not os.path.exists(dir):
    os.mkdir(dir)

Try the os.path.exists function

if not os.path.exists(dir):
    os.mkdir(dir)

回答 8

我将以下内容放下。但是,这并非完全安全。

import os

dirname = 'create/me'

try:
    os.makedirs(dirname)
except OSError:
    if os.path.exists(dirname):
        # We are nearly safe
        pass
    else:
        # There was an error on creation, so make sure we know about it
        raise

现在,正如我所说,这并不是万无一失的,因为我们有可能无法创建目录,而在此期间可能会有另一个创建它的进程。

I have put the following down. It’s not totally foolproof though.

import os

dirname = 'create/me'

try:
    os.makedirs(dirname)
except OSError:
    if os.path.exists(dirname):
        # We are nearly safe
        pass
    else:
        # There was an error on creation, so make sure we know about it
        raise

Now as I say, this is not really foolproof, because we have the possiblity of failing to create the directory, and another process creating it during that period.


回答 9

检查目录是否存在并根据需要创建目录?

对此的直接答案是,假设有一个简单的情况,您不希望其他用户或进程弄乱您的目录:

if not os.path.exists(d):
    os.makedirs(d)

或者如果使目录符合竞争条件(即如果检查路径是否存在,则可能已经建立了其他路径),请执行以下操作:

import errno
try:
    os.makedirs(d)
except OSError as exception:
    if exception.errno != errno.EEXIST:
        raise

但是,也许更好的方法是通过以下方式使用临时目录来避免资源争用问题tempfile

import tempfile

d = tempfile.mkdtemp()

以下是在线文档中的要点:

mkdtemp(suffix='', prefix='tmp', dir=None)
    User-callable function to create and return a unique temporary
    directory.  The return value is the pathname of the directory.

    The directory is readable, writable, and searchable only by the
    creating user.

    Caller is responsible for deleting the directory when done with it.

新的Python 3.5:pathlib.Pathexist_ok

有一个新的Path对象(从3.4版开始),它具有许多要与路径一起使用的方法-其中一个是mkdir

(在上下文中,我正在使用脚本跟踪我的每周代表。这是脚本中代码的相关部分,这些内容使我避免对同一数据每天多次遇到Stack Overflow。)

首先相关进口:

from pathlib import Path
import tempfile

我们现在不必处理os.path.join-只需将路径部分与结合起来即可/

directory = Path(tempfile.gettempdir()) / 'sodata'

然后,我确定地确保目录存在- exist_ok参数在Python 3.5中显示:

directory.mkdir(exist_ok=True)

这是文档的相关部分:

如果exist_ok为true,FileExistsErrorPOSIX mkdir -p仅当最后一个路径组件不是现有的非目录文件时,才会忽略异常(与命令相同的行为)。

这里还有更多脚本-就我而言,我不受竞争条件的影响,我只有一个进程希望目录(或包含的文件)存在,并且我没有任何尝试删除的过程目录。

todays_file = directory / str(datetime.datetime.utcnow().date())
if todays_file.exists():
    logger.info("todays_file exists: " + str(todays_file))
    df = pd.read_json(str(todays_file))

Path必须将对象强制转换为str其他期望str路径使用它们的API 。

也许应该更新Pandas以接受抽象基类的实例os.PathLike

Check if a directory exists and create it if necessary?

The direct answer to this is, assuming a simple situation where you don’t expect other users or processes to be messing with your directory:

if not os.path.exists(d):
    os.makedirs(d)

or if making the directory is subject to race conditions (i.e. if after checking the path exists, something else may have already made it) do this:

import errno
try:
    os.makedirs(d)
except OSError as exception:
    if exception.errno != errno.EEXIST:
        raise

But perhaps an even better approach is to sidestep the resource contention issue, by using temporary directories via tempfile:

import tempfile

d = tempfile.mkdtemp()

Here’s the essentials from the online doc:

mkdtemp(suffix='', prefix='tmp', dir=None)
    User-callable function to create and return a unique temporary
    directory.  The return value is the pathname of the directory.

    The directory is readable, writable, and searchable only by the
    creating user.

    Caller is responsible for deleting the directory when done with it.

New in Python 3.5: pathlib.Path with exist_ok

There’s a new Path object (as of 3.4) with lots of methods one would want to use with paths – one of which is mkdir.

(For context, I’m tracking my weekly rep with a script. Here’s the relevant parts of code from the script that allow me to avoid hitting Stack Overflow more than once a day for the same data.)

First the relevant imports:

from pathlib import Path
import tempfile

We don’t have to deal with os.path.join now – just join path parts with a /:

directory = Path(tempfile.gettempdir()) / 'sodata'

Then I idempotently ensure the directory exists – the exist_ok argument shows up in Python 3.5:

directory.mkdir(exist_ok=True)

Here’s the relevant part of the documentation:

If exist_ok is true, FileExistsError exceptions will be ignored (same behavior as the POSIX mkdir -p command), but only if the last path component is not an existing non-directory file.

Here’s a little more of the script – in my case, I’m not subject to a race condition, I only have one process that expects the directory (or contained files) to be there, and I don’t have anything trying to remove the directory.

todays_file = directory / str(datetime.datetime.utcnow().date())
if todays_file.exists():
    logger.info("todays_file exists: " + str(todays_file))
    df = pd.read_json(str(todays_file))

Path objects have to be coerced to str before other APIs that expect str paths can use them.

Perhaps Pandas should be updated to accept instances of the abstract base class, os.PathLike.


回答 10

在Python 3.4中,您还可以使用全新的pathlib模块

from pathlib import Path
path = Path("/my/directory/filename.txt")
try:
    if not path.parent.exists():
        path.parent.mkdir(parents=True)
except OSError:
    # handle error; you can also catch specific errors like
    # FileExistsError and so on.

In Python 3.4 you can also use the brand new pathlib module:

from pathlib import Path
path = Path("/my/directory/filename.txt")
try:
    if not path.parent.exists():
        path.parent.mkdir(parents=True)
except OSError:
    # handle error; you can also catch specific errors like
    # FileExistsError and so on.

回答 11

相关的Python文档建议使用的编码风格(更容易请求原谅比许可)EAFP。这意味着代码

try:
    os.makedirs(path)
except OSError as exception:
    if exception.errno != errno.EEXIST:
        raise
    else:
        print "\nBE CAREFUL! Directory %s already exists." % path

比替代品更好

if not os.path.exists(path):
    os.makedirs(path)
else:
    print "\nBE CAREFUL! Directory %s already exists." % path

该文档正是由于此问题中讨论的种族条件而提出了这一建议。此外,正如此处其他人所提到的,查询一次操作系统而不是两次查询操作系统具有性能优势。最后,在某些情况下(当开发人员知道应用程序正在运行的环境时),可能会提出支持第二个代码的参数,只有在特殊情况下才提倡该程序已为该程序建立了私有环境。本身(以及同一程序的其他实例)。

即使在这种情况下,这也是一种不好的做法,并且可能导致长时间的无用调试。例如,我们为目录设置权限的事实不应该使我们拥有为我们目的而适当设置的印象权限。可以使用其他权限挂载父目录。通常,程序应始终正常运行,并且程序员不应期望一个特定的环境。

The relevant Python documentation suggests the use of the EAFP coding style (Easier to Ask for Forgiveness than Permission). This means that the code

try:
    os.makedirs(path)
except OSError as exception:
    if exception.errno != errno.EEXIST:
        raise
    else:
        print "\nBE CAREFUL! Directory %s already exists." % path

is better than the alternative

if not os.path.exists(path):
    os.makedirs(path)
else:
    print "\nBE CAREFUL! Directory %s already exists." % path

The documentation suggests this exactly because of the race condition discussed in this question. In addition, as others mention here, there is a performance advantage in querying once instead of twice the OS. Finally, the argument placed forward, potentially, in favour of the second code in some cases –when the developer knows the environment the application is running– can only be advocated in the special case that the program has set up a private environment for itself (and other instances of the same program).

Even in that case, this is a bad practice and can lead to long useless debugging. For example, the fact we set the permissions for a directory should not leave us with the impression permissions are set appropriately for our purposes. A parent directory could be mounted with other permissions. In general, a program should always work correctly and the programmer should not expect one specific environment.


回答 12

Python3中os.makedirs支持设置exist_ok。默认设置为False,这意味着OSError如果目标目录已存在,将引发。通过设置exist_okTrueOSError(目录存在)将被忽略,并且不会创建目录。

os.makedirs(path,exist_ok=True)

Python2中os.makedirs不支持设置exist_ok。您可以在heikki-toivonen的答案中使用该方法:

import os
import errno

def make_sure_path_exists(path):
    try:
        os.makedirs(path)
    except OSError as exception:
        if exception.errno != errno.EEXIST:
            raise

In Python3, os.makedirs supports setting exist_ok. The default setting is False, which means an OSError will be raised if the target directory already exists. By setting exist_ok to True, OSError (directory exists) will be ignored and the directory will not be created.

os.makedirs(path,exist_ok=True)

In Python2, os.makedirs doesn’t support setting exist_ok. You can use the approach in heikki-toivonen’s answer:

import os
import errno

def make_sure_path_exists(path):
    try:
        os.makedirs(path)
    except OSError as exception:
        if exception.errno != errno.EEXIST:
            raise

回答 13

对于单线解决方案,可以使用IPython.utils.path.ensure_dir_exists()

from IPython.utils.path import ensure_dir_exists
ensure_dir_exists(dir)

文档中确保目录存在。如果不存在,请尝试创建它,并在其他进程正在这样做的情况下防止出现竞争情况。

For a one-liner solution, you can use IPython.utils.path.ensure_dir_exists():

from IPython.utils.path import ensure_dir_exists
ensure_dir_exists(dir)

From the documentation: Ensure that a directory exists. If it doesn’t exist, try to create it and protect against a race condition if another process is doing the same.


回答 14

您可以使用 mkpath

# Create a directory and any missing ancestor directories. 
# If the directory already exists, do nothing.

from distutils.dir_util import mkpath
mkpath("test")    

请注意,它也会创建祖先目录。

它适用于Python 2和3。

You can use mkpath

# Create a directory and any missing ancestor directories. 
# If the directory already exists, do nothing.

from distutils.dir_util import mkpath
mkpath("test")    

Note that it will create the ancestor directories as well.

It works for Python 2 and 3.


回答 15

我使用os.path.exists()是一个Python 3脚本,可用于检查目录是否存在,如果目录不存在则创建一个,如果目录存在则将其删除(如果需要)。

它提示用户输入目录,并且可以轻松修改。

I use os.path.exists(), here is a Python 3 script that can be used to check if a directory exists, create one if it does not exist, and delete it if it does exist (if desired).

It prompts users for input of the directory and can be easily modified.


回答 16

您可以os.listdir为此使用:

import os
if 'dirName' in os.listdir('parentFolderPath')
    print('Directory Exists')

You can use os.listdir for this:

import os
if 'dirName' in os.listdir('parentFolderPath')
    print('Directory Exists')

回答 17

我找到了这个问题,起初我为自己遇到的一些失败和错误感到困惑。我正在使用Python 3(Arch Linux x86_64系统上的Anaconda虚拟环境中的v.3.5)。

考虑以下目录结构:

└── output/         ## dir
   ├── corpus       ## file
   ├── corpus2/     ## dir
   └── subdir/      ## dir

这是我的实验/注释,它们使事情变得清晰:

# ----------------------------------------------------------------------------
# [1] /programming/273192/how-can-i-create-a-directory-if-it-does-not-exist

import pathlib

""" Notes:
        1.  Include a trailing slash at the end of the directory path
            ("Method 1," below).
        2.  If a subdirectory in your intended path matches an existing file
            with same name, you will get the following error:
            "NotADirectoryError: [Errno 20] Not a directory:" ...
"""
# Uncomment and try each of these "out_dir" paths, singly:

# ----------------------------------------------------------------------------
# METHOD 1:
# Re-running does not overwrite existing directories and files; no errors.

# out_dir = 'output/corpus3'                ## no error but no dir created (missing tailing /)
# out_dir = 'output/corpus3/'               ## works
# out_dir = 'output/corpus3/doc1'           ## no error but no dir created (missing tailing /)
# out_dir = 'output/corpus3/doc1/'          ## works
# out_dir = 'output/corpus3/doc1/doc.txt'   ## no error but no file created (os.makedirs creates dir, not files!  ;-)
# out_dir = 'output/corpus2/tfidf/'         ## fails with "Errno 20" (existing file named "corpus2")
# out_dir = 'output/corpus3/tfidf/'         ## works
# out_dir = 'output/corpus3/a/b/c/d/'       ## works

# [2] https://docs.python.org/3/library/os.html#os.makedirs

# Uncomment these to run "Method 1":

#directory = os.path.dirname(out_dir)
#os.makedirs(directory, mode=0o777, exist_ok=True)

# ----------------------------------------------------------------------------
# METHOD 2:
# Re-running does not overwrite existing directories and files; no errors.

# out_dir = 'output/corpus3'                ## works
# out_dir = 'output/corpus3/'               ## works
# out_dir = 'output/corpus3/doc1'           ## works
# out_dir = 'output/corpus3/doc1/'          ## works
# out_dir = 'output/corpus3/doc1/doc.txt'   ## no error but creates a .../doc.txt./ dir
# out_dir = 'output/corpus2/tfidf/'         ## fails with "Errno 20" (existing file named "corpus2")
# out_dir = 'output/corpus3/tfidf/'         ## works
# out_dir = 'output/corpus3/a/b/c/d/'       ## works

# Uncomment these to run "Method 2":

#import os, errno
#try:
#       os.makedirs(out_dir)
#except OSError as e:
#       if e.errno != errno.EEXIST:
#               raise
# ----------------------------------------------------------------------------

结论:我认为“方法2”更可靠。

[1] 如果目录不存在,如何创建?

[2] https://docs.python.org/3/library/os.html#os.makedirs

I found this Q/A and I was initially puzzled by some of the failures and errors I was getting. I am working in Python 3 (v.3.5 in an Anaconda virtual environment on an Arch Linux x86_64 system).

Consider this directory structure:

└── output/         ## dir
   ├── corpus       ## file
   ├── corpus2/     ## dir
   └── subdir/      ## dir

Here are my experiments/notes, which clarifies things:

# ----------------------------------------------------------------------------
# [1] https://stackoverflow.com/questions/273192/how-can-i-create-a-directory-if-it-does-not-exist

import pathlib

""" Notes:
        1.  Include a trailing slash at the end of the directory path
            ("Method 1," below).
        2.  If a subdirectory in your intended path matches an existing file
            with same name, you will get the following error:
            "NotADirectoryError: [Errno 20] Not a directory:" ...
"""
# Uncomment and try each of these "out_dir" paths, singly:

# ----------------------------------------------------------------------------
# METHOD 1:
# Re-running does not overwrite existing directories and files; no errors.

# out_dir = 'output/corpus3'                ## no error but no dir created (missing tailing /)
# out_dir = 'output/corpus3/'               ## works
# out_dir = 'output/corpus3/doc1'           ## no error but no dir created (missing tailing /)
# out_dir = 'output/corpus3/doc1/'          ## works
# out_dir = 'output/corpus3/doc1/doc.txt'   ## no error but no file created (os.makedirs creates dir, not files!  ;-)
# out_dir = 'output/corpus2/tfidf/'         ## fails with "Errno 20" (existing file named "corpus2")
# out_dir = 'output/corpus3/tfidf/'         ## works
# out_dir = 'output/corpus3/a/b/c/d/'       ## works

# [2] https://docs.python.org/3/library/os.html#os.makedirs

# Uncomment these to run "Method 1":

#directory = os.path.dirname(out_dir)
#os.makedirs(directory, mode=0o777, exist_ok=True)

# ----------------------------------------------------------------------------
# METHOD 2:
# Re-running does not overwrite existing directories and files; no errors.

# out_dir = 'output/corpus3'                ## works
# out_dir = 'output/corpus3/'               ## works
# out_dir = 'output/corpus3/doc1'           ## works
# out_dir = 'output/corpus3/doc1/'          ## works
# out_dir = 'output/corpus3/doc1/doc.txt'   ## no error but creates a .../doc.txt./ dir
# out_dir = 'output/corpus2/tfidf/'         ## fails with "Errno 20" (existing file named "corpus2")
# out_dir = 'output/corpus3/tfidf/'         ## works
# out_dir = 'output/corpus3/a/b/c/d/'       ## works

# Uncomment these to run "Method 2":

#import os, errno
#try:
#       os.makedirs(out_dir)
#except OSError as e:
#       if e.errno != errno.EEXIST:
#               raise
# ----------------------------------------------------------------------------

Conclusion: in my opinion, “Method 2” is more robust.

[1] How can I create a directory if it does not exist?

[2] https://docs.python.org/3/library/os.html#os.makedirs


回答 18

我看到了Heikki ToivonenABB的答案,并想到了这种变化。

import os
import errno

def make_sure_path_exists(path):
    try:
        os.makedirs(path)
    except OSError as exception:
        if exception.errno != errno.EEXIST or not os.path.isdir(path):
            raise

I saw Heikki Toivonen and A-B-B‘s answers and thought of this variation.

import os
import errno

def make_sure_path_exists(path):
    try:
        os.makedirs(path)
    except OSError as exception:
        if exception.errno != errno.EEXIST or not os.path.isdir(path):
            raise

回答 19

使用此命令检查并创建目录

 if not os.path.isdir(test_img_dir):
     os.mkdir(test_img_dir)

Use this command check and create dir

 if not os.path.isdir(test_img_dir):
     os.mkdir(test_img_dir)

回答 20

如果在支持mkdir-p选项命令的计算机上运行,​​为什么不使用子流程模块 ?适用于python 2.7和python 3.6

from subprocess import call
call(['mkdir', '-p', 'path1/path2/path3'])

在大多数系统上都可以做到。

在可移植性无关紧要的情况下(例如,使用docker),解决方案只需2行。您也不必添加逻辑来检查目录是否存在。最后,重新运行很安全,没有任何副作用

如果您需要错误处理:

from subprocess import check_call
try:
    check_call(['mkdir', '-p', 'path1/path2/path3'])
except:
    handle...

Why not use subprocess module if running on a machine that supports command mkdir with -p option ? Works on python 2.7 and python 3.6

from subprocess import call
call(['mkdir', '-p', 'path1/path2/path3'])

Should do the trick on most systems.

In situations where portability doesn’t matter (ex, using docker) the solution is a clean 2 lines. You also don’t have to add logic to check if directories exist or not. Finally, it is safe to re-run without any side effects

If you need error handling:

from subprocess import check_call
try:
    check_call(['mkdir', '-p', 'path1/path2/path3'])
except:
    handle...

回答 21

如果考虑以下因素:

os.path.isdir('/tmp/dirname')

表示目录(路径)存在,并且是目录。所以对我来说,这种方式满足了我的需求。因此,我可以确保它是文件夹(不是文件)并且存在。

If you consider the following:

os.path.isdir('/tmp/dirname')

means a directory (path) exists AND is a directory. So for me this way does what I need. So I can make sure it is folder (not a file) and exists.


回答 22

create_dir()在程序/项目的入口点调用该函数。

import os

def create_dir(directory):
    if not os.path.exists(directory):
        print('Creating Directory '+directory)
        os.makedirs(directory)

create_dir('Project directory')

Call the function create_dir() at the entry point of your program/project.

import os

def create_dir(directory):
    if not os.path.exists(directory):
        print('Creating Directory '+directory)
        os.makedirs(directory)

create_dir('Project directory')

回答 23

您必须在创建目录之前设置完整路径:

import os,sys,inspect
import pathlib

currentdir = os.path.dirname(os.path.abspath(inspect.getfile(inspect.currentframe())))
your_folder = currentdir + "/" + "your_folder"

if not os.path.exists(your_folder):
   pathlib.Path(your_folder).mkdir(parents=True, exist_ok=True)

这对我有用,希望对您也一样

You have to set the full path before creating the directory:

import os,sys,inspect
import pathlib

currentdir = os.path.dirname(os.path.abspath(inspect.getfile(inspect.currentframe())))
your_folder = currentdir + "/" + "your_folder"

if not os.path.exists(your_folder):
   pathlib.Path(your_folder).mkdir(parents=True, exist_ok=True)

This works for me and hopefully, it will works for you as well


回答 24

import os
if os.path.isfile(filename):
    print "file exists"
else:
    "Your code here"

您的代码在哪里使用(touch)命令

这将检查文件是否存在,如果不存在则将创建它。

import os
if os.path.isfile(filename):
    print "file exists"
else:
    "Your code here"

Where your code here is use the (touch) command

This will check if the file is there if it is not then it will create it.


Python是否具有字符串“包含”子字符串方法?

问题:Python是否具有字符串“包含”子字符串方法?

我在寻找Python中的string.containsor string.indexof方法。

我想要做:

if not somestring.contains("blah"):
   continue

I’m looking for a string.contains or string.indexof method in Python.

I want to do:

if not somestring.contains("blah"):
   continue

回答 0

您可以使用in运算符

if "blah" not in somestring: 
    continue

You can use the in operator:

if "blah" not in somestring: 
    continue

回答 1

如果只是子字符串搜索,则可以使用string.find("substring")

你必须与小心一点findindexin虽然,因为它们是字符串搜索。换句话说,这是:

s = "This be a string"
if s.find("is") == -1:
    print("No 'is' here!")
else:
    print("Found 'is' in the string.")

它将打印Found 'is' in the string.类似,if "is" in s:结果为True。这可能是您想要的,也可能不是。

If it’s just a substring search you can use string.find("substring").

You do have to be a little careful with find, index, and in though, as they are substring searches. In other words, this:

s = "This be a string"
if s.find("is") == -1:
    print("No 'is' here!")
else:
    print("Found 'is' in the string.")

It would print Found 'is' in the string. Similarly, if "is" in s: would evaluate to True. This may or may not be what you want.


回答 2

Python是否有包含子字符串方法的字符串?

是的,但是Python有一个比较运算符,您应该改用它,因为该语言打算使用它,而其他程序员则希望您使用它。该关键字是in,用作比较运算符:

>>> 'foo' in '**foo**'
True

原始问题要求的相反的(补码)是not in

>>> 'foo' not in '**foo**' # returns False
False

这在语义上not 'foo' in '**foo**'与之相同,但是它在语言中更具可读性,并作为可读性的改进而明确提供。

避免使用__contains__findindex

如所承诺的,这是contains方法:

str.__contains__('**foo**', 'foo')

返回True。您也可以从超字符串的实例调用此函数:

'**foo**'.__contains__('foo')

但是不要。以下划线开头的方法在语义上被视为私有。使用此功能的唯一原因是在扩展inand not in功能(例如,子类化str)时:

class NoisyString(str):
    def __contains__(self, other):
        print('testing if "{0}" in "{1}"'.format(other, self))
        return super(NoisyString, self).__contains__(other)

ns = NoisyString('a string with a substring inside')

现在:

>>> 'substring' in ns
testing if "substring" in "a string with a substring inside"
True

另外,请避免使用以下字符串方法:

>>> '**foo**'.index('foo')
2
>>> '**foo**'.find('foo')
2

>>> '**oo**'.find('foo')
-1
>>> '**oo**'.index('foo')

Traceback (most recent call last):
  File "<pyshell#40>", line 1, in <module>
    '**oo**'.index('foo')
ValueError: substring not found

其他语言可能没有直接测试子字符串的方法,因此您必须使用这些类型的方法,但是对于Python,使用in比较运算符会更加有效。

性能比较

我们可以比较实现同一目标的各种方式。

import timeit

def in_(s, other):
    return other in s

def contains(s, other):
    return s.__contains__(other)

def find(s, other):
    return s.find(other) != -1

def index(s, other):
    try:
        s.index(other)
    except ValueError:
        return False
    else:
        return True



perf_dict = {
'in:True': min(timeit.repeat(lambda: in_('superstring', 'str'))),
'in:False': min(timeit.repeat(lambda: in_('superstring', 'not'))),
'__contains__:True': min(timeit.repeat(lambda: contains('superstring', 'str'))),
'__contains__:False': min(timeit.repeat(lambda: contains('superstring', 'not'))),
'find:True': min(timeit.repeat(lambda: find('superstring', 'str'))),
'find:False': min(timeit.repeat(lambda: find('superstring', 'not'))),
'index:True': min(timeit.repeat(lambda: index('superstring', 'str'))),
'index:False': min(timeit.repeat(lambda: index('superstring', 'not'))),
}

现在我们看到使用in比其他方法快得多。进行等效操作的时间越少越好:

>>> perf_dict
{'in:True': 0.16450627865128808,
 'in:False': 0.1609668098178645,
 '__contains__:True': 0.24355481654697542,
 '__contains__:False': 0.24382793854783813,
 'find:True': 0.3067379407923454,
 'find:False': 0.29860888058124146,
 'index:True': 0.29647137792585454,
 'index:False': 0.5502287584545229}

Does Python have a string contains substring method?

Yes, but Python has a comparison operator that you should use instead, because the language intends its usage, and other programmers will expect you to use it. That keyword is in, which is used as a comparison operator:

>>> 'foo' in '**foo**'
True

The opposite (complement), which the original question asks for, is not in:

>>> 'foo' not in '**foo**' # returns False
False

This is semantically the same as not 'foo' in '**foo**' but it’s much more readable and explicitly provided for in the language as a readability improvement.

Avoid using __contains__, find, and index

As promised, here’s the contains method:

str.__contains__('**foo**', 'foo')

returns True. You could also call this function from the instance of the superstring:

'**foo**'.__contains__('foo')

But don’t. Methods that start with underscores are considered semantically private. The only reason to use this is when extending the in and not in functionality (e.g. if subclassing str):

class NoisyString(str):
    def __contains__(self, other):
        print('testing if "{0}" in "{1}"'.format(other, self))
        return super(NoisyString, self).__contains__(other)

ns = NoisyString('a string with a substring inside')

and now:

>>> 'substring' in ns
testing if "substring" in "a string with a substring inside"
True

Also, avoid the following string methods:

>>> '**foo**'.index('foo')
2
>>> '**foo**'.find('foo')
2

>>> '**oo**'.find('foo')
-1
>>> '**oo**'.index('foo')

Traceback (most recent call last):
  File "<pyshell#40>", line 1, in <module>
    '**oo**'.index('foo')
ValueError: substring not found

Other languages may have no methods to directly test for substrings, and so you would have to use these types of methods, but with Python, it is much more efficient to use the in comparison operator.

Performance comparisons

We can compare various ways of accomplishing the same goal.

import timeit

def in_(s, other):
    return other in s

def contains(s, other):
    return s.__contains__(other)

def find(s, other):
    return s.find(other) != -1

def index(s, other):
    try:
        s.index(other)
    except ValueError:
        return False
    else:
        return True



perf_dict = {
'in:True': min(timeit.repeat(lambda: in_('superstring', 'str'))),
'in:False': min(timeit.repeat(lambda: in_('superstring', 'not'))),
'__contains__:True': min(timeit.repeat(lambda: contains('superstring', 'str'))),
'__contains__:False': min(timeit.repeat(lambda: contains('superstring', 'not'))),
'find:True': min(timeit.repeat(lambda: find('superstring', 'str'))),
'find:False': min(timeit.repeat(lambda: find('superstring', 'not'))),
'index:True': min(timeit.repeat(lambda: index('superstring', 'str'))),
'index:False': min(timeit.repeat(lambda: index('superstring', 'not'))),
}

And now we see that using in is much faster than the others. Less time to do an equivalent operation is better:

>>> perf_dict
{'in:True': 0.16450627865128808,
 'in:False': 0.1609668098178645,
 '__contains__:True': 0.24355481654697542,
 '__contains__:False': 0.24382793854783813,
 'find:True': 0.3067379407923454,
 'find:False': 0.29860888058124146,
 'index:True': 0.29647137792585454,
 'index:False': 0.5502287584545229}

回答 3

if needle in haystack:正如@Michael所说,这是正常的用法-它依赖于in运算符,比方法调用更具可读性和速度。

如果您确实需要一个方法而不是一个运算符(例如,key=对一个非常特殊的类做一些奇怪的事情??),那就是'haystack'.__contains__。但是由于您的示例是用于的if,我想您并不是真的在说什么;-)。直接使用特殊方法不是很好的形式(既不可读也不高效),而是要通过委托给它们的运算符和内建函数使用它们。

if needle in haystack: is the normal use, as @Michael says — it relies on the in operator, more readable and faster than a method call.

If you truly need a method instead of an operator (e.g. to do some weird key= for a very peculiar sort…?), that would be 'haystack'.__contains__. But since your example is for use in an if, I guess you don’t really mean what you say;-). It’s not good form (nor readable, nor efficient) to use special methods directly — they’re meant to be used, instead, through the operators and builtins that delegate to them.


回答 4

in Python字符串和列表

下面是一些有用的示例,它们说明了该in方法:

"foo" in "foobar"
True

"foo" in "Foobar"
False

"foo" in "Foobar".lower()
True

"foo".capitalize() in "Foobar"
True

"foo" in ["bar", "foo", "foobar"]
True

"foo" in ["fo", "o", "foobar"]
False

["foo" in a for a in ["fo", "o", "foobar"]]
[False, False, True]

警告。列表是可迭代的,并且该in方法作用于可迭代的对象,而不仅仅是字符串。

in Python strings and lists

Here are a few useful examples that speak for themselves concerning the in method:

"foo" in "foobar"
True

"foo" in "Foobar"
False

"foo" in "Foobar".lower()
True

"foo".capitalize() in "Foobar"
True

"foo" in ["bar", "foo", "foobar"]
True

"foo" in ["fo", "o", "foobar"]
False

["foo" in a for a in ["fo", "o", "foobar"]]
[False, False, True]

Caveat. Lists are iterables, and the in method acts on iterables, not just strings.


回答 5

如果您满意"blah" in somestring但希望将其用作函数/方法调用,则可以执行此操作

import operator

if not operator.contains(somestring, "blah"):
    continue

在Python 操作符模块中,或多或少可以找到Python中的所有操作符,包括in

If you are happy with "blah" in somestring but want it to be a function/method call, you can probably do this

import operator

if not operator.contains(somestring, "blah"):
    continue

All operators in Python can be more or less found in the operator module including in.


回答 6

因此,显然,矢量方向比较没有类似之处。一个明显的Python方式是:

names = ['bob', 'john', 'mike']
any(st in 'bob and john' for st in names) 
>> True

any(st in 'mary and jane' for st in names) 
>> False

So apparently there is nothing similar for vector-wise comparison. An obvious Python way to do so would be:

names = ['bob', 'john', 'mike']
any(st in 'bob and john' for st in names) 
>> True

any(st in 'mary and jane' for st in names) 
>> False

回答 7

您可以使用y.count()

它将返回子字符串出现在字符串中的次数的整数值。

例如:

string.count("bah") >> 0
string.count("Hello") >> 1

You can use y.count().

It will return the integer value of the number of times a sub string appears in a string.

For example:

string.count("bah") >> 0
string.count("Hello") >> 1

回答 8

这是您的答案:

if "insert_char_or_string_here" in "insert_string_to_search_here":
    #DOSTUFF

检查是否为假:

if not "insert_char_or_string_here" in "insert_string_to_search_here":
    #DOSTUFF

要么:

if "insert_char_or_string_here" not in "insert_string_to_search_here":
    #DOSTUFF

Here is your answer:

if "insert_char_or_string_here" in "insert_string_to_search_here":
    #DOSTUFF

For checking if it is false:

if not "insert_char_or_string_here" in "insert_string_to_search_here":
    #DOSTUFF

OR:

if "insert_char_or_string_here" not in "insert_string_to_search_here":
    #DOSTUFF

回答 9

您可以使用正则表达式获取出现次数:

>>> import re
>>> print(re.findall(r'( |t)', to_search_in)) # searches for t or space
['t', ' ', 't', ' ', ' ']

You can use regular expressions to get the occurrences:

>>> import re
>>> print(re.findall(r'( |t)', to_search_in)) # searches for t or space
['t', ' ', 't', ' ', ' ']

访问“ for”循环中的索引?

问题:访问“ for”循环中的索引?

如何for在如下所示的循环中访问索引?

ints = [8, 23, 45, 12, 78]
for i in ints:
    print('item #{} = {}'.format(???, i))

我想得到以下输出:

item #1 = 8
item #2 = 23
item #3 = 45
item #4 = 12
item #5 = 78

当我使用循环遍历它时for,如何访问循环索引(在这种情况下为1到5)?

How do I access the index in a for loop like the following?

ints = [8, 23, 45, 12, 78]
for i in ints:
    print('item #{} = {}'.format(???, i))

I want to get this output:

item #1 = 8
item #2 = 23
item #3 = 45
item #4 = 12
item #5 = 78

When I loop through it using a for loop, how do I access the loop index, from 1 to 5 in this case?


回答 0

使用其他状态变量,例如索引变量(通常在C或PHP等语言中使用),被认为是非Python的。

更好的选择是使用enumerate()Python 2和3中都提供的内置函数。

for idx, val in enumerate(ints):
    print(idx, val)

进一步了解PEP 279

Using an additional state variable, such as an index variable (which you would normally use in languages such as C or PHP), is considered non-pythonic.

The better option is to use the built-in function enumerate(), available in both Python 2 and 3:

for idx, val in enumerate(ints):
    print(idx, val)

Check out PEP 279 for more.


回答 1

使用for循环,在这种情况下如何访问循环索引(从1到5)?

用于enumerate在迭代时获取带有元素的索引:

for index, item in enumerate(items):
    print(index, item)

并请注意,Python的索引从零开始,因此上述值将为0到4。如果要计数1到5,请执行以下操作:

for count, item in enumerate(items, start=1):
    print(count, item)

单项控制流

您所要求的是以下Pythonic等效项,这是大多数低级语言程序员将使用的算法:

index = 0            # Python's indexing starts at zero
for item in items:   # Python's for loops are a "for each" loop 
    print(index, item)
    index += 1

或使用没有for-each循环的语言:

index = 0
while index < len(items):
    print(index, items[index])
    index += 1

或有时在Python中更常见(但唯一):

for index in range(len(items)):
    print(index, items[index])

使用枚举功能

Python的enumerate功能通过隐藏索引的记帐,并将可迭代项封装到另一个可迭代项(一个enumerate对象)中,从而减少了视觉混乱,该可迭代项产生了两个索引元组以及原始可迭代项将提供的项目。看起来像这样:

for index, item in enumerate(items, start=0):   # default is zero
    print(index, item)

此代码示例很好说明了Python特有的代码与非Python特有的代码之间的区别的典范示例。惯用代码是复杂的(但不复杂)Python,以预期使用的方式编写。语言的设计者期望使用惯用代码,这意味着通常该代码不仅更具可读性,而且效率更高。

计数

即使您不需要索引,但是您需要对迭代次数(有时是理想的)1进行计数,而最终的数字将是您的计数。

for count, item in enumerate(items, start=1):   # default is zero
    print(item)

print('there were {0} items printed'.format(count))

当您说想要从1到5时,该计数似乎更多地是您想要的内容(而不是索引)。


分解-逐步说明

为了分解这些示例,假设我们有一个要迭代的项目列表,并带有一个索引:

items = ['a', 'b', 'c', 'd', 'e']

现在,我们通过此可迭代的枚举,创建一个枚举对象:

enumerate_object = enumerate(items) # the enumerate object

我们可以从该迭代中提取第一个项目,以使我们可以使用该next函数进行循环:

iteration = next(enumerate_object) # first iteration from enumerate
print(iteration)

我们看到我们得到了元组0,第一个索引,和'a',第一项:

(0, 'a')

我们可以使用所谓的“ 序列拆包 ”从这两个元组中提取元素:

index, item = iteration
#   0,  'a' = (0, 'a') # essentially this.

当我们检查时index,我们发现它指的是第一个索引0,并且item指的是第一项'a'

>>> print(index)
0
>>> print(item)
a

结论

  • Python索引从零开始
  • 要在迭代过程中从迭代器获取这些索引,请使用枚举函数
  • 以惯用方式使用枚举(以及元组拆包)将创建更易读和可维护的代码:

这样做:

for index, item in enumerate(items, start=0):   # Python indexes start at zero
    print(index, item)

Using a for loop, how do I access the loop index, from 1 to 5 in this case?

Use enumerate to get the index with the element as you iterate:

for index, item in enumerate(items):
    print(index, item)

And note that Python’s indexes start at zero, so you would get 0 to 4 with the above. If you want the count, 1 to 5, do this:

for count, item in enumerate(items, start=1):
    print(count, item)

Unidiomatic control flow

What you are asking for is the Pythonic equivalent of the following, which is the algorithm most programmers of lower-level languages would use:

index = 0            # Python's indexing starts at zero
for item in items:   # Python's for loops are a "for each" loop 
    print(index, item)
    index += 1

Or in languages that do not have a for-each loop:

index = 0
while index < len(items):
    print(index, items[index])
    index += 1

or sometimes more commonly (but unidiomatically) found in Python:

for index in range(len(items)):
    print(index, items[index])

Use the Enumerate Function

Python’s enumerate function reduces the visual clutter by hiding the accounting for the indexes, and encapsulating the iterable into another iterable (an enumerate object) that yields a two-item tuple of the index and the item that the original iterable would provide. That looks like this:

for index, item in enumerate(items, start=0):   # default is zero
    print(index, item)

This code sample is fairly well the canonical example of the difference between code that is idiomatic of Python and code that is not. Idiomatic code is sophisticated (but not complicated) Python, written in the way that it was intended to be used. Idiomatic code is expected by the designers of the language, which means that usually this code is not just more readable, but also more efficient.

Getting a count

Even if you don’t need indexes as you go, but you need a count of the iterations (sometimes desirable) you can start with 1 and the final number will be your count.

for count, item in enumerate(items, start=1):   # default is zero
    print(item)

print('there were {0} items printed'.format(count))

The count seems to be more what you intend to ask for (as opposed to index) when you said you wanted from 1 to 5.


Breaking it down – a step by step explanation

To break these examples down, say we have a list of items that we want to iterate over with an index:

items = ['a', 'b', 'c', 'd', 'e']

Now we pass this iterable to enumerate, creating an enumerate object:

enumerate_object = enumerate(items) # the enumerate object

We can pull the first item out of this iterable that we would get in a loop with the next function:

iteration = next(enumerate_object) # first iteration from enumerate
print(iteration)

And we see we get a tuple of 0, the first index, and 'a', the first item:

(0, 'a')

we can use what is referred to as “sequence unpacking” to extract the elements from this two-tuple:

index, item = iteration
#   0,  'a' = (0, 'a') # essentially this.

and when we inspect index, we find it refers to the first index, 0, and item refers to the first item, 'a'.

>>> print(index)
0
>>> print(item)
a

Conclusion

  • Python indexes start at zero
  • To get these indexes from an iterable as you iterate over it, use the enumerate function
  • Using enumerate in the idiomatic way (along with tuple unpacking) creates code that is more readable and maintainable:

So do this:

for index, item in enumerate(items, start=0):   # Python indexes start at zero
    print(index, item)

回答 2

这是很简单的,从开始它1以外0

for index, item in enumerate(iterable, start=1):
   print index, item

注意

重要提示,尽管index可能会引起误解,但tuple (idx, item)在这里。好去。

It’s pretty simple to start it from 1 other than 0:

for index, item in enumerate(iterable, start=1):
   print index, item

Note

Important hint, though a little misleading since index will be a tuple (idx, item) here. Good to go.


回答 3

for i in range(len(ints)):
   print i, ints[i]
for i in range(len(ints)):
   print i, ints[i]

回答 4

按照Python的规范,有几种方法可以做到这一点。在所有示例中均假定:lst = [1, 2, 3, 4, 5]

1.使用枚举(被认为是最惯用的

for index, element in enumerate(lst):
    # do the things that need doing here

我认为这也是最安全的选择,因为消除了进行无限递归的机会。项目及其索引都保存在变量中,无需编写任何其他代码即可访问该项目。

2.创建一个变量来保存索引(使用for

for index in range(len(lst)):   # or xrange
    # you will have to write extra code to get the element

3.创建一个变量来保存索引(使用while

index = 0
while index < len(lst):
    # you will have to write extra code to get the element
    index += 1  # escape infinite recursion

4.总有另一种方法

如前所述,还有其他方法尚未在此处进行说明,它们甚至可能在其他情况下更适用。例如使用itertools.chainfor。它比其他示例更好地处理嵌套循环。

As is the norm in Python there are several ways to do this. In all examples assume: lst = [1, 2, 3, 4, 5]

1. Using enumerate (considered most idiomatic)

for index, element in enumerate(lst):
    # do the things that need doing here

This is also the safest option in my opinion because the chance of going into infinite recursion has been eliminated. Both the item and its index are held in variables and there is no need to write any further code to access the item.

2. Creating a variable to hold the index (using for)

for index in range(len(lst)):   # or xrange
    # you will have to write extra code to get the element

3. Creating a variable to hold the index (using while)

index = 0
while index < len(lst):
    # you will have to write extra code to get the element
    index += 1  # escape infinite recursion

4. There is always another way

As explained before, there are other ways to do this that have not been explained here and they may even apply more in other situations. e.g using itertools.chain with for. It handles nested loops better than the other examples.


回答 5

老式的方式:

for ix in range(len(ints)):
    print ints[ix]

清单理解:

[ (ix, ints[ix]) for ix in range(len(ints))]

>>> ints
[1, 2, 3, 4, 5]
>>> for ix in range(len(ints)): print ints[ix]
... 
1
2
3
4
5
>>> [ (ix, ints[ix]) for ix in range(len(ints))]
[(0, 1), (1, 2), (2, 3), (3, 4), (4, 5)]
>>> lc = [ (ix, ints[ix]) for ix in range(len(ints))]
>>> for tup in lc:
...     print tup
... 
(0, 1)
(1, 2)
(2, 3)
(3, 4)
(4, 5)
>>> 

Old fashioned way:

for ix in range(len(ints)):
    print ints[ix]

List comprehension:

[ (ix, ints[ix]) for ix in range(len(ints))]

>>> ints
[1, 2, 3, 4, 5]
>>> for ix in range(len(ints)): print ints[ix]
... 
1
2
3
4
5
>>> [ (ix, ints[ix]) for ix in range(len(ints))]
[(0, 1), (1, 2), (2, 3), (3, 4), (4, 5)]
>>> lc = [ (ix, ints[ix]) for ix in range(len(ints))]
>>> for tup in lc:
...     print tup
... 
(0, 1)
(1, 2)
(2, 3)
(3, 4)
(4, 5)
>>> 

回答 6

Python 2.7中访问循环内列表索引的最快方法是对小列表使用range方法,对中型和大型列表使用枚举方法

请参阅不同的方法,可以在列表和访问索引值被用来遍历和其性能指标(我想是对您有用)下面的代码样本:

from timeit import timeit

# Using range
def range_loop(iterable):
    for i in range(len(iterable)):
        1 + iterable[i]

# Using xrange
def xrange_loop(iterable):
    for i in xrange(len(iterable)):
        1 + iterable[i]

# Using enumerate
def enumerate_loop(iterable):
    for i, val in enumerate(iterable):
        1 + val

# Manual indexing
def manual_indexing_loop(iterable):
    index = 0
    for item in iterable:
        1 + item
        index += 1

请参阅以下每种方法的性能指标:

from timeit import timeit

def measure(l, number=10000):
print "Measure speed for list with %d items" % len(l)
print "xrange: ", timeit(lambda :xrange_loop(l), number=number)
print "range: ", timeit(lambda :range_loop(l), number=number)
print "enumerate: ", timeit(lambda :enumerate_loop(l), number=number)
print "manual_indexing: ", timeit(lambda :manual_indexing_loop(l), number=number)

measure(range(1000))
# Measure speed for list with 1000 items
# xrange:  0.758321046829
# range:  0.701184988022
# enumerate:  0.724966049194
# manual_indexing:  0.894635915756

measure(range(10000))
# Measure speed for list with 100000 items
# xrange:  81.4756360054
# range:  75.0172479153
# enumerate:  74.687623024
# manual_indexing:  91.6308541298

measure(range(10000000), number=100)
# Measure speed for list with 10000000 items
# xrange:  82.267786026
# range:  84.0493988991
# enumerate:  78.0344707966
# manual_indexing:  95.0491430759

结果,使用range方法是列出1000个项目中最快的一种。对于大小大于10000的列表,enumerate则为获胜者。

在下面添加一些有用的链接:

The fastest way to access indexes of list within loop in Python 2.7 is to use the range method for small lists and enumerate method for medium and huge size lists.

Please see different approaches which can be used to iterate over list and access index value and their performance metrics (which I suppose would be useful for you) in code samples below:

from timeit import timeit

# Using range
def range_loop(iterable):
    for i in range(len(iterable)):
        1 + iterable[i]

# Using xrange
def xrange_loop(iterable):
    for i in xrange(len(iterable)):
        1 + iterable[i]

# Using enumerate
def enumerate_loop(iterable):
    for i, val in enumerate(iterable):
        1 + val

# Manual indexing
def manual_indexing_loop(iterable):
    index = 0
    for item in iterable:
        1 + item
        index += 1

See performance metrics for each method below:

from timeit import timeit

def measure(l, number=10000):
print "Measure speed for list with %d items" % len(l)
print "xrange: ", timeit(lambda :xrange_loop(l), number=number)
print "range: ", timeit(lambda :range_loop(l), number=number)
print "enumerate: ", timeit(lambda :enumerate_loop(l), number=number)
print "manual_indexing: ", timeit(lambda :manual_indexing_loop(l), number=number)

measure(range(1000))
# Measure speed for list with 1000 items
# xrange:  0.758321046829
# range:  0.701184988022
# enumerate:  0.724966049194
# manual_indexing:  0.894635915756

measure(range(10000))
# Measure speed for list with 100000 items
# xrange:  81.4756360054
# range:  75.0172479153
# enumerate:  74.687623024
# manual_indexing:  91.6308541298

measure(range(10000000), number=100)
# Measure speed for list with 10000000 items
# xrange:  82.267786026
# range:  84.0493988991
# enumerate:  78.0344707966
# manual_indexing:  95.0491430759

As the result, using range method is the fastest one up to list with 1000 items. For list with size > 10 000 items enumerate is the winner.

Adding some useful links below:


回答 7

首先,索引将从0到4。编程语言从0开始计数;从0开始计数。不要忘了,否则您将遇到索引超出范围的异常。for循环中需要的只是一个从0到4的变量,如下所示:

for x in range(0, 5):

请记住,我写了0到5,因为循环在最大值之前停了一个数字。:)

要获取索引的值,请使用

list[index]

First of all, the indexes will be from 0 to 4. Programming languages start counting from 0; don’t forget that or you will come across an index out of bounds exception. All you need in the for loop is a variable counting from 0 to 4 like so:

for x in range(0, 5):

Keep in mind that I wrote 0 to 5 because the loop stops one number before the max. :)

To get the value of an index use

list[index]

回答 8

这是for循环访问索引时得到的结果:

for i in enumerate(items): print(i)

items = [8, 23, 45, 12, 78]

for i in enumerate(items):
    print("index/value", i)

结果:

# index/value (0, 8)
# index/value (1, 23)
# index/value (2, 45)
# index/value (3, 12)
# index/value (4, 78)

for i, val in enumerate(items): print(i, val)

items = [8, 23, 45, 12, 78]

for i, val in enumerate(items):
    print("index", i, "for value", val)

结果:

# index 0 for value 8
# index 1 for value 23
# index 2 for value 45
# index 3 for value 12
# index 4 for value 78

for i, val in enumerate(items): print(i)

items = [8, 23, 45, 12, 78]

for i, val in enumerate(items):
    print("index", i)

结果:

# index 0
# index 1
# index 2
# index 3
# index 4

Here’s what you get when you’re accessing index in for loops:

for i in enumerate(items): print(i)

items = [8, 23, 45, 12, 78]

for i in enumerate(items):
    print("index/value", i)

Result:

# index/value (0, 8)
# index/value (1, 23)
# index/value (2, 45)
# index/value (3, 12)
# index/value (4, 78)

for i, val in enumerate(items): print(i, val)

items = [8, 23, 45, 12, 78]

for i, val in enumerate(items):
    print("index", i, "for value", val)

Result:

# index 0 for value 8
# index 1 for value 23
# index 2 for value 45
# index 3 for value 12
# index 4 for value 78

for i, val in enumerate(items): print(i)

items = [8, 23, 45, 12, 78]

for i, val in enumerate(items):
    print("index", i)

Result:

# index 0
# index 1
# index 2
# index 3
# index 4

回答 9

根据此讨论:http : //bytes.com/topic/python/answers/464012-objects-list-index

循环计数器迭代

当前用于遍历索引的惯用法使用内置range函数:

for i in range(len(sequence)):
    # work with index i

可以通过旧习惯用法或使用新的zip内置函数来实现元素和索引的循环:

for i in range(len(sequence)):
    e = sequence[i]
    # work with index i and element e

要么

for i, e in zip(range(len(sequence)), sequence):
    # work with index i and element e

通过http://www.python.org/dev/peps/pep-0212/

According to this discussion: http://bytes.com/topic/python/answers/464012-objects-list-index

Loop counter iteration

The current idiom for looping over the indices makes use of the built-in range function:

for i in range(len(sequence)):
    # work with index i

Looping over both elements and indices can be achieved either by the old idiom or by using the new zip built-in function:

for i in range(len(sequence)):
    e = sequence[i]
    # work with index i and element e

or

for i, e in zip(range(len(sequence)), sequence):
    # work with index i and element e

via http://www.python.org/dev/peps/pep-0212/


回答 10

您可以使用以下代码进行操作:

ints = [8, 23, 45, 12, 78]
index = 0

for value in (ints):
    index +=1
    print index, value

如果您需要在循环结束时重置索引值,请使用此代码:

ints = [8, 23, 45, 12, 78]
index = 0

for value in (ints):
    index +=1
    print index, value
    if index >= len(ints)-1:
        index = 0

You can do it with this code:

ints = [8, 23, 45, 12, 78]
index = 0

for value in (ints):
    index +=1
    print index, value

Use this code if you need to reset the index value at the end of the loop:

ints = [8, 23, 45, 12, 78]
index = 0

for value in (ints):
    index +=1
    print index, value
    if index >= len(ints)-1:
        index = 0

回答 11

解决此问题的最佳方法是使用枚举内置python函数。
枚举返回元组
第一个值是索引,
第二个值是该索引处数组的元素

In [1]: ints = [8, 23, 45, 12, 78]

In [2]: for idx, val in enumerate(ints):
   ...:         print(idx, val)
   ...:     
(0, 8)
(1, 23)
(2, 45)
(3, 12)
(4, 78)

Best solution for this problem is use enumerate in-build python function.
enumerate return tuple
first value is index
second value is element of array at that index

In [1]: ints = [8, 23, 45, 12, 78]

In [2]: for idx, val in enumerate(ints):
   ...:         print(idx, val)
   ...:     
(0, 8)
(1, 23)
(2, 45)
(3, 12)
(4, 78)

回答 12

在您的问题中,您写道:“在这种情况下,我如何从1到5访问循环索引?”

但是,列表的索引从零开始。因此,那么我们需要知道您真正想要的是列表中每个项目的索引和项目,还是您真正想要的是从1开始的数字。幸运的是,在Python中,执行这一项或两项都很容易。

首先,要澄清一下,该enumerate函数迭代地返回列表中每个项目的索引和相应项目。

alist = [1, 2, 3, 4, 5]

for n, a in enumerate(alist):
    print("%d %d" % (n, a))

上面的输出是

0 1
1 2
2 3
3 4
4 5

请注意,索引从0开始。这种索引在包括Python和C在内的现代编程语言中很常见。

如果希望循环跨越列表的一部分,则可以将标准Python语法用于列表的一部分。例如,要从列表中的第二个项目循环到最后一个但不包括最后一个项目,可以使用

for n, a in enumerate(alist[1:-1]):
    print("%d %d" % (n, a))

请再次注意,输出索引从0开始,

0 2
1 3
2 4

这给我们带来了start=n的开关enumerate()。这只是使索引偏移,您可以等效地在循环内向索引简单地添加一个数字。

for n, a in enumerate(alist, start=1):
    print("%d %d" % (n, a))

其输出是

1 1
2 2
3 3
4 4
5 5

In your question, you write “how do I access the loop index, from 1 to 5 in this case?”

However, the index for a list runs from zero. So, then we need to know if what you actually want is the index and item for each item in a list, or whether you really want numbers starting from 1. Fortunately, in Python, it is easy to do either or both.

First, to clarify, the enumerate function iteratively returns the index and corresponding item for each item in a list.

alist = [1, 2, 3, 4, 5]

for n, a in enumerate(alist):
    print("%d %d" % (n, a))

The output for the above is then,

0 1
1 2
2 3
3 4
4 5

Notice that the index runs from 0. This kind of indexing is common among modern programming languages including Python and C.

If you want your loop to span a part of the list, you can use the standard Python syntax for a part of the list. For example, to loop from the second item in a list up to but not including the last item, you could use

for n, a in enumerate(alist[1:-1]):
    print("%d %d" % (n, a))

Note that once again, the output index runs from 0,

0 2
1 3
2 4

That brings us to the start=n switch for enumerate(). This simply offsets the index, you can equivalently simply add a number to the index inside the loop.

for n, a in enumerate(alist, start=1):
    print("%d %d" % (n, a))

for which the output is

1 1
2 2
3 3
4 4
5 5

回答 13

如果我要迭代,nums = [1, 2, 3, 4, 5]我会做

for i, num in enumerate(nums, start=1):
    print(i, num)

或获得长度为 l = len(nums)

for i in range(l):
    print(i+1, nums[i])

If I were to iterate nums = [1, 2, 3, 4, 5] I would do

for i, num in enumerate(nums, start=1):
    print(i, num)

Or get the length as l = len(nums)

for i in range(l):
    print(i+1, nums[i])

回答 14

如果列表中没有重复的值:

for i in ints:
    indx = ints.index(i)
    print(i, indx)

If there is no duplicate value in the list:

for i in ints:
    indx = ints.index(i)
    print(i, indx)

回答 15

您也可以尝试以下操作:

data = ['itemA.ABC', 'itemB.defg', 'itemC.drug', 'itemD.ashok']
x = []
for (i, item) in enumerate(data):
      a = (i, str(item).split('.'))
      x.append(a)
for index, value in x:
     print(index, value)

输出是

0 ['itemA', 'ABC']
1 ['itemB', 'defg']
2 ['itemC', 'drug']
3 ['itemD', 'ashok']

You can also try this:

data = ['itemA.ABC', 'itemB.defg', 'itemC.drug', 'itemD.ashok']
x = []
for (i, item) in enumerate(data):
      a = (i, str(item).split('.'))
      x.append(a)
for index, value in x:
     print(index, value)

The output is

0 ['itemA', 'ABC']
1 ['itemB', 'defg']
2 ['itemC', 'drug']
3 ['itemD', 'ashok']

回答 16

您可以使用index方法

ints = [8, 23, 45, 12, 78]
inds = [ints.index(i) for i in ints]

编辑 在注释中突出显示,如果中存在重复项ints,则此方法不起作用,下面的方法应适用于以下任何值ints

ints = [8, 8, 8, 23, 45, 12, 78]
inds = [tup[0] for tup in enumerate(ints)]

或者

ints = [8, 8, 8, 23, 45, 12, 78]
inds = [tup for tup in enumerate(ints)]

如果要同时获取索引和值ints作为元组列表。

它使用在enumerate此问题的选定答案中的方法,但具有列表理解功能,因此可以用较少的代码来加快速度。

You can use the index method

ints = [8, 23, 45, 12, 78]
inds = [ints.index(i) for i in ints]

EDIT Highlighted in the comment that this method doesn’t work if there are duplicates in ints, the method below should work for any values in ints:

ints = [8, 8, 8, 23, 45, 12, 78]
inds = [tup[0] for tup in enumerate(ints)]

Or alternatively

ints = [8, 8, 8, 23, 45, 12, 78]
inds = [tup for tup in enumerate(ints)]

if you want to get both the index and the value in ints as a list of tuples.

It uses the method of enumerate in the selected answer to this question, but with list comprehension, making it faster with less code.


回答 17

使用While循环的简单答案:

arr = [8, 23, 45, 12, 78]
i = 0
while i<len(arr):
    print("Item ",i+1," = ",arr[i])
    i +=1

输出:

在此处输入图片说明

Simple answer using While Loop:

arr = [8, 23, 45, 12, 78]
i = 0
while i<len(arr):
    print("Item ",i+1," = ",arr[i])
    i +=1

Output:

enter image description here


回答 18

要使用for循环在列表理解中打印(索引,值)的元组:

ints = [8, 23, 45, 12, 78]
print [(i,ints[i]) for i in range(len(ints))]

输出:

[(0, 8), (1, 23), (2, 45), (3, 12), (4, 78)]

To print tuple of (index, value) in list comprehension using a for loop:

ints = [8, 23, 45, 12, 78]
print [(i,ints[i]) for i in range(len(ints))]

Output:

[(0, 8), (1, 23), (2, 45), (3, 12), (4, 78)]

回答 19

这足以达到目的:

list1 = [10, 'sumit', 43.21, 'kumar', '43', 'test', 3]
for x in list1:
    print('index:', list1.index(x), 'value:', x)

This serves the purpose well enough:

list1 = [10, 'sumit', 43.21, 'kumar', '43', 'test', 3]
for x in list1:
    print('index:', list1.index(x), 'value:', x)

静态方法和类方法之间的区别

问题:静态方法和类方法之间的区别

@staticmethod修饰的功能和用修饰的功能有什么区别@classmethod

What is the difference between a function decorated with @staticmethod and one decorated with @classmethod?


回答 0

也许有点示例代码将有助于:发现其中的差别在调用签名fooclass_foo并且static_foo

class A(object):
    def foo(self, x):
        print "executing foo(%s, %s)" % (self, x)

    @classmethod
    def class_foo(cls, x):
        print "executing class_foo(%s, %s)" % (cls, x)

    @staticmethod
    def static_foo(x):
        print "executing static_foo(%s)" % x    

a = A()

以下是对象实例调用方法的常用方法。对象实例,a作为第一个参数隐式传递。

a.foo(1)
# executing foo(<__main__.A object at 0xb7dbef0c>,1)

使用classmethods时,对象实例的类作为第一个参数而不是隐式传递self

a.class_foo(1)
# executing class_foo(<class '__main__.A'>,1)

您也可以class_foo使用该类进行呼叫。实际上,如果您将某些东西定义为类方法,则可能是因为您打算从类而不是从类实例调用它。A.foo(1)本来会引发TypeError,但A.class_foo(1)效果很好:

A.class_foo(1)
# executing class_foo(<class '__main__.A'>,1)

人们发现类方法的一种用途是创建可继承的替代构造函数


使用staticmethods时self(对象实例)和 cls(类)都不会隐式传递为第一个参数。它们的行为类似于普通函数,不同之处在于您可以从实例或类中调用它们:

a.static_foo(1)
# executing static_foo(1)

A.static_foo('hi')
# executing static_foo(hi)

静态方法用于对与类之间具有某种逻辑联系的函数进行分组。


foo只是一个函数,但是当您调用a.foo它时,不仅得到函数,还会得到函数的“部分应用”版本,其中对象实例a绑定为函数的第一个参数。foo期望有2个参数,而a.foo只期望有1个参数。

a势必到foo。这就是下面的术语“绑定”的含义:

print(a.foo)
# <bound method A.foo of <__main__.A object at 0xb7d52f0c>>

a.class_fooa不绑定class_foo,而是与类A绑定class_foo

print(a.class_foo)
# <bound method type.class_foo of <class '__main__.A'>>

在这里,使用静态方法,即使它是一种方法,也a.static_foo只是返回一个没有绑定参数的良好的’ole函数。static_foo期望有1个参数,也 a.static_foo期望有1个参数。

print(a.static_foo)
# <function static_foo at 0xb7d479cc>

当然,当您static_foo使用类进行调用时,也会发生同样的事情A

print(A.static_foo)
# <function static_foo at 0xb7d479cc>

Maybe a bit of example code will help: Notice the difference in the call signatures of foo, class_foo and static_foo:

class A(object):
    def foo(self, x):
        print "executing foo(%s, %s)" % (self, x)

    @classmethod
    def class_foo(cls, x):
        print "executing class_foo(%s, %s)" % (cls, x)

    @staticmethod
    def static_foo(x):
        print "executing static_foo(%s)" % x    

a = A()

Below is the usual way an object instance calls a method. The object instance, a, is implicitly passed as the first argument.

a.foo(1)
# executing foo(<__main__.A object at 0xb7dbef0c>,1)

With classmethods, the class of the object instance is implicitly passed as the first argument instead of self.

a.class_foo(1)
# executing class_foo(<class '__main__.A'>,1)

You can also call class_foo using the class. In fact, if you define something to be a classmethod, it is probably because you intend to call it from the class rather than from a class instance. A.foo(1) would have raised a TypeError, but A.class_foo(1) works just fine:

A.class_foo(1)
# executing class_foo(<class '__main__.A'>,1)

One use people have found for class methods is to create inheritable alternative constructors.


With staticmethods, neither self (the object instance) nor cls (the class) is implicitly passed as the first argument. They behave like plain functions except that you can call them from an instance or the class:

a.static_foo(1)
# executing static_foo(1)

A.static_foo('hi')
# executing static_foo(hi)

Staticmethods are used to group functions which have some logical connection with a class to the class.


foo is just a function, but when you call a.foo you don’t just get the function, you get a “partially applied” version of the function with the object instance a bound as the first argument to the function. foo expects 2 arguments, while a.foo only expects 1 argument.

a is bound to foo. That is what is meant by the term “bound” below:

print(a.foo)
# <bound method A.foo of <__main__.A object at 0xb7d52f0c>>

With a.class_foo, a is not bound to class_foo, rather the class A is bound to class_foo.

print(a.class_foo)
# <bound method type.class_foo of <class '__main__.A'>>

Here, with a staticmethod, even though it is a method, a.static_foo just returns a good ‘ole function with no arguments bound. static_foo expects 1 argument, and a.static_foo expects 1 argument too.

print(a.static_foo)
# <function static_foo at 0xb7d479cc>

And of course the same thing happens when you call static_foo with the class A instead.

print(A.static_foo)
# <function static_foo at 0xb7d479cc>

回答 1

一个静态方法是一无所知,它被称为上类或实例的方法。它只是获取传递的参数,没有隐式的第一个参数。在Python中基本上没有用-您可以使用模块函数代替静态方法。

类方法,在另一方面,是获取传递的类,它被称为上,或该类的实例,它被称为上的,作为第一个参数的方法。当您希望该方法成为类的工厂时,这很有用:由于它获得了作为第一个参数调用的实际类,因此即使涉及子类,也始终可以实例化正确的类。例如dict.fromkeys(),观察在子类上调用时,类方法如何返回子类的实例:

>>> class DictSubclass(dict):
...     def __repr__(self):
...         return "DictSubclass"
... 
>>> dict.fromkeys("abc")
{'a': None, 'c': None, 'b': None}
>>> DictSubclass.fromkeys("abc")
DictSubclass
>>> 

A staticmethod is a method that knows nothing about the class or instance it was called on. It just gets the arguments that were passed, no implicit first argument. It is basically useless in Python — you can just use a module function instead of a staticmethod.

A classmethod, on the other hand, is a method that gets passed the class it was called on, or the class of the instance it was called on, as first argument. This is useful when you want the method to be a factory for the class: since it gets the actual class it was called on as first argument, you can always instantiate the right class, even when subclasses are involved. Observe for instance how dict.fromkeys(), a classmethod, returns an instance of the subclass when called on a subclass:

>>> class DictSubclass(dict):
...     def __repr__(self):
...         return "DictSubclass"
... 
>>> dict.fromkeys("abc")
{'a': None, 'c': None, 'b': None}
>>> DictSubclass.fromkeys("abc")
DictSubclass
>>> 

回答 2

基本上@classmethod使方法的第一个参数是从其调用的类(而不是类实例),@staticmethod它没有任何隐式参数。

Basically @classmethod makes a method whose first argument is the class it’s called from (rather than the class instance), @staticmethod does not have any implicit arguments.


回答 3

官方python文档:

@classmethod

类方法将类作为隐式第一个参数接收,就像实例方法接收实例一样。要声明类方法,请使用以下惯用法:

class C:
    @classmethod
    def f(cls, arg1, arg2, ...): ... 

@classmethod表单是一个函数 装饰器 –有关详细信息,请参见函数定义中的函数定义描述。

可以在类(如C.f())或实例(如C().f())上调用它。该实例除其类外均被忽略。如果为派生类调用类方法,则派生类对象作为隐式第一个参数传递。

类方法不同于C ++或Java静态方法。如果需要这些,请参阅staticmethod()本节。

@staticmethod

静态方法不会收到隐式的第一个参数。要声明静态方法,请使用以下惯用法:

class C:
    @staticmethod
    def f(arg1, arg2, ...): ... 

@staticmethod表单是一个函数 装饰器 –有关详细信息,请参见函数定义中的函数定义描述。

可以在类(如C.f())或实例(如C().f())上调用它。该实例除其类外均被忽略。

Python中的静态方法类似于Java或C ++中的静态方法。有关更高级的概念,请参阅 classmethod()本节。

Official python docs:

@classmethod

A class method receives the class as implicit first argument, just like an instance method receives the instance. To declare a class method, use this idiom:

class C:
    @classmethod
    def f(cls, arg1, arg2, ...): ... 

The @classmethod form is a function decorator – see the description of function definitions in Function definitions for details.

It can be called either on the class (such as C.f()) or on an instance (such as C().f()). The instance is ignored except for its class. If a class method is called for a derived class, the derived class object is passed as the implied first argument.

Class methods are different than C++ or Java static methods. If you want those, see staticmethod() in this section.

@staticmethod

A static method does not receive an implicit first argument. To declare a static method, use this idiom:

class C:
    @staticmethod
    def f(arg1, arg2, ...): ... 

The @staticmethod form is a function decorator – see the description of function definitions in Function definitions for details.

It can be called either on the class (such as C.f()) or on an instance (such as C().f()). The instance is ignored except for its class.

Static methods in Python are similar to those found in Java or C++. For a more advanced concept, see classmethod() in this section.


回答 4

是关于这个问题的简短文章

@staticmethod函数不过是在类内部定义的函数。可调用而无需先实例化该类。它的定义通过继承是不可变的。

@classmethod函数也可以在不实例化类的情况下调用,但是其定义是通过继承遵循Sub类,而不是Parent类。这是因为@classmethod函数的第一个参数必须始终为cls(类)。

Here is a short article on this question

@staticmethod function is nothing more than a function defined inside a class. It is callable without instantiating the class first. It’s definition is immutable via inheritance.

@classmethod function also callable without instantiating the class, but its definition follows Sub class, not Parent class, via inheritance. That’s because the first argument for @classmethod function must always be cls (class).


回答 5

要决定使用@staticmethod还是@classmethod,您必须查看方法内部。如果您的方法访问类中的其他变量/方法,请使用@classmethod。另一方面,如果您的方法未触及类的其他任何部分,请使用@staticmethod。

class Apple:

    _counter = 0

    @staticmethod
    def about_apple():
        print('Apple is good for you.')

        # note you can still access other member of the class
        # but you have to use the class instance 
        # which is not very nice, because you have repeat yourself
        # 
        # For example:
        # @staticmethod
        #    print('Number of apples have been juiced: %s' % Apple._counter)
        #
        # @classmethod
        #    print('Number of apples have been juiced: %s' % cls._counter)
        #
        #    @classmethod is especially useful when you move your function to other class,
        #       you don't have to rename the class reference 

    @classmethod
    def make_apple_juice(cls, number_of_apples):
        print('Make juice:')
        for i in range(number_of_apples):
            cls._juice_this(i)

    @classmethod
    def _juice_this(cls, apple):
        print('Juicing %d...' % apple)
        cls._counter += 1

To decide whether to use @staticmethod or @classmethod you have to look inside your method. If your method accesses other variables/methods in your class then use @classmethod. On the other hand, if your method does not touches any other parts of the class then use @staticmethod.

class Apple:

    _counter = 0

    @staticmethod
    def about_apple():
        print('Apple is good for you.')

        # note you can still access other member of the class
        # but you have to use the class instance 
        # which is not very nice, because you have repeat yourself
        # 
        # For example:
        # @staticmethod
        #    print('Number of apples have been juiced: %s' % Apple._counter)
        #
        # @classmethod
        #    print('Number of apples have been juiced: %s' % cls._counter)
        #
        #    @classmethod is especially useful when you move your function to other class,
        #       you don't have to rename the class reference 

    @classmethod
    def make_apple_juice(cls, number_of_apples):
        print('Make juice:')
        for i in range(number_of_apples):
            cls._juice_this(i)

    @classmethod
    def _juice_this(cls, apple):
        print('Juicing %d...' % apple)
        cls._counter += 1

回答 6

Python中的@staticmethod和@classmethod有什么区别?

您可能已经看到了类似此伪代码的Python代码,该代码演示了各种方法类型的签名,并提供了一个文档字符串来说明每种方法:

class Foo(object):

    def a_normal_instance_method(self, arg_1, kwarg_2=None):
        '''
        Return a value that is a function of the instance with its
        attributes, and other arguments such as arg_1 and kwarg2
        '''

    @staticmethod
    def a_static_method(arg_0):
        '''
        Return a value that is a function of arg_0. It does not know the 
        instance or class it is called from.
        '''

    @classmethod
    def a_class_method(cls, arg1):
        '''
        Return a value that is a function of the class and other arguments.
        respects subclassing, it is called with the class it is called from.
        '''

普通实例方法

首先,我会解释a_normal_instance_method。这就是所谓的“ 实例方法 ”。使用实例方法时,它用作部分函数(与总函数相反,在源代码中查看时为所有值定义的总函数),即在使用时,将第一个参数预定义为具有所有给定属性的对象。它具有绑定到其对象的实例,并且必须从该对象的实例调用它。通常,它将访问实例的各种属性。

例如,这是一个字符串的实例:

', '

如果我们join在该字符串上使用实例方法来连接另一个可迭代对象,则很明显,它是实例的功能,除了是可迭代列表的功能之外,还['a', 'b', 'c']

>>> ', '.join(['a', 'b', 'c'])
'a, b, c'

绑定方法

可以通过点分查找来绑定实例方法,以备后用。

例如,这将str.join方法绑定到':'实例:

>>> join_with_colons = ':'.join 

之后,我们可以将其用作已绑定第一个参数的函数。这样,它就像实例上的部分函数一样工作:

>>> join_with_colons('abcde')
'a:b:c:d:e'
>>> join_with_colons(['FF', 'FF', 'FF', 'FF', 'FF', 'FF'])
'FF:FF:FF:FF:FF:FF'

静态方法

静态方法并没有把实例作为参数。

它与模块级功能非常相似。

但是,模块级功能必须存在于模块中,并且必须专门导入到其他使用该功能的地方。

但是,如果将其附加到对象上,它也将通过导入和继承方便地跟随对象。

静态方法的一个示例是str.maketransstringPython 3 的模块中移出的。它使转换表适合由占用str.translate。从字符串的实例使用时,看起来确实很愚蠢,如下所示,但是从string模块导入函数相当笨拙,并且能够从类中调用它很好,例如str.maketrans

# demonstrate same function whether called from instance or not:
>>> ', '.maketrans('ABC', 'abc')
{65: 97, 66: 98, 67: 99}
>>> str.maketrans('ABC', 'abc')
{65: 97, 66: 98, 67: 99}

在python 2中,您必须从越来越少用的字符串模块中导入此函数:

>>> import string
>>> 'ABCDEFG'.translate(string.maketrans('ABC', 'abc'))
'abcDEFG'

类方法

类方法与实例方法类似,因为它采用了隐式的第一个参数,但是它采用了类,而不是采用实例。通常,它们被用作替代构造函数以更好地使用语义,并且它将支持继承。

内建类方法的最典型示例是dict.fromkeys。它用作dict的替代构造函数(非常适合当您知道键是什么并且想要它们的默认值时)。

>>> dict.fromkeys(['a', 'b', 'c'])
{'c': None, 'b': None, 'a': None}

当我们对dict进行子类化时,可以使用相同的构造函数,该构造函数创建子类的实例。

>>> class MyDict(dict): 'A dict subclass, use to demo classmethods'
>>> md = MyDict.fromkeys(['a', 'b', 'c'])
>>> md
{'a': None, 'c': None, 'b': None}
>>> type(md)
<class '__main__.MyDict'>

看到熊猫的源代码的替代构造其它类似的例子,同时也看到了官方的Python文档classmethodstaticmethod

What is the difference between @staticmethod and @classmethod in Python?

You may have seen Python code like this pseudocode, which demonstrates the signatures of the various method types and provides a docstring to explain each:

class Foo(object):

    def a_normal_instance_method(self, arg_1, kwarg_2=None):
        '''
        Return a value that is a function of the instance with its
        attributes, and other arguments such as arg_1 and kwarg2
        '''

    @staticmethod
    def a_static_method(arg_0):
        '''
        Return a value that is a function of arg_0. It does not know the 
        instance or class it is called from.
        '''

    @classmethod
    def a_class_method(cls, arg1):
        '''
        Return a value that is a function of the class and other arguments.
        respects subclassing, it is called with the class it is called from.
        '''

The Normal Instance Method

First I’ll explain a_normal_instance_method. This is precisely called an “instance method“. When an instance method is used, it is used as a partial function (as opposed to a total function, defined for all values when viewed in source code) that is, when used, the first of the arguments is predefined as the instance of the object, with all of its given attributes. It has the instance of the object bound to it, and it must be called from an instance of the object. Typically, it will access various attributes of the instance.

For example, this is an instance of a string:

', '

if we use the instance method, join on this string, to join another iterable, it quite obviously is a function of the instance, in addition to being a function of the iterable list, ['a', 'b', 'c']:

>>> ', '.join(['a', 'b', 'c'])
'a, b, c'

Bound methods

Instance methods can be bound via a dotted lookup for use later.

For example, this binds the str.join method to the ':' instance:

>>> join_with_colons = ':'.join 

And later we can use this as a function that already has the first argument bound to it. In this way, it works like a partial function on the instance:

>>> join_with_colons('abcde')
'a:b:c:d:e'
>>> join_with_colons(['FF', 'FF', 'FF', 'FF', 'FF', 'FF'])
'FF:FF:FF:FF:FF:FF'

Static Method

The static method does not take the instance as an argument.

It is very similar to a module level function.

However, a module level function must live in the module and be specially imported to other places where it is used.

If it is attached to the object, however, it will follow the object conveniently through importing and inheritance as well.

An example of a static method is str.maketrans, moved from the string module in Python 3. It makes a translation table suitable for consumption by str.translate. It does seem rather silly when used from an instance of a string, as demonstrated below, but importing the function from the string module is rather clumsy, and it’s nice to be able to call it from the class, as in str.maketrans

# demonstrate same function whether called from instance or not:
>>> ', '.maketrans('ABC', 'abc')
{65: 97, 66: 98, 67: 99}
>>> str.maketrans('ABC', 'abc')
{65: 97, 66: 98, 67: 99}

In python 2, you have to import this function from the increasingly less useful string module:

>>> import string
>>> 'ABCDEFG'.translate(string.maketrans('ABC', 'abc'))
'abcDEFG'

Class Method

A class method is a similar to an instance method in that it takes an implicit first argument, but instead of taking the instance, it takes the class. Frequently these are used as alternative constructors for better semantic usage and it will support inheritance.

The most canonical example of a builtin classmethod is dict.fromkeys. It is used as an alternative constructor of dict, (well suited for when you know what your keys are and want a default value for them.)

>>> dict.fromkeys(['a', 'b', 'c'])
{'c': None, 'b': None, 'a': None}

When we subclass dict, we can use the same constructor, which creates an instance of the subclass.

>>> class MyDict(dict): 'A dict subclass, use to demo classmethods'
>>> md = MyDict.fromkeys(['a', 'b', 'c'])
>>> md
{'a': None, 'c': None, 'b': None}
>>> type(md)
<class '__main__.MyDict'>

See the pandas source code for other similar examples of alternative constructors, and see also the official Python documentation on classmethod and staticmethod.


回答 7

我开始使用C ++,Java和Python学习编程语言,所以这个问题也困扰着我,直到我理解了每种语言的简单用法。

类方法:与Java和C ++不同,Python没有构造函数重载。因此,可以使用实现此目的classmethod。以下示例将对此进行解释

让我们考虑,我们有一个Person类,它有两个参数first_name,并last_name与创建的实例Person

class Person(object):

    def __init__(self, first_name, last_name):
        self.first_name = first_name
        self.last_name = last_name

现在,如果您只需要使用一个名称创建一个类(仅使用一个名称)first_name,那么您将无法在Python中执行类似的操作。

当您尝试创建对象(实例)时,这将给您一个错误。

class Person(object):

    def __init__(self, first_name, last_name):
        self.first_name = first_name
        self.last_name = last_name

    def __init__(self, first_name):
        self.first_name = first_name

但是,您可以使用@classmethod以下方法实现相同的目的

class Person(object):

    def __init__(self, first_name, last_name):
        self.first_name = first_name
        self.last_name = last_name

    @classmethod
    def get_person(cls, first_name):
        return cls(first_name, "")

静态方法:这很简单,它不受实例或类的约束,您可以使用类名简单地调用它。

因此,假设在上面的示例中,您需要一个first_name不超过20个字符的验证,您只需执行此操作即可。

@staticmethod  
def validate_name(name):
    return len(name) <= 20

你可以简单地使用 class name

Person.validate_name("Gaurang Shah")

I started learning programming language with C++ and then Java and then Python and so this question bothered me a lot as well, until I understood the simple usage of each.

Class Method: Python unlike Java and C++ doesn’t have constructor overloading. And so to achieve this you could use classmethod. Following example will explain this

Let’s consider we have a Person class which takes two arguments first_name and last_name and creates the instance of Person.

class Person(object):

    def __init__(self, first_name, last_name):
        self.first_name = first_name
        self.last_name = last_name

Now, if the requirement comes where you need to create a class using a single name only, just a first_name, you can’t do something like this in Python.

This will give you an error when you will try to create an object (instance).

class Person(object):

    def __init__(self, first_name, last_name):
        self.first_name = first_name
        self.last_name = last_name

    def __init__(self, first_name):
        self.first_name = first_name

However, you could achieve the same thing using @classmethod as mentioned below

class Person(object):

    def __init__(self, first_name, last_name):
        self.first_name = first_name
        self.last_name = last_name

    @classmethod
    def get_person(cls, first_name):
        return cls(first_name, "")

Static Method: This is rather simple, it’s not bound to instance or class and you can simply call that using class name.

So let’s say in above example you need a validation that first_name should not exceed 20 characters, you can simply do this.

@staticmethod  
def validate_name(name):
    return len(name) <= 20

and you could simply call using class name

Person.validate_name("Gaurang Shah")

回答 8

我认为一个更好的问题是“何时使用@classmethod与@staticmethod?”

@classmethod允许您轻松访问与类定义关联的私有成员。这是完成单例或工厂类的一种好方法,该类控制已创建对象的实例数量。

@staticmethod可以提供少量的性能提升,但是我还没有看到在类中有效地使用静态方法,而该方法不能作为类外的独立函数来实现。

I think a better question is “When would you use @classmethod vs @staticmethod?”

@classmethod allows you easy access to private members that are associated to the class definition. this is a great way to do singletons, or factory classes that control the number of instances of the created objects exist.

@staticmethod provides marginal performance gains, but I have yet to see a productive use of a static method within a class that couldn’t be achieved as a standalone function outside the class.


回答 9

@decorators是在python 2.4中添加的。如果您使用的是python <2.4,则可以使用classmethod()和staticmethod()函数。

例如,如果您想创建一个工厂方法(一个函数根据得到的参数返回一个类的不同实现的实例),您可以执行以下操作:

class Cluster(object):

    def _is_cluster_for(cls, name):
        """
        see if this class is the cluster with this name
        this is a classmethod
        """ 
        return cls.__name__ == name
    _is_cluster_for = classmethod(_is_cluster_for)

    #static method
    def getCluster(name):
        """
        static factory method, should be in Cluster class
        returns a cluster object for the given name
        """
        for cls in Cluster.__subclasses__():
            if cls._is_cluster_for(name):
                return cls()
    getCluster = staticmethod(getCluster)

还要注意,这是使用类方法和静态方法的一个很好的例子。静态方法显然属于该类,因为它在内部使用类Cluster。类方法仅需要有关类的信息,而无需对象的实例。

_is_cluster_for方法设为类方法的另一个好处是,子类可以决定更改其实现,这可能是因为它非常通用并且可以处理多种类型的集群,因此仅检查类的名称是不够的。

@decorators were added in python 2.4 If you’re using python < 2.4 you can use the classmethod() and staticmethod() function.

For example, if you want to create a factory method (A function returning an instance of a different implementation of a class depending on what argument it gets) you can do something like:

class Cluster(object):

    def _is_cluster_for(cls, name):
        """
        see if this class is the cluster with this name
        this is a classmethod
        """ 
        return cls.__name__ == name
    _is_cluster_for = classmethod(_is_cluster_for)

    #static method
    def getCluster(name):
        """
        static factory method, should be in Cluster class
        returns a cluster object for the given name
        """
        for cls in Cluster.__subclasses__():
            if cls._is_cluster_for(name):
                return cls()
    getCluster = staticmethod(getCluster)

Also observe that this is a good example for using a classmethod and a static method, The static method clearly belongs to the class, since it uses the class Cluster internally. The classmethod only needs information about the class, and no instance of the object.

Another benefit of making the _is_cluster_for method a classmethod is so a subclass can decide to change it’s implementation, maybe because it is pretty generic and can handle more than one type of cluster, so just checking the name of the class would not be enough.


回答 10

静态方法:

  • 没有自变量的简单函数。
  • 处理类属性;不在实例属性上。
  • 可以通过类和实例调用。
  • 内置函数staticmethod()用于创建它们。

静态方法的好处:

  • 它在类范围内本地化函数名称
  • 它将功能代码移近使用位置
  • 与模块级函数相比,导入更方便,因为不必专门导入每种方法

    @staticmethod
    def some_static_method(*args, **kwds):
        pass

类方法:

  • 具有第一个参数作为类名的函数。
  • 可以通过类和实例调用。
  • 这些是使用classmethod内置函数创建的。

     @classmethod
     def some_class_method(cls, *args, **kwds):
         pass

Static Methods:

  • Simple functions with no self argument.
  • Work on class attributes; not on instance attributes.
  • Can be called through both class and instance.
  • The built-in function staticmethod()is used to create them.

Benefits of Static Methods:

  • It localizes the function name in the classscope
  • It moves the function code closer to where it is used
  • More convenient to import versus module-level functions since each method does not have to be specially imported

    @staticmethod
    def some_static_method(*args, **kwds):
        pass
    

Class Methods:

  • Functions that have first argument as classname.
  • Can be called through both class and instance.
  • These are created with classmethod in-built function.

     @classmethod
     def some_class_method(cls, *args, **kwds):
         pass
    

回答 11

@staticmethod只是禁用默认函数作为方法描述符。classmethod将函数包装在可调用的容器中,该容器将对拥有类的引用作为第一个参数传递:

>>> class C(object):
...  pass
... 
>>> def f():
...  pass
... 
>>> staticmethod(f).__get__(None, C)
<function f at 0x5c1cf0>
>>> classmethod(f).__get__(None, C)
<bound method type.f of <class '__main__.C'>>

实际上,classmethod它具有运行时开销,但可以访问拥有的类。另外,我建议使用元类并将类方法放在该元类上:

>>> class CMeta(type):
...  def foo(cls):
...   print cls
... 
>>> class C(object):
...  __metaclass__ = CMeta
... 
>>> C.foo()
<class '__main__.C'>

@staticmethod just disables the default function as method descriptor. classmethod wraps your function in a container callable that passes a reference to the owning class as first argument:

>>> class C(object):
...  pass
... 
>>> def f():
...  pass
... 
>>> staticmethod(f).__get__(None, C)
<function f at 0x5c1cf0>
>>> classmethod(f).__get__(None, C)
<bound method type.f of <class '__main__.C'>>

As a matter of fact, classmethod has a runtime overhead but makes it possible to access the owning class. Alternatively I recommend using a metaclass and putting the class methods on that metaclass:

>>> class CMeta(type):
...  def foo(cls):
...   print cls
... 
>>> class C(object):
...  __metaclass__ = CMeta
... 
>>> C.foo()
<class '__main__.C'>

回答 12

关于如何在Python中使用静态,类或抽象方法的权威指南是该主题的一个很好的链接,并总结如下。

@staticmethod函数不过是在类内部定义的函数。可调用而无需先实例化该类。它的定义通过继承是不可变的。

  • Python不必实例化对象的绑定方法。
  • 它简化了代码的可读性,并且不依赖于对象本身的状态。

@classmethod函数也可以在不实例化该类的情况下调用,但是其定义遵循子类,而不是父类,通过继承可以被子类覆盖。这是因为@classmethodfunction 的第一个参数必须始终为cls(类)。

  • 工厂方法,用于使用例如某种预处理为类创建实例。
  • 静态方法调用静态方法:如果将静态方法拆分为多个静态方法,则不应硬编码类名,而应使用类方法

The definitive guide on how to use static, class or abstract methods in Python is one good link for this topic, and summary it as following.

@staticmethod function is nothing more than a function defined inside a class. It is callable without instantiating the class first. It’s definition is immutable via inheritance.

  • Python does not have to instantiate a bound-method for object.
  • It eases the readability of the code, and it does not depend on the state of object itself;

@classmethod function also callable without instantiating the class, but its definition follows Sub class, not Parent class, via inheritance, can be overridden by subclass. That’s because the first argument for @classmethod function must always be cls (class).

  • Factory methods, that are used to create an instance for a class using for example some sort of pre-processing.
  • Static methods calling static methods: if you split a static methods in several static methods, you shouldn’t hard-code the class name but use class methods

回答 13

只有第一个参数不同

  • 普通方法:当前对象(如果自动作为(附加)第一个参数传递)
  • classmethod:当前对象的类作为(附加)fist参数自动传递
  • 静态方法:不会自动传递其他参数。您传递给该函数的就是所得到的。

更详细地…

普通方法

调用对象的方法时,它会自动获得一个额外的参数self作为其第一个参数。即方法

def f(self, x, y)

必须使用2个参数调用。self是自动传递的,它是对象本身

类方法

装饰方法时

@classmethod
def f(cls, x, y)

自动提供的参数不是 self,而是的类 self

静态方法

装饰方法时

@staticmethod
def f(x, y)

该方法根本没有任何自动参数。仅提供调用它的参数。

用法

  • classmethod 主要用于替代构造函数。
  • staticmethod不使用对象的状态。它可能是类外部的函数。它仅放在类中以将具有相似功能的功能分组(例如,类似于Java的Math类静态方法)
class Point
    def __init__(self, x, y):
        self.x = x
        self.y = y

    @classmethod
    def frompolar(cls, radius, angle):
        """The `cls` argument is the `Point` class itself"""
        return cls(radius * cos(angle), radius * sin(angle))

    @staticmethod
    def angle(x, y):
        """this could be outside the class, but we put it here 
just because we think it is logically related to the class."""
        return atan(y, x)


p1 = Point(3, 2)
p2 = Point.frompolar(3, pi/4)

angle = Point.angle(3, 2)

Only the first argument differs:

  • normal method: the current object if automatically passed as an (additional) first argument
  • classmethod: the class of the current object is automatically passed as an (additional) fist argument
  • staticmethod: no extra arguments are automatically passed. What you passed to the function is what you get.

In more detail…

normal method

When an object’s method is called, it is automatically given an extra argument self as its first argument. That is, method

def f(self, x, y)

must be called with 2 arguments. self is automatically passed, and it is the object itself.

class method

When the method is decorated

@classmethod
def f(cls, x, y)

the automatically provided argument is not self, but the class of self.

static method

When the method is decorated

@staticmethod
def f(x, y)

the method is not given any automatic argument at all. It is only given the parameters that it is called with.

usages

  • classmethod is mostly used for alternative constructors.
  • staticmethod does not use the state of the object. It could be a function external to a class. It only put inside the class for grouping functions with similar functionality (for example, like Java’s Math class static methods)
class Point
    def __init__(self, x, y):
        self.x = x
        self.y = y

    @classmethod
    def frompolar(cls, radius, angle):
        """The `cls` argument is the `Point` class itself"""
        return cls(radius * cos(angle), radius * sin(angle))

    @staticmethod
    def angle(x, y):
        """this could be outside the class, but we put it here 
just because we think it is logically related to the class."""
        return atan(y, x)


p1 = Point(3, 2)
p2 = Point.frompolar(3, pi/4)

angle = Point.angle(3, 2)


回答 14

让我先说一下用@classmethod装饰的方法与@staticmethod装饰的方法之间的相似性。

相似:两者都可以在本身上调用,而不仅仅是类的实例。因此,从某种意义上来说,它们都是Class的方法

区别:类方法将接收类本身作为第一个参数,而静态方法则不接收。

因此,从某种意义上说,静态方法并不绑定于Class本身,而只是因为它可能具有相关的功能而挂在这里。

>>> class Klaus:
        @classmethod
        def classmthd(*args):
            return args

        @staticmethod
        def staticmthd(*args):
            return args

# 1. Call classmethod without any arg
>>> Klaus.classmthd()  
(__main__.Klaus,)  # the class gets passed as the first argument

# 2. Call classmethod with 1 arg
>>> Klaus.classmthd('chumma')
(__main__.Klaus, 'chumma')

# 3. Call staticmethod without any arg
>>> Klaus.staticmthd()  
()

# 4. Call staticmethod with 1 arg
>>> Klaus.staticmthd('chumma')
('chumma',)

Let me tell the similarity between a method decorated with @classmethod vs @staticmethod first.

Similarity: Both of them can be called on the Class itself, rather than just the instance of the class. So, both of them in a sense are Class’s methods.

Difference: A classmethod will receive the class itself as the first argument, while a staticmethod does not.

So a static method is, in a sense, not bound to the Class itself and is just hanging in there just because it may have a related functionality.

>>> class Klaus:
        @classmethod
        def classmthd(*args):
            return args

        @staticmethod
        def staticmthd(*args):
            return args

# 1. Call classmethod without any arg
>>> Klaus.classmthd()  
(__main__.Klaus,)  # the class gets passed as the first argument

# 2. Call classmethod with 1 arg
>>> Klaus.classmthd('chumma')
(__main__.Klaus, 'chumma')

# 3. Call staticmethod without any arg
>>> Klaus.staticmthd()  
()

# 4. Call staticmethod with 1 arg
>>> Klaus.staticmthd('chumma')
('chumma',)

回答 15

关于静态方法与类方法的另一个考虑是继承。假设您有以下类:

class Foo(object):
    @staticmethod
    def bar():
        return "In Foo"

然后,您想覆盖bar()一个子类:

class Foo2(Foo):
    @staticmethod
    def bar():
        return "In Foo2"

这是可行的,但是请注意,现在bar()子类(Foo2)中的实现不再可以利用该类的任何特定优势。例如,假设Foo2有一个magic()要在Foo2实现中使用的名为的方法bar()

class Foo2(Foo):
    @staticmethod
    def bar():
        return "In Foo2"
    @staticmethod
    def magic():
        return "Something useful you'd like to use in bar, but now can't" 

这里的解决办法是打电话Foo2.magic()bar(),但此时你重复自己(如果名称Foo2的改变,你必须记住要更新bar()方法)。

对我来说,这有点违反开放式/封闭式原则,因为做出的决定Foo会影响您在派生类中重构通用代码的能力(即扩展性较小)。如果bar()是a,classmethod我们会没事的:

class Foo(object):
    @classmethod
    def bar(cls):
        return "In Foo"

class Foo2(Foo):
    @classmethod
    def bar(cls):
        return "In Foo2 " + cls.magic()
    @classmethod
    def magic(cls):
        return "MAGIC"

print Foo2().bar()

给出: In Foo2 MAGIC

Another consideration with respect to staticmethod vs classmethod comes up with inheritance. Say you have the following class:

class Foo(object):
    @staticmethod
    def bar():
        return "In Foo"

And you then want to override bar() in a child class:

class Foo2(Foo):
    @staticmethod
    def bar():
        return "In Foo2"

This works, but note that now the bar() implementation in the child class (Foo2) can no longer take advantage of anything specific to that class. For example, say Foo2 had a method called magic() that you want to use in the Foo2 implementation of bar():

class Foo2(Foo):
    @staticmethod
    def bar():
        return "In Foo2"
    @staticmethod
    def magic():
        return "Something useful you'd like to use in bar, but now can't" 

The workaround here would be to call Foo2.magic() in bar(), but then you’re repeating yourself (if the name of Foo2 changes, you’ll have to remember to update that bar() method).

To me, this is a slight violation of the open/closed principle, since a decision made in Foo is impacting your ability to refactor common code in a derived class (ie it’s less open to extension). If bar() were a classmethod we’d be fine:

class Foo(object):
    @classmethod
    def bar(cls):
        return "In Foo"

class Foo2(Foo):
    @classmethod
    def bar(cls):
        return "In Foo2 " + cls.magic()
    @classmethod
    def magic(cls):
        return "MAGIC"

print Foo2().bar()

Gives: In Foo2 MAGIC


回答 16

我将尝试通过一个示例来说明基本区别。

class A(object):
    x = 0

    def say_hi(self):
        pass

    @staticmethod
    def say_hi_static():
        pass

    @classmethod
    def say_hi_class(cls):
        pass

    def run_self(self):
        self.x += 1
        print self.x # outputs 1
        self.say_hi()
        self.say_hi_static()
        self.say_hi_class()

    @staticmethod
    def run_static():
        print A.x  # outputs 0
        # A.say_hi() #  wrong
        A.say_hi_static()
        A.say_hi_class()

    @classmethod
    def run_class(cls):
        print cls.x # outputs 0
        # cls.say_hi() #  wrong
        cls.say_hi_static()
        cls.say_hi_class()

1-我们可以直接调用静态方法和类方法而无需初始化

# A.run_self() #  wrong
A.run_static()
A.run_class()

2-静态方法不能调用self方法,但可以调用其他static和classmethod

3-静态方法属于类,根本不会使用对象。

4-类方法不绑定到对象而是绑定到类。

I will try to explain the basic difference using an example.

class A(object):
    x = 0

    def say_hi(self):
        pass

    @staticmethod
    def say_hi_static():
        pass

    @classmethod
    def say_hi_class(cls):
        pass

    def run_self(self):
        self.x += 1
        print self.x # outputs 1
        self.say_hi()
        self.say_hi_static()
        self.say_hi_class()

    @staticmethod
    def run_static():
        print A.x  # outputs 0
        # A.say_hi() #  wrong
        A.say_hi_static()
        A.say_hi_class()

    @classmethod
    def run_class(cls):
        print cls.x # outputs 0
        # cls.say_hi() #  wrong
        cls.say_hi_static()
        cls.say_hi_class()

1 – we can directly call static and classmethods without initializing

# A.run_self() #  wrong
A.run_static()
A.run_class()

2- Static method cannot call self method but can call other static and classmethod

3- Static method belong to class and will not use object at all.

4- Class method are not bound to an object but to a class.


回答 17

@classmethod:可用于创建对该类创建的所有实例的共享全局访问……例如由多个用户更新记录….我特别发现创建单例时它也很有效..: )

@static方法:与与…相关联的类或实例无关,但出于可读性考虑,可以使用static方法

@classmethod : can be used to create a shared global access to all the instances created of that class….. like updating a record by multiple users…. I particulary found it use ful when creating singletons as well..:)

@static method: has nothing to do with the class or instance being associated with …but for readability can use static method


回答 18

您可能需要考虑以下两者之间的区别:

Class A:
    def foo():  # no self parameter, no decorator
        pass

Class B:
    @staticmethod
    def foo():  # no self parameter
        pass

这在python2和python3之间发生了变化:

python2:

>>> A.foo()
TypeError
>>> A().foo()
TypeError
>>> B.foo()
>>> B().foo()

python3:

>>> A.foo()
>>> A().foo()
TypeError
>>> B.foo()
>>> B().foo()

因此@staticmethod,仅在类中直接使用 for方法已成为python3中的可选方法。如果要从类和实例中调用它们,则仍需要使用@staticmethod装饰器。

unutbus的答案很好地涵盖了其他情况。

You might want to consider the difference between:

Class A:
    def foo():  # no self parameter, no decorator
        pass

and

Class B:
    @staticmethod
    def foo():  # no self parameter
        pass

This has changed between python2 and python3:

python2:

>>> A.foo()
TypeError
>>> A().foo()
TypeError
>>> B.foo()
>>> B().foo()

python3:

>>> A.foo()
>>> A().foo()
TypeError
>>> B.foo()
>>> B().foo()

So using @staticmethod for methods only called directly from the class has become optional in python3. If you want to call them from both class and instance, you still need to use the @staticmethod decorator.

The other cases have been well covered by unutbus answer.


回答 19

我的贡献演示之间的差异@classmethod@staticmethod以及实例方法,包括如何实例可以间接调用@staticmethod。但是@staticmethod与其从实例中间接调用a ,不如将其设为私有可能更像是“ pythonic”。这里没有演示从私有方法获取某些东西,但是基本上是相同的概念。

#!python3

from os import system
system('cls')
# %   %   %   %   %   %   %   %   %   %   %   %   %   %   %   %   %   %   %   %

class DemoClass(object):
    # instance methods need a class instance and
    # can access the instance through 'self'
    def instance_method_1(self):
        return 'called from inside the instance_method_1()'

    def instance_method_2(self):
        # an instance outside the class indirectly calls the static_method
        return self.static_method() + ' via instance_method_2()'

    # class methods don't need a class instance, they can't access the
    # instance (self) but they have access to the class itself via 'cls'
    @classmethod
    def class_method(cls):
        return 'called from inside the class_method()'

    # static methods don't have access to 'cls' or 'self', they work like
    # regular functions but belong to the class' namespace
    @staticmethod
    def static_method():
        return 'called from inside the static_method()'
# %   %   %   %   %   %   %   %   %   %   %   %   %   %   %   %   %   %   %   %

# works even if the class hasn't been instantiated
print(DemoClass.class_method() + '\n')
''' called from inside the class_method() '''

# works even if the class hasn't been instantiated
print(DemoClass.static_method() + '\n')
''' called from inside the static_method() '''
# %   %   %   %   %   %   %   %   %   %   %   %   %   %   %   %   %   %   %   %

# >>>>> all methods types can be called on a class instance <<<<<
# instantiate the class
democlassObj = DemoClass()

# call instance_method_1()
print(democlassObj.instance_method_1() + '\n')
''' called from inside the instance_method_1() '''

# # indirectly call static_method through instance_method_2(), there's really no use
# for this since a @staticmethod can be called whether the class has been
# instantiated or not
print(democlassObj.instance_method_2() + '\n')
''' called from inside the static_method() via instance_method_2() '''

# call class_method()
print(democlassObj.class_method() + '\n')
'''  called from inside the class_method() '''

# call static_method()
print(democlassObj.static_method())
''' called from inside the static_method() '''

"""
# whether the class is instantiated or not, this doesn't work
print(DemoClass.instance_method_1() + '\n')
'''
TypeError: TypeError: unbound method instancemethod() must be called with
DemoClass instance as first argument (got nothing instead)
'''
"""

My contribution demonstrates the difference amongst @classmethod, @staticmethod, and instance methods, including how an instance can indirectly call a @staticmethod. But instead of indirectly calling a @staticmethod from an instance, making it private may be more “pythonic.” Getting something from a private method isn’t demonstrated here but it’s basically the same concept.

#!python3

from os import system
system('cls')
# %   %   %   %   %   %   %   %   %   %   %   %   %   %   %   %   %   %   %   %

class DemoClass(object):
    # instance methods need a class instance and
    # can access the instance through 'self'
    def instance_method_1(self):
        return 'called from inside the instance_method_1()'

    def instance_method_2(self):
        # an instance outside the class indirectly calls the static_method
        return self.static_method() + ' via instance_method_2()'

    # class methods don't need a class instance, they can't access the
    # instance (self) but they have access to the class itself via 'cls'
    @classmethod
    def class_method(cls):
        return 'called from inside the class_method()'

    # static methods don't have access to 'cls' or 'self', they work like
    # regular functions but belong to the class' namespace
    @staticmethod
    def static_method():
        return 'called from inside the static_method()'
# %   %   %   %   %   %   %   %   %   %   %   %   %   %   %   %   %   %   %   %

# works even if the class hasn't been instantiated
print(DemoClass.class_method() + '\n')
''' called from inside the class_method() '''

# works even if the class hasn't been instantiated
print(DemoClass.static_method() + '\n')
''' called from inside the static_method() '''
# %   %   %   %   %   %   %   %   %   %   %   %   %   %   %   %   %   %   %   %

# >>>>> all methods types can be called on a class instance <<<<<
# instantiate the class
democlassObj = DemoClass()

# call instance_method_1()
print(democlassObj.instance_method_1() + '\n')
''' called from inside the instance_method_1() '''

# # indirectly call static_method through instance_method_2(), there's really no use
# for this since a @staticmethod can be called whether the class has been
# instantiated or not
print(democlassObj.instance_method_2() + '\n')
''' called from inside the static_method() via instance_method_2() '''

# call class_method()
print(democlassObj.class_method() + '\n')
'''  called from inside the class_method() '''

# call static_method()
print(democlassObj.static_method())
''' called from inside the static_method() '''

"""
# whether the class is instantiated or not, this doesn't work
print(DemoClass.instance_method_1() + '\n')
'''
TypeError: TypeError: unbound method instancemethod() must be called with
DemoClass instance as first argument (got nothing instead)
'''
"""

回答 20

类方法将类作为隐式第一个参数接收,就像实例方法接收实例一样。它是绑定到类而不是类对象的方法,因为它使用指向类而不是对象实例的类参数,所以可以访问类的状态。它可以修改适用于该类所有实例的类状态。例如,它可以修改将适用于所有实例的类变量。

另一方面,与类方法或实例方法相比,静态方法不接收隐式的第一个参数。并且无法访问或修改类状态。它仅属于该类,因为从设计的角度来看这是正确的方法。但是就功能而言,在运行时未绑定到该类。

作为准则,请将静态方法用作实用程序,将类方法用作例如factory。或定义一个单例。并使用实例方法对实例的状态和行为进行建模。

希望我很清楚!

A class method receives the class as implicit first argument, just like an instance method receives the instance. It is a method which is bound to the class and not the object of the class.It has access to the state of the class as it takes a class parameter that points to the class and not the object instance. It can modify a class state that would apply across all the instances of the class. For example it can modify a class variable that will be applicable to all the instances.

On the other hand, a static method does not receive an implicit first argument, compared to class methods or instance methods. And can’t access or modify class state. It only belongs to the class because from design point of view that is the correct way. But in terms of functionality is not bound, at runtime, to the class.

as a guideline, use static methods as utilities, use class methods for example as factory . Or maybe to define a singleton. And use instance methods to model the state and behavior of instances.

Hope I was clear !


回答 21

顾名思义,类方法用于更改类而不是对象。为了更改类,他们将修改类属性(而不是对象属性),因为这是更新类的方式。这就是类方法将类(通常用“ cls”表示)作为第一个参数的原因。

class A(object):
    m=54

    @classmethod
    def class_method(cls):
        print "m is %d" % cls.m

另一方面,静态方法用于执行未绑定到类的功能,即它们不会读取或写入类变量。因此,静态方法不将类作为参数。使用它们是为了使类执行与该类目的不直接相关的功能。

class X(object):
    m=54 #will not be referenced

    @staticmethod
    def static_method():
        print "Referencing/calling a variable or function outside this class. E.g. Some global variable/function."

Class methods, as the name suggests, are used to make changes to classes and not the objects. To make changes to classes, they will modify the class attributes(not object attributes), since that is how you update classes. This is the reason that class methods take the class(conventionally denoted by ‘cls’) as the first argument.

class A(object):
    m=54

    @classmethod
    def class_method(cls):
        print "m is %d" % cls.m

Static methods on the other hand, are used to perform functionalities that are not bound to the class i.e. they will not read or write class variables. Hence, static methods do not take classes as arguments. They are used so that classes can perform functionalities that are not directly related to the purpose of the class.

class X(object):
    m=54 #will not be referenced

    @staticmethod
    def static_method():
        print "Referencing/calling a variable or function outside this class. E.g. Some global variable/function."

回答 22

从字面上分析@staticmethod可以提供不同的见解。

类的常规方法是隐式动态方法,该方法将实例作为第一个参数。
相反,静态方法不将实例作为第一个参数,因此称为“静态”

静态方法确实是一种正常的功能,与类定义之外的功能相同。
幸运的是,将它分组在类中只是为了靠近它的应用位置,或者您可以滚动查找它。

Analyze @staticmethod literally providing different insights.

A normal method of a class is an implicit dynamic method which takes the instance as first argument.
In contrast, a staticmethod does not take the instance as first argument, so is called ‘static’.

A staticmethod is indeed such a normal function the same as those outside a class definition.
It is luckily grouped into the class just in order to stand closer where it is applied, or you might scroll around to find it.


回答 23

我认为给出一个纯Python版本的staticmethodclassmethod将有助于在语言级别上理解它们之间的区别。

它们都是非数据描述符(如果您先熟悉描述符,会更容易理解它们)。

class StaticMethod(object):
    "Emulate PyStaticMethod_Type() in Objects/funcobject.c"

    def __init__(self, f):
        self.f = f

    def __get__(self, obj, objtype=None):
        return self.f


class ClassMethod(object):
    "Emulate PyClassMethod_Type() in Objects/funcobject.c"
    def __init__(self, f):
        self.f = f

    def __get__(self, obj, cls=None):
        def inner(*args, **kwargs):
            if cls is None:
                cls = type(obj)
            return self.f(cls, *args, **kwargs)
        return inner

I think giving a purely Python version of staticmethod and classmethod would help to understand the difference between them at language level.

Both of them are non-data descriptors (It would be easier to understand them if you are familiar with descriptors first).

class StaticMethod(object):
    "Emulate PyStaticMethod_Type() in Objects/funcobject.c"

    def __init__(self, f):
        self.f = f

    def __get__(self, obj, objtype=None):
        return self.f


class ClassMethod(object):
    "Emulate PyClassMethod_Type() in Objects/funcobject.c"
    def __init__(self, f):
        self.f = f

    def __get__(self, obj, cls=None):
        def inner(*args, **kwargs):
            if cls is None:
                cls = type(obj)
            return self.f(cls, *args, **kwargs)
        return inner

回答 24

静态方法无法访问继承层次结构中的对象,类或父类的服装。可以直接在类上调用它(无需创建对象)。

classmethod无法访问该对象的属性。但是,它可以访问继承层次结构中的类和父类的属性。可以直接在类上调用它(无需创建对象)。如果在该对象上调用,则它与普通方法相同,后者不会访问self.<attribute(s)>并且self.__class__.<attribute(s)>只能访问。

认为我们有一个带有的类b=2,我们将创建一个对象并将其重新设置为b=4其中。静态方法无法访问以前的任何内容。Classmethod .b==2只能通过进行访问cls.b。:普通方法可以同时访问.b==4通过self.b.b==2通过self.__class__.b

我们可以遵循KISS风格(保持简单,愚蠢):不要使用静态方法和类方法,不要在未实例化它们的情况下使用类,仅访问对象的属性self.attribute(s)。在某些语言中,以这种方式实现了OOP,我认为这不是一个坏主意。:)

staticmethod has no access to attibutes of the object, of the class, or of parent classes in the inheritance hierarchy. It can be called at the class directly (without creating an object).

classmethod has no access to attributes of the object. It however can access attributes of the class and of parent classes in the inheritance hierarchy. It can be called at the class directly (without creating an object). If called at the object then it is the same as normal method which doesn’t access self.<attribute(s)> and accesses self.__class__.<attribute(s)> only.

Think we have a class with b=2, we will create an object and re-set this to b=4 in it. Staticmethod cannot access nothing from previous. Classmethod can access .b==2 only, via cls.b. Normal method can access both: .b==4 via self.b and .b==2 via self.__class__.b.

We could follow the KISS style (keep it simple, stupid): Don’t use staticmethods and classmethods, don’t use classes without instantiating them, access only the object’s attributes self.attribute(s). There are languages where the OOP is implemented that way and I think it is not bad idea. :)


回答 25

在iPython中对其他相同方法的快速分析表明,该方法会@staticmethod产生少量的性能提升(以纳秒为单位),但否则似乎无济于事。另外,staticmethod()在编译过程中(通过运行脚本执行任何代码之前),通过处理该方法的其他工作可能会消除所有性能提升。

出于代码可读性的考虑,@staticmethod除非您的方法用于纳秒级的工作负载,否则我将避免使用。

A quick hack-up ofotherwise identical methods in iPython reveals that @staticmethod yields marginal performance gains (in the nanoseconds), but otherwise it seems to serve no function. Also, any performance gains will probably be wiped out by the additional work of processing the method through staticmethod() during compilation (which happens prior to any code execution when you run a script).

For the sake of code readability I’d avoid @staticmethod unless your method will be used for loads of work, where the nanoseconds count.


如何列出目录的所有文件?

问题:如何列出目录的所有文件?

如何在Python中列出目录的所有文件并将其添加到list

How can I list all files of a directory in Python and add them to a list?


回答 0

os.listdir()将为您提供目录中的所有内容- 文件目录

如果只需要文件,则可以使用os.path以下方法将其过滤掉:

from os import listdir
from os.path import isfile, join
onlyfiles = [f for f in listdir(mypath) if isfile(join(mypath, f))]

或者,您也可以使用os.walk()它为访问的每个目录生成两个列表 – 为您拆分为文件目录。如果只需要顶层目录,可以在第一次生成目录时中断

from os import walk

f = []
for (dirpath, dirnames, filenames) in walk(mypath):
    f.extend(filenames)
    break

os.listdir() will get you everything that’s in a directory – files and directories.

If you want just files, you could either filter this down using os.path:

from os import listdir
from os.path import isfile, join
onlyfiles = [f for f in listdir(mypath) if isfile(join(mypath, f))]

or you could use os.walk() which will yield two lists for each directory it visits – splitting into files and dirs for you. If you only want the top directory you can just break the first time it yields

from os import walk

f = []
for (dirpath, dirnames, filenames) in walk(mypath):
    f.extend(filenames)
    break

回答 1

我更喜欢使用glob模块,因为它可以进行模式匹配和扩展。

import glob
print(glob.glob("/home/adam/*.txt"))

它将返回包含查询文件的列表:

['/home/adam/file1.txt', '/home/adam/file2.txt', .... ]

I prefer using the glob module, as it does pattern matching and expansion.

import glob
print(glob.glob("/home/adam/*.txt"))

It will return a list with the queried files:

['/home/adam/file1.txt', '/home/adam/file2.txt', .... ]

回答 2

使用Python 2和3获取文件列表


os.listdir()

如何获取当前目录中的所有文件(和目录)(Python 3)

以下是在Python 3中使用oslistdir()函数仅检索当前目录中文件的简单方法。进一步的探索将演示如何返回目录中的文件夹,但是您不会在子目录中拥有该文件,因此可以使用步行-稍后讨论)。

 import os
 arr = os.listdir()
 print(arr)

 >>> ['$RECYCLE.BIN', 'work.txt', '3ebooks.txt', 'documents']

glob

我发现glob更容易选择相同类型或相同的文件。看下面的例子:

import glob

txtfiles = []
for file in glob.glob("*.txt"):
    txtfiles.append(file)

glob 具有列表理解

import glob

mylist = [f for f in glob.glob("*.txt")]

glob 具有功能

该函数在参数中返回给定扩展名(.txt,.docx等)的列表。

import glob

def filebrowser(ext=""):
    "Returns files with an extension"
    return [f for f in glob.glob(f"*{ext}")]

x = filebrowser(".txt")
print(x)

>>> ['example.txt', 'fb.txt', 'intro.txt', 'help.txt']

glob 扩展先前的代码

该函数现在返回与您作为参数传递的字符串匹配的文件列表

import glob

def filesearch(word=""):
    """Returns a list with all files with the word/extension in it"""
    file = []
    for f in glob.glob("*"):
        if word[0] == ".":
            if f.endswith(word):
                file.append(f)
                return file
        elif word in f:
            file.append(f)
            return file
    return file

lookfor = "example", ".py"
for w in lookfor:
    print(f"{w:10} found => {filesearch(w)}")

输出

example    found => []
.py        found => ['search.py']

获取完整的路径名 os.path.abspath

正如您所注意到的,上面的代码中没有文件的完整路径。如果需要绝对路径,则可以使用os.path模块的另一个函数,_getfullpathname将从os.listdir()中获取的文件作为参数。还有其他完整路径的方法,稍后我们将进行检查(如mexmex所建议,我将_getfullpathname替换为abspath)。

 import os
 files_path = [os.path.abspath(x) for x in os.listdir()]
 print(files_path)

 >>> ['F:\\documenti\applications.txt', 'F:\\documenti\collections.txt']

使用以下命令获取所有子目录中文件类型的完整路径名 walk

我发现这对于在许多目录中查找内容非常有用,它帮助我找到了一个我不记得其名称的文件:

import os

# Getting the current work directory (cwd)
thisdir = os.getcwd()

# r=root, d=directories, f = files
for r, d, f in os.walk(thisdir):
    for file in f:
        if file.endswith(".docx"):
            print(os.path.join(r, file))

os.listdir():获取当前目录中的文件(Python 2)

在Python 2中,如果要在当前目录中列出文件列表,则必须将参数指定为“。”。或os.listdir方法中的os.getcwd()。

 import os
 arr = os.listdir('.')
 print(arr)

 >>> ['$RECYCLE.BIN', 'work.txt', '3ebooks.txt', 'documents']

进入目录树

# Method 1
x = os.listdir('..')

# Method 2
x= os.listdir('/')

获取文件:os.listdir()在特定目录中(Python 2和3)

 import os
 arr = os.listdir('F:\\python')
 print(arr)

 >>> ['$RECYCLE.BIN', 'work.txt', '3ebooks.txt', 'documents']

使用以下命令获取特定子目录的文件 os.listdir()

import os

x = os.listdir("./content")

os.walk('.') -当前目录

 import os
 arr = next(os.walk('.'))[2]
 print(arr)

 >>> ['5bs_Turismo1.pdf', '5bs_Turismo1.pptx', 'esperienza.txt']

next(os.walk('.'))os.path.join('dir', 'file')

 import os
 arr = []
 for d,r,f in next(os.walk("F:\\_python")):
     for file in f:
         arr.append(os.path.join(r,file))

 for f in arr:
     print(files)

>>> F:\\_python\\dict_class.py
>>> F:\\_python\\programmi.txt

next(os.walk('F:\\') -获取完整路径-列表理解

 [os.path.join(r,file) for r,d,f in next(os.walk("F:\\_python")) for file in f]

 >>> ['F:\\_python\\dict_class.py', 'F:\\_python\\programmi.txt']

os.walk -获取完整路径-子目录中的所有文件**

x = [os.path.join(r,file) for r,d,f in os.walk("F:\\_python") for file in f]
print(x)

>>> ['F:\\_python\\dict.py', 'F:\\_python\\progr.txt', 'F:\\_python\\readl.py']

os.listdir() -仅获取txt文件

 arr_txt = [x for x in os.listdir() if x.endswith(".txt")]
 print(arr_txt)

 >>> ['work.txt', '3ebooks.txt']

使用glob获得的文件的完整路径

如果我需要文件的绝对路径:

from path import path
from glob import glob
x = [path(f).abspath() for f in glob("F:\\*.txt")]
for f in x:
    print(f)

>>> F:\acquistionline.txt
>>> F:\acquisti_2018.txt
>>> F:\bootstrap_jquery_ecc.txt

使用os.path.isfile列表,以避免目录

import os.path
listOfFiles = [f for f in os.listdir() if os.path.isfile(f)]
print(listOfFiles)

>>> ['a simple game.py', 'data.txt', 'decorator.py']

使用pathlib在Python 3.4

import pathlib

flist = []
for p in pathlib.Path('.').iterdir():
    if p.is_file():
        print(p)
        flist.append(p)

 >>> error.PNG
 >>> exemaker.bat
 >>> guiprova.mp3
 >>> setup.py
 >>> speak_gui2.py
 >>> thumb.PNG

list comprehension

flist = [p for p in pathlib.Path('.').iterdir() if p.is_file()]

或者,使用pathlib.Path()代替pathlib.Path(".")

在pathlib.Path()中使用glob方法

import pathlib

py = pathlib.Path().glob("*.py")
for file in py:
    print(file)

>>> stack_overflow_list.py
>>> stack_overflow_list_tkinter.py

使用os.walk获取所有文件

import os
x = [i[2] for i in os.walk('.')]
y=[]
for t in x:
    for f in t:
        y.append(f)
print(y)

>>> ['append_to_list.py', 'data.txt', 'data1.txt', 'data2.txt', 'data_180617', 'os_walk.py', 'READ2.py', 'read_data.py', 'somma_defaltdic.py', 'substitute_words.py', 'sum_data.py', 'data.txt', 'data1.txt', 'data_180617']

仅获取具有next的文件并进入目录

 import os
 x = next(os.walk('F://python'))[2]
 print(x)

 >>> ['calculator.bat','calculator.py']

仅获取具有next的目录并进入目录

 import os
 next(os.walk('F://python'))[1] # for the current dir use ('.')

 >>> ['python3','others']

使用以下命令获取所有子目录名称 walk

for r,d,f in os.walk("F:\\_python"):
    for dirs in d:
        print(dirs)

>>> .vscode
>>> pyexcel
>>> pyschool.py
>>> subtitles
>>> _metaprogramming
>>> .ipynb_checkpoints

os.scandir() 从Python 3.5及更高版本开始

import os
x = [f.name for f in os.scandir() if f.is_file()]
print(x)

>>> ['calculator.bat','calculator.py']

# Another example with scandir (a little variation from docs.python.org)
# This one is more efficient than os.listdir.
# In this case, it shows the files only in the current directory
# where the script is executed.

import os
with os.scandir() as i:
    for entry in i:
        if entry.is_file():
            print(entry.name)

>>> ebookmaker.py
>>> error.PNG
>>> exemaker.bat
>>> guiprova.mp3
>>> setup.py
>>> speakgui4.py
>>> speak_gui2.py
>>> speak_gui3.py
>>> thumb.PNG

例子:

例如 1:子目录中有多少个文件?

在此示例中,我们查找所有目录及其子目录中包含的文件数。

import os

def count(dir, counter=0):
    "returns number of files in dir and subdirs"
    for pack in os.walk(dir):
        for f in pack[2]:
            counter += 1
    return dir + " : " + str(counter) + "files"

print(count("F:\\python"))

>>> 'F:\\\python' : 12057 files'

例2:如何将所有文件从一个目录复制到另一个目录?

用于在计算机中排序的脚本,以查找一种类型的所有文件(默认值:pptx)并将其复制到新文件夹中。

import os
import shutil
from path import path

destination = "F:\\file_copied"
# os.makedirs(destination)

def copyfile(dir, filetype='pptx', counter=0):
    "Searches for pptx (or other - pptx is the default) files and copies them"
    for pack in os.walk(dir):
        for f in pack[2]:
            if f.endswith(filetype):
                fullpath = pack[0] + "\\" + f
                print(fullpath)
                shutil.copy(fullpath, destination)
                counter += 1
    if counter > 0:
        print('-' * 30)
        print("\t==> Found in: `" + dir + "` : " + str(counter) + " files\n")

for dir in os.listdir():
    "searches for folders that starts with `_`"
    if dir[0] == '_':
        # copyfile(dir, filetype='pdf')
        copyfile(dir, filetype='txt')


>>> _compiti18\Compito Contabilità 1\conti.txt
>>> _compiti18\Compito Contabilità 1\modula4.txt
>>> _compiti18\Compito Contabilità 1\moduloa4.txt
>>> ------------------------
>>> ==> Found in: `_compiti18` : 3 files

例如 3:如何获取txt文件中的所有文件

如果要使用所有文件名创建一个txt文件,请执行以下操作:

import os
mylist = ""
with open("filelist.txt", "w", encoding="utf-8") as file:
    for eachfile in os.listdir():
        mylist += eachfile + "\n"
    file.write(mylist)

示例:包含硬盘驱动器所有文件的txt

"""
We are going to save a txt file with all the files in your directory.
We will use the function walk()
"""

import os

# see all the methods of os
# print(*dir(os), sep=", ")
listafile = []
percorso = []
with open("lista_file.txt", "w", encoding='utf-8') as testo:
    for root, dirs, files in os.walk("D:\\"):
        for file in files:
            listafile.append(file)
            percorso.append(root + "\\" + file)
            testo.write(file + "\n")
listafile.sort()
print("N. of files", len(listafile))
with open("lista_file_ordinata.txt", "w", encoding="utf-8") as testo_ordinato:
    for file in listafile:
        testo_ordinato.write(file + "\n")

with open("percorso.txt", "w", encoding="utf-8") as file_percorso:
    for file in percorso:
        file_percorso.write(file + "\n")

os.system("lista_file.txt")
os.system("lista_file_ordinata.txt")
os.system("percorso.txt")

C:\的所有文件都在一个文本文件中

这是先前代码的简短版本。如果您需要从另一个位置开始,请更改开始查找文件的文件夹。这段代码在我的计算机上的文本文件上生成了50 mb的内容,其中包含完整路径的文件少于500.000行。

import os

with open("file.txt", "w", encoding="utf-8") as filewrite:
    for r, d, f in os.walk("C:\\"):
        for file in f:
            filewrite.write(f"{r + file}\n")

如何在一个类型的文件夹中写入所有路径的文件

使用此功能,您可以创建一个txt文件,该文件将具有要查找的文件类型的名称(例如pngfile.txt),并带有该类型所有文件的所有完整路径。我认为有时候它会很有用。

import os

def searchfiles(extension='.ttf', folder='H:\\'):
    "Create a txt file with all the file of a type"
    with open(extension[1:] + "file.txt", "w", encoding="utf-8") as filewrite:
        for r, d, f in os.walk(folder):
            for file in f:
                if file.endswith(extension):
                    filewrite.write(f"{r + file}\n")

# looking for png file (fonts) in the hard disk H:\
searchfiles('.png', 'H:\\')

>>> H:\4bs_18\Dolphins5.png
>>> H:\4bs_18\Dolphins6.png
>>> H:\4bs_18\Dolphins7.png
>>> H:\5_18\marketing html\assets\imageslogo2.png
>>> H:\7z001.png
>>> H:\7z002.png

(新)找到所有文件并使用tkinter GUI打开它们

我只是想在这个2019年添加一个小应用程序,以在目录中搜索所有文件,并能够通过双击列表中文件的名称来打开它们。 在此处输入图片说明

import tkinter as tk
import os

def searchfiles(extension='.txt', folder='H:\\'):
    "insert all files in the listbox"
    for r, d, f in os.walk(folder):
        for file in f:
            if file.endswith(extension):
                lb.insert(0, r + "\\" + file)

def open_file():
    os.startfile(lb.get(lb.curselection()[0]))

root = tk.Tk()
root.geometry("400x400")
bt = tk.Button(root, text="Search", command=lambda:searchfiles('.png', 'H:\\'))
bt.pack()
lb = tk.Listbox(root)
lb.pack(fill="both", expand=1)
lb.bind("<Double-Button>", lambda x: open_file())
root.mainloop()

Get a list of files with Python 2 and 3


os.listdir()

How to get all the files (and directories) in the current directory (Python 3)

Following, are simple methods to retrieve only files in the current directory, using os and the listdir() function, in Python 3. Further exploration, will demonstrate how to return folders in the directory, but you will not have the file in the subdirectory, for that you can use walk – discussed later).

 import os
 arr = os.listdir()
 print(arr)

 >>> ['$RECYCLE.BIN', 'work.txt', '3ebooks.txt', 'documents']

glob

I found glob easier to select the file of the same type or with something in common. Look at the following example:

import glob

txtfiles = []
for file in glob.glob("*.txt"):
    txtfiles.append(file)

glob with list comprehension

import glob

mylist = [f for f in glob.glob("*.txt")]

glob with a function

The function returns a list of the given extension (.txt, .docx ecc.) in the argument

import glob

def filebrowser(ext=""):
    "Returns files with an extension"
    return [f for f in glob.glob(f"*{ext}")]

x = filebrowser(".txt")
print(x)

>>> ['example.txt', 'fb.txt', 'intro.txt', 'help.txt']

glob extending the previous code

The function now returns a list of file that matched with the string you pass as argument

import glob

def filesearch(word=""):
    """Returns a list with all files with the word/extension in it"""
    file = []
    for f in glob.glob("*"):
        if word[0] == ".":
            if f.endswith(word):
                file.append(f)
                return file
        elif word in f:
            file.append(f)
            return file
    return file

lookfor = "example", ".py"
for w in lookfor:
    print(f"{w:10} found => {filesearch(w)}")

output

example    found => []
.py        found => ['search.py']

Getting the full path name with os.path.abspath

As you noticed, you don’t have the full path of the file in the code above. If you need to have the absolute path, you can use another function of the os.path module called _getfullpathname, putting the file that you get from os.listdir() as an argument. There are other ways to have the full path, as we will check later (I replaced, as suggested by mexmex, _getfullpathname with abspath).

 import os
 files_path = [os.path.abspath(x) for x in os.listdir()]
 print(files_path)

 >>> ['F:\\documenti\applications.txt', 'F:\\documenti\collections.txt']

Get the full path name of a type of file into all subdirectories with walk

I find this very useful to find stuff in many directories, and it helped me find a file about which I didn’t remember the name:

import os

# Getting the current work directory (cwd)
thisdir = os.getcwd()

# r=root, d=directories, f = files
for r, d, f in os.walk(thisdir):
    for file in f:
        if file.endswith(".docx"):
            print(os.path.join(r, file))

os.listdir(): get files in the current directory (Python 2)

In Python 2, if you want the list of the files in the current directory, you have to give the argument as ‘.’ or os.getcwd() in the os.listdir method.

 import os
 arr = os.listdir('.')
 print(arr)

 >>> ['$RECYCLE.BIN', 'work.txt', '3ebooks.txt', 'documents']

To go up in the directory tree

# Method 1
x = os.listdir('..')

# Method 2
x= os.listdir('/')

Get files: os.listdir() in a particular directory (Python 2 and 3)

 import os
 arr = os.listdir('F:\\python')
 print(arr)

 >>> ['$RECYCLE.BIN', 'work.txt', '3ebooks.txt', 'documents']

Get files of a particular subdirectory with os.listdir()

import os

x = os.listdir("./content")

os.walk('.') – current directory

 import os
 arr = next(os.walk('.'))[2]
 print(arr)

 >>> ['5bs_Turismo1.pdf', '5bs_Turismo1.pptx', 'esperienza.txt']

next(os.walk('.')) and os.path.join('dir', 'file')

 import os
 arr = []
 for d,r,f in next(os.walk("F:\\_python")):
     for file in f:
         arr.append(os.path.join(r,file))

 for f in arr:
     print(files)

>>> F:\\_python\\dict_class.py
>>> F:\\_python\\programmi.txt

next(os.walk('F:\\') – get the full path – list comprehension

 [os.path.join(r,file) for r,d,f in next(os.walk("F:\\_python")) for file in f]

 >>> ['F:\\_python\\dict_class.py', 'F:\\_python\\programmi.txt']

os.walk – get full path – all files in sub dirs**

x = [os.path.join(r,file) for r,d,f in os.walk("F:\\_python") for file in f]
print(x)

>>> ['F:\\_python\\dict.py', 'F:\\_python\\progr.txt', 'F:\\_python\\readl.py']

os.listdir() – get only txt files

 arr_txt = [x for x in os.listdir() if x.endswith(".txt")]
 print(arr_txt)

 >>> ['work.txt', '3ebooks.txt']

Using glob to get the full path of the files

If I should need the absolute path of the files:

from path import path
from glob import glob
x = [path(f).abspath() for f in glob("F:\\*.txt")]
for f in x:
    print(f)

>>> F:\acquistionline.txt
>>> F:\acquisti_2018.txt
>>> F:\bootstrap_jquery_ecc.txt

Using os.path.isfile to avoid directories in the list

import os.path
listOfFiles = [f for f in os.listdir() if os.path.isfile(f)]
print(listOfFiles)

>>> ['a simple game.py', 'data.txt', 'decorator.py']

Using pathlib from Python 3.4

import pathlib

flist = []
for p in pathlib.Path('.').iterdir():
    if p.is_file():
        print(p)
        flist.append(p)

 >>> error.PNG
 >>> exemaker.bat
 >>> guiprova.mp3
 >>> setup.py
 >>> speak_gui2.py
 >>> thumb.PNG

With list comprehension:

flist = [p for p in pathlib.Path('.').iterdir() if p.is_file()]

Alternatively, use pathlib.Path() instead of pathlib.Path(".")

Use glob method in pathlib.Path()

import pathlib

py = pathlib.Path().glob("*.py")
for file in py:
    print(file)

>>> stack_overflow_list.py
>>> stack_overflow_list_tkinter.py

Get all and only files with os.walk

import os
x = [i[2] for i in os.walk('.')]
y=[]
for t in x:
    for f in t:
        y.append(f)
print(y)

>>> ['append_to_list.py', 'data.txt', 'data1.txt', 'data2.txt', 'data_180617', 'os_walk.py', 'READ2.py', 'read_data.py', 'somma_defaltdic.py', 'substitute_words.py', 'sum_data.py', 'data.txt', 'data1.txt', 'data_180617']

Get only files with next and walk in a directory

 import os
 x = next(os.walk('F://python'))[2]
 print(x)

 >>> ['calculator.bat','calculator.py']

Get only directories with next and walk in a directory

 import os
 next(os.walk('F://python'))[1] # for the current dir use ('.')

 >>> ['python3','others']

Get all the subdir names with walk

for r,d,f in os.walk("F:\\_python"):
    for dirs in d:
        print(dirs)

>>> .vscode
>>> pyexcel
>>> pyschool.py
>>> subtitles
>>> _metaprogramming
>>> .ipynb_checkpoints

os.scandir() from Python 3.5 and greater

import os
x = [f.name for f in os.scandir() if f.is_file()]
print(x)

>>> ['calculator.bat','calculator.py']

# Another example with scandir (a little variation from docs.python.org)
# This one is more efficient than os.listdir.
# In this case, it shows the files only in the current directory
# where the script is executed.

import os
with os.scandir() as i:
    for entry in i:
        if entry.is_file():
            print(entry.name)

>>> ebookmaker.py
>>> error.PNG
>>> exemaker.bat
>>> guiprova.mp3
>>> setup.py
>>> speakgui4.py
>>> speak_gui2.py
>>> speak_gui3.py
>>> thumb.PNG

Examples:

Ex. 1: How many files are there in the subdirectories?

In this example, we look for the number of files that are included in all the directory and its subdirectories.

import os

def count(dir, counter=0):
    "returns number of files in dir and subdirs"
    for pack in os.walk(dir):
        for f in pack[2]:
            counter += 1
    return dir + " : " + str(counter) + "files"

print(count("F:\\python"))

>>> 'F:\\\python' : 12057 files'

Ex.2: How to copy all files from a directory to another?

A script to make order in your computer finding all files of a type (default: pptx) and copying them in a new folder.

import os
import shutil
from path import path

destination = "F:\\file_copied"
# os.makedirs(destination)

def copyfile(dir, filetype='pptx', counter=0):
    "Searches for pptx (or other - pptx is the default) files and copies them"
    for pack in os.walk(dir):
        for f in pack[2]:
            if f.endswith(filetype):
                fullpath = pack[0] + "\\" + f
                print(fullpath)
                shutil.copy(fullpath, destination)
                counter += 1
    if counter > 0:
        print('-' * 30)
        print("\t==> Found in: `" + dir + "` : " + str(counter) + " files\n")

for dir in os.listdir():
    "searches for folders that starts with `_`"
    if dir[0] == '_':
        # copyfile(dir, filetype='pdf')
        copyfile(dir, filetype='txt')


>>> _compiti18\Compito Contabilità 1\conti.txt
>>> _compiti18\Compito Contabilità 1\modula4.txt
>>> _compiti18\Compito Contabilità 1\moduloa4.txt
>>> ------------------------
>>> ==> Found in: `_compiti18` : 3 files

Ex. 3: How to get all the files in a txt file

In case you want to create a txt file with all the file names:

import os
mylist = ""
with open("filelist.txt", "w", encoding="utf-8") as file:
    for eachfile in os.listdir():
        mylist += eachfile + "\n"
    file.write(mylist)

Example: txt with all the files of an hard drive

"""
We are going to save a txt file with all the files in your directory.
We will use the function walk()
"""

import os

# see all the methods of os
# print(*dir(os), sep=", ")
listafile = []
percorso = []
with open("lista_file.txt", "w", encoding='utf-8') as testo:
    for root, dirs, files in os.walk("D:\\"):
        for file in files:
            listafile.append(file)
            percorso.append(root + "\\" + file)
            testo.write(file + "\n")
listafile.sort()
print("N. of files", len(listafile))
with open("lista_file_ordinata.txt", "w", encoding="utf-8") as testo_ordinato:
    for file in listafile:
        testo_ordinato.write(file + "\n")

with open("percorso.txt", "w", encoding="utf-8") as file_percorso:
    for file in percorso:
        file_percorso.write(file + "\n")

os.system("lista_file.txt")
os.system("lista_file_ordinata.txt")
os.system("percorso.txt")

All the file of C:\ in one text file

This is a shorter version of the previous code. Change the folder where to start finding the files if you need to start from another position. This code generate a 50 mb on text file on my computer with something less then 500.000 lines with files with the complete path.

import os

with open("file.txt", "w", encoding="utf-8") as filewrite:
    for r, d, f in os.walk("C:\\"):
        for file in f:
            filewrite.write(f"{r + file}\n")

How to write a file with all paths in a folder of a type

With this function you can create a txt file that will have the name of a type of file that you look for (ex. pngfile.txt) with all the full path of all the files of that type. It can be useful sometimes, I think.

import os

def searchfiles(extension='.ttf', folder='H:\\'):
    "Create a txt file with all the file of a type"
    with open(extension[1:] + "file.txt", "w", encoding="utf-8") as filewrite:
        for r, d, f in os.walk(folder):
            for file in f:
                if file.endswith(extension):
                    filewrite.write(f"{r + file}\n")

# looking for png file (fonts) in the hard disk H:\
searchfiles('.png', 'H:\\')

>>> H:\4bs_18\Dolphins5.png
>>> H:\4bs_18\Dolphins6.png
>>> H:\4bs_18\Dolphins7.png
>>> H:\5_18\marketing html\assets\imageslogo2.png
>>> H:\7z001.png
>>> H:\7z002.png

(New) Find all files and open them with tkinter GUI

I just wanted to add in this 2019 a little app to search for all files in a dir and be able to open them by doubleclicking on the name of the file in the list. enter image description here

import tkinter as tk
import os

def searchfiles(extension='.txt', folder='H:\\'):
    "insert all files in the listbox"
    for r, d, f in os.walk(folder):
        for file in f:
            if file.endswith(extension):
                lb.insert(0, r + "\\" + file)

def open_file():
    os.startfile(lb.get(lb.curselection()[0]))

root = tk.Tk()
root.geometry("400x400")
bt = tk.Button(root, text="Search", command=lambda:searchfiles('.png', 'H:\\'))
bt.pack()
lb = tk.Listbox(root)
lb.pack(fill="both", expand=1)
lb.bind("<Double-Button>", lambda x: open_file())
root.mainloop()

回答 3

import os
os.listdir("somedirectory")

将返回“ somedirectory”中所有文件和目录的列表。

import os
os.listdir("somedirectory")

will return a list of all files and directories in “somedirectory”.


回答 4

一种获取文件列表(不包含子目录)的单行解决方案:

filenames = next(os.walk(path))[2]

或绝对路径名:

paths = [os.path.join(path, fn) for fn in next(os.walk(path))[2]]

A one-line solution to get only list of files (no subdirectories):

filenames = next(os.walk(path))[2]

or absolute pathnames:

paths = [os.path.join(path, fn) for fn in next(os.walk(path))[2]]

回答 5

从目录及其所有子目录获取完整的文件路径

import os

def get_filepaths(directory):
    """
    This function will generate the file names in a directory 
    tree by walking the tree either top-down or bottom-up. For each 
    directory in the tree rooted at directory top (including top itself), 
    it yields a 3-tuple (dirpath, dirnames, filenames).
    """
    file_paths = []  # List which will store all of the full filepaths.

    # Walk the tree.
    for root, directories, files in os.walk(directory):
        for filename in files:
            # Join the two strings in order to form the full filepath.
            filepath = os.path.join(root, filename)
            file_paths.append(filepath)  # Add it to the list.

    return file_paths  # Self-explanatory.

# Run the above function and store its results in a variable.   
full_file_paths = get_filepaths("/Users/johnny/Desktop/TEST")

  • 我在上述函数中提供的路径包含3个文件-其中两个在根目录中,另一个在子文件夹“ SUBFOLDER”中。您现在可以执行以下操作:
  • print full_file_paths 这将打印列表:

    • ['/Users/johnny/Desktop/TEST/file1.txt', '/Users/johnny/Desktop/TEST/file2.txt', '/Users/johnny/Desktop/TEST/SUBFOLDER/file3.dat']

如果愿意,您可以打开和阅读内容,或仅关注扩展名为“ .dat”的文件,如以下代码所示:

for f in full_file_paths:
  if f.endswith(".dat"):
    print f

/Users/johnny/Desktop/TEST/SUBFOLDER/file3.dat

Getting Full File Paths From a Directory and All Its Subdirectories

import os

def get_filepaths(directory):
    """
    This function will generate the file names in a directory 
    tree by walking the tree either top-down or bottom-up. For each 
    directory in the tree rooted at directory top (including top itself), 
    it yields a 3-tuple (dirpath, dirnames, filenames).
    """
    file_paths = []  # List which will store all of the full filepaths.

    # Walk the tree.
    for root, directories, files in os.walk(directory):
        for filename in files:
            # Join the two strings in order to form the full filepath.
            filepath = os.path.join(root, filename)
            file_paths.append(filepath)  # Add it to the list.

    return file_paths  # Self-explanatory.

# Run the above function and store its results in a variable.   
full_file_paths = get_filepaths("/Users/johnny/Desktop/TEST")

  • The path I provided in the above function contained 3 files— two of them in the root directory, and another in a subfolder called “SUBFOLDER.” You can now do things like:
  • print full_file_paths which will print the list:

    • ['/Users/johnny/Desktop/TEST/file1.txt', '/Users/johnny/Desktop/TEST/file2.txt', '/Users/johnny/Desktop/TEST/SUBFOLDER/file3.dat']

If you’d like, you can open and read the contents, or focus only on files with the extension “.dat” like in the code below:

for f in full_file_paths:
  if f.endswith(".dat"):
    print f

/Users/johnny/Desktop/TEST/SUBFOLDER/file3.dat


回答 6

从3.4版开始,有内置的迭代器,它比os.listdir()

pathlib版本3.4中的新功能。

>>> import pathlib
>>> [p for p in pathlib.Path('.').iterdir() if p.is_file()]

根据PEP 428,该pathlib库的目的是提供一个简单的类层次结构,以处理文件系统路径以及用户对其进行的常见操作。

os.scandir()3.5版中的新功能。

>>> import os
>>> [entry for entry in os.scandir('.') if entry.is_file()]

请注意,os.walk()使用os.scandir()代替了os.listdir()3.5版,并且根据PEP 471将其速度提高了2-20倍。

我也建议您阅读下面的ShadowRanger评论。

Since version 3.4 there are builtin iterators for this which are a lot more efficient than os.listdir():

pathlib: New in version 3.4.

>>> import pathlib
>>> [p for p in pathlib.Path('.').iterdir() if p.is_file()]

According to PEP 428, the aim of the pathlib library is to provide a simple hierarchy of classes to handle filesystem paths and the common operations users do over them.

os.scandir(): New in version 3.5.

>>> import os
>>> [entry for entry in os.scandir('.') if entry.is_file()]

Note that os.walk() uses os.scandir() instead of os.listdir() from version 3.5, and its speed got increased by 2-20 times according to PEP 471.

Let me also recommend reading ShadowRanger’s comment below.


回答 7

初步说明

  • 尽管文件目录问题文本中的术语,但有些人可能会认为目录实际上是特殊文件
  • 该声明: ”目录的所有文件 ”可以用两种方式解释:
    1. 所有直接(或1级)后代
    2. 整个目录树中的所有子代(包括子目录中的子代)
  • 提出问题时,我认为Python 2LTS版本,但是代码示例将由Python 3.5)运行(我将使其尽可能与Python 2兼容;此外,属于我要发布的Python来自v3.5.4-除非另有说明)。结果与问题中的另一个关键字相关:“ 将它们添加到列表中 ”:

    • Python 2.2之前的版本中版本中,序列(可迭代)主要由列表(元组,集合等)表示。
    • Python 2.2中引入生成器[Python.Wiki]:Generators)的概念-由[Python 3]:yield语句提供。随着时间的流逝,对于返回/使用列表的函数,生成器对应对象开始出现
    • Python 3中,generator是默认行为
    • 不知道返回列表是否仍然是强制性的(或者生成器也可以执行),但是将生成器传递给列表构造函数会从列表构造器中创建列表(并消耗列表)。以下示例说明了[Python 3]的区别mapfunction,iterable,…
    >>> import sys
    >>> sys.version
    '2.7.10 (default, Mar  8 2016, 15:02:46) [MSC v.1600 64 bit (AMD64)]'
    >>> m = map(lambda x: x, [1, 2, 3])  # Just a dummy lambda function
    >>> m, type(m)
    ([1, 2, 3], <type 'list'>)
    >>> len(m)
    3


    >>> import sys
    >>> sys.version
    '3.5.4 (v3.5.4:3f56838, Aug  8 2017, 02:17:05) [MSC v.1900 64 bit (AMD64)]'
    >>> m = map(lambda x: x, [1, 2, 3])
    >>> m, type(m)
    (<map object at 0x000001B4257342B0>, <class 'map'>)
    >>> len(m)
    Traceback (most recent call last):
      File "<stdin>", line 1, in <module>
    TypeError: object of type 'map' has no len()
    >>> lm0 = list(m)  # Build a list from the generator
    >>> lm0, type(lm0)
    ([1, 2, 3], <class 'list'>)
    >>>
    >>> lm1 = list(m)  # Build a list from the same generator
    >>> lm1, type(lm1)  # Empty list now - generator already consumed
    ([], <class 'list'>)
  • 这些示例将基于具有以下结构的名为root_dir的目录(此示例适用于Win,但我也在Lnx上使用同一棵树):

    E:\Work\Dev\StackOverflow\q003207219>tree /f "root_dir"
    Folder PATH listing for volume Work
    Volume serial number is 00000029 3655:6FED
    E:\WORK\DEV\STACKOVERFLOW\Q003207219\ROOT_DIR
    ¦   file0
    ¦   file1
    ¦
    +---dir0
    ¦   +---dir00
    ¦   ¦   ¦   file000
    ¦   ¦   ¦
    ¦   ¦   +---dir000
    ¦   ¦           file0000
    ¦   ¦
    ¦   +---dir01
    ¦   ¦       file010
    ¦   ¦       file011
    ¦   ¦
    ¦   +---dir02
    ¦       +---dir020
    ¦           +---dir0200
    +---dir1
    ¦       file10
    ¦       file11
    ¦       file12
    ¦
    +---dir2
    ¦   ¦   file20
    ¦   ¦
    ¦   +---dir20
    ¦           file200
    ¦
    +---dir3


解决方案

程序化方法:

  1. [Python 3]:操作系统。listdirpath =’。’

    返回一个列表,其中包含由path给出的目录中条目的名称。列表按任意顺序排列,不包括特殊条目'.''..'


    >>> import os
    >>> root_dir = "root_dir"  # Path relative to current dir (os.getcwd())
    >>>
    >>> os.listdir(root_dir)  # List all the items in root_dir
    ['dir0', 'dir1', 'dir2', 'dir3', 'file0', 'file1']
    >>>
    >>> [item for item in os.listdir(root_dir) if os.path.isfile(os.path.join(root_dir, item))]  # Filter items and only keep files (strip out directories)
    ['file0', 'file1']

    一个更详细的示例(code_os_listdir.py):

    import os
    from pprint import pformat
    
    
    def _get_dir_content(path, include_folders, recursive):
        entries = os.listdir(path)
        for entry in entries:
            entry_with_path = os.path.join(path, entry)
            if os.path.isdir(entry_with_path):
                if include_folders:
                    yield entry_with_path
                if recursive:
                    for sub_entry in _get_dir_content(entry_with_path, include_folders, recursive):
                        yield sub_entry
            else:
                yield entry_with_path
    
    
    def get_dir_content(path, include_folders=True, recursive=True, prepend_folder_name=True):
        path_len = len(path) + len(os.path.sep)
        for item in _get_dir_content(path, include_folders, recursive):
            yield item if prepend_folder_name else item[path_len:]
    
    
    def _get_dir_content_old(path, include_folders, recursive):
        entries = os.listdir(path)
        ret = list()
        for entry in entries:
            entry_with_path = os.path.join(path, entry)
            if os.path.isdir(entry_with_path):
                if include_folders:
                    ret.append(entry_with_path)
                if recursive:
                    ret.extend(_get_dir_content_old(entry_with_path, include_folders, recursive))
            else:
                ret.append(entry_with_path)
        return ret
    
    
    def get_dir_content_old(path, include_folders=True, recursive=True, prepend_folder_name=True):
        path_len = len(path) + len(os.path.sep)
        return [item if prepend_folder_name else item[path_len:] for item in _get_dir_content_old(path, include_folders, recursive)]
    
    
    def main():
        root_dir = "root_dir"
        ret0 = get_dir_content(root_dir, include_folders=True, recursive=True, prepend_folder_name=True)
        lret0 = list(ret0)
        print(ret0, len(lret0), pformat(lret0))
        ret1 = get_dir_content_old(root_dir, include_folders=False, recursive=True, prepend_folder_name=False)
        print(len(ret1), pformat(ret1))
    
    
    if __name__ == "__main__":
        main()

    注意事项

    • 有两种实现:
      • 使用生成器的生成器(当然这里似乎没有用,因为我立即将结果转换为列表)
      • 经典的(函数名称以 _old
    • 使用递归(进入子目录)
    • 对于每种实现,都有两个功能:
      • 下划线 _):“ private”(不应直接调用)-完成所有工作
      • 公共的(包装上一个):它只是从返回的条目中剥离出初始路径(如果需要)。这是一个丑陋的实现,但这是我目前唯一能想到的想法
    • 在性能方面,生成器通常要快一些(考虑这两个创建迭代时间),但是我没有在递归函数中对其进行测试,而且我还在内部生成器上迭代了函数-不知道性能如何友好的是
    • 玩弄参数以获得不同的结果


    输出

    (py35x64_test) E:\Work\Dev\StackOverflow\q003207219>"e:\Work\Dev\VEnvs\py35x64_test\Scripts\python.exe" "code_os_listdir.py"
    <generator object get_dir_content at 0x000001BDDBB3DF10> 22 ['root_dir\\dir0',
     'root_dir\\dir0\\dir00',
     'root_dir\\dir0\\dir00\\dir000',
     'root_dir\\dir0\\dir00\\dir000\\file0000',
     'root_dir\\dir0\\dir00\\file000',
     'root_dir\\dir0\\dir01',
     'root_dir\\dir0\\dir01\\file010',
     'root_dir\\dir0\\dir01\\file011',
     'root_dir\\dir0\\dir02',
     'root_dir\\dir0\\dir02\\dir020',
     'root_dir\\dir0\\dir02\\dir020\\dir0200',
     'root_dir\\dir1',
     'root_dir\\dir1\\file10',
     'root_dir\\dir1\\file11',
     'root_dir\\dir1\\file12',
     'root_dir\\dir2',
     'root_dir\\dir2\\dir20',
     'root_dir\\dir2\\dir20\\file200',
     'root_dir\\dir2\\file20',
     'root_dir\\dir3',
     'root_dir\\file0',
     'root_dir\\file1']
    11 ['dir0\\dir00\\dir000\\file0000',
     'dir0\\dir00\\file000',
     'dir0\\dir01\\file010',
     'dir0\\dir01\\file011',
     'dir1\\file10',
     'dir1\\file11',
     'dir1\\file12',
     'dir2\\dir20\\file200',
     'dir2\\file20',
     'file0',
     'file1']


  1. [Python 3]:操作系统。scandirpath =’。’Python 3.5 +,反向移植:[PyPI]:scandir

    返回与path指定的目录中的条目相对应的os.DirEntry对象的迭代器。条目以任意顺序产生,特殊条目'.''..'不包括在内。

    使用scandir()而不是listdir()可以显着提高还需要文件类型或文件属性信息的代码的性能,因为如果操作系统在扫描目录时提供了os.DirEntry对象,则该信息会公开。所有的os.DirEntry方法都可以执行系统调用,但是is_dir()is_file()通常只需要系统调用即可进行符号链接。os.DirEntry.stat()在Unix上始终需要系统调用,而在Windows上只需要一个系统调用即可。


    >>> import os
    >>> root_dir = os.path.join(".", "root_dir")  # Explicitly prepending current directory
    >>> root_dir
    '.\\root_dir'
    >>>
    >>> scandir_iterator = os.scandir(root_dir)
    >>> scandir_iterator
    <nt.ScandirIterator object at 0x00000268CF4BC140>
    >>> [item.path for item in scandir_iterator]
    ['.\\root_dir\\dir0', '.\\root_dir\\dir1', '.\\root_dir\\dir2', '.\\root_dir\\dir3', '.\\root_dir\\file0', '.\\root_dir\\file1']
    >>>
    >>> [item.path for item in scandir_iterator]  # Will yield an empty list as it was consumed by previous iteration (automatically performed by the list comprehension)
    []
    >>>
    >>> scandir_iterator = os.scandir(root_dir)  # Reinitialize the generator
    >>> for item in scandir_iterator :
    ...     if os.path.isfile(item.path):
    ...             print(item.name)
    ...
    file0
    file1

    注意事项

    • 类似于 os.listdir
    • 但是它也更灵活(并提供更多功能),更多Python ic(在某些情况下更快)


  1. [Python 3]:操作系统。步行top,topdown = True,onerror = None,followlinks = False

    通过自上而下或自下而上移动目录树来生成文件名。对于在目录为根的树中的每个目录顶部(包括顶部本身),它产生一个3元组(dirpathdirnamesfilenames)。


    >>> import os
    >>> root_dir = os.path.join(os.getcwd(), "root_dir")  # Specify the full path
    >>> root_dir
    'E:\\Work\\Dev\\StackOverflow\\q003207219\\root_dir'
    >>>
    >>> walk_generator = os.walk(root_dir)
    >>> root_dir_entry = next(walk_generator)  # First entry corresponds to the root dir (passed as an argument)
    >>> root_dir_entry
    ('E:\\Work\\Dev\\StackOverflow\\q003207219\\root_dir', ['dir0', 'dir1', 'dir2', 'dir3'], ['file0', 'file1'])
    >>>
    >>> root_dir_entry[1] + root_dir_entry[2]  # Display dirs and files (direct descendants) in a single list
    ['dir0', 'dir1', 'dir2', 'dir3', 'file0', 'file1']
    >>>
    >>> [os.path.join(root_dir_entry[0], item) for item in root_dir_entry[1] + root_dir_entry[2]]  # Display all the entries in the previous list by their full path
    ['E:\\Work\\Dev\\StackOverflow\\q003207219\\root_dir\\dir0', 'E:\\Work\\Dev\\StackOverflow\\q003207219\\root_dir\\dir1', 'E:\\Work\\Dev\\StackOverflow\\q003207219\\root_dir\\dir2', 'E:\\Work\\Dev\\StackOverflow\\q003207219\\root_dir\\dir3', 'E:\\Work\\Dev\\StackOverflow\\q003207219\\root_dir\\file0', 'E:\\Work\\Dev\\StackOverflow\\q003207219\\root_dir\\file1']
    >>>
    >>> for entry in walk_generator:  # Display the rest of the elements (corresponding to every subdir)
    ...     print(entry)
    ...
    ('E:\\Work\\Dev\\StackOverflow\\q003207219\\root_dir\\dir0', ['dir00', 'dir01', 'dir02'], [])
    ('E:\\Work\\Dev\\StackOverflow\\q003207219\\root_dir\\dir0\\dir00', ['dir000'], ['file000'])
    ('E:\\Work\\Dev\\StackOverflow\\q003207219\\root_dir\\dir0\\dir00\\dir000', [], ['file0000'])
    ('E:\\Work\\Dev\\StackOverflow\\q003207219\\root_dir\\dir0\\dir01', [], ['file010', 'file011'])
    ('E:\\Work\\Dev\\StackOverflow\\q003207219\\root_dir\\dir0\\dir02', ['dir020'], [])
    ('E:\\Work\\Dev\\StackOverflow\\q003207219\\root_dir\\dir0\\dir02\\dir020', ['dir0200'], [])
    ('E:\\Work\\Dev\\StackOverflow\\q003207219\\root_dir\\dir0\\dir02\\dir020\\dir0200', [], [])
    ('E:\\Work\\Dev\\StackOverflow\\q003207219\\root_dir\\dir1', [], ['file10', 'file11', 'file12'])
    ('E:\\Work\\Dev\\StackOverflow\\q003207219\\root_dir\\dir2', ['dir20'], ['file20'])
    ('E:\\Work\\Dev\\StackOverflow\\q003207219\\root_dir\\dir2\\dir20', [], ['file200'])
    ('E:\\Work\\Dev\\StackOverflow\\q003207219\\root_dir\\dir3', [], [])

    注意事项

    • 在场景下,它使用os.scandiros.listdir在旧版本上)
    • 它通过重复子文件夹来完成繁重的工作


  1. [Python 3]:glob。globpathname,*,recursive = False[Python 3]:glob。iglobpathname,*,recursive = False

    返回与pathname匹配的路径名的可能为空的列表,该列表必须是包含路径说明的字符串。路径名可以是绝对的(如/usr/src/Python-1.5/Makefile)或相对的(如../../Tools/*/*.gif),并且可以包含壳式通配符。损坏的符号链接包含在结果中(如在shell中)。

    在版本3.5中更改:使用“ **” 支持递归glob 。


    >>> import glob, os
    >>> wildcard_pattern = "*"
    >>> root_dir = os.path.join("root_dir", wildcard_pattern)  # Match every file/dir name
    >>> root_dir
    'root_dir\\*'
    >>>
    >>> glob_list = glob.glob(root_dir)
    >>> glob_list
    ['root_dir\\dir0', 'root_dir\\dir1', 'root_dir\\dir2', 'root_dir\\dir3', 'root_dir\\file0', 'root_dir\\file1']
    >>>
    >>> [item.replace("root_dir" + os.path.sep, "") for item in glob_list]  # Strip the dir name and the path separator from begining
    ['dir0', 'dir1', 'dir2', 'dir3', 'file0', 'file1']
    >>>
    >>> for entry in glob.iglob(root_dir + "*", recursive=True):
    ...     print(entry)
    ...
    root_dir\
    root_dir\dir0
    root_dir\dir0\dir00
    root_dir\dir0\dir00\dir000
    root_dir\dir0\dir00\dir000\file0000
    root_dir\dir0\dir00\file000
    root_dir\dir0\dir01
    root_dir\dir0\dir01\file010
    root_dir\dir0\dir01\file011
    root_dir\dir0\dir02
    root_dir\dir0\dir02\dir020
    root_dir\dir0\dir02\dir020\dir0200
    root_dir\dir1
    root_dir\dir1\file10
    root_dir\dir1\file11
    root_dir\dir1\file12
    root_dir\dir2
    root_dir\dir2\dir20
    root_dir\dir2\dir20\file200
    root_dir\dir2\file20
    root_dir\dir3
    root_dir\file0
    root_dir\file1

    注意事项

    • 用途 os.listdir
    • 对于大树(尤其是在启用递归的情况下),iglob首选
    • 允许基于名称进行高级过滤(由于通配符)


  1. [Python 3]:类pathlib。路径* pathsegmentsPython 3.4 +,backport:[PyPI]:pathlib2

    >>> import pathlib
    >>> root_dir = "root_dir"
    >>> root_dir_instance = pathlib.Path(root_dir)
    >>> root_dir_instance
    WindowsPath('root_dir')
    >>> root_dir_instance.name
    'root_dir'
    >>> root_dir_instance.is_dir()
    True
    >>>
    >>> [item.name for item in root_dir_instance.glob("*")]  # Wildcard searching for all direct descendants
    ['dir0', 'dir1', 'dir2', 'dir3', 'file0', 'file1']
    >>>
    >>> [os.path.join(item.parent.name, item.name) for item in root_dir_instance.glob("*") if not item.is_dir()]  # Display paths (including parent) for files only
    ['root_dir\\file0', 'root_dir\\file1']

    注意事项

    • 这是实现我们目标的一种方式
    • 这是处理路径的OOP风格
    • 提供许多功能


  1. [Python 2]:dircache.listdir(path)(仅Python 2


    def listdir(path):
        """List directory contents, using cache."""
        try:
            cached_mtime, list = cache[path]
            del cache[path]
        except KeyError:
            cached_mtime, list = -1, []
        mtime = os.stat(path).st_mtime
        if mtime != cached_mtime:
            list = os.listdir(path)
            list.sort()
        cache[path] = mtime, list
        return list


  1. [man7]:OPENDIR(3) / [man7]:READDIR(3) / [man7]:CLOSEDIR(3)通过[Python 3]:ctypes-Python的外部函数库(特定POSIX

    ctypes是Python的外部函数库。它提供C兼容的数据类型,并允许在DLL或共享库中调用函数。它可以用于将这些库包装在纯Python中。

    code_ctypes.py

    #!/usr/bin/env python3
    
    import sys
    from ctypes import Structure, \
        c_ulonglong, c_longlong, c_ushort, c_ubyte, c_char, c_int, \
        CDLL, POINTER, \
        create_string_buffer, get_errno, set_errno, cast
    
    
    DT_DIR = 4
    DT_REG = 8
    
    char256 = c_char * 256
    
    
    class LinuxDirent64(Structure):
        _fields_ = [
            ("d_ino", c_ulonglong),
            ("d_off", c_longlong),
            ("d_reclen", c_ushort),
            ("d_type", c_ubyte),
            ("d_name", char256),
        ]
    
    LinuxDirent64Ptr = POINTER(LinuxDirent64)
    
    libc_dll = this_process = CDLL(None, use_errno=True)
    # ALWAYS set argtypes and restype for functions, otherwise it's UB!!!
    opendir = libc_dll.opendir
    readdir = libc_dll.readdir
    closedir = libc_dll.closedir
    
    
    def get_dir_content(path):
        ret = [path, list(), list()]
        dir_stream = opendir(create_string_buffer(path.encode()))
        if (dir_stream == 0):
            print("opendir returned NULL (errno: {:d})".format(get_errno()))
            return ret
        set_errno(0)
        dirent_addr = readdir(dir_stream)
        while dirent_addr:
            dirent_ptr = cast(dirent_addr, LinuxDirent64Ptr)
            dirent = dirent_ptr.contents
            name = dirent.d_name.decode()
            if dirent.d_type & DT_DIR:
                if name not in (".", ".."):
                    ret[1].append(name)
            elif dirent.d_type & DT_REG:
                ret[2].append(name)
            dirent_addr = readdir(dir_stream)
        if get_errno():
            print("readdir returned NULL (errno: {:d})".format(get_errno()))
        closedir(dir_stream)
        return ret
    
    
    def main():
        print("{:s} on {:s}\n".format(sys.version, sys.platform))
        root_dir = "root_dir"
        entries = get_dir_content(root_dir)
        print(entries)
    
    
    if __name__ == "__main__":
        main()

    注意事项

    • 它从libc加载三个函数(在当前进程中加载​​)并调用它们(有关更多详细信息,请检查[SO]:如何检查文件是否存在无异常?(@ CristiFati的回答))第4项的最后注释)。这将使这种方法非常接近Python / C边缘
    • LinuxDirent64ctypes的代表性结构dirent64[man7]:dirent.h(0P) (因此是DT_从我的机器常量):Ubtu 16 644.10.0-40泛型libc6的-dev的:AMD64)。在其他口味/版本上,结构定义可能会有所不同,如果是,则应更新ctypes别名,否则将产生未定义行为
    • 它以os.walk格式返回数据。我没有麻烦使其递归,但是从现有代码开始,这将是一件相当琐碎的任务
    • 一切在Win上也是可行的,数据(库,函数,结构,常量等)不同


    输出

    [cfati@cfati-ubtu16x64-0:~/Work/Dev/StackOverflow/q003207219]> ./code_ctypes.py
    3.5.2 (default, Nov 12 2018, 13:43:14)
    [GCC 5.4.0 20160609] on linux
    
    ['root_dir', ['dir2', 'dir1', 'dir3', 'dir0'], ['file1', 'file0']]


  1. [ActiveState.Docs]:win32file.FindFilesW(特定Win

    使用Windows Unicode API检索匹配文件名的列表。API FindFirstFileW / FindNextFileW / Find关闭函数的接口。


    >>> import os, win32file, win32con
    >>> root_dir = "root_dir"
    >>> wildcard = "*"
    >>> root_dir_wildcard = os.path.join(root_dir, wildcard)
    >>> entry_list = win32file.FindFilesW(root_dir_wildcard)
    >>> len(entry_list)  # Don't display the whole content as it's too long
    8
    >>> [entry[-2] for entry in entry_list]  # Only display the entry names
    ['.', '..', 'dir0', 'dir1', 'dir2', 'dir3', 'file0', 'file1']
    >>>
    >>> [entry[-2] for entry in entry_list if entry[0] & win32con.FILE_ATTRIBUTE_DIRECTORY and entry[-2] not in (".", "..")]  # Filter entries and only display dir names (except self and parent)
    ['dir0', 'dir1', 'dir2', 'dir3']
    >>>
    >>> [os.path.join(root_dir, entry[-2]) for entry in entry_list if entry[0] & (win32con.FILE_ATTRIBUTE_NORMAL | win32con.FILE_ATTRIBUTE_ARCHIVE)]  # Only display file "full" names
    ['root_dir\\file0', 'root_dir\\file1']

    注意事项


  1. 安装一些(其他)第三方软件包即可
    • 最有可能会依赖于上述一项(或多项)(可能需要进行一些自定义)


注意事项

  • 代码应具有可移植性(针对特定区域的地方-带有标记的地方除外)或交叉的:

    • 平台(NixWin,)
    • Python版本(2、3,)
  • 在上述变体中使用了多种路径样式(绝对路径,相对路径),以说明以下事实:所使用的“工具”在此方向上是灵活的

  • os.listdiros.scandir使用opendir / readdir / closedir[MS.Docs]:FindFirstFileW函数 / [MS.Docs]:FindNextFileW函数 / [MS.Docs]:FindClose函数)(通过[GitHub]:python / cpython-(master)cpython /模块/posixmodule.c

  • win32file.FindFilesW也使用那些(特定于Win的)函数(通过[GitHub]:mhammond / pywin32-(主)pywin32 / win32 / src / win32file.i

  • _get_dir_content(从第1点开始)可以使用以下任何一种方法来实现(有些需要更多的工作,有些需要更少的工作)

    • 可以执行一些高级过滤(而不只是文件目录):例如,include_folders参数可以被另一个参数(例如filter_func)代替,该函数将以路径作为参数:(filter_func=lambda x: True这不会删除)内容)和_get_dir_content内的内容类似:(if not filter_func(entry_with_path): continue如果该函数对一项失败,则将被跳过),但是代码越复杂,执行所花费的时间就越长
  • 诺娜·贝恩!由于使用了递归,因此我必须提到我在笔记本电脑(Win 10 x64)上进行了一些测试,与该问题完全无关,并且当递归级别达到(990 .. 1000)范围内的某个值时(recursionlimit -1000 (默认)),我得到了StackOverflow :)。如果目录树超过该限制(我不是FS专家,那么我什至不知道那是否可能),那可能是个问题。
    我还必须提到,我没有尝试增加递归限制,因为我在该领域没有经验(必须增加OS的堆栈之前,我可以增加多少?级别),但从理论上讲,如果目录深度大于可能的最高递归限制(在该计算机上),则总有失败的可能性

  • 代码示例仅用于说明目的。这意味着我没有考虑错误处理(我不认为有任何尝试 / 除外 / else / finally块),因此代码并不健壮(原因是:使其尽可能简单和简短) )。对于生产,还应添加错误处理

其他方法:

  1. 仅将Python用作包装器

    • 一切都使用另一种技术完成
    • 该技术是从Python调用的
    • 我所知道的最著名的味道是我所说的系统管理员方法:

      • 采用 Python(或与此相关的任何编程语言)执行Shell命令(并解析其输出)
      • 有人认为这是一个很好的技巧
      • 我认为这更像是一种la脚的解决方法(gainarie),因为操作本身是从shell在这种情况下为cmd)执行的,因此与Python无关。
      • 过滤(grep/ findstr)或输出格式化都可以在两面进行,但我不会坚持这样做。另外,我故意使用os.system代替subprocess.Popen
      (py35x64_test) E:\Work\Dev\StackOverflow\q003207219>"e:\Work\Dev\VEnvs\py35x64_test\Scripts\python.exe" -c "import os;os.system(\"dir /b root_dir\")"
      dir0
      dir1
      dir2
      dir3
      file0
      file1

    通常应避免这种方法,因为如果某些命令输出格式在OS版本/风格之间略有不同,则解析代码也应进行调整;例如,更不用说语言环境之间的差异了)。

Preliminary notes

  • Although there’s a clear differentiation between file and directory terms in the question text, some may argue that directories are actually special files
  • The statement: “all files of a directory” can be interpreted in two ways:
    1. All direct (or level 1) descendants only
    2. All descendants in the whole directory tree (including the ones in sub-directories)
  • When the question was asked, I imagine that Python 2, was the LTS version, however the code samples will be run by Python 3(.5) (I’ll keep them as Python 2 compliant as possible; also, any code belonging to Python that I’m going to post, is from v3.5.4 – unless otherwise specified). That has consequences related to another keyword in the question: “add them into a list“:

    • In pre Python 2.2 versions, sequences (iterables) were mostly represented by lists (tuples, sets, …)
    • In Python 2.2, the concept of generator ([Python.Wiki]: Generators) – courtesy of [Python 3]: The yield statement) – was introduced. As time passed, generator counterparts started to appear for functions that returned/worked with lists
    • In Python 3, generator is the default behavior
    • Not sure if returning a list is still mandatory (or a generator would do as well), but passing a generator to the list constructor, will create a list out of it (and also consume it). The example below illustrates the differences on [Python 3]: map(function, iterable, …)
    >>> import sys
    >>> sys.version
    '2.7.10 (default, Mar  8 2016, 15:02:46) [MSC v.1600 64 bit (AMD64)]'
    >>> m = map(lambda x: x, [1, 2, 3])  # Just a dummy lambda function
    >>> m, type(m)
    ([1, 2, 3], <type 'list'>)
    >>> len(m)
    3
    


    >>> import sys
    >>> sys.version
    '3.5.4 (v3.5.4:3f56838, Aug  8 2017, 02:17:05) [MSC v.1900 64 bit (AMD64)]'
    >>> m = map(lambda x: x, [1, 2, 3])
    >>> m, type(m)
    (<map object at 0x000001B4257342B0>, <class 'map'>)
    >>> len(m)
    Traceback (most recent call last):
      File "<stdin>", line 1, in <module>
    TypeError: object of type 'map' has no len()
    >>> lm0 = list(m)  # Build a list from the generator
    >>> lm0, type(lm0)
    ([1, 2, 3], <class 'list'>)
    >>>
    >>> lm1 = list(m)  # Build a list from the same generator
    >>> lm1, type(lm1)  # Empty list now - generator already consumed
    ([], <class 'list'>)
    
  • The examples will be based on a directory called root_dir with the following structure (this example is for Win, but I’m using the same tree on Lnx as well):

    E:\Work\Dev\StackOverflow\q003207219>tree /f "root_dir"
    Folder PATH listing for volume Work
    Volume serial number is 00000029 3655:6FED
    E:\WORK\DEV\STACKOVERFLOW\Q003207219\ROOT_DIR
    ¦   file0
    ¦   file1
    ¦
    +---dir0
    ¦   +---dir00
    ¦   ¦   ¦   file000
    ¦   ¦   ¦
    ¦   ¦   +---dir000
    ¦   ¦           file0000
    ¦   ¦
    ¦   +---dir01
    ¦   ¦       file010
    ¦   ¦       file011
    ¦   ¦
    ¦   +---dir02
    ¦       +---dir020
    ¦           +---dir0200
    +---dir1
    ¦       file10
    ¦       file11
    ¦       file12
    ¦
    +---dir2
    ¦   ¦   file20
    ¦   ¦
    ¦   +---dir20
    ¦           file200
    ¦
    +---dir3
    


Solutions

Programmatic approaches:

  1. [Python 3]: os.listdir(path=’.’)

    Return a list containing the names of the entries in the directory given by path. The list is in arbitrary order, and does not include the special entries '.' and '..'


    >>> import os
    >>> root_dir = "root_dir"  # Path relative to current dir (os.getcwd())
    >>>
    >>> os.listdir(root_dir)  # List all the items in root_dir
    ['dir0', 'dir1', 'dir2', 'dir3', 'file0', 'file1']
    >>>
    >>> [item for item in os.listdir(root_dir) if os.path.isfile(os.path.join(root_dir, item))]  # Filter items and only keep files (strip out directories)
    ['file0', 'file1']
    

    A more elaborate example (code_os_listdir.py):

    import os
    from pprint import pformat
    
    
    def _get_dir_content(path, include_folders, recursive):
        entries = os.listdir(path)
        for entry in entries:
            entry_with_path = os.path.join(path, entry)
            if os.path.isdir(entry_with_path):
                if include_folders:
                    yield entry_with_path
                if recursive:
                    for sub_entry in _get_dir_content(entry_with_path, include_folders, recursive):
                        yield sub_entry
            else:
                yield entry_with_path
    
    
    def get_dir_content(path, include_folders=True, recursive=True, prepend_folder_name=True):
        path_len = len(path) + len(os.path.sep)
        for item in _get_dir_content(path, include_folders, recursive):
            yield item if prepend_folder_name else item[path_len:]
    
    
    def _get_dir_content_old(path, include_folders, recursive):
        entries = os.listdir(path)
        ret = list()
        for entry in entries:
            entry_with_path = os.path.join(path, entry)
            if os.path.isdir(entry_with_path):
                if include_folders:
                    ret.append(entry_with_path)
                if recursive:
                    ret.extend(_get_dir_content_old(entry_with_path, include_folders, recursive))
            else:
                ret.append(entry_with_path)
        return ret
    
    
    def get_dir_content_old(path, include_folders=True, recursive=True, prepend_folder_name=True):
        path_len = len(path) + len(os.path.sep)
        return [item if prepend_folder_name else item[path_len:] for item in _get_dir_content_old(path, include_folders, recursive)]
    
    
    def main():
        root_dir = "root_dir"
        ret0 = get_dir_content(root_dir, include_folders=True, recursive=True, prepend_folder_name=True)
        lret0 = list(ret0)
        print(ret0, len(lret0), pformat(lret0))
        ret1 = get_dir_content_old(root_dir, include_folders=False, recursive=True, prepend_folder_name=False)
        print(len(ret1), pformat(ret1))
    
    
    if __name__ == "__main__":
        main()
    

    Notes:

    • There are two implementations:
      • One that uses generators (of course here it seems useless, since I immediately convert the result to a list)
      • The classic one (function names ending in _old)
    • Recursion is used (to get into subdirectories)
    • For each implementation there are two functions:
      • One that starts with an underscore (_): “private” (should not be called directly) – that does all the work
      • The public one (wrapper over previous): it just strips off the initial path (if required) from the returned entries. It’s an ugly implementation, but it’s the only idea that I could come with at this point
    • In terms of performance, generators are generally a little bit faster (considering both creation and iteration times), but I didn’t test them in recursive functions, and also I am iterating inside the function over inner generators – don’t know how performance friendly is that
    • Play with the arguments to get different results


    Output:

    (py35x64_test) E:\Work\Dev\StackOverflow\q003207219>"e:\Work\Dev\VEnvs\py35x64_test\Scripts\python.exe" "code_os_listdir.py"
    <generator object get_dir_content at 0x000001BDDBB3DF10> 22 ['root_dir\\dir0',
     'root_dir\\dir0\\dir00',
     'root_dir\\dir0\\dir00\\dir000',
     'root_dir\\dir0\\dir00\\dir000\\file0000',
     'root_dir\\dir0\\dir00\\file000',
     'root_dir\\dir0\\dir01',
     'root_dir\\dir0\\dir01\\file010',
     'root_dir\\dir0\\dir01\\file011',
     'root_dir\\dir0\\dir02',
     'root_dir\\dir0\\dir02\\dir020',
     'root_dir\\dir0\\dir02\\dir020\\dir0200',
     'root_dir\\dir1',
     'root_dir\\dir1\\file10',
     'root_dir\\dir1\\file11',
     'root_dir\\dir1\\file12',
     'root_dir\\dir2',
     'root_dir\\dir2\\dir20',
     'root_dir\\dir2\\dir20\\file200',
     'root_dir\\dir2\\file20',
     'root_dir\\dir3',
     'root_dir\\file0',
     'root_dir\\file1']
    11 ['dir0\\dir00\\dir000\\file0000',
     'dir0\\dir00\\file000',
     'dir0\\dir01\\file010',
     'dir0\\dir01\\file011',
     'dir1\\file10',
     'dir1\\file11',
     'dir1\\file12',
     'dir2\\dir20\\file200',
     'dir2\\file20',
     'file0',
     'file1']
    


  1. [Python 3]: os.scandir(path=’.’) (Python 3.5+, backport: [PyPI]: scandir)

    Return an iterator of os.DirEntry objects corresponding to the entries in the directory given by path. The entries are yielded in arbitrary order, and the special entries '.' and '..' are not included.

    Using scandir() instead of listdir() can significantly increase the performance of code that also needs file type or file attribute information, because os.DirEntry objects expose this information if the operating system provides it when scanning a directory. All os.DirEntry methods may perform a system call, but is_dir() and is_file() usually only require a system call for symbolic links; os.DirEntry.stat() always requires a system call on Unix but only requires one for symbolic links on Windows.


    >>> import os
    >>> root_dir = os.path.join(".", "root_dir")  # Explicitly prepending current directory
    >>> root_dir
    '.\\root_dir'
    >>>
    >>> scandir_iterator = os.scandir(root_dir)
    >>> scandir_iterator
    <nt.ScandirIterator object at 0x00000268CF4BC140>
    >>> [item.path for item in scandir_iterator]
    ['.\\root_dir\\dir0', '.\\root_dir\\dir1', '.\\root_dir\\dir2', '.\\root_dir\\dir3', '.\\root_dir\\file0', '.\\root_dir\\file1']
    >>>
    >>> [item.path for item in scandir_iterator]  # Will yield an empty list as it was consumed by previous iteration (automatically performed by the list comprehension)
    []
    >>>
    >>> scandir_iterator = os.scandir(root_dir)  # Reinitialize the generator
    >>> for item in scandir_iterator :
    ...     if os.path.isfile(item.path):
    ...             print(item.name)
    ...
    file0
    file1
    

    Notes:

    • It’s similar to os.listdir
    • But it’s also more flexible (and offers more functionality), more Pythonic (and in some cases, faster)


  1. [Python 3]: os.walk(top, topdown=True, onerror=None, followlinks=False)

    Generate the file names in a directory tree by walking the tree either top-down or bottom-up. For each directory in the tree rooted at directory top (including top itself), it yields a 3-tuple (dirpath, dirnames, filenames).


    >>> import os
    >>> root_dir = os.path.join(os.getcwd(), "root_dir")  # Specify the full path
    >>> root_dir
    'E:\\Work\\Dev\\StackOverflow\\q003207219\\root_dir'
    >>>
    >>> walk_generator = os.walk(root_dir)
    >>> root_dir_entry = next(walk_generator)  # First entry corresponds to the root dir (passed as an argument)
    >>> root_dir_entry
    ('E:\\Work\\Dev\\StackOverflow\\q003207219\\root_dir', ['dir0', 'dir1', 'dir2', 'dir3'], ['file0', 'file1'])
    >>>
    >>> root_dir_entry[1] + root_dir_entry[2]  # Display dirs and files (direct descendants) in a single list
    ['dir0', 'dir1', 'dir2', 'dir3', 'file0', 'file1']
    >>>
    >>> [os.path.join(root_dir_entry[0], item) for item in root_dir_entry[1] + root_dir_entry[2]]  # Display all the entries in the previous list by their full path
    ['E:\\Work\\Dev\\StackOverflow\\q003207219\\root_dir\\dir0', 'E:\\Work\\Dev\\StackOverflow\\q003207219\\root_dir\\dir1', 'E:\\Work\\Dev\\StackOverflow\\q003207219\\root_dir\\dir2', 'E:\\Work\\Dev\\StackOverflow\\q003207219\\root_dir\\dir3', 'E:\\Work\\Dev\\StackOverflow\\q003207219\\root_dir\\file0', 'E:\\Work\\Dev\\StackOverflow\\q003207219\\root_dir\\file1']
    >>>
    >>> for entry in walk_generator:  # Display the rest of the elements (corresponding to every subdir)
    ...     print(entry)
    ...
    ('E:\\Work\\Dev\\StackOverflow\\q003207219\\root_dir\\dir0', ['dir00', 'dir01', 'dir02'], [])
    ('E:\\Work\\Dev\\StackOverflow\\q003207219\\root_dir\\dir0\\dir00', ['dir000'], ['file000'])
    ('E:\\Work\\Dev\\StackOverflow\\q003207219\\root_dir\\dir0\\dir00\\dir000', [], ['file0000'])
    ('E:\\Work\\Dev\\StackOverflow\\q003207219\\root_dir\\dir0\\dir01', [], ['file010', 'file011'])
    ('E:\\Work\\Dev\\StackOverflow\\q003207219\\root_dir\\dir0\\dir02', ['dir020'], [])
    ('E:\\Work\\Dev\\StackOverflow\\q003207219\\root_dir\\dir0\\dir02\\dir020', ['dir0200'], [])
    ('E:\\Work\\Dev\\StackOverflow\\q003207219\\root_dir\\dir0\\dir02\\dir020\\dir0200', [], [])
    ('E:\\Work\\Dev\\StackOverflow\\q003207219\\root_dir\\dir1', [], ['file10', 'file11', 'file12'])
    ('E:\\Work\\Dev\\StackOverflow\\q003207219\\root_dir\\dir2', ['dir20'], ['file20'])
    ('E:\\Work\\Dev\\StackOverflow\\q003207219\\root_dir\\dir2\\dir20', [], ['file200'])
    ('E:\\Work\\Dev\\StackOverflow\\q003207219\\root_dir\\dir3', [], [])
    

    Notes:

    • Under the scenes, it uses os.scandir (os.listdir on older versions)
    • It does the heavy lifting by recurring in subfolders


  1. [Python 3]: glob.glob(pathname, *, recursive=False) ([Python 3]: glob.iglob(pathname, *, recursive=False))

    Return a possibly-empty list of path names that match pathname, which must be a string containing a path specification. pathname can be either absolute (like /usr/src/Python-1.5/Makefile) or relative (like ../../Tools/*/*.gif), and can contain shell-style wildcards. Broken symlinks are included in the results (as in the shell).

    Changed in version 3.5: Support for recursive globs using “**”.


    >>> import glob, os
    >>> wildcard_pattern = "*"
    >>> root_dir = os.path.join("root_dir", wildcard_pattern)  # Match every file/dir name
    >>> root_dir
    'root_dir\\*'
    >>>
    >>> glob_list = glob.glob(root_dir)
    >>> glob_list
    ['root_dir\\dir0', 'root_dir\\dir1', 'root_dir\\dir2', 'root_dir\\dir3', 'root_dir\\file0', 'root_dir\\file1']
    >>>
    >>> [item.replace("root_dir" + os.path.sep, "") for item in glob_list]  # Strip the dir name and the path separator from begining
    ['dir0', 'dir1', 'dir2', 'dir3', 'file0', 'file1']
    >>>
    >>> for entry in glob.iglob(root_dir + "*", recursive=True):
    ...     print(entry)
    ...
    root_dir\
    root_dir\dir0
    root_dir\dir0\dir00
    root_dir\dir0\dir00\dir000
    root_dir\dir0\dir00\dir000\file0000
    root_dir\dir0\dir00\file000
    root_dir\dir0\dir01
    root_dir\dir0\dir01\file010
    root_dir\dir0\dir01\file011
    root_dir\dir0\dir02
    root_dir\dir0\dir02\dir020
    root_dir\dir0\dir02\dir020\dir0200
    root_dir\dir1
    root_dir\dir1\file10
    root_dir\dir1\file11
    root_dir\dir1\file12
    root_dir\dir2
    root_dir\dir2\dir20
    root_dir\dir2\dir20\file200
    root_dir\dir2\file20
    root_dir\dir3
    root_dir\file0
    root_dir\file1
    

    Notes:

    • Uses os.listdir
    • For large trees (especially if recursive is on), iglob is preferred
    • Allows advanced filtering based on name (due to the wildcard)


  1. [Python 3]: class pathlib.Path(*pathsegments) (Python 3.4+, backport: [PyPI]: pathlib2)

    >>> import pathlib
    >>> root_dir = "root_dir"
    >>> root_dir_instance = pathlib.Path(root_dir)
    >>> root_dir_instance
    WindowsPath('root_dir')
    >>> root_dir_instance.name
    'root_dir'
    >>> root_dir_instance.is_dir()
    True
    >>>
    >>> [item.name for item in root_dir_instance.glob("*")]  # Wildcard searching for all direct descendants
    ['dir0', 'dir1', 'dir2', 'dir3', 'file0', 'file1']
    >>>
    >>> [os.path.join(item.parent.name, item.name) for item in root_dir_instance.glob("*") if not item.is_dir()]  # Display paths (including parent) for files only
    ['root_dir\\file0', 'root_dir\\file1']
    

    Notes:

    • This is one way of achieving our goal
    • It’s the OOP style of handling paths
    • Offers lots of functionalities


  1. [Python 2]: dircache.listdir(path) (Python 2 only)


    def listdir(path):
        """List directory contents, using cache."""
        try:
            cached_mtime, list = cache[path]
            del cache[path]
        except KeyError:
            cached_mtime, list = -1, []
        mtime = os.stat(path).st_mtime
        if mtime != cached_mtime:
            list = os.listdir(path)
            list.sort()
        cache[path] = mtime, list
        return list
    


  1. [man7]: OPENDIR(3) / [man7]: READDIR(3) / [man7]: CLOSEDIR(3) via [Python 3]: ctypes – A foreign function library for Python (POSIX specific)

    ctypes is a foreign function library for Python. It provides C compatible data types, and allows calling functions in DLLs or shared libraries. It can be used to wrap these libraries in pure Python.

    code_ctypes.py:

    #!/usr/bin/env python3
    
    import sys
    from ctypes import Structure, \
        c_ulonglong, c_longlong, c_ushort, c_ubyte, c_char, c_int, \
        CDLL, POINTER, \
        create_string_buffer, get_errno, set_errno, cast
    
    
    DT_DIR = 4
    DT_REG = 8
    
    char256 = c_char * 256
    
    
    class LinuxDirent64(Structure):
        _fields_ = [
            ("d_ino", c_ulonglong),
            ("d_off", c_longlong),
            ("d_reclen", c_ushort),
            ("d_type", c_ubyte),
            ("d_name", char256),
        ]
    
    LinuxDirent64Ptr = POINTER(LinuxDirent64)
    
    libc_dll = this_process = CDLL(None, use_errno=True)
    # ALWAYS set argtypes and restype for functions, otherwise it's UB!!!
    opendir = libc_dll.opendir
    readdir = libc_dll.readdir
    closedir = libc_dll.closedir
    
    
    def get_dir_content(path):
        ret = [path, list(), list()]
        dir_stream = opendir(create_string_buffer(path.encode()))
        if (dir_stream == 0):
            print("opendir returned NULL (errno: {:d})".format(get_errno()))
            return ret
        set_errno(0)
        dirent_addr = readdir(dir_stream)
        while dirent_addr:
            dirent_ptr = cast(dirent_addr, LinuxDirent64Ptr)
            dirent = dirent_ptr.contents
            name = dirent.d_name.decode()
            if dirent.d_type & DT_DIR:
                if name not in (".", ".."):
                    ret[1].append(name)
            elif dirent.d_type & DT_REG:
                ret[2].append(name)
            dirent_addr = readdir(dir_stream)
        if get_errno():
            print("readdir returned NULL (errno: {:d})".format(get_errno()))
        closedir(dir_stream)
        return ret
    
    
    def main():
        print("{:s} on {:s}\n".format(sys.version, sys.platform))
        root_dir = "root_dir"
        entries = get_dir_content(root_dir)
        print(entries)
    
    
    if __name__ == "__main__":
        main()
    

    Notes:

    • It loads the three functions from libc (loaded in the current process) and calls them (for more details check [SO]: How do I check whether a file exists without exceptions? (@CristiFati’s answer) – last notes from item #4.). That would place this approach very close to the Python / C edge
    • LinuxDirent64 is the ctypes representation of struct dirent64 from [man7]: dirent.h(0P) (so are the DT_ constants) from my machine: Ubtu 16 x64 (4.10.0-40-generic and libc6-dev:amd64). On other flavors/versions, the struct definition might differ, and if so, the ctypes alias should be updated, otherwise it will yield Undefined Behavior
    • It returns data in the os.walk‘s format. I didn’t bother to make it recursive, but starting from the existing code, that would be a fairly trivial task
    • Everything is doable on Win as well, the data (libraries, functions, structs, constants, …) differ


    Output:

    [cfati@cfati-ubtu16x64-0:~/Work/Dev/StackOverflow/q003207219]> ./code_ctypes.py
    3.5.2 (default, Nov 12 2018, 13:43:14)
    [GCC 5.4.0 20160609] on linux
    
    ['root_dir', ['dir2', 'dir1', 'dir3', 'dir0'], ['file1', 'file0']]
    


  1. [ActiveState.Docs]: win32file.FindFilesW (Win specific)

    Retrieves a list of matching filenames, using the Windows Unicode API. An interface to the API FindFirstFileW/FindNextFileW/Find close functions.


    >>> import os, win32file, win32con
    >>> root_dir = "root_dir"
    >>> wildcard = "*"
    >>> root_dir_wildcard = os.path.join(root_dir, wildcard)
    >>> entry_list = win32file.FindFilesW(root_dir_wildcard)
    >>> len(entry_list)  # Don't display the whole content as it's too long
    8
    >>> [entry[-2] for entry in entry_list]  # Only display the entry names
    ['.', '..', 'dir0', 'dir1', 'dir2', 'dir3', 'file0', 'file1']
    >>>
    >>> [entry[-2] for entry in entry_list if entry[0] & win32con.FILE_ATTRIBUTE_DIRECTORY and entry[-2] not in (".", "..")]  # Filter entries and only display dir names (except self and parent)
    ['dir0', 'dir1', 'dir2', 'dir3']
    >>>
    >>> [os.path.join(root_dir, entry[-2]) for entry in entry_list if entry[0] & (win32con.FILE_ATTRIBUTE_NORMAL | win32con.FILE_ATTRIBUTE_ARCHIVE)]  # Only display file "full" names
    ['root_dir\\file0', 'root_dir\\file1']
    

    Notes:


  1. Install some (other) third-party package that does the trick
    • Most likely, will rely on one (or more) of the above (maybe with slight customizations)


Notes:

  • Code is meant to be portable (except places that target a specific area – which are marked) or cross:

    • platform (Nix, Win, )
    • Python version (2, 3, )
  • Multiple path styles (absolute, relatives) were used across the above variants, to illustrate the fact that the “tools” used are flexible in this direction

  • os.listdir and os.scandir use opendir / readdir / closedir ([MS.Docs]: FindFirstFileW function / [MS.Docs]: FindNextFileW function / [MS.Docs]: FindClose function) (via [GitHub]: python/cpython – (master) cpython/Modules/posixmodule.c)

  • win32file.FindFilesW uses those (Win specific) functions as well (via [GitHub]: mhammond/pywin32 – (master) pywin32/win32/src/win32file.i)

  • _get_dir_content (from point #1.) can be implemented using any of these approaches (some will require more work and some less)

    • Some advanced filtering (instead of just file vs. dir) could be done: e.g. the include_folders argument could be replaced by another one (e.g. filter_func) which would be a function that takes a path as an argument: filter_func=lambda x: True (this doesn’t strip out anything) and inside _get_dir_content something like: if not filter_func(entry_with_path): continue (if the function fails for one entry, it will be skipped), but the more complex the code becomes, the longer it will take to execute
  • Nota bene! Since recursion is used, I must mention that I did some tests on my laptop (Win 10 x64), totally unrelated to this problem, and when the recursion level was reaching values somewhere in the (990 .. 1000) range (recursionlimit – 1000 (default)), I got StackOverflow :). If the directory tree exceeds that limit (I am not an FS expert, so I don’t know if that is even possible), that could be a problem.
    I must also mention that I didn’t try to increase recursionlimit because I have no experience in the area (how much can I increase it before having to also increase the stack at OS level), but in theory there will always be the possibility for failure, if the dir depth is larger than the highest possible recursionlimit (on that machine)

  • The code samples are for demonstrative purposes only. That means that I didn’t take into account error handling (I don’t think there’s any try / except / else / finally block), so the code is not robust (the reason is: to keep it as simple and short as possible). For production, error handling should be added as well

Other approaches:

  1. Use Python only as a wrapper

    • Everything is done using another technology
    • That technology is invoked from Python
    • The most famous flavor that I know is what I call the system administrator approach:

      • Use Python (or any programming language for that matter) in order to execute shell commands (and parse their outputs)
      • Some consider this a neat hack
      • I consider it more like a lame workaround (gainarie), as the action per se is performed from shell (cmd in this case), and thus doesn’t have anything to do with Python.
      • Filtering (grep / findstr) or output formatting could be done on both sides, but I’m not going to insist on it. Also, I deliberately used os.system instead of subprocess.Popen.
      (py35x64_test) E:\Work\Dev\StackOverflow\q003207219>"e:\Work\Dev\VEnvs\py35x64_test\Scripts\python.exe" -c "import os;os.system(\"dir /b root_dir\")"
      dir0
      dir1
      dir2
      dir3
      file0
      file1
      

    In general this approach is to be avoided, since if some command output format slightly differs between OS versions/flavors, the parsing code should be adapted as well; not to mention differences between locales).


回答 8

我真的很喜欢adamk的答案,建议您使用glob()同名模块中的。这使您可以与进行模式匹配*

但是,正如其他人在评论中指出的那样,它们glob()可能会因不一致的斜线方向而绊倒。为了解决这个问题,建议您使用模块中的join()expanduser()函数,也可以使用os.path模块中的getcwd()函数os

例如:

from glob import glob

# Return everything under C:\Users\admin that contains a folder called wlp.
glob('C:\Users\admin\*\wlp')

上面的代码很糟糕-路径已经过硬编码,并且只能在Windows上的驱动器名称和\s硬编码到路径之间使用。

from glob    import glob
from os.path import join

# Return everything under Users, admin, that contains a folder called wlp.
glob(join('Users', 'admin', '*', 'wlp'))

上面的方法效果更好,但是它依赖Users于Windows上经常发现的文件夹名称,而在其他OS上却很少见。它还依赖于具有特定名称的用户admin

from glob    import glob
from os.path import expanduser, join

# Return everything under the user directory that contains a folder called wlp.
glob(join(expanduser('~'), '*', 'wlp'))

这可以在所有平台上完美运行。

另一个很好的示例,它可以在各种平台上完美运行,并且有所不同:

from glob    import glob
from os      import getcwd
from os.path import join

# Return everything under the current directory that contains a folder called wlp.
glob(join(getcwd(), '*', 'wlp'))

希望这些示例可以帮助您了解在标准Python库模块中可以找到的一些功能的强大功能。

I really liked adamk’s answer, suggesting that you use glob(), from the module of the same name. This allows you to have pattern matching with *s.

But as other people pointed out in the comments, glob() can get tripped up over inconsistent slash directions. To help with that, I suggest you use the join() and expanduser() functions in the os.path module, and perhaps the getcwd() function in the os module, as well.

As examples:

from glob import glob

# Return everything under C:\Users\admin that contains a folder called wlp.
glob('C:\Users\admin\*\wlp')

The above is terrible – the path has been hardcoded and will only ever work on Windows between the drive name and the \s being hardcoded into the path.

from glob    import glob
from os.path import join

# Return everything under Users, admin, that contains a folder called wlp.
glob(join('Users', 'admin', '*', 'wlp'))

The above works better, but it relies on the folder name Users which is often found on Windows and not so often found on other OSs. It also relies on the user having a specific name, admin.

from glob    import glob
from os.path import expanduser, join

# Return everything under the user directory that contains a folder called wlp.
glob(join(expanduser('~'), '*', 'wlp'))

This works perfectly across all platforms.

Another great example that works perfectly across platforms and does something a bit different:

from glob    import glob
from os      import getcwd
from os.path import join

# Return everything under the current directory that contains a folder called wlp.
glob(join(getcwd(), '*', 'wlp'))

Hope these examples help you see the power of a few of the functions you can find in the standard Python library modules.


回答 9

def list_files(path):
    # returns a list of names (with extension, without full path) of all files 
    # in folder path
    files = []
    for name in os.listdir(path):
        if os.path.isfile(os.path.join(path, name)):
            files.append(name)
    return files 
def list_files(path):
    # returns a list of names (with extension, without full path) of all files 
    # in folder path
    files = []
    for name in os.listdir(path):
        if os.path.isfile(os.path.join(path, name)):
            files.append(name)
    return files 

回答 10

如果您正在寻找find的Python实现,这是我经常使用的食谱:

from findtools.find_files import (find_files, Match)

# Recursively find all *.sh files in **/usr/bin**
sh_files_pattern = Match(filetype='f', name='*.sh')
found_files = find_files(path='/usr/bin', match=sh_files_pattern)

for found_file in found_files:
    print found_file

因此,我用它制作了一个PyPI 软件包,并且还有一个GitHub存储库。我希望有人发现它可能对该代码有用。

If you are looking for a Python implementation of find, this is a recipe I use rather frequently:

from findtools.find_files import (find_files, Match)

# Recursively find all *.sh files in **/usr/bin**
sh_files_pattern = Match(filetype='f', name='*.sh')
found_files = find_files(path='/usr/bin', match=sh_files_pattern)

for found_file in found_files:
    print found_file

So I made a PyPI package out of it and there is also a GitHub repository. I hope that someone finds it potentially useful for this code.


回答 11

为了获得更好的结果,您可以listdir()os模块的方法与生成器一起使用(生成器是保持其状态的强大迭代器,还记得吗?)。以下代码在这两个版本上均可正常使用:Python 2和Python 3。

这是一个代码:

import os

def files(path):  
    for file in os.listdir(path):
        if os.path.isfile(os.path.join(path, file)):
            yield file

for file in files("."):  
    print (file)

listdir()方法返回给定目录的条目列表。如果给定的条目是文件,则该方法os.path.isfile()返回True。并且yield操作员退出功能但保持其当前状态,并且仅返回检测为文件的条目的名称。以上所有内容使我们可以循环生成器功能。

For greater results, you can use listdir() method of the os module along with a generator (a generator is a powerful iterator that keeps its state, remember?). The following code works fine with both versions: Python 2 and Python 3.

Here’s a code:

import os

def files(path):  
    for file in os.listdir(path):
        if os.path.isfile(os.path.join(path, file)):
            yield file

for file in files("."):  
    print (file)

The listdir() method returns the list of entries for the given directory. The method os.path.isfile() returns True if the given entry is a file. And the yield operator quits the func but keeps its current state, and it returns only the name of the entry detected as a file. All the above allows us to loop over the generator function.


回答 12

返回绝对文件路径的列表,不会递归到子目录中

L = [os.path.join(os.getcwd(),f) for f in os.listdir('.') if os.path.isfile(os.path.join(os.getcwd(),f))]

Returning a list of absolute filepaths, does not recurse into subdirectories

L = [os.path.join(os.getcwd(),f) for f in os.listdir('.') if os.path.isfile(os.path.join(os.getcwd(),f))]

回答 13

import os
import os.path


def get_files(target_dir):
    item_list = os.listdir(target_dir)

    file_list = list()
    for item in item_list:
        item_dir = os.path.join(target_dir,item)
        if os.path.isdir(item_dir):
            file_list += get_files(item_dir)
        else:
            file_list.append(item_dir)
    return file_list

在这里,我使用递归结构。

import os
import os.path


def get_files(target_dir):
    item_list = os.listdir(target_dir)

    file_list = list()
    for item in item_list:
        item_dir = os.path.join(target_dir,item)
        if os.path.isdir(item_dir):
            file_list += get_files(item_dir)
        else:
            file_list.append(item_dir)
    return file_list

Here I use a recursive structure.


回答 14

一位聪明的老师曾经告诉我:

当有几种确定的方法可以做某事时,没有一种方法适合所有情况。

因此,我将为问题的一个子集添加一个解决方案:很多时候,我们只想检查文件是否匹配开始字符串和结束字符串,而无需进入子目录。因此,我们想要一个返回文件名列表的函数,例如:

filenames = dir_filter('foo/baz', radical='radical', extension='.txt')

如果您想先声明两个函数,可以这样做:

def file_filter(filename, radical='', extension=''):
    "Check if a filename matches a radical and extension"
    if not filename:
        return False
    filename = filename.strip()
    return(filename.startswith(radical) and filename.endswith(extension))

def dir_filter(dirname='', radical='', extension=''):
    "Filter filenames in directory according to radical and extension"
    if not dirname:
        dirname = '.'
    return [filename for filename in os.listdir(dirname)
                if file_filter(filename, radical, extension)]

此解决方案可以使用正则表达式轻松进行一般化(pattern如果您不希望模式始终坚持文件名的开头或结尾,则可能需要添加一个参数)。

A wise teacher told me once that:

When there are several established ways to do something, none of them is good for all cases.

I will thus add a solution for a subset of the problem: quite often, we only want to check whether a file matches a start string and an end string, without going into subdirectories. We would thus like a function that returns a list of filenames, like:

filenames = dir_filter('foo/baz', radical='radical', extension='.txt')

If you care to first declare two functions, this can be done:

def file_filter(filename, radical='', extension=''):
    "Check if a filename matches a radical and extension"
    if not filename:
        return False
    filename = filename.strip()
    return(filename.startswith(radical) and filename.endswith(extension))

def dir_filter(dirname='', radical='', extension=''):
    "Filter filenames in directory according to radical and extension"
    if not dirname:
        dirname = '.'
    return [filename for filename in os.listdir(dirname)
                if file_filter(filename, radical, extension)]

This solution could be easily generalized with regular expressions (and you might want to add a pattern argument, if you do not want your patterns to always stick to the start or end of the filename).


回答 15

使用生成器

import os
def get_files(search_path):
     for (dirpath, _, filenames) in os.walk(search_path):
         for filename in filenames:
             yield os.path.join(dirpath, filename)
list_files = get_files('.')
for filename in list_files:
    print(filename)

Using generators

import os
def get_files(search_path):
     for (dirpath, _, filenames) in os.walk(search_path):
         for filename in filenames:
             yield os.path.join(dirpath, filename)
list_files = get_files('.')
for filename in list_files:
    print(filename)

回答 16

Python 3.4+的另一个非常易读的变体是使用pathlib.Path.glob:

from pathlib import Path
folder = '/foo'
[f for f in Path(folder).glob('*') if f.is_file()]

进行更具体的说明很简单,例如,在所有子目录中仅查找不是符号链接的Python源文件:

[f for f in Path(folder).glob('**/*.py') if not f.is_symlink()]

Another very readable variant for Python 3.4+ is using pathlib.Path.glob:

from pathlib import Path
folder = '/foo'
[f for f in Path(folder).glob('*') if f.is_file()]

It is simple to make more specific, e.g. only look for Python source files which are not symbolic links, also in all subdirectories:

[f for f in Path(folder).glob('**/*.py') if not f.is_symlink()]

回答 17

这是我的通用功能。它返回文件路径而不是文件名的列表,因为我发现它更有用。它具有一些可选参数,使其具有通用性。例如,我经常将其与诸如pattern='*.txt'或那样的参数一起使用subfolders=True

import os
import fnmatch

def list_paths(folder='.', pattern='*', case_sensitive=False, subfolders=False):
    """Return a list of the file paths matching the pattern in the specified 
    folder, optionally including files inside subfolders.
    """
    match = fnmatch.fnmatchcase if case_sensitive else fnmatch.fnmatch
    walked = os.walk(folder) if subfolders else [next(os.walk(folder))]
    return [os.path.join(root, f)
            for root, dirnames, filenames in walked
            for f in filenames if match(f, pattern)]

Here’s my general-purpose function for this. It returns a list of file paths rather than filenames since I found that to be more useful. It has a few optional arguments that make it versatile. For instance, I often use it with arguments like pattern='*.txt' or subfolders=True.

import os
import fnmatch

def list_paths(folder='.', pattern='*', case_sensitive=False, subfolders=False):
    """Return a list of the file paths matching the pattern in the specified 
    folder, optionally including files inside subfolders.
    """
    match = fnmatch.fnmatchcase if case_sensitive else fnmatch.fnmatch
    walked = os.walk(folder) if subfolders else [next(os.walk(folder))]
    return [os.path.join(root, f)
            for root, dirnames, filenames in walked
            for f in filenames if match(f, pattern)]

回答 18

我将提供一个示例liner,其中可以提供sourcepath和文件类型作为输入。该代码返回带有csv扩展名的文件名列表。使用万一需要返回所有文件。这还将递归扫描子目录。

[y for x in os.walk(sourcePath) for y in glob(os.path.join(x[0], '*.csv'))]

根据需要修改文件扩展名和源路径。

I will provide a sample one liner where sourcepath and file type can be provided as input. The code returns a list of filenames with csv extension. Use . in case all files needs to be returned. This will also recursively scans the subdirectories.

[y for x in os.walk(sourcePath) for y in glob(os.path.join(x[0], '*.csv'))]

Modify file extensions and source path as needed.


回答 19

对于python2:pip install rglob

import rglob
file_list=rglob.rglob("/home/base/dir/", "*")
print file_list

For python2: pip install rglob

import rglob
file_list=rglob.rglob("/home/base/dir/", "*")
print file_list

回答 20

dircache是“自2.6版起不推荐使用:dircache模块已在Python 3.0中删除。”

import dircache
list = dircache.listdir(pathname)
i = 0
check = len(list[0])
temp = []
count = len(list)
while count != 0:
  if len(list[i]) != check:
     temp.append(list[i-1])
     check = len(list[i])
  else:
    i = i + 1
    count = count - 1

print temp

dircache is “Deprecated since version 2.6: The dircache module has been removed in Python 3.0.”

import dircache
list = dircache.listdir(pathname)
i = 0
check = len(list[0])
temp = []
count = len(list)
while count != 0:
  if len(list[i]) != check:
     temp.append(list[i-1])
     check = len(list[i])
  else:
    i = i + 1
    count = count - 1

print temp

如何按值对字典排序?

问题:如何按值对字典排序?

我有一个从数据库的两个字段中读取的值的字典:字符串字段和数字字段。字符串字段是唯一的,因此这是字典的键。

我可以对键进行排序,但是如何根据值进行排序?

注意:我在这里阅读了堆栈溢出问题,如何按字典值对字典列表进行排序?可能会更改我的代码以包含字典列表,但是由于我实际上并不需要字典列表,因此我想知道是否存在一种更简单的解决方案来按升序或降序进行排序。

I have a dictionary of values read from two fields in a database: a string field and a numeric field. The string field is unique, so that is the key of the dictionary.

I can sort on the keys, but how can I sort based on the values?

Note: I have read Stack Overflow question here How do I sort a list of dictionaries by a value of the dictionary? and probably could change my code to have a list of dictionaries, but since I do not really need a list of dictionaries I wanted to know if there is a simpler solution to sort either in ascending or descending order.


回答 0

Python 3.6+

x = {1: 2, 3: 4, 4: 3, 2: 1, 0: 0}
{k: v for k, v in sorted(x.items(), key=lambda item: item[1])}
{0: 0, 2: 1, 1: 2, 4: 3, 3: 4}

旧版Python

无法对字典进行排序,只能获得已排序字典的表示形式。字典本质上是无序的,但其他类型(例如列表和元组)不是。因此,您需要一个有序的数据类型来表示排序后的值,该值将是一个列表-可能是一个元组列表。

例如,

import operator
x = {1: 2, 3: 4, 4: 3, 2: 1, 0: 0}
sorted_x = sorted(x.items(), key=operator.itemgetter(1))

sorted_x将是按每个元组中第二个元素排序的元组列表。dict(sorted_x) == x

对于那些希望对键而不是值进行排序的人:

import operator
x = {1: 2, 3: 4, 4: 3, 2: 1, 0: 0}
sorted_x = sorted(x.items(), key=operator.itemgetter(0))

在Python3中,由于不允许拆包[1],我们可以使用

x = {1: 2, 3: 4, 4: 3, 2: 1, 0: 0}
sorted_x = sorted(x.items(), key=lambda kv: kv[1])

如果要将输出作为字典,则可以使用collections.OrderedDict

import collections

sorted_dict = collections.OrderedDict(sorted_x)

Python 3.6+

x = {1: 2, 3: 4, 4: 3, 2: 1, 0: 0}
{k: v for k, v in sorted(x.items(), key=lambda item: item[1])}
{0: 0, 2: 1, 1: 2, 4: 3, 3: 4}

Older Python

It is not possible to sort a dictionary, only to get a representation of a dictionary that is sorted. Dictionaries are inherently orderless, but other types, such as lists and tuples, are not. So you need an ordered data type to represent sorted values, which will be a list—probably a list of tuples.

For instance,

import operator
x = {1: 2, 3: 4, 4: 3, 2: 1, 0: 0}
sorted_x = sorted(x.items(), key=operator.itemgetter(1))

sorted_x will be a list of tuples sorted by the second element in each tuple. dict(sorted_x) == x.

And for those wishing to sort on keys instead of values:

import operator
x = {1: 2, 3: 4, 4: 3, 2: 1, 0: 0}
sorted_x = sorted(x.items(), key=operator.itemgetter(0))

In Python3 since unpacking is not allowed [1] we can use

x = {1: 2, 3: 4, 4: 3, 2: 1, 0: 0}
sorted_x = sorted(x.items(), key=lambda kv: kv[1])

If you want the output as a dict, you can use collections.OrderedDict:

import collections

sorted_dict = collections.OrderedDict(sorted_x)

回答 1

简单如: sorted(dict1, key=dict1.get)

好吧,实际上可以执行“按字典值排序”。最近,我不得不在Code Golf(堆栈溢出问题Code golf:单词频率图表)中进行此操作。简而言之,问题是这样的:给定文本,计算遇到每个单词的频率,并显示按频率递减排序的最重要单词列表。

如果您以单词为键构建字典,每个单词的出现次数为值,则简化为:

from collections import defaultdict
d = defaultdict(int)
for w in text.split():
    d[w] += 1

那么您可以获取单词列表,sorted(d, key=d.get)并按使用频率排序-使用单词出现的次数作为排序键,对字典键进行排序迭代。

for w in sorted(d, key=d.get, reverse=True):
    print(w, d[w])

我正在写这个详细的说明,以说明人们通常所说的“我可以很容易地按键对字典进行排序,但是如何按值进行排序”的意思-我认为原始帖子试图解决这样的问题。解决方案是根据值对键列表进行排序,如上所示。

As simple as: sorted(dict1, key=dict1.get)

Well, it is actually possible to do a “sort by dictionary values”. Recently I had to do that in a Code Golf (Stack Overflow question Code golf: Word frequency chart). Abridged, the problem was of the kind: given a text, count how often each word is encountered and display a list of the top words, sorted by decreasing frequency.

If you construct a dictionary with the words as keys and the number of occurrences of each word as value, simplified here as:

from collections import defaultdict
d = defaultdict(int)
for w in text.split():
    d[w] += 1

then you can get a list of the words, ordered by frequency of use with sorted(d, key=d.get) – the sort iterates over the dictionary keys, using the number of word occurrences as a sort key .

for w in sorted(d, key=d.get, reverse=True):
    print(w, d[w])

I am writing this detailed explanation to illustrate what people often mean by “I can easily sort a dictionary by key, but how do I sort by value” – and I think the original post was trying to address such an issue. And the solution is to do sort of list of the keys, based on the values, as shown above.


回答 2

您可以使用:

sorted(d.items(), key=lambda x: x[1])

这将按照字典中每个条目的值(从最小到最大)对字典进行排序。

要对其进行降序排序,只需添加reverse=True

sorted(d.items(), key=lambda x: x[1], reverse=True)

输入:

d = {'one':1,'three':3,'five':5,'two':2,'four':4}
a = sorted(d.items(), key=lambda x: x[1])    
print(a)

输出:

[('one', 1), ('two', 2), ('three', 3), ('four', 4), ('five', 5)]

You could use:

sorted(d.items(), key=lambda x: x[1])

This will sort the dictionary by the values of each entry within the dictionary from smallest to largest.

To sort it in descending order just add reverse=True:

sorted(d.items(), key=lambda x: x[1], reverse=True)

Input:

d = {'one':1,'three':3,'five':5,'two':2,'four':4}
a = sorted(d.items(), key=lambda x: x[1])    
print(a)

Output:

[('one', 1), ('two', 2), ('three', 3), ('four', 4), ('five', 5)]

回答 3

字典无法排序,但您可以从中建立排序列表。

字典值的排序列表:

sorted(d.values())

(键,值)对的列表,按值排序:

from operator import itemgetter
sorted(d.items(), key=itemgetter(1))

Dicts can’t be sorted, but you can build a sorted list from them.

A sorted list of dict values:

sorted(d.values())

A list of (key, value) pairs, sorted by value:

from operator import itemgetter
sorted(d.items(), key=itemgetter(1))

回答 4

在最近的Python 2.7中,我们有了新的OrderedDict类型,该类型可以记住添加项目的顺序。

>>> d = {"third": 3, "first": 1, "fourth": 4, "second": 2}

>>> for k, v in d.items():
...     print "%s: %s" % (k, v)
...
second: 2
fourth: 4
third: 3
first: 1

>>> d
{'second': 2, 'fourth': 4, 'third': 3, 'first': 1}

要从原始字典中重新排序,请按以下值排序:

>>> from collections import OrderedDict
>>> d_sorted_by_value = OrderedDict(sorted(d.items(), key=lambda x: x[1]))

OrderedDict的行为类似于普通字典:

>>> for k, v in d_sorted_by_value.items():
...     print "%s: %s" % (k, v)
...
first: 1
second: 2
third: 3
fourth: 4

>>> d_sorted_by_value
OrderedDict([('first': 1), ('second': 2), ('third': 3), ('fourth': 4)])

In recent Python 2.7, we have the new OrderedDict type, which remembers the order in which the items were added.

>>> d = {"third": 3, "first": 1, "fourth": 4, "second": 2}

>>> for k, v in d.items():
...     print "%s: %s" % (k, v)
...
second: 2
fourth: 4
third: 3
first: 1

>>> d
{'second': 2, 'fourth': 4, 'third': 3, 'first': 1}

To make a new ordered dictionary from the original, sorting by the values:

>>> from collections import OrderedDict
>>> d_sorted_by_value = OrderedDict(sorted(d.items(), key=lambda x: x[1]))

The OrderedDict behaves like a normal dict:

>>> for k, v in d_sorted_by_value.items():
...     print "%s: %s" % (k, v)
...
first: 1
second: 2
third: 3
fourth: 4

>>> d_sorted_by_value
OrderedDict([('first': 1), ('second': 2), ('third': 3), ('fourth': 4)])

回答 5

更新:2015年12月5日使用Python 3.5

尽管我发现接受的答案很有用,但令我感到惊讶的是,它没有被更新为从标准库集合模块中引用OrderedDict作为可行的现代替代方案,旨在解决这类问题。

from operator import itemgetter
from collections import OrderedDict

x = {1: 2, 3: 4, 4: 3, 2: 1, 0: 0}
sorted_x = OrderedDict(sorted(x.items(), key=itemgetter(1)))
# OrderedDict([(0, 0), (2, 1), (1, 2), (4, 3), (3, 4)])

官方的OrderedDict文档也提供了一个非常相似的示例,但是对排序函数使用了lambda:

# regular unsorted dictionary
d = {'banana': 3, 'apple':4, 'pear': 1, 'orange': 2}

# dictionary sorted by value
OrderedDict(sorted(d.items(), key=lambda t: t[1]))
# OrderedDict([('pear', 1), ('orange', 2), ('banana', 3), ('apple', 4)])

UPDATE: 5 DECEMBER 2015 using Python 3.5

Whilst I found the accepted answer useful, I was also surprised that it hasn’t been updated to reference OrderedDict from the standard library collections module as a viable, modern alternative – designed to solve exactly this type of problem.

from operator import itemgetter
from collections import OrderedDict

x = {1: 2, 3: 4, 4: 3, 2: 1, 0: 0}
sorted_x = OrderedDict(sorted(x.items(), key=itemgetter(1)))
# OrderedDict([(0, 0), (2, 1), (1, 2), (4, 3), (3, 4)])

The official OrderedDict documentation offers a very similar example too, but using a lambda for the sort function:

# regular unsorted dictionary
d = {'banana': 3, 'apple':4, 'pear': 1, 'orange': 2}

# dictionary sorted by value
OrderedDict(sorted(d.items(), key=lambda t: t[1]))
# OrderedDict([('pear', 1), ('orange', 2), ('banana', 3), ('apple', 4)])

回答 6

Hank Gay的答案几乎相同:

sorted([(value,key) for (key,value) in mydict.items()])

或根据John Fouhy的建议进行了稍微优化:

sorted((value,key) for (key,value) in mydict.items())

Pretty much the same as Hank Gay’s answer:

sorted([(value,key) for (key,value) in mydict.items()])

Or optimized slightly as suggested by John Fouhy:

sorted((value,key) for (key,value) in mydict.items())

回答 7

使用namedtuple通常很方便。例如,您有一个“名称”作为键,而“分数”作为值的字典,并且您想对“分数”进行排序:

import collections
Player = collections.namedtuple('Player', 'score name')
d = {'John':5, 'Alex':10, 'Richard': 7}

首先以最低分数排序:

worst = sorted(Player(v,k) for (k,v) in d.items())

首先以最高分排序:

best = sorted([Player(v,k) for (k,v) in d.items()], reverse=True)

现在您可以得到Python的第二好玩家(index = 1)的名称和得分,如下所示:

player = best[1]
player.name
    'Richard'
player.score
    7

It can often be very handy to use namedtuple. For example, you have a dictionary of ‘name’ as keys and ‘score’ as values and you want to sort on ‘score’:

import collections
Player = collections.namedtuple('Player', 'score name')
d = {'John':5, 'Alex':10, 'Richard': 7}

sorting with lowest score first:

worst = sorted(Player(v,k) for (k,v) in d.items())

sorting with highest score first:

best = sorted([Player(v,k) for (k,v) in d.items()], reverse=True)

Now you can get the name and score of, let’s say the second-best player (index=1) very Pythonically like this:

player = best[1]
player.name
    'Richard'
player.score
    7

回答 8

Python 3.6开始,将对内置字典进行排序

好消息,因此OP从数据库中检索到的映射对的原始用例(以唯一的字符串ID作为键,将数值作为值)转换为内置Python v3.6 + dict,现在应该遵守插入顺序。

如果说从数据库查询中得到的两个列表表达式如下:

SELECT a_key, a_value FROM a_table ORDER BY a_value;

将存储在两个Python元组k_seq和v_seq中(按数字索引对齐,并且具有相同的长度),然后:

k_seq = ('foo', 'bar', 'baz')
v_seq = (0, 1, 42)
ordered_map = dict(zip(k_seq, v_seq))

允许以后输出为:

for k, v in ordered_map.items():
    print(k, v)

在这种情况下产生(对于新的Python 3.6+内置字典!):

foo 0
bar 1
baz 42

以v的每个值相同的顺序排列。

当前在我的机器上的Python 3.5安装位置生成:

bar 1
foo 0
baz 42

细节:

正如Raymond Hettinger在2012年提出的(请参见python-dev上的邮件,主题为“更紧凑的字典,迭代速度更快”),现在(2016年),Victor Stinner在给主题为“ Python 3.6 dict的 python-dev的邮件”中宣布紧凑并获得私有版本;由于在Python 3.6中已解决/实现了问题27350 “紧凑且有序的字典”,因此关键字变得有序”,我们现在可以使用内置的字典来维护插入顺序!!

希望这将导致第一步的薄层OrderedDict实现。就像@ JimFasarakis-Hilliard指出的那样,将来还会看到一些OrderedDict类型的用例。我认为整个Python社区都会仔细检查,是否经得起时间的考验以及下一步将是什么。

是时候重新考虑一下我们的编码习惯,不要错过以下稳定排序所带来的可能性:

  • 关键字参数和
  • (中级)字典存储

第一个是因为它在某些情况下简化了函数和方法的实现中的分派。

第二个参数鼓励使用dicts作为处理管道中的中间存储。

Raymond Hettinger 从旧金山Python Meetup Group的演讲2016-DEC-08中提供了解释“ Python 3.6词典背后的技术文档。

也许相当一部分Stack Overflow高修饰度的问答页面会收到此信息的变体,并且许多高质量的答案也需要按版本进行更新。

警告购买者(另请参阅下面的2017年12月15日更新):

正如@ajcr正确指出的那样:“此新实现的顺序保留方面被认为是实现细节,因此不应依赖。” (摘自whatsnew36)并不是很挑剔,引文有点悲观了;-)。它继续显示为“(将来可能会改变,但是希望在更改语言规范以强制所有当前和将来的Python实现保留顺序语义之前,先在几个版本中使用该语言的新dict实现;有助于保持与仍旧有效的随机迭代顺序的旧版语言(例如Python 3.5)的向后兼容性。”

因此,就像在某些人类语言(例如德语)中一样,用法决定了语言的使用方式,现在遗嘱已在whatsnew36中声明。

更新2017-12-15:

发给python-dev列表邮件中,Guido van Rossum声明:

做到这一点。裁定“裁定保留插入顺序”。谢谢!

因此,dict插入顺序的3.6 CPython版本的副作用现在已成为语言规范的一部分(并且不再仅仅是实现细节)。collections.OrderedDict正如雷蒙德·海廷格(Raymond Hettinger)在讨论中所提醒的那样,该邮件线程还浮出了一些与众不同的设计目标。

As of Python 3.6 the built-in dict will be ordered

Good news, so the OP’s original use case of mapping pairs retrieved from a database with unique string ids as keys and numeric values as values into a built-in Python v3.6+ dict, should now respect the insert order.

If say the resulting two column table expressions from a database query like:

SELECT a_key, a_value FROM a_table ORDER BY a_value;

would be stored in two Python tuples, k_seq and v_seq (aligned by numerical index and with the same length of course), then:

k_seq = ('foo', 'bar', 'baz')
v_seq = (0, 1, 42)
ordered_map = dict(zip(k_seq, v_seq))

Allow to output later as:

for k, v in ordered_map.items():
    print(k, v)

yielding in this case (for the new Python 3.6+ built-in dict!):

foo 0
bar 1
baz 42

in the same ordering per value of v.

Where in the Python 3.5 install on my machine it currently yields:

bar 1
foo 0
baz 42

Details:

As proposed in 2012 by Raymond Hettinger (cf. mail on python-dev with subject “More compact dictionaries with faster iteration”) and now (in 2016) announced in a mail by Victor Stinner to python-dev with subject “Python 3.6 dict becomes compact and gets a private version; and keywords become ordered” due to the fix/implementation of issue 27350 “Compact and ordered dict” in Python 3.6 we will now be able, to use a built-in dict to maintain insert order!!

Hopefully this will lead to a thin layer OrderedDict implementation as a first step. As @JimFasarakis-Hilliard indicated, some see use cases for the OrderedDict type also in the future. I think the Python community at large will carefully inspect, if this will stand the test of time, and what the next steps will be.

Time to rethink our coding habits to not miss the possibilities opened by stable ordering of:

  • Keyword arguments and
  • (intermediate) dict storage

The first because it eases dispatch in the implementation of functions and methods in some cases.

The second as it encourages to more easily use dicts as intermediate storage in processing pipelines.

Raymond Hettinger kindly provided documentation explaining “The Tech Behind Python 3.6 Dictionaries” – from his San Francisco Python Meetup Group presentation 2016-DEC-08.

And maybe quite some Stack Overflow high decorated question and answer pages will receive variants of this information and many high quality answers will require a per version update too.

Caveat Emptor (but also see below update 2017-12-15):

As @ajcr rightfully notes: “The order-preserving aspect of this new implementation is considered an implementation detail and should not be relied upon.” (from the whatsnew36) not nit picking, but the citation was cut a bit pessimistic ;-). It continues as ” (this may change in the future, but it is desired to have this new dict implementation in the language for a few releases before changing the language spec to mandate order-preserving semantics for all current and future Python implementations; this also helps preserve backwards-compatibility with older versions of the language where random iteration order is still in effect, e.g. Python 3.5).”

So as in some human languages (e.g. German), usage shapes the language, and the will now has been declared … in whatsnew36.

Update 2017-12-15:

In a mail to the python-dev list, Guido van Rossum declared:

Make it so. “Dict keeps insertion order” is the ruling. Thanks!

So, the version 3.6 CPython side-effect of dict insertion ordering is now becoming part of the language spec (and not anymore only an implementation detail). That mail thread also surfaced some distinguishing design goals for collections.OrderedDict as reminded by Raymond Hettinger during discussion.


回答 9

我有同样的问题,我这样解决了:

WantedOutput = sorted(MyDict, key=lambda x : MyDict[x]) 

(回答“无法对字典进行排序的人没有读过这个问题!实际上,“我可以对键进行排序,但是如何根据值进行排序?”显然意味着他想要一个列表)键根据其值的值排序。)

请注意,顺序定义不正确(具有相同值的键在输出列表中将以任意顺序排列)。

I had the same problem, and I solved it like this:

WantedOutput = sorted(MyDict, key=lambda x : MyDict[x]) 

(People who answer “It is not possible to sort a dict” did not read the question! In fact, “I can sort on the keys, but how can I sort based on the values?” clearly means that he wants a list of the keys sorted according to the value of their values.)

Please notice that the order is not well defined (keys with the same value will be in an arbitrary order in the output list).


回答 10

如果值是数字,则也可以Countercollections中使用。

from collections import Counter

x = {'hello': 1, 'python': 5, 'world': 3}
c = Counter(x)
print(c.most_common())

>> [('python', 5), ('world', 3), ('hello', 1)]    

If values are numeric you may also use Counter from collections.

from collections import Counter

x = {'hello': 1, 'python': 5, 'world': 3}
c = Counter(x)
print(c.most_common())

>> [('python', 5), ('world', 3), ('hello', 1)]    

回答 11

在Python 2.7中,只需执行以下操作:

from collections import OrderedDict
# regular unsorted dictionary
d = {'banana': 3, 'apple':4, 'pear': 1, 'orange': 2}

# dictionary sorted by key
OrderedDict(sorted(d.items(), key=lambda t: t[0]))
OrderedDict([('apple', 4), ('banana', 3), ('orange', 2), ('pear', 1)])

# dictionary sorted by value
OrderedDict(sorted(d.items(), key=lambda t: t[1]))
OrderedDict([('pear', 1), ('orange', 2), ('banana', 3), ('apple', 4)])

复制粘贴自:http : //docs.python.org/dev/library/collections.html#ordereddict-examples-and-recipes

请享用 ;-)

In Python 2.7, simply do:

from collections import OrderedDict
# regular unsorted dictionary
d = {'banana': 3, 'apple':4, 'pear': 1, 'orange': 2}

# dictionary sorted by key
OrderedDict(sorted(d.items(), key=lambda t: t[0]))
OrderedDict([('apple', 4), ('banana', 3), ('orange', 2), ('pear', 1)])

# dictionary sorted by value
OrderedDict(sorted(d.items(), key=lambda t: t[1]))
OrderedDict([('pear', 1), ('orange', 2), ('banana', 3), ('apple', 4)])

copy-paste from : http://docs.python.org/dev/library/collections.html#ordereddict-examples-and-recipes

Enjoy ;-)


回答 12

这是代码:

import operator
origin_list = [
    {"name": "foo", "rank": 0, "rofl": 20000},
    {"name": "Silly", "rank": 15, "rofl": 1000},
    {"name": "Baa", "rank": 300, "rofl": 20},
    {"name": "Zoo", "rank": 10, "rofl": 200},
    {"name": "Penguin", "rank": -1, "rofl": 10000}
]
print ">> Original >>"
for foo in origin_list:
    print foo

print "\n>> Rofl sort >>"
for foo in sorted(origin_list, key=operator.itemgetter("rofl")):
    print foo

print "\n>> Rank sort >>"
for foo in sorted(origin_list, key=operator.itemgetter("rank")):
    print foo

结果如下:

原版的

{'name': 'foo', 'rank': 0, 'rofl': 20000}
{'name': 'Silly', 'rank': 15, 'rofl': 1000}
{'name': 'Baa', 'rank': 300, 'rofl': 20}
{'name': 'Zoo', 'rank': 10, 'rofl': 200}
{'name': 'Penguin', 'rank': -1, 'rofl': 10000}

罗夫

{'name': 'Baa', 'rank': 300, 'rofl': 20}
{'name': 'Zoo', 'rank': 10, 'rofl': 200}
{'name': 'Silly', 'rank': 15, 'rofl': 1000}
{'name': 'Penguin', 'rank': -1, 'rofl': 10000}
{'name': 'foo', 'rank': 0, 'rofl': 20000}

{'name': 'Penguin', 'rank': -1, 'rofl': 10000}
{'name': 'foo', 'rank': 0, 'rofl': 20000}
{'name': 'Zoo', 'rank': 10, 'rofl': 200}
{'name': 'Silly', 'rank': 15, 'rofl': 1000}
{'name': 'Baa', 'rank': 300, 'rofl': 20}

This is the code:

import operator
origin_list = [
    {"name": "foo", "rank": 0, "rofl": 20000},
    {"name": "Silly", "rank": 15, "rofl": 1000},
    {"name": "Baa", "rank": 300, "rofl": 20},
    {"name": "Zoo", "rank": 10, "rofl": 200},
    {"name": "Penguin", "rank": -1, "rofl": 10000}
]
print ">> Original >>"
for foo in origin_list:
    print foo

print "\n>> Rofl sort >>"
for foo in sorted(origin_list, key=operator.itemgetter("rofl")):
    print foo

print "\n>> Rank sort >>"
for foo in sorted(origin_list, key=operator.itemgetter("rank")):
    print foo

Here are the results:

Original

{'name': 'foo', 'rank': 0, 'rofl': 20000}
{'name': 'Silly', 'rank': 15, 'rofl': 1000}
{'name': 'Baa', 'rank': 300, 'rofl': 20}
{'name': 'Zoo', 'rank': 10, 'rofl': 200}
{'name': 'Penguin', 'rank': -1, 'rofl': 10000}

Rofl

{'name': 'Baa', 'rank': 300, 'rofl': 20}
{'name': 'Zoo', 'rank': 10, 'rofl': 200}
{'name': 'Silly', 'rank': 15, 'rofl': 1000}
{'name': 'Penguin', 'rank': -1, 'rofl': 10000}
{'name': 'foo', 'rank': 0, 'rofl': 20000}

Rank

{'name': 'Penguin', 'rank': -1, 'rofl': 10000}
{'name': 'foo', 'rank': 0, 'rofl': 20000}
{'name': 'Zoo', 'rank': 10, 'rofl': 200}
{'name': 'Silly', 'rank': 15, 'rofl': 1000}
{'name': 'Baa', 'rank': 300, 'rofl': 20}

回答 13

请尝试以下方法。让我们用以下数据定义一个名为mydict的字典:

mydict = {'carl':40,
          'alan':2,
          'bob':1,
          'danny':3}

如果要按键对字典排序,可以执行以下操作:

for key in sorted(mydict.iterkeys()):
    print "%s: %s" % (key, mydict[key])

这应该返回以下输出:

alan: 2
bob: 1
carl: 40
danny: 3

另一方面,如果要按值对字典排序(如问题中所述),则可以执行以下操作:

for key, value in sorted(mydict.iteritems(), key=lambda (k,v): (v,k)):
    print "%s: %s" % (key, value)

该命令的结果(按值对字典进行排序)应返回以下内容:

bob: 1
alan: 2
danny: 3
carl: 40

Try the following approach. Let us define a dictionary called mydict with the following data:

mydict = {'carl':40,
          'alan':2,
          'bob':1,
          'danny':3}

If one wanted to sort the dictionary by keys, one could do something like:

for key in sorted(mydict.iterkeys()):
    print "%s: %s" % (key, mydict[key])

This should return the following output:

alan: 2
bob: 1
carl: 40
danny: 3

On the other hand, if one wanted to sort a dictionary by value (as is asked in the question), one could do the following:

for key, value in sorted(mydict.iteritems(), key=lambda (k,v): (v,k)):
    print "%s: %s" % (key, value)

The result of this command (sorting the dictionary by value) should return the following:

bob: 1
alan: 2
danny: 3
carl: 40

回答 14

从Python 3.6开始,dict对象现在按插入顺序排序。它正式在Python 3.7的规范中。

>>> words = {"python": 2, "blah": 4, "alice": 3}
>>> dict(sorted(words.items(), key=lambda x: x[1]))
{'python': 2, 'alice': 3, 'blah': 4}

在此之前,您必须使用OrderedDict

Python 3.7文档说:

在版本3.7中更改:保证字典顺序为插入顺序。此行为是3.6版CPython的实现细节。

Starting from Python 3.6, dict objects are now ordered by insertion order. It’s officially in the specs of Python 3.7.

>>> words = {"python": 2, "blah": 4, "alice": 3}
>>> dict(sorted(words.items(), key=lambda x: x[1]))
{'python': 2, 'alice': 3, 'blah': 4}

Before that, you had to use OrderedDict.

Python 3.7 documentation says:

Changed in version 3.7: Dictionary order is guaranteed to be insertion order. This behavior was implementation detail of CPython from 3.6.


回答 15

您可以创建一个“倒排索引”

from collections import defaultdict
inverse= defaultdict( list )
for k, v in originalDict.items():
    inverse[v].append( k )

现在您的逆数具有值;每个值都有一个适用键的列表。

for k in sorted(inverse):
    print k, inverse[k]

You can create an “inverted index”, also

from collections import defaultdict
inverse= defaultdict( list )
for k, v in originalDict.items():
    inverse[v].append( k )

Now your inverse has the values; each value has a list of applicable keys.

for k in sorted(inverse):
    print k, inverse[k]

回答 16

您可以使用collections.Counter。请注意,这对于数字和非数字值均适用。

>>> x = {1: 2, 3: 4, 4:3, 2:1, 0:0}
>>> from collections import Counter
>>> #To sort in reverse order
>>> Counter(x).most_common()
[(3, 4), (4, 3), (1, 2), (2, 1), (0, 0)]
>>> #To sort in ascending order
>>> Counter(x).most_common()[::-1]
[(0, 0), (2, 1), (1, 2), (4, 3), (3, 4)]
>>> #To get a dictionary sorted by values
>>> from collections import OrderedDict
>>> OrderedDict(Counter(x).most_common()[::-1])
OrderedDict([(0, 0), (2, 1), (1, 2), (4, 3), (3, 4)])

You can use the collections.Counter. Note, this will work for both numeric and non-numeric values.

>>> x = {1: 2, 3: 4, 4:3, 2:1, 0:0}
>>> from collections import Counter
>>> #To sort in reverse order
>>> Counter(x).most_common()
[(3, 4), (4, 3), (1, 2), (2, 1), (0, 0)]
>>> #To sort in ascending order
>>> Counter(x).most_common()[::-1]
[(0, 0), (2, 1), (1, 2), (4, 3), (3, 4)]
>>> #To get a dictionary sorted by values
>>> from collections import OrderedDict
>>> OrderedDict(Counter(x).most_common()[::-1])
OrderedDict([(0, 0), (2, 1), (1, 2), (4, 3), (3, 4)])

回答 17

您可以使用skip dict,这是一个按值永久排序的字典。

>>> data = {1: 2, 3: 4, 4: 3, 2: 1, 0: 0}
>>> SkipDict(data)
{0: 0.0, 2: 1.0, 1: 2.0, 4: 3.0, 3: 4.0}

如果使用keys()values()或者items()那么你会在排序顺序通过值迭代。

它是使用跳过列表数据结构实现的。

You can use a skip dict which is a dictionary that’s permanently sorted by value.

>>> data = {1: 2, 3: 4, 4: 3, 2: 1, 0: 0}
>>> SkipDict(data)
{0: 0.0, 2: 1.0, 1: 2.0, 4: 3.0, 3: 4.0}

If you use keys(), values() or items() then you’ll iterate in sorted order by value.

It’s implemented using the skip list datastructure.


回答 18

您还可以使用可以传递给键的自定义函数。

def dict_val(x):
    return x[1]
x = {1: 2, 3: 4, 4: 3, 2: 1, 0: 0}
sorted_x = sorted(x.items(), key=dict_val)

You can also use custom function that can be passed to key.

def dict_val(x):
    return x[1]
x = {1: 2, 3: 4, 4: 3, 2: 1, 0: 0}
sorted_x = sorted(x.items(), key=dict_val)

回答 19

from django.utils.datastructures import SortedDict

def sortedDictByKey(self,data):
    """Sorted dictionary order by key"""
    sortedDict = SortedDict()
    if data:
        if isinstance(data, dict):
            sortedKey = sorted(data.keys())
            for k in sortedKey:
                sortedDict[k] = data[k]
    return sortedDict
from django.utils.datastructures import SortedDict

def sortedDictByKey(self,data):
    """Sorted dictionary order by key"""
    sortedDict = SortedDict()
    if data:
        if isinstance(data, dict):
            sortedKey = sorted(data.keys())
            for k in sortedKey:
                sortedDict[k] = data[k]
    return sortedDict

回答 20

正如Dilettant指出的那样,Python 3.6现在将保持顺序!我以为我会分享我编写的一个函数,该函数可以简化可迭代对象(元组,列表,字典)的排序。在后一种情况下,您可以对键或值进行排序,并且可以考虑数字比较。仅适用于> = 3.6!

当您尝试在包含字符串和整数的可迭代对象上使用sorted时,sorted()将失败。当然,您可以使用str()强制进行字符串比较。但是,在某些情况下,您想进行小于的实际数字比较(在字符串比较中不是这种情况)。因此,我提出了以下建议。当您需要显式数值比较时,可以使用该标志,该标志将尝试通过将所有值都转换为浮点数来进行显式数值排序。如果成功,它将进行数字排序,否则将使用字符串比较。1220num_as_num

欢迎提出改进或推送要求的评论。

def sort_iterable(iterable, sort_on=None, reverse=False, num_as_num=False):
    def _sort(i):
      # sort by 0 = keys, 1 values, None for lists and tuples
      try:
        if num_as_num:
          if i is None:
            _sorted = sorted(iterable, key=lambda v: float(v), reverse=reverse)
          else:
            _sorted = dict(sorted(iterable.items(), key=lambda v: float(v[i]), reverse=reverse))
        else:
          raise TypeError
      except (TypeError, ValueError):
        if i is None:
          _sorted = sorted(iterable, key=lambda v: str(v), reverse=reverse)
        else:
          _sorted = dict(sorted(iterable.items(), key=lambda v: str(v[i]), reverse=reverse))

      return _sorted

    if isinstance(iterable, list):
      sorted_list = _sort(None)
      return sorted_list
    elif isinstance(iterable, tuple):
      sorted_list = tuple(_sort(None))
      return sorted_list
    elif isinstance(iterable, dict):
      if sort_on == 'keys':
        sorted_dict = _sort(0)
        return sorted_dict
      elif sort_on == 'values':
        sorted_dict = _sort(1)
        return sorted_dict
      elif sort_on is not None:
        raise ValueError(f"Unexpected value {sort_on} for sort_on. When sorting a dict, use key or values")
    else:
      raise TypeError(f"Unexpected type {type(iterable)} for iterable. Expected a list, tuple, or dict")

As pointed out by Dilettant, Python 3.6 will now keep the order! I thought I’d share a function I wrote that eases the sorting of an iterable (tuple, list, dict). In the latter case, you can sort either on keys or values, and it can take numeric comparison into account. Only for >= 3.6!

When you try using sorted on an iterable that holds e.g. strings as well as ints, sorted() will fail. Of course you can force string comparison with str(). However, in some cases you want to do actual numeric comparison where 12 is smaller than 20 (which is not the case in string comparison). So I came up with the following. When you want explicit numeric comparison you can use the flag num_as_num which will try to do explicit numeric sorting by trying to convert all values to floats. If that succeeds, it will do numeric sorting, otherwise it’ll resort to string comparison.

Comments for improvement or push requests welcome.

def sort_iterable(iterable, sort_on=None, reverse=False, num_as_num=False):
    def _sort(i):
      # sort by 0 = keys, 1 values, None for lists and tuples
      try:
        if num_as_num:
          if i is None:
            _sorted = sorted(iterable, key=lambda v: float(v), reverse=reverse)
          else:
            _sorted = dict(sorted(iterable.items(), key=lambda v: float(v[i]), reverse=reverse))
        else:
          raise TypeError
      except (TypeError, ValueError):
        if i is None:
          _sorted = sorted(iterable, key=lambda v: str(v), reverse=reverse)
        else:
          _sorted = dict(sorted(iterable.items(), key=lambda v: str(v[i]), reverse=reverse))

      return _sorted

    if isinstance(iterable, list):
      sorted_list = _sort(None)
      return sorted_list
    elif isinstance(iterable, tuple):
      sorted_list = tuple(_sort(None))
      return sorted_list
    elif isinstance(iterable, dict):
      if sort_on == 'keys':
        sorted_dict = _sort(0)
        return sorted_dict
      elif sort_on == 'values':
        sorted_dict = _sort(1)
        return sorted_dict
      elif sort_on is not None:
        raise ValueError(f"Unexpected value {sort_on} for sort_on. When sorting a dict, use key or values")
    else:
      raise TypeError(f"Unexpected type {type(iterable)} for iterable. Expected a list, tuple, or dict")

回答 21

这是在d.values()d.keys()上使用zip的解决方案。该链接(在Dictionary视图对象上)下面的几行是:

这允许使用zip()创建(值,键)对:pair = zip(d.values(),d.keys())。

因此,我们可以执行以下操作:

d = {'key1': 874.7, 'key2': 5, 'key3': 8.1}

d_sorted = sorted(zip(d.values(), d.keys()))

print d_sorted 
# prints: [(5, 'key2'), (8.1, 'key3'), (874.7, 'key1')]

Here is a solution using zip on d.values() and d.keys(). A few lines down this link (on Dictionary view objects) is:

This allows the creation of (value, key) pairs using zip(): pairs = zip(d.values(), d.keys()).

So we can do the following:

d = {'key1': 874.7, 'key2': 5, 'key3': 8.1}

d_sorted = sorted(zip(d.values(), d.keys()))

print d_sorted 
# prints: [(5, 'key2'), (8.1, 'key3'), (874.7, 'key1')]

回答 22

当然,请记住,您需要使用它,OrderedDict因为常规Python字典不会保留原始顺序。

from collections import OrderedDict
a = OrderedDict(sorted(originalDict.items(), key=lambda x: x[1]))

如果您没有Python 2.7或更高版本,则最好的办法是迭代生成器函数中的值。(有一个OrderedDict2.4和2.6 在这里,但

a)我不知道它的效果如何

b)当然,您必须下载并安装它。如果您没有管理权限,那么恐怕该选项不可用了。)


def gen(originalDict):
    for x, y in sorted(zip(originalDict.keys(), originalDict.values()), key=lambda z: z[1]):
        yield (x, y)
    #Yields as a tuple with (key, value). You can iterate with conditional clauses to get what you want. 

for bleh, meh in gen(myDict):
    if bleh == "foo":
        print(myDict[bleh])

您还可以打印出每个值

for bleh, meh in gen(myDict):
    print(bleh, meh)

如果未使用Python 3.0或更高版本,请记住在打印后删除括号

Of course, remember, you need to use OrderedDict because regular Python dictionaries don’t keep the original order.

from collections import OrderedDict
a = OrderedDict(sorted(originalDict.items(), key=lambda x: x[1]))

If you do not have Python 2.7 or higher, the best you can do is iterate over the values in a generator function. (There is an OrderedDict for 2.4 and 2.6 here, but

a) I don’t know about how well it works

and

b) You have to download and install it of course. If you do not have administrative access, then I’m afraid the option’s out.)


def gen(originalDict):
    for x, y in sorted(zip(originalDict.keys(), originalDict.values()), key=lambda z: z[1]):
        yield (x, y)
    #Yields as a tuple with (key, value). You can iterate with conditional clauses to get what you want. 

for bleh, meh in gen(myDict):
    if bleh == "foo":
        print(myDict[bleh])

You can also print out every value

for bleh, meh in gen(myDict):
    print(bleh, meh)

Please remember to remove the parentheses after print if not using Python 3.0 or above


回答 23

使用ValueSortedDicthttp://stardict.sourceforge.net/Dictionaries.php下载

from dicts.sorteddict import ValueSortedDict
d = {1: 2, 3: 4, 4:3, 2:1, 0:0}
sorted_dict = ValueSortedDict(d)
print sorted_dict.items() 

[(0, 0), (2, 1), (1, 2), (4, 3), (3, 4)]

Use ValueSortedDict from dicts:

from dicts.sorteddict import ValueSortedDict
d = {1: 2, 3: 4, 4:3, 2:1, 0:0}
sorted_dict = ValueSortedDict(d)
print sorted_dict.items() 

[(0, 0), (2, 1), (1, 2), (4, 3), (3, 4)]

回答 24

这适用于3.1.x:

import operator
slovar_sorted=sorted(slovar.items(), key=operator.itemgetter(1), reverse=True)
print(slovar_sorted)

This works in 3.1.x:

import operator
slovar_sorted=sorted(slovar.items(), key=operator.itemgetter(1), reverse=True)
print(slovar_sorted)

回答 25

刚刚从Python for Everyone中学习了相关技能。

您可以使用一个临时列表来帮助您对字典进行排序:

#Assume dictionary to be:
d = {'apple': 500.1, 'banana': 1500.2, 'orange': 1.0, 'pineapple': 789.0}

# create a temporary list
tmp = []

# iterate through the dictionary and append each tuple into the temporary list 
for key, value in d.items():
    tmptuple = (value, key)
    tmp.append(tmptuple)

# sort the list in ascending order
tmp = sorted(tmp)

print (tmp)

如果要按降序对列表进行排序,只需将原始排序行更改为:

tmp = sorted(tmp, reverse=True)

使用列表推导,一个衬里将是:

#Assuming the dictionary looks like
d = {'apple': 500.1, 'banana': 1500.2, 'orange': 1.0, 'pineapple': 789.0}
#One liner for sorting in ascending order
print (sorted([(v, k) for k, v in d.items()]))
#One liner for sorting in descending order
print (sorted([(v, k) for k, v in d.items()], reverse=True))

样本输出:

#Asending order
[(1.0, 'orange'), (500.1, 'apple'), (789.0, 'pineapple'), (1500.2, 'banana')]
#Descending order
[(1500.2, 'banana'), (789.0, 'pineapple'), (500.1, 'apple'), (1.0, 'orange')]

Just learned relevant skill from Python for Everybody.

You may use a temporary list to help you to sort the dictionary:

#Assume dictionary to be:
d = {'apple': 500.1, 'banana': 1500.2, 'orange': 1.0, 'pineapple': 789.0}

# create a temporary list
tmp = []

# iterate through the dictionary and append each tuple into the temporary list 
for key, value in d.items():
    tmptuple = (value, key)
    tmp.append(tmptuple)

# sort the list in ascending order
tmp = sorted(tmp)

print (tmp)

If you want to sort the list in descending order, simply change the original sorting line to:

tmp = sorted(tmp, reverse=True)

Using list comprehension, the one liner would be:

#Assuming the dictionary looks like
d = {'apple': 500.1, 'banana': 1500.2, 'orange': 1.0, 'pineapple': 789.0}
#One liner for sorting in ascending order
print (sorted([(v, k) for k, v in d.items()]))
#One liner for sorting in descending order
print (sorted([(v, k) for k, v in d.items()], reverse=True))

Sample Output:

#Asending order
[(1.0, 'orange'), (500.1, 'apple'), (789.0, 'pineapple'), (1500.2, 'banana')]
#Descending order
[(1500.2, 'banana'), (789.0, 'pineapple'), (500.1, 'apple'), (1.0, 'orange')]

回答 26

遍历字典并按其值降序对其进行排序:

$ python --version
Python 3.2.2

$ cat sort_dict_by_val_desc.py 
dictionary = dict(siis = 1, sana = 2, joka = 3, tuli = 4, aina = 5)
for word in sorted(dictionary, key=dictionary.get, reverse=True):
  print(word, dictionary[word])

$ python sort_dict_by_val_desc.py 
aina 5
tuli 4
joka 3
sana 2
siis 1

Iterate through a dict and sort it by its values in descending order:

$ python --version
Python 3.2.2

$ cat sort_dict_by_val_desc.py 
dictionary = dict(siis = 1, sana = 2, joka = 3, tuli = 4, aina = 5)
for word in sorted(dictionary, key=dictionary.get, reverse=True):
  print(word, dictionary[word])

$ python sort_dict_by_val_desc.py 
aina 5
tuli 4
joka 3
sana 2
siis 1

回答 27

如果您的值是整数,并且使用的是Python 2.7或更高版本,则可以使用collections.Counter代替dict。该most_common方法将为您提供所有项目,并按值排序。

If your values are integers, and you use Python 2.7 or newer, you can use collections.Counter instead of dict. The most_common method will give you all items, sorted by the value.


回答 28

为了完整起见,我发布了一个使用heapq的解决方案。请注意,此方法适用于数值和非数值

>>> x = {1: 2, 3: 4, 4:3, 2:1, 0:0}
>>> x_items = x.items()
>>> heapq.heapify(x_items)
>>> #To sort in reverse order
>>> heapq.nlargest(len(x_items),x_items, operator.itemgetter(1))
[(3, 4), (4, 3), (1, 2), (2, 1), (0, 0)]
>>> #To sort in ascending order
>>> heapq.nsmallest(len(x_items),x_items, operator.itemgetter(1))
[(0, 0), (2, 1), (1, 2), (4, 3), (3, 4)]

For the sake of completeness, I am posting a solution using heapq. Note, this method will work for both numeric and non-numeric values

>>> x = {1: 2, 3: 4, 4:3, 2:1, 0:0}
>>> x_items = x.items()
>>> heapq.heapify(x_items)
>>> #To sort in reverse order
>>> heapq.nlargest(len(x_items),x_items, operator.itemgetter(1))
[(3, 4), (4, 3), (1, 2), (2, 1), (0, 0)]
>>> #To sort in ascending order
>>> heapq.nsmallest(len(x_items),x_items, operator.itemgetter(1))
[(0, 0), (2, 1), (1, 2), (4, 3), (3, 4)]

回答 29

由于需要保持与旧版本Python的向后兼容性,我认为OrderedDict解决方案非常不明智。您需要适用于Python 2.7和更早版本的工具。

但是,另一个答案中提到的集合解决方案绝对是极好的,因为您可以重新训练键和值之间的联系,这对于字典而言极为重要。

我不同意另一个答案中提出的第一选择,因为它会丢掉钥匙。

我使用了上面提到的解决方案(如下所示的代码),并保留了对键和值的访问,在我的情况下,排序是在值上进行的,但重要的是在对值进行排序之后对键进行排序。

from collections import Counter

x = {'hello':1, 'python':5, 'world':3}
c=Counter(x)
print c.most_common()


>> [('python', 5), ('world', 3), ('hello', 1)]

Because of requirements to retain backward compatability with older versions of Python I think the OrderedDict solution is very unwise. You want something that works with Python 2.7 and older versions.

But the collections solution mentioned in another answer is absolutely superb, because you retrain a connection between the key and value which in the case of dictionaries is extremely important.

I don’t agree with the number one choice presented in another answer, because it throws away the keys.

I used the solution mentioned above (code shown below) and retained access to both keys and values and in my case the ordering was on the values, but the importance was the ordering of the keys after ordering the values.

from collections import Counter

x = {'hello':1, 'python':5, 'world':3}
c=Counter(x)
print c.most_common()


>> [('python', 5), ('world', 3), ('hello', 1)]

如何从列表列表中制作平面列表?

问题:如何从列表列表中制作平面列表?

我想知道是否有捷径可以从Python的列表清单中做出一个简单的清单。

我可以for循环执行此操作,但是也许有一些很酷的“单线”功能?我尝试使用reduce(),但出现错误。

l = [[1, 2, 3], [4, 5, 6], [7], [8, 9]]
reduce(lambda x, y: x.extend(y), l)

错误信息

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "<stdin>", line 1, in <lambda>
AttributeError: 'NoneType' object has no attribute 'extend'

I wonder whether there is a shortcut to make a simple list out of list of lists in Python.

I can do that in a for loop, but maybe there is some cool “one-liner”? I tried it with reduce(), but I get an error.

Code

l = [[1, 2, 3], [4, 5, 6], [7], [8, 9]]
reduce(lambda x, y: x.extend(y), l)

Error message

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "<stdin>", line 1, in <lambda>
AttributeError: 'NoneType' object has no attribute 'extend'

回答 0

给定一个列表列表l

flat_list = [item for sublist in l for item in sublist]

意思是:

flat_list = []
for sublist in l:
    for item in sublist:
        flat_list.append(item)

比到目前为止发布的快捷方式快。(l是要展平的列表。)

这是相应的功能:

flatten = lambda l: [item for sublist in l for item in sublist]

作为证据,您可以使用timeit标准库中的模块:

$ python -mtimeit -s'l=[[1,2,3],[4,5,6], [7], [8,9]]*99' '[item for sublist in l for item in sublist]'
10000 loops, best of 3: 143 usec per loop
$ python -mtimeit -s'l=[[1,2,3],[4,5,6], [7], [8,9]]*99' 'sum(l, [])'
1000 loops, best of 3: 969 usec per loop
$ python -mtimeit -s'l=[[1,2,3],[4,5,6], [7], [8,9]]*99' 'reduce(lambda x,y: x+y,l)'
1000 loops, best of 3: 1.1 msec per loop

说明:基于快捷方式+(包括中的隐含使用sum)的必要性是O(L**2)当有L个子列表时-随着中间结果列表的长度越来越长,每一步都会分配一个新的中间结果列表对象,并且所有项目必须复制前一个中间结果中的结果(以及最后添加的一些新结果)。因此,为简单起见,而又不失去一般性,请说您有I个项目的L个子列表:第一个I项目来回复制L-1次,第二个I项目L-2次,依此类推;等等。总份数是I乘以x从1到L的x的总和,即I * (L**2)/2

列表理解只生成一次列表,然后将每个项目(从其原始居住地复制到结果列表)也恰好复制一次。

Given a list of lists l,

flat_list = [item for sublist in l for item in sublist]

which means:

flat_list = []
for sublist in l:
    for item in sublist:
        flat_list.append(item)

is faster than the shortcuts posted so far. (l is the list to flatten.)

Here is the corresponding function:

flatten = lambda l: [item for sublist in l for item in sublist]

As evidence, you can use the timeit module in the standard library:

$ python -mtimeit -s'l=[[1,2,3],[4,5,6], [7], [8,9]]*99' '[item for sublist in l for item in sublist]'
10000 loops, best of 3: 143 usec per loop
$ python -mtimeit -s'l=[[1,2,3],[4,5,6], [7], [8,9]]*99' 'sum(l, [])'
1000 loops, best of 3: 969 usec per loop
$ python -mtimeit -s'l=[[1,2,3],[4,5,6], [7], [8,9]]*99' 'reduce(lambda x,y: x+y,l)'
1000 loops, best of 3: 1.1 msec per loop

Explanation: the shortcuts based on + (including the implied use in sum) are, of necessity, O(L**2) when there are L sublists — as the intermediate result list keeps getting longer, at each step a new intermediate result list object gets allocated, and all the items in the previous intermediate result must be copied over (as well as a few new ones added at the end). So, for simplicity and without actual loss of generality, say you have L sublists of I items each: the first I items are copied back and forth L-1 times, the second I items L-2 times, and so on; total number of copies is I times the sum of x for x from 1 to L excluded, i.e., I * (L**2)/2.

The list comprehension just generates one list, once, and copies each item over (from its original place of residence to the result list) also exactly once.


回答 1

您可以使用itertools.chain()

import itertools
list2d = [[1,2,3], [4,5,6], [7], [8,9]]
merged = list(itertools.chain(*list2d))

或者,您可以使用itertools.chain.from_iterable()不需要使用*运算符解压缩列表的方法:

import itertools
list2d = [[1,2,3], [4,5,6], [7], [8,9]]
merged = list(itertools.chain.from_iterable(list2d))

You can use itertools.chain():

import itertools
list2d = [[1,2,3], [4,5,6], [7], [8,9]]
merged = list(itertools.chain(*list2d))

Or you can use itertools.chain.from_iterable() which doesn’t require unpacking the list with the * operator:

import itertools
list2d = [[1,2,3], [4,5,6], [7], [8,9]]
merged = list(itertools.chain.from_iterable(list2d))

回答 2

作者注意:这是低效的。但是很有趣,因为类人猿很棒。它不适用于生产Python代码。

>>> sum(l, [])
[1, 2, 3, 4, 5, 6, 7, 8, 9]

这只是对在第一个参数中传递的iterable元素求和,将第二个参数视为和的初始值(如果未给出,0则改为使用和,这种情况下会给您带来错误)。

由于您是对嵌套列表求和,因此实际上得到[1,3]+[2,4]的结果sum([[1,3],[2,4]],[])等于[1,3,2,4]

请注意,仅适用于列表列表。对于列表列表列表,您将需要其他解决方案。

Note from the author: This is inefficient. But fun, because monoids are awesome. It’s not appropriate for production Python code.

>>> sum(l, [])
[1, 2, 3, 4, 5, 6, 7, 8, 9]

This just sums the elements of iterable passed in the first argument, treating second argument as the initial value of the sum (if not given, 0 is used instead and this case will give you an error).

Because you are summing nested lists, you actually get [1,3]+[2,4] as a result of sum([[1,3],[2,4]],[]), which is equal to [1,3,2,4].

Note that only works on lists of lists. For lists of lists of lists, you’ll need another solution.


回答 3

我使用perfplot(我的一个宠物项目,本质上是一个包装纸timeit)测试了大多数建议的解决方案,然后发现

functools.reduce(operator.iconcat, a, [])

串联多个小列表和几个长列表时,这是最快的解决方案。(operator.iadd同样快。)

在此处输入图片说明

在此处输入图片说明


复制剧情的代码:

import functools
import itertools
import numpy
import operator
import perfplot


def forfor(a):
    return [item for sublist in a for item in sublist]


def sum_brackets(a):
    return sum(a, [])


def functools_reduce(a):
    return functools.reduce(operator.concat, a)


def functools_reduce_iconcat(a):
    return functools.reduce(operator.iconcat, a, [])


def itertools_chain(a):
    return list(itertools.chain.from_iterable(a))


def numpy_flat(a):
    return list(numpy.array(a).flat)


def numpy_concatenate(a):
    return list(numpy.concatenate(a))


perfplot.show(
    setup=lambda n: [list(range(10))] * n,
    # setup=lambda n: [list(range(n))] * 10,
    kernels=[
        forfor,
        sum_brackets,
        functools_reduce,
        functools_reduce_iconcat,
        itertools_chain,
        numpy_flat,
        numpy_concatenate,
    ],
    n_range=[2 ** k for k in range(16)],
    xlabel="num lists (of length 10)",
    # xlabel="len lists (10 lists total)"
)

I tested most suggested solutions with perfplot (a pet project of mine, essentially a wrapper around timeit), and found

functools.reduce(operator.iconcat, a, [])

to be the fastest solution, both when many small lists and few long lists are concatenated. (operator.iadd is equally fast.)

enter image description here

enter image description here


Code to reproduce the plot:

import functools
import itertools
import numpy
import operator
import perfplot


def forfor(a):
    return [item for sublist in a for item in sublist]


def sum_brackets(a):
    return sum(a, [])


def functools_reduce(a):
    return functools.reduce(operator.concat, a)


def functools_reduce_iconcat(a):
    return functools.reduce(operator.iconcat, a, [])


def itertools_chain(a):
    return list(itertools.chain.from_iterable(a))


def numpy_flat(a):
    return list(numpy.array(a).flat)


def numpy_concatenate(a):
    return list(numpy.concatenate(a))


perfplot.show(
    setup=lambda n: [list(range(10))] * n,
    # setup=lambda n: [list(range(n))] * 10,
    kernels=[
        forfor,
        sum_brackets,
        functools_reduce,
        functools_reduce_iconcat,
        itertools_chain,
        numpy_flat,
        numpy_concatenate,
    ],
    n_range=[2 ** k for k in range(16)],
    xlabel="num lists (of length 10)",
    # xlabel="len lists (10 lists total)"
)

回答 4

from functools import reduce #python 3

>>> l = [[1,2,3],[4,5,6], [7], [8,9]]
>>> reduce(lambda x,y: x+y,l)
[1, 2, 3, 4, 5, 6, 7, 8, 9]

extend()您的示例中的方法将修改x而不是返回有用的值(期望值reduce())。

reduce版本的更快方法是

>>> import operator
>>> l = [[1,2,3],[4,5,6], [7], [8,9]]
>>> reduce(operator.concat, l)
[1, 2, 3, 4, 5, 6, 7, 8, 9]
from functools import reduce #python 3

>>> l = [[1,2,3],[4,5,6], [7], [8,9]]
>>> reduce(lambda x,y: x+y,l)
[1, 2, 3, 4, 5, 6, 7, 8, 9]

The extend() method in your example modifies x instead of returning a useful value (which reduce() expects).

A faster way to do the reduce version would be

>>> import operator
>>> l = [[1,2,3],[4,5,6], [7], [8,9]]
>>> reduce(operator.concat, l)
[1, 2, 3, 4, 5, 6, 7, 8, 9]

回答 5

如果您使用Django,请不要重新发明轮子:

>>> from django.contrib.admin.utils import flatten
>>> l = [[1,2,3], [4,5], [6]]
>>> flatten(l)
>>> [1, 2, 3, 4, 5, 6]

熊猫

>>> from pandas.core.common import flatten
>>> list(flatten(l))

Itertools

>>> import itertools
>>> flatten = itertools.chain.from_iterable
>>> list(flatten(l))

Matplotlib

>>> from matplotlib.cbook import flatten
>>> list(flatten(l))

Unipath

>>> from unipath.path import flatten
>>> list(flatten(l))

Setuptools

>>> from setuptools.namespaces import flatten
>>> list(flatten(l))

Don’t reinvent the wheel if you use Django:

>>> from django.contrib.admin.utils import flatten
>>> l = [[1,2,3], [4,5], [6]]
>>> flatten(l)
>>> [1, 2, 3, 4, 5, 6]

Pandas:

>>> from pandas.core.common import flatten
>>> list(flatten(l))

Itertools:

>>> import itertools
>>> flatten = itertools.chain.from_iterable
>>> list(flatten(l))

Matplotlib

>>> from matplotlib.cbook import flatten
>>> list(flatten(l))

Unipath:

>>> from unipath.path import flatten
>>> list(flatten(l))

Setuptools:

>>> from setuptools.namespaces import flatten
>>> list(flatten(l))

回答 6

这是适用于数字字符串嵌套列表和混合容器的通用方法。

#from typing import Iterable 
from collections import Iterable                            # < py38


def flatten(items):
    """Yield items from any nested iterable; see Reference."""
    for x in items:
        if isinstance(x, Iterable) and not isinstance(x, (str, bytes)):
            for sub_x in flatten(x):
                yield sub_x
        else:
            yield x

注意事项

  • 在Python 3中,yield from flatten(x)可以替换for sub_x in flatten(x): yield sub_x
  • 在Python 3.8,抽象基类移动collection.abc所述typing模块。

演示版

lst = [[1, 2, 3], [4, 5, 6], [7], [8, 9]]
list(flatten(lst))                                         # nested lists
# [1, 2, 3, 4, 5, 6, 7, 8, 9]

mixed = [[1, [2]], (3, 4, {5, 6}, 7), 8, "9"]              # numbers, strs, nested & mixed
list(flatten(mixed))
# [1, 2, 3, 4, 5, 6, 7, 8, '9']

参考

  • 此解决方案是根据Beazley,D.和B. Jones的食谱修改的。食谱4.14,Python Cookbook第三版,O’Reilly Media Inc.,塞巴斯托波尔,加利福尼亚:2013年。
  • 找到了较早的SO帖子,可能是原始的演示。

Here is a general approach that applies to numbers, strings, nested lists and mixed containers.

Code

#from typing import Iterable 
from collections import Iterable                            # < py38


def flatten(items):
    """Yield items from any nested iterable; see Reference."""
    for x in items:
        if isinstance(x, Iterable) and not isinstance(x, (str, bytes)):
            for sub_x in flatten(x):
                yield sub_x
        else:
            yield x

Notes:

  • In Python 3, yield from flatten(x) can replace for sub_x in flatten(x): yield sub_x
  • In Python 3.8, abstract base classes are moved from collection.abc to the typing module.

Demo

lst = [[1, 2, 3], [4, 5, 6], [7], [8, 9]]
list(flatten(lst))                                         # nested lists
# [1, 2, 3, 4, 5, 6, 7, 8, 9]

mixed = [[1, [2]], (3, 4, {5, 6}, 7), 8, "9"]              # numbers, strs, nested & mixed
list(flatten(mixed))
# [1, 2, 3, 4, 5, 6, 7, 8, '9']

Reference

  • This solution is modified from a recipe in Beazley, D. and B. Jones. Recipe 4.14, Python Cookbook 3rd Ed., O’Reilly Media Inc. Sebastopol, CA: 2013.
  • Found an earlier SO post, possibly the original demonstration.

回答 7

如果要展平不知道嵌套深度的数据结构,可以使用1iteration_utilities.deepflatten

>>> from iteration_utilities import deepflatten

>>> l = [[1, 2, 3], [4, 5, 6], [7], [8, 9]]
>>> list(deepflatten(l, depth=1))
[1, 2, 3, 4, 5, 6, 7, 8, 9]

>>> l = [[1, 2, 3], [4, [5, 6]], 7, [8, 9]]
>>> list(deepflatten(l))
[1, 2, 3, 4, 5, 6, 7, 8, 9]

它是一个生成器,因此您需要将结果list强制转换为或对其进行显式迭代。


如果只展平一个级别,并且每个项目本身都是可迭代的,则还可以使用iteration_utilities.flatten它本身只是一个薄包装itertools.chain.from_iterable

>>> from iteration_utilities import flatten
>>> l = [[1, 2, 3], [4, 5, 6], [7], [8, 9]]
>>> list(flatten(l))
[1, 2, 3, 4, 5, 6, 7, 8, 9]

只是添加一些时间(基于NicoSchlömer的答案,其中不包括此答案中提供的功能):

在此处输入图片说明

这是一个对数对数图,可以容纳跨度很大的值。对于定性推理:越低越好。

研究结果表明,如果迭代只包含几个内部iterables然后sum将最快,但长期iterables只itertools.chain.from_iterableiteration_utilities.deepflatten或嵌套的理解与合理的性能itertools.chain.from_iterable是最快的(如已被尼科Schlömer注意到)。

from itertools import chain
from functools import reduce
from collections import Iterable  # or from collections.abc import Iterable
import operator
from iteration_utilities import deepflatten

def nested_list_comprehension(lsts):
    return [item for sublist in lsts for item in sublist]

def itertools_chain_from_iterable(lsts):
    return list(chain.from_iterable(lsts))

def pythons_sum(lsts):
    return sum(lsts, [])

def reduce_add(lsts):
    return reduce(lambda x, y: x + y, lsts)

def pylangs_flatten(lsts):
    return list(flatten(lsts))

def flatten(items):
    """Yield items from any nested iterable; see REF."""
    for x in items:
        if isinstance(x, Iterable) and not isinstance(x, (str, bytes)):
            yield from flatten(x)
        else:
            yield x

def reduce_concat(lsts):
    return reduce(operator.concat, lsts)

def iteration_utilities_deepflatten(lsts):
    return list(deepflatten(lsts, depth=1))


from simple_benchmark import benchmark

b = benchmark(
    [nested_list_comprehension, itertools_chain_from_iterable, pythons_sum, reduce_add,
     pylangs_flatten, reduce_concat, iteration_utilities_deepflatten],
    arguments={2**i: [[0]*5]*(2**i) for i in range(1, 13)},
    argument_name='number of inner lists'
)

b.plot()

1免责声明:我是该图书馆的作者

If you want to flatten a data-structure where you don’t know how deep it’s nested you could use iteration_utilities.deepflatten1

>>> from iteration_utilities import deepflatten

>>> l = [[1, 2, 3], [4, 5, 6], [7], [8, 9]]
>>> list(deepflatten(l, depth=1))
[1, 2, 3, 4, 5, 6, 7, 8, 9]

>>> l = [[1, 2, 3], [4, [5, 6]], 7, [8, 9]]
>>> list(deepflatten(l))
[1, 2, 3, 4, 5, 6, 7, 8, 9]

It’s a generator so you need to cast the result to a list or explicitly iterate over it.


To flatten only one level and if each of the items is itself iterable you can also use iteration_utilities.flatten which itself is just a thin wrapper around itertools.chain.from_iterable:

>>> from iteration_utilities import flatten
>>> l = [[1, 2, 3], [4, 5, 6], [7], [8, 9]]
>>> list(flatten(l))
[1, 2, 3, 4, 5, 6, 7, 8, 9]

Just to add some timings (based on Nico Schlömer answer that didn’t include the function presented in this answer):

enter image description here

It’s a log-log plot to accommodate for the huge range of values spanned. For qualitative reasoning: Lower is better.

The results show that if the iterable contains only a few inner iterables then sum will be fastest, however for long iterables only the itertools.chain.from_iterable, iteration_utilities.deepflatten or the nested comprehension have reasonable performance with itertools.chain.from_iterable being the fastest (as already noticed by Nico Schlömer).

from itertools import chain
from functools import reduce
from collections import Iterable  # or from collections.abc import Iterable
import operator
from iteration_utilities import deepflatten

def nested_list_comprehension(lsts):
    return [item for sublist in lsts for item in sublist]

def itertools_chain_from_iterable(lsts):
    return list(chain.from_iterable(lsts))

def pythons_sum(lsts):
    return sum(lsts, [])

def reduce_add(lsts):
    return reduce(lambda x, y: x + y, lsts)

def pylangs_flatten(lsts):
    return list(flatten(lsts))

def flatten(items):
    """Yield items from any nested iterable; see REF."""
    for x in items:
        if isinstance(x, Iterable) and not isinstance(x, (str, bytes)):
            yield from flatten(x)
        else:
            yield x

def reduce_concat(lsts):
    return reduce(operator.concat, lsts)

def iteration_utilities_deepflatten(lsts):
    return list(deepflatten(lsts, depth=1))


from simple_benchmark import benchmark

b = benchmark(
    [nested_list_comprehension, itertools_chain_from_iterable, pythons_sum, reduce_add,
     pylangs_flatten, reduce_concat, iteration_utilities_deepflatten],
    arguments={2**i: [[0]*5]*(2**i) for i in range(1, 13)},
    argument_name='number of inner lists'
)

b.plot()

1 Disclaimer: I’m the author of that library


回答 8

我收回我的声明。总和不是赢家。尽管列表较小时速度更快。但是,列表较大时,性能会大大降低。

>>> timeit.Timer(
        '[item for sublist in l for item in sublist]',
        'l=[[1, 2, 3], [4, 5, 6, 7, 8], [1, 2, 3, 4, 5, 6, 7]] * 10000'
    ).timeit(100)
2.0440959930419922

sum版本仍在运行一分钟以上,尚未处理!

对于中型列表:

>>> timeit.Timer(
        '[item for sublist in l for item in sublist]',
        'l=[[1, 2, 3], [4, 5, 6, 7, 8], [1, 2, 3, 4, 5, 6, 7]] * 10'
    ).timeit()
20.126545906066895
>>> timeit.Timer(
        'reduce(lambda x,y: x+y,l)',
        'l=[[1, 2, 3], [4, 5, 6, 7, 8], [1, 2, 3, 4, 5, 6, 7]] * 10'
    ).timeit()
22.242258071899414
>>> timeit.Timer(
        'sum(l, [])',
        'l=[[1, 2, 3], [4, 5, 6, 7, 8], [1, 2, 3, 4, 5, 6, 7]] * 10'
    ).timeit()
16.449732065200806

使用小清单和时间:number = 1000000

>>> timeit.Timer(
        '[item for sublist in l for item in sublist]',
        'l=[[1, 2, 3], [4, 5, 6, 7, 8], [1, 2, 3, 4, 5, 6, 7]]'
    ).timeit()
2.4598159790039062
>>> timeit.Timer(
        'reduce(lambda x,y: x+y,l)',
        'l=[[1, 2, 3], [4, 5, 6, 7, 8], [1, 2, 3, 4, 5, 6, 7]]'
    ).timeit()
1.5289170742034912
>>> timeit.Timer(
        'sum(l, [])',
        'l=[[1, 2, 3], [4, 5, 6, 7, 8], [1, 2, 3, 4, 5, 6, 7]]'
    ).timeit()
1.0598428249359131

I take my statement back. sum is not the winner. Although it is faster when the list is small. But the performance degrades significantly with larger lists.

>>> timeit.Timer(
        '[item for sublist in l for item in sublist]',
        'l=[[1, 2, 3], [4, 5, 6, 7, 8], [1, 2, 3, 4, 5, 6, 7]] * 10000'
    ).timeit(100)
2.0440959930419922

The sum version is still running for more than a minute and it hasn’t done processing yet!

For medium lists:

>>> timeit.Timer(
        '[item for sublist in l for item in sublist]',
        'l=[[1, 2, 3], [4, 5, 6, 7, 8], [1, 2, 3, 4, 5, 6, 7]] * 10'
    ).timeit()
20.126545906066895
>>> timeit.Timer(
        'reduce(lambda x,y: x+y,l)',
        'l=[[1, 2, 3], [4, 5, 6, 7, 8], [1, 2, 3, 4, 5, 6, 7]] * 10'
    ).timeit()
22.242258071899414
>>> timeit.Timer(
        'sum(l, [])',
        'l=[[1, 2, 3], [4, 5, 6, 7, 8], [1, 2, 3, 4, 5, 6, 7]] * 10'
    ).timeit()
16.449732065200806

Using small lists and timeit: number=1000000

>>> timeit.Timer(
        '[item for sublist in l for item in sublist]',
        'l=[[1, 2, 3], [4, 5, 6, 7, 8], [1, 2, 3, 4, 5, 6, 7]]'
    ).timeit()
2.4598159790039062
>>> timeit.Timer(
        'reduce(lambda x,y: x+y,l)',
        'l=[[1, 2, 3], [4, 5, 6, 7, 8], [1, 2, 3, 4, 5, 6, 7]]'
    ).timeit()
1.5289170742034912
>>> timeit.Timer(
        'sum(l, [])',
        'l=[[1, 2, 3], [4, 5, 6, 7, 8], [1, 2, 3, 4, 5, 6, 7]]'
    ).timeit()
1.0598428249359131

回答 9

似乎与operator.add!当您将两个列表加在一起时,正确的术语是concat,而不是添加。operator.concat是您需要使用的。

如果您认为功能正常,那么就这么简单:

>>> from functools import reduce
>>> list2d = ((1, 2, 3), (4, 5, 6), (7,), (8, 9))
>>> reduce(operator.concat, list2d)
(1, 2, 3, 4, 5, 6, 7, 8, 9)

您会看到reduce尊重序列类型,因此在提供元组时,您会得到一个元组。让我们尝试一个列表:

>>> list2d = [[1, 2, 3],[4, 5, 6], [7], [8, 9]]
>>> reduce(operator.concat, list2d)
[1, 2, 3, 4, 5, 6, 7, 8, 9]

啊哈,您会得到一个清单。

性能如何:

>>> list2d = [[1, 2, 3],[4, 5, 6], [7], [8, 9]]
>>> %timeit list(itertools.chain.from_iterable(list2d))
1000000 loops, best of 3: 1.36 µs per loop

from_iterable相当快!但这是无法与相比的concat

>>> list2d = ((1, 2, 3),(4, 5, 6), (7,), (8, 9))
>>> %timeit reduce(operator.concat, list2d)
1000000 loops, best of 3: 492 ns per loop

There seems to be a confusion with operator.add! When you add two lists together, the correct term for that is concat, not add. operator.concat is what you need to use.

If you’re thinking functional, it is as easy as this::

>>> from functools import reduce
>>> list2d = ((1, 2, 3), (4, 5, 6), (7,), (8, 9))
>>> reduce(operator.concat, list2d)
(1, 2, 3, 4, 5, 6, 7, 8, 9)

You see reduce respects the sequence type, so when you supply a tuple, you get back a tuple. Let’s try with a list::

>>> list2d = [[1, 2, 3],[4, 5, 6], [7], [8, 9]]
>>> reduce(operator.concat, list2d)
[1, 2, 3, 4, 5, 6, 7, 8, 9]

Aha, you get back a list.

How about performance::

>>> list2d = [[1, 2, 3],[4, 5, 6], [7], [8, 9]]
>>> %timeit list(itertools.chain.from_iterable(list2d))
1000000 loops, best of 3: 1.36 µs per loop

from_iterable is pretty fast! But it’s no comparison to reduce with concat.

>>> list2d = ((1, 2, 3),(4, 5, 6), (7,), (8, 9))
>>> %timeit reduce(operator.concat, list2d)
1000000 loops, best of 3: 492 ns per loop

回答 10

为什么使用扩展?

reduce(lambda x, y: x+y, l)

这应该工作正常。

Why do you use extend?

reduce(lambda x, y: x+y, l)

This should work fine.


回答 11

考虑安装more_itertools软件包。

> pip install more_itertools

它附带了一个实现flattensource,来自itertools配方):

import more_itertools


lst = [[1, 2, 3], [4, 5, 6], [7], [8, 9]]
list(more_itertools.flatten(lst))
# [1, 2, 3, 4, 5, 6, 7, 8, 9]

从2.4版开始,您可以使用more_itertools.collapsesource,由abarnet提供)来展平更复杂,嵌套的可迭代对象。

lst = [[1, 2, 3], [4, 5, 6], [7], [8, 9]]
list(more_itertools.collapse(lst)) 
# [1, 2, 3, 4, 5, 6, 7, 8, 9]

lst = [[1, 2, 3], [[4, 5, 6]], [[[7]]], 8, 9]              # complex nesting
list(more_itertools.collapse(lst))
# [1, 2, 3, 4, 5, 6, 7, 8, 9]

Consider installing the more_itertools package.

> pip install more_itertools

It ships with an implementation for flatten (source, from the itertools recipes):

import more_itertools


lst = [[1, 2, 3], [4, 5, 6], [7], [8, 9]]
list(more_itertools.flatten(lst))
# [1, 2, 3, 4, 5, 6, 7, 8, 9]

As of version 2.4, you can flatten more complicated, nested iterables with more_itertools.collapse (source, contributed by abarnet).

lst = [[1, 2, 3], [4, 5, 6], [7], [8, 9]]
list(more_itertools.collapse(lst)) 
# [1, 2, 3, 4, 5, 6, 7, 8, 9]

lst = [[1, 2, 3], [[4, 5, 6]], [[[7]]], 8, 9]              # complex nesting
list(more_itertools.collapse(lst))
# [1, 2, 3, 4, 5, 6, 7, 8, 9]

回答 12

您的函数不起作用的原因是因为扩展名扩展了数组就位并且不返回它。您仍然可以使用以下方法从lambda返回x:

reduce(lambda x,y: x.extend(y) or x, l)

注意:扩展比列表上的+更有效。

The reason your function didn’t work is because the extend extends an array in-place and doesn’t return it. You can still return x from lambda, using something like this:

reduce(lambda x,y: x.extend(y) or x, l)

Note: extend is more efficient than + on lists.


回答 13

def flatten(l, a):
    for i in l:
        if isinstance(i, list):
            flatten(i, a)
        else:
            a.append(i)
    return a

print(flatten([[[1, [1,1, [3, [4,5,]]]], 2, 3], [4, 5],6], []))

# [1, 1, 1, 3, 4, 5, 2, 3, 4, 5, 6]
def flatten(l, a):
    for i in l:
        if isinstance(i, list):
            flatten(i, a)
        else:
            a.append(i)
    return a

print(flatten([[[1, [1,1, [3, [4,5,]]]], 2, 3], [4, 5],6], []))

# [1, 1, 1, 3, 4, 5, 2, 3, 4, 5, 6]

回答 14

递归版本

x = [1,2,[3,4],[5,[6,[7]]],8,9,[10]]

def flatten_list(k):
    result = list()
    for i in k:
        if isinstance(i,list):

            #The isinstance() function checks if the object (first argument) is an 
            #instance or subclass of classinfo class (second argument)

            result.extend(flatten_list(i)) #Recursive call
        else:
            result.append(i)
    return result

flatten_list(x)
#result = [1,2,3,4,5,6,7,8,9,10]

Recursive version

x = [1,2,[3,4],[5,[6,[7]]],8,9,[10]]

def flatten_list(k):
    result = list()
    for i in k:
        if isinstance(i,list):

            #The isinstance() function checks if the object (first argument) is an 
            #instance or subclass of classinfo class (second argument)

            result.extend(flatten_list(i)) #Recursive call
        else:
            result.append(i)
    return result

flatten_list(x)
#result = [1,2,3,4,5,6,7,8,9,10]

回答 15

matplotlib.cbook.flatten() 即使嵌套列表比示例嵌套更深,它也适用于嵌套列表。

import matplotlib
l = [[1, 2, 3], [4, 5, 6], [7], [8, 9]]
print(list(matplotlib.cbook.flatten(l)))
l2 = [[1, 2, 3], [4, 5, 6], [7], [8, [9, 10, [11, 12, [13]]]]]
print list(matplotlib.cbook.flatten(l2))

结果:

[1, 2, 3, 4, 5, 6, 7, 8, 9]
[1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13]

这比下划线._。flatten快18倍:

Average time over 1000 trials of matplotlib.cbook.flatten: 2.55e-05 sec
Average time over 1000 trials of underscore._.flatten: 4.63e-04 sec
(time for underscore._)/(time for matplotlib.cbook) = 18.1233394636

matplotlib.cbook.flatten() will work for nested lists even if they nest more deeply than the example.

import matplotlib
l = [[1, 2, 3], [4, 5, 6], [7], [8, 9]]
print(list(matplotlib.cbook.flatten(l)))
l2 = [[1, 2, 3], [4, 5, 6], [7], [8, [9, 10, [11, 12, [13]]]]]
print list(matplotlib.cbook.flatten(l2))

Result:

[1, 2, 3, 4, 5, 6, 7, 8, 9]
[1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13]

This is 18x faster than underscore._.flatten:

Average time over 1000 trials of matplotlib.cbook.flatten: 2.55e-05 sec
Average time over 1000 trials of underscore._.flatten: 4.63e-04 sec
(time for underscore._)/(time for matplotlib.cbook) = 18.1233394636

回答 16

在处理基于文本的可变长度列表时,可接受的答案对我不起作用。这是对我有用的另一种方法。

l = ['aaa', 'bb', 'cccccc', ['xx', 'yyyyyyy']]

接受的答案无效

flat_list = [item for sublist in l for item in sublist]
print(flat_list)
['a', 'a', 'a', 'b', 'b', 'c', 'c', 'c', 'c', 'c', 'c', 'xx', 'yyyyyyy']

新提出的解决方案,没有工作对我来说:

flat_list = []
_ = [flat_list.extend(item) if isinstance(item, list) else flat_list.append(item) for item in l if item]
print(flat_list)
['aaa', 'bb', 'cccccc', 'xx', 'yyyyyyy']

The accepted answer did not work for me when dealing with text-based lists of variable lengths. Here is an alternate approach that did work for me.

l = ['aaa', 'bb', 'cccccc', ['xx', 'yyyyyyy']]

Accepted answer that did not work:

flat_list = [item for sublist in l for item in sublist]
print(flat_list)
['a', 'a', 'a', 'b', 'b', 'c', 'c', 'c', 'c', 'c', 'c', 'xx', 'yyyyyyy']

New proposed solution that did work for me:

flat_list = []
_ = [flat_list.extend(item) if isinstance(item, list) else flat_list.append(item) for item in l if item]
print(flat_list)
['aaa', 'bb', 'cccccc', 'xx', 'yyyyyyy']

回答 17

上面的Anil函数的一个坏功能是,它要求用户始终手动将第二个参数指定为空列表[]。相反,这应该是默认设置。由于Python对象的工作方式,这些对象应在函数内部而不是参数中设置。

这是一个工作功能:

def list_flatten(l, a=None):
    #check a
    if a is None:
        #initialize with empty list
        a = []

    for i in l:
        if isinstance(i, list):
            list_flatten(i, a)
        else:
            a.append(i)
    return a

测试:

In [2]: lst = [1, 2, [3], [[4]],[5,[6]]]

In [3]: lst
Out[3]: [1, 2, [3], [[4]], [5, [6]]]

In [11]: list_flatten(lst)
Out[11]: [1, 2, 3, 4, 5, 6]

An bad feature of Anil’s function above is that it requires the user to always manually specify the second argument to be an empty list []. This should instead be a default. Due to the way Python objects work, these should be set inside the function, not in the arguments.

Here’s a working function:

def list_flatten(l, a=None):
    #check a
    if a is None:
        #initialize with empty list
        a = []

    for i in l:
        if isinstance(i, list):
            list_flatten(i, a)
        else:
            a.append(i)
    return a

Testing:

In [2]: lst = [1, 2, [3], [[4]],[5,[6]]]

In [3]: lst
Out[3]: [1, 2, [3], [[4]], [5, [6]]]

In [11]: list_flatten(lst)
Out[11]: [1, 2, 3, 4, 5, 6]

回答 18

以下对我来说似乎最简单:

>>> import numpy as np
>>> l = [[1, 2, 3], [4, 5, 6], [7], [8, 9]]
>>> print (np.concatenate(l))
[1 2 3 4 5 6 7 8 9]

Following seem simplest to me:

>>> import numpy as np
>>> l = [[1, 2, 3], [4, 5, 6], [7], [8, 9]]
>>> print (np.concatenate(l))
[1 2 3 4 5 6 7 8 9]

回答 19

也可以使用NumPy的flat

import numpy as np
list(np.array(l).flat)

编辑11/02/2016:仅当子列表具有相同尺寸时才可用。

One can also use NumPy’s flat:

import numpy as np
list(np.array(l).flat)

Edit 11/02/2016: Only works when sublists have identical dimensions.


回答 20

您可以使用numpy:
flat_list = list(np.concatenate(list_of_list))

You can use numpy :
flat_list = list(np.concatenate(list_of_list))


回答 21

如果您愿意放弃一点速度以获得更干净的外观,则可以使用numpy.concatenate().tolist()numpy.concatenate().ravel().tolist()

import numpy

l = [[1, 2, 3], [4, 5, 6], [7], [8, 9]] * 99

%timeit numpy.concatenate(l).ravel().tolist()
1000 loops, best of 3: 313 µs per loop

%timeit numpy.concatenate(l).tolist()
1000 loops, best of 3: 312 µs per loop

%timeit [item for sublist in l for item in sublist]
1000 loops, best of 3: 31.5 µs per loop

您可以在docs numpy.concatenatenumpy.ravel中找到更多信息

If you are willing to give up a tiny amount of speed for a cleaner look, then you could use numpy.concatenate().tolist() or numpy.concatenate().ravel().tolist():

import numpy

l = [[1, 2, 3], [4, 5, 6], [7], [8, 9]] * 99

%timeit numpy.concatenate(l).ravel().tolist()
1000 loops, best of 3: 313 µs per loop

%timeit numpy.concatenate(l).tolist()
1000 loops, best of 3: 312 µs per loop

%timeit [item for sublist in l for item in sublist]
1000 loops, best of 3: 31.5 µs per loop

You can find out more here in the docs numpy.concatenate and numpy.ravel


回答 22

我找到的最快解决方案(无论如何都是大型列表):

import numpy as np
#turn list into an array and flatten()
np.array(l).flatten()

做完了!您当然可以通过执行list(l)将其转换为列表

Fastest solution I have found (for large list anyway):

import numpy as np
#turn list into an array and flatten()
np.array(l).flatten()

Done! You can of course turn it back into a list by executing list(l)


回答 23

underscore.py包装风扇的简单代码

from underscore import _
_.flatten([[1, 2, 3], [4, 5, 6], [7], [8, 9]])
# [1, 2, 3, 4, 5, 6, 7, 8, 9]

它解决了所有扁平化问题(无列表项或复杂的嵌套)

from underscore import _
# 1 is none list item
# [2, [3]] is complex nesting
_.flatten([1, [2, [3]], [4, 5, 6], [7], [8, 9]])
# [1, 2, 3, 4, 5, 6, 7, 8, 9]

您可以underscore.py使用pip 安装

pip install underscore.py

Simple code for underscore.py package fan

from underscore import _
_.flatten([[1, 2, 3], [4, 5, 6], [7], [8, 9]])
# [1, 2, 3, 4, 5, 6, 7, 8, 9]

It solves all flatten problems (none list item or complex nesting)

from underscore import _
# 1 is none list item
# [2, [3]] is complex nesting
_.flatten([1, [2, [3]], [4, 5, 6], [7], [8, 9]])
# [1, 2, 3, 4, 5, 6, 7, 8, 9]

You can install underscore.py with pip

pip install underscore.py

回答 24

def flatten(alist):
    if alist == []:
        return []
    elif type(alist) is not list:
        return [alist]
    else:
        return flatten(alist[0]) + flatten(alist[1:])
def flatten(alist):
    if alist == []:
        return []
    elif type(alist) is not list:
        return [alist]
    else:
        return flatten(alist[0]) + flatten(alist[1:])

回答 25

注意:以下内容适用于Python 3.3+,因为它使用yield_fromsix也是第三方软件包,尽管它很稳定。或者,您可以使用sys.version


在的情况下obj = [[1, 2,], [3, 4], [5, 6]],这里的所有解决方案都不错,包括列表理解和itertools.chain.from_iterable

但是,请考虑以下稍微复杂的情况:

>>> obj = [[1, 2, 3], [4, 5], 6, 'abc', [7], [8, [9, 10]]]

这里有几个问题:

  • 一个元素6只是一个标量。它是不可迭代的,因此上述路由将在此处失败。
  • 其中一个要素,'abc'技术上可迭代(所有str s为)。但是,在行与行之间进行一点阅读时,您并不想这样处理-您希望将其视为单个元素。
  • 最后一个元素[8, [9, 10]]本身就是嵌套的可迭代对象。基本列表理解,chain.from_iterable仅提取“下一级”。

您可以通过以下方法对此进行补救:

>>> from collections import Iterable
>>> from six import string_types

>>> def flatten(obj):
...     for i in obj:
...         if isinstance(i, Iterable) and not isinstance(i, string_types):
...             yield from flatten(i)
...         else:
...             yield i


>>> list(flatten(obj))
[1, 2, 3, 4, 5, 6, 'abc', 7, 8, 9, 10]

在这里,您检查子元素(1)是否可通过Iterable,ABC从进行迭代itertools,但还要确保(2)元素不是 “字符串状”的。

Note: Below applies to Python 3.3+ because it uses yield_from. six is also a third-party package, though it is stable. Alternately, you could use sys.version.


In the case of obj = [[1, 2,], [3, 4], [5, 6]], all of the solutions here are good, including list comprehension and itertools.chain.from_iterable.

However, consider this slightly more complex case:

>>> obj = [[1, 2, 3], [4, 5], 6, 'abc', [7], [8, [9, 10]]]

There are several problems here:

  • One element, 6, is just a scalar; it’s not iterable, so the above routes will fail here.
  • One element, 'abc', is technically iterable (all strs are). However, reading between the lines a bit, you don’t want to treat it as such–you want to treat it as a single element.
  • The final element, [8, [9, 10]] is itself a nested iterable. Basic list comprehension and chain.from_iterable only extract “1 level down.”

You can remedy this as follows:

>>> from collections import Iterable
>>> from six import string_types

>>> def flatten(obj):
...     for i in obj:
...         if isinstance(i, Iterable) and not isinstance(i, string_types):
...             yield from flatten(i)
...         else:
...             yield i


>>> list(flatten(obj))
[1, 2, 3, 4, 5, 6, 'abc', 7, 8, 9, 10]

Here, you check that the sub-element (1) is iterable with Iterable, an ABC from itertools, but also want to ensure that (2) the element is not “string-like.”


回答 26

flat_list = []
for i in list_of_list:
    flat_list+=i

该代码也可以很好地工作,因为它会一直扩展列表。虽然非常相似,但是只有一个for循环。因此,它比添加2 for循环具有更少的复杂性。

flat_list = []
for i in list_of_list:
    flat_list+=i

This Code also works fine as it just extend the list all the way. Although it is much similar but only have one for loop. So It have less complexity than adding 2 for loops.


回答 27

from nltk import flatten

l = [[1, 2, 3], [4, 5, 6], [7], [8, 9]]
flatten(l)

这种解决方案相对于大多数其他解决方案的优势在于,如果您有类似以下的列表:

l = [1, [2, 3], [4, 5, 6], [7], [8, 9]]

虽然其他大多数解决方案都会引发错误,但此解决方案可以解决这些问题。

from nltk import flatten

l = [[1, 2, 3], [4, 5, 6], [7], [8, 9]]
flatten(l)

The advantage of this solution over most others here is that if you have a list like:

l = [1, [2, 3], [4, 5, 6], [7], [8, 9]]

while most other solutions throw an error this solution handles them.


回答 28

这可能不是最有效的方法,但我认为应该放一个衬里(实际上是两个衬里)。两种版本均可在任意层次的嵌套列表上使用,并利用语言功能(Python3.5)和递归。

def make_list_flat (l):
    flist = []
    flist.extend ([l]) if (type (l) is not list) else [flist.extend (make_list_flat (e)) for e in l]
    return flist

a = [[1, 2], [[[[3, 4, 5], 6]]], 7, [8, [9, [10, 11], 12, [13, 14, [15, [[16, 17], 18]]]]]]
flist = make_list_flat(a)
print (flist)

输出是

[1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18]

这以深度优先的方式工作。递归向下进行,直到找到一个非列表元素,然后扩展局部变量flist,然后将其回滚到父变量。每当flist返回时,它就会扩展到flist列表理解中的父级。因此,从根本上返回一个平面列表。

上面的代码创建了几个本地列表并返回它们,用于扩展父级列表。我认为解决此问题的方法可能是创建gloabl flist,如下所示。

a = [[1, 2], [[[[3, 4, 5], 6]]], 7, [8, [9, [10, 11], 12, [13, 14, [15, [[16, 17], 18]]]]]]
flist = []
def make_list_flat (l):
    flist.extend ([l]) if (type (l) is not list) else [make_list_flat (e) for e in l]

make_list_flat(a)
print (flist)

输出再次

[1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18]

尽管目前我不确定效率。

This may not be the most efficient way but I thought to put a one-liner (actually a two-liner). Both versions will work on arbitrary hierarchy nested lists, and exploits language features (Python3.5) and recursion.

def make_list_flat (l):
    flist = []
    flist.extend ([l]) if (type (l) is not list) else [flist.extend (make_list_flat (e)) for e in l]
    return flist

a = [[1, 2], [[[[3, 4, 5], 6]]], 7, [8, [9, [10, 11], 12, [13, 14, [15, [[16, 17], 18]]]]]]
flist = make_list_flat(a)
print (flist)

The output is

[1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18]

This works in a depth first manner. The recursion goes down until it finds a non-list element, then extends the local variable flist and then rolls back it to the parent. Whenever flist is returned, it is extended to the parent’s flist in the list comprehension. Therefore, at the root, a flat list is returned.

The above one creates several local lists and returns them which are used to extend the parent’s list. I think the way around for this may be creating a gloabl flist, like below.

a = [[1, 2], [[[[3, 4, 5], 6]]], 7, [8, [9, [10, 11], 12, [13, 14, [15, [[16, 17], 18]]]]]]
flist = []
def make_list_flat (l):
    flist.extend ([l]) if (type (l) is not list) else [make_list_flat (e) for e in l]

make_list_flat(a)
print (flist)

The output is again

[1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18]

Although I am not sure at this time about the efficiency.


回答 29

适用于整数的异质和均质列表的另一种异常方法:

from typing import List


def flatten(l: list) -> List[int]:
    """Flatten an arbitrary deep nested list of lists of integers.

    Examples:
        >>> flatten([1, 2, [1, [10]]])
        [1, 2, 1, 10]

    Args:
        l: Union[l, Union[int, List[int]]

    Returns:
        Flatted list of integer
    """
    return [int(i.strip('[ ]')) for i in str(l).split(',')]

Another unusual approach that works for hetero- and homogeneous lists of integers:

from typing import List


def flatten(l: list) -> List[int]:
    """Flatten an arbitrary deep nested list of lists of integers.

    Examples:
        >>> flatten([1, 2, [1, [10]]])
        [1, 2, 1, 10]

    Args:
        l: Union[l, Union[int, List[int]]

    Returns:
        Flatted list of integer
    """
    return [int(i.strip('[ ]')) for i in str(l).split(',')]

了解切片符号

问题:了解切片符号

我需要一个关于Python切片符号的很好的解释(引用是一个加号)。

对我而言,此表示法需要一些注意。

它看起来非常强大,但是我还没有完全了解它。

I need a good explanation (references are a plus) on Python’s slice notation.

To me, this notation needs a bit of picking up.

It looks extremely powerful, but I haven’t quite got my head around it.


回答 0

真的很简单:

a[start:stop]  # items start through stop-1
a[start:]      # items start through the rest of the array
a[:stop]       # items from the beginning through stop-1
a[:]           # a copy of the whole array

还有一个step值,可以与以上任何一种一起使用:

a[start:stop:step] # start through not past stop, by step

要记住的关键点是该:stop值表示不在所选切片中的第一个值。所以,之间的差stopstart是选择的元素的数量(如果step是1,默认值)。

另一个功能是startstop可能是负数,这意味着它从数组的末尾而不是开头开始计数。所以:

a[-1]    # last item in the array
a[-2:]   # last two items in the array
a[:-2]   # everything except the last two items

同样,step可能为负数:

a[::-1]    # all items in the array, reversed
a[1::-1]   # the first two items, reversed
a[:-3:-1]  # the last two items, reversed
a[-3::-1]  # everything except the last two items, reversed

如果项目数量少于您的要求,Python对程序员很友好。例如,如果您要求a[:-2]并且a仅包含一个元素,则会得到一个空列表,而不是一个错误。有时您会更喜欢该错误,因此您必须意识到这种情况可能会发生。

slice()对象的关系

[]上面的代码中实际上将切片运算符与slice()使用:符号的对象一起使用(仅在内有效[]),即:

a[start:stop:step]

等效于:

a[slice(start, stop, step)]

切片对象也表现略有不同,这取决于参数的个数,同样range(),即两个slice(stop)slice(start, stop[, step])支持。要跳过指定给定参数的操作,可以使用None,例如a[start:]等于a[slice(start, None)]a[::-1]等于a[slice(None, None, -1)]

尽管:基于的符号对于简单切片非常有帮助,但是slice()对象的显式使用简化了切片的编程生成。

It’s pretty simple really:

a[start:stop]  # items start through stop-1
a[start:]      # items start through the rest of the array
a[:stop]       # items from the beginning through stop-1
a[:]           # a copy of the whole array

There is also the step value, which can be used with any of the above:

a[start:stop:step] # start through not past stop, by step

The key point to remember is that the :stop value represents the first value that is not in the selected slice. So, the difference between stop and start is the number of elements selected (if step is 1, the default).

The other feature is that start or stop may be a negative number, which means it counts from the end of the array instead of the beginning. So:

a[-1]    # last item in the array
a[-2:]   # last two items in the array
a[:-2]   # everything except the last two items

Similarly, step may be a negative number:

a[::-1]    # all items in the array, reversed
a[1::-1]   # the first two items, reversed
a[:-3:-1]  # the last two items, reversed
a[-3::-1]  # everything except the last two items, reversed

Python is kind to the programmer if there are fewer items than you ask for. For example, if you ask for a[:-2] and a only contains one element, you get an empty list instead of an error. Sometimes you would prefer the error, so you have to be aware that this may happen.

Relation to slice() object

The slicing operator [] is actually being used in the above code with a slice() object using the : notation (which is only valid within []), i.e.:

a[start:stop:step]

is equivalent to:

a[slice(start, stop, step)]

Slice objects also behave slightly differently depending on the number of arguments, similarly to range(), i.e. both slice(stop) and slice(start, stop[, step]) are supported. To skip specifying a given argument, one might use None, so that e.g. a[start:] is equivalent to a[slice(start, None)] or a[::-1] is equivalent to a[slice(None, None, -1)].

While the :-based notation is very helpful for simple slicing, the explicit use of slice() objects simplifies the programmatic generation of slicing.


回答 1

Python的教程谈论它(稍微向下滚动,直到你得到关于切片的部分)。

ASCII艺术图也有助于记住切片的工作方式:

 +---+---+---+---+---+---+
 | P | y | t | h | o | n |
 +---+---+---+---+---+---+
 0   1   2   3   4   5   6
-6  -5  -4  -3  -2  -1

记住切片如何工作的一种方法是将索引视为指向字符之间的指针,第一个字符的左边缘编号为0。然后,n个字符的字符串的最后符的右侧边缘具有索引n

The Python tutorial talks about it (scroll down a bit until you get to the part about slicing).

The ASCII art diagram is helpful too for remembering how slices work:

 +---+---+---+---+---+---+
 | P | y | t | h | o | n |
 +---+---+---+---+---+---+
 0   1   2   3   4   5   6
-6  -5  -4  -3  -2  -1

One way to remember how slices work is to think of the indices as pointing between characters, with the left edge of the first character numbered 0. Then the right edge of the last character of a string of n characters has index n.


回答 2

列举语法允许的可能性:

>>> seq[:]                # [seq[0],   seq[1],          ..., seq[-1]    ]
>>> seq[low:]             # [seq[low], seq[low+1],      ..., seq[-1]    ]
>>> seq[:high]            # [seq[0],   seq[1],          ..., seq[high-1]]
>>> seq[low:high]         # [seq[low], seq[low+1],      ..., seq[high-1]]
>>> seq[::stride]         # [seq[0],   seq[stride],     ..., seq[-1]    ]
>>> seq[low::stride]      # [seq[low], seq[low+stride], ..., seq[-1]    ]
>>> seq[:high:stride]     # [seq[0],   seq[stride],     ..., seq[high-1]]
>>> seq[low:high:stride]  # [seq[low], seq[low+stride], ..., seq[high-1]]

当然,如果(high-low)%stride != 0,则终点将比稍低high-1

如果stride为负,则由于我们递减计数,因此顺序有所更改:

>>> seq[::-stride]        # [seq[-1],   seq[-1-stride],   ..., seq[0]    ]
>>> seq[high::-stride]    # [seq[high], seq[high-stride], ..., seq[0]    ]
>>> seq[:low:-stride]     # [seq[-1],   seq[-1-stride],   ..., seq[low+1]]
>>> seq[high:low:-stride] # [seq[high], seq[high-stride], ..., seq[low+1]]

扩展切片(带有逗号和省略号)通常仅由特殊的数据结构(例如NumPy)使用;基本序列不支持它们。

>>> class slicee:
...     def __getitem__(self, item):
...         return repr(item)
...
>>> slicee()[0, 1:2, ::5, ...]
'(0, slice(1, 2, None), slice(None, None, 5), Ellipsis)'

Enumerating the possibilities allowed by the grammar:

>>> seq[:]                # [seq[0],   seq[1],          ..., seq[-1]    ]
>>> seq[low:]             # [seq[low], seq[low+1],      ..., seq[-1]    ]
>>> seq[:high]            # [seq[0],   seq[1],          ..., seq[high-1]]
>>> seq[low:high]         # [seq[low], seq[low+1],      ..., seq[high-1]]
>>> seq[::stride]         # [seq[0],   seq[stride],     ..., seq[-1]    ]
>>> seq[low::stride]      # [seq[low], seq[low+stride], ..., seq[-1]    ]
>>> seq[:high:stride]     # [seq[0],   seq[stride],     ..., seq[high-1]]
>>> seq[low:high:stride]  # [seq[low], seq[low+stride], ..., seq[high-1]]

Of course, if (high-low)%stride != 0, then the end point will be a little lower than high-1.

If stride is negative, the ordering is changed a bit since we’re counting down:

>>> seq[::-stride]        # [seq[-1],   seq[-1-stride],   ..., seq[0]    ]
>>> seq[high::-stride]    # [seq[high], seq[high-stride], ..., seq[0]    ]
>>> seq[:low:-stride]     # [seq[-1],   seq[-1-stride],   ..., seq[low+1]]
>>> seq[high:low:-stride] # [seq[high], seq[high-stride], ..., seq[low+1]]

Extended slicing (with commas and ellipses) are mostly used only by special data structures (like NumPy); the basic sequences don’t support them.

>>> class slicee:
...     def __getitem__(self, item):
...         return repr(item)
...
>>> slicee()[0, 1:2, ::5, ...]
'(0, slice(1, 2, None), slice(None, None, 5), Ellipsis)'

回答 3

上面的答案不讨论切片分配。要了解切片分配,在ASCII艺术中添加另一个概念很有帮助:

                +---+---+---+---+---+---+
                | P | y | t | h | o | n |
                +---+---+---+---+---+---+
Slice position: 0   1   2   3   4   5   6
Index position:   0   1   2   3   4   5

>>> p = ['P','y','t','h','o','n']
# Why the two sets of numbers:
# indexing gives items, not lists
>>> p[0]
 'P'
>>> p[5]
 'n'

# Slicing gives lists
>>> p[0:1]
 ['P']
>>> p[0:2]
 ['P','y']

对于从零到n的切片,一种试探法是:“零是起点,从起点开始,并在列表中取n个项目”。

>>> p[5] # the last of six items, indexed from zero
 'n'
>>> p[0:5] # does NOT include the last item!
 ['P','y','t','h','o']
>>> p[0:6] # not p[0:5]!!!
 ['P','y','t','h','o','n']

另一个启发式方法是:“对于任何切片,将起始位置替换为零,应用先前的启发式方法以获取列表的末尾,然后将第一个数字向上计数以从开始处切掉项”

>>> p[0:4] # Start at the beginning and count out 4 items
 ['P','y','t','h']
>>> p[1:4] # Take one item off the front
 ['y','t','h']
>>> p[2:4] # Take two items off the front
 ['t','h']
# etc.

切片分配的第一个规则是,由于切片返回一个列表,因此切片分配需要一个列表(或其他可迭代的):

>>> p[2:3]
 ['t']
>>> p[2:3] = ['T']
>>> p
 ['P','y','T','h','o','n']
>>> p[2:3] = 't'
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: can only assign an iterable

您还可以在上面看到的切片分配的第二条规则是,切片索引会返回列表的任何部分,这与切片分配更改的部分相同:

>>> p[2:4]
 ['T','h']
>>> p[2:4] = ['t','r']
>>> p
 ['P','y','t','r','o','n']

切片分配的第三个规则是,分配的列表(可迭代)不必具有相同的长度。索引切片被简单地切出,并被分配的内容全部替换:

>>> p = ['P','y','t','h','o','n'] # Start over
>>> p[2:4] = ['s','p','a','m']
>>> p
 ['P','y','s','p','a','m','o','n']

习惯最棘手的部分是分配给空片。使用启发式1和2可以很容易地索引一个空片:

>>> p = ['P','y','t','h','o','n']
>>> p[0:4]
 ['P','y','t','h']
>>> p[1:4]
 ['y','t','h']
>>> p[2:4]
 ['t','h']
>>> p[3:4]
 ['h']
>>> p[4:4]
 []

然后,一旦您看到了,将切片分配给空切片也是有意义的:

>>> p = ['P','y','t','h','o','n']
>>> p[2:4] = ['x','y'] # Assigned list is same length as slice
>>> p
 ['P','y','x','y','o','n'] # Result is same length
>>> p = ['P','y','t','h','o','n']
>>> p[3:4] = ['x','y'] # Assigned list is longer than slice
>>> p
 ['P','y','t','x','y','o','n'] # The result is longer
>>> p = ['P','y','t','h','o','n']
>>> p[4:4] = ['x','y']
>>> p
 ['P','y','t','h','x','y','o','n'] # The result is longer still

请注意,由于我们没有更改分片的第二个数字(4),因此即使我们分配给空分片,插入的项目也总是紧靠’o’堆积。因此,空切片分配的位置是非空切片分配的位置的逻辑扩展。

进行一点备份,当您继续进行我们从头开始计算切片的过程时,会发生什么?

>>> p = ['P','y','t','h','o','n']
>>> p[0:4]
 ['P','y','t','h']
>>> p[1:4]
 ['y','t','h']
>>> p[2:4]
 ['t','h']
>>> p[3:4]
 ['h']
>>> p[4:4]
 []
>>> p[5:4]
 []
>>> p[6:4]
 []

使用切片,一旦完成,就完成了;它不会开始向后切片。在Python中,除非您通过使用负数明确要求它们,否则您不会获得负面的进步。

>>> p[5:3:-1]
 ['n','o']

“一旦完成,就完成了”规则有一些奇怪的后果:

>>> p[4:4]
 []
>>> p[5:4]
 []
>>> p[6:4]
 []
>>> p[6]
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
IndexError: list index out of range

实际上,与索引相比,Python切片非常防错:

>>> p[100:200]
 []
>>> p[int(2e99):int(1e99)]
 []

有时这会派上用场,但也会导致一些奇怪的行为:

>>> p
 ['P', 'y', 't', 'h', 'o', 'n']
>>> p[int(2e99):int(1e99)] = ['p','o','w','e','r']
>>> p
 ['P', 'y', 't', 'h', 'o', 'n', 'p', 'o', 'w', 'e', 'r']

根据您的应用程序,这可能……或可能不是……您所希望的!


以下是我的原始答案的文字。它对很多人都有用,所以我不想删除它。

>>> r=[1,2,3,4]
>>> r[1:1]
[]
>>> r[1:1]=[9,8]
>>> r
[1, 9, 8, 2, 3, 4]
>>> r[1:1]=['blah']
>>> r
[1, 'blah', 9, 8, 2, 3, 4]

这也可以澄清切片和索引之间的区别。

The answers above don’t discuss slice assignment. To understand slice assignment, it’s helpful to add another concept to the ASCII art:

                +---+---+---+---+---+---+
                | P | y | t | h | o | n |
                +---+---+---+---+---+---+
Slice position: 0   1   2   3   4   5   6
Index position:   0   1   2   3   4   5

>>> p = ['P','y','t','h','o','n']
# Why the two sets of numbers:
# indexing gives items, not lists
>>> p[0]
 'P'
>>> p[5]
 'n'

# Slicing gives lists
>>> p[0:1]
 ['P']
>>> p[0:2]
 ['P','y']

One heuristic is, for a slice from zero to n, think: “zero is the beginning, start at the beginning and take n items in a list”.

>>> p[5] # the last of six items, indexed from zero
 'n'
>>> p[0:5] # does NOT include the last item!
 ['P','y','t','h','o']
>>> p[0:6] # not p[0:5]!!!
 ['P','y','t','h','o','n']

Another heuristic is, “for any slice, replace the start by zero, apply the previous heuristic to get the end of the list, then count the first number back up to chop items off the beginning”

>>> p[0:4] # Start at the beginning and count out 4 items
 ['P','y','t','h']
>>> p[1:4] # Take one item off the front
 ['y','t','h']
>>> p[2:4] # Take two items off the front
 ['t','h']
# etc.

The first rule of slice assignment is that since slicing returns a list, slice assignment requires a list (or other iterable):

>>> p[2:3]
 ['t']
>>> p[2:3] = ['T']
>>> p
 ['P','y','T','h','o','n']
>>> p[2:3] = 't'
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: can only assign an iterable

The second rule of slice assignment, which you can also see above, is that whatever portion of the list is returned by slice indexing, that’s the same portion that is changed by slice assignment:

>>> p[2:4]
 ['T','h']
>>> p[2:4] = ['t','r']
>>> p
 ['P','y','t','r','o','n']

The third rule of slice assignment is, the assigned list (iterable) doesn’t have to have the same length; the indexed slice is simply sliced out and replaced en masse by whatever is being assigned:

>>> p = ['P','y','t','h','o','n'] # Start over
>>> p[2:4] = ['s','p','a','m']
>>> p
 ['P','y','s','p','a','m','o','n']

The trickiest part to get used to is assignment to empty slices. Using heuristic 1 and 2 it’s easy to get your head around indexing an empty slice:

>>> p = ['P','y','t','h','o','n']
>>> p[0:4]
 ['P','y','t','h']
>>> p[1:4]
 ['y','t','h']
>>> p[2:4]
 ['t','h']
>>> p[3:4]
 ['h']
>>> p[4:4]
 []

And then once you’ve seen that, slice assignment to the empty slice makes sense too:

>>> p = ['P','y','t','h','o','n']
>>> p[2:4] = ['x','y'] # Assigned list is same length as slice
>>> p
 ['P','y','x','y','o','n'] # Result is same length
>>> p = ['P','y','t','h','o','n']
>>> p[3:4] = ['x','y'] # Assigned list is longer than slice
>>> p
 ['P','y','t','x','y','o','n'] # The result is longer
>>> p = ['P','y','t','h','o','n']
>>> p[4:4] = ['x','y']
>>> p
 ['P','y','t','h','x','y','o','n'] # The result is longer still

Note that, since we are not changing the second number of the slice (4), the inserted items always stack right up against the ‘o’, even when we’re assigning to the empty slice. So the position for the empty slice assignment is the logical extension of the positions for the non-empty slice assignments.

Backing up a little bit, what happens when you keep going with our procession of counting up the slice beginning?

>>> p = ['P','y','t','h','o','n']
>>> p[0:4]
 ['P','y','t','h']
>>> p[1:4]
 ['y','t','h']
>>> p[2:4]
 ['t','h']
>>> p[3:4]
 ['h']
>>> p[4:4]
 []
>>> p[5:4]
 []
>>> p[6:4]
 []

With slicing, once you’re done, you’re done; it doesn’t start slicing backwards. In Python you don’t get negative strides unless you explicitly ask for them by using a negative number.

>>> p[5:3:-1]
 ['n','o']

There are some weird consequences to the “once you’re done, you’re done” rule:

>>> p[4:4]
 []
>>> p[5:4]
 []
>>> p[6:4]
 []
>>> p[6]
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
IndexError: list index out of range

In fact, compared to indexing, Python slicing is bizarrely error-proof:

>>> p[100:200]
 []
>>> p[int(2e99):int(1e99)]
 []

This can come in handy sometimes, but it can also lead to somewhat strange behavior:

>>> p
 ['P', 'y', 't', 'h', 'o', 'n']
>>> p[int(2e99):int(1e99)] = ['p','o','w','e','r']
>>> p
 ['P', 'y', 't', 'h', 'o', 'n', 'p', 'o', 'w', 'e', 'r']

Depending on your application, that might… or might not… be what you were hoping for there!


Below is the text of my original answer. It has been useful to many people, so I didn’t want to delete it.

>>> r=[1,2,3,4]
>>> r[1:1]
[]
>>> r[1:1]=[9,8]
>>> r
[1, 9, 8, 2, 3, 4]
>>> r[1:1]=['blah']
>>> r
[1, 'blah', 9, 8, 2, 3, 4]

This may also clarify the difference between slicing and indexing.


回答 4

解释Python的切片符号

总之,冒号(:)在标符号(subscriptable[subscriptarg])使切片符号-它具有可选参数,startstopstep

sliceable[start:stop:step]

Python切片是一种计算快速的方法,可以有条不紊地访问部分数据。我认为,即使是一名中级Python程序员,这也是该语言必须熟悉的一个方面。

重要定义

首先,让我们定义一些术语:

start:切片的开始索引,它将包含此索引处的元素,除非它与stop相同,默认为0,即第一个索引。如果为负,则表示从头开始n

stop:切片的结束索引,包含该索引处的元素,默认为要切片的序列的长度,即直到并包括结束。

步骤:索引增加的数量,默认为1。如果为负,则按相反方向切片。

索引如何工作

您可以使这些正数或负数中的任何一个。正数的含义很简单,但对于负数,就像在Python索引,向后从最终的计数启动停止,并为一步,你只需递减索引。此示例来自文档的教程,但我对其进行了稍微修改,以指示每个索引引用序列中的哪个项目:

 +---+---+---+---+---+---+
 | P | y | t | h | o | n |
 +---+---+---+---+---+---+
   0   1   2   3   4   5 
  -6  -5  -4  -3  -2  -1

切片如何工作

要将切片符号与支持它的序列一起使用,必须在序列后的方括号中至少包含一个冒号(根据Python数据模型,该括号实际上实现__getitem__了序列的方法)。

切片符号的工作方式如下:

sequence[start:stop:step]

并记得startstopstep有默认值,因此要访问默认值,只需省略参数。

从列表(或支持字符串的任何其他序列,如字符串)中获取最后九个元素的切片表示法如下所示:

my_list[-9:]

看到此内容时,我将括号中的部分读为“从末尾到第9位”。(实际上,我在心理上将其缩写为“ -9,on”)

说明:

完整的符号是

my_list[-9:None:None]

并替换为默认值(实际上,当step为负数时,stop默认值为-len(my_list) - 1,因此None对于stop而言,实际上仅意味着它会到达执行此操作的最后一个步骤):

my_list[-9:len(my_list):1]

冒号:是什么将告诉Python,你给它一个切片,而不是一个常规的索引。这就是为什么在Python 2中制作列表的浅表副本的惯用方式是

list_copy = sequence[:]

清除它们的方法是:

del my_list[:]

(Python 3获得了list.copyand list.clear方法。)

step为负数时,默认值startstop更改

默认情况下,当step参数为空(或None)时,会将其分配给+1

但是,您可以传入一个负整数,列表(或大多数其他标准可切片)将从头到尾切成片。

因此,负片将更改startand 的默认值stop

在来源中确认

我希望鼓励用户阅读源代码和文档。切片对象源代码和此逻辑可在此处找到。首先我们确定是否step为负:

 step_is_negative = step_sign < 0;

如果是这样,则下限是-1 指我们一直切到开始并包括起点,上限是长度减去1,这意味着我们从末尾开始。(请注意,此语义-1不同从一个-1用户可能通过在Python索引指示最后项)。

if (step_is_negative) {
    lower = PyLong_FromLong(-1L);
    if (lower == NULL)
        goto error;

    upper = PyNumber_Add(length, lower);
    if (upper == NULL)
        goto error;
}

否则step为正,下界将为零,上限(我们将达到但不包括在内)的是切片列表的长度。

else {
    lower = _PyLong_Zero;
    Py_INCREF(lower);
    upper = length;
    Py_INCREF(upper);
}

然后,我们可能需要应用默认设置startstop-那么默认的start时计算为上限step是否定的:

if (self->start == Py_None) {
    start = step_is_negative ? upper : lower;
    Py_INCREF(start);
}

stop,下限:

if (self->stop == Py_None) {
    stop = step_is_negative ? lower : upper;
    Py_INCREF(stop);
}

给您的切片起一个描述性的名字!

您可能会发现,将形成切片与将切片传递给list.__getitem__方法分开很有用(这就是方括号所做的事情)。即使您并不陌生,它也可以使您的代码更具可读性,以便其他可能必须阅读您的代码的人可以更轻松地了解您的操作。

但是,您不能只将一些用冒号分隔的整数分配给变量。您需要使用slice对象:

last_nine_slice = slice(-9, None)

第二个参数,None,是必需的,使得第一参数被解释为所述start参数否则这将是stop自变量

然后可以将slice对象传递给序列:

>>> list(range(100))[last_nine_slice]
[91, 92, 93, 94, 95, 96, 97, 98, 99]

有趣的是,范围也可以切片:

>>> range(100)[last_nine_slice]
range(91, 100)

内存注意事项:

由于Python列表切片在内存中创建了新对象,因此需要注意的另一个重要功能是itertools.islice。通常,您需要遍历一个切片,而不仅仅是在内存中静态创建它。islice对此很完美。一个警告,它不支持负的参数startstop或者step,如果这是一个问题,您可能需要计算指标或反向迭代提前。

length = 100
last_nine_iter = itertools.islice(list(range(length)), length-9, None, 1)
list_last_nine = list(last_nine_iter)

现在:

>>> list_last_nine
[91, 92, 93, 94, 95, 96, 97, 98, 99]

列表切片可以复制的事实是列表本身的功能。如果要切片高级对象(例如Pandas DataFrame),则它可能会返回原始视图,而不是副本。

Explain Python’s slice notation

In short, the colons (:) in subscript notation (subscriptable[subscriptarg]) make slice notation – which has the optional arguments, start, stop, step:

sliceable[start:stop:step]

Python slicing is a computationally fast way to methodically access parts of your data. In my opinion, to be even an intermediate Python programmer, it’s one aspect of the language that it is necessary to be familiar with.

Important Definitions

To begin with, let’s define a few terms:

start: the beginning index of the slice, it will include the element at this index unless it is the same as stop, defaults to 0, i.e. the first index. If it’s negative, it means to start n items from the end.

stop: the ending index of the slice, it does not include the element at this index, defaults to length of the sequence being sliced, that is, up to and including the end.

step: the amount by which the index increases, defaults to 1. If it’s negative, you’re slicing over the iterable in reverse.

How Indexing Works

You can make any of these positive or negative numbers. The meaning of the positive numbers is straightforward, but for negative numbers, just like indexes in Python, you count backwards from the end for the start and stop, and for the step, you simply decrement your index. This example is from the documentation’s tutorial, but I’ve modified it slightly to indicate which item in a sequence each index references:

 +---+---+---+---+---+---+
 | P | y | t | h | o | n |
 +---+---+---+---+---+---+
   0   1   2   3   4   5 
  -6  -5  -4  -3  -2  -1

How Slicing Works

To use slice notation with a sequence that supports it, you must include at least one colon in the square brackets that follow the sequence (which actually implement the __getitem__ method of the sequence, according to the Python data model.)

Slice notation works like this:

sequence[start:stop:step]

And recall that there are defaults for start, stop, and step, so to access the defaults, simply leave out the argument.

Slice notation to get the last nine elements from a list (or any other sequence that supports it, like a string) would look like this:

my_list[-9:]

When I see this, I read the part in the brackets as “9th from the end, to the end.” (Actually, I abbreviate it mentally as “-9, on”)

Explanation:

The full notation is

my_list[-9:None:None]

and to substitute the defaults (actually when step is negative, stop‘s default is -len(my_list) - 1, so None for stop really just means it goes to whichever end step takes it to):

my_list[-9:len(my_list):1]

The colon, :, is what tells Python you’re giving it a slice and not a regular index. That’s why the idiomatic way of making a shallow copy of lists in Python 2 is

list_copy = sequence[:]

And clearing them is with:

del my_list[:]

(Python 3 gets a list.copy and list.clear method.)

When step is negative, the defaults for start and stop change

By default, when the step argument is empty (or None), it is assigned to +1.

But you can pass in a negative integer, and the list (or most other standard slicables) will be sliced from the end to the beginning.

Thus a negative slice will change the defaults for start and stop!

Confirming this in the source

I like to encourage users to read the source as well as the documentation. The source code for slice objects and this logic is found here. First we determine if step is negative:

 step_is_negative = step_sign < 0;

If so, the lower bound is -1 meaning we slice all the way up to and including the beginning, and the upper bound is the length minus 1, meaning we start at the end. (Note that the semantics of this -1 is different from a -1 that users may pass indexes in Python indicating the last item.)

if (step_is_negative) {
    lower = PyLong_FromLong(-1L);
    if (lower == NULL)
        goto error;

    upper = PyNumber_Add(length, lower);
    if (upper == NULL)
        goto error;
}

Otherwise step is positive, and the lower bound will be zero and the upper bound (which we go up to but not including) the length of the sliced list.

else {
    lower = _PyLong_Zero;
    Py_INCREF(lower);
    upper = length;
    Py_INCREF(upper);
}

Then, we may need to apply the defaults for start and stop – the default then for start is calculated as the upper bound when step is negative:

if (self->start == Py_None) {
    start = step_is_negative ? upper : lower;
    Py_INCREF(start);
}

and stop, the lower bound:

if (self->stop == Py_None) {
    stop = step_is_negative ? lower : upper;
    Py_INCREF(stop);
}

Give your slices a descriptive name!

You may find it useful to separate forming the slice from passing it to the list.__getitem__ method (that’s what the square brackets do). Even if you’re not new to it, it keeps your code more readable so that others that may have to read your code can more readily understand what you’re doing.

However, you can’t just assign some integers separated by colons to a variable. You need to use the slice object:

last_nine_slice = slice(-9, None)

The second argument, None, is required, so that the first argument is interpreted as the start argument otherwise it would be the stop argument.

You can then pass the slice object to your sequence:

>>> list(range(100))[last_nine_slice]
[91, 92, 93, 94, 95, 96, 97, 98, 99]

It’s interesting that ranges also take slices:

>>> range(100)[last_nine_slice]
range(91, 100)

Memory Considerations:

Since slices of Python lists create new objects in memory, another important function to be aware of is itertools.islice. Typically you’ll want to iterate over a slice, not just have it created statically in memory. islice is perfect for this. A caveat, it doesn’t support negative arguments to start, stop, or step, so if that’s an issue you may need to calculate indices or reverse the iterable in advance.

length = 100
last_nine_iter = itertools.islice(list(range(length)), length-9, None, 1)
list_last_nine = list(last_nine_iter)

and now:

>>> list_last_nine
[91, 92, 93, 94, 95, 96, 97, 98, 99]

The fact that list slices make a copy is a feature of lists themselves. If you’re slicing advanced objects like a Pandas DataFrame, it may return a view on the original, and not a copy.


回答 5

当我第一次看到切片语法时,有几件事对我来说并不立即显而易见:

>>> x = [1,2,3,4,5,6]
>>> x[::-1]
[6,5,4,3,2,1]

颠倒序列的简单方法!

如果出于某种原因,您想要按相反的顺序进行第二个项目:

>>> x = [1,2,3,4,5,6]
>>> x[::-2]
[6,4,2]

And a couple of things that weren’t immediately obvious to me when I first saw the slicing syntax:

>>> x = [1,2,3,4,5,6]
>>> x[::-1]
[6,5,4,3,2,1]

Easy way to reverse sequences!

And if you wanted, for some reason, every second item in the reversed sequence:

>>> x = [1,2,3,4,5,6]
>>> x[::-2]
[6,4,2]

回答 6

在Python 2.7中

用Python切片

[a:b:c]

len = length of string, tuple or list

c -- default is +1. The sign of c indicates forward or backward, absolute value of c indicates steps. Default is forward with step size 1. Positive means forward, negative means backward.

a --  When c is positive or blank, default is 0. When c is negative, default is -1.

b --  When c is positive or blank, default is len. When c is negative, default is -(len+1).

了解索引分配非常重要。

In forward direction, starts at 0 and ends at len-1

In backward direction, starts at -1 and ends at -len

当您说[a:b:c]时,您要说的是根据c的符号(向前或向后),从a开始,到b结束(不包括bth索引处的元素)。使用上面的索引规则,请记住,您只会在此范围内找到元素:

-len, -len+1, -len+2, ..., 0, 1, 2,3,4 , len -1

但是这个范围在两个方向上都无限地继续:

...,-len -2 ,-len-1,-len, -len+1, -len+2, ..., 0, 1, 2,3,4 , len -1, len, len +1, len+2 , ....

例如:

             0    1    2   3    4   5   6   7   8   9   10   11
             a    s    t   r    i   n   g
    -9  -8  -7   -6   -5  -4   -3  -2  -1

如果您对a,b和c的选择允许您在使用上方a,b,c的规则遍历时与上述范围重叠,则您将获得一个包含元素的列表(在遍历期间被触摸)或一个空列表。

最后一件事:如果a和b相等,那么您还会得到一个空列表:

>>> l1
[2, 3, 4]

>>> l1[:]
[2, 3, 4]

>>> l1[::-1] # a default is -1 , b default is -(len+1)
[4, 3, 2]

>>> l1[:-4:-1] # a default is -1
[4, 3, 2]

>>> l1[:-3:-1] # a default is -1
[4, 3]

>>> l1[::] # c default is +1, so a default is 0, b default is len
[2, 3, 4]

>>> l1[::-1] # c is -1 , so a default is -1 and b default is -(len+1)
[4, 3, 2]


>>> l1[-100:-200:-1] # Interesting
[]

>>> l1[-1:-200:-1] # Interesting
[4, 3, 2]


>>> l1[-1:-1:1]
[]


>>> l1[-1:5:1] # Interesting
[4]


>>> l1[1:-7:1]
[]

>>> l1[1:-7:-1] # Interesting
[3, 2]

>>> l1[:-2:-2] # a default is -1, stop(b) at -2 , step(c) by 2 in reverse direction
[4]

In Python 2.7

Slicing in Python

[a:b:c]

len = length of string, tuple or list

c -- default is +1. The sign of c indicates forward or backward, absolute value of c indicates steps. Default is forward with step size 1. Positive means forward, negative means backward.

a --  When c is positive or blank, default is 0. When c is negative, default is -1.

b --  When c is positive or blank, default is len. When c is negative, default is -(len+1).

Understanding index assignment is very important.

In forward direction, starts at 0 and ends at len-1

In backward direction, starts at -1 and ends at -len

When you say [a:b:c], you are saying depending on the sign of c (forward or backward), start at a and end at b (excluding element at bth index). Use the indexing rule above and remember you will only find elements in this range:

-len, -len+1, -len+2, ..., 0, 1, 2,3,4 , len -1

But this range continues in both directions infinitely:

...,-len -2 ,-len-1,-len, -len+1, -len+2, ..., 0, 1, 2,3,4 , len -1, len, len +1, len+2 , ....

For example:

             0    1    2   3    4   5   6   7   8   9   10   11
             a    s    t   r    i   n   g
    -9  -8  -7   -6   -5  -4   -3  -2  -1

If your choice of a, b, and c allows overlap with the range above as you traverse using rules for a,b,c above you will either get a list with elements (touched during traversal) or you will get an empty list.

One last thing: if a and b are equal, then also you get an empty list:

>>> l1
[2, 3, 4]

>>> l1[:]
[2, 3, 4]

>>> l1[::-1] # a default is -1 , b default is -(len+1)
[4, 3, 2]

>>> l1[:-4:-1] # a default is -1
[4, 3, 2]

>>> l1[:-3:-1] # a default is -1
[4, 3]

>>> l1[::] # c default is +1, so a default is 0, b default is len
[2, 3, 4]

>>> l1[::-1] # c is -1 , so a default is -1 and b default is -(len+1)
[4, 3, 2]


>>> l1[-100:-200:-1] # Interesting
[]

>>> l1[-1:-200:-1] # Interesting
[4, 3, 2]


>>> l1[-1:-1:1]
[]


>>> l1[-1:5:1] # Interesting
[4]


>>> l1[1:-7:1]
[]

>>> l1[1:-7:-1] # Interesting
[3, 2]

>>> l1[:-2:-2] # a default is -1, stop(b) at -2 , step(c) by 2 in reverse direction
[4]

回答 7

http://wiki.python.org/moin/MovingToPythonFromOtherLanguages中找到了这张很棒的桌子

Python indexes and slices for a six-element list.
Indexes enumerate the elements, slices enumerate the spaces between the elements.

Index from rear:    -6  -5  -4  -3  -2  -1      a=[0,1,2,3,4,5]    a[1:]==[1,2,3,4,5]
Index from front:    0   1   2   3   4   5      len(a)==6          a[:5]==[0,1,2,3,4]
                   +---+---+---+---+---+---+    a[0]==0            a[:-2]==[0,1,2,3]
                   | a | b | c | d | e | f |    a[5]==5            a[1:2]==[1]
                   +---+---+---+---+---+---+    a[-1]==5           a[1:-1]==[1,2,3,4]
Slice from front:  :   1   2   3   4   5   :    a[-2]==4
Slice from rear:   :  -5  -4  -3  -2  -1   :
                                                b=a[:]
                                                b==[0,1,2,3,4,5] (shallow copy of a)

Found this great table at http://wiki.python.org/moin/MovingToPythonFromOtherLanguages

Python indexes and slices for a six-element list.
Indexes enumerate the elements, slices enumerate the spaces between the elements.

Index from rear:    -6  -5  -4  -3  -2  -1      a=[0,1,2,3,4,5]    a[1:]==[1,2,3,4,5]
Index from front:    0   1   2   3   4   5      len(a)==6          a[:5]==[0,1,2,3,4]
                   +---+---+---+---+---+---+    a[0]==0            a[:-2]==[0,1,2,3]
                   | a | b | c | d | e | f |    a[5]==5            a[1:2]==[1]
                   +---+---+---+---+---+---+    a[-1]==5           a[1:-1]==[1,2,3,4]
Slice from front:  :   1   2   3   4   5   :    a[-2]==4
Slice from rear:   :  -5  -4  -3  -2  -1   :
                                                b=a[:]
                                                b==[0,1,2,3,4,5] (shallow copy of a)

回答 8

使用了一点之后,我意识到最简单的描述是它与for循环中的参数完全相同…

(from:to:step)

它们都是可选的:

(:to:step)
(from::step)
(from:to)

然后,负索引只需要您将字符串的长度添加到负索引即可理解。

无论如何这对我有用…

After using it a bit I realise that the simplest description is that it is exactly the same as the arguments in a for loop…

(from:to:step)

Any of them are optional:

(:to:step)
(from::step)
(from:to)

Then the negative indexing just needs you to add the length of the string to the negative indices to understand it.

This works for me anyway…


回答 9

我发现更容易记住它是如何工作的,然后我可以找出任何特定的开始/停止/步骤组合。

首先了解它是有启发性的range()

def range(start=0, stop, step=1):  # Illegal syntax, but that's the effect
    i = start
    while (i < stop if step > 0 else i > stop):
        yield i
        i += step

从头开始start,以递增step,不达到stop。很简单。

要记住的关于负步长的事情stop是,无论是更高还是更低,始终是被排除的终点。如果您想以相反的顺序切割同一片,则分开进行反转会更清洁:例如'abcde'[1:-2][::-1]从左侧切出一个字符,从右侧切出两个字符,然后反转。(另请参见reversed()。)

序列切片相同,不同之处在于它首先对负索引进行规范化,并且永远不会超出序列范围:

待办事项:当abs(step)> 1;时,下面的代码有一个“永不超出序列”的错误;我我打补丁是正确的,但很难理解。

def this_is_how_slicing_works(seq, start=None, stop=None, step=1):
    if start is None:
        start = (0 if step > 0 else len(seq)-1)
    elif start < 0:
        start += len(seq)
    if not 0 <= start < len(seq):  # clip if still outside bounds
        start = (0 if step > 0 else len(seq)-1)
    if stop is None:
        stop = (len(seq) if step > 0 else -1)  # really -1, not last element
    elif stop < 0:
        stop += len(seq)
    for i in range(start, stop, step):
        if 0 <= i < len(seq):
            yield seq[i]

不必担心is None细节-请记住,省略start和/或stop始终做正确的事情可以为您提供整个序列。

首先,通过对负索引进行规范化,可以从开始到结束独立地对开始和/或停止进行计数:'abcde'[1:-2] == 'abcde'[1:3] == 'bc'尽管range(1,-2) == []。标准化有时被认为是“对长度取模”,但请注意,它仅将长度加一次:例如'abcde'[-53:42],只是整个字符串。

I find it easier to remember how it works, and then I can figure out any specific start/stop/step combination.

It’s instructive to understand range() first:

def range(start=0, stop, step=1):  # Illegal syntax, but that's the effect
    i = start
    while (i < stop if step > 0 else i > stop):
        yield i
        i += step

Begin from start, increment by step, do not reach stop. Very simple.

The thing to remember about negative step is that stop is always the excluded end, whether it’s higher or lower. If you want same slice in opposite order, it’s much cleaner to do the reversal separately: e.g. 'abcde'[1:-2][::-1] slices off one char from left, two from right, then reverses. (See also reversed().)

Sequence slicing is same, except it first normalizes negative indexes, and it can never go outside the sequence:

TODO: The code below had a bug with “never go outside the sequence” when abs(step)>1; I think I patched it to be correct, but it’s hard to understand.

def this_is_how_slicing_works(seq, start=None, stop=None, step=1):
    if start is None:
        start = (0 if step > 0 else len(seq)-1)
    elif start < 0:
        start += len(seq)
    if not 0 <= start < len(seq):  # clip if still outside bounds
        start = (0 if step > 0 else len(seq)-1)
    if stop is None:
        stop = (len(seq) if step > 0 else -1)  # really -1, not last element
    elif stop < 0:
        stop += len(seq)
    for i in range(start, stop, step):
        if 0 <= i < len(seq):
            yield seq[i]

Don’t worry about the is None details – just remember that omitting start and/or stop always does the right thing to give you the whole sequence.

Normalizing negative indexes first allows start and/or stop to be counted from the end independently: 'abcde'[1:-2] == 'abcde'[1:3] == 'bc' despite range(1,-2) == []. The normalization is sometimes thought of as “modulo the length”, but note it adds the length just once: e.g. 'abcde'[-53:42] is just the whole string.


回答 10

我自己使用“元素之间的索引点”方法来思考它,但是描述它有时可以帮助他人获得它的一种方法是:

mylist[X:Y]

X是所需的第一个元素的索引。
Y是您不需要的第一个元素的索引。

I use the “an index points between elements” method of thinking about it myself, but one way of describing it which sometimes helps others get it is this:

mylist[X:Y]

X is the index of the first element you want.
Y is the index of the first element you don’t want.


回答 11

Index:
      ------------>
  0   1   2   3   4
+---+---+---+---+---+
| a | b | c | d | e |
+---+---+---+---+---+
  0  -4  -3  -2  -1
      <------------

Slice:
    <---------------|
|--------------->
:   1   2   3   4   :
+---+---+---+---+---+
| a | b | c | d | e |
+---+---+---+---+---+
:  -4  -3  -2  -1   :
|--------------->
    <---------------|

我希望这将帮助您在Python中为列表建模。

参考:http : //wiki.python.org/moin/MovingToPythonFromOtherLanguages

Index:
      ------------>
  0   1   2   3   4
+---+---+---+---+---+
| a | b | c | d | e |
+---+---+---+---+---+
  0  -4  -3  -2  -1
      <------------

Slice:
    <---------------|
|--------------->
:   1   2   3   4   :
+---+---+---+---+---+
| a | b | c | d | e |
+---+---+---+---+---+
:  -4  -3  -2  -1   :
|--------------->
    <---------------|

I hope this will help you to model the list in Python.

Reference: http://wiki.python.org/moin/MovingToPythonFromOtherLanguages


回答 12

Python切片符号:

a[start:end:step]
  • 对于startend,负值被解释为相对于序列的末尾。
  • 对于正指标end指示的位置后,要包含的最后一个元素。
  • 空白值的默认设置如下:[+0:-0:1]
  • 使用否定步骤会颠倒对start和的解释。end

该符号扩展到(numpy)个矩阵和多维数组。例如,要切片整个列,可以使用:

m[::,0:2:] ## slice the first two columns

切片包含数组元素的引用,而不是副本。如果要为数组创建单独的副本,可以使用deepcopy()

Python slicing notation:

a[start:end:step]
  • For start and end, negative values are interpreted as being relative to the end of the sequence.
  • Positive indices for end indicate the position after the last element to be included.
  • Blank values are defaulted as follows: [+0:-0:1].
  • Using a negative step reverses the interpretation of start and end

The notation extends to (numpy) matrices and multidimensional arrays. For example, to slice entire columns you can use:

m[::,0:2:] ## slice the first two columns

Slices hold references, not copies, of the array elements. If you want to make a separate copy an array, you can use deepcopy().


回答 13

您还可以使用切片分配从列表中删除一个或多个元素:

r = [1, 'blah', 9, 8, 2, 3, 4]
>>> r[1:4] = []
>>> r
[1, 2, 3, 4]

You can also use slice assignment to remove one or more elements from a list:

r = [1, 'blah', 9, 8, 2, 3, 4]
>>> r[1:4] = []
>>> r
[1, 2, 3, 4]

回答 14

这只是一些额外的信息…请考虑以下列表

>>> l=[12,23,345,456,67,7,945,467]

反转列表的其他技巧:

>>> l[len(l):-len(l)-1:-1]
[467, 945, 7, 67, 456, 345, 23, 12]

>>> l[:-len(l)-1:-1]
[467, 945, 7, 67, 456, 345, 23, 12]

>>> l[len(l)::-1]
[467, 945, 7, 67, 456, 345, 23, 12]

>>> l[::-1]
[467, 945, 7, 67, 456, 345, 23, 12]

>>> l[-1:-len(l)-1:-1]
[467, 945, 7, 67, 456, 345, 23, 12]

This is just for some extra info… Consider the list below

>>> l=[12,23,345,456,67,7,945,467]

Few other tricks for reversing the list:

>>> l[len(l):-len(l)-1:-1]
[467, 945, 7, 67, 456, 345, 23, 12]

>>> l[:-len(l)-1:-1]
[467, 945, 7, 67, 456, 345, 23, 12]

>>> l[len(l)::-1]
[467, 945, 7, 67, 456, 345, 23, 12]

>>> l[::-1]
[467, 945, 7, 67, 456, 345, 23, 12]

>>> l[-1:-len(l)-1:-1]
[467, 945, 7, 67, 456, 345, 23, 12]

回答 15

这是我教新手切片的方法:

了解索引和切片之间的区别:

Wiki Python的这张惊人图片清楚地区分了索引编制和切片。

在此处输入图片说明

这是一个包含六个元素的列表。为了更好地了解切片,请将该列表视为一组六个盒子放在一起。每个盒子中都有一个字母。

索引就像处理盒子的内容。您可以检查任何框的内容。但是您不能一次检查多个框的内容。您甚至可以更换包装箱中的物品。但是您不能将两个球放在一个盒子中,也不能一次更换两个球。

In [122]: alpha = ['a', 'b', 'c', 'd', 'e', 'f']

In [123]: alpha
Out[123]: ['a', 'b', 'c', 'd', 'e', 'f']

In [124]: alpha[0]
Out[124]: 'a'

In [127]: alpha[0] = 'A'

In [128]: alpha
Out[128]: ['A', 'b', 'c', 'd', 'e', 'f']

In [129]: alpha[0,1]
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-129-c7eb16585371> in <module>()
----> 1 alpha[0,1]

TypeError: list indices must be integers, not tuple

切片就像处理盒子本身。您可以拿起第一个盒子并将其放在另一个桌子上。要拿起盒子,您只需要知道盒子的开始和结束位置即可。

您甚至可以拾取前三个框,最后两个框或1到4之间的所有框。因此,如果您知道开始和结束,则可以选择任何一组框。这些位置称为开始位置和停止位置。

有趣的是,您可以一次替换多个盒子。您也可以在需要的地方放置多个盒子。

In [130]: alpha[0:1]
Out[130]: ['A']

In [131]: alpha[0:1] = 'a'

In [132]: alpha
Out[132]: ['a', 'b', 'c', 'd', 'e', 'f']

In [133]: alpha[0:2] = ['A', 'B']

In [134]: alpha
Out[134]: ['A', 'B', 'c', 'd', 'e', 'f']

In [135]: alpha[2:2] = ['x', 'xx']

In [136]: alpha
Out[136]: ['A', 'B', 'x', 'xx', 'c', 'd', 'e', 'f']

切片步骤:

到现在为止,您一直在不断挑选箱子。但是有时您需要离散地接机。例如,您可以每隔两个框取一次。您甚至可以从头开始每隔三个盒子拿起。此值称为步长。这代表您连续的拾音器之间的差距。如果您要从头到尾选择框,则步长应该为正,反之亦然。

In [137]: alpha = ['a', 'b', 'c', 'd', 'e', 'f']

In [142]: alpha[1:5:2]
Out[142]: ['b', 'd']

In [143]: alpha[-1:-5:-2]
Out[143]: ['f', 'd']

In [144]: alpha[1:5:-2]
Out[144]: []

In [145]: alpha[-1:-5:2]
Out[145]: []

Python如何找出缺失的参数:

切片时,如果遗漏任何参数,Python会尝试自动找出它。

如果您检查CPython的源代码,则会发现一个名为PySlice_GetIndicesEx()的函数,该函数可以为任何给定参数找出切片的索引。这是Python中的逻辑等效代码。

此函数采用Python对象和可选参数进行切片,并返回所请求切片的开始,停止,步长和切片长度。

def py_slice_get_indices_ex(obj, start=None, stop=None, step=None):

    length = len(obj)

    if step is None:
        step = 1
    if step == 0:
        raise Exception("Step cannot be zero.")

    if start is None:
        start = 0 if step > 0 else length - 1
    else:
        if start < 0:
            start += length
        if start < 0:
            start = 0 if step > 0 else -1
        if start >= length:
            start = length if step > 0 else length - 1

    if stop is None:
        stop = length if step > 0 else -1
    else:
        if stop < 0:
            stop += length
        if stop < 0:
            stop = 0 if step > 0 else -1
        if stop >= length:
            stop = length if step > 0 else length - 1

    if (step < 0 and stop >= start) or (step > 0 and start >= stop):
        slice_length = 0
    elif step < 0:
        slice_length = (stop - start + 1)/(step) + 1
    else:
        slice_length = (stop - start - 1)/(step) + 1

    return (start, stop, step, slice_length)

这就是切片背后的智能。由于Python具有称为slice的内置函数,因此您可以传递一些参数并检查其计算缺失参数的技巧。

In [21]: alpha = ['a', 'b', 'c', 'd', 'e', 'f']

In [22]: s = slice(None, None, None)

In [23]: s
Out[23]: slice(None, None, None)

In [24]: s.indices(len(alpha))
Out[24]: (0, 6, 1)

In [25]: range(*s.indices(len(alpha)))
Out[25]: [0, 1, 2, 3, 4, 5]

In [26]: s = slice(None, None, -1)

In [27]: range(*s.indices(len(alpha)))
Out[27]: [5, 4, 3, 2, 1, 0]

In [28]: s = slice(None, 3, -1)

In [29]: range(*s.indices(len(alpha)))
Out[29]: [5, 4]

注意:这篇文章最初是在我的博客Python切片背后的情报中撰写的。

This is how I teach slices to newbies:

Understanding the difference between indexing and slicing:

Wiki Python has this amazing picture which clearly distinguishes indexing and slicing.

Enter image description here

It is a list with six elements in it. To understand slicing better, consider that list as a set of six boxes placed together. Each box has an alphabet in it.

Indexing is like dealing with the contents of box. You can check contents of any box. But you can’t check the contents of multiple boxes at once. You can even replace the contents of the box. But you can’t place two balls in one box or replace two balls at a time.

In [122]: alpha = ['a', 'b', 'c', 'd', 'e', 'f']

In [123]: alpha
Out[123]: ['a', 'b', 'c', 'd', 'e', 'f']

In [124]: alpha[0]
Out[124]: 'a'

In [127]: alpha[0] = 'A'

In [128]: alpha
Out[128]: ['A', 'b', 'c', 'd', 'e', 'f']

In [129]: alpha[0,1]
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-129-c7eb16585371> in <module>()
----> 1 alpha[0,1]

TypeError: list indices must be integers, not tuple

Slicing is like dealing with boxes themselves. You can pick up the first box and place it on another table. To pick up the box, all you need to know is the position of beginning and ending of the box.

You can even pick up the first three boxes or the last two boxes or all boxes between 1 and 4. So, you can pick any set of boxes if you know the beginning and ending. These positions are called start and stop positions.

The interesting thing is that you can replace multiple boxes at once. Also you can place multiple boxes wherever you like.

In [130]: alpha[0:1]
Out[130]: ['A']

In [131]: alpha[0:1] = 'a'

In [132]: alpha
Out[132]: ['a', 'b', 'c', 'd', 'e', 'f']

In [133]: alpha[0:2] = ['A', 'B']

In [134]: alpha
Out[134]: ['A', 'B', 'c', 'd', 'e', 'f']

In [135]: alpha[2:2] = ['x', 'xx']

In [136]: alpha
Out[136]: ['A', 'B', 'x', 'xx', 'c', 'd', 'e', 'f']

Slicing With Step:

Till now you have picked boxes continuously. But sometimes you need to pick up discretely. For example, you can pick up every second box. You can even pick up every third box from the end. This value is called step size. This represents the gap between your successive pickups. The step size should be positive if You are picking boxes from the beginning to end and vice versa.

In [137]: alpha = ['a', 'b', 'c', 'd', 'e', 'f']

In [142]: alpha[1:5:2]
Out[142]: ['b', 'd']

In [143]: alpha[-1:-5:-2]
Out[143]: ['f', 'd']

In [144]: alpha[1:5:-2]
Out[144]: []

In [145]: alpha[-1:-5:2]
Out[145]: []

How Python Figures Out Missing Parameters:

When slicing, if you leave out any parameter, Python tries to figure it out automatically.

If you check the source code of CPython, you will find a function called PySlice_GetIndicesEx() which figures out indices to a slice for any given parameters. Here is the logical equivalent code in Python.

This function takes a Python object and optional parameters for slicing and returns the start, stop, step, and slice length for the requested slice.

def py_slice_get_indices_ex(obj, start=None, stop=None, step=None):

    length = len(obj)

    if step is None:
        step = 1
    if step == 0:
        raise Exception("Step cannot be zero.")

    if start is None:
        start = 0 if step > 0 else length - 1
    else:
        if start < 0:
            start += length
        if start < 0:
            start = 0 if step > 0 else -1
        if start >= length:
            start = length if step > 0 else length - 1

    if stop is None:
        stop = length if step > 0 else -1
    else:
        if stop < 0:
            stop += length
        if stop < 0:
            stop = 0 if step > 0 else -1
        if stop >= length:
            stop = length if step > 0 else length - 1

    if (step < 0 and stop >= start) or (step > 0 and start >= stop):
        slice_length = 0
    elif step < 0:
        slice_length = (stop - start + 1)/(step) + 1
    else:
        slice_length = (stop - start - 1)/(step) + 1

    return (start, stop, step, slice_length)

This is the intelligence that is present behind slices. Since Python has an built-in function called slice, you can pass some parameters and check how smartly it calculates missing parameters.

In [21]: alpha = ['a', 'b', 'c', 'd', 'e', 'f']

In [22]: s = slice(None, None, None)

In [23]: s
Out[23]: slice(None, None, None)

In [24]: s.indices(len(alpha))
Out[24]: (0, 6, 1)

In [25]: range(*s.indices(len(alpha)))
Out[25]: [0, 1, 2, 3, 4, 5]

In [26]: s = slice(None, None, -1)

In [27]: range(*s.indices(len(alpha)))
Out[27]: [5, 4, 3, 2, 1, 0]

In [28]: s = slice(None, 3, -1)

In [29]: range(*s.indices(len(alpha)))
Out[29]: [5, 4]

Note: This post was originally written in my blog, The Intelligence Behind Python Slices.


回答 16

通常,编写带有很多硬编码索引值的代码会导致可读性和维护混乱。例如,如果一年后返回代码,您将对其进行查看,并想知道编写代码时的想法。显示的解决方案只是一种更清晰地说明代码实际运行方式的方式。通常,内置slice()创建一个slice对象,该对象可在允许slice的任何地方使用。例如:

>>> items = [0, 1, 2, 3, 4, 5, 6]
>>> a = slice(2, 4)
>>> items[2:4]
[2, 3]
>>> items[a]
[2, 3]
>>> items[a] = [10,11]
>>> items
[0, 1, 10, 11, 4, 5, 6]
>>> del items[a]
>>> items
[0, 1, 4, 5, 6]

如果您有切片实例s,则可以分别通过查看其s.start,s.stop和s.step属性来获取有关其的更多信息。例如:

>>> a = slice(10, 50, 2)
>>> a.start
10
>>> a.stop
50
>>> a.step
2
>>>

As a general rule, writing code with a lot of hardcoded index values leads to a readability and maintenance mess. For example, if you come back to the code a year later, you’ll look at it and wonder what you were thinking when you wrote it. The solution shown is simply a way of more clearly stating what your code is actually doing. In general, the built-in slice() creates a slice object that can be used anywhere a slice is allowed. For example:

>>> items = [0, 1, 2, 3, 4, 5, 6]
>>> a = slice(2, 4)
>>> items[2:4]
[2, 3]
>>> items[a]
[2, 3]
>>> items[a] = [10,11]
>>> items
[0, 1, 10, 11, 4, 5, 6]
>>> del items[a]
>>> items
[0, 1, 4, 5, 6]

If you have a slice instance s, you can get more information about it by looking at its s.start, s.stop, and s.step attributes, respectively. For example:

>>> a = slice(10, 50, 2)
>>> a.start
10
>>> a.stop
50
>>> a.step
2
>>>

回答 17

1.切片符号

为简单起见,请记住slice只有一种形式:

s[start:end:step]

这是它的工作方式:

  • s:可以切片的对象
  • start:开始迭代的第一个索引
  • end:最后一个索引,请注意,end索引将不包含在结果切片中
  • step:选择每个step索引元素

另一种进口的东西:所有的startendstep可以省略!如果省略了它们,它们的默认值将被使用:0len(s)1相应地。

因此可能的变化是:

# Mostly used variations
s[start:end]
s[start:]
s[:end]

# Step-related variations
s[:end:step]
s[start::step]
s[::step]

# Make a copy
s[:]

注意:如果start >= end(仅考虑step>0),Python将返回一个空slice []

2.陷阱

上一部分解释了切片如何工作的核心功能,并且在大多数情况下都可以使用。但是,您应该注意一些陷阱,本部分将对它们进行说明。

负指标

使Python学习者感到困惑的第一件事就是索引可能是负数! 不要惊慌:负索引意味着倒数。

例如:

s[-5:]    # Start at the 5th index from the end of array,
          # thus returning the last 5 elements.
s[:-5]    # Start at index 0, and end until the 5th index from end of array,
          # thus returning s[0:len(s)-5].

负步

使事情更加混乱的是,这step也可能是负面的!

否定步骤意味着向后迭代数组:从头到尾,包括结束索引,并且从结果中排除开始索引。

:当步骤为负值,默认值startlen(s)(虽然end不等于0,因为s[::-1]包含s[0])。例如:

s[::-1]            # Reversed slice
s[len(s)::-1]      # The same as above, reversed slice
s[0:len(s):-1]     # Empty list

超出范围错误?

惊奇: 当索引超出范围时,slice不会引发IndexError!

如果索引超出范围,Python将尽力将索引设置为0len(s)根据情况。例如:

s[:len(s)+5]      # The same as s[:len(s)]
s[-len(s)-5::]    # The same as s[0:]
s[len(s)+5::-1]   # The same as s[len(s)::-1], and the same as s[::-1]

3.例子

让我们以示例结束这个答案,解释我们所讨论的一切:

# Create our array for demonstration
In [1]: s = [i for i in range(10)]

In [2]: s
Out[2]: [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]

In [3]: s[2:]   # From index 2 to last index
Out[3]: [2, 3, 4, 5, 6, 7, 8, 9]

In [4]: s[:8]   # From index 0 up to index 8
Out[4]: [0, 1, 2, 3, 4, 5, 6, 7]

In [5]: s[4:7]  # From index 4 (included) up to index 7(excluded)
Out[5]: [4, 5, 6]

In [6]: s[:-2]  # Up to second last index (negative index)
Out[6]: [0, 1, 2, 3, 4, 5, 6, 7]

In [7]: s[-2:]  # From second last index (negative index)
Out[7]: [8, 9]

In [8]: s[::-1] # From last to first in reverse order (negative step)
Out[8]: [9, 8, 7, 6, 5, 4, 3, 2, 1, 0]

In [9]: s[::-2] # All odd numbers in reversed order
Out[9]: [9, 7, 5, 3, 1]

In [11]: s[-2::-2] # All even numbers in reversed order
Out[11]: [8, 6, 4, 2, 0]

In [12]: s[3:15]   # End is out of range, and Python will set it to len(s).
Out[12]: [3, 4, 5, 6, 7, 8, 9]

In [14]: s[5:1]    # Start > end; return empty list
Out[14]: []

In [15]: s[11]     # Access index 11 (greater than len(s)) will raise an IndexError
---------------------------------------------------------------------------
IndexError                                Traceback (most recent call last)
<ipython-input-15-79ffc22473a3> in <module>()
----> 1 s[11]

IndexError: list index out of range

1. Slice Notation

To make it simple, remember slice has only one form:

s[start:end:step]

and here is how it works:

  • s: an object that can be sliced
  • start: first index to start iteration
  • end: last index, NOTE that end index will not be included in the resulted slice
  • step: pick element every step index

Another import thing: all start,end, step can be omitted! And if they are omitted, their default value will be used: 0,len(s),1 accordingly.

So possible variations are:

# Mostly used variations
s[start:end]
s[start:]
s[:end]

# Step-related variations
s[:end:step]
s[start::step]
s[::step]

# Make a copy
s[:]

NOTE: If start >= end (considering only when step>0), Python will return a empty slice [].

2. Pitfalls

The above part explains the core features on how slice works, and it will work on most occasions. However, there can be pitfalls you should watch out, and this part explains them.

Negative indexes

The very first thing that confuses Python learners is that an index can be negative! Don’t panic: a negative index means count backwards.

For example:

s[-5:]    # Start at the 5th index from the end of array,
          # thus returning the last 5 elements.
s[:-5]    # Start at index 0, and end until the 5th index from end of array,
          # thus returning s[0:len(s)-5].

Negative step

Making things more confusing is that step can be negative too!

A negative step means iterate the array backwards: from the end to start, with the end index included, and the start index excluded from the result.

NOTE: when step is negative, the default value for start is len(s) (while end does not equal to 0, because s[::-1] contains s[0]). For example:

s[::-1]            # Reversed slice
s[len(s)::-1]      # The same as above, reversed slice
s[0:len(s):-1]     # Empty list

Out of range error?

Be surprised: slice does not raise an IndexError when the index is out of range!

If the index is out of range, Python will try its best to set the index to 0 or len(s) according to the situation. For example:

s[:len(s)+5]      # The same as s[:len(s)]
s[-len(s)-5::]    # The same as s[0:]
s[len(s)+5::-1]   # The same as s[len(s)::-1], and the same as s[::-1]

3. Examples

Let’s finish this answer with examples, explaining everything we have discussed:

# Create our array for demonstration
In [1]: s = [i for i in range(10)]

In [2]: s
Out[2]: [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]

In [3]: s[2:]   # From index 2 to last index
Out[3]: [2, 3, 4, 5, 6, 7, 8, 9]

In [4]: s[:8]   # From index 0 up to index 8
Out[4]: [0, 1, 2, 3, 4, 5, 6, 7]

In [5]: s[4:7]  # From index 4 (included) up to index 7(excluded)
Out[5]: [4, 5, 6]

In [6]: s[:-2]  # Up to second last index (negative index)
Out[6]: [0, 1, 2, 3, 4, 5, 6, 7]

In [7]: s[-2:]  # From second last index (negative index)
Out[7]: [8, 9]

In [8]: s[::-1] # From last to first in reverse order (negative step)
Out[8]: [9, 8, 7, 6, 5, 4, 3, 2, 1, 0]

In [9]: s[::-2] # All odd numbers in reversed order
Out[9]: [9, 7, 5, 3, 1]

In [11]: s[-2::-2] # All even numbers in reversed order
Out[11]: [8, 6, 4, 2, 0]

In [12]: s[3:15]   # End is out of range, and Python will set it to len(s).
Out[12]: [3, 4, 5, 6, 7, 8, 9]

In [14]: s[5:1]    # Start > end; return empty list
Out[14]: []

In [15]: s[11]     # Access index 11 (greater than len(s)) will raise an IndexError
---------------------------------------------------------------------------
IndexError                                Traceback (most recent call last)
<ipython-input-15-79ffc22473a3> in <module>()
----> 1 s[11]

IndexError: list index out of range

回答 18

先前的答案没有讨论使用著名的NumPy包可以实现的多维数组切片:

切片也可以应用于多维数组。

# Here, a is a NumPy array

>>> a
array([[ 1,  2,  3,  4],
       [ 5,  6,  7,  8],
       [ 9, 10, 11, 12]])
>>> a[:2, 0:3:2]
array([[1, 3],
       [5, 7]])

的“ :2”逗号在第一维和操作之前,“ 0:3:2”逗号在第二维操作之后。

The previous answers don’t discuss multi-dimensional array slicing which is possible using the famous NumPy package:

Slicing can also be applied to multi-dimensional arrays.

# Here, a is a NumPy array

>>> a
array([[ 1,  2,  3,  4],
       [ 5,  6,  7,  8],
       [ 9, 10, 11, 12]])
>>> a[:2, 0:3:2]
array([[1, 3],
       [5, 7]])

The “:2” before the comma operates on the first dimension and the “0:3:2” after the comma operates on the second dimension.


回答 19

#!/usr/bin/env python

def slicegraphical(s, lista):

    if len(s) > 9:
        print """Enter a string of maximum 9 characters,
    so the printig would looki nice"""
        return 0;
    # print " ",
    print '  '+'+---' * len(s) +'+'
    print ' ',
    for letter in s:
        print '| {}'.format(letter),
    print '|'
    print " ",; print '+---' * len(s) +'+'

    print " ",
    for letter in range(len(s) +1):
        print '{}  '.format(letter),
    print ""
    for letter in range(-1*(len(s)), 0):
        print ' {}'.format(letter),
    print ''
    print ''


    for triada in lista:
        if len(triada) == 3:
            if triada[0]==None and triada[1] == None and triada[2] == None:
                # 000
                print s+'[   :   :   ]' +' = ', s[triada[0]:triada[1]:triada[2]]
            elif triada[0] == None and triada[1] == None and triada[2] != None:
                # 001
                print s+'[   :   :{0:2d} ]'.format(triada[2], '','') +' = ', s[triada[0]:triada[1]:triada[2]]
            elif triada[0] == None and triada[1] != None and triada[2] == None:
                # 010
                print s+'[   :{0:2d} :   ]'.format(triada[1]) +' = ', s[triada[0]:triada[1]:triada[2]]
            elif triada[0] == None and triada[1] != None and triada[2] != None:
                # 011
                print s+'[   :{0:2d} :{1:2d} ]'.format(triada[1], triada[2]) +' = ', s[triada[0]:triada[1]:triada[2]]
            elif triada[0] != None and triada[1] == None and triada[2] == None:
                # 100
                print s+'[{0:2d} :   :   ]'.format(triada[0]) +' = ', s[triada[0]:triada[1]:triada[2]]
            elif triada[0] != None and triada[1] == None and triada[2] != None:
                # 101
                print s+'[{0:2d} :   :{1:2d} ]'.format(triada[0], triada[2]) +' = ', s[triada[0]:triada[1]:triada[2]]
            elif triada[0] != None and triada[1] != None and triada[2] == None:
                # 110
                print s+'[{0:2d} :{1:2d} :   ]'.format(triada[0], triada[1]) +' = ', s[triada[0]:triada[1]:triada[2]]
            elif triada[0] != None and triada[1] != None and triada[2] != None:
                # 111
                print s+'[{0:2d} :{1:2d} :{2:2d} ]'.format(triada[0], triada[1], triada[2]) +' = ', s[triada[0]:triada[1]:triada[2]]

        elif len(triada) == 2:
            if triada[0] == None and triada[1] == None:
                # 00
                print s+'[   :   ]    ' + ' = ', s[triada[0]:triada[1]]
            elif triada[0] == None and triada[1] != None:
                # 01
                print s+'[   :{0:2d} ]    '.format(triada[1]) + ' = ', s[triada[0]:triada[1]]
            elif triada[0] != None and triada[1] == None:
                # 10
                print s+'[{0:2d} :   ]    '.format(triada[0]) + ' = ', s[triada[0]:triada[1]]
            elif triada[0] != None and triada[1] != None:
                # 11
                print s+'[{0:2d} :{1:2d} ]    '.format(triada[0],triada[1]) + ' = ', s[triada[0]:triada[1]]

        elif len(triada) == 1:
            print s+'[{0:2d} ]        '.format(triada[0]) + ' = ', s[triada[0]]


if __name__ == '__main__':
    # Change "s" to what ever string you like, make it 9 characters for
    # better representation.
    s = 'COMPUTERS'

    # add to this list different lists to experement with indexes
    # to represent ex. s[::], use s[None, None,None], otherwise you get an error
    # for s[2:] use s[2:None]

    lista = [[4,7],[2,5,2],[-5,1,-1],[4],[-4,-6,-1], [2,-3,1],[2,-3,-1], [None,None,-1],[-5,None],[-5,0,-1],[-5,None,-1],[-1,1,-2]]

    slicegraphical(s, lista)

您可以运行此脚本并进行实验,以下是我从脚本中获得的一些示例。

  +---+---+---+---+---+---+---+---+---+
  | C | O | M | P | U | T | E | R | S |
  +---+---+---+---+---+---+---+---+---+
  0   1   2   3   4   5   6   7   8   9   
 -9  -8  -7  -6  -5  -4  -3  -2  -1 

COMPUTERS[ 4 : 7 ]     =  UTE
COMPUTERS[ 2 : 5 : 2 ] =  MU
COMPUTERS[-5 : 1 :-1 ] =  UPM
COMPUTERS[ 4 ]         =  U
COMPUTERS[-4 :-6 :-1 ] =  TU
COMPUTERS[ 2 :-3 : 1 ] =  MPUT
COMPUTERS[ 2 :-3 :-1 ] =  
COMPUTERS[   :   :-1 ] =  SRETUPMOC
COMPUTERS[-5 :   ]     =  UTERS
COMPUTERS[-5 : 0 :-1 ] =  UPMO
COMPUTERS[-5 :   :-1 ] =  UPMOC
COMPUTERS[-1 : 1 :-2 ] =  SEUM
[Finished in 0.9s]

当使用否定步骤时,请注意答案右移1。

#!/usr/bin/env python

def slicegraphical(s, lista):

    if len(s) > 9:
        print """Enter a string of maximum 9 characters,
    so the printig would looki nice"""
        return 0;
    # print " ",
    print '  '+'+---' * len(s) +'+'
    print ' ',
    for letter in s:
        print '| {}'.format(letter),
    print '|'
    print " ",; print '+---' * len(s) +'+'

    print " ",
    for letter in range(len(s) +1):
        print '{}  '.format(letter),
    print ""
    for letter in range(-1*(len(s)), 0):
        print ' {}'.format(letter),
    print ''
    print ''


    for triada in lista:
        if len(triada) == 3:
            if triada[0]==None and triada[1] == None and triada[2] == None:
                # 000
                print s+'[   :   :   ]' +' = ', s[triada[0]:triada[1]:triada[2]]
            elif triada[0] == None and triada[1] == None and triada[2] != None:
                # 001
                print s+'[   :   :{0:2d} ]'.format(triada[2], '','') +' = ', s[triada[0]:triada[1]:triada[2]]
            elif triada[0] == None and triada[1] != None and triada[2] == None:
                # 010
                print s+'[   :{0:2d} :   ]'.format(triada[1]) +' = ', s[triada[0]:triada[1]:triada[2]]
            elif triada[0] == None and triada[1] != None and triada[2] != None:
                # 011
                print s+'[   :{0:2d} :{1:2d} ]'.format(triada[1], triada[2]) +' = ', s[triada[0]:triada[1]:triada[2]]
            elif triada[0] != None and triada[1] == None and triada[2] == None:
                # 100
                print s+'[{0:2d} :   :   ]'.format(triada[0]) +' = ', s[triada[0]:triada[1]:triada[2]]
            elif triada[0] != None and triada[1] == None and triada[2] != None:
                # 101
                print s+'[{0:2d} :   :{1:2d} ]'.format(triada[0], triada[2]) +' = ', s[triada[0]:triada[1]:triada[2]]
            elif triada[0] != None and triada[1] != None and triada[2] == None:
                # 110
                print s+'[{0:2d} :{1:2d} :   ]'.format(triada[0], triada[1]) +' = ', s[triada[0]:triada[1]:triada[2]]
            elif triada[0] != None and triada[1] != None and triada[2] != None:
                # 111
                print s+'[{0:2d} :{1:2d} :{2:2d} ]'.format(triada[0], triada[1], triada[2]) +' = ', s[triada[0]:triada[1]:triada[2]]

        elif len(triada) == 2:
            if triada[0] == None and triada[1] == None:
                # 00
                print s+'[   :   ]    ' + ' = ', s[triada[0]:triada[1]]
            elif triada[0] == None and triada[1] != None:
                # 01
                print s+'[   :{0:2d} ]    '.format(triada[1]) + ' = ', s[triada[0]:triada[1]]
            elif triada[0] != None and triada[1] == None:
                # 10
                print s+'[{0:2d} :   ]    '.format(triada[0]) + ' = ', s[triada[0]:triada[1]]
            elif triada[0] != None and triada[1] != None:
                # 11
                print s+'[{0:2d} :{1:2d} ]    '.format(triada[0],triada[1]) + ' = ', s[triada[0]:triada[1]]

        elif len(triada) == 1:
            print s+'[{0:2d} ]        '.format(triada[0]) + ' = ', s[triada[0]]


if __name__ == '__main__':
    # Change "s" to what ever string you like, make it 9 characters for
    # better representation.
    s = 'COMPUTERS'

    # add to this list different lists to experement with indexes
    # to represent ex. s[::], use s[None, None,None], otherwise you get an error
    # for s[2:] use s[2:None]

    lista = [[4,7],[2,5,2],[-5,1,-1],[4],[-4,-6,-1], [2,-3,1],[2,-3,-1], [None,None,-1],[-5,None],[-5,0,-1],[-5,None,-1],[-1,1,-2]]

    slicegraphical(s, lista)

You can run this script and experiment with it, below is some samples that I got from the script.

  +---+---+---+---+---+---+---+---+---+
  | C | O | M | P | U | T | E | R | S |
  +---+---+---+---+---+---+---+---+---+
  0   1   2   3   4   5   6   7   8   9   
 -9  -8  -7  -6  -5  -4  -3  -2  -1 

COMPUTERS[ 4 : 7 ]     =  UTE
COMPUTERS[ 2 : 5 : 2 ] =  MU
COMPUTERS[-5 : 1 :-1 ] =  UPM
COMPUTERS[ 4 ]         =  U
COMPUTERS[-4 :-6 :-1 ] =  TU
COMPUTERS[ 2 :-3 : 1 ] =  MPUT
COMPUTERS[ 2 :-3 :-1 ] =  
COMPUTERS[   :   :-1 ] =  SRETUPMOC
COMPUTERS[-5 :   ]     =  UTERS
COMPUTERS[-5 : 0 :-1 ] =  UPMO
COMPUTERS[-5 :   :-1 ] =  UPMOC
COMPUTERS[-1 : 1 :-2 ] =  SEUM
[Finished in 0.9s]

When using a negative step, notice that the answer is shifted to the right by 1.


回答 20

我的大脑似乎很高兴接受lst[start:end]包含start-th项的内容。我什至可以说这是一个“自然的假设”。

但是偶尔会有一个疑问浮出水面,我的大脑要求确保它不含end-th元素。

在这些时刻,我依靠这个简单的定理:

for any n,    lst = lst[:n] + lst[n:]

这个漂亮的属性告诉我,lst[start:end]它不包含end-th项,因为它在中lst[end:]

注意,该定理对任何一个n都成立。例如,您可以检查

lst = range(10)
lst[:-42] + lst[-42:] == lst

返回True

My brain seems happy to accept that lst[start:end] contains the start-th item. I might even say that it is a ‘natural assumption’.

But occasionally a doubt creeps in and my brain asks for reassurance that it does not contain the end-th element.

In these moments I rely on this simple theorem:

for any n,    lst = lst[:n] + lst[n:]

This pretty property tells me that lst[start:end] does not contain the end-th item because it is in lst[end:].

Note that this theorem is true for any n at all. For example, you can check that

lst = range(10)
lst[:-42] + lst[-42:] == lst

returns True.


回答 21

我认为,如果以以下方式(继续阅读)看待它,您将更好地理解和记住Python字符串切片表示法。

让我们使用以下字符串…

azString = "abcdefghijklmnopqrstuvwxyz"

对于那些不知道的人,您可以azString使用符号来创建任何子字符串azString[x:y]

来自其他编程语言的那是常识受到损害的时候。x和y是什么?

在寻求一种记忆技术时,我不得不坐下来并运行几种方案,该技术将帮助我记住x和y是什么,并帮助我在第一次尝试中正确地分割字符串。

我的结论是,x和y应该被视为包围我们要附加的字符串的边界索引。因此,我们应该将表达式视为azString[index1, index2]或什至更清晰azString[index_of_first_character, index_after_the_last_character]

这是该示例的可视化示例…

Letters   a b c d e f g h i j ...
         ↑ ↑ ↑ ↑ ↑ ↑ ↑ ↑ ↑ ↑
             ┊           ┊
Indexes  0 1 2 3 4 5 6 7 8 9 ...
             ┊           ┊
cdefgh    index1       index2

因此,您要做的就是将index1和index2设置为所需子字符串周围的值。例如,要获取子字符串“ cdefgh”,您可以使用azString[2:8],因为“ c”左侧的索引为2,而右侧“ h”的索引为8。

请记住,我们正在设定界限。这些边界是您可以放置​​一些括号的位置,这些括号将像这样围绕子字符串…

ab [ cdefgh ] ij

该技巧始终有效,并且易于记忆。

In my opinion, you will understand and memorize better the Python string slicing notation if you look at it the following way (read on).

Let’s work with the following string …

azString = "abcdefghijklmnopqrstuvwxyz"

For those who don’t know, you can create any substring from azString using the notation azString[x:y]

Coming from other programming languages, that’s when the common sense gets compromised. What are x and y?

I had to sit down and run several scenarios in my quest for a memorization technique that will help me remember what x and y are and help me slice strings properly at the first attempt.

My conclusion is that x and y should be seen as the boundary indexes that are surrounding the strings that we want to extra. So we should see the expression as azString[index1, index2] or even more clearer as azString[index_of_first_character, index_after_the_last_character].

Here is an example visualization of that …

Letters   a b c d e f g h i j ...
         ↑ ↑ ↑ ↑ ↑ ↑ ↑ ↑ ↑ ↑
             ┊           ┊
Indexes  0 1 2 3 4 5 6 7 8 9 ...
             ┊           ┊
cdefgh    index1       index2

So all you have to do is setting index1 and index2 to the values that will surround the desired substring. For instance, to get the substring “cdefgh”, you can use azString[2:8], because the index on the left side of “c” is 2 and the one on the right size of “h” is 8.

Remember that we are setting the boundaries. And those boundaries are the positions where you could place some brackets that will be wrapped around the substring like this …

a b [ c d e f g h ] i j

That trick works all the time and is easy to memorize.


回答 22

先前的大多数答案都清除了有关切片符号的问题。

用于切片的扩展索引语法为aList[start:stop:step],基本示例为:

在此处输入图片说明

更多切片示例:15个扩展切片

Most of the previous answers clears up questions about slice notation.

The extended indexing syntax used for slicing is aList[start:stop:step], and basic examples are:

Enter image description here:

More slicing examples: 15 Extended Slices


回答 23

在Python中,切片的最基本形式如下:

l[start:end]

where l是某个集合,start是一个包含索引,并且end是一个排斥索引。

In [1]: l = list(range(10))

In [2]: l[:5] # First five elements
Out[2]: [0, 1, 2, 3, 4]

In [3]: l[-5:] # Last five elements
Out[3]: [5, 6, 7, 8, 9]

从头开始切片时,可以省略零索引,而从末尾切片时,可以省略最终索引,因为它是多余的,所以不要太冗长:

In [5]: l[:3] == l[0:3]
Out[5]: True

In [6]: l[7:] == l[7:len(l)]
Out[6]: True

在相对于集合末尾进行偏移量时,负整数很有用:

In [7]: l[:-1] # Include all elements but the last one
Out[7]: [0, 1, 2, 3, 4, 5, 6, 7, 8]

In [8]: l[-3:] # Take the last three elements
Out[8]: [7, 8, 9]

切片时可以提供超出范围的索引,例如:

In [9]: l[:20] # 20 is out of index bounds, and l[20] will raise an IndexError exception
Out[9]: [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]

In [11]: l[-20:] # -20 is out of index bounds, and l[-20] will raise an IndexError exception
Out[11]: [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]

请记住,切片集合的结果是一个全新的集合。另外,在分配中使用切片表示法时,切片分配的长度不必相同。将保留分配的切片之前和之后的值,并且集合将缩小或增长以包含新值:

In [16]: l[2:6] = list('abc') # Assigning fewer elements than the ones contained in the sliced collection l[2:6]

In [17]: l
Out[17]: [0, 1, 'a', 'b', 'c', 6, 7, 8, 9]

In [18]: l[2:5] = list('hello') # Assigning more elements than the ones contained in the sliced collection l [2:5]

In [19]: l
Out[19]: [0, 1, 'h', 'e', 'l', 'l', 'o', 6, 7, 8, 9]

如果省略开始索引和结束索引,则将复制该集合:

In [14]: l_copy = l[:]

In [15]: l == l_copy and l is not l_copy
Out[15]: True

如果在执行赋值操作时省略了开始索引和结束索引,则集合的全部内容将被引用的副本代替:

In [20]: l[:] = list('hello...')

In [21]: l
Out[21]: ['h', 'e', 'l', 'l', 'o', '.', '.', '.']

除了基本切片之外,还可以应用以下符号:

l[start:end:step]

where l是一个集合,start是一个包含索引,end是一个排他索引,并且step是一个可用于获取第n个项目的跨度l

In [22]: l = list(range(10))

In [23]: l[::2] # Take the elements which indexes are even
Out[23]: [0, 2, 4, 6, 8]

In [24]: l[1::2] # Take the elements which indexes are odd
Out[24]: [1, 3, 5, 7, 9]

使用step提供了一个有用的技巧来反转Python中的集合:

In [25]: l[::-1]
Out[25]: [9, 8, 7, 6, 5, 4, 3, 2, 1, 0]

step下面的示例也可以使用负整数:

In[28]:  l[::-2]
Out[28]: [9, 7, 5, 3, 1]

但是,对使用负值step可能会造成混乱。此外,为了Python化,应避免使用startend以及step在一个片。如果需要这样做,请考虑分两次进行(一次进行切片,另一次进行大跨步)。

In [29]: l = l[::2] # This step is for striding

In [30]: l
Out[30]: [0, 2, 4, 6, 8]

In [31]: l = l[1:-1] # This step is for slicing

In [32]: l
Out[32]: [2, 4, 6]

In Python, the most basic form for slicing is the following:

l[start:end]

where l is some collection, start is an inclusive index, and end is an exclusive index.

In [1]: l = list(range(10))

In [2]: l[:5] # First five elements
Out[2]: [0, 1, 2, 3, 4]

In [3]: l[-5:] # Last five elements
Out[3]: [5, 6, 7, 8, 9]

When slicing from the start, you can omit the zero index, and when slicing to the end, you can omit the final index since it is redundant, so do not be verbose:

In [5]: l[:3] == l[0:3]
Out[5]: True

In [6]: l[7:] == l[7:len(l)]
Out[6]: True

Negative integers are useful when doing offsets relative to the end of a collection:

In [7]: l[:-1] # Include all elements but the last one
Out[7]: [0, 1, 2, 3, 4, 5, 6, 7, 8]

In [8]: l[-3:] # Take the last three elements
Out[8]: [7, 8, 9]

It is possible to provide indices that are out of bounds when slicing such as:

In [9]: l[:20] # 20 is out of index bounds, and l[20] will raise an IndexError exception
Out[9]: [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]

In [11]: l[-20:] # -20 is out of index bounds, and l[-20] will raise an IndexError exception
Out[11]: [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]

Keep in mind that the result of slicing a collection is a whole new collection. In addition, when using slice notation in assignments, the length of the slice assignments do not need to be the same. The values before and after the assigned slice will be kept, and the collection will shrink or grow to contain the new values:

In [16]: l[2:6] = list('abc') # Assigning fewer elements than the ones contained in the sliced collection l[2:6]

In [17]: l
Out[17]: [0, 1, 'a', 'b', 'c', 6, 7, 8, 9]

In [18]: l[2:5] = list('hello') # Assigning more elements than the ones contained in the sliced collection l [2:5]

In [19]: l
Out[19]: [0, 1, 'h', 'e', 'l', 'l', 'o', 6, 7, 8, 9]

If you omit the start and end index, you will make a copy of the collection:

In [14]: l_copy = l[:]

In [15]: l == l_copy and l is not l_copy
Out[15]: True

If the start and end indexes are omitted when performing an assignment operation, the entire content of the collection will be replaced with a copy of what is referenced:

In [20]: l[:] = list('hello...')

In [21]: l
Out[21]: ['h', 'e', 'l', 'l', 'o', '.', '.', '.']

Besides basic slicing, it is also possible to apply the following notation:

l[start:end:step]

where l is a collection, start is an inclusive index, end is an exclusive index, and step is a stride that can be used to take every nth item in l.

In [22]: l = list(range(10))

In [23]: l[::2] # Take the elements which indexes are even
Out[23]: [0, 2, 4, 6, 8]

In [24]: l[1::2] # Take the elements which indexes are odd
Out[24]: [1, 3, 5, 7, 9]

Using step provides a useful trick to reverse a collection in Python:

In [25]: l[::-1]
Out[25]: [9, 8, 7, 6, 5, 4, 3, 2, 1, 0]

It is also possible to use negative integers for step as the following example:

In[28]:  l[::-2]
Out[28]: [9, 7, 5, 3, 1]

However, using a negative value for step could become very confusing. Moreover, in order to be Pythonic, you should avoid using start, end, and step in a single slice. In case this is required, consider doing this in two assignments (one to slice, and the other to stride).

In [29]: l = l[::2] # This step is for striding

In [30]: l
Out[30]: [0, 2, 4, 6, 8]

In [31]: l = l[1:-1] # This step is for slicing

In [32]: l
Out[32]: [2, 4, 6]

回答 24

我想添加一个世界您好!这个例子为初学者解释了切片的基础。这对我帮助很大。

我们来看一个包含六个值的列表['P', 'Y', 'T', 'H', 'O', 'N']

+---+---+---+---+---+---+
| P | Y | T | H | O | N |
+---+---+---+---+---+---+
  0   1   2   3   4   5

现在,该列表中最简单的部分就是其子列表。符号是[<index>:<index>],关键是这样阅读:

[ start cutting before this index : end cutting before this index ]

现在,如果您[2:5]从上面的列表中切出一部分,就会发生这种情况:

        |           |
+---+---|---+---+---|---+
| P | Y | T | H | O | N |
+---+---|---+---+---|---+
  0   1 | 2   3   4 | 5

您在具有index的元素之前进行了切割,在具有index 的元素之前进行2另一切割。因此,结果将是这两个削减之间的一个片段,一个清单。5['T', 'H', 'O']

I want to add one Hello, World! example that explains the basics of slices for the very beginners. It helped me a lot.

Let’s have a list with six values ['P', 'Y', 'T', 'H', 'O', 'N']:

+---+---+---+---+---+---+
| P | Y | T | H | O | N |
+---+---+---+---+---+---+
  0   1   2   3   4   5

Now the simplest slices of that list are its sublists. The notation is [<index>:<index>] and the key is to read it like this:

[ start cutting before this index : end cutting before this index ]

Now if you make a slice [2:5] of the list above, this will happen:

        |           |
+---+---|---+---+---|---+
| P | Y | T | H | O | N |
+---+---|---+---+---|---+
  0   1 | 2   3   4 | 5

You made a cut before the element with index 2 and another cut before the element with index 5. So the result will be a slice between those two cuts, a list ['T', 'H', 'O'].


回答 25

我个人认为它像一个for循环:

a[start:end:step]
# for(i = start; i < end; i += step)

此外,请注意,对于负值startend相对于所述列表的末尾和上述通过计算在示例given_index + a.shape[0]

I personally think about it like a for loop:

a[start:end:step]
# for(i = start; i < end; i += step)

Also, note that negative values for start and end are relative to the end of the list and computed in the example above by given_index + a.shape[0].


回答 26

以下是字符串索引的示例:

 +---+---+---+---+---+
 | H | e | l | p | A |
 +---+---+---+---+---+
 0   1   2   3   4   5
-5  -4  -3  -2  -1

str="Name string"

切片示例:[开始:结束:步骤]

str[start:end] # Items start through end-1
str[start:]    # Items start through the rest of the array
str[:end]      # Items from the beginning through end-1
str[:]         # A copy of the whole array

下面是示例用法:

print str[0] = N
print str[0:2] = Na
print str[0:7] = Name st
print str[0:7:2] = Nm t
print str[0:-1:2] = Nm ti

The below is the example of an index of a string:

 +---+---+---+---+---+
 | H | e | l | p | A |
 +---+---+---+---+---+
 0   1   2   3   4   5
-5  -4  -3  -2  -1

str="Name string"

Slicing example: [start:end:step]

str[start:end] # Items start through end-1
str[start:]    # Items start through the rest of the array
str[:end]      # Items from the beginning through end-1
str[:]         # A copy of the whole array

Below is the example usage:

print str[0] = N
print str[0:2] = Na
print str[0:7] = Name st
print str[0:7:2] = Nm t
print str[0:-1:2] = Nm ti

回答 27

如果您认为切片中的负索引令人困惑,这是一种很简单的思考方法:只需将负索引替换为len - index。因此,例如,将-3替换为len(list) - 3

说明内部切片功能的最佳方法是在实现此操作的代码中显示它:

def slice(list, start = None, end = None, step = 1):
  # Take care of missing start/end parameters
  start = 0 if start is None else start
  end = len(list) if end is None else end

  # Take care of negative start/end parameters
  start = len(list) + start if start < 0 else start
  end = len(list) + end if end < 0 else end

  # Now just execute a for-loop with start, end and step
  return [list[i] for i in range(start, end, step)]

If you feel negative indices in slicing is confusing, here’s a very easy way to think about it: just replace the negative index with len - index. So for example, replace -3 with len(list) - 3.

The best way to illustrate what slicing does internally is just show it in code that implements this operation:

def slice(list, start = None, end = None, step = 1):
  # Take care of missing start/end parameters
  start = 0 if start is None else start
  end = len(list) if end is None else end

  # Take care of negative start/end parameters
  start = len(list) + start if start < 0 else start
  end = len(list) + end if end < 0 else end

  # Now just execute a for-loop with start, end and step
  return [list[i] for i in range(start, end, step)]

回答 28

基本切片技术是定义起点,终点和步长-也称为步幅。

首先,我们将创建一个值列表以用于切片。

创建两个要切片的列表。第一个是从1到9的数字列表(列表A)。第二个也是一个数字列表,从0到9(列表B):

A = list(range(1, 10, 1)) # Start, stop, and step
B = list(range(9))

print("This is List A:", A)
print("This is List B:", B)

索引A中的数字3和B中的数字6。

print(A[2])
print(B[6])

基本切片

用于切片的扩展索引语法为aList [start:stop:step]。start参数和step参数都默认为none-唯一需要的参数是stop。您是否注意到这类似于使用范围定义列表A和B的方式?这是因为slice对象代表由range(开始,停止,步进)指定的索引集。Python 3.4文档。

如您所见,仅定义stop将返回一个元素。由于开始默认为无,因此这意味着只检索一个元素。

请注意,第一个元素是索引0,而不是索引索引1。这就是为什么我们在此练习中使用2个列表的原因。列表A的元素根据顺序位置编号(第一个元素为1,第二个元素为2,依此类推),而列表B的元素为将用于为其编号的数字(第一个元素为[0],第一个元素为[0],等等。)。

使用扩展的索引语法,我们检索值的范围。例如,所有值都用冒号检索。

A[:]

要检索元素的子集,需要定义开始位置和停止位置。

给定模式aList [start:stop],从列表A中检索前两个元素。

The basic slicing technique is to define the starting point, the stopping point, and the step size – also known as stride.

First, we will create a list of values to use in our slicing.

Create two lists to slice. The first is a numeric list from 1 to 9 (List A). The second is also a numeric list, from 0 to 9 (List B):

A = list(range(1, 10, 1)) # Start, stop, and step
B = list(range(9))

print("This is List A:", A)
print("This is List B:", B)

Index the number 3 from A and the number 6 from B.

print(A[2])
print(B[6])

Basic Slicing

Extended indexing syntax used for slicing is aList[start:stop:step]. The start argument and the step argument both default to none – the only required argument is stop. Did you notice this is similar to how range was used to define lists A and B? This is because the slice object represents the set of indices specified by range(start, stop, step). Python 3.4 documentation.

As you can see, defining only stop returns one element. Since the start defaults to none, this translates into retrieving only one element.

It is important to note, the first element is index 0, not index 1. This is why we are using 2 lists for this exercise. List A’s elements are numbered according to the ordinal position (the first element is 1, the second element is 2, etc.) while List B’s elements are the numbers that would be used to index them ([0] for the first element 0, etc.).

With extended indexing syntax, we retrieve a range of values. For example, all values are retrieved with a colon.

A[:]

To retrieve a subset of elements, the start and stop positions need to be defined.

Given the pattern aList[start:stop], retrieve the first two elements from List A.


回答 29

我认为Python教程图(在其他各种答案中被引用)不是很好,因为该建议对积极的步伐有效,但对消极的步伐却无效。

这是图:

 +---+---+---+---+---+---+
 | P | y | t | h | o | n |
 +---+---+---+---+---+---+
 0   1   2   3   4   5   6
-6  -5  -4  -3  -2  -1

从图中,我希望a[-4,-6,-1]是,yP但是它是ty

>>> a = "Python"
>>> a[2:4:1] # as expected
'th'
>>> a[-4:-6:-1] # off by 1
'ty'

始终起作用的是思考字符或空位,并使用索引作为半开间隔–如果正向跨步则向右打开,如果负向跨步则向左打开。

通过这种方式,我能想到的a[-4:-6:-1]a(-6,-4]在区间的术语。

 +---+---+---+---+---+---+
 | P | y | t | h | o | n |
 +---+---+---+---+---+---+
   0   1   2   3   4   5  
  -6  -5  -4  -3  -2  -1

 +---+---+---+---+---+---+---+---+---+---+---+---+
 | P | y | t | h | o | n | P | y | t | h | o | n |
 +---+---+---+---+---+---+---+---+---+---+---+---+
  -6  -5  -4  -3  -2  -1   0   1   2   3   4   5  

I don’t think that the Python tutorial diagram (cited in various other answers) is good as this suggestion works for positive stride, but does not for a negative stride.

This is the diagram:

 +---+---+---+---+---+---+
 | P | y | t | h | o | n |
 +---+---+---+---+---+---+
 0   1   2   3   4   5   6
-6  -5  -4  -3  -2  -1

From the diagram, I expect a[-4,-6,-1] to be yP but it is ty.

>>> a = "Python"
>>> a[2:4:1] # as expected
'th'
>>> a[-4:-6:-1] # off by 1
'ty'

What always work is to think in characters or slots and use indexing as a half-open interval — right-open if positive stride, left-open if negative stride.

This way, I can think of a[-4:-6:-1] as a(-6,-4] in interval terminology.

 +---+---+---+---+---+---+
 | P | y | t | h | o | n |
 +---+---+---+---+---+---+
   0   1   2   3   4   5  
  -6  -5  -4  -3  -2  -1

 +---+---+---+---+---+---+---+---+---+---+---+---+
 | P | y | t | h | o | n | P | y | t | h | o | n |
 +---+---+---+---+---+---+---+---+---+---+---+---+
  -6  -5  -4  -3  -2  -1   0   1   2   3   4   5  

Python yield 关键字有什么作用?详细解答

yield 关键字有什么作用?要了解 yield 的作用,您必须了解生成器是什么。在了解生成器之前,您必须了解iterables(可迭代对象)。

Python yield 关键字有什么作用?详细解答

1. 什么是可迭代对象?

对于列表时,您可以一项一项地输出它的值。一项一项地读取列表的内容,这种形式被称为迭代:

>>> mylist = [1, 2, 3]
>>> for i in mylist:
...    print(i)
1
2
3

mylist是一个可迭代的对象。当您使用列表推导式时,您将创建一个列表,以及一个可迭代对象:

>>> mylist = [x*x for x in range(3)]
>>> for i in mylist:
...    print(i)
0
1
4

那些可以被你使用 “ for... in...” 迭代的所有对象都是可迭代的,比如 数组、字符串等。

这些可迭代对象很方便,因为您可以随心所欲地读取它们,但是您将所有值存储在内存中,当您有很多值时,这并不总是您想要的。

2. 什么是生成器?

生成器是迭代器,一种只能迭代一次的可迭代对象。生成器不会将所有值存储在内存中,它们会即时生成值

>>> mygenerator = (x*x for x in range(3))
>>> for i in mygenerator:
...    print(i)
0
1
4

生成器创建的时候需要用 () 而不是 [] . 但是,您不能重复执行 for i in mygenerator ,因为生成器只能使用一次:它们计算出 0 (0*0),然后忘记它并计算得到 1 (1*1),然后一一结束计算得到 4 (2*2)。

3. 重点来了,什么是yield? yield 关键字有什么作用?

yield 是一个像 return 一样使用的关键字,不同的是使用yield会使该函数返回一个生成器。

>>> def create_generator():
...    mylist = range(3)
...    for i in mylist:
...        yield i*i
...
>>> mygenerator = create_generator() # create a generator
>>> print(mygenerator) # mygenerator is an object!
<generator object create_generator at 0xb7555c34>
>>> for i in mygenerator:
...     print(i)
0
1
4

这是一个无用的示例,但是当您知道您的函数将返回大量您只需要读取一次的值时,它会很方便。

要掌握yield,你必须明白,当你调用函数时,你写在函数体中的代码并没有运行。该函数只返回生成器对象。然后,您的代码将在每次for循环使用生成器时从停止的地方继续。

现在是困难的部分:

第一次 for 调用从您的函数创建的生成器对象时,它将从头开始运行您的函数中的代码,直到命中yield,然后它将返回循环的第一个值。然后,每个后续调用将运行您在函数中编写的循环的另一次迭代并返回下一个值。这将一直持续到生成器被认为是空的为止。


4. 控制生成器耗尽的一个例子

>>> class Bank(): # Let's create a bank, building ATMs
...    crisis = False
...    def create_atm(self):
...        while not self.crisis:
...            yield "$100"
>>> hsbc = Bank() # When everything's ok the ATM gives you as much as you want
>>> corner_street_atm = hsbc.create_atm()
>>> print(corner_street_atm.next())
$100
>>> print(corner_street_atm.next())
$100
>>> print([corner_street_atm.next() for cash in range(5)])
['$100', '$100', '$100', '$100', '$100']
>>> hsbc.crisis = True # Crisis is coming, no more money!
>>> print(corner_street_atm.next())
<type 'exceptions.StopIteration'>
>>> wall_street_atm = hsbc.create_atm() # It's even true for new ATMs
>>> print(wall_street_atm.next())
<type 'exceptions.StopIteration'>
>>> hsbc.crisis = False # The trouble is, even post-crisis the ATM remains empty
>>> print(corner_street_atm.next())
<type 'exceptions.StopIteration'>
>>> brand_new_atm = hsbc.create_atm() # Build a new one to get back in business
>>> for cash in brand_new_atm:
...    print cash
$100
$100
$100
$100
$100
$100
$100
$100
$100
...

注意:对于 Python 3,使用print(corner_street_atm.__next__())print(next(corner_street_atm))

它可用于控制对资源的访问等各种事情。

5.Itertools,你最好的朋友

itertools 模块包含操作可迭代对象的特殊函数。曾经想复制一个生成器吗?连接两个生成器?使用单行对嵌套列表中的值进行分组?Map/Zip 不创建另一个列表?

那么就 import itertools.

一个例子?让我们看看四马比赛可能的到达顺序:

>>> horses = [1, 2, 3, 4]
>>> races = itertools.permutations(horses)
>>> print(races)
<itertools.permutations object at 0xb754f1dc>
>>> print(list(itertools.permutations(horses)))
[(1, 2, 3, 4),
 (1, 2, 4, 3),
 (1, 3, 2, 4),
 (1, 3, 4, 2),
 (1, 4, 2, 3),
 (1, 4, 3, 2),
 (2, 1, 3, 4),
 (2, 1, 4, 3),
 (2, 3, 1, 4),
 (2, 3, 4, 1),
 (2, 4, 1, 3),
 (2, 4, 3, 1),
 (3, 1, 2, 4),
 (3, 1, 4, 2),
 (3, 2, 1, 4),
 (3, 2, 4, 1),
 (3, 4, 1, 2),
 (3, 4, 2, 1),
 (4, 1, 2, 3),
 (4, 1, 3, 2),
 (4, 2, 1, 3),
 (4, 2, 3, 1),
 (4, 3, 1, 2),
 (4, 3, 2, 1)]

6.理解迭代的内部机制

迭代是一个包含可迭代对象(实现__iter__()方法)和迭代器(实现__next__()方法)的过程。

可迭代对象是您可以从中获取迭代器的任何对象。迭代器是让你迭代可迭代对象的对象。

这篇文章中有更多关于for循环如何工作的内容。

我们的文章到此就结束啦,如果你喜欢今天的 Python 教程,请持续关注Python实用宝典。

有任何问题,可以在公众号后台回复:加群,回答相应验证信息,进入互助群询问。

原创不易,希望你能在下面点个赞和在看支持我继续创作,谢谢!


​Python实用宝典 ( pythondict.com )
不只是一个宝典
欢迎关注公众号:Python实用宝典

Pandas 性能优化