标签归档:list

根据布尔值列表过滤列表

问题:根据布尔值列表过滤列表

我有一个值列表,需要根据布尔值列表中的值进行过滤:

list_a = [1, 2, 4, 6]
filter = [True, False, True, False]

我使用以下行生成一个新的过滤列表:

filtered_list = [i for indx,i in enumerate(list_a) if filter[indx] == True]

结果是:

print filtered_list
[1,4]

这条线工作正常,但是(对我而言)看起来有些过分了,我想知道是否有更简单的方法来实现这一目标。


忠告

以下答案提供了两个好的建议:

1-不要filter像我一样命名列表,因为它是内置函数。

2-不要比较True像我做的事情,if filter[idx]==True..因为这是不必要的。只需使用if filter[idx]就足够了。

I have a list of values which I need to filter given the values in a list of booleans:

list_a = [1, 2, 4, 6]
filter = [True, False, True, False]

I generate a new filtered list with the following line:

filtered_list = [i for indx,i in enumerate(list_a) if filter[indx] == True]

which results in:

print filtered_list
[1,4]

The line works but looks (to me) a bit overkill and I was wondering if there was a simpler way to achieve the same.


Advices

Summary of two good advices given in the answers below:

1- Don’t name a list filter like I did because it is a built-in function.

2- Don’t compare things to True like I did with if filter[idx]==True.. since it’s unnecessary. Just using if filter[idx] is enough.


回答 0

您正在寻找itertools.compress

>>> from itertools import compress
>>> list_a = [1, 2, 4, 6]
>>> fil = [True, False, True, False]
>>> list(compress(list_a, fil))
[1, 4]

时序比较(py3.x):

>>> list_a = [1, 2, 4, 6]
>>> fil = [True, False, True, False]
>>> %timeit list(compress(list_a, fil))
100000 loops, best of 3: 2.58 us per loop
>>> %timeit [i for (i, v) in zip(list_a, fil) if v]  #winner
100000 loops, best of 3: 1.98 us per loop

>>> list_a = [1, 2, 4, 6]*100
>>> fil = [True, False, True, False]*100
>>> %timeit list(compress(list_a, fil))              #winner
10000 loops, best of 3: 24.3 us per loop
>>> %timeit [i for (i, v) in zip(list_a, fil) if v]
10000 loops, best of 3: 82 us per loop

>>> list_a = [1, 2, 4, 6]*10000
>>> fil = [True, False, True, False]*10000
>>> %timeit list(compress(list_a, fil))              #winner
1000 loops, best of 3: 1.66 ms per loop
>>> %timeit [i for (i, v) in zip(list_a, fil) if v] 
100 loops, best of 3: 7.65 ms per loop

不要filter用作变量名,它是一个内置函数。

You’re looking for itertools.compress:

>>> from itertools import compress
>>> list_a = [1, 2, 4, 6]
>>> fil = [True, False, True, False]
>>> list(compress(list_a, fil))
[1, 4]

Timing comparisons(py3.x):

>>> list_a = [1, 2, 4, 6]
>>> fil = [True, False, True, False]
>>> %timeit list(compress(list_a, fil))
100000 loops, best of 3: 2.58 us per loop
>>> %timeit [i for (i, v) in zip(list_a, fil) if v]  #winner
100000 loops, best of 3: 1.98 us per loop

>>> list_a = [1, 2, 4, 6]*100
>>> fil = [True, False, True, False]*100
>>> %timeit list(compress(list_a, fil))              #winner
10000 loops, best of 3: 24.3 us per loop
>>> %timeit [i for (i, v) in zip(list_a, fil) if v]
10000 loops, best of 3: 82 us per loop

>>> list_a = [1, 2, 4, 6]*10000
>>> fil = [True, False, True, False]*10000
>>> %timeit list(compress(list_a, fil))              #winner
1000 loops, best of 3: 1.66 ms per loop
>>> %timeit [i for (i, v) in zip(list_a, fil) if v] 
100 loops, best of 3: 7.65 ms per loop

Don’t use filter as a variable name, it is a built-in function.


回答 1

像这样:

filtered_list = [i for (i, v) in zip(list_a, filter) if v]

使用zip是在多个索引上并行迭代的pythonic方式,无需任何索引。假设两个序列的长度相同(最短用完后拉链停止)。使用itertools这种简单的情况有点过分…

在示例中您应该真正停止做的一件事是将事物与True进行比较,这通常不是必需的。相反if filter[idx]==True: ...,您可以简单地编写if filter[idx]: ...

Like so:

filtered_list = [i for (i, v) in zip(list_a, filter) if v]

Using zip is the pythonic way to iterate over multiple sequences in parallel, without needing any indexing. This assumes both sequences have the same length (zip stops after the shortest runs out). Using itertools for such a simple case is a bit overkill …

One thing you do in your example you should really stop doing is comparing things to True, this is usually not necessary. Instead of if filter[idx]==True: ..., you can simply write if filter[idx]: ....


回答 2

使用numpy:

In [128]: list_a = np.array([1, 2, 4, 6])
In [129]: filter = np.array([True, False, True, False])
In [130]: list_a[filter]

Out[130]: array([1, 4])

或者,如果list_a可以是一个numpy数组但不能过滤,请查看Alex Szatmary的答案

Numpy通常也可以大大提高速度

In [133]: list_a = [1, 2, 4, 6]*10000
In [134]: fil = [True, False, True, False]*10000
In [135]: list_a_np = np.array(list_a)
In [136]: fil_np = np.array(fil)

In [139]: %timeit list(itertools.compress(list_a, fil))
1000 loops, best of 3: 625 us per loop

In [140]: %timeit list_a_np[fil_np]
10000 loops, best of 3: 173 us per loop

With numpy:

In [128]: list_a = np.array([1, 2, 4, 6])
In [129]: filter = np.array([True, False, True, False])
In [130]: list_a[filter]

Out[130]: array([1, 4])

or see Alex Szatmary’s answer if list_a can be a numpy array but not filter

Numpy usually gives you a big speed boost as well

In [133]: list_a = [1, 2, 4, 6]*10000
In [134]: fil = [True, False, True, False]*10000
In [135]: list_a_np = np.array(list_a)
In [136]: fil_np = np.array(fil)

In [139]: %timeit list(itertools.compress(list_a, fil))
1000 loops, best of 3: 625 us per loop

In [140]: %timeit list_a_np[fil_np]
10000 loops, best of 3: 173 us per loop

回答 3

为此,请使用numpy,即,如果您有一个数组a,而不是list_a

a = np.array([1, 2, 4, 6])
my_filter = np.array([True, False, True, False], dtype=bool)
a[my_filter]
> array([1, 4])

To do this using numpy, ie, if you have an array, a, instead of list_a:

a = np.array([1, 2, 4, 6])
my_filter = np.array([True, False, True, False], dtype=bool)
a[my_filter]
> array([1, 4])

回答 4

filtered_list = [list_a[i] for i in range(len(list_a)) if filter[i]]
filtered_list = [list_a[i] for i in range(len(list_a)) if filter[i]]

回答 5

使用python 3,您可以list_a[filter]用来获取True值。要获得False价值,请使用list_a[~filter]

With python 3 you can use list_a[filter] to get True values. To get False values use list_a[~filter]


Python使用枚举内部列表理解

问题:Python使用枚举内部列表理解

假设我有一个这样的列表:

mylist = ["a","b","c","d"]

要获得打印的值及其索引,我可以使用Python的enumerate函数,如下所示

>>> for i,j in enumerate(mylist):
...     print i,j
...
0 a
1 b
2 c
3 d
>>>

现在,当我尝试在A中使用它时,出现list comprehension了这个错误

>>> [i,j for i,j in enumerate(mylist)]
  File "<stdin>", line 1
    [i,j for i,j in enumerate(mylist)]
           ^
SyntaxError: invalid syntax

所以,我的问题是:在列表理解中使用枚举的正确方法是什么?

Lets suppose I have a list like this:

mylist = ["a","b","c","d"]

To get the values printed along with their index I can use Python’s enumerate function like this

>>> for i,j in enumerate(mylist):
...     print i,j
...
0 a
1 b
2 c
3 d
>>>

Now, when I try to use it inside a list comprehension it gives me this error

>>> [i,j for i,j in enumerate(mylist)]
  File "<stdin>", line 1
    [i,j for i,j in enumerate(mylist)]
           ^
SyntaxError: invalid syntax

So, my question is: what is the correct way of using enumerate inside list comprehension?


回答 0

试试这个:

[(i, j) for i, j in enumerate(mylist)]

您需要放入i,j一个元组以使列表理解起作用。另外,鉴于enumerate() 已经返回一个元组,您可以直接将其返回而无需先拆包:

[pair for pair in enumerate(mylist)]

无论哪种方式,返回的结果都是预期的:

> [(0, 'a'), (1, 'b'), (2, 'c'), (3, 'd')]

Try this:

[(i, j) for i, j in enumerate(mylist)]

You need to put i,j inside a tuple for the list comprehension to work. Alternatively, given that enumerate() already returns a tuple, you can return it directly without unpacking it first:

[pair for pair in enumerate(mylist)]

Either way, the result that gets returned is as expected:

> [(0, 'a'), (1, 'b'), (2, 'c'), (3, 'd')]

回答 1

只需明确一点,这enumerate与列表理解语法无关。

此列表推导返回一个元组列表:

[(i,j) for i in range(3) for j in 'abc']

这是字典列表:

[{i:j} for i in range(3) for j in 'abc']

列表清单:

[[i,j] for i in range(3) for j in 'abc']

语法错误:

[i,j for i in range(3) for j in 'abc']

这是不一致的(IMHO),并且与字典理解语法混淆:

>>> {i:j for i,j in enumerate('abcdef')}
{0: 'a', 1: 'b', 2: 'c', 3: 'd', 4: 'e', 5: 'f'}

和一组元组:

>>> {(i,j) for i,j in enumerate('abcdef')}
set([(0, 'a'), (4, 'e'), (1, 'b'), (2, 'c'), (5, 'f'), (3, 'd')])

正如ÓscarLópez所述,您可以直接通过枚举元组:

>>> [t for t in enumerate('abcdef') ] 
[(0, 'a'), (1, 'b'), (2, 'c'), (3, 'd'), (4, 'e'), (5, 'f')]

Just to be really clear, this has nothing to do with enumerate and everything to do with list comprehension syntax.

This list comprehension returns a list of tuples:

[(i,j) for i in range(3) for j in 'abc']

this a list of dicts:

[{i:j} for i in range(3) for j in 'abc']

a list of lists:

[[i,j] for i in range(3) for j in 'abc']

a syntax error:

[i,j for i in range(3) for j in 'abc']

Which is inconsistent (IMHO) and confusing with dictionary comprehensions syntax:

>>> {i:j for i,j in enumerate('abcdef')}
{0: 'a', 1: 'b', 2: 'c', 3: 'd', 4: 'e', 5: 'f'}

And a set of tuples:

>>> {(i,j) for i,j in enumerate('abcdef')}
set([(0, 'a'), (4, 'e'), (1, 'b'), (2, 'c'), (5, 'f'), (3, 'd')])

As Óscar López stated, you can just pass the enumerate tuple directly:

>>> [t for t in enumerate('abcdef') ] 
[(0, 'a'), (1, 'b'), (2, 'c'), (3, 'd'), (4, 'e'), (5, 'f')]

回答 2

或者,如果您不坚持使用列表理解:

>>> mylist = ["a","b","c","d"]
>>> list(enumerate(mylist))
[(0, 'a'), (1, 'b'), (2, 'c'), (3, 'd')]

Or, if you don’t insist on using a list comprehension:

>>> mylist = ["a","b","c","d"]
>>> list(enumerate(mylist))
[(0, 'a'), (1, 'b'), (2, 'c'), (3, 'd')]

回答 3

如果您使用的是长列表,则列表理解似乎会更快,更不用说可读性了。

~$ python -mtimeit -s"mylist = ['a','b','c','d']" "list(enumerate(mylist))"
1000000 loops, best of 3: 1.61 usec per loop
~$ python -mtimeit -s"mylist = ['a','b','c','d']" "[(i, j) for i, j in enumerate(mylist)]"
1000000 loops, best of 3: 0.978 usec per loop
~$ python -mtimeit -s"mylist = ['a','b','c','d']" "[t for t in enumerate(mylist)]"
1000000 loops, best of 3: 0.767 usec per loop

If you’re using long lists, it appears the list comprehension’s faster, not to mention more readable.

~$ python -mtimeit -s"mylist = ['a','b','c','d']" "list(enumerate(mylist))"
1000000 loops, best of 3: 1.61 usec per loop
~$ python -mtimeit -s"mylist = ['a','b','c','d']" "[(i, j) for i, j in enumerate(mylist)]"
1000000 loops, best of 3: 0.978 usec per loop
~$ python -mtimeit -s"mylist = ['a','b','c','d']" "[t for t in enumerate(mylist)]"
1000000 loops, best of 3: 0.767 usec per loop

回答 4

这是一种实现方法:

>>> mylist = ['a', 'b', 'c', 'd']
>>> [item for item in enumerate(mylist)]
[(0, 'a'), (1, 'b'), (2, 'c'), (3, 'd')]

或者,您可以执行以下操作:

>>> [(i, j) for i, j in enumerate(mylist)]
[(0, 'a'), (1, 'b'), (2, 'c'), (3, 'd')]

出现错误的原因是您错过了()ij使其成为一个元组。

Here’s a way to do it:

>>> mylist = ['a', 'b', 'c', 'd']
>>> [item for item in enumerate(mylist)]
[(0, 'a'), (1, 'b'), (2, 'c'), (3, 'd')]

Alternatively, you can do:

>>> [(i, j) for i, j in enumerate(mylist)]
[(0, 'a'), (1, 'b'), (2, 'c'), (3, 'd')]

The reason you got an error was that you were missing the () around i and j to make it a tuple.


回答 5

明确说明元组。

[(i, j) for (i, j) in enumerate(mylist)]

Be explicit about the tuples.

[(i, j) for (i, j) in enumerate(mylist)]

回答 6

所有人都很好回答。我知道这里的问题是枚举所特有的,但是这样的话,又是另一个角度

from itertools import izip, count
a = ["5", "6", "1", "2"]
tupleList = list( izip( count(), a ) )
print(tupleList)

如果必须在性能方面并行迭代多个列表,它将变得更加强大。只是一个想法

a = ["5", "6", "1", "2"]
b = ["a", "b", "c", "d"]
tupleList = list( izip( count(), a, b ) )
print(tupleList)

All great answer guys. I know the question here is specific to enumeration but how about something like this, just another perspective

from itertools import izip, count
a = ["5", "6", "1", "2"]
tupleList = list( izip( count(), a ) )
print(tupleList)

It becomes more powerful, if one has to iterate multiple lists in parallel in terms of performance. Just a thought

a = ["5", "6", "1", "2"]
b = ["a", "b", "c", "d"]
tupleList = list( izip( count(), a, b ) )
print(tupleList)

如何将键值元组列表转换成字典?

问题:如何将键值元组列表转换成字典?

我有一个列表,看起来像:

[('A', 1), ('B', 2), ('C', 3)]

我想把它变成一个像这样的字典:

{'A': 1, 'B': 2, 'C': 3}

最好的方法是什么?

编辑:我的元组列表实际上更像是:

[(A, 12937012397), (BERA, 2034927830), (CE, 2349057340)]

I have a list that looks like:

[('A', 1), ('B', 2), ('C', 3)]

I want to turn it into a dictionary that looks like:

{'A': 1, 'B': 2, 'C': 3}

What’s the best way to go about this?

EDIT: My list of tuples is actually more like:

[(A, 12937012397), (BERA, 2034927830), (CE, 2349057340)]

回答 0

这给了我与尝试拆分列表并压缩列表相同的错误。ValueError:字典更新序列元素#0的长度为1916;2个为必填项

那是你的实际问题。

答案是列表中的元素与您认为的不一样。如果键入,myList[0]您会发现列表的第一个元素不是二元组,例如('A', 1),而是1916长度的iterable。

一旦您真正有了原始问题(myList = [('A',1),('B',2),...])中所述表格的列表,您所需要做的就是dict(myList)

This gives me the same error as trying to split the list up and zip it. ValueError: dictionary update sequence element #0 has length 1916; 2 is required

THAT is your actual question.

The answer is that the elements of your list are not what you think they are. If you type myList[0] you will find that the first element of your list is not a two-tuple, e.g. ('A', 1), but rather a 1916-length iterable.

Once you actually have a list in the form you stated in your original question (myList = [('A',1),('B',2),...]), all you need to do is dict(myList).


回答 1

>>> dict([('A', 1), ('B', 2), ('C', 3)])
{'A': 1, 'C': 3, 'B': 2}
>>> dict([('A', 1), ('B', 2), ('C', 3)])
{'A': 1, 'C': 3, 'B': 2}

回答 2

你有尝试过吗?

>>> l=[('A',1), ('B',2), ('C',3)]
>>> d=dict(l)
>>> d
{'A': 1, 'C': 3, 'B': 2}

Have you tried this?

>>> l=[('A',1), ('B',2), ('C',3)]
>>> d=dict(l)
>>> d
{'A': 1, 'C': 3, 'B': 2}

回答 3

这是处理重复的元组“键”的方法:

# An example
l = [('A', 1), ('B', 2), ('C', 3), ('A', 5), ('D', 0), ('D', 9)]

# A solution
d = dict()
[d [t [0]].append(t [1]) if t [0] in list(d.keys()) 
 else d.update({t [0]: [t [1]]}) for t in l]
d

OUTPUT: {'A': [1, 5], 'B': [2], 'C': [3], 'D': [0, 9]}

Here is a way to handle duplicate tuple “keys”:

# An example
l = [('A', 1), ('B', 2), ('C', 3), ('A', 5), ('D', 0), ('D', 9)]

# A solution
d = dict()
[d [t [0]].append(t [1]) if t [0] in list(d.keys()) 
 else d.update({t [0]: [t [1]]}) for t in l]
d

OUTPUT: {'A': [1, 5], 'B': [2], 'C': [3], 'D': [0, 9]}

回答 4

使用字典推导的另一种方式

>>> t = [('A', 1), ('B', 2), ('C', 3)]
>>> d = { i:j for i,j in t }
>>> d
{'A': 1, 'B': 2, 'C': 3}

Another way using dictionary comprehensions,

>>> t = [('A', 1), ('B', 2), ('C', 3)]
>>> d = { i:j for i,j in t }
>>> d
{'A': 1, 'B': 2, 'C': 3}

回答 5

如果Tuple没有重复键,则很简单。

tup = [("A",0),("B",3),("C",5)]
dic = dict(tup)
print(dic)

如果元组具有键重复。

tup = [("A",0),("B",3),("C",5),("A",9),("B",4)]
dic = {}
for i, j in tup:
    dic.setdefault(i,[]).append(j)
print(dic)

If Tuple has no key repetitions, it’s Simple.

tup = [("A",0),("B",3),("C",5)]
dic = dict(tup)
print(dic)

If tuple has key repetitions.

tup = [("A",0),("B",3),("C",5),("A",9),("B",4)]
dic = {}
for i, j in tup:
    dic.setdefault(i,[]).append(j)
print(dic)

回答 6

l=[['A', 1], ['B', 2], ['C', 3]]
d={}
for i,j in l:
d.setdefault(i,j)
print(d)
l=[['A', 1], ['B', 2], ['C', 3]]
d={}
for i,j in l:
d.setdefault(i,j)
print(d)

从字符串列表的元素中删除结尾的换行符

问题:从字符串列表的元素中删除结尾的换行符

我必须采用以下形式的大量单词:

['this\n', 'is\n', 'a\n', 'list\n', 'of\n', 'words\n']

然后使用strip功能,将其转换为:

['this', 'is', 'a', 'list', 'of', 'words']

我以为我写的东西行得通,但是我不断收到错误消息:

“’list’对象没有属性’strip’”

这是我尝试过的代码:

strip_list = []
for lengths in range(1,20):
    strip_list.append(0) #longest word in the text file is 20 characters long
for a in lines:
    strip_list.append(lines[a].strip())

I have to take a large list of words in the form:

['this\n', 'is\n', 'a\n', 'list\n', 'of\n', 'words\n']

and then using the strip function, turn it into:

['this', 'is', 'a', 'list', 'of', 'words']

I thought that what I had written would work, but I keep getting an error saying:

“‘list’ object has no attribute ‘strip'”

Here is the code that I tried:

strip_list = []
for lengths in range(1,20):
    strip_list.append(0) #longest word in the text file is 20 characters long
for a in lines:
    strip_list.append(lines[a].strip())

回答 0

>>> my_list = ['this\n', 'is\n', 'a\n', 'list\n', 'of\n', 'words\n']
>>> map(str.strip, my_list)
['this', 'is', 'a', 'list', 'of', 'words']
>>> my_list = ['this\n', 'is\n', 'a\n', 'list\n', 'of\n', 'words\n']
>>> map(str.strip, my_list)
['this', 'is', 'a', 'list', 'of', 'words']

回答 1

清单理解力? [x.strip() for x in lst]

list comprehension? [x.strip() for x in lst]


回答 2

您可以使用列表推导

strip_list = [item.strip() for item in lines]

map功能:

# with a lambda
strip_list = map(lambda it: it.strip(), lines)

# without a lambda
strip_list = map(str.strip, lines)

You can use lists comprehensions:

strip_list = [item.strip() for item in lines]

Or the map function:

# with a lambda
strip_list = map(lambda it: it.strip(), lines)

# without a lambda
strip_list = map(str.strip, lines)

回答 3

这可以使用PEP 202中定义的列表理解来完成

[w.strip() for w in  ['this\n', 'is\n', 'a\n', 'list\n', 'of\n', 'words\n']]

This can be done using list comprehensions as defined in PEP 202

[w.strip() for w in  ['this\n', 'is\n', 'a\n', 'list\n', 'of\n', 'words\n']]

回答 4

所有其他答案,主要是关于列表理解的,都很棒。但是只是为了解释您的错误:

strip_list = []
for lengths in range(1,20):
    strip_list.append(0) #longest word in the text file is 20 characters long
for a in lines:
    strip_list.append(lines[a].strip())

a是您列表的成员,而不是索引。您可以这样写:

[...]
for a in lines:
    strip_list.append(a.strip())

另一个重要的评论:您可以通过以下方式创建一个空列表:

strip_list = [0] * 20

但这不是那么有用,因为可以.append 内容追加到列表中。在您的情况下,创建带有默认值的列表是没有用的,因为在附加剥离字符串时,将逐项构建该列表。

因此,您的代码应类似于:

strip_list = []
for a in lines:
    strip_list.append(a.strip())

但是,可以肯定的是,最好的选择就是这个,因为这是完全一样的:

stripped = [line.strip() for line in lines]

如果您遇到的不仅仅是a复杂的事情.strip,请将其放在函数中并执行相同的操作。这是使用列表最易读的方式。

All other answers, and mainly about list comprehension, are great. But just to explain your error:

strip_list = []
for lengths in range(1,20):
    strip_list.append(0) #longest word in the text file is 20 characters long
for a in lines:
    strip_list.append(lines[a].strip())

a is a member of your list, not an index. What you could write is this:

[...]
for a in lines:
    strip_list.append(a.strip())

Another important comment: you can create an empty list this way:

strip_list = [0] * 20

But this is not so useful, as .append appends stuff to your list. In your case, it’s not useful to create a list with defaut values, as you’ll build it item per item when appending stripped strings.

So your code should be like:

strip_list = []
for a in lines:
    strip_list.append(a.strip())

But, for sure, the best one is this one, as this is exactly the same thing:

stripped = [line.strip() for line in lines]

In case you have something more complicated than just a .strip, put this in a function, and do the same. That’s the most readable way to work with lists.


回答 5

如果您只需要删除结尾的空格,则可以使用str.rstrip(),它的效率应比str.strip()

>>> lst = ['this\n', 'is\n', 'a\n', 'list\n', 'of\n', 'words\n']
>>> [x.rstrip() for x in lst]
['this', 'is', 'a', 'list', 'of', 'words']
>>> list(map(str.rstrip, lst))
['this', 'is', 'a', 'list', 'of', 'words']

If you need to remove just trailing whitespace, you could use str.rstrip(), which should be slightly more efficient than str.strip():

>>> lst = ['this\n', 'is\n', 'a\n', 'list\n', 'of\n', 'words\n']
>>> [x.rstrip() for x in lst]
['this', 'is', 'a', 'list', 'of', 'words']
>>> list(map(str.rstrip, lst))
['this', 'is', 'a', 'list', 'of', 'words']

回答 6

my_list = ['this\n', 'is\n', 'a\n', 'list\n', 'of\n', 'words\n']
print([l.strip() for l in my_list])

输出:

['this', 'is', 'a', 'list', 'of', 'words']
my_list = ['this\n', 'is\n', 'a\n', 'list\n', 'of\n', 'words\n']
print([l.strip() for l in my_list])

Output:

['this', 'is', 'a', 'list', 'of', 'words']

检查值是否已存在于字典列表中?

问题:检查值是否已存在于字典列表中?

我有一个Python字典列表,如下所示:

a = [
    {'main_color': 'red', 'second_color':'blue'},
    {'main_color': 'yellow', 'second_color':'green'},
    {'main_color': 'yellow', 'second_color':'blue'},
]

我想检查列表中是否已存在具有特定键/值的字典,如下所示:

// is a dict with 'main_color'='red' in the list already?
// if not: add item

I’ve got a Python list of dictionaries, as follows:

a = [
    {'main_color': 'red', 'second_color':'blue'},
    {'main_color': 'yellow', 'second_color':'green'},
    {'main_color': 'yellow', 'second_color':'blue'},
]

I’d like to check whether a dictionary with a particular key/value already exists in the list, as follows:

// is a dict with 'main_color'='red' in the list already?
// if not: add item

回答 0

这是一种实现方法:

if not any(d['main_color'] == 'red' for d in a):
    # does not exist

括号中的部分是一个生成器表达式,该表达式True将为每个具有您要查找的键-值对的字典返回,否则为False


如果密钥也可能丢失,则上面的代码可以给您一个KeyError。您可以使用get并提供默认值来解决此问题。如果不提供默认值,None则返回。

if not any(d.get('main_color', default_value) == 'red' for d in a):
    # does not exist

Here’s one way to do it:

if not any(d['main_color'] == 'red' for d in a):
    # does not exist

The part in parentheses is a generator expression that returns True for each dictionary that has the key-value pair you are looking for, otherwise False.


If the key could also be missing the above code can give you a KeyError. You can fix this by using get and providing a default value. If you don’t provide a default value, None is returned.

if not any(d.get('main_color', default_value) == 'red' for d in a):
    # does not exist

回答 1

也许这会有所帮助:

a = [{ 'main_color': 'red', 'second_color':'blue'},
     { 'main_color': 'yellow', 'second_color':'green'},
     { 'main_color': 'yellow', 'second_color':'blue'}]

def in_dictlist((key, value), my_dictlist):
    for this in my_dictlist:
        if this[key] == value:
            return this
    return {}

print in_dictlist(('main_color','red'), a)
print in_dictlist(('main_color','pink'), a)

Maybe this helps:

a = [{ 'main_color': 'red', 'second_color':'blue'},
     { 'main_color': 'yellow', 'second_color':'green'},
     { 'main_color': 'yellow', 'second_color':'blue'}]

def in_dictlist((key, value), my_dictlist):
    for this in my_dictlist:
        if this[key] == value:
            return this
    return {}

print in_dictlist(('main_color','red'), a)
print in_dictlist(('main_color','pink'), a)

回答 2

遵循这些原则的功能也许就是您所追求的:

 def add_unique_to_dict_list(dict_list, key, value):
  for d in dict_list:
     if key in d:
        return d[key]

  dict_list.append({ key: value })
  return value

Perhaps a function along these lines is what you’re after:

 def add_unique_to_dict_list(dict_list, key, value):
  for d in dict_list:
     if key in d:
        return d[key]

  dict_list.append({ key: value })
  return value

回答 3

基于@Mark Byers的一个很好的答案,并紧接着@Florent问题,仅表明它也可以在具有超过2个键的dic列表中使用2个条件:

names = []
names.append({'first': 'Nil', 'last': 'Elliot', 'suffix': 'III'})
names.append({'first': 'Max', 'last': 'Sam', 'suffix': 'IX'})
names.append({'first': 'Anthony', 'last': 'Mark', 'suffix': 'IX'})

if not any(d['first'] == 'Anthony' and d['last'] == 'Mark' for d in names):

    print('Not exists!')
else:
    print('Exists!')

结果:

Exists!

Based on @Mark Byers great answer, and following @Florent question, just to indicate that it will also work with 2 conditions on list of dics with more than 2 keys:

names = []
names.append({'first': 'Nil', 'last': 'Elliot', 'suffix': 'III'})
names.append({'first': 'Max', 'last': 'Sam', 'suffix': 'IX'})
names.append({'first': 'Anthony', 'last': 'Mark', 'suffix': 'IX'})

if not any(d['first'] == 'Anthony' and d['last'] == 'Mark' for d in names):

    print('Not exists!')
else:
    print('Exists!')

Result:

Exists!

在Python中,“。append()”和“ + = []”之间有什么区别?

问题:在Python中,“。append()”和“ + = []”之间有什么区别?

之间有什么区别?

some_list1 = []
some_list1.append("something")

some_list2 = []
some_list2 += ["something"]

What is the difference between:

some_list1 = []
some_list1.append("something")

and

some_list2 = []
some_list2 += ["something"]

回答 0

对于您而言,唯一的区别是性能:append是两倍的速度。

Python 3.0 (r30:67507, Dec  3 2008, 20:14:27) [MSC v.1500 32 bit (Intel)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> import timeit
>>> timeit.Timer('s.append("something")', 's = []').timeit()
0.20177424499999999
>>> timeit.Timer('s += ["something"]', 's = []').timeit()
0.41192320500000079

Python 2.5.1 (r251:54863, Apr 18 2007, 08:51:08) [MSC v.1310 32 bit (Intel)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> import timeit
>>> timeit.Timer('s.append("something")', 's = []').timeit()
0.23079359499999999
>>> timeit.Timer('s += ["something"]', 's = []').timeit()
0.44208112500000141

通常情况下,append会将一个项目添加到列表中,而+=将右侧列表的所有元素复制到左侧列表中。

更新:性能分析

比较字节码,我们可以假设appendversion在LOAD_ATTR+ CALL_FUNCTION和+ = version-中浪费了周期BUILD_LIST。显然BUILD_LIST大于LOAD_ATTR+ CALL_FUNCTION

>>> import dis
>>> dis.dis(compile("s = []; s.append('spam')", '', 'exec'))
  1           0 BUILD_LIST               0
              3 STORE_NAME               0 (s)
              6 LOAD_NAME                0 (s)
              9 LOAD_ATTR                1 (append)
             12 LOAD_CONST               0 ('spam')
             15 CALL_FUNCTION            1
             18 POP_TOP
             19 LOAD_CONST               1 (None)
             22 RETURN_VALUE
>>> dis.dis(compile("s = []; s += ['spam']", '', 'exec'))
  1           0 BUILD_LIST               0
              3 STORE_NAME               0 (s)
              6 LOAD_NAME                0 (s)
              9 LOAD_CONST               0 ('spam')
             12 BUILD_LIST               1
             15 INPLACE_ADD
             16 STORE_NAME               0 (s)
             19 LOAD_CONST               1 (None)
             22 RETURN_VALUE

我们可以通过减少LOAD_ATTR开销来进一步提高性能:

>>> timeit.Timer('a("something")', 's = []; a = s.append').timeit()
0.15924410999923566

For your case the only difference is performance: append is twice as fast.

Python 3.0 (r30:67507, Dec  3 2008, 20:14:27) [MSC v.1500 32 bit (Intel)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> import timeit
>>> timeit.Timer('s.append("something")', 's = []').timeit()
0.20177424499999999
>>> timeit.Timer('s += ["something"]', 's = []').timeit()
0.41192320500000079

Python 2.5.1 (r251:54863, Apr 18 2007, 08:51:08) [MSC v.1310 32 bit (Intel)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> import timeit
>>> timeit.Timer('s.append("something")', 's = []').timeit()
0.23079359499999999
>>> timeit.Timer('s += ["something"]', 's = []').timeit()
0.44208112500000141

In general case append will add one item to the list, while += will copy all elements of right-hand-side list into the left-hand-side list.

Update: perf analysis

Comparing bytecodes we can assume that append version wastes cycles in LOAD_ATTR + CALL_FUNCTION, and += version — in BUILD_LIST. Apparently BUILD_LIST outweighs LOAD_ATTR + CALL_FUNCTION.

>>> import dis
>>> dis.dis(compile("s = []; s.append('spam')", '', 'exec'))
  1           0 BUILD_LIST               0
              3 STORE_NAME               0 (s)
              6 LOAD_NAME                0 (s)
              9 LOAD_ATTR                1 (append)
             12 LOAD_CONST               0 ('spam')
             15 CALL_FUNCTION            1
             18 POP_TOP
             19 LOAD_CONST               1 (None)
             22 RETURN_VALUE
>>> dis.dis(compile("s = []; s += ['spam']", '', 'exec'))
  1           0 BUILD_LIST               0
              3 STORE_NAME               0 (s)
              6 LOAD_NAME                0 (s)
              9 LOAD_CONST               0 ('spam')
             12 BUILD_LIST               1
             15 INPLACE_ADD
             16 STORE_NAME               0 (s)
             19 LOAD_CONST               1 (None)
             22 RETURN_VALUE

We can improve performance even more by removing LOAD_ATTR overhead:

>>> timeit.Timer('a("something")', 's = []; a = s.append').timeit()
0.15924410999923566

回答 1

在您给出的示例中,append和之间在输出方面没有区别+=。但是append和之间+(这是最初询问的问题)之间存在区别。

>>> a = []
>>> id(a)
11814312
>>> a.append("hello")
>>> id(a)
11814312

>>> b = []
>>> id(b)
11828720
>>> c = b + ["hello"]
>>> id(c)
11833752
>>> b += ["hello"]
>>> id(b)
11828720

如您所见,append+=具有相同的结果;他们将项目添加到列表中,而不生成新列表。使用+添加两个列表并生成一个新列表。

In the example you gave, there is no difference, in terms of output, between append and +=. But there is a difference between append and + (which the question originally asked about).

>>> a = []
>>> id(a)
11814312
>>> a.append("hello")
>>> id(a)
11814312

>>> b = []
>>> id(b)
11828720
>>> c = b + ["hello"]
>>> id(c)
11833752
>>> b += ["hello"]
>>> id(b)
11828720

As you can see, append and += have the same result; they add the item to the list, without producing a new list. Using + adds the two lists and produces a new list.


回答 2

>>> a=[]
>>> a.append([1,2])
>>> a
[[1, 2]]
>>> a=[]
>>> a+=[1,2]
>>> a
[1, 2]

看到append将单个元素添加到列表,可以是任何元素。+=[]加入列表。

>>> a=[]
>>> a.append([1,2])
>>> a
[[1, 2]]
>>> a=[]
>>> a+=[1,2]
>>> a
[1, 2]

See that append adds a single element to the list, which may be anything. +=[] joins the lists.


回答 3

+ =是一个分配。使用它时,您实际上是在说“ some_list2 = some_list2 + [‘something’]”。分配涉及重新绑定,因此:

l= []

def a1(x):
    l.append(x) # works

def a2(x):
    l= l+[x] # assign to l, makes l local
             # so attempt to read l for addition gives UnboundLocalError

def a3(x):
    l+= [x]  # fails for the same reason

+ =运算符通常还应该像list + list通常那样创建一个新的列表对象:

>>> l1= []
>>> l2= l1

>>> l1.append('x')
>>> l1 is l2
True

>>> l1= l1+['x']
>>> l1 is l2
False

但是实际上:

>>> l2= l1
>>> l1+= ['x']
>>> l1 is l2
True

这是因为Python列表实现了__iadd __()来使+ =扩展分配短路,而调用list.extend()。(这有点奇怪:它通常按照您的意思进行,但出于令人困惑的原因。)

通常,如果要添加/扩展现有列表,并且希望保留对同一列表的引用(而不是创建新列表),则最好明确并坚持使用append()/ extend()方法。

+= is an assignment. When you use it you’re really saying ‘some_list2= some_list2+[‘something’]’. Assignments involve rebinding, so:

l= []

def a1(x):
    l.append(x) # works

def a2(x):
    l= l+[x] # assign to l, makes l local
             # so attempt to read l for addition gives UnboundLocalError

def a3(x):
    l+= [x]  # fails for the same reason

The += operator should also normally create a new list object like list+list normally does:

>>> l1= []
>>> l2= l1

>>> l1.append('x')
>>> l1 is l2
True

>>> l1= l1+['x']
>>> l1 is l2
False

However in reality:

>>> l2= l1
>>> l1+= ['x']
>>> l1 is l2
True

This is because Python lists implement __iadd__() to make a += augmented assignment short-circuit and call list.extend() instead. (It’s a bit of a strange wart this: it usually does what you meant, but for confusing reasons.)

In general, if you’re appending/extended an existing list, and you want to keep the reference to the same list (instead of making a new one), it’s best to be explicit and stick with the append()/extend() methods.


回答 4

 some_list2 += ["something"]

实际上是

 some_list2.extend(["something"])

对于一个值,没有区别。文档指出:

s.append(x) 与… s[len(s):len(s)] = [x]
s.extend(x) 相同s[len(s):len(s)] = x

因此显然s.append(x)s.extend([x])

 some_list2 += ["something"]

is actually

 some_list2.extend(["something"])

for one value, there is no difference. Documentation states, that:

s.append(x) same as s[len(s):len(s)] = [x]
s.extend(x) same as s[len(s):len(s)] = x

Thus obviously s.append(x) is same as s.extend([x])


回答 5

区别在于,连接将使结果列表变平,而附加将使级别保持完整:

因此,例如:

myList = [ ]
listA = [1,2,3]
listB = ["a","b","c"]

使用append,最终得到一个列表列表:

>> myList.append(listA)
>> myList.append(listB)
>> myList
[[1,2,3],['a',b','c']]

而是使用串联,最终得到一个平面列表:

>> myList += listA + listB
>> myList
[1,2,3,"a","b","c"]

The difference is that concatenate will flatten the resulting list, whereas append will keep the levels intact:

So for example with:

myList = [ ]
listA = [1,2,3]
listB = ["a","b","c"]

Using append, you end up with a list of lists:

>> myList.append(listA)
>> myList.append(listB)
>> myList
[[1,2,3],['a',b','c']]

Using concatenate instead, you end up with a flat list:

>> myList += listA + listB
>> myList
[1,2,3,"a","b","c"]

回答 6

这里的性能测试不正确:

  1. 您不应只运行一次配置文件。
  2. 如果比较append与+ = []的次数,则应将append声明为局部函数。
  3. 时间结果在不同的python版本上是不同的:64位和32位

例如

timeit.Timer(’for xrange(100)中的i:app(i)’,’s = []; app = s.append’)。timeit()

好的测试可以在这里找到:http : //markandclick.com/1/post/2012/01/python-list-append-vs.html

The performance tests here are not correct:

  1. You shouldn’t run the profile only once.
  2. If comparing append vs. += [] number of times you should declare append as a local function.
  3. time results are different on different python versions: 64 and 32 bit

e.g.

timeit.Timer(‘for i in xrange(100): app(i)’, ‘s = [] ; app = s.append’).timeit()

good tests can be found here: http://markandclick.com/1/post/2012/01/python-list-append-vs.html


回答 7

除了其他答案中描述的方面之外,当您尝试构建列表列表时,append和+ []的行为也非常不同。

>>> list1=[[1,2],[3,4]]
>>> list2=[5,6]
>>> list3=list1+list2
>>> list3
[[1, 2], [3, 4], 5, 6]
>>> list1.append(list2)
>>> list1
[[1, 2], [3, 4], [5, 6]]

list1 + [‘5’,’6’]将“ 5”和“ 6”添加到list1作为单独的元素。list1.append([‘5’,’6’])将列表[‘5’,’6’]作为单个元素添加到list1。

In addition to the aspects described in the other answers, append and +[] have very different behaviors when you’re trying to build a list of lists.

>>> list1=[[1,2],[3,4]]
>>> list2=[5,6]
>>> list3=list1+list2
>>> list3
[[1, 2], [3, 4], 5, 6]
>>> list1.append(list2)
>>> list1
[[1, 2], [3, 4], [5, 6]]

list1+[‘5′,’6’] adds ‘5’ and ‘6’ to the list1 as individual elements. list1.append([‘5′,’6’]) adds the list [‘5′,’6’] to the list1 as a single element.


回答 8

其他答案中提到的重新绑定行为在某些情况下确实很重要:

>>> a = ([],[])
>>> a[0].append(1)
>>> a
([1], [])
>>> a[1] += [1]
Traceback (most recent call last):
  File "<interactive input>", line 1, in <module>
TypeError: 'tuple' object does not support item assignment

这是因为即使对象已就地突变,增强分配也会始终重新绑定。这里的绑定恰好是a[1] = *mutated list*,不适用于元组。

The rebinding behaviour mentioned in other answers does matter in certain circumstances:

>>> a = ([],[])
>>> a[0].append(1)
>>> a
([1], [])
>>> a[1] += [1]
Traceback (most recent call last):
  File "<interactive input>", line 1, in <module>
TypeError: 'tuple' object does not support item assignment

That’s because augmented assignment always rebinds, even if the object was mutated in-place. The rebinding here happens to be a[1] = *mutated list*, which doesn’t work for tuples.


回答 9

让我们先举一个例子

list1=[1,2,3,4]
list2=list1     (that means they points to same object)

if we do 
list1=list1+[5]    it will create a new object of list
print(list1)       output [1,2,3,4,5] 
print(list2)       output [1,2,3,4]

but if we append  then 
list1.append(5)     no new object of list created
print(list1)       output [1,2,3,4,5] 
print(list2)       output [1,2,3,4,5]

extend(list) also do the same work as append it just append a list instead of a 
single variable 

let’s take an example first

list1=[1,2,3,4]
list2=list1     (that means they points to same object)

if we do 
list1=list1+[5]    it will create a new object of list
print(list1)       output [1,2,3,4,5] 
print(list2)       output [1,2,3,4]

but if we append  then 
list1.append(5)     no new object of list created
print(list1)       output [1,2,3,4,5] 
print(list2)       output [1,2,3,4,5]

extend(list) also do the same work as append it just append a list instead of a 
single variable 

回答 10

append()方法将单个项目添加到现有列表中

some_list1 = []
some_list1.append("something")

因此,这里some_list1将被修改。

更新:

而使用+组合现有列表中列表的元素(多个元素),类似于扩展(由Flux纠正)。

some_list2 = []
some_list2 += ["something"]

因此,这里some_list2和[“ something”]是结合在一起的两个列表。

The append() method adds a single item to the existing list

some_list1 = []
some_list1.append("something")

So here the some_list1 will get modified.

Updated:

Whereas using + to combine the elements of lists (more than one element) in the existing list similar to the extend (as corrected by Flux).

some_list2 = []
some_list2 += ["something"]

So here the some_list2 and [“something”] are the two lists that are combined.


回答 11

“ +” 不会改变列表

.append()改变旧列表

“+” does not mutate the list

.append() mutates the old list


从列表或元组中明确选择项目

问题:从列表或元组中明确选择项目

我有以下Python列表(也可以是元组):

myList = ['foo', 'bar', 'baz', 'quux']

我可以说

>>> myList[0:3]
['foo', 'bar', 'baz']
>>> myList[::2]
['foo', 'baz']
>>> myList[1::2]
['bar', 'quux']

如何显式挑选索引没有特定模式的项目?例如,我要选择[0,2,3]。或者,从1000个很大的清单中,我要选择[87, 342, 217, 998, 500]。是否有一些Python语法可以做到这一点?看起来像这样:

>>> myBigList[87, 342, 217, 998, 500]

I have the following Python list (can also be a tuple):

myList = ['foo', 'bar', 'baz', 'quux']

I can say

>>> myList[0:3]
['foo', 'bar', 'baz']
>>> myList[::2]
['foo', 'baz']
>>> myList[1::2]
['bar', 'quux']

How do I explicitly pick out items whose indices have no specific patterns? For example, I want to select [0,2,3]. Or from a very big list of 1000 items, I want to select [87, 342, 217, 998, 500]. Is there some Python syntax that does that? Something that looks like:

>>> myBigList[87, 342, 217, 998, 500]

回答 0

list( myBigList[i] for i in [87, 342, 217, 998, 500] )

我将答案与python 2.5.2进行了比较:

  • 19.7微秒: [ myBigList[i] for i in [87, 342, 217, 998, 500] ]

  • 20.6 USEC: map(myBigList.__getitem__, (87, 342, 217, 998, 500))

  • 22.7 USEC: itemgetter(87, 342, 217, 998, 500)(myBigList)

  • 24.6 USEC: list( myBigList[i] for i in [87, 342, 217, 998, 500] )

请注意,在Python 3中,第1个已更改为与第4个相同。


另一种选择是以a开头,numpy.array它允许通过列表或a进行索引numpy.array

>>> import numpy
>>> myBigList = numpy.array(range(1000))
>>> myBigList[(87, 342, 217, 998, 500)]
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
IndexError: invalid index
>>> myBigList[[87, 342, 217, 998, 500]]
array([ 87, 342, 217, 998, 500])
>>> myBigList[numpy.array([87, 342, 217, 998, 500])]
array([ 87, 342, 217, 998, 500])

tuple不工作方式相同那些片。

list( myBigList[i] for i in [87, 342, 217, 998, 500] )

I compared the answers with python 2.5.2:

  • 19.7 usec: [ myBigList[i] for i in [87, 342, 217, 998, 500] ]

  • 20.6 usec: map(myBigList.__getitem__, (87, 342, 217, 998, 500))

  • 22.7 usec: itemgetter(87, 342, 217, 998, 500)(myBigList)

  • 24.6 usec: list( myBigList[i] for i in [87, 342, 217, 998, 500] )

Note that in Python 3, the 1st was changed to be the same as the 4th.


Another option would be to start out with a numpy.array which allows indexing via a list or a numpy.array:

>>> import numpy
>>> myBigList = numpy.array(range(1000))
>>> myBigList[(87, 342, 217, 998, 500)]
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
IndexError: invalid index
>>> myBigList[[87, 342, 217, 998, 500]]
array([ 87, 342, 217, 998, 500])
>>> myBigList[numpy.array([87, 342, 217, 998, 500])]
array([ 87, 342, 217, 998, 500])

The tuple doesn’t work the same way as those are slices.


回答 1

那这个呢:

from operator import itemgetter
itemgetter(0,2,3)(myList)
('foo', 'baz', 'quux')

What about this:

from operator import itemgetter
itemgetter(0,2,3)(myList)
('foo', 'baz', 'quux')

回答 2

它不是内置的,但是如果您愿意,可以创建一个将元组作为“索引”的list的子类:

class MyList(list):

    def __getitem__(self, index):
        if isinstance(index, tuple):
            return [self[i] for i in index]
        return super(MyList, self).__getitem__(index)


seq = MyList("foo bar baaz quux mumble".split())
print seq[0]
print seq[2,4]
print seq[1::2]

印刷

foo
['baaz', 'mumble']
['bar', 'quux']

It isn’t built-in, but you can make a subclass of list that takes tuples as “indexes” if you’d like:

class MyList(list):

    def __getitem__(self, index):
        if isinstance(index, tuple):
            return [self[i] for i in index]
        return super(MyList, self).__getitem__(index)


seq = MyList("foo bar baaz quux mumble".split())
print seq[0]
print seq[2,4]
print seq[1::2]

printing

foo
['baaz', 'mumble']
['bar', 'quux']

回答 3

也许列表理解是按顺序进行的:

L = ['a', 'b', 'c', 'd', 'e', 'f']
print [ L[index] for index in [1,3,5] ]

生成:

['b', 'd', 'f']

那是您要找的东西吗?

Maybe a list comprehension is in order:

L = ['a', 'b', 'c', 'd', 'e', 'f']
print [ L[index] for index in [1,3,5] ]

Produces:

['b', 'd', 'f']

Is that what you are looking for?


回答 4

>>> map(myList.__getitem__, (2,2,1,3))
('baz', 'baz', 'bar', 'quux')

您也可以创建自己的List类,该类支持将元组用作__getitem__要执行的操作的参数myList[(2,2,1,3)]

>>> map(myList.__getitem__, (2,2,1,3))
('baz', 'baz', 'bar', 'quux')

You can also create your own List class which supports tuples as arguments to __getitem__ if you want to be able to do myList[(2,2,1,3)].


回答 5

我只想指出,即使itemgetter的语法看起来也很整洁,但是在大型列表上执行时有点慢。

import timeit
from operator import itemgetter
start=timeit.default_timer()
for i in range(1000000):
    itemgetter(0,2,3)(myList)
print ("Itemgetter took ", (timeit.default_timer()-start))

物品获取者1.065209062149279

start=timeit.default_timer()
for i in range(1000000):
    myList[0],myList[2],myList[3]
print ("Multiple slice took ", (timeit.default_timer()-start))

多个切片花费0.6225321444745759

I just want to point out, even syntax of itemgetter looks really neat, but it’s kinda slow when perform on large list.

import timeit
from operator import itemgetter
start=timeit.default_timer()
for i in range(1000000):
    itemgetter(0,2,3)(myList)
print ("Itemgetter took ", (timeit.default_timer()-start))

Itemgetter took 1.065209062149279

start=timeit.default_timer()
for i in range(1000000):
    myList[0],myList[2],myList[3]
print ("Multiple slice took ", (timeit.default_timer()-start))

Multiple slice took 0.6225321444745759


回答 6

另一个可能的解决方案:

sek=[]
L=[1,2,3,4,5,6,7,8,9,0]
for i in [2, 4, 7, 0, 3]:
   a=[L[i]]
   sek=sek+a
print (sek)

Another possible solution:

sek=[]
L=[1,2,3,4,5,6,7,8,9,0]
for i in [2, 4, 7, 0, 3]:
   a=[L[i]]
   sek=sek+a
print (sek)

回答 7

当你有一个布尔numpy数组时,就像 mask

[mylist[i] for i in np.arange(len(mask), dtype=int)[mask]]

适用于任何序列或np.array的lambda:

subseq = lambda myseq, mask : [myseq[i] for i in np.arange(len(mask), dtype=int)[mask]]

newseq = subseq(myseq, mask)

like often when you have a boolean numpy array like mask

[mylist[i] for i in np.arange(len(mask), dtype=int)[mask]]

A lambda that works for any sequence or np.array:

subseq = lambda myseq, mask : [myseq[i] for i in np.arange(len(mask), dtype=int)[mask]]

newseq = subseq(myseq, mask)


Python列表可以有多大?

问题:Python列表可以有多大?

在Python中,列表可以有多大?我需要大约12000个元素的列表。我仍然可以运行列表方法(例如排序等)吗?

In Python, how big can a list get? I need a list of about 12000 elements. Will I still be able to run list methods such as sorting, etc?


回答 0

根据源代码,列表的最大大小为PY_SSIZE_T_MAX/sizeof(PyObject*)

PY_SSIZE_T_MAXpyport.h中定义为((size_t) -1)>>1

在常规的32位系统上,这是(4294967295/2)/ 4或536870912。

因此,在32位系统上,python列表的最大大小为536,870,912个元素。

只要您拥有的元素数量等于或小于此数量,所有列表函数都应正确运行。

According to the source code, the maximum size of a list is PY_SSIZE_T_MAX/sizeof(PyObject*).

PY_SSIZE_T_MAX is defined in pyport.h to be ((size_t) -1)>>1

On a regular 32bit system, this is (4294967295 / 2) / 4 or 536870912.

Therefore the maximum size of a python list on a 32 bit system is 536,870,912 elements.

As long as the number of elements you have is equal or below this, all list functions should operate correctly.


回答 1

Python文档所述

sys.maxsize

平台的Py_ssize_t类型支持的最大正整数,因此列表,字符串,字典和许多其他容器可以具有的最大大小。

在我的计算机(Linux x86_64)中:

>>> import sys
>>> print sys.maxsize
9223372036854775807

As the Python documentation says:

sys.maxsize

The largest positive integer supported by the platform’s Py_ssize_t type, and thus the maximum size lists, strings, dicts, and many other containers can have.

In my computer (Linux x86_64):

>>> import sys
>>> print sys.maxsize
9223372036854775807

回答 2

当然可以。实际上,您可以轻松地自己看到:

l = range(12000)
l = sorted(l, reverse=True)

在我的机器上运行这些行需要:

real    0m0.036s
user    0m0.024s
sys  0m0.004s

但是可以肯定,正如其他人所说。数组越大,操作将越慢。

Sure it is OK. Actually you can see for yourself easily:

l = range(12000)
l = sorted(l, reverse=True)

Running the those lines on my machine took:

real    0m0.036s
user    0m0.024s
sys  0m0.004s

But sure as everyone else said. The larger the array the slower the operations will be.


回答 3

在临时代码中,我创建了包含数百万个元素的列表。我相信Python的列表实现仅受系统上内存量的限制。

此外,尽管列表很大,但列表方法/函数仍应继续工作。

如果您关心性能,那么值得研究一下NumPy之类的库。

In casual code I’ve created lists with millions of elements. I believe that Python’s implementation of lists are only bound by the amount of memory on your system.

In addition, the list methods / functions should continue to work despite the size of the list.

If you care about performance, it might be worthwhile to look into a library such as NumPy.


回答 4

清单的性能特征在Effbot 进行了描述。

Python列表实际上是作为用于快速随机访问的向量实现的,因此容器基本上将容纳与内存中的空间一样多的项目。(您需要用于列表中包含的指针的空间以及在内存中用于指向的对象的空间。)

追加是O(1)(摊销的恒定复杂度),但是,插入/从序列中间删除将需要O(n)(线性复杂度)重新排序,这将随着列表中元素数量的增加而变慢。

您的排序问题更加细微,因为比较操作可能会花费无数的时间。如果您执行的比较缓慢,则需要花费很长时间,尽管这不是Python的list数据类型的错。

反转只需要交换列表中所有指针所需的时间O(n)(由于触摸每个指针一次,所以有必要(线性复杂度))。

Performance characteristics for lists are described on Effbot.

Python lists are actually implemented as vector for fast random access, so the container will basically hold as many items as there is space for in memory. (You need space for pointers contained in the list as well as space in memory for the object(s) being pointed to.)

Appending is O(1) (amortized constant complexity), however, inserting into/deleting from the middle of the sequence will require an O(n) (linear complexity) reordering, which will get slower as the number of elements in your list.

Your sorting question is more nuanced, since the comparison operation can take an unbounded amount of time. If you’re performing really slow comparisons, it will take a long time, though it’s no fault of Python’s list data type.

Reversal just takes the amount of time it required to swap all the pointers in the list (necessarily O(n) (linear complexity), since you touch each pointer once).


回答 5

12000个元素在Python中什么都没有…实际上,只要Python解释器在您的系统上具有内存,元素的数量就可以增加。

12000 elements is nothing in Python… and actually the number of elements can go as far as the Python interpreter has memory on your system.


回答 6

对于不同的系统,它会有所不同(取决于RAM)。最简单的找出方法是

import six six.MAXSIZE 9223372036854775807 这使的最大尺寸listdict太,按照该文件

It varies for different systems (depends on RAM). The easiest way to find out is

import six six.MAXSIZE 9223372036854775807 This gives the max size of list and dict too ,as per the documentation


回答 7

我想说,您仅受可用RAM总量的限制。显然,数组越大,对其进行的操作就越长。

I’d say you’re only limited by the total amount of RAM available. Obviously the larger the array the longer operations on it will take.


回答 8

我是在x64位系统上从这里获得的:win32上的Python 3.7.0b5(v3.7.0b5:abb8802389,2018年5月31日,01:54:01)[MSC v.1913 64位(AMD64)]

I got this from here on a x64 bit system: Python 3.7.0b5 (v3.7.0b5:abb8802389, May 31 2018, 01:54:01) [MSC v.1913 64 bit (AMD64)] on win32


回答 9

列表号没有限制。导致错误的主要原因是RAM。请升级您的内存大小。

There is no limitation of list number. The main reason which causes your error is the RAM. Please upgrade your memory size.


递归子文件夹搜索并返回列表python中的文件

问题:递归子文件夹搜索并返回列表python中的文件

我正在编写一个脚本,以递归地遍历主文件夹中的子文件夹并根据某种文件类型构建一个列表。我的脚本有问题。当前设置如下

for root, subFolder, files in os.walk(PATH):
    for item in files:
        if item.endswith(".txt") :
            fileNamePath = str(os.path.join(root,subFolder,item))

问题是subFolder变量正在拉入子文件夹列表,而不是ITEM文件所在的文件夹。我曾考虑过为子文件夹运行一个for循环,然后加入路径的第一部分,但我想出了ID双重检查以了解在此之前是否有人提出建议。谢谢你的帮助!

I am working on a script to recursively go through subfolders in a mainfolder and build a list off a certain file type. I am having an issue with the script. Its currently set as follows

for root, subFolder, files in os.walk(PATH):
    for item in files:
        if item.endswith(".txt") :
            fileNamePath = str(os.path.join(root,subFolder,item))

the problem is that the subFolder variable is pulling in a list of subfolders rather than the folder that the ITEM file is located. I was thinking of running a for loop for the subfolder before and join the first part of the path but I figured Id double check to see if anyone has any suggestions before that. Thanks for your help!


回答 0

您应该使用dirpath调用的root。在dirnames供给这样你就可以进行清理,如果有文件夹,你不想os.walk递归到。

import os
result = [os.path.join(dp, f) for dp, dn, filenames in os.walk(PATH) for f in filenames if os.path.splitext(f)[1] == '.txt']

编辑:

经过最新一轮的投票之后,我发现这glob是一个用于扩展选择的更好工具。

import os
from glob import glob
result = [y for x in os.walk(PATH) for y in glob(os.path.join(x[0], '*.txt'))]

也是生成器版本

from itertools import chain
result = (chain.from_iterable(glob(os.path.join(x[0], '*.txt')) for x in os.walk('.')))

适用于Python的Edit2 3.4+

from pathlib import Path
result = list(Path(".").rglob("*.[tT][xX][tT]"))

You should be using the dirpath which you call root. The dirnames are supplied so you can prune it if there are folders that you don’t wish os.walk to recurse into.

import os
result = [os.path.join(dp, f) for dp, dn, filenames in os.walk(PATH) for f in filenames if os.path.splitext(f)[1] == '.txt']

Edit:

After the latest downvote, it occurred to me that glob is a better tool for selecting by extension.

import os
from glob import glob
result = [y for x in os.walk(PATH) for y in glob(os.path.join(x[0], '*.txt'))]

Also a generator version

from itertools import chain
result = (chain.from_iterable(glob(os.path.join(x[0], '*.txt')) for x in os.walk('.')))

Edit2 for Python 3.4+

from pathlib import Path
result = list(Path(".").rglob("*.[tT][xX][tT]"))

回答 1

Python 3.5中进行了更改:支持使用“ **”的递归glob。

glob.glob()有一个新的递归参数

如果要获取下面的每个.txt文件my_path(递归包括子目录):

import glob

files = glob.glob(my_path + '/**/*.txt', recursive=True)

# my_path/     the dir
# **/       every file and dir under my_path
# *.txt     every file that ends with '.txt'

如果需要迭代器,可以使用iglob作为替代方案:

for file in glob.iglob(my_path, recursive=False):
    # ...

Changed in Python 3.5: Support for recursive globs using “**”.

glob.glob() got a new recursive parameter.

If you want to get every .txt file under my_path (recursively including subdirs):

import glob

files = glob.glob(my_path + '/**/*.txt', recursive=True)

# my_path/     the dir
# **/       every file and dir under my_path
# *.txt     every file that ends with '.txt'

If you need an iterator you can use iglob as an alternative:

for file in glob.iglob(my_path, recursive=False):
    # ...

回答 2

我将把John La Rooy的列表理解理解为nest for的表达,以防万一其他人难以理解它。

result = [y for x in os.walk(PATH) for y in glob(os.path.join(x[0], '*.txt'))]

应等效于:

import glob

result = []

for x in os.walk(PATH):
    for y in glob.glob(os.path.join(x[0], '*.txt')):
        result.append(y)

这是列表理解os.walkglob.glob函数的文档。

I will translate John La Rooy’s list comprehension to nested for’s, just in case anyone else has trouble understanding it.

result = [y for x in os.walk(PATH) for y in glob(os.path.join(x[0], '*.txt'))]

Should be equivalent to:

import glob

result = []

for x in os.walk(PATH):
    for y in glob.glob(os.path.join(x[0], '*.txt')):
        result.append(y)

Here’s the documentation for list comprehension and the functions os.walk and glob.glob.


回答 3

这似乎是我能想到的最快的解决方案,并且比任何解决方案都快os.walk而且要快得多glob

  • 它也将免费为您提供所有嵌套子文件夹的列表。
  • 您可以搜索几个不同的扩展名。
  • 您也可以选择通过改变返回的文件不管是完整路径或只是名称f.pathf.name(不改变它的子文件夹!)。

Args :dir: str, ext: list
函数返回两个列表:subfolders, files

有关详细的速度分析,请参见下文。

def run_fast_scandir(dir, ext):    # dir: str, ext: list
    subfolders, files = [], []

    for f in os.scandir(dir):
        if f.is_dir():
            subfolders.append(f.path)
        if f.is_file():
            if os.path.splitext(f.name)[1].lower() in ext:
                files.append(f.path)


    for dir in list(subfolders):
        sf, f = run_fast_scandir(dir, ext)
        subfolders.extend(sf)
        files.extend(f)
    return subfolders, files


subfolders, files = run_fast_scandir(folder, [".jpg"])


速度分析

各种方法来获取所有子文件夹和主文件夹中具有特定文件扩展名的所有文件。

tl; dr:
fast_scandir显然是获胜的,并且是除os.walk之外所有其他解决方案的两倍。
os.walk落后第二名。
-使用glob会大大减慢该过程。
-没有任何结果使用自然排序。这意味着将对结果进行如下排序:1、10、2。要进行自然排序(1、2、10),请查看https://stackoverflow.com/a/48030307/2441026


结果:

fast_scandir    took  499 ms. Found files: 16596. Found subfolders: 439
os.walk         took  589 ms. Found files: 16596
find_files      took  919 ms. Found files: 16596
glob.iglob      took  998 ms. Found files: 16596
glob.glob       took 1002 ms. Found files: 16596
pathlib.rglob   took 1041 ms. Found files: 16596
os.walk-glob    took 1043 ms. Found files: 16596

测试使用W7x64,Python 3.8.1和20次运行完成。439个(部分嵌套的)子文件夹中的16596个文件。
find_files来自https://stackoverflow.com/a/45646357/2441026,可让您搜索多个扩展名。
fast_scandir由我自己编写,并且还将返回子文件夹列表。您可以给它提供扩展名列表以进行搜索(我测试了一个列表,其中一个条目很简单,if ... == ".jpg"并且没有显着差异)。


# -*- coding: utf-8 -*-
# Python 3


import time
import os
from glob import glob, iglob
from pathlib import Path


directory = r"<folder>"
RUNS = 20


def run_os_walk():
    a = time.time_ns()
    for i in range(RUNS):
        fu = [os.path.join(dp, f) for dp, dn, filenames in os.walk(directory) for f in filenames if
                  os.path.splitext(f)[1].lower() == '.jpg']
    print(f"os.walk\t\t\ttook {(time.time_ns() - a) / 1000 / 1000 / RUNS:.0f} ms. Found files: {len(fu)}")


def run_os_walk_glob():
    a = time.time_ns()
    for i in range(RUNS):
        fu = [y for x in os.walk(directory) for y in glob(os.path.join(x[0], '*.jpg'))]
    print(f"os.walk-glob\ttook {(time.time_ns() - a) / 1000 / 1000 / RUNS:.0f} ms. Found files: {len(fu)}")


def run_glob():
    a = time.time_ns()
    for i in range(RUNS):
        fu = glob(os.path.join(directory, '**', '*.jpg'), recursive=True)
    print(f"glob.glob\t\ttook {(time.time_ns() - a) / 1000 / 1000 / RUNS:.0f} ms. Found files: {len(fu)}")


def run_iglob():
    a = time.time_ns()
    for i in range(RUNS):
        fu = list(iglob(os.path.join(directory, '**', '*.jpg'), recursive=True))
    print(f"glob.iglob\t\ttook {(time.time_ns() - a) / 1000 / 1000 / RUNS:.0f} ms. Found files: {len(fu)}")


def run_pathlib_rglob():
    a = time.time_ns()
    for i in range(RUNS):
        fu = list(Path(directory).rglob("*.jpg"))
    print(f"pathlib.rglob\ttook {(time.time_ns() - a) / 1000 / 1000 / RUNS:.0f} ms. Found files: {len(fu)}")


def find_files(files, dirs=[], extensions=[]):
    # https://stackoverflow.com/a/45646357/2441026

    new_dirs = []
    for d in dirs:
        try:
            new_dirs += [ os.path.join(d, f) for f in os.listdir(d) ]
        except OSError:
            if os.path.splitext(d)[1].lower() in extensions:
                files.append(d)

    if new_dirs:
        find_files(files, new_dirs, extensions )
    else:
        return


def run_fast_scandir(dir, ext):    # dir: str, ext: list
    # https://stackoverflow.com/a/59803793/2441026

    subfolders, files = [], []

    for f in os.scandir(dir):
        if f.is_dir():
            subfolders.append(f.path)
        if f.is_file():
            if os.path.splitext(f.name)[1].lower() in ext:
                files.append(f.path)


    for dir in list(subfolders):
        sf, f = run_fast_scandir(dir, ext)
        subfolders.extend(sf)
        files.extend(f)
    return subfolders, files



if __name__ == '__main__':
    run_os_walk()
    run_os_walk_glob()
    run_glob()
    run_iglob()
    run_pathlib_rglob()


    a = time.time_ns()
    for i in range(RUNS):
        files = []
        find_files(files, dirs=[directory], extensions=[".jpg"])
    print(f"find_files\t\ttook {(time.time_ns() - a) / 1000 / 1000 / RUNS:.0f} ms. Found files: {len(files)}")


    a = time.time_ns()
    for i in range(RUNS):
        subf, files = run_fast_scandir(directory, [".jpg"])
    print(f"fast_scandir\ttook {(time.time_ns() - a) / 1000 / 1000 / RUNS:.0f} ms. Found files: {len(files)}. Found subfolders: {len(subf)}")

This seems to be the fastest solution I could come up with, and is faster than os.walk and a lot faster than any glob solution.

  • It will also give you a list of all nested subfolders at basically no cost.
  • You can search for several different extensions.
  • You can also choose to return either full paths or just the names for the files by changing f.path to f.name (do not change it for subfolders!).

Args: dir: str, ext: list.
Function returns two lists: subfolders, files.

See below for a detailed speed anaylsis.

def run_fast_scandir(dir, ext):    # dir: str, ext: list
    subfolders, files = [], []

    for f in os.scandir(dir):
        if f.is_dir():
            subfolders.append(f.path)
        if f.is_file():
            if os.path.splitext(f.name)[1].lower() in ext:
                files.append(f.path)


    for dir in list(subfolders):
        sf, f = run_fast_scandir(dir, ext)
        subfolders.extend(sf)
        files.extend(f)
    return subfolders, files


subfolders, files = run_fast_scandir(folder, [".jpg"])


Speed analysis

for various methods to get all files with a specific file extension inside all subfolders and the main folder.

tl;dr:
fast_scandir clearly wins and is twice as fast as all other solutions, except os.walk.
os.walk is second place slighly slower.
– using glob will greatly slow down the process.
– None of the results use natural sorting. This means results will be sorted like this: 1, 10, 2. To get natural sorting (1, 2, 10), please have a look at https://stackoverflow.com/a/48030307/2441026


Results:

fast_scandir    took  499 ms. Found files: 16596. Found subfolders: 439
os.walk         took  589 ms. Found files: 16596
find_files      took  919 ms. Found files: 16596
glob.iglob      took  998 ms. Found files: 16596
glob.glob       took 1002 ms. Found files: 16596
pathlib.rglob   took 1041 ms. Found files: 16596
os.walk-glob    took 1043 ms. Found files: 16596

Tests were done with W7x64, Python 3.8.1, 20 runs. 16596 files in 439 (partially nested) subfolders.
find_files is from https://stackoverflow.com/a/45646357/2441026 and lets you search for several extensions.
fast_scandir was written by myself and will also return a list of subfolders. You can give it a list of extensions to search for (I tested a list with one entry to a simple if ... == ".jpg" and there was no significant difference).


# -*- coding: utf-8 -*-
# Python 3


import time
import os
from glob import glob, iglob
from pathlib import Path


directory = r"<folder>"
RUNS = 20


def run_os_walk():
    a = time.time_ns()
    for i in range(RUNS):
        fu = [os.path.join(dp, f) for dp, dn, filenames in os.walk(directory) for f in filenames if
                  os.path.splitext(f)[1].lower() == '.jpg']
    print(f"os.walk\t\t\ttook {(time.time_ns() - a) / 1000 / 1000 / RUNS:.0f} ms. Found files: {len(fu)}")


def run_os_walk_glob():
    a = time.time_ns()
    for i in range(RUNS):
        fu = [y for x in os.walk(directory) for y in glob(os.path.join(x[0], '*.jpg'))]
    print(f"os.walk-glob\ttook {(time.time_ns() - a) / 1000 / 1000 / RUNS:.0f} ms. Found files: {len(fu)}")


def run_glob():
    a = time.time_ns()
    for i in range(RUNS):
        fu = glob(os.path.join(directory, '**', '*.jpg'), recursive=True)
    print(f"glob.glob\t\ttook {(time.time_ns() - a) / 1000 / 1000 / RUNS:.0f} ms. Found files: {len(fu)}")


def run_iglob():
    a = time.time_ns()
    for i in range(RUNS):
        fu = list(iglob(os.path.join(directory, '**', '*.jpg'), recursive=True))
    print(f"glob.iglob\t\ttook {(time.time_ns() - a) / 1000 / 1000 / RUNS:.0f} ms. Found files: {len(fu)}")


def run_pathlib_rglob():
    a = time.time_ns()
    for i in range(RUNS):
        fu = list(Path(directory).rglob("*.jpg"))
    print(f"pathlib.rglob\ttook {(time.time_ns() - a) / 1000 / 1000 / RUNS:.0f} ms. Found files: {len(fu)}")


def find_files(files, dirs=[], extensions=[]):
    # https://stackoverflow.com/a/45646357/2441026

    new_dirs = []
    for d in dirs:
        try:
            new_dirs += [ os.path.join(d, f) for f in os.listdir(d) ]
        except OSError:
            if os.path.splitext(d)[1].lower() in extensions:
                files.append(d)

    if new_dirs:
        find_files(files, new_dirs, extensions )
    else:
        return


def run_fast_scandir(dir, ext):    # dir: str, ext: list
    # https://stackoverflow.com/a/59803793/2441026

    subfolders, files = [], []

    for f in os.scandir(dir):
        if f.is_dir():
            subfolders.append(f.path)
        if f.is_file():
            if os.path.splitext(f.name)[1].lower() in ext:
                files.append(f.path)


    for dir in list(subfolders):
        sf, f = run_fast_scandir(dir, ext)
        subfolders.extend(sf)
        files.extend(f)
    return subfolders, files



if __name__ == '__main__':
    run_os_walk()
    run_os_walk_glob()
    run_glob()
    run_iglob()
    run_pathlib_rglob()


    a = time.time_ns()
    for i in range(RUNS):
        files = []
        find_files(files, dirs=[directory], extensions=[".jpg"])
    print(f"find_files\t\ttook {(time.time_ns() - a) / 1000 / 1000 / RUNS:.0f} ms. Found files: {len(files)}")


    a = time.time_ns()
    for i in range(RUNS):
        subf, files = run_fast_scandir(directory, [".jpg"])
    print(f"fast_scandir\ttook {(time.time_ns() - a) / 1000 / 1000 / RUNS:.0f} ms. Found files: {len(files)}. Found subfolders: {len(subf)}")

回答 4

新的pathlib库将其简化为一行:

from pathlib import Path
result = list(Path(PATH).glob('**/*.txt'))

您还可以使用生成器版本:

from pathlib import Path
for file in Path(PATH).glob('**/*.txt'):
    pass

这将返回Path对象,您几乎可以将其用于任何对象,或者通过获取字符串形式的文件名file.name

The new pathlib library simplifies this to one line:

from pathlib import Path
result = list(Path(PATH).glob('**/*.txt'))

You can also use the generator version:

from pathlib import Path
for file in Path(PATH).glob('**/*.txt'):
    pass

This returns Path objects, which you can use for pretty much anything, or get the file name as a string by file.name.


回答 5

它不是最Python的答案,但我会把它放在这里很有趣,因为这是递归中的一个很好的类

def find_files( files, dirs=[], extensions=[]):
    new_dirs = []
    for d in dirs:
        try:
            new_dirs += [ os.path.join(d, f) for f in os.listdir(d) ]
        except OSError:
            if os.path.splitext(d)[1] in extensions:
                files.append(d)

    if new_dirs:
        find_files(files, new_dirs, extensions )
    else:
        return

在我的机器上,我有两个文件夹,root并且root2

mender@multivax ]ls -R root root2
root:
temp1 temp2

root/temp1:
temp1.1 temp1.2

root/temp1/temp1.1:
f1.mid

root/temp1/temp1.2:
f.mi  f.mid

root/temp2:
tmp.mid

root2:
dummie.txt temp3

root2/temp3:
song.mid

比方说,我想找到所有.txt和所有.mid文件在任何这些目录中,然后我可以做

files = []
find_files( files, dirs=['root','root2'], extensions=['.mid','.txt'] )
print(files)

#['root2/dummie.txt',
# 'root/temp2/tmp.mid',
# 'root2/temp3/song.mid',
# 'root/temp1/temp1.1/f1.mid',
# 'root/temp1/temp1.2/f.mid']

Its not the most pythonic answer, but I’ll put it here for fun because it’s a neat lesson in recursion

def find_files( files, dirs=[], extensions=[]):
    new_dirs = []
    for d in dirs:
        try:
            new_dirs += [ os.path.join(d, f) for f in os.listdir(d) ]
        except OSError:
            if os.path.splitext(d)[1] in extensions:
                files.append(d)

    if new_dirs:
        find_files(files, new_dirs, extensions )
    else:
        return

On my machine I have two folders, root and root2

mender@multivax ]ls -R root root2
root:
temp1 temp2

root/temp1:
temp1.1 temp1.2

root/temp1/temp1.1:
f1.mid

root/temp1/temp1.2:
f.mi  f.mid

root/temp2:
tmp.mid

root2:
dummie.txt temp3

root2/temp3:
song.mid

Lets say I want to find all .txt and all .mid files in either of these directories, then I can just do

files = []
find_files( files, dirs=['root','root2'], extensions=['.mid','.txt'] )
print(files)

#['root2/dummie.txt',
# 'root/temp2/tmp.mid',
# 'root2/temp3/song.mid',
# 'root/temp1/temp1.1/f1.mid',
# 'root/temp1/temp1.2/f.mid']

回答 6

递归是Python 3.5中的新增功能,因此它不适用于Python 2.7。这是使用r字符串的示例,因此您只需按Win,Lin,…上的路径提供路径即可。

import glob

mypath=r"C:\Users\dj\Desktop\nba"

files = glob.glob(mypath + r'\**\*.py', recursive=True)
# print(files) # as list
for f in files:
    print(f) # nice looking single line per file

注意:它将列出所有文件,无论文件应多深。

Recursive is new in Python 3.5, so it won’t work on Python 2.7. Here is the example that uses r strings so you just need to provide the path as is on either Win, Lin, …

import glob

mypath=r"C:\Users\dj\Desktop\nba"

files = glob.glob(mypath + r'\**\*.py', recursive=True)
# print(files) # as list
for f in files:
    print(f) # nice looking single line per file

Note: It will list all files, no matter how deep it should go.


回答 7

您可以通过这种方式返回绝对路径文件列表。

def list_files_recursive(path):
    """
    Function that receives as a parameter a directory path
    :return list_: File List and Its Absolute Paths
    """

    import os

    files = []

    # r = root, d = directories, f = files
    for r, d, f in os.walk(path):
        for file in f:
            files.append(os.path.join(r, file))

    lst = [file for file in files]
    return lst


if __name__ == '__main__':

    result = list_files_recursive('/tmp')
    print(result)

You can do it this way to return you a list of absolute path files.

def list_files_recursive(path):
    """
    Function that receives as a parameter a directory path
    :return list_: File List and Its Absolute Paths
    """

    import os

    files = []

    # r = root, d = directories, f = files
    for r, d, f in os.walk(path):
        for file in f:
            files.append(os.path.join(r, file))

    lst = [file for file in files]
    return lst


if __name__ == '__main__':

    result = list_files_recursive('/tmp')
    print(result)


回答 8

如果您不介意安装其他灯库,则可以执行以下操作:

pip install plazy

用法:

import plazy

txt_filter = lambda x : True if x.endswith('.txt') else False
files = plazy.list_files(root='data', filter_func=txt_filter, is_include_root=True)

结果应如下所示:

['data/a.txt', 'data/b.txt', 'data/sub_dir/c.txt']

它同时适用于Python 2.7和Python 3。

GitHub:https : //github.com/kyzas/plazy#list-files

免责声明:我是的作者plazy

If you don’t mind installing an additional light library, you can do this:

pip install plazy

Usage:

import plazy

txt_filter = lambda x : True if x.endswith('.txt') else False
files = plazy.list_files(root='data', filter_func=txt_filter, is_include_root=True)

The result should look something like this:

['data/a.txt', 'data/b.txt', 'data/sub_dir/c.txt']

It works on both Python 2.7 and Python 3.

Github: https://github.com/kyzas/plazy#list-files

Disclaimer: I’m an author of plazy.


回答 9

此功能将递归地仅将文件放入列表中。希望你能。

import os


def ls_files(dir):
    files = list()
    for item in os.listdir(dir):
        abspath = os.path.join(dir, item)
        try:
            if os.path.isdir(abspath):
                files = files + ls_files(abspath)
            else:
                files.append(abspath)
        except FileNotFoundError as err:
            print('invalid directory\n', 'Error: ', err)
    return files

This function will recursively put only files into a list.

import os


def ls_files(dir):
    files = list()
    for item in os.listdir(dir):
        abspath = os.path.join(dir, item)
        try:
            if os.path.isdir(abspath):
                files = files + ls_files(abspath)
            else:
                files.append(abspath)
        except FileNotFoundError as err:
            print('invalid directory\n', 'Error: ', err)
    return files

回答 10

您最初的解决方案几乎是正确的,但是变量“ root”是在递归路径周围动态更新的。os.walk()是一个递归生成器。每个元组集(根,subFolder,文件)都是按照设置方式针对特定根的。

root = 'C:\\'
subFolder = ['Users', 'ProgramFiles', 'ProgramFiles (x86)', 'Windows', ...]
files = ['foo1.txt', 'foo2.txt', 'foo3.txt', ...]

root = 'C:\\Users\\'
subFolder = ['UserAccount1', 'UserAccount2', ...]
files = ['bar1.txt', 'bar2.txt', 'bar3.txt', ...]

...

我对您的代码做了些微调整,以打印完整列表。

import os
for root, subFolder, files in os.walk(PATH):
    for item in files:
        if item.endswith(".txt") :
            fileNamePath = str(os.path.join(root,item))
            print(fileNamePath)

希望这可以帮助!

Your original solution was very nearly correct, but the variable “root” is dynamically updated as it recursively paths around. os.walk() is a recursive generator. Each tuple set of (root, subFolder, files) is for a specific root the way you have it setup.

i.e.

root = 'C:\\'
subFolder = ['Users', 'ProgramFiles', 'ProgramFiles (x86)', 'Windows', ...]
files = ['foo1.txt', 'foo2.txt', 'foo3.txt', ...]

root = 'C:\\Users\\'
subFolder = ['UserAccount1', 'UserAccount2', ...]
files = ['bar1.txt', 'bar2.txt', 'bar3.txt', ...]

...

I made a slight tweak to your code to print a full list.

import os
for root, subFolder, files in os.walk(PATH):
    for item in files:
        if item.endswith(".txt") :
            fileNamePath = str(os.path.join(root,item))
            print(fileNamePath)

Hope this helps!


函数式编程中的“ pythonic”等同于“ fold”函数是什么?

问题:函数式编程中的“ pythonic”等同于“ fold”函数是什么?

在Haskell中,实现以下目标的最惯用的方法是:

foldl (+) 0 [1,2,3,4,5]
--> 15

或等效的Ruby:

[1,2,3,4,5].inject(0) {|m,x| m + x}
#> 15

显然,Python提供了reduce与fold完全相同的功能,但实际上是如上所述的fold的实现,但是,有人告诉我,“ pythonic”编程方式是避免使用lambda术语和高阶函数,并尽可能使用列表理解。因此,有没有一种首选的方式来折叠列表或不是Python reduce函数的类似列表的结构,或者是reduce惯用的方式来实现此目的?

What is the most idiomatic way to achieve something like the following, in Haskell:

foldl (+) 0 [1,2,3,4,5]
--> 15

Or its equivalent in Ruby:

[1,2,3,4,5].inject(0) {|m,x| m + x}
#> 15

Obviously, Python provides the reduce function, which is an implementation of fold, exactly as above, however, I was told that the ‘pythonic’ way of programming was to avoid lambda terms and higher-order functions, preferring list-comprehensions where possible. Therefore, is there a preferred way of folding a list, or list-like structure in Python that isn’t the reduce function, or is reduce the idiomatic way of achieving this?


回答 0

用Python方式对数组求和的方法是使用sum。为了其他目的,有时可以使用reduce(来自functools模块)和operator模块的某种组合,例如:

def product(xs):
    return reduce(operator.mul, xs, 1)

请注意,reduce实际上这是一个foldl用Haskell表示的。没有执行折叠的特殊语法,没有内置函数foldr,实际上reduce与非关联运算符一起使用被认为是不良样式。

使用高阶函数是相当Python的;它很好地利用了Python的原理,即一切都是对象,包括函数和类。没错,lambda被某些Pythonista所反对,但这主要是因为它们变得复杂时往往不太可读。

The Pythonic way of summing an array is using sum. For other purposes, you can sometimes use some combination of reduce (from the functools module) and the operator module, e.g.:

def product(xs):
    return reduce(operator.mul, xs, 1)

Be aware that reduce is actually a foldl, in Haskell terms. There is no special syntax to perform folds, there’s no builtin foldr, and actually using reduce with non-associative operators is considered bad style.

Using higher-order functions is quite pythonic; it makes good use of Python’s principle that everything is an object, including functions and classes. You are right that lambdas are frowned upon by some Pythonistas, but mostly because they tend not to be very readable when they get complex.


回答 1

哈斯克尔

foldl (+) 0 [1,2,3,4,5]

Python

reduce(lambda a,b: a+b, [1,2,3,4,5], 0)

显然,这是一个说明问题的简单例子。在Python中,您只需要这样做sum([1,2,3,4,5]),甚至Haskell纯粹主义者通常也会更喜欢sum [1,2,3,4,5]

对于没有明显便利函数的非平凡场景,惯用的pythonic方法是显式写出for循环并使用可变变量分配,而不是使用reduceor fold

那根本不是功能样式,而是“ pythonic”方式。Python并非为功能纯正者设计。了解Python如何支持流控制的异常,以了解非功能性惯用python的情况。

Haskell

foldl (+) 0 [1,2,3,4,5]

Python

reduce(lambda a,b: a+b, [1,2,3,4,5], 0)

Obviously, that is a trivial example to illustrate a point. In Python you would just do sum([1,2,3,4,5]) and even Haskell purists would generally prefer sum [1,2,3,4,5].

For non-trivial scenarios when there is no obvious convenience function, the idiomatic pythonic approach is to explicitly write out the for loop and use mutable variable assignment instead of using reduce or a fold.

That is not at all the functional style, but that is the “pythonic” way. Python is not designed for functional purists. See how Python favors exceptions for flow control to see how non-functional idiomatic python is.


回答 2

在Python 3中,reduce已被删除:版本说明。不过,您可以使用functools模块

import operator, functools
def product(xs):
    return functools.reduce(operator.mul, xs, 1)

另一方面,文档表达了对for-loop而不是的偏好reduce,因此:

def product(xs):
    result = 1
    for i in xs:
        result *= i
    return result

In Python 3, the reduce has been removed: Release notes. Nevertheless you can use the functools module

import operator, functools
def product(xs):
    return functools.reduce(operator.mul, xs, 1)

On the other hand, the documentation expresses preference towards for-loop instead of reduce, hence:

def product(xs):
    result = 1
    for i in xs:
        result *= i
    return result

回答 3

您也可以重新发明轮子:

def fold(f, l, a):
    """
    f: the function to apply
    l: the list to fold
    a: the accumulator, who is also the 'zero' on the first call
    """ 
    return a if(len(l) == 0) else fold(f, l[1:], f(a, l[0]))

print "Sum:", fold(lambda x, y : x+y, [1,2,3,4,5], 0)

print "Any:", fold(lambda x, y : x or y, [False, True, False], False)

print "All:", fold(lambda x, y : x and y, [False, True, False], True)

# Prove that result can be of a different type of the list's elements
print "Count(x==True):", 
print fold(lambda x, y : x+1 if(y) else x, [False, True, True], 0)

You can reinvent the wheel as well:

def fold(f, l, a):
    """
    f: the function to apply
    l: the list to fold
    a: the accumulator, who is also the 'zero' on the first call
    """ 
    return a if(len(l) == 0) else fold(f, l[1:], f(a, l[0]))

print "Sum:", fold(lambda x, y : x+y, [1,2,3,4,5], 0)

print "Any:", fold(lambda x, y : x or y, [False, True, False], False)

print "All:", fold(lambda x, y : x and y, [False, True, False], True)

# Prove that result can be of a different type of the list's elements
print "Count(x==True):", 
print fold(lambda x, y : x+1 if(y) else x, [False, True, True], 0)

回答 4

并不是真正回答这个问题,而是折叠和折叠的一线:

a = [8,3,4]

## Foldl
reduce(lambda x,y: x**y, a)
#68719476736

## Foldr
reduce(lambda x,y: y**x, a[::-1])
#14134776518227074636666380005943348126619871175004951664972849610340958208L

Not really answer to the question, but one-liners for foldl and foldr:

a = [8,3,4]

## Foldl
reduce(lambda x,y: x**y, a)
#68719476736

## Foldr
reduce(lambda x,y: y**x, a[::-1])
#14134776518227074636666380005943348126619871175004951664972849610340958208L

回答 5

开始Python 3.8并引入赋值表达式(PEP 572):=运算符),这使您可以为表达式的结果命名,我们可以使用列表推导来复制其他语言称为fold / foldleft / reduce的操作:

给定一个列表,一个约简函数和一个累加器:

items = [1, 2, 3, 4, 5]
f = lambda acc, x: acc * x
accumulator = 1

我们可以折叠itemsf在为了获得所得accumulation

[accumulator := f(accumulator, x) for x in items]
# accumulator = 120

或以压缩形式形成:

acc = 1; [acc := acc * x for x in [1, 2, 3, 4, 5]]
# acc = 120

请注意,这实际上也是“ scanleft”操作,因为列表理解的结果表示每个步骤的累加状态:

acc = 1
scanned = [acc := acc * x for x in [1, 2, 3, 4, 5]]
# scanned = [1, 2, 6, 24, 120]
# acc = 120

Starting Python 3.8, and the introduction of assignment expressions (PEP 572) (:= operator), which gives the possibility to name the result of an expression, we can use a list comprehension to replicate what other languages call fold/foldleft/reduce operations:

Given a list, a reducing function and an accumulator:

items = [1, 2, 3, 4, 5]
f = lambda acc, x: acc * x
accumulator = 1

we can fold items with f in order to obtain the resulting accumulation:

[accumulator := f(accumulator, x) for x in items]
# accumulator = 120

or in a condensed formed:

acc = 1; [acc := acc * x for x in [1, 2, 3, 4, 5]]
# acc = 120

Note that this is actually also a “scanleft” operation as the result of the list comprehension represents the state of the accumulation at each step:

acc = 1
scanned = [acc := acc * x for x in [1, 2, 3, 4, 5]]
# scanned = [1, 2, 6, 24, 120]
# acc = 120

回答 6

对这个(减少)问题的实际答案是:只需使用一个循环!

initial_value = 0
for x in the_list:
    initial_value += x #or any function.

这将比reduce更快,而PyPy之类的东西可以优化循环。

顺便说一句,总和的情况应该用sum函数来解决

The actual answer to this (reduce) problem is: Just use a loop!

initial_value = 0
for x in the_list:
    initial_value += x #or any function.

This will be faster than a reduce and things like PyPy can optimize loops like that.

BTW, the sum case should be solved with the sum function


回答 7

我相信这个问题的一些回答者错过了该fold功能作为抽象工具的广泛含义。是的,sum可以对整数列表执行相同的操作,但这是不重要的情况。fold更通用。当您具有一系列形状各异的数据结构并想要清晰地表达聚合时,此功能很有用。因此,不必for每次都用一个聚合变量建立一个循环并手动重新计算它,fold函数(或reduce似乎对应的Python版本)允许程序员通过简单地提供以下内容来更清楚地表达聚合的意图:两件事情:

  • 聚合的默认起始值​​或“种子”值。
  • 该函数采用聚合的当前值(以“种子”开头)和列表中的下一个元素,并返回下一个聚合值。

I believe some of the respondents of this question have missed the broader implication of the fold function as an abstract tool. Yes, sum can do the same thing for a list of integers, but this is a trivial case. fold is more generic. It is useful when you have a sequence of data structures of varying shape and want to cleanly express an aggregation. So instead of having to build up a for loop with an aggregate variable and manually recompute it each time, a fold function (or the Python version, which reduce appears to correspond to) allows the programmer to express the intent of the aggregation much more plainly by simply providing two things:

  • A default starting or “seed” value for the aggregation.
  • A function that takes the current value of the aggregation (starting with the “seed”) and the next element in the list, and returns the next aggregation value.

回答 8

我参加聚会可能已经很晚了,但是我们可以foldr使用简单的lambda演算和curried函数创建自定义项。这是我在python中实现的foldr。

def foldr(func):
    def accumulator(acc):
        def listFunc(l):
            if l:
                x = l[0]
                xs = l[1:]
                return func(x)(foldr(func)(acc)(xs))
            else:
                return acc
        return listFunc
    return accumulator  


def curried_add(x):
    def inner(y):
        return x + y
    return inner

def curried_mult(x):
    def inner(y):
        return x * y
    return inner

print foldr(curried_add)(0)(range(1, 6))
print foldr(curried_mult)(1)(range(1, 6))

即使实现是递归的(可能很慢),它也会分别打印值15120

I may be quite late to the party, but we can create custom foldr using simple lambda calculus and curried function. Here is my implementation of foldr in python.

def foldr(func):
    def accumulator(acc):
        def listFunc(l):
            if l:
                x = l[0]
                xs = l[1:]
                return func(x)(foldr(func)(acc)(xs))
            else:
                return acc
        return listFunc
    return accumulator  


def curried_add(x):
    def inner(y):
        return x + y
    return inner

def curried_mult(x):
    def inner(y):
        return x * y
    return inner

print foldr(curried_add)(0)(range(1, 6))
print foldr(curried_mult)(1)(range(1, 6))

Even though the implementation is recursive (might be slow), it will print the values 15 and 120 respectively