问题:如何检查列表中是否包含以下项目之一?
我正在尝试寻找一种简短的方法来查看列表中是否包含以下任何项,但是我的第一次尝试不起作用。除了编写函数来完成此任务外,它还是检查多个项目之一是否在列表中的任何捷径。
>>> a = [2,3,4]
>>> print (1 or 2) in a
False
>>> print (2 or 1) in a
True
I’m trying to find a short way to see if any of the following items is in a list, but my first attempt does not work. Besides writing a function to accomplish this, is the any short way to check if one of multiple items is in a list.
>>> a = [2,3,4]
>>> print (1 or 2) in a
False
>>> print (2 or 1) in a
True
回答 0
>>> L1 = [2,3,4]
>>> L2 = [1,2]
>>> [i for i in L1 if i in L2]
[2]
>>> S1 = set(L1)
>>> S2 = set(L2)
>>> S1.intersection(S2)
set([2])
空列表和空集都为False,因此您可以将值直接用作真值。
>>> L1 = [2,3,4]
>>> L2 = [1,2]
>>> [i for i in L1 if i in L2]
[2]
>>> S1 = set(L1)
>>> S2 = set(L2)
>>> S1.intersection(S2)
set([2])
Both empty lists and empty sets are False, so you can use the value directly as a truth value.
回答 1
啊,托比亚斯你击败了我。我正在考虑您的解决方案的这种细微变化:
>>> a = [1,2,3,4]
>>> b = [2,7]
>>> print(any(x in a for x in b))
True
Ah, Tobias you beat me to it. I was thinking of this slight variation on your solution:
>>> a = [1,2,3,4]
>>> b = [2,7]
>>> any(x in a for x in b)
True
回答 2
也许有点懒:
a = [1,2,3,4]
b = [2,7]
print any((True for x in a if x in b))
Maybe a bit more lazy:
a = [1,2,3,4]
b = [2,7]
print any((True for x in a if x in b))
回答 3
考虑一下代码实际上是怎么说的!
>>> (1 or 2)
1
>>> (2 or 1)
2
那可能可以解释一下。:) Python显然实现了“惰性”,这不足为奇。它执行如下操作:
def or(x, y):
if x: return x
if y: return y
return False
在第一个示例中,x == 1
和y == 2
。在第二个示例中,反之亦然。这就是为什么它根据它们的顺序返回不同的值的原因。
Think about what the code actually says!
>>> (1 or 2)
1
>>> (2 or 1)
2
That should probably explain it. :) Python apparently implements “lazy or”, which should come as no surprise. It performs it something like this:
def or(x, y):
if x: return x
if y: return y
return False
In the first example, x == 1
and y == 2
. In the second example, it’s vice versa. That’s why it returns different values depending on the order of them.
回答 4
a = {2,3,4}
if {1,2} & a:
pass
代码高尔夫球版。如果有必要,请考虑使用集合。我发现这比列表理解更具可读性。
a = {2,3,4}
if {1,2} & a:
pass
Code golf version. Consider using a set if it makes sense to do so.
I find this more readable than a list comprehension.
回答 5
1行,没有列表推导。
>>> any(map(lambda each: each in [2,3,4], [1,2]))
True
>>> any(map(lambda each: each in [2,3,4], [1,5]))
False
>>> any(map(lambda each: each in [2,3,4], [2,4]))
True
1 line without list comprehensions.
>>> any(map(lambda each: each in [2,3,4], [1,2]))
True
>>> any(map(lambda each: each in [2,3,4], [1,5]))
False
>>> any(map(lambda each: each in [2,3,4], [2,4]))
True
回答 6
最好的我想出了:
any([True for e in (1, 2) if e in a])
Best I could come up with:
any([True for e in (1, 2) if e in a])
回答 7
在python 3中,我们可以开始使用unpack星号。给出两个列表:
bool(len({*a} & {*b}))
编辑:纳入alkanen的建议
In python 3 we can start make use of the unpack asterisk. Given two lists:
bool(len({*a} & {*b}))
Edit: incorporate alkanen’s suggestion
回答 8
当您认为“检查b中是否存在a”时,请考虑哈希(在这种情况下为set)。最快的方法是散列要检查的列表,然后检查其中的每个项目。
这就是Joe Koberg的答案之所以如此之快的原因:检查集合交集非常快。
但是,当您没有大量数据时,进行设置可能会浪费时间。因此,您可以建立一组列表,然后仅检查每个项目:
tocheck = [1,2] # items to check
a = [2,3,4] # the list
a = set(a) # convert to set (O(len(a)))
print [i for i in tocheck if i in a] # check items (O(len(tocheck)))
当您要检查的项目数量很少时,差异可以忽略不计。但是要检查一大堆数字…
测试:
from timeit import timeit
methods = ['''tocheck = [1,2] # items to check
a = [2,3,4] # the list
a = set(a) # convert to set (O(n))
[i for i in tocheck if i in a] # check items (O(m))''',
'''L1 = [2,3,4]
L2 = [1,2]
[i for i in L1 if i in L2]''',
'''S1 = set([2,3,4])
S2 = set([1,2])
S1.intersection(S2)''',
'''a = [1,2]
b = [2,3,4]
any(x in a for x in b)''']
for method in methods:
print timeit(method, number=10000)
print
methods = ['''tocheck = range(200,300) # items to check
a = range(2, 10000) # the list
a = set(a) # convert to set (O(n))
[i for i in tocheck if i in a] # check items (O(m))''',
'''L1 = range(2, 10000)
L2 = range(200,300)
[i for i in L1 if i in L2]''',
'''S1 = set(range(2, 10000))
S2 = set(range(200,300))
S1.intersection(S2)''',
'''a = range(200,300)
b = range(2, 10000)
any(x in a for x in b)''']
for method in methods:
print timeit(method, number=1000)
速度:
M1: 0.0170331001282 # make one set
M2: 0.0164539813995 # list comprehension
M3: 0.0286040306091 # set intersection
M4: 0.0305438041687 # any
M1: 0.49850320816 # make one set
M2: 25.2735087872 # list comprehension
M3: 0.466138124466 # set intersection
M4: 0.668627977371 # any
一贯快速的方法是制作一组(列表中的),但是交集在大型数据集上的效果最好!
When you think “check to see if a in b”, think hashes (in this case, sets). The fastest way is to hash the list you want to check, and then check each item in there.
This is why Joe Koberg’s answer is fast: checking set intersection is very fast.
When you don’t have a lot of data though, making sets can be a waste of time. So, you can make a set of the list and just check each item:
tocheck = [1,2] # items to check
a = [2,3,4] # the list
a = set(a) # convert to set (O(len(a)))
print [i for i in tocheck if i in a] # check items (O(len(tocheck)))
When the number of items you want to check is small, the difference can be negligible. But check lots of numbers against a large list…
tests:
from timeit import timeit
methods = ['''tocheck = [1,2] # items to check
a = [2,3,4] # the list
a = set(a) # convert to set (O(n))
[i for i in tocheck if i in a] # check items (O(m))''',
'''L1 = [2,3,4]
L2 = [1,2]
[i for i in L1 if i in L2]''',
'''S1 = set([2,3,4])
S2 = set([1,2])
S1.intersection(S2)''',
'''a = [1,2]
b = [2,3,4]
any(x in a for x in b)''']
for method in methods:
print timeit(method, number=10000)
print
methods = ['''tocheck = range(200,300) # items to check
a = range(2, 10000) # the list
a = set(a) # convert to set (O(n))
[i for i in tocheck if i in a] # check items (O(m))''',
'''L1 = range(2, 10000)
L2 = range(200,300)
[i for i in L1 if i in L2]''',
'''S1 = set(range(2, 10000))
S2 = set(range(200,300))
S1.intersection(S2)''',
'''a = range(200,300)
b = range(2, 10000)
any(x in a for x in b)''']
for method in methods:
print timeit(method, number=1000)
speeds:
M1: 0.0170331001282 # make one set
M2: 0.0164539813995 # list comprehension
M3: 0.0286040306091 # set intersection
M4: 0.0305438041687 # any
M1: 0.49850320816 # make one set
M2: 25.2735087872 # list comprehension
M3: 0.466138124466 # set intersection
M4: 0.668627977371 # any
The method that is consistently fast is to make one set (of the list), but the intersection works on large data sets the best!
回答 9
在某些情况下(例如,唯一列表元素),可以使用设置操作。
>>> a=[2,3,4]
>>> set(a) - set([2,3]) != set(a)
True
>>>
或者,使用set.isdisjoint(),
>>> not set(a).isdisjoint(set([2,3]))
True
>>> not set(a).isdisjoint(set([5,6]))
False
>>>
In some cases (e.g. unique list elements), set operations can be used.
>>> a=[2,3,4]
>>> set(a) - set([2,3]) != set(a)
True
>>>
Or, using set.isdisjoint(),
>>> not set(a).isdisjoint(set([2,3]))
True
>>> not set(a).isdisjoint(set([5,6]))
False
>>>
回答 10
这将在一行中完成。
>>> a=[2,3,4]
>>> b=[1,2]
>>> bool(sum(map(lambda x: x in b, a)))
True
This will do it in one line.
>>> a=[2,3,4]
>>> b=[1,2]
>>> bool(sum(map(lambda x: x in b, a)))
True
回答 11
我收集了其他答案和评论中提到的几种解决方案,然后进行了速度测试。not set(a).isdisjoint(b)
原来是最快的,结果还没减慢多少False
。
每个三次运行的测试的可能的配置的一个小样本a
和b
。时间以微秒为单位。
Any with generator and max
2.093 1.997 7.879
Any with generator
0.907 0.692 2.337
Any with list
1.294 1.452 2.137
True in list
1.219 1.348 2.148
Set with &
1.364 1.749 1.412
Set intersection explcit set(b)
1.424 1.787 1.517
Set intersection implicit set(b)
0.964 1.298 0.976
Set isdisjoint explicit set(b)
1.062 1.094 1.241
Set isdisjoint implicit set(b)
0.622 0.621 0.753
import timeit
def printtimes(t):
print '{:.3f}'.format(t/10.0),
setup1 = 'a = range(10); b = range(9,15)'
setup2 = 'a = range(10); b = range(10)'
setup3 = 'a = range(10); b = range(10,20)'
print 'Any with generator and max\n\t',
printtimes(timeit.Timer('any(x in max(a,b,key=len) for x in min(b,a,key=len))',setup=setup1).timeit(10000000))
printtimes(timeit.Timer('any(x in max(a,b,key=len) for x in min(b,a,key=len))',setup=setup2).timeit(10000000))
printtimes(timeit.Timer('any(x in max(a,b,key=len) for x in min(b,a,key=len))',setup=setup3).timeit(10000000))
print
print 'Any with generator\n\t',
printtimes(timeit.Timer('any(i in a for i in b)',setup=setup1).timeit(10000000))
printtimes(timeit.Timer('any(i in a for i in b)',setup=setup2).timeit(10000000))
printtimes(timeit.Timer('any(i in a for i in b)',setup=setup3).timeit(10000000))
print
print 'Any with list\n\t',
printtimes(timeit.Timer('any([i in a for i in b])',setup=setup1).timeit(10000000))
printtimes(timeit.Timer('any([i in a for i in b])',setup=setup2).timeit(10000000))
printtimes(timeit.Timer('any([i in a for i in b])',setup=setup3).timeit(10000000))
print
print 'True in list\n\t',
printtimes(timeit.Timer('True in [i in a for i in b]',setup=setup1).timeit(10000000))
printtimes(timeit.Timer('True in [i in a for i in b]',setup=setup2).timeit(10000000))
printtimes(timeit.Timer('True in [i in a for i in b]',setup=setup3).timeit(10000000))
print
print 'Set with &\n\t',
printtimes(timeit.Timer('bool(set(a) & set(b))',setup=setup1).timeit(10000000))
printtimes(timeit.Timer('bool(set(a) & set(b))',setup=setup2).timeit(10000000))
printtimes(timeit.Timer('bool(set(a) & set(b))',setup=setup3).timeit(10000000))
print
print 'Set intersection explcit set(b)\n\t',
printtimes(timeit.Timer('bool(set(a).intersection(set(b)))',setup=setup1).timeit(10000000))
printtimes(timeit.Timer('bool(set(a).intersection(set(b)))',setup=setup2).timeit(10000000))
printtimes(timeit.Timer('bool(set(a).intersection(set(b)))',setup=setup3).timeit(10000000))
print
print 'Set intersection implicit set(b)\n\t',
printtimes(timeit.Timer('bool(set(a).intersection(b))',setup=setup1).timeit(10000000))
printtimes(timeit.Timer('bool(set(a).intersection(b))',setup=setup2).timeit(10000000))
printtimes(timeit.Timer('bool(set(a).intersection(b))',setup=setup3).timeit(10000000))
print
print 'Set isdisjoint explicit set(b)\n\t',
printtimes(timeit.Timer('not set(a).isdisjoint(set(b))',setup=setup1).timeit(10000000))
printtimes(timeit.Timer('not set(a).isdisjoint(set(b))',setup=setup2).timeit(10000000))
printtimes(timeit.Timer('not set(a).isdisjoint(set(b))',setup=setup3).timeit(10000000))
print
print 'Set isdisjoint implicit set(b)\n\t',
printtimes(timeit.Timer('not set(a).isdisjoint(b)',setup=setup1).timeit(10000000))
printtimes(timeit.Timer('not set(a).isdisjoint(b)',setup=setup1).timeit(10000000))
printtimes(timeit.Timer('not set(a).isdisjoint(b)',setup=setup3).timeit(10000000))
print
I collected several of the solutions mentioned in other answers and in comments, then ran a speed test. not set(a).isdisjoint(b)
turned out the be the fastest, it also did not slowdown much when the result was False
.
Each of the three runs tests a small sample of the possible configurations of a
and b
. The times are in microseconds.
Any with generator and max
2.093 1.997 7.879
Any with generator
0.907 0.692 2.337
Any with list
1.294 1.452 2.137
True in list
1.219 1.348 2.148
Set with &
1.364 1.749 1.412
Set intersection explcit set(b)
1.424 1.787 1.517
Set intersection implicit set(b)
0.964 1.298 0.976
Set isdisjoint explicit set(b)
1.062 1.094 1.241
Set isdisjoint implicit set(b)
0.622 0.621 0.753
import timeit
def printtimes(t):
print '{:.3f}'.format(t/10.0),
setup1 = 'a = range(10); b = range(9,15)'
setup2 = 'a = range(10); b = range(10)'
setup3 = 'a = range(10); b = range(10,20)'
print 'Any with generator and max\n\t',
printtimes(timeit.Timer('any(x in max(a,b,key=len) for x in min(b,a,key=len))',setup=setup1).timeit(10000000))
printtimes(timeit.Timer('any(x in max(a,b,key=len) for x in min(b,a,key=len))',setup=setup2).timeit(10000000))
printtimes(timeit.Timer('any(x in max(a,b,key=len) for x in min(b,a,key=len))',setup=setup3).timeit(10000000))
print
print 'Any with generator\n\t',
printtimes(timeit.Timer('any(i in a for i in b)',setup=setup1).timeit(10000000))
printtimes(timeit.Timer('any(i in a for i in b)',setup=setup2).timeit(10000000))
printtimes(timeit.Timer('any(i in a for i in b)',setup=setup3).timeit(10000000))
print
print 'Any with list\n\t',
printtimes(timeit.Timer('any([i in a for i in b])',setup=setup1).timeit(10000000))
printtimes(timeit.Timer('any([i in a for i in b])',setup=setup2).timeit(10000000))
printtimes(timeit.Timer('any([i in a for i in b])',setup=setup3).timeit(10000000))
print
print 'True in list\n\t',
printtimes(timeit.Timer('True in [i in a for i in b]',setup=setup1).timeit(10000000))
printtimes(timeit.Timer('True in [i in a for i in b]',setup=setup2).timeit(10000000))
printtimes(timeit.Timer('True in [i in a for i in b]',setup=setup3).timeit(10000000))
print
print 'Set with &\n\t',
printtimes(timeit.Timer('bool(set(a) & set(b))',setup=setup1).timeit(10000000))
printtimes(timeit.Timer('bool(set(a) & set(b))',setup=setup2).timeit(10000000))
printtimes(timeit.Timer('bool(set(a) & set(b))',setup=setup3).timeit(10000000))
print
print 'Set intersection explcit set(b)\n\t',
printtimes(timeit.Timer('bool(set(a).intersection(set(b)))',setup=setup1).timeit(10000000))
printtimes(timeit.Timer('bool(set(a).intersection(set(b)))',setup=setup2).timeit(10000000))
printtimes(timeit.Timer('bool(set(a).intersection(set(b)))',setup=setup3).timeit(10000000))
print
print 'Set intersection implicit set(b)\n\t',
printtimes(timeit.Timer('bool(set(a).intersection(b))',setup=setup1).timeit(10000000))
printtimes(timeit.Timer('bool(set(a).intersection(b))',setup=setup2).timeit(10000000))
printtimes(timeit.Timer('bool(set(a).intersection(b))',setup=setup3).timeit(10000000))
print
print 'Set isdisjoint explicit set(b)\n\t',
printtimes(timeit.Timer('not set(a).isdisjoint(set(b))',setup=setup1).timeit(10000000))
printtimes(timeit.Timer('not set(a).isdisjoint(set(b))',setup=setup2).timeit(10000000))
printtimes(timeit.Timer('not set(a).isdisjoint(set(b))',setup=setup3).timeit(10000000))
print
print 'Set isdisjoint implicit set(b)\n\t',
printtimes(timeit.Timer('not set(a).isdisjoint(b)',setup=setup1).timeit(10000000))
printtimes(timeit.Timer('not set(a).isdisjoint(b)',setup=setup1).timeit(10000000))
printtimes(timeit.Timer('not set(a).isdisjoint(b)',setup=setup3).timeit(10000000))
print
回答 12
我不得不说,我的处境可能不是您想要的,但它可能为您的思考提供了另一种选择。
我已经尝试了set()和any()方法,但是仍然存在速度问题。因此,我记得Raymond Hettinger所说的python中的所有内容都是字典,并尽可能使用dict。这就是我尝试过的。
我使用带有int的defaultdict表示否定结果,并将第一个列表中的项目用作第二个列表的键(转换为defaultdict)。因为您可以使用dict进行即时查找,所以您会立即知道默认dict中是否存在该项目。我知道您并不总是可以更改第二个列表的数据结构,但是如果您能够从一开始就进行更改,那么它会更快。您可能需要将list2(较大的列表)转换为defaultdict,其中key是您要从小列表中检查的潜在值,值是1(命中)或0(无命中,默认)。
from collections import defaultdict
already_indexed = defaultdict(int)
def check_exist(small_list, default_list):
for item in small_list:
if default_list[item] == 1:
return True
return False
if check_exist(small_list, already_indexed):
continue
else:
for x in small_list:
already_indexed[x] = 1
I have to say that my situation might not be what you are looking for, but it may provide an alternative to your thinking.
I have tried both the set() and any() method but still have problems with speed. So I remembered Raymond Hettinger said everything in python is a dictionary and use dict whenever you can. So that’s what I tried.
I used a defaultdict with int to indicate negative results and used the item in the first list as the key for the second list (converted to defaultdict). Because you have instant lookup with dict, you know immediately whether that item exist in the defaultdict. I know you don’t always get to change data structure for your second list, but if you are able to from the start, then it’s much faster. You may have to convert list2 (larger list) to a defaultdict, where key is the potential value you want to check from small list, and value is either 1 (hit) or 0 (no hit, default).
from collections import defaultdict
already_indexed = defaultdict(int)
def check_exist(small_list, default_list):
for item in small_list:
if default_list[item] == 1:
return True
return False
if check_exist(small_list, already_indexed):
continue
else:
for x in small_list:
already_indexed[x] = 1
回答 13
简单。
_new_list = []
for item in a:
if item in b:
_new_list.append(item)
else:
pass
Simple.
_new_list = []
for item in a:
if item in b:
_new_list.append(item)
else:
pass