问题:Python可以测试列表中多个值的成员资格吗?

我想测试列表中是否有两个或多个值具有成员资格,但结果却出乎意料:

>>> 'a','b' in ['b', 'a', 'foo', 'bar']
('a', True)

那么,Python可以一次在列表中测试多个值的成员资格吗?结果是什么意思?

I want to test if two or more values have membership on a list, but I’m getting an unexpected result:

>>> 'a','b' in ['b', 'a', 'foo', 'bar']
('a', True)

So, Can Python test the membership of multiple values at once in a list? What does that result mean?


回答 0

这可以满足您的要求,并且几乎可以在所有情况下使用:

>>> all(x in ['b', 'a', 'foo', 'bar'] for x in ['a', 'b'])
True

该表达式'a','b' in ['b', 'a', 'foo', 'bar']无法按预期工作,因为Python将其解释为元组:

>>> 'a', 'b'
('a', 'b')
>>> 'a', 5 + 2
('a', 7)
>>> 'a', 'x' in 'xerxes'
('a', True)

其他选择

还有其他执行此测试的方法,但是它们不适用于许多不同种类的输入。正如Kabie指出的那样,您可以使用集合解决此问题…

>>> set(['a', 'b']).issubset(set(['a', 'b', 'foo', 'bar']))
True
>>> {'a', 'b'} <= {'a', 'b', 'foo', 'bar'}
True

…有时:

>>> {'a', ['b']} <= {'a', ['b'], 'foo', 'bar'}
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: unhashable type: 'list'

只能使用可哈希元素创建集。但是生成器表达式all(x in container for x in items)几乎可以处理任何容器类型。唯一的要求是container可重复使用(即不是生成器)。items可以是任何可迭代的。

>>> container = [['b'], 'a', 'foo', 'bar']
>>> items = (i for i in ('a', ['b']))
>>> all(x in [['b'], 'a', 'foo', 'bar'] for x in items)
True

速度测试

在许多情况下,子集测试会比快all,但差异并不令人震惊-除非问题无关紧要,因为集合不是一个选择,除非。仅将列表转换为集合是为了进行这样的测试并不总是值得为此感到麻烦。而且,将生成器转换为集合有时会非常浪费,从而使程序运行速度降低了多个数量级。

这里有一些基准用于说明。最大的区别来当两个containeritems都比较小。在那种情况下,子集方法要快一个数量级:

>>> smallset = set(range(10))
>>> smallsubset = set(range(5))
>>> %timeit smallset >= smallsubset
110 ns ± 0.702 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each)
>>> %timeit all(x in smallset for x in smallsubset)
951 ns ± 11.5 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)

这看起来有很大的不同。但只要container是一组,all就可以在更大的规模上完美使用:

>>> bigset = set(range(100000))
>>> bigsubset = set(range(50000))
>>> %timeit bigset >= bigsubset
1.14 ms ± 13.9 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
>>> %timeit all(x in bigset for x in bigsubset)
5.96 ms ± 37 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

使用子集测试仍然更快,但在这种规模下仅提高了约5倍。速度的提高归功于Python的快速c支持实现set,但是两种情况下的基本算法都是相同的。

如果items由于其他原因已经将您的信息存储在列表中,那么在使用子集测试方法之前,您必须将它们转换为集合。然后加速降到大约2.5倍:

>>> %timeit bigset >= set(bigsubseq)
2.1 ms ± 49.2 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

而且,如果您container是一个序列,并且需要首先进行转换,那么加速会更小:

>>> %timeit set(bigseq) >= set(bigsubseq)
4.36 ms ± 31.4 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

我们唯一灾难性地得到缓慢结果的时间是当我们container按顺序离开时:

>>> %timeit all(x in bigseq for x in bigsubseq)
184 ms ± 994 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)

当然,只有在必须的情况下,我们才会这样做。如果其中的所有项目bigseq都是可哈希的,那么我们将改为:

>>> %timeit bigset = set(bigseq); all(x in bigset for x in bigsubseq)
7.24 ms ± 78 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

这仅比替代方法(set(bigseq) >= set(bigsubseq),位于4.36以上的时间)快1.66倍。

因此,子集测试通常更快,但幅度并不惊人。另一方面,让我们看看何时all更快。如果items有一千万个值长,并且可能具有不存在的值container怎么办?

>>> %timeit hugeiter = (x * 10 for bss in [bigsubseq] * 2000 for x in bss); set(bigset) >= set(hugeiter)
13.1 s ± 167 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
>>> %timeit hugeiter = (x * 10 for bss in [bigsubseq] * 2000 for x in bss); all(x in bigset for x in hugeiter)
2.33 ms ± 65.2 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

在这种情况下,将生成器转换为生成器组非常浪费。该set构造具有消耗整个生成器。但是的捷径all确保了仅消耗一小部分生成器,因此它比子集测试快四个数量级

诚然,这是一个极端的例子。但正如它所显示的,您不能假设一种方法在所有情况下都更快。

结果

在大多数情况下container,至少当其所有元素都可哈希化时,转换为集合才是值得的。这是因为infor集为O(1),而insequence为O(n)。

另一方面,有时仅使用子集测试是值得的。如果您的测试项目已经存储在集中,则绝对可以这样做。否则,all只会慢一点,并且不需要任何其他存储。它也可以与大型项目生成器一起使用,在这种情况下有时可以大大提高速度。

This does what you want, and will work in nearly all cases:

>>> all(x in ['b', 'a', 'foo', 'bar'] for x in ['a', 'b'])
True

The expression 'a','b' in ['b', 'a', 'foo', 'bar'] doesn’t work as expected because Python interprets it as a tuple:

>>> 'a', 'b'
('a', 'b')
>>> 'a', 5 + 2
('a', 7)
>>> 'a', 'x' in 'xerxes'
('a', True)

Other Options

There are other ways to execute this test, but they won’t work for as many different kinds of inputs. As Kabie points out, you can solve this problem using sets…

>>> set(['a', 'b']).issubset(set(['a', 'b', 'foo', 'bar']))
True
>>> {'a', 'b'} <= {'a', 'b', 'foo', 'bar'}
True

…sometimes:

>>> {'a', ['b']} <= {'a', ['b'], 'foo', 'bar'}
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: unhashable type: 'list'

Sets can only be created with hashable elements. But the generator expression all(x in container for x in items) can handle almost any container type. The only requirement is that container be re-iterable (i.e. not a generator). items can be any iterable at all.

>>> container = [['b'], 'a', 'foo', 'bar']
>>> items = (i for i in ('a', ['b']))
>>> all(x in [['b'], 'a', 'foo', 'bar'] for x in items)
True

Speed Tests

In many cases, the subset test will be faster than all, but the difference isn’t shocking — except when the question is irrelevant because sets aren’t an option. Converting lists to sets just for the purpose of a test like this won’t always be worth the trouble. And converting generators to sets can sometimes be incredibly wasteful, slowing programs down by many orders of magnitude.

Here are a few benchmarks for illustration. The biggest difference comes when both container and items are relatively small. In that case, the subset approach is about an order of magnitude faster:

>>> smallset = set(range(10))
>>> smallsubset = set(range(5))
>>> %timeit smallset >= smallsubset
110 ns ± 0.702 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each)
>>> %timeit all(x in smallset for x in smallsubset)
951 ns ± 11.5 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)

This looks like a big difference. But as long as container is a set, all is still perfectly usable at vastly larger scales:

>>> bigset = set(range(100000))
>>> bigsubset = set(range(50000))
>>> %timeit bigset >= bigsubset
1.14 ms ± 13.9 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
>>> %timeit all(x in bigset for x in bigsubset)
5.96 ms ± 37 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

Using subset testing is still faster, but only by about 5x at this scale. The speed boost is due to Python’s fast c-backed implementation of set, but the fundamental algorithm is the same in both cases.

If your items are already stored in a list for other reasons, then you’ll have to convert them to a set before using the subset test approach. Then the speedup drops to about 2.5x:

>>> %timeit bigset >= set(bigsubseq)
2.1 ms ± 49.2 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

And if your container is a sequence, and needs to be converted first, then the speedup is even smaller:

>>> %timeit set(bigseq) >= set(bigsubseq)
4.36 ms ± 31.4 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

The only time we get disastrously slow results is when we leave container as a sequence:

>>> %timeit all(x in bigseq for x in bigsubseq)
184 ms ± 994 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)

And of course, we’ll only do that if we must. If all the items in bigseq are hashable, then we’ll do this instead:

>>> %timeit bigset = set(bigseq); all(x in bigset for x in bigsubseq)
7.24 ms ± 78 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

That’s just 1.66x faster than the alternative (set(bigseq) >= set(bigsubseq), timed above at 4.36).

So subset testing is generally faster, but not by an incredible margin. On the other hand, let’s look at when all is faster. What if items is ten-million values long, and is likely to have values that aren’t in container?

>>> %timeit hugeiter = (x * 10 for bss in [bigsubseq] * 2000 for x in bss); set(bigset) >= set(hugeiter)
13.1 s ± 167 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
>>> %timeit hugeiter = (x * 10 for bss in [bigsubseq] * 2000 for x in bss); all(x in bigset for x in hugeiter)
2.33 ms ± 65.2 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

Converting the generator into a set turns out to be incredibly wasteful in this case. The set constructor has to consume the entire generator. But the short-circuiting behavior of all ensures that only a small portion of the generator needs to be consumed, so it’s faster than a subset test by four orders of magnitude.

This is an extreme example, admittedly. But as it shows, you can’t assume that one approach or the other will be faster in all cases.

The Upshot

Most of the time, converting container to a set is worth it, at least if all its elements are hashable. That’s because in for sets is O(1), while in for sequences is O(n).

On the other hand, using subset testing is probably only worth it sometimes. Definitely do it if your test items are already stored in a set. Otherwise, all is only a little slower, and doesn’t require any additional storage. It can also be used with large generators of items, and sometimes provides a massive speedup in that case.


回答 1

另一种方法是:

>>> set(['a','b']).issubset( ['b','a','foo','bar'] )
True

Another way to do it:

>>> set(['a','b']).issubset( ['b','a','foo','bar'] )
True

回答 2

我敢肯定,in它具有更高的优先级,,因此您的语句被解释为'a', ('b' in ['b' ...]),然后'a', True由于该'b'值在数组中而被求值。

请参阅先前的答案以了解如何做自己想做的事情。

I’m pretty sure in is having higher precedence than , so your statement is being interpreted as 'a', ('b' in ['b' ...]), which then evaluates to 'a', True since 'b' is in the array.

See previous answer for how to do what you want.


回答 3

如果您要检查所有输入匹配项

>>> all(x in ['b', 'a', 'foo', 'bar'] for x in ['a', 'b'])

如果您想检查至少一场比赛

>>> any(x in ['b', 'a', 'foo', 'bar'] for x in ['a', 'b'])

If you want to check all of your input matches,

>>> all(x in ['b', 'a', 'foo', 'bar'] for x in ['a', 'b'])

if you want to check at least one match,

>>> any(x in ['b', 'a', 'foo', 'bar'] for x in ['a', 'b'])

回答 4

Python解析器将该语句评估为元组,其中第一个值为'a',第二个值为表达式'b' in ['b', 'a', 'foo', 'bar'](其值为True)。

您可以编写一个简单的函数来完成您想要的事情,但是:

def all_in(candidates, sequence):
    for element in candidates:
        if element not in sequence:
            return False
    return True

并这样称呼:

>>> all_in(('a', 'b'), ['b', 'a', 'foo', 'bar'])
True

The Python parser evaluated that statement as a tuple, where the first value was 'a', and the second value is the expression 'b' in ['b', 'a', 'foo', 'bar'] (which evaluates to True).

You can write a simple function do do what you want, though:

def all_in(candidates, sequence):
    for element in candidates:
        if element not in sequence:
            return False
    return True

And call it like:

>>> all_in(('a', 'b'), ['b', 'a', 'foo', 'bar'])
True

回答 5

[x for x in ['a','b'] if x in ['b', 'a', 'foo', 'bar']]

我认为这比选择的答案更好的原因是,您确实不需要调用“ all()”函数。在IF语句中,空列表的值为False,非空列表的值为True。

if [x for x in ['a','b'] if x in ['b', 'a', 'foo', 'bar']]:
    ...Do something...

例:

>>> [x for x in ['a','b'] if x in ['b', 'a', 'foo', 'bar']]
['a', 'b']
>>> [x for x in ['G','F'] if x in ['b', 'a', 'foo', 'bar']]
[]
[x for x in ['a','b'] if x in ['b', 'a', 'foo', 'bar']]

The reason I think this is better than the chosen answer is that you really don’t need to call the ‘all()’ function. Empty list evaluates to False in IF statements, non-empty list evaluates to True.

if [x for x in ['a','b'] if x in ['b', 'a', 'foo', 'bar']]:
    ...Do something...

Example:

>>> [x for x in ['a','b'] if x in ['b', 'a', 'foo', 'bar']]
['a', 'b']
>>> [x for x in ['G','F'] if x in ['b', 'a', 'foo', 'bar']]
[]

回答 6

我想说,我们甚至可以将那些方括号排除在外。

array = ['b', 'a', 'foo', 'bar']
all([i in array for i in 'a', 'b'])

I would say we can even leave those square brackets out.

array = ['b', 'a', 'foo', 'bar']
all([i in array for i in 'a', 'b'])

回答 7

这里给出的两个答案都不会处理重复的元素。例如,如果要测试[1,2,2]是否是[1,2,3,4]的子列表,则两者都将返回True。那可能就是您的意思,但是我只是想澄清一下。如果要为[1,2,3,4]中的[1,2,2]返回false,则需要对两个列表进行排序,并在每个列表上检查带有移动索引的每个项目。for循环稍微复杂一点。

Both of the answers presented here will not handle repeated elements. For example, if you are testing whether [1,2,2] is a sublist of [1,2,3,4], both will return True. That may be what you mean to do, but I just wanted to clarify. If you want to return false for [1,2,2] in [1,2,3,4], you would need to sort both lists and check each item with a moving index on each list. Just a slightly more complicated for loop.


回答 8

没有lambdas,怎么能成为pythonic!..不能被认真对待..但是这种方式也适用:

orig_array = [ ..... ]
test_array = [ ... ]

filter(lambda x:x in test_array, orig_array) == test_array

如果要测试数组中是否包含任何值,请省略结尾部分:

filter(lambda x:x in test_array, orig_array)

how can you be pythonic without lambdas! .. not to be taken seriously .. but this way works too:

orig_array = [ ..... ]
test_array = [ ... ]

filter(lambda x:x in test_array, orig_array) == test_array

leave out the end part if you want to test if any of the values are in the array:

filter(lambda x:x in test_array, orig_array)

回答 9

这是我的做法:

A = ['a','b','c']
B = ['c']
logic = [(x in B) for x in A]
if True in logic:
    do something

Here’s how I did it:

A = ['a','b','c']
B = ['c']
logic = [(x in B) for x in A]
if True in logic:
    do something

声明:本站所有文章,如无特殊说明或标注,均为本站原创发布。任何个人或组织,在未征得本站同意时,禁止复制、盗用、采集、发布本站内容到任何网站、书籍等各类媒体平台。如若本站内容侵犯了原著者的合法权益,可联系我们进行处理。