问题:如何验证一个列表是否是另一个列表的子集?
我需要验证列表是否是另一个列表的子集-我想要的只是布尔返回值。
在相交之后在较小列表上测试相等性是最快的方法吗?鉴于需要比较的数据集数量,性能至关重要。
根据讨论添加更多事实:
在许多测试中,两个列表中的两个列表是否相同?它作为静态查找表之一来执行。
需要列表吗?事实并非如此-静态查找表可以是执行效果最好的任何内容。动态命令是一个字典,我们从中提取密钥以执行静态查找。
在这种情况下,最佳解决方案是什么?
I need to verify if a list is a subset of another – a boolean return is all I seek.
Is testing equality on the smaller list after an intersection the fastest way to do this? Performance is of utmost importance given the number of datasets that need to be compared.
Adding further facts based on discussions:
Will either of the lists be the same for many tests? It does as one of them is a static lookup table.
Does it need to be a list? It does not – the static lookup table can be anything that performs best. The dynamic one is a dict from which we extract the keys to perform a static lookup on.
What would be the optimal solution given the scenario?
回答 0
Python为此提供的性能函数是set.issubset
。但是,它确实有一些限制,使得不清楚是否是您问题的答案。
列表可能包含多个项目并具有特定顺序。一套没有。此外,设置仅适用于可哈希对象。
您是在询问子集还是子序列(这意味着您需要一个字符串搜索算法)?在许多测试中,两个列表中的两个列表是否相同?列表中包含哪些数据类型?而且,这是否需要列出清单?
您的其他文章与字典和列表相交,使类型更清晰,并且确实推荐使用字典键视图来实现类似集合的功能。在那种情况下,之所以可以工作是因为字典键的行为就像一个集合(以至于在我们使用Python进行集合之前,我们都使用字典)。一个人想知道问题如何在三个小时内变得不那么具体。
The performant function Python provides for this is set.issubset
. It does have a few restrictions that make it unclear if it’s the answer to your question, however.
A list may contain items multiple times and has a specific order. A set does not. Additionally, sets only work on hashable objects.
Are you asking about subset or subsequence (which means you’ll want a string search algorithm)? Will either of the lists be the same for many tests? What are the datatypes contained in the list? And for that matter, does it need to be a list?
Your other post intersect a dict and list made the types clearer and did get a recommendation to use dictionary key views for their set-like functionality. In that case it was known to work because dictionary keys behave like a set (so much so that before we had sets in Python we used dictionaries). One wonders how the issue got less specific in three hours.
回答 1
>>> a = [1, 3, 5]
>>> b = [1, 3, 5, 8]
>>> c = [3, 5, 9]
>>> set(a) <= set(b)
True
>>> set(c) <= set(b)
False
>>> a = ['yes', 'no', 'hmm']
>>> b = ['yes', 'no', 'hmm', 'well']
>>> c = ['sorry', 'no', 'hmm']
>>>
>>> set(a) <= set(b)
True
>>> set(c) <= set(b)
False
>>> a = [1, 3, 5]
>>> b = [1, 3, 5, 8]
>>> c = [3, 5, 9]
>>> set(a) <= set(b)
True
>>> set(c) <= set(b)
False
>>> a = ['yes', 'no', 'hmm']
>>> b = ['yes', 'no', 'hmm', 'well']
>>> c = ['sorry', 'no', 'hmm']
>>>
>>> set(a) <= set(b)
True
>>> set(c) <= set(b)
False
回答 2
one = [1, 2, 3]
two = [9, 8, 5, 3, 2, 1]
all(x in two for x in one)
说明:生成器通过遍历列表one
检查该项是否在list中来创建布尔值two
。 如果每个项目都是真实的,则返回,否则all()
返回。True
False
还有一个优点是all
,在缺少元素的第一个实例上返回False,而不必处理每个项目。
one = [1, 2, 3]
two = [9, 8, 5, 3, 2, 1]
all(x in two for x in one)
Explanation: Generator creating booleans by looping through list one
checking if that item is in list two
. all()
returns True
if every item is truthy, else False
.
There is also an advantage that all
return False on the first instance of a missing element rather than having to process every item.
回答 3
假设项目是可哈希的
>>> from collections import Counter
>>> not Counter([1, 2]) - Counter([1])
False
>>> not Counter([1, 2]) - Counter([1, 2])
True
>>> not Counter([1, 2, 2]) - Counter([1, 2])
False
如果您不在乎重复的项目,例如 [1, 2, 2]
并且[1, 2]
然后只需使用:
>>> set([1, 2, 2]).issubset([1, 2])
True
在相交之后在较小列表上测试相等性是最快的方法吗?
.issubset
将是最快的方法。在测试之前检查长度issubset
不会提高速度,因为您仍然有O(N + M)个项目要进行迭代和检查。
Assuming the items are hashable
>>> from collections import Counter
>>> not Counter([1, 2]) - Counter([1])
False
>>> not Counter([1, 2]) - Counter([1, 2])
True
>>> not Counter([1, 2, 2]) - Counter([1, 2])
False
If you don’t care about duplicate items eg. [1, 2, 2]
and [1, 2]
then just use:
>>> set([1, 2, 2]).issubset([1, 2])
True
Is testing equality on the smaller list after an intersection the fastest way to do this?
.issubset
will be the fastest way to do it. Checking the length before testing issubset
will not improve speed because you still have O(N + M) items to iterate through and check.
回答 4
另一种解决方案是使用intersection
。
one = [1, 2, 3]
two = [9, 8, 5, 3, 2, 1]
set(one).intersection(set(two)) == set(one)
集的交集将包含 set one
(要么)
one = [1, 2, 3]
two = [9, 8, 5, 3, 2, 1]
set(one) & (set(two)) == set(one)
One more solution would be to use a intersection
.
one = [1, 2, 3]
two = [9, 8, 5, 3, 2, 1]
set(one).intersection(set(two)) == set(one)
The intersection of the sets would contain of set one
(OR)
one = [1, 2, 3]
two = [9, 8, 5, 3, 2, 1]
set(one) & (set(two)) == set(one)
回答 5
one = [1, 2, 3]
two = [9, 8, 5, 3, 2, 1]
set(x in two for x in one) == set([True])
如果list1在列表2中:
one = [1, 2, 3]
two = [9, 8, 5, 3, 2, 1]
set(x in two for x in one) == set([True])
If list1 is in list 2:
回答 6
集合论不适用于列表,因为重复使用集合论会导致错误的答案。
例如:
a = [1, 3, 3, 3, 5]
b = [1, 3, 3, 4, 5]
set(b) > set(a)
没有任何意义。是的,它给出了错误的答案,但这是不正确的,因为集合论只是在进行比较:1,3,5与1,3,4,5。您必须包括所有重复项。
取而代之的是,您必须计算每个项目的每次出现次数,并进行大于等于的检查。这不是很昂贵,因为它不使用O(N ^ 2)运算,并且不需要快速排序。
#!/usr/bin/env python
from collections import Counter
def containedInFirst(a, b):
a_count = Counter(a)
b_count = Counter(b)
for key in b_count:
if a_count.has_key(key) == False:
return False
if b_count[key] > a_count[key]:
return False
return True
a = [1, 3, 3, 3, 5]
b = [1, 3, 3, 4, 5]
print "b in a: ", containedInFirst(a, b)
a = [1, 3, 3, 3, 4, 4, 5]
b = [1, 3, 3, 4, 5]
print "b in a: ", containedInFirst(a, b)
然后运行它,您将得到:
$ python contained.py
b in a: False
b in a: True
Set theory is inappropriate for lists since duplicates will result in wrong answers using set theory.
For example:
a = [1, 3, 3, 3, 5]
b = [1, 3, 3, 4, 5]
set(b) > set(a)
has no meaning. Yes, it gives a false answer but this is not correct since set theory is just comparing: 1,3,5 versus 1,3,4,5. You must include all duplicates.
Instead you must count each occurrence of each item and do a greater than equal to check. This is not very expensive, because it is not using O(N^2) operations and does not require quick sort.
#!/usr/bin/env python
from collections import Counter
def containedInFirst(a, b):
a_count = Counter(a)
b_count = Counter(b)
for key in b_count:
if a_count.has_key(key) == False:
return False
if b_count[key] > a_count[key]:
return False
return True
a = [1, 3, 3, 3, 5]
b = [1, 3, 3, 4, 5]
print "b in a: ", containedInFirst(a, b)
a = [1, 3, 3, 3, 4, 4, 5]
b = [1, 3, 3, 4, 5]
print "b in a: ", containedInFirst(a, b)
Then running this you get:
$ python contained.py
b in a: False
b in a: True
回答 7
如果我迟到聚会请原谅。;)
要检查一个set A
是否是set B
,Python
具有A.issubset(B)
和的子集A <= B
。它set
只能工作,而且效果很好,但是内部实现的复杂性是未知的。参考:https : //docs.python.org/2/library/sets.html#set-objects
我想出了一种算法来检查是否list A
是list B
以下注释的子集。
检查一个列表是否是另一个列表的子集
is_subset = True;
A = [9, 3, 11, 1, 7, 2];
B = [11, 4, 6, 2, 15, 1, 9, 8, 5, 3];
print(A, B);
# skip checking if list A has elements more than list B
if len(A) > len(B):
is_subset = False;
else:
# complexity of sorting using quicksort or merge sort: O(n ln n)
# use best sorting algorithm available to minimize complexity
A.sort();
B.sort();
print(A, B);
# complexity: O(n^2)
# for a in A:
# if a not in B:
# is_subset = False;
# break;
# complexity: O(n)
is_found = False;
last_index_j = 0;
for i in range(len(A)):
for j in range(last_index_j, len(B)):
is_found = False;
print("i=" + str(i) + ", j=" + str(j) + ", " + str(A[i]) + "==" + str(B[j]) + "?");
if B[j] <= A[i]:
if A[i] == B[j]:
is_found = True;
last_index_j = j;
else:
is_found = False;
break;
if is_found:
print("Found: " + str(A[i]));
last_index_j = last_index_j + 1;
break;
else:
print("Not found: " + str(A[i]));
if is_found == False:
is_subset = False;
break;
print("subset") if is_subset else print("not subset");
输出量
[9, 3, 11, 1, 7, 2] [11, 4, 6, 2, 15, 1, 9, 8, 5, 3]
[1, 2, 3, 7, 9, 11] [1, 2, 3, 4, 5, 6, 8, 9, 11, 15]
i=0, j=0, 1==1?
Found: 1
i=1, j=1, 2==1?
Not found: 2
i=1, j=2, 2==2?
Found: 2
i=2, j=3, 3==3?
Found: 3
i=3, j=4, 7==4?
Not found: 7
i=3, j=5, 7==5?
Not found: 7
i=3, j=6, 7==6?
Not found: 7
i=3, j=7, 7==8?
not subset
Pardon me if I am late to the party. ;)
To check if one set A
is subset of set B
, Python
has A.issubset(B)
and A <= B
. It works on set
only and works great BUT the complexity of internal implementation is unknown. Reference: https://docs.python.org/2/library/sets.html#set-objects
I came up with an algorithm to check if list A
is a subset of list B
with following remarks.
- To reduce complexity of finding subset, I find it appropriate to
sort
both lists first before comparing elements to qualify for
subset.
- It helped me to
break
the loop
when value of element of second list B[j]
is greater than value of element of first list A[i]
.
last_index_j
is used to start loop
over list B
where it last left off. It helps avoid starting comparisons from the start of
list B
(which is, as you might guess unnecessary, to start list B
from index 0
in subsequent iterations
.)
Complexity will be O(n ln n)
each for sorting both lists and O(n)
for checking for subset.
O(n ln n) + O(n ln n) + O(n) = O(n ln n)
.
Code has lots of print
statements to see what’s going on at each iteration
of the loop
. These are meant for understanding
only.
Check if one list is subset of another list
is_subset = True;
A = [9, 3, 11, 1, 7, 2];
B = [11, 4, 6, 2, 15, 1, 9, 8, 5, 3];
print(A, B);
# skip checking if list A has elements more than list B
if len(A) > len(B):
is_subset = False;
else:
# complexity of sorting using quicksort or merge sort: O(n ln n)
# use best sorting algorithm available to minimize complexity
A.sort();
B.sort();
print(A, B);
# complexity: O(n^2)
# for a in A:
# if a not in B:
# is_subset = False;
# break;
# complexity: O(n)
is_found = False;
last_index_j = 0;
for i in range(len(A)):
for j in range(last_index_j, len(B)):
is_found = False;
print("i=" + str(i) + ", j=" + str(j) + ", " + str(A[i]) + "==" + str(B[j]) + "?");
if B[j] <= A[i]:
if A[i] == B[j]:
is_found = True;
last_index_j = j;
else:
is_found = False;
break;
if is_found:
print("Found: " + str(A[i]));
last_index_j = last_index_j + 1;
break;
else:
print("Not found: " + str(A[i]));
if is_found == False:
is_subset = False;
break;
print("subset") if is_subset else print("not subset");
Output
[9, 3, 11, 1, 7, 2] [11, 4, 6, 2, 15, 1, 9, 8, 5, 3]
[1, 2, 3, 7, 9, 11] [1, 2, 3, 4, 5, 6, 8, 9, 11, 15]
i=0, j=0, 1==1?
Found: 1
i=1, j=1, 2==1?
Not found: 2
i=1, j=2, 2==2?
Found: 2
i=2, j=3, 3==3?
Found: 3
i=3, j=4, 7==4?
Not found: 7
i=3, j=5, 7==5?
Not found: 7
i=3, j=6, 7==6?
Not found: 7
i=3, j=7, 7==8?
not subset
回答 8
下面的代码检查给定集合是否为另一个集合的“适当子集”
def is_proper_subset(set, superset):
return all(x in superset for x in set) and len(set)<len(superset)
Below code checks whether a given set is a “proper subset” of another set
def is_proper_subset(set, superset):
return all(x in superset for x in set) and len(set)<len(superset)
回答 9
在python 3.5中,您可以执行[*set()][index]
来获取元素。这是比其他方法慢得多的解决方案。
one = [1, 2, 3]
two = [9, 8, 5, 3, 2, 1]
result = set(x in two for x in one)
[*result][0] == True
或仅与len和set
len(set(a+b)) == len(set(a))
In python 3.5 you can do a [*set()][index]
to get the element. It is much slower solution than other methods.
one = [1, 2, 3]
two = [9, 8, 5, 3, 2, 1]
result = set(x in two for x in one)
[*result][0] == True
or just with len and set
len(set(a+b)) == len(set(a))
回答 10
这是我如何知道一个列表是否是另一个列表的子集,对于我来说,顺序对我很重要。
def is_subset(list_long,list_short):
short_length = len(list_short)
subset_list = []
for i in range(len(list_long)-short_length+1):
subset_list.append(list_long[i:i+short_length])
if list_short in subset_list:
return True
else: return False
Here is how I know if one list is a subset of another one, the sequence matters to me in my case.
def is_subset(list_long,list_short):
short_length = len(list_short)
subset_list = []
for i in range(len(list_long)-short_length+1):
subset_list.append(list_long[i:i+short_length])
if list_short in subset_list:
return True
else: return False
回答 11
大多数解决方案都认为列表没有重复项。如果您的列表确实有重复,可以尝试以下操作:
def isSubList(subList,mlist):
uniqueElements=set(subList)
for e in uniqueElements:
if subList.count(e) > mlist.count(e):
return False
# It is sublist
return True
这样可以确保子列表中的元素绝不会与列表中的元素不同,也不会包含更多的公共元素。
lst=[1,2,2,3,4]
sl1=[2,2,3]
sl2=[2,2,2]
sl3=[2,5]
print(isSubList(sl1,lst)) # True
print(isSubList(sl2,lst)) # False
print(isSubList(sl3,lst)) # False
Most of the solutions consider that the lists do not have duplicates. In case your lists do have duplicates you can try this:
def isSubList(subList,mlist):
uniqueElements=set(subList)
for e in uniqueElements:
if subList.count(e) > mlist.count(e):
return False
# It is sublist
return True
It ensures the sublist never has different elements than list or a greater amount of a common element.
lst=[1,2,2,3,4]
sl1=[2,2,3]
sl2=[2,2,2]
sl3=[2,5]
print(isSubList(sl1,lst)) # True
print(isSubList(sl2,lst)) # False
print(isSubList(sl3,lst)) # False
回答 12
由于没有人考虑过将字符串与字符串进行比较,因此这是我的建议。
当然,您可能想检查管道(“ |”)是否不在两个列表中,并且可能自动选择另一个字符,但是您明白了。
使用空字符串作为分隔符不是解决方案,因为数字可以有几个数字([12,3]!= [1,23])
def issublist(l1,l2):
return '|'.join([str(i) for i in l1]) in '|'.join([str(i) for i in l2])
Since no one has considered comparing two strings, here’s my proposal.
You may of course want to check if the pipe (“|”) is not part of either lists and maybe chose automatically another char, but you got the idea.
Using an empty string as separator is not a solution since the numbers can have several digits ([12,3] != [1,23])
def issublist(l1,l2):
return '|'.join([str(i) for i in l1]) in '|'.join([str(i) for i in l2])
回答 13
如果您要问一个列表是否“包含”在另一个列表中,则:
>>>if listA in listB: return True
如果您要询问listA中的每个元素在listB中是否具有相等数量的匹配元素,请尝试:
all(True if listA.count(item) <= listB.count(item) else False for item in listA)
If you are asking if one list is “contained” in another list then:
>>>if listA in listB: return True
If you are asking if each element in listA has an equal number of matching elements in listB try:
all(True if listA.count(item) <= listB.count(item) else False for item in listA)
回答 14
如果a2 is subset of a1
,那么Length of set(a1 + a2) == Length of set(a1)
a1 = [1, 2, 3, 4, 5]
a2 = [1, 2, 3]
len(set(a1)) == len(set(a1 + a2))
If a2 is subset of a1
, then Length of set(a1 + a2) == Length of set(a1)
a1 = [1, 2, 3, 4, 5]
a2 = [1, 2, 3]
len(set(a1)) == len(set(a1 + a2))