如何验证一个列表是否是另一个列表的子集?

问题:如何验证一个列表是否是另一个列表的子集?

我需要验证列表是否是另一个列表的子集-我想要的只是布尔返回值。

在相交之后在较小列表上测试相等性是最快的方法吗?鉴于需要比较的数据集数量,性能至关重要。

根据讨论添加更多事实:

  1. 在许多测试中,两个列表中的两个列表是否相同?它作为静态查找表之一来执行。

  2. 需要列表吗?事实并非如此-静态查找表可以是执行效果最好的任何内容。动态命令是一个字典,我们从中提取密钥以执行静态查找。

在这种情况下,最佳解决方案是什么?

I need to verify if a list is a subset of another – a boolean return is all I seek.

Is testing equality on the smaller list after an intersection the fastest way to do this? Performance is of utmost importance given the number of datasets that need to be compared.

Adding further facts based on discussions:

  1. Will either of the lists be the same for many tests? It does as one of them is a static lookup table.

  2. Does it need to be a list? It does not – the static lookup table can be anything that performs best. The dynamic one is a dict from which we extract the keys to perform a static lookup on.

What would be the optimal solution given the scenario?


回答 0

Python为此提供的性能函数是set.issubset。但是,它确实有一些限制,使得不清楚是否是您问题的答案。

列表可能包含多个项目并具有特定顺序。一套没有。此外,设置仅适用于可哈希对象。

您是在询问子集还是子序列(这意味着您需要一个字符串搜索算法)?在许多测试中,两个列表中的两个列表是否相同?列表中包含哪些数据类型?而且,这是否需要列出清单?

您的其他文章与字典和列表相交,使类型更清晰,并且确实推荐使用字典键视图来实现类似集合的功能。在那种情况下,之所以可以工作是因为字典键的行为就像一个集合(以至于在我们使用Python进行集合之前,我们都使用字典)。一个人想知道问题如何在三个小时内变得不那么具体。

The performant function Python provides for this is set.issubset. It does have a few restrictions that make it unclear if it’s the answer to your question, however.

A list may contain items multiple times and has a specific order. A set does not. Additionally, sets only work on hashable objects.

Are you asking about subset or subsequence (which means you’ll want a string search algorithm)? Will either of the lists be the same for many tests? What are the datatypes contained in the list? And for that matter, does it need to be a list?

Your other post intersect a dict and list made the types clearer and did get a recommendation to use dictionary key views for their set-like functionality. In that case it was known to work because dictionary keys behave like a set (so much so that before we had sets in Python we used dictionaries). One wonders how the issue got less specific in three hours.


回答 1

>>> a = [1, 3, 5]
>>> b = [1, 3, 5, 8]
>>> c = [3, 5, 9]
>>> set(a) <= set(b)
True
>>> set(c) <= set(b)
False

>>> a = ['yes', 'no', 'hmm']
>>> b = ['yes', 'no', 'hmm', 'well']
>>> c = ['sorry', 'no', 'hmm']
>>> 
>>> set(a) <= set(b)
True
>>> set(c) <= set(b)
False
>>> a = [1, 3, 5]
>>> b = [1, 3, 5, 8]
>>> c = [3, 5, 9]
>>> set(a) <= set(b)
True
>>> set(c) <= set(b)
False

>>> a = ['yes', 'no', 'hmm']
>>> b = ['yes', 'no', 'hmm', 'well']
>>> c = ['sorry', 'no', 'hmm']
>>> 
>>> set(a) <= set(b)
True
>>> set(c) <= set(b)
False

回答 2

one = [1, 2, 3]
two = [9, 8, 5, 3, 2, 1]

all(x in two for x in one)

说明:生成器通过遍历列表one检查该项是否在list中来创建布尔值two。 如果每个项目都是真实的,则返回,否则all()返回。TrueFalse

还有一个优点是all,在缺少元素的第一个实例上返回False,而不必处理每个项目。

one = [1, 2, 3]
two = [9, 8, 5, 3, 2, 1]

all(x in two for x in one)

Explanation: Generator creating booleans by looping through list one checking if that item is in list two. all() returns True if every item is truthy, else False.

There is also an advantage that all return False on the first instance of a missing element rather than having to process every item.


回答 3

假设项目是可哈希的

>>> from collections import Counter
>>> not Counter([1, 2]) - Counter([1])
False
>>> not Counter([1, 2]) - Counter([1, 2])
True
>>> not Counter([1, 2, 2]) - Counter([1, 2])
False

如果您不在乎重复的项目,例如 [1, 2, 2]并且[1, 2]然后只需使用:

>>> set([1, 2, 2]).issubset([1, 2])
True

在相交之后在较小列表上测试相等性是最快的方法吗?

.issubset将是最快的方法。在测试之前检查长度issubset不会提高速度,因为您仍然有O(N + M)个项目要进行迭代和检查。

Assuming the items are hashable

>>> from collections import Counter
>>> not Counter([1, 2]) - Counter([1])
False
>>> not Counter([1, 2]) - Counter([1, 2])
True
>>> not Counter([1, 2, 2]) - Counter([1, 2])
False

If you don’t care about duplicate items eg. [1, 2, 2] and [1, 2] then just use:

>>> set([1, 2, 2]).issubset([1, 2])
True

Is testing equality on the smaller list after an intersection the fastest way to do this?

.issubset will be the fastest way to do it. Checking the length before testing issubset will not improve speed because you still have O(N + M) items to iterate through and check.


回答 4

另一种解决方案是使用intersection

one = [1, 2, 3]
two = [9, 8, 5, 3, 2, 1]

set(one).intersection(set(two)) == set(one)

集的交集将包含 set one

(要么)

one = [1, 2, 3]
two = [9, 8, 5, 3, 2, 1]

set(one) & (set(two)) == set(one)

One more solution would be to use a intersection.

one = [1, 2, 3]
two = [9, 8, 5, 3, 2, 1]

set(one).intersection(set(two)) == set(one)

The intersection of the sets would contain of set one

(OR)

one = [1, 2, 3]
two = [9, 8, 5, 3, 2, 1]

set(one) & (set(two)) == set(one)

回答 5

one = [1, 2, 3]
two = [9, 8, 5, 3, 2, 1]

set(x in two for x in one) == set([True])

如果list1在列表2中:

  • (x in two for x in one)生成的列表True

  • 当我们做一个set(x in two for x in one)只有一个元素(真)。

one = [1, 2, 3]
two = [9, 8, 5, 3, 2, 1]

set(x in two for x in one) == set([True])

If list1 is in list 2:

  • (x in two for x in one) generates a list of True.

  • when we do a set(x in two for x in one) has only one element (True).


回答 6

集合论不适用于列表,因为重复使用集合论会导致错误的答案。

例如:

a = [1, 3, 3, 3, 5]
b = [1, 3, 3, 4, 5]
set(b) > set(a)

没有任何意义。是的,它给出了错误的答案,但这是不正确的,因为集合论只是在进行比较:1,3,5与1,3,4,5。您必须包括所有重复项。

取而代之的是,您必须计算每个项目的每次出现次数,并进行大于等于的检查。这不是很昂贵,因为它不使用O(N ^ 2)运算,并且不需要快速排序。

#!/usr/bin/env python

from collections import Counter

def containedInFirst(a, b):
  a_count = Counter(a)
  b_count = Counter(b)
  for key in b_count:
    if a_count.has_key(key) == False:
      return False
    if b_count[key] > a_count[key]:
      return False
  return True


a = [1, 3, 3, 3, 5]
b = [1, 3, 3, 4, 5]
print "b in a: ", containedInFirst(a, b)

a = [1, 3, 3, 3, 4, 4, 5]
b = [1, 3, 3, 4, 5]
print "b in a: ", containedInFirst(a, b)

然后运行它,您将得到:

$ python contained.py 
b in a:  False
b in a:  True

Set theory is inappropriate for lists since duplicates will result in wrong answers using set theory.

For example:

a = [1, 3, 3, 3, 5]
b = [1, 3, 3, 4, 5]
set(b) > set(a)

has no meaning. Yes, it gives a false answer but this is not correct since set theory is just comparing: 1,3,5 versus 1,3,4,5. You must include all duplicates.

Instead you must count each occurrence of each item and do a greater than equal to check. This is not very expensive, because it is not using O(N^2) operations and does not require quick sort.

#!/usr/bin/env python

from collections import Counter

def containedInFirst(a, b):
  a_count = Counter(a)
  b_count = Counter(b)
  for key in b_count:
    if a_count.has_key(key) == False:
      return False
    if b_count[key] > a_count[key]:
      return False
  return True


a = [1, 3, 3, 3, 5]
b = [1, 3, 3, 4, 5]
print "b in a: ", containedInFirst(a, b)

a = [1, 3, 3, 3, 4, 4, 5]
b = [1, 3, 3, 4, 5]
print "b in a: ", containedInFirst(a, b)

Then running this you get:

$ python contained.py 
b in a:  False
b in a:  True

回答 7

如果我迟到聚会请原谅。;)

要检查一个set A是否是set BPython具有A.issubset(B)和的子集A <= B。它set只能工作,而且效果很好,但是内部实现的复杂性是未知的。参考:https : //docs.python.org/2/library/sets.html#set-objects

我想出了一种算法来检查是否list Alist B以下注释的子集。

  • 为了降低查找子集的复杂性,我发现 sort先比较两个列表,然后再比较要符合子集条件的元素。
  • 它帮助我breakloop当第二列表元素的值B[j]比第一个列表元素的值A[i]
  • last_index_j用于启动looplist B其最后一次离开。它有助于避免从的开头开始进行比较 list B(您可能会不必要地list Bindex 0随后的开始iterations)进行比较。
  • O(n ln n)排序列表和O(n)检查子集都将很复杂。
    O(n ln n) + O(n ln n) + O(n) = O(n ln n)

  • 代码中有很多print语句,以查看每个处iteration发生了什么loop。这些仅用于理解。

检查一个列表是否是另一个列表的子集

is_subset = True;

A = [9, 3, 11, 1, 7, 2];
B = [11, 4, 6, 2, 15, 1, 9, 8, 5, 3];

print(A, B);

# skip checking if list A has elements more than list B
if len(A) > len(B):
    is_subset = False;
else:
    # complexity of sorting using quicksort or merge sort: O(n ln n)
    # use best sorting algorithm available to minimize complexity
    A.sort();
    B.sort();

    print(A, B);

    # complexity: O(n^2)
    # for a in A:
    #   if a not in B:
    #       is_subset = False;
    #       break;

    # complexity: O(n)
    is_found = False;
    last_index_j = 0;

    for i in range(len(A)):
        for j in range(last_index_j, len(B)):
            is_found = False;

            print("i=" + str(i) + ", j=" + str(j) + ", " + str(A[i]) + "==" + str(B[j]) + "?");

            if B[j] <= A[i]:
                if A[i] == B[j]:
                    is_found = True;
                last_index_j = j;
            else:
                is_found = False;
                break;

            if is_found:
                print("Found: " + str(A[i]));
                last_index_j = last_index_j + 1;
                break;
            else:
                print("Not found: " + str(A[i]));

        if is_found == False:
            is_subset = False;
            break;

print("subset") if is_subset else print("not subset");

输出量

[9, 3, 11, 1, 7, 2] [11, 4, 6, 2, 15, 1, 9, 8, 5, 3]
[1, 2, 3, 7, 9, 11] [1, 2, 3, 4, 5, 6, 8, 9, 11, 15]
i=0, j=0, 1==1?
Found: 1
i=1, j=1, 2==1?
Not found: 2
i=1, j=2, 2==2?
Found: 2
i=2, j=3, 3==3?
Found: 3
i=3, j=4, 7==4?
Not found: 7
i=3, j=5, 7==5?
Not found: 7
i=3, j=6, 7==6?
Not found: 7
i=3, j=7, 7==8?
not subset

Pardon me if I am late to the party. ;)

To check if one set A is subset of set B, Python has A.issubset(B) and A <= B. It works on set only and works great BUT the complexity of internal implementation is unknown. Reference: https://docs.python.org/2/library/sets.html#set-objects

I came up with an algorithm to check if list A is a subset of list B with following remarks.

  • To reduce complexity of finding subset, I find it appropriate to sort both lists first before comparing elements to qualify for subset.
  • It helped me to break the loop when value of element of second list B[j] is greater than value of element of first list A[i].
  • last_index_j is used to start loop over list B where it last left off. It helps avoid starting comparisons from the start of list B (which is, as you might guess unnecessary, to start list B from index 0 in subsequent iterations.)
  • Complexity will be O(n ln n) each for sorting both lists and O(n) for checking for subset.
    O(n ln n) + O(n ln n) + O(n) = O(n ln n).

  • Code has lots of print statements to see what’s going on at each iteration of the loop. These are meant for understanding only.

Check if one list is subset of another list

is_subset = True;

A = [9, 3, 11, 1, 7, 2];
B = [11, 4, 6, 2, 15, 1, 9, 8, 5, 3];

print(A, B);

# skip checking if list A has elements more than list B
if len(A) > len(B):
    is_subset = False;
else:
    # complexity of sorting using quicksort or merge sort: O(n ln n)
    # use best sorting algorithm available to minimize complexity
    A.sort();
    B.sort();

    print(A, B);

    # complexity: O(n^2)
    # for a in A:
    #   if a not in B:
    #       is_subset = False;
    #       break;

    # complexity: O(n)
    is_found = False;
    last_index_j = 0;

    for i in range(len(A)):
        for j in range(last_index_j, len(B)):
            is_found = False;

            print("i=" + str(i) + ", j=" + str(j) + ", " + str(A[i]) + "==" + str(B[j]) + "?");

            if B[j] <= A[i]:
                if A[i] == B[j]:
                    is_found = True;
                last_index_j = j;
            else:
                is_found = False;
                break;

            if is_found:
                print("Found: " + str(A[i]));
                last_index_j = last_index_j + 1;
                break;
            else:
                print("Not found: " + str(A[i]));

        if is_found == False:
            is_subset = False;
            break;

print("subset") if is_subset else print("not subset");

Output

[9, 3, 11, 1, 7, 2] [11, 4, 6, 2, 15, 1, 9, 8, 5, 3]
[1, 2, 3, 7, 9, 11] [1, 2, 3, 4, 5, 6, 8, 9, 11, 15]
i=0, j=0, 1==1?
Found: 1
i=1, j=1, 2==1?
Not found: 2
i=1, j=2, 2==2?
Found: 2
i=2, j=3, 3==3?
Found: 3
i=3, j=4, 7==4?
Not found: 7
i=3, j=5, 7==5?
Not found: 7
i=3, j=6, 7==6?
Not found: 7
i=3, j=7, 7==8?
not subset

回答 8

下面的代码检查给定集合是否为另一个集合的“适当子集”

 def is_proper_subset(set, superset):
     return all(x in superset for x in set) and len(set)<len(superset)

Below code checks whether a given set is a “proper subset” of another set

 def is_proper_subset(set, superset):
     return all(x in superset for x in set) and len(set)<len(superset)

回答 9

在python 3.5中,您可以执行[*set()][index]来获取元素。这是比其他方法慢得多的解决方案。

one = [1, 2, 3]
two = [9, 8, 5, 3, 2, 1]

result = set(x in two for x in one)

[*result][0] == True

或仅与len和set

len(set(a+b)) == len(set(a))

In python 3.5 you can do a [*set()][index] to get the element. It is much slower solution than other methods.

one = [1, 2, 3]
two = [9, 8, 5, 3, 2, 1]

result = set(x in two for x in one)

[*result][0] == True

or just with len and set

len(set(a+b)) == len(set(a))

回答 10

这是我如何知道一个列表是否是另一个列表的子集,对于我来说,顺序对我很重要。

def is_subset(list_long,list_short):
    short_length = len(list_short)
    subset_list = []
    for i in range(len(list_long)-short_length+1):
        subset_list.append(list_long[i:i+short_length])
    if list_short in subset_list:
        return True
    else: return False

Here is how I know if one list is a subset of another one, the sequence matters to me in my case.

def is_subset(list_long,list_short):
    short_length = len(list_short)
    subset_list = []
    for i in range(len(list_long)-short_length+1):
        subset_list.append(list_long[i:i+short_length])
    if list_short in subset_list:
        return True
    else: return False

回答 11

大多数解决方案都认为列表没有重复项。如果您的列表确实有重复,可以尝试以下操作:

def isSubList(subList,mlist):
    uniqueElements=set(subList)
    for e in uniqueElements:
        if subList.count(e) > mlist.count(e):
            return False     
    # It is sublist
    return True

这样可以确保子列表中的元素绝不会与列表中的元素不同,也不会包含更多的公共元素。

lst=[1,2,2,3,4]
sl1=[2,2,3]
sl2=[2,2,2]
sl3=[2,5]

print(isSubList(sl1,lst)) # True
print(isSubList(sl2,lst)) # False
print(isSubList(sl3,lst)) # False

Most of the solutions consider that the lists do not have duplicates. In case your lists do have duplicates you can try this:

def isSubList(subList,mlist):
    uniqueElements=set(subList)
    for e in uniqueElements:
        if subList.count(e) > mlist.count(e):
            return False     
    # It is sublist
    return True

It ensures the sublist never has different elements than list or a greater amount of a common element.

lst=[1,2,2,3,4]
sl1=[2,2,3]
sl2=[2,2,2]
sl3=[2,5]

print(isSubList(sl1,lst)) # True
print(isSubList(sl2,lst)) # False
print(isSubList(sl3,lst)) # False

回答 12

由于没有人考虑过将字符串与字符串进行比较,因此这是我的建议。

当然,您可能想检查管道(“ |”)是否不在两个列表中,并且可能自动选择另一个字符,但是您明白了。

使用空字符串作为分隔符不是解决方案,因为数字可以有几个数字([12,3]!= [1,23])

def issublist(l1,l2):
    return '|'.join([str(i) for i in l1]) in '|'.join([str(i) for i in l2])

Since no one has considered comparing two strings, here’s my proposal.

You may of course want to check if the pipe (“|”) is not part of either lists and maybe chose automatically another char, but you got the idea.

Using an empty string as separator is not a solution since the numbers can have several digits ([12,3] != [1,23])

def issublist(l1,l2):
    return '|'.join([str(i) for i in l1]) in '|'.join([str(i) for i in l2])

回答 13

如果您要问一个列表是否“包含”在另一个列表中,则:

>>>if listA in listB: return True

如果您要询问listA中的每个元素在listB中是否具有相等数量的匹配元素,请尝试:

all(True if listA.count(item) <= listB.count(item) else False for item in listA)

If you are asking if one list is “contained” in another list then:

>>>if listA in listB: return True

If you are asking if each element in listA has an equal number of matching elements in listB try:

all(True if listA.count(item) <= listB.count(item) else False for item in listA)

回答 14

如果a2 is subset of a1,那么Length of set(a1 + a2) == Length of set(a1)

a1 = [1, 2, 3, 4, 5]
a2 = [1, 2, 3]

len(set(a1)) == len(set(a1 + a2))

If a2 is subset of a1, then Length of set(a1 + a2) == Length of set(a1)

a1 = [1, 2, 3, 4, 5]
a2 = [1, 2, 3]

len(set(a1)) == len(set(a1 + a2))