Python在一个列表中查找不在另一个列表中的元素[重复]

问题:Python在一个列表中查找不在另一个列表中的元素[重复]

我需要比较两个列表,以便创建在一个列表中找到但不在另一个列表中找到的特定元素的新列表。例如:

main_list=[]
list_1=["a", "b", "c", "d", "e"]
list_2=["a", "f", "c", "m"] 

我想遍历list_1,并将list_2中所有在list_1中找不到的元素附加到main_list。

结果应为:

main_list=["f", "m"]

如何使用python做到这一点?

I need to compare two lists in order to create a new list of specific elements found in one list but not in the other. For example:

main_list=[]
list_1=["a", "b", "c", "d", "e"]
list_2=["a", "f", "c", "m"] 

I want to loop through list_1 and append to main_list all the elements from list_2 that are not found in list_1.

The result should be:

main_list=["f", "m"]

How can I do it with python?


回答 0

TL; DR:
解决方案(1)

import numpy as np
main_list = np.setdiff1d(list_2,list_1)
# yields the elements in `list_2` that are NOT in `list_1`

解决方案(2) 您需要一个排序列表

def setdiff_sorted(array1,array2,assume_unique=False):
    ans = np.setdiff1d(array1,array2,assume_unique).tolist()
    if assume_unique:
        return sorted(ans)
    return ans
main_list = setdiff_sorted(list_2,list_1)




说明:
(1)可以使用与NumPy的setdiff1darray1array2assume_unique= False)。

assume_unique询问用户数组是否已经唯一。
如果为False,则首先确定唯一元素。
如果为True,则函数将假定元素已经是唯一的,并且函数将跳过确定唯一元素的操作。

这产生了独特的值array1不是array2assume_uniqueFalse默认。

如果您担心 唯一元素(基于Chinny84响应),则只需使用(其中assume_unique=False=>默认值):

import numpy as np
list_1 = ["a", "b", "c", "d", "e"]
list_2 = ["a", "f", "c", "m"] 
main_list = np.setdiff1d(list_2,list_1)
# yields the elements in `list_2` that are NOT in `list_1`


(2) 对于想要对答案进行排序的人,我做了一个自定义函数:

import numpy as np
def setdiff_sorted(array1,array2,assume_unique=False):
    ans = np.setdiff1d(array1,array2,assume_unique).tolist()
    if assume_unique:
        return sorted(ans)
    return ans

要获得答案,请运行:

main_list = setdiff_sorted(list_2,list_1)

旁注:
(a)解决方案2(自定义函数setdiff_sorted)返回一个列表(与解决方案1中的数组相比)。

(b)如果不确定这些元素是否唯一,则只需setdiff1d在解决方案A和B中都使用NumPy的默认设置。并发症的例子是什么?见注释(c)。

(c)如果两个列表中的任何一个都不唯一,情况将有所不同。
list_2的不是唯一的:list2 = ["a", "f", "c", "m", "m"]。保持list1原样:list_1 = ["a", "b", "c", "d", "e"]
设置assume_uniqueyields 的默认值["f", "m"](在两种解决方案中)。但是,如果您设置了assume_unique=True,两种解决方案都可以["f", "m", "m"]。为什么?这是因为用户认为元素是唯一的)。因此,最好保持assume_unique为其默认值。请注意,两个答案均已排序。

TL;DR:
SOLUTION (1)

import numpy as np
main_list = np.setdiff1d(list_2,list_1)
# yields the elements in `list_2` that are NOT in `list_1`

SOLUTION (2) You want a sorted list

def setdiff_sorted(array1,array2,assume_unique=False):
    ans = np.setdiff1d(array1,array2,assume_unique).tolist()
    if assume_unique:
        return sorted(ans)
    return ans
main_list = setdiff_sorted(list_2,list_1)




EXPLANATIONS:
(1) You can use NumPy’s setdiff1d (array1,array2,assume_unique=False).

assume_unique asks the user IF the arrays ARE ALREADY UNIQUE.
If False, then the unique elements are determined first.
If True, the function will assume that the elements are already unique AND function will skip determining the unique elements.

This yields the unique values in array1 that are not in array2. assume_unique is False by default.

If you are concerned with the unique elements (based on the response of Chinny84), then simply use (where assume_unique=False => the default value):

import numpy as np
list_1 = ["a", "b", "c", "d", "e"]
list_2 = ["a", "f", "c", "m"] 
main_list = np.setdiff1d(list_2,list_1)
# yields the elements in `list_2` that are NOT in `list_1`


(2) For those who want answers to be sorted, I’ve made a custom function:

import numpy as np
def setdiff_sorted(array1,array2,assume_unique=False):
    ans = np.setdiff1d(array1,array2,assume_unique).tolist()
    if assume_unique:
        return sorted(ans)
    return ans

To get the answer, run:

main_list = setdiff_sorted(list_2,list_1)

SIDE NOTES:
(a) Solution 2 (custom function setdiff_sorted) returns a list (compared to an array in solution 1).

(b) If you aren’t sure if the elements are unique, just use the default setting of NumPy’s setdiff1d in both solutions A and B. What can be an example of a complication? See note (c).

(c) Things will be different if either of the two lists is not unique.
Say list_2 is not unique: list2 = ["a", "f", "c", "m", "m"]. Keep list1 as is: list_1 = ["a", "b", "c", "d", "e"]
Setting the default value of assume_unique yields ["f", "m"] (in both solutions). HOWEVER, if you set assume_unique=True, both solutions give ["f", "m", "m"]. Why? This is because the user ASSUMED that the elements are unique). Hence, IT IS BETTER TO KEEP assume_unique to its default value. Note that both answers are sorted.


回答 1

您可以使用集:

main_list = list(set(list_2) - set(list_1))

输出:

>>> list_1=["a", "b", "c", "d", "e"]
>>> list_2=["a", "f", "c", "m"]
>>> set(list_2) - set(list_1)
set(['m', 'f'])
>>> list(set(list_2) - set(list_1))
['m', 'f']

根据@JonClements的评论,这是一个更简洁的版本:

>>> list_1=["a", "b", "c", "d", "e"]
>>> list_2=["a", "f", "c", "m"]
>>> list(set(list_2).difference(list_1))
['m', 'f']

You can use sets:

main_list = list(set(list_2) - set(list_1))

Output:

>>> list_1=["a", "b", "c", "d", "e"]
>>> list_2=["a", "f", "c", "m"]
>>> set(list_2) - set(list_1)
set(['m', 'f'])
>>> list(set(list_2) - set(list_1))
['m', 'f']

Per @JonClements’ comment, here is a tidier version:

>>> list_1=["a", "b", "c", "d", "e"]
>>> list_2=["a", "f", "c", "m"]
>>> list(set(list_2).difference(list_1))
['m', 'f']

回答 2

不知道为什么当您拥有本机方法时,上述说明为何如此复杂:

main_list = list(set(list_2)-set(list_1))

Not sure why the above explanations are so complicated when you have native methods available:

main_list = list(set(list_2)-set(list_1))

回答 3

使用这样的列表理解

main_list = [item for item in list_2 if item not in list_1]

输出:

>>> list_1 = ["a", "b", "c", "d", "e"]
>>> list_2 = ["a", "f", "c", "m"] 
>>> 
>>> main_list = [item for item in list_2 if item not in list_1]
>>> main_list
['f', 'm']

编辑:

就像下面的注释中提到的那样,如果列表很大,则以上并不是理想的解决方案。在这种情况下,更好的选择是转换list_1set第一个:

set_1 = set(list_1)  # this reduces the lookup time from O(n) to O(1)
main_list = [item for item in list_2 if item not in set_1]

Use a list comprehension like this:

main_list = [item for item in list_2 if item not in list_1]

Output:

>>> list_1 = ["a", "b", "c", "d", "e"]
>>> list_2 = ["a", "f", "c", "m"] 
>>> 
>>> main_list = [item for item in list_2 if item not in list_1]
>>> main_list
['f', 'm']

Edit:

Like mentioned in the comments below, with large lists, the above is not the ideal solution. When that’s the case, a better option would be converting list_1 to a set first:

set_1 = set(list_1)  # this reduces the lookup time from O(n) to O(1)
main_list = [item for item in list_2 if item not in set_1]

回答 4

如果您想要一种单线解决方案(忽略导入),该解决方案仅需要O(max(n, m))长度n和长度输入工作,m而不需要O(n * m)工作,则可以使用以下itertools模块

from itertools import filterfalse

main_list = list(filterfalse(set(list_1).__contains__, list_2))

这利用了功能函数在构造上采用回调函数的优势,从而允许它创建一次回调并在每个元素中重用它,而无需将其存储在某个位置(因为filterfalse在内部存储);列表推导和生成器表达式可以做到这一点,但这很丑陋。†

在一行中得到与以下结果相同的结果:

main_list = [x for x in list_2 if x not in list_1]

速度:

set_1 = set(list_1)
main_list = [x for x in list_2 if x not in set_1]

当然,如果比较是按位置进行的,则:

list_1 = [1, 2, 3]
list_2 = [2, 3, 4]

应该生成:

main_list = [2, 3, 4]

(因为in list_2中的值与in 中的相同索引相匹配list_1),您绝对应该使用Patrick的答案,该答案不涉及临时lists或sets(即使sets大致相同O(1),它们每张支票的“常数”因数也比简单的等式支票高) )并且涉及O(min(n, m))工作,比其他任何答案都要少,并且如果您的问题对位置敏感,则是唯一正确的答案当匹配元素以不匹配的偏移量出现时解决方案。

†:使用列表理解来做与单行代码相同的方法是滥用嵌套循环以在“最外层”循环中创建和缓存值,例如:

main_list = [x for set_1 in (set(list_1),) for x in list_2 if x not in set_1]

这也给Python 3带来了次要的性能优势(因为现在set_1它在理解代码中处于本地范围内,而不是从每次检查的嵌套范围中查找;在Python 2上则没有关系,因为Python 2并未使用闭包列出理解;它们的作用范围与所使用的作用域相同)。

If you want a one-liner solution (ignoring imports) that only requires O(max(n, m)) work for inputs of length n and m, not O(n * m) work, you can do so with the itertools module:

from itertools import filterfalse

main_list = list(filterfalse(set(list_1).__contains__, list_2))

This takes advantage of the functional functions taking a callback function on construction, allowing it to create the callback once and reuse it for every element without needing to store it somewhere (because filterfalse stores it internally); list comprehensions and generator expressions can do this, but it’s ugly.†

That gets the same results in a single line as:

main_list = [x for x in list_2 if x not in list_1]

with the speed of:

set_1 = set(list_1)
main_list = [x for x in list_2 if x not in set_1]

Of course, if the comparisons are intended to be positional, so:

list_1 = [1, 2, 3]
list_2 = [2, 3, 4]

should produce:

main_list = [2, 3, 4]

(because no value in list_2 has a match at the same index in list_1), you should definitely go with Patrick’s answer, which involves no temporary lists or sets (even with sets being roughly O(1), they have a higher “constant” factor per check than simple equality checks) and involves O(min(n, m)) work, less than any other answer, and if your problem is position sensitive, is the only correct solution when matching elements appear at mismatched offsets.

†: The way to do the same thing with a list comprehension as a one-liner would be to abuse nested looping to create and cache value(s) in the “outermost” loop, e.g.:

main_list = [x for set_1 in (set(list_1),) for x in list_2 if x not in set_1]

which also gives a minor performance benefit on Python 3 (because now set_1 is locally scoped in the comprehension code, rather than looked up from nested scope for each check; on Python 2 that doesn’t matter, because Python 2 doesn’t use closures for list comprehensions; they operate in the same scope they’re used in).


回答 5

main_list=[]
list_1=["a", "b", "c", "d", "e"]
list_2=["a", "f", "c", "m"]

for i in list_2:
    if i not in list_1:
        main_list.append(i)

print(main_list)

输出:

['f', 'm']
main_list=[]
list_1=["a", "b", "c", "d", "e"]
list_2=["a", "f", "c", "m"]

for i in list_2:
    if i not in list_1:
        main_list.append(i)

print(main_list)

output:

['f', 'm']

回答 6

我会将zip这些列表放在一起,以逐个元素比较它们。

main_list = [b for a, b in zip(list1, list2) if a!= b]

I would zip the lists together to compare them element by element.

main_list = [b for a, b in zip(list1, list2) if a!= b]

回答 7

我使用了两种方法,发现一种方法比其他方法有用。这是我的答案:

我的输入数据:

crkmod_mpp = ['M13','M18','M19','M24']
testmod_mpp = ['M13','M14','M15','M16','M17','M18','M19','M20','M21','M22','M23','M24']

方法1:np.setdiff1d我喜欢这种方法,因为它保留了位置

test= list(np.setdiff1d(testmod_mpp,crkmod_mpp))
print(test)
['M15', 'M16', 'M22', 'M23', 'M20', 'M14', 'M17', 'M21']

方法2:尽管给出的答案与方法1相同,但扰乱了顺序

test = list(set(testmod_mpp).difference(set(crkmod_mpp)))
print(test)
['POA23', 'POA15', 'POA17', 'POA16', 'POA22', 'POA18', 'POA24', 'POA21']

方法1完全np.setdiff1d符合我的要求。此答案仅供参考。

I used two methods and I found one method useful over other. Here is my answer:

My input data:

crkmod_mpp = ['M13','M18','M19','M24']
testmod_mpp = ['M13','M14','M15','M16','M17','M18','M19','M20','M21','M22','M23','M24']

Method1: np.setdiff1d I like this approach over other because it preserves the position

test= list(np.setdiff1d(testmod_mpp,crkmod_mpp))
print(test)
['M15', 'M16', 'M22', 'M23', 'M20', 'M14', 'M17', 'M21']

Method2: Though it gives same answer as in Method1 but disturbs the order

test = list(set(testmod_mpp).difference(set(crkmod_mpp)))
print(test)
['POA23', 'POA15', 'POA17', 'POA16', 'POA22', 'POA18', 'POA24', 'POA21']

Method1 np.setdiff1d meets my requirements perfectly. This answer for information.


回答 8

如果应该考虑发生的次数,则可能需要使用类似以下内容的方法collections.Counter

list_1=["a", "b", "c", "d", "e"]
list_2=["a", "f", "c", "m"] 
from collections import Counter
cnt1 = Counter(list_1)
cnt2 = Counter(list_2)
final = [key for key, counts in cnt2.items() if cnt1.get(key, 0) != counts]

>>> final
['f', 'm']

如所承诺的,这也可以将不同的出现次数称为“差异”:

list_1=["a", "b", "c", "d", "e", 'a']
cnt1 = Counter(list_1)
cnt2 = Counter(list_2)
final = [key for key, counts in cnt2.items() if cnt1.get(key, 0) != counts]

>>> final
['a', 'f', 'm']

If the number of occurences should be taken into account you probably need to use something like collections.Counter:

list_1=["a", "b", "c", "d", "e"]
list_2=["a", "f", "c", "m"] 
from collections import Counter
cnt1 = Counter(list_1)
cnt2 = Counter(list_2)
final = [key for key, counts in cnt2.items() if cnt1.get(key, 0) != counts]

>>> final
['f', 'm']

As promised this can also handle differing number of occurences as “difference”:

list_1=["a", "b", "c", "d", "e", 'a']
cnt1 = Counter(list_1)
cnt2 = Counter(list_2)
final = [key for key, counts in cnt2.items() if cnt1.get(key, 0) != counts]

>>> final
['a', 'f', 'm']

回答 9

从ser1中删除ser2中存在的项目。

输入项

ser1 = pd.Series([1、2、3、4、5])ser2 = pd.Series([4、5、6、7、8])

ser1 [〜ser1.isin(ser2)]

From ser1 remove items present in ser2.

Input

ser1 = pd.Series([1, 2, 3, 4, 5]) ser2 = pd.Series([4, 5, 6, 7, 8])

Solution

ser1[~ser1.isin(ser2)]