问题:从另一个列表中删除一个列表中出现的所有元素

假设我有两个列表,l1l2。我想表演l1 - l2,返回l1not中的所有元素l2

我可以想到一个幼稚的循环方法来执行此操作,但这实际上效率很低。什么是Python高效的方法?

例如,如果我有l1 = [1,2,6,8] and l2 = [2,3,5,8]l1 - l2应返回[1,6]

Let’s say I have two lists, l1 and l2. I want to perform l1 - l2, which returns all elements of l1 not in l2.

I can think of a naive loop approach to doing this, but that is going to be really inefficient. What is a pythonic and efficient way of doing this?

As an example, if I have l1 = [1,2,6,8] and l2 = [2,3,5,8], l1 - l2 should return [1,6]


回答 0

Python具有称为List Comprehensions的语言功能,非常适合使这种事情变得非常容易。以下语句完全满足您的要求,并将结果存储在l3

l3 = [x for x in l1 if x not in l2]

l3将包含[1, 6]

Python has a language feature called List Comprehensions that is perfectly suited to making this sort of thing extremely easy. The following statement does exactly what you want and stores the result in l3:

l3 = [x for x in l1 if x not in l2]

l3 will contain [1, 6].


回答 1

一种方法是使用集合:

>>> set([1,2,6,8]) - set([2,3,5,8])
set([1, 6])

One way is to use sets:

>>> set([1,2,6,8]) - set([2,3,5,8])
set([1, 6])

回答 2

或者,您也可以filter其与lambda表达式配合使用以获取所需的结果。例如:

>>> l1 = [1,2,6,8]
>>> l2 = set([2,3,5,8])

#     v  `filter` returns the a iterator object. Here I'm type-casting 
#     v  it to `list` in order to display the resultant value
>>> list(filter(lambda x: x not in l2, l1))
[1, 6]

性能比较

在这里,我正在比较此处提到的所有答案的效果。不出所料,Arkku的 set运营速度最快。

  • Arkku的设置差异 -第一次(每个循环0.124 微秒

    mquadri$ python -m timeit -s "l1 = set([1,2,6,8]); l2 = set([2,3,5,8]);" "l1 - l2"
    10000000 loops, best of 3: 0.124 usec per loop
  • 带有set查找的Daniel Pryden的列表理解 -第二(每个循环0.302 微秒

    mquadri$ python -m timeit -s "l1 = [1,2,6,8]; l2 = set([2,3,5,8]);" "[x for x in l1 if x not in l2]"
    1000000 loops, best of 3: 0.302 usec per loop
  • 普通列表上的甜甜圈列表理解 -第三(每个循环0.552微秒)

    mquadri$ python -m timeit -s "l1 = [1,2,6,8]; l2 = [2,3,5,8];" "[x for x in l1 if x not in l2]"
    1000000 loops, best of 3: 0.552 usec per loop
  • Moinuddin Quadri的使用filter -第四(每个循环0.972微秒)

    mquadri$ python -m timeit -s "l1 = [1,2,6,8]; l2 = set([2,3,5,8]);" "filter(lambda x: x not in l2, l1)"
    1000000 loops, best of 3: 0.972 usec per loop
  • Akshay Hazari’s组合使用reduce+filter -第五(每个循环3.97usec)

    mquadri$ python -m timeit "l1 = [1,2,6,8]; l2 = [2,3,5,8];" "reduce(lambda x,y : filter(lambda z: z!=y,x) ,l1,l2)"
    100000 loops, best of 3: 3.97 usec per loop

PS: set不维持顺序,并从列表中删除重复的元素。因此,如果您需要使用任何设置差异,请不要使用。

As an alternative, you may also use filter with the lambda expression to get the desired result. For example:

>>> l1 = [1,2,6,8]
>>> l2 = set([2,3,5,8])

#     v  `filter` returns the a iterator object. Here I'm type-casting 
#     v  it to `list` in order to display the resultant value
>>> list(filter(lambda x: x not in l2, l1))
[1, 6]

Performance Comparison

Here I am comparing the performance of all the answers mentioned here. As expected, Arkku’s set based operation is fastest.

  • Arkku’s Set Difference – First (0.124 usec per loop)

    mquadri$ python -m timeit -s "l1 = set([1,2,6,8]); l2 = set([2,3,5,8]);" "l1 - l2"
    10000000 loops, best of 3: 0.124 usec per loop
    
  • Daniel Pryden’s List Comprehension with set lookup – Second (0.302 usec per loop)

    mquadri$ python -m timeit -s "l1 = [1,2,6,8]; l2 = set([2,3,5,8]);" "[x for x in l1 if x not in l2]"
    1000000 loops, best of 3: 0.302 usec per loop
    
  • Donut’s List Comprehension on plain list – Third (0.552 usec per loop)

    mquadri$ python -m timeit -s "l1 = [1,2,6,8]; l2 = [2,3,5,8];" "[x for x in l1 if x not in l2]"
    1000000 loops, best of 3: 0.552 usec per loop
    
  • Moinuddin Quadri’s using filter – Fourth (0.972 usec per loop)

    mquadri$ python -m timeit -s "l1 = [1,2,6,8]; l2 = set([2,3,5,8]);" "filter(lambda x: x not in l2, l1)"
    1000000 loops, best of 3: 0.972 usec per loop
    
  • Akshay Hazari’s using combination of reduce + filter – Fifth (3.97 usec per loop)

    mquadri$ python -m timeit "l1 = [1,2,6,8]; l2 = [2,3,5,8];" "reduce(lambda x,y : filter(lambda z: z!=y,x) ,l1,l2)"
    100000 loops, best of 3: 3.97 usec per loop
    

PS: set does not maintain the order and removes the duplicate elements from the list. Hence, do not use set difference if you need any of these.


回答 3

在此处扩展Donut的答案和其他答案,通过使用生成器理解而不是列表理解,以及通过使用set数据结构,您可以获得甚至更好的结果(因为in运算符在列表中为O(n)但O(1)在一组上)。

所以这是一个适合您的函数:

def filter_list(full_list, excludes):
    s = set(excludes)
    return (x for x in full_list if x not in s)

结果将是可迭代的,将延迟获取已过滤列表。如果您需要一个真实的列表对象(例如,如果需要对len()结果进行操作),则可以轻松构建一个列表,如下所示:

filtered_list = list(filter_list(full_list, excludes))

Expanding on Donut’s answer and the other answers here, you can get even better results by using a generator comprehension instead of a list comprehension, and by using a set data structure (since the in operator is O(n) on a list but O(1) on a set).

So here’s a function that would work for you:

def filter_list(full_list, excludes):
    s = set(excludes)
    return (x for x in full_list if x not in s)

The result will be an iterable that will lazily fetch the filtered list. If you need a real list object (e.g. if you need to do a len() on the result), then you can easily build a list like so:

filtered_list = list(filter_list(full_list, excludes))

回答 4

使用Python设置类型。那将是最Python的。:)

另外,由于它是本机的,因此它也应该是最优化的方法。

看到:

http://docs.python.org/library/stdtypes.html#set

http://docs.python.org/library/sets.htm(适用于较旧的python)

# Using Python 2.7 set literal format.
# Otherwise, use: l1 = set([1,2,6,8])
#
l1 = {1,2,6,8}
l2 = {2,3,5,8}
l3 = l1 - l2

Use the Python set type. That would be the most Pythonic. :)

Also, since it’s native, it should be the most optimized method too.

See:

http://docs.python.org/library/stdtypes.html#set

http://docs.python.org/library/sets.htm (for older python)

# Using Python 2.7 set literal format.
# Otherwise, use: l1 = set([1,2,6,8])
#
l1 = {1,2,6,8}
l2 = {2,3,5,8}
l3 = l1 - l2

回答 5

使用 Set Comprehensions {x in l2中的x}或set(l2)进行设置,然后使用List Comprehensions获取列表

l2set = set(l2)
l3 = [x for x in l1 if x not in l2set]

基准测试代码:

import time

l1 = list(range(1000*10 * 3))
l2 = list(range(1000*10 * 2))

l2set = {x for x in l2}

tic = time.time()
l3 = [x for x in l1 if x not in l2set]
toc = time.time()
diffset = toc-tic
print(diffset)

tic = time.time()
l3 = [x for x in l1 if x not in l2]
toc = time.time()
difflist = toc-tic
print(difflist)

print("speedup %fx"%(difflist/diffset))

基准测试结果:

0.0015058517456054688
3.968189239501953
speedup 2635.179227x    

use Set Comprehensions {x for x in l2} or set(l2) to get set, then use List Comprehensions to get list

l2set = set(l2)
l3 = [x for x in l1 if x not in l2set]

benchmark test code:

import time

l1 = list(range(1000*10 * 3))
l2 = list(range(1000*10 * 2))

l2set = {x for x in l2}

tic = time.time()
l3 = [x for x in l1 if x not in l2set]
toc = time.time()
diffset = toc-tic
print(diffset)

tic = time.time()
l3 = [x for x in l1 if x not in l2]
toc = time.time()
difflist = toc-tic
print(difflist)

print("speedup %fx"%(difflist/diffset))

benchmark test result:

0.0015058517456054688
3.968189239501953
speedup 2635.179227x    

回答 6

替代解决方案:

reduce(lambda x,y : filter(lambda z: z!=y,x) ,[2,3,5,8],[1,2,6,8])

Alternate Solution :

reduce(lambda x,y : filter(lambda z: z!=y,x) ,[2,3,5,8],[1,2,6,8])

声明:本站所有文章,如无特殊说明或标注,均为本站原创发布。任何个人或组织,在未征得本站同意时,禁止复制、盗用、采集、发布本站内容到任何网站、书籍等各类媒体平台。如若本站内容侵犯了原著者的合法权益,可联系我们进行处理。