找到多个集合的交集的最佳方法?

问题:找到多个集合的交集的最佳方法?

我有一套清单:

setlist = [s1,s2,s3...]

我要s1∩s2∩s3 …

我可以编写一个函数来执行一系列逐对操作s1.intersection(s2),等等。

有没有推荐,更好或内置的方法?

I have a list of sets:

setlist = [s1,s2,s3...]

I want s1 ∩ s2 ∩ s3 …

I can write a function to do it by performing a series of pairwise s1.intersection(s2), etc.

Is there a recommended, better, or built-in way?


回答 0

从Python 2.6版开始,您可以对使用多个参数set.intersection(),例如

u = set.intersection(s1, s2, s3)

如果这些集合在列表中,则表示:

u = set.intersection(*setlist)

这里*a_list列表扩展

请注意,set.intersection不是一个静态的方法,但这种使用功能符号应用第一套交叉口列表的其余部分。因此,如果参数列表为空,则将失败。

From Python version 2.6 on you can use multiple arguments to set.intersection(), like

u = set.intersection(s1, s2, s3)

If the sets are in a list, this translates to:

u = set.intersection(*setlist)

where *a_list is list expansion

Note that set.intersection is not a static method, but this uses the functional notation to apply intersection of the first set with the rest of the list. So if the argument list is empty this will fail.


回答 1

从2.6开始,set.intersection任意可迭代。

>>> s1 = set([1, 2, 3])
>>> s2 = set([2, 3, 4])
>>> s3 = set([2, 4, 6])
>>> s1 & s2 & s3
set([2])
>>> s1.intersection(s2, s3)
set([2])
>>> sets = [s1, s2, s3]
>>> set.intersection(*sets)
set([2])

As of 2.6, set.intersection takes arbitrarily many iterables.

>>> s1 = set([1, 2, 3])
>>> s2 = set([2, 3, 4])
>>> s3 = set([2, 4, 6])
>>> s1 & s2 & s3
set([2])
>>> s1.intersection(s2, s3)
set([2])
>>> sets = [s1, s2, s3]
>>> set.intersection(*sets)
set([2])

回答 2

显然,set.intersection这里是您想要的,但是如果您需要概括“取所有这些和”,“取所有这些的乘积”,“取所有这些的异或”,则您想要的是reduce功能:

from operator import and_
from functools import reduce
print(reduce(and_, [{1,2,3},{2,3,4},{3,4,5}])) # = {3}

要么

print(reduce((lambda x,y: x&y), [{1,2,3},{2,3,4},{3,4,5}])) # = {3}

Clearly set.intersection is what you want here, but in case you ever need a generalisation of “take the sum of all these”, “take the product of all these”, “take the xor of all these”, what you are looking for is the reduce function:

from operator import and_
from functools import reduce
print(reduce(and_, [{1,2,3},{2,3,4},{3,4,5}])) # = {3}

or

print(reduce((lambda x,y: x&y), [{1,2,3},{2,3,4},{3,4,5}])) # = {3}

回答 3

如果您没有Python 2.6或更高版本,则可以选择编写一个显式的for循环:

def set_list_intersection(set_list):
  if not set_list:
    return set()
  result = set_list[0]
  for s in set_list[1:]:
    result &= s
  return result

set_list = [set([1, 2]), set([1, 3]), set([1, 4])]
print set_list_intersection(set_list)
# Output: set([1])

您也可以使用reduce

set_list = [set([1, 2]), set([1, 3]), set([1, 4])]
print reduce(lambda s1, s2: s1 & s2, set_list)
# Output: set([1])

但是,许多Python程序员都不喜欢它,包括Guido本人

大约12年前,Python获得了lambda,reduce(),filter()和map(),这是由(我相信)一个Lisp黑客(他错过了它们并提交了工作补丁)提供的。但是,尽管具有PR值,但我认为应该从Python 3000中删除这些功能。

所以现在reduce()。这实际上是我一直最讨厌的一个,因为除了一些涉及+或*的示exceptions,几乎每次我看到带有非平凡函数参数的reduce()调用时,我都需要拿笔和纸来在我了解reduce()应该做什么之前,请先绘制出该函数实际输入的内容。因此,在我看来,reduce()的适用性几乎仅限于关联运算符,在所有其他情况下,最好显式地写出累加循环。

If you don’t have Python 2.6 or higher, the alternative is to write an explicit for loop:

def set_list_intersection(set_list):
  if not set_list:
    return set()
  result = set_list[0]
  for s in set_list[1:]:
    result &= s
  return result

set_list = [set([1, 2]), set([1, 3]), set([1, 4])]
print set_list_intersection(set_list)
# Output: set([1])

You can also use reduce:

set_list = [set([1, 2]), set([1, 3]), set([1, 4])]
print reduce(lambda s1, s2: s1 & s2, set_list)
# Output: set([1])

However, many Python programmers dislike it, including Guido himself:

About 12 years ago, Python aquired lambda, reduce(), filter() and map(), courtesy of (I believe) a Lisp hacker who missed them and submitted working patches. But, despite of the PR value, I think these features should be cut from Python 3000.

So now reduce(). This is actually the one I’ve always hated most, because, apart from a few examples involving + or *, almost every time I see a reduce() call with a non-trivial function argument, I need to grab pen and paper to diagram what’s actually being fed into that function before I understand what the reduce() is supposed to do. So in my mind, the applicability of reduce() is pretty much limited to associative operators, and in all other cases it’s better to write out the accumulation loop explicitly.


回答 4

在这里,我为多个集合交集提供了一个通用函数,试图利用现有的最佳方法:

def multiple_set_intersection(*sets):
    """Return multiple set intersection."""
    try:
        return set.intersection(*sets)
    except TypeError: # this is Python < 2.6 or no arguments
        pass

    try: a_set= sets[0]
    except IndexError: # no arguments
        return set() # return empty set

    return reduce(a_set.intersection, sets[1:])

Guido可能不喜欢reduce,但我对此很喜欢:)

Here I’m offering a generic function for multiple set intersection trying to take advantage of the best method available:

def multiple_set_intersection(*sets):
    """Return multiple set intersection."""
    try:
        return set.intersection(*sets)
    except TypeError: # this is Python < 2.6 or no arguments
        pass

    try: a_set= sets[0]
    except IndexError: # no arguments
        return set() # return empty set

    return reduce(a_set.intersection, sets[1:])

Guido might dislike reduce, but I’m kind of fond of it :)


回答 5

Jean-FrançoisFabre set.intesection(* list_of_sets)答案无疑是最pyhtonic的答案,并且是公认的答案。

对于那些想要使用reduce的用户,以下方法也将起作用:

reduce(set.intersection, list_of_sets)

Jean-François Fabre set.intesection(*list_of_sets) answer is definetly the most Pyhtonic and is rightly the accepted answer.

For those that want to use reduce, the following will also work:

reduce(set.intersection, list_of_sets)