标签归档:set

如何在不使用“ |”的情况下将两组连接在一起

问题:如何在不使用“ |”的情况下将两组连接在一起

假定ST被分配了集合。不使用join运算符|,如何找到两个集合的并集?例如,找到交叉点:

S = {1, 2, 3, 4}
T = {3, 4, 5, 6}
S_intersect_T = { i for i in S if i in T }

那么如何在不使用的情况下在一行中找到两个集合的并集|呢?

Assume that S and T are assigned sets. Without using the join operator |, how can I find the union of the two sets? This, for example, finds the intersection:

S = {1, 2, 3, 4}
T = {3, 4, 5, 6}
S_intersect_T = { i for i in S if i in T }

So how can I find the union of two sets in one line without using |?


回答 0

您可以对集合使用联合方法: set.union(other_set)

请注意,它会返回一个新集合,即不会对其自身进行修改。

You can use union method for sets: set.union(other_set)

Note that it returns a new set i.e it doesn’t modify itself.


回答 1

您可以使用or_别名:

>>> from operator import or_
>>> from functools import reduce # python3 required
>>> reduce(or_, [{1, 2, 3, 4}, {3, 4, 5, 6}])
set([1, 2, 3, 4, 5, 6])

You could use or_ alias:

>>> from operator import or_
>>> from functools import reduce # python3 required
>>> reduce(or_, [{1, 2, 3, 4}, {3, 4, 5, 6}])
set([1, 2, 3, 4, 5, 6])

回答 2

如果您可以修改原始集(在某些情况下可能需要这样做),可以使用set.update()

S.update(T)

返回值是None,但S将更新为原始S和的并集T

If you are fine with modifying the original set (which you may want to do in some cases), you can use set.update():

S.update(T)

The return value is None, but S will be updated to be the union of the original S and T.


回答 3

假设您也无法使用s.union(t),等于s | t,您可以尝试

>>> from itertools import chain
>>> set(chain(s,t))
set([1, 2, 3, 4, 5, 6])

或者,如果您想理解,

>>> {i for j in (s,t) for i in j}
set([1, 2, 3, 4, 5, 6])

Assuming you also can’t use s.union(t), which is equivalent to s | t, you could try

>>> from itertools import chain
>>> set(chain(s,t))
set([1, 2, 3, 4, 5, 6])

Or, if you want a comprehension,

>>> {i for j in (s,t) for i in j}
set([1, 2, 3, 4, 5, 6])

回答 4

如果加入表示您是工会,请尝试以下方法:

set(list(s) + list(t))

这有点骇人听闻,但我想不出更好的衬里来做。

If by join you mean union, try this:

set(list(s) + list(t))

It’s a bit of a hack, but I can’t think of a better one liner to do it.


回答 5

假设您有2个清单

 A = [1,2,3,4]
 B = [3,4,5,6]

所以你可以找到A联盟B如下

 union = set(A).union(set(B))

另外,如果要查找相交和非相交,请按照以下步骤进行操作

 intersection = set(A).intersection(set(B))
 non_intersection = union - intersection

Suppose you have 2 lists

 A = [1,2,3,4]
 B = [3,4,5,6]

so you can find A Union B as follow

 union = set(A).union(set(B))

also if you want to find intersection and non-intersection you do that as follow

 intersection = set(A).intersection(set(B))
 non_intersection = union - intersection

回答 6

您可以像这样将两个集合解压缩成一个集合:

>>> set_1 = {1, 2, 3, 4}
>>> set_2 = {3, 4, 5, 6}
>>> union = {*set_1, *set_2}
>>> union
{1, 2, 3, 4, 5, 6}

*解包集。拆包是将可迭代项(例如集合或列表)表示为它产生的每个项目的地方。这意味着上面的示例简化为{1, 2, 3, 4, 3, 4, 5, 6},然后简化为,{1, 2, 3, 4, 5, 6}因为该集合只能包含唯一项。

You can just unpack both sets into one like this:

>>> set_1 = {1, 2, 3, 4}
>>> set_2 = {3, 4, 5, 6}
>>> union = {*set_1, *set_2}
>>> union
{1, 2, 3, 4, 5, 6}

The * unpacks the set. Unpacking is where an iterable (e.g. a set or list) is represented as every item it yields. This means the above example simplifies to {1, 2, 3, 4, 3, 4, 5, 6} which then simplifies to {1, 2, 3, 4, 5, 6} because the set can only contain unique items.


回答 7

您可以做union或简单的列表理解

[A.add(_) for _ in B]

A将拥有B的所有元素

You can do union or simple list comprehension

[A.add(_) for _ in B]

A would have all the elements of B


如何在Python中创建一组集?

问题:如何在Python中创建一组集?

我正在尝试在Python中设置一组。我不知道该怎么做。

从空集开始xx

xx = set([])
# Now we have some other set, for example
elements = set([2,3,4])
xx.add(elements)

但我明白了

TypeError: unhashable type: 'list'

要么

TypeError: unhashable type: 'set'

Python中可能有一组集合吗?

我正在处理大量集合,但我希望不必处理重复的集合(集合A1,集合A2,….的集合B,如果Ai = Aj,则“将取消”两个集合)

I’m trying to make a set of sets in Python. I can’t figure out how to do it.

Starting with the empty set xx:

xx = set([])
# Now we have some other set, for example
elements = set([2,3,4])
xx.add(elements)

but I get

TypeError: unhashable type: 'list'

or

TypeError: unhashable type: 'set'

Is it possible to have a set of sets in Python?

I am dealing with a large collection of sets and I want to be able to not have to deal duplicate sets (a set B of sets A1, A2, …., An would “cancel” two sets if Ai = Aj)


回答 0

Python的抱怨是因为内部set对象是可变的,因此不可散列。解决方案是frozenset用于内部集,以表明您无意修改它们。

Python’s complaining because the inner set objects are mutable and thus not hashable. The solution is to use frozenset for the inner sets, to indicate that you have no intention of modifying them.


回答 1

人们已经提到您可以使用Frozenset()做到这一点,所以我将添加一个代码来实现此目的:

例如,您要从以下列表列表中创建一组集合:

t = [[], [1, 2], [5], [1, 2, 5], [1, 2, 3, 4], [1, 2, 3, 6]]

您可以通过以下方式创建集合:

t1 = set(frozenset(i) for i in t)

People already mentioned that you can do this with a frozenset(), so I will just add a code how to achieve this:

For example you want to create a set of sets from the following list of lists:

t = [[], [1, 2], [5], [1, 2, 5], [1, 2, 3, 4], [1, 2, 3, 6]]

you can create your set in the following way:

t1 = set(frozenset(i) for i in t)

回答 2

frozenset在内部使用。


回答 3

所以我有完全相同的问题。我想制作一个可以作为一组集合使用的数据结构。问题在于集合必须包含不可变的对象。因此,您可以做的只是将其作为一组元组。对我来说很好!

A = set()
A.add( (2,3,4) )##adds the element
A.add( (2,3,4) )##does not add the same element
A.add( (2,3,5) )##adds the element, because it is different!

So I had the exact same problem. I wanted to make a data structure that works as a set of sets. The problem is that the sets must contain immutable objects. So, what you can do is simply make it as a set of tuples. That worked fine for me!

A = set()
A.add( (2,3,4) )##adds the element
A.add( (2,3,4) )##does not add the same element
A.add( (2,3,5) )##adds the element, because it is different!

回答 4

截至2020年,Python官方文档建议使用frozenset表示集合集。

As of 2020, the official Python documentation advise using frozenset to represent sets of sets.


将列表转换为集合会更改元素顺序

问题:将列表转换为集合会更改元素顺序

最近,我注意到当我将a转换listset元素的顺序发生变化,由字符排序。

考虑以下示例:

x=[1,2,20,6,210]
print x 
# [1, 2, 20, 6, 210] # the order is same as initial order

set(x)
# set([1, 2, 20, 210, 6]) # in the set(x) output order is sorted

我的问题是-

  1. 为什么会这样呢?
  2. 如何进行设置操作(尤其是“设置差异”)而不丢失初始顺序?

Recently I noticed that when I am converting a list to set the order of elements is changed and is sorted by character.

Consider this example:

x=[1,2,20,6,210]
print x 
# [1, 2, 20, 6, 210] # the order is same as initial order

set(x)
# set([1, 2, 20, 210, 6]) # in the set(x) output order is sorted

My questions are –

  1. Why is this happening?
  2. How can I do set operations (especially Set Difference) without losing the initial order?

回答 0

  1. A set是无序的数据结构,因此它不保留插入顺序。

  2. 这取决于您的要求。如果您有一个普通列表,并且想要在保留列表顺序的同时删除一些元素集,则可以通过列表理解来做到这一点:

    >>> a = [1, 2, 20, 6, 210]
    >>> b = set([6, 20, 1])
    >>> [x for x in a if x not in b]
    [2, 210]

    如果需要同时支持快速成员资格测试保留插入顺序的数据结构,则可以使用Python字典的键,从Python 3.7开始保证可以保留插入顺序:

    >>> a = dict.fromkeys([1, 2, 20, 6, 210])
    >>> b = dict.fromkeys([6, 20, 1])
    >>> dict.fromkeys(x for x in a if x not in b)
    {2: None, 210: None}

    b并不需要在这里订购–您也可以使用set。请注意,a.keys() - b.keys()返回的设置差为set,因此不会保留插入顺序。

    在旧版本的Python中,您可以collections.OrderedDict改用:

    >>> a = collections.OrderedDict.fromkeys([1, 2, 20, 6, 210])
    >>> b = collections.OrderedDict.fromkeys([6, 20, 1])
    >>> collections.OrderedDict.fromkeys(x for x in a if x not in b)
    OrderedDict([(2, None), (210, None)])
  1. A set is an unordered data structure, so it does not preserve the insertion order.

  2. This depends on your requirements. If you have an normal list, and want to remove some set of elements while preserving the order of the list, you can do this with a list comprehension:

    >>> a = [1, 2, 20, 6, 210]
    >>> b = set([6, 20, 1])
    >>> [x for x in a if x not in b]
    [2, 210]
    

    If you need a data structure that supports both fast membership tests and preservation of insertion order, you can use the keys of a Python dictionary, which starting from Python 3.7 is guaranteed to preserve the insertion order:

    >>> a = dict.fromkeys([1, 2, 20, 6, 210])
    >>> b = dict.fromkeys([6, 20, 1])
    >>> dict.fromkeys(x for x in a if x not in b)
    {2: None, 210: None}
    

    b doesn’t really need to be ordered here – you could use a set as well. Note that a.keys() - b.keys() returns the set difference as a set, so it won’t preserve the insertion order.

    In older versions of Python, you can use collections.OrderedDict instead:

    >>> a = collections.OrderedDict.fromkeys([1, 2, 20, 6, 210])
    >>> b = collections.OrderedDict.fromkeys([6, 20, 1])
    >>> collections.OrderedDict.fromkeys(x for x in a if x not in b)
    OrderedDict([(2, None), (210, None)])
    

回答 1

在Python 3.6中,set()现在应该保持顺序,但是对于Python 2和Python 3还有另一种解决方案:

>>> x = [1, 2, 20, 6, 210]
>>> sorted(set(x), key=x.index)
[1, 2, 20, 6, 210]

In Python 3.6, set() now should keep the order, but there is another solution for Python 2 and 3:

>>> x = [1, 2, 20, 6, 210]
>>> sorted(set(x), key=x.index)
[1, 2, 20, 6, 210]

回答 2

回答第一个问题时,集合是针对集合操作进行优化的数据结构。像数学集一样,它不强制或维持元素的任何特定顺序。集合的抽象概念不强制执行顺序,因此不需要强制执行。从列表创建集合时,Python可以根据其用于集合的内部实现的需要自由更改元素的顺序,从而能够高效地执行集合操作。

Answering your first question, a set is a data structure optimized for set operations. Like a mathematical set, it does not enforce or maintain any particular order of the elements. The abstract concept of a set does not enforce order, so the implementation is not required to. When you create a set from a list, Python has the liberty to change the order of the elements for the needs of the internal implementation it uses for a set, which is able to perform set operations efficiently.


回答 3

通过以下功能删除重复项并保留顺序

def unique(sequence):
    seen = set()
    return [x for x in sequence if not (x in seen or seen.add(x))]

检查此链接

remove duplicates and preserve order by below function

def unique(sequence):
    seen = set()
    return [x for x in sequence if not (x in seen or seen.add(x))]

check this link


回答 4

在数学中,有集合有序集合(osets)。

  • set:唯一元素的无序容器(实现)
  • oset:唯一元素的有序容器(未实现)

在Python中,仅直接实现集合。我们可以使用常规的dict键(3.7+)模拟osets 。

给定

a = [1, 2, 20, 6, 210, 2, 1]
b = {2, 6}

oset = dict.fromkeys(a).keys()
# dict_keys([1, 2, 20, 6, 210])

演示版

删除副本,保留插入顺序。

list(oset)
# [1, 2, 20, 6, 210]

对dict键进行类似集合的操作。

oset - b
# {1, 20, 210}

oset | b
# {1, 2, 5, 6, 20, 210}

oset & b
# {2, 6}

oset ^ b
# {1, 5, 20, 210}

细节

注意:无序结构并不排除有序元素。相反,不能保证维持订单。例:

assert {1, 2, 3} == {2, 3, 1}                    # sets (order is ignored)

assert [1, 2, 3] != [2, 3, 1]                    # lists (order is guaranteed)

可能会很高兴地发现列表多集(mset)是另外两种引人入胜的数学数据结构:

  • list:允许重复的元素的有序容器(已实现)
  • mset:允许重复的元素的无序容器(NotImplemented)*

摘要

Container | Ordered | Unique | Implemented
----------|---------|--------|------------
set       |    n    |    y   |     y
oset      |    y    |    y   |     n
list      |    y    |    n   |     y
mset      |    n    |    n   |     n*  

*可以使用collections.Counter()dict样的多重性(计数)映射间接模拟多重集。

In mathematics, there are sets and ordered sets (osets).

  • set: an unordered container of unique elements (Implemented)
  • oset: an ordered container of unique elements (NotImplemented)

In Python, only sets are directly implemented. We can emulate osets with regular dict keys (3.7+).

Given

a = [1, 2, 20, 6, 210, 2, 1]
b = {2, 6}

Code

oset = dict.fromkeys(a).keys()
# dict_keys([1, 2, 20, 6, 210])

Demo

Replicates are removed, insertion-order is preserved.

list(oset)
# [1, 2, 20, 6, 210]

Set-like operations on dict keys.

oset - b
# {1, 20, 210}

oset | b
# {1, 2, 5, 6, 20, 210}

oset & b
# {2, 6}

oset ^ b
# {1, 5, 20, 210}

Details

Note: an unordered structure does not preclude ordered elements. Rather, maintained order is not guaranteed. Example:

assert {1, 2, 3} == {2, 3, 1}                    # sets (order is ignored)

assert [1, 2, 3] != [2, 3, 1]                    # lists (order is guaranteed)

One may be pleased to discover that a list and multiset (mset) are two more fascinating, mathematical data structures:

  • list: an ordered container of elements that permits replicates (Implemented)
  • mset: an unordered container of elements that permits replicates (NotImplemented)*

Summary

Container | Ordered | Unique | Implemented
----------|---------|--------|------------
set       |    n    |    y   |     y
oset      |    y    |    y   |     n
list      |    y    |    n   |     y
mset      |    n    |    n   |     n*  

*A multiset can be indirectly emulated with collections.Counter(), a dict-like mapping of multiplicities (counts).


回答 5

如其他答案所示,集合是不保留元素顺序的数据结构(和数学概念)-

但是,通过使用集合和字典的组合,可以实现所需的功能-尝试使用以下代码段:

# save the element order in a dict:
x_dict = dict(x,y for y, x in enumerate(my_list) )
x_set = set(my_list)
#perform desired set operations
...
#retrieve ordered list from the set:
new_list = [None] * len(new_set)
for element in new_set:
   new_list[x_dict[element]] = element

As denoted in other answers, sets are data structures (and mathematical concepts) that do not preserve the element order –

However, by using a combination of sets and dictionaries, it is possible that you can achieve wathever you want – try using these snippets:

# save the element order in a dict:
x_dict = dict(x,y for y, x in enumerate(my_list) )
x_set = set(my_list)
#perform desired set operations
...
#retrieve ordered list from the set:
new_list = [None] * len(new_set)
for element in new_set:
   new_list[x_dict[element]] = element

回答 6

在Sven的答案的基础上,我发现使用了collections.OrderedDict这样的代码,它帮助我完成了想要的工作,并允许我向dict中添加更多项:

import collections

x=[1,2,20,6,210]
z=collections.OrderedDict.fromkeys(x)
z
OrderedDict([(1, None), (2, None), (20, None), (6, None), (210, None)])

如果要添加项目,但仍将其视为一组,则可以执行以下操作:

z['nextitem']=None

您可以在字典上执行类似z.keys()的操作并获取集合:

z.keys()
[1, 2, 20, 6, 210]

Building on Sven’s answer, I found using collections.OrderedDict like so helped me accomplish what you want plus allow me to add more items to the dict:

import collections

x=[1,2,20,6,210]
z=collections.OrderedDict.fromkeys(x)
z
OrderedDict([(1, None), (2, None), (20, None), (6, None), (210, None)])

If you want to add items but still treat it like a set you can just do:

z['nextitem']=None

And you can perform an operation like z.keys() on the dict and get the set:

z.keys()
[1, 2, 20, 6, 210]

回答 7

上面最高分数概念的实现将其带回到列表中:

def SetOfListInOrder(incominglist):
    from collections import OrderedDict
    outtemp = OrderedDict()
    for item in incominglist:
        outtemp[item] = None
    return(list(outtemp))

在Python 3.6和Python 2.7上进行了简短测试。

An implementation of the highest score concept above that brings it back to a list:

def SetOfListInOrder(incominglist):
    from collections import OrderedDict
    outtemp = OrderedDict()
    for item in incominglist:
        outtemp[item] = None
    return(list(outtemp))

Tested (briefly) on Python 3.6 and Python 2.7.


回答 8

如果您要在两个初始列表中进行少量元素设置差值运算,而不是使用collections.OrderedDict使实现复杂化并使可读性降低的元素,则可以使用:

# initial lists on which you want to do set difference
>>> nums = [1,2,2,3,3,4,4,5]
>>> evens = [2,4,4,6]
>>> evens_set = set(evens)
>>> result = []
>>> for n in nums:
...   if not n in evens_set and not n in result:
...     result.append(n)
... 
>>> result
[1, 3, 5]

它的时间复杂度不是很好,但是它整洁且易于阅读。

In case you have a small number of elements in your two initial lists on which you want to do set difference operation, instead of using collections.OrderedDict which complicates the implementation and makes it less readable, you can use:

# initial lists on which you want to do set difference
>>> nums = [1,2,2,3,3,4,4,5]
>>> evens = [2,4,4,6]
>>> evens_set = set(evens)
>>> result = []
>>> for n in nums:
...   if not n in evens_set and not n in result:
...     result.append(n)
... 
>>> result
[1, 3, 5]

Its time complexity is not that good but it is neat and easy to read.


回答 9

有趣的是,人们总是使用“现实世界中的问题”开玩笑来解释理论科学中的定义。

如果设置有顺序,则首先需要弄清楚以下问题。如果列表中有重复的元素,那么将其变成集合时的顺序应该是什么?如果我们将两个集合并集,顺序是什么?如果在同一元素上以不同顺序相交的两个集合相交,顺序是什么?

另外,set在搜索特定键方面要快得多,这在set操作中非常有用(这就是为什么需要set而不是list的原因)。

如果您真的在乎索引,只需将其保留为列表即可。如果仍要对许多列表中的元素进行设置操作,最简单的方法是为每个列表创建一个字典,该列表中的集合具有相同的键以及包含原始列表中所有键索引的list值。

def indx_dic(l):
    dic = {}
    for i in range(len(l)):
        if l[i] in dic:
            dic.get(l[i]).append(i)
        else:
            dic[l[i]] = [i]
    return(dic)

a = [1,2,3,4,5,1,3,2]
set_a  = set(a)
dic_a = indx_dic(a)

print(dic_a)
# {1: [0, 5], 2: [1, 7], 3: [2, 6], 4: [3], 5: [4]}
print(set_a)
# {1, 2, 3, 4, 5}

It’s interesting that people always use ‘real world problem’ to make joke on the definition in theoretical science.

If set has order, you first need to figure out the following problems. If your list has duplicate elements, what should the order be when you turn it into a set? What is the order if we union two sets? What is the order if we intersect two sets with different order on the same elements?

Plus, set is much faster in searching for a particular key which is very good in sets operation (and that’s why you need a set, but not list).

If you really care about the index, just keep it as a list. If you still want to do set operation on the elements in many lists, the simplest way is creating a dictionary for each list with the same keys in the set along with a value of list containing all the index of the key in the original list.

def indx_dic(l):
    dic = {}
    for i in range(len(l)):
        if l[i] in dic:
            dic.get(l[i]).append(i)
        else:
            dic[l[i]] = [i]
    return(dic)

a = [1,2,3,4,5,1,3,2]
set_a  = set(a)
dic_a = indx_dic(a)

print(dic_a)
# {1: [0, 5], 2: [1, 7], 3: [2, 6], 4: [3], 5: [4]}
print(set_a)
# {1, 2, 3, 4, 5}

回答 10

这是一种简单的方法:

x=[1,2,20,6,210]
print sorted(set(x))

Here’s an easy way to do it:

x=[1,2,20,6,210]
print sorted(set(x))

使用花括号在Python中初始化Set

问题:使用花括号在Python中初始化Set

我正在学习python,并且对初始化集有一个新手问题。通过测试,我发现可以像这样初始化一个集合:

my_set = {'foo', 'bar', 'baz'}

与标准方式相反,以这种方式进行操作是否有任何缺点:

my_set = set(['foo', 'bar', 'baz'])

还是仅仅是样式问题?

I’m learning python, and I have a novice question about initializing sets. Through testing, I’ve discovered that a set can be initialized like so:

my_set = {'foo', 'bar', 'baz'}

Are there any disadvantages of doing it this way, as opposed to the standard way of:

my_set = set(['foo', 'bar', 'baz'])

or is it just a question of style?


回答 0

设置文字语法有两个明显的问题:

my_set = {'foo', 'bar', 'baz'}
  1. 在Python 2.7之前不可用

  2. 无法使用该语法表示空集(使用{}创建空dict)

这些可能对您很重要,也可能不重要。

概述此语法的文档部分在此处

There are two obvious issues with the set literal syntax:

my_set = {'foo', 'bar', 'baz'}
  1. It’s not available before Python 2.7

  2. There’s no way to express an empty set using that syntax (using {} creates an empty dict)

Those may or may not be important to you.

The section of the docs outlining this syntax is here.


回答 1

也比较之间的差别{},并set()用一个字的说法。

>>> a = set('aardvark')
>>> a
{'d', 'v', 'a', 'r', 'k'} 
>>> b = {'aardvark'}
>>> b
{'aardvark'}

但两者ab都是套路。

Compare also the difference between {} and set() with a single word argument.

>>> a = set('aardvark')
>>> a
{'d', 'v', 'a', 'r', 'k'} 
>>> b = {'aardvark'}
>>> b
{'aardvark'}

but both a and b are sets of course.


回答 2

Python 3文档与python 2.7相同):

花括号或set()函数可用于创建集合。注意:要创建一个空集,您必须使用set()而不是{}; 后者将创建一个空字典,这是我们将在下一节中讨论的数据结构。

在python 2.7中:

>>> my_set = {'foo', 'bar', 'baz', 'baz', 'foo'}
>>> my_set
set(['bar', 'foo', 'baz'])

请注意,{}它也用于map/ dict

>>> m = {'a':2,3:'d'}
>>> m[3]
'd'
>>> m={}
>>> type(m)
<type 'dict'> 

还可以使用综合语法来初始化集:

>>> a = {x for x in """didn't know about {} and sets """ if x not in 'set' }
>>> a
set(['a', ' ', 'b', 'd', "'", 'i', 'k', 'o', 'n', 'u', 'w', '{', '}'])

From Python 3 documentation (the same holds for python 2.7):

Curly braces or the set() function can be used to create sets. Note: to create an empty set you have to use set(), not {}; the latter creates an empty dictionary, a data structure that we discuss in the next section.

in python 2.7:

>>> my_set = {'foo', 'bar', 'baz', 'baz', 'foo'}
>>> my_set
set(['bar', 'foo', 'baz'])

Be aware that {} is also used for map/dict:

>>> m = {'a':2,3:'d'}
>>> m[3]
'd'
>>> m={}
>>> type(m)
<type 'dict'> 

One can also use comprehensive syntax to initialize sets:

>>> a = {x for x in """didn't know about {} and sets """ if x not in 'set' }
>>> a
set(['a', ' ', 'b', 'd', "'", 'i', 'k', 'o', 'n', 'u', 'w', '{', '}'])

回答 3

您需要 empty_set = set()初始化一个空集。{}是空字典。

You need to do empty_set = set() to initialize an empty set. {} is am empty dictionaty.


为什么tuple(set([(1,“ a”,“ b”,“ c”,“ z”,“ f”]))==元组(set([(a,b,c) “ z”,“ f”,1]))85%的时间启用了哈希随机化?

问题:为什么tuple(set([(1,“ a”,“ b”,“ c”,“ z”,“ f”]))==元组(set([(a,b,c) “ z”,“ f”,1]))85%的时间启用了哈希随机化?

鉴于零比雷埃夫斯对另一个问题的回答,我们认为

x = tuple(set([1, "a", "b", "c", "z", "f"]))
y = tuple(set(["a", "b", "c", "z", "f", 1]))
print(x == y)

True在启用散列随机化的情况下,大约打印时间的85%。为什么是85%?

Given Zero Piraeus’ answer to another question, we have that

x = tuple(set([1, "a", "b", "c", "z", "f"]))
y = tuple(set(["a", "b", "c", "z", "f", 1]))
print(x == y)

Prints True about 85% of the time with hash randomization enabled. Why 85%?


回答 0

我假设这个问题的所有读者都读过:

首先要注意的是,哈希随机化是由解释器启动决定的。

两组字母的哈希值都相同,因此唯一重要的是是否发生冲突(顺序会受到影响)。


通过第二个链接的推论,我们知道这些集合的支持数组从长度8开始:

_ _ _ _ _ _ _ _

在第一种情况下,我们插入1

_ 1 _ _ _ _ _ _

然后插入其余部分:

α 1 ? ? ? ? ? ?

然后将其重新映射为32号:

    1 can't collide with α as α is an even hash
  ↓ so 1 is inserted at slot 1 first
? 1 ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?

在第二种情况下,我们插入其余部分:

? β ? ? ? ? ? ?

然后尝试插入1:

    Try to insert 1 here, but will
  ↓ be rehashed if β exists
? β ? ? ? ? ? ?

然后它将被修复:

    Try to insert 1 here, but will
    be rehashed if β exists and has
  ↓ not rehashed somewhere else
? β ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?

因此,迭代次数是否不同仅取决于β是否存在。


β的机率是5个字母中的任何一个都将以1模8 哈希以1模32哈希的概率。

由于任何哈希到1模32的东西也哈希到1模8,我们想找到32个插槽的机会,所以五个插槽之一在插槽1中:

5 (number of letters) / 32 (number of slots)

5/32为0.15625,因此在两个固定结构之间有15.625%的订单概率不同


一点也不奇怪,这正是零比雷埃夫斯所测量的。


¹从技术上讲,这并不明显。我们可以假装5个散列中的每一个都是唯一的,因为需要重新哈希处理,但是由于线性探测,实际上更有可能发生“成束的”结构……但是因为我们只查看是否占用了一个插槽,所以实际上不会影响我们。

I’m going to assume any readers of this question to have read both:

The first thing to note is that hash randomization is decided on interpreter start-up.

The hash of each letter will be the same for both sets, so the only thing that can matter is if there is a collision (where order will be affected).


By the deductions of that second link we know the backing array for these sets starts at length 8:

_ _ _ _ _ _ _ _

In the first case, we insert 1:

_ 1 _ _ _ _ _ _

and then insert the rest:

α 1 ? ? ? ? ? ?

Then it is rehashed to size 32:

    1 can't collide with α as α is an even hash
  ↓ so 1 is inserted at slot 1 first
? 1 ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?

In the second case, we insert the rest:

? β ? ? ? ? ? ?

And then try to insert 1:

    Try to insert 1 here, but will
  ↓ be rehashed if β exists
? β ? ? ? ? ? ?

And then it will be rehashed:

    Try to insert 1 here, but will
    be rehashed if β exists and has
  ↓ not rehashed somewhere else
? β ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?

So whether the iteration orders are different depends solely on whether β exists.


The chance of a β is the chance that any of the 5 letters will hash to 1 modulo 8 and hash to 1 modulo 32.

Since anything that hashes to 1 modulo 32 also hashes to 1 modulo 8, we want to find the chance that of the 32 slots, one of the five is in slot 1:

5 (number of letters) / 32 (number of slots)

5/32 is 0.15625, so there is a 15.625% chance¹ of the orders being different between the two set constructions.


Not very strangely at all, this is exactly what Zero Piraeus measured.


¹Technically even this isn’t obvious. We can pretend every one of the 5 hashes uniquely because of rehashing, but because of linear probing it’s actually more likely for “bunched” structures to occur… but because we’re only looking at whether a single slot is occupied, this doesn’t actually affect us.


2套并集不包含所有项目

问题:2套并集不包含所有项目

在下面的联合中更改两个集合的顺序时,为什么会得到不同的结果?

set1 = {1, 2, 3}
set2 = {True, False}

print(set1 | set2)
# {False, 1, 2, 3}

print(set2 | set1)
#{False, True, 2, 3}

How come when I change the order of the two sets in the unions below, I get different results?

set1 = {1, 2, 3}
set2 = {True, False}

print(set1 | set2)
# {False, 1, 2, 3}

print(set2 | set1)
#{False, True, 2, 3}

回答 0

为什么union()不包含所有项目

1True是等价的,被认为是重复的。同样,0False也等效:

>>> 1 == True
True
>>> 0 == False
True

使用哪个等效值

当遇到多个等效值时,集合将保持第一个可见:

>>> {0, False}
{0}
>>> {False, 0}
{False}

使价值观与众不同的方法

为了使它们与众不同,只需将它们(value, type)成对存储:

>>> set1 = {(1, int), (2, int), (3, int)}
>>> set2 = {(True, bool), (False, bool)}
>>> set1 | set2
{(3, <class 'int'>), (1, <class 'int'>), (2, <class 'int'>),
 (True, <class 'bool'>), (False, <class 'bool'>)}
>>> set1 & set2
set()

区分值的另一种方法是将它们存储为字符串:

>>> set1 = {'1', '2', '3'}
>>> set2 = {'True', 'False'}
>>> set1 | set2
{'2', '3', 'False', 'True', '1'}
>>> set1 & set2
set()

希望这可以消除谜团并显示前进的方向:-)


从评论中救出:

这是为破坏十字型等价(即标准技术0.0 == 0True == 1以及Decimal(8.5) == 8.5)该技术是在Python 2.7的正则表达式模块用于力的unicode正则表达式被从其他等效的STR正则表达式清楚地高速缓存。是在Python也使用的技术类型参数为true时,对于functools.lru_cache()为3。

如果OP需要除默认等效关系以外的其他内容,则需要定义一些新关系。根据使用情况,可能是字符串不区分大小写,Unicode规范化,视觉外观(看起来不同的事物被认为是不同的),身份(没有两个不同的对象被认为是相等的),值/类型对或其他一些定义等价关系的函数。给定OP的特定示例,他/她似乎期望按类型区分或视觉区分。

Why the union() doesn’t contain all items

The 1 and True are equivalent and considered to be duplicates. Likewise the 0 and False are equivalent as well:

>>> 1 == True
True
>>> 0 == False
True

Which equivalent value is used

When multiple equivalent values are encountered, sets keep the first one seen:

>>> {0, False}
{0}
>>> {False, 0}
{False}

Ways to make the values be distinct

To get them to be treated as distinct, just store them in a (value, type) pair:

>>> set1 = {(1, int), (2, int), (3, int)}
>>> set2 = {(True, bool), (False, bool)}
>>> set1 | set2
{(3, <class 'int'>), (1, <class 'int'>), (2, <class 'int'>),
 (True, <class 'bool'>), (False, <class 'bool'>)}
>>> set1 & set2
set()

Another way to make the values distinct is to store them as strings:

>>> set1 = {'1', '2', '3'}
>>> set2 = {'True', 'False'}
>>> set1 | set2
{'2', '3', 'False', 'True', '1'}
>>> set1 & set2
set()

Hope this clears up the mystery and shows the way forward :-)


Rescued from the comments:

This is the standard technique for breaking cross-type equivalence (i.e. 0.0 == 0, True == 1, and Decimal(8.5) == 8.5). The technique is used in Python 2.7’s regular expression module to force unicode regexes to be cached distinctly from otherwise equivalent str regexes. The technique is also used in Python 3 for functools.lru_cache() when the typed parameter is true.

If the OP needs something other than the default equivalence relation, then some new relation needs to be defined. Depending the use case, that could be case-insensitivity for strings, normalization for unicode, visual appearance (things that look different are considered different), identity (no two distinct objects are considered equal), a value/type pair, or some other function that defines an equivalence relation. Given the OPs specific example, it would seem that he/she expected either distinction by type or visual distinction.


回答 1

在Python,False0被认为是等效的,True1。因为True1被视为相同的值,所以它们中的一个只能同时出现在集合中。哪一个取决于它们添加到集合中的顺序。在第一行中,set1将其用作第一集合,因此我们可以得到1结果集合。在第二组中,True在第一组中,因此True包含在结果中。

In Python, False and 0 are considered equivalent, as are True and 1. Because True and 1 are considered the same value, only one of them can be present in a set a the same time. Which one depends on the order they are added to the set in. In the first line, set1 is used as the first set, so we get 1 in the resulting set. In the second set, True is in the first set, so True is included in the result.


回答 2

如果您查看https://docs.python.org/3/library/stdtypes.html#boolean-values部分4.12.10。布尔值:

布尔值是两个常量对象False和True。它们用于表示真值(尽管其他值也可以视为假或真)。在数字上下文中(例如,用作算术运算符的参数时),它们的行为分别类似于整数0和1

If you look at https://docs.python.org/3/library/stdtypes.html#boolean-values section 4.12.10. Boolean Values:

Boolean values are the two constant objects False and True. They are used to represent truth values (although other values can also be considered false or true). In numeric contexts (for example when used as the argument to an arithmetic operator), they behave like the integers 0 and 1, respectively.


回答 3

为布尔定义了比较运算符(==!=),TrueFalse匹配1和0。

这就是为什么在集合联合中,当它检查是否True已经在新集合中时,会得到一个真实的答案:

>>> True in {1}
True
>>> 1 in {True}
True

The comparison operator (==, !=) is defined for boolean True and False to match 1 and 0.

That’s why, in the set union, when it checks whether True is in the new set already, it gets a truthy answer:

>>> True in {1}
True
>>> 1 in {True}
True

如何在python中将项目添加到空集

问题:如何在python中将项目添加到空集

我有以下步骤:

def myProc(invIndex, keyWord):
    D={}
    for i in range(len(keyWord)):
        if keyWord[i] in invIndex.keys():
                    D.update(invIndex[query[i]])
    return D

但是我收到以下错误:

Traceback (most recent call last):
  File "<stdin>", line 3, in <module>
TypeError: cannot convert dictionary update sequence element #0 to a sequence

如果D包含元素,则不会出现任何错误。但是我需要D在开始时为空。

I have the following procedure:

def myProc(invIndex, keyWord):
    D={}
    for i in range(len(keyWord)):
        if keyWord[i] in invIndex.keys():
                    D.update(invIndex[query[i]])
    return D

But I am getting the following error:

Traceback (most recent call last):
  File "<stdin>", line 3, in <module>
TypeError: cannot convert dictionary update sequence element #0 to a sequence

I do not get any error if D contains elements. But I need D to be empty at the beginning.


回答 0

D = {} 是未设置的字典。

>>> d = {}
>>> type(d)
<type 'dict'>

用途 D = set()

>>> d = set()
>>> type(d)
<type 'set'>
>>> d.update({1})
>>> d.add(2)
>>> d.update([3,3,3])
>>> d
set([1, 2, 3])

D = {} is a dictionary not set.

>>> d = {}
>>> type(d)
<type 'dict'>

Use D = set():

>>> d = set()
>>> type(d)
<type 'set'>
>>> d.update({1})
>>> d.add(2)
>>> d.update([3,3,3])
>>> d
set([1, 2, 3])

回答 1

>>> d = {}
>>> D = set()
>>> type(d)
<type 'dict'>
>>> type(D)
<type 'set'>

您制作的是字典而不是Set。

update字典中的方法用于从上一个字典更新新字典,就像这样,

>>> abc = {1: 2}
>>> d.update(abc)
>>> d
{1: 2}

而在集合中,它用于向集合中添加元素。

>>> D.update([1, 2])
>>> D
set([1, 2])
>>> d = {}
>>> D = set()
>>> type(d)
<type 'dict'>
>>> type(D)
<type 'set'>

What you’ve made is a dictionary and not a Set.

The update method in dictionary is used to update the new dictionary from a previous one, like so,

>>> abc = {1: 2}
>>> d.update(abc)
>>> d
{1: 2}

Whereas in sets, it is used to add elements to the set.

>>> D.update([1, 2])
>>> D
set([1, 2])

回答 2

当您将变量分配给空括号{}例如:时new_set = {},它将成为字典。要创建一个空集,请将变量分配给“ set()”,即:new_set = set()

When you assign a variable to empty curly braces {} eg: new_set = {}, it becomes a dictionary. To create an empty set, assign the variable to a ‘set()’ ie: new_set = set()


如何获得集合的所有子集?(电源组)

问题:如何获得集合的所有子集?(电源组)

给定一套

{0, 1, 2, 3}

如何产生子集:

[set(),
 {0},
 {1},
 {2},
 {3},
 {0, 1},
 {0, 2},
 {0, 3},
 {1, 2},
 {1, 3},
 {2, 3},
 {0, 1, 2},
 {0, 1, 3},
 {0, 2, 3},
 {1, 2, 3},
 {0, 1, 2, 3}]

Given a set

{0, 1, 2, 3}

How can I produce the subsets:

[set(),
 {0},
 {1},
 {2},
 {3},
 {0, 1},
 {0, 2},
 {0, 3},
 {1, 2},
 {1, 3},
 {2, 3},
 {0, 1, 2},
 {0, 1, 3},
 {0, 2, 3},
 {1, 2, 3},
 {0, 1, 2, 3}]

回答 0

Pythonitertools页面对此有一个精确的powerset配方:

from itertools import chain, combinations

def powerset(iterable):
    "powerset([1,2,3]) --> () (1,) (2,) (3,) (1,2) (1,3) (2,3) (1,2,3)"
    s = list(iterable)
    return chain.from_iterable(combinations(s, r) for r in range(len(s)+1))

输出:

>>> list(powerset("abcd"))
[(), ('a',), ('b',), ('c',), ('d',), ('a', 'b'), ('a', 'c'), ('a', 'd'), ('b', 'c'), ('b', 'd'), ('c', 'd'), ('a', 'b', 'c'), ('a', 'b', 'd'), ('a', 'c', 'd'), ('b', 'c', 'd'), ('a', 'b', 'c', 'd')]

如果您不喜欢开头的空元组,则可以更改range语句range(1, len(s)+1)以避免使用0长度的组合。

The Python itertools page has exactly a powerset recipe for this:

from itertools import chain, combinations

def powerset(iterable):
    "powerset([1,2,3]) --> () (1,) (2,) (3,) (1,2) (1,3) (2,3) (1,2,3)"
    s = list(iterable)
    return chain.from_iterable(combinations(s, r) for r in range(len(s)+1))

Output:

>>> list(powerset("abcd"))
[(), ('a',), ('b',), ('c',), ('d',), ('a', 'b'), ('a', 'c'), ('a', 'd'), ('b', 'c'), ('b', 'd'), ('c', 'd'), ('a', 'b', 'c'), ('a', 'b', 'd'), ('a', 'c', 'd'), ('b', 'c', 'd'), ('a', 'b', 'c', 'd')]

If you don’t like that empty tuple at the beginning, you can just change the range statement to range(1, len(s)+1) to avoid a 0-length combination.


回答 1

这是有关电源组的更多代码。这是从头开始写的:

>>> def powerset(s):
...     x = len(s)
...     for i in range(1 << x):
...         print [s[j] for j in range(x) if (i & (1 << j))]
...
>>> powerset([4,5,6])
[]
[4]
[5]
[4, 5]
[6]
[4, 6]
[5, 6]
[4, 5, 6]

马克·鲁沙科夫(Mark Rushakoff)的评论在这里适用:“如果您不喜欢开头的空元组,请继续。”您可以将range语句更改为range(1,len(s)+1)以避免长度为0的组合”除了在我的情况下更改for i in range(1 << x)for i in range(1, 1 << x)


回到今年以后,我现在将其编写为:

def powerset(s):
    x = len(s)
    masks = [1 << i for i in range(x)]
    for i in range(1 << x):
        yield [ss for mask, ss in zip(masks, s) if i & mask]

然后,测试代码如下所示:

print(list(powerset([4, 5, 6])))

使用yield意味着您无需在单个内存中计算所有结果。在主循环之外预先计算掩码被认为是值得进行的优化。

Here is more code for a powerset. This is written from scratch:

>>> def powerset(s):
...     x = len(s)
...     for i in range(1 << x):
...         print [s[j] for j in range(x) if (i & (1 << j))]
...
>>> powerset([4,5,6])
[]
[4]
[5]
[4, 5]
[6]
[4, 6]
[5, 6]
[4, 5, 6]

Mark Rushakoff’s comment is applicable here: “If you don’t like that empty tuple at the beginning, on.”you can just change the range statement to range(1, len(s)+1) to avoid a 0-length combination”, except in my case you change for i in range(1 << x) to for i in range(1, 1 << x).


Returning to this years later, I’d now write it like this:

def powerset(s):
    x = len(s)
    masks = [1 << i for i in range(x)]
    for i in range(1 << x):
        yield [ss for mask, ss in zip(masks, s) if i & mask]

And then the test code would look like this, say:

print(list(powerset([4, 5, 6])))

Using yield means that you do not need to calculate all results in a single piece of memory. Precalculating the masks outside the main loop is assumed to be a worthwhile optimization.


回答 2

如果您正在寻找一个快速的答案,我刚刚在Google上搜索了“ python power set”,并提出了以下建议:Python Power Set Generator

这是该页面中代码的复制粘贴:

def powerset(seq):
    """
    Returns all the subsets of this set. This is a generator.
    """
    if len(seq) <= 1:
        yield seq
        yield []
    else:
        for item in powerset(seq[1:]):
            yield [seq[0]]+item
            yield item

可以这样使用:

 l = [1, 2, 3, 4]
 r = [x for x in powerset(l)]

现在r是您想要的所有元素的列表,可以进行排序和打印:

r.sort()
print r
[[], [1], [1, 2], [1, 2, 3], [1, 2, 3, 4], [1, 2, 4], [1, 3], [1, 3, 4], [1, 4], [2], [2, 3], [2, 3, 4], [2, 4], [3], [3, 4], [4]]

If you’re looking for a quick answer, I just searched “python power set” on google and came up with this: Python Power Set Generator

Here’s a copy-paste from the code in that page:

def powerset(seq):
    """
    Returns all the subsets of this set. This is a generator.
    """
    if len(seq) <= 1:
        yield seq
        yield []
    else:
        for item in powerset(seq[1:]):
            yield [seq[0]]+item
            yield item

This can be used like this:

 l = [1, 2, 3, 4]
 r = [x for x in powerset(l)]

Now r is a list of all the elements you wanted, and can be sorted and printed:

r.sort()
print r
[[], [1], [1, 2], [1, 2, 3], [1, 2, 3, 4], [1, 2, 4], [1, 3], [1, 3, 4], [1, 4], [2], [2, 3], [2, 3, 4], [2, 4], [3], [3, 4], [4]]

回答 3

def powerset(lst):
    return reduce(lambda result, x: result + [subset + [x] for subset in result],
                  lst, [[]])
def powerset(lst):
    return reduce(lambda result, x: result + [subset + [x] for subset in result],
                  lst, [[]])

回答 4

powerset有一个改进:

def powerset(seq):
    """
    Returns all the subsets of this set. This is a generator.
    """
    if len(seq) <= 0:
        yield []
    else:
        for item in powerset(seq[1:]):
            yield [seq[0]]+item
            yield item

There is a refinement of powerset:

def powerset(seq):
    """
    Returns all the subsets of this set. This is a generator.
    """
    if len(seq) <= 0:
        yield []
    else:
        for item in powerset(seq[1:]):
            yield [seq[0]]+item
            yield item

回答 5

TL; DR(直接进入简化)

我知道我以前已经添加了答案,但是我真的很喜欢我的新实现。我将一个集合作为输入,但是实际上它可以是任何迭代的,并且我返回的是集合的集合,即输入的幂集。我喜欢这种方法,因为它更符合幂集所有子集)的数学定义。

def power_set(A):
    """A is an iterable (list, tuple, set, str, etc)
    returns a set which is the power set of A."""
    length = len(A)
    l = [a for a in A]
    ps = set()

    for i in range(2 ** length):
        selector = f'{i:0{length}b}'
        subset = {l[j] for j, bit in enumerate(selector) if bit == '1'}
        ps.add(frozenset(subset))

    return ps

如果您想确切地在答案中发布输出,请使用以下命令:

>>> [set(s) for s in power_set({1, 2, 3, 4})]
[{3, 4},
 {2},
 {1, 4},
 {2, 3, 4},
 {2, 3},
 {1, 2, 4},
 {1, 2},
 {1, 2, 3},
 {3},
 {2, 4},
 {1},
 {1, 2, 3, 4},
 set(),
 {1, 3},
 {1, 3, 4},
 {4}]

说明

已知功率集的元素数为2 ** len(A),因此可以在for循环中清楚地看到。

我需要将输入(最好是一组)转换为列表,因为一组是唯一无序元素的数据结构,而顺序对于生成子集至关重要。

selector是此算法的关键。请注意,selector它的长度与输入集的长度相同,为了使之成为可能,它使用带填充的f字符串。基本上,这使我可以选择将在每次迭代期间添加到每个子集的元素。假设输入集包含3个元素{0, 1, 2},那么选择器将采用0到7(含)之间的值,二进制形式为:

000 # 0
001 # 1
010 # 2
011 # 3
100 # 4
101 # 5
110 # 6
111 # 7

因此,无论是否应添加原始集合的元素,每个位都可以用作指示符。查看二进制数字,然后将每个数字都视为超集的元素,这1意味着j应添加索引处的元素,并且不应添加0此元素。

我使用集合推导在每次迭代时生成一个子集,并将此子集转换为,frozenset以便可以将其添加到ps(幂集)。否则,我将无法添加它,因为Python中的集合仅包含不可变的对象。

简化版

您可以使用一些python理解来简化代码,因此可以摆脱那些for循环。您还zip可以避免使用j索引,并且代码最终将如下所示:

def power_set(A):
    length = len(A)
    return {
        frozenset({e for e, b in zip(A, f'{i:{length}b}') if b == '1'})
        for i in range(2 ** length)
    }

而已。我喜欢这种算法的原因是它比其他算法更清晰,更直观,因为itertools即使它按预期工作,依靠它看起来也很神奇。

TL;DR (go directly to Simplification)

I know I have previously added an answer, but I really like my new implementation. I am taking a set as input, but it actually could be any iterable, and I am returning a set of sets which is the power set of the input. I like this approach because it is more aligned with the mathematical definition of power set (set of all subsets).

def power_set(A):
    """A is an iterable (list, tuple, set, str, etc)
    returns a set which is the power set of A."""
    length = len(A)
    l = [a for a in A]
    ps = set()

    for i in range(2 ** length):
        selector = f'{i:0{length}b}'
        subset = {l[j] for j, bit in enumerate(selector) if bit == '1'}
        ps.add(frozenset(subset))

    return ps

If you want exactly the output you posted in your answer use this:

>>> [set(s) for s in power_set({1, 2, 3, 4})]
[{3, 4},
 {2},
 {1, 4},
 {2, 3, 4},
 {2, 3},
 {1, 2, 4},
 {1, 2},
 {1, 2, 3},
 {3},
 {2, 4},
 {1},
 {1, 2, 3, 4},
 set(),
 {1, 3},
 {1, 3, 4},
 {4}]

Explanation

It is known that the number of elements of the power set is 2 ** len(A), so that could clearly be seen in the for loop.

I need to convert the input (ideally a set) into a list because by a set is a data structure of unique unordered elements, and the order will be crucial to generate the subsets.

selector is key in this algorithm. Note that selector has the same length as the input set, and to make this possible it is using an f-string with padding. Basically, this allows me to select the elements that will be added to each subset during each iteration. Let’s say the input set has 3 elements {0, 1, 2}, so selector will take values between 0 and 7 (inclusive), which in binary are:

000 # 0
001 # 1
010 # 2
011 # 3
100 # 4
101 # 5
110 # 6
111 # 7

So, each bit could serve as an indicator if an element of the original set should be added or not. Look at the binary numbers, and just think of each number as an element of the super set in which 1 means that an element at index j should be added, and 0 means that this element should not be added.

I am using a set comprehension to generate a subset at each iteration, and I convert this subset into a frozenset so I can add it to ps (power set). Otherwise, I won’t be able to add it because a set in Python consists only of immutable objects.

Simplification

You can simplify the code using some python comprehensions, so you can get rid of those for loops. You can also use zip to avoid using j index and the code will end up as the following:

def power_set(A):
    length = len(A)
    return {
        frozenset({e for e, b in zip(A, f'{i:{length}b}') if b == '1'})
        for i in range(2 ** length)
    }

That’s it. What I like of this algorithm is that is clearer and more intuitive than others because it looks quite magical to rely on itertools even though it works as expected.


回答 6

def get_power_set(s):
  power_set=[[]]
  for elem in s:
    # iterate over the sub sets so far
    for sub_set in power_set:
      # add a new subset consisting of the subset at hand added elem
      power_set=power_set+[list(sub_set)+[elem]]
  return power_set

例如:

get_power_set([1,2,3])

[[], [1], [2], [1, 2], [3], [1, 3], [2, 3], [1, 2, 3]]
def get_power_set(s):
  power_set=[[]]
  for elem in s:
    # iterate over the sub sets so far
    for sub_set in power_set:
      # add a new subset consisting of the subset at hand added elem
      power_set=power_set+[list(sub_set)+[elem]]
  return power_set

For example:

get_power_set([1,2,3])

yield

[[], [1], [2], [1, 2], [3], [1, 3], [2, 3], [1, 2, 3]]

回答 7

我发现以下算法非常清楚和简单:

def get_powerset(some_list):
    """Returns all subsets of size 0 - len(some_list) for some_list"""
    if len(some_list) == 0:
        return [[]]

    subsets = []
    first_element = some_list[0]
    remaining_list = some_list[1:]
    # Strategy: get all the subsets of remaining_list. For each
    # of those subsets, a full subset list will contain both
    # the original subset as well as a version of the subset
    # that contains first_element
    for partial_subset in get_powerset(remaining_list):
        subsets.append(partial_subset)
        subsets.append(partial_subset[:] + [first_element])

    return subsets

生成功率集的另一种方法是生成所有具有n位的二进制数。作为幂集,带n数字的位数为2 ^ n。该算法的原理是,子集中可能存在或不存在元素,因为二进制数字可能是一个或零,但不能同时存在。

def power_set(items):
    N = len(items)
    # enumerate the 2 ** N possible combinations
    for i in range(2 ** N):
        combo = []
        for j in range(N):
            # test bit jth of integer i
            if (i >> j) % 2 == 1:
                combo.append(items[j])
        yield combo

在上MITx时,我找到了两种算法:6.00.2x计算思维和数据科学概论,我认为这是我所见过的最容易理解的算法之一。

I have found the following algorithm very clear and simple:

def get_powerset(some_list):
    """Returns all subsets of size 0 - len(some_list) for some_list"""
    if len(some_list) == 0:
        return [[]]

    subsets = []
    first_element = some_list[0]
    remaining_list = some_list[1:]
    # Strategy: get all the subsets of remaining_list. For each
    # of those subsets, a full subset list will contain both
    # the original subset as well as a version of the subset
    # that contains first_element
    for partial_subset in get_powerset(remaining_list):
        subsets.append(partial_subset)
        subsets.append(partial_subset[:] + [first_element])

    return subsets

Another way one can generate the powerset is by generating all binary numbers that have n bits. As a power set the amount of number with n digits is 2 ^ n. The principle of this algorithm is that an element could be present or not in a subset as a binary digit could be one or zero but not both.

def power_set(items):
    N = len(items)
    # enumerate the 2 ** N possible combinations
    for i in range(2 ** N):
        combo = []
        for j in range(N):
            # test bit jth of integer i
            if (i >> j) % 2 == 1:
                combo.append(items[j])
        yield combo

I found both algorithms when I was taking MITx: 6.00.2x Introduction to Computational Thinking and Data Science, and I consider it is one of the easiest algorithms to understand I have seen.


回答 8

我只是想提供最容易理解的解决方案,即反代码高尔夫版本。

from itertools import combinations

l = ["x", "y", "z", ]

def powerset(items):
    combo = []
    for r in range(len(items) + 1):
        #use a list to coerce a actual list from the combinations generator
        combo.append(list(combinations(items,r)))
    return combo

l_powerset = powerset(l)

for i, item in enumerate(l_powerset):
    print "All sets of length ", i
    print item

结果

全部套长0

[()]

全部套长1

[('x',), ('y',), ('z',)]

全套长度2

[('x', 'y'), ('x', 'z'), ('y', 'z')]

全套长度3

[('x', 'y', 'z')]

有关更多信息,请参见itertools文档,以及有关电源集的Wikipedia条目

I just wanted to provide the most comprehensible solution, the anti code-golf version.

from itertools import combinations

l = ["x", "y", "z", ]

def powerset(items):
    combo = []
    for r in range(len(items) + 1):
        #use a list to coerce a actual list from the combinations generator
        combo.append(list(combinations(items,r)))
    return combo

l_powerset = powerset(l)

for i, item in enumerate(l_powerset):
    print "All sets of length ", i
    print item

The results

All sets of length 0

[()]

All sets of length 1

[('x',), ('y',), ('z',)]

All sets of length 2

[('x', 'y'), ('x', 'z'), ('y', 'z')]

All sets of length 3

[('x', 'y', 'z')]

For more see the itertools docs, also the wikipedia entry on power sets


回答 9

只是一个快速的动力设定刷新器!

X的幂集,简单来说就是X的所有子集的集合,包括空集

示例集X =(a,b,c)

幂集= {{a,b,c},{a,b},{a,c},{b,c},{a},{b},{c},{}}

这是查找功率集的另一种方法:

def power_set(input):
    # returns a list of all subsets of the list a
    if (len(input) == 0):
        return [[]]
    else:
        main_subset = [ ]
        for small_subset in power_set(input[1:]):
            main_subset += [small_subset]
            main_subset += [[input[0]] + small_subset]
        return main_subset

print(power_set([0,1,2,3]))

完全归功于来源

Just a quick power set refresher !

Power set of a set X, is simply the set of all subsets of X including the empty set

Example set X = (a,b,c)

Power Set = { { a , b , c } , { a , b } , { a , c } , { b , c } , { a } , { b } , { c } , { } }

Here is another way of finding power set:

def power_set(input):
    # returns a list of all subsets of the list a
    if (len(input) == 0):
        return [[]]
    else:
        main_subset = [ ]
        for small_subset in power_set(input[1:]):
            main_subset += [small_subset]
            main_subset += [[input[0]] + small_subset]
        return main_subset

print(power_set([0,1,2,3]))

full credit to source


回答 10

这可以很自然地通过itertools.product以下方式完成:

import itertools

def powerset(l):
    for sl in itertools.product(*[[[], [i]] for i in l]):
        yield {j for i in sl for j in i}

This can be done very naturally with itertools.product:

import itertools

def powerset(l):
    for sl in itertools.product(*[[[], [i]] for i in l]):
        yield {j for i in sl for j in i}

回答 11

一种简单的方法是利用2的补数算法利用整数的内部表示。

整数的二进制表示形式为{000,001,010,011,100,101,110,111}对于范围从0到7的数字。对于整数计数器值,将1视为集合中包含的对应元素,将’0’视为整数作为排除,我们可以根据计数顺序生成子集。必须从生成数字0pow(2,n) -1其中n是数组的长度,即二进制表示形式的位数。

基于它的简单子集生成器函数可以编写如下。它基本上依赖

def subsets(array):
    if not array:
        return
    else:
        length = len(array)
        for max_int in range(0x1 << length):
            subset = []
            for i in range(length):
                if max_int & (0x1 << i):
                    subset.append(array[i])
            yield subset

然后可以用作

def get_subsets(array):
    powerset = []
    for i in subsets(array):
        powerser.append(i)
    return powerset

测验

在本地文件中添加以下内容

if __name__ == '__main__':
    sample = ['b',  'd',  'f']

    for i in range(len(sample)):
        print "Subsets for " , sample[i:], " are ", get_subsets(sample[i:])

提供以下输出

Subsets for  ['b', 'd', 'f']  are  [[], ['b'], ['d'], ['b', 'd'], ['f'], ['b', 'f'], ['d', 'f'], ['b', 'd', 'f']]
Subsets for  ['d', 'f']  are  [[], ['d'], ['f'], ['d', 'f']]
Subsets for  ['f']  are  [[], ['f']]

A simple way would be to harness the internal representation of integers under 2’s complement arithmetic.

Binary representation of integers is as {000, 001, 010, 011, 100, 101, 110, 111} for numbers ranging from 0 to 7. For an integer counter value, considering 1 as inclusion of corresponding element in collection and ‘0’ as exclusion we can generate subsets based on the counting sequence. Numbers have to be generated from 0 to pow(2,n) -1 where n is the length of array i.e. number of bits in binary representation.

A simple Subset Generator Function based on it can be written as below. It basically relies

def subsets(array):
    if not array:
        return
    else:
        length = len(array)
        for max_int in range(0x1 << length):
            subset = []
            for i in range(length):
                if max_int & (0x1 << i):
                    subset.append(array[i])
            yield subset

and then it can be used as

def get_subsets(array):
    powerset = []
    for i in subsets(array):
        powerser.append(i)
    return powerset

Testing

Adding following in local file

if __name__ == '__main__':
    sample = ['b',  'd',  'f']

    for i in range(len(sample)):
        print "Subsets for " , sample[i:], " are ", get_subsets(sample[i:])

gives following output

Subsets for  ['b', 'd', 'f']  are  [[], ['b'], ['d'], ['b', 'd'], ['f'], ['b', 'f'], ['d', 'f'], ['b', 'd', 'f']]
Subsets for  ['d', 'f']  are  [[], ['d'], ['f'], ['d', 'f']]
Subsets for  ['f']  are  [[], ['f']]

回答 12

空集是所有子集的一部分,则可以使用:

def subsets(iterable):
    for n in range(len(iterable) + 1):
        yield from combinations(iterable, n)

With empty set, which is part of all the subsets, you could use:

def subsets(iterable):
    for n in range(len(iterable) + 1):
        yield from combinations(iterable, n)

回答 13

几乎所有这些答案都使用list而不是set来代替,这对我来说似乎有点作弊。因此,出于好奇,我尝试真正地制作一个简单的版本set并为其他“ Python新手”进行总结。

我发现在处理Python的set实现方面有很多奇怪之处。给我的主要惊喜是处理空集。这与Ruby的Set实现相反,在Ruby中,我可以简单地进行操作Set[Set[]]并获得一个Set包含一个empty的内容Set,因此我最初发现它有些混乱。

回顾powerset一下,在使用sets时,我遇到了两个问题:

  1. set()需要一个可迭代的,所以set(set())会返回,set() 因为空集的可迭代是空的(我猜:)
  2. 获得一组集合,set({set()})并且set.add(set)由于不可散列而无法工作set()

为了解决这两个问题,我使用了frozenset(),这意味着我没有得到我想要的东西(类型从字面上看set),而是利用了整体set界面。

def powerset(original_set):
  # below gives us a set with one empty set in it
  ps = set({frozenset()}) 
  for member in original_set:
    subset = set()
    for m in ps:
      # to be added into subset, needs to be
      # frozenset.union(set) so it's hashable
      subset.add(m.union(set([member]))
    ps = ps.union(subset)
  return ps

下面我们frozenset正确地得到2²(16)s作为输出:

In [1]: powerset(set([1,2,3,4]))
Out[2]:
{frozenset(),
 frozenset({3, 4}),
 frozenset({2}),
 frozenset({1, 4}),
 frozenset({3}),
 frozenset({2, 3}),
 frozenset({2, 3, 4}),
 frozenset({1, 2}),
 frozenset({2, 4}),
 frozenset({1}),
 frozenset({1, 2, 4}),
 frozenset({1, 3}),
 frozenset({1, 2, 3}),
 frozenset({4}),
 frozenset({1, 3, 4}),
 frozenset({1, 2, 3, 4})}

既然没有办法有一个setset,S在Python如果你想将这些frozensets转换setS,你必须回映射到listlist(map(set, powerset(set([1,2,3,4])))) )或修改上面。

Almost all of these answers use list rather than set, which felt like a bit of a cheat to me. So, out of curiosity I tried to do a simple version truly on set and summarize for other “new to Python” folks.

I found there’s a couple oddities in dealing with Python’s set implementation. The main surprise to me was handling empty sets. This is in contrast to Ruby’s Set implementation, where I can simply do Set[Set[]] and get a Set containing one empty Set, so I found it initially a little confusing.

To review, in doing powerset with sets, I encountered two problems:

  1. set() takes an iterable, so set(set()) will return set() because the empty set iterable is empty (duh I guess :))
  2. to get a set of sets, set({set()}) and set.add(set) won’t work because set() isn’t hashable

To solve both issues, I made use of frozenset(), which means I don’t quite get what I want (type is literally set), but makes use of the overall set interace.

def powerset(original_set):
  # below gives us a set with one empty set in it
  ps = set({frozenset()}) 
  for member in original_set:
    subset = set()
    for m in ps:
      # to be added into subset, needs to be
      # frozenset.union(set) so it's hashable
      subset.add(m.union(set([member]))
    ps = ps.union(subset)
  return ps

Below we get 2² (16) frozensets correctly as output:

In [1]: powerset(set([1,2,3,4]))
Out[2]:
{frozenset(),
 frozenset({3, 4}),
 frozenset({2}),
 frozenset({1, 4}),
 frozenset({3}),
 frozenset({2, 3}),
 frozenset({2, 3, 4}),
 frozenset({1, 2}),
 frozenset({2, 4}),
 frozenset({1}),
 frozenset({1, 2, 4}),
 frozenset({1, 3}),
 frozenset({1, 2, 3}),
 frozenset({4}),
 frozenset({1, 3, 4}),
 frozenset({1, 2, 3, 4})}

As there’s no way to have a set of sets in Python, if you want to turn these frozensets into sets, you’ll have to map them back into a list (list(map(set, powerset(set([1,2,3,4])))) ) or modify the above.


回答 14

也许这个问题越来越老了,但我希望我的代码能对某人有所帮助。

def powSet(set):
    if len(set) == 0:
       return [[]]
    return addtoAll(set[0],powSet(set[1:])) + powSet(set[1:])

def addtoAll(e, set):
   for c in set:
       c.append(e)
   return set

Perhaps the question is getting old, but I hope my code will help someone.

def powSet(set):
    if len(set) == 0:
       return [[]]
    return addtoAll(set[0],powSet(set[1:])) + powSet(set[1:])

def addtoAll(e, set):
   for c in set:
       c.append(e)
   return set

回答 15

使用powerset()包中的函数more_itertools

产生可迭代的所有可能子集

>>> list(powerset([1, 2, 3]))
[(), (1,), (2,), (3,), (1, 2), (1, 3), (2, 3), (1, 2, 3)]

如果要设置,请使用:

list(map(set, powerset(iterable)))

Use function powerset() from package more_itertools.

Yields all possible subsets of the iterable

>>> list(powerset([1, 2, 3]))
[(), (1,), (2,), (3,), (1, 2), (1, 3), (2, 3), (1, 2, 3)]

If you want sets, use:

list(map(set, powerset(iterable)))

回答 16

递归获取所有子集。疯狂屁股一线

from typing import List

def subsets(xs: list) -> List[list]:
    return subsets(xs[1:]) + [x + [xs[0]] for x in subsets(xs[1:])] if xs else [[]]

基于Haskell解决方案

subsets :: [a] -> [[a]]
subsets [] = [[]]
subsets (x:xs) = map (x:) (subsets xs) ++ subsets xs

Getting all the subsets with recursion. Crazy-ass one-liner

from typing import List

def subsets(xs: list) -> List[list]:
    return subsets(xs[1:]) + [x + [xs[0]] for x in subsets(xs[1:])] if xs else [[]]

Based on a Haskell solution

subsets :: [a] -> [[a]]
subsets [] = [[]]
subsets (x:xs) = map (x:) (subsets xs) ++ subsets xs

回答 17

def findsubsets(s, n): 
    return list(itertools.combinations(s, n)) 

def allsubsets(s) :
    a = []
    for x in range(1,len(s)+1):
        a.append(map(set,findsubsets(s,x)))      
    return a
def findsubsets(s, n): 
    return list(itertools.combinations(s, n)) 

def allsubsets(s) :
    a = []
    for x in range(1,len(s)+1):
        a.append(map(set,findsubsets(s,x)))      
    return a

回答 18

您可以这样做:

def powerset(x):
    m=[]
    if not x:
        m.append(x)
    else:
        A = x[0]
        B = x[1:]
        for z in powerset(B):
            m.append(z)
            r = [A] + z
            m.append(r)
    return m

print(powerset([1, 2, 3, 4]))

输出:

[[], [1], [2], [1, 2], [3], [1, 3], [2, 3], [1, 2, 3], [4], [1, 4], [2, 4], [1, 2, 4], [3, 4], [1, 3, 4], [2, 3, 4], [1, 2, 3, 4]]

You can do it like this:

def powerset(x):
    m=[]
    if not x:
        m.append(x)
    else:
        A = x[0]
        B = x[1:]
        for z in powerset(B):
            m.append(z)
            r = [A] + z
            m.append(r)
    return m

print(powerset([1, 2, 3, 4]))

Output:

[[], [1], [2], [1, 2], [3], [1, 3], [2, 3], [1, 2, 3], [4], [1, 4], [2, 4], [1, 2, 4], [3, 4], [1, 3, 4], [2, 3, 4], [1, 2, 3, 4]]

回答 19

我知道这为时已晚

已经有许多其他解决方案,但仍然…

def power_set(lst):
    pw_set = [[]]

    for i in range(0,len(lst)):
        for j in range(0,len(pw_set)):
            ele = pw_set[j].copy()
            ele = ele + [lst[i]]
            pw_set = pw_set + [ele]

    return pw_set

I know this is too late

There are many other solutions already but still…

def power_set(lst):
    pw_set = [[]]

    for i in range(0,len(lst)):
        for j in range(0,len(pw_set)):
            ele = pw_set[j].copy()
            ele = ele + [lst[i]]
            pw_set = pw_set + [ele]

    return pw_set

回答 20

这很疯狂,因为这些答案均未提供实际的Python集的返回值。这是一个凌乱的实现,它将提供实际上是Python的powerset set

test_set = set(['yo', 'whatup', 'money'])
def powerset( base_set ):
    """ modified from pydoc's itertools recipe shown above"""
    from itertools import chain, combinations
    base_list = list( base_set )
    combo_list = [ combinations(base_list, r) for r in range(len(base_set)+1) ]

    powerset = set([])
    for ll in combo_list:
        list_of_frozensets = list( map( frozenset, map( list, ll ) ) ) 
        set_of_frozensets = set( list_of_frozensets )
        powerset = powerset.union( set_of_frozensets )

    return powerset

print powerset( test_set )
# >>> set([ frozenset(['money','whatup']), frozenset(['money','whatup','yo']), 
#        frozenset(['whatup']), frozenset(['whatup','yo']), frozenset(['yo']),
#        frozenset(['money','yo']), frozenset(['money']), frozenset([]) ])

不过,我希望看到更好的实现。

This is wild because none of these answers actually provide the return of an actual Python set. Here is a messy implementation that will give a powerset that actually is a Python set.

test_set = set(['yo', 'whatup', 'money'])
def powerset( base_set ):
    """ modified from pydoc's itertools recipe shown above"""
    from itertools import chain, combinations
    base_list = list( base_set )
    combo_list = [ combinations(base_list, r) for r in range(len(base_set)+1) ]

    powerset = set([])
    for ll in combo_list:
        list_of_frozensets = list( map( frozenset, map( list, ll ) ) ) 
        set_of_frozensets = set( list_of_frozensets )
        powerset = powerset.union( set_of_frozensets )

    return powerset

print powerset( test_set )
# >>> set([ frozenset(['money','whatup']), frozenset(['money','whatup','yo']), 
#        frozenset(['whatup']), frozenset(['whatup','yo']), frozenset(['yo']),
#        frozenset(['money','yo']), frozenset(['money']), frozenset([]) ])

I’d love to see a better implementation, though.


回答 21

这是我利用组合但仅使用内置功能的快速实现。

def powerSet(array):
    length = str(len(array))
    formatter = '{:0' + length + 'b}'
    combinations = []
    for i in xrange(2**int(length)):
        combinations.append(formatter.format(i))
    sets = set()
    currentSet = []
    for combo in combinations:
        for i,val in enumerate(combo):
            if val=='1':
                currentSet.append(array[i])
        sets.add(tuple(sorted(currentSet)))
        currentSet = []
    return sets

Here is my quick implementation utilizing combinations but using only built-ins.

def powerSet(array):
    length = str(len(array))
    formatter = '{:0' + length + 'b}'
    combinations = []
    for i in xrange(2**int(length)):
        combinations.append(formatter.format(i))
    sets = set()
    currentSet = []
    for combo in combinations:
        for i,val in enumerate(combo):
            if val=='1':
                currentSet.append(array[i])
        sets.add(tuple(sorted(currentSet)))
        currentSet = []
    return sets

回答 22

范围n中的所有子集均已设置:

n = int(input())
l = [i for i in range (1, n + 1)]

for number in range(2 ** n) :
    binary = bin(number)[: 1 : -1]
    subset = [l[i] for i in range(len(binary)) if binary[i] == "1"]
    print(set(sorted(subset)) if number > 0 else "{}")

All subsets in range n as set:

n = int(input())
l = [i for i in range (1, n + 1)]

for number in range(2 ** n) :
    binary = bin(number)[: 1 : -1]
    subset = [l[i] for i in range(len(binary)) if binary[i] == "1"]
    print(set(sorted(subset)) if number > 0 else "{}")

回答 23

import math    
def printPowerSet(set,set_size): 
    pow_set_size =int(math.pow(2, set_size))
    for counter in range(pow_set_size):
    for j in range(set_size):  
        if((counter & (1 << j)) > 0):
            print(set[j], end = "")
    print("")
set = ['a', 'b', 'c']
printPowerSet(set,3)
import math    
def printPowerSet(set,set_size): 
    pow_set_size =int(math.pow(2, set_size))
    for counter in range(pow_set_size):
    for j in range(set_size):  
        if((counter & (1 << j)) > 0):
            print(set[j], end = "")
    print("")
set = ['a', 'b', 'c']
printPowerSet(set,3)

回答 24

我在《发现计算机科学:跨学科问题,原理和Python编程。2015年版》这本书中看到了一个问题的变体。在练习10.2.11中,输入只是一个整数,而输出应为幂集。这是我的递归解决方案(除了基本的python3,不使用其他任何东西)

def powerSetR(n):
    assert n >= 0
    if n == 0:
        return [[]]
    else:
        input_set = list(range(1, n+1)) # [1,2,...n]
        main_subset = [ ]
        for small_subset in powerSetR(n-1):
            main_subset += [small_subset]
            main_subset += [ [input_set[-1]] + small_subset]
        return main_subset

superset = powerSetR(4)
print(superset)       
print("Number of sublists:", len(superset))

输出是

[[],[4],[3],[4、3],[2],[4、2],[3、2],[4、3、2],[1],[4、1 ],[3、1],[4、3、1],[2、1],[4、2、1],[3、2、1],[4、3、2、1]]的数量子列表:16

A variation of the question, is an exercise I see on the book “Discovering Computer Science: Interdisciplinary Problems, Principles, and Python Programming. 2015 edition”. In that exercise 10.2.11, the input is just an integer number, and the output should be the power sets. Here is my recursive solution (not using anything else but basic python3 )

def powerSetR(n):
    assert n >= 0
    if n == 0:
        return [[]]
    else:
        input_set = list(range(1, n+1)) # [1,2,...n]
        main_subset = [ ]
        for small_subset in powerSetR(n-1):
            main_subset += [small_subset]
            main_subset += [ [input_set[-1]] + small_subset]
        return main_subset

superset = powerSetR(4)
print(superset)       
print("Number of sublists:", len(superset))

And the output is

[[], [4], [3], [4, 3], [2], [4, 2], [3, 2], [4, 3, 2], [1], [4, 1], [3, 1], [4, 3, 1], [2, 1], [4, 2, 1], [3, 2, 1], [4, 3, 2, 1]] Number of sublists: 16


回答 25

我没有遇到过该more_itertools.powerset功能,建议使用它。我还建议不要使用的默认输出顺序itertools.combinations,通常是希望最小化位置之间的距离,并对位置之间距离较短的项目子集进行排序,使其高于/之间距离更大的项目。

itertools食谱页面显示它使用chain.from_iterable

  • 请注意,r这里与二项式系数的下部的标准符号匹配,s通常n在数学课本和计算器中称为“ n选择r”
def powerset(iterable):
    "powerset([1,2,3]) --> () (1,) (2,) (3,) (1,2) (1,3) (2,3) (1,2,3)"
    s = list(iterable)
    return chain.from_iterable(combinations(s, r) for r in range(len(s)+1))

这里的其他示例[1,2,3,4]以2个元组以“字典顺序”的顺序列出(当我们将数字打印为整数时)给出的幂集。如果我在旁边写上数字之间的距离(即差),它表明了我的观点:

121
132
143
231
242
341

子集的正确顺序应该是首先“耗尽”最小距离的顺序,如下所示:

121
231
341
132
242
143

在这里使用数字会使此顺序看起来“错误”,但是考虑一下字母,["a","b","c","d"]这很清楚为什么对于按此顺序获得幂集可能很有​​用:

ab ⇒ 1
bc ⇒ 1
cd ⇒ 1
ac ⇒ 2
bd ⇒ 2
ad ⇒ 3

对于更多项目,此效果更加明显,就我的目的而言,它使能够有意义地描述Powerset的索引范围之间有所不同。

(关于组合代码中算法的输出顺序,有很多关于格雷码等的文章,我不认为这是附带问题)。

我实际上只是编写了一个相当复杂的程序,该程序使用此快速整数分区代码以正确的顺序输出值,但是随后我发现more_itertools.powerset并且对于大多数使用方法而言,仅使用该函数就可以了,就像这样:

from more_itertools import powerset
from numpy import ediff1d

def ps_sorter(tup):
    l = len(tup)
    d = ediff1d(tup).tolist()
    return l, d

ps = powerset([1,2,3,4])

ps = sorted(ps, key=ps_sorter)

for x in ps:
    print(x)

()
(1,)
(2,)
(3,)
(4,)
(1, 2)
(2, 3)
(3, 4)
(1, 3)
(2, 4)
(1, 4)
(1, 2, 3)
(2, 3, 4)
(1, 2, 4)
(1, 3, 4)
(1, 2, 3, 4)

我写了一些更复杂的代码,这将很好地打印幂(见回购了漂亮的印刷,我这里不包括的功能:print_partitionsprint_partitions_by_length,和pprint_tuple)。

这一切都非常简单,但是如果您想要一些可以直接访问不同级别的Powerset的代码,它可能仍然有用:

from itertools import permutations as permute
from numpy import cumsum

# http://jeromekelleher.net/generating-integer-partitions.html
# via
# /programming/10035752/elegant-python-code-for-integer-partitioning#comment25080713_10036764

def asc_int_partitions(n):
    a = [0 for i in range(n + 1)]
    k = 1
    y = n - 1
    while k != 0:
        x = a[k - 1] + 1
        k -= 1
        while 2 * x <= y:
            a[k] = x
            y -= x
            k += 1
        l = k + 1
        while x <= y:
            a[k] = x
            a[l] = y
            yield tuple(a[:k + 2])
            x += 1
            y -= 1
        a[k] = x + y
        y = x + y - 1
        yield tuple(a[:k + 1])

# https://stackoverflow.com/a/6285330/2668831
def uniquely_permute(iterable, enforce_sort=False, r=None):
    previous = tuple()
    if enforce_sort: # potential waste of effort (default: False)
        iterable = sorted(iterable)
    for p in permute(iterable, r):
        if p > previous:
            previous = p
            yield p

def sum_min(p):
    return sum(p), min(p)

def partitions_by_length(max_n, sorting=True, permuting=False):
    partition_dict = {0: ()}
    for n in range(1,max_n+1):
        partition_dict.setdefault(n, [])
        partitions = list(asc_int_partitions(n))
        for p in partitions:
            if permuting:
                perms = uniquely_permute(p)
                for perm in perms:
                    partition_dict.get(len(p)).append(perm)
            else:
                partition_dict.get(len(p)).append(p)
    if not sorting:
        return partition_dict
    for k in partition_dict:
        partition_dict.update({k: sorted(partition_dict.get(k), key=sum_min)})
    return partition_dict

def print_partitions_by_length(max_n, sorting=True, permuting=True):
    partition_dict = partitions_by_length(max_n, sorting=sorting, permuting=permuting)
    for k in partition_dict:
        if k == 0:
            print(tuple(partition_dict.get(k)), end="")
        for p in partition_dict.get(k):
            print(pprint_tuple(p), end=" ")
        print()
    return

def generate_powerset(items, subset_handler=tuple, verbose=False):
    """
    Generate the powerset of an iterable `items`.

    Handling of the elements of the iterable is by whichever function is passed as
    `subset_handler`, which must be able to handle the `None` value for the
    empty set. The function `string_handler` will join the elements of the subset
    with the empty string (useful when `items` is an iterable of `str` variables).
    """
    ps = {0: [subset_handler()]}
    n = len(items)
    p_dict = partitions_by_length(n-1, sorting=True, permuting=True)
    for p_len, parts in p_dict.items():
        ps.setdefault(p_len, [])
        if p_len == 0:
            # singletons
            for offset in range(n):
                subset = subset_handler([items[offset]])
                if verbose:
                    if offset > 0:
                        print(end=" ")
                    if offset == n - 1:
                        print(subset, end="\n")
                    else:
                        print(subset, end=",")
                ps.get(p_len).append(subset)
        for pcount, partition in enumerate(parts):
            distance = sum(partition)
            indices = (cumsum(partition)).tolist()
            for offset in range(n - distance):
                subset = subset_handler([items[offset]] + [items[offset:][i] for i in indices])
                if verbose:
                    if offset > 0:
                        print(end=" ")
                    if offset == n - distance - 1:
                        print(subset, end="\n")
                    else:
                        print(subset, end=",")
                ps.get(p_len).append(subset)
        if verbose and p_len < n-1:
            print()
    return ps

作为示例,我编写了一个CLI演示程序,该程序将字符串作为命令行参数:

python string_powerset.py abcdef

a, b, c, d, e, f

ab, bc, cd, de, ef
ac, bd, ce, df
ad, be, cf
ae, bf
af

abc, bcd, cde, def
abd, bce, cdf
acd, bde, cef
abe, bcf
ade, bef
ace, bdf
abf
aef
acf
adf

abcd, bcde, cdef
abce, bcdf
abde, bcef
acde, bdef
abcf
abef
adef
abdf
acdf
acef

abcde, bcdef
abcdf
abcef
abdef
acdef

abcdef

I hadn’t come across the more_itertools.powerset function and would recommend using that. I also recommend not using the default ordering of the output from itertools.combinations, often instead you want to minimise the distance between the positions and sort the subsets of items with shorter distance between them above/before the items with larger distance between them.

The itertools recipes page shows it uses chain.from_iterable

  • Note that the r here matches the standard notation for the lower part of a binomial coefficient, the s is usually referred to as n in mathematics texts and on calculators (“n Choose r”)
def powerset(iterable):
    "powerset([1,2,3]) --> () (1,) (2,) (3,) (1,2) (1,3) (2,3) (1,2,3)"
    s = list(iterable)
    return chain.from_iterable(combinations(s, r) for r in range(len(s)+1))

The other examples here give the powerset of [1,2,3,4] in such a way that the 2-tuples are listed in “lexicographic” order (when we print the numbers as integers). If I write the distance between the numbers alongside it (i.e. the difference), it shows my point:

12 ⇒ 1
13 ⇒ 2
14 ⇒ 3
23 ⇒ 1
24 ⇒ 2
34 ⇒ 1

The correct order for subsets should be the order which ‘exhausts’ the minimal distance first, like so:

12 ⇒ 1
23 ⇒ 1
34 ⇒ 1
13 ⇒ 2
24 ⇒ 2
14 ⇒ 3

Using numbers here makes this ordering look ‘wrong’, but consider for example the letters ["a","b","c","d"] it is clearer why this might be useful to obtain the powerset in this order:

ab ⇒ 1
bc ⇒ 1
cd ⇒ 1
ac ⇒ 2
bd ⇒ 2
ad ⇒ 3

This effect is more pronounced with more items, and for my purposes it makes the difference between being able to describe the ranges of the indexes of the powerset meaningfully.

(There is a lot written on Gray codes etc. for the output order of algorithms in combinatorics, I don’t see it as a side issue).

I actually just wrote a fairly involved program which used this fast integer partition code to output the values in the proper order, but then I discovered more_itertools.powerset and for most uses it’s probably fine to just use that function like so:

from more_itertools import powerset
from numpy import ediff1d

def ps_sorter(tup):
    l = len(tup)
    d = ediff1d(tup).tolist()
    return l, d

ps = powerset([1,2,3,4])

ps = sorted(ps, key=ps_sorter)

for x in ps:
    print(x)

()
(1,)
(2,)
(3,)
(4,)
(1, 2)
(2, 3)
(3, 4)
(1, 3)
(2, 4)
(1, 4)
(1, 2, 3)
(2, 3, 4)
(1, 2, 4)
(1, 3, 4)
(1, 2, 3, 4)

I wrote some more involved code which will print the powerset nicely (see the repo for pretty printing functions I’ve not included here: print_partitions, print_partitions_by_length, and pprint_tuple).

This is all pretty simple, but still might be useful if you want some code that’ll let you get straight to accessing the different levels of the powerset:

from itertools import permutations as permute
from numpy import cumsum

# http://jeromekelleher.net/generating-integer-partitions.html
# via
# https://stackoverflow.com/questions/10035752/elegant-python-code-for-integer-partitioning#comment25080713_10036764

def asc_int_partitions(n):
    a = [0 for i in range(n + 1)]
    k = 1
    y = n - 1
    while k != 0:
        x = a[k - 1] + 1
        k -= 1
        while 2 * x <= y:
            a[k] = x
            y -= x
            k += 1
        l = k + 1
        while x <= y:
            a[k] = x
            a[l] = y
            yield tuple(a[:k + 2])
            x += 1
            y -= 1
        a[k] = x + y
        y = x + y - 1
        yield tuple(a[:k + 1])

# https://stackoverflow.com/a/6285330/2668831
def uniquely_permute(iterable, enforce_sort=False, r=None):
    previous = tuple()
    if enforce_sort: # potential waste of effort (default: False)
        iterable = sorted(iterable)
    for p in permute(iterable, r):
        if p > previous:
            previous = p
            yield p

def sum_min(p):
    return sum(p), min(p)

def partitions_by_length(max_n, sorting=True, permuting=False):
    partition_dict = {0: ()}
    for n in range(1,max_n+1):
        partition_dict.setdefault(n, [])
        partitions = list(asc_int_partitions(n))
        for p in partitions:
            if permuting:
                perms = uniquely_permute(p)
                for perm in perms:
                    partition_dict.get(len(p)).append(perm)
            else:
                partition_dict.get(len(p)).append(p)
    if not sorting:
        return partition_dict
    for k in partition_dict:
        partition_dict.update({k: sorted(partition_dict.get(k), key=sum_min)})
    return partition_dict

def print_partitions_by_length(max_n, sorting=True, permuting=True):
    partition_dict = partitions_by_length(max_n, sorting=sorting, permuting=permuting)
    for k in partition_dict:
        if k == 0:
            print(tuple(partition_dict.get(k)), end="")
        for p in partition_dict.get(k):
            print(pprint_tuple(p), end=" ")
        print()
    return

def generate_powerset(items, subset_handler=tuple, verbose=False):
    """
    Generate the powerset of an iterable `items`.

    Handling of the elements of the iterable is by whichever function is passed as
    `subset_handler`, which must be able to handle the `None` value for the
    empty set. The function `string_handler` will join the elements of the subset
    with the empty string (useful when `items` is an iterable of `str` variables).
    """
    ps = {0: [subset_handler()]}
    n = len(items)
    p_dict = partitions_by_length(n-1, sorting=True, permuting=True)
    for p_len, parts in p_dict.items():
        ps.setdefault(p_len, [])
        if p_len == 0:
            # singletons
            for offset in range(n):
                subset = subset_handler([items[offset]])
                if verbose:
                    if offset > 0:
                        print(end=" ")
                    if offset == n - 1:
                        print(subset, end="\n")
                    else:
                        print(subset, end=",")
                ps.get(p_len).append(subset)
        for pcount, partition in enumerate(parts):
            distance = sum(partition)
            indices = (cumsum(partition)).tolist()
            for offset in range(n - distance):
                subset = subset_handler([items[offset]] + [items[offset:][i] for i in indices])
                if verbose:
                    if offset > 0:
                        print(end=" ")
                    if offset == n - distance - 1:
                        print(subset, end="\n")
                    else:
                        print(subset, end=",")
                ps.get(p_len).append(subset)
        if verbose and p_len < n-1:
            print()
    return ps

As an example, I wrote a CLI demo program which takes a string as a command line argument:

python string_powerset.py abcdef

a, b, c, d, e, f

ab, bc, cd, de, ef
ac, bd, ce, df
ad, be, cf
ae, bf
af

abc, bcd, cde, def
abd, bce, cdf
acd, bde, cef
abe, bcf
ade, bef
ace, bdf
abf
aef
acf
adf

abcd, bcde, cdef
abce, bcdf
abde, bcef
acde, bdef
abcf
abef
adef
abdf
acdf
acef

abcde, bcdef
abcdf
abcef
abdef
acdef

abcdef

回答 26

如果您想要任何特定长度的子集,可以这样操作:

from itertools import combinations
someSet = {0, 1, 2, 3}
([x for i in range(len(someSet)+1) for x in combinations(someSet,i)])

通常,对于任意长度的子集,您可以修改范围偏差。输出是

[(),(0,),(1,),(2,),(3,),(0,1),(0,2),(0,3),(1,2),(1 ,3),(2,3),(0,1,2),(0,1,3),(0,2,3),(1,2,3),(0,1,2,3 )]

If you want any specific length of subsets you can do it like this:

from itertools import combinations
someSet = {0, 1, 2, 3}
([x for i in range(len(someSet)+1) for x in combinations(someSet,i)])

More generally for arbitary length subsets you can modify the range arugment. The output is

[(), (0,), (1,), (2,), (3,), (0, 1), (0, 2), (0, 3), (1, 2), (1, 3), (2, 3), (0, 1, 2), (0, 1, 3), (0, 2, 3), (1, 2, 3), (0, 1, 2, 3)]


回答 27

def powerset(some_set):
    res = [(a,b) for a in some_set for b in some_set]
    return res
def powerset(some_set):
    res = [(a,b) for a in some_set for b in some_set]
    return res

如何在python中构造一组列表项?

问题:如何在python中构造一组列表项?

list在python中有一个文件名,我想set从所有文件名中构造一个。

filelist=[]
for filename in filelist:
    set(filename)

这似乎不起作用。怎么办

I have a list of filenames in python and I would want to construct a set out of all the filenames.

filelist=[]
for filename in filelist:
    set(filename)

This does not seem to work. How can do this?


回答 0

如果您具有可散列对象的列表(文件名可能是字符串,那么它们应该算在内):

lst = ['foo.py', 'bar.py', 'baz.py', 'qux.py', Ellipsis]

您可以直接构造集合:

s = set(lst)

实际上,set将这种方式与任何可迭代对象一起使用! (鸭子打字不好吗?)


如果要迭代进行:

s = set()
for item in iterable:
    s.add(item)

但是很少需要这样做。我只提到它是因为该set.add方法非常有用。

If you have a list of hashable objects (filenames would probably be strings, so they should count):

lst = ['foo.py', 'bar.py', 'baz.py', 'qux.py', Ellipsis]

you can construct the set directly:

s = set(lst)

In fact, set will work this way with any iterable object! (Isn’t duck typing great?)


If you want to do it iteratively:

s = set()
for item in iterable:
    s.add(item)

But there’s rarely a need to do it this way. I only mention it because the set.add method is quite useful.


回答 1

最直接的解决方案是:

 s = set(filelist)

原始代码中的问题是未将值分配给集合。这是您的代码的固定版本:

 s = set()
 for filename in filelist:
     s.add(filename)
 print(s)

The most direct solution is this:

s = set(filelist)

The issue in your original code is that the values weren’t being assigned to the set. Here’s the fixed-up version of your code:

s = set()
for filename in filelist:
    s.add(filename)
print(s)

回答 2

你可以做

my_set = set(my_list)

或者,对于Python 3,

my_set = {*my_list}

从列表创建一个集合。相反,您也可以

my_list = list(my_set)

或者,对于Python 3,

my_list = [*my_set]

从集合创建列表。

只需注意,将列表转换为集合时,列表中元素的顺序通常会丢失,因为集合本质上是无序的。(不过,CPython中的一个exceptions似乎是如果列表仅包含非负整数,但是我认为这是CPython中集合实现的结果,并且这种行为在不同的Python实现之间会有所不同。)

You can do

my_set = set(my_list)

or, in Python 3,

my_set = {*my_list}

to create a set from a list. Conversely, you can also do

my_list = list(my_set)

or, in Python 3,

my_list = [*my_set]

to create a list from a set.

Just note that the order of the elements in a list is generally lost when converting the list to a set since a set is inherently unordered. (One exception in CPython, though, seems to be if the list consists only of non-negative integers, but I assume this is a consequence of the implementation of sets in CPython and that this behavior can vary between different Python implementations.)


回答 3

这是另一种解决方案:

>>>list1=["C:\\","D:\\","E:\\","C:\\"]
>>>set1=set(list1)
>>>set1
set(['E:\\', 'D:\\', 'C:\\'])

在这段代码中,我使用了set方法,以便将其变成一个集合,然后从列表中删除了所有重复的值

Here is another solution:

>>>list1=["C:\\","D:\\","E:\\","C:\\"]
>>>set1=set(list1)
>>>set1
set(['E:\\', 'D:\\', 'C:\\'])

In this code I have used the set method in order to turn it into a set and then it removed all duplicate values from the list


回答 4

一种像这样的迭代方式构造集合的一般方法:

aset = {e for e in alist}

One general way to construct set in iterative way like this:

aset = {e for e in alist}

回答 5

简单地说:

new_list = set(your_list)

Simply put the line:

new_list = set(your_list)

如何将可迭代的内容添加到集合中?

问题:如何将可迭代的内容添加到集合中?

将可迭代的所有项目添加到现有项目的“一种明显方法”set什么?

What is the “one […] obvious way” to add all items of an iterable to an existing set?


回答 0

您可以像这样list将a的元素添加set

>>> foo = set(range(0, 4))
>>> foo
set([0, 1, 2, 3])
>>> foo.update(range(2, 6))
>>> foo
set([0, 1, 2, 3, 4, 5])

You can add elements of a list to a set like this:

>>> foo = set(range(0, 4))
>>> foo
set([0, 1, 2, 3])
>>> foo.update(range(2, 6))
>>> foo
set([0, 1, 2, 3, 4, 5])

回答 1

为了使任何可能相信(例如)aset.add()在循环中进行的工作都具有竞争优势的人受益,aset.update()以下示例说明了如何在公开之前快速检验自己的信念:

>\python27\python -mtimeit -s"it=xrange(10000);a=set(xrange(100))" "a.update(it)"
1000 loops, best of 3: 294 usec per loop

>\python27\python -mtimeit -s"it=xrange(10000);a=set(xrange(100))" "for i in it:a.add(i)"
1000 loops, best of 3: 950 usec per loop

>\python27\python -mtimeit -s"it=xrange(10000);a=set(xrange(100))" "a |= set(it)"
1000 loops, best of 3: 458 usec per loop

>\python27\python -mtimeit -s"it=xrange(20000);a=set(xrange(100))" "a.update(it)"
1000 loops, best of 3: 598 usec per loop

>\python27\python -mtimeit -s"it=xrange(20000);a=set(xrange(100))" "for i in it:a.add(i)"
1000 loops, best of 3: 1.89 msec per loop

>\python27\python -mtimeit -s"it=xrange(20000);a=set(xrange(100))" "a |= set(it)"
1000 loops, best of 3: 891 usec per loop

看来循环方法的每项成本是该方法的三倍以上update

使用|= set()成本大约是原来的1.5倍update,但循环添加每个单独项目的成本却是原来的一半。

For the benefit of anyone who might believe e.g. that doing aset.add() in a loop would have performance competitive with doing aset.update(), here’s an example of how you can test your beliefs quickly before going public:

>\python27\python -mtimeit -s"it=xrange(10000);a=set(xrange(100))" "a.update(it)"
1000 loops, best of 3: 294 usec per loop

>\python27\python -mtimeit -s"it=xrange(10000);a=set(xrange(100))" "for i in it:a.add(i)"
1000 loops, best of 3: 950 usec per loop

>\python27\python -mtimeit -s"it=xrange(10000);a=set(xrange(100))" "a |= set(it)"
1000 loops, best of 3: 458 usec per loop

>\python27\python -mtimeit -s"it=xrange(20000);a=set(xrange(100))" "a.update(it)"
1000 loops, best of 3: 598 usec per loop

>\python27\python -mtimeit -s"it=xrange(20000);a=set(xrange(100))" "for i in it:a.add(i)"
1000 loops, best of 3: 1.89 msec per loop

>\python27\python -mtimeit -s"it=xrange(20000);a=set(xrange(100))" "a |= set(it)"
1000 loops, best of 3: 891 usec per loop

Looks like the cost per item of the loop approach is over THREE times that of the update approach.

Using |= set() costs about 1.5x what update does but half of what adding each individual item in a loop does.


回答 2

您可以使用set()函数将一个可迭代对象转换为一个集合,然后使用标准集合更新运算符(| =)将新集合中的唯一值添加到现有集合中。

>>> a = { 1, 2, 3 }
>>> b = ( 3, 4, 5 )
>>> a |= set(b)
>>> a
set([1, 2, 3, 4, 5])

You can use the set() function to convert an iterable into a set, and then use standard set update operator (|=) to add the unique values from your new set into the existing one.

>>> a = { 1, 2, 3 }
>>> b = ( 3, 4, 5 )
>>> a |= set(b)
>>> a
set([1, 2, 3, 4, 5])

回答 3

只是快速更新,使用python 3进行计时:

#!/usr/local/bin python3
from timeit import Timer

a = set(range(1, 100000))
b = list(range(50000, 150000))

def one_by_one(s, l):
    for i in l:
        s.add(i)    

def cast_to_list_and_back(s, l):
    s = set(list(s) + l)

def update_set(s,l):
    s.update(l)

结果是:

one_by_one 10.184448844986036
cast_to_list_and_back 7.969255169969983
update_set 2.212590195937082

Just a quick update, timings using python 3:

#!/usr/local/bin python3
from timeit import Timer

a = set(range(1, 100000))
b = list(range(50000, 150000))

def one_by_one(s, l):
    for i in l:
        s.add(i)    

def cast_to_list_and_back(s, l):
    s = set(list(s) + l)

def update_set(s,l):
    s.update(l)

results are:

one_by_one 10.184448844986036
cast_to_list_and_back 7.969255169969983
update_set 2.212590195937082

回答 4

使用列表理解。

使用列表来短路可迭代的创建:)

>>> x = [1, 2, 3, 4]
>>> 
>>> k = x.__iter__()
>>> k
<listiterator object at 0x100517490>
>>> l = [y for y in k]
>>> l
[1, 2, 3, 4]
>>> 
>>> z = Set([1,2])
>>> z.update(l)
>>> z
set([1, 2, 3, 4])
>>> 

[编辑:错过了问题的设定部分]

Use list comprehension.

Short circuiting the creation of iterable using a list for example :)

>>> x = [1, 2, 3, 4]
>>> 
>>> k = x.__iter__()
>>> k
<listiterator object at 0x100517490>
>>> l = [y for y in k]
>>> l
[1, 2, 3, 4]
>>> 
>>> z = Set([1,2])
>>> z.update(l)
>>> z
set([1, 2, 3, 4])
>>> 

[Edit: missed the set part of question]


回答 5

for item in items:
   extant_set.add(item)

作为记录,我认为这样的主张“应该有一种-最好只有一种-显而易见的方式”。是假的。它假设许多技术娴熟的人都会做出这样的假设,而每个人的想法都是相同的。对一个人显而易见的东西对另一个人不是那么明显。

我认为我提出的解决方案清晰易读,并且可以满足您的要求。我不相信它会带来任何性能上的损失-尽管我承认我可能会遗漏一些东西。但是尽管如此,对于其他开发人员来说,它可能并不明显且更可取。

for item in items:
   extant_set.add(item)

For the record, I think the assertion that “There should be one– and preferably only one –obvious way to do it.” is bogus. It makes an assumption that many technical minded people make, that everyone thinks alike. What is obvious to one person is not so obvious to another.

I would argue that my proposed solution is clearly readable, and does what you ask. I don’t believe there are any performance hits involved with it–though I admit I might be missing something. But despite all of that, it might not be obvious and preferable to another developer.