Python是否有序集?

问题:Python是否有序集?

Python有一个有序的字典。有序套呢?

Python has an ordered dictionary. What about an ordered set?


回答 0

为此,有一个有序的设置(可能的新链接)配方,可从Python 2文档中引用。无需修改即可在Py2.6或更高版本以及3.0或更高版本上运行。该接口几乎与普通集合完全相同,不同之处在于初始化应使用列表进行。

OrderedSet([1, 2, 3])

这是一个MutableSet,因此for的签名.union与set 的签名不匹配,但是由于它包含__or__类似的内容,因此可以轻松添加:

@staticmethod
def union(*sets):
    union = OrderedSet()
    union.union(*sets)
    return union

def union(self, *sets):
    for set in sets:
        self |= set

There is an ordered set (possible new link) recipe for this which is referred to from the Python 2 Documentation. This runs on Py2.6 or later and 3.0 or later without any modifications. The interface is almost exactly the same as a normal set, except that initialisation should be done with a list.

OrderedSet([1, 2, 3])

This is a MutableSet, so the signature for .union doesn’t match that of set, but since it includes __or__ something similar can easily be added:

@staticmethod
def union(*sets):
    union = OrderedSet()
    union.union(*sets)
    return union

def union(self, *sets):
    for set in sets:
        self |= set

回答 1

有序集在功能上是有序字典的特例。

字典的键是唯一的。因此,如果人们不理会有序字典中的值(例如,通过分配它们None),那么实质上就是一个有序集合。

对于Python 3.1的存在collections.OrderedDict。以下是OrderedSet的示例实现。(请注意,只有很少的方法需要定义或重写:collections.OrderedDictcollections.MutableSet。做繁重)

import collections

class OrderedSet(collections.OrderedDict, collections.MutableSet):

    def update(self, *args, **kwargs):
        if kwargs:
            raise TypeError("update() takes no keyword arguments")

        for s in args:
            for e in s:
                 self.add(e)

    def add(self, elem):
        self[elem] = None

    def discard(self, elem):
        self.pop(elem, None)

    def __le__(self, other):
        return all(e in other for e in self)

    def __lt__(self, other):
        return self <= other and self != other

    def __ge__(self, other):
        return all(e in self for e in other)

    def __gt__(self, other):
        return self >= other and self != other

    def __repr__(self):
        return 'OrderedSet([%s])' % (', '.join(map(repr, self.keys())))

    def __str__(self):
        return '{%s}' % (', '.join(map(repr, self.keys())))

    difference = __sub__ 
    difference_update = __isub__
    intersection = __and__
    intersection_update = __iand__
    issubset = __le__
    issuperset = __ge__
    symmetric_difference = __xor__
    symmetric_difference_update = __ixor__
    union = __or__

An ordered set is functionally a special case of an ordered dictionary.

The keys of a dictionary are unique. Thus, if one disregards the values in an ordered dictionary (e.g. by assigning them None), then one has essentially an ordered set.

As of Python 3.1 there is collections.OrderedDict. The following is an example implementation of an OrderedSet. (Note that only few methods need to be defined or overridden: collections.OrderedDict and collections.MutableSet do the heavy lifting.)

import collections

class OrderedSet(collections.OrderedDict, collections.MutableSet):

    def update(self, *args, **kwargs):
        if kwargs:
            raise TypeError("update() takes no keyword arguments")

        for s in args:
            for e in s:
                 self.add(e)

    def add(self, elem):
        self[elem] = None

    def discard(self, elem):
        self.pop(elem, None)

    def __le__(self, other):
        return all(e in other for e in self)

    def __lt__(self, other):
        return self <= other and self != other

    def __ge__(self, other):
        return all(e in self for e in other)

    def __gt__(self, other):
        return self >= other and self != other

    def __repr__(self):
        return 'OrderedSet([%s])' % (', '.join(map(repr, self.keys())))

    def __str__(self):
        return '{%s}' % (', '.join(map(repr, self.keys())))

    difference = __sub__ 
    difference_update = __isub__
    intersection = __and__
    intersection_update = __iand__
    issubset = __le__
    issuperset = __ge__
    symmetric_difference = __xor__
    symmetric_difference_update = __ixor__
    union = __or__

回答 2

答案是否定的,但是您可以collections.OrderedDict在Python标准库中仅使用键(和None)作为同一目的。

更新:从Python 3.7(和CPython 3.6)开始,标准dict可以保证保持顺序,并且比更具性能OrderedDict。(但是,为了向后兼容,尤其是为了可读性,您可能希望继续使用OrderedDict。)

这是一个示例,该示例说明如何dict在保留订单的同时用作有序集来过滤出重复项,从而模拟有序集。使用dictclass方法fromkeys()创建字典,然后简单地要求它keys()

>>> keywords = ['foo', 'bar', 'bar', 'foo', 'baz', 'foo']

>>> list(dict.fromkeys(keywords))
['foo', 'bar', 'baz']

The answer is no, but you can use collections.OrderedDict from the Python standard library with just keys (and values as None) for the same purpose.

Update: As of Python 3.7 (and CPython 3.6), standard dict is guaranteed to preserve order and is more performant than OrderedDict. (For backward compatibility and especially readability, however, you may wish to continue using OrderedDict.)

Here’s an example of how to use dict as an ordered set to filter out duplicate items while preserving order, thereby emulating an ordered set. Use the dict class method fromkeys() to create a dict, then simply ask for the keys() back.

>>> keywords = ['foo', 'bar', 'bar', 'foo', 'baz', 'foo']

>>> list(dict.fromkeys(keywords))
['foo', 'bar', 'baz']

回答 3

我可以做得比OrderedSet更好:bollton具有2/3兼容的纯Python IndexedSet类型,该类型不仅是有序集合,而且还支持索引编制(与列表一样)。

简单地pip install boltons(或复制setutils.py到您的代码库中),导入IndexedSet和:

>>> from boltons.setutils import IndexedSet
>>> x = IndexedSet(list(range(4)) + list(range(8)))
>>> x
IndexedSet([0, 1, 2, 3, 4, 5, 6, 7])
>>> x - set(range(2))
IndexedSet([2, 3, 4, 5, 6, 7])
>>> x[-1]
7
>>> fcr = IndexedSet('freecreditreport.com')
>>> ''.join(fcr[:fcr.index('.')])
'frecditpo'

一切都是独一无二的,并保持秩序。全面披露:我写了IndexedSet,但是这也意味着如果有任何问题,您可以给我个麻烦。:)

I can do you one better than an OrderedSet: boltons has a pure-Python, 2/3-compatible IndexedSet type that is not only an ordered set, but also supports indexing (as with lists).

Simply pip install boltons (or copy setutils.py into your codebase), import the IndexedSet and:

>>> from boltons.setutils import IndexedSet
>>> x = IndexedSet(list(range(4)) + list(range(8)))
>>> x
IndexedSet([0, 1, 2, 3, 4, 5, 6, 7])
>>> x - set(range(2))
IndexedSet([2, 3, 4, 5, 6, 7])
>>> x[-1]
7
>>> fcr = IndexedSet('freecreditreport.com')
>>> ''.join(fcr[:fcr.index('.')])
'frecditpo'

Everything is unique and retained in order. Full disclosure: I wrote the IndexedSet, but that also means you can bug me if there are any issues. :)


回答 4

在PyPI上的实现

尽管其他人指出,Python中还没有内置的插入顺序保留集实现(但是),但我感到这个问题缺少一个答案,该答案指出了在PyPI上可以找到的内容。

有软件包:

其中一些实现是基于Raymond Hettinger提交给ActiveState配方的,在其他答案中也提到了该配方

一些差异

  • 有序集(1.1版)
    • 优势:O(1)用于按索引查找(例如my_set[5]
  • oset(0.1.3版)
    • 优势:O(1)为 remove(item)
    • 缺点:显然是O(n)用于按索引查找

这两个实现都为add(item)__contains__(item)item in my_set)具有O(1 )。

Implementations on PyPI

While others have pointed out that there is no built-in implementation of an insertion-order preserving set in Python (yet), I am feeling that this question is missing an answer which states what there is to be found on PyPI.

There are the packages:

Some of these implementations are based on the recipe posted by Raymond Hettinger to ActiveState which is also mentioned in other answers here.

Some differences

  • ordered-set (version 1.1)
    • advantage: O(1) for lookups by index (e.g. my_set[5])
  • oset (version 0.1.3)
    • advantage: O(1) for remove(item)
    • disadvantage: apparently O(n) for lookups by index

Both implementations have O(1) for add(item) and __contains__(item) (item in my_set).


回答 5

如果您使用排序集来维护排序顺序,请考虑使用PyPI中的排序集实现。该sortedcontainers模块提供了一个SortedSet的只是这个目的。一些好处:纯Python,快速C实现,100%单元测试覆盖率,数小时的压力测试。

通过pip从PyPI安装很容易:

pip install sortedcontainers

请注意,如果不能pip install,则只需从开源存储库中下拉sortedlist.py和sortedset.py文件。

安装完成后,您可以:

from sortedcontainers import SortedSet
help(SortedSet)

sortedcontainers模块还与几种替代实现保持性能比较

对于询问Python的bag数据类型的评论,还有一种SortedList数据类型,可用于有效地实现bag。

If you’re using the ordered set to maintain a sorted order, consider using a sorted set implementation from PyPI. The sortedcontainers module provides a SortedSet for just this purpose. Some benefits: pure-Python, fast-as-C implementations, 100% unit test coverage, hours of stress testing.

Installing from PyPI is easy with pip:

pip install sortedcontainers

Note that if you can’t pip install, simply pull down the sortedlist.py and sortedset.py files from the open-source repository.

Once installed you can simply:

from sortedcontainers import SortedSet
help(SortedSet)

The sortedcontainers module also maintains a performance comparison with several alternative implementations.

For the comment that asked about Python’s bag data type, there’s alternatively a SortedList data type which can be used to efficiently implement a bag.


回答 6

如果您已经在代码中使用了pandas,则其Index对象的行为就很像有序集,如本文中所示。

文章中的示例:

indA = pd.Index([1, 3, 5, 7, 9])
indB = pd.Index([2, 3, 5, 7, 11])

indA & indB  # intersection
indA | indB  # union
indA - indB  # difference
indA ^ indB  # symmetric difference

In case you’re already using pandas in your code, its Index object behaves pretty like an ordered set, as shown in this article.

Examples from the article:

indA = pd.Index([1, 3, 5, 7, 9])
indB = pd.Index([2, 3, 5, 7, 11])

indA & indB  # intersection
indA | indB  # union
indA - indB  # difference
indA ^ indB  # symmetric difference

回答 7

太迟了一点,但我已经写了一类setlist作为一部分collections-extended完全实现双方SequenceSet

>>> from collections_extended import setlist
>>> sl = setlist('abracadabra')
>>> sl
setlist(('a', 'b', 'r', 'c', 'd'))
>>> sl[3]
'c'
>>> sl[-1]
'd'
>>> 'r' in sl  # testing for inclusion is fast
True
>>> sl.index('d')  # so is finding the index of an element
4
>>> sl.insert(1, 'd')  # inserting an element already in raises a ValueError
ValueError
>>> sl.index('d')
4

GitHub:https : //github.com/mlenzen/collections-extended

文档:http : //collections-extended.lenzm.net/en/latest/

PyPI:https://pypi.python.org/pypi/collections-extended

A little late to the game, but I’ve written a class setlist as part of collections-extended that fully implements both Sequence and Set

>>> from collections_extended import setlist
>>> sl = setlist('abracadabra')
>>> sl
setlist(('a', 'b', 'r', 'c', 'd'))
>>> sl[3]
'c'
>>> sl[-1]
'd'
>>> 'r' in sl  # testing for inclusion is fast
True
>>> sl.index('d')  # so is finding the index of an element
4
>>> sl.insert(1, 'd')  # inserting an element already in raises a ValueError
ValueError
>>> sl.index('d')
4

GitHub: https://github.com/mlenzen/collections-extended

Documentation: http://collections-extended.lenzm.net/en/latest/

PyPI: https://pypi.python.org/pypi/collections-extended


回答 8

OrderedSet官方图书馆没有。我为所有数据结构制作了详尽的备忘单,以供您参考。

DataStructure = {
    'Collections': {
        'Map': [
            ('dict', 'OrderDict', 'defaultdict'),
            ('chainmap', 'types.MappingProxyType')
        ],
        'Set': [('set', 'frozenset'), {'multiset': 'collection.Counter'}]
    },
    'Sequence': {
        'Basic': ['list', 'tuple', 'iterator']
    },
    'Algorithm': {
        'Priority': ['heapq', 'queue.PriorityQueue'],
        'Queue': ['queue.Queue', 'multiprocessing.Queue'],
        'Stack': ['collection.deque', 'queue.LifeQueue']
        },
    'text_sequence': ['str', 'byte', 'bytearray']
}

There’s no OrderedSet in official library. I make an exhaustive cheatsheet of all the data structure for your reference.

DataStructure = {
    'Collections': {
        'Map': [
            ('dict', 'OrderDict', 'defaultdict'),
            ('chainmap', 'types.MappingProxyType')
        ],
        'Set': [('set', 'frozenset'), {'multiset': 'collection.Counter'}]
    },
    'Sequence': {
        'Basic': ['list', 'tuple', 'iterator']
    },
    'Algorithm': {
        'Priority': ['heapq', 'queue.PriorityQueue'],
        'Queue': ['queue.Queue', 'multiprocessing.Queue'],
        'Stack': ['collection.deque', 'queue.LifeQueue']
        },
    'text_sequence': ['str', 'byte', 'bytearray']
}

回答 9

所述ParallelRegression包提供了SETLIST()有序集类,它是多个方法完成比基于ActiveState的配方的选项。它支持可用于列表的所有方法以及大多数(如果不是全部)可用于集合的方法。

The ParallelRegression package provides a setList( ) ordered set class that is more method-complete than the options based on the ActiveState recipe. It supports all methods available for lists and most if not all methods available for sets.


回答 10

正如其他答案所提到的,对于python 3.7+,该字典按定义排序。除了子类化之外,OrderedDict我们还可以子类化abc.collections.MutableSettyping.MutableSet使用dict的键存储值。

class OrderedSet(typing.MutableSet[T]):
    """A set that preserves insertion order by internally using a dict."""

    def __init__(self, iterable: t.Iterator[T]):
        self._d = dict.fromkeys(iterable)

    def add(self, x: T) -> None:
        self._d[x] = None

    def discard(self, x: T) -> None:
        self._d.pop(x)

    def __contains__(self, x: object) -> bool:
        return self._d.__contains__(x)

    def __len__(self) -> int:
        return self._d.__len__()

    def __iter__(self) -> t.Iterator[T]:
        return self._d.__iter__()

然后:

x = OrderedSet([1, 2, -1, "bar"])
x.add(0)
assert list(x) == [1, 2, -1, "bar", 0]

我将此代码放在一个小的库中,所以任何人都可以pip install

As other answers mention, as for python 3.7+, the dict is ordered by definition. Instead of subclassing OrderedDict we can subclass abc.collections.MutableSet or typing.MutableSet using the dict’s keys to store our values.

class OrderedSet(typing.MutableSet[T]):
    """A set that preserves insertion order by internally using a dict."""

    def __init__(self, iterable: t.Iterator[T]):
        self._d = dict.fromkeys(iterable)

    def add(self, x: T) -> None:
        self._d[x] = None

    def discard(self, x: T) -> None:
        self._d.pop(x)

    def __contains__(self, x: object) -> bool:
        return self._d.__contains__(x)

    def __len__(self) -> int:
        return self._d.__len__()

    def __iter__(self) -> t.Iterator[T]:
        return self._d.__iter__()

Then just:

x = OrderedSet([1, 2, -1, "bar"])
x.add(0)
assert list(x) == [1, 2, -1, "bar", 0]

I put this code in a small library, so anyone can just pip install it.


回答 11

对于许多目的,只需调用sorted就足够了。例如

>>> s = set([0, 1, 2, 99, 4, 40, 3, 20, 24, 100, 60])
>>> sorted(s)
[0, 1, 2, 3, 4, 20, 24, 40, 60, 99, 100]

如果要重复使用此功能,则调用sorted函数会产生开销,因此,只要您完成更改集合的操作,就可能希望保存结果列表。如果您需要维护唯一元素并进行排序,那么我建议您使用具有任意值(例如无)的集合中的OrderedDict。

For many purposes simply calling sorted will suffice. For example

>>> s = set([0, 1, 2, 99, 4, 40, 3, 20, 24, 100, 60])
>>> sorted(s)
[0, 1, 2, 3, 4, 20, 24, 40, 60, 99, 100]

If you are going to use this repeatedly, there will be overhead incurred by calling the sorted function so you might want to save the resulting list, as long as you’re done changing the set. If you need to maintain unique elements and sorted, I agree with the suggestion of using OrderedDict from collections with an arbitrary value such as None.


回答 12

因此,我还有一个小清单,很明显可以引入非唯一值。

我搜索了某种唯一列表的存在,但是后来意识到在添加元素之前测试元素的存在就可以了。

if(not new_element in my_list):
    my_list.append(new_element)

我不知道这种简单方法是否有警告,但可以解决我的问题。

So i also had a small list where i clearly had the possibility of introducing non-unique values.

I searched for the existence of a unique list of some sort, but then realized that testing the existence of the element before adding it works just fine.

if(not new_element in my_list):
    my_list.append(new_element)

I don’t know if there are caveats to this simple approach, but it solves my problem.