标签归档:set

将值附加到Python中的集合

问题:将值附加到Python中的集合

我有这样的一套:

keep = set(generic_drugs_mapping[drug] for drug in drug_input)

如何将值添加[0,1,2,3,4,5,6,7,8,9,10]到此集中?

I have a set like this:

keep = set(generic_drugs_mapping[drug] for drug in drug_input)

How do I add values [0,1,2,3,4,5,6,7,8,9,10] into this set?


回答 0

keep.update(yoursequenceofvalues)

例如,keep.update(xrange(11))对于您的特定示例。或者,如果由于某些其他原因必须在循环中生成值,

for ...whatever...:
  onemorevalue = ...whatever...
  keep.add(onemorevalue)

但是,当然.update,在其他可行的情况下,只需一次调用即可批量执行此操作更快,更方便。

keep.update(yoursequenceofvalues)

e.g, keep.update(xrange(11)) for your specific example. Or, if you have to produce the values in a loop for some other reason,

for ...whatever...:
  onemorevalue = ...whatever...
  keep.add(onemorevalue)

But, of course, doing it in bulk with a single .update call is faster and handier, when otherwise feasible.


回答 1

定义集合

a = set()

使用添加附加单个值

a.add(1)
a.add(2)

使用update从元组,集合,列表或冻结集合中添加元素

a.update([3,4])

>> print(a)
{1, 2, 3, 4}

如果要添加元组或冻结集本身,请使用add

a.add((5, 6))

>> print(a)
{1, 2, 3, 4, (5, 6)}

注意:由于集合元素必须是可哈希的,并且列表被认为是可变的,因此不能将列表添加到集合中。您也不能将其他集合添加到集合中。但是,您可以从列表和集合中添加元素,如“ .update”方法所示。

Define set

a = set()

Use add to append single values

a.add(1)
a.add(2)

Use update to add elements from tuples, sets, lists or frozen-sets

a.update([3,4])

>> print(a)
{1, 2, 3, 4}

If you want to add a tuple or frozen-set itself, use add

a.add((5, 6))

>> print(a)
{1, 2, 3, 4, (5, 6)}

Note: Since set elements must be hashable, and lists are considered mutable, you cannot add a list to a set. You also cannot add other sets to a set. You can however, add the elements from lists and sets as demonstrated with the “.update” method.


回答 2

您还可以使用|运算符来连接两个集合(集合理论中的集):

>>> my_set = {1}
>>> my_set = my_set | {2}
>>> my_set
{1, 2}

或更短的形式使用|=

>>> my_set = {1}
>>> my_set |= {2}
>>> my_set
{1, 2}

注意:在Python 2.7之前的版本中,请使用set([...])代替{...}

You can also use the | operator to concatenate two sets (union in set theory):

>>> my_set = {1}
>>> my_set = my_set | {2}
>>> my_set
{1, 2}

Or a shorter form using |=:

>>> my_set = {1}
>>> my_set |= {2}
>>> my_set
{1, 2}

Note: In versions prior to Python 2.7, use set([...]) instead of {...}.


回答 3

update像这样使用:

keep.update(newvalues)

Use update like this:

keep.update(newvalues)

回答 4

这个问题是第一个在Google查询“ Python如何添加要设置的元素”时显示在Google上的问题,因此,值得注意的是,如果您想将整个字符串添加到集合中,则应添加.add(),不.update()

假设您有一个foo_str内容为的字符串'this is a sentence',并且有一些bar_set等于set()

如果这样做 bar_set.update(foo_str),您设置的内容将是{'t', 'a', ' ', 'e', 's', 'n', 'h', 'c', 'i'}

如果这样做bar_set.add(foo_str),您设置的内容将是{'this is a sentence'}

This question is the first one that shows up on Google when one looks up “Python how to add elements to set”, so it’s worth noting explicitly that, if you want to add a whole string to a set, it should be added with .add(), not .update().

Say you have a string foo_str whose contents are 'this is a sentence', and you have some set bar_set equal to set().

If you do bar_set.update(foo_str), the contents of your set will be {'t', 'a', ' ', 'e', 's', 'n', 'h', 'c', 'i'}.

If you do bar_set.add(foo_str), the contents of your set will be {'this is a sentence'}.


回答 5

我喜欢这样做的方法是将原始集合和想要添加的值都转换为列表,添加它们,然后将它们转换回集合,如下所示:

setMenu = {"Eggs", "Bacon"}
print(setMenu)
> {'Bacon', 'Eggs'}
setMenu = set(list(setMenu) + list({"Spam"}))
print(setMenu)
> {'Bacon', 'Spam', 'Eggs'}
setAdditions = {"Lobster", "Sausage"}
setMenu = set(list(setMenu) + list(setAdditions))
print(setMenu)
> {'Lobster', 'Spam', 'Eggs', 'Sausage', 'Bacon'}

这样,我也可以使用相同的逻辑轻松添加多个集合,TypeError: unhashable type: 'set'如果尝试使用该.update()方法,则会得到一个提示。

The way I like to do this is to convert both the original set and the values I’d like to add into lists, add them, and then convert them back into a set, like this:

setMenu = {"Eggs", "Bacon"}
print(setMenu)
> {'Bacon', 'Eggs'}
setMenu = set(list(setMenu) + list({"Spam"}))
print(setMenu)
> {'Bacon', 'Spam', 'Eggs'}
setAdditions = {"Lobster", "Sausage"}
setMenu = set(list(setMenu) + list(setAdditions))
print(setMenu)
> {'Lobster', 'Spam', 'Eggs', 'Sausage', 'Bacon'}

This way I can also easily add multiple sets using the same logic, which gets me an TypeError: unhashable type: 'set' if I try doing it with the .update() method.


回答 6

keep.update((0,1,2,3,4,5,6,7,8,9,10))

要么

keep.update(np.arange(11))
keep.update((0,1,2,3,4,5,6,7,8,9,10))

Or

keep.update(np.arange(11))

回答 7

对我来说,在Python 3中,它的工作方式很简单:

keep = keep.union((0,1,2,3,4,5,6,7,8,9,10))

我不知道这是否正确…

For me, in Python 3, it’s working simply in this way:

keep = keep.union((0,1,2,3,4,5,6,7,8,9,10))

I don’t know if it may be correct…


Python是否有序集?

问题:Python是否有序集?

Python有一个有序的字典。有序套呢?

Python has an ordered dictionary. What about an ordered set?


回答 0

为此,有一个有序的设置(可能的新链接)配方,可从Python 2文档中引用。无需修改即可在Py2.6或更高版本以及3.0或更高版本上运行。该接口几乎与普通集合完全相同,不同之处在于初始化应使用列表进行。

OrderedSet([1, 2, 3])

这是一个MutableSet,因此for的签名.union与set 的签名不匹配,但是由于它包含__or__类似的内容,因此可以轻松添加:

@staticmethod
def union(*sets):
    union = OrderedSet()
    union.union(*sets)
    return union

def union(self, *sets):
    for set in sets:
        self |= set

There is an ordered set (possible new link) recipe for this which is referred to from the Python 2 Documentation. This runs on Py2.6 or later and 3.0 or later without any modifications. The interface is almost exactly the same as a normal set, except that initialisation should be done with a list.

OrderedSet([1, 2, 3])

This is a MutableSet, so the signature for .union doesn’t match that of set, but since it includes __or__ something similar can easily be added:

@staticmethod
def union(*sets):
    union = OrderedSet()
    union.union(*sets)
    return union

def union(self, *sets):
    for set in sets:
        self |= set

回答 1

有序集在功能上是有序字典的特例。

字典的键是唯一的。因此,如果人们不理会有序字典中的值(例如,通过分配它们None),那么实质上就是一个有序集合。

对于Python 3.1的存在collections.OrderedDict。以下是OrderedSet的示例实现。(请注意,只有很少的方法需要定义或重写:collections.OrderedDictcollections.MutableSet。做繁重)

import collections

class OrderedSet(collections.OrderedDict, collections.MutableSet):

    def update(self, *args, **kwargs):
        if kwargs:
            raise TypeError("update() takes no keyword arguments")

        for s in args:
            for e in s:
                 self.add(e)

    def add(self, elem):
        self[elem] = None

    def discard(self, elem):
        self.pop(elem, None)

    def __le__(self, other):
        return all(e in other for e in self)

    def __lt__(self, other):
        return self <= other and self != other

    def __ge__(self, other):
        return all(e in self for e in other)

    def __gt__(self, other):
        return self >= other and self != other

    def __repr__(self):
        return 'OrderedSet([%s])' % (', '.join(map(repr, self.keys())))

    def __str__(self):
        return '{%s}' % (', '.join(map(repr, self.keys())))

    difference = __sub__ 
    difference_update = __isub__
    intersection = __and__
    intersection_update = __iand__
    issubset = __le__
    issuperset = __ge__
    symmetric_difference = __xor__
    symmetric_difference_update = __ixor__
    union = __or__

An ordered set is functionally a special case of an ordered dictionary.

The keys of a dictionary are unique. Thus, if one disregards the values in an ordered dictionary (e.g. by assigning them None), then one has essentially an ordered set.

As of Python 3.1 there is collections.OrderedDict. The following is an example implementation of an OrderedSet. (Note that only few methods need to be defined or overridden: collections.OrderedDict and collections.MutableSet do the heavy lifting.)

import collections

class OrderedSet(collections.OrderedDict, collections.MutableSet):

    def update(self, *args, **kwargs):
        if kwargs:
            raise TypeError("update() takes no keyword arguments")

        for s in args:
            for e in s:
                 self.add(e)

    def add(self, elem):
        self[elem] = None

    def discard(self, elem):
        self.pop(elem, None)

    def __le__(self, other):
        return all(e in other for e in self)

    def __lt__(self, other):
        return self <= other and self != other

    def __ge__(self, other):
        return all(e in self for e in other)

    def __gt__(self, other):
        return self >= other and self != other

    def __repr__(self):
        return 'OrderedSet([%s])' % (', '.join(map(repr, self.keys())))

    def __str__(self):
        return '{%s}' % (', '.join(map(repr, self.keys())))

    difference = __sub__ 
    difference_update = __isub__
    intersection = __and__
    intersection_update = __iand__
    issubset = __le__
    issuperset = __ge__
    symmetric_difference = __xor__
    symmetric_difference_update = __ixor__
    union = __or__

回答 2

答案是否定的,但是您可以collections.OrderedDict在Python标准库中仅使用键(和None)作为同一目的。

更新:从Python 3.7(和CPython 3.6)开始,标准dict可以保证保持顺序,并且比更具性能OrderedDict。(但是,为了向后兼容,尤其是为了可读性,您可能希望继续使用OrderedDict。)

这是一个示例,该示例说明如何dict在保留订单的同时用作有序集来过滤出重复项,从而模拟有序集。使用dictclass方法fromkeys()创建字典,然后简单地要求它keys()

>>> keywords = ['foo', 'bar', 'bar', 'foo', 'baz', 'foo']

>>> list(dict.fromkeys(keywords))
['foo', 'bar', 'baz']

The answer is no, but you can use collections.OrderedDict from the Python standard library with just keys (and values as None) for the same purpose.

Update: As of Python 3.7 (and CPython 3.6), standard dict is guaranteed to preserve order and is more performant than OrderedDict. (For backward compatibility and especially readability, however, you may wish to continue using OrderedDict.)

Here’s an example of how to use dict as an ordered set to filter out duplicate items while preserving order, thereby emulating an ordered set. Use the dict class method fromkeys() to create a dict, then simply ask for the keys() back.

>>> keywords = ['foo', 'bar', 'bar', 'foo', 'baz', 'foo']

>>> list(dict.fromkeys(keywords))
['foo', 'bar', 'baz']

回答 3

我可以做得比OrderedSet更好:bollton具有2/3兼容的纯Python IndexedSet类型,该类型不仅是有序集合,而且还支持索引编制(与列表一样)。

简单地pip install boltons(或复制setutils.py到您的代码库中),导入IndexedSet和:

>>> from boltons.setutils import IndexedSet
>>> x = IndexedSet(list(range(4)) + list(range(8)))
>>> x
IndexedSet([0, 1, 2, 3, 4, 5, 6, 7])
>>> x - set(range(2))
IndexedSet([2, 3, 4, 5, 6, 7])
>>> x[-1]
7
>>> fcr = IndexedSet('freecreditreport.com')
>>> ''.join(fcr[:fcr.index('.')])
'frecditpo'

一切都是独一无二的,并保持秩序。全面披露:我写了IndexedSet,但是这也意味着如果有任何问题,您可以给我个麻烦。:)

I can do you one better than an OrderedSet: boltons has a pure-Python, 2/3-compatible IndexedSet type that is not only an ordered set, but also supports indexing (as with lists).

Simply pip install boltons (or copy setutils.py into your codebase), import the IndexedSet and:

>>> from boltons.setutils import IndexedSet
>>> x = IndexedSet(list(range(4)) + list(range(8)))
>>> x
IndexedSet([0, 1, 2, 3, 4, 5, 6, 7])
>>> x - set(range(2))
IndexedSet([2, 3, 4, 5, 6, 7])
>>> x[-1]
7
>>> fcr = IndexedSet('freecreditreport.com')
>>> ''.join(fcr[:fcr.index('.')])
'frecditpo'

Everything is unique and retained in order. Full disclosure: I wrote the IndexedSet, but that also means you can bug me if there are any issues. :)


回答 4

在PyPI上的实现

尽管其他人指出,Python中还没有内置的插入顺序保留集实现(但是),但我感到这个问题缺少一个答案,该答案指出了在PyPI上可以找到的内容。

有软件包:

其中一些实现是基于Raymond Hettinger提交给ActiveState配方的,在其他答案中也提到了该配方

一些差异

  • 有序集(1.1版)
    • 优势:O(1)用于按索引查找(例如my_set[5]
  • oset(0.1.3版)
    • 优势:O(1)为 remove(item)
    • 缺点:显然是O(n)用于按索引查找

这两个实现都为add(item)__contains__(item)item in my_set)具有O(1 )。

Implementations on PyPI

While others have pointed out that there is no built-in implementation of an insertion-order preserving set in Python (yet), I am feeling that this question is missing an answer which states what there is to be found on PyPI.

There are the packages:

Some of these implementations are based on the recipe posted by Raymond Hettinger to ActiveState which is also mentioned in other answers here.

Some differences

  • ordered-set (version 1.1)
    • advantage: O(1) for lookups by index (e.g. my_set[5])
  • oset (version 0.1.3)
    • advantage: O(1) for remove(item)
    • disadvantage: apparently O(n) for lookups by index

Both implementations have O(1) for add(item) and __contains__(item) (item in my_set).


回答 5

如果您使用排序集来维护排序顺序,请考虑使用PyPI中的排序集实现。该sortedcontainers模块提供了一个SortedSet的只是这个目的。一些好处:纯Python,快速C实现,100%单元测试覆盖率,数小时的压力测试。

通过pip从PyPI安装很容易:

pip install sortedcontainers

请注意,如果不能pip install,则只需从开源存储库中下拉sortedlist.py和sortedset.py文件。

安装完成后,您可以:

from sortedcontainers import SortedSet
help(SortedSet)

sortedcontainers模块还与几种替代实现保持性能比较

对于询问Python的bag数据类型的评论,还有一种SortedList数据类型,可用于有效地实现bag。

If you’re using the ordered set to maintain a sorted order, consider using a sorted set implementation from PyPI. The sortedcontainers module provides a SortedSet for just this purpose. Some benefits: pure-Python, fast-as-C implementations, 100% unit test coverage, hours of stress testing.

Installing from PyPI is easy with pip:

pip install sortedcontainers

Note that if you can’t pip install, simply pull down the sortedlist.py and sortedset.py files from the open-source repository.

Once installed you can simply:

from sortedcontainers import SortedSet
help(SortedSet)

The sortedcontainers module also maintains a performance comparison with several alternative implementations.

For the comment that asked about Python’s bag data type, there’s alternatively a SortedList data type which can be used to efficiently implement a bag.


回答 6

如果您已经在代码中使用了pandas,则其Index对象的行为就很像有序集,如本文中所示。

文章中的示例:

indA = pd.Index([1, 3, 5, 7, 9])
indB = pd.Index([2, 3, 5, 7, 11])

indA & indB  # intersection
indA | indB  # union
indA - indB  # difference
indA ^ indB  # symmetric difference

In case you’re already using pandas in your code, its Index object behaves pretty like an ordered set, as shown in this article.

Examples from the article:

indA = pd.Index([1, 3, 5, 7, 9])
indB = pd.Index([2, 3, 5, 7, 11])

indA & indB  # intersection
indA | indB  # union
indA - indB  # difference
indA ^ indB  # symmetric difference

回答 7

太迟了一点,但我已经写了一类setlist作为一部分collections-extended完全实现双方SequenceSet

>>> from collections_extended import setlist
>>> sl = setlist('abracadabra')
>>> sl
setlist(('a', 'b', 'r', 'c', 'd'))
>>> sl[3]
'c'
>>> sl[-1]
'd'
>>> 'r' in sl  # testing for inclusion is fast
True
>>> sl.index('d')  # so is finding the index of an element
4
>>> sl.insert(1, 'd')  # inserting an element already in raises a ValueError
ValueError
>>> sl.index('d')
4

GitHub:https : //github.com/mlenzen/collections-extended

文档:http : //collections-extended.lenzm.net/en/latest/

PyPI:https://pypi.python.org/pypi/collections-extended

A little late to the game, but I’ve written a class setlist as part of collections-extended that fully implements both Sequence and Set

>>> from collections_extended import setlist
>>> sl = setlist('abracadabra')
>>> sl
setlist(('a', 'b', 'r', 'c', 'd'))
>>> sl[3]
'c'
>>> sl[-1]
'd'
>>> 'r' in sl  # testing for inclusion is fast
True
>>> sl.index('d')  # so is finding the index of an element
4
>>> sl.insert(1, 'd')  # inserting an element already in raises a ValueError
ValueError
>>> sl.index('d')
4

GitHub: https://github.com/mlenzen/collections-extended

Documentation: http://collections-extended.lenzm.net/en/latest/

PyPI: https://pypi.python.org/pypi/collections-extended


回答 8

OrderedSet官方图书馆没有。我为所有数据结构制作了详尽的备忘单,以供您参考。

DataStructure = {
    'Collections': {
        'Map': [
            ('dict', 'OrderDict', 'defaultdict'),
            ('chainmap', 'types.MappingProxyType')
        ],
        'Set': [('set', 'frozenset'), {'multiset': 'collection.Counter'}]
    },
    'Sequence': {
        'Basic': ['list', 'tuple', 'iterator']
    },
    'Algorithm': {
        'Priority': ['heapq', 'queue.PriorityQueue'],
        'Queue': ['queue.Queue', 'multiprocessing.Queue'],
        'Stack': ['collection.deque', 'queue.LifeQueue']
        },
    'text_sequence': ['str', 'byte', 'bytearray']
}

There’s no OrderedSet in official library. I make an exhaustive cheatsheet of all the data structure for your reference.

DataStructure = {
    'Collections': {
        'Map': [
            ('dict', 'OrderDict', 'defaultdict'),
            ('chainmap', 'types.MappingProxyType')
        ],
        'Set': [('set', 'frozenset'), {'multiset': 'collection.Counter'}]
    },
    'Sequence': {
        'Basic': ['list', 'tuple', 'iterator']
    },
    'Algorithm': {
        'Priority': ['heapq', 'queue.PriorityQueue'],
        'Queue': ['queue.Queue', 'multiprocessing.Queue'],
        'Stack': ['collection.deque', 'queue.LifeQueue']
        },
    'text_sequence': ['str', 'byte', 'bytearray']
}

回答 9

所述ParallelRegression包提供了SETLIST()有序集类,它是多个方法完成比基于ActiveState的配方的选项。它支持可用于列表的所有方法以及大多数(如果不是全部)可用于集合的方法。

The ParallelRegression package provides a setList( ) ordered set class that is more method-complete than the options based on the ActiveState recipe. It supports all methods available for lists and most if not all methods available for sets.


回答 10

正如其他答案所提到的,对于python 3.7+,该字典按定义排序。除了子类化之外,OrderedDict我们还可以子类化abc.collections.MutableSettyping.MutableSet使用dict的键存储值。

class OrderedSet(typing.MutableSet[T]):
    """A set that preserves insertion order by internally using a dict."""

    def __init__(self, iterable: t.Iterator[T]):
        self._d = dict.fromkeys(iterable)

    def add(self, x: T) -> None:
        self._d[x] = None

    def discard(self, x: T) -> None:
        self._d.pop(x)

    def __contains__(self, x: object) -> bool:
        return self._d.__contains__(x)

    def __len__(self) -> int:
        return self._d.__len__()

    def __iter__(self) -> t.Iterator[T]:
        return self._d.__iter__()

然后:

x = OrderedSet([1, 2, -1, "bar"])
x.add(0)
assert list(x) == [1, 2, -1, "bar", 0]

我将此代码放在一个小的库中,所以任何人都可以pip install

As other answers mention, as for python 3.7+, the dict is ordered by definition. Instead of subclassing OrderedDict we can subclass abc.collections.MutableSet or typing.MutableSet using the dict’s keys to store our values.

class OrderedSet(typing.MutableSet[T]):
    """A set that preserves insertion order by internally using a dict."""

    def __init__(self, iterable: t.Iterator[T]):
        self._d = dict.fromkeys(iterable)

    def add(self, x: T) -> None:
        self._d[x] = None

    def discard(self, x: T) -> None:
        self._d.pop(x)

    def __contains__(self, x: object) -> bool:
        return self._d.__contains__(x)

    def __len__(self) -> int:
        return self._d.__len__()

    def __iter__(self) -> t.Iterator[T]:
        return self._d.__iter__()

Then just:

x = OrderedSet([1, 2, -1, "bar"])
x.add(0)
assert list(x) == [1, 2, -1, "bar", 0]

I put this code in a small library, so anyone can just pip install it.


回答 11

对于许多目的,只需调用sorted就足够了。例如

>>> s = set([0, 1, 2, 99, 4, 40, 3, 20, 24, 100, 60])
>>> sorted(s)
[0, 1, 2, 3, 4, 20, 24, 40, 60, 99, 100]

如果要重复使用此功能,则调用sorted函数会产生开销,因此,只要您完成更改集合的操作,就可能希望保存结果列表。如果您需要维护唯一元素并进行排序,那么我建议您使用具有任意值(例如无)的集合中的OrderedDict。

For many purposes simply calling sorted will suffice. For example

>>> s = set([0, 1, 2, 99, 4, 40, 3, 20, 24, 100, 60])
>>> sorted(s)
[0, 1, 2, 3, 4, 20, 24, 40, 60, 99, 100]

If you are going to use this repeatedly, there will be overhead incurred by calling the sorted function so you might want to save the resulting list, as long as you’re done changing the set. If you need to maintain unique elements and sorted, I agree with the suggestion of using OrderedDict from collections with an arbitrary value such as None.


回答 12

因此,我还有一个小清单,很明显可以引入非唯一值。

我搜索了某种唯一列表的存在,但是后来意识到在添加元素之前测试元素的存在就可以了。

if(not new_element in my_list):
    my_list.append(new_element)

我不知道这种简单方法是否有警告,但可以解决我的问题。

So i also had a small list where i clearly had the possibility of introducing non-unique values.

I searched for the existence of a unique list of some sort, but then realized that testing the existence of the element before adding it works just fine.

if(not new_element in my_list):
    my_list.append(new_element)

I don’t know if there are caveats to this simple approach, but it solves my problem.


空集文字?

问题:空集文字?

[] =空 list

() =空 tuple

{} =空 dict

空有类似的记号set吗?还是我必须写set()

[] = empty list

() = empty tuple

{} = empty dict

Is there a similar notation for an empty set? Or do I have to write set()?


回答 0

不,空集没有文字语法。你必须写set()

No, there’s no literal syntax for the empty set. You have to write set().


回答 1

一定请使用 set()创建一个空集。

但是,如果您想打动别人,请告诉他们您可以使用文字和*Python> = 3.5(请参阅PEP 448)创建一个空集,方法是:

>>> s = {*()}  # or {*{}} or {*[]}
>>> print(s)
set()

这基本上是一种更简洁的方法{_ for _ in ()},但是,请不要这样做。

By all means, please use set() to create an empty set.

But, if you want to impress people, tell them that you can create an empty set using literals and * with Python >= 3.5 (see PEP 448) by doing:

>>> s = {*()}  # or {*{}} or {*[]}
>>> print(s)
set()

this is basically a more condensed way of doing {_ for _ in ()}, but, don’t do this.


回答 2

只是为了扩展公认的答案:

从version 2.73.1python起,set文字已经{}以用法的形式出现了{1,2,3},但是{}它本身仍然用于空字典。

Python 2.7(第一行在Python <2.7中无效)

>>> {1,2,3}.__class__
<type 'set'>
>>> {}.__class__
<type 'dict'>

Python 3.x

>>> {1,4,5}.__class__
<class 'set'>
>>> {}.__class__
<type 'dict'>

此处更多内容:https//docs.python.org/3/whatsnew/2.7.html#other-language-changes

Just to extend the accepted answer:

From version 2.7 and 3.1 python has got set literal {} in form of usage {1,2,3}, but {} itself still used for empty dict.

Python 2.7 (first line is invalid in Python <2.7)

>>> {1,2,3}.__class__
<type 'set'>
>>> {}.__class__
<type 'dict'>

Python 3.x

>>> {1,4,5}.__class__
<class 'set'>
>>> {}.__class__
<type 'dict'>

More here: https://docs.python.org/3/whatsnew/2.7.html#other-language-changes


回答 3

这取决于您是否要使用文字进行比较或赋值。

如果要将现有集设为空,则可以使用该.clear()方法,尤其是在要避免创建新对象的情况下。如果要进行比较,请使用set()或检查长度是否为0。

例:

#create a new set    
a=set([1,2,3,'foo','bar'])
#or, using a literal:
a={1,2,3,'foo','bar'}

#create an empty set
a=set()
#or, use the clear method
a.clear()

#comparison to a new blank set
if a==set():
    #do something

#length-checking comparison
if len(a)==0:
    #do something

It depends on if you want the literal for a comparison, or for assignment.

If you want to make an existing set empty, you can use the .clear() metod, especially if you want to avoid creating a new object. If you want to do a comparison, use set() or check if the length is 0.

example:

#create a new set    
a=set([1,2,3,'foo','bar'])
#or, using a literal:
a={1,2,3,'foo','bar'}

#create an empty set
a=set()
#or, use the clear method
a.clear()

#comparison to a new blank set
if a==set():
    #do something

#length-checking comparison
if len(a)==0:
    #do something

回答 4

更加疯狂的想法是:在Python 3接受Unicode标识符的情况下,您可以声明一个变量ϕ = frozenset()(ϕ为U + 03D5)并使用它。

Adding to the crazy ideas: with Python 3 accepting unicode identifiers, you could declare a variable ϕ = frozenset() (ϕ is U+03D5) and use it instead.


回答 5

是。适用于非空dict / set的相同表示法适用于空dict / set。

注意非空dictset文字之间的区别:

{1: 'a', 2: 'b', 3: 'c'}-一个数字键-值对的内部使得dict
{'aaa', 'bbb', 'ccc'}-一个元组值的内部使一个set

所以:

{}==零个键值对==空dict
{*()}==空值元组==空set

但是事实是您可以做到,但这并不意味着您应该这样做。除非您有很强的理由,否则最好显式构造一个空集,例如:

a = set()

注意:正如评论中注意到的那样,{()}不是一个空集合。这是一个包含1个元素的集合:空元组。

Yes. The same notation that works for non-empty dict/set works for empty ones.

Notice the difference between non-empty dict and set literals:

{1: 'a', 2: 'b', 3: 'c'} — a number of key-value pairs inside makes a dict
{'aaa', 'bbb', 'ccc'} — a tuple of values inside makes a set

So:

{} == zero number of key-value pairs == empty dict
{*()} == empty tuple of values == empty set

However the fact, that you can do it, doesn’t mean you should. Unless you have some strong reasons, it’s better to construct an empty set explicitly, like:

a = set()

NB: As ctrueden noticed in comments, {()} is not an empty set. It’s a set with 1 element: empty tuple.


回答 6

有几种方法可以在Python中创建空Set:

  1. 使用 set()方法
    这是python中的内置方法,可在该变量中创建Empty set。
  2. 使用clear()方法(创造性的工程师技术LOL)
    请参见以下示例:

    sets = {“ Hi”,“ How”,“ are”,“ You”,“ All”}
    类型(集合)(此行输出:set)
    集合.clear()
    print(sets)(此行输出:{})
    type(sets)(此行输出:set)

因此,这是创建空Set的2种方法。

There are few ways to create empty Set in Python :

  1. Using set() method
    This is the built-in method in python that creates Empty set in that variable.
  2. Using clear() method (creative Engineer Technique LOL)
    See this Example:

    sets={“Hi”,”How”,”are”,”You”,”All”}
    type(sets)  (This Line Output : set)
    sets.clear()
    print(sets)  (This Line Output : {})
    type(sets)  (This Line Output : set)

So, This are 2 ways to create empty Set.


如何从集合中检索元素而不删除它?

问题:如何从集合中检索元素而不删除它?

假设以下内容:

>>> s = set([1, 2, 3])

如何获得的值(任意值)出来s而不做s.pop()?我想将该项目保留在集合中,直到我确定可以删除它为止-只有在异步调用另一个主机后才能确定。

快速又肮脏:

>>> elem = s.pop()
>>> s.add(elem)

但是您知道更好的方法吗?理想的情况是恒定时间。

Suppose the following:

>>> s = set([1, 2, 3])

How do I get a value (any value) out of s without doing s.pop()? I want to leave the item in the set until I am sure I can remove it – something I can only be sure of after an asynchronous call to another host.

Quick and dirty:

>>> elem = s.pop()
>>> s.add(elem)

But do you know of a better way? Ideally in constant time.


回答 0

不需要复制整个集合的两个选项:

for e in s:
    break
# e is now an element from s

要么…

e = next(iter(s))

但是总的来说,集合不支持索引或切片。

Two options that don’t require copying the whole set:

for e in s:
    break
# e is now an element from s

Or…

e = next(iter(s))

But in general, sets don’t support indexing or slicing.


回答 1

最少的代码是:

>>> s = set([1, 2, 3])
>>> list(s)[0]
1

显然,这将创建一个新列表,其中包含该集合的每个成员,因此如果您的集合很大,就不会很大。

Least code would be:

>>> s = set([1, 2, 3])
>>> list(s)[0]
1

Obviously this would create a new list which contains each member of the set, so not great if your set is very large.


回答 2

我想知道这些功能如何在不同的集合上执行,所以我做了一个基准测试:

from random import sample

def ForLoop(s):
    for e in s:
        break
    return e

def IterNext(s):
    return next(iter(s))

def ListIndex(s):
    return list(s)[0]

def PopAdd(s):
    e = s.pop()
    s.add(e)
    return e

def RandomSample(s):
    return sample(s, 1)

def SetUnpacking(s):
    e, *_ = s
    return e

from simple_benchmark import benchmark

b = benchmark([ForLoop, IterNext, ListIndex, PopAdd, RandomSample, SetUnpacking],
              {2**i: set(range(2**i)) for i in range(1, 20)},
              argument_name='set size',
              function_aliases={first: 'First'})

b.plot()

在此处输入图片说明

该图清楚地表明,一些方法(RandomSampleSetUnpackingListIndex)依赖于集的大小,并应在一般情况下可避免(至少在性能可能是重要的)。正如其他答案已经显示的那样,最快的方法是ForLoop

但是,只要使用恒定时间方法之一,性能差异就可以忽略不计。


iteration_utilities(免责声明:我是作者)包含此用例的便捷功能first

>>> from iteration_utilities import first
>>> first({1,2,3,4})
1

我也将其包含在上述基准中。它可以与其他两个“快速”解决方案竞争,但是两者之间的差异并不大。

I wondered how the functions will perform for different sets, so I did a benchmark:

from random import sample

def ForLoop(s):
    for e in s:
        break
    return e

def IterNext(s):
    return next(iter(s))

def ListIndex(s):
    return list(s)[0]

def PopAdd(s):
    e = s.pop()
    s.add(e)
    return e

def RandomSample(s):
    return sample(s, 1)

def SetUnpacking(s):
    e, *_ = s
    return e

from simple_benchmark import benchmark

b = benchmark([ForLoop, IterNext, ListIndex, PopAdd, RandomSample, SetUnpacking],
              {2**i: set(range(2**i)) for i in range(1, 20)},
              argument_name='set size',
              function_aliases={first: 'First'})

b.plot()

enter image description here

This plot clearly shows that some approaches (RandomSample, SetUnpacking and ListIndex) depend on the size of the set and should be avoided in the general case (at least if performance might be important). As already shown by the other answers the fastest way is ForLoop.

However as long as one of the constant time approaches is used the performance difference will be negligible.


iteration_utilities (Disclaimer: I’m the author) contains a convenience function for this use-case: first:

>>> from iteration_utilities import first
>>> first({1,2,3,4})
1

I also included it in the benchmark above. It can compete with the other two “fast” solutions but the difference isn’t much either way.


回答 3

tl; dr

for first_item in muh_set: break仍然是Python 3.x中的最佳方法。诅咒你,圭多。

do这样做

欢迎使用从wr推断出来的另一组Python 3.x计时出色的特定于Python 2.x的响应。与AChampion同样有用的特定于Python 3.x的响应不同,下面的时间安排建议了上面提到的时间异常解决方案,包括:

欢乐代码段

开启,收听,定时:

from timeit import Timer

stats = [
    "for i in range(1000): \n\tfor x in s: \n\t\tbreak",
    "for i in range(1000): next(iter(s))",
    "for i in range(1000): s.add(s.pop())",
    "for i in range(1000): list(s)[0]",
    "for i in range(1000): random.sample(s, 1)",
]

for stat in stats:
    t = Timer(stat, setup="import random\ns=set(range(100))")
    try:
        print("Time for %s:\t %f"%(stat, t.timeit(number=1000)))
    except:
        t.print_exc()

快速过时的计时

看哪!按最快到最慢的片段排序:

$ ./test_get.py
Time for for i in range(1000): 
    for x in s: 
        break:   0.249871
Time for for i in range(1000): next(iter(s)):    0.526266
Time for for i in range(1000): s.add(s.pop()):   0.658832
Time for for i in range(1000): list(s)[0]:   4.117106
Time for for i in range(1000): random.sample(s, 1):  21.851104

全家人的面部植物

毫不奇怪,手动迭代的速度至少是下一个最快解决方案的两倍。尽管与Bad Old Python 2.x相比(以前的手动迭代速度至少快四倍),差距有所减小,但PPE 20狂热者对我而言最冗长的解决方案是最好的解决方案感到失望。至少将一个集转换为一个列表以仅提取该集的第一个元素是预期的那样可怕。感谢Guido,愿他的光芒继续指导我们。

令人惊讶的是,基于RNG的解决方案绝对可怕。列表转换不好,但random 确实会带来糟糕的结果。对于随机数上帝来说如此之多。

我只是希望那些无定形set.get_first()的人已经为我们准备了一种方法。如果您正在阅读本文,他们:“请。做点什么。”

tl;dr

for first_item in muh_set: break remains the optimal approach in Python 3.x. Curse you, Guido.

y u do this

Welcome to yet another set of Python 3.x timings, extrapolated from wr.‘s excellent Python 2.x-specific response. Unlike AChampion‘s equally helpful Python 3.x-specific response, the timings below also time outlier solutions suggested above – including:

Code Snippets for Great Joy

Turn on, tune in, time it:

from timeit import Timer

stats = [
    "for i in range(1000): \n\tfor x in s: \n\t\tbreak",
    "for i in range(1000): next(iter(s))",
    "for i in range(1000): s.add(s.pop())",
    "for i in range(1000): list(s)[0]",
    "for i in range(1000): random.sample(s, 1)",
]

for stat in stats:
    t = Timer(stat, setup="import random\ns=set(range(100))")
    try:
        print("Time for %s:\t %f"%(stat, t.timeit(number=1000)))
    except:
        t.print_exc()

Quickly Obsoleted Timeless Timings

Behold! Ordered by fastest to slowest snippets:

$ ./test_get.py
Time for for i in range(1000): 
    for x in s: 
        break:   0.249871
Time for for i in range(1000): next(iter(s)):    0.526266
Time for for i in range(1000): s.add(s.pop()):   0.658832
Time for for i in range(1000): list(s)[0]:   4.117106
Time for for i in range(1000): random.sample(s, 1):  21.851104

Faceplants for the Whole Family

Unsurprisingly, manual iteration remains at least twice as fast as the next fastest solution. Although the gap has decreased from the Bad Old Python 2.x days (in which manual iteration was at least four times as fast), it disappoints the PEP 20 zealot in me that the most verbose solution is the best. At least converting a set into a list just to extract the first element of the set is as horrible as expected. Thank Guido, may his light continue to guide us.

Surprisingly, the RNG-based solution is absolutely horrible. List conversion is bad, but random really takes the awful-sauce cake. So much for the Random Number God.

I just wish the amorphous They would PEP up a set.get_first() method for us already. If you’re reading this, They: “Please. Do something.”


回答 4

为了提供不同方法背后的一些时序图,请考虑以下代码。 get()是我自定义添加到Python的setobject.c中,只是一个pop()而没有删除该元素。

from timeit import *

stats = ["for i in xrange(1000): iter(s).next()   ",
         "for i in xrange(1000): \n\tfor x in s: \n\t\tbreak",
         "for i in xrange(1000): s.add(s.pop())   ",
         "for i in xrange(1000): s.get()          "]

for stat in stats:
    t = Timer(stat, setup="s=set(range(100))")
    try:
        print "Time for %s:\t %f"%(stat, t.timeit(number=1000))
    except:
        t.print_exc()

输出为:

$ ./test_get.py
Time for for i in xrange(1000): iter(s).next()   :       0.433080
Time for for i in xrange(1000):
        for x in s:
                break:   0.148695
Time for for i in xrange(1000): s.add(s.pop())   :       0.317418
Time for for i in xrange(1000): s.get()          :       0.146673

这意味着for / break解决方案是最快的(有时比自定义get()解决方案更快)。

To provide some timing figures behind the different approaches, consider the following code. The get() is my custom addition to Python’s setobject.c, being just a pop() without removing the element.

from timeit import *

stats = ["for i in xrange(1000): iter(s).next()   ",
         "for i in xrange(1000): \n\tfor x in s: \n\t\tbreak",
         "for i in xrange(1000): s.add(s.pop())   ",
         "for i in xrange(1000): s.get()          "]

for stat in stats:
    t = Timer(stat, setup="s=set(range(100))")
    try:
        print "Time for %s:\t %f"%(stat, t.timeit(number=1000))
    except:
        t.print_exc()

The output is:

$ ./test_get.py
Time for for i in xrange(1000): iter(s).next()   :       0.433080
Time for for i in xrange(1000):
        for x in s:
                break:   0.148695
Time for for i in xrange(1000): s.add(s.pop())   :       0.317418
Time for for i in xrange(1000): s.get()          :       0.146673

This means that the for/break solution is the fastest (sometimes faster than the custom get() solution).


回答 5

由于您需要一个随机元素,因此也可以使用:

>>> import random
>>> s = set([1,2,3])
>>> random.sample(s, 1)
[2]

该文档似乎并未提及的性能random.sample。从包含大量列表和大量集合的非常快速的实证检验来看,列表似乎是恒定时间,但集合却并非如此。同样,对集合的迭代也不是随机的。顺序不确定,但可以预测:

>>> list(set(range(10))) == range(10)
True 

如果随机性很重要,并且您需要在恒定时间内(大量集合)使用一堆元素,那么我会先使用random.sample并转换为列表:

>>> lst = list(s) # once, O(len(s))?
...
>>> e = random.sample(lst, 1)[0] # constant time

Since you want a random element, this will also work:

>>> import random
>>> s = set([1,2,3])
>>> random.sample(s, 1)
[2]

The documentation doesn’t seem to mention performance of random.sample. From a really quick empirical test with a huge list and a huge set, it seems to be constant time for a list but not for the set. Also, iteration over a set isn’t random; the order is undefined but predictable:

>>> list(set(range(10))) == range(10)
True 

If randomness is important and you need a bunch of elements in constant time (large sets), I’d use random.sample and convert to a list first:

>>> lst = list(s) # once, O(len(s))?
...
>>> e = random.sample(lst, 1)[0] # constant time

回答 6

似乎是最紧凑的(6个符号),但是获得集合元素的方法很慢(由PEP 3132实现):

e,*_=s

在Python 3.5+中,您还可以使用以下7符号表达式(感谢PEP 448):

[*s][0]

这两种选择在我的机器上都比for循环方法慢大约1000倍。

Seemingly the most compact (6 symbols) though very slow way to get a set element (made possible by PEP 3132):

e,*_=s

With Python 3.5+ you can also use this 7-symbol expression (thanks to PEP 448):

[*s][0]

Both options are roughly 1000 times slower on my machine than the for-loop method.


回答 7

我使用编写的实用程序函数。它的名称有点误导,因为它暗示它可能是随机物品或类似物品。

def anyitem(iterable):
    try:
        return iter(iterable).next()
    except StopIteration:
        return None

I use a utility function I wrote. Its name is somewhat misleading because it kind of implies it might be a random item or something like that.

def anyitem(iterable):
    try:
        return iter(iterable).next()
    except StopIteration:
        return None

回答 8

在@wr之后。帖子,我得到类似的结果(对于Python3.5)

from timeit import *

stats = ["for i in range(1000): next(iter(s))",
         "for i in range(1000): \n\tfor x in s: \n\t\tbreak",
         "for i in range(1000): s.add(s.pop())"]

for stat in stats:
    t = Timer(stat, setup="s=set(range(100000))")
    try:
        print("Time for %s:\t %f"%(stat, t.timeit(number=1000)))
    except:
        t.print_exc()

输出:

Time for for i in range(1000): next(iter(s)):    0.205888
Time for for i in range(1000): 
    for x in s: 
        break:                                   0.083397
Time for for i in range(1000): s.add(s.pop()):   0.226570

但是,在更改基础集(例如,调用remove())时,对于可迭代的示例(foriter)来说效果很差:

from timeit import *

stats = ["while s:\n\ta = next(iter(s))\n\ts.remove(a)",
         "while s:\n\tfor x in s: break\n\ts.remove(x)",
         "while s:\n\tx=s.pop()\n\ts.add(x)\n\ts.remove(x)"]

for stat in stats:
    t = Timer(stat, setup="s=set(range(100000))")
    try:
        print("Time for %s:\t %f"%(stat, t.timeit(number=1000)))
    except:
        t.print_exc()

结果是:

Time for while s:
    a = next(iter(s))
    s.remove(a):             2.938494
Time for while s:
    for x in s: break
    s.remove(x):             2.728367
Time for while s:
    x=s.pop()
    s.add(x)
    s.remove(x):             0.030272

Following @wr. post, I get similar results (for Python3.5)

from timeit import *

stats = ["for i in range(1000): next(iter(s))",
         "for i in range(1000): \n\tfor x in s: \n\t\tbreak",
         "for i in range(1000): s.add(s.pop())"]

for stat in stats:
    t = Timer(stat, setup="s=set(range(100000))")
    try:
        print("Time for %s:\t %f"%(stat, t.timeit(number=1000)))
    except:
        t.print_exc()

Output:

Time for for i in range(1000): next(iter(s)):    0.205888
Time for for i in range(1000): 
    for x in s: 
        break:                                   0.083397
Time for for i in range(1000): s.add(s.pop()):   0.226570

However, when changing the underlying set (e.g. call to remove()) things go badly for the iterable examples (for, iter):

from timeit import *

stats = ["while s:\n\ta = next(iter(s))\n\ts.remove(a)",
         "while s:\n\tfor x in s: break\n\ts.remove(x)",
         "while s:\n\tx=s.pop()\n\ts.add(x)\n\ts.remove(x)"]

for stat in stats:
    t = Timer(stat, setup="s=set(range(100000))")
    try:
        print("Time for %s:\t %f"%(stat, t.timeit(number=1000)))
    except:
        t.print_exc()

Results in:

Time for while s:
    a = next(iter(s))
    s.remove(a):             2.938494
Time for while s:
    for x in s: break
    s.remove(x):             2.728367
Time for while s:
    x=s.pop()
    s.add(x)
    s.remove(x):             0.030272

回答 9

我通常对小型馆藏所做的是创建这样的解析器/转换器方法

def convertSetToList(setName):
return list(setName)

然后我可以使用新列表并按索引号访问

userFields = convertSetToList(user)
name = request.json[userFields[0]]

作为列表,您将拥有可能需要使用的所有其他方法

What I usually do for small collections is to create kind of parser/converter method like this

def convertSetToList(setName):
return list(setName)

Then I can use the new list and access by index number

userFields = convertSetToList(user)
name = request.json[userFields[0]]

As a list you will have all the other methods that you may need to work with


回答 10

怎么s.copy().pop()样 我尚未计时,但它应该可以正常工作,而且很简单。但是,它适用于小型集,因为它可以复制整个集。

How about s.copy().pop()? I haven’t timed it, but it should work and it’s simple. It works best for small sets however, as it copies the whole set.


回答 11

另一种选择是将字典与不需要的值一起使用。例如,


poor_man_set = {}
poor_man_set[1] = None
poor_man_set[2] = None
poor_man_set[3] = None
...

您可以将键视为一组,但它们只是一个数组:


keys = poor_man_set.keys()
print "Some key = %s" % keys[0]

这种选择的副作用是您的代码将与较旧set的Python 早期版本向后兼容。这也许不是最佳答案,但这是另一种选择。

编辑:您甚至可以执行以下操作来隐藏使用字典而不是数组或集合的事实:


poor_man_set = {}
poor_man_set[1] = None
poor_man_set[2] = None
poor_man_set[3] = None
poor_man_set = poor_man_set.keys()

Another option is to use a dictionary with values you don’t care about. E.g.,


poor_man_set = {}
poor_man_set[1] = None
poor_man_set[2] = None
poor_man_set[3] = None
...

You can treat the keys as a set except that they’re just an array:


keys = poor_man_set.keys()
print "Some key = %s" % keys[0]

A side effect of this choice is that your code will be backwards compatible with older, pre-set versions of Python. It’s maybe not the best answer but it’s another option.

Edit: You can even do something like this to hide the fact that you used a dict instead of an array or set:


poor_man_set = {}
poor_man_set[1] = None
poor_man_set[2] = None
poor_man_set[3] = None
poor_man_set = poor_man_set.keys()

获得两个列表之间的差异

问题:获得两个列表之间的差异

我在Python中有两个列表,如下所示:

temp1 = ['One', 'Two', 'Three', 'Four']
temp2 = ['One', 'Two']

我需要用第一个列表中没有的项目创建第三个列表。从示例中,我必须得到:

temp3 = ['Three', 'Four']

有没有循环和检查的快速方法吗?

I have two lists in Python, like these:

temp1 = ['One', 'Two', 'Three', 'Four']
temp2 = ['One', 'Two']

I need to create a third list with items from the first list which aren’t present in the second one. From the example I have to get:

temp3 = ['Three', 'Four']

Are there any fast ways without cycles and checking?


回答 0

In [5]: list(set(temp1) - set(temp2))
Out[5]: ['Four', 'Three']

当心

In [5]: set([1, 2]) - set([2, 3])
Out[5]: set([1]) 

您可能希望/希望它等于的位置set([1, 3])。如果确实要set([1, 3])作为答案,则需要使用set([1, 2]).symmetric_difference(set([2, 3]))

In [5]: list(set(temp1) - set(temp2))
Out[5]: ['Four', 'Three']

Beware that

In [5]: set([1, 2]) - set([2, 3])
Out[5]: set([1]) 

where you might expect/want it to equal set([1, 3]). If you do want set([1, 3]) as your answer, you’ll need to use set([1, 2]).symmetric_difference(set([2, 3])).


回答 1

现有解决方案均提供以下一项或多项:

  • 比O(n * m)性能快。
  • 保留输入列表的顺序。

但是到目前为止,还没有解决方案。如果两者都想要,请尝试以下操作:

s = set(temp2)
temp3 = [x for x in temp1 if x not in s]

性能测试

import timeit
init = 'temp1 = list(range(100)); temp2 = [i * 2 for i in range(50)]'
print timeit.timeit('list(set(temp1) - set(temp2))', init, number = 100000)
print timeit.timeit('s = set(temp2);[x for x in temp1 if x not in s]', init, number = 100000)
print timeit.timeit('[item for item in temp1 if item not in temp2]', init, number = 100000)

结果:

4.34620224079 # ars' answer
4.2770634955  # This answer
30.7715615392 # matt b's answer

我介绍的方法以及保留顺序也比集合减法要快(略),因为它不需要构造不必要的集合。如果第一个列表比第二个列表长很多,并且散列很昂贵,则性能差异将更加明显。这是第二个测试,证明了这一点:

init = '''
temp1 = [str(i) for i in range(100000)]
temp2 = [str(i * 2) for i in range(50)]
'''

结果:

11.3836875916 # ars' answer
3.63890368748 # this answer (3 times faster!)
37.7445402279 # matt b's answer

The existing solutions all offer either one or the other of:

  • Faster than O(n*m) performance.
  • Preserve order of input list.

But so far no solution has both. If you want both, try this:

s = set(temp2)
temp3 = [x for x in temp1 if x not in s]

Performance test

import timeit
init = 'temp1 = list(range(100)); temp2 = [i * 2 for i in range(50)]'
print timeit.timeit('list(set(temp1) - set(temp2))', init, number = 100000)
print timeit.timeit('s = set(temp2);[x for x in temp1 if x not in s]', init, number = 100000)
print timeit.timeit('[item for item in temp1 if item not in temp2]', init, number = 100000)

Results:

4.34620224079 # ars' answer
4.2770634955  # This answer
30.7715615392 # matt b's answer

The method I presented as well as preserving order is also (slightly) faster than the set subtraction because it doesn’t require construction of an unnecessary set. The performance difference would be more noticable if the first list is considerably longer than the second and if hashing is expensive. Here’s a second test demonstrating this:

init = '''
temp1 = [str(i) for i in range(100000)]
temp2 = [str(i * 2) for i in range(50)]
'''

Results:

11.3836875916 # ars' answer
3.63890368748 # this answer (3 times faster!)
37.7445402279 # matt b's answer

回答 2

temp3 = [item for item in temp1 if item not in temp2]
temp3 = [item for item in temp1 if item not in temp2]

回答 3

可以使用以下简单函数找到两个列表(例如list1和list2)之间的差异。

def diff(list1, list2):
    c = set(list1).union(set(list2))  # or c = set(list1) | set(list2)
    d = set(list1).intersection(set(list2))  # or d = set(list1) & set(list2)
    return list(c - d)

要么

def diff(list1, list2):
    return list(set(list1).symmetric_difference(set(list2)))  # or return list(set(list1) ^ set(list2))

通过使用上述功能,可以使用diff(temp2, temp1)或找到差异diff(temp1, temp2)。两者都会给出结果['Four', 'Three']。您不必担心列表的顺序或先给出哪个列表。

Python文档参考

The difference between two lists (say list1 and list2) can be found using the following simple function.

def diff(list1, list2):
    c = set(list1).union(set(list2))  # or c = set(list1) | set(list2)
    d = set(list1).intersection(set(list2))  # or d = set(list1) & set(list2)
    return list(c - d)

or

def diff(list1, list2):
    return list(set(list1).symmetric_difference(set(list2)))  # or return list(set(list1) ^ set(list2))

By Using the above function, the difference can be found using diff(temp2, temp1) or diff(temp1, temp2). Both will give the result ['Four', 'Three']. You don’t have to worry about the order of the list or which list is to be given first.

Python doc reference


回答 4

如果您需要递归的区别,我已经为python编写了一个软件包:https : //github.com/seperman/deepdiff

安装

从PyPi安装:

pip install deepdiff

用法示例

输入

>>> from deepdiff import DeepDiff
>>> from pprint import pprint
>>> from __future__ import print_function # In case running on Python 2

同一对象返回空

>>> t1 = {1:1, 2:2, 3:3}
>>> t2 = t1
>>> print(DeepDiff(t1, t2))
{}

项目类型已更改

>>> t1 = {1:1, 2:2, 3:3}
>>> t2 = {1:1, 2:"2", 3:3}
>>> pprint(DeepDiff(t1, t2), indent=2)
{ 'type_changes': { 'root[2]': { 'newtype': <class 'str'>,
                                 'newvalue': '2',
                                 'oldtype': <class 'int'>,
                                 'oldvalue': 2}}}

物品的价值已经改变

>>> t1 = {1:1, 2:2, 3:3}
>>> t2 = {1:1, 2:4, 3:3}
>>> pprint(DeepDiff(t1, t2), indent=2)
{'values_changed': {'root[2]': {'newvalue': 4, 'oldvalue': 2}}}

添加和/或删除项目

>>> t1 = {1:1, 2:2, 3:3, 4:4}
>>> t2 = {1:1, 2:4, 3:3, 5:5, 6:6}
>>> ddiff = DeepDiff(t1, t2)
>>> pprint (ddiff)
{'dic_item_added': ['root[5]', 'root[6]'],
 'dic_item_removed': ['root[4]'],
 'values_changed': {'root[2]': {'newvalue': 4, 'oldvalue': 2}}}

弦差异

>>> t1 = {1:1, 2:2, 3:3, 4:{"a":"hello", "b":"world"}}
>>> t2 = {1:1, 2:4, 3:3, 4:{"a":"hello", "b":"world!"}}
>>> ddiff = DeepDiff(t1, t2)
>>> pprint (ddiff, indent = 2)
{ 'values_changed': { 'root[2]': {'newvalue': 4, 'oldvalue': 2},
                      "root[4]['b']": { 'newvalue': 'world!',
                                        'oldvalue': 'world'}}}

弦差异2

>>> t1 = {1:1, 2:2, 3:3, 4:{"a":"hello", "b":"world!\nGoodbye!\n1\n2\nEnd"}}
>>> t2 = {1:1, 2:2, 3:3, 4:{"a":"hello", "b":"world\n1\n2\nEnd"}}
>>> ddiff = DeepDiff(t1, t2)
>>> pprint (ddiff, indent = 2)
{ 'values_changed': { "root[4]['b']": { 'diff': '--- \n'
                                                '+++ \n'
                                                '@@ -1,5 +1,4 @@\n'
                                                '-world!\n'
                                                '-Goodbye!\n'
                                                '+world\n'
                                                ' 1\n'
                                                ' 2\n'
                                                ' End',
                                        'newvalue': 'world\n1\n2\nEnd',
                                        'oldvalue': 'world!\n'
                                                    'Goodbye!\n'
                                                    '1\n'
                                                    '2\n'
                                                    'End'}}}

>>> 
>>> print (ddiff['values_changed']["root[4]['b']"]["diff"])
--- 
+++ 
@@ -1,5 +1,4 @@
-world!
-Goodbye!
+world
 1
 2
 End

类型变更

>>> t1 = {1:1, 2:2, 3:3, 4:{"a":"hello", "b":[1, 2, 3]}}
>>> t2 = {1:1, 2:2, 3:3, 4:{"a":"hello", "b":"world\n\n\nEnd"}}
>>> ddiff = DeepDiff(t1, t2)
>>> pprint (ddiff, indent = 2)
{ 'type_changes': { "root[4]['b']": { 'newtype': <class 'str'>,
                                      'newvalue': 'world\n\n\nEnd',
                                      'oldtype': <class 'list'>,
                                      'oldvalue': [1, 2, 3]}}}

清单差异

>>> t1 = {1:1, 2:2, 3:3, 4:{"a":"hello", "b":[1, 2, 3, 4]}}
>>> t2 = {1:1, 2:2, 3:3, 4:{"a":"hello", "b":[1, 2]}}
>>> ddiff = DeepDiff(t1, t2)
>>> pprint (ddiff, indent = 2)
{'iterable_item_removed': {"root[4]['b'][2]": 3, "root[4]['b'][3]": 4}}

清单差异2:

>>> t1 = {1:1, 2:2, 3:3, 4:{"a":"hello", "b":[1, 2, 3]}}
>>> t2 = {1:1, 2:2, 3:3, 4:{"a":"hello", "b":[1, 3, 2, 3]}}
>>> ddiff = DeepDiff(t1, t2)
>>> pprint (ddiff, indent = 2)
{ 'iterable_item_added': {"root[4]['b'][3]": 3},
  'values_changed': { "root[4]['b'][1]": {'newvalue': 3, 'oldvalue': 2},
                      "root[4]['b'][2]": {'newvalue': 2, 'oldvalue': 3}}}

列出差异忽略顺序或重复项:(具有与上述相同的字典)

>>> t1 = {1:1, 2:2, 3:3, 4:{"a":"hello", "b":[1, 2, 3]}}
>>> t2 = {1:1, 2:2, 3:3, 4:{"a":"hello", "b":[1, 3, 2, 3]}}
>>> ddiff = DeepDiff(t1, t2, ignore_order=True)
>>> print (ddiff)
{}

包含字典的列表:

>>> t1 = {1:1, 2:2, 3:3, 4:{"a":"hello", "b":[1, 2, {1:1, 2:2}]}}
>>> t2 = {1:1, 2:2, 3:3, 4:{"a":"hello", "b":[1, 2, {1:3}]}}
>>> ddiff = DeepDiff(t1, t2)
>>> pprint (ddiff, indent = 2)
{ 'dic_item_removed': ["root[4]['b'][2][2]"],
  'values_changed': {"root[4]['b'][2][1]": {'newvalue': 3, 'oldvalue': 1}}}

套装:

>>> t1 = {1, 2, 8}
>>> t2 = {1, 2, 3, 5}
>>> ddiff = DeepDiff(t1, t2)
>>> pprint (DeepDiff(t1, t2))
{'set_item_added': ['root[3]', 'root[5]'], 'set_item_removed': ['root[8]']}

命名元组:

>>> from collections import namedtuple
>>> Point = namedtuple('Point', ['x', 'y'])
>>> t1 = Point(x=11, y=22)
>>> t2 = Point(x=11, y=23)
>>> pprint (DeepDiff(t1, t2))
{'values_changed': {'root.y': {'newvalue': 23, 'oldvalue': 22}}}

自定义对象:

>>> class ClassA(object):
...     a = 1
...     def __init__(self, b):
...         self.b = b
... 
>>> t1 = ClassA(1)
>>> t2 = ClassA(2)
>>> 
>>> pprint(DeepDiff(t1, t2))
{'values_changed': {'root.b': {'newvalue': 2, 'oldvalue': 1}}}

对象属性添加:

>>> t2.c = "new attribute"
>>> pprint(DeepDiff(t1, t2))
{'attribute_added': ['root.c'],
 'values_changed': {'root.b': {'newvalue': 2, 'oldvalue': 1}}}

In case you want the difference recursively, I have written a package for python: https://github.com/seperman/deepdiff

Installation

Install from PyPi:

pip install deepdiff

Example usage

Importing

>>> from deepdiff import DeepDiff
>>> from pprint import pprint
>>> from __future__ import print_function # In case running on Python 2

Same object returns empty

>>> t1 = {1:1, 2:2, 3:3}
>>> t2 = t1
>>> print(DeepDiff(t1, t2))
{}

Type of an item has changed

>>> t1 = {1:1, 2:2, 3:3}
>>> t2 = {1:1, 2:"2", 3:3}
>>> pprint(DeepDiff(t1, t2), indent=2)
{ 'type_changes': { 'root[2]': { 'newtype': <class 'str'>,
                                 'newvalue': '2',
                                 'oldtype': <class 'int'>,
                                 'oldvalue': 2}}}

Value of an item has changed

>>> t1 = {1:1, 2:2, 3:3}
>>> t2 = {1:1, 2:4, 3:3}
>>> pprint(DeepDiff(t1, t2), indent=2)
{'values_changed': {'root[2]': {'newvalue': 4, 'oldvalue': 2}}}

Item added and/or removed

>>> t1 = {1:1, 2:2, 3:3, 4:4}
>>> t2 = {1:1, 2:4, 3:3, 5:5, 6:6}
>>> ddiff = DeepDiff(t1, t2)
>>> pprint (ddiff)
{'dic_item_added': ['root[5]', 'root[6]'],
 'dic_item_removed': ['root[4]'],
 'values_changed': {'root[2]': {'newvalue': 4, 'oldvalue': 2}}}

String difference

>>> t1 = {1:1, 2:2, 3:3, 4:{"a":"hello", "b":"world"}}
>>> t2 = {1:1, 2:4, 3:3, 4:{"a":"hello", "b":"world!"}}
>>> ddiff = DeepDiff(t1, t2)
>>> pprint (ddiff, indent = 2)
{ 'values_changed': { 'root[2]': {'newvalue': 4, 'oldvalue': 2},
                      "root[4]['b']": { 'newvalue': 'world!',
                                        'oldvalue': 'world'}}}

String difference 2

>>> t1 = {1:1, 2:2, 3:3, 4:{"a":"hello", "b":"world!\nGoodbye!\n1\n2\nEnd"}}
>>> t2 = {1:1, 2:2, 3:3, 4:{"a":"hello", "b":"world\n1\n2\nEnd"}}
>>> ddiff = DeepDiff(t1, t2)
>>> pprint (ddiff, indent = 2)
{ 'values_changed': { "root[4]['b']": { 'diff': '--- \n'
                                                '+++ \n'
                                                '@@ -1,5 +1,4 @@\n'
                                                '-world!\n'
                                                '-Goodbye!\n'
                                                '+world\n'
                                                ' 1\n'
                                                ' 2\n'
                                                ' End',
                                        'newvalue': 'world\n1\n2\nEnd',
                                        'oldvalue': 'world!\n'
                                                    'Goodbye!\n'
                                                    '1\n'
                                                    '2\n'
                                                    'End'}}}

>>> 
>>> print (ddiff['values_changed']["root[4]['b']"]["diff"])
--- 
+++ 
@@ -1,5 +1,4 @@
-world!
-Goodbye!
+world
 1
 2
 End

Type change

>>> t1 = {1:1, 2:2, 3:3, 4:{"a":"hello", "b":[1, 2, 3]}}
>>> t2 = {1:1, 2:2, 3:3, 4:{"a":"hello", "b":"world\n\n\nEnd"}}
>>> ddiff = DeepDiff(t1, t2)
>>> pprint (ddiff, indent = 2)
{ 'type_changes': { "root[4]['b']": { 'newtype': <class 'str'>,
                                      'newvalue': 'world\n\n\nEnd',
                                      'oldtype': <class 'list'>,
                                      'oldvalue': [1, 2, 3]}}}

List difference

>>> t1 = {1:1, 2:2, 3:3, 4:{"a":"hello", "b":[1, 2, 3, 4]}}
>>> t2 = {1:1, 2:2, 3:3, 4:{"a":"hello", "b":[1, 2]}}
>>> ddiff = DeepDiff(t1, t2)
>>> pprint (ddiff, indent = 2)
{'iterable_item_removed': {"root[4]['b'][2]": 3, "root[4]['b'][3]": 4}}

List difference 2:

>>> t1 = {1:1, 2:2, 3:3, 4:{"a":"hello", "b":[1, 2, 3]}}
>>> t2 = {1:1, 2:2, 3:3, 4:{"a":"hello", "b":[1, 3, 2, 3]}}
>>> ddiff = DeepDiff(t1, t2)
>>> pprint (ddiff, indent = 2)
{ 'iterable_item_added': {"root[4]['b'][3]": 3},
  'values_changed': { "root[4]['b'][1]": {'newvalue': 3, 'oldvalue': 2},
                      "root[4]['b'][2]": {'newvalue': 2, 'oldvalue': 3}}}

List difference ignoring order or duplicates: (with the same dictionaries as above)

>>> t1 = {1:1, 2:2, 3:3, 4:{"a":"hello", "b":[1, 2, 3]}}
>>> t2 = {1:1, 2:2, 3:3, 4:{"a":"hello", "b":[1, 3, 2, 3]}}
>>> ddiff = DeepDiff(t1, t2, ignore_order=True)
>>> print (ddiff)
{}

List that contains dictionary:

>>> t1 = {1:1, 2:2, 3:3, 4:{"a":"hello", "b":[1, 2, {1:1, 2:2}]}}
>>> t2 = {1:1, 2:2, 3:3, 4:{"a":"hello", "b":[1, 2, {1:3}]}}
>>> ddiff = DeepDiff(t1, t2)
>>> pprint (ddiff, indent = 2)
{ 'dic_item_removed': ["root[4]['b'][2][2]"],
  'values_changed': {"root[4]['b'][2][1]": {'newvalue': 3, 'oldvalue': 1}}}

Sets:

>>> t1 = {1, 2, 8}
>>> t2 = {1, 2, 3, 5}
>>> ddiff = DeepDiff(t1, t2)
>>> pprint (DeepDiff(t1, t2))
{'set_item_added': ['root[3]', 'root[5]'], 'set_item_removed': ['root[8]']}

Named Tuples:

>>> from collections import namedtuple
>>> Point = namedtuple('Point', ['x', 'y'])
>>> t1 = Point(x=11, y=22)
>>> t2 = Point(x=11, y=23)
>>> pprint (DeepDiff(t1, t2))
{'values_changed': {'root.y': {'newvalue': 23, 'oldvalue': 22}}}

Custom objects:

>>> class ClassA(object):
...     a = 1
...     def __init__(self, b):
...         self.b = b
... 
>>> t1 = ClassA(1)
>>> t2 = ClassA(2)
>>> 
>>> pprint(DeepDiff(t1, t2))
{'values_changed': {'root.b': {'newvalue': 2, 'oldvalue': 1}}}

Object attribute added:

>>> t2.c = "new attribute"
>>> pprint(DeepDiff(t1, t2))
{'attribute_added': ['root.c'],
 'values_changed': {'root.b': {'newvalue': 2, 'oldvalue': 1}}}

回答 5

可以使用python XOR运算符完成。

  • 这将删除每个列表中的重复项
  • 这将显示temp1与temp2和temp2与temp1的差异。

set(temp1) ^ set(temp2)

Can be done using python XOR operator.

  • This will remove the duplicates in each list
  • This will show difference of temp1 from temp2 and temp2 from temp1.

set(temp1) ^ set(temp2)

回答 6

最简单的方法

使用set()。difference(set())

list_a = [1,2,3]
list_b = [2,3]
print set(list_a).difference(set(list_b))

答案是 set([1])

可以打印为列表,

print list(set(list_a).difference(set(list_b)))

most simple way,

use set().difference(set())

list_a = [1,2,3]
list_b = [2,3]
print set(list_a).difference(set(list_b))

answer is set([1])

can print as a list,

print list(set(list_a).difference(set(list_b)))

回答 7

如果您真的在考虑性能,请使用numpy!

这是完整的笔记本,是github上的要点,其中包括list,numpy和pandas之间的比较。

https://gist.github.com/denfromufa/2821ff59b02e9482be15d27f2bbd4451

在此处输入图片说明

If you are really looking into performance, then use numpy!

Here is the full notebook as a gist on github with comparison between list, numpy, and pandas.

https://gist.github.com/denfromufa/2821ff59b02e9482be15d27f2bbd4451

enter image description here


回答 8

因为目前的解决方案都无法产生元组,所以我会抛出:

temp3 = tuple(set(temp1) - set(temp2))

或者:

#edited using @Mark Byers idea. If you accept this one as answer, just accept his instead.
temp3 = tuple(x for x in temp1 if x not in set(temp2))

像其他非元组在该方向上产生答案一样,它保留了顺序

i’ll toss in since none of the present solutions yield a tuple:

temp3 = tuple(set(temp1) - set(temp2))

alternatively:

#edited using @Mark Byers idea. If you accept this one as answer, just accept his instead.
temp3 = tuple(x for x in temp1 if x not in set(temp2))

Like the other non-tuple yielding answers in this direction, it preserves order


回答 9

我想要的东西,将采取两个列表,并可以做什么diffbash呢。因为当您搜索“ python diff two list”时该问题首先弹出,并且不是很具体,所以我将发布我提出的内容。

使用SequenceMatherfrom difflib可以像比较两个列表diff。其他答案都不会告诉您差异发生的位置,但是这个答案确实可以。一些答案只能在一个方向上有所不同。一些重新排列元素。有些不处理重复项。但是此解决方案为您提供了两个列表之间的真正区别:

a = 'A quick fox jumps the lazy dog'.split()
b = 'A quick brown mouse jumps over the dog'.split()

from difflib import SequenceMatcher

for tag, i, j, k, l in SequenceMatcher(None, a, b).get_opcodes():
  if tag == 'equal': print('both have', a[i:j])
  if tag in ('delete', 'replace'): print('  1st has', a[i:j])
  if tag in ('insert', 'replace'): print('  2nd has', b[k:l])

输出:

both have ['A', 'quick']
  1st has ['fox']
  2nd has ['brown', 'mouse']
both have ['jumps']
  2nd has ['over']
both have ['the']
  1st has ['lazy']
both have ['dog']

当然,如果您的应用程序做出与其他答案相同的假设,则您将从中受益最大。但是,如果您正在寻找真正的diff功能,那么这是唯一的方法。

例如,其他答案都无法处理:

a = [1,2,3,4,5]
b = [5,4,3,2,1]

但这确实做到了:

  2nd has [5, 4, 3, 2]
both have [1]
  1st has [2, 3, 4, 5]

I wanted something that would take two lists and could do what diff in bash does. Since this question pops up first when you search for “python diff two lists” and is not very specific, I will post what I came up with.

Using SequenceMather from difflib you can compare two lists like diff does. None of the other answers will tell you the position where the difference occurs, but this one does. Some answers give the difference in only one direction. Some reorder the elements. Some don’t handle duplicates. But this solution gives you a true difference between two lists:

a = 'A quick fox jumps the lazy dog'.split()
b = 'A quick brown mouse jumps over the dog'.split()

from difflib import SequenceMatcher

for tag, i, j, k, l in SequenceMatcher(None, a, b).get_opcodes():
  if tag == 'equal': print('both have', a[i:j])
  if tag in ('delete', 'replace'): print('  1st has', a[i:j])
  if tag in ('insert', 'replace'): print('  2nd has', b[k:l])

This outputs:

both have ['A', 'quick']
  1st has ['fox']
  2nd has ['brown', 'mouse']
both have ['jumps']
  2nd has ['over']
both have ['the']
  1st has ['lazy']
both have ['dog']

Of course, if your application makes the same assumptions the other answers make, you will benefit from them the most. But if you are looking for a true diff functionality, then this is the only way to go.

For example, none of the other answers could handle:

a = [1,2,3,4,5]
b = [5,4,3,2,1]

But this one does:

  2nd has [5, 4, 3, 2]
both have [1]
  1st has [2, 3, 4, 5]

回答 10

尝试这个:

temp3 = set(temp1) - set(temp2)

Try this:

temp3 = set(temp1) - set(temp2)

回答 11

这可能比Mark的列表理解速度还要快:

list(itertools.filterfalse(set(temp2).__contains__, temp1))

this could be even faster than Mark’s list comprehension:

list(itertools.filterfalse(set(temp2).__contains__, temp1))

回答 12

这是Counter最简单情况的答案。

这比上面的双向差异短,因为它只完全满足问题的要求:生成第一个列表的列表,而不生成第二个列表。

from collections import Counter

lst1 = ['One', 'Two', 'Three', 'Four']
lst2 = ['One', 'Two']

c1 = Counter(lst1)
c2 = Counter(lst2)
diff = list((c1 - c2).elements())

另外,根据您对可读性的偏好,它可以提供不错的一线:

diff = list((Counter(lst1) - Counter(lst2)).elements())

输出:

['Three', 'Four']

请注意,list(...)如果仅在呼叫上进行迭代,则可以将其删除。

由于此解决方案使用计数器,因此与许多基于集合的答案相比,它可以正确处理数量。例如在此输入上:

lst1 = ['One', 'Two', 'Two', 'Two', 'Three', 'Three', 'Four']
lst2 = ['One', 'Two']

输出为:

['Two', 'Two', 'Three', 'Three', 'Four']

Here’s a Counter answer for the simplest case.

This is shorter than the one above that does two-way diffs because it only does exactly what the question asks: generate a list of what’s in the first list but not the second.

from collections import Counter

lst1 = ['One', 'Two', 'Three', 'Four']
lst2 = ['One', 'Two']

c1 = Counter(lst1)
c2 = Counter(lst2)
diff = list((c1 - c2).elements())

Alternatively, depending on your readability preferences, it makes for a decent one-liner:

diff = list((Counter(lst1) - Counter(lst2)).elements())

Output:

['Three', 'Four']

Note that you can remove the list(...) call if you are just iterating over it.

Because this solution uses counters, it handles quantities properly vs the many set-based answers. For example on this input:

lst1 = ['One', 'Two', 'Two', 'Two', 'Three', 'Three', 'Four']
lst2 = ['One', 'Two']

The output is:

['Two', 'Two', 'Three', 'Three', 'Four']

回答 13

如果对difflist的元素进行排序和设置,则可以使用幼稚的方法。

list1=[1,2,3,4,5]
list2=[1,2,3]

print list1[len(list2):]

或使用本机set方法:

subset=set(list1).difference(list2)

print subset

import timeit
init = 'temp1 = list(range(100)); temp2 = [i * 2 for i in range(50)]'
print "Naive solution: ", timeit.timeit('temp1[len(temp2):]', init, number = 100000)
print "Native set solution: ", timeit.timeit('set(temp1).difference(temp2)', init, number = 100000)

天真的解决方案:0.0787101593292

本机设置解决方案:0.998837615564

You could use a naive method if the elements of the difflist are sorted and sets.

list1=[1,2,3,4,5]
list2=[1,2,3]

print list1[len(list2):]

or with native set methods:

subset=set(list1).difference(list2)

print subset

import timeit
init = 'temp1 = list(range(100)); temp2 = [i * 2 for i in range(50)]'
print "Naive solution: ", timeit.timeit('temp1[len(temp2):]', init, number = 100000)
print "Native set solution: ", timeit.timeit('set(temp1).difference(temp2)', init, number = 100000)

Naive solution: 0.0787101593292

Native set solution: 0.998837615564


回答 14

为此我在游戏中为时不晚,但是您可以将上述某些代码的性能与此进行比较,其中两个最快的竞争者是:

list(set(x).symmetric_difference(set(y)))
list(set(x) ^ set(y))

对于基本的编码我深表歉意。

import time
import random
from itertools import filterfalse

# 1 - performance (time taken)
# 2 - correctness (answer - 1,4,5,6)
# set performance
performance = 1
numberoftests = 7

def answer(x,y,z):
    if z == 0:
        start = time.clock()
        lists = (str(list(set(x)-set(y))+list(set(y)-set(y))))
        times = ("1 = " + str(time.clock() - start))
        return (lists,times)

    elif z == 1:
        start = time.clock()
        lists = (str(list(set(x).symmetric_difference(set(y)))))
        times = ("2 = " + str(time.clock() - start))
        return (lists,times)

    elif z == 2:
        start = time.clock()
        lists = (str(list(set(x) ^ set(y))))
        times = ("3 = " + str(time.clock() - start))
        return (lists,times)

    elif z == 3:
        start = time.clock()
        lists = (filterfalse(set(y).__contains__, x))
        times = ("4 = " + str(time.clock() - start))
        return (lists,times)

    elif z == 4:
        start = time.clock()
        lists = (tuple(set(x) - set(y)))
        times = ("5 = " + str(time.clock() - start))
        return (lists,times)

    elif z == 5:
        start = time.clock()
        lists = ([tt for tt in x if tt not in y])
        times = ("6 = " + str(time.clock() - start))
        return (lists,times)

    else:    
        start = time.clock()
        Xarray = [iDa for iDa in x if iDa not in y]
        Yarray = [iDb for iDb in y if iDb not in x]
        lists = (str(Xarray + Yarray))
        times = ("7 = " + str(time.clock() - start))
        return (lists,times)

n = numberoftests

if performance == 2:
    a = [1,2,3,4,5]
    b = [3,2,6]
    for c in range(0,n):
        d = answer(a,b,c)
        print(d[0])

elif performance == 1:
    for tests in range(0,10):
        print("Test Number" + str(tests + 1))
        a = random.sample(range(1, 900000), 9999)
        b = random.sample(range(1, 900000), 9999)
        for c in range(0,n):
            #if c not in (1,4,5,6):
            d = answer(a,b,c)
            print(d[1])

I am little too late in the game for this but you can do a comparison of performance of some of the above mentioned code with this, two of the fastest contenders are,

list(set(x).symmetric_difference(set(y)))
list(set(x) ^ set(y))

I apologize for the elementary level of coding.

import time
import random
from itertools import filterfalse

# 1 - performance (time taken)
# 2 - correctness (answer - 1,4,5,6)
# set performance
performance = 1
numberoftests = 7

def answer(x,y,z):
    if z == 0:
        start = time.clock()
        lists = (str(list(set(x)-set(y))+list(set(y)-set(y))))
        times = ("1 = " + str(time.clock() - start))
        return (lists,times)

    elif z == 1:
        start = time.clock()
        lists = (str(list(set(x).symmetric_difference(set(y)))))
        times = ("2 = " + str(time.clock() - start))
        return (lists,times)

    elif z == 2:
        start = time.clock()
        lists = (str(list(set(x) ^ set(y))))
        times = ("3 = " + str(time.clock() - start))
        return (lists,times)

    elif z == 3:
        start = time.clock()
        lists = (filterfalse(set(y).__contains__, x))
        times = ("4 = " + str(time.clock() - start))
        return (lists,times)

    elif z == 4:
        start = time.clock()
        lists = (tuple(set(x) - set(y)))
        times = ("5 = " + str(time.clock() - start))
        return (lists,times)

    elif z == 5:
        start = time.clock()
        lists = ([tt for tt in x if tt not in y])
        times = ("6 = " + str(time.clock() - start))
        return (lists,times)

    else:    
        start = time.clock()
        Xarray = [iDa for iDa in x if iDa not in y]
        Yarray = [iDb for iDb in y if iDb not in x]
        lists = (str(Xarray + Yarray))
        times = ("7 = " + str(time.clock() - start))
        return (lists,times)

n = numberoftests

if performance == 2:
    a = [1,2,3,4,5]
    b = [3,2,6]
    for c in range(0,n):
        d = answer(a,b,c)
        print(d[0])

elif performance == 1:
    for tests in range(0,10):
        print("Test Number" + str(tests + 1))
        a = random.sample(range(1, 900000), 9999)
        b = random.sample(range(1, 900000), 9999)
        for c in range(0,n):
            #if c not in (1,4,5,6):
            d = answer(a,b,c)
            print(d[1])

回答 15

这是一些比较两个字符串列表的简单的,保留顺序的方法。

一种不寻常的方法,使用pathlib

import pathlib


temp1 = ["One", "Two", "Three", "Four"]
temp2 = ["One", "Two"]

p = pathlib.Path(*temp1)
r = p.relative_to(*temp2)
list(r.parts)
# ['Three', 'Four']

假设两个列表都包含以相同的开头的字符串。有关更多详细信息,请参阅文档。注意,与设置操作相比,它并不是特别快。


使用以下方法的直接实现itertools.zip_longest

import itertools as it


[x for x, y in it.zip_longest(temp1, temp2) if x != y]
# ['Three', 'Four']

Here are a few simple, order-preserving ways of diffing two lists of strings.

Code

An unusual approach using pathlib:

import pathlib


temp1 = ["One", "Two", "Three", "Four"]
temp2 = ["One", "Two"]

p = pathlib.Path(*temp1)
r = p.relative_to(*temp2)
list(r.parts)
# ['Three', 'Four']

This assumes both lists contain strings with equivalent beginnings. See the docs for more details. Note, it is not particularly fast compared to set operations.


A straight-forward implementation using itertools.zip_longest:

import itertools as it


[x for x, y in it.zip_longest(temp1, temp2) if x != y]
# ['Three', 'Four']

回答 16

这是另一个解决方案:

def diff(a, b):
    xa = [i for i in set(a) if i not in b]
    xb = [i for i in set(b) if i not in a]
    return xa + xb

This is another solution:

def diff(a, b):
    xa = [i for i in set(a) if i not in b]
    xb = [i for i in set(b) if i not in a]
    return xa + xb

回答 17

如果遇到问题TypeError: unhashable type: 'list',则需要将列表或集合转换为元组,例如

set(map(tuple, list_of_lists1)).symmetric_difference(set(map(tuple, list_of_lists2)))

另请参阅如何在python中比较列表/集合的列表?

If you run into TypeError: unhashable type: 'list' you need to turn lists or sets into tuples, e.g.

set(map(tuple, list_of_lists1)).symmetric_difference(set(map(tuple, list_of_lists2)))

See also How to compare a list of lists/sets in python?


回答 18

假设我们有两个清单

list1 = [1, 3, 5, 7, 9]
list2 = [1, 2, 3, 4, 5]

从上面的两个列表中可以看出,列表2中存在项目1、3、5,而项目7、9中则不存在。另一方面,列表1中存在项目1、3、5,而项目2、4中不存在。

返回包含项目7、9和2、4的新列表的最佳解决方案是什么?

上面的所有答案都找到了解决方案,现在最最佳的是什么?

def difference(list1, list2):
    new_list = []
    for i in list1:
        if i not in list2:
            new_list.append(i)

    for j in list2:
        if j not in list1:
            new_list.append(j)
    return new_list

def sym_diff(list1, list2):
    return list(set(list1).symmetric_difference(set(list2)))

使用timeit我们可以看到结果

t1 = timeit.Timer("difference(list1, list2)", "from __main__ import difference, 
list1, list2")
t2 = timeit.Timer("sym_diff(list1, list2)", "from __main__ import sym_diff, 
list1, list2")

print('Using two for loops', t1.timeit(number=100000), 'Milliseconds')
print('Using two for loops', t2.timeit(number=100000), 'Milliseconds')

退货

[7, 9, 2, 4]
Using two for loops 0.11572412995155901 Milliseconds
Using symmetric_difference 0.11285737506113946 Milliseconds

Process finished with exit code 0

Let’s say we have two lists

list1 = [1, 3, 5, 7, 9]
list2 = [1, 2, 3, 4, 5]

we can see from the above two lists that items 1, 3, 5 exist in list2 and items 7, 9 do not. On the other hand, items 1, 3, 5 exist in list1 and items 2, 4 do not.

What is the best solution to return a new list containing items 7, 9 and 2, 4?

All answers above find the solution, now whats the most optimal?

def difference(list1, list2):
    new_list = []
    for i in list1:
        if i not in list2:
            new_list.append(i)

    for j in list2:
        if j not in list1:
            new_list.append(j)
    return new_list

versus

def sym_diff(list1, list2):
    return list(set(list1).symmetric_difference(set(list2)))

Using timeit we can see the results

t1 = timeit.Timer("difference(list1, list2)", "from __main__ import difference, 
list1, list2")
t2 = timeit.Timer("sym_diff(list1, list2)", "from __main__ import sym_diff, 
list1, list2")

print('Using two for loops', t1.timeit(number=100000), 'Milliseconds')
print('Using two for loops', t2.timeit(number=100000), 'Milliseconds')

returns

[7, 9, 2, 4]
Using two for loops 0.11572412995155901 Milliseconds
Using symmetric_difference 0.11285737506113946 Milliseconds

Process finished with exit code 0

回答 19

arulmr解决方案的单行版本

def diff(listA, listB):
    return set(listA) - set(listB) | set(listA) -set(listB)

single line version of arulmr solution

def diff(listA, listB):
    return set(listA) - set(listB) | set(listA) -set(listB)

回答 20

如果您想要更像变更集的东西…可以使用Counter

from collections import Counter

def diff(a, b):
  """ more verbose than needs to be, for clarity """
  ca, cb = Counter(a), Counter(b)
  to_add = cb - ca
  to_remove = ca - cb
  changes = Counter(to_add)
  changes.subtract(to_remove)
  return changes

lista = ['one', 'three', 'four', 'four', 'one']
listb = ['one', 'two', 'three']

In [127]: diff(lista, listb)
Out[127]: Counter({'two': 1, 'one': -1, 'four': -2})
# in order to go from lista to list b, you need to add a "two", remove a "one", and remove two "four"s

In [128]: diff(listb, lista)
Out[128]: Counter({'four': 2, 'one': 1, 'two': -1})
# in order to go from listb to lista, you must add two "four"s, add a "one", and remove a "two"

if you want something more like a changeset… could use Counter

from collections import Counter

def diff(a, b):
  """ more verbose than needs to be, for clarity """
  ca, cb = Counter(a), Counter(b)
  to_add = cb - ca
  to_remove = ca - cb
  changes = Counter(to_add)
  changes.subtract(to_remove)
  return changes

lista = ['one', 'three', 'four', 'four', 'one']
listb = ['one', 'two', 'three']

In [127]: diff(lista, listb)
Out[127]: Counter({'two': 1, 'one': -1, 'four': -2})
# in order to go from lista to list b, you need to add a "two", remove a "one", and remove two "four"s

In [128]: diff(listb, lista)
Out[128]: Counter({'four': 2, 'one': 1, 'two': -1})
# in order to go from listb to lista, you must add two "four"s, add a "one", and remove a "two"

回答 21

我们可以计算交集减去列表的并集:

temp1 = ['One', 'Two', 'Three', 'Four']
temp2 = ['One', 'Two', 'Five']

set(temp1+temp2)-(set(temp1)&set(temp2))

Out: set(['Four', 'Five', 'Three']) 

We can calculate intersection minus union of lists:

temp1 = ['One', 'Two', 'Three', 'Four']
temp2 = ['One', 'Two', 'Five']

set(temp1+temp2)-(set(temp1)&set(temp2))

Out: set(['Four', 'Five', 'Three']) 

回答 22

只需一行即可解决。给定的问题是两个列表(temp1和temp2)在第三个列表(temp3)中返​​回它们的差。

temp3 = list(set(temp1).difference(set(temp2)))

This can be solved with one line. The question is given two lists (temp1 and temp2) return their difference in a third list (temp3).

temp3 = list(set(temp1).difference(set(temp2)))

回答 23

这是区分两个列表(无论内容如何)的一种简单方法,您可以得到如下所示的结果:

>>> from sets import Set
>>>
>>> l1 = ['xvda', False, 'xvdbb', 12, 'xvdbc']
>>> l2 = ['xvda', 'xvdbb', 'xvdbc', 'xvdbd', None]
>>>
>>> Set(l1).symmetric_difference(Set(l2))
Set([False, 'xvdbd', None, 12])

希望这会有所帮助。

Here is an simple way to distinguish two lists (whatever the contents are), you can get the result as shown below :

>>> from sets import Set
>>>
>>> l1 = ['xvda', False, 'xvdbb', 12, 'xvdbc']
>>> l2 = ['xvda', 'xvdbb', 'xvdbc', 'xvdbd', None]
>>>
>>> Set(l1).symmetric_difference(Set(l2))
Set([False, 'xvdbd', None, 12])

Hope this will helpful.


回答 24

我更喜欢使用转换为集合,然后使用“ difference()”函数。完整的代码是:

temp1 = ['One', 'Two', 'Three', 'Four'  ]                   
temp2 = ['One', 'Two']
set1 = set(temp1)
set2 = set(temp2)
set3 = set1.difference(set2)
temp3 = list(set3)
print(temp3)

输出:

>>>print(temp3)
['Three', 'Four']

这是最容易理解的,如果将来使用大数据,将来会更容易,如果不需要重复数据,将其转换为数据集将删除重复数据。希望能帮助到你 ;-)

I prefer to use converting to sets and then using the “difference()” function. The full code is :

temp1 = ['One', 'Two', 'Three', 'Four'  ]                   
temp2 = ['One', 'Two']
set1 = set(temp1)
set2 = set(temp2)
set3 = set1.difference(set2)
temp3 = list(set3)
print(temp3)

Output:

>>>print(temp3)
['Three', 'Four']

It’s the easiest to undersand, and morover in future if you work with large data, converting it to sets will remove duplicates if duplicates are not required. Hope it helps ;-)


回答 25

(list(set(a)-set(b))+list(set(b)-set(a)))
(list(set(a)-set(b))+list(set(b)-set(a)))

回答 26

def diffList(list1, list2):     # returns the difference between two lists.
    if len(list1) > len(list2):
        return (list(set(list1) - set(list2)))
    else:
        return (list(set(list2) - set(list1)))

例如,如果list1 = [10, 15, 20, 25, 30, 35, 40]list2 = [25, 40, 35]则返回的列表将是output = [10, 20, 30, 15]

def diffList(list1, list2):     # returns the difference between two lists.
    if len(list1) > len(list2):
        return (list(set(list1) - set(list2)))
    else:
        return (list(set(list2) - set(list1)))

e.g. if list1 = [10, 15, 20, 25, 30, 35, 40] and list2 = [25, 40, 35] then the returned list will be output = [10, 20, 30, 15]