Python集与列表

问题:Python集与列表

在Python中,哪种数据结构更有效/更快速?假设顺序对我而言并不重要,并且无论如何我都将检查重复项,那么Python设置是否比Python列表慢?

In Python, which data structure is more efficient/speedy? Assuming that order is not important to me and I would be checking for duplicates anyway, is a Python set slower than a Python list?


回答 0

这取决于您打算如何处理。

在确定对象是否存在于集合中时,集合要快得多(如中所示x in s),但是在遍历其内容时要比列表慢。

您可以使用timeit模块查看哪种情况适合您的情况。

It depends on what you are intending to do with it.

Sets are significantly faster when it comes to determining if an object is present in the set (as in x in s), but are slower than lists when it comes to iterating over their contents.

You can use the timeit module to see which is faster for your situation.


回答 1

当您只想遍历值时,列表比集合要快一些。

但是,如果要检查项目中是否包含项目,则集合的速度明显快于列表。它们只能包含唯一项。

事实证明,除了不变性之外,元组的执行几乎与列表完全相同。

反复进行

>>> def iter_test(iterable):
...     for i in iterable:
...         pass
...
>>> from timeit import timeit
>>> timeit(
...     "iter_test(iterable)",
...     setup="from __main__ import iter_test; iterable = set(range(10000))",
...     number=100000)
12.666952133178711
>>> timeit(
...     "iter_test(iterable)",
...     setup="from __main__ import iter_test; iterable = list(range(10000))",
...     number=100000)
9.917098999023438
>>> timeit(
...     "iter_test(iterable)",
...     setup="from __main__ import iter_test; iterable = tuple(range(10000))",
...     number=100000)
9.865639209747314

确定是否存在对象

>>> def in_test(iterable):
...     for i in range(1000):
...         if i in iterable:
...             pass
...
>>> from timeit import timeit
>>> timeit(
...     "in_test(iterable)",
...     setup="from __main__ import in_test; iterable = set(range(1000))",
...     number=10000)
0.5591847896575928
>>> timeit(
...     "in_test(iterable)",
...     setup="from __main__ import in_test; iterable = list(range(1000))",
...     number=10000)
50.18339991569519
>>> timeit(
...     "in_test(iterable)",
...     setup="from __main__ import in_test; iterable = tuple(range(1000))",
...     number=10000)
51.597304821014404

Lists are slightly faster than sets when you just want to iterate over the values.

Sets, however, are significantly faster than lists if you want to check if an item is contained within it. They can only contain unique items though.

It turns out tuples perform in almost exactly the same way as lists, except for their immutability.

Iterating

>>> def iter_test(iterable):
...     for i in iterable:
...         pass
...
>>> from timeit import timeit
>>> timeit(
...     "iter_test(iterable)",
...     setup="from __main__ import iter_test; iterable = set(range(10000))",
...     number=100000)
12.666952133178711
>>> timeit(
...     "iter_test(iterable)",
...     setup="from __main__ import iter_test; iterable = list(range(10000))",
...     number=100000)
9.917098999023438
>>> timeit(
...     "iter_test(iterable)",
...     setup="from __main__ import iter_test; iterable = tuple(range(10000))",
...     number=100000)
9.865639209747314

Determine if an object is present

>>> def in_test(iterable):
...     for i in range(1000):
...         if i in iterable:
...             pass
...
>>> from timeit import timeit
>>> timeit(
...     "in_test(iterable)",
...     setup="from __main__ import in_test; iterable = set(range(1000))",
...     number=10000)
0.5591847896575928
>>> timeit(
...     "in_test(iterable)",
...     setup="from __main__ import in_test; iterable = list(range(1000))",
...     number=10000)
50.18339991569519
>>> timeit(
...     "in_test(iterable)",
...     setup="from __main__ import in_test; iterable = tuple(range(1000))",
...     number=10000)
51.597304821014404

回答 2

列表效果:

>>> import timeit
>>> timeit.timeit(stmt='10**6 in a', setup='a = range(10**6)', number=100000)
0.008128150348026608

设置效果:

>>> timeit.timeit(stmt='10**6 in a', setup='a = set(range(10**6))', number=100000)
0.005674857488571661

您可能要考虑元组,因为它们与列表相似,但是无法修改。它们占用的内存略少,并且访问速度更快。它们不像列表那样灵活,但效率更高。它们的正常用途是用作字典键。

集也是序列结构,但与列表和元组有两个区别。尽管集合确实具有顺序,但是该顺序是任意的,不在程序员的控制之下。第二个区别是集合中的元素必须唯一。

set根据定义。[ python | Wiki ]。

>>> x = set([1, 1, 2, 2, 3, 3])
>>> x
{1, 2, 3}

List performance:

>>> import timeit
>>> timeit.timeit(stmt='10**6 in a', setup='a = range(10**6)', number=100000)
0.008128150348026608

Set performance:

>>> timeit.timeit(stmt='10**6 in a', setup='a = set(range(10**6))', number=100000)
0.005674857488571661

You may want to consider Tuples as they’re similar to lists but can’t be modified. They take up slightly less memory and are faster to access. They aren’t as flexible but are more efficient than lists. Their normal use is to serve as dictionary keys.

Sets are also sequence structures but with two differences from lists and tuples. Although sets do have an order, that order is arbitrary and not under the programmer’s control. The second difference is that the elements in a set must be unique.

set by definition. [python | wiki].

>>> x = set([1, 1, 2, 2, 3, 3])
>>> x
{1, 2, 3}

回答 3

Set由于近乎即时的“包含”检查而获胜:https//en.wikipedia.org/wiki/Hash_table

列表实现:通常是一个数组,靠近金属层较低,适合于迭代和按元素索引随机访问。

设置实现:https : //en.wikipedia.org/wiki/Hash_table,它不会在列表上进行迭代,而是通过计算键中的哈希值来找到元素,因此它取决于键元素和哈希值的性质功能。类似于用于字典的内容。我怀疑list如果元素很少(<5)可能会更快,元素计数越大,set包含检查的性能越好。它也可以快速添加和删除元素。还请始终牢记,构建一套需要付出代价!

注意:如果list已经对进行了排序,则搜索list可能会很快,但是对于通常情况set,包含检查的a 会更快,更简单。

Set wins due to near instant ‘contains’ checks: https://en.wikipedia.org/wiki/Hash_table

List implementation: usually an array, low level close to the metal good for iteration and random access by element index.

Set implementation: https://en.wikipedia.org/wiki/Hash_table, it does not iterate on a list, but finds the element by computing a hash from the key, so it depends on the nature of the key elements and the hash function. Similar to what is used for dict. I suspect list could be faster if you have very few elements (< 5), the larger element count the better the set will perform for a contains check. It is also fast for element addition and removal. Also always keep in mind that building a set has a cost !

NOTE: If the list is already sorted, searching the list could be quite fast on small lists, but with more data a set is faster for contains checks.


回答 4

tl; dr

数据结构(DS)很重要,因为它们用于对数据执行操作,这基本上意味着:接受一些输入,对其进行处理,然后返回输出

在某些特定情况下,某些数据结构比其他数据结构更有用。因此,询问哪个(DS)更有效/更快是相当不公平的。这就像问刀和叉之间哪种工具更有效。我的意思是所有情况都取决于情况。

清单

列表是可变序列通常用于存储同类项目的集合

套装

集合对象是不同的可哈希对象无序集合。它通常用于测试成员资格,从序列中删除重复项以及计算数学运算(例如交集,并集,差和对称差)。

用法

从一些答案中可以明显看出,迭代值时列表比集合快得多。另一方面,检查项目是否包含列表时,集合比列表快。因此,对于某些特定操作,您唯一能说的是列表比集合要好,反之亦然。

tl;dr

Data structures (DS) are important because they are used to perform operations on data which basically implies: take some input, process it, and give back the output.

Some data structures are more useful than others in some particular cases. Therefore, it is quite unfair to ask which (DS) is more efficient/speedy. It is like asking which tool is more efficient between a knife and fork. I mean all depends on the situation.

Lists

A list is mutable sequence, typically used to store collections of homogeneous items.

Sets

A set object is an unordered collection of distinct hashable objects. It is commonly used to test membership, remove duplicates from a sequence, and compute mathematical operations such as intersection, union, difference, and symmetric difference.

Usage

From some of the answers, it is clear that a list is quite faster than a set when iterating over the values. On the other hand, a set is faster than a list when checking if an item is contained within it. Therefore, the only thing you can say is that a list is better than a set for some particular operations and vice-versa.


回答 5

当使用CPython检查值是否为少量文字之一时,我对结果感兴趣。set在Python 3 vs中获胜tuplelist并且or

from timeit import timeit

def in_test1():
  for i in range(1000):
    if i in (314, 628):
      pass

def in_test2():
  for i in range(1000):
    if i in [314, 628]:
      pass

def in_test3():
  for i in range(1000):
    if i in {314, 628}:
      pass

def in_test4():
  for i in range(1000):
    if i == 314 or i == 628:
      pass

print("tuple")
print(timeit("in_test1()", setup="from __main__ import in_test1", number=100000))
print("list")
print(timeit("in_test2()", setup="from __main__ import in_test2", number=100000))
print("set")
print(timeit("in_test3()", setup="from __main__ import in_test3", number=100000))
print("or")
print(timeit("in_test4()", setup="from __main__ import in_test4", number=100000))

输出:

tuple
4.735646052286029
list
4.7308746771886945
set
3.5755991376936436
or
4.687681658193469

对于3到5个字面量,set仍然会以较大幅度获胜,并or成为最慢的。

在Python 2中,set总是最慢的。or是最快的2至3文本和tuplelist是具有4个或多个文字更快。我无法区分tuplevs 的速度list

当要测试的值被缓存在函数之外的全局变量中,而不是在循环中创建文字set时,即使在Python 2中,每次也会赢。

这些结果适用于Core i7上的64位CPython。

I was interested in the results when checking, with CPython, if a value is one of a small number of literals. set wins in Python 3 vs tuple, list and or:

from timeit import timeit

def in_test1():
  for i in range(1000):
    if i in (314, 628):
      pass

def in_test2():
  for i in range(1000):
    if i in [314, 628]:
      pass

def in_test3():
  for i in range(1000):
    if i in {314, 628}:
      pass

def in_test4():
  for i in range(1000):
    if i == 314 or i == 628:
      pass

print("tuple")
print(timeit("in_test1()", setup="from __main__ import in_test1", number=100000))
print("list")
print(timeit("in_test2()", setup="from __main__ import in_test2", number=100000))
print("set")
print(timeit("in_test3()", setup="from __main__ import in_test3", number=100000))
print("or")
print(timeit("in_test4()", setup="from __main__ import in_test4", number=100000))

Output:

tuple
4.735646052286029
list
4.7308746771886945
set
3.5755991376936436
or
4.687681658193469

For 3 to 5 literals, set still wins by a wide margin, and or becomes the slowest.

In Python 2, set is always the slowest. or is the fastest for 2 to 3 literals, and tuple and list are faster with 4 or more literals. I couldn’t distinguish the speed of tuple vs list.

When the values to test were cached in a global variable out of the function, rather than creating the literal within the loop, set won every time, even in Python 2.

These results apply to 64-bit CPython on a Core i7.


回答 6

我建议您使用用例仅限于引用或搜索存在的Set实现,以及使用用例需要您执行迭代的Tuple实现。列表是低级别的实现,需要大量的内存开销。

I would recommend a Set implementation where the use case is limit to referencing or search for existence and Tuple implementation where the use case requires you to perform iteration. A list is a low-level implementation and requires significant memory overhead.


回答 7

from datetime import datetime
listA = range(10000000)
setA = set(listA)
tupA = tuple(listA)
#Source Code

def calc(data, type):
start = datetime.now()
if data in type:
print ""
end = datetime.now()
print end-start

calc(9999, listA)
calc(9999, tupA)
calc(9999, setA)

比较所有3的10次迭代后的输出: 比较

from datetime import datetime
listA = range(10000000)
setA = set(listA)
tupA = tuple(listA)
#Source Code

def calc(data, type):
start = datetime.now()
if data in type:
print ""
end = datetime.now()
print end-start

calc(9999, listA)
calc(9999, tupA)
calc(9999, setA)

Output after comparing 10 iterations for all 3 : Comparison


回答 8

集合更快,而且您可以通过集合获得更多功能,比如说您有两个集合:

set1 = {"Harry Potter", "James Bond", "Iron Man"}
set2 = {"Captain America", "Black Widow", "Hulk", "Harry Potter", "James Bond"}

我们可以轻松地加入两个集合:

set3 = set1.union(set2)

找出两者的共同点:

set3 = set1.intersection(set2)

找出两者的不同之处:

set3 = set1.difference(set2)

以及更多!只是尝试一下,它们很有趣!此外,如果您必须处理2个列表中的不同值或2个列表中的公用值,我更喜欢将列表转换为集合,许多程序员都采用这种方式。希望它对您有帮助:-)

Sets are faster, morover you get more functions with sets, such as lets say you have two sets :

set1 = {"Harry Potter", "James Bond", "Iron Man"}
set2 = {"Captain America", "Black Widow", "Hulk", "Harry Potter", "James Bond"}

We can easily join two sets:

set3 = set1.union(set2)

Find out what is common in both:

set3 = set1.intersection(set2)

Find out what is different in both:

set3 = set1.difference(set2)

And much more! Just try them out, they are fun! Moreover if you have to work on the different values within 2 list or common values within 2 lists, I prefer to convert your lists to sets, and many programmers do in that way. Hope it helps you :-)