标签归档:set

Python设置为列表

问题:Python设置为列表

如何在Python中将集合转换为列表?使用

a = set(["Blah", "Hello"])
a = list(a)

不起作用。它给了我:

TypeError: 'set' object is not callable

How can I convert a set to a list in Python? Using

a = set(["Blah", "Hello"])
a = list(a)

doesn’t work. It gives me:

TypeError: 'set' object is not callable

回答 0

您的代码可以正常工作(在cpython 2.4、2.5、2.6、2.7、3.1和3.2上进行了测试):

>>> a = set(["Blah", "Hello"])
>>> a = list(a) # You probably wrote a = list(a()) here or list = set() above
>>> a
['Blah', 'Hello']

检查您是否没有list意外覆盖:

>>> assert list == __builtins__.list

Your code does work (tested with cpython 2.4, 2.5, 2.6, 2.7, 3.1 and 3.2):

>>> a = set(["Blah", "Hello"])
>>> a = list(a) # You probably wrote a = list(a()) here or list = set() above
>>> a
['Blah', 'Hello']

Check that you didn’t overwrite list by accident:

>>> assert list == __builtins__.list

回答 1

您无意间使用了内置集作为变量名,从而掩盖了它,这是一种复制错误的简单方法

>>> set=set()
>>> set=set()
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: 'set' object is not callable

第一行将set重新绑定到set的实例。第二行试图调用该实例,该实例当然会失败。

这是一个不太混乱的版本,每个变量使用不同的名称。使用新鲜的口译员

>>> a=set()
>>> b=a()
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: 'set' object is not callable

希望很明显,调用a是一个错误

You’ve shadowed the builtin set by accidentally using it as a variable name, here is a simple way to replicate your error

>>> set=set()
>>> set=set()
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: 'set' object is not callable

The first line rebinds set to an instance of set. The second line is trying to call the instance which of course fails.

Here is a less confusing version using different names for each variable. Using a fresh interpreter

>>> a=set()
>>> b=a()
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: 'set' object is not callable

Hopefully it is obvious that calling a is an error


回答 2

在编写之前,set(XXXXX) 您已经使用“ set”作为变量,例如

set = 90 #you have used "set" as an object


a = set(["Blah", "Hello"])
a = list(a)

before you write set(XXXXX) you have used “set” as a variable e.g.

set = 90 #you have used "set" as an object
…
…
a = set(["Blah", "Hello"])
a = list(a)

回答 3

这将起作用:

>>> t = [1,1,2,2,3,3,4,5]
>>> print list(set(t))
[1,2,3,4,5]

但是,如果将“列表”或“集合”用作变量名,则会得到:

TypeError: 'set' object is not callable

例如:

>>> set = [1,1,2,2,3,3,4,5]
>>> print list(set(set))
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: 'list' object is not callable

如果将“列表”用作变量名,则会发生相同的错误。

This will work:

>>> t = [1,1,2,2,3,3,4,5]
>>> print list(set(t))
[1,2,3,4,5]

However, if you have used “list” or “set” as a variable name you will get the:

TypeError: 'set' object is not callable

eg:

>>> set = [1,1,2,2,3,3,4,5]
>>> print list(set(set))
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: 'list' object is not callable

Same error will occur if you have used “list” as a variable name.


回答 4

s = set([1,2,3])
print [ x for x in iter(s) ]
s = set([1,2,3])
print [ x for x in iter(s) ]

回答 5

您的代码在Win7 x64上与Python 3.2.1兼容

a = set(["Blah", "Hello"])
a = list(a)
type(a)
<class 'list'>

Your code works with Python 3.2.1 on Win7 x64

a = set(["Blah", "Hello"])
a = list(a)
type(a)
<class 'list'>

回答 6

尝试结合使用map和lambda函数:

aList = map( lambda x: x, set ([1, 2, 6, 9, 0]) )

如果您在字符串中有一组数字并将其转换为整数列表,则这是一种非常方便的方法:

aList = map( lambda x: int(x), set (['1', '2', '3', '7', '12']) )

Try using combination of map and lambda functions:

aList = map( lambda x: x, set ([1, 2, 6, 9, 0]) )

It is very convenient approach if you have a set of numbers in string and you want to convert it to list of integers:

aList = map( lambda x: int(x), set (['1', '2', '3', '7', '12']) )

为什么字典和集合中的顺序是任意的?

问题:为什么字典和集合中的顺序是任意的?

我不明白如何通过“任意”顺序完成字典或在python中设置的循环。

我的意思是,这是一种编程语言,因此该语言中的所有内容都必须100%确定,对吗?Python必须具有某种算法来决定选择字典或集合的哪一部分,第一,第二等等。

我想念什么?

I don’t understand how looping over a dictionary or set in python is done by ‘arbitrary’ order.

I mean, it’s a programming language so everything in the language must be 100% determined, correct? Python must have some kind of algorithm that decides which part of the dictionary or set is chosen, 1st, second and so on.

What am I missing?


回答 0

注意:此答案是dict在Python 3.6中更改类型的实现之前编写的。此答案中的大多数实现细节仍然适用,但是字典中键的列出顺序不再由哈希值确定。设置的实现保持不变。

顺序不是任意的,而是取决于字典或集合的插入和删除历史记录以及特定的Python实现。对于该答案的其余部分,对于“字典”,您还可以阅读“设置”;集被实现为仅具有键而没有值的字典。

对键进行散列,并将散列值分配给动态表中的插槽(它可以根据需要增加或缩小)。映射过程可能导致冲突,这意味着必须根据已存在的键将密钥插入下一个插槽。

列出内容循环遍历插槽,因此键以它们当前在表中的顺序列出。

以键'foo''bar'为例,假设表的大小为8个插槽。在Python 2.7中,hash('foo')is -4177197833195190597hash('bar')is 327024216814240868。模数8,这意味着这两个键分别插入插槽3和4中,然后:

>>> hash('foo')
-4177197833195190597
>>> hash('foo') % 8
3
>>> hash('bar')
327024216814240868
>>> hash('bar') % 8
4

这通知了他们的上市顺序:

>>> {'bar': None, 'foo': None}
{'foo': None, 'bar': None}

除3和4之外的所有插槽均为空,在表上循环先列出插槽3,然后列出插槽4,因此'foo'在之前列出'bar'

barbaz,但是散列值恰好相距8,因此映射到完全相同的插槽4

>>> hash('bar')
327024216814240868
>>> hash('baz')
327024216814240876
>>> hash('bar') % 8
4
>>> hash('baz') % 8
4

现在,他们的顺序取决于首先插入哪个密钥。第二个密钥将必须移至下一个插槽:

>>> {'baz': None, 'bar': None}
{'bar': None, 'baz': None}
>>> {'bar': None, 'baz': None}
{'baz': None, 'bar': None}

此处的表顺序有所不同,因为一个或另一个键先插入插槽。

CPython使用的基础结构(最常用的Python实现)的技术名称是哈希表,该哈希表使用开放式寻址。如果您感到好奇,并且对C足够了解,请查看C实现的所有(详细记录)细节。您还可以观看Brandon RhodesPycon 2010上所作的有关CPython如何dict工作的演示,或获取Beautiful Code的副本,其中包括Andrew Kuchling编写的有关实现的章节。

请注意,从Python 3.3开始,还使用了随机哈希种子,使得哈希冲突无法预测,以防止某些类型的拒绝服务(攻击者通过引起大量哈希冲突而使Python服务器无响应)。这意味着给定字典或集合的顺序取决于当前Python调用的随机哈希种子。

其他实现可以自由地为字典使用不同的结构,只要它们满足已记录的Python接口即可,但是我相信到目前为止,所有实现都使用哈希表的变体。

CPython 3.6引入了一个新的 dict实现,该实现可以维持插入顺序,并且启动起来更快,内存效率更高。新的实现没有保留一个大的稀疏表,其中的每一行都引用存储的哈希值以及键和值对象,而是添加了一个较小的哈希数组,该数组仅引用单独的“密集”表中的索引(一个表仅包含尽可能多的行) (因为有实际的键/值对),而密集表恰好按顺序列出了包含的项。有关更多详细信息,请参见Python-Dev建议。请注意,在Python 3.6中,这被视为实现细节,Python语言不会指定其他实现必须保留顺序。这在Python 3.7中有所更改,在该版本中,此详细信息已提升为一种语言规范;为了使任何实现与Python 3.7或更高版本正确兼容,必须复制此保留顺序的行为。明确地说:此更改不适用于集合,因为集合已经具有“小”哈希结构。

Python 2.7及更高版本还提供了一个OrderedDict该类的子类dict添加了额外的数据结构来记录键顺序。以某种速度和额外的内存为代价,此类会记住您按什么顺序插入键。然后列出键,值或项目将按此顺序进行。它使用存储在其他词典中的双向链接列表来使订单保持最新状态。请参阅Raymond Hettinger帖子,概述该想法OrderedDict对象还有其他优点,例如可重新排序

如果您需要订购的套装,则可以安装oset软件包;它适用于Python 2.5及更高版本。

Note: This answer was written before the implementation of the dict type changed, in Python 3.6. Most of the implementation details in this answer still apply, but the listing order of keys in dictionaries is no longer determined by hash values. The set implementation remains unchanged.

The order is not arbitrary, but depends on the insertion and deletion history of the dictionary or set, as well as on the specific Python implementation. For the remainder of this answer, for ‘dictionary’, you can also read ‘set’; sets are implemented as dictionaries with just keys and no values.

Keys are hashed, and hash values are assigned to slots in a dynamic table (it can grow or shrink based on needs). And that mapping process can lead to collisions, meaning that a key will have to be slotted in a next slot based on what is already there.

Listing the contents loops over the slots, and so keys are listed in the order they currently reside in the table.

Take the keys 'foo' and 'bar', for example, and lets assume the table size is 8 slots. In Python 2.7, hash('foo') is -4177197833195190597, hash('bar') is 327024216814240868. Modulo 8, that means these two keys are slotted in slots 3 and 4 then:

>>> hash('foo')
-4177197833195190597
>>> hash('foo') % 8
3
>>> hash('bar')
327024216814240868
>>> hash('bar') % 8
4

This informs their listing order:

>>> {'bar': None, 'foo': None}
{'foo': None, 'bar': None}

All slots except 3 and 4 are empty, looping over the table first lists slot 3, then slot 4, so 'foo' is listed before 'bar'.

bar and baz, however, have hash values that are exactly 8 apart and thus map to the exact same slot, 4:

>>> hash('bar')
327024216814240868
>>> hash('baz')
327024216814240876
>>> hash('bar') % 8
4
>>> hash('baz') % 8
4

Their order now depends on which key was slotted first; the second key will have to be moved to a next slot:

>>> {'baz': None, 'bar': None}
{'bar': None, 'baz': None}
>>> {'bar': None, 'baz': None}
{'baz': None, 'bar': None}

The table order differs here, because one or the other key was slotted first.

The technical name for the underlying structure used by CPython (the most commonly used Python implemenation) is a hash table, one that uses open addressing. If you are curious, and understand C well enough, take a look at the C implementation for all the (well documented) details. You could also watch this Pycon 2010 presentation by Brandon Rhodes about how CPython dict works, or pick up a copy of Beautiful Code, which includes a chapter on the implementation written by Andrew Kuchling.

Note that as of Python 3.3, a random hash seed is used as well, making hash collisions unpredictable to prevent certain types of denial of service (where an attacker renders a Python server unresponsive by causing mass hash collisions). This means that the order of a given dictionary or set is then also dependent on the random hash seed for the current Python invocation.

Other implementations are free to use a different structure for dictionaries, as long as they satisfy the documented Python interface for them, but I believe that all implementations so far use a variation of the hash table.

CPython 3.6 introduces a new dict implementation that maintains insertion order, and is faster and more memory efficient to boot. Rather than keep a large sparse table where each row references the stored hash value, and the key and value objects, the new implementation adds a smaller hash array that only references indices in a separate ‘dense’ table (one that only contains as many rows as there are actual key-value pairs), and it is the dense table that happens to list the contained items in order. See the proposal to Python-Dev for more details. Note that in Python 3.6 this is considered an implementation detail, Python-the-language does not specify that other implementations have to retain order. This changed in Python 3.7, where this detail was elevated to be a language specification; for any implementation to be properly compatible with Python 3.7 or newer it must copy this order-preserving behaviour. And to be explicit: this change doesn’t apply to sets, as sets already have a ‘small’ hash structure.

Python 2.7 and newer also provides an OrderedDict class, a subclass of dict that adds an additional data structure to record key order. At the price of some speed and extra memory, this class remembers in what order you inserted keys; listing keys, values or items will then do so in that order. It uses a doubly-linked list stored in an additional dictionary to keep the order up-to-date efficiently. See the post by Raymond Hettinger outlining the idea. OrderedDict objects have other advantages, such as being re-orderable.

If you wanted an ordered set, you can install the oset package; it works on Python 2.5 and up.


回答 1

这更多是对Python 3.41集的响应,该集在被关闭之前被重复了。


其他人是对的:不要依赖命令。甚至不要假装有一个。

也就是说,您可以依靠件事:

list(myset) == list(myset)

也就是说,顺序是稳定的


要了解为什么会有感知的顺序,就需要了解以下几点:

  • Python使用哈希集

  • CPython的哈希集如何存储在内存中以及

  • 数字如何散列

从顶部:

一个哈希集合是存储随机数据与真快,查找时间的方法。

它具有一个支持数组:

# A C array; items may be NULL,
# a pointer to an object, or a
# special dummy object
_ _ 4 _ _ 2 _ _ 6

我们将忽略特殊的伪对象,该伪对象的存在只是为了使移除更易于处理,因为我们不会从这些集合中移除。

为了真正快速地进行查找,您需要做一些魔术来计算对象的哈希值。唯一的规则是两个相等的对象具有相同的哈希值。(但是,如果两个对象具有相同的哈希,则它们可能不相等。)

然后,通过将模数乘以数组长度来建立索引:

hash(4) % len(storage) = index 2

这使得访问元素确实非常快。

散列只是故事的大部分,因为hash(n) % len(storage)并且hash(m) % len(storage)可以产生相同的数目。在这种情况下,几种不同的策略可以尝试解决冲突。CPython在做复杂的事情之前先使用了9次“线性探测”,因此在寻找其他位置之前,它会在插槽的左侧查找多达9个位置。

CPython的哈希集存储如下:

  • 哈希集不能超过2/3 full。如果有20个元素,并且后备数组长30个元素,则后备存储将调整为更大的大小。这是因为您与小型后备店的碰撞更为频繁,而碰撞会使一切变慢。

  • 除大型存储集(50k元素)以2的幂(8、32、128,…)调整大小外,后备存储以8的幂从4开始调整大小。

因此,当您创建阵列时,后备存储区的长度为8。当存储区的容量为5并添加一个元素时,它将短暂包含6个元素。6 > ²⁄₃·8因此这会触发调整大小,后备存储将大小增加三倍,达到32。

最后,hash(n)仅返回n数字(-1特殊情况除外)。


因此,让我们看第一个:

v_set = {88,11,1,33,21,3,7,55,37,8}

len(v_set)是10,因此在添加所有项目后,后备存储至少为15(+1)。2的相关乘方为32。因此,后备存储为:

__ __ __ __ __ __ __ __ __ __ __ __ __ __ __ __ __ __ __ __ __ __ __ __ __ __ __ __ __ __ __ __

我们有

hash(88) % 32 = 24
hash(11) % 32 = 11
hash(1)  % 32 = 1
hash(33) % 32 = 1
hash(21) % 32 = 21
hash(3)  % 32 = 3
hash(7)  % 32 = 7
hash(55) % 32 = 23
hash(37) % 32 = 5
hash(8)  % 32 = 8

所以这些插入为:

__  1 __  3 __ 37 __  7  8 __ __ 11 __ __ __ __ __ __ __ __ __ 21 __ 55 88 __ __ __ __ __ __ __
   33 ← Can't also be where 1 is;
        either 1 or 33 has to move

因此,我们希望订单像

{[1 or 33], 3, 37, 7, 8, 11, 21, 55, 88}

与1或33不在其他地方的开始。这将使用线性探测,因此我们将具有:


__  1 33  3 __ 37 __  7  8 __ __ 11 __ __ __ __ __ __ __ __ __ 21 __ 55 88 __ __ __ __ __ __ __

要么


__ 33  1  3 __ 37 __  7  8 __ __ 11 __ __ __ __ __ __ __ __ __ 21 __ 55 88 __ __ __ __ __ __ __

您可能希望33是被替换的,因为1已经存在,但是由于在构建集合时会发生调整大小,实际上并非如此。每次重建集合时,已经添加的项目都会有效地重新排序。

现在你明白了为什么

{7,5,11,1,4,13,55,12,2,3,6,20,9,10}

可能是有秩序的。有14个元素,因此后备存储区至少为21 + 1,这意味着32:

__ __ __ __ __ __ __ __ __ __ __ __ __ __ __ __ __ __ __ __ __ __ __ __ __ __ __ __ __ __ __ __

前13个插槽中的1到13个哈希值。20进入插槽20。

__  1  2  3  4  5  6  7  8  9 10 11 12 13 __ __ __ __ __ __ 20 __ __ __ __ __ __ __ __ __ __ __

55进入插槽hash(55) % 3223

__  1  2  3  4  5  6  7  8  9 10 11 12 13 __ __ __ __ __ __ 20 __ __ 55 __ __ __ __ __ __ __ __

如果我们选择50,我们期望

__  1  2  3  4  5  6  7  8  9 10 11 12 13 __ __ __ __ 50 __ 20 __ __ __ __ __ __ __ __ __ __ __

瞧瞧:

{1, 2, 3, 4, 5, 6, 7, 9, 10, 11, 12, 13, 20, 50}
#>>> {1, 2, 3, 4, 5, 6, 7, 9, 10, 11, 12, 13, 50, 20}

pop 通过事物的外观非常简单地实现:遍历列表并弹出第一个列表。


这是所有实现细节。

This is more a response to Python 3.41 A set before it was closed as a duplicate.


The others are right: don’t rely on the order. Don’t even pretend there is one.

That said, there is one thing you can rely on:

list(myset) == list(myset)

That is, the order is stable.


Understanding why there is a perceived order requires understanding a few things:

  • That Python uses hash sets,

  • How CPython’s hash set is stored in memory and

  • How numbers get hashed

From the top:

A hash set is a method of storing random data with really fast lookup times.

It has a backing array:

# A C array; items may be NULL,
# a pointer to an object, or a
# special dummy object
_ _ 4 _ _ 2 _ _ 6

We shall ignore the special dummy object, which exists only to make removes easier to deal with, because we won’t be removing from these sets.

In order to have really fast lookup, you do some magic to calculate a hash from an object. The only rule is that two objects which are equal have the same hash. (But if two objects have the same hash they can be unequal.)

You then make in index by taking the modulus by the array length:

hash(4) % len(storage) = index 2

This makes it really fast to access elements.

Hashes are only most of the story, as hash(n) % len(storage) and hash(m) % len(storage) can result in the same number. In that case, several different strategies can try and resolve the conflict. CPython uses “linear probing” 9 times before doing complicated things, so it will look to the left of the slot for up to 9 places before looking elsewhere.

CPython’s hash sets are stored like this:

  • A hash set can be no more than 2/3 full. If there are 20 elements and the backing array is 30 elements long, the backing store will resize to be larger. This is because you get collisions more often with small backing stores, and collisions slow everything down.

  • The backing store resizes in powers of 4, starting at 8, except for large sets (50k elements) which resize in powers of two: (8, 32, 128, …).

So when you create an array the backing store is length 8. When it is 5 full and you add an element, it will briefly contain 6 elements. 6 > ²⁄₃·8 so this triggers a resize, and the backing store quadruples to size 32.

Finally, hash(n) just returns n for numbers (except -1 which is special).


So, let’s look at the first one:

v_set = {88,11,1,33,21,3,7,55,37,8}

len(v_set) is 10, so the backing store is at least 15(+1) after all items have been added. The relevant power of 2 is 32. So the backing store is:

__ __ __ __ __ __ __ __ __ __ __ __ __ __ __ __ __ __ __ __ __ __ __ __ __ __ __ __ __ __ __ __

We have

hash(88) % 32 = 24
hash(11) % 32 = 11
hash(1)  % 32 = 1
hash(33) % 32 = 1
hash(21) % 32 = 21
hash(3)  % 32 = 3
hash(7)  % 32 = 7
hash(55) % 32 = 23
hash(37) % 32 = 5
hash(8)  % 32 = 8

so these insert as:

__  1 __  3 __ 37 __  7  8 __ __ 11 __ __ __ __ __ __ __ __ __ 21 __ 55 88 __ __ __ __ __ __ __
   33 ← Can't also be where 1 is;
        either 1 or 33 has to move

So we would expect an order like

{[1 or 33], 3, 37, 7, 8, 11, 21, 55, 88}

with the 1 or 33 that isn’t at the start somewhere else. This will use linear probing, so we will either have:

       ↓
__  1 33  3 __ 37 __  7  8 __ __ 11 __ __ __ __ __ __ __ __ __ 21 __ 55 88 __ __ __ __ __ __ __

or

       ↓
__ 33  1  3 __ 37 __  7  8 __ __ 11 __ __ __ __ __ __ __ __ __ 21 __ 55 88 __ __ __ __ __ __ __

You might expect the 33 to be the one that’s displaced because the 1 was already there, but due to the resizing that happens as the set is being built, this isn’t actually the case. Every time the set gets rebuilt, the items already added are effectively reordered.

Now you can see why

{7,5,11,1,4,13,55,12,2,3,6,20,9,10}

might be in order. There are 14 elements, so the backing store is at least 21+1, which means 32:

__ __ __ __ __ __ __ __ __ __ __ __ __ __ __ __ __ __ __ __ __ __ __ __ __ __ __ __ __ __ __ __

1 to 13 hash in the first 13 slots. 20 goes in slot 20.

__  1  2  3  4  5  6  7  8  9 10 11 12 13 __ __ __ __ __ __ 20 __ __ __ __ __ __ __ __ __ __ __

55 goes in slot hash(55) % 32 which is 23:

__  1  2  3  4  5  6  7  8  9 10 11 12 13 __ __ __ __ __ __ 20 __ __ 55 __ __ __ __ __ __ __ __

If we chose 50 instead, we’d expect

__  1  2  3  4  5  6  7  8  9 10 11 12 13 __ __ __ __ 50 __ 20 __ __ __ __ __ __ __ __ __ __ __

And lo and behold:

{1, 2, 3, 4, 5, 6, 7, 9, 10, 11, 12, 13, 20, 50}
#>>> {1, 2, 3, 4, 5, 6, 7, 9, 10, 11, 12, 13, 50, 20}

pop is implemented quite simply by the looks of things: it traverses the list and pops the first one.


This is all implementation detail.


回答 2

“任意”与“不确定”不同。

他们的意思是,没有“在公共界面中”的字典迭代顺序有用的属性。几乎可以肯定,迭代顺序的许多属性完全由当前实现字典迭代的代码确定,但是作者并没有向您保证可以使用它们。这给了他们更大的自由,可以在Python版本之间(甚至在不同的操作条件下,或者在运行时完全随机地)更改这些属性,而不必担心程序会中断。

因此,如果您编写的程序在所有字典顺序上都依赖于任何属性,那么您正在“违反使用字典类型的约定”,并且Python开发人员并不保证这将始终有效,即使它看起来可以正常工作现在,当您对其进行测试时。从根本上讲,这等效于依赖C中的“未定义行为”。

“Arbitrary” isn’t the same thing as “non-determined”.

What they’re saying is that there are no useful properties of dictionary iteration order that are “in the public interface”. There almost certainly are many properties of the iteration order that are fully determined by the code that currently implements dictionary iteration, but the authors aren’t promising them to you as something you can use. This gives them more freedom to change these properties between Python versions (or even just in different operating conditions, or completely at random at runtime) without worrying that your program will break.

Thus if you write a program that depends on any property at all of dictionary order, then you are “breaking the contract” of using the dictionary type, and the Python developers are not promising that this will always work, even if it appears to work for now when you test it. It’s basically the equivalent of relying on “undefined behaviour” in C.


回答 3

这个问题的其他答案都很好并且写得很好。OP询问“如何”,我将其解释为“他们如何摆脱”或“为什么”。

Python文档说字典没有排序,因为Python字典实现了抽象数据类型 关联数组。正如他们所说

返回绑定的顺序可以是任意的

换句话说,计算机科学专业的学生不能假设关联数组是有序的。数学中的集合也是如此

集合中元素的列出顺序无关紧要

计算机科学

集合是一种抽象数据类型,可以存储某些值,而没有任何特定顺序

使用哈希表实现字典是一个实现细节,它很有趣,因为就顺序而言,它具有与关联数组相同的属性。

The other answers to this question are excellent and well written. The OP asks “how” which I interpret as “how do they get away with” or “why”.

The Python documentation says dictionaries are not ordered because the Python dictionary implements the abstract data type associative array. As they say

the order in which the bindings are returned may be arbitrary

In other words, a computer science student cannot assume that an associative array is ordered. The same is true for sets in math

the order in which the elements of a set are listed is irrelevant

and computer science

a set is an abstract data type that can store certain values, without any particular order

Implementing a dictionary using a hash table is an implementation detail that is interesting in that it has the same properties as associative arrays as far as order is concerned.


回答 4

Python使用哈希表来存储字典,因此使用哈希表的字典或其他可迭代对象中没有顺序。

但是关于哈希对象中项目的索引,python根据以下代码在其中hashtable.c计算索引:

key_hash = ht->hash_func(key);
index = key_hash & (ht->num_buckets - 1);

因此,因为整数的哈希值是整数本身*索引基于数字(ht->num_buckets - 1是一个常数),所以按位计算-和之间的索引(ht->num_buckets - 1)与数字本身*(预期-1的哈希值是-2 ),以及具有其哈希值的其他对象。

考虑以下set使用hash-table的示例:

>>> set([0,1919,2000,3,45,33,333,5])
set([0, 33, 3, 5, 45, 333, 2000, 1919])

对于数量,33我们有:

33 & (ht->num_buckets - 1) = 1

实际上是:

'0b100001' & '0b111'= '0b1' # 1 the index of 33

注意在这种情况下(ht->num_buckets - 1)8-1=70b111

对于1919

'0b11101111111' & '0b111' = '0b111' # 7 the index of 1919

对于333

'0b101001101' & '0b111' = '0b101' # 5 the index of 333

有关python哈希函数的更多详细信息,请阅读python源代码中的以下引号:

未来的主要细节:在模拟随机性的意义上,大多数哈希方案都依赖于具有“良好”的哈希函数。Python并非如此:在最常见的情况下,它最重要的哈希函数(用于字符串和整数)非常规则:

>>> map(hash, (0, 1, 2, 3))
  [0, 1, 2, 3]
>>> map(hash, ("namea", "nameb", "namec", "named"))
  [-1658398457, -1658398460, -1658398459, -1658398462]

这不一定坏!相反,在大小为2 ** i的表中,以低序i位作为初始表索引非常快,并且对于由连续整数范围索引的字典,根本没有冲突。当键是“连续”字符串时,情况大致相同。因此,这在通常情况下会提供比随机行为更好的行为,这是非常理想的。

OTOH,当发生冲突时,填充哈希表的连续切片的趋势使得良好的冲突解决策略至关重要。仅采用哈希码的最后i位也是容易受到攻击的:例如,将列表[i << 16 for i in range(20000)]视为一组键。 由于int是它们自己的哈希码,并且适合大小为2 ** 15的字典,因此每个哈希码的最后15位均为0:它们映射到相同的表索引。

但是迎合不寻常的情况不应减慢通常的情况,因此我们无论如何都只接受最后的i个信息。剩下的事要靠冲突解决来解决。如果我们通常在第一次尝试时就找到了要寻找的密钥(事实证明,我们通常会这样做-表负载因数保持在2/3以下,那么我们的优势很明显),那么就可以了保持初始索引计算的便宜是最好的选择。


*类的哈希函数int

class int:
    def __hash__(self):
        value = self
        if value == -1:
            value = -2
        return value

Python use hash table for storing the dictionaries, so there is no order in dictionaries or other iterable objects that use hash table.

But regarding the indices of items in a hash object, python calculate the indices based on following code within hashtable.c:

key_hash = ht->hash_func(key);
index = key_hash & (ht->num_buckets - 1);

Therefor, as the hash value of integers is the integer itself* the index is based on the number (ht->num_buckets - 1 is a constant) so the index calculated by Bitwise-and between (ht->num_buckets - 1) and the number itself* (expect for -1 which it’s hash is -2) , and for other objects with their hash value.

consider the following example with set that use hash-table :

>>> set([0,1919,2000,3,45,33,333,5])
set([0, 33, 3, 5, 45, 333, 2000, 1919])

For number 33 we have :

33 & (ht->num_buckets - 1) = 1

That actually it’s :

'0b100001' & '0b111'= '0b1' # 1 the index of 33

Note in this case (ht->num_buckets - 1) is 8-1=7 or 0b111.

And for 1919 :

'0b11101111111' & '0b111' = '0b111' # 7 the index of 1919

And for 333 :

'0b101001101' & '0b111' = '0b101' # 5 the index of 333

For more details about python hash function its good to read the following quotes from python source code :

Major subtleties ahead: Most hash schemes depend on having a “good” hash function, in the sense of simulating randomness. Python doesn’t: its most important hash functions (for strings and ints) are very regular in common cases:

>>> map(hash, (0, 1, 2, 3))
  [0, 1, 2, 3]
>>> map(hash, ("namea", "nameb", "namec", "named"))
  [-1658398457, -1658398460, -1658398459, -1658398462]

This isn’t necessarily bad! To the contrary, in a table of size 2**i, taking the low-order i bits as the initial table index is extremely fast, and there are no collisions at all for dicts indexed by a contiguous range of ints. The same is approximately true when keys are “consecutive” strings. So this gives better-than-random behavior in common cases, and that’s very desirable.

OTOH, when collisions occur, the tendency to fill contiguous slices of the hash table makes a good collision resolution strategy crucial. Taking only the last i bits of the hash code is also vulnerable: for example, consider the list [i << 16 for i in range(20000)] as a set of keys. Since ints are their own hash codes, and this fits in a dict of size 2**15, the last 15 bits of every hash code are all 0: they all map to the same table index.

But catering to unusual cases should not slow the usual ones, so we just take the last i bits anyway. It’s up to collision resolution to do the rest. If we usually find the key we’re looking for on the first try (and, it turns out, we usually do — the table load factor is kept under 2/3, so the odds are solidly in our favor), then it makes best sense to keep the initial index computation dirt cheap.


* The hash function for class int :

class int:
    def __hash__(self):
        value = self
        if value == -1:
            value = -2
        return value


回答 5

从Python 3.7(在CPython 3.6中已经开始)开始,字典项将保持其插入顺序


set()如何实现?

问题:set()如何实现?

我见过有人说setpython 中的对象具有O(1)成员资格检查。如何在内部实现它们以允许这样做?它使用哪种数据结构?该实现还有什么其他含义?

这里的每个答案都非常有启发性,但是我只能接受一个答案,因此,我将选择与原始问题最接近的答案。谢谢你的信息!

I’ve seen people say that set objects in python have O(1) membership-checking. How are they implemented internally to allow this? What sort of data structure does it use? What other implications does that implementation have?

Every answer here was really enlightening, but I can only accept one, so I’ll go with the closest answer to my original question. Thanks all for the info!


回答 0

根据这个线程

实际上,CPython的集合被实现为类似于带有伪值的字典(键是集合的成员)的字典,并且进行了一些优化,可以利用这种缺乏值的方式

因此,基本上a set使用哈希表作为其基础数据结构。这解释了O(1)成员资格检查,因为在哈希表中查找项目平均而言是O(1)操作。

如果您愿意,甚至可以浏览CPython源代码以获取集合,根据Achim Domma的说法,该代码大部分是实现中的剪切和粘贴dict

According to this thread:

Indeed, CPython’s sets are implemented as something like dictionaries with dummy values (the keys being the members of the set), with some optimization(s) that exploit this lack of values

So basically a set uses a hashtable as its underlying data structure. This explains the O(1) membership checking, since looking up an item in a hashtable is an O(1) operation, on average.

If you are so inclined you can even browse the CPython source code for set which, according to Achim Domma, is mostly a cut-and-paste from the dict implementation.


回答 1

当人们说集合具有O(1)成员资格检查时,他们正在谈论平均情况。在最坏的情况下(当所有哈希值冲突时),成员资格检查为O(n)。有关时间复杂性,请参见Python Wiki

维基百科的文章说,最好的情况下为一个哈希表,不调整大小的时间复杂度O(1 + k/n)。由于Python集使用调整大小的哈希表,因此该结果并不直接适用于Python集。

在Wikipedia文章上再说一点,对于一般情况,并假设一个简单的统一哈希函数,时间复杂度为O(1/(1-k/n)),其中k/n可以由常数限制c<1

Big-O仅将渐近行为表示为n→∞。由于k / n可以由常数c <1限制,与n无关

O(1/(1-k/n))不大于O(1/(1-c))等于O(constant)= O(1)

因此,假设统一的简单哈希,平均而言,Python集的成员资格检查为O(1)

When people say sets have O(1) membership-checking, they are talking about the average case. In the worst case (when all hashed values collide) membership-checking is O(n). See the Python wiki on time complexity.

The Wikipedia article says the best case time complexity for a hash table that does not resize is O(1 + k/n). This result does not directly apply to Python sets since Python sets use a hash table that resizes.

A little further on the Wikipedia article says that for the average case, and assuming a simple uniform hashing function, the time complexity is O(1/(1-k/n)), where k/n can be bounded by a constant c<1.

Big-O refers only to asymptotic behavior as n → ∞. Since k/n can be bounded by a constant, c<1, independent of n,

O(1/(1-k/n)) is no bigger than O(1/(1-c)) which is equivalent to O(constant) = O(1).

So assuming uniform simple hashing, on average, membership-checking for Python sets is O(1).


回答 2

我认为这是一个常见的错误,set查找(或该问题的哈希表)不是O(1)。
来自维基百科

在最简单的模型中,哈希函数是完全未指定的,并且该表不会调整大小。为了最好地选择散列函数,大小为n且具有开放寻址的表没有冲突,最多可容纳n个元素,一次比较即可成功查找,并且大小为n的具有链接和k个键的表具有最小的最大(0,kn)冲突和O(1 + k / n)比较以查找。对于最差的哈希函数选择,每个插入都会导致冲突,并且哈希表会退化为线性搜索,每个插入都要进行Ω(k)摊销比较,并且最多可以进行k个比较才能成功查找。

相关:Java哈希图真的是O(1)吗?

I think its a common mistake, set lookup (or hashtable for that matter) are not O(1).
from the Wikipedia

In the simplest model, the hash function is completely unspecified and the table does not resize. For the best possible choice of hash function, a table of size n with open addressing has no collisions and holds up to n elements, with a single comparison for successful lookup, and a table of size n with chaining and k keys has the minimum max(0, k-n) collisions and O(1 + k/n) comparisons for lookup. For the worst choice of hash function, every insertion causes a collision, and hash tables degenerate to linear search, with Ω(k) amortized comparisons per insertion and up to k comparisons for a successful lookup.

Related: Is a Java hashmap really O(1)?


回答 3

我们都可以轻松访问source,前面的评论set_lookkey()说:

/* set object implementation
 Written and maintained by Raymond D. Hettinger <python@rcn.com>
 Derived from Lib/sets.py and Objects/dictobject.c.
 The basic lookup function used by all operations.
 This is based on Algorithm D from Knuth Vol. 3, Sec. 6.4.
 The initial probe index is computed as hash mod the table size.
 Subsequent probe indices are computed as explained in Objects/dictobject.c.
 To improve cache locality, each probe inspects a series of consecutive
 nearby entries before moving on to probes elsewhere in memory.  This leaves
 us with a hybrid of linear probing and open addressing.  The linear probing
 reduces the cost of hash collisions because consecutive memory accesses
 tend to be much cheaper than scattered probes.  After LINEAR_PROBES steps,
 we then use open addressing with the upper bits from the hash value.  This
 helps break-up long chains of collisions.
 All arithmetic on hash should ignore overflow.
 Unlike the dictionary implementation, the lookkey function can return
 NULL if the rich comparison returns an error.
*/


...
#ifndef LINEAR_PROBES
#define LINEAR_PROBES 9
#endif

/* This must be >= 1 */
#define PERTURB_SHIFT 5

static setentry *
set_lookkey(PySetObject *so, PyObject *key, Py_hash_t hash)  
{
...

We all have easy access to the source, where the comment preceding set_lookkey() says:

/* set object implementation
 Written and maintained by Raymond D. Hettinger <python@rcn.com>
 Derived from Lib/sets.py and Objects/dictobject.c.
 The basic lookup function used by all operations.
 This is based on Algorithm D from Knuth Vol. 3, Sec. 6.4.
 The initial probe index is computed as hash mod the table size.
 Subsequent probe indices are computed as explained in Objects/dictobject.c.
 To improve cache locality, each probe inspects a series of consecutive
 nearby entries before moving on to probes elsewhere in memory.  This leaves
 us with a hybrid of linear probing and open addressing.  The linear probing
 reduces the cost of hash collisions because consecutive memory accesses
 tend to be much cheaper than scattered probes.  After LINEAR_PROBES steps,
 we then use open addressing with the upper bits from the hash value.  This
 helps break-up long chains of collisions.
 All arithmetic on hash should ignore overflow.
 Unlike the dictionary implementation, the lookkey function can return
 NULL if the rich comparison returns an error.
*/


...
#ifndef LINEAR_PROBES
#define LINEAR_PROBES 9
#endif

/* This must be >= 1 */
#define PERTURB_SHIFT 5

static setentry *
set_lookkey(PySetObject *so, PyObject *key, Py_hash_t hash)  
{
...

回答 4

为了进一步强调set's和之间的区别dict's,这是setobject.c注释部分的摘录,其中阐明了set与dicts的主要区别。

集合的用例与字典中存在较大差异的字典大相径庭。相反,集合主要是关于成员资格测试,其中事先不知道元素的存在。因此,集合实现需要针对发现和未发现的情况进行优化。

github上的源代码

To emphasize a little more the difference between set's and dict's, here is an excerpt from the setobject.c comment sections, which clarify’s the main difference of set’s against dicts.

Use cases for sets differ considerably from dictionaries where looked-up keys are more likely to be present. In contrast, sets are primarily about membership testing where the presence of an element is not known in advance. Accordingly, the set implementation needs to optimize for both the found and not-found case.

source on github


如何对集合进行JSON序列化?

问题:如何对集合进行JSON序列化?

我有一个Python set,其中包含带有__hash____eq__方法的对象,以确保该集合中没有重复项。

我需要对该结果进行json编码set,但是即使将一个空值传递set给该json.dumps方法也会引发TypeError

  File "/usr/lib/python2.7/json/encoder.py", line 201, in encode
    chunks = self.iterencode(o, _one_shot=True)
  File "/usr/lib/python2.7/json/encoder.py", line 264, in iterencode
    return _iterencode(o, 0)
  File "/usr/lib/python2.7/json/encoder.py", line 178, in default
    raise TypeError(repr(o) + " is not JSON serializable")
TypeError: set([]) is not JSON serializable

我知道我可以为json.JSONEncoder具有自定义default方法的类创建扩展,但是我什至不知道从哪里开始转换set。是否应该set使用默认方法中的值创建字典,然后返回该方法的编码?理想情况下,我想使默认方法能够处理原始编码器阻塞的所有数据类型(我将Mongo用作数据源,因此日期似乎也引发了此错误)

正确方向的任何提示将不胜感激。

编辑:

感谢你的回答!也许我应该更精确一些。

我利用(并赞成)这里的答案来解决set翻译的局限性,但是内部键也是一个问题。

中的set对象是转换为的复杂对象__dict__,但它们本身也可以包含其属性值,这些值可能不符合json编码器中的基本类型。

涉及到很多不同的类型set,并且哈希基本上为实体计算了唯一的ID,但是按照NoSQL的真正精神,没有确切说明子对象包含什么。

一个对象可能包含的日期值starts,而另一个对象可能具有一些其他模式,该模式不包含包含“非原始”对象的键。

这就是为什么我唯一能想到的解决方案是扩展JSONEncoder替换default方法以打开不同情况的方法-但我不确定如何进行此操作,并且文档不明确。在嵌套对象中,是default按键返回go 的值,还是只是查看整个对象的通用包含/丢弃?该方法如何容纳嵌套值?我已经看过先前的问题,但似乎找不到最佳的针对特定情况的编码的方法(不幸的是,这似乎是我在这里需要做的事情)。

I have a Python set that contains objects with __hash__ and __eq__ methods in order to make certain no duplicates are included in the collection.

I need to json encode this result set, but passing even an empty set to the json.dumps method raises a TypeError.

  File "/usr/lib/python2.7/json/encoder.py", line 201, in encode
    chunks = self.iterencode(o, _one_shot=True)
  File "/usr/lib/python2.7/json/encoder.py", line 264, in iterencode
    return _iterencode(o, 0)
  File "/usr/lib/python2.7/json/encoder.py", line 178, in default
    raise TypeError(repr(o) + " is not JSON serializable")
TypeError: set([]) is not JSON serializable

I know I can create an extension to the json.JSONEncoder class that has a custom default method, but I’m not even sure where to begin in converting over the set. Should I create a dictionary out of the set values within the default method, and then return the encoding on that? Ideally, I’d like to make the default method able to handle all the datatypes that the original encoder chokes on (I’m using Mongo as a data source so dates seem to raise this error too)

Any hint in the right direction would be appreciated.

EDIT:

Thanks for the answer! Perhaps I should have been more precise.

I utilized (and upvoted) the answers here to get around the limitations of the set being translated, but there are internal keys that are an issue as well.

The objects in the set are complex objects that translate to __dict__, but they themselves can also contain values for their properties that could be ineligible for the basic types in the json encoder.

There’s a lot of different types coming into this set, and the hash basically calculates a unique id for the entity, but in the true spirit of NoSQL there’s no telling exactly what the child object contains.

One object might contain a date value for starts, whereas another may have some other schema that includes no keys containing “non-primitive” objects.

That is why the only solution I could think of was to extend the JSONEncoder to replace the default method to turn on different cases – but I’m not sure how to go about this and the documentation is ambiguous. In nested objects, does the value returned from default go by key, or is it just a generic include/discard that looks at the whole object? How does that method accommodate nested values? I’ve looked through previous questions and can’t seem to find the best approach to case-specific encoding (which unfortunately seems like what I’m going to need to do here).


回答 0

JSON表示法只有少数本机数据类型(对象,数组,字符串,数字,布尔值和null),因此以JSON序列化的任何内容都必须表示为这些类型之一。

json模块docs所示,此转换可以由JSONEncoderJSONDecoder自动完成,但随后您将放弃可能需要的其他一些结构(如果将集转换为列表,则将失去恢复常规数据的能力。列表;如果使用将集转换为字典,dict.fromkeys(s)则将失去恢复字典的能力)。

一个更复杂的解决方案是构建可以与其他本机JSON类型共存的自定义类型。这使您可以存储嵌套结构,其中包括列表,集合,字典,小数,日期时间对象等:

from json import dumps, loads, JSONEncoder, JSONDecoder
import pickle

class PythonObjectEncoder(JSONEncoder):
    def default(self, obj):
        if isinstance(obj, (list, dict, str, unicode, int, float, bool, type(None))):
            return JSONEncoder.default(self, obj)
        return {'_python_object': pickle.dumps(obj)}

def as_python_object(dct):
    if '_python_object' in dct:
        return pickle.loads(str(dct['_python_object']))
    return dct

这是一个示例会话,显示它可以处理列表,字典和集合:

>>> data = [1,2,3, set(['knights', 'who', 'say', 'ni']), {'key':'value'}, Decimal('3.14')]

>>> j = dumps(data, cls=PythonObjectEncoder)

>>> loads(j, object_hook=as_python_object)
[1, 2, 3, set(['knights', 'say', 'who', 'ni']), {u'key': u'value'}, Decimal('3.14')]

另外,使用更通用的序列化技术(例如YAMLTwisted Jelly或Python的pickle模块)可能很有用。它们每个都支持更大范围的数据类型。

JSON notation has only a handful of native datatypes (objects, arrays, strings, numbers, booleans, and null), so anything serialized in JSON needs to be expressed as one of these types.

As shown in the json module docs, this conversion can be done automatically by a JSONEncoder and JSONDecoder, but then you would be giving up some other structure you might need (if you convert sets to a list, then you lose the ability to recover regular lists; if you convert sets to a dictionary using dict.fromkeys(s) then you lose the ability to recover dictionaries).

A more sophisticated solution is to build-out a custom type that can coexist with other native JSON types. This lets you store nested structures that include lists, sets, dicts, decimals, datetime objects, etc.:

from json import dumps, loads, JSONEncoder, JSONDecoder
import pickle

class PythonObjectEncoder(JSONEncoder):
    def default(self, obj):
        if isinstance(obj, (list, dict, str, unicode, int, float, bool, type(None))):
            return JSONEncoder.default(self, obj)
        return {'_python_object': pickle.dumps(obj)}

def as_python_object(dct):
    if '_python_object' in dct:
        return pickle.loads(str(dct['_python_object']))
    return dct

Here is a sample session showing that it can handle lists, dicts, and sets:

>>> data = [1,2,3, set(['knights', 'who', 'say', 'ni']), {'key':'value'}, Decimal('3.14')]

>>> j = dumps(data, cls=PythonObjectEncoder)

>>> loads(j, object_hook=as_python_object)
[1, 2, 3, set(['knights', 'say', 'who', 'ni']), {u'key': u'value'}, Decimal('3.14')]

Alternatively, it may be useful to use a more general purpose serialization technique such as YAML, Twisted Jelly, or Python’s pickle module. These each support a much greater range of datatypes.


回答 1

您可以创建一个自定义编码器,list当遇到时将返回set。这是一个例子:

>>> import json
>>> class SetEncoder(json.JSONEncoder):
...    def default(self, obj):
...       if isinstance(obj, set):
...          return list(obj)
...       return json.JSONEncoder.default(self, obj)
... 
>>> json.dumps(set([1,2,3,4,5]), cls=SetEncoder)
'[1, 2, 3, 4, 5]'

您也可以通过这种方式检测其他类型。如果需要保留列表实际上是一个集合,则可以使用自定义编码。类似的东西return {'type':'set', 'list':list(obj)}可能会起作用。

要说明嵌套类型,请考虑将其序列化:

>>> class Something(object):
...    pass
>>> json.dumps(set([1,2,3,4,5,Something()]), cls=SetEncoder)

这将引发以下错误:

TypeError: <__main__.Something object at 0x1691c50> is not JSON serializable

这表明编码器将获取list返回的结果,并对其子代递归调用序列化器。要为多种类型添加自定义序列化程序,可以执行以下操作:

>>> class SetEncoder(json.JSONEncoder):
...    def default(self, obj):
...       if isinstance(obj, set):
...          return list(obj)
...       if isinstance(obj, Something):
...          return 'CustomSomethingRepresentation'
...       return json.JSONEncoder.default(self, obj)
... 
>>> json.dumps(set([1,2,3,4,5,Something()]), cls=SetEncoder)
'[1, 2, 3, 4, 5, "CustomSomethingRepresentation"]'

You can create a custom encoder that returns a list when it encounters a set. Here’s an example:

>>> import json
>>> class SetEncoder(json.JSONEncoder):
...    def default(self, obj):
...       if isinstance(obj, set):
...          return list(obj)
...       return json.JSONEncoder.default(self, obj)
... 
>>> json.dumps(set([1,2,3,4,5]), cls=SetEncoder)
'[1, 2, 3, 4, 5]'

You can detect other types this way too. If you need to retain that the list was actually a set, you could use a custom encoding. Something like return {'type':'set', 'list':list(obj)} might work.

To illustrated nested types, consider serializing this:

>>> class Something(object):
...    pass
>>> json.dumps(set([1,2,3,4,5,Something()]), cls=SetEncoder)

This raises the following error:

TypeError: <__main__.Something object at 0x1691c50> is not JSON serializable

This indicates that the encoder will take the list result returned and recursively call the serializer on its children. To add a custom serializer for multiple types, you can do this:

>>> class SetEncoder(json.JSONEncoder):
...    def default(self, obj):
...       if isinstance(obj, set):
...          return list(obj)
...       if isinstance(obj, Something):
...          return 'CustomSomethingRepresentation'
...       return json.JSONEncoder.default(self, obj)
... 
>>> json.dumps(set([1,2,3,4,5,Something()]), cls=SetEncoder)
'[1, 2, 3, 4, 5, "CustomSomethingRepresentation"]'

回答 2

我将Raymond Hettinger的解决方案调整为适用于python 3。

这是发生了什么变化:

  • unicode 消失了
  • 更新调用父母defaultsuper()
  • 使用base64序列化bytes型成str(因为它似乎bytes在python 3不能被转换为JSON)
from decimal import Decimal
from base64 import b64encode, b64decode
from json import dumps, loads, JSONEncoder
import pickle

class PythonObjectEncoder(JSONEncoder):
    def default(self, obj):
        if isinstance(obj, (list, dict, str, int, float, bool, type(None))):
            return super().default(obj)
        return {'_python_object': b64encode(pickle.dumps(obj)).decode('utf-8')}

def as_python_object(dct):
    if '_python_object' in dct:
        return pickle.loads(b64decode(dct['_python_object'].encode('utf-8')))
    return dct

data = [1,2,3, set(['knights', 'who', 'say', 'ni']), {'key':'value'}, Decimal('3.14')]
j = dumps(data, cls=PythonObjectEncoder)
print(loads(j, object_hook=as_python_object))
# prints: [1, 2, 3, {'knights', 'who', 'say', 'ni'}, {'key': 'value'}, Decimal('3.14')]

I adapted Raymond Hettinger’s solution to python 3.

Here is what has changed:

  • unicode disappeared
  • updated the call to the parents’ default with super()
  • using base64 to serialize the bytes type into str (because it seems that bytes in python 3 can’t be converted to JSON)
from decimal import Decimal
from base64 import b64encode, b64decode
from json import dumps, loads, JSONEncoder
import pickle

class PythonObjectEncoder(JSONEncoder):
    def default(self, obj):
        if isinstance(obj, (list, dict, str, int, float, bool, type(None))):
            return super().default(obj)
        return {'_python_object': b64encode(pickle.dumps(obj)).decode('utf-8')}

def as_python_object(dct):
    if '_python_object' in dct:
        return pickle.loads(b64decode(dct['_python_object'].encode('utf-8')))
    return dct

data = [1,2,3, set(['knights', 'who', 'say', 'ni']), {'key':'value'}, Decimal('3.14')]
j = dumps(data, cls=PythonObjectEncoder)
print(loads(j, object_hook=as_python_object))
# prints: [1, 2, 3, {'knights', 'who', 'say', 'ni'}, {'key': 'value'}, Decimal('3.14')]

回答 3

JSON中仅字典,列表和原始对象类型(int,字符串,布尔)可用。

Only dictionaries, Lists and primitive object types (int, string, bool) are available in JSON.


回答 4

您无需创建自定义编码器类即可提供default方法-可以将其作为关键字参数传递:

import json

def serialize_sets(obj):
    if isinstance(obj, set):
        return list(obj)

    return obj

json_str = json.dumps(set([1,2,3]), default=serialize_sets)
print(json_str)

会生成[1, 2, 3]所有受支持的Python版本。

You don’t need to make a custom encoder class to supply the default method – it can be passed in as a keyword argument:

import json

def serialize_sets(obj):
    if isinstance(obj, set):
        return list(obj)

    return obj

json_str = json.dumps(set([1,2,3]), default=serialize_sets)
print(json_str)

results in [1, 2, 3] in all supported Python versions.


回答 5

如果您只需要编码集合,而不是一般的Python对象,并且想要使其易于阅读,则可以使用Raymond Hettinger答案的简化版本:

import json
import collections

class JSONSetEncoder(json.JSONEncoder):
    """Use with json.dumps to allow Python sets to be encoded to JSON

    Example
    -------

    import json

    data = dict(aset=set([1,2,3]))

    encoded = json.dumps(data, cls=JSONSetEncoder)
    decoded = json.loads(encoded, object_hook=json_as_python_set)
    assert data == decoded     # Should assert successfully

    Any object that is matched by isinstance(obj, collections.Set) will
    be encoded, but the decoded value will always be a normal Python set.

    """

    def default(self, obj):
        if isinstance(obj, collections.Set):
            return dict(_set_object=list(obj))
        else:
            return json.JSONEncoder.default(self, obj)

def json_as_python_set(dct):
    """Decode json {'_set_object': [1,2,3]} to set([1,2,3])

    Example
    -------
    decoded = json.loads(encoded, object_hook=json_as_python_set)

    Also see :class:`JSONSetEncoder`

    """
    if '_set_object' in dct:
        return set(dct['_set_object'])
    return dct

If you only need to encode sets, not general Python objects, and want to keep it easily human-readable, a simplified version of Raymond Hettinger’s answer can be used:

import json
import collections

class JSONSetEncoder(json.JSONEncoder):
    """Use with json.dumps to allow Python sets to be encoded to JSON

    Example
    -------

    import json

    data = dict(aset=set([1,2,3]))

    encoded = json.dumps(data, cls=JSONSetEncoder)
    decoded = json.loads(encoded, object_hook=json_as_python_set)
    assert data == decoded     # Should assert successfully

    Any object that is matched by isinstance(obj, collections.Set) will
    be encoded, but the decoded value will always be a normal Python set.

    """

    def default(self, obj):
        if isinstance(obj, collections.Set):
            return dict(_set_object=list(obj))
        else:
            return json.JSONEncoder.default(self, obj)

def json_as_python_set(dct):
    """Decode json {'_set_object': [1,2,3]} to set([1,2,3])

    Example
    -------
    decoded = json.loads(encoded, object_hook=json_as_python_set)

    Also see :class:`JSONSetEncoder`

    """
    if '_set_object' in dct:
        return set(dct['_set_object'])
    return dct

回答 6

如果您只需要快速转储并且不想实现自定义编码器。您可以使用以下内容:

json_string = json.dumps(data, iterable_as_array=True)

这会将所有集合(和其他可迭代对象)转换为数组。请注意,当您解析json时,这些字段将保留为数组。如果要保留类型,则需要编写自定义编码器。

If you need just quick dump and don’t want to implement custom encoder. You can use the following:

json_string = json.dumps(data, iterable_as_array=True)

This will convert all sets (and other iterables) into arrays. Just beware that those fields will stay arrays when you parse the json back. If you want to preserve the types, you need to write custom encoder.


回答 7

公认的解决方案的一个缺点是它的输出是非常特定于python的。也就是说,人类无法观察到其原始json输出,也无法通过其他语言(例如javascript)加载该输出。例:

db = {
        "a": [ 44, set((4,5,6)) ],
        "b": [ 55, set((4,3,2)) ]
        }

j = dumps(db, cls=PythonObjectEncoder)
print(j)

会给你:

{"a": [44, {"_python_object": "gANjYnVpbHRpbnMKc2V0CnEAXXEBKEsESwVLBmWFcQJScQMu"}], "b": [55, {"_python_object": "gANjYnVpbHRpbnMKc2V0CnEAXXEBKEsCSwNLBGWFcQJScQMu"}]}

我可以提出一种解决方案,将集合降级为包含输出列表的字典,并在使用相同的编码器加载到python中时将其降级为集合,从而保留可观察性和语言不可知性:

from decimal import Decimal
from base64 import b64encode, b64decode
from json import dumps, loads, JSONEncoder
import pickle

class PythonObjectEncoder(JSONEncoder):
    def default(self, obj):
        if isinstance(obj, (list, dict, str, int, float, bool, type(None))):
            return super().default(obj)
        elif isinstance(obj, set):
            return {"__set__": list(obj)}
        return {'_python_object': b64encode(pickle.dumps(obj)).decode('utf-8')}

def as_python_object(dct):
    if '__set__' in dct:
        return set(dct['__set__'])
    elif '_python_object' in dct:
        return pickle.loads(b64decode(dct['_python_object'].encode('utf-8')))
    return dct

db = {
        "a": [ 44, set((4,5,6)) ],
        "b": [ 55, set((4,3,2)) ]
        }

j = dumps(db, cls=PythonObjectEncoder)
print(j)
ob = loads(j)
print(ob["a"])

这使您:

{"a": [44, {"__set__": [4, 5, 6]}], "b": [55, {"__set__": [2, 3, 4]}]}
[44, {'__set__': [4, 5, 6]}]

请注意,序列化包含具有键的元素的字典"__set__"将破坏此机制。因此__set__现在已成为保留dict键。显然,可以随意使用另一个更加模糊的密钥。

One shortcoming of the accepted solution is that its output is very python specific. I.e. its raw json output cannot be observed by a human or loaded by another language (e.g. javascript). example:

db = {
        "a": [ 44, set((4,5,6)) ],
        "b": [ 55, set((4,3,2)) ]
        }

j = dumps(db, cls=PythonObjectEncoder)
print(j)

Will get you:

{"a": [44, {"_python_object": "gANjYnVpbHRpbnMKc2V0CnEAXXEBKEsESwVLBmWFcQJScQMu"}], "b": [55, {"_python_object": "gANjYnVpbHRpbnMKc2V0CnEAXXEBKEsCSwNLBGWFcQJScQMu"}]}

I can propose a solution which downgrades the set to a dict containing a list on the way out, and back to a set when loaded into python using the same encoder, therefore preserving observability and language agnosticism:

from decimal import Decimal
from base64 import b64encode, b64decode
from json import dumps, loads, JSONEncoder
import pickle

class PythonObjectEncoder(JSONEncoder):
    def default(self, obj):
        if isinstance(obj, (list, dict, str, int, float, bool, type(None))):
            return super().default(obj)
        elif isinstance(obj, set):
            return {"__set__": list(obj)}
        return {'_python_object': b64encode(pickle.dumps(obj)).decode('utf-8')}

def as_python_object(dct):
    if '__set__' in dct:
        return set(dct['__set__'])
    elif '_python_object' in dct:
        return pickle.loads(b64decode(dct['_python_object'].encode('utf-8')))
    return dct

db = {
        "a": [ 44, set((4,5,6)) ],
        "b": [ 55, set((4,3,2)) ]
        }

j = dumps(db, cls=PythonObjectEncoder)
print(j)
ob = loads(j)
print(ob["a"])

Which gets you:

{"a": [44, {"__set__": [4, 5, 6]}], "b": [55, {"__set__": [2, 3, 4]}]}
[44, {'__set__': [4, 5, 6]}]

Note that serializing a dictionary which has an element with a key "__set__" will break this mechanism. So __set__ has now become a reserved dict key. Obviously feel free to use another, more deeply obfuscated key.


如何在python中将集合转换为列表?

问题:如何在python中将集合转换为列表?

我正在尝试将一组转换为Python 2.6中的列表。我正在使用以下语法:

first_list = [1,2,3,4]
my_set=set(first_list)
my_list = list(my_set)

但是,我得到以下堆栈跟踪:

Traceback (most recent call last):
  File "<console>", line 1, in <module>
TypeError: 'set' object is not callable

我怎样才能解决这个问题?

I am trying to convert a set to a list in Python 2.6. I’m using this syntax:

first_list = [1,2,3,4]
my_set=set(first_list)
my_list = list(my_set)

However, I get the following stack trace:

Traceback (most recent call last):
  File "<console>", line 1, in <module>
TypeError: 'set' object is not callable

How can I fix this?


回答 0

已经是清单了

type(my_set)
>>> <type 'list'>

你想要类似的东西吗

my_set = set([1,2,3,4])
my_list = list(my_set)
print my_list
>> [1, 2, 3, 4]

编辑:您的最后评论的输出

>>> my_list = [1,2,3,4]
>>> my_set = set(my_list)
>>> my_new_list = list(my_set)
>>> print my_new_list
[1, 2, 3, 4]

我想知道您是否做了这样的事情:

>>> set=set()
>>> set([1,2])
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: 'set' object is not callable

It is already a list

type(my_set)
>>> <type 'list'>

Do you want something like

my_set = set([1,2,3,4])
my_list = list(my_set)
print my_list
>> [1, 2, 3, 4]

EDIT : Output of your last comment

>>> my_list = [1,2,3,4]
>>> my_set = set(my_list)
>>> my_new_list = list(my_set)
>>> print my_new_list
[1, 2, 3, 4]

I’m wondering if you did something like this :

>>> set=set()
>>> set([1,2])
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: 'set' object is not callable

回答 1

代替:

first_list = [1,2,3,4]
my_set=set(first_list)
my_list = list(my_set)

为什么不简化该过程:

my_list = list(set([1,2,3,4])

这将从您的列表中删除重复对象,并将列表返回给您。

Instead of:

first_list = [1,2,3,4]
my_set=set(first_list)
my_list = list(my_set)

Why not shortcut the process:

my_list = list(set([1,2,3,4])

This will remove the dupes from you list and return a list back to you.


回答 2

[编辑]似乎您之前已经重新定义了“列表”,将其用作变量名,如下所示:

list = set([1,2,3,4]) # oops
#...
first_list = [1,2,3,4]
my_set=set(first_list)
my_list = list(my_set)

你会得到

Traceback (most recent call last):
  File "<console>", line 1, in <module>
TypeError: 'set' object is not callable

[EDITED] It’s seems you earlier have redefined “list”, using it as a variable name, like this:

list = set([1,2,3,4]) # oops
#...
first_list = [1,2,3,4]
my_set=set(first_list)
my_list = list(my_set)

And you’l get

Traceback (most recent call last):
  File "<console>", line 1, in <module>
TypeError: 'set' object is not callable

回答 3

每当遇到此类问题时,请尝试使用以下方法查找要首先转换的元素的数据类型:

type(my_set)

然后,使用:

  list(my_set) 

将其转换为列表。您现在可以像使用python中的任何普通列表一样使用新建列表。

Whenever you are stuck in such type of problems, try to find the datatype of the element you want to convert first by using :

type(my_set)

Then, Use:

  list(my_set) 

to convert it to a list. You can use the newly built list like any normal list in python now.


回答 4

只需键入:

list(my_set)

这会将格式为{‘1’,’2’}的集合转换为格式为[‘1’,’2’]的列表。

Simply type:

list(my_set)

This will turn a set in the form {‘1′,’2’} into a list in the form [‘1′,’2’].


回答 5

查看您的第一行。您的堆栈跟踪信息显然不是来自您在此处粘贴的代码,因此我不知道您到底做了什么。

>>> my_set=([1,2,3,4])
>>> my_set
[1, 2, 3, 4]
>>> type(my_set)
<type 'list'>
>>> list(my_set)
[1, 2, 3, 4]
>>> type(_)
<type 'list'>

你想要的是set([1, 2, 3, 4])

>>> my_set = set([1, 2, 3, 4])
>>> my_set
set([1, 2, 3, 4])
>>> type(my_set)
<type 'set'>
>>> list(my_set)
[1, 2, 3, 4]
>>> type(_)
<type 'list'>

“不可调用”异常表示您正在执行类似操作set()()-尝试调用set实例。

Review your first line. Your stack trace is clearly not from the code you’ve pasted here, so I don’t know precisely what you’ve done.

>>> my_set=([1,2,3,4])
>>> my_set
[1, 2, 3, 4]
>>> type(my_set)
<type 'list'>
>>> list(my_set)
[1, 2, 3, 4]
>>> type(_)
<type 'list'>

What you wanted was set([1, 2, 3, 4]).

>>> my_set = set([1, 2, 3, 4])
>>> my_set
set([1, 2, 3, 4])
>>> type(my_set)
<type 'set'>
>>> list(my_set)
[1, 2, 3, 4]
>>> type(_)
<type 'list'>

The “not callable” exception means you were doing something like set()() – attempting to call a set instance.


回答 6

我不确定您是否正在使用此([1, 2])语法创建一个集合,而不是一个列表。要创建集合,您应该使用set([1, 2])

这些括号只是包裹住您的表达方式,就像您会写的那样:

if (condition1
    and condition2 == 3):
    print something

并没有真正被忽略,但是对您的表情什么也不做。

注意:(something, something_else)将创建一个元组(但仍然没有列表)。

I’m not sure that you’re creating a set with this ([1, 2]) syntax, rather a list. To create a set, you should use set([1, 2]).

These brackets are just envelopping your expression, as if you would have written:

if (condition1
    and condition2 == 3):
    print something

There’re not really ignored, but do nothing to your expression.

Note: (something, something_else) will create a tuple (but still no list).


回答 7

Python是一种动态类型化的语言,这意味着您不能像在C或C ++中那样定义变量的类型:

type variable = value

要么

type variable(value)

在Python中,如果更改类型或类型的初始化函数(构造函数)来声明类型的变量,则可以使用强制转换:

my_set = set([1,2,3])
type my_set

会给你<type 'set'>答案。

如果有列表,请执行以下操作:

my_list = [1,2,3]
my_set = set(my_list)

Python is a dynamically typed language, which means that you cannot define the type of the variable as you do in C or C++:

type variable = value

or

type variable(value)

In Python, you use coercing if you change types, or the init functions (constructors) of the types to declare a variable of a type:

my_set = set([1,2,3])
type my_set

will give you <type 'set'> for an answer.

If you have a list, do this:

my_list = [1,2,3]
my_set = set(my_list)

回答 8

嗯,我敢打赌,在前面的几行中,您会看到以下内容:

list = set(something)

我错了吗 ?

Hmmm I bet that in some previous lines you have something like:

list = set(something)

Am I wrong ?


如何“完美”地覆盖字典?

问题:如何“完美”地覆盖字典?

如何使dict的子类尽可能“完美” ?最终目标是要有一个简单的字典,其中的键是小写的。

似乎应该覆盖一些微小的原语才能完成这项工作,但是根据我的所有研究和尝试,似乎并非如此:

这是我的第一个尝试,get()不起作用,毫无疑问,还有许多其他小问题:

class arbitrary_dict(dict):
    """A dictionary that applies an arbitrary key-altering function
       before accessing the keys."""

    def __keytransform__(self, key):
        return key

    # Overridden methods. List from 
    # /programming/2390827/how-to-properly-subclass-dict

    def __init__(self, *args, **kwargs):
        self.update(*args, **kwargs)

    # Note: I'm using dict directly, since super(dict, self) doesn't work.
    # I'm not sure why, perhaps dict is not a new-style class.

    def __getitem__(self, key):
        return dict.__getitem__(self, self.__keytransform__(key))

    def __setitem__(self, key, value):
        return dict.__setitem__(self, self.__keytransform__(key), value)

    def __delitem__(self, key):
        return dict.__delitem__(self, self.__keytransform__(key))

    def __contains__(self, key):
        return dict.__contains__(self, self.__keytransform__(key))


class lcdict(arbitrary_dict):
    def __keytransform__(self, key):
        return str(key).lower()

How can I make as “perfect” a subclass of dict as possible? The end goal is to have a simple dict in which the keys are lowercase.

It would seem that there should be some tiny set of primitives I can override to make this work, but according to all my research and attempts it seem like this isn’t the case:

  • If I override __getitem__/__setitem__, then get/set don’t work. How can I make them work? Surely I don’t need to implement them individually?

  • Am I preventing pickling from working, and do I need to implement __setstate__ etc?

  • Do I need repr, update and __init__?

  • Should I just use mutablemapping (it seems one shouldn’t use UserDict or DictMixin)? If so, how? The docs aren’t exactly enlightening.

Here is my first go at it, get() doesn’t work and no doubt there are many other minor problems:

class arbitrary_dict(dict):
    """A dictionary that applies an arbitrary key-altering function
       before accessing the keys."""

    def __keytransform__(self, key):
        return key

    # Overridden methods. List from 
    # https://stackoverflow.com/questions/2390827/how-to-properly-subclass-dict

    def __init__(self, *args, **kwargs):
        self.update(*args, **kwargs)

    # Note: I'm using dict directly, since super(dict, self) doesn't work.
    # I'm not sure why, perhaps dict is not a new-style class.

    def __getitem__(self, key):
        return dict.__getitem__(self, self.__keytransform__(key))

    def __setitem__(self, key, value):
        return dict.__setitem__(self, self.__keytransform__(key), value)

    def __delitem__(self, key):
        return dict.__delitem__(self, self.__keytransform__(key))

    def __contains__(self, key):
        return dict.__contains__(self, self.__keytransform__(key))


class lcdict(arbitrary_dict):
    def __keytransform__(self, key):
        return str(key).lower()

回答 0

您可以使用模块中的ABC(抽象基类)编写行为dict非常简单的对象。它甚至会告诉您是否错过了一种方法,因此以下是关闭ABC的最低版本。collections.abc

from collections.abc import MutableMapping


class TransformedDict(MutableMapping):
    """A dictionary that applies an arbitrary key-altering
       function before accessing the keys"""

    def __init__(self, *args, **kwargs):
        self.store = dict()
        self.update(dict(*args, **kwargs))  # use the free update to set keys

    def __getitem__(self, key):
        return self.store[self.__keytransform__(key)]

    def __setitem__(self, key, value):
        self.store[self.__keytransform__(key)] = value

    def __delitem__(self, key):
        del self.store[self.__keytransform__(key)]

    def __iter__(self):
        return iter(self.store)

    def __len__(self):
        return len(self.store)

    def __keytransform__(self, key):
        return key

您可以从ABC获得一些免费方法:

class MyTransformedDict(TransformedDict):

    def __keytransform__(self, key):
        return key.lower()


s = MyTransformedDict([('Test', 'test')])

assert s.get('TEST') is s['test']   # free get
assert 'TeSt' in s                  # free __contains__
                                    # free setdefault, __eq__, and so on

import pickle
# works too since we just use a normal dict
assert pickle.loads(pickle.dumps(s)) == s

我不会dict直接继承(或其他内置)。这通常没有任何意义,因为您真正想要做的是实现a的接口dict。而这正是ABC的目的。

You can write an object that behaves like a dict quite easily with ABCs (Abstract Base Classes) from the collections.abc module. It even tells you if you missed a method, so below is the minimal version that shuts the ABC up.

from collections.abc import MutableMapping


class TransformedDict(MutableMapping):
    """A dictionary that applies an arbitrary key-altering
       function before accessing the keys"""

    def __init__(self, *args, **kwargs):
        self.store = dict()
        self.update(dict(*args, **kwargs))  # use the free update to set keys

    def __getitem__(self, key):
        return self.store[self._keytransform(key)]

    def __setitem__(self, key, value):
        self.store[self._keytransform(key)] = value

    def __delitem__(self, key):
        del self.store[self._keytransform(key)]

    def __iter__(self):
        return iter(self.store)
    
    def __len__(self):
        return len(self.store)

    def _keytransform(self, key):
        return key

You get a few free methods from the ABC:

class MyTransformedDict(TransformedDict):

    def _keytransform(self, key):
        return key.lower()


s = MyTransformedDict([('Test', 'test')])

assert s.get('TEST') is s['test']   # free get
assert 'TeSt' in s                  # free __contains__
                                    # free setdefault, __eq__, and so on

import pickle
# works too since we just use a normal dict
assert pickle.loads(pickle.dumps(s)) == s

I wouldn’t subclass dict (or other builtins) directly. It often makes no sense, because what you actually want to do is implement the interface of a dict. And that is exactly what ABCs are for.


回答 1

如何使dict的子类尽可能“完美”?

最终目标是要有一个简单的字典,其中的键是小写的。

  • 如果我覆盖__getitem__/ __setitem__,则获取/设置不起作用。我如何使它们工作?当然,我不需要单独实施它们吗?

  • 我是否在阻止酸洗,我需要实施 __setstate__等吗?

  • 我需要repr,update和__init__吗?

  • 我应该只使用mutablemapping(似乎不应该使用UserDictDictMixin)吗?如果是这样,怎么办?这些文档并不完全具有启发性。

可接受的答案将是我的第一种方法,但是由于它存在一些问题,并且由于没有人解决替代方法,实际上dict是将a子类化,因此我将在此处进行操作。

接受的答案有什么问题?

对我来说,这似乎是一个非常简单的请求:

如何使dict的子类尽可能“完美”?最终目标是要有一个简单的字典,其中的键是小写的。

接受的答案实际上不是子类dict,并且对此的测试失败:

>>> isinstance(MyTransformedDict([('Test', 'test')]), dict)
False

理想情况下,任何类型检查代码都将测试我们期望的接口或抽象基类,但是如果将我们的数据对象传递给正在测试的函数,dict而我们无法“修复”这些函数,则此代码将失败。

其他可能引起的争议:

  • 可接受的答案也缺少类方法:fromkeys
  • 可接受的答案也有冗余__dict__-因此会占用更多的内存空间:

    >>> s.foo = 'bar'
    >>> s.__dict__
    {'foo': 'bar', 'store': {'test': 'test'}}

实际上是子类化 dict

我们可以通过继承重用dict方法。我们需要做的就是创建一个接口层,以确保键(如果是字符串)以小写形式传递到字典中。

如果我覆盖__getitem__/ __setitem__,则获取/设置不起作用。我如何使它们工作?当然,我不需要单独实施它们吗?

好吧,分别实现它们是此方法的缺点,也是使用方法的不利之处MutableMapping(请参阅接受的答案),但实际上并不需要太多工作。

首先,让我们排除Python 2和Python 3之间的差异,创建一个singleton(_RaiseKeyError)以确保我们知道是否确实获得的参数dict.pop,并创建一个函数以确保我们的字符串键是小写的:

from itertools import chain
try:              # Python 2
    str_base = basestring
    items = 'iteritems'
except NameError: # Python 3
    str_base = str, bytes, bytearray
    items = 'items'

_RaiseKeyError = object() # singleton for no-default behavior

def ensure_lower(maybe_str):
    """dict keys can be any hashable object - only call lower if str"""
    return maybe_str.lower() if isinstance(maybe_str, str_base) else maybe_str

现在我们实现-我使用super了完整参数,因此该代码适用于Python 2和3:

class LowerDict(dict):  # dicts take a mapping or iterable as their optional first argument
    __slots__ = () # no __dict__ - that would be redundant
    @staticmethod # because this doesn't make sense as a global function.
    def _process_args(mapping=(), **kwargs):
        if hasattr(mapping, items):
            mapping = getattr(mapping, items)()
        return ((ensure_lower(k), v) for k, v in chain(mapping, getattr(kwargs, items)()))
    def __init__(self, mapping=(), **kwargs):
        super(LowerDict, self).__init__(self._process_args(mapping, **kwargs))
    def __getitem__(self, k):
        return super(LowerDict, self).__getitem__(ensure_lower(k))
    def __setitem__(self, k, v):
        return super(LowerDict, self).__setitem__(ensure_lower(k), v)
    def __delitem__(self, k):
        return super(LowerDict, self).__delitem__(ensure_lower(k))
    def get(self, k, default=None):
        return super(LowerDict, self).get(ensure_lower(k), default)
    def setdefault(self, k, default=None):
        return super(LowerDict, self).setdefault(ensure_lower(k), default)
    def pop(self, k, v=_RaiseKeyError):
        if v is _RaiseKeyError:
            return super(LowerDict, self).pop(ensure_lower(k))
        return super(LowerDict, self).pop(ensure_lower(k), v)
    def update(self, mapping=(), **kwargs):
        super(LowerDict, self).update(self._process_args(mapping, **kwargs))
    def __contains__(self, k):
        return super(LowerDict, self).__contains__(ensure_lower(k))
    def copy(self): # don't delegate w/ super - dict.copy() -> dict :(
        return type(self)(self)
    @classmethod
    def fromkeys(cls, keys, v=None):
        return super(LowerDict, cls).fromkeys((ensure_lower(k) for k in keys), v)
    def __repr__(self):
        return '{0}({1})'.format(type(self).__name__, super(LowerDict, self).__repr__())

我们使用的样板化的做法对任何方法或特殊方法引用的关键,但在其他方面,通过继承,我们获得方法:lenclearitemskeyspopitem,和values是免费的。尽管这需要一些仔细的思考才能正确解决,但看到它可行却是微不足道的。

(请注意,haskey在Python 2 中已弃用,在Python 3中已删除。)

这是一些用法:

>>> ld = LowerDict(dict(foo='bar'))
>>> ld['FOO']
'bar'
>>> ld['foo']
'bar'
>>> ld.pop('FoO')
'bar'
>>> ld.setdefault('Foo')
>>> ld
{'foo': None}
>>> ld.get('Bar')
>>> ld.setdefault('Bar')
>>> ld
{'bar': None, 'foo': None}
>>> ld.popitem()
('bar', None)

我是否在阻止酸洗,我需要实施 __setstate__等吗?

酸洗

dict子类的泡菜就可以了:

>>> import pickle
>>> pickle.dumps(ld)
b'\x80\x03c__main__\nLowerDict\nq\x00)\x81q\x01X\x03\x00\x00\x00fooq\x02Ns.'
>>> pickle.loads(pickle.dumps(ld))
{'foo': None}
>>> type(pickle.loads(pickle.dumps(ld)))
<class '__main__.LowerDict'>

__repr__

我需要repr,update和__init__吗?

我们定义了update__init__,但是__repr__默认情况下您会很漂亮:

>>> ld # without __repr__ defined for the class, we get this
{'foo': None}

但是,最好编写一个,__repr__以提高代码的可调试性。理想的测试是eval(repr(obj)) == obj。如果您的代码很简单,我强烈建议您:

>>> ld = LowerDict({})
>>> eval(repr(ld)) == ld
True
>>> ld = LowerDict(dict(a=1, b=2, c=3))
>>> eval(repr(ld)) == ld
True

您会看到,这正是我们重新创建等效对象所需要的-这可能会出现在我们的日志或回溯中:

>>> ld
LowerDict({'a': 1, 'c': 3, 'b': 2})

结论

我应该只使用mutablemapping(似乎不应该使用UserDictDictMixin)吗?如果是这样,怎么办?这些文档并不完全具有启发性。

是的,这些是更多几行代码,但是它们旨在变得更全面。我的第一个倾向是使用公认的答案,如果有问题,我将看一下我的答案-因为它有点复杂,而且没有ABC可以帮助我正确设置界面。

过早的优化将使搜索性能变得更加复杂。 MutableMapping更简单-在其他所有条件相同的情况下,它可以立即获得优势。不过,要列出所有差异,让我们进行比较和对比。

我应该补充一点,是有人试图将类似的字典放入collections模块中,但是被拒绝了。您可能应该这样做:

my_dict[transform(key)]

它应该更容易调试。

比较和对比

MutableMapping(缺少fromkeys)实现的6个接口函数和带有dict子类的11 个接口函数。我并不需要实现__iter__或者__len__,而是我要实现getsetdefaultpopupdatecopy__contains__,和fromkeys-但这些都是相当琐碎,因为我可以使用继承大多数这些实现的。

MutableMapping实现在Python中dict实现了一些用C 实现的东西-因此,我希望dict在某些情况下子类的性能更高。

我们__eq__在两种方法上都获得了自由-只有当另一个dict都为小写时,这两种方法才假定相等-但是,我再次认为,dict子类的比较会更快。

摘要:

  • 子类化MutableMapping更简单,发生错误的机会更少,但更慢,占用更多内存(请参阅冗余字典),并且失败isinstance(x, dict)
  • 子类化dict更快,使用更少的内存并通过isinstance(x, dict),但是实现起来却更加复杂。

哪个更完美?那取决于您对完美的定义。

How can I make as “perfect” a subclass of dict as possible?

The end goal is to have a simple dict in which the keys are lowercase.

  • If I override __getitem__/__setitem__, then get/set don’t work. How do I make them work? Surely I don’t need to implement them individually?

  • Am I preventing pickling from working, and do I need to implement __setstate__ etc?

  • Do I need repr, update and __init__?

  • Should I just use mutablemapping (it seems one shouldn’t use UserDict or DictMixin)? If so, how? The docs aren’t exactly enlightening.

The accepted answer would be my first approach, but since it has some issues, and since no one has addressed the alternative, actually subclassing a dict, I’m going to do that here.

What’s wrong with the accepted answer?

This seems like a rather simple request to me:

How can I make as “perfect” a subclass of dict as possible? The end goal is to have a simple dict in which the keys are lowercase.

The accepted answer doesn’t actually subclass dict, and a test for this fails:

>>> isinstance(MyTransformedDict([('Test', 'test')]), dict)
False

Ideally, any type-checking code would be testing for the interface we expect, or an abstract base class, but if our data objects are being passed into functions that are testing for dict – and we can’t “fix” those functions, this code will fail.

Other quibbles one might make:

  • The accepted answer is also missing the classmethod: fromkeys.
  • The accepted answer also has a redundant __dict__ – therefore taking up more space in memory:

    >>> s.foo = 'bar'
    >>> s.__dict__
    {'foo': 'bar', 'store': {'test': 'test'}}
    

Actually subclassing dict

We can reuse the dict methods through inheritance. All we need to do is create an interface layer that ensures keys are passed into the dict in lowercase form if they are strings.

If I override __getitem__/__setitem__, then get/set don’t work. How do I make them work? Surely I don’t need to implement them individually?

Well, implementing them each individually is the downside to this approach and the upside to using MutableMapping (see the accepted answer), but it’s really not that much more work.

First, let’s factor out the difference between Python 2 and 3, create a singleton (_RaiseKeyError) to make sure we know if we actually get an argument to dict.pop, and create a function to ensure our string keys are lowercase:

from itertools import chain
try:              # Python 2
    str_base = basestring
    items = 'iteritems'
except NameError: # Python 3
    str_base = str, bytes, bytearray
    items = 'items'

_RaiseKeyError = object() # singleton for no-default behavior

def ensure_lower(maybe_str):
    """dict keys can be any hashable object - only call lower if str"""
    return maybe_str.lower() if isinstance(maybe_str, str_base) else maybe_str

Now we implement – I’m using super with the full arguments so that this code works for Python 2 and 3:

class LowerDict(dict):  # dicts take a mapping or iterable as their optional first argument
    __slots__ = () # no __dict__ - that would be redundant
    @staticmethod # because this doesn't make sense as a global function.
    def _process_args(mapping=(), **kwargs):
        if hasattr(mapping, items):
            mapping = getattr(mapping, items)()
        return ((ensure_lower(k), v) for k, v in chain(mapping, getattr(kwargs, items)()))
    def __init__(self, mapping=(), **kwargs):
        super(LowerDict, self).__init__(self._process_args(mapping, **kwargs))
    def __getitem__(self, k):
        return super(LowerDict, self).__getitem__(ensure_lower(k))
    def __setitem__(self, k, v):
        return super(LowerDict, self).__setitem__(ensure_lower(k), v)
    def __delitem__(self, k):
        return super(LowerDict, self).__delitem__(ensure_lower(k))
    def get(self, k, default=None):
        return super(LowerDict, self).get(ensure_lower(k), default)
    def setdefault(self, k, default=None):
        return super(LowerDict, self).setdefault(ensure_lower(k), default)
    def pop(self, k, v=_RaiseKeyError):
        if v is _RaiseKeyError:
            return super(LowerDict, self).pop(ensure_lower(k))
        return super(LowerDict, self).pop(ensure_lower(k), v)
    def update(self, mapping=(), **kwargs):
        super(LowerDict, self).update(self._process_args(mapping, **kwargs))
    def __contains__(self, k):
        return super(LowerDict, self).__contains__(ensure_lower(k))
    def copy(self): # don't delegate w/ super - dict.copy() -> dict :(
        return type(self)(self)
    @classmethod
    def fromkeys(cls, keys, v=None):
        return super(LowerDict, cls).fromkeys((ensure_lower(k) for k in keys), v)
    def __repr__(self):
        return '{0}({1})'.format(type(self).__name__, super(LowerDict, self).__repr__())

We use an almost boiler-plate approach for any method or special method that references a key, but otherwise, by inheritance, we get methods: len, clear, items, keys, popitem, and values for free. While this required some careful thought to get right, it is trivial to see that this works.

(Note that haskey was deprecated in Python 2, removed in Python 3.)

Here’s some usage:

>>> ld = LowerDict(dict(foo='bar'))
>>> ld['FOO']
'bar'
>>> ld['foo']
'bar'
>>> ld.pop('FoO')
'bar'
>>> ld.setdefault('Foo')
>>> ld
{'foo': None}
>>> ld.get('Bar')
>>> ld.setdefault('Bar')
>>> ld
{'bar': None, 'foo': None}
>>> ld.popitem()
('bar', None)

Am I preventing pickling from working, and do I need to implement __setstate__ etc?

pickling

And the dict subclass pickles just fine:

>>> import pickle
>>> pickle.dumps(ld)
b'\x80\x03c__main__\nLowerDict\nq\x00)\x81q\x01X\x03\x00\x00\x00fooq\x02Ns.'
>>> pickle.loads(pickle.dumps(ld))
{'foo': None}
>>> type(pickle.loads(pickle.dumps(ld)))
<class '__main__.LowerDict'>

__repr__

Do I need repr, update and __init__?

We defined update and __init__, but you have a beautiful __repr__ by default:

>>> ld # without __repr__ defined for the class, we get this
{'foo': None}

However, it’s good to write a __repr__ to improve the debugability of your code. The ideal test is eval(repr(obj)) == obj. If it’s easy to do for your code, I strongly recommend it:

>>> ld = LowerDict({})
>>> eval(repr(ld)) == ld
True
>>> ld = LowerDict(dict(a=1, b=2, c=3))
>>> eval(repr(ld)) == ld
True

You see, it’s exactly what we need to recreate an equivalent object – this is something that might show up in our logs or in backtraces:

>>> ld
LowerDict({'a': 1, 'c': 3, 'b': 2})

Conclusion

Should I just use mutablemapping (it seems one shouldn’t use UserDict or DictMixin)? If so, how? The docs aren’t exactly enlightening.

Yeah, these are a few more lines of code, but they’re intended to be comprehensive. My first inclination would be to use the accepted answer, and if there were issues with it, I’d then look at my answer – as it’s a little more complicated, and there’s no ABC to help me get my interface right.

Premature optimization is going for greater complexity in search of performance. MutableMapping is simpler – so it gets an immediate edge, all else being equal. Nevertheless, to lay out all the differences, let’s compare and contrast.

I should add that there was a push to put a similar dictionary into the collections module, but it was rejected. You should probably just do this instead:

my_dict[transform(key)]

It should be far more easily debugable.

Compare and contrast

There are 6 interface functions implemented with the MutableMapping (which is missing fromkeys) and 11 with the dict subclass. I don’t need to implement __iter__ or __len__, but instead I have to implement get, setdefault, pop, update, copy, __contains__, and fromkeys – but these are fairly trivial, since I can use inheritance for most of those implementations.

The MutableMapping implements some things in Python that dict implements in C – so I would expect a dict subclass to be more performant in some cases.

We get a free __eq__ in both approaches – both of which assume equality only if another dict is all lowercase – but again, I think the dict subclass will compare more quickly.

Summary:

  • subclassing MutableMapping is simpler with fewer opportunities for bugs, but slower, takes more memory (see redundant dict), and fails isinstance(x, dict)
  • subclassing dict is faster, uses less memory, and passes isinstance(x, dict), but it has greater complexity to implement.

Which is more perfect? That depends on your definition of perfect.


回答 2

我的要求比较严格:

  • 我必须保留大小写信息(字符串是显示给用户的文件的路径,但这是Windows应用程序,因此内部所有操作都必须区分大小写)
  • 我需要密钥尽可能小(它确实在内存性能上有所作为,从370中砍掉了110 mb)。这意味着不能缓存键的小写版本。
  • 我需要尽快创建数据结构(这次再次改变了性能,提高了速度)。我不得不去一个内置的

我最初的想法是用笨拙的Path类代替不区分大小写的unicode子类-但是:

  • 事实证明很难做到这一点-参见:python中不区分大小写的字符串类
  • 事实证明,显式的dict键处理使代码变得冗长而混乱,并且容易出错(结构前后传递,并且不清楚它们是否具有CIStr实例作为键/元素,容易忘记some_dict[CIstr(path)],而且很难看)

因此,我最终不得不写下不区分大小写的字典。感谢@AaronHall 编写的代码,它简化了10倍。

class CIstr(unicode):
    """See https://stackoverflow.com/a/43122305/281545, especially for inlines"""
    __slots__ = () # does make a difference in memory performance

    #--Hash/Compare
    def __hash__(self):
        return hash(self.lower())
    def __eq__(self, other):
        if isinstance(other, CIstr):
            return self.lower() == other.lower()
        return NotImplemented
    def __ne__(self, other):
        if isinstance(other, CIstr):
            return self.lower() != other.lower()
        return NotImplemented
    def __lt__(self, other):
        if isinstance(other, CIstr):
            return self.lower() < other.lower()
        return NotImplemented
    def __ge__(self, other):
        if isinstance(other, CIstr):
            return self.lower() >= other.lower()
        return NotImplemented
    def __gt__(self, other):
        if isinstance(other, CIstr):
            return self.lower() > other.lower()
        return NotImplemented
    def __le__(self, other):
        if isinstance(other, CIstr):
            return self.lower() <= other.lower()
        return NotImplemented
    #--repr
    def __repr__(self):
        return '{0}({1})'.format(type(self).__name__,
                                 super(CIstr, self).__repr__())

def _ci_str(maybe_str):
    """dict keys can be any hashable object - only call CIstr if str"""
    return CIstr(maybe_str) if isinstance(maybe_str, basestring) else maybe_str

class LowerDict(dict):
    """Dictionary that transforms its keys to CIstr instances.
    Adapted from: https://stackoverflow.com/a/39375731/281545
    """
    __slots__ = () # no __dict__ - that would be redundant

    @staticmethod # because this doesn't make sense as a global function.
    def _process_args(mapping=(), **kwargs):
        if hasattr(mapping, 'iteritems'):
            mapping = getattr(mapping, 'iteritems')()
        return ((_ci_str(k), v) for k, v in
                chain(mapping, getattr(kwargs, 'iteritems')()))
    def __init__(self, mapping=(), **kwargs):
        # dicts take a mapping or iterable as their optional first argument
        super(LowerDict, self).__init__(self._process_args(mapping, **kwargs))
    def __getitem__(self, k):
        return super(LowerDict, self).__getitem__(_ci_str(k))
    def __setitem__(self, k, v):
        return super(LowerDict, self).__setitem__(_ci_str(k), v)
    def __delitem__(self, k):
        return super(LowerDict, self).__delitem__(_ci_str(k))
    def copy(self): # don't delegate w/ super - dict.copy() -> dict :(
        return type(self)(self)
    def get(self, k, default=None):
        return super(LowerDict, self).get(_ci_str(k), default)
    def setdefault(self, k, default=None):
        return super(LowerDict, self).setdefault(_ci_str(k), default)
    __no_default = object()
    def pop(self, k, v=__no_default):
        if v is LowerDict.__no_default:
            # super will raise KeyError if no default and key does not exist
            return super(LowerDict, self).pop(_ci_str(k))
        return super(LowerDict, self).pop(_ci_str(k), v)
    def update(self, mapping=(), **kwargs):
        super(LowerDict, self).update(self._process_args(mapping, **kwargs))
    def __contains__(self, k):
        return super(LowerDict, self).__contains__(_ci_str(k))
    @classmethod
    def fromkeys(cls, keys, v=None):
        return super(LowerDict, cls).fromkeys((_ci_str(k) for k in keys), v)
    def __repr__(self):
        return '{0}({1})'.format(type(self).__name__,
                                 super(LowerDict, self).__repr__())

隐式还是显式仍然是一个问题,但是一旦尘埃落定,就重命名属性/变量以ci开头(以及大量的doc注释说明ci代表不区分大小写),我认为这是一个完美的解决方案-因为代码的读者必须充分意识到我们正在处理不区分大小写的基础数据结构。希望这将修复一些难以重现的错误,我怀疑这些错误归结为区分大小写。

欢迎评论/更正:)

My requirements were a bit stricter:

  • I had to retain case info (the strings are paths to files displayed to the user, but it’s a windows app so internally all operations must be case insensitive)
  • I needed keys to be as small as possible (it did make a difference in memory performance, chopped off 110 mb out of 370). This meant that caching lowercase version of keys is not an option.
  • I needed creation of the data structures to be as fast as possible (again made a difference in performance, speed this time). I had to go with a builtin

My initial thought was to substitute our clunky Path class for a case insensitive unicode subclass – but:

  • proved hard to get that right – see: A case insensitive string class in python
  • turns out that explicit dict keys handling makes code verbose and messy – and error prone (structures are passed hither and thither, and it is not clear if they have CIStr instances as keys/elements, easy to forget plus some_dict[CIstr(path)] is ugly)

So I had finally to write down that case insensitive dict. Thanks to code by @AaronHall that was made 10 times easier.

class CIstr(unicode):
    """See https://stackoverflow.com/a/43122305/281545, especially for inlines"""
    __slots__ = () # does make a difference in memory performance

    #--Hash/Compare
    def __hash__(self):
        return hash(self.lower())
    def __eq__(self, other):
        if isinstance(other, CIstr):
            return self.lower() == other.lower()
        return NotImplemented
    def __ne__(self, other):
        if isinstance(other, CIstr):
            return self.lower() != other.lower()
        return NotImplemented
    def __lt__(self, other):
        if isinstance(other, CIstr):
            return self.lower() < other.lower()
        return NotImplemented
    def __ge__(self, other):
        if isinstance(other, CIstr):
            return self.lower() >= other.lower()
        return NotImplemented
    def __gt__(self, other):
        if isinstance(other, CIstr):
            return self.lower() > other.lower()
        return NotImplemented
    def __le__(self, other):
        if isinstance(other, CIstr):
            return self.lower() <= other.lower()
        return NotImplemented
    #--repr
    def __repr__(self):
        return '{0}({1})'.format(type(self).__name__,
                                 super(CIstr, self).__repr__())

def _ci_str(maybe_str):
    """dict keys can be any hashable object - only call CIstr if str"""
    return CIstr(maybe_str) if isinstance(maybe_str, basestring) else maybe_str

class LowerDict(dict):
    """Dictionary that transforms its keys to CIstr instances.
    Adapted from: https://stackoverflow.com/a/39375731/281545
    """
    __slots__ = () # no __dict__ - that would be redundant

    @staticmethod # because this doesn't make sense as a global function.
    def _process_args(mapping=(), **kwargs):
        if hasattr(mapping, 'iteritems'):
            mapping = getattr(mapping, 'iteritems')()
        return ((_ci_str(k), v) for k, v in
                chain(mapping, getattr(kwargs, 'iteritems')()))
    def __init__(self, mapping=(), **kwargs):
        # dicts take a mapping or iterable as their optional first argument
        super(LowerDict, self).__init__(self._process_args(mapping, **kwargs))
    def __getitem__(self, k):
        return super(LowerDict, self).__getitem__(_ci_str(k))
    def __setitem__(self, k, v):
        return super(LowerDict, self).__setitem__(_ci_str(k), v)
    def __delitem__(self, k):
        return super(LowerDict, self).__delitem__(_ci_str(k))
    def copy(self): # don't delegate w/ super - dict.copy() -> dict :(
        return type(self)(self)
    def get(self, k, default=None):
        return super(LowerDict, self).get(_ci_str(k), default)
    def setdefault(self, k, default=None):
        return super(LowerDict, self).setdefault(_ci_str(k), default)
    __no_default = object()
    def pop(self, k, v=__no_default):
        if v is LowerDict.__no_default:
            # super will raise KeyError if no default and key does not exist
            return super(LowerDict, self).pop(_ci_str(k))
        return super(LowerDict, self).pop(_ci_str(k), v)
    def update(self, mapping=(), **kwargs):
        super(LowerDict, self).update(self._process_args(mapping, **kwargs))
    def __contains__(self, k):
        return super(LowerDict, self).__contains__(_ci_str(k))
    @classmethod
    def fromkeys(cls, keys, v=None):
        return super(LowerDict, cls).fromkeys((_ci_str(k) for k in keys), v)
    def __repr__(self):
        return '{0}({1})'.format(type(self).__name__,
                                 super(LowerDict, self).__repr__())

Implicit vs explicit is still a problem, but once dust settles, renaming of attributes/variables to start with ci (and a big fat doc comment explaining that ci stands for case insensitive) I think is a perfect solution – as readers of the code must be fully aware that we are dealing with case insensitive underlying data structures. This will hopefully fix some hard to reproduce bugs, which I suspect boil down to case sensitivity.

Comments/corrections welcome :)


回答 3

您要做的就是

class BatchCollection(dict):
    def __init__(self, *args, **kwargs):
        dict.__init__(*args, **kwargs)

要么

class BatchCollection(dict):
    def __init__(self, inpt={}):
        super(BatchCollection, self).__init__(inpt)

我个人使用的样本用法

### EXAMPLE
class BatchCollection(dict):
    def __init__(self, inpt={}):
        dict.__init__(*args, **kwargs)

    def __setitem__(self, key, item):
        if (isinstance(key, tuple) and len(key) == 2
                and isinstance(item, collections.Iterable)):
            # self.__dict__[key] = item
            super(BatchCollection, self).__setitem__(key, item)
        else:
            raise Exception(
                "Valid key should be a tuple (database_name, table_name) "
                "and value should be iterable")

注意:仅在python3中测试

All you will have to do is

class BatchCollection(dict):
    def __init__(self, *args, **kwargs):
        dict.__init__(*args, **kwargs)

OR

class BatchCollection(dict):
    def __init__(self, inpt={}):
        super(BatchCollection, self).__init__(inpt)

A sample usage for my personal use

### EXAMPLE
class BatchCollection(dict):
    def __init__(self, inpt={}):
        dict.__init__(*args, **kwargs)

    def __setitem__(self, key, item):
        if (isinstance(key, tuple) and len(key) == 2
                and isinstance(item, collections.Iterable)):
            # self.__dict__[key] = item
            super(BatchCollection, self).__setitem__(key, item)
        else:
            raise Exception(
                "Valid key should be a tuple (database_name, table_name) "
                "and value should be iterable")

Note: tested only in python3


回答 4

尝试了两者的后顶部 2的建议,我已经定居在为Python 2.7黑幕,看中间路线。也许3更聪明,但对我来说:

class MyDict(MutableMapping):
   # ... the few __methods__ that mutablemapping requires
   # and then this monstrosity
   @property
   def __class__(self):
       return dict

我真的很讨厌,但似乎符合我的需求,这些需求是:

  • 可以覆盖 **my_dict
    • 如果您从继承dict则绕过您的代码。试试看。
    • 这使得#2 一直都是我无法接受的,因为这在python代码中很常见
  • 伪装成 isinstance(my_dict, dict)
    • 仅排除MutableMapping,所以#1是不够的
    • 我衷心推荐#1,如果您不需要的话,它既简单又可预测
  • 完全可控的行为
    • 所以我不能继承 dict

如果您需要与其他人区分开来,我个人使用这样的名称(尽管我会建议使用更好的名称):

def __am_i_me(self):
  return True

@classmethod
def __is_it_me(cls, other):
  try:
    return other.__am_i_me()
  except Exception:
    return False

只要您只需要在内部识别自己,这种方式就很难__am_i_me因python的名称更改(这_MyDict__am_i_me从此类外部的任何调用重命名)而意外调用。_method在实践和文化上都比s 私密一些。

到目前为止,除了看上去非常阴暗的__class__覆盖之外,我还没有任何抱怨。我很高兴听到别人遇到的任何问题,但我不完全了解后果。但是到目前为止,我还没有遇到任何问题,这使我可以在很多位置迁移很多中等质量的代码,而无需进行任何更改。


作为证据:https : //repl.it/repls/TraumaticToughCockatoo

基本上:复制当前的#2选项print 'method_name'向每个方法添加行,然后尝试执行此操作并观察输出:

d = LowerDict()  # prints "init", or whatever your print statement said
print '------'
splatted = dict(**d)  # note that there are no prints here

您将在其他情况下看到类似的行为。假设您的伪造品dict是其他数据类型的包装,因此没有合理的方法将数据存储在后备字典中;**your_dict不管其他方法做什么,它将为空。

这适用于MutableMapping,但是一旦您继承dict它就变得不可控制。


编辑:作为更新,它已经运行了将近两年没有出现任何问题,使用了数十万行(可能是几百万行)复杂的,遗留了很多经验的python。所以我对此很满意:)

编辑2:很显然,我很早以前就把它复印了。 @classmethod __class__不适用于isinstance支票- @property __class__可以:https : //repl.it/repls/UnitedScientificSequence

After trying out both of the top two suggestions, I’ve settled on a shady-looking middle route for Python 2.7. Maybe 3 is saner, but for me:

class MyDict(MutableMapping):
   # ... the few __methods__ that mutablemapping requires
   # and then this monstrosity
   @property
   def __class__(self):
       return dict

which I really hate, but seems to fit my needs, which are:

  • can override **my_dict
    • if you inherit from dict, this bypasses your code. try it out.
    • this makes #2 unacceptable for me at all times, as this is quite common in python code
  • masquerades as isinstance(my_dict, dict)
    • rules out MutableMapping alone, so #1 is not enough
    • I heartily recommend #1 if you don’t need this, it’s simple and predictable
  • fully controllable behavior
    • so I cannot inherit from dict

If you need to tell yourself apart from others, personally I use something like this (though I’d recommend better names):

def __am_i_me(self):
  return True

@classmethod
def __is_it_me(cls, other):
  try:
    return other.__am_i_me()
  except Exception:
    return False

As long as you only need to recognize yourself internally, this way it’s harder to accidentally call __am_i_me due to python’s name-munging (this is renamed to _MyDict__am_i_me from anything calling outside this class). Slightly more private than _methods, both in practice and culturally.

So far I have no complaints, aside from the seriously-shady-looking __class__ override. I’d be thrilled to hear of any problems that others encounter with this though, I don’t fully understand the consequences. But so far I’ve had no problems whatsoever, and this allowed me to migrate a lot of middling-quality code in lots of locations without needing any changes.


As evidence: https://repl.it/repls/TraumaticToughCockatoo

Basically: copy the current #2 option, add print 'method_name' lines to every method, and then try this and watch the output:

d = LowerDict()  # prints "init", or whatever your print statement said
print '------'
splatted = dict(**d)  # note that there are no prints here

You’ll see similar behavior for other scenarios. Say your fake-dict is a wrapper around some other datatype, so there’s no reasonable way to store the data in the backing-dict; **your_dict will be empty, regardless of what every other method does.

This works correctly for MutableMapping, but as soon as you inherit from dict it becomes uncontrollable.


Edit: as an update, this has been running without a single issue for almost two years now, on several hundred thousand (eh, might be a couple million) lines of complicated, legacy-ridden python. So I’m pretty happy with it :)

Edit 2: apparently I mis-copied this or something long ago. @classmethod __class__ does not work for isinstance checks – @property __class__ does: https://repl.it/repls/UnitedScientificSequence


Python集与列表

问题:Python集与列表

在Python中,哪种数据结构更有效/更快速?假设顺序对我而言并不重要,并且无论如何我都将检查重复项,那么Python设置是否比Python列表慢?

In Python, which data structure is more efficient/speedy? Assuming that order is not important to me and I would be checking for duplicates anyway, is a Python set slower than a Python list?


回答 0

这取决于您打算如何处理。

在确定对象是否存在于集合中时,集合要快得多(如中所示x in s),但是在遍历其内容时要比列表慢。

您可以使用timeit模块查看哪种情况适合您的情况。

It depends on what you are intending to do with it.

Sets are significantly faster when it comes to determining if an object is present in the set (as in x in s), but are slower than lists when it comes to iterating over their contents.

You can use the timeit module to see which is faster for your situation.


回答 1

当您只想遍历值时,列表比集合要快一些。

但是,如果要检查项目中是否包含项目,则集合的速度明显快于列表。它们只能包含唯一项。

事实证明,除了不变性之外,元组的执行几乎与列表完全相同。

反复进行

>>> def iter_test(iterable):
...     for i in iterable:
...         pass
...
>>> from timeit import timeit
>>> timeit(
...     "iter_test(iterable)",
...     setup="from __main__ import iter_test; iterable = set(range(10000))",
...     number=100000)
12.666952133178711
>>> timeit(
...     "iter_test(iterable)",
...     setup="from __main__ import iter_test; iterable = list(range(10000))",
...     number=100000)
9.917098999023438
>>> timeit(
...     "iter_test(iterable)",
...     setup="from __main__ import iter_test; iterable = tuple(range(10000))",
...     number=100000)
9.865639209747314

确定是否存在对象

>>> def in_test(iterable):
...     for i in range(1000):
...         if i in iterable:
...             pass
...
>>> from timeit import timeit
>>> timeit(
...     "in_test(iterable)",
...     setup="from __main__ import in_test; iterable = set(range(1000))",
...     number=10000)
0.5591847896575928
>>> timeit(
...     "in_test(iterable)",
...     setup="from __main__ import in_test; iterable = list(range(1000))",
...     number=10000)
50.18339991569519
>>> timeit(
...     "in_test(iterable)",
...     setup="from __main__ import in_test; iterable = tuple(range(1000))",
...     number=10000)
51.597304821014404

Lists are slightly faster than sets when you just want to iterate over the values.

Sets, however, are significantly faster than lists if you want to check if an item is contained within it. They can only contain unique items though.

It turns out tuples perform in almost exactly the same way as lists, except for their immutability.

Iterating

>>> def iter_test(iterable):
...     for i in iterable:
...         pass
...
>>> from timeit import timeit
>>> timeit(
...     "iter_test(iterable)",
...     setup="from __main__ import iter_test; iterable = set(range(10000))",
...     number=100000)
12.666952133178711
>>> timeit(
...     "iter_test(iterable)",
...     setup="from __main__ import iter_test; iterable = list(range(10000))",
...     number=100000)
9.917098999023438
>>> timeit(
...     "iter_test(iterable)",
...     setup="from __main__ import iter_test; iterable = tuple(range(10000))",
...     number=100000)
9.865639209747314

Determine if an object is present

>>> def in_test(iterable):
...     for i in range(1000):
...         if i in iterable:
...             pass
...
>>> from timeit import timeit
>>> timeit(
...     "in_test(iterable)",
...     setup="from __main__ import in_test; iterable = set(range(1000))",
...     number=10000)
0.5591847896575928
>>> timeit(
...     "in_test(iterable)",
...     setup="from __main__ import in_test; iterable = list(range(1000))",
...     number=10000)
50.18339991569519
>>> timeit(
...     "in_test(iterable)",
...     setup="from __main__ import in_test; iterable = tuple(range(1000))",
...     number=10000)
51.597304821014404

回答 2

列表效果:

>>> import timeit
>>> timeit.timeit(stmt='10**6 in a', setup='a = range(10**6)', number=100000)
0.008128150348026608

设置效果:

>>> timeit.timeit(stmt='10**6 in a', setup='a = set(range(10**6))', number=100000)
0.005674857488571661

您可能要考虑元组,因为它们与列表相似,但是无法修改。它们占用的内存略少,并且访问速度更快。它们不像列表那样灵活,但效率更高。它们的正常用途是用作字典键。

集也是序列结构,但与列表和元组有两个区别。尽管集合确实具有顺序,但是该顺序是任意的,不在程序员的控制之下。第二个区别是集合中的元素必须唯一。

set根据定义。[ python | Wiki ]。

>>> x = set([1, 1, 2, 2, 3, 3])
>>> x
{1, 2, 3}

List performance:

>>> import timeit
>>> timeit.timeit(stmt='10**6 in a', setup='a = range(10**6)', number=100000)
0.008128150348026608

Set performance:

>>> timeit.timeit(stmt='10**6 in a', setup='a = set(range(10**6))', number=100000)
0.005674857488571661

You may want to consider Tuples as they’re similar to lists but can’t be modified. They take up slightly less memory and are faster to access. They aren’t as flexible but are more efficient than lists. Their normal use is to serve as dictionary keys.

Sets are also sequence structures but with two differences from lists and tuples. Although sets do have an order, that order is arbitrary and not under the programmer’s control. The second difference is that the elements in a set must be unique.

set by definition. [python | wiki].

>>> x = set([1, 1, 2, 2, 3, 3])
>>> x
{1, 2, 3}

回答 3

Set由于近乎即时的“包含”检查而获胜:https//en.wikipedia.org/wiki/Hash_table

列表实现:通常是一个数组,靠近金属层较低,适合于迭代和按元素索引随机访问。

设置实现:https : //en.wikipedia.org/wiki/Hash_table,它不会在列表上进行迭代,而是通过计算键中的哈希值来找到元素,因此它取决于键元素和哈希值的性质功能。类似于用于字典的内容。我怀疑list如果元素很少(<5)可能会更快,元素计数越大,set包含检查的性能越好。它也可以快速添加和删除元素。还请始终牢记,构建一套需要付出代价!

注意:如果list已经对进行了排序,则搜索list可能会很快,但是对于通常情况set,包含检查的a 会更快,更简单。

Set wins due to near instant ‘contains’ checks: https://en.wikipedia.org/wiki/Hash_table

List implementation: usually an array, low level close to the metal good for iteration and random access by element index.

Set implementation: https://en.wikipedia.org/wiki/Hash_table, it does not iterate on a list, but finds the element by computing a hash from the key, so it depends on the nature of the key elements and the hash function. Similar to what is used for dict. I suspect list could be faster if you have very few elements (< 5), the larger element count the better the set will perform for a contains check. It is also fast for element addition and removal. Also always keep in mind that building a set has a cost !

NOTE: If the list is already sorted, searching the list could be quite fast on small lists, but with more data a set is faster for contains checks.


回答 4

tl; dr

数据结构(DS)很重要,因为它们用于对数据执行操作,这基本上意味着:接受一些输入,对其进行处理,然后返回输出

在某些特定情况下,某些数据结构比其他数据结构更有用。因此,询问哪个(DS)更有效/更快是相当不公平的。这就像问刀和叉之间哪种工具更有效。我的意思是所有情况都取决于情况。

清单

列表是可变序列通常用于存储同类项目的集合

套装

集合对象是不同的可哈希对象无序集合。它通常用于测试成员资格,从序列中删除重复项以及计算数学运算(例如交集,并集,差和对称差)。

用法

从一些答案中可以明显看出,迭代值时列表比集合快得多。另一方面,检查项目是否包含列表时,集合比列表快。因此,对于某些特定操作,您唯一能说的是列表比集合要好,反之亦然。

tl;dr

Data structures (DS) are important because they are used to perform operations on data which basically implies: take some input, process it, and give back the output.

Some data structures are more useful than others in some particular cases. Therefore, it is quite unfair to ask which (DS) is more efficient/speedy. It is like asking which tool is more efficient between a knife and fork. I mean all depends on the situation.

Lists

A list is mutable sequence, typically used to store collections of homogeneous items.

Sets

A set object is an unordered collection of distinct hashable objects. It is commonly used to test membership, remove duplicates from a sequence, and compute mathematical operations such as intersection, union, difference, and symmetric difference.

Usage

From some of the answers, it is clear that a list is quite faster than a set when iterating over the values. On the other hand, a set is faster than a list when checking if an item is contained within it. Therefore, the only thing you can say is that a list is better than a set for some particular operations and vice-versa.


回答 5

当使用CPython检查值是否为少量文字之一时,我对结果感兴趣。set在Python 3 vs中获胜tuplelist并且or

from timeit import timeit

def in_test1():
  for i in range(1000):
    if i in (314, 628):
      pass

def in_test2():
  for i in range(1000):
    if i in [314, 628]:
      pass

def in_test3():
  for i in range(1000):
    if i in {314, 628}:
      pass

def in_test4():
  for i in range(1000):
    if i == 314 or i == 628:
      pass

print("tuple")
print(timeit("in_test1()", setup="from __main__ import in_test1", number=100000))
print("list")
print(timeit("in_test2()", setup="from __main__ import in_test2", number=100000))
print("set")
print(timeit("in_test3()", setup="from __main__ import in_test3", number=100000))
print("or")
print(timeit("in_test4()", setup="from __main__ import in_test4", number=100000))

输出:

tuple
4.735646052286029
list
4.7308746771886945
set
3.5755991376936436
or
4.687681658193469

对于3到5个字面量,set仍然会以较大幅度获胜,并or成为最慢的。

在Python 2中,set总是最慢的。or是最快的2至3文本和tuplelist是具有4个或多个文字更快。我无法区分tuplevs 的速度list

当要测试的值被缓存在函数之外的全局变量中,而不是在循环中创建文字set时,即使在Python 2中,每次也会赢。

这些结果适用于Core i7上的64位CPython。

I was interested in the results when checking, with CPython, if a value is one of a small number of literals. set wins in Python 3 vs tuple, list and or:

from timeit import timeit

def in_test1():
  for i in range(1000):
    if i in (314, 628):
      pass

def in_test2():
  for i in range(1000):
    if i in [314, 628]:
      pass

def in_test3():
  for i in range(1000):
    if i in {314, 628}:
      pass

def in_test4():
  for i in range(1000):
    if i == 314 or i == 628:
      pass

print("tuple")
print(timeit("in_test1()", setup="from __main__ import in_test1", number=100000))
print("list")
print(timeit("in_test2()", setup="from __main__ import in_test2", number=100000))
print("set")
print(timeit("in_test3()", setup="from __main__ import in_test3", number=100000))
print("or")
print(timeit("in_test4()", setup="from __main__ import in_test4", number=100000))

Output:

tuple
4.735646052286029
list
4.7308746771886945
set
3.5755991376936436
or
4.687681658193469

For 3 to 5 literals, set still wins by a wide margin, and or becomes the slowest.

In Python 2, set is always the slowest. or is the fastest for 2 to 3 literals, and tuple and list are faster with 4 or more literals. I couldn’t distinguish the speed of tuple vs list.

When the values to test were cached in a global variable out of the function, rather than creating the literal within the loop, set won every time, even in Python 2.

These results apply to 64-bit CPython on a Core i7.


回答 6

我建议您使用用例仅限于引用或搜索存在的Set实现,以及使用用例需要您执行迭代的Tuple实现。列表是低级别的实现,需要大量的内存开销。

I would recommend a Set implementation where the use case is limit to referencing or search for existence and Tuple implementation where the use case requires you to perform iteration. A list is a low-level implementation and requires significant memory overhead.


回答 7

from datetime import datetime
listA = range(10000000)
setA = set(listA)
tupA = tuple(listA)
#Source Code

def calc(data, type):
start = datetime.now()
if data in type:
print ""
end = datetime.now()
print end-start

calc(9999, listA)
calc(9999, tupA)
calc(9999, setA)

比较所有3的10次迭代后的输出: 比较

from datetime import datetime
listA = range(10000000)
setA = set(listA)
tupA = tuple(listA)
#Source Code

def calc(data, type):
start = datetime.now()
if data in type:
print ""
end = datetime.now()
print end-start

calc(9999, listA)
calc(9999, tupA)
calc(9999, setA)

Output after comparing 10 iterations for all 3 : Comparison


回答 8

集合更快,而且您可以通过集合获得更多功能,比如说您有两个集合:

set1 = {"Harry Potter", "James Bond", "Iron Man"}
set2 = {"Captain America", "Black Widow", "Hulk", "Harry Potter", "James Bond"}

我们可以轻松地加入两个集合:

set3 = set1.union(set2)

找出两者的共同点:

set3 = set1.intersection(set2)

找出两者的不同之处:

set3 = set1.difference(set2)

以及更多!只是尝试一下,它们很有趣!此外,如果您必须处理2个列表中的不同值或2个列表中的公用值,我更喜欢将列表转换为集合,许多程序员都采用这种方式。希望它对您有帮助:-)

Sets are faster, morover you get more functions with sets, such as lets say you have two sets :

set1 = {"Harry Potter", "James Bond", "Iron Man"}
set2 = {"Captain America", "Black Widow", "Hulk", "Harry Potter", "James Bond"}

We can easily join two sets:

set3 = set1.union(set2)

Find out what is common in both:

set3 = set1.intersection(set2)

Find out what is different in both:

set3 = set1.difference(set2)

And much more! Just try them out, they are fun! Moreover if you have to work on the different values within 2 list or common values within 2 lists, I prefer to convert your lists to sets, and many programmers do in that way. Hope it helps you :-)


添加列表进行设置?

问题:添加列表进行设置?

在Python 2.6解释器上测试:

>>> a=set('abcde')
>>> a
set(['a', 'c', 'b', 'e', 'd'])
>>> l=['f','g']
>>> l
['f', 'g']
>>> a.add(l)
Traceback (most recent call last):
  File "<pyshell#35>", line 1, in <module>
    a.add(l)
TypeError: list objects are unhashable

我认为我无法将列表添加到集合中,因为Python无法告诉我是否两次添加了相同的列表。有解决方法吗?

编辑:我想添加列表本身,而不是其元素。

Tested on Python 2.6 interpreter:

>>> a=set('abcde')
>>> a
set(['a', 'c', 'b', 'e', 'd'])
>>> l=['f','g']
>>> l
['f', 'g']
>>> a.add(l)
Traceback (most recent call last):
  File "<pyshell#35>", line 1, in <module>
    a.add(l)
TypeError: list objects are unhashable

I think that I can’t add the list to the set because there’s no way Python can tell If I have added the same list twice. Is there a workaround?

EDIT: I want to add the list itself, not its elements.


回答 0

您不能将列表添加到集合中,因为列表是可变的,这意味着您可以在将列表添加到集合后更改列表的内容。

但是,您可以将元组添加到集合中,因为您不能更改元组的内容:

>>> a.add(('f', 'g'))
>>> print a
set(['a', 'c', 'b', 'e', 'd', ('f', 'g')])

编辑:某些解释:文档将a定义set不同的可哈希对象的无序集合。这些对象必须是可哈希化的,因此,与每次执行这些操作时都要查看每个单独的元素相比,查找,添加和删除元素可以更快地完成。Wikipedia文章中说明了使用的特定算法。在effbot.org上说明了Python的哈希算法,并且__hash__python参考中提供了pythons函数。

一些事实:

  • 设置元素以及字典键必须是可哈希的
  • 一些不可散列的数据类型:
    • listtuple改用
    • setfrozenset改用
    • dict:没有官方对应文件,但有一些 食谱
  • 默认情况下,对象实例是可哈希的,每个实例具有唯一的哈希。您可以按照python参考中的说明覆盖此行为。

You can’t add a list to a set because lists are mutable, meaning that you can change the contents of the list after adding it to the set.

You can however add tuples to the set, because you cannot change the contents of a tuple:

>>> a.add(('f', 'g'))
>>> print a
set(['a', 'c', 'b', 'e', 'd', ('f', 'g')])

Edit: some explanation: The documentation defines a set as an unordered collection of distinct hashable objects. The objects have to be hashable so that finding, adding and removing elements can be done faster than looking at each individual element every time you perform these operations. The specific algorithms used are explained in the Wikipedia article. Pythons hashing algorithms are explained on effbot.org and pythons __hash__ function in the python reference.

Some facts:

  • Set elements as well as dictionary keys have to be hashable
  • Some unhashable datatypes:
    • list: use tuple instead
    • set: use frozenset instead
    • dict: has no official counterpart, but there are some recipes
  • Object instances are hashable by default with each instance having a unique hash. You can override this behavior as explained in the python reference.

回答 1

使用set.update()|=

>>> a = set('abc')
>>> l = ['d', 'e']
>>> a.update(l)
>>> a
{'e', 'b', 'c', 'd', 'a'}

>>> l = ['f', 'g']
>>> a |= set(l)
>>> a
{'e', 'b', 'f', 'c', 'd', 'g', 'a'}

编辑:如果要添加列表本身而不是其成员,那么很遗憾,您必须使用元组。集合成员必须是可哈希的

Use set.update() or |=

>>> a = set('abc')
>>> l = ['d', 'e']
>>> a.update(l)
>>> a
{'e', 'b', 'c', 'd', 'a'}

>>> l = ['f', 'g']
>>> a |= set(l)
>>> a
{'e', 'b', 'f', 'c', 'd', 'g', 'a'}

edit: If you want to add the list itself and not its members, then you must use a tuple, unfortunately. Set members must be hashable.


回答 2

要将列表的元素添加到集合中,请使用update

https://docs.python.org/2/library/sets.html

s.update(t):返回集合s,其中元素从t添加

例如

>>> s = set([1, 2])
>>> l = [3, 4]
>>> s.update(l)
>>> s
{1, 2, 3, 4}

如果您想将整个列表作为单个元素添加到集合中,则不能这样做,因为列表不可散列。您可以改为添加一个元组,例如s.add(tuple(l))。有关更多信息,请参阅TypeError:无法散列的类型:“ list”(在使用内置set函数时)

To add the elements of a list to a set, use update

From https://docs.python.org/2/library/sets.html

s.update(t): return set s with elements added from t

E.g.

>>> s = set([1, 2])
>>> l = [3, 4]
>>> s.update(l)
>>> s
{1, 2, 3, 4}

If you instead want to add the entire list as a single element to the set, you can’t because lists aren’t hashable. You could instead add a tuple, e.g. s.add(tuple(l)). See also TypeError: unhashable type: ‘list’ when using built-in set function for more information on that.


回答 3

希望这会有所帮助:

>>> seta = set('1234')
>>> listb = ['a','b','c']
>>> seta.union(listb)
set(['a', 'c', 'b', '1', '3', '2', '4'])
>>> seta
set(['1', '3', '2', '4'])
>>> seta = seta.union(listb)
>>> seta
set(['a', 'c', 'b', '1', '3', '2', '4'])

Hopefully this helps:

>>> seta = set('1234')
>>> listb = ['a','b','c']
>>> seta.union(listb)
set(['a', 'c', 'b', '1', '3', '2', '4'])
>>> seta
set(['1', '3', '2', '4'])
>>> seta = seta.union(listb)
>>> seta
set(['a', 'c', 'b', '1', '3', '2', '4'])

回答 4

请注意该功能set.update()。该文档说:

用本身和其他元素的并集更新集合。

Please notice the function set.update(). The documentation says:

Update a set with the union of itself and others.


回答 5

列表对象是不可散列的。您可能想将它们变成元组。

list objects are unhashable. you might want to turn them in to tuples though.


回答 6

集合不能具有可变的(可变的)元素/成员。可变列表不能成为集合的成员。

由于集合是可变的,因此不能有集合!但是,您可以设置一组Frozensets。

(同一种“可变性要求”适用于字典的键。)

其他答案已经为您提供了代码,我希望这能提供一些见识。我希望Alex Martelli会提供更多详细信息。

Sets can’t have mutable (changeable) elements/members. A list, being mutable, cannot be a member of a set.

As sets are mutable, you cannot have a set of sets! You can have a set of frozensets though.

(The same kind of “mutability requirement” applies to the keys of a dict.)

Other answers have already given you code, I hope this gives a bit of insight. I’m hoping Alex Martelli will answer with even more details.


回答 7

您要添加一个元组,而不是列表:

>>> a=set('abcde')
>>> a
set(['a', 'c', 'b', 'e', 'd'])
>>> l=['f','g']
>>> l
['f', 'g']
>>> t = tuple(l)
>>> t
('f', 'g')
>>> a.add(t)
>>> a
set(['a', 'c', 'b', 'e', 'd', ('f', 'g')])

如果有列表,则可以转换为元组,如上所示。元组是不可变的,因此可以将其添加到集合中。

You want to add a tuple, not a list:

>>> a=set('abcde')
>>> a
set(['a', 'c', 'b', 'e', 'd'])
>>> l=['f','g']
>>> l
['f', 'g']
>>> t = tuple(l)
>>> t
('f', 'g')
>>> a.add(t)
>>> a
set(['a', 'c', 'b', 'e', 'd', ('f', 'g')])

If you have a list, you can convert to the tuple, as shown above. A tuple is immutable, so it can be added to the set.


回答 8

我发现我今天需要做类似的事情。该算法知道何时创建需要添加到集合中的新列表,但是不知道何时完成对列表的操作。

无论如何,我想要的行为是设置为使用id而不是hash。因此,我发现自己mydict[id(mylist)] = mylist没有myset.add(mylist)提供想要的行为。

I found I needed to do something similar today. The algorithm knew when it was creating a new list that needed to added to the set, but not when it would have finished operating on the list.

Anyway, the behaviour I wanted was for set to use id rather than hash. As such I found mydict[id(mylist)] = mylist instead of myset.add(mylist) to offer the behaviour I wanted.


回答 9

您将要使用可哈希化的元组(不能哈希诸如列表的可变对象)。

>>> a = set("abcde")
>>> a
set(['a', 'c', 'b', 'e', 'd'])
>>> t = ('f', 'g')
>>> a.add(t)
>>> a
set(['a', 'c', 'b', 'e', 'd', ('f', 'g')])

You’ll want to use tuples, which are hashable (you can’t hash a mutable object like a list).

>>> a = set("abcde")
>>> a
set(['a', 'c', 'b', 'e', 'd'])
>>> t = ('f', 'g')
>>> a.add(t)
>>> a
set(['a', 'c', 'b', 'e', 'd', ('f', 'g')])

回答 10

这是我通常的做法:

def add_list_to_set(my_list, my_set):
    [my_set.add(each) for each in my_list]
return my_set

Here is how I usually do it:

def add_list_to_set(my_list, my_set):
    [my_set.add(each) for each in my_list]
return my_set

回答 11

应该这样做:

set(tuple(i) for i in L)

This should do:

set(tuple(i) for i in L)

找到多个集合的交集的最佳方法?

问题:找到多个集合的交集的最佳方法?

我有一套清单:

setlist = [s1,s2,s3...]

我要s1∩s2∩s3 …

我可以编写一个函数来执行一系列逐对操作s1.intersection(s2),等等。

有没有推荐,更好或内置的方法?

I have a list of sets:

setlist = [s1,s2,s3...]

I want s1 ∩ s2 ∩ s3 …

I can write a function to do it by performing a series of pairwise s1.intersection(s2), etc.

Is there a recommended, better, or built-in way?


回答 0

从Python 2.6版开始,您可以对使用多个参数set.intersection(),例如

u = set.intersection(s1, s2, s3)

如果这些集合在列表中,则表示:

u = set.intersection(*setlist)

这里*a_list列表扩展

请注意,set.intersection不是一个静态的方法,但这种使用功能符号应用第一套交叉口列表的其余部分。因此,如果参数列表为空,则将失败。

From Python version 2.6 on you can use multiple arguments to set.intersection(), like

u = set.intersection(s1, s2, s3)

If the sets are in a list, this translates to:

u = set.intersection(*setlist)

where *a_list is list expansion

Note that set.intersection is not a static method, but this uses the functional notation to apply intersection of the first set with the rest of the list. So if the argument list is empty this will fail.


回答 1

从2.6开始,set.intersection任意可迭代。

>>> s1 = set([1, 2, 3])
>>> s2 = set([2, 3, 4])
>>> s3 = set([2, 4, 6])
>>> s1 & s2 & s3
set([2])
>>> s1.intersection(s2, s3)
set([2])
>>> sets = [s1, s2, s3]
>>> set.intersection(*sets)
set([2])

As of 2.6, set.intersection takes arbitrarily many iterables.

>>> s1 = set([1, 2, 3])
>>> s2 = set([2, 3, 4])
>>> s3 = set([2, 4, 6])
>>> s1 & s2 & s3
set([2])
>>> s1.intersection(s2, s3)
set([2])
>>> sets = [s1, s2, s3]
>>> set.intersection(*sets)
set([2])

回答 2

显然,set.intersection这里是您想要的,但是如果您需要概括“取所有这些和”,“取所有这些的乘积”,“取所有这些的异或”,则您想要的是reduce功能:

from operator import and_
from functools import reduce
print(reduce(and_, [{1,2,3},{2,3,4},{3,4,5}])) # = {3}

要么

print(reduce((lambda x,y: x&y), [{1,2,3},{2,3,4},{3,4,5}])) # = {3}

Clearly set.intersection is what you want here, but in case you ever need a generalisation of “take the sum of all these”, “take the product of all these”, “take the xor of all these”, what you are looking for is the reduce function:

from operator import and_
from functools import reduce
print(reduce(and_, [{1,2,3},{2,3,4},{3,4,5}])) # = {3}

or

print(reduce((lambda x,y: x&y), [{1,2,3},{2,3,4},{3,4,5}])) # = {3}

回答 3

如果您没有Python 2.6或更高版本,则可以选择编写一个显式的for循环:

def set_list_intersection(set_list):
  if not set_list:
    return set()
  result = set_list[0]
  for s in set_list[1:]:
    result &= s
  return result

set_list = [set([1, 2]), set([1, 3]), set([1, 4])]
print set_list_intersection(set_list)
# Output: set([1])

您也可以使用reduce

set_list = [set([1, 2]), set([1, 3]), set([1, 4])]
print reduce(lambda s1, s2: s1 & s2, set_list)
# Output: set([1])

但是,许多Python程序员都不喜欢它,包括Guido本人

大约12年前,Python获得了lambda,reduce(),filter()和map(),这是由(我相信)一个Lisp黑客(他错过了它们并提交了工作补丁)提供的。但是,尽管具有PR值,但我认为应该从Python 3000中删除这些功能。

所以现在reduce()。这实际上是我一直最讨厌的一个,因为除了一些涉及+或*的示exceptions,几乎每次我看到带有非平凡函数参数的reduce()调用时,我都需要拿笔和纸来在我了解reduce()应该做什么之前,请先绘制出该函数实际输入的内容。因此,在我看来,reduce()的适用性几乎仅限于关联运算符,在所有其他情况下,最好显式地写出累加循环。

If you don’t have Python 2.6 or higher, the alternative is to write an explicit for loop:

def set_list_intersection(set_list):
  if not set_list:
    return set()
  result = set_list[0]
  for s in set_list[1:]:
    result &= s
  return result

set_list = [set([1, 2]), set([1, 3]), set([1, 4])]
print set_list_intersection(set_list)
# Output: set([1])

You can also use reduce:

set_list = [set([1, 2]), set([1, 3]), set([1, 4])]
print reduce(lambda s1, s2: s1 & s2, set_list)
# Output: set([1])

However, many Python programmers dislike it, including Guido himself:

About 12 years ago, Python aquired lambda, reduce(), filter() and map(), courtesy of (I believe) a Lisp hacker who missed them and submitted working patches. But, despite of the PR value, I think these features should be cut from Python 3000.

So now reduce(). This is actually the one I’ve always hated most, because, apart from a few examples involving + or *, almost every time I see a reduce() call with a non-trivial function argument, I need to grab pen and paper to diagram what’s actually being fed into that function before I understand what the reduce() is supposed to do. So in my mind, the applicability of reduce() is pretty much limited to associative operators, and in all other cases it’s better to write out the accumulation loop explicitly.


回答 4

在这里,我为多个集合交集提供了一个通用函数,试图利用现有的最佳方法:

def multiple_set_intersection(*sets):
    """Return multiple set intersection."""
    try:
        return set.intersection(*sets)
    except TypeError: # this is Python < 2.6 or no arguments
        pass

    try: a_set= sets[0]
    except IndexError: # no arguments
        return set() # return empty set

    return reduce(a_set.intersection, sets[1:])

Guido可能不喜欢reduce,但我对此很喜欢:)

Here I’m offering a generic function for multiple set intersection trying to take advantage of the best method available:

def multiple_set_intersection(*sets):
    """Return multiple set intersection."""
    try:
        return set.intersection(*sets)
    except TypeError: # this is Python < 2.6 or no arguments
        pass

    try: a_set= sets[0]
    except IndexError: # no arguments
        return set() # return empty set

    return reduce(a_set.intersection, sets[1:])

Guido might dislike reduce, but I’m kind of fond of it :)


回答 5

Jean-FrançoisFabre set.intesection(* list_of_sets)答案无疑是最pyhtonic的答案,并且是公认的答案。

对于那些想要使用reduce的用户,以下方法也将起作用:

reduce(set.intersection, list_of_sets)

Jean-François Fabre set.intesection(*list_of_sets) answer is definetly the most Pyhtonic and is rightly the accepted answer.

For those that want to use reduce, the following will also work:

reduce(set.intersection, list_of_sets)


在Python中,什么时候使用字典,列表或集合?

问题:在Python中,什么时候使用字典,列表或集合?

我什么时候应该使用字典,列表或集合?

是否存在更适合每种数据类型的方案?

When should I use a dictionary, list or set?

Are there scenarios that are more suited for each data type?


回答 0

一个list保持秩序,dictset不要:当你关心的秩序,因此,您必须使用list(如果你的容器的选择仅限于这三种,当然;-)。

dict与每个键关联一个值,而listset仅包含值:很明显,非常不同的用例。

set要求项目是可哈希的,list不是:如果您有不可哈希的项目,则不能使用,set而必须使用list

set禁止重复,list不禁止:也是至关重要的区别。(可以在以下位置找到“多重集”,该多重集将重复项映射到不止一次存在的项目的不同计数中;如果出于某些奇怪的原因而无法导入,则collections.Counter可以将其构建为,或者在2.7之前的版本中Python作为,使用项目作为键,并将相关值作为计数)。dictcollectionscollections.defaultdict(int)

set(或dict键中)中检查值的隶属关系非常快(花费一个恒定的短时间),而在列表中,它花费的时间与列表的长度成正比(在一般情况下和最坏情况下)。因此,如果您有可散列的项目,则不关心订单或重复项,而希望快速进行成员资格检查set比更好list

A list keeps order, dict and set don’t: when you care about order, therefore, you must use list (if your choice of containers is limited to these three, of course;-).

dict associates with each key a value, while list and set just contain values: very different use cases, obviously.

set requires items to be hashable, list doesn’t: if you have non-hashable items, therefore, you cannot use set and must instead use list.

set forbids duplicates, list does not: also a crucial distinction. (A “multiset”, which maps duplicates into a different count for items present more than once, can be found in collections.Counter — you could build one as a dict, if for some weird reason you couldn’t import collections, or, in pre-2.7 Python as a collections.defaultdict(int), using the items as keys and the associated value as the count).

Checking for membership of a value in a set (or dict, for keys) is blazingly fast (taking about a constant, short time), while in a list it takes time proportional to the list’s length in the average and worst cases. So, if you have hashable items, don’t care either way about order or duplicates, and want speedy membership checking, set is better than list.


回答 1

  • 您是否只需要订购的物品序列?取得清单。
  • 你只需要知道你是否已经一个特定的值,但不排序(你不需要存储复本)?使用一套。
  • 您是否需要将值与键相关联,以便稍后可以有效地(通过键)查找它们?使用字典。
  • Do you just need an ordered sequence of items? Go for a list.
  • Do you just need to know whether or not you’ve already got a particular value, but without ordering (and you don’t need to store duplicates)? Use a set.
  • Do you need to associate values with keys, so you can look them up efficiently (by key) later on? Use a dictionary.

回答 2

如果您想要无序的唯一元素集合,请使用set。(例如,当您要在文档中使用所有单词的集合时)。

当您想要收集元素的不可变的有序列表时,请使用tuple。(例如,当您希望将(名称,phone_number)对用作集合中的元素时,您将需要一个元组而不是一个列表,因为集合要求元素是不可变的。

当您想收集元素的可变的有序列表时,请使用list。(例如,当您要将新的电话号码追加到列表中时:[number1,number2,…])。

当您想要从键到值的映射时,请使用dict。(例如,当您需要将姓名映射到电话号码的电话簿时:){'John Smith' : '555-1212'}。请注意,字典中的键是无序的。(如果您遍历字典(电话簿),则按键(名称)可能以任何顺序显示)。

When you want an unordered collection of unique elements, use a set. (For example, when you want the set of all the words used in a document).

When you want to collect an immutable ordered list of elements, use a tuple. (For example, when you want a (name, phone_number) pair that you wish to use as an element in a set, you would need a tuple rather than a list since sets require elements be immutable).

When you want to collect a mutable ordered list of elements, use a list. (For example, when you want to append new phone numbers to a list: [number1, number2, …]).

When you want a mapping from keys to values, use a dict. (For example, when you want a telephone book which maps names to phone numbers: {'John Smith' : '555-1212'}). Note the keys in a dict are unordered. (If you iterate through a dict (telephone book), the keys (names) may show up in any order).


回答 3

  • 当您有一组映射到值的唯一键时,请使用字典。

  • 如果您有项目的有序集合,请使用列表。

  • 使用一组存储一组无序的项目。

  • Use a dictionary when you have a set of unique keys that map to values.

  • Use a list if you have an ordered collection of items.

  • Use a set to store an unordered set of items.


回答 4

简而言之,使用:

list -如果您需要订购的物品序列。

dict -如果您需要将值与键相关联

set -如果您需要保留唯一元素。

详细说明

清单

列表是可变序列,通常用于存储同类项目的集合。

列表实现了所有常见的序列操作:

  • x in lx not in l
  • l[i]l[i:j]l[i:j:k]
  • len(l)min(l)max(l)
  • l.count(x)
  • l.index(x[, i[, j]])-的第一出现的索引xl(在或之后i和之前j的indeces)

列表还实现了所有可变序列操作:

  • l[i] = x-项目il被替换x
  • l[i:j] = tlito的切片j被iterable的内容替换t
  • del l[i:j] – 如同 l[i:j] = []
  • l[i:j:k] = t-的元素l[i:j:k]已替换为t
  • del l[i:j:k]s[i:j:k]从列表中删除的元素
  • l.append(x)-追加x到序列的末尾
  • l.clear()-从中删除所有项目l(与del相同l[:]
  • l.copy()-创建的浅表副本l(与相同l[:]
  • l.extend(t)l += t-扩展l以下内容t
  • l *= n-更新l其内容重复n
  • l.insert(i, x)-插入xl由下式给出的指数在i
  • l.pop([i])-在处检索项目,i并将其从中删除l
  • l.remove(x)-从等于x的l位置删除第一项l[i]
  • l.reverse()-反转l到位的项目

利用方法append和可以将列表用作堆栈pop

字典

字典将可散列的值映射到任意对象。字典是可变对象。字典的主要操作是使用一些键存储值并提取给定键的值。

在字典中,不能将不可哈希的值(即包含列表,字典或其他可变类型的值)用作键。

集合是不同的可哈希对象的无序集合。集合通常用于进行成员资格测试,从序列中删除重复项以及计算数学运算(例如交集,并集,差和对称差)。

In short, use:

list – if you require an ordered sequence of items.

dict – if you require to relate values with keys

set – if you require to keep unique elements.

Detailed Explanation

List

A list is a mutable sequence, typically used to store collections of homogeneous items.

A list implements all of the common sequence operations:

  • x in l and x not in l
  • l[i], l[i:j], l[i:j:k]
  • len(l), min(l), max(l)
  • l.count(x)
  • l.index(x[, i[, j]]) – index of the 1st occurrence of x in l (at or after i and before j indeces)

A list also implements all of the mutable sequence operations:

  • l[i] = x – item i of l is replaced by x
  • l[i:j] = t – slice of l from i to j is replaced by the contents of the iterable t
  • del l[i:j] – same as l[i:j] = []
  • l[i:j:k] = t – the elements of l[i:j:k] are replaced by those of t
  • del l[i:j:k] – removes the elements of s[i:j:k] from the list
  • l.append(x) – appends x to the end of the sequence
  • l.clear() – removes all items from l (same as del l[:])
  • l.copy() – creates a shallow copy of l (same as l[:])
  • l.extend(t) or l += t – extends l with the contents of t
  • l *= n – updates l with its contents repeated n times
  • l.insert(i, x) – inserts x into l at the index given by i
  • l.pop([i]) – retrieves the item at i and also removes it from l
  • l.remove(x) – remove the first item from l where l[i] is equal to x
  • l.reverse() – reverses the items of l in place

A list could be used as stack by taking advantage of the methods append and pop.

Dictionary

A dictionary maps hashable values to arbitrary objects. A dictionary is a mutable object. The main operations on a dictionary are storing a value with some key and extracting the value given the key.

In a dictionary, you cannot use as keys values that are not hashable, that is, values containing lists, dictionaries or other mutable types.

Set

A set is an unordered collection of distinct hashable objects. A set is commonly used to include membership testing, removing duplicates from a sequence, and computing mathematical operations such as intersection, union, difference, and symmetric difference.


回答 5

尽管这并不涵盖sets,但这是对dicts和lists 的很好解释:

列表看起来就是-值列表。它们中的每一个都从零开始编号-第一个从零开始编号,第二个为1,第三个为2,依此类推。您可以从列表中删除值,并在末尾添加新值。例如:您的许多猫的名字。

字典类似于其名称所暗示的内容-字典。在字典中,您有单词的“索引”,并且每个单词都有一个定义。在python中,单词称为“键”,而定义称为“值”。字典中的值未编号-类似于其名称所建议的名称-字典。在字典中,您有单词的“索引”,并且每个单词都有一个定义。字典中的值没有编号-它们也没有任何特定的顺序-键执行相同的操作。您可以添加,删除和修改字典中的值。例如:电话簿。

http://www.sthurlow.com/python/lesson06/

Although this doesn’t cover sets, it is a good explanation of dicts and lists:

Lists are what they seem – a list of values. Each one of them is numbered, starting from zero – the first one is numbered zero, the second 1, the third 2, etc. You can remove values from the list, and add new values to the end. Example: Your many cats’ names.

Dictionaries are similar to what their name suggests – a dictionary. In a dictionary, you have an ‘index’ of words, and for each of them a definition. In python, the word is called a ‘key’, and the definition a ‘value’. The values in a dictionary aren’t numbered – tare similar to what their name suggests – a dictionary. In a dictionary, you have an ‘index’ of words, and for each of them a definition. The values in a dictionary aren’t numbered – they aren’t in any specific order, either – the key does the same thing. You can add, remove, and modify the values in dictionaries. Example: telephone book.

http://www.sthurlow.com/python/lesson06/


回答 6

对于C ++,我始终牢记以下流程图:在哪种情况下,我使用特定的STL容器?,所以我很好奇Python3是否也有类似的东西,但是我没有运气。

对于Python,需要记住的是:没有像C ++一样的Python标准。因此,不同的Python解释器(例如CPython,PyPy)可能会有巨大的差异。以下流程图适用于CPython。

另外,我发现包含以下数据结构到图中,没有什么好办法:bytesbyte arraystuplesnamed_tuplesChainMapCounter,和arrays

  • OrderedDict并且deque可以通过collections模块获得。
  • heapq可从heapq模块中获得
  • LifoQueueQueuePriorityQueue可以通过queue专门用于并发(线程)访问的模块获得。(也有一个multiprocessing.Queue可用的,但我不知道与它之间的区别,queue.Queue但是假设需要从进程进行并发访问时应该使用它。)
  • dictsetfrozen_set,和list被内置当然

对于任何人,如果您可以改善此答案并在各个方面提供更好的图表,我将不胜感激。随时欢迎。 流程图

PS:该图已通过yed制作。graphml文件在这里

For C++ I was always having this flow chart in mind: In which scenario do I use a particular STL container?, so I was curious if something similar is available for Python3 as well, but I had no luck.

What you need to keep in mind for Python is: There is no single Python standard as for C++. Hence there might be huge differences for different Python interpreters (e.g. CPython, PyPy). The following flow chart is for CPython.

Additionally I found no good way to incorporate the following data structures into the diagram: bytes, byte arrays, tuples, named_tuples, ChainMap, Counter, and arrays.

  • OrderedDict and deque are available via collections module.
  • heapq is available from the heapq module
  • LifoQueue, Queue, and PriorityQueue are available via the queue module which is designed for concurrent (threads) access. (There is also a multiprocessing.Queue available but I don’t know the differences to queue.Queue but would assume that it should be used when concurrent access from processes is needed.)
  • dict, set, frozen_set, and list are builtin of course

For anyone I would be grateful if you could improve this answer and provide a better diagram in every aspect. Feel free and welcome. flowchart

PS: the diagram has been made with yed. The graphml file is here


回答 7

结合列表字典集合,还有另一个有趣的python对象OrderedDicts

顺序词典与常规词典一样,但是它们记住项目插入的顺序。在有序字典上进行迭代时,将按照项的键首次添加的顺序返回项。

当您需要保留键的顺序(例如处理文档)时,OrderedDicts可能会很有用:通常需要文档中所有术语的向量表示。因此,使用OrderedDicts,您可以有效地验证术语是否已被阅读过,添加术语,提取术语,以及在所有操作之后可以提取它们的有序矢量表示。

In combination with lists, dicts and sets, there are also another interesting python objects, OrderedDicts.

Ordered dictionaries are just like regular dictionaries but they remember the order that items were inserted. When iterating over an ordered dictionary, the items are returned in the order their keys were first added.

OrderedDicts could be useful when you need to preserve the order of the keys, for example working with documents: It’s common to need the vector representation of all terms in a document. So using OrderedDicts you can efficiently verify if a term has been read before, add terms, extract terms, and after all the manipulations you can extract the ordered vector representation of them.


回答 8

列表就是它们的外观-值列表。它们中的每一个都从零开始编号-第一个从零开始编号,第二个为1,第三个为2,依此类推。您可以从列表中删除值,并在末尾添加新值。例如:您的许多猫的名字。

元组就像列表一样,但是您不能更改它们的值。首先给出的值是程序其余部分所保持的值。同样,每个值都从零开始编号,以方便参考。示例:一年中的月份名称。

字典类似于其名称所暗示的内容-字典。在字典中,您有单词的“索引”,并且每个单词都有一个定义。在python中,单词称为“键”,而定义称为“值”。字典中的值未编号-类似于其名称所建议的名称-字典。在字典中,您有单词的“索引”,并且每个单词都有一个定义。在python中,单词称为“键”,而定义称为“值”。字典中的值没有编号-它们也没有任何特定的顺序-键执行相同的操作。您可以添加,删除和修改字典中的值。例如:电话簿。

Lists are what they seem – a list of values. Each one of them is numbered, starting from zero – the first one is numbered zero, the second 1, the third 2, etc. You can remove values from the list, and add new values to the end. Example: Your many cats’ names.

Tuples are just like lists, but you can’t change their values. The values that you give it first up, are the values that you are stuck with for the rest of the program. Again, each value is numbered starting from zero, for easy reference. Example: the names of the months of the year.

Dictionaries are similar to what their name suggests – a dictionary. In a dictionary, you have an ‘index’ of words, and for each of them a definition. In python, the word is called a ‘key’, and the definition a ‘value’. The values in a dictionary aren’t numbered – tare similar to what their name suggests – a dictionary. In a dictionary, you have an ‘index’ of words, and for each of them a definition. In python, the word is called a ‘key’, and the definition a ‘value’. The values in a dictionary aren’t numbered – they aren’t in any specific order, either – the key does the same thing. You can add, remove, and modify the values in dictionaries. Example: telephone book.


回答 9

在使用它们时,我会详尽列出它们的方法,以供您参考:

class ContainerMethods:
    def __init__(self):
        self.list_methods_11 = {
                    'Add':{'append','extend','insert'},
                    'Subtract':{'pop','remove'},
                    'Sort':{'reverse', 'sort'},
                    'Search':{'count', 'index'},
                    'Entire':{'clear','copy'},
                            }
        self.tuple_methods_2 = {'Search':'count','index'}

        self.dict_methods_11 = {
                    'Views':{'keys', 'values', 'items'},
                    'Add':{'update'},
                    'Subtract':{'pop', 'popitem',},
                    'Extract':{'get','setdefault',},
                    'Entire':{ 'clear', 'copy','fromkeys'},
                            }
        self.set_methods_17 ={
                    'Add':{['add', 'update'],['difference_update','symmetric_difference_update','intersection_update']},
                    'Subtract':{'pop', 'remove','discard'},
                    'Relation':{'isdisjoint', 'issubset', 'issuperset'},
                    'operation':{'union' 'intersection','difference', 'symmetric_difference'}
                    'Entire':{'clear', 'copy'}}

When use them, I make an exhaustive cheatsheet of their methods for your reference:

class ContainerMethods:
    def __init__(self):
        self.list_methods_11 = {
                    'Add':{'append','extend','insert'},
                    'Subtract':{'pop','remove'},
                    'Sort':{'reverse', 'sort'},
                    'Search':{'count', 'index'},
                    'Entire':{'clear','copy'},
                            }
        self.tuple_methods_2 = {'Search':'count','index'}

        self.dict_methods_11 = {
                    'Views':{'keys', 'values', 'items'},
                    'Add':{'update'},
                    'Subtract':{'pop', 'popitem',},
                    'Extract':{'get','setdefault',},
                    'Entire':{ 'clear', 'copy','fromkeys'},
                            }
        self.set_methods_17 ={
                    'Add':{['add', 'update'],['difference_update','symmetric_difference_update','intersection_update']},
                    'Subtract':{'pop', 'remove','discard'},
                    'Relation':{'isdisjoint', 'issubset', 'issuperset'},
                    'operation':{'union' 'intersection','difference', 'symmetric_difference'}
                    'Entire':{'clear', 'copy'}}

回答 10

字典:Python字典的用法类似于哈希表,其键为索引,对象为值。

列表:列表用于将对象保存在数组中,该对象由该对象在数组中的位置索引。

集合:集合是具有函数的集合,这些函数可以判断集合中是否存在对象。

Dictionary: A python dictionary is used like a hash table with key as index and object as value.

List: A list is used for holding objects in an array indexed by position of that object in the array.

Set: A set is a collection with functions that can tell if an object is present or not present in the set.