为什么我们需要Python(或任何不可变数据类型)中的元组?

问题:为什么我们需要Python(或任何不可变数据类型)中的元组?

我已经阅读了几本python教程(《 Dive Into Python》,其中之一),以及Python.org上的语言参考-我不明白为什么该语言需要元组。

元组与列表或集合相比没有任何方法,如果我必须将元组转换为集合或列表以对其进行排序,那么首先使用元组的意义是什么?

不变性?

为什么有人会关心变量是否位于与最初分配时不同的内存位置?Python不可变性的全部工作似乎都过分强调了。

在C / C ++中,如果我分配了一个指针并指向一些有效的内存,则在使用该地址之前,只要它不为null,我都不在乎该地址位于何处。

每当我引用该变量时,都不需要知道指针是否仍指向原始地址。我只是检查null并使用(或不使用)。

在Python中,当我分配一个字符串(或元组)并将其分配给x,然后修改该字符串时,为什么我不在乎它是否是原始对象?只要变量指向我的数据,那就很重要。

>>> x='hello'
>>> id(x)
1234567
>>> x='good bye'
>>> id(x)
5432167

x 仍然引用我想要的数据,为什么有人需要关心其ID是相同还是不同?

I’ve read several python tutorials (Dive Into Python, for one), and the language reference on Python.org – I don’t see why the language needs tuples.

Tuples have no methods compared to a list or set, and if I must convert a tuple to a set or list to be able to sort them, what’s the point of using a tuple in the first place?

Immutability?

Why does anyone care if a variable lives at a different place in memory than when it was originally allocated? This whole business of immutability in Python seems to be over emphasized.

In C/C++ if I allocate a pointer and point to some valid memory, I don’t care where the address is located as long as it’s not null before I use it.

Whenever I reference that variable, I don’t need to know if the pointer is still pointing to the original address or not. I just check for null and use it (or not).

In Python, when I allocate a string (or tuple) assign it to x, then modify the string, why do I care if it’s the original object? As long as the variable points to my data, that’s all that matters.

>>> x='hello'
>>> id(x)
1234567
>>> x='good bye'
>>> id(x)
5432167

x still references the data I want, why does anyone need to care if its id is the same or different?


回答 0

  1. 不变的对象可以实现实质性的优化;这大概就是为什么字符串在Java中也是不可变的,它是与Python完全分开但同时开发的,而在真正功能的语言中几乎所有东西都是不可变的。

  2. 特别是在Python中,只有不可变的对象才可以是可哈希的(因此,集合的成员或字典中的键)也是可以哈希的。再次,这种优化提供了优化,但不仅仅是“实质性”(设计存储完全可变对象的体面哈希表是一场噩梦-要么在对哈希进行哈希处理后立即复制所有内容,要么进行检查对象是否哈希的噩梦)自从您上次引用它以来,它已经改变了,它的头变得丑陋。

优化问题示例:

$ python -mtimeit '["fee", "fie", "fo", "fum"]'
1000000 loops, best of 3: 0.432 usec per loop
$ python -mtimeit '("fee", "fie", "fo", "fum")'
10000000 loops, best of 3: 0.0563 usec per loop
  1. immutable objects can allow substantial optimization; this is presumably why strings are also immutable in Java, developed quite separately but about the same time as Python, and just about everything is immutable in truly-functional languages.

  2. in Python in particular, only immutables can be hashable (and, therefore, members of sets, or keys in dictionaries). Again, this afford optimization, but far more than just “substantial” (designing decent hash tables storing completely mutable objects is a nightmare — either you take copies of everything as soon as you hash it, or the nightmare of checking whether the object’s hash has changed since you last took a reference to it rears its ugly head).

Example of optimization issue:

$ python -mtimeit '["fee", "fie", "fo", "fum"]'
1000000 loops, best of 3: 0.432 usec per loop
$ python -mtimeit '("fee", "fie", "fo", "fum")'
10000000 loops, best of 3: 0.0563 usec per loop

回答 1

上面的答案都没有指出元组与列表的真正问题,许多Python新手似乎还没有完全理解。

元组和列表有不同的用途。列表存储同类数据。您可以并且应该有这样的列表:

["Bob", "Joe", "John", "Sam"]

正确使用列表的原因是因为这些列表都是同类型的数据,尤其是人们的名字。但采取这样的清单:

["Billy", "Bob", "Joe", 42]

该清单是一个人的全名和年龄。那不是一种数据。存储该信息的正确方法是在元组或对象中。可以说我们有几个:

[("Billy", "Bob", "Joe", 42), ("Robert", "", "Smith", 31)]

元组和列表的不变性和可变性不是主要区别。列表是相同种类的项目的列表:文件,名称,对象。元组是一组不同类型的对象。它们有不同的用途,许多Python编码器滥用了元组的含义列表。

请不要。


编辑:

我认为这篇博客文章解释了为什么我觉得比我做得更好:http : //news.e-scribe.com/397

None of the answers above point out the real issue of tuples vs lists, which many new to Python seem to not fully understand.

Tuples and lists serve different purposes. Lists store homogenous data. You can and should have a list like this:

["Bob", "Joe", "John", "Sam"]

The reason that is a correct use of lists is because those are all homogenous types of data, specifically, people’s names. But take a list like this:

["Billy", "Bob", "Joe", 42]

That list is one person’s full name, and their age. That isn’t one type of data. The correct way to store that information is either in a tuple, or in an object. Lets say we have a few :

[("Billy", "Bob", "Joe", 42), ("Robert", "", "Smith", 31)]

The immutability and mutability of Tuples and Lists is not the main difference. A list is a list of the same kind of items: files, names, objects. Tuples are a grouping of different types of objects. They have different uses, and many Python coders abuse lists for what tuples are meant for.

Please don’t.


Edit:

I think this blog post explains why I think this better than I did: http://news.e-scribe.com/397


回答 2

如果必须将元组转换为集合或列表才能对其进行排序,那么首先使用元组有什么意义?

在这种情况下,可能没有意义。这不是问题,因为这不是您考虑使用元组的情况之一。

如您所指出的,元组是不可变的。具有不可变类型的原因适用于元组:

  • 复制效率:您可以为它添加别名(将变量绑定到引用),而不是复制不可变的对象
  • 比较效率:使用按引用复制时,可以通过比较位置而不是内容来比较两个变量
  • 实习:您最多需要存储任何不变值的一份副本
  • 无需在并发代码中同步对不可变对象的访问
  • const正确性:不允许更改某些值。(对我而言)这是不可变类型的主要原因。

请注意,特定的Python实现可能无法利用上述所有功能。

字典键必须是不可变的,否则更改键对象的属性可能会使基础数据结构的不变性失效。因此,元组可以潜在地用作键。这是const正确性的结果。

另请参见Dive Into Python中的介绍元组 ” 。

if I must convert a tuple to a set or list to be able to sort them, what’s the point of using a tuple in the first place?

In this particular case, there probably isn’t a point. This is a non-issue, because this isn’t one of the cases where you’d consider using a tuple.

As you point out, tuples are immutable. The reasons for having immutable types apply to tuples:

  • copy efficiency: rather than copying an immutable object, you can alias it (bind a variable to a reference)
  • comparison efficiency: when you’re using copy-by-reference, you can compare two variables by comparing location, rather than content
  • interning: you need to store at most one copy of any immutable value
  • there’s no need to synchronize access to immutable objects in concurrent code
  • const correctness: some values shouldn’t be allowed to change. This (to me) is the main reason for immutable types.

Note that a particular Python implementation may not make use of all of the above features.

Dictionary keys must be immutable, otherwise changing the properties of a key-object can invalidate invariants of the underlying data structure. Tuples can thus potentially be used as keys. This is a consequence of const correctness.

See also “Introducing tuples“, from Dive Into Python.


回答 3

有时我们喜欢使用对象作为字典键

就其价值而言,最近的元组(2.6+)index()count()方法

Sometimes we like to use objects as dictionary keys

For what it’s worth, tuples recently (2.6+) grew index() and count() methods


回答 4

我总是发现对于同一基本数据结构(数组)有两种完全独立的类型是一个笨拙的设计,但实际上并不是一个真正的问题。(每种语言都有其缺陷,包括Python,但这并不是很重要。)

为什么有人会关心变量是否位于与最初分配时不同的内存位置?Python不可变性的全部工作似乎都过分强调了。

这些是不同的东西。可变性与它在内存中的存储位置无关。这意味着它指向内容无法更改。

Python对象创建后无法更改位置,无论是否可变。(更准确地说,id()的值不能改变,实际上是相同的。)可变对象的内部存储可以改变,但这是一个隐藏的实现细节。

>>> x='hello'
>>> id(x)
1234567
>>> x='good bye'
>>> id(x)
5432167

这不是在修改(“变异”)变量。它正在创建一个具有相同名称的新变量,并丢弃旧变量。与变异操作比较:

>>> a = [1,2,3]
>>> id(a)
3084599212L
>>> a[1] = 5
>>> a
[1, 5, 3]
>>> id(a)
3084599212L

正如其他人指出的那样,这允许将数组用作字典以及其他需要不变性的数据结构的键。

请注意,字典的键不必完全不变。只有用作密钥的部分才是不变的。对于某些用途,这是一个重要的区别。例如,您可能有一个代表用户的类,该类通过唯一的用户名比较相等性和哈希值。然后,您可以将其他可变数据挂在类上-“用户已登录”,等等。由于这不会影响相等性或哈希,因此可以将其用作字典中的键并且完全有效。这在Python中不是很常见;我只是指出这一点,因为几个人声称密钥必须是“不可变的”,这只是部分正确的。不过,我已经在C ++映射和集合中使用了很多次。

I’ve always found having two completely separate types for the same basic data structure (arrays) to be an awkward design, but not a real problem in practice. (Every language has its warts, Python included, but this isn’t an important one.)

Why does anyone care if a variable lives at a different place in memory than when it was originally allocated? This whole business of immutability in Python seems to be over emphasized.

These are different things. Mutability isn’t related to the place it’s stored in memory; it means the stuff it points to can’t change.

Python objects can’t change location after they’re created, mutable or not. (More accurately, the value of id() can’t change–same thing, in practice.) The internal storage of mutable objects can change, but that’s a hidden implementation detail.

>>> x='hello'
>>> id(x)
1234567
>>> x='good bye'
>>> id(x)
5432167

This isn’t modifying (“mutating”) the variable; it’s creating a new variable with the same name, and discarding the old one. Compare to a mutating operation:

>>> a = [1,2,3]
>>> id(a)
3084599212L
>>> a[1] = 5
>>> a
[1, 5, 3]
>>> id(a)
3084599212L

As others have pointed out, this allows using arrays as keys to dictionaries, and other data structures that need immutability.

Note that keys for dictionaries do not have to be completely immutable. Only the part of it used as a key needs to be immutable; for some uses, this is an important distinction. For example, you could have a class representing a user, which compares equality and a hash by the unique username. You could then hang other mutable data on the class–“user is logged in”, etc. Since this doesn’t affect equality or the hash, it’s possible and perfectly valid to use this as a key in a dictionary. This isn’t too commonly needed in Python; I just point it out since several people have claimed that keys need to be “immutable”, which is only partially correct. I’ve used this many times with C++ maps and sets, though.


回答 5

正如小偷在评论中所提供的那样,Guido的观点未被完全接受/赞赏:“列表用于同构数据,元组用于异构数据”。当然,许多反对者将此解释为意味着列表中的所有元素应为同一类型。

我喜欢以不同的方式看待它,与过去的其他人一样:

blue= 0, 0, 255
alist= ["red", "green", blue]

请注意,即使type(alist [1])!= type(alist [2]),我也认为列表是同质的。

如果我可以更改元素的顺序,并且代码中没有问题(除了假设,例如“应该排序”),则应使用列表。如果不行(就像blue上面的元组一样),那么我应该使用一个元组。

As gnibbler offered in a comment, Guido had an opinion that is not fully accepted/appreciated: “lists are for homogeneous data, tuples are for heterogeneous data”. Of course, many of the opposers interpreted this as meaning that all elements of a list should be of the same type.

I like to see it differently, not unlike others also have in the past:

blue= 0, 0, 255
alist= ["red", "green", blue]

Note that I consider alist to be homogeneous, even if type(alist[1]) != type(alist[2]).

If I can change the order of the elements and I won’t have issues in my code (apart from assumptions, e.g. “it should be sorted”), then a list should be used. If not (like in the tuple blue above), then I should use a tuple.


回答 6

它们很重要,因为它们可以保证调用者不会忽略传递给它们的对象。如果您这样做:

a = [1,1,1]
doWork(a)

调用方法无法保证呼叫后a的值。然而,

a = (1,1,1)
doWorK(a)

现在,您作为此代码的调用者或阅读者知道a是相同的。在这种情况下,您始终可以复制列表并通过该列表,但是现在您是在浪费时间,而不是使用更具语义意义的语言构造。

They are important since they guarantee the caller that the object they pass won’t be mutated. If you do this:

a = [1,1,1]
doWork(a)

The caller has no guarantee of the value of a after the call. However,

a = (1,1,1)
doWorK(a)

Now you as the caller or as a reader of this code know that a is the same. You could always for this scenario make a copy of the list and pass that but now you are wasting cycles instead of using a language construct that makes more semantic sense.


回答 7

你可以在这里看到一些讨论

you can see here for some discussion on this


回答 8

您的问题(和后续评论)集中在id()在分配期间是否发生变化。专注于不变对象替换和可变对象修改之间的差异的后续影响,而不是差异本身,也许不是最佳方法。

在继续之前,请确保下面演示的行为符合您对Python的期望。

>>> a1 = [1]
>>> a2 = a1
>>> print a2[0]
1
>>> a1[0] = 2
>>> print a2[0]
2

在这种情况下,即使仅为a1分配了新值,a2的内容也被更改。与以下内容对比:

>>> a1 = (1,)
>>> a2 = a1
>>> print a2[0]
1
>>> a1 = (2,)
>>> print a2[0]
1

在后一种情况下,我们替换了整个列表,而不是更新其内容。对于不可变类型(例如元组),这是唯一允许的行为。

为什么这么重要?假设您有一个字典:

>>> t1 = (1,2)
>>> d1 = { t1 : 'three' }
>>> print d1
{(1,2): 'three'}
>>> t1[0] = 0  ## results in a TypeError, as tuples cannot be modified
>>> t1 = (2,3) ## creates a new tuple, does not modify the old one
>>> print d1   ## as seen here, the dict is still intact
{(1,2): 'three'}

使用元组,可以安全地防止字典的键“从其下方”更改为散列为不同值的项目。这对于有效执行至关重要。

Your question (and follow-up comments) focus on whether the id() changes during an assignment. Focusing on this follow-on effect of the difference between immutable object replacement and mutable object modification rather than the difference itself is perhaps not the best approach.

Before we continue, make sure that the behavior demonstrated below is what you expect from Python.

>>> a1 = [1]
>>> a2 = a1
>>> print a2[0]
1
>>> a1[0] = 2
>>> print a2[0]
2

In this case, the contents of a2 was changed, even though only a1 had a new value assigned. Contrast to the following:

>>> a1 = (1,)
>>> a2 = a1
>>> print a2[0]
1
>>> a1 = (2,)
>>> print a2[0]
1

In this latter case, we replaced the entire list, rather than updating its contents. With immutable types such as tuples, this is the only behavior allowed.

Why does this matter? Let’s say you have a dict:

>>> t1 = (1,2)
>>> d1 = { t1 : 'three' }
>>> print d1
{(1,2): 'three'}
>>> t1[0] = 0  ## results in a TypeError, as tuples cannot be modified
>>> t1 = (2,3) ## creates a new tuple, does not modify the old one
>>> print d1   ## as seen here, the dict is still intact
{(1,2): 'three'}

Using a tuple, the dictionary is safe from having its keys changed “out from under it” to items which hash to a different value. This is critical to allow efficient implementation.