问题:“ is”运算符对整数的行为异常
为什么以下内容在Python中表现异常?
>>> a = 256
>>> b = 256
>>> a is b
True # This is an expected result
>>> a = 257
>>> b = 257
>>> a is b
False # What happened here? Why is this False?
>>> 257 is 257
True # Yet the literal numbers compare properly
我正在使用Python 2.5.2。尝试使用某些不同版本的Python,Python 2.3.3似乎在99到100之间显示了上述行为。
基于以上所述,我可以假设Python是内部实现的,因此“小”整数的存储方式与大整数的存储方式不同,并且is
运算符可以分辨出这种差异。为什么要泄漏抽象?当我事先不知道它们是否为数字时,比较两个任意对象以查看它们是否相同的更好方法是什么?
Why does the following behave unexpectedly in Python?
>>> a = 256
>>> b = 256
>>> a is b
True # This is an expected result
>>> a = 257
>>> b = 257
>>> a is b
False # What happened here? Why is this False?
>>> 257 is 257
True # Yet the literal numbers compare properly
I am using Python 2.5.2. Trying some different versions of Python, it appears that Python 2.3.3 shows the above behaviour between 99 and 100.
Based on the above, I can hypothesize that Python is internally implemented such that “small” integers are stored in a different way than larger integers and the is
operator can tell the difference. Why the leaky abstraction? What is a better way of comparing two arbitrary objects to see whether they are the same when I don’t know in advance whether they are numbers or not?
回答 0
看看这个:
>>> a = 256
>>> b = 256
>>> id(a)
9987148
>>> id(b)
9987148
>>> a = 257
>>> b = 257
>>> id(a)
11662816
>>> id(b)
11662828
这是我在Python 2文档“普通整数对象”中发现的内容(对于Python 3也是一样):
当前的实现为-5到256之间的所有整数保留一个整数对象数组,当您在该范围内创建int时,实际上实际上是返回对现有对象的引用。因此应该可以更改1的值。我怀疑在这种情况下Python的行为是不确定的。:-)
Take a look at this:
>>> a = 256
>>> b = 256
>>> id(a)
9987148
>>> id(b)
9987148
>>> a = 257
>>> b = 257
>>> id(a)
11662816
>>> id(b)
11662828
Here’s what I found in the Python 2 documentation, “Plain Integer Objects” (It’s the same for Python 3):
The current implementation keeps an
array of integer objects for all
integers between -5 and 256, when you
create an int in that range you
actually just get back a reference to
the existing object. So it should be
possible to change the value of 1. I
suspect the behaviour of Python in
this case is undefined. :-)
回答 1
Python的“ is”运算符在使用整数时表现异常吗?
总结-让我强调一下:不要is
用于比较整数。
这不是您应该有任何期望的行为。
相反,分别使用==
和!=
比较相等和不平等。例如:
>>> a = 1000
>>> a == 1000 # Test integers like this,
True
>>> a != 5000 # or this!
True
>>> a is 1000 # Don't do this! - Don't use `is` to test integers!!
False
说明
要知道这一点,您需要了解以下内容。
首先,该怎么is
办?它是一个比较运算符。从文档中:
运算符is
并is not
测试对象标识:x is y
当且仅当x和y是同一对象时才为true。x is not y
产生反真值。
因此,以下内容是等效的。
>>> a is b
>>> id(a) == id(b)
从文档中:
id
返回对象的“身份”。这是一个整数(或长整数),在该对象的生存期内,此整数保证是唯一且恒定的。具有不重叠生存期的两个对象可能具有相同的id()
值。
请注意,CPython(Python的参考实现)中对象的ID是内存中的位置这一事实是实现细节。Python的其他实现(例如Jython或IronPython)可以轻松地使用的不同实现id
。
那么用例是is
什么呢? PEP8描述:
与单例之类的比较None
应始终使用is
或
is not
,而不应使用相等运算符。
问题
您询问并陈述以下问题(带有代码):
为什么以下内容在Python中表现异常?
>>> a = 256
>>> b = 256
>>> a is b
True # This is an expected result
这不是预期的结果。为什么会这样?这仅意味着256
两者a
和引用的整数值是整数b
的相同实例。整数在Python中是不可变的,因此它们不能更改。这对任何代码都没有影响。不应期望。这仅仅是一个实现细节。
但是也许我们应该为每次声明一个等于256的值而在内存中没有新的单独实例感到高兴。
>>> a = 257
>>> b = 257
>>> a is b
False # What happened here? Why is this False?
看起来我们现在有两个单独的整数实例,它们的值257
在内存中。由于整数是不可变的,因此会浪费内存。希望我们不要浪费很多。我们可能不是。但是不能保证这种行为。
>>> 257 is 257
True # Yet the literal numbers compare properly
好吧,这看起来好像您的Python特定实现正在尝试变得聪明,除非必须这样做,否则不会在内存中创建冗余值的整数。您似乎表明您正在使用Python的引用实现,即CPython。对CPython有好处。
如果CPython可以在全球范围内做到这一点甚至更好,如果它可以便宜地做到这一点(因为查找会花费一定的成本),也许还有另一种实现方式。
但是对于对代码的影响,您不必在乎整数是否是整数的特定实例。您只需要关心该实例的值是什么,就可以使用普通的比较运算符,即==
。
是什么is
呢
is
检查id
两个对象的相同。在CPython中,id
是内存中的位置,但是在另一个实现中,它可能是其他一些唯一标识的数字。要用代码重新声明:
>>> a is b
是相同的
>>> id(a) == id(b)
那我们为什么要使用is
呢?
相对于说,这是一个非常快速的检查,检查两个很长的字符串的值是否相等。但是由于它适用于对象的唯一性,因此我们的用例有限。实际上,我们主要是想用它来检查None
,这是一个单例(内存中一个地方存在的唯一实例)。如果有可能将其他单例合并is
,我们可以创建其他单例,我们可能会与进行检查,但这相对较少。这是一个示例(将在Python 2和3中运行),例如
SENTINEL_SINGLETON = object() # this will only be created one time.
def foo(keyword_argument=None):
if keyword_argument is None:
print('no argument given to foo')
bar()
bar(keyword_argument)
bar('baz')
def bar(keyword_argument=SENTINEL_SINGLETON):
# SENTINEL_SINGLETON tells us if we were not passed anything
# as None is a legitimate potential argument we could get.
if keyword_argument is SENTINEL_SINGLETON:
print('no argument given to bar')
else:
print('argument to bar: {0}'.format(keyword_argument))
foo()
哪些打印:
no argument given to foo
no argument given to bar
argument to bar: None
argument to bar: baz
因此,我们看到,使用is
和和哨兵,我们可以区分何时bar
不带参数调用和何时带调用None
。这些是主要的用例的is
-不要没有用它来测试整数,字符串,元组,或者其他喜欢这些东西的平等。
Python’s “is” operator behaves unexpectedly with integers?
In summary – let me emphasize: Do not use is
to compare integers.
This isn’t behavior you should have any expectations about.
Instead, use ==
and !=
to compare for equality and inequality, respectively. For example:
>>> a = 1000
>>> a == 1000 # Test integers like this,
True
>>> a != 5000 # or this!
True
>>> a is 1000 # Don't do this! - Don't use `is` to test integers!!
False
Explanation
To know this, you need to know the following.
First, what does is
do? It is a comparison operator. From the documentation:
The operators is
and is not
test for object identity: x is y
is true
if and only if x and y are the same object. x is not y
yields the
inverse truth value.
And so the following are equivalent.
>>> a is b
>>> id(a) == id(b)
From the documentation:
id
Return the “identity” of an object. This is an integer (or long
integer) which is guaranteed to be unique and constant for this object
during its lifetime. Two objects with non-overlapping lifetimes may
have the same id()
value.
Note that the fact that the id of an object in CPython (the reference implementation of Python) is the location in memory is an implementation detail. Other implementations of Python (such as Jython or IronPython) could easily have a different implementation for id
.
So what is the use-case for is
? PEP8 describes:
Comparisons to singletons like None
should always be done with is
or
is not
, never the equality operators.
The Question
You ask, and state, the following question (with code):
Why does the following behave unexpectedly in Python?
>>> a = 256
>>> b = 256
>>> a is b
True # This is an expected result
It is not an expected result. Why is it expected? It only means that the integers valued at 256
referenced by both a
and b
are the same instance of integer. Integers are immutable in Python, thus they cannot change. This should have no impact on any code. It should not be expected. It is merely an implementation detail.
But perhaps we should be glad that there is not a new separate instance in memory every time we state a value equals 256.
>>> a = 257
>>> b = 257
>>> a is b
False # What happened here? Why is this False?
Looks like we now have two separate instances of integers with the value of 257
in memory. Since integers are immutable, this wastes memory. Let’s hope we’re not wasting a lot of it. We’re probably not. But this behavior is not guaranteed.
>>> 257 is 257
True # Yet the literal numbers compare properly
Well, this looks like your particular implementation of Python is trying to be smart and not creating redundantly valued integers in memory unless it has to. You seem to indicate you are using the referent implementation of Python, which is CPython. Good for CPython.
It might be even better if CPython could do this globally, if it could do so cheaply (as there would a cost in the lookup), perhaps another implementation might.
But as for impact on code, you should not care if an integer is a particular instance of an integer. You should only care what the value of that instance is, and you would use the normal comparison operators for that, i.e. ==
.
What is
does
is
checks that the id
of two objects are the same. In CPython, the id
is the location in memory, but it could be some other uniquely identifying number in another implementation. To restate this with code:
>>> a is b
is the same as
>>> id(a) == id(b)
Why would we want to use is
then?
This can be a very fast check relative to say, checking if two very long strings are equal in value. But since it applies to the uniqueness of the object, we thus have limited use-cases for it. In fact, we mostly want to use it to check for None
, which is a singleton (a sole instance existing in one place in memory). We might create other singletons if there is potential to conflate them, which we might check with is
, but these are relatively rare. Here’s an example (will work in Python 2 and 3) e.g.
SENTINEL_SINGLETON = object() # this will only be created one time.
def foo(keyword_argument=None):
if keyword_argument is None:
print('no argument given to foo')
bar()
bar(keyword_argument)
bar('baz')
def bar(keyword_argument=SENTINEL_SINGLETON):
# SENTINEL_SINGLETON tells us if we were not passed anything
# as None is a legitimate potential argument we could get.
if keyword_argument is SENTINEL_SINGLETON:
print('no argument given to bar')
else:
print('argument to bar: {0}'.format(keyword_argument))
foo()
Which prints:
no argument given to foo
no argument given to bar
argument to bar: None
argument to bar: baz
And so we see, with is
and a sentinel, we are able to differentiate between when bar
is called with no arguments and when it is called with None
. These are the primary use-cases for is
– do not use it to test for equality of integers, strings, tuples, or other things like these.
回答 2
这取决于您是否要看两个事物是否相等或相同的对象。
is
检查它们是否是相同的对象,而不仅仅是相等。小整数可能指向相同的内存位置以提高空间效率
In [29]: a = 3
In [30]: b = 3
In [31]: id(a)
Out[31]: 500729144
In [32]: id(b)
Out[32]: 500729144
您应该==
用来比较任意对象的相等性。您可以使用__eq__
和__ne__
属性指定行为。
It depends on whether you’re looking to see if 2 things are equal, or the same object.
is
checks to see if they are the same object, not just equal. The small ints are probably pointing to the same memory location for space efficiency
In [29]: a = 3
In [30]: b = 3
In [31]: id(a)
Out[31]: 500729144
In [32]: id(b)
Out[32]: 500729144
You should use ==
to compare equality of arbitrary objects. You can specify the behavior with the __eq__
, and __ne__
attributes.
回答 3
我来晚了,但是,您想从中获得答案吗?我将尝试以介绍性的方式对此进行说明,以便更多的人可以跟进。
关于CPython的一件好事是您实际上可以看到其来源。我将使用3.5版本的链接,但是找到相应的2.x链接是微不足道的。
在CPython中,用于创建新对象的C-API函数int
为PyLong_FromLong(long v)
。此功能的说明是:
当前的实现为-5到256之间的所有整数保留一个整数对象数组,当您在该范围内创建int时,实际上实际上是返回对现有对象的引用。因此应该可以更改1的值。我怀疑在这种情况下Python的行为是不确定的。:-)
(我的斜体)
不了解您,但我看到了并想:让我们找到那个数组!
如果您还不熟悉实现CPython的C代码,则应 ; 一切都井井有条,可读性强。对于我们而言,我们需要在看Objects
子目录中的主源代码目录树。
PyLong_FromLong
处理long
对象,因此不难推断我们需要窥视内部longobject.c
。看完内部,您可能会觉得事情很混乱。它们是,但不要担心,我们正在寻找的功能在第230行令人不寒而栗,等待我们检查出来。这是一个很小的函数,因此主体(不包括声明)可以轻松粘贴到此处:
PyObject *
PyLong_FromLong(long ival)
{
// omitting declarations
CHECK_SMALL_INT(ival);
if (ival < 0) {
/* negate: cant write this as abs_ival = -ival since that
invokes undefined behaviour when ival is LONG_MIN */
abs_ival = 0U-(unsigned long)ival;
sign = -1;
}
else {
abs_ival = (unsigned long)ival;
}
/* Fast path for single-digit ints */
if (!(abs_ival >> PyLong_SHIFT)) {
v = _PyLong_New(1);
if (v) {
Py_SIZE(v) = sign;
v->ob_digit[0] = Py_SAFE_DOWNCAST(
abs_ival, unsigned long, digit);
}
return (PyObject*)v;
}
现在,我们不是C 主代码-haxxorz,但我们也不傻,我们可以看到这CHECK_SMALL_INT(ival);
一切诱人地窥视着我们。我们可以理解,这与此有关。让我们来看看:
#define CHECK_SMALL_INT(ival) \
do if (-NSMALLNEGINTS <= ival && ival < NSMALLPOSINTS) { \
return get_small_int((sdigit)ival); \
} while(0)
因此,get_small_int
如果值ival
满足条件,则它是一个调用函数的宏:
if (-NSMALLNEGINTS <= ival && ival < NSMALLPOSINTS)
那么什么是NSMALLNEGINTS
和NSMALLPOSINTS
?宏!他们在这里:
#ifndef NSMALLPOSINTS
#define NSMALLPOSINTS 257
#endif
#ifndef NSMALLNEGINTS
#define NSMALLNEGINTS 5
#endif
所以我们的条件是if (-5 <= ival && ival < 257)
通话get_small_int
。
接下来,让我们看一下get_small_int
它的所有荣耀(好吧,我们只看它的身体,因为那是有趣的地方):
PyObject *v;
assert(-NSMALLNEGINTS <= ival && ival < NSMALLPOSINTS);
v = (PyObject *)&small_ints[ival + NSMALLNEGINTS];
Py_INCREF(v);
好的,声明一个PyObject
,断言先前的条件成立并执行赋值:
v = (PyObject *)&small_ints[ival + NSMALLNEGINTS];
small_ints
看起来很像我们一直在寻找的那个数组,它是!我们只要阅读该死的文档,我们就永远知道!:
/* Small integers are preallocated in this array so that they
can be shared.
The integers that are preallocated are those in the range
-NSMALLNEGINTS (inclusive) to NSMALLPOSINTS (not inclusive).
*/
static PyLongObject small_ints[NSMALLNEGINTS + NSMALLPOSINTS];
是的,这是我们的家伙。当您要int
在该范围内创建一个新[NSMALLNEGINTS, NSMALLPOSINTS)
对象时,您只需返回对已预先分配的现有对象的引用。
由于引用引用的是同一对象,因此id()
直接发布或检查其上的身份is
将返回完全相同的内容。
但是,什么时候分配它们?
在_PyLong_Init
Python 初始化期间,将很乐意进入for循环为您执行此操作:
for (ival = -NSMALLNEGINTS; ival < NSMALLPOSINTS; ival++, v++) {
查看源代码以阅读循环体!
希望我的解释使您现在对C的认识清楚(很明显是故意的)。
但是,257 is 257
?这是怎么回事?
这实际上更容易解释,我已经尝试过这样做;这是由于Python将这个交互式语句作为一个单独的块执行:
>>> 257 is 257
在编译此语句期间,CPython将看到您有两个匹配的文字,并将使用相同的PyLongObject
表示形式257
。如果您自己进行编译并检查其内容,则可以看到以下内容:
>>> codeObj = compile("257 is 257", "blah!", "exec")
>>> codeObj.co_consts
(257, None)
当CPython进行操作时,现在将要加载完全相同的对象:
>>> import dis
>>> dis.dis(codeObj)
1 0 LOAD_CONST 0 (257) # dis
3 LOAD_CONST 0 (257) # dis again
6 COMPARE_OP 8 (is)
所以is
会回来的True
。
I’m late but, you want some source with your answer? I’ll try and word this in an introductory manner so more folks can follow along.
A good thing about CPython is that you can actually see the source for this. I’m going to use links for the 3.5 release, but finding the corresponding 2.x ones is trivial.
In CPython, the C-API function that handles creating a new int
object is PyLong_FromLong(long v)
. The description for this function is:
The current implementation keeps an array of integer objects for all integers between -5 and 256, when you create an int in that range you actually just get back a reference to the existing object. So it should be possible to change the value of 1. I suspect the behaviour of Python in this case is undefined. :-)
(My italics)
Don’t know about you but I see this and think: Let’s find that array!
If you haven’t fiddled with the C code implementing CPython you should; everything is pretty organized and readable. For our case, we need to look in the Objects
subdirectory of the main source code directory tree.
PyLong_FromLong
deals with long
objects so it shouldn’t be hard to deduce that we need to peek inside longobject.c
. After looking inside you might think things are chaotic; they are, but fear not, the function we’re looking for is chilling at line 230 waiting for us to check it out. It’s a smallish function so the main body (excluding declarations) is easily pasted here:
PyObject *
PyLong_FromLong(long ival)
{
// omitting declarations
CHECK_SMALL_INT(ival);
if (ival < 0) {
/* negate: cant write this as abs_ival = -ival since that
invokes undefined behaviour when ival is LONG_MIN */
abs_ival = 0U-(unsigned long)ival;
sign = -1;
}
else {
abs_ival = (unsigned long)ival;
}
/* Fast path for single-digit ints */
if (!(abs_ival >> PyLong_SHIFT)) {
v = _PyLong_New(1);
if (v) {
Py_SIZE(v) = sign;
v->ob_digit[0] = Py_SAFE_DOWNCAST(
abs_ival, unsigned long, digit);
}
return (PyObject*)v;
}
Now, we’re no C master-code-haxxorz but we’re also not dumb, we can see that CHECK_SMALL_INT(ival);
peeking at us all seductively; we can understand it has something to do with this. Let’s check it out:
#define CHECK_SMALL_INT(ival) \
do if (-NSMALLNEGINTS <= ival && ival < NSMALLPOSINTS) { \
return get_small_int((sdigit)ival); \
} while(0)
So it’s a macro that calls function get_small_int
if the value ival
satisfies the condition:
if (-NSMALLNEGINTS <= ival && ival < NSMALLPOSINTS)
So what are NSMALLNEGINTS
and NSMALLPOSINTS
? Macros! Here they are:
#ifndef NSMALLPOSINTS
#define NSMALLPOSINTS 257
#endif
#ifndef NSMALLNEGINTS
#define NSMALLNEGINTS 5
#endif
So our condition is if (-5 <= ival && ival < 257)
call get_small_int
.
Next let’s look at get_small_int
in all its glory (well, we’ll just look at its body because that’s where the interesting things are):
PyObject *v;
assert(-NSMALLNEGINTS <= ival && ival < NSMALLPOSINTS);
v = (PyObject *)&small_ints[ival + NSMALLNEGINTS];
Py_INCREF(v);
Okay, declare a PyObject
, assert that the previous condition holds and execute the assignment:
v = (PyObject *)&small_ints[ival + NSMALLNEGINTS];
small_ints
looks a lot like that array we’ve been searching for, and it is! We could’ve just read the damn documentation and we would’ve know all along!:
/* Small integers are preallocated in this array so that they
can be shared.
The integers that are preallocated are those in the range
-NSMALLNEGINTS (inclusive) to NSMALLPOSINTS (not inclusive).
*/
static PyLongObject small_ints[NSMALLNEGINTS + NSMALLPOSINTS];
So yup, this is our guy. When you want to create a new int
in the range [NSMALLNEGINTS, NSMALLPOSINTS)
you’ll just get back a reference to an already existing object that has been preallocated.
Since the reference refers to the same object, issuing id()
directly or checking for identity with is
on it will return exactly the same thing.
But, when are they allocated??
During initialization in _PyLong_Init
Python will gladly enter in a for loop do do this for you:
for (ival = -NSMALLNEGINTS; ival < NSMALLPOSINTS; ival++, v++) {
Check out the source to read the loop body!
I hope my explanation has made you C things clearly now (pun obviously intented).
But, 257 is 257
? What’s up?
This is actually easier to explain, and I have attempted to do so already; it’s due to the fact that Python will execute this interactive statement as a single block:
>>> 257 is 257
During complilation of this statement, CPython will see that you have two matching literals and will use the same PyLongObject
representing 257
. You can see this if you do the compilation yourself and examine its contents:
>>> codeObj = compile("257 is 257", "blah!", "exec")
>>> codeObj.co_consts
(257, None)
When CPython does the operation, it’s now just going to load the exact same object:
>>> import dis
>>> dis.dis(codeObj)
1 0 LOAD_CONST 0 (257) # dis
3 LOAD_CONST 0 (257) # dis again
6 COMPARE_OP 8 (is)
So is
will return True
.
回答 4
您可以检入源文件intobject.c,Python会缓存小整数以提高效率。每次创建对小整数的引用时,都是在引用缓存的小整数,而不是新对象。257不是一个小整数,因此它被计算为另一个对象。
最好==
用于此目的。
As you can check in source file intobject.c, Python caches small integers for efficiency. Every time you create a reference to a small integer, you are referring the cached small integer, not a new object. 257 is not an small integer, so it is calculated as a different object.
It is better to use ==
for that purpose.
回答 5
我认为您的假设是正确的。实验id
(对象的身份):
In [1]: id(255)
Out[1]: 146349024
In [2]: id(255)
Out[2]: 146349024
In [3]: id(257)
Out[3]: 146802752
In [4]: id(257)
Out[4]: 148993740
In [5]: a=255
In [6]: b=255
In [7]: c=257
In [8]: d=257
In [9]: id(a), id(b), id(c), id(d)
Out[9]: (146349024, 146349024, 146783024, 146804020)
看来数字<= 255
被当作文字,而上面的任何东西都被不同地对待!
I think your hypotheses is correct. Experiment with id
(identity of object):
In [1]: id(255)
Out[1]: 146349024
In [2]: id(255)
Out[2]: 146349024
In [3]: id(257)
Out[3]: 146802752
In [4]: id(257)
Out[4]: 148993740
In [5]: a=255
In [6]: b=255
In [7]: c=257
In [8]: d=257
In [9]: id(a), id(b), id(c), id(d)
Out[9]: (146349024, 146349024, 146783024, 146804020)
It appears that numbers <= 255
are treated as literals and anything above is treated differently!
回答 6
对于整数,字符串或日期时间之类的不可变值对象,对象标识并不是特别有用。最好考虑平等。身份本质上是值对象的实现细节-由于它们是不可变的,因此对同一个对象或多个对象具有多个引用之间没有有效的区别。
For immutable value objects, like ints, strings or datetimes, object identity is not especially useful. It’s better to think about equality. Identity is essentially an implementation detail for value objects – since they’re immutable, there’s no effective difference between having multiple refs to the same object or multiple objects.
回答 7
现有答案中都没有指出另一个问题。允许Python合并任何两个不可变的值,并且预先创建的小int值不是发生这种情况的唯一方法。永远不能保证 Python实现会做到这一点,但他们所做的不仅仅只是小整数。
一方面,还有一些其他预先创建的值,例如empty tuple
,str
和bytes
和一些短字符串(在CPython 3.6中,这是256个单字符Latin-1字符串)。例如:
>>> a = ()
>>> b = ()
>>> a is b
True
而且,即使是非预先创建的值也可以相同。考虑以下示例:
>>> c = 257
>>> d = 257
>>> c is d
False
>>> e, f = 258, 258
>>> e is f
True
这不限于int
值:
>>> g, h = 42.23e100, 42.23e100
>>> g is h
True
显然,CPython没有为预先创建float
值42.23e100
。那么,这是怎么回事?
CPython的编译器将合并一些已知不变类型等的恒定值int
,float
,str
,bytes
,在相同的编译单元。对于一个模块,整个模块是一个编译单元,但是在交互式解释器中,每个语句都是一个单独的编译单元。由于c
和d
是在单独的语句中定义的,因此不会合并它们的值。由于e
和f
是在同一条语句中定义的,因此将合并它们的值。
您可以通过分解字节码来查看发生了什么。尝试定义一个执行该操作的函数,e, f = 128, 128
然后对其进行调用dis.dis
,您将看到只有一个常数值(128, 128)
>>> def f(): i, j = 258, 258
>>> dis.dis(f)
1 0 LOAD_CONST 2 ((128, 128))
2 UNPACK_SEQUENCE 2
4 STORE_FAST 0 (i)
6 STORE_FAST 1 (j)
8 LOAD_CONST 0 (None)
10 RETURN_VALUE
>>> f.__code__.co_consts
(None, 128, (128, 128))
>>> id(f.__code__.co_consts[1], f.__code__.co_consts[2][0], f.__code__.co_consts[2][1])
4305296480, 4305296480, 4305296480
您可能会注意到,128
即使字节码实际上并未使用编译器,编译器也已将其存储为常量,这使您了解了CPython编译器所做的优化很少。这意味着(非空)元组实际上不会最终合并:
>>> k, l = (1, 2), (1, 2)
>>> k is l
False
把在一个函数,dis
它,看看co_consts
-there是一个1
和2
两个(1, 2)
共享相同的元组1
和2
,但不相同,并且((1, 2), (1, 2))
具有两个不同的元组相等的元组。
CPython还有另外一个优化:字符串实习。与编译器常量折叠不同,这不限于源代码文字:
>>> m = 'abc'
>>> n = 'abc'
>>> m is n
True
另一方面,它仅限于内部存储类型“ ascii compact”,“ compact”或“ legacy ready”的str
类型和字符串,并且在许多情况下,只有“ ascii compact”会被嵌入。
无论如何,不同实现之间,同一实现的版本之间,甚至同一实现的同一副本上运行相同代码的时间之间,关于值必须是,可能是或不能不同的规则有所不同。 。
有趣的是值得学习一个特定Python的规则。但是在代码中不值得依赖它们。唯一安全的规则是:
- 不要编写假定两个相等但分别创建的不可变值相同的代码(不要使用
x is y
,请使用x == y
)
- 不要编写假定两个相等但分别创建的不可变值不同的代码(不要使用
x is not y
,请使用x != y
)
或者,换句话说,仅用于is
测试已记录的单例(如None
)或仅在代码中的一个位置创建的单例(如_sentinel = object()
成语)。
There’s another issue that isn’t pointed out in any of the existing answers. Python is allowed to merge any two immutable values, and pre-created small int values are not the only way this can happen. A Python implementation is never guaranteed to do this, but they all do it for more than just small ints.
For one thing, there are some other pre-created values, such as the empty tuple
, str
, and bytes
, and some short strings (in CPython 3.6, it’s the 256 single-character Latin-1 strings). For example:
>>> a = ()
>>> b = ()
>>> a is b
True
But also, even non-pre-created values can be identical. Consider these examples:
>>> c = 257
>>> d = 257
>>> c is d
False
>>> e, f = 258, 258
>>> e is f
True
And this isn’t limited to int
values:
>>> g, h = 42.23e100, 42.23e100
>>> g is h
True
Obviously, CPython doesn’t come with a pre-created float
value for 42.23e100
. So, what’s going on here?
The CPython compiler will merge constant values of some known-immutable types like int
, float
, str
, bytes
, in the same compilation unit. For a module, the whole module is a compilation unit, but at the interactive interpreter, each statement is a separate compilation unit. Since c
and d
are defined in separate statements, their values aren’t merged. Since e
and f
are defined in the same statement, their values are merged.
You can see what’s going on by disassembling the bytecode. Try defining a function that does e, f = 128, 128
and then calling dis.dis
on it, and you’ll see that there’s a single constant value (128, 128)
>>> def f(): i, j = 258, 258
>>> dis.dis(f)
1 0 LOAD_CONST 2 ((128, 128))
2 UNPACK_SEQUENCE 2
4 STORE_FAST 0 (i)
6 STORE_FAST 1 (j)
8 LOAD_CONST 0 (None)
10 RETURN_VALUE
>>> f.__code__.co_consts
(None, 128, (128, 128))
>>> id(f.__code__.co_consts[1], f.__code__.co_consts[2][0], f.__code__.co_consts[2][1])
4305296480, 4305296480, 4305296480
You may notice that the compiler has stored 128
as a constant even though it’s not actually used by the bytecode, which gives you an idea of how little optimization CPython’s compiler does. Which means that (non-empty) tuples actually don’t end up merged:
>>> k, l = (1, 2), (1, 2)
>>> k is l
False
Put that in a function, dis
it, and look at the co_consts
—there’s a 1
and a 2
, two (1, 2)
tuples that share the same 1
and 2
but are not identical, and a ((1, 2), (1, 2))
tuple that has the two distinct equal tuples.
There’s one more optimization that CPython does: string interning. Unlike compiler constant folding, this isn’t restricted to source code literals:
>>> m = 'abc'
>>> n = 'abc'
>>> m is n
True
On the other hand, it is limited to the str
type, and to strings of internal storage kind “ascii compact”, “compact”, or “legacy ready”, and in many cases only “ascii compact” will get interned.
At any rate, the rules for what values must be, might be, or cannot be distinct vary from implementation to implementation, and between versions of the same implementation, and maybe even between runs of the same code on the same copy of the same implementation.
It can be worth learning the rules for one specific Python for the fun of it. But it’s not worth relying on them in your code. The only safe rule is:
- Do not write code that assumes two equal but separately-created immutable values are identical (don’t use
x is y
, use x == y
)
- Do not write code that assumes two equal but separately-created immutable values are distinct (don’t use
x is not y
, use x != y
)
Or, in other words, only use is
to test for the documented singletons (like None
) or that are only created in one place in the code (like the _sentinel = object()
idiom).
回答 8
is
是身份相等运算符(功能类似于id(a) == id(b)
);只是两个相等的数字不一定是同一对象。出于性能原因,一些小整数正好会被记住,因此它们往往是相同的(因为它们是不可变的,因此可以这样做)。
===
另一方面,PHP的运算符被描述为检查相等性和类型:x == y and type(x) == type(y)
根据Paulo Freitas的评论。这足以满足通用数,但不同于以荒谬方式is
定义的类__eq__
:
class Unequal:
def __eq__(self, other):
return False
对于“内置”类,PHP显然允许相同的东西(我指的是在C级实现,而不是在PHP中实现)。计时器对象可能有点荒谬,它每次用作数字时,其值都不同。相当为什么要模拟Visual Basic,Now
而不是显示它是带有time.time()
我不知道。
Greg Hewgill(OP)发表了一条澄清的评论:“我的目标是比较对象标识,而不是价值相等。除了数字,我希望对象标识与价值相等相同。”
这将有另一个答案,因为我们必须将事物归类为数字,以选择是否与==
或进行比较is
。CPython定义数字协议,包括PyNumber_Check,但这不能从Python本身访问。
我们可以尝试使用isinstance
所有已知的数字类型,但这不可避免地是不完整的。类型模块包含一个StringTypes列表,但没有NumberTypes。从Python 2.6开始,内置数字类具有基类numbers.Number
,但存在相同的问题:
import numpy, numbers
assert not issubclass(numpy.int16,numbers.Number)
assert issubclass(int,numbers.Number)
顺便说一句,NumPy将产生低数字的单独实例。
我实际上不知道这个问题的答案。我想从理论上讲可以使用ctypes进行调用PyNumber_Check
,但是即使该函数也已经受到参数,并且肯定不是可移植的。我们只需要对我们目前要测试的内容有所保留。
最后,此问题源于Python最初没有类型树,其谓词如Scheme number?
或Haskell的 类型类 Num。is
检查对象身份,而不是值相等。PHP的历史也很悠久,===
显然is
只在PHP5中的对象上起作用,而在PHP4中没有。这就是跨语言(包括一种语言的版本)之间转移的越来越大的痛苦。
is
is the identity equality operator (functioning like id(a) == id(b)
); it’s just that two equal numbers aren’t necessarily the same object. For performance reasons some small integers happen to be memoized so they will tend to be the same (this can be done since they are immutable).
PHP’s ===
operator, on the other hand, is described as checking equality and type: x == y and type(x) == type(y)
as per Paulo Freitas’ comment. This will suffice for common numbers, but differ from is
for classes that define __eq__
in an absurd manner:
class Unequal:
def __eq__(self, other):
return False
PHP apparently allows the same thing for “built-in” classes (which I take to mean implemented at C level, not in PHP). A slightly less absurd use might be a timer object, which has a different value every time it’s used as a number. Quite why you’d want to emulate Visual Basic’s Now
instead of showing that it is an evaluation with time.time()
I don’t know.
Greg Hewgill (OP) made one clarifying comment “My goal is to compare object identity, rather than equality of value. Except for numbers, where I want to treat object identity the same as equality of value.”
This would have yet another answer, as we have to categorize things as numbers or not, to select whether we compare with ==
or is
. CPython defines the number protocol, including PyNumber_Check, but this is not accessible from Python itself.
We could try to use isinstance
with all the number types we know of, but this would inevitably be incomplete. The types module contains a StringTypes list but no NumberTypes. Since Python 2.6, the built in number classes have a base class numbers.Number
, but it has the same problem:
import numpy, numbers
assert not issubclass(numpy.int16,numbers.Number)
assert issubclass(int,numbers.Number)
By the way, NumPy will produce separate instances of low numbers.
I don’t actually know an answer to this variant of the question. I suppose one could theoretically use ctypes to call PyNumber_Check
, but even that function has been debated, and it’s certainly not portable. We’ll just have to be less particular about what we test for now.
In the end, this issue stems from Python not originally having a type tree with predicates like Scheme’s number?
, or Haskell’s type class Num. is
checks object identity, not value equality. PHP has a colorful history as well, where ===
apparently behaves as is
only on objects in PHP5, but not PHP4. Such are the growing pains of moving across languages (including versions of one).
回答 9
字符串也会发生这种情况:
>>> s = b = 'somestr'
>>> s == b, s is b, id(s), id(b)
(True, True, 4555519392, 4555519392)
现在一切似乎都很好。
>>> s = 'somestr'
>>> b = 'somestr'
>>> s == b, s is b, id(s), id(b)
(True, True, 4555519392, 4555519392)
这也是预期的。
>>> s1 = b1 = 'somestrdaasd ad ad asd as dasddsg,dlfg ,;dflg, dfg a'
>>> s1 == b1, s1 is b1, id(s1), id(b1)
(True, True, 4555308080, 4555308080)
>>> s1 = 'somestrdaasd ad ad asd as dasddsg,dlfg ,;dflg, dfg a'
>>> b1 = 'somestrdaasd ad ad asd as dasddsg,dlfg ,;dflg, dfg a'
>>> s1 == b1, s1 is b1, id(s1), id(b1)
(True, False, 4555308176, 4555308272)
现在那是出乎意料的。
It also happens with strings:
>>> s = b = 'somestr'
>>> s == b, s is b, id(s), id(b)
(True, True, 4555519392, 4555519392)
Now everything seems fine.
>>> s = 'somestr'
>>> b = 'somestr'
>>> s == b, s is b, id(s), id(b)
(True, True, 4555519392, 4555519392)
That’s expected too.
>>> s1 = b1 = 'somestrdaasd ad ad asd as dasddsg,dlfg ,;dflg, dfg a'
>>> s1 == b1, s1 is b1, id(s1), id(b1)
(True, True, 4555308080, 4555308080)
>>> s1 = 'somestrdaasd ad ad asd as dasddsg,dlfg ,;dflg, dfg a'
>>> b1 = 'somestrdaasd ad ad asd as dasddsg,dlfg ,;dflg, dfg a'
>>> s1 == b1, s1 is b1, id(s1), id(b1)
(True, False, 4555308176, 4555308272)
Now that’s unexpected.
回答 10
What’s New In Python 3.8: Changes in Python behavior:
The compiler now produces a SyntaxWarning when identity checks (is
and
is not
) are used with certain types of literals (e.g. strings, ints).
These can often work by accident in CPython, but are not guaranteed by
the language spec. The warning advises users to use equality tests (==
and !=
) instead.