## 问题：为什么在Python 3中“范围（1000000000000000（1000000000000001））”这么快？

``1000000000000000 in range(1000000000000001)``

``1000000000000000000000 in range(0,1000000000000000000001,10) # count by tens``

``````def my_crappy_range(N):
i = 0
while i < N:
yield i
i += 1
return``````

It is my understanding that the `range()` function, which is actually an object type in Python 3, generates its contents on the fly, similar to a generator.

This being the case, I would have expected the following line to take an inordinate amount of time, because in order to determine whether 1 quadrillion is in the range, a quadrillion values would have to be generated:

``````1000000000000000 in range(1000000000000001)
``````

Furthermore: it seems that no matter how many zeroes I add on, the calculation more or less takes the same amount of time (basically instantaneous).

I have also tried things like this, but the calculation is still almost instant:

``````1000000000000000000000 in range(0,1000000000000000000001,10) # count by tens
``````

If I try to implement my own range function, the result is not so nice!!

``````def my_crappy_range(N):
i = 0
while i < N:
yield i
i += 1
return
``````

What is the `range()` object doing under the hood that makes it so fast?

Martijn Pieters’ answer was chosen for its completeness, but also see abarnert’s first answer for a good discussion of what it means for `range` to be a full-fledged sequence in Python 3, and some information/warning regarding potential inconsistency for `__contains__` function optimization across Python implementations. abarnert’s other answer goes into some more detail and provides links for those interested in the history behind the optimization in Python 3 (and lack of optimization of `xrange` in Python 2). Answers by poke and by wim provide the relevant C source code and explanations for those who are interested.

## 回答 0

Python 3 `range()`对象不会立即产生数字。它是一个智能序列对象，可按需生成数字。它包含的只是您的开始，结束和步长值，然后在对对象进行迭代时，每次迭代都会计算下一个整数。

``````class my_range(object):
def __init__(self, start, stop=None, step=1):
if stop is None:
start, stop = 0, start
self.start, self.stop, self.step = start, stop, step
if step < 0:
lo, hi, step = stop, start, -step
else:
lo, hi = start, stop
self.length = 0 if lo > hi else ((hi - lo - 1) // step) + 1

def __iter__(self):
current = self.start
if self.step < 0:
while current > self.stop:
yield current
current += self.step
else:
while current < self.stop:
yield current
current += self.step

def __len__(self):
return self.length

def __getitem__(self, i):
if i < 0:
i += self.length
if 0 <= i < self.length:
return self.start + i * self.step
raise IndexError('Index out of range: {}'.format(i))

def __contains__(self, num):
if self.step < 0:
if not (self.stop < num <= self.start):
return False
else:
if not (self.start <= num < self.stop):
return False
return (num - self.start) % self.step == 0``````

* 由于Python整数是无界的，所以时间接近恒定，因此数学运算也随着N的增长而及时增长，这使其成为O（log N）运算。由于所有操作均以优化的C代码执行，并且Python将整数值存储在30位块中，因此，由于此处涉及的整数大小，您会用光内存，然后再看到任何性能影响。

The Python 3 `range()` object doesn’t produce numbers immediately; it is a smart sequence object that produces numbers on demand. All it contains is your start, stop and step values, then as you iterate over the object the next integer is calculated each iteration.

The object also implements the `object.__contains__` hook, and calculates if your number is part of its range. Calculating is a (near) constant time operation *. There is never a need to scan through all possible integers in the range.

The advantage of the `range` type over a regular `list` or `tuple` is that a range object will always take the same (small) amount of memory, no matter the size of the range it represents (as it only stores the `start`, `stop` and `step` values, calculating individual items and subranges as needed).

So at a minimum, your `range()` object would do:

``````class my_range(object):
def __init__(self, start, stop=None, step=1):
if stop is None:
start, stop = 0, start
self.start, self.stop, self.step = start, stop, step
if step < 0:
lo, hi, step = stop, start, -step
else:
lo, hi = start, stop
self.length = 0 if lo > hi else ((hi - lo - 1) // step) + 1

def __iter__(self):
current = self.start
if self.step < 0:
while current > self.stop:
yield current
current += self.step
else:
while current < self.stop:
yield current
current += self.step

def __len__(self):
return self.length

def __getitem__(self, i):
if i < 0:
i += self.length
if 0 <= i < self.length:
return self.start + i * self.step
raise IndexError('Index out of range: {}'.format(i))

def __contains__(self, num):
if self.step < 0:
if not (self.stop < num <= self.start):
return False
else:
if not (self.start <= num < self.stop):
return False
return (num - self.start) % self.step == 0
``````

This is still missing several things that a real `range()` supports (such as the `.index()` or `.count()` methods, hashing, equality testing, or slicing), but should give you an idea.

I also simplified the `__contains__` implementation to only focus on integer tests; if you give a real `range()` object a non-integer value (including subclasses of `int`), a slow scan is initiated to see if there is a match, just as if you use a containment test against a list of all the contained values. This was done to continue to support other numeric types that just happen to support equality testing with integers but are not expected to support integer arithmetic as well. See the original Python issue that implemented the containment test.

* Near constant time because Python integers are unbounded and so math operations also grow in time as N grows, making this a O(log N) operation. Since it’s all executed in optimised C code and Python stores integer values in 30-bit chunks, you’d run out of memory before you saw any performance impact due to the size of the integers involved here.

## 回答 1

``````>>> a = range(5)
>>> print(list(a))
[0, 1, 2, 3, 4]
>>> print(list(a))
[0, 1, 2, 3, 4]``````

``````>>> b = my_crappy_range(5)
>>> print(list(b))
[0, 1, 2, 3, 4]
>>> print(list(b))
[]``````

``````>>> import collections.abc
>>> isinstance(a, collections.abc.Sequence)
True``````

``````>>> a[3]         # indexable
3
>>> len(a)       # sized
5
>>> 3 in a       # membership
True
>>> reversed(a)  # reversible
<range_iterator at 0x101cd2360>
>>> a.index(3)   # implements 'index'
3
>>> a.count(3)   # implements 'count'
1``````

（作为一个旁注，如果您使用`print(iter(a))`，则会注意到`range`使用与相同的`listiterator`类型`list`。它是如何工作的？A 除了`listiterator`使用`list`C的C实现这一事实外，没有使用任何其他特殊方法`__getitem__`，因此对于`range`太。）

The fundamental misunderstanding here is in thinking that `range` is a generator. It’s not. In fact, it’s not any kind of iterator.

You can tell this pretty easily:

``````>>> a = range(5)
>>> print(list(a))
[0, 1, 2, 3, 4]
>>> print(list(a))
[0, 1, 2, 3, 4]
``````

If it were a generator, iterating it once would exhaust it:

``````>>> b = my_crappy_range(5)
>>> print(list(b))
[0, 1, 2, 3, 4]
>>> print(list(b))
[]
``````

What `range` actually is, is a sequence, just like a list. You can even test this:

``````>>> import collections.abc
>>> isinstance(a, collections.abc.Sequence)
True
``````

This means it has to follow all the rules of being a sequence:

``````>>> a[3]         # indexable
3
>>> len(a)       # sized
5
>>> 3 in a       # membership
True
>>> reversed(a)  # reversible
<range_iterator at 0x101cd2360>
>>> a.index(3)   # implements 'index'
3
>>> a.count(3)   # implements 'count'
1
``````

The difference between a `range` and a `list` is that a `range` is a lazy or dynamic sequence; it doesn’t remember all of its values, it just remembers its `start`, `stop`, and `step`, and creates the values on demand on `__getitem__`.

(As a side note, if you `print(iter(a))`, you’ll notice that `range` uses the same `listiterator` type as `list`. How does that work? A `listiterator` doesn’t use anything special about `list` except for the fact that it provides a C implementation of `__getitem__`, so it works fine for `range` too.)

Now, there’s nothing that says that `Sequence.__contains__` has to be constant time—in fact, for obvious examples of sequences like `list`, it isn’t. But there’s nothing that says it can’t be. And it’s easier to implement `range.__contains__` to just check it mathematically (`(val - start) % step`, but with some extra complexity to deal with negative steps) than to actually generate and test all the values, so why shouldn’t it do it the better way?

But there doesn’t seem to be anything in the language that guarantees this will happen. As Ashwini Chaudhari points out, if you give it a non-integral value, instead of converting to integer and doing the mathematical test, it will fall back to iterating all the values and comparing them one by one. And just because CPython 3.2+ and PyPy 3.x versions happen to contain this optimization, and it’s an obvious good idea and easy to do, there’s no reason that IronPython or NewKickAssPython 3.x couldn’t leave it out. (And in fact CPython 3.0-3.1 didn’t include it.)

If `range` actually were a generator, like `my_crappy_range`, then it wouldn’t make sense to test `__contains__` this way, or at least the way it makes sense wouldn’t be obvious. If you’d already iterated the first 3 values, is `1` still `in` the generator? Should testing for `1` cause it to iterate and consume all the values up to `1` (or up to the first value `>= 1`)?

## 回答 2

1. 检查数字在`start`和之间`stop`，以及
2. 检查步幅值是否不会“超过”我们的数字。

1. `4 <= 994 < 1000`
2. `(994 - 4) % 2 == 0`

``````static int
range_contains_long(rangeobject *r, PyObject *ob)
{
int cmp1, cmp2, cmp3;
PyObject *tmp1 = NULL;
PyObject *tmp2 = NULL;
PyObject *zero = NULL;
int result = -1;

zero = PyLong_FromLong(0);
if (zero == NULL) /* MemoryError in int(0) */
goto end;

/* Check if the value can possibly be in the range. */

cmp1 = PyObject_RichCompareBool(r->step, zero, Py_GT);
if (cmp1 == -1)
goto end;
if (cmp1 == 1) { /* positive steps: start <= ob < stop */
cmp2 = PyObject_RichCompareBool(r->start, ob, Py_LE);
cmp3 = PyObject_RichCompareBool(ob, r->stop, Py_LT);
}
else { /* negative steps: stop < ob <= start */
cmp2 = PyObject_RichCompareBool(ob, r->start, Py_LE);
cmp3 = PyObject_RichCompareBool(r->stop, ob, Py_LT);
}

if (cmp2 == -1 || cmp3 == -1) /* TypeError */
goto end;
if (cmp2 == 0 || cmp3 == 0) { /* ob outside of range */
result = 0;
goto end;
}

/* Check that the stride does not invalidate ob's membership. */
tmp1 = PyNumber_Subtract(ob, r->start);
if (tmp1 == NULL)
goto end;
tmp2 = PyNumber_Remainder(tmp1, r->step);
if (tmp2 == NULL)
goto end;
/* result = ((int(ob) - start) % step) == 0 */
result = PyObject_RichCompareBool(tmp2, zero, Py_EQ);
end:
Py_XDECREF(tmp1);
Py_XDECREF(tmp2);
Py_XDECREF(zero);
return result;
}

static int
range_contains(rangeobject *r, PyObject *ob)
{
if (PyLong_CheckExact(ob) || PyBool_Check(ob))
return range_contains_long(r, ob);

return (int)_PySequence_IterSearch((PyObject*)r, ob,
PY_ITERSEARCH_CONTAINS);
}``````

``/* result = ((int(ob) - start) % step) == 0 */ ``

``````>>> x, r = 1000000000000000, range(1000000000000001)
>>> class MyInt(int):
...     pass
...
>>> x_ = MyInt(x)
>>> x in r  # calculates immediately :)
True
>>> x_ in r  # iterates for ages.. :(
^\Quit (core dumped)``````

Use the source, Luke!

In CPython, `range(...).__contains__` (a method wrapper) will eventually delegate to a simple calculation which checks if the value can possibly be in the range. The reason for the speed here is we’re using mathematical reasoning about the bounds, rather than a direct iteration of the range object. To explain the logic used:

1. Check that the number is between `start` and `stop`, and
2. Check that the stride value doesn’t “step over” our number.

For example, `994` is in `range(4, 1000, 2)` because:

1. `4 <= 994 < 1000`, and
2. `(994 - 4) % 2 == 0`.

The full C code is included below, which is a bit more verbose because of memory management and reference counting details, but the basic idea is there:

``````static int
range_contains_long(rangeobject *r, PyObject *ob)
{
int cmp1, cmp2, cmp3;
PyObject *tmp1 = NULL;
PyObject *tmp2 = NULL;
PyObject *zero = NULL;
int result = -1;

zero = PyLong_FromLong(0);
if (zero == NULL) /* MemoryError in int(0) */
goto end;

/* Check if the value can possibly be in the range. */

cmp1 = PyObject_RichCompareBool(r->step, zero, Py_GT);
if (cmp1 == -1)
goto end;
if (cmp1 == 1) { /* positive steps: start <= ob < stop */
cmp2 = PyObject_RichCompareBool(r->start, ob, Py_LE);
cmp3 = PyObject_RichCompareBool(ob, r->stop, Py_LT);
}
else { /* negative steps: stop < ob <= start */
cmp2 = PyObject_RichCompareBool(ob, r->start, Py_LE);
cmp3 = PyObject_RichCompareBool(r->stop, ob, Py_LT);
}

if (cmp2 == -1 || cmp3 == -1) /* TypeError */
goto end;
if (cmp2 == 0 || cmp3 == 0) { /* ob outside of range */
result = 0;
goto end;
}

/* Check that the stride does not invalidate ob's membership. */
tmp1 = PyNumber_Subtract(ob, r->start);
if (tmp1 == NULL)
goto end;
tmp2 = PyNumber_Remainder(tmp1, r->step);
if (tmp2 == NULL)
goto end;
/* result = ((int(ob) - start) % step) == 0 */
result = PyObject_RichCompareBool(tmp2, zero, Py_EQ);
end:
Py_XDECREF(tmp1);
Py_XDECREF(tmp2);
Py_XDECREF(zero);
return result;
}

static int
range_contains(rangeobject *r, PyObject *ob)
{
if (PyLong_CheckExact(ob) || PyBool_Check(ob))
return range_contains_long(r, ob);

return (int)_PySequence_IterSearch((PyObject*)r, ob,
PY_ITERSEARCH_CONTAINS);
}
``````

The “meat” of the idea is mentioned in the line:

``````/* result = ((int(ob) - start) % step) == 0 */
``````

As a final note – look at the `range_contains` function at the bottom of the code snippet. If the exact type check fails then we don’t use the clever algorithm described, instead falling back to a dumb iteration search of the range using `_PySequence_IterSearch`! You can check this behaviour in the interpreter (I’m using v3.5.0 here):

``````>>> x, r = 1000000000000000, range(1000000000000001)
>>> class MyInt(int):
...     pass
...
>>> x_ = MyInt(x)
>>> x in r  # calculates immediately :)
True
>>> x_ in r  # iterates for ages.. :(
^\Quit (core dumped)
``````

## 回答 3

``````static int
range_contains(rangeobject *r, PyObject *ob)
{
if (PyLong_CheckExact(ob) || PyBool_Check(ob))
return range_contains_long(r, ob);

return (int)_PySequence_IterSearch((PyObject*)r, ob,
PY_ITERSEARCH_CONTAINS);
}``````

``````def range_contains (rangeObj, obj):
if isinstance(obj, int):
return range_contains_long(rangeObj, obj)

# default logic by iterating
return any(obj == x for x in rangeObj)

def range_contains_long (r, num):
if r.step > 0:
# positive step: r.start <= num < r.stop
cmp2 = r.start <= num
cmp3 = num < r.stop
else:
# negative step: r.start >= num > r.stop
cmp2 = num <= r.start
cmp3 = r.stop < num

# outside of the range boundaries
if not cmp2 or not cmp3:
return False

# num must be on a valid step inside the boundaries
return (num - r.start) % r.step == 0``````

To add to Martijn’s answer, this is the relevant part of the source (in C, as the range object is written in native code):

``````static int
range_contains(rangeobject *r, PyObject *ob)
{
if (PyLong_CheckExact(ob) || PyBool_Check(ob))
return range_contains_long(r, ob);

return (int)_PySequence_IterSearch((PyObject*)r, ob,
PY_ITERSEARCH_CONTAINS);
}
``````

So for `PyLong` objects (which is `int` in Python 3), it will use the `range_contains_long` function to determine the result. And that function essentially checks if `ob` is in the specified range (although it looks a bit more complex in C).

If it’s not an `int` object, it falls back to iterating until it finds the value (or not).

The whole logic could be translated to pseudo-Python like this:

``````def range_contains (rangeObj, obj):
if isinstance(obj, int):
return range_contains_long(rangeObj, obj)

# default logic by iterating
return any(obj == x for x in rangeObj)

def range_contains_long (r, num):
if r.step > 0:
# positive step: r.start <= num < r.stop
cmp2 = r.start <= num
cmp3 = num < r.stop
else:
# negative step: r.start >= num > r.stop
cmp2 = num <= r.start
cmp3 = r.stop < num

# outside of the range boundaries
if not cmp2 or not cmp3:
return False

# num must be on a valid step inside the boundaries
return (num - r.start) % r.step == 0
``````

## 回答 4

*实际上，您甚至可以单独使用索引免费获得迭代，但是在2.3 `xrange`对象中获得了自定义迭代器。

**第一个版本实际上是重新实现了它，并且弄错了细节-例如，它将给您`MyIntSubclass(2) in range(5) == False`。但是Daniel Stutzbach的补丁更新版本恢复了以前的大部分代码，包括对通用代码的后备支持，`_PySequence_IterSearch``range.__contains__`在不应用优化的情况下会缓慢地降低3.2 之前版本的隐式使用。

If you’re wondering why this optimization was added to `range.__contains__`, and why it wasn’t added to `xrange.__contains__` in 2.7:

First, as Ashwini Chaudhary discovered, issue 1766304 was opened explicitly to optimize `[x]range.__contains__`. A patch for this was accepted and checked in for 3.2, but not backported to 2.7 because “xrange has behaved like this for such a long time that I don’t see what it buys us to commit the patch this late.” (2.7 was nearly out at that point.)

Meanwhile:

Originally, `xrange` was a not-quite-sequence object. As the 3.1 docs say:

Range objects have very little behavior: they only support indexing, iteration, and the `len` function.

This wasn’t quite true; an `xrange` object actually supported a few other things that come automatically with indexing and `len`,* including `__contains__` (via linear search). But nobody thought it was worth making them full sequences at the time.

Then, as part of implementing the Abstract Base Classes PEP, it was important to figure out which builtin types should be marked as implementing which ABCs, and `xrange`/`range` claimed to implement `collections.Sequence`, even though it still only handled the same “very little behavior”. Nobody noticed that problem until issue 9213. The patch for that issue not only added `index` and `count` to 3.2’s `range`, it also re-worked the optimized `__contains__` (which shares the same math with `index`, and is directly used by `count`).**This change went in for 3.2 as well, and was not backported to 2.x, because “it’s a bugfix that adds new methods”. (At this point, 2.7 was already past rc status.)

So, there were two chances to get this optimization backported to 2.7, but they were both rejected.

* In fact, you even get iteration for free with indexing alone, but in 2.3 `xrange` objects got a custom iterator.

** The first version actually reimplemented it, and got the details wrong—e.g., it would give you `MyIntSubclass(2) in range(5) == False`. But Daniel Stutzbach’s updated version of the patch restored most of the previous code, including the fallback to the generic, slow `_PySequence_IterSearch` that pre-3.2 `range.__contains__` was implicitly using when the optimization doesn’t apply.

## 回答 5

``````>>> r = range(5)
>>> for i in r:
print(i, 2 in r, list(r))

0 True [0, 1, 2, 3, 4]
1 True [0, 1, 2, 3, 4]
2 True [0, 1, 2, 3, 4]
3 True [0, 1, 2, 3, 4]
4 True [0, 1, 2, 3, 4]``````

The other answers explained it well already, but I’d like to offer another experiment illustrating the nature of range objects:

``````>>> r = range(5)
>>> for i in r:
print(i, 2 in r, list(r))

0 True [0, 1, 2, 3, 4]
1 True [0, 1, 2, 3, 4]
2 True [0, 1, 2, 3, 4]
3 True [0, 1, 2, 3, 4]
4 True [0, 1, 2, 3, 4]
``````

As you can see, a range object is an object that remembers its range and can be used many times (even while iterating over it), not just a one-time generator.

## 回答 6

`sys.maxsize in range(sys.maxsize)` 相当快

`Decimal(sys.maxsize) in range(sys.maxsize)` 很慢

（在这种情况下， `range`，因此，如果python收到意外的Decimal，则python将比较所有数字）

It’s all about a lazy approach to the evaluation and some extra optimization of `range`. Values in ranges don’t need to be computed until real use, or even further due to extra optimization.

By the way, your integer is not such big, consider `sys.maxsize`

`sys.maxsize in range(sys.maxsize)` is pretty fast

due to optimization – it’s easy to compare given integer just with min and max of range.

but:

`Decimal(sys.maxsize) in range(sys.maxsize)` is pretty slow.

(in this case, there is no optimization in `range`, so if python receives unexpected Decimal, python will compare all numbers)

You should be aware of an implementation detail but should not be relied upon, because this may change in the future.

## TL;DR

The object returned by `range()` is actually a `range` object. This object implements the iterator interface so you can iterate over its values sequentially, just like a generator, list, or tuple.

But it also implements the interface which is actually what gets called when an object appears on the right hand side of the `in` operator. The `__contains__()` method returns a `bool` of whether or not the item on the left-hand-side of the `in` is in the object. Since `range` objects know their bounds and stride, this is very easy to implement in O(1).

## 回答 8

1. 由于优化，将给定的整数与最小和最大范围进行比较非常容易。
2. 在Python3 中range（）函数之所以如此之快，是因为这里我们对边界使用数学推理，而不是范围对象的直接迭代。
3. 所以在这里解释逻辑：
• 检查数字是否在开始和停止之间。
• 检查步长精度值是否不超过我们的数字。
4. 例如，997在range（4，1000，3）内是因为：

`4 <= 997 < 1000, and (997 - 4) % 3 == 0.`

1. Due to optimization, it is very easy to compare given integers just with min and max range.
2. The reason that range() function is so fast in Python3 is that here we use mathematical reasoning for the bounds, rather than a direct iteration of the range object.
3. So for explaining the logic here:
• Check whether the number is between the start and stop.
• Check whether the step precision value doesn’t go over our number.
4. Take an example, 997 is in range(4, 1000, 3) because:

`4 <= 997 < 1000, and (997 - 4) % 3 == 0.`

## 回答 9

Try `x-1 in (i for i in range(x))` for large `x` values, which uses a generator comprehension to avoid invoking the `range.__contains__` optimisation.