为什么在Python 3中“范围(1000000000000000(1000000000000001))”这么快?

问题:为什么在Python 3中“范围(1000000000000000(1000000000000001))”这么快?

据我了解,该range()函数实际上是Python 3中的一种对象类型,它会生成器一样动态生成其内容。

在这种情况下,我本以为下一行会花费过多的时间,因为要确定1个四舍五入是否在该范围内,必须生成一个四舍五入值:

1000000000000000 in range(1000000000000001)

此外:似乎无论我添加多少个零,计算多少都花费相同的时间(基本上是瞬时的)。

我也尝试过这样的事情,但是计算仍然是即时的:

1000000000000000000000 in range(0,1000000000000000000001,10) # count by tens

如果我尝试实现自己的范围函数,结果将不是很好!

def my_crappy_range(N):
    i = 0
    while i < N:
        yield i
        i += 1
    return

使range()物体如此之快的物体在做什么?


选择Martijn Pieters的答案是因为它的完整性,但也看到了abarnert的第一个答案,它很好地讨论了在Python 3中range成为完整序列的含义,以及一些有关__contains__跨Python实现的函数优化潜在不一致的信息/警告。。abarnert的其他答案更加详细,并为那些对Python 3优化背后的历史(以及xrangePython 2中缺乏优化)感兴趣的人提供了链接。pokewim的答案感兴趣的人提供了相关的C源代码和说明。

It is my understanding that the range() function, which is actually an object type in Python 3, generates its contents on the fly, similar to a generator.

This being the case, I would have expected the following line to take an inordinate amount of time, because in order to determine whether 1 quadrillion is in the range, a quadrillion values would have to be generated:

1000000000000000 in range(1000000000000001)

Furthermore: it seems that no matter how many zeroes I add on, the calculation more or less takes the same amount of time (basically instantaneous).

I have also tried things like this, but the calculation is still almost instant:

1000000000000000000000 in range(0,1000000000000000000001,10) # count by tens

If I try to implement my own range function, the result is not so nice!!

def my_crappy_range(N):
    i = 0
    while i < N:
        yield i
        i += 1
    return

What is the range() object doing under the hood that makes it so fast?


Martijn Pieters’ answer was chosen for its completeness, but also see abarnert’s first answer for a good discussion of what it means for range to be a full-fledged sequence in Python 3, and some information/warning regarding potential inconsistency for __contains__ function optimization across Python implementations. abarnert’s other answer goes into some more detail and provides links for those interested in the history behind the optimization in Python 3 (and lack of optimization of xrange in Python 2). Answers by poke and by wim provide the relevant C source code and explanations for those who are interested.


回答 0

Python 3 range()对象不会立即产生数字。它是一个智能序列对象,可按需生成数字。它包含的只是您的开始,结束和步长值,然后在对对象进行迭代时,每次迭代都会计算下一个整数。

该对象还实现了object.__contains__hook,并计算您的电话号码是否在其范围内。计算是一个(近)恒定时间运算*。永远不需要扫描范围内的所有可能整数。

range()对象文档中

所述的优点range类型通过常规listtuple是一个范围对象将始终以相同的内存(小)数量,无论它代表的范围内的大小(因为它仅存储startstopstep值,计算各个项目和子范围如所须)。

因此,您的range()对象至少可以做到:

class my_range(object):
    def __init__(self, start, stop=None, step=1):
        if stop is None:
            start, stop = 0, start
        self.start, self.stop, self.step = start, stop, step
        if step < 0:
            lo, hi, step = stop, start, -step
        else:
            lo, hi = start, stop
        self.length = 0 if lo > hi else ((hi - lo - 1) // step) + 1

    def __iter__(self):
        current = self.start
        if self.step < 0:
            while current > self.stop:
                yield current
                current += self.step
        else:
            while current < self.stop:
                yield current
                current += self.step

    def __len__(self):
        return self.length

    def __getitem__(self, i):
        if i < 0:
            i += self.length
        if 0 <= i < self.length:
            return self.start + i * self.step
        raise IndexError('Index out of range: {}'.format(i))

    def __contains__(self, num):
        if self.step < 0:
            if not (self.stop < num <= self.start):
                return False
        else:
            if not (self.start <= num < self.stop):
                return False
        return (num - self.start) % self.step == 0

这仍然缺少实际range()支持的几项内容(例如.index().count()方法,哈希,相等性测试或切片),但应该可以给您一个提示。

我还简化了__contains__实现,只专注于整数测试。如果您为实物range()提供非整数值(包括的子类int),则会启动慢速扫描以查看是否存在匹配项,就好像您对所有包含的值的列表使用了包含测试一样。这样做是为了继续支持其他数字类型,这些数字类型恰好支持使用整数进行相等性测试,但也不希望同时支持整数算术。请参阅实现收容测试的原始Python问题


* 由于Python整数是无界的,所以时间接近恒定,因此数学运算也随着N的增长而及时增长,这使其成为O(log N)运算。由于所有操作均以优化的C代码执行,并且Python将整数值存储在30位块中,因此,由于此处涉及的整数大小,您会用光内存,然后再看到任何性能影响。

The Python 3 range() object doesn’t produce numbers immediately; it is a smart sequence object that produces numbers on demand. All it contains is your start, stop and step values, then as you iterate over the object the next integer is calculated each iteration.

The object also implements the object.__contains__ hook, and calculates if your number is part of its range. Calculating is a (near) constant time operation *. There is never a need to scan through all possible integers in the range.

From the range() object documentation:

The advantage of the range type over a regular list or tuple is that a range object will always take the same (small) amount of memory, no matter the size of the range it represents (as it only stores the start, stop and step values, calculating individual items and subranges as needed).

So at a minimum, your range() object would do:

class my_range(object):
    def __init__(self, start, stop=None, step=1):
        if stop is None:
            start, stop = 0, start
        self.start, self.stop, self.step = start, stop, step
        if step < 0:
            lo, hi, step = stop, start, -step
        else:
            lo, hi = start, stop
        self.length = 0 if lo > hi else ((hi - lo - 1) // step) + 1

    def __iter__(self):
        current = self.start
        if self.step < 0:
            while current > self.stop:
                yield current
                current += self.step
        else:
            while current < self.stop:
                yield current
                current += self.step

    def __len__(self):
        return self.length

    def __getitem__(self, i):
        if i < 0:
            i += self.length
        if 0 <= i < self.length:
            return self.start + i * self.step
        raise IndexError('Index out of range: {}'.format(i))

    def __contains__(self, num):
        if self.step < 0:
            if not (self.stop < num <= self.start):
                return False
        else:
            if not (self.start <= num < self.stop):
                return False
        return (num - self.start) % self.step == 0

This is still missing several things that a real range() supports (such as the .index() or .count() methods, hashing, equality testing, or slicing), but should give you an idea.

I also simplified the __contains__ implementation to only focus on integer tests; if you give a real range() object a non-integer value (including subclasses of int), a slow scan is initiated to see if there is a match, just as if you use a containment test against a list of all the contained values. This was done to continue to support other numeric types that just happen to support equality testing with integers but are not expected to support integer arithmetic as well. See the original Python issue that implemented the containment test.


* Near constant time because Python integers are unbounded and so math operations also grow in time as N grows, making this a O(log N) operation. Since it’s all executed in optimised C code and Python stores integer values in 30-bit chunks, you’d run out of memory before you saw any performance impact due to the size of the integers involved here.


回答 1

此处的根本误解是认为range是生成器。不是。实际上,它不是任何迭代器。

您可以很容易地说出这一点:

>>> a = range(5)
>>> print(list(a))
[0, 1, 2, 3, 4]
>>> print(list(a))
[0, 1, 2, 3, 4]

如果它是一个生成器,则对其进行一次迭代将耗尽它:

>>> b = my_crappy_range(5)
>>> print(list(b))
[0, 1, 2, 3, 4]
>>> print(list(b))
[]

什么range实际上是,是一个序列,就像一个列表。您甚至可以测试一下:

>>> import collections.abc
>>> isinstance(a, collections.abc.Sequence)
True

这意味着它必须遵循成为序列的所有规则:

>>> a[3]         # indexable
3
>>> len(a)       # sized
5
>>> 3 in a       # membership
True
>>> reversed(a)  # reversible
<range_iterator at 0x101cd2360>
>>> a.index(3)   # implements 'index'
3
>>> a.count(3)   # implements 'count'
1

一个之间的差range和一list在于,range动态序列; 它不记得所有的价值,它只是记住它startstopstep,并根据需要创建的值__getitem__

(作为一个旁注,如果您使用print(iter(a)),则会注意到range使用与相同的listiterator类型list。它是如何工作的?A 除了listiterator使用listC的C实现这一事实外,没有使用任何其他特殊方法__getitem__,因此对于range太。)


现在,没有什么可以说Sequence.__contains__必须是恒定时间的-实际上,对于类似的明显示例list,事实并非如此。但是没有什么可以说是不可能的。与range.__contains__(val - start) % step实际进行计算和测试所有值相比,仅对其进行数学检查(,但具有一些额外的复杂性来处理否定步骤)要容易实现,那么为什么这样做会更好呢?

但是似乎没有什么语言可以保证会发生这种情况。正如Ashwini Chaudhari指出的那样,如果您给它提供一个非整数值,而不是转换为整数并进行数学测试,它将落到对所有值进行迭代并逐一进行比较的过程中。不仅因为CPython 3.2+和PyPy 3.x版本恰好包含此优化,而且这是一个显而易见的好主意且易于实现,所以Iron Iron或NewKickAssPython 3.x没有理由不能放弃它。(实际上,CPython 3.0-3.1 并未包含它。)


如果range实际上是一个生成器(如)my_crappy_range,那么以__contains__这种方式进行测试就没有意义,或者至少有一种合理的方式并不明显。如果您已经迭代了前三个值,那么生成器1仍然in是吗?测试是否应该1使其迭代并消耗所有值1(或直到第一个值>= 1)?

The fundamental misunderstanding here is in thinking that range is a generator. It’s not. In fact, it’s not any kind of iterator.

You can tell this pretty easily:

>>> a = range(5)
>>> print(list(a))
[0, 1, 2, 3, 4]
>>> print(list(a))
[0, 1, 2, 3, 4]

If it were a generator, iterating it once would exhaust it:

>>> b = my_crappy_range(5)
>>> print(list(b))
[0, 1, 2, 3, 4]
>>> print(list(b))
[]

What range actually is, is a sequence, just like a list. You can even test this:

>>> import collections.abc
>>> isinstance(a, collections.abc.Sequence)
True

This means it has to follow all the rules of being a sequence:

>>> a[3]         # indexable
3
>>> len(a)       # sized
5
>>> 3 in a       # membership
True
>>> reversed(a)  # reversible
<range_iterator at 0x101cd2360>
>>> a.index(3)   # implements 'index'
3
>>> a.count(3)   # implements 'count'
1

The difference between a range and a list is that a range is a lazy or dynamic sequence; it doesn’t remember all of its values, it just remembers its start, stop, and step, and creates the values on demand on __getitem__.

(As a side note, if you print(iter(a)), you’ll notice that range uses the same listiterator type as list. How does that work? A listiterator doesn’t use anything special about list except for the fact that it provides a C implementation of __getitem__, so it works fine for range too.)


Now, there’s nothing that says that Sequence.__contains__ has to be constant time—in fact, for obvious examples of sequences like list, it isn’t. But there’s nothing that says it can’t be. And it’s easier to implement range.__contains__ to just check it mathematically ((val - start) % step, but with some extra complexity to deal with negative steps) than to actually generate and test all the values, so why shouldn’t it do it the better way?

But there doesn’t seem to be anything in the language that guarantees this will happen. As Ashwini Chaudhari points out, if you give it a non-integral value, instead of converting to integer and doing the mathematical test, it will fall back to iterating all the values and comparing them one by one. And just because CPython 3.2+ and PyPy 3.x versions happen to contain this optimization, and it’s an obvious good idea and easy to do, there’s no reason that IronPython or NewKickAssPython 3.x couldn’t leave it out. (And in fact CPython 3.0-3.1 didn’t include it.)


If range actually were a generator, like my_crappy_range, then it wouldn’t make sense to test __contains__ this way, or at least the way it makes sense wouldn’t be obvious. If you’d already iterated the first 3 values, is 1 still in the generator? Should testing for 1 cause it to iterate and consume all the values up to 1 (or up to the first value >= 1)?


回答 2

使用消息来源,卢克!

在CPython中,range(...).__contains__(方法包装器)最终将委托给一个简单的计算,该计算将检查该值是否可以在该范围内。速度之所以如此,是因为我们使用关于边界的数学推理,而不是range对象的直接迭代。解释所使用的逻辑:

  1. 检查数字在start和之间stop,以及
  2. 检查步幅值是否不会“超过”我们的数字。

例如,994range(4, 1000, 2)因为:

  1. 4 <= 994 < 1000
  2. (994 - 4) % 2 == 0

完整的C代码包含在下面,由于内存管理和引用计数的详细信息,因此较为冗长,但这里存在基本思想:

static int
range_contains_long(rangeobject *r, PyObject *ob)
{
    int cmp1, cmp2, cmp3;
    PyObject *tmp1 = NULL;
    PyObject *tmp2 = NULL;
    PyObject *zero = NULL;
    int result = -1;

    zero = PyLong_FromLong(0);
    if (zero == NULL) /* MemoryError in int(0) */
        goto end;

    /* Check if the value can possibly be in the range. */

    cmp1 = PyObject_RichCompareBool(r->step, zero, Py_GT);
    if (cmp1 == -1)
        goto end;
    if (cmp1 == 1) { /* positive steps: start <= ob < stop */
        cmp2 = PyObject_RichCompareBool(r->start, ob, Py_LE);
        cmp3 = PyObject_RichCompareBool(ob, r->stop, Py_LT);
    }
    else { /* negative steps: stop < ob <= start */
        cmp2 = PyObject_RichCompareBool(ob, r->start, Py_LE);
        cmp3 = PyObject_RichCompareBool(r->stop, ob, Py_LT);
    }

    if (cmp2 == -1 || cmp3 == -1) /* TypeError */
        goto end;
    if (cmp2 == 0 || cmp3 == 0) { /* ob outside of range */
        result = 0;
        goto end;
    }

    /* Check that the stride does not invalidate ob's membership. */
    tmp1 = PyNumber_Subtract(ob, r->start);
    if (tmp1 == NULL)
        goto end;
    tmp2 = PyNumber_Remainder(tmp1, r->step);
    if (tmp2 == NULL)
        goto end;
    /* result = ((int(ob) - start) % step) == 0 */
    result = PyObject_RichCompareBool(tmp2, zero, Py_EQ);
  end:
    Py_XDECREF(tmp1);
    Py_XDECREF(tmp2);
    Py_XDECREF(zero);
    return result;
}

static int
range_contains(rangeobject *r, PyObject *ob)
{
    if (PyLong_CheckExact(ob) || PyBool_Check(ob))
        return range_contains_long(r, ob);

    return (int)_PySequence_IterSearch((PyObject*)r, ob,
                                       PY_ITERSEARCH_CONTAINS);
}

该行的“实质”在该行中提到:

/* result = ((int(ob) - start) % step) == 0 */ 

最后一点-查看range_contains代码段底部的函数。如果确切的类型检查失败,那么我们将不使用描述的巧妙算法,而是使用_PySequence_IterSearch!退回到该范围的愚蠢迭代搜索。您可以在解释器中检查此行为(我在这里使用v3.5.0):

>>> x, r = 1000000000000000, range(1000000000000001)
>>> class MyInt(int):
...     pass
... 
>>> x_ = MyInt(x)
>>> x in r  # calculates immediately :) 
True
>>> x_ in r  # iterates for ages.. :( 
^\Quit (core dumped)

Use the source, Luke!

In CPython, range(...).__contains__ (a method wrapper) will eventually delegate to a simple calculation which checks if the value can possibly be in the range. The reason for the speed here is we’re using mathematical reasoning about the bounds, rather than a direct iteration of the range object. To explain the logic used:

  1. Check that the number is between start and stop, and
  2. Check that the stride value doesn’t “step over” our number.

For example, 994 is in range(4, 1000, 2) because:

  1. 4 <= 994 < 1000, and
  2. (994 - 4) % 2 == 0.

The full C code is included below, which is a bit more verbose because of memory management and reference counting details, but the basic idea is there:

static int
range_contains_long(rangeobject *r, PyObject *ob)
{
    int cmp1, cmp2, cmp3;
    PyObject *tmp1 = NULL;
    PyObject *tmp2 = NULL;
    PyObject *zero = NULL;
    int result = -1;

    zero = PyLong_FromLong(0);
    if (zero == NULL) /* MemoryError in int(0) */
        goto end;

    /* Check if the value can possibly be in the range. */

    cmp1 = PyObject_RichCompareBool(r->step, zero, Py_GT);
    if (cmp1 == -1)
        goto end;
    if (cmp1 == 1) { /* positive steps: start <= ob < stop */
        cmp2 = PyObject_RichCompareBool(r->start, ob, Py_LE);
        cmp3 = PyObject_RichCompareBool(ob, r->stop, Py_LT);
    }
    else { /* negative steps: stop < ob <= start */
        cmp2 = PyObject_RichCompareBool(ob, r->start, Py_LE);
        cmp3 = PyObject_RichCompareBool(r->stop, ob, Py_LT);
    }

    if (cmp2 == -1 || cmp3 == -1) /* TypeError */
        goto end;
    if (cmp2 == 0 || cmp3 == 0) { /* ob outside of range */
        result = 0;
        goto end;
    }

    /* Check that the stride does not invalidate ob's membership. */
    tmp1 = PyNumber_Subtract(ob, r->start);
    if (tmp1 == NULL)
        goto end;
    tmp2 = PyNumber_Remainder(tmp1, r->step);
    if (tmp2 == NULL)
        goto end;
    /* result = ((int(ob) - start) % step) == 0 */
    result = PyObject_RichCompareBool(tmp2, zero, Py_EQ);
  end:
    Py_XDECREF(tmp1);
    Py_XDECREF(tmp2);
    Py_XDECREF(zero);
    return result;
}

static int
range_contains(rangeobject *r, PyObject *ob)
{
    if (PyLong_CheckExact(ob) || PyBool_Check(ob))
        return range_contains_long(r, ob);

    return (int)_PySequence_IterSearch((PyObject*)r, ob,
                                       PY_ITERSEARCH_CONTAINS);
}

The “meat” of the idea is mentioned in the line:

/* result = ((int(ob) - start) % step) == 0 */ 

As a final note – look at the range_contains function at the bottom of the code snippet. If the exact type check fails then we don’t use the clever algorithm described, instead falling back to a dumb iteration search of the range using _PySequence_IterSearch! You can check this behaviour in the interpreter (I’m using v3.5.0 here):

>>> x, r = 1000000000000000, range(1000000000000001)
>>> class MyInt(int):
...     pass
... 
>>> x_ = MyInt(x)
>>> x in r  # calculates immediately :) 
True
>>> x_ in r  # iterates for ages.. :( 
^\Quit (core dumped)

回答 3

为了补充Martijn的答案,这是源代码的相关部分(在C中,因为range对象是用本机代码编写的):

static int
range_contains(rangeobject *r, PyObject *ob)
{
    if (PyLong_CheckExact(ob) || PyBool_Check(ob))
        return range_contains_long(r, ob);

    return (int)_PySequence_IterSearch((PyObject*)r, ob,
                                       PY_ITERSEARCH_CONTAINS);
}

因此对于PyLong对象(int在Python 3中是),它将使用该range_contains_long函数确定结果。该函数实际上检查是否ob在指定范围内(尽管在C语言中看起来更复杂)。

如果不是int对象,它将退回到迭代,直到找到(或没有)值为止。

整个逻辑可以像这样转换为伪Python:

def range_contains (rangeObj, obj):
    if isinstance(obj, int):
        return range_contains_long(rangeObj, obj)

    # default logic by iterating
    return any(obj == x for x in rangeObj)

def range_contains_long (r, num):
    if r.step > 0:
        # positive step: r.start <= num < r.stop
        cmp2 = r.start <= num
        cmp3 = num < r.stop
    else:
        # negative step: r.start >= num > r.stop
        cmp2 = num <= r.start
        cmp3 = r.stop < num

    # outside of the range boundaries
    if not cmp2 or not cmp3:
        return False

    # num must be on a valid step inside the boundaries
    return (num - r.start) % r.step == 0

To add to Martijn’s answer, this is the relevant part of the source (in C, as the range object is written in native code):

static int
range_contains(rangeobject *r, PyObject *ob)
{
    if (PyLong_CheckExact(ob) || PyBool_Check(ob))
        return range_contains_long(r, ob);

    return (int)_PySequence_IterSearch((PyObject*)r, ob,
                                       PY_ITERSEARCH_CONTAINS);
}

So for PyLong objects (which is int in Python 3), it will use the range_contains_long function to determine the result. And that function essentially checks if ob is in the specified range (although it looks a bit more complex in C).

If it’s not an int object, it falls back to iterating until it finds the value (or not).

The whole logic could be translated to pseudo-Python like this:

def range_contains (rangeObj, obj):
    if isinstance(obj, int):
        return range_contains_long(rangeObj, obj)

    # default logic by iterating
    return any(obj == x for x in rangeObj)

def range_contains_long (r, num):
    if r.step > 0:
        # positive step: r.start <= num < r.stop
        cmp2 = r.start <= num
        cmp3 = num < r.stop
    else:
        # negative step: r.start >= num > r.stop
        cmp2 = num <= r.start
        cmp3 = r.stop < num

    # outside of the range boundaries
    if not cmp2 or not cmp3:
        return False

    # num must be on a valid step inside the boundaries
    return (num - r.start) % r.step == 0

回答 4

如果您想知道为什么将此优化添加到range.__contains__,以及为什么将其添加到xrange.__contains__2.7:

首先,正如Ashwini Chaudhary所发现的, 发行1766304已明确打开以进行优化[x]range.__contains__接受了此修补程序并签入了3.2版本,但没有回迁到2.7版本,因为“ xrange表现得如此之久,以至于我看不到它为什么让我们提交最新的修补程序。” (当时2.7快要淘汰了。)

与此同时:

最初xrange是一个非相当序列的对象。如 3.1文档所说:

范围对象的行为很少:它们仅支持索引,迭代和 len功能。

这不是真的。一个xrange对象实际支持,与索引和自动出现一些其他的东西len *包括__contains__(通过线性搜索)。但是,没有人认为有必要在那时将它们完整地序列化。

然后,作为实现抽象基类 PEP的一部分,重要的是弄清楚应将哪些内置类型标记为实现哪些ABC和xrange/ range声称实现collections.Sequence,即使它仍仅处理相同的“非常少的行为”。在发布9213之前,没有人注意到这个问题。该问题的补丁不仅增加indexcount3.2的range,它也重新工作的优化__contains__(共享相同的数学index,并直接使用count)。** 此更改也适用于3.2,并且没有回移植到2.x,因为“这是一个添加了新方法的错误修正”。(此时,2.7已经超过了rc状态。)

因此,有两次机会可以将此优化回溯到2.7,但都被拒绝了。


*实际上,您甚至可以单独使用索引免费获得迭代,但是在2.3 xrange对象中获得了自定义迭代器。

**第一个版本实际上是重新实现了它,并且弄错了细节-例如,它将给您MyIntSubclass(2) in range(5) == False。但是Daniel Stutzbach的补丁更新版本恢复了以前的大部分代码,包括对通用代码的后备支持,_PySequence_IterSearchrange.__contains__在不应用优化的情况下会缓慢地降低3.2 之前版本的隐式使用。

If you’re wondering why this optimization was added to range.__contains__, and why it wasn’t added to xrange.__contains__ in 2.7:

First, as Ashwini Chaudhary discovered, issue 1766304 was opened explicitly to optimize [x]range.__contains__. A patch for this was accepted and checked in for 3.2, but not backported to 2.7 because “xrange has behaved like this for such a long time that I don’t see what it buys us to commit the patch this late.” (2.7 was nearly out at that point.)

Meanwhile:

Originally, xrange was a not-quite-sequence object. As the 3.1 docs say:

Range objects have very little behavior: they only support indexing, iteration, and the len function.

This wasn’t quite true; an xrange object actually supported a few other things that come automatically with indexing and len,* including __contains__ (via linear search). But nobody thought it was worth making them full sequences at the time.

Then, as part of implementing the Abstract Base Classes PEP, it was important to figure out which builtin types should be marked as implementing which ABCs, and xrange/range claimed to implement collections.Sequence, even though it still only handled the same “very little behavior”. Nobody noticed that problem until issue 9213. The patch for that issue not only added index and count to 3.2’s range, it also re-worked the optimized __contains__ (which shares the same math with index, and is directly used by count).**This change went in for 3.2 as well, and was not backported to 2.x, because “it’s a bugfix that adds new methods”. (At this point, 2.7 was already past rc status.)

So, there were two chances to get this optimization backported to 2.7, but they were both rejected.


* In fact, you even get iteration for free with indexing alone, but in 2.3 xrange objects got a custom iterator.

** The first version actually reimplemented it, and got the details wrong—e.g., it would give you MyIntSubclass(2) in range(5) == False. But Daniel Stutzbach’s updated version of the patch restored most of the previous code, including the fallback to the generic, slow _PySequence_IterSearch that pre-3.2 range.__contains__ was implicitly using when the optimization doesn’t apply.


回答 5

其他答案已经很好地说明了这一点,但是我想提供另一个实验来说明范围对象的性质:

>>> r = range(5)
>>> for i in r:
        print(i, 2 in r, list(r))

0 True [0, 1, 2, 3, 4]
1 True [0, 1, 2, 3, 4]
2 True [0, 1, 2, 3, 4]
3 True [0, 1, 2, 3, 4]
4 True [0, 1, 2, 3, 4]

如您所见,范围对象是一个记住其范围的对象,可以多次使用(即使在其上进行迭代),而不仅仅是一次生成器。

The other answers explained it well already, but I’d like to offer another experiment illustrating the nature of range objects:

>>> r = range(5)
>>> for i in r:
        print(i, 2 in r, list(r))

0 True [0, 1, 2, 3, 4]
1 True [0, 1, 2, 3, 4]
2 True [0, 1, 2, 3, 4]
3 True [0, 1, 2, 3, 4]
4 True [0, 1, 2, 3, 4]

As you can see, a range object is an object that remembers its range and can be used many times (even while iterating over it), not just a one-time generator.


回答 6

这是关于一个偷懒的办法来评估和一些额外的优化range。直到实际使用时才需要计算范围内的值,或者由于额外的优化甚至不需要进一步计算。

顺便说一句,您的整数不是那么大,请考虑 sys.maxsize

sys.maxsize in range(sys.maxsize) 相当快

由于优化-比较给定的整数和范围的最小值和最大值很容易。

但:

Decimal(sys.maxsize) in range(sys.maxsize) 很慢

(在这种情况下, range,因此,如果python收到意外的Decimal,则python将比较所有数字)

您应该了解实现细节,但不应依赖它,因为将来可能会改变。

It’s all about a lazy approach to the evaluation and some extra optimization of range. Values in ranges don’t need to be computed until real use, or even further due to extra optimization.

By the way, your integer is not such big, consider sys.maxsize

sys.maxsize in range(sys.maxsize) is pretty fast

due to optimization – it’s easy to compare given integer just with min and max of range.

but:

Decimal(sys.maxsize) in range(sys.maxsize) is pretty slow.

(in this case, there is no optimization in range, so if python receives unexpected Decimal, python will compare all numbers)

You should be aware of an implementation detail but should not be relied upon, because this may change in the future.


回答 7

TL; DR

传回的物件range()实际上是range对象。该对象实现了迭代器接口,因此您可以按顺序迭代其值,就像生成器,列表或元组一样。

但是,它实现了__contains__接口,该接口实际上是当对象出现在in操作员右侧时调用的接口。该__contains__()方法返回a bool左侧项目是否in在对象中。由于range对象知道其边界和步幅,因此在O(1)中非常容易实现。

TL;DR

The object returned by range() is actually a range object. This object implements the iterator interface so you can iterate over its values sequentially, just like a generator, list, or tuple.

But it also implements the __contains__ interface which is actually what gets called when an object appears on the right hand side of the in operator. The __contains__() method returns a bool of whether or not the item on the left-hand-side of the in is in the object. Since range objects know their bounds and stride, this is very easy to implement in O(1).


回答 8

  1. 由于优化,将给定的整数与最小和最大范围进行比较非常容易。
  2. 在Python3 中range()函数之所以如此之快,是因为这里我们对边界使用数学推理,而不是范围对象的直接迭代。
  3. 所以在这里解释逻辑:
    • 检查数字是否在开始和停止之间。
    • 检查步长精度值是否不超过我们的数字。
  4. 例如,997在range(4,1000,3)内是因为:

    4 <= 997 < 1000, and (997 - 4) % 3 == 0.

  1. Due to optimization, it is very easy to compare given integers just with min and max range.
  2. The reason that range() function is so fast in Python3 is that here we use mathematical reasoning for the bounds, rather than a direct iteration of the range object.
  3. So for explaining the logic here:
    • Check whether the number is between the start and stop.
    • Check whether the step precision value doesn’t go over our number.
  4. Take an example, 997 is in range(4, 1000, 3) because:

    4 <= 997 < 1000, and (997 - 4) % 3 == 0.


回答 9

尝试x-1 in (i for i in range(x))使用较大的x值,该值使用生成器理解来避免调用range.__contains__优化。

Try x-1 in (i for i in range(x)) for large x values, which uses a generator comprehension to avoid invoking the range.__contains__ optimisation.