标签归档:performance

检查列表中是否存在值的最快方法

问题:检查列表中是否存在值的最快方法

最快的方法是什么才能知道列表中是否存在值(列表中包含数百万个值)及其索引是什么?

我知道列表中的所有值都是唯一的,如本例所示。

我尝试的第一种方法是(在我的实际代码中为3.8秒):

a = [4,2,3,1,5,6]

if a.count(7) == 1:
    b=a.index(7)
    "Do something with variable b"

我尝试的第二种方法是(速度提高了2倍:实际代码为1.9秒):

a = [4,2,3,1,5,6]

try:
    b=a.index(7)
except ValueError:
    "Do nothing"
else:
    "Do something with variable b"

堆栈溢出用户建议的方法(我的实际代码为2.74秒):

a = [4,2,3,1,5,6]
if 7 in a:
    a.index(7)

在我的真实代码中,第一种方法耗时3.81秒,第二种方法耗时1.88秒。这是一个很好的改进,但是:

我是使用Python /脚本的初学者,有没有更快的方法来完成相同的事情并节省更多的处理时间?

我的应用程序更具体的说明:

在Blender API中,我可以访问粒子列表:

particles = [1, 2, 3, 4, etc.]

从那里,我可以访问粒子的位置:

particles[x].location = [x,y,z]

对于每个粒子,我通过搜索每个粒子位置来测试是否存在邻居:

if [x+1,y,z] in particles.location
    "Find the identity of this neighbour particle in x:the particle's index
    in the array"
    particles.index([x+1,y,z])

What is the fastest way to know if a value exists in a list (a list with millions of values in it) and what its index is?

I know that all values in the list are unique as in this example.

The first method I try is (3.8 sec in my real code):

a = [4,2,3,1,5,6]

if a.count(7) == 1:
    b=a.index(7)
    "Do something with variable b"

The second method I try is (2x faster: 1.9 sec for my real code):

a = [4,2,3,1,5,6]

try:
    b=a.index(7)
except ValueError:
    "Do nothing"
else:
    "Do something with variable b"

Proposed methods from Stack Overflow user (2.74 sec for my real code):

a = [4,2,3,1,5,6]
if 7 in a:
    a.index(7)

In my real code, the first method takes 3.81 sec and the second method takes 1.88 sec. It’s a good improvement, but:

I’m a beginner with Python/scripting, and is there a faster way to do the same things and save more processing time?

More specific explication for my application:

In the Blender API I can access a list of particles:

particles = [1, 2, 3, 4, etc.]

From there, I can access a particle’s location:

particles[x].location = [x,y,z]

And for each particle I test if a neighbour exists by searching each particle location like so:

if [x+1,y,z] in particles.location
    "Find the identity of this neighbour particle in x:the particle's index
    in the array"
    particles.index([x+1,y,z])

回答 0

7 in a

最清晰,最快的方法。

您也可以考虑使用set,但是从列表中构造该集合所花费的时间可能比更快的成员资格测试所节省的时间还要长。唯一可以确定的基准就是基准测试。(这还取决于您需要执行哪些操作)

7 in a

Clearest and fastest way to do it.

You can also consider using a set, but constructing that set from your list may take more time than faster membership testing will save. The only way to be certain is to benchmark well. (this also depends on what operations you require)


回答 1

正如其他人所述,in对于大型列表,它可能非常慢。这里是表演一些比较insetbisect。请注意时间(以秒为单位)是对数刻度。

在此处输入图片说明

测试代码:

import random
import bisect
import matplotlib.pyplot as plt
import math
import time

def method_in(a,b,c):
    start_time = time.time()
    for i,x in enumerate(a):
        if x in b:
            c[i] = 1
    return(time.time()-start_time)   

def method_set_in(a,b,c):
    start_time = time.time()
    s = set(b)
    for i,x in enumerate(a):
        if x in s:
            c[i] = 1
    return(time.time()-start_time)

def method_bisect(a,b,c):
    start_time = time.time()
    b.sort()
    for i,x in enumerate(a):
        index = bisect.bisect_left(b,x)
        if index < len(a):
            if x == b[index]:
                c[i] = 1
    return(time.time()-start_time)

def profile():
    time_method_in = []
    time_method_set_in = []
    time_method_bisect = []

    Nls = [x for x in range(1000,20000,1000)]
    for N in Nls:
        a = [x for x in range(0,N)]
        random.shuffle(a)
        b = [x for x in range(0,N)]
        random.shuffle(b)
        c = [0 for x in range(0,N)]

        time_method_in.append(math.log(method_in(a,b,c)))
        time_method_set_in.append(math.log(method_set_in(a,b,c)))
        time_method_bisect.append(math.log(method_bisect(a,b,c)))

    plt.plot(Nls,time_method_in,marker='o',color='r',linestyle='-',label='in')
    plt.plot(Nls,time_method_set_in,marker='o',color='b',linestyle='-',label='set')
    plt.plot(Nls,time_method_bisect,marker='o',color='g',linestyle='-',label='bisect')
    plt.xlabel('list size', fontsize=18)
    plt.ylabel('log(time)', fontsize=18)
    plt.legend(loc = 'upper left')
    plt.show()

As stated by others, in can be very slow for large lists. Here are some comparisons of the performances for in, set and bisect. Note the time (in second) is in log scale.

enter image description here

Code for testing:

import random
import bisect
import matplotlib.pyplot as plt
import math
import time

def method_in(a,b,c):
    start_time = time.time()
    for i,x in enumerate(a):
        if x in b:
            c[i] = 1
    return(time.time()-start_time)   

def method_set_in(a,b,c):
    start_time = time.time()
    s = set(b)
    for i,x in enumerate(a):
        if x in s:
            c[i] = 1
    return(time.time()-start_time)

def method_bisect(a,b,c):
    start_time = time.time()
    b.sort()
    for i,x in enumerate(a):
        index = bisect.bisect_left(b,x)
        if index < len(a):
            if x == b[index]:
                c[i] = 1
    return(time.time()-start_time)

def profile():
    time_method_in = []
    time_method_set_in = []
    time_method_bisect = []

    Nls = [x for x in range(1000,20000,1000)]
    for N in Nls:
        a = [x for x in range(0,N)]
        random.shuffle(a)
        b = [x for x in range(0,N)]
        random.shuffle(b)
        c = [0 for x in range(0,N)]

        time_method_in.append(math.log(method_in(a,b,c)))
        time_method_set_in.append(math.log(method_set_in(a,b,c)))
        time_method_bisect.append(math.log(method_bisect(a,b,c)))

    plt.plot(Nls,time_method_in,marker='o',color='r',linestyle='-',label='in')
    plt.plot(Nls,time_method_set_in,marker='o',color='b',linestyle='-',label='set')
    plt.plot(Nls,time_method_bisect,marker='o',color='g',linestyle='-',label='bisect')
    plt.xlabel('list size', fontsize=18)
    plt.ylabel('log(time)', fontsize=18)
    plt.legend(loc = 'upper left')
    plt.show()

回答 2

您可以将物品放入set。集合查找非常有效。

尝试:

s = set(a)
if 7 in s:
  # do stuff

编辑在注释中,您说您想获取元素的索引。不幸的是,集合没有元素位置的概念。另一种方法是对列表进行预排序,然后在每次需要查找元素时使用二进制搜索

You could put your items into a set. Set lookups are very efficient.

Try:

s = set(a)
if 7 in s:
  # do stuff

edit In a comment you say that you’d like to get the index of the element. Unfortunately, sets have no notion of element position. An alternative is to pre-sort your list and then use binary search every time you need to find an element.


回答 3

def check_availability(element, collection: iter):
    return element in collection

用法

check_availability('a', [1,2,3,4,'a','b','c'])

我相信这是知道所选值是否在数组中的最快方法。

def check_availability(element, collection: iter):
    return element in collection

Usage

check_availability('a', [1,2,3,4,'a','b','c'])

I believe this is the fastest way to know if a chosen value is in an array.


回答 4

a = [4,2,3,1,5,6]

index = dict((y,x) for x,y in enumerate(a))
try:
   a_index = index[7]
except KeyError:
   print "Not found"
else:
   print "found"

如果a不变,这将是一个好主意,因此我们可以做一次dict()部分,然后重复使用它。如果确实发生变化,请提供您正在做的更多详细信息。

a = [4,2,3,1,5,6]

index = dict((y,x) for x,y in enumerate(a))
try:
   a_index = index[7]
except KeyError:
   print "Not found"
else:
   print "found"

This will only be a good idea if a doesn’t change and thus we can do the dict() part once and then use it repeatedly. If a does change, please provide more detail on what you are doing.


回答 5

最初的问题是:

最快的方法是什么才能知道列表中是否存在值(列表中包含数百万个值)及其索引是什么?

因此,有两件事可以找到:

  1. 是列表中的一项,并且
  2. 什么是索引(如果在列表中)。

为此,我修改了@xslittlegrass代码以在所有情况下计算索引,并添加了其他方法。

结果

在此处输入图片说明

方法是:

  1. in-基本上如果b中的x:返回b.index(x)
  2. try–try / catch on b.index(x)(跳过必须检查b中的x)
  3. set-基本上,如果x在set(b)中:返回b.index(x)
  4. bisect-用索引对其b进行排序,对sorted(b)中的x进行二进制搜索。请注意@xslittlegrass的mod,它返回排序后的b中的索引,而不是原始b)
  5. 反向-为b形成反向查找字典d; 然后d [x]提供x的索引。

结果表明,方法5最快。

有趣的是,tryset方法在时间上是等效的。


测试代码

import random
import bisect
import matplotlib.pyplot as plt
import math
import timeit
import itertools

def wrapper(func, *args, **kwargs):
    " Use to produced 0 argument function for call it"
    # Reference https://www.pythoncentral.io/time-a-python-function/
    def wrapped():
        return func(*args, **kwargs)
    return wrapped

def method_in(a,b,c):
    for i,x in enumerate(a):
        if x in b:
            c[i] = b.index(x)
        else:
            c[i] = -1
    return c

def method_try(a,b,c):
    for i, x in enumerate(a):
        try:
            c[i] = b.index(x)
        except ValueError:
            c[i] = -1

def method_set_in(a,b,c):
    s = set(b)
    for i,x in enumerate(a):
        if x in s:
            c[i] = b.index(x)
        else:
            c[i] = -1
    return c

def method_bisect(a,b,c):
    " Finds indexes using bisection "

    # Create a sorted b with its index
    bsorted = sorted([(x, i) for i, x in enumerate(b)], key = lambda t: t[0])

    for i,x in enumerate(a):
        index = bisect.bisect_left(bsorted,(x, ))
        c[i] = -1
        if index < len(a):
            if x == bsorted[index][0]:
                c[i] = bsorted[index][1]  # index in the b array

    return c

def method_reverse_lookup(a, b, c):
    reverse_lookup = {x:i for i, x in enumerate(b)}
    for i, x in enumerate(a):
        c[i] = reverse_lookup.get(x, -1)
    return c

def profile():
    Nls = [x for x in range(1000,20000,1000)]
    number_iterations = 10
    methods = [method_in, method_try, method_set_in, method_bisect, method_reverse_lookup]
    time_methods = [[] for _ in range(len(methods))]

    for N in Nls:
        a = [x for x in range(0,N)]
        random.shuffle(a)
        b = [x for x in range(0,N)]
        random.shuffle(b)
        c = [0 for x in range(0,N)]

        for i, func in enumerate(methods):
            wrapped = wrapper(func, a, b, c)
            time_methods[i].append(math.log(timeit.timeit(wrapped, number=number_iterations)))

    markers = itertools.cycle(('o', '+', '.', '>', '2'))
    colors = itertools.cycle(('r', 'b', 'g', 'y', 'c'))
    labels = itertools.cycle(('in', 'try', 'set', 'bisect', 'reverse'))

    for i in range(len(time_methods)):
        plt.plot(Nls,time_methods[i],marker = next(markers),color=next(colors),linestyle='-',label=next(labels))

    plt.xlabel('list size', fontsize=18)
    plt.ylabel('log(time)', fontsize=18)
    plt.legend(loc = 'upper left')
    plt.show()

profile()

The original question was:

What is the fastest way to know if a value exists in a list (a list with millions of values in it) and what its index is?

Thus there are two things to find:

  1. is an item in the list, and
  2. what is the index (if in the list).

Towards this, I modified @xslittlegrass code to compute indexes in all cases, and added an additional method.

Results

enter image description here

Methods are:

  1. in–basically if x in b: return b.index(x)
  2. try–try/catch on b.index(x) (skips having to check if x in b)
  3. set–basically if x in set(b): return b.index(x)
  4. bisect–sort b with its index, binary search for x in sorted(b). Note mod from @xslittlegrass who returns the index in the sorted b, rather than the original b)
  5. reverse–form a reverse lookup dictionary d for b; then d[x] provides the index of x.

Results show that method 5 is the fastest.

Interestingly the try and the set methods are equivalent in time.


Test Code

import random
import bisect
import matplotlib.pyplot as plt
import math
import timeit
import itertools

def wrapper(func, *args, **kwargs):
    " Use to produced 0 argument function for call it"
    # Reference https://www.pythoncentral.io/time-a-python-function/
    def wrapped():
        return func(*args, **kwargs)
    return wrapped

def method_in(a,b,c):
    for i,x in enumerate(a):
        if x in b:
            c[i] = b.index(x)
        else:
            c[i] = -1
    return c

def method_try(a,b,c):
    for i, x in enumerate(a):
        try:
            c[i] = b.index(x)
        except ValueError:
            c[i] = -1

def method_set_in(a,b,c):
    s = set(b)
    for i,x in enumerate(a):
        if x in s:
            c[i] = b.index(x)
        else:
            c[i] = -1
    return c

def method_bisect(a,b,c):
    " Finds indexes using bisection "

    # Create a sorted b with its index
    bsorted = sorted([(x, i) for i, x in enumerate(b)], key = lambda t: t[0])

    for i,x in enumerate(a):
        index = bisect.bisect_left(bsorted,(x, ))
        c[i] = -1
        if index < len(a):
            if x == bsorted[index][0]:
                c[i] = bsorted[index][1]  # index in the b array

    return c

def method_reverse_lookup(a, b, c):
    reverse_lookup = {x:i for i, x in enumerate(b)}
    for i, x in enumerate(a):
        c[i] = reverse_lookup.get(x, -1)
    return c

def profile():
    Nls = [x for x in range(1000,20000,1000)]
    number_iterations = 10
    methods = [method_in, method_try, method_set_in, method_bisect, method_reverse_lookup]
    time_methods = [[] for _ in range(len(methods))]

    for N in Nls:
        a = [x for x in range(0,N)]
        random.shuffle(a)
        b = [x for x in range(0,N)]
        random.shuffle(b)
        c = [0 for x in range(0,N)]

        for i, func in enumerate(methods):
            wrapped = wrapper(func, a, b, c)
            time_methods[i].append(math.log(timeit.timeit(wrapped, number=number_iterations)))

    markers = itertools.cycle(('o', '+', '.', '>', '2'))
    colors = itertools.cycle(('r', 'b', 'g', 'y', 'c'))
    labels = itertools.cycle(('in', 'try', 'set', 'bisect', 'reverse'))

    for i in range(len(time_methods)):
        plt.plot(Nls,time_methods[i],marker = next(markers),color=next(colors),linestyle='-',label=next(labels))

    plt.xlabel('list size', fontsize=18)
    plt.ylabel('log(time)', fontsize=18)
    plt.legend(loc = 'upper left')
    plt.show()

profile()

回答 6

听起来您的应用程序可能会受益于使用Bloom Filter数据结构的优势。

简而言之,布隆过滤器查询可以很快告诉您集合中是否绝对没有值。否则,您可以进行较慢的查找,以获取列表中可能存在的值的索引。因此,如果您的应用程序倾向于比“已找到”结果更频繁地获得“未找到”结果,则可以通过添加Bloom Filter来加快速度。

有关详细信息,Wikipedia很好地概述了布隆过滤器的工作方式,并且对“ python布隆过滤器库”的网络搜索将至少提供一些有用的实现。

It sounds like your application might gain advantage from the use of a Bloom Filter data structure.

In short, a bloom filter look-up can tell you very quickly if a value is DEFINITELY NOT present in a set. Otherwise, you can do a slower look-up to get the index of a value that POSSIBLY MIGHT BE in the list. So if your application tends to get the “not found” result much more often then the “found” result, you might see a speed up by adding a Bloom Filter.

For details, Wikipedia provides a good overview of how Bloom Filters work, and a web search for “python bloom filter library” will provide at least a couple useful implementations.


回答 7

请注意,in运算符不仅测试相等性(==),还测试身份(is),s 的in逻辑大致等同于以下内容(它实际上是用C编写的,但不是用Python编写的,至少是用CPython编写的):list

for element in s:
    if element is target:
        # fast check for identity implies equality
        return True
    if element == target:
        # slower check for actual equality
        return True
return False

在大多数情况下,这个细节是无关紧要的,但是在某些情况下,它可能会使Python新手感到惊讶,例如,numpy.NAN具有不等于自身的异常特性:

>>> import numpy
>>> numpy.NAN == numpy.NAN
False
>>> numpy.NAN is numpy.NAN
True
>>> numpy.NAN in [numpy.NAN]
True

要区分这些异常情况,可以使用any()

>>> lst = [numpy.NAN, 1 , 2]
>>> any(element == numpy.NAN for element in lst)
False
>>> any(element is numpy.NAN for element in lst)
True 

注意s 的in逻辑为:listany()

any(element is target or element == target for element in lst)

但是,我要强调的是,这是一个in极端的情况,在绝大多数情况下,运算符都是经过高度优化的,而这正是您想要的(当然是使用a list或使用a set)。

Be aware that the in operator tests not only equality (==) but also identity (is), the in logic for lists is roughly equivalent to the following (it’s actually written in C and not Python though, at least in CPython):

for element in s:
    if element is target:
        # fast check for identity implies equality
        return True
    if element == target:
        # slower check for actual equality
        return True
return False

In most circumstances this detail is irrelevant, but in some circumstances it might leave a Python novice surprised, for example, numpy.NAN has the unusual property of being not being equal to itself:

>>> import numpy
>>> numpy.NAN == numpy.NAN
False
>>> numpy.NAN is numpy.NAN
True
>>> numpy.NAN in [numpy.NAN]
True

To distinguish between these unusual cases you could use any() like:

>>> lst = [numpy.NAN, 1 , 2]
>>> any(element == numpy.NAN for element in lst)
False
>>> any(element is numpy.NAN for element in lst)
True 

Note the in logic for lists with any() would be:

any(element is target or element == target for element in lst)

However, I should emphasize that this is an edge case, and for the vast majority of cases the in operator is highly optimised and exactly what you want of course (either with a list or with a set).


回答 8

或使用__contains__

sequence.__contains__(value)

演示:

>>> l=[1,2,3]
>>> l.__contains__(3)
True
>>> 

Or use __contains__:

sequence.__contains__(value)

Demo:

>>> l=[1,2,3]
>>> l.__contains__(3)
True
>>> 

回答 9

@Winston Ewert的解决方案极大地提高了非常大的列表的速度,但是这个stackoverflow答案表明,如果经常到达除外分支,则try:/ except:/ else:构造将变慢。一种替代方法是利用该.get()方法使用dict:

a = [4,2,3,1,5,6]

index = dict((y, x) for x, y in enumerate(a))

b = index.get(7, None)
if b is not None:
    "Do something with variable b"

.get(key, default)方法仅适用于无法保证键将包含在dict中的情况。如果关键存在,它返回值(将dict[key]),但是,当它不是,.get()返回默认值(在这里None)。在这种情况下,您需要确保所选的默认值不会在中a

@Winston Ewert’s solution yields a big speed-up for very large lists, but this stackoverflow answer indicates that the the try:/except:/else: construct will be slowed down if the except branch is often reached. An alternative is to take advantage of the .get() method for the dict:

a = [4,2,3,1,5,6]

index = dict((y, x) for x, y in enumerate(a))

b = index.get(7, None)
if b is not None:
    "Do something with variable b"

The .get(key, default) method is just for the case when you can’t guarantee a key will be in the dict. If key is present, it returns the value (as would dict[key]), but when it is not, .get() returns your default value (here None). You need to make sure in this case that the chosen default will not be in a.


回答 10

这不是代码,而是用于快速搜索的算法。

如果您的列表和要查找的值都是数字,那么这很简单。如果是字符串:请看底部:

  • -让“ n”为列表的长度
  • -可选步骤:如果需要元素索引:将第二列添加到元素的当前索引(0到n-1)-稍后再说
  • 订购列表或列表的副本(.sort())
  • 依次通过:
    • 将您的数字与列表的第n / 2个元素进行比较
      • 如果更大,则在索引n / 2-n之间再次循环
      • 如果较小,则在索引0-n / 2之间再次循环
      • 如果相同:您找到了
  • 不断缩小列表的范围,直到找到它或只有2个数字(在您要查找的数字的下方和上方)
  • 这将在最多19个步骤中找到1.000.000列表中的任何元素(准确地说是log(2)n)

如果您还需要号码的原始位置,请在第二个索引列中查找。

如果您的列表不是由数字组成的,则该方法仍然有效并且将是最快的,但是您可能需要定义一个可以比较/排序字符串的函数。

当然,这需要sorted()方法的投资,但是如果您继续重复使用相同的列表进行检查,那可能是值得的。

This is not the code, but the algorithm for very fast searching.

If your list and the value you are looking for are all numbers, this is pretty straightforward. If strings: look at the bottom:

  • -Let “n” be the length of your list
  • -Optional step: if you need the index of the element: add a second column to the list with current index of elements (0 to n-1) – see later
  • Order your list or a copy of it (.sort())
  • Loop through:
    • Compare your number to the n/2th element of the list
      • If larger, loop again between indexes n/2-n
      • If smaller, loop again between indexes 0-n/2
      • If the same: you found it
  • Keep narrowing the list until you have found it or only have 2 numbers (below and above the one you are looking for)
  • This will find any element in at most 19 steps for a list of 1.000.000 (log(2)n to be precise)

If you also need the original position of your number, look for it in the second, index column.

If your list is not made of numbers, the method still works and will be fastest, but you may need to define a function which can compare/order strings.

Of course, this needs the investment of the sorted() method, but if you keep reusing the same list for checking, it may be worth it.


回答 11

因为问题不一定总是被理解为最快的技术方法-我总是建议理解/编写最直接的最快方法:列表理解,单线

[i for i in list_from_which_to_search if i in list_to_search_in]

我对list_to_search_in所有项目都拥有一个,并想返回中的项目索引list_from_which_to_search

这将在一个不错的列表中返回索引。

还有其他方法可以解决此问题-但是列表理解速度足够快,并且可以以足够快的速度编写它来解决问题。

Because the question is not always supposed to be understood as the fastest technical way – I always suggest the most straightforward fastest way to understand/write: a list comprehension, one-liner

[i for i in list_from_which_to_search if i in list_to_search_in]

I had a list_to_search_in with all the items, and wanted to return the indexes of the items in the list_from_which_to_search.

This returns the indexes in a nice list.

There are other ways to check this problem – however list comprehensions are quick enough, adding to the fact of writing it quick enough, to solve a problem.


回答 12

对我而言,这是0.030秒(实际),0.026秒(用户)和0.004秒(系统)。

try:
print("Started")
x = ["a", "b", "c", "d", "e", "f"]

i = 0

while i < len(x):
    i += 1
    if x[i] == "e":
        print("Found")
except IndexError:
    pass

For me it was 0.030 sec (real), 0.026 sec (user), and 0.004 sec (sys).

try:
print("Started")
x = ["a", "b", "c", "d", "e", "f"]

i = 0

while i < len(x):
    i += 1
    if x[i] == "e":
        print("Found")
except IndexError:
    pass

回答 13

检查乘积等于k的数组中是否存在两个元素的代码:

n = len(arr1)
for i in arr1:
    if k%i==0:
        print(i)

Code to check whether two elements exist in array whose product equals k:

n = len(arr1)
for i in arr1:
    if k%i==0:
        print(i)

获得两个列表之间的差异

问题:获得两个列表之间的差异

我在Python中有两个列表,如下所示:

temp1 = ['One', 'Two', 'Three', 'Four']
temp2 = ['One', 'Two']

我需要用第一个列表中没有的项目创建第三个列表。从示例中,我必须得到:

temp3 = ['Three', 'Four']

有没有循环和检查的快速方法吗?

I have two lists in Python, like these:

temp1 = ['One', 'Two', 'Three', 'Four']
temp2 = ['One', 'Two']

I need to create a third list with items from the first list which aren’t present in the second one. From the example I have to get:

temp3 = ['Three', 'Four']

Are there any fast ways without cycles and checking?


回答 0

In [5]: list(set(temp1) - set(temp2))
Out[5]: ['Four', 'Three']

当心

In [5]: set([1, 2]) - set([2, 3])
Out[5]: set([1]) 

您可能希望/希望它等于的位置set([1, 3])。如果确实要set([1, 3])作为答案,则需要使用set([1, 2]).symmetric_difference(set([2, 3]))

In [5]: list(set(temp1) - set(temp2))
Out[5]: ['Four', 'Three']

Beware that

In [5]: set([1, 2]) - set([2, 3])
Out[5]: set([1]) 

where you might expect/want it to equal set([1, 3]). If you do want set([1, 3]) as your answer, you’ll need to use set([1, 2]).symmetric_difference(set([2, 3])).


回答 1

现有解决方案均提供以下一项或多项:

  • 比O(n * m)性能快。
  • 保留输入列表的顺序。

但是到目前为止,还没有解决方案。如果两者都想要,请尝试以下操作:

s = set(temp2)
temp3 = [x for x in temp1 if x not in s]

性能测试

import timeit
init = 'temp1 = list(range(100)); temp2 = [i * 2 for i in range(50)]'
print timeit.timeit('list(set(temp1) - set(temp2))', init, number = 100000)
print timeit.timeit('s = set(temp2);[x for x in temp1 if x not in s]', init, number = 100000)
print timeit.timeit('[item for item in temp1 if item not in temp2]', init, number = 100000)

结果:

4.34620224079 # ars' answer
4.2770634955  # This answer
30.7715615392 # matt b's answer

我介绍的方法以及保留顺序也比集合减法要快(略),因为它不需要构造不必要的集合。如果第一个列表比第二个列表长很多,并且散列很昂贵,则性能差异将更加明显。这是第二个测试,证明了这一点:

init = '''
temp1 = [str(i) for i in range(100000)]
temp2 = [str(i * 2) for i in range(50)]
'''

结果:

11.3836875916 # ars' answer
3.63890368748 # this answer (3 times faster!)
37.7445402279 # matt b's answer

The existing solutions all offer either one or the other of:

  • Faster than O(n*m) performance.
  • Preserve order of input list.

But so far no solution has both. If you want both, try this:

s = set(temp2)
temp3 = [x for x in temp1 if x not in s]

Performance test

import timeit
init = 'temp1 = list(range(100)); temp2 = [i * 2 for i in range(50)]'
print timeit.timeit('list(set(temp1) - set(temp2))', init, number = 100000)
print timeit.timeit('s = set(temp2);[x for x in temp1 if x not in s]', init, number = 100000)
print timeit.timeit('[item for item in temp1 if item not in temp2]', init, number = 100000)

Results:

4.34620224079 # ars' answer
4.2770634955  # This answer
30.7715615392 # matt b's answer

The method I presented as well as preserving order is also (slightly) faster than the set subtraction because it doesn’t require construction of an unnecessary set. The performance difference would be more noticable if the first list is considerably longer than the second and if hashing is expensive. Here’s a second test demonstrating this:

init = '''
temp1 = [str(i) for i in range(100000)]
temp2 = [str(i * 2) for i in range(50)]
'''

Results:

11.3836875916 # ars' answer
3.63890368748 # this answer (3 times faster!)
37.7445402279 # matt b's answer

回答 2

temp3 = [item for item in temp1 if item not in temp2]
temp3 = [item for item in temp1 if item not in temp2]

回答 3

可以使用以下简单函数找到两个列表(例如list1和list2)之间的差异。

def diff(list1, list2):
    c = set(list1).union(set(list2))  # or c = set(list1) | set(list2)
    d = set(list1).intersection(set(list2))  # or d = set(list1) & set(list2)
    return list(c - d)

要么

def diff(list1, list2):
    return list(set(list1).symmetric_difference(set(list2)))  # or return list(set(list1) ^ set(list2))

通过使用上述功能,可以使用diff(temp2, temp1)或找到差异diff(temp1, temp2)。两者都会给出结果['Four', 'Three']。您不必担心列表的顺序或先给出哪个列表。

Python文档参考

The difference between two lists (say list1 and list2) can be found using the following simple function.

def diff(list1, list2):
    c = set(list1).union(set(list2))  # or c = set(list1) | set(list2)
    d = set(list1).intersection(set(list2))  # or d = set(list1) & set(list2)
    return list(c - d)

or

def diff(list1, list2):
    return list(set(list1).symmetric_difference(set(list2)))  # or return list(set(list1) ^ set(list2))

By Using the above function, the difference can be found using diff(temp2, temp1) or diff(temp1, temp2). Both will give the result ['Four', 'Three']. You don’t have to worry about the order of the list or which list is to be given first.

Python doc reference


回答 4

如果您需要递归的区别,我已经为python编写了一个软件包:https : //github.com/seperman/deepdiff

安装

从PyPi安装:

pip install deepdiff

用法示例

输入

>>> from deepdiff import DeepDiff
>>> from pprint import pprint
>>> from __future__ import print_function # In case running on Python 2

同一对象返回空

>>> t1 = {1:1, 2:2, 3:3}
>>> t2 = t1
>>> print(DeepDiff(t1, t2))
{}

项目类型已更改

>>> t1 = {1:1, 2:2, 3:3}
>>> t2 = {1:1, 2:"2", 3:3}
>>> pprint(DeepDiff(t1, t2), indent=2)
{ 'type_changes': { 'root[2]': { 'newtype': <class 'str'>,
                                 'newvalue': '2',
                                 'oldtype': <class 'int'>,
                                 'oldvalue': 2}}}

物品的价值已经改变

>>> t1 = {1:1, 2:2, 3:3}
>>> t2 = {1:1, 2:4, 3:3}
>>> pprint(DeepDiff(t1, t2), indent=2)
{'values_changed': {'root[2]': {'newvalue': 4, 'oldvalue': 2}}}

添加和/或删除项目

>>> t1 = {1:1, 2:2, 3:3, 4:4}
>>> t2 = {1:1, 2:4, 3:3, 5:5, 6:6}
>>> ddiff = DeepDiff(t1, t2)
>>> pprint (ddiff)
{'dic_item_added': ['root[5]', 'root[6]'],
 'dic_item_removed': ['root[4]'],
 'values_changed': {'root[2]': {'newvalue': 4, 'oldvalue': 2}}}

弦差异

>>> t1 = {1:1, 2:2, 3:3, 4:{"a":"hello", "b":"world"}}
>>> t2 = {1:1, 2:4, 3:3, 4:{"a":"hello", "b":"world!"}}
>>> ddiff = DeepDiff(t1, t2)
>>> pprint (ddiff, indent = 2)
{ 'values_changed': { 'root[2]': {'newvalue': 4, 'oldvalue': 2},
                      "root[4]['b']": { 'newvalue': 'world!',
                                        'oldvalue': 'world'}}}

弦差异2

>>> t1 = {1:1, 2:2, 3:3, 4:{"a":"hello", "b":"world!\nGoodbye!\n1\n2\nEnd"}}
>>> t2 = {1:1, 2:2, 3:3, 4:{"a":"hello", "b":"world\n1\n2\nEnd"}}
>>> ddiff = DeepDiff(t1, t2)
>>> pprint (ddiff, indent = 2)
{ 'values_changed': { "root[4]['b']": { 'diff': '--- \n'
                                                '+++ \n'
                                                '@@ -1,5 +1,4 @@\n'
                                                '-world!\n'
                                                '-Goodbye!\n'
                                                '+world\n'
                                                ' 1\n'
                                                ' 2\n'
                                                ' End',
                                        'newvalue': 'world\n1\n2\nEnd',
                                        'oldvalue': 'world!\n'
                                                    'Goodbye!\n'
                                                    '1\n'
                                                    '2\n'
                                                    'End'}}}

>>> 
>>> print (ddiff['values_changed']["root[4]['b']"]["diff"])
--- 
+++ 
@@ -1,5 +1,4 @@
-world!
-Goodbye!
+world
 1
 2
 End

类型变更

>>> t1 = {1:1, 2:2, 3:3, 4:{"a":"hello", "b":[1, 2, 3]}}
>>> t2 = {1:1, 2:2, 3:3, 4:{"a":"hello", "b":"world\n\n\nEnd"}}
>>> ddiff = DeepDiff(t1, t2)
>>> pprint (ddiff, indent = 2)
{ 'type_changes': { "root[4]['b']": { 'newtype': <class 'str'>,
                                      'newvalue': 'world\n\n\nEnd',
                                      'oldtype': <class 'list'>,
                                      'oldvalue': [1, 2, 3]}}}

清单差异

>>> t1 = {1:1, 2:2, 3:3, 4:{"a":"hello", "b":[1, 2, 3, 4]}}
>>> t2 = {1:1, 2:2, 3:3, 4:{"a":"hello", "b":[1, 2]}}
>>> ddiff = DeepDiff(t1, t2)
>>> pprint (ddiff, indent = 2)
{'iterable_item_removed': {"root[4]['b'][2]": 3, "root[4]['b'][3]": 4}}

清单差异2:

>>> t1 = {1:1, 2:2, 3:3, 4:{"a":"hello", "b":[1, 2, 3]}}
>>> t2 = {1:1, 2:2, 3:3, 4:{"a":"hello", "b":[1, 3, 2, 3]}}
>>> ddiff = DeepDiff(t1, t2)
>>> pprint (ddiff, indent = 2)
{ 'iterable_item_added': {"root[4]['b'][3]": 3},
  'values_changed': { "root[4]['b'][1]": {'newvalue': 3, 'oldvalue': 2},
                      "root[4]['b'][2]": {'newvalue': 2, 'oldvalue': 3}}}

列出差异忽略顺序或重复项:(具有与上述相同的字典)

>>> t1 = {1:1, 2:2, 3:3, 4:{"a":"hello", "b":[1, 2, 3]}}
>>> t2 = {1:1, 2:2, 3:3, 4:{"a":"hello", "b":[1, 3, 2, 3]}}
>>> ddiff = DeepDiff(t1, t2, ignore_order=True)
>>> print (ddiff)
{}

包含字典的列表:

>>> t1 = {1:1, 2:2, 3:3, 4:{"a":"hello", "b":[1, 2, {1:1, 2:2}]}}
>>> t2 = {1:1, 2:2, 3:3, 4:{"a":"hello", "b":[1, 2, {1:3}]}}
>>> ddiff = DeepDiff(t1, t2)
>>> pprint (ddiff, indent = 2)
{ 'dic_item_removed': ["root[4]['b'][2][2]"],
  'values_changed': {"root[4]['b'][2][1]": {'newvalue': 3, 'oldvalue': 1}}}

套装:

>>> t1 = {1, 2, 8}
>>> t2 = {1, 2, 3, 5}
>>> ddiff = DeepDiff(t1, t2)
>>> pprint (DeepDiff(t1, t2))
{'set_item_added': ['root[3]', 'root[5]'], 'set_item_removed': ['root[8]']}

命名元组:

>>> from collections import namedtuple
>>> Point = namedtuple('Point', ['x', 'y'])
>>> t1 = Point(x=11, y=22)
>>> t2 = Point(x=11, y=23)
>>> pprint (DeepDiff(t1, t2))
{'values_changed': {'root.y': {'newvalue': 23, 'oldvalue': 22}}}

自定义对象:

>>> class ClassA(object):
...     a = 1
...     def __init__(self, b):
...         self.b = b
... 
>>> t1 = ClassA(1)
>>> t2 = ClassA(2)
>>> 
>>> pprint(DeepDiff(t1, t2))
{'values_changed': {'root.b': {'newvalue': 2, 'oldvalue': 1}}}

对象属性添加:

>>> t2.c = "new attribute"
>>> pprint(DeepDiff(t1, t2))
{'attribute_added': ['root.c'],
 'values_changed': {'root.b': {'newvalue': 2, 'oldvalue': 1}}}

In case you want the difference recursively, I have written a package for python: https://github.com/seperman/deepdiff

Installation

Install from PyPi:

pip install deepdiff

Example usage

Importing

>>> from deepdiff import DeepDiff
>>> from pprint import pprint
>>> from __future__ import print_function # In case running on Python 2

Same object returns empty

>>> t1 = {1:1, 2:2, 3:3}
>>> t2 = t1
>>> print(DeepDiff(t1, t2))
{}

Type of an item has changed

>>> t1 = {1:1, 2:2, 3:3}
>>> t2 = {1:1, 2:"2", 3:3}
>>> pprint(DeepDiff(t1, t2), indent=2)
{ 'type_changes': { 'root[2]': { 'newtype': <class 'str'>,
                                 'newvalue': '2',
                                 'oldtype': <class 'int'>,
                                 'oldvalue': 2}}}

Value of an item has changed

>>> t1 = {1:1, 2:2, 3:3}
>>> t2 = {1:1, 2:4, 3:3}
>>> pprint(DeepDiff(t1, t2), indent=2)
{'values_changed': {'root[2]': {'newvalue': 4, 'oldvalue': 2}}}

Item added and/or removed

>>> t1 = {1:1, 2:2, 3:3, 4:4}
>>> t2 = {1:1, 2:4, 3:3, 5:5, 6:6}
>>> ddiff = DeepDiff(t1, t2)
>>> pprint (ddiff)
{'dic_item_added': ['root[5]', 'root[6]'],
 'dic_item_removed': ['root[4]'],
 'values_changed': {'root[2]': {'newvalue': 4, 'oldvalue': 2}}}

String difference

>>> t1 = {1:1, 2:2, 3:3, 4:{"a":"hello", "b":"world"}}
>>> t2 = {1:1, 2:4, 3:3, 4:{"a":"hello", "b":"world!"}}
>>> ddiff = DeepDiff(t1, t2)
>>> pprint (ddiff, indent = 2)
{ 'values_changed': { 'root[2]': {'newvalue': 4, 'oldvalue': 2},
                      "root[4]['b']": { 'newvalue': 'world!',
                                        'oldvalue': 'world'}}}

String difference 2

>>> t1 = {1:1, 2:2, 3:3, 4:{"a":"hello", "b":"world!\nGoodbye!\n1\n2\nEnd"}}
>>> t2 = {1:1, 2:2, 3:3, 4:{"a":"hello", "b":"world\n1\n2\nEnd"}}
>>> ddiff = DeepDiff(t1, t2)
>>> pprint (ddiff, indent = 2)
{ 'values_changed': { "root[4]['b']": { 'diff': '--- \n'
                                                '+++ \n'
                                                '@@ -1,5 +1,4 @@\n'
                                                '-world!\n'
                                                '-Goodbye!\n'
                                                '+world\n'
                                                ' 1\n'
                                                ' 2\n'
                                                ' End',
                                        'newvalue': 'world\n1\n2\nEnd',
                                        'oldvalue': 'world!\n'
                                                    'Goodbye!\n'
                                                    '1\n'
                                                    '2\n'
                                                    'End'}}}

>>> 
>>> print (ddiff['values_changed']["root[4]['b']"]["diff"])
--- 
+++ 
@@ -1,5 +1,4 @@
-world!
-Goodbye!
+world
 1
 2
 End

Type change

>>> t1 = {1:1, 2:2, 3:3, 4:{"a":"hello", "b":[1, 2, 3]}}
>>> t2 = {1:1, 2:2, 3:3, 4:{"a":"hello", "b":"world\n\n\nEnd"}}
>>> ddiff = DeepDiff(t1, t2)
>>> pprint (ddiff, indent = 2)
{ 'type_changes': { "root[4]['b']": { 'newtype': <class 'str'>,
                                      'newvalue': 'world\n\n\nEnd',
                                      'oldtype': <class 'list'>,
                                      'oldvalue': [1, 2, 3]}}}

List difference

>>> t1 = {1:1, 2:2, 3:3, 4:{"a":"hello", "b":[1, 2, 3, 4]}}
>>> t2 = {1:1, 2:2, 3:3, 4:{"a":"hello", "b":[1, 2]}}
>>> ddiff = DeepDiff(t1, t2)
>>> pprint (ddiff, indent = 2)
{'iterable_item_removed': {"root[4]['b'][2]": 3, "root[4]['b'][3]": 4}}

List difference 2:

>>> t1 = {1:1, 2:2, 3:3, 4:{"a":"hello", "b":[1, 2, 3]}}
>>> t2 = {1:1, 2:2, 3:3, 4:{"a":"hello", "b":[1, 3, 2, 3]}}
>>> ddiff = DeepDiff(t1, t2)
>>> pprint (ddiff, indent = 2)
{ 'iterable_item_added': {"root[4]['b'][3]": 3},
  'values_changed': { "root[4]['b'][1]": {'newvalue': 3, 'oldvalue': 2},
                      "root[4]['b'][2]": {'newvalue': 2, 'oldvalue': 3}}}

List difference ignoring order or duplicates: (with the same dictionaries as above)

>>> t1 = {1:1, 2:2, 3:3, 4:{"a":"hello", "b":[1, 2, 3]}}
>>> t2 = {1:1, 2:2, 3:3, 4:{"a":"hello", "b":[1, 3, 2, 3]}}
>>> ddiff = DeepDiff(t1, t2, ignore_order=True)
>>> print (ddiff)
{}

List that contains dictionary:

>>> t1 = {1:1, 2:2, 3:3, 4:{"a":"hello", "b":[1, 2, {1:1, 2:2}]}}
>>> t2 = {1:1, 2:2, 3:3, 4:{"a":"hello", "b":[1, 2, {1:3}]}}
>>> ddiff = DeepDiff(t1, t2)
>>> pprint (ddiff, indent = 2)
{ 'dic_item_removed': ["root[4]['b'][2][2]"],
  'values_changed': {"root[4]['b'][2][1]": {'newvalue': 3, 'oldvalue': 1}}}

Sets:

>>> t1 = {1, 2, 8}
>>> t2 = {1, 2, 3, 5}
>>> ddiff = DeepDiff(t1, t2)
>>> pprint (DeepDiff(t1, t2))
{'set_item_added': ['root[3]', 'root[5]'], 'set_item_removed': ['root[8]']}

Named Tuples:

>>> from collections import namedtuple
>>> Point = namedtuple('Point', ['x', 'y'])
>>> t1 = Point(x=11, y=22)
>>> t2 = Point(x=11, y=23)
>>> pprint (DeepDiff(t1, t2))
{'values_changed': {'root.y': {'newvalue': 23, 'oldvalue': 22}}}

Custom objects:

>>> class ClassA(object):
...     a = 1
...     def __init__(self, b):
...         self.b = b
... 
>>> t1 = ClassA(1)
>>> t2 = ClassA(2)
>>> 
>>> pprint(DeepDiff(t1, t2))
{'values_changed': {'root.b': {'newvalue': 2, 'oldvalue': 1}}}

Object attribute added:

>>> t2.c = "new attribute"
>>> pprint(DeepDiff(t1, t2))
{'attribute_added': ['root.c'],
 'values_changed': {'root.b': {'newvalue': 2, 'oldvalue': 1}}}

回答 5

可以使用python XOR运算符完成。

  • 这将删除每个列表中的重复项
  • 这将显示temp1与temp2和temp2与temp1的差异。

set(temp1) ^ set(temp2)

Can be done using python XOR operator.

  • This will remove the duplicates in each list
  • This will show difference of temp1 from temp2 and temp2 from temp1.

set(temp1) ^ set(temp2)

回答 6

最简单的方法

使用set()。difference(set())

list_a = [1,2,3]
list_b = [2,3]
print set(list_a).difference(set(list_b))

答案是 set([1])

可以打印为列表,

print list(set(list_a).difference(set(list_b)))

most simple way,

use set().difference(set())

list_a = [1,2,3]
list_b = [2,3]
print set(list_a).difference(set(list_b))

answer is set([1])

can print as a list,

print list(set(list_a).difference(set(list_b)))

回答 7

如果您真的在考虑性能,请使用numpy!

这是完整的笔记本,是github上的要点,其中包括list,numpy和pandas之间的比较。

https://gist.github.com/denfromufa/2821ff59b02e9482be15d27f2bbd4451

在此处输入图片说明

If you are really looking into performance, then use numpy!

Here is the full notebook as a gist on github with comparison between list, numpy, and pandas.

https://gist.github.com/denfromufa/2821ff59b02e9482be15d27f2bbd4451

enter image description here


回答 8

因为目前的解决方案都无法产生元组,所以我会抛出:

temp3 = tuple(set(temp1) - set(temp2))

或者:

#edited using @Mark Byers idea. If you accept this one as answer, just accept his instead.
temp3 = tuple(x for x in temp1 if x not in set(temp2))

像其他非元组在该方向上产生答案一样,它保留了顺序

i’ll toss in since none of the present solutions yield a tuple:

temp3 = tuple(set(temp1) - set(temp2))

alternatively:

#edited using @Mark Byers idea. If you accept this one as answer, just accept his instead.
temp3 = tuple(x for x in temp1 if x not in set(temp2))

Like the other non-tuple yielding answers in this direction, it preserves order


回答 9

我想要的东西,将采取两个列表,并可以做什么diffbash呢。因为当您搜索“ python diff two list”时该问题首先弹出,并且不是很具体,所以我将发布我提出的内容。

使用SequenceMatherfrom difflib可以像比较两个列表diff。其他答案都不会告诉您差异发生的位置,但是这个答案确实可以。一些答案只能在一个方向上有所不同。一些重新排列元素。有些不处理重复项。但是此解决方案为您提供了两个列表之间的真正区别:

a = 'A quick fox jumps the lazy dog'.split()
b = 'A quick brown mouse jumps over the dog'.split()

from difflib import SequenceMatcher

for tag, i, j, k, l in SequenceMatcher(None, a, b).get_opcodes():
  if tag == 'equal': print('both have', a[i:j])
  if tag in ('delete', 'replace'): print('  1st has', a[i:j])
  if tag in ('insert', 'replace'): print('  2nd has', b[k:l])

输出:

both have ['A', 'quick']
  1st has ['fox']
  2nd has ['brown', 'mouse']
both have ['jumps']
  2nd has ['over']
both have ['the']
  1st has ['lazy']
both have ['dog']

当然,如果您的应用程序做出与其他答案相同的假设,则您将从中受益最大。但是,如果您正在寻找真正的diff功能,那么这是唯一的方法。

例如,其他答案都无法处理:

a = [1,2,3,4,5]
b = [5,4,3,2,1]

但这确实做到了:

  2nd has [5, 4, 3, 2]
both have [1]
  1st has [2, 3, 4, 5]

I wanted something that would take two lists and could do what diff in bash does. Since this question pops up first when you search for “python diff two lists” and is not very specific, I will post what I came up with.

Using SequenceMather from difflib you can compare two lists like diff does. None of the other answers will tell you the position where the difference occurs, but this one does. Some answers give the difference in only one direction. Some reorder the elements. Some don’t handle duplicates. But this solution gives you a true difference between two lists:

a = 'A quick fox jumps the lazy dog'.split()
b = 'A quick brown mouse jumps over the dog'.split()

from difflib import SequenceMatcher

for tag, i, j, k, l in SequenceMatcher(None, a, b).get_opcodes():
  if tag == 'equal': print('both have', a[i:j])
  if tag in ('delete', 'replace'): print('  1st has', a[i:j])
  if tag in ('insert', 'replace'): print('  2nd has', b[k:l])

This outputs:

both have ['A', 'quick']
  1st has ['fox']
  2nd has ['brown', 'mouse']
both have ['jumps']
  2nd has ['over']
both have ['the']
  1st has ['lazy']
both have ['dog']

Of course, if your application makes the same assumptions the other answers make, you will benefit from them the most. But if you are looking for a true diff functionality, then this is the only way to go.

For example, none of the other answers could handle:

a = [1,2,3,4,5]
b = [5,4,3,2,1]

But this one does:

  2nd has [5, 4, 3, 2]
both have [1]
  1st has [2, 3, 4, 5]

回答 10

尝试这个:

temp3 = set(temp1) - set(temp2)

Try this:

temp3 = set(temp1) - set(temp2)

回答 11

这可能比Mark的列表理解速度还要快:

list(itertools.filterfalse(set(temp2).__contains__, temp1))

this could be even faster than Mark’s list comprehension:

list(itertools.filterfalse(set(temp2).__contains__, temp1))

回答 12

这是Counter最简单情况的答案。

这比上面的双向差异短,因为它只完全满足问题的要求:生成第一个列表的列表,而不生成第二个列表。

from collections import Counter

lst1 = ['One', 'Two', 'Three', 'Four']
lst2 = ['One', 'Two']

c1 = Counter(lst1)
c2 = Counter(lst2)
diff = list((c1 - c2).elements())

另外,根据您对可读性的偏好,它可以提供不错的一线:

diff = list((Counter(lst1) - Counter(lst2)).elements())

输出:

['Three', 'Four']

请注意,list(...)如果仅在呼叫上进行迭代,则可以将其删除。

由于此解决方案使用计数器,因此与许多基于集合的答案相比,它可以正确处理数量。例如在此输入上:

lst1 = ['One', 'Two', 'Two', 'Two', 'Three', 'Three', 'Four']
lst2 = ['One', 'Two']

输出为:

['Two', 'Two', 'Three', 'Three', 'Four']

Here’s a Counter answer for the simplest case.

This is shorter than the one above that does two-way diffs because it only does exactly what the question asks: generate a list of what’s in the first list but not the second.

from collections import Counter

lst1 = ['One', 'Two', 'Three', 'Four']
lst2 = ['One', 'Two']

c1 = Counter(lst1)
c2 = Counter(lst2)
diff = list((c1 - c2).elements())

Alternatively, depending on your readability preferences, it makes for a decent one-liner:

diff = list((Counter(lst1) - Counter(lst2)).elements())

Output:

['Three', 'Four']

Note that you can remove the list(...) call if you are just iterating over it.

Because this solution uses counters, it handles quantities properly vs the many set-based answers. For example on this input:

lst1 = ['One', 'Two', 'Two', 'Two', 'Three', 'Three', 'Four']
lst2 = ['One', 'Two']

The output is:

['Two', 'Two', 'Three', 'Three', 'Four']

回答 13

如果对difflist的元素进行排序和设置,则可以使用幼稚的方法。

list1=[1,2,3,4,5]
list2=[1,2,3]

print list1[len(list2):]

或使用本机set方法:

subset=set(list1).difference(list2)

print subset

import timeit
init = 'temp1 = list(range(100)); temp2 = [i * 2 for i in range(50)]'
print "Naive solution: ", timeit.timeit('temp1[len(temp2):]', init, number = 100000)
print "Native set solution: ", timeit.timeit('set(temp1).difference(temp2)', init, number = 100000)

天真的解决方案:0.0787101593292

本机设置解决方案:0.998837615564

You could use a naive method if the elements of the difflist are sorted and sets.

list1=[1,2,3,4,5]
list2=[1,2,3]

print list1[len(list2):]

or with native set methods:

subset=set(list1).difference(list2)

print subset

import timeit
init = 'temp1 = list(range(100)); temp2 = [i * 2 for i in range(50)]'
print "Naive solution: ", timeit.timeit('temp1[len(temp2):]', init, number = 100000)
print "Native set solution: ", timeit.timeit('set(temp1).difference(temp2)', init, number = 100000)

Naive solution: 0.0787101593292

Native set solution: 0.998837615564


回答 14

为此我在游戏中为时不晚,但是您可以将上述某些代码的性能与此进行比较,其中两个最快的竞争者是:

list(set(x).symmetric_difference(set(y)))
list(set(x) ^ set(y))

对于基本的编码我深表歉意。

import time
import random
from itertools import filterfalse

# 1 - performance (time taken)
# 2 - correctness (answer - 1,4,5,6)
# set performance
performance = 1
numberoftests = 7

def answer(x,y,z):
    if z == 0:
        start = time.clock()
        lists = (str(list(set(x)-set(y))+list(set(y)-set(y))))
        times = ("1 = " + str(time.clock() - start))
        return (lists,times)

    elif z == 1:
        start = time.clock()
        lists = (str(list(set(x).symmetric_difference(set(y)))))
        times = ("2 = " + str(time.clock() - start))
        return (lists,times)

    elif z == 2:
        start = time.clock()
        lists = (str(list(set(x) ^ set(y))))
        times = ("3 = " + str(time.clock() - start))
        return (lists,times)

    elif z == 3:
        start = time.clock()
        lists = (filterfalse(set(y).__contains__, x))
        times = ("4 = " + str(time.clock() - start))
        return (lists,times)

    elif z == 4:
        start = time.clock()
        lists = (tuple(set(x) - set(y)))
        times = ("5 = " + str(time.clock() - start))
        return (lists,times)

    elif z == 5:
        start = time.clock()
        lists = ([tt for tt in x if tt not in y])
        times = ("6 = " + str(time.clock() - start))
        return (lists,times)

    else:    
        start = time.clock()
        Xarray = [iDa for iDa in x if iDa not in y]
        Yarray = [iDb for iDb in y if iDb not in x]
        lists = (str(Xarray + Yarray))
        times = ("7 = " + str(time.clock() - start))
        return (lists,times)

n = numberoftests

if performance == 2:
    a = [1,2,3,4,5]
    b = [3,2,6]
    for c in range(0,n):
        d = answer(a,b,c)
        print(d[0])

elif performance == 1:
    for tests in range(0,10):
        print("Test Number" + str(tests + 1))
        a = random.sample(range(1, 900000), 9999)
        b = random.sample(range(1, 900000), 9999)
        for c in range(0,n):
            #if c not in (1,4,5,6):
            d = answer(a,b,c)
            print(d[1])

I am little too late in the game for this but you can do a comparison of performance of some of the above mentioned code with this, two of the fastest contenders are,

list(set(x).symmetric_difference(set(y)))
list(set(x) ^ set(y))

I apologize for the elementary level of coding.

import time
import random
from itertools import filterfalse

# 1 - performance (time taken)
# 2 - correctness (answer - 1,4,5,6)
# set performance
performance = 1
numberoftests = 7

def answer(x,y,z):
    if z == 0:
        start = time.clock()
        lists = (str(list(set(x)-set(y))+list(set(y)-set(y))))
        times = ("1 = " + str(time.clock() - start))
        return (lists,times)

    elif z == 1:
        start = time.clock()
        lists = (str(list(set(x).symmetric_difference(set(y)))))
        times = ("2 = " + str(time.clock() - start))
        return (lists,times)

    elif z == 2:
        start = time.clock()
        lists = (str(list(set(x) ^ set(y))))
        times = ("3 = " + str(time.clock() - start))
        return (lists,times)

    elif z == 3:
        start = time.clock()
        lists = (filterfalse(set(y).__contains__, x))
        times = ("4 = " + str(time.clock() - start))
        return (lists,times)

    elif z == 4:
        start = time.clock()
        lists = (tuple(set(x) - set(y)))
        times = ("5 = " + str(time.clock() - start))
        return (lists,times)

    elif z == 5:
        start = time.clock()
        lists = ([tt for tt in x if tt not in y])
        times = ("6 = " + str(time.clock() - start))
        return (lists,times)

    else:    
        start = time.clock()
        Xarray = [iDa for iDa in x if iDa not in y]
        Yarray = [iDb for iDb in y if iDb not in x]
        lists = (str(Xarray + Yarray))
        times = ("7 = " + str(time.clock() - start))
        return (lists,times)

n = numberoftests

if performance == 2:
    a = [1,2,3,4,5]
    b = [3,2,6]
    for c in range(0,n):
        d = answer(a,b,c)
        print(d[0])

elif performance == 1:
    for tests in range(0,10):
        print("Test Number" + str(tests + 1))
        a = random.sample(range(1, 900000), 9999)
        b = random.sample(range(1, 900000), 9999)
        for c in range(0,n):
            #if c not in (1,4,5,6):
            d = answer(a,b,c)
            print(d[1])

回答 15

这是一些比较两个字符串列表的简单的,保留顺序的方法。

一种不寻常的方法,使用pathlib

import pathlib


temp1 = ["One", "Two", "Three", "Four"]
temp2 = ["One", "Two"]

p = pathlib.Path(*temp1)
r = p.relative_to(*temp2)
list(r.parts)
# ['Three', 'Four']

假设两个列表都包含以相同的开头的字符串。有关更多详细信息,请参阅文档。注意,与设置操作相比,它并不是特别快。


使用以下方法的直接实现itertools.zip_longest

import itertools as it


[x for x, y in it.zip_longest(temp1, temp2) if x != y]
# ['Three', 'Four']

Here are a few simple, order-preserving ways of diffing two lists of strings.

Code

An unusual approach using pathlib:

import pathlib


temp1 = ["One", "Two", "Three", "Four"]
temp2 = ["One", "Two"]

p = pathlib.Path(*temp1)
r = p.relative_to(*temp2)
list(r.parts)
# ['Three', 'Four']

This assumes both lists contain strings with equivalent beginnings. See the docs for more details. Note, it is not particularly fast compared to set operations.


A straight-forward implementation using itertools.zip_longest:

import itertools as it


[x for x, y in it.zip_longest(temp1, temp2) if x != y]
# ['Three', 'Four']

回答 16

这是另一个解决方案:

def diff(a, b):
    xa = [i for i in set(a) if i not in b]
    xb = [i for i in set(b) if i not in a]
    return xa + xb

This is another solution:

def diff(a, b):
    xa = [i for i in set(a) if i not in b]
    xb = [i for i in set(b) if i not in a]
    return xa + xb

回答 17

如果遇到问题TypeError: unhashable type: 'list',则需要将列表或集合转换为元组,例如

set(map(tuple, list_of_lists1)).symmetric_difference(set(map(tuple, list_of_lists2)))

另请参阅如何在python中比较列表/集合的列表?

If you run into TypeError: unhashable type: 'list' you need to turn lists or sets into tuples, e.g.

set(map(tuple, list_of_lists1)).symmetric_difference(set(map(tuple, list_of_lists2)))

See also How to compare a list of lists/sets in python?


回答 18

假设我们有两个清单

list1 = [1, 3, 5, 7, 9]
list2 = [1, 2, 3, 4, 5]

从上面的两个列表中可以看出,列表2中存在项目1、3、5,而项目7、9中则不存在。另一方面,列表1中存在项目1、3、5,而项目2、4中不存在。

返回包含项目7、9和2、4的新列表的最佳解决方案是什么?

上面的所有答案都找到了解决方案,现在最最佳的是什么?

def difference(list1, list2):
    new_list = []
    for i in list1:
        if i not in list2:
            new_list.append(i)

    for j in list2:
        if j not in list1:
            new_list.append(j)
    return new_list

def sym_diff(list1, list2):
    return list(set(list1).symmetric_difference(set(list2)))

使用timeit我们可以看到结果

t1 = timeit.Timer("difference(list1, list2)", "from __main__ import difference, 
list1, list2")
t2 = timeit.Timer("sym_diff(list1, list2)", "from __main__ import sym_diff, 
list1, list2")

print('Using two for loops', t1.timeit(number=100000), 'Milliseconds')
print('Using two for loops', t2.timeit(number=100000), 'Milliseconds')

退货

[7, 9, 2, 4]
Using two for loops 0.11572412995155901 Milliseconds
Using symmetric_difference 0.11285737506113946 Milliseconds

Process finished with exit code 0

Let’s say we have two lists

list1 = [1, 3, 5, 7, 9]
list2 = [1, 2, 3, 4, 5]

we can see from the above two lists that items 1, 3, 5 exist in list2 and items 7, 9 do not. On the other hand, items 1, 3, 5 exist in list1 and items 2, 4 do not.

What is the best solution to return a new list containing items 7, 9 and 2, 4?

All answers above find the solution, now whats the most optimal?

def difference(list1, list2):
    new_list = []
    for i in list1:
        if i not in list2:
            new_list.append(i)

    for j in list2:
        if j not in list1:
            new_list.append(j)
    return new_list

versus

def sym_diff(list1, list2):
    return list(set(list1).symmetric_difference(set(list2)))

Using timeit we can see the results

t1 = timeit.Timer("difference(list1, list2)", "from __main__ import difference, 
list1, list2")
t2 = timeit.Timer("sym_diff(list1, list2)", "from __main__ import sym_diff, 
list1, list2")

print('Using two for loops', t1.timeit(number=100000), 'Milliseconds')
print('Using two for loops', t2.timeit(number=100000), 'Milliseconds')

returns

[7, 9, 2, 4]
Using two for loops 0.11572412995155901 Milliseconds
Using symmetric_difference 0.11285737506113946 Milliseconds

Process finished with exit code 0

回答 19

arulmr解决方案的单行版本

def diff(listA, listB):
    return set(listA) - set(listB) | set(listA) -set(listB)

single line version of arulmr solution

def diff(listA, listB):
    return set(listA) - set(listB) | set(listA) -set(listB)

回答 20

如果您想要更像变更集的东西…可以使用Counter

from collections import Counter

def diff(a, b):
  """ more verbose than needs to be, for clarity """
  ca, cb = Counter(a), Counter(b)
  to_add = cb - ca
  to_remove = ca - cb
  changes = Counter(to_add)
  changes.subtract(to_remove)
  return changes

lista = ['one', 'three', 'four', 'four', 'one']
listb = ['one', 'two', 'three']

In [127]: diff(lista, listb)
Out[127]: Counter({'two': 1, 'one': -1, 'four': -2})
# in order to go from lista to list b, you need to add a "two", remove a "one", and remove two "four"s

In [128]: diff(listb, lista)
Out[128]: Counter({'four': 2, 'one': 1, 'two': -1})
# in order to go from listb to lista, you must add two "four"s, add a "one", and remove a "two"

if you want something more like a changeset… could use Counter

from collections import Counter

def diff(a, b):
  """ more verbose than needs to be, for clarity """
  ca, cb = Counter(a), Counter(b)
  to_add = cb - ca
  to_remove = ca - cb
  changes = Counter(to_add)
  changes.subtract(to_remove)
  return changes

lista = ['one', 'three', 'four', 'four', 'one']
listb = ['one', 'two', 'three']

In [127]: diff(lista, listb)
Out[127]: Counter({'two': 1, 'one': -1, 'four': -2})
# in order to go from lista to list b, you need to add a "two", remove a "one", and remove two "four"s

In [128]: diff(listb, lista)
Out[128]: Counter({'four': 2, 'one': 1, 'two': -1})
# in order to go from listb to lista, you must add two "four"s, add a "one", and remove a "two"

回答 21

我们可以计算交集减去列表的并集:

temp1 = ['One', 'Two', 'Three', 'Four']
temp2 = ['One', 'Two', 'Five']

set(temp1+temp2)-(set(temp1)&set(temp2))

Out: set(['Four', 'Five', 'Three']) 

We can calculate intersection minus union of lists:

temp1 = ['One', 'Two', 'Three', 'Four']
temp2 = ['One', 'Two', 'Five']

set(temp1+temp2)-(set(temp1)&set(temp2))

Out: set(['Four', 'Five', 'Three']) 

回答 22

只需一行即可解决。给定的问题是两个列表(temp1和temp2)在第三个列表(temp3)中返​​回它们的差。

temp3 = list(set(temp1).difference(set(temp2)))

This can be solved with one line. The question is given two lists (temp1 and temp2) return their difference in a third list (temp3).

temp3 = list(set(temp1).difference(set(temp2)))

回答 23

这是区分两个列表(无论内容如何)的一种简单方法,您可以得到如下所示的结果:

>>> from sets import Set
>>>
>>> l1 = ['xvda', False, 'xvdbb', 12, 'xvdbc']
>>> l2 = ['xvda', 'xvdbb', 'xvdbc', 'xvdbd', None]
>>>
>>> Set(l1).symmetric_difference(Set(l2))
Set([False, 'xvdbd', None, 12])

希望这会有所帮助。

Here is an simple way to distinguish two lists (whatever the contents are), you can get the result as shown below :

>>> from sets import Set
>>>
>>> l1 = ['xvda', False, 'xvdbb', 12, 'xvdbc']
>>> l2 = ['xvda', 'xvdbb', 'xvdbc', 'xvdbd', None]
>>>
>>> Set(l1).symmetric_difference(Set(l2))
Set([False, 'xvdbd', None, 12])

Hope this will helpful.


回答 24

我更喜欢使用转换为集合,然后使用“ difference()”函数。完整的代码是:

temp1 = ['One', 'Two', 'Three', 'Four'  ]                   
temp2 = ['One', 'Two']
set1 = set(temp1)
set2 = set(temp2)
set3 = set1.difference(set2)
temp3 = list(set3)
print(temp3)

输出:

>>>print(temp3)
['Three', 'Four']

这是最容易理解的,如果将来使用大数据,将来会更容易,如果不需要重复数据,将其转换为数据集将删除重复数据。希望能帮助到你 ;-)

I prefer to use converting to sets and then using the “difference()” function. The full code is :

temp1 = ['One', 'Two', 'Three', 'Four'  ]                   
temp2 = ['One', 'Two']
set1 = set(temp1)
set2 = set(temp2)
set3 = set1.difference(set2)
temp3 = list(set3)
print(temp3)

Output:

>>>print(temp3)
['Three', 'Four']

It’s the easiest to undersand, and morover in future if you work with large data, converting it to sets will remove duplicates if duplicates are not required. Hope it helps ;-)


回答 25

(list(set(a)-set(b))+list(set(b)-set(a)))
(list(set(a)-set(b))+list(set(b)-set(a)))

回答 26

def diffList(list1, list2):     # returns the difference between two lists.
    if len(list1) > len(list2):
        return (list(set(list1) - set(list2)))
    else:
        return (list(set(list2) - set(list1)))

例如,如果list1 = [10, 15, 20, 25, 30, 35, 40]list2 = [25, 40, 35]则返回的列表将是output = [10, 20, 30, 15]

def diffList(list1, list2):     # returns the difference between two lists.
    if len(list1) > len(list2):
        return (list(set(list1) - set(list2)))
    else:
        return (list(set(list2) - set(list1)))

e.g. if list1 = [10, 15, 20, 25, 30, 35, 40] and list2 = [25, 40, 35] then the returned list will be output = [10, 20, 30, 15]


为什么[]比list()快?

问题:为什么[]比list()快?

我最近比较了[]和的处理速度,并list()惊讶地发现它的[]运行速度是的三倍以上list()。我跑了相同的测试与{}dict(),结果几乎相同:[]{}两个花了大约0.128sec /百万次,而list()dict()把每个粗0.428sec /万次。

为什么是这样?不要[]{}(可能()'',太)立即传回了一些空的股票面值的副本,而其明确命名同行(list()dict()tuple()str())完全去创建一个对象,他们是否真的有元素?

我不知道这两种方法有何不同,但我很想找出答案。我在文档中或SO上都找不到答案,而寻找空括号却比我预期的要麻烦得多。

通过分别调用timeit.timeit("[]")timeit.timeit("list()"),和timeit.timeit("{}")timeit.timeit("dict()")来比较列表和字典,以获得计时结果。我正在运行Python 2.7.9。

我最近发现“ 为什么True慢于if? ”比较了if Trueto 的性能,if 1并且似乎触及了类似的文字对全局场景;也许也值得考虑。

I recently compared the processing speeds of [] and list() and was surprised to discover that [] runs more than three times faster than list(). I ran the same test with {} and dict() and the results were practically identical: [] and {} both took around 0.128sec / million cycles, while list() and dict() took roughly 0.428sec / million cycles each.

Why is this? Do [] and {} (and probably () and '', too) immediately pass back a copies of some empty stock literal while their explicitly-named counterparts (list(), dict(), tuple(), str()) fully go about creating an object, whether or not they actually have elements?

I have no idea how these two methods differ but I’d love to find out. I couldn’t find an answer in the docs or on SO, and searching for empty brackets turned out to be more problematic than I’d expected.

I got my timing results by calling timeit.timeit("[]") and timeit.timeit("list()"), and timeit.timeit("{}") and timeit.timeit("dict()"), to compare lists and dictionaries, respectively. I’m running Python 2.7.9.

I recently discovered “Why is if True slower than if 1?” that compares the performance of if True to if 1 and seems to touch on a similar literal-versus-global scenario; perhaps it’s worth considering as well.


回答 0

因为[]{}文字语法。Python可以创建字节码仅用于创建列表或字典对象:

>>> import dis
>>> dis.dis(compile('[]', '', 'eval'))
  1           0 BUILD_LIST               0
              3 RETURN_VALUE        
>>> dis.dis(compile('{}', '', 'eval'))
  1           0 BUILD_MAP                0
              3 RETURN_VALUE        

list()dict()是单独的对象。它们的名称需要解析,必须包含堆栈以推入参数,必须存储框架以供以后检索,并且必须进行调用。这都需要更多时间。

对于空的情况,这意味着您至少要有一个LOAD_NAME(必须在全局命名空间以及__builtin__模块中进行搜索),后跟一个CALL_FUNCTION必须保留当前帧的:

>>> dis.dis(compile('list()', '', 'eval'))
  1           0 LOAD_NAME                0 (list)
              3 CALL_FUNCTION            0
              6 RETURN_VALUE        
>>> dis.dis(compile('dict()', '', 'eval'))
  1           0 LOAD_NAME                0 (dict)
              3 CALL_FUNCTION            0
              6 RETURN_VALUE        

您可以使用以下命令分别计时名称查找timeit

>>> import timeit
>>> timeit.timeit('list', number=10**7)
0.30749011039733887
>>> timeit.timeit('dict', number=10**7)
0.4215109348297119

时间差异可能是字典哈希冲突。从调用这些对象的时间中减去这些时间,然后将结果与使用文字的时间进行比较:

>>> timeit.timeit('[]', number=10**7)
0.30478692054748535
>>> timeit.timeit('{}', number=10**7)
0.31482696533203125
>>> timeit.timeit('list()', number=10**7)
0.9991960525512695
>>> timeit.timeit('dict()', number=10**7)
1.0200958251953125

因此,1.00 - 0.31 - 0.30 == 0.39每1000万次调用必须调用该对象花费了额外的几秒钟。

您可以通过将全局名称别名为本地名称来避免全局查找成本(使用timeit设置,绑定到名称的所有内容都是本地名称):

>>> timeit.timeit('_list', '_list = list', number=10**7)
0.1866450309753418
>>> timeit.timeit('_dict', '_dict = dict', number=10**7)
0.19016098976135254
>>> timeit.timeit('_list()', '_list = list', number=10**7)
0.841480016708374
>>> timeit.timeit('_dict()', '_dict = dict', number=10**7)
0.7233691215515137

但您永远无法克服这些CALL_FUNCTION成本。

Because [] and {} are literal syntax. Python can create bytecode just to create the list or dictionary objects:

>>> import dis
>>> dis.dis(compile('[]', '', 'eval'))
  1           0 BUILD_LIST               0
              3 RETURN_VALUE        
>>> dis.dis(compile('{}', '', 'eval'))
  1           0 BUILD_MAP                0
              3 RETURN_VALUE        

list() and dict() are separate objects. Their names need to be resolved, the stack has to be involved to push the arguments, the frame has to be stored to retrieve later, and a call has to be made. That all takes more time.

For the empty case, that means you have at the very least a LOAD_NAME (which has to search through the global namespace as well as the __builtin__ module) followed by a CALL_FUNCTION, which has to preserve the current frame:

>>> dis.dis(compile('list()', '', 'eval'))
  1           0 LOAD_NAME                0 (list)
              3 CALL_FUNCTION            0
              6 RETURN_VALUE        
>>> dis.dis(compile('dict()', '', 'eval'))
  1           0 LOAD_NAME                0 (dict)
              3 CALL_FUNCTION            0
              6 RETURN_VALUE        

You can time the name lookup separately with timeit:

>>> import timeit
>>> timeit.timeit('list', number=10**7)
0.30749011039733887
>>> timeit.timeit('dict', number=10**7)
0.4215109348297119

The time discrepancy there is probably a dictionary hash collision. Subtract those times from the times for calling those objects, and compare the result against the times for using literals:

>>> timeit.timeit('[]', number=10**7)
0.30478692054748535
>>> timeit.timeit('{}', number=10**7)
0.31482696533203125
>>> timeit.timeit('list()', number=10**7)
0.9991960525512695
>>> timeit.timeit('dict()', number=10**7)
1.0200958251953125

So having to call the object takes an additional 1.00 - 0.31 - 0.30 == 0.39 seconds per 10 million calls.

You can avoid the global lookup cost by aliasing the global names as locals (using a timeit setup, everything you bind to a name is a local):

>>> timeit.timeit('_list', '_list = list', number=10**7)
0.1866450309753418
>>> timeit.timeit('_dict', '_dict = dict', number=10**7)
0.19016098976135254
>>> timeit.timeit('_list()', '_list = list', number=10**7)
0.841480016708374
>>> timeit.timeit('_dict()', '_dict = dict', number=10**7)
0.7233691215515137

but you never can overcome that CALL_FUNCTION cost.


回答 1

list()需要全局查找和函数调用,但需要[]编译为一条指令。看到:

Python 2.7.3
>>> import dis
>>> print dis.dis(lambda: list())
  1           0 LOAD_GLOBAL              0 (list)
              3 CALL_FUNCTION            0
              6 RETURN_VALUE        
None
>>> print dis.dis(lambda: [])
  1           0 BUILD_LIST               0
              3 RETURN_VALUE        
None

list() requires a global lookup and a function call but [] compiles to a single instruction. See:

Python 2.7.3
>>> import dis
>>> print dis.dis(lambda: list())
  1           0 LOAD_GLOBAL              0 (list)
              3 CALL_FUNCTION            0
              6 RETURN_VALUE        
None
>>> print dis.dis(lambda: [])
  1           0 BUILD_LIST               0
              3 RETURN_VALUE        
None

回答 2

因为list是一个功能转化说一个字符串列表对象,而[]用于创建一个列表蝙蝠。尝试一下(可能对您更有意义):

x = "wham bam"
a = list(x)
>>> a
["w", "h", "a", "m", ...]

y = ["wham bam"]
>>> y
["wham bam"]

为您提供包含您所输入内容的实际列表。

Because list is a function to convert say a string to a list object, while [] is used to create a list off the bat. Try this (might make more sense to you):

x = "wham bam"
a = list(x)
>>> a
["w", "h", "a", "m", ...]

While

y = ["wham bam"]
>>> y
["wham bam"]

Gives you a actual list containing whatever you put in it.


回答 3

至此,答案非常好,并完全涵盖了这个问题。对于那些感兴趣的人,我将进一步从字节码中删除。我正在使用CPython的最新仓库;在这方面,旧版本的行为类似,但可能会稍作更改。

这是每个BUILD_LIST针对for []CALL_FUNCTIONfor 的执行情况的细分list()


BUILD_LIST指令:

您应该只查看恐怖:

PyObject *list =  PyList_New(oparg);
if (list == NULL)
    goto error;
while (--oparg >= 0) {
    PyObject *item = POP();
    PyList_SET_ITEM(list, oparg, item);
}
PUSH(list);
DISPATCH();

我知道那令人费解。这是多么简单:

  • 使用创建新列表PyList_New(主要是为新的列表对象分配内存),以oparg信号指示堆栈上的参数数量。开门见山。
  • 检查是否没有问题if (list==NULL)
  • 使用PyList_SET_ITEM(宏)添加位于堆栈上的所有参数(在我们的示例中此参数未执行)。

难怪它很快!它是为创建新列表而定制的,仅此而已:-)

CALL_FUNCTION指令:

窥视代码处理时,这是您看到的第一件事CALL_FUNCTION

PyObject **sp, *res;
sp = stack_pointer;
res = call_function(&sp, oparg, NULL);
stack_pointer = sp;
PUSH(res);
if (res == NULL) {
    goto error;
}
DISPATCH();

看起来很无害吧?好吧,不是,不幸的call_function是,不是一个会立即调用该函数的直截了当的家伙,它不会。相反,它从堆栈中获取对象,获取堆栈中的所有参数,然后根据对象的类型进行切换。它是:

我们正在调用list类型,传入的参数call_functionPyList_Type。CPython现在必须调用一个泛型函数来处理名为的所有可调用对象_PyObject_FastCallKeywords,还有更多函数调用。

该函数再次检查某些函数类型(我不明白为什么),然后在为kwargs创建字典后,如果需要,继续调用_PyObject_FastCallDict

_PyObject_FastCallDict终于把我们带到某个地方!执行后甚至更多的检查抓住了tp_call从插槽type中的type我们在通过了,那就是它抓住type.tp_call。然后,它根据传入的参数来创建元组_PyStack_AsTuple,最后可以最终进行调用

tp_call,它将匹配type.__call__并最终创建列表对象。它调用与之__new__对应的列表PyType_GenericNew并为其分配内存PyType_GenericAlloc这实际上是它与追上的部分PyList_New,最后。所有以前的内容对于以通用方式处理对象都是必需的。

最后,使用任何可用参数type_call调用list.__init__并初始化列表,然后继续返回原来的方式。:-)

最后,记住 LOAD_NAME,这是另一个在这里做出贡献的家伙。


很容易看到,在处理我们的输入时,Python通常必须跳过圈以真正找到合适的C函数来完成工作。它不具有立即调用它的功能,因为它是动态的,有人可能会掩盖list并且男孩会做很多人做的事情),因此必须采取另一条路。

这是哪里 list()损失很多的地方:正在探索的Python需要做以找出它应该做什么。

另一方面,字面语法恰好意味着一回事。它无法更改,并且始终以预定的方式运行。

脚注:所有功能名称均可能从一个版本更改为另一个版本。关键点仍然存在,并且很可能在将来的任何版本中都存在,这是动态查找使事情变慢的原因。

The answers here are great, to the point and fully cover this question. I’ll drop a further step down from byte-code for those interested. I’m using the most recent repo of CPython; older versions behave similar in this regard but slight changes might be in place.

Here’s a break down of the execution for each of these, BUILD_LIST for [] and CALL_FUNCTION for list().


The BUILD_LIST instruction:

You should just view the horror:

PyObject *list =  PyList_New(oparg);
if (list == NULL)
    goto error;
while (--oparg >= 0) {
    PyObject *item = POP();
    PyList_SET_ITEM(list, oparg, item);
}
PUSH(list);
DISPATCH();

Terribly convoluted, I know. This is how simple it is:

  • Create a new list with PyList_New (this mainly allocates the memory for a new list object), oparg signalling the number of arguments on the stack. Straight to the point.
  • Check that nothing went wrong with if (list==NULL).
  • Add any arguments (in our case this isn’t executed) located on the stack with PyList_SET_ITEM (a macro).

No wonder it is fast! It’s custom-made for creating new lists, nothing else :-)

The CALL_FUNCTION instruction:

Here’s the first thing you see when you peek at the code handling CALL_FUNCTION:

PyObject **sp, *res;
sp = stack_pointer;
res = call_function(&sp, oparg, NULL);
stack_pointer = sp;
PUSH(res);
if (res == NULL) {
    goto error;
}
DISPATCH();

Looks pretty harmless, right? Well, no, unfortunately not, call_function is not a straightforward guy that will call the function immediately, it can’t. Instead, it grabs the object from the stack, grabs all arguments of the stack and then switches based on the type of the object; is it a:

We’re calling the list type, the argument passed in to call_function is PyList_Type. CPython now has to call a generic function to handle any callable objects named _PyObject_FastCallKeywords, yay more function calls.

This function again makes some checks for certain function types (which I cannot understand why) and then, after creating a dict for kwargs if required, goes on to call _PyObject_FastCallDict.

_PyObject_FastCallDict finally gets us somewhere! After performing even more checks it grabs the tp_call slot from the type of the type we’ve passed in, that is, it grabs type.tp_call. It then proceeds to create a tuple out of of the arguments passed in with _PyStack_AsTuple and, finally, a call can finally be made!

tp_call, which matches type.__call__ takes over and finally creates the list object. It calls the lists __new__ which corresponds to PyType_GenericNew and allocates memory for it with PyType_GenericAlloc: This is actually the part where it catches up with PyList_New, finally. All the previous are necessary to handle objects in a generic fashion.

In the end, type_call calls list.__init__ and initializes the list with any available arguments, then we go on a returning back the way we came. :-)

Finally, remmeber the LOAD_NAME, that’s another guy that contributes here.


It’s easy to see that, when dealing with our input, Python generally has to jump through hoops in order to actually find out the appropriate C function to do the job. It doesn’t have the curtesy of immediately calling it because it’s dynamic, someone might mask list (and boy do many people do) and another path must be taken.

This is where list() loses much: The exploring Python needs to do to find out what the heck it should do.

Literal syntax, on the other hand, means exactly one thing; it cannot be changed and always behaves in a pre-determined way.

Footnote: All function names are subject to change from one release to the other. The point still stands and most likely will stand in any future versions, it’s the dynamic look-up that slows things down.


回答 4

为什么[]要比list()

最大的原因是Python list()就像用户定义的函数一样对待,这意味着您可以通过别名别名来拦截它list并做一些不同的事情(例如使用您自己的子类列表或双端队列)。

它将立即使用创建新的内置列表实例[]

我的解释旨在为您提供直觉。

说明

[] 通常称为文字语法。

在语法中,这称为“列表显示”。从文档

列表显示是括在方括号中的一系列可能为空的表达式:

list_display ::=  "[" [starred_list | comprehension] "]"

列表显示将产生一个新的列表对象,其内容由表达式列表或理解列表指定。提供逗号分隔的表达式列表时,将按从左到右的顺序评估其元素,并将其按此顺序放入列表对象中。提供理解后,将根据理解产生的元素来构建列表。

简而言之,这意味着将list创建一个内置类型的对象。

不能回避这一点-这意味着Python可以尽快完成它。

另一方面,list()可以list使用内置列表构造函数拦截创建内置对象的过程。

例如,假设我们希望创建噪音较大的列表:

class List(list):
    def __init__(self, iterable=None):
        if iterable is None:
            super().__init__()
        else:
            super().__init__(iterable)
        print('List initialized.')

然后,我们可以list在模块级别的全局范围内截取该名称,然后在创建时list,实际上创建了子类型列表:

>>> list = List
>>> a_list = list()
List initialized.
>>> type(a_list)
<class '__main__.List'>

同样,我们可以将其从全局命名空间中删除

del list

并将其放在内置命名空间中:

import builtins
builtins.list = List

现在:

>>> list_0 = list()
List initialized.
>>> type(list_0)
<class '__main__.List'>

并注意列表显示无条件创建列表:

>>> list_1 = []
>>> type(list_1)
<class 'list'>

我们可能只是暂时执行此操作,所以请撤消更改-首先List从内置文件中删除新对象:

>>> del builtins.list
>>> builtins.list
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
AttributeError: module 'builtins' has no attribute 'list'
>>> list()
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
NameError: name 'list' is not defined

哦,不,我们失去了原来的踪迹。

不用担心,我们仍然可以得到list-它是列表文字的类型:

>>> builtins.list = type([])
>>> list()
[]

所以…

为什么[]要比list()

如我们所见-我们可以覆盖list-但是我们不能截取文字类型的创建。使用时,list我们必须进行查找以查看是否存在任何内容。

然后,我们必须调用已查找的任何可调用对象。从语法上:

调用使用一系列可能为空的参数来调用可调用对象(例如,函数):

call                 ::=  primary "(" [argument_list [","] | comprehension] ")"

我们可以看到它对任何名称都具有相同的作用,而不仅仅是列表:

>>> import dis
>>> dis.dis('list()')
  1           0 LOAD_NAME                0 (list)
              2 CALL_FUNCTION            0
              4 RETURN_VALUE
>>> dis.dis('doesnotexist()')
  1           0 LOAD_NAME                0 (doesnotexist)
              2 CALL_FUNCTION            0
              4 RETURN_VALUE

因为[]在Python字节码级别没有函数调用:

>>> dis.dis('[]')
  1           0 BUILD_LIST               0
              2 RETURN_VALUE

它只是直接建立列表而无需在字节码级别进行任何查找或调用。

结论

我们已经证明了list可以使用范围规则用用户代码拦截,并且可以list()查找可调用对象然后调用它。

[]列表显示或文字显示则避免了名称查找和函数调用。

Why is [] faster than list()?

The biggest reason is that Python treats list() just like a user-defined function, which means you can intercept it by aliasing something else to list and do something different (like use your own subclassed list or perhaps a deque).

It immediately creates a new instance of a builtin list with [].

My explanation seeks to give you the intuition for this.

Explanation

[] is commonly known as literal syntax.

In the grammar, this is referred to as a “list display”. From the docs:

A list display is a possibly empty series of expressions enclosed in square brackets:

list_display ::=  "[" [starred_list | comprehension] "]"

A list display yields a new list object, the contents being specified by either a list of expressions or a comprehension. When a comma-separated list of expressions is supplied, its elements are evaluated from left to right and placed into the list object in that order. When a comprehension is supplied, the list is constructed from the elements resulting from the comprehension.

In short, this means that a builtin object of type list is created.

There is no circumventing this – which means Python can do it as quickly as it may.

On the other hand, list() can be intercepted from creating a builtin list using the builtin list constructor.

For example, say we want our lists to be created noisily:

class List(list):
    def __init__(self, iterable=None):
        if iterable is None:
            super().__init__()
        else:
            super().__init__(iterable)
        print('List initialized.')

We could then intercept the name list on the module level global scope, and then when we create a list, we actually create our subtyped list:

>>> list = List
>>> a_list = list()
List initialized.
>>> type(a_list)
<class '__main__.List'>

Similarly we could remove it from the global namespace

del list

and put it in the builtin namespace:

import builtins
builtins.list = List

And now:

>>> list_0 = list()
List initialized.
>>> type(list_0)
<class '__main__.List'>

And note that the list display creates a list unconditionally:

>>> list_1 = []
>>> type(list_1)
<class 'list'>

We probably only do this temporarily, so lets undo our changes – first remove the new List object from the builtins:

>>> del builtins.list
>>> builtins.list
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
AttributeError: module 'builtins' has no attribute 'list'
>>> list()
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
NameError: name 'list' is not defined

Oh, no, we lost track of the original.

Not to worry, we can still get list – it’s the type of a list literal:

>>> builtins.list = type([])
>>> list()
[]

So…

Why is [] faster than list()?

As we’ve seen – we can overwrite list – but we can’t intercept the creation of the literal type. When we use list we have to do the lookups to see if anything is there.

Then we have to call whatever callable we have looked up. From the grammar:

A call calls a callable object (e.g., a function) with a possibly empty series of arguments:

call                 ::=  primary "(" [argument_list [","] | comprehension] ")"

We can see that it does the same thing for any name, not just list:

>>> import dis
>>> dis.dis('list()')
  1           0 LOAD_NAME                0 (list)
              2 CALL_FUNCTION            0
              4 RETURN_VALUE
>>> dis.dis('doesnotexist()')
  1           0 LOAD_NAME                0 (doesnotexist)
              2 CALL_FUNCTION            0
              4 RETURN_VALUE

For [] there is no function call at the Python bytecode level:

>>> dis.dis('[]')
  1           0 BUILD_LIST               0
              2 RETURN_VALUE

It simply goes straight to building the list without any lookups or calls at the bytecode level.

Conclusion

We have demonstrated that list can be intercepted with user code using the scoping rules, and that list() looks for a callable and then calls it.

Whereas [] is a list display, or a literal, and thus avoids the name lookup and function call.


如果PyPy快6.3倍,为什么我不应该在CPython上使用PyPy?

问题:如果PyPy快6.3倍,为什么我不应该在CPython上使用PyPy?

我已经听到很多有关PyPy项目的信息。他们声称它比其站点上的CPython解释器快6.3倍。

每当我们谈论诸如Python之类的动态语言时,速度都是头等大事。为了解决这个问题,他们说PyPy快6.3倍。

第二个问题是并行性,臭名昭著的Global Interpreter Lock(GIL)。为此,PyPy表示可以提供无GIL的Python

如果PyPy可以解决这些巨大的挑战,那么它的哪些弱点正在阻碍广泛采用?也就是说,是什么原因导致我这样的人,一个典型的Python开发,切换到PyPy 现在

I’ve been hearing a lot about the PyPy project. They claim it is 6.3 times faster than the CPython interpreter on their site.

Whenever we talk about dynamic languages like Python, speed is one of the top issues. To solve this, they say PyPy is 6.3 times faster.

The second issue is parallelism, the infamous Global Interpreter Lock (GIL). For this, PyPy says it can give GIL-less Python.

If PyPy can solve these great challenges, what are its weaknesses that are preventing wider adoption? That is to say, what’s preventing someone like me, a typical Python developer, from switching to PyPy right now?


回答 0

注意: PyPy现在比2013年提出这个问题时更加成熟,并且得到了更好的支持。避免从过时的信息中得出结论。


  1. 正如其他人很快提到的,PyPy 对C扩展提供了长期的支持。它具有支持,但通常速度低于Python,并且充其量也只是个问题。因此,许多模块只需要 CPython。PyPy不支持numpy PyPy现在支持numpy。某些扩展仍然不受支持(Pandas,SciPy等),请在进行更改之前先查看支持的软件包的列表
  2. 目前,对Python 3的支持尚处于试验阶段。 刚刚达到稳定!自2014年6月20日起,PyPy3 2.3.1-Fulcrum退出了
  3. PyPy有时并不真正更快“脚本”,其中有很多人使用Python进行。这些是运行时间短的程序,它们执行简单和小的操作。由于PyPy是JIT编译器,因此其主要优点来自运行时间长和简单的类型(例如数字)。坦率地说,与CPython相比,PyPy的JIT之前速度非常差
  4. 惯性。迁移到PyPy通常需要重新配置工具,对于某些人和组织而言,这简直就是太多的工作。

我会说,这些是影响我的主要原因。

NOTE: PyPy is more mature and better supported now than it was in 2013, when this question was asked. Avoid drawing conclusions from out-of-date information.


  1. PyPy, as others have been quick to mention, has tenuous support for C extensions. It has support, but typically at slower-than-Python speeds and it’s iffy at best. Hence a lot of modules simply require CPython. PyPy doesn’t support numpy PyPy now supports numpy. Some extensions are still not supported (Pandas, SciPy, etc.), take a look at the list of supported packages before making the change.
  2. Python 3 support is experimental at the moment. has just reached stable! As of 20th June 2014, PyPy3 2.3.1 – Fulcrum is out!
  3. PyPy sometimes isn’t actually faster for “scripts”, which a lot of people use Python for. These are the short-running programs that do something simple and small. Because PyPy is a JIT compiler its main advantages come from long run times and simple types (such as numbers). Frankly, PyPy’s pre-JIT speeds are pretty bad compared to CPython.
  4. Inertia. Moving to PyPy often requires retooling, which for some people and organizations is simply too much work.

Those are the main reasons that affect me, I’d say.


回答 1

该网站也没有权利要求PyPy比CPython的快6.3倍。报价:

所有基准的几何平均值比CPython快0.16或6.3倍

这与您所做的一揽子声明完全不同,当您了解差异时,您将至少了解一组不能仅仅说“使用PyPy”的原因。听起来好像我很挑剔,但是了解为什么这两个陈述完全不同是至关重要的。

分解:

  • 他们所做的陈述仅适用于他们所使用的基准。它完全没有说明您的程序(除非您的程序与其基准之一完全相同)。

  • 该声明大约是一组基准的平均值。没有人声称运行PyPy甚至可以为他们测试过的程序带来6.3倍的改进。

  • 没有人声称PyPy甚至可以运行CPython运行的所有程序,更不用说更快了。

That site does not claim PyPy is 6.3 times faster than CPython. To quote:

The geometric average of all benchmarks is 0.16 or 6.3 times faster than CPython

This is a very different statement to the blanket statement you made, and when you understand the difference, you’ll understand at least one set of reasons why you can’t just say “use PyPy”. It might sound like I’m nit-picking, but understanding why these two statements are totally different is vital.

To break that down:

  • The statement they make only applies to the benchmarks they’ve used. It says absolutely nothing about your program (unless your program is exactly the same as one of their benchmarks).

  • The statement is about an average of a group of benchmarks. There is no claim that running PyPy will give a 6.3 times improvement even for the programs they have tested.

  • There is no claim that PyPy will even run all the programs that CPython runs at all, let alone faster.


回答 2

由于pypy并非100%兼容,因此需要8 gig的ram进行编译,这是一个不断变化的目标,并且处于高度试验阶段,而cpython是稳定的,这是模块构建器默认的目标,长达20年(包括无法在pypy上运行的c扩展名) ),并且已经广泛部署。

Pypy可能永远不会成为参考实现,但是它是一个很好的工具。

Because pypy is not 100% compatible, takes 8 gigs of ram to compile, is a moving target, and highly experimental, where cpython is stable, the default target for module builders for 2 decades (including c extensions that don’t work on pypy), and already widely deployed.

Pypy will likely never be the reference implementation, but it is a good tool to have.


回答 3

第二个问题更容易回答:如果所有代码都是纯Python,则基本上可以使用PyPy替代。但是,许多广泛使用的库(包括一些标准库)都是用C编写的,并作为Python扩展进行编译。其中有些可以与PyPy一起使用,有些则不能。PyPy提供了与Python相同的“面向前”工具-也就是说,它是Python-,但是它的内在功能是不同的,因此与这些内在功能连接的工具将不起作用。

关于第一个问题,我想这有点像第一个Catch-22:PyPy一直在迅速发展,以提高速度并增强与其他代码的互操作性。这使其比官方更具实验性。

我认为,如果PyPy进入稳定状态,则有可能开始被更广泛地使用。我也认为Python摆脱C的支持是很棒的。但这不会一会儿发生。PyPy还没有达到临界质量的地方是几乎对自己有用的,足以做你想要的一切,这将激励人们以填补空白。

The second question is easier to answer: you basically can use PyPy as a drop-in replacement if all your code is pure Python. However, many widely used libraries (including some of the standard library) are written in C and compiled as Python extensions. Some of these can be made to work with PyPy, some can’t. PyPy provides the same “forward-facing” tool as Python — that is, it is Python — but its innards are different, so tools that interface with those innards won’t work.

As for the first question, I imagine it is sort of a Catch-22 with the first: PyPy has been evolving rapidly in an effort to improve speed and enhance interoperability with other code. This has made it more experimental than official.

I think it’s possible that if PyPy gets into a stable state, it may start getting more widely used. I also think it would be great for Python to move away from its C underpinnings. But it won’t happen for a while. PyPy hasn’t yet reached the critical mass where it is almost useful enough on its own to do everything you’d want, which would motivate people to fill in the gaps.


回答 4

我对此主题做了一个小型基准测试。尽管许多其他发布者在兼容性方面都提出了很好的观点,但我的经验是,PyPy仅仅移动一些位并没有那么快。对于Python的许多用途,它实际上仅存在于在两个或多个服务之间转换位。例如,很少有Web应用程序对数据集执行CPU密集型分析。相反,它们从客户端获取一些字节,将其存储在某种数据库中,然后再将其返回给其他客户端。有时,数据格式会更改。

BDFL和CPython开发人员是一群非常聪明的人,并设法帮助CPython在这种情况下表现出色。这是一个无耻的博客插件:http : //www.hydrogen18.com/blog/unpickling-buffers.html。我正在使用Stackless,它是从CPython派生的,并保留了完整的C模块接口。在那种情况下,我发现使用PyPy没有任何优势。

I did a small benchmark on this topic. While many of the other posters have made good points about compatibility, my experience has been that PyPy isn’t that much faster for just moving around bits. For many uses of Python, it really only exists to translate bits between two or more services. For example, not many web applications are performing CPU intensive analysis of datasets. Instead, they take some bytes from a client, store them in some sort of database, and later return them to other clients. Sometimes the format of the data is changed.

The BDFL and the CPython developers are a remarkably intelligent group of people and have a managed to help CPython perform excellent in such a scenario. Here’s a shameless blog plug: http://www.hydrogen18.com/blog/unpickling-buffers.html . I’m using Stackless, which is derived from CPython and retains the full C module interface. I didn’t find any advantage to using PyPy in that case.


回答 5

问:如果与CPython相比,PyPy可以解决这些巨大的挑战(速度,内存消耗,并行性),那么它的哪些弱点在阻止更广泛的采用?

答:首先,很少有证据表明PyPy团队可以解决问题的速度一般。长期证据表明,PyPy运行某些Python代码要比CPython慢​​,而且这一缺点似乎深深地植根于PyPy。

其次,在相当多的情况下,当前版本的PyPy消耗的内存比CPython多得多。因此,PyPy尚未解决内存消耗问题。

无论PyPy解决所提到的巨大挑战,并在一般更快,较少的内存饿了,和更友好的并行与CPython是一个悬而未决的问题无法在短期内得到解决。有人押注,PyPy将永远无法提供一种通用解决方案,使它在所有情况下均能统治CPython 2.7和3.3。

如果PyPy总体上要比CPython更好,这是值得怀疑的,那么影响其广泛采用的主要弱点将是与CPython的兼容性。还存在一些问题,例如CPython可在更广泛的CPU和OS上运行,但是与PyPy的性能和CPython兼容性目标相比,这些问题的重要性要小得多。


问:为什么现在不能放弃用PyPy替换CPython?

答:PyPy并非100%与CPython兼容,因为它没有在后台模拟CPython。有些程序可能仍依赖于PyPy中缺少的CPython的独特功能,例如C绑定,Python对象和方法的C实现,或CPython垃圾收集器的增量性质。

Q: If PyPy can solve these great challenges (speed, memory consumption, parallelism) in comparison to CPython, what are its weaknesses that are preventing wider adoption?

A: First, there is little evidence that the PyPy team can solve the speed problem in general. Long-term evidence is showing that PyPy runs certain Python codes slower than CPython and this drawback seems to be rooted very deeply in PyPy.

Secondly, the current version of PyPy consumes much more memory than CPython in a rather large set of cases. So PyPy didn’t solve the memory consumption problem yet.

Whether PyPy solves the mentioned great challenges and will in general be faster, less memory hungry, and more friendly to parallelism than CPython is an open question that cannot be solved in the short term. Some people are betting that PyPy will never be able to offer a general solution enabling it to dominate CPython 2.7 and 3.3 in all cases.

If PyPy succeeds to be better than CPython in general, which is questionable, the main weakness affecting its wider adoption will be its compatibility with CPython. There also exist issues such as the fact that CPython runs on a wider range of CPUs and OSes, but these issues are much less important compared to PyPy’s performance and CPython-compatibility goals.


Q: Why can’t I do drop in replacement of CPython with PyPy now?

A: PyPy isn’t 100% compatible with CPython because it isn’t simulating CPython under the hood. Some programs may still depend on CPython’s unique features that are absent in PyPy such as C bindings, C implementations of Python object&methods, or the incremental nature of CPython’s garbage collector.


回答 6

CPython具有引用计数和垃圾收集,PyPy仅具有垃圾收集。

因此,对象倾向于更早地删除,并__del__在CPython中以更可预测的方式调用。一些软件依赖于这种行为,因此它们还没有准备好迁移到PyPy。

某些其他软件可同时使用这两种软件,但CPython使用较少的内存,因为较早时释放了未使用的对象。(我没有任何度量来表明这有多重要,还有哪些其他实现细节会影响内存使用。)

CPython has reference counting and garbage collection, PyPy has garbage collection only.

So objects tend to be deleted earlier and __del__ is called in a more predictable way in CPython. Some software relies on this behavior, thus they are not ready for migrating to PyPy.

Some other software works with both, but uses less memory with CPython, because unused objects are freed earlier. (I don’t have any measurements to indicate how significant this is and what other implementation details affect the memory use.)


回答 7

对于许多项目,在速度方面,不同的python之间实际上有0%的差异。那就是那些受工程时间支配并且所有python都具有相同数量的库支持的库。

For a lot of projects, there is actually 0% difference between the different pythons in terms of speed. That is those that are dominated by engineering time and where all pythons have the same amount of library support.


回答 8

简单地说:PyPy提供了CPython所缺乏的速度,但却牺牲了它的兼容性。但是,大多数人选择Python是因为它具有灵活性和“含电池”功能(高兼容性),而不是因为它的速度(尽管它仍然是首选)。

To make this simple: PyPy provides the speed that’s lacked by CPython but sacrifices its compatibility. Most people, however, choose Python for its flexibility and its “battery-included” feature (high compatibility), not for its speed (it’s still preferred though).


回答 9

我发现了一些例子,其中PyPy比Python慢​​。但是:仅在Windows上。

C:\Users\User>python -m timeit -n10 -s"from sympy import isprime" "isprime(2**521-1);isprime(2**1279-1)"
10 loops, best of 3: 294 msec per loop

C:\Users\User>pypy -m timeit -n10 -s"from sympy import isprime" "isprime(2**521-1);isprime(2**1279-1)"
10 loops, best of 3: 1.33 sec per loop

因此,如果您想到的是PyPy,请忘记Windows。在Linux上,您可以实现出色的加速。示例(列出1到1,000,000之间的所有素数):

from sympy import sieve
primes = list(sieve.primerange(1, 10**6))

PyPy的运行速度比Python快10(!)倍。但不在Windows上。那里只有3倍的速度。

I’ve found examples, where PyPy is slower than Python. But: Only on Windows.

C:\Users\User>python -m timeit -n10 -s"from sympy import isprime" "isprime(2**521-1);isprime(2**1279-1)"
10 loops, best of 3: 294 msec per loop

C:\Users\User>pypy -m timeit -n10 -s"from sympy import isprime" "isprime(2**521-1);isprime(2**1279-1)"
10 loops, best of 3: 1.33 sec per loop

So, if you think of PyPy, forget Windows. On Linux, you can achieve awesome accelerations. Example (list all primes between 1 and 1,000,000):

from sympy import sieve
primes = list(sieve.primerange(1, 10**6))

This runs 10(!) times faster on PyPy than on Python. But not on windows. There it is only 3x as fast.


回答 10

PyPy已经支持Python 3一段时间了,但是根据Anthony Shaw在2018年4月2日发布的HackerNoon帖子中所述,PyPy3仍然比PyPy(Python 2)慢几倍。

对于许多科学计算,尤其是矩阵计算,numpy是更好的选择(请参阅FAQ:我应该安装numpy还是numpypy?)。

Pypy不支持gmpy2。您可以改用gmpy_cffi, 尽管我尚未测试过它的速度,并且该项目在2014年发布了一个版本。

对于Project Euler问题,我经常使用PyPy,对于简单的数值计算通常from __future__ import division足以满足我的目的,但是截至2018年,Python 3支持仍在开发中,最好的选择是在64位Linux上。Windows PyPy3.5 v6.0(截至2018年12月)为最新版本。

PyPy has had Python 3 support for a while, but according to this HackerNoon post by Anthony Shaw from April 2nd, 2018, PyPy3 is still several times slower than PyPy (Python 2).

For many scientific calculations, particularly matrix calculations, numpy is a better choice (see FAQ: Should I install numpy or numpypy?).

Pypy does not support gmpy2. You can instead make use of gmpy_cffi though I haven’t tested its speed and the project had one release in 2014.

For Project Euler problems, I make frequent use of PyPy, and for simple numerical calculations often from __future__ import division is sufficient for my purposes, but Python 3 support is still being worked on as of 2018, with your best bet being on 64-bit Linux. Windows PyPy3.5 v6.0, the latest as of December 2018, is in beta.


回答 11

支持的Python版本

引用PythonZen

可读性很重要。

例如,Python 3.7引入了数据类,Python 3.8引入了fstring =

Python 3.7和Python 3.8中可能还有其他更重要的功能。关键是PyPy目前不支持Python 3.7或Python 3.8。

Supported Python Versions

To cite the Zen of Python:

Readability counts.

For example, Python 3.7 introduced dataclasses and Python 3.8 introduced fstring =.

There might be other features in Python 3.7 and Python 3.8 which are more important to you. The point is that PyPy does not support Python 3.7 or Python 3.8 at the moment.


与Project Euler的速度比较:C,Python,Erlang,Haskell

问题:与Project Euler的速度比较:C,Python,Erlang,Haskell

我已经采取了问题#12项目欧拉作为编程锻炼和比较我的(肯定不是最优的)实现在C,Python和Erlang和Haskell的。为了获得更高的执行时间,我搜索的第一个三角形数的除数大于1000,而不是原始问题中所述的500。

结果如下:

C:

lorenzo@enzo:~/erlang$ gcc -lm -o euler12.bin euler12.c
lorenzo@enzo:~/erlang$ time ./euler12.bin
842161320

real    0m11.074s
user    0m11.070s
sys 0m0.000s

Python:

lorenzo@enzo:~/erlang$ time ./euler12.py 
842161320

real    1m16.632s
user    1m16.370s
sys 0m0.250s

带有PyPy的Python:

lorenzo@enzo:~/Downloads/pypy-c-jit-43780-b590cf6de419-linux64/bin$ time ./pypy /home/lorenzo/erlang/euler12.py 
842161320

real    0m13.082s
user    0m13.050s
sys 0m0.020s

Erlang:

lorenzo@enzo:~/erlang$ erlc euler12.erl 
lorenzo@enzo:~/erlang$ time erl -s euler12 solve
Erlang R13B03 (erts-5.7.4) [source] [64-bit] [smp:4:4] [rq:4] [async-threads:0] [hipe] [kernel-poll:false]

Eshell V5.7.4  (abort with ^G)
1> 842161320

real    0m48.259s
user    0m48.070s
sys 0m0.020s

Haskell:

lorenzo@enzo:~/erlang$ ghc euler12.hs -o euler12.hsx
[1 of 1] Compiling Main             ( euler12.hs, euler12.o )
Linking euler12.hsx ...
lorenzo@enzo:~/erlang$ time ./euler12.hsx 
842161320

real    2m37.326s
user    2m37.240s
sys 0m0.080s

摘要:

  • C:100%
  • Python:692%(使用PyPy时为118%)
  • Erlang:436%(135%感谢RichardC)
  • 哈斯克尔:1421%

我认为C具有很大的优势,因为它使用long进行计算,而不使用其他任意三个整数。另外,它不需要先加载运行时(其他加载项吗?)。

问题1: Erlang,Python和Haskell是否会由于使用任意长度的整数而导致速度降低,或者只要值小于,它们是否会失去速度MAXINT

问题2: 为什么Haskell这么慢?是否有编译器标志可以使您刹车?或者它是我的实现?(后者很有可能是因为Haskell是一本对我有七个印章的书。)

问题3: 您能否为我提供一些提示,说明如何在不改变因素确定方式的情况下优化这些实现?以任何方式进行优化:对语言更好,更快,更“原生”。

编辑:

问题4: 我的功能实现是否允许LCO(最后一次调用优化,又称为尾部递归消除),从而避免在调用堆栈上添加不必要的帧?

尽管我不得不承认我的Haskell和Erlang知识非常有限,但我确实试图在四种语言中尽可能地实现相同的算法。


使用的源代码:

#include <stdio.h>
#include <math.h>

int factorCount (long n)
{
    double square = sqrt (n);
    int isquare = (int) square;
    int count = isquare == square ? -1 : 0;
    long candidate;
    for (candidate = 1; candidate <= isquare; candidate ++)
        if (0 == n % candidate) count += 2;
    return count;
}

int main ()
{
    long triangle = 1;
    int index = 1;
    while (factorCount (triangle) < 1001)
    {
        index ++;
        triangle += index;
    }
    printf ("%ld\n", triangle);
}

#! /usr/bin/env python3.2

import math

def factorCount (n):
    square = math.sqrt (n)
    isquare = int (square)
    count = -1 if isquare == square else 0
    for candidate in range (1, isquare + 1):
        if not n % candidate: count += 2
    return count

triangle = 1
index = 1
while factorCount (triangle) < 1001:
    index += 1
    triangle += index

print (triangle)

-module (euler12).
-compile (export_all).

factorCount (Number) -> factorCount (Number, math:sqrt (Number), 1, 0).

factorCount (_, Sqrt, Candidate, Count) when Candidate > Sqrt -> Count;

factorCount (_, Sqrt, Candidate, Count) when Candidate == Sqrt -> Count + 1;

factorCount (Number, Sqrt, Candidate, Count) ->
    case Number rem Candidate of
        0 -> factorCount (Number, Sqrt, Candidate + 1, Count + 2);
        _ -> factorCount (Number, Sqrt, Candidate + 1, Count)
    end.

nextTriangle (Index, Triangle) ->
    Count = factorCount (Triangle),
    if
        Count > 1000 -> Triangle;
        true -> nextTriangle (Index + 1, Triangle + Index + 1)  
    end.

solve () ->
    io:format ("~p~n", [nextTriangle (1, 1) ] ),
    halt (0).

factorCount number = factorCount' number isquare 1 0 - (fromEnum $ square == fromIntegral isquare)
    where square = sqrt $ fromIntegral number
          isquare = floor square

factorCount' number sqrt candidate count
    | fromIntegral candidate > sqrt = count
    | number `mod` candidate == 0 = factorCount' number sqrt (candidate + 1) (count + 2)
    | otherwise = factorCount' number sqrt (candidate + 1) count

nextTriangle index triangle
    | factorCount triangle > 1000 = triangle
    | otherwise = nextTriangle (index + 1) (triangle + index + 1)

main = print $ nextTriangle 1 1

I have taken Problem #12 from Project Euler as a programming exercise and to compare my (surely not optimal) implementations in C, Python, Erlang and Haskell. In order to get some higher execution times, I search for the first triangle number with more than 1000 divisors instead of 500 as stated in the original problem.

The result is the following:

C:

lorenzo@enzo:~/erlang$ gcc -lm -o euler12.bin euler12.c
lorenzo@enzo:~/erlang$ time ./euler12.bin
842161320

real    0m11.074s
user    0m11.070s
sys 0m0.000s

Python:

lorenzo@enzo:~/erlang$ time ./euler12.py 
842161320

real    1m16.632s
user    1m16.370s
sys 0m0.250s

Python with PyPy:

lorenzo@enzo:~/Downloads/pypy-c-jit-43780-b590cf6de419-linux64/bin$ time ./pypy /home/lorenzo/erlang/euler12.py 
842161320

real    0m13.082s
user    0m13.050s
sys 0m0.020s

Erlang:

lorenzo@enzo:~/erlang$ erlc euler12.erl 
lorenzo@enzo:~/erlang$ time erl -s euler12 solve
Erlang R13B03 (erts-5.7.4) [source] [64-bit] [smp:4:4] [rq:4] [async-threads:0] [hipe] [kernel-poll:false]

Eshell V5.7.4  (abort with ^G)
1> 842161320

real    0m48.259s
user    0m48.070s
sys 0m0.020s

Haskell:

lorenzo@enzo:~/erlang$ ghc euler12.hs -o euler12.hsx
[1 of 1] Compiling Main             ( euler12.hs, euler12.o )
Linking euler12.hsx ...
lorenzo@enzo:~/erlang$ time ./euler12.hsx 
842161320

real    2m37.326s
user    2m37.240s
sys 0m0.080s

Summary:

  • C: 100%
  • Python: 692% (118% with PyPy)
  • Erlang: 436% (135% thanks to RichardC)
  • Haskell: 1421%

I suppose that C has a big advantage as it uses long for the calculations and not arbitrary length integers as the other three. Also it doesn’t need to load a runtime first (Do the others?).

Question 1: Do Erlang, Python and Haskell lose speed due to using arbitrary length integers or don’t they as long as the values are less than MAXINT?

Question 2: Why is Haskell so slow? Is there a compiler flag that turns off the brakes or is it my implementation? (The latter is quite probable as Haskell is a book with seven seals to me.)

Question 3: Can you offer me some hints how to optimize these implementations without changing the way I determine the factors? Optimization in any way: nicer, faster, more “native” to the language.

EDIT:

Question 4: Do my functional implementations permit LCO (last call optimization, a.k.a tail recursion elimination) and hence avoid adding unnecessary frames onto the call stack?

I really tried to implement the same algorithm as similar as possible in the four languages, although I have to admit that my Haskell and Erlang knowledge is very limited.


Source codes used:

#include <stdio.h>
#include <math.h>

int factorCount (long n)
{
    double square = sqrt (n);
    int isquare = (int) square;
    int count = isquare == square ? -1 : 0;
    long candidate;
    for (candidate = 1; candidate <= isquare; candidate ++)
        if (0 == n % candidate) count += 2;
    return count;
}

int main ()
{
    long triangle = 1;
    int index = 1;
    while (factorCount (triangle) < 1001)
    {
        index ++;
        triangle += index;
    }
    printf ("%ld\n", triangle);
}

#! /usr/bin/env python3.2

import math

def factorCount (n):
    square = math.sqrt (n)
    isquare = int (square)
    count = -1 if isquare == square else 0
    for candidate in range (1, isquare + 1):
        if not n % candidate: count += 2
    return count

triangle = 1
index = 1
while factorCount (triangle) < 1001:
    index += 1
    triangle += index

print (triangle)

-module (euler12).
-compile (export_all).

factorCount (Number) -> factorCount (Number, math:sqrt (Number), 1, 0).

factorCount (_, Sqrt, Candidate, Count) when Candidate > Sqrt -> Count;

factorCount (_, Sqrt, Candidate, Count) when Candidate == Sqrt -> Count + 1;

factorCount (Number, Sqrt, Candidate, Count) ->
    case Number rem Candidate of
        0 -> factorCount (Number, Sqrt, Candidate + 1, Count + 2);
        _ -> factorCount (Number, Sqrt, Candidate + 1, Count)
    end.

nextTriangle (Index, Triangle) ->
    Count = factorCount (Triangle),
    if
        Count > 1000 -> Triangle;
        true -> nextTriangle (Index + 1, Triangle + Index + 1)  
    end.

solve () ->
    io:format ("~p~n", [nextTriangle (1, 1) ] ),
    halt (0).

factorCount number = factorCount' number isquare 1 0 - (fromEnum $ square == fromIntegral isquare)
    where square = sqrt $ fromIntegral number
          isquare = floor square

factorCount' number sqrt candidate count
    | fromIntegral candidate > sqrt = count
    | number `mod` candidate == 0 = factorCount' number sqrt (candidate + 1) (count + 2)
    | otherwise = factorCount' number sqrt (candidate + 1) count

nextTriangle index triangle
    | factorCount triangle > 1000 = triangle
    | otherwise = nextTriangle (index + 1) (triangle + index + 1)

main = print $ nextTriangle 1 1

回答 0

使用GHC 7.0.3gcc 4.4.6Linux 2.6.29一个x86_64的Core2双核(2.5GHz的)机器上,编译使用ghc -O2 -fllvm -fforce-recomp用于Haskell和gcc -O3 -lm为C.

  • 您的C例程运行8.4秒(比您的运行速度快,原因可能是-O3
  • Haskell解决方案可在36秒内运行(由于显示-O2标记)
  • 您的factorCount'代码未明确输入且默认为Integer(感谢Daniel在这里纠正我的误诊!)。使用给出显式类型签名(无论如何都是标准做法)Int,时间更改为11.1秒
  • factorCount'你不必要的呼唤fromIntegral。修复不会导致任何变化(编译器很聪明,对您来说很幸运)。
  • modrem更快更充分的地方使用了。这将时间更改为8.5秒
  • factorCount'不断应用两个永不更改的自变量(numbersqrt)。工人/包装工人的转型为我们提供了:
 $ time ./so
 842161320  

 real    0m7.954s  
 user    0m7.944s  
 sys     0m0.004s  

没错,7.95秒。始终比C解决方案快半秒。没有-fllvm标记,我仍然会收到消息8.182 seconds,因此NCG后端在这种情况下也表现良好。

结论:Haskell很棒。

结果代码

factorCount number = factorCount' number isquare 1 0 - (fromEnum $ square == fromIntegral isquare)
    where square = sqrt $ fromIntegral number
          isquare = floor square

factorCount' :: Int -> Int -> Int -> Int -> Int
factorCount' number sqrt candidate0 count0 = go candidate0 count0
  where
  go candidate count
    | candidate > sqrt = count
    | number `rem` candidate == 0 = go (candidate + 1) (count + 2)
    | otherwise = go (candidate + 1) count

nextTriangle index triangle
    | factorCount triangle > 1000 = triangle
    | otherwise = nextTriangle (index + 1) (triangle + index + 1)

main = print $ nextTriangle 1 1

编辑:现在我们已经探索了,让我们解决问题

问题1:erlang,python和haskell是否由于使用任意长度的整数而失去速度,还是只要它们的值小于MAXINT,它们就不会丢失吗?

在Haskell中,使用Integer速度慢于Int但慢多少取决于所执行的计算。幸运的是(对于64位计算机)Int就足够了。出于可移植性的考虑,您可能应该重写我的代码以使用Int64Word64(C不是唯一带有的语言long)。

问题2:为什么haskell这么慢?是否有编译器标志可以使您刹车,或者它是我的实现?(后者很可能因为haskell是一本对我有七个印章的书。)

问题3:能否为我提供一些提示,说明如何在不改变因素确定方式的情况下优化这些实现?以任何方式进行优化:对语言更好,更快,更“原生”。

这就是我上面回答的。答案是

  • 0)通过使用优化 -O2
  • 1)尽可能使用快速(特别是:不可装箱)类型
  • 2)rem不是mod(经常被遗忘的优化),并且
  • 3)worker / wrapper转换(也许是最常见的优化)。

问题4:我的功能实现是否允许LCO,从而避免在调用堆栈中添加不必要的帧?

是的,这不是问题。做得好,很高兴您考虑了这一点。

Using GHC 7.0.3, gcc 4.4.6, Linux 2.6.29 on an x86_64 Core2 Duo (2.5GHz) machine, compiling using ghc -O2 -fllvm -fforce-recomp for Haskell and gcc -O3 -lm for C.

  • Your C routine runs in 8.4 seconds (faster than your run probably because of -O3)
  • The Haskell solution runs in 36 seconds (due to the -O2 flag)
  • Your factorCount' code isn’t explicitly typed and defaulting to Integer (thanks to Daniel for correcting my misdiagnosis here!). Giving an explicit type signature (which is standard practice anyway) using Int and the time changes to 11.1 seconds
  • in factorCount' you have needlessly called fromIntegral. A fix results in no change though (the compiler is smart, lucky for you).
  • You used mod where rem is faster and sufficient. This changes the time to 8.5 seconds.
  • factorCount' is constantly applying two extra arguments that never change (number, sqrt). A worker/wrapper transformation gives us:
 $ time ./so
 842161320  

 real    0m7.954s  
 user    0m7.944s  
 sys     0m0.004s  

That’s right, 7.95 seconds. Consistently half a second faster than the C solution. Without the -fllvm flag I’m still getting 8.182 seconds, so the NCG backend is doing well in this case too.

Conclusion: Haskell is awesome.

Resulting Code

factorCount number = factorCount' number isquare 1 0 - (fromEnum $ square == fromIntegral isquare)
    where square = sqrt $ fromIntegral number
          isquare = floor square

factorCount' :: Int -> Int -> Int -> Int -> Int
factorCount' number sqrt candidate0 count0 = go candidate0 count0
  where
  go candidate count
    | candidate > sqrt = count
    | number `rem` candidate == 0 = go (candidate + 1) (count + 2)
    | otherwise = go (candidate + 1) count

nextTriangle index triangle
    | factorCount triangle > 1000 = triangle
    | otherwise = nextTriangle (index + 1) (triangle + index + 1)

main = print $ nextTriangle 1 1

EDIT: So now that we’ve explored that, lets address the questions

Question 1: Do erlang, python and haskell lose speed due to using arbitrary length integers or don’t they as long as the values are less than MAXINT?

In Haskell, using Integer is slower than Int but how much slower depends on the computations performed. Luckily (for 64 bit machines) Int is sufficient. For portability sake you should probably rewrite my code to use Int64 or Word64 (C isn’t the only language with a long).

Question 2: Why is haskell so slow? Is there a compiler flag that turns off the brakes or is it my implementation? (The latter is quite probable as haskell is a book with seven seals to me.)

Question 3: Can you offer me some hints how to optimize these implementations without changing the way I determine the factors? Optimization in any way: nicer, faster, more “native” to the language.

That was what I answered above. The answer was

  • 0) Use optimization via -O2
  • 1) Use fast (notably: unbox-able) types when possible
  • 2) rem not mod (a frequently forgotten optimization) and
  • 3) worker/wrapper transformation (perhaps the most common optimization).

Question 4: Do my functional implementations permit LCO and hence avoid adding unnecessary frames onto the call stack?

Yes, that wasn’t the issue. Good work and glad you considered this.


回答 1

Erlang实现存在一些问题。作为以下内容的基准,我测得的未经修改的Erlang程序的执行时间为47.6秒,而C代码为12.7秒。

如果要运行计算密集型的Erlang代码,您应该做的第一件事就是使用本机代码。使用进行编译,erlc +native euler12时间缩短至41.3秒。但是,这比这种代码的本机编译要低得多(仅为15%),问题是您使用-compile(export_all)。这对于实验很有用,但是所有功能都可以从外部访问的事实导致本机编译器非常保守。(普通的BEAM仿真器不会受到太大影响。)用替换该声明可以-export([solve/0]).提供更好的加速:31.5秒(比基线快将近35%)。

但是代码本身有一个问题:对于factorCount循环中的每个迭代,您都可以执行以下测试:

factorCount (_, Sqrt, Candidate, Count) when Candidate == Sqrt -> Count + 1;

C代码不会执行此操作。通常,在同一代码的不同实现之间进行公平的比较可能比较棘手,特别是如果算法是数字的,因为您需要确保它们实际上在做相同的事情。尽管某个实现最终会达到相同的结果,但由于某个地方的某些类型转换而导致的一种实现中的轻微舍入错误可能会导致其执行的迭代次数要多于另一个实现。

为了消除这种可能的错误源(并消除每次迭代中的额外测试),我重新编写了factorCount函数,如下所示,该函数非常类似于C代码:

factorCount (N) ->
    Sqrt = math:sqrt (N),
    ISqrt = trunc(Sqrt),
    if ISqrt == Sqrt -> factorCount (N, ISqrt, 1, -1);
       true          -> factorCount (N, ISqrt, 1, 0)
    end.

factorCount (_N, ISqrt, Candidate, Count) when Candidate > ISqrt -> Count;
factorCount ( N, ISqrt, Candidate, Count) ->
    case N rem Candidate of
        0 -> factorCount (N, ISqrt, Candidate + 1, Count + 2);
        _ -> factorCount (N, ISqrt, Candidate + 1, Count)
    end.

此重写(no export_all)和本机编译为我提供了以下运行时间:

$ erlc +native euler12.erl
$ time erl -noshell -s euler12 solve
842161320

real    0m19.468s
user    0m19.450s
sys 0m0.010s

与C代码相比,还算不错:

$ time ./a.out 
842161320

real    0m12.755s
user    0m12.730s
sys 0m0.020s

考虑到Erlang根本不适合编写数字代码,在这样的程序上,它仅比C慢50%。

最后,关于您的问题:

问题1:erlang,python和haskell是否由于使用任意长度的整数而导致速度变慢,或者只要值小于MAXINT,它们是否会变慢?

是的,有点。在Erlang中,没有办法说“结合使用32/64位算术”,因此,除非编译器可以证明整数的某些界限(通常不能),否则它必须检查所有计算以查看是否可以将它们放入单个标记的单词中,或者是否必须将它们转换为堆分配的bignum。即使在运行时实际上没有使用大数,也必须执行这些检查。另一方面,这意味着您知道,如果您突然给它比以前更大的输入,该算法将永远不会因为意外的整数环绕而失败。

问题4:我的功能实现是否允许LCO,从而避免在调用堆栈中添加不必要的帧?

是的,您的Erlang代码在上次通话优化方面是正确的。

There are some problems with the Erlang implementation. As baseline for the following, my measured execution time for your unmodified Erlang program was 47.6 seconds, compared to 12.7 seconds for the C code.

The first thing you should do if you want to run computationally intensive Erlang code is to use native code. Compiling with erlc +native euler12 got the time down to 41.3 seconds. This is however a much lower speedup (just 15%) than expected from native compilation on this kind of code, and the problem is your use of -compile(export_all). This is useful for experimentation, but the fact that all functions are potentially reachable from the outside causes the native compiler to be very conservative. (The normal BEAM emulator is not that much affected.) Replacing this declaration with -export([solve/0]). gives a much better speedup: 31.5 seconds (almost 35% from the baseline).

But the code itself has a problem: for each iteration in the factorCount loop, you perform this test:

factorCount (_, Sqrt, Candidate, Count) when Candidate == Sqrt -> Count + 1;

The C code doesn’t do this. In general, it can be tricky to make a fair comparison between different implementations of the same code, and in particular if the algorithm is numerical, because you need to be sure that they are actually doing the same thing. A slight rounding error in one implementation due to some typecast somewhere may cause it to do many more iterations than the other even though both eventually reach the same result.

To eliminate this possible error source (and get rid of the extra test in each iteration), I rewrote the factorCount function as follows, closely modelled on the C code:

factorCount (N) ->
    Sqrt = math:sqrt (N),
    ISqrt = trunc(Sqrt),
    if ISqrt == Sqrt -> factorCount (N, ISqrt, 1, -1);
       true          -> factorCount (N, ISqrt, 1, 0)
    end.

factorCount (_N, ISqrt, Candidate, Count) when Candidate > ISqrt -> Count;
factorCount ( N, ISqrt, Candidate, Count) ->
    case N rem Candidate of
        0 -> factorCount (N, ISqrt, Candidate + 1, Count + 2);
        _ -> factorCount (N, ISqrt, Candidate + 1, Count)
    end.

This rewrite, no export_all, and native compilation, gave me the following run time:

$ erlc +native euler12.erl
$ time erl -noshell -s euler12 solve
842161320

real    0m19.468s
user    0m19.450s
sys 0m0.010s

which is not too bad compared to the C code:

$ time ./a.out 
842161320

real    0m12.755s
user    0m12.730s
sys 0m0.020s

considering that Erlang is not at all geared towards writing numerical code, being only 50% slower than C on a program like this is pretty good.

Finally, regarding your questions:

Question 1: Do erlang, python and haskell loose speed due to using arbitrary length integers or don’t they as long as the values are less than MAXINT?

Yes, somewhat. In Erlang, there is no way of saying “use 32/64-bit arithmetic with wrap-around”, so unless the compiler can prove some bounds on your integers (and it usually can’t), it must check all computations to see if they can fit in a single tagged word or if it has to turn them into heap-allocated bignums. Even if no bignums are ever used in practice at runtime, these checks will have to be performed. On the other hand, that means you know that the algorithm will never fail because of an unexpected integer wraparound if you suddenly give it larger inputs than before.

Question 4: Do my functional implementations permit LCO and hence avoid adding unnecessary frames onto the call stack?

Yes, your Erlang code is correct with respect to last call optimization.


回答 2

关于Python优化,除了使用PyPy(在代码零更改的情况下实现相当不错的加速)之外,您还可以使用PyPy的翻译工具链来编译兼容RPython的版本,或者使用Cython来构建扩展模块,两者Cython模块比我的测试中的C版本要快,而Cython模块的速度几乎是后者的两倍。作为参考,我还包括C和PyPy基准测试结果:

C(与编译gcc -O3 -lm

% time ./euler12-c 
842161320

./euler12-c  11.95s 
 user 0.00s 
 system 99% 
 cpu 11.959 total

PyPy 1.5

% time pypy euler12.py
842161320
pypy euler12.py  
16.44s user 
0.01s system 
99% cpu 16.449 total

RPython(使用最新的PyPy修订版c2f583445aee

% time ./euler12-rpython-c
842161320
./euler12-rpy-c  
10.54s user 0.00s 
system 99% 
cpu 10.540 total

Cython 0.15

% time python euler12-cython.py
842161320
python euler12-cython.py  
6.27s user 0.00s 
system 99% 
cpu 6.274 total

RPython版本有几个关键更改。要转换为独立程序,您需要定义您的target,在这种情况下为main函数。它应该接受,sys.argv因为它是唯一的参数,并且需要返回一个int。您可以使用translate.py对其进行翻译,% translate.py euler12-rpython.py后者将转换为C并为您编译。

# euler12-rpython.py

import math, sys

def factorCount(n):
    square = math.sqrt(n)
    isquare = int(square)
    count = -1 if isquare == square else 0
    for candidate in xrange(1, isquare + 1):
        if not n % candidate: count += 2
    return count

def main(argv):
    triangle = 1
    index = 1
    while factorCount(triangle) < 1001:
        index += 1
        triangle += index
    print triangle
    return 0

if __name__ == '__main__':
    main(sys.argv)

def target(*args):
    return main, None

Cython版本被重写为扩展模块_euler12.pyx,我从普通的python文件导入并调用了该模块。在_euler12.pyx本质上是相同的版本,有一些额外的静态类型声明。setup.py具有使用来构建扩展的普通样板python setup.py build_ext --inplace

# _euler12.pyx
from libc.math cimport sqrt

cdef int factorCount(int n):
    cdef int candidate, isquare, count
    cdef double square
    square = sqrt(n)
    isquare = int(square)
    count = -1 if isquare == square else 0
    for candidate in range(1, isquare + 1):
        if not n % candidate: count += 2
    return count

cpdef main():
    cdef int triangle = 1, index = 1
    while factorCount(triangle) < 1001:
        index += 1
        triangle += index
    print triangle

# euler12-cython.py
import _euler12
_euler12.main()

# setup.py
from distutils.core import setup
from distutils.extension import Extension
from Cython.Distutils import build_ext

ext_modules = [Extension("_euler12", ["_euler12.pyx"])]

setup(
  name = 'Euler12-Cython',
  cmdclass = {'build_ext': build_ext},
  ext_modules = ext_modules
)

老实说,我对RPython或Cython的经验很少,并对结果感到惊喜。如果您使用的是CPython,则在Cython扩展模块中编写CPU密集型代码似乎是优化程序的一种简便方法。

In regards to Python optimization, in addition to using PyPy (for pretty impressive speed-ups with zero change to your code), you could use PyPy’s translation toolchain to compile an RPython-compliant version, or Cython to build an extension module, both of which are faster than the C version in my testing, with the Cython module nearly twice as fast. For reference I include C and PyPy benchmark results as well:

C (compiled with gcc -O3 -lm)

% time ./euler12-c 
842161320

./euler12-c  11.95s 
 user 0.00s 
 system 99% 
 cpu 11.959 total

PyPy 1.5

% time pypy euler12.py
842161320
pypy euler12.py  
16.44s user 
0.01s system 
99% cpu 16.449 total

RPython (using latest PyPy revision, c2f583445aee)

% time ./euler12-rpython-c
842161320
./euler12-rpy-c  
10.54s user 0.00s 
system 99% 
cpu 10.540 total

Cython 0.15

% time python euler12-cython.py
842161320
python euler12-cython.py  
6.27s user 0.00s 
system 99% 
cpu 6.274 total

The RPython version has a couple of key changes. To translate into a standalone program you need to define your target, which in this case is the main function. It’s expected to accept sys.argv as it’s only argument, and is required to return an int. You can translate it by using translate.py, % translate.py euler12-rpython.py which translates to C and compiles it for you.

# euler12-rpython.py

import math, sys

def factorCount(n):
    square = math.sqrt(n)
    isquare = int(square)
    count = -1 if isquare == square else 0
    for candidate in xrange(1, isquare + 1):
        if not n % candidate: count += 2
    return count

def main(argv):
    triangle = 1
    index = 1
    while factorCount(triangle) < 1001:
        index += 1
        triangle += index
    print triangle
    return 0

if __name__ == '__main__':
    main(sys.argv)

def target(*args):
    return main, None

The Cython version was rewritten as an extension module _euler12.pyx, which I import and call from a normal python file. The _euler12.pyx is essentially the same as your version, with some additional static type declarations. The setup.py has the normal boilerplate to build the extension, using python setup.py build_ext --inplace.

# _euler12.pyx
from libc.math cimport sqrt

cdef int factorCount(int n):
    cdef int candidate, isquare, count
    cdef double square
    square = sqrt(n)
    isquare = int(square)
    count = -1 if isquare == square else 0
    for candidate in range(1, isquare + 1):
        if not n % candidate: count += 2
    return count

cpdef main():
    cdef int triangle = 1, index = 1
    while factorCount(triangle) < 1001:
        index += 1
        triangle += index
    print triangle

# euler12-cython.py
import _euler12
_euler12.main()

# setup.py
from distutils.core import setup
from distutils.extension import Extension
from Cython.Distutils import build_ext

ext_modules = [Extension("_euler12", ["_euler12.pyx"])]

setup(
  name = 'Euler12-Cython',
  cmdclass = {'build_ext': build_ext},
  ext_modules = ext_modules
)

I honestly have very little experience with either RPython or Cython, and was pleasantly surprised at the results. If you are using CPython, writing your CPU-intensive bits of code in a Cython extension module seems like a really easy way to optimize your program.


回答 3

问题3:您能否为我提供一些提示,说明如何在不改变因素确定方式的情况下优化这些实现?以任何方式进行优化:对语言更好,更快,更“原生”。

C实现是次优的(由Thomas M. DuBuisson暗示),该版本使用64位整数(即long数据类型)。稍后我将研究程序集列表,但经过深思熟虑的猜测,编译后的代码中存在一些内存访问,这使得使用64位整数的速度大大降低。这就是生成的代码(因为可以在SSE寄存器中容纳更少的64位整数,或者将双精度数舍入为64位整数的事实比较慢)。

这是修改后的代码(只需将int替换为long,并且我显式内联了factorCount,尽管我认为gcc -O3不需要这样做):

#include <stdio.h>
#include <math.h>

static inline int factorCount(int n)
{
    double square = sqrt (n);
    int isquare = (int)square;
    int count = isquare == square ? -1 : 0;
    int candidate;
    for (candidate = 1; candidate <= isquare; candidate ++)
        if (0 == n % candidate) count += 2;
    return count;
}

int main ()
{
    int triangle = 1;
    int index = 1;
    while (factorCount (triangle) < 1001)
    {
        index++;
        triangle += index;
    }
    printf ("%d\n", triangle);
}

运行+计时它给出:

$ gcc -O3 -lm -o euler12 euler12.c; time ./euler12
842161320
./euler12  2.95s user 0.00s system 99% cpu 2.956 total

作为参考,Thomas在前面的答案中的haskell实现给出:

$ ghc -O2 -fllvm -fforce-recomp euler12.hs; time ./euler12                                                                                      [9:40]
[1 of 1] Compiling Main             ( euler12.hs, euler12.o )
Linking euler12 ...
842161320
./euler12  9.43s user 0.13s system 99% cpu 9.602 total

结论:ghc是一个很棒的编译器,它没有任何缺点,但是gcc通常会生成更快的代码。

Question 3: Can you offer me some hints how to optimize these implementations without changing the way I determine the factors? Optimization in any way: nicer, faster, more “native” to the language.

The C implementation is suboptimal (as hinted at by Thomas M. DuBuisson), the version uses 64-bit integers (i.e. long datatype). I’ll investigate the assembly listing later, but with an educated guess, there are some memory accesses going on in the compiled code, which make using 64-bit integers significantly slower. It’s that or generated code (be it the fact that you can fit less 64-bit ints in a SSE register or round a double to a 64-bit integer is slower).

Here is the modified code (simply replace long with int and I explicitly inlined factorCount, although I do not think that this is necessary with gcc -O3):

#include <stdio.h>
#include <math.h>

static inline int factorCount(int n)
{
    double square = sqrt (n);
    int isquare = (int)square;
    int count = isquare == square ? -1 : 0;
    int candidate;
    for (candidate = 1; candidate <= isquare; candidate ++)
        if (0 == n % candidate) count += 2;
    return count;
}

int main ()
{
    int triangle = 1;
    int index = 1;
    while (factorCount (triangle) < 1001)
    {
        index++;
        triangle += index;
    }
    printf ("%d\n", triangle);
}

Running + timing it gives:

$ gcc -O3 -lm -o euler12 euler12.c; time ./euler12
842161320
./euler12  2.95s user 0.00s system 99% cpu 2.956 total

For reference, the haskell implementation by Thomas in the earlier answer gives:

$ ghc -O2 -fllvm -fforce-recomp euler12.hs; time ./euler12                                                                                      [9:40]
[1 of 1] Compiling Main             ( euler12.hs, euler12.o )
Linking euler12 ...
842161320
./euler12  9.43s user 0.13s system 99% cpu 9.602 total

Conclusion: Taking nothing away from ghc, its a great compiler, but gcc normally generates faster code.


回答 4

看一下这个博客。在过去的一年左右的时间里,他用Haskell和Python解决了一些Project Euler问题,并且他通常发现Haskell的速度要快得多。我认为在这些语言之间,它与您的流畅程度和编码风格有关。

说到Python速度,您使用的是错误的实现!尝试使用PyPy,对于这样的事情,您会发现它的运行速度要快得多。

Take a look at this blog. Over the past year or so he’s done a few of the Project Euler problems in Haskell and Python, and he’s generally found Haskell to be much faster. I think that between those languages it has more to do with your fluency and coding style.

When it comes to Python speed, you’re using the wrong implementation! Try PyPy, and for things like this you’ll find it to be much, much faster.


回答 5

通过使用Haskell软件包中的某些功能,可以大大加快Haskell的实现。在这种情况下,我使用了素数,它仅与“ cabal install primes”一起安装;)

import Data.Numbers.Primes
import Data.List

triangleNumbers = scanl1 (+) [1..]
nDivisors n = product $ map ((+1) . length) (group (primeFactors n))
answer = head $ filter ((> 500) . nDivisors) triangleNumbers

main :: IO ()
main = putStrLn $ "First triangle number to have over 500 divisors: " ++ (show answer)

时间:

您的原始程序:

PS> measure-command { bin\012_slow.exe }

TotalSeconds      : 16.3807409
TotalMilliseconds : 16380.7409

改进的实施

PS> measure-command { bin\012.exe }

TotalSeconds      : 0.0383436
TotalMilliseconds : 38.3436

如您所见,这台计算机在您运行16秒钟的同一台计算机上运行的时间为38毫秒:)

编译命令:

ghc -O2 012.hs -o bin\012.exe
ghc -O2 012_slow.hs -o bin\012_slow.exe

Your Haskell implementation could be greatly sped up by using some functions from Haskell packages. In this case I used primes, which is just installed with ‘cabal install primes’ ;)

import Data.Numbers.Primes
import Data.List

triangleNumbers = scanl1 (+) [1..]
nDivisors n = product $ map ((+1) . length) (group (primeFactors n))
answer = head $ filter ((> 500) . nDivisors) triangleNumbers

main :: IO ()
main = putStrLn $ "First triangle number to have over 500 divisors: " ++ (show answer)

Timings:

Your original program:

PS> measure-command { bin\012_slow.exe }

TotalSeconds      : 16.3807409
TotalMilliseconds : 16380.7409

Improved implementation

PS> measure-command { bin\012.exe }

TotalSeconds      : 0.0383436
TotalMilliseconds : 38.3436

As you can see, this one runs in 38 milliseconds on the same machine where yours ran in 16 seconds :)

Compilation commands:

ghc -O2 012.hs -o bin\012.exe
ghc -O2 012_slow.hs -o bin\012_slow.exe

回答 6

纯娱乐。以下是更“原生”的Haskell实现:

import Control.Applicative
import Control.Monad
import Data.Either
import Math.NumberTheory.Powers.Squares

isInt :: RealFrac c => c -> Bool
isInt = (==) <$> id <*> fromInteger . round

intSqrt :: (Integral a) => a -> Int
--intSqrt = fromIntegral . floor . sqrt . fromIntegral
intSqrt = fromIntegral . integerSquareRoot'

factorize :: Int -> [Int]
factorize 1 = []
factorize n = first : factorize (quot n first)
  where first = (!! 0) $ [a | a <- [2..intSqrt n], rem n a == 0] ++ [n]

factorize2 :: Int -> [(Int,Int)]
factorize2 = foldl (\ls@((val,freq):xs) y -> if val == y then (val,freq+1):xs else (y,1):ls) [(0,0)] . factorize

numDivisors :: Int -> Int
numDivisors = foldl (\acc (_,y) -> acc * (y+1)) 1 <$> factorize2

nextTriangleNumber :: (Int,Int) -> (Int,Int)
nextTriangleNumber (n,acc) = (n+1,acc+n+1)

forward :: Int -> (Int, Int) -> Either (Int, Int) (Int, Int)
forward k val@(n,acc) = if numDivisors acc > k then Left val else Right (nextTriangleNumber val)

problem12 :: Int -> (Int, Int)
problem12 n = (!!0) . lefts . scanl (>>=) (forward n (1,1)) . repeat . forward $ n

main = do
  let (n,val) = problem12 1000
  print val

使用 ghc -O3,这可以在我的机器(1.73GHz Core i7)上持续运行0.55-0.58秒。

对于C版本,更有效的factorCount函数:

int factorCount (int n)
{
  int count = 1;
  int candidate,tmpCount;
  while (n % 2 == 0) {
    count++;
    n /= 2;
  }
    for (candidate = 3; candidate < n && candidate * candidate < n; candidate += 2)
    if (n % candidate == 0) {
      tmpCount = 1;
      do {
        tmpCount++;
        n /= candidate;
      } while (n % candidate == 0);
       count*=tmpCount;
      }
  if (n > 1)
    count *= 2;
  return count;
}

使用来将main中的longs更改为int gcc -O3 -lm,这始终在0.31-0.35秒内运行。

如果利用第n个三角形数= n *(n + 1)/ 2,并且n和(n + 1)具有完全不同的素因式分解这一事实,这两个函数都可以运行得更快。可以乘以一半以找出整体的因素数量。以下:

int main ()
{
  int triangle = 0,count1,count2 = 1;
  do {
    count1 = count2;
    count2 = ++triangle % 2 == 0 ? factorCount(triangle+1) : factorCount((triangle+1)/2);
  } while (count1*count2 < 1001);
  printf ("%lld\n", ((long long)triangle)*(triangle+1)/2);
}

会将c代码运行时间减少到0.17-0.19秒,并且它可以处理更大的搜索-大于10000的因素在我的计算机上花费大约43秒。我对感兴趣的读者留下了类似的haskell提速。

Just for fun. The following is a more ‘native’ Haskell implementation:

import Control.Applicative
import Control.Monad
import Data.Either
import Math.NumberTheory.Powers.Squares

isInt :: RealFrac c => c -> Bool
isInt = (==) <$> id <*> fromInteger . round

intSqrt :: (Integral a) => a -> Int
--intSqrt = fromIntegral . floor . sqrt . fromIntegral
intSqrt = fromIntegral . integerSquareRoot'

factorize :: Int -> [Int]
factorize 1 = []
factorize n = first : factorize (quot n first)
  where first = (!! 0) $ [a | a <- [2..intSqrt n], rem n a == 0] ++ [n]

factorize2 :: Int -> [(Int,Int)]
factorize2 = foldl (\ls@((val,freq):xs) y -> if val == y then (val,freq+1):xs else (y,1):ls) [(0,0)] . factorize

numDivisors :: Int -> Int
numDivisors = foldl (\acc (_,y) -> acc * (y+1)) 1 <$> factorize2

nextTriangleNumber :: (Int,Int) -> (Int,Int)
nextTriangleNumber (n,acc) = (n+1,acc+n+1)

forward :: Int -> (Int, Int) -> Either (Int, Int) (Int, Int)
forward k val@(n,acc) = if numDivisors acc > k then Left val else Right (nextTriangleNumber val)

problem12 :: Int -> (Int, Int)
problem12 n = (!!0) . lefts . scanl (>>=) (forward n (1,1)) . repeat . forward $ n

main = do
  let (n,val) = problem12 1000
  print val

Using ghc -O3, this consistently runs in 0.55-0.58 seconds on my machine (1.73GHz Core i7).

A more efficient factorCount function for the C version:

int factorCount (int n)
{
  int count = 1;
  int candidate,tmpCount;
  while (n % 2 == 0) {
    count++;
    n /= 2;
  }
    for (candidate = 3; candidate < n && candidate * candidate < n; candidate += 2)
    if (n % candidate == 0) {
      tmpCount = 1;
      do {
        tmpCount++;
        n /= candidate;
      } while (n % candidate == 0);
       count*=tmpCount;
      }
  if (n > 1)
    count *= 2;
  return count;
}

Changing longs to ints in main, using gcc -O3 -lm, this consistently runs in 0.31-0.35 seconds.

Both can be made to run even faster if you take advantage of the fact that the nth triangle number = n*(n+1)/2, and n and (n+1) have completely disparate prime factorizations, so the number of factors of each half can be multiplied to find the number of factors of the whole. The following:

int main ()
{
  int triangle = 0,count1,count2 = 1;
  do {
    count1 = count2;
    count2 = ++triangle % 2 == 0 ? factorCount(triangle+1) : factorCount((triangle+1)/2);
  } while (count1*count2 < 1001);
  printf ("%lld\n", ((long long)triangle)*(triangle+1)/2);
}

will reduce the c code run time to 0.17-0.19 seconds, and it can handle much larger searches — greater than 10000 factors takes about 43 seconds on my machine. I leave a similar haskell speedup to the interested reader.


回答 7

问题1:erlang,python和haskell是否由于使用任意长度的整数而导致速度变慢,或者只要值小于MAXINT,它们是否会变慢?

这不太可能。关于Erlang和Haskell,我不能说太多(嗯,下面可能是关于Haskell的话),但是我可以指出Python中的许多其他瓶颈。程序每次尝试使用Python中的某些值执行操作时,都应验证这些值是否来自正确的类型,这会花费一些时间。您的factorCount函数只是分配一个具有range (1, isquare + 1)不同时间的列表,而运行时malloc风格的内存分配要比在C语言中对带有计数器的范围进行迭代要慢得多。factorCount()方法被多次调用,因此分配了很多列表。另外,让我们不要忘记Python是被解释的,而CPython解释器并没有将重点放在优化上。

编辑:哦,嗯,我注意到您正在使用Python 3,因此range()不会返回列表,而是返回一个生成器。在这种情况下,我关于分配列表的观点是错误的:该函数只是分配range对象,尽管这些对象效率不高,但效率却不如分配包含许多项目的列表。

问题2:为什么haskell这么慢?是否有编译器标志可以使您刹车,或者它是我的实现?(后者很可能因为haskell是一本对我有七个印章的书。)

您在使用拥抱吗?拥抱是一个相当慢的解释器。如果您正在使用它,也许您可​​以获得GHC的美好时光 -但我只是混淆假设,一个好的Haskell编译器在幕后所做的工作非常有趣,并且超出了我的理解范围:)

问题3:能否为我提供一些提示,说明如何在不改变因素确定方式的情况下优化这些实现?以任何方式进行优化:对语言更好,更快,更“原生”。

我会说你在玩一个有趣的游戏。了解各种语言的最好的部分就是尽可能以最不同的方式使用它们:)但是我离题,我只是对此没有任何建议。抱歉,在这种情况下,希望有人能为您提供帮助:)

问题4:我的功能实现是否允许LCO,从而避免在调用堆栈中添加不必要的帧?

据我所记得,您只需要确保递归调用是返回值之前的最后一个命令。换句话说,下面的函数可以使用这种优化:

def factorial(n, acc=1):
    if n > 1:
        acc = acc * n
        n = n - 1
        return factorial(n, acc)
    else:
        return acc

但是,如果您的函数如下所示,则您将没有这样的优化,因为在递归调用之后有一个运算(乘法):

def factorial2(n):
    if n > 1:
        f = factorial2(n-1)
        return f*n
    else:
        return 1

我将操作分为一些局部变量,以明确执行哪些操作。但是,最常见的是如下所示查看这些功能,但对于我要指出的内容,它们是等效的:

def factorial(n, acc=1):
    if n > 1:
        return factorial(n-1, acc*n)
    else:
        return acc

def factorial2(n):
    if n > 1:
        return n*factorial(n-1)
    else:
        return 1

注意,由编译器/解释器决定是否进行尾递归。例如,如果我没有记错的话,Python解释器就不会这样做(我之所以使用Python是因为它的语法很流利)。无论如何,如果您发现奇怪的东西,例如带有两个参数的阶乘函数(并且其中一个参数的名称具有诸如accaccumulator等等),现在您知道人们为什么这样做了:)

Question 1: Do erlang, python and haskell loose speed due to using arbitrary length integers or don’t they as long as the values are less than MAXINT?

This is unlikely. I cannot say much about Erlang and Haskell (well, maybe a bit about Haskell below) but I can point a lot of other bottlenecks in Python. Every time the program tries to execute an operation with some values in Python, it should verify whether the values are from the proper type, and it costs a bit of time. Your factorCount function just allocates a list with range (1, isquare + 1) various times, and runtime, malloc-styled memory allocation is way slower than iterating on a range with a counter as you do in C. Notably, the factorCount() is called multiple times and so allocates a lot of lists. Also, let us not forget that Python is interpreted and the CPython interpreter has no great focus on being optimized.

EDIT: oh, well, I note that you are using Python 3 so range() does not return a list, but a generator. In this case, my point about allocating lists is half-wrong: the function just allocates range objects, which are inefficient nonetheless but not as inefficient as allocating a list with a lot of items.

Question 2: Why is haskell so slow? Is there a compiler flag that turns off the brakes or is it my implementation? (The latter is quite probable as haskell is a book with seven seals to me.)

Are you using Hugs? Hugs is a considerably slow interpreter. If you are using it, maybe you can get a better time with GHC – but I am only cogitating hypotesis, the kind of stuff a good Haskell compiler does under the hood is pretty fascinating and way beyond my comprehension :)

Question 3: Can you offer me some hints how to optimize these implementations without changing the way I determine the factors? Optimization in any way: nicer, faster, more “native” to the language.

I’d say you are playing an unfunny game. The best part of knowing various languages is to use them the most different way possible :) But I digress, I just do not have any recommendation for this point. Sorry, I hope someone can help you in this case :)

Question 4: Do my functional implementations permit LCO and hence avoid adding unnecessary frames onto the call stack?

As far as I remember, you just need to make sure that your recursive call is the last command before returning a value. In other words, a function like the one below could use such optimization:

def factorial(n, acc=1):
    if n > 1:
        acc = acc * n
        n = n - 1
        return factorial(n, acc)
    else:
        return acc

However, you would not have such optimization if your function were such as the one below, because there is an operation (multiplication) after the recursive call:

def factorial2(n):
    if n > 1:
        f = factorial2(n-1)
        return f*n
    else:
        return 1

I separated the operations in some local variables for make it clear which operations are executed. However, the most usual is to see these functions as below, but they are equivalent for the point I am making:

def factorial(n, acc=1):
    if n > 1:
        return factorial(n-1, acc*n)
    else:
        return acc

def factorial2(n):
    if n > 1:
        return n*factorial(n-1)
    else:
        return 1

Note that it is up to the compiler/interpreter to decide if it will make tail recursion. For example, the Python interpreter does not do it if I remember well (I used Python in my example only because of its fluent syntax). Anyway, if you find strange stuff such as factorial functions with two parameters (and one of the parameters has names such as acc, accumulator etc.) now you know why people do it :)


回答 8

使用Haskell,您实际上不需要显式考虑递归。

factorCount number = foldr factorCount' 0 [1..isquare] -
                     (fromEnum $ square == fromIntegral isquare)
    where
      square = sqrt $ fromIntegral number
      isquare = floor square
      factorCount' candidate
        | number `rem` candidate == 0 = (2 +)
        | otherwise = id

triangles :: [Int]
triangles = scanl1 (+) [1,2..]

main = print . head $ dropWhile ((< 1001) . factorCount) triangles

在上面的代码中,我用普通的列表操作替换了@Thomas答案中的显式递归。该代码仍然执行完全相同的操作,而无需我们担心尾部递归。它运行(〜7.49s)比@Thomas的答案(〜7.04s)在我的装有GHC 7.6.2的机器上的速度慢约6%,而@Raedwulf的C版本的运行速度约为3.15s。一年以来,GHC似乎有所改善。

PS。我知道这是一个老问题,我在Google搜索中偶然发现了这个问题(现在我忘记了正在搜索的内容…)。只是想评论有关LCO的问题,并总体上表达我对Haskell的看法。我想对最佳答案发表评论,但评论不允许使用代码块。

With Haskell, you really don’t need to think in recursions explicitly.

factorCount number = foldr factorCount' 0 [1..isquare] -
                     (fromEnum $ square == fromIntegral isquare)
    where
      square = sqrt $ fromIntegral number
      isquare = floor square
      factorCount' candidate
        | number `rem` candidate == 0 = (2 +)
        | otherwise = id

triangles :: [Int]
triangles = scanl1 (+) [1,2..]

main = print . head $ dropWhile ((< 1001) . factorCount) triangles

In the above code, I have replaced explicit recursions in @Thomas’ answer with common list operations. The code still does exactly the same thing without us worrying about tail recursion. It runs (~ 7.49s) about 6% slower than the version in @Thomas’ answer (~ 7.04s) on my machine with GHC 7.6.2, while the C version from @Raedwulf runs ~ 3.15s. It seems GHC has improved over the year.

PS. I know it is an old question, and I stumble upon it from google searches (I forgot what I was searching, now…). Just wanted to comment on the question about LCO and express my feelings about Haskell in general. I wanted to comment on the top answer, but comments do not allow code blocks.


回答 9

有关C版本的更多数字和说明。显然,这些年来没有人这样做。记住要对这个答案进行投票,这样它才能在每个人的视野和学习中获得领先。

第一步:作者程序的基准

笔记本电脑规格:

  • CPU i3 M380(931 MHz-最大省电模式)
  • 4GB内存
  • Win7 64位
  • Microsoft Visual Studio 2012旗舰版
  • Cygwin与GCC 4.9.3
  • Python 2.7.10

命令:

compiling on VS x64 command prompt > `for /f %f in ('dir /b *.c') do cl /O2 /Ot /Ox %f -o %f_x64_vs2012.exe`
compiling on cygwin with gcc x64   > `for f in ./*.c; do gcc -m64 -O3 $f -o ${f}_x64_gcc.exe ; done`
time (unix tools) using cygwin > `for f in ./*.exe; do  echo "----------"; echo $f ; time $f ; done`

----------
$ time python ./original.py

real    2m17.748s
user    2m15.783s
sys     0m0.093s
----------
$ time ./original_x86_vs2012.exe

real    0m8.377s
user    0m0.015s
sys     0m0.000s
----------
$ time ./original_x64_vs2012.exe

real    0m8.408s
user    0m0.000s
sys     0m0.015s
----------
$ time ./original_x64_gcc.exe

real    0m20.951s
user    0m20.732s
sys     0m0.030s

文件名是: integertype_architecture_compiler.exe

  • 现在,integertype与原始程序相同(稍后会详细介绍)
  • 架构是x86还是x64,具体取决于编译器设置
  • 编译器是gcc还是vs2012

第二步:再次调查,改进并进行基准测试

VS比gcc快250%。两种编译器的速度应该相似。显然,代码或编译器选项出了点问题。让我们调查一下!

第一个关注点是整数类型。转换可能会很昂贵,并且一致性对于更好的代码生成和优化很重要。所有整数应为同一类型。

这是一个混合的混乱intlong现在。我们将对此进行改进。使用什么类型?最快的。要对它们进行基准测试!

----------
$ time ./int_x86_vs2012.exe

real    0m8.440s
user    0m0.016s
sys     0m0.015s
----------
$ time ./int_x64_vs2012.exe

real    0m8.408s
user    0m0.016s
sys     0m0.015s
----------
$ time ./int32_x86_vs2012.exe

real    0m8.408s
user    0m0.000s
sys     0m0.015s
----------
$ time ./int32_x64_vs2012.exe

real    0m8.362s
user    0m0.000s
sys     0m0.015s
----------
$ time ./int64_x86_vs2012.exe

real    0m18.112s
user    0m0.000s
sys     0m0.015s
----------
$ time ./int64_x64_vs2012.exe

real    0m18.611s
user    0m0.000s
sys     0m0.015s
----------
$ time ./long_x86_vs2012.exe

real    0m8.393s
user    0m0.015s
sys     0m0.000s
----------
$ time ./long_x64_vs2012.exe

real    0m8.440s
user    0m0.000s
sys     0m0.015s
----------
$ time ./uint32_x86_vs2012.exe

real    0m8.362s
user    0m0.000s
sys     0m0.015s
----------
$ time ./uint32_x64_vs2012.exe

real    0m8.393s
user    0m0.015s
sys     0m0.015s
----------
$ time ./uint64_x86_vs2012.exe

real    0m15.428s
user    0m0.000s
sys     0m0.015s
----------
$ time ./uint64_x64_vs2012.exe

real    0m15.725s
user    0m0.015s
sys     0m0.015s
----------
$ time ./int_x64_gcc.exe

real    0m8.531s
user    0m8.329s
sys     0m0.015s
----------
$ time ./int32_x64_gcc.exe

real    0m8.471s
user    0m8.345s
sys     0m0.000s
----------
$ time ./int64_x64_gcc.exe

real    0m20.264s
user    0m20.186s
sys     0m0.015s
----------
$ time ./long_x64_gcc.exe

real    0m20.935s
user    0m20.809s
sys     0m0.015s
----------
$ time ./uint32_x64_gcc.exe

real    0m8.393s
user    0m8.346s
sys     0m0.015s
----------
$ time ./uint64_x64_gcc.exe

real    0m16.973s
user    0m16.879s
sys     0m0.030s

整数类型为int long int32_t uint32_t int64_tand uint64_tfrom#include <stdint.h>

在C中有很多整数类型,还有一些带符号/无符号可玩,以及选择编译为x86或x64(不要与实际整数大小混淆)的选择。有很多版本可以编译和运行^^

第三步:了解数字

明确的结论:

  • 32位整数比64位等效项快200%
  • 无符号的64位整数比有符号的64位快25%(不幸的是,我对此没有任何解释)

技巧问题:“ C中int和long的大小是多少?”
正确的答案是:int的大小和C中的long大小没有明确定义!

从C规范:

int至少为32位
长至少是一个int

从gcc手册页(-m32和-m64标志):

32位环境将int,long和指针设置为32位,并生成可在任何i386系统上运行的代码。
64位环境将int设置为32位长,将指针设置为64位,并为AMD的x86-64体系结构生成代码。

从MSDN文档(数据类型范围)https://msdn.microsoft.com/zh-cn/library/s3f49ktz%28v=vs.110%29.aspx

int,4字节,也称为有符号
long,4字节,也称为long int和有符号long int

结论:吸取的教训

  • 32位整数比64位整数快。

  • 标准整数类型在C和C ++中都没有很好的定义,它们随编译器和体系结构的不同而不同。需要一致性和可预测性时,请使用中的uint32_t整数族#include <stdint.h>

  • 速度问题解决了。所有其他语言都落后了数百%,C&C ++再次获胜!他们总是这样做。下一个改进将是使用OpenMP:D的多线程

Some more numbers and explanations for the C version. Apparently noone did it during all those years. Remember to upvote this answer so it can get on top for everyone to see and learn.

Step One: Benchmark of the author’s programs

Laptop Specifications:

  • CPU i3 M380 (931 MHz – maximum battery saving mode)
  • 4GB memory
  • Win7 64 bits
  • Microsoft Visual Studio 2012 Ultimate
  • Cygwin with gcc 4.9.3
  • Python 2.7.10

Commands:

compiling on VS x64 command prompt > `for /f %f in ('dir /b *.c') do cl /O2 /Ot /Ox %f -o %f_x64_vs2012.exe`
compiling on cygwin with gcc x64   > `for f in ./*.c; do gcc -m64 -O3 $f -o ${f}_x64_gcc.exe ; done`
time (unix tools) using cygwin > `for f in ./*.exe; do  echo "----------"; echo $f ; time $f ; done`

.

----------
$ time python ./original.py

real    2m17.748s
user    2m15.783s
sys     0m0.093s
----------
$ time ./original_x86_vs2012.exe

real    0m8.377s
user    0m0.015s
sys     0m0.000s
----------
$ time ./original_x64_vs2012.exe

real    0m8.408s
user    0m0.000s
sys     0m0.015s
----------
$ time ./original_x64_gcc.exe

real    0m20.951s
user    0m20.732s
sys     0m0.030s

Filenames are: integertype_architecture_compiler.exe

  • integertype is the same as the original program for now (more on that later)
  • architecture is x86 or x64 depending on the compiler settings
  • compiler is gcc or vs2012

Step Two: Investigate, Improve and Benchmark Again

VS is 250% faster than gcc. The two compilers should give a similar speed. Obviously, something is wrong with the code or the compiler options. Let’s investigate!

The first point of interest is the integer types. Conversions can be expensive and consistency is important for better code generation & optimizations. All integers should be the same type.

It’s a mixed mess of int and long right now. We’re going to improve that. What type to use? The fastest. Gotta benchmark them’all!

----------
$ time ./int_x86_vs2012.exe

real    0m8.440s
user    0m0.016s
sys     0m0.015s
----------
$ time ./int_x64_vs2012.exe

real    0m8.408s
user    0m0.016s
sys     0m0.015s
----------
$ time ./int32_x86_vs2012.exe

real    0m8.408s
user    0m0.000s
sys     0m0.015s
----------
$ time ./int32_x64_vs2012.exe

real    0m8.362s
user    0m0.000s
sys     0m0.015s
----------
$ time ./int64_x86_vs2012.exe

real    0m18.112s
user    0m0.000s
sys     0m0.015s
----------
$ time ./int64_x64_vs2012.exe

real    0m18.611s
user    0m0.000s
sys     0m0.015s
----------
$ time ./long_x86_vs2012.exe

real    0m8.393s
user    0m0.015s
sys     0m0.000s
----------
$ time ./long_x64_vs2012.exe

real    0m8.440s
user    0m0.000s
sys     0m0.015s
----------
$ time ./uint32_x86_vs2012.exe

real    0m8.362s
user    0m0.000s
sys     0m0.015s
----------
$ time ./uint32_x64_vs2012.exe

real    0m8.393s
user    0m0.015s
sys     0m0.015s
----------
$ time ./uint64_x86_vs2012.exe

real    0m15.428s
user    0m0.000s
sys     0m0.015s
----------
$ time ./uint64_x64_vs2012.exe

real    0m15.725s
user    0m0.015s
sys     0m0.015s
----------
$ time ./int_x64_gcc.exe

real    0m8.531s
user    0m8.329s
sys     0m0.015s
----------
$ time ./int32_x64_gcc.exe

real    0m8.471s
user    0m8.345s
sys     0m0.000s
----------
$ time ./int64_x64_gcc.exe

real    0m20.264s
user    0m20.186s
sys     0m0.015s
----------
$ time ./long_x64_gcc.exe

real    0m20.935s
user    0m20.809s
sys     0m0.015s
----------
$ time ./uint32_x64_gcc.exe

real    0m8.393s
user    0m8.346s
sys     0m0.015s
----------
$ time ./uint64_x64_gcc.exe

real    0m16.973s
user    0m16.879s
sys     0m0.030s

Integer types are int long int32_t uint32_t int64_t and uint64_t from #include <stdint.h>

There are LOTS of integer types in C, plus some signed/unsigned to play with, plus the choice to compile as x86 or x64 (not to be confused with the actual integer size). That is a lot of versions to compile and run ^^

Step Three: Understanding the Numbers

Definitive conclusions:

  • 32 bits integers are ~200% faster than 64 bits equivalents
  • unsigned 64 bits integers are 25 % faster than signed 64 bits (Unfortunately, I have no explanation for that)

Trick question: “What are the sizes of int and long in C?”
The right answer is: The size of int and long in C are not well-defined!

From the C spec:

int is at least 32 bits
long is at least an int

From the gcc man page (-m32 and -m64 flags):

The 32-bit environment sets int, long and pointer to 32 bits and generates code that runs on any i386 system.
The 64-bit environment sets int to 32 bits and long and pointer to 64 bits and generates code for AMD’s x86-64 architecture.

From MSDN documentation (Data Type Ranges) https://msdn.microsoft.com/en-us/library/s3f49ktz%28v=vs.110%29.aspx :

int, 4 bytes, also knows as signed
long, 4 bytes, also knows as long int and signed long int

To Conclude This: Lessons Learned

  • 32 bits integers are faster than 64 bits integers.

  • Standard integers types are not well defined in C nor C++, they vary depending on compilers and architectures. When you need consistency and predictability, use the uint32_t integer family from #include <stdint.h>.

  • Speed issues solved. All other languages are back hundreds percent behind, C & C++ win again ! They always do. The next improvement will be multithreading using OpenMP :D


回答 10

查看您的Erlang实现。时间包括启动整个虚拟机,运行程序以及停止虚拟机。非常确定,设置和停止erlang vm会花费一些时间。

如果计时是在erlang虚拟机本身内部完成的,则结果将有所不同,因为在这种情况下,我们仅拥有有关程序的实际时间。否则,我相信启动和加载Erlang Vm的过程加上停止它(如您在程序中所述)所花费的总时间都包含在您用于计时的方法的总时间中。程序正在输出。考虑使用erlang计时本身,当我们想在虚拟机本身中计时程序时使用 timer:tc/1 or timer:tc/2 or timer:tc/3。这样,来自erlang的结果将排除启动和停止/杀死/停止虚拟机所花费的时间。那就是我的理由,考虑一下,然后再次尝试基准测试。

我实际上建议我们尝试在那些语言的运行时内对那些具有运行时的语言计时(对于具有运行时的语言),以便获得精确的值。例如,C与Erlang,Python和Haskell一样,没有启动和关闭运行时系统的开销(对此有98%的把握-我可以纠正)。因此(基于此推理),我最后说,该基准对于在运行时系统之上运行的语言而言不够精确/不够合理。让我们再次进行这些更改。

编辑:即使所有语言都有运行时系统,启动每种语言和停止它的开销也会有所不同。所以我建议我们从运行时系统中进行计时(适用于这种语言)。众所周知,Erlang VM在启动时会有相当大的开销!

Looking at your Erlang implementation. The timing has included the start up of the entire virtual machine, running your program and halting the virtual machine. Am pretty sure that setting up and halting the erlang vm takes some time.

If the timing was done within the erlang virtual machine itself, results would be different as in that case we would have the actual time for only the program in question. Otherwise, i believe that the total time taken by the process of starting and loading of the Erlang Vm plus that of halting it (as you put it in your program) are all included in the total time which the method you are using to time the program is outputting. Consider using the erlang timing itself which we use when we want to time our programs within the virtual machine itself timer:tc/1 or timer:tc/2 or timer:tc/3. In this way, the results from erlang will exclude the time taken to start and stop/kill/halt the virtual machine. That is my reasoning there, think about it, and then try your bench mark again.

I actually suggest that we try to time the program (for languages that have a runtime), within the runtime of those languages in order to get a precise value. C for example has no overhead of starting and shutting down a runtime system as does Erlang, Python and Haskell (98% sure of this – i stand correction). So (based on this reasoning) i conclude by saying that this benchmark wasnot precise /fair enough for languages running on top of a runtime system. Lets do it again with these changes.

EDIT: besides even if all the languages had runtime systems, the overhead of starting each and halting it would differ. so i suggest we time from within the runtime systems (for the languages for which this applies). The Erlang VM is known to have considerable overhead at start up!


回答 11

问题1:Erlang,Python和Haskell是否由于使用任意长度的整数而导致速度下降,还是只要它们的值小于MAXINT,它们就不会丢失吗?

对于Erlang,可以否定回答第一个问题。通过适当地使用Erlang可以回答最后一个问题,如下所示:

http://bredsaal.dk/learning-erlang-using-projecteuler-net

因为它比您最初的C示例要快,所以我想会有很多问题,因为其他问题已经详细介绍了。

这个Erlang模块在便宜的上网本上执行大约需要5秒钟的时间。它使用erlang中的网络线程模型,从而演示了如何利用事件模型。它可以分布在许多节点上。而且速度很快。不是我的代码。

-module(p12dist).  
-author("Jannich Brendle, jannich@bredsaal.dk, http://blog.bredsaal.dk").  
-compile(export_all).

server() ->  
  server(1).

server(Number) ->  
  receive {getwork, Worker_PID} -> Worker_PID ! {work,Number,Number+100},  
  server(Number+101);  
  {result,T} -> io:format("The result is: \~w.\~n", [T]);  
  _ -> server(Number)  
  end.

worker(Server_PID) ->  
  Server_PID ! {getwork, self()},  
  receive {work,Start,End} -> solve(Start,End,Server_PID)  
  end,  
  worker(Server_PID).

start() ->  
  Server_PID = spawn(p12dist, server, []),  
  spawn(p12dist, worker, [Server_PID]),  
  spawn(p12dist, worker, [Server_PID]),  
  spawn(p12dist, worker, [Server_PID]),  
  spawn(p12dist, worker, [Server_PID]).

solve(N,End,_) when N =:= End -> no_solution;

solve(N,End,Server_PID) ->  
  T=round(N*(N+1)/2),
  case (divisor(T,round(math:sqrt(T))) > 500) of  
    true ->  
      Server_PID ! {result,T};  
    false ->  
      solve(N+1,End,Server_PID)  
  end.

divisors(N) ->  
  divisor(N,round(math:sqrt(N))).

divisor(_,0) -> 1;  
divisor(N,I) ->  
  case (N rem I) =:= 0 of  
  true ->  
    2+divisor(N,I-1);  
  false ->  
    divisor(N,I-1)  
  end.

以下测试在以下环境下进行:Intel(R)Atom(TM)CPU N270 @ 1.60GHz

~$ time erl -noshell -s p12dist start

The result is: 76576500.

^C

BREAK: (a)bort (c)ontinue (p)roc info (i)nfo (l)oaded
       (v)ersion (k)ill (D)b-tables (d)istribution
a

real    0m5.510s
user    0m5.836s
sys 0m0.152s

Question 1: Do Erlang, Python and Haskell lose speed due to using arbitrary length integers or don’t they as long as the values are less than MAXINT?

Question one can be answered in the negative for Erlang. The last question is answered by using Erlang appropriately, as in:

http://bredsaal.dk/learning-erlang-using-projecteuler-net

Since it’s faster than your initial C example, I would guess there are numerous problems as others have already covered in detail.

This Erlang module executes on a cheap netbook in about 5 seconds … It uses the network threads model in erlang and, as such demonstrates how to take advantage of the event model. It could be distributed over many nodes. And it’s fast. Not my code.

-module(p12dist).  
-author("Jannich Brendle, jannich@bredsaal.dk, http://blog.bredsaal.dk").  
-compile(export_all).

server() ->  
  server(1).

server(Number) ->  
  receive {getwork, Worker_PID} -> Worker_PID ! {work,Number,Number+100},  
  server(Number+101);  
  {result,T} -> io:format("The result is: \~w.\~n", [T]);  
  _ -> server(Number)  
  end.

worker(Server_PID) ->  
  Server_PID ! {getwork, self()},  
  receive {work,Start,End} -> solve(Start,End,Server_PID)  
  end,  
  worker(Server_PID).

start() ->  
  Server_PID = spawn(p12dist, server, []),  
  spawn(p12dist, worker, [Server_PID]),  
  spawn(p12dist, worker, [Server_PID]),  
  spawn(p12dist, worker, [Server_PID]),  
  spawn(p12dist, worker, [Server_PID]).

solve(N,End,_) when N =:= End -> no_solution;

solve(N,End,Server_PID) ->  
  T=round(N*(N+1)/2),
  case (divisor(T,round(math:sqrt(T))) > 500) of  
    true ->  
      Server_PID ! {result,T};  
    false ->  
      solve(N+1,End,Server_PID)  
  end.

divisors(N) ->  
  divisor(N,round(math:sqrt(N))).

divisor(_,0) -> 1;  
divisor(N,I) ->  
  case (N rem I) =:= 0 of  
  true ->  
    2+divisor(N,I-1);  
  false ->  
    divisor(N,I-1)  
  end.

The test below took place on an: Intel(R) Atom(TM) CPU N270 @ 1.60GHz

~$ time erl -noshell -s p12dist start

The result is: 76576500.

^C

BREAK: (a)bort (c)ontinue (p)roc info (i)nfo (l)oaded
       (v)ersion (k)ill (D)b-tables (d)istribution
a

real    0m5.510s
user    0m5.836s
sys 0m0.152s

回答 12

C ++ 11,<我20ms的运行在这里

我了解您想获得一些技巧来帮助您改善特定于语言的知识,但是由于本文对此已作了详尽介绍,因此我想为可能看过您对问题的数学评论等内容的人添加一些上下文,并想知道为什么这样做代码太慢了。

该答案主要是提供上下文,以希望帮助人们更轻松地评估您的问题/其他答案中的代码。

这段代码仅基于以下几点使用了与丑陋语言无关的(丑陋的)优化:

  1. 每个训练序列号的形式为n(n + 1)/ 2
  2. n和n + 1是互质的
  3. 除数是一个乘法函数

#include <iostream>
#include <cmath>
#include <tuple>
#include <chrono>

using namespace std;

// Calculates the divisors of an integer by determining its prime factorisation.

int get_divisors(long long n)
{
    int divisors_count = 1;

    for(long long i = 2;
        i <= sqrt(n);
        /* empty */)
    {
        int divisions = 0;
        while(n % i == 0)
        {
            n /= i;
            divisions++;
        }

        divisors_count *= (divisions + 1);

        //here, we try to iterate more efficiently by skipping
        //obvious non-primes like 4, 6, etc
        if(i == 2)
            i++;
        else
            i += 2;
    }

    if(n != 1) //n is a prime
        return divisors_count * 2;
    else
        return divisors_count;
}

long long euler12()
{
    //n and n + 1
    long long n, n_p_1;

    n = 1; n_p_1 = 2;

    // divisors_x will store either the divisors of x or x/2
    // (the later iff x is divisible by two)
    long long divisors_n = 1;
    long long divisors_n_p_1 = 2;

    for(;;)
    {
        /* This loop has been unwound, so two iterations are completed at a time
         * n and n + 1 have no prime factors in common and therefore we can
         * calculate their divisors separately
         */

        long long total_divisors;                 //the divisors of the triangle number
                                                  // n(n+1)/2

        //the first (unwound) iteration

        divisors_n_p_1 = get_divisors(n_p_1 / 2); //here n+1 is even and we

        total_divisors =
                  divisors_n
                * divisors_n_p_1;

        if(total_divisors > 1000)
            break;

        //move n and n+1 forward
        n = n_p_1;
        n_p_1 = n + 1;

        //fix the divisors
        divisors_n = divisors_n_p_1;
        divisors_n_p_1 = get_divisors(n_p_1);   //n_p_1 is now odd!

        //now the second (unwound) iteration

        total_divisors =
                  divisors_n
                * divisors_n_p_1;

        if(total_divisors > 1000)
            break;

        //move n and n+1 forward
        n = n_p_1;
        n_p_1 = n + 1;

        //fix the divisors
        divisors_n = divisors_n_p_1;
        divisors_n_p_1 = get_divisors(n_p_1 / 2);   //n_p_1 is now even!
    }

    return (n * n_p_1) / 2;
}

int main()
{
    for(int i = 0; i < 1000; i++)
    {
        using namespace std::chrono;
        auto start = high_resolution_clock::now();
        auto result = euler12();
        auto end = high_resolution_clock::now();

        double time_elapsed = duration_cast<milliseconds>(end - start).count();

        cout << result << " " << time_elapsed << '\n';
    }
    return 0;
}

我的台式机平均要花19毫秒,而笔记本电脑平均要花80毫秒,这与我在这里看到的大多数其他代码相去甚远。毫无疑问,仍有许多优化措施可用。

C++11, < 20ms for meRun it here

I understand that you want tips to help improve your language specific knowledge, but since that has been well covered here, I thought I would add some context for people who may have looked at the mathematica comment on your question, etc, and wondered why this code was so much slower.

This answer is mainly to provide context to hopefully help people evaluate the code in your question / other answers more easily.

This code uses only a couple of (uglyish) optimisations, unrelated to the language used, based on:

  1. every traingle number is of the form n(n+1)/2
  2. n and n+1 are coprime
  3. the number of divisors is a multiplicative function

#include <iostream>
#include <cmath>
#include <tuple>
#include <chrono>

using namespace std;

// Calculates the divisors of an integer by determining its prime factorisation.

int get_divisors(long long n)
{
    int divisors_count = 1;

    for(long long i = 2;
        i <= sqrt(n);
        /* empty */)
    {
        int divisions = 0;
        while(n % i == 0)
        {
            n /= i;
            divisions++;
        }

        divisors_count *= (divisions + 1);

        //here, we try to iterate more efficiently by skipping
        //obvious non-primes like 4, 6, etc
        if(i == 2)
            i++;
        else
            i += 2;
    }

    if(n != 1) //n is a prime
        return divisors_count * 2;
    else
        return divisors_count;
}

long long euler12()
{
    //n and n + 1
    long long n, n_p_1;

    n = 1; n_p_1 = 2;

    // divisors_x will store either the divisors of x or x/2
    // (the later iff x is divisible by two)
    long long divisors_n = 1;
    long long divisors_n_p_1 = 2;

    for(;;)
    {
        /* This loop has been unwound, so two iterations are completed at a time
         * n and n + 1 have no prime factors in common and therefore we can
         * calculate their divisors separately
         */

        long long total_divisors;                 //the divisors of the triangle number
                                                  // n(n+1)/2

        //the first (unwound) iteration

        divisors_n_p_1 = get_divisors(n_p_1 / 2); //here n+1 is even and we

        total_divisors =
                  divisors_n
                * divisors_n_p_1;

        if(total_divisors > 1000)
            break;

        //move n and n+1 forward
        n = n_p_1;
        n_p_1 = n + 1;

        //fix the divisors
        divisors_n = divisors_n_p_1;
        divisors_n_p_1 = get_divisors(n_p_1);   //n_p_1 is now odd!

        //now the second (unwound) iteration

        total_divisors =
                  divisors_n
                * divisors_n_p_1;

        if(total_divisors > 1000)
            break;

        //move n and n+1 forward
        n = n_p_1;
        n_p_1 = n + 1;

        //fix the divisors
        divisors_n = divisors_n_p_1;
        divisors_n_p_1 = get_divisors(n_p_1 / 2);   //n_p_1 is now even!
    }

    return (n * n_p_1) / 2;
}

int main()
{
    for(int i = 0; i < 1000; i++)
    {
        using namespace std::chrono;
        auto start = high_resolution_clock::now();
        auto result = euler12();
        auto end = high_resolution_clock::now();

        double time_elapsed = duration_cast<milliseconds>(end - start).count();

        cout << result << " " << time_elapsed << '\n';
    }
    return 0;
}

That takes around 19ms on average for my desktop and 80ms for my laptop, a far cry from most of the other code I’ve seen here. And there are, no doubt, many optimisations still available.


回答 13

尝试去:

package main

import "fmt"
import "math"

func main() {
    var n, m, c int
    for i := 1; ; i++ {
        n, m, c = i * (i + 1) / 2, int(math.Sqrt(float64(n))), 0
        for f := 1; f < m; f++ {
            if n % f == 0 { c++ }
    }
    c *= 2
    if m * m == n { c ++ }
    if c > 1001 {
        fmt.Println(n)
        break
        }
    }
}

我得到:

原始C版本:9.1690 100%
执行:8.2520 111%

但是使用:

package main

import (
    "math"
    "fmt"
 )

// Sieve of Eratosthenes
func PrimesBelow(limit int) []int {
    switch {
        case limit < 2:
            return []int{}
        case limit == 2:
            return []int{2}
    }
    sievebound := (limit - 1) / 2
    sieve := make([]bool, sievebound+1)
    crosslimit := int(math.Sqrt(float64(limit))-1) / 2
    for i := 1; i <= crosslimit; i++ {
        if !sieve[i] {
            for j := 2 * i * (i + 1); j <= sievebound; j += 2*i + 1 {
                sieve[j] = true
            }
        }
    }
    plimit := int(1.3*float64(limit)) / int(math.Log(float64(limit)))
    primes := make([]int, plimit)
    p := 1
    primes[0] = 2
    for i := 1; i <= sievebound; i++ {
        if !sieve[i] {
            primes[p] = 2*i + 1
            p++
            if p >= plimit {
                break
            }
        }
    }
    last := len(primes) - 1
    for i := last; i > 0; i-- {
        if primes[i] != 0 {
            break
        }
        last = i
    }
    return primes[0:last]
}



func main() {
    fmt.Println(p12())
}
// Requires PrimesBelow from utils.go
func p12() int {
    n, dn, cnt := 3, 2, 0
    primearray := PrimesBelow(1000000)
    for cnt <= 1001 {
        n++
        n1 := n
        if n1%2 == 0 {
            n1 /= 2
        }
        dn1 := 1
        for i := 0; i < len(primearray); i++ {
            if primearray[i]*primearray[i] > n1 {
                dn1 *= 2
                break
            }
            exponent := 1
            for n1%primearray[i] == 0 {
                exponent++
                n1 /= primearray[i]
            }
            if exponent > 1 {
                dn1 *= exponent
            }
            if n1 == 1 {
                break
            }
        }
        cnt = dn * dn1
        dn = dn1
    }
    return n * (n - 1) / 2
}

我得到:

原始c版本:9.1690 100%
thaumkid的c版本:0.1060 8650%
初版:8.2520 111%
次版:0.0230 39865%

我也尝试了Python3.6和pypy3.3-5.5-alpha:

原始C版本:8.629 100%
thaumkid的C版本:0.109 7916%
Python3.6:54.795 16%
pypy3.3-5.5-alpha:13.291 65%

然后用下面的代码我得到:

原始c版本:8.629 100%
thaumkid的c版本:0.109 8650%
Python3.6:1.489 580%
pypy3.3-5.5-alpha:0.582 1483%

def D(N):
    if N == 1: return 1
    sqrtN = int(N ** 0.5)
    nf = 1
    for d in range(2, sqrtN + 1):
        if N % d == 0:
            nf = nf + 1
    return 2 * nf - (1 if sqrtN**2 == N else 0)

L = 1000
Dt, n = 0, 0

while Dt <= L:
    t = n * (n + 1) // 2
    Dt = D(n/2)*D(n+1) if n%2 == 0 else D(n)*D((n+1)/2)
    n = n + 1

print (t)

Trying GO:

package main

import "fmt"
import "math"

func main() {
    var n, m, c int
    for i := 1; ; i++ {
        n, m, c = i * (i + 1) / 2, int(math.Sqrt(float64(n))), 0
        for f := 1; f < m; f++ {
            if n % f == 0 { c++ }
    }
    c *= 2
    if m * m == n { c ++ }
    if c > 1001 {
        fmt.Println(n)
        break
        }
    }
}

I get:

original c version: 9.1690 100%
go: 8.2520 111%

But using:

package main

import (
    "math"
    "fmt"
 )

// Sieve of Eratosthenes
func PrimesBelow(limit int) []int {
    switch {
        case limit < 2:
            return []int{}
        case limit == 2:
            return []int{2}
    }
    sievebound := (limit - 1) / 2
    sieve := make([]bool, sievebound+1)
    crosslimit := int(math.Sqrt(float64(limit))-1) / 2
    for i := 1; i <= crosslimit; i++ {
        if !sieve[i] {
            for j := 2 * i * (i + 1); j <= sievebound; j += 2*i + 1 {
                sieve[j] = true
            }
        }
    }
    plimit := int(1.3*float64(limit)) / int(math.Log(float64(limit)))
    primes := make([]int, plimit)
    p := 1
    primes[0] = 2
    for i := 1; i <= sievebound; i++ {
        if !sieve[i] {
            primes[p] = 2*i + 1
            p++
            if p >= plimit {
                break
            }
        }
    }
    last := len(primes) - 1
    for i := last; i > 0; i-- {
        if primes[i] != 0 {
            break
        }
        last = i
    }
    return primes[0:last]
}



func main() {
    fmt.Println(p12())
}
// Requires PrimesBelow from utils.go
func p12() int {
    n, dn, cnt := 3, 2, 0
    primearray := PrimesBelow(1000000)
    for cnt <= 1001 {
        n++
        n1 := n
        if n1%2 == 0 {
            n1 /= 2
        }
        dn1 := 1
        for i := 0; i < len(primearray); i++ {
            if primearray[i]*primearray[i] > n1 {
                dn1 *= 2
                break
            }
            exponent := 1
            for n1%primearray[i] == 0 {
                exponent++
                n1 /= primearray[i]
            }
            if exponent > 1 {
                dn1 *= exponent
            }
            if n1 == 1 {
                break
            }
        }
        cnt = dn * dn1
        dn = dn1
    }
    return n * (n - 1) / 2
}

I get:

original c version: 9.1690 100%
thaumkid’s c version: 0.1060 8650%
first go version: 8.2520 111%
second go version: 0.0230 39865%

I also tried Python3.6 and pypy3.3-5.5-alpha:

original c version: 8.629 100%
thaumkid’s c version: 0.109 7916%
Python3.6: 54.795 16%
pypy3.3-5.5-alpha: 13.291 65%

and then with following code I got:

original c version: 8.629 100%
thaumkid’s c version: 0.109 8650%
Python3.6: 1.489 580%
pypy3.3-5.5-alpha: 0.582 1483%

def D(N):
    if N == 1: return 1
    sqrtN = int(N ** 0.5)
    nf = 1
    for d in range(2, sqrtN + 1):
        if N % d == 0:
            nf = nf + 1
    return 2 * nf - (1 if sqrtN**2 == N else 0)

L = 1000
Dt, n = 0, 0

while Dt <= L:
    t = n * (n + 1) // 2
    Dt = D(n/2)*D(n+1) if n%2 == 0 else D(n)*D((n+1)/2)
    n = n + 1

print (t)

回答 14

更改: case (divisor(T,round(math:sqrt(T))) > 500) of

至: case (divisor(T,round(math:sqrt(T))) > 1000) of

这将为Erlang多进程示例提供正确的答案。

Change: case (divisor(T,round(math:sqrt(T))) > 500) of

To: case (divisor(T,round(math:sqrt(T))) > 1000) of

This will produce the correct answer for the Erlang multi-process example.


回答 15

我假设只有在涉及的因素有很多小因素的情况下,因素的数量才会很大。因此,我使用了thaumkid出色的算法,但首先使用了对因数的近似值,该值永远不会太小。这很简单:检查最多29个主要因子,然后检查剩余数量并计算n个因子的上限。用它来计算因子数量的上限,如果该数量足够高,请计算因子的确切数量。

下面的代码不需要出于正确性的这种假设,而是要快速。似乎可行;十万分之一的数字中只有大约一个给出的估计值足够高,需要进行全面检查。

这是代码:

// Return at least the number of factors of n.
static uint64_t approxfactorcount (uint64_t n)
{
    uint64_t count = 1, add;

#define CHECK(d)                            \
    do {                                    \
        if (n % d == 0) {                   \
            add = count;                    \
            do { n /= d; count += add; }    \
            while (n % d == 0);             \
        }                                   \
    } while (0)

    CHECK ( 2); CHECK ( 3); CHECK ( 5); CHECK ( 7); CHECK (11); CHECK (13);
    CHECK (17); CHECK (19); CHECK (23); CHECK (29);
    if (n == 1) return count;
    if (n < 1ull * 31 * 31) return count * 2;
    if (n < 1ull * 31 * 31 * 37) return count * 4;
    if (n < 1ull * 31 * 31 * 37 * 37) return count * 8;
    if (n < 1ull * 31 * 31 * 37 * 37 * 41) return count * 16;
    if (n < 1ull * 31 * 31 * 37 * 37 * 41 * 43) return count * 32;
    if (n < 1ull * 31 * 31 * 37 * 37 * 41 * 43 * 47) return count * 64;
    if (n < 1ull * 31 * 31 * 37 * 37 * 41 * 43 * 47 * 53) return count * 128;
    if (n < 1ull * 31 * 31 * 37 * 37 * 41 * 43 * 47 * 53 * 59) return count * 256;
    if (n < 1ull * 31 * 31 * 37 * 37 * 41 * 43 * 47 * 53 * 59 * 61) return count * 512;
    if (n < 1ull * 31 * 31 * 37 * 37 * 41 * 43 * 47 * 53 * 59 * 61 * 67) return count * 1024;
    if (n < 1ull * 31 * 31 * 37 * 37 * 41 * 43 * 47 * 53 * 59 * 61 * 67 * 71) return count * 2048;
    if (n < 1ull * 31 * 31 * 37 * 37 * 41 * 43 * 47 * 53 * 59 * 61 * 67 * 71 * 73) return count * 4096;
    return count * 1000000;
}

// Return the number of factors of n.
static uint64_t factorcount (uint64_t n)
{
    uint64_t count = 1, add;

    CHECK (2); CHECK (3);

    uint64_t d = 5, inc = 2;
    for (; d*d <= n; d += inc, inc = (6 - inc))
        CHECK (d);

    if (n > 1) count *= 2; // n must be a prime number
    return count;
}

// Prints triangular numbers with record numbers of factors.
static void printrecordnumbers (uint64_t limit)
{
    uint64_t record = 30000;

    uint64_t count1, factor1;
    uint64_t count2 = 1, factor2 = 1;

    for (uint64_t n = 1; n <= limit; ++n)
    {
        factor1 = factor2;
        count1 = count2;

        factor2 = n + 1; if (factor2 % 2 == 0) factor2 /= 2;
        count2 = approxfactorcount (factor2);

        if (count1 * count2 > record)
        {
            uint64_t factors = factorcount (factor1) * factorcount (factor2);
            if (factors > record)
            {
                printf ("%lluth triangular number = %llu has %llu factors\n", n, factor1 * factor2, factors);
                record = factors;
            }
        }
    }
}

这将在0.7秒内找到第14,753,024个三角形,其中有13824个因数;在34秒内找到了879,207,615个三角形,其中有61,440个因数;在10分钟5秒内发现了第123,524,486,975个三角形,具有138,240个因数;在第26,467,792,064个三角形中,具有172,032个因数。 21分25秒(2.4 GHz Core2 Duo),因此该代码平均每个数字仅需要116个处理器周期。最后一个三角形数字本身大于2 ^ 68,因此

I made the assumption that the number of factors is only large if the numbers involved have many small factors. So I used thaumkid’s excellent algorithm, but first used an approximation to the factor count that is never too small. It’s quite simple: Check for prime factors up to 29, then check the remaining number and calculate an upper bound for the nmber of factors. Use this to calculate an upper bound for the number of factors, and if that number is high enough, calculate the exact number of factors.

The code below doesn’t need this assumption for correctness, but to be fast. It seems to work; only about one in 100,000 numbers gives an estimate that is high enough to require a full check.

Here’s the code:

// Return at least the number of factors of n.
static uint64_t approxfactorcount (uint64_t n)
{
    uint64_t count = 1, add;

#define CHECK(d)                            \
    do {                                    \
        if (n % d == 0) {                   \
            add = count;                    \
            do { n /= d; count += add; }    \
            while (n % d == 0);             \
        }                                   \
    } while (0)

    CHECK ( 2); CHECK ( 3); CHECK ( 5); CHECK ( 7); CHECK (11); CHECK (13);
    CHECK (17); CHECK (19); CHECK (23); CHECK (29);
    if (n == 1) return count;
    if (n < 1ull * 31 * 31) return count * 2;
    if (n < 1ull * 31 * 31 * 37) return count * 4;
    if (n < 1ull * 31 * 31 * 37 * 37) return count * 8;
    if (n < 1ull * 31 * 31 * 37 * 37 * 41) return count * 16;
    if (n < 1ull * 31 * 31 * 37 * 37 * 41 * 43) return count * 32;
    if (n < 1ull * 31 * 31 * 37 * 37 * 41 * 43 * 47) return count * 64;
    if (n < 1ull * 31 * 31 * 37 * 37 * 41 * 43 * 47 * 53) return count * 128;
    if (n < 1ull * 31 * 31 * 37 * 37 * 41 * 43 * 47 * 53 * 59) return count * 256;
    if (n < 1ull * 31 * 31 * 37 * 37 * 41 * 43 * 47 * 53 * 59 * 61) return count * 512;
    if (n < 1ull * 31 * 31 * 37 * 37 * 41 * 43 * 47 * 53 * 59 * 61 * 67) return count * 1024;
    if (n < 1ull * 31 * 31 * 37 * 37 * 41 * 43 * 47 * 53 * 59 * 61 * 67 * 71) return count * 2048;
    if (n < 1ull * 31 * 31 * 37 * 37 * 41 * 43 * 47 * 53 * 59 * 61 * 67 * 71 * 73) return count * 4096;
    return count * 1000000;
}

// Return the number of factors of n.
static uint64_t factorcount (uint64_t n)
{
    uint64_t count = 1, add;

    CHECK (2); CHECK (3);

    uint64_t d = 5, inc = 2;
    for (; d*d <= n; d += inc, inc = (6 - inc))
        CHECK (d);

    if (n > 1) count *= 2; // n must be a prime number
    return count;
}

// Prints triangular numbers with record numbers of factors.
static void printrecordnumbers (uint64_t limit)
{
    uint64_t record = 30000;

    uint64_t count1, factor1;
    uint64_t count2 = 1, factor2 = 1;

    for (uint64_t n = 1; n <= limit; ++n)
    {
        factor1 = factor2;
        count1 = count2;

        factor2 = n + 1; if (factor2 % 2 == 0) factor2 /= 2;
        count2 = approxfactorcount (factor2);

        if (count1 * count2 > record)
        {
            uint64_t factors = factorcount (factor1) * factorcount (factor2);
            if (factors > record)
            {
                printf ("%lluth triangular number = %llu has %llu factors\n", n, factor1 * factor2, factors);
                record = factors;
            }
        }
    }
}

This finds the 14,753,024th triangular with 13824 factors in about 0.7 seconds, the 879,207,615th triangular number with 61,440 factors in 34 seconds, the 12,524,486,975th triangular number with 138,240 factors in 10 minutes 5 seconds, and the 26,467,792,064th triangular number with 172,032 factors in 21 minutes 25 seconds (2.4GHz Core2 Duo), so this code takes only 116 processor cycles per number on average. The last triangular number itself is larger than 2^68, so


回答 16

我将“ Jannich Brendle”版本修改为1000,而不是500。并列出了euler12.bin,euler12.erl,p12dist.erl的结果。两种erl代码都使用“ + native”进行编译。

zhengs-MacBook-Pro:workspace zhengzhibin$ time erl -noshell -s p12dist start
The result is: 842161320.

real    0m3.879s
user    0m14.553s
sys     0m0.314s
zhengs-MacBook-Pro:workspace zhengzhibin$ time erl -noshell -s euler12 solve
842161320

real    0m10.125s
user    0m10.078s
sys     0m0.046s
zhengs-MacBook-Pro:workspace zhengzhibin$ time ./euler12.bin 
842161320

real    0m5.370s
user    0m5.328s
sys     0m0.004s
zhengs-MacBook-Pro:workspace zhengzhibin$

I modified “Jannich Brendle” version to 1000 instead 500. And list the result of euler12.bin, euler12.erl, p12dist.erl. Both erl codes use ‘+native’ to compile.

zhengs-MacBook-Pro:workspace zhengzhibin$ time erl -noshell -s p12dist start
The result is: 842161320.

real    0m3.879s
user    0m14.553s
sys     0m0.314s
zhengs-MacBook-Pro:workspace zhengzhibin$ time erl -noshell -s euler12 solve
842161320

real    0m10.125s
user    0m10.078s
sys     0m0.046s
zhengs-MacBook-Pro:workspace zhengzhibin$ time ./euler12.bin 
842161320

real    0m5.370s
user    0m5.328s
sys     0m0.004s
zhengs-MacBook-Pro:workspace zhengzhibin$

回答 17

#include <stdio.h>
#include <math.h>

int factorCount (long n)
{
    double square = sqrt (n);
    int isquare = (int) square+1;
    long candidate = 2;
    int count = 1;
    while(candidate <= isquare && candidate<=n){
        int c = 1;
        while (n % candidate == 0) {
           c++;
           n /= candidate;
        }
        count *= c;
        candidate++;
    }
    return count;
}

int main ()
{
    long triangle = 1;
    int index = 1;
    while (factorCount (triangle) < 1001)
    {
        index ++;
        triangle += index;
    }
    printf ("%ld\n", triangle);
}

gcc -lm -Ofast euler.c

超时时间

2.79s用户0.00s系统99%cpu 2.794

#include <stdio.h>
#include <math.h>

int factorCount (long n)
{
    double square = sqrt (n);
    int isquare = (int) square+1;
    long candidate = 2;
    int count = 1;
    while(candidate <= isquare && candidate<=n){
        int c = 1;
        while (n % candidate == 0) {
           c++;
           n /= candidate;
        }
        count *= c;
        candidate++;
    }
    return count;
}

int main ()
{
    long triangle = 1;
    int index = 1;
    while (factorCount (triangle) < 1001)
    {
        index ++;
        triangle += index;
    }
    printf ("%ld\n", triangle);
}

gcc -lm -Ofast euler.c

time ./a.out

2.79s user 0.00s system 99% cpu 2.794 total


建议使用哪个Python内存分析器?[关闭]

问题:建议使用哪个Python内存分析器?[关闭]

我想知道我的Python应用程序的内存使用情况,尤其想知道哪些代码块/部分或对象消耗了最多的内存。Google搜索显示商用的是Python Memory Validator(仅限Windows)。

开源的是PySizerHeapy

我没有尝试过任何人,所以我想知道哪个是最好的考虑因素:

  1. 提供大多数细节。

  2. 我必须要做最少的更改,也可以不做任何更改。

I want to know the memory usage of my Python application and specifically want to know what code blocks/portions or objects are consuming most memory. Google search shows a commercial one is Python Memory Validator (Windows only).

And open source ones are PySizer and Heapy.

I haven’t tried anyone, so I wanted to know which one is the best considering:

  1. Gives most details.

  2. I have to do least or no changes to my code.


回答 0

很容易使用。在代码中的某些时候,您必须编写以下代码:

from guppy import hpy
h = hpy()
print(h.heap())

这将为您提供如下输出:

Partition of a set of 132527 objects. Total size = 8301532 bytes.
Index  Count   %     Size   % Cumulative  % Kind (class / dict of class)
0  35144  27  2140412  26   2140412  26 str
1  38397  29  1309020  16   3449432  42 tuple
2    530   0   739856   9   4189288  50 dict (no owner)

您还可以从何处引用对象,并获取有关该对象的统计信息,但是以某种方式,该文档上的文档很少。

还有一个用Tk编写的图形浏览器。

Heapy is quite simple to use. At some point in your code, you have to write the following:

from guppy import hpy
h = hpy()
print(h.heap())

This gives you some output like this:

Partition of a set of 132527 objects. Total size = 8301532 bytes.
Index  Count   %     Size   % Cumulative  % Kind (class / dict of class)
0  35144  27  2140412  26   2140412  26 str
1  38397  29  1309020  16   3449432  42 tuple
2    530   0   739856   9   4189288  50 dict (no owner)

You can also find out from where objects are referenced and get statistics about that, but somehow the docs on that are a bit sparse.

There is a graphical browser as well, written in Tk.


回答 1

由于没有人提到它,因此我将指向我的模块memory_profiler,该能够打印内存使用情况的报告,并且可以在Unix和Windows上运行(在最后一个版本中需要psutil)。输出不是很详细,但是目标是让您概述代码在哪里消耗了更多的内存,而不是对分配的对象进行详尽的分析。

在用函数修饰功能@profile并使用-m memory_profiler标记运行代码后,它将打印如下一行报告:

Line #    Mem usage  Increment   Line Contents
==============================================
     3                           @profile
     4      5.97 MB    0.00 MB   def my_func():
     5     13.61 MB    7.64 MB       a = [1] * (10 ** 6)
     6    166.20 MB  152.59 MB       b = [2] * (2 * 10 ** 7)
     7     13.61 MB -152.59 MB       del b
     8     13.61 MB    0.00 MB       return a

Since nobody has mentioned it I’ll point to my module memory_profiler which is capable of printing line-by-line report of memory usage and works on Unix and Windows (needs psutil on this last one). Output is not very detailed but the goal is to give you an overview of where the code is consuming more memory and not a exhaustive analysis on allocated objects.

After decorating your function with @profile and running your code with the -m memory_profiler flag it will print a line-by-line report like this:

Line #    Mem usage  Increment   Line Contents
==============================================
     3                           @profile
     4      5.97 MB    0.00 MB   def my_func():
     5     13.61 MB    7.64 MB       a = [1] * (10 ** 6)
     6    166.20 MB  152.59 MB       b = [2] * (2 * 10 ** 7)
     7     13.61 MB -152.59 MB       del b
     8     13.61 MB    0.00 MB       return a

回答 2

我推荐Dowser。设置非常容易,您需要对代码进行零更改。您可以通过简单的Web界面随时查看每种类型的对象计数,查看活动对象列表,查看对活动对象的引用。

# memdebug.py

import cherrypy
import dowser

def start(port):
    cherrypy.tree.mount(dowser.Root())
    cherrypy.config.update({
        'environment': 'embedded',
        'server.socket_port': port
    })
    cherrypy.server.quickstart()
    cherrypy.engine.start(blocking=False)

您导入memdebug,然后调用memdebug.start。就这样。

我没有尝试过PySizer或Heapy。我会很感激别人的评论。

更新

上面的代码用于CherryPy 2.XCherryPy 3.Xserver.quickstart方法已删除,并且engine.start不带有该blocking标志。因此,如果您正在使用CherryPy 3.X

# memdebug.py

import cherrypy
import dowser

def start(port):
    cherrypy.tree.mount(dowser.Root())
    cherrypy.config.update({
        'environment': 'embedded',
        'server.socket_port': port
    })
    cherrypy.engine.start()

I recommend Dowser. It is very easy to setup, and you need zero changes to your code. You can view counts of objects of each type through time, view list of live objects, view references to live objects, all from the simple web interface.

# memdebug.py

import cherrypy
import dowser

def start(port):
    cherrypy.tree.mount(dowser.Root())
    cherrypy.config.update({
        'environment': 'embedded',
        'server.socket_port': port
    })
    cherrypy.server.quickstart()
    cherrypy.engine.start(blocking=False)

You import memdebug, and call memdebug.start. That’s all.

I haven’t tried PySizer or Heapy. I would appreciate others’ reviews.

UPDATE

The above code is for CherryPy 2.X, CherryPy 3.X the server.quickstart method has been removed and engine.start does not take the blocking flag. So if you are using CherryPy 3.X

# memdebug.py

import cherrypy
import dowser

def start(port):
    cherrypy.tree.mount(dowser.Root())
    cherrypy.config.update({
        'environment': 'embedded',
        'server.socket_port': port
    })
    cherrypy.engine.start()

回答 3

考虑objgraph库(请参阅http://www.lshift.net/blog/2008/11/14/tracing-python-memory-leaks以获得示例用例)。


回答 4

up(还有另一个)是Python的Memory Usage Profiler。该工具集的重点在于识别内存泄漏。

Muppy试图帮助开发人员识别Python应用程序的内存泄漏。它可以跟踪运行时的内存使用情况,并标识泄漏的对象。另外,提供了一些工具,这些工具可以定位未释放对象的来源。

Muppy is (yet another) Memory Usage Profiler for Python. The focus of this toolset is laid on the identification of memory leaks.

Muppy tries to help developers to identity memory leaks of Python applications. It enables the tracking of memory usage during runtime and the identification of objects which are leaking. Additionally, tools are provided which allow to locate the source of not released objects.


回答 5

我正在为Python开发一个名为memprof的内存分析器:

http://jmdana.github.io/memprof/

它允许您在执行装饰方法期间记录和绘制变量的内存使用情况。您只需要使用以下方法导入库:

from memprof import memprof

并使用以下方法装饰您的方法:

@memprof

这是有关情节外观的示例:

在此处输入图片说明

该项目托管在GitHub中:

https://github.com/jmdana/memprof

I’m developing a memory profiler for Python called memprof:

http://jmdana.github.io/memprof/

It allows you to log and plot the memory usage of your variables during the execution of the decorated methods. You just have to import the library using:

from memprof import memprof

And decorate your method using:

@memprof

This is an example on how the plots look like:

enter image description here

The project is hosted in GitHub:

https://github.com/jmdana/memprof


回答 6

我发现苹果酱比Heapy或PySizer更具功能。如果您恰巧正在运行wsgi Webapp,则Dozer是Dowser的一个很好的中间件包装

I found meliae to be much more functional than Heapy or PySizer. If you happen to be running a wsgi webapp, then Dozer is a nice middleware wrapper of Dowser


回答 7

也尝试pytracemalloc项目,该项目提供每个Python行号的内存使用情况。

编辑(2014/04):现在它具有Qt GUI来分析快照。

Try also the pytracemalloc project which provides the memory usage per Python line number.

EDIT (2014/04): It now has a Qt GUI to analyze snapshots.


为什么在Python 3中“范围(1000000000000000(1000000000000001))”这么快?

问题:为什么在Python 3中“范围(1000000000000000(1000000000000001))”这么快?

据我了解,该range()函数实际上是Python 3中的一种对象类型,它会生成器一样动态生成其内容。

在这种情况下,我本以为下一行会花费过多的时间,因为要确定1个四舍五入是否在该范围内,必须生成一个四舍五入值:

1000000000000000 in range(1000000000000001)

此外:似乎无论我添加多少个零,计算多少都花费相同的时间(基本上是瞬时的)。

我也尝试过这样的事情,但是计算仍然是即时的:

1000000000000000000000 in range(0,1000000000000000000001,10) # count by tens

如果我尝试实现自己的范围函数,结果将不是很好!

def my_crappy_range(N):
    i = 0
    while i < N:
        yield i
        i += 1
    return

使range()物体如此之快的物体在做什么?


选择Martijn Pieters的答案是因为它的完整性,但也看到了abarnert的第一个答案,它很好地讨论了在Python 3中range成为完整序列的含义,以及一些有关__contains__跨Python实现的函数优化潜在不一致的信息/警告。。abarnert的其他答案更加详细,并为那些对Python 3优化背后的历史(以及xrangePython 2中缺乏优化)感兴趣的人提供了链接。pokewim的答案感兴趣的人提供了相关的C源代码和说明。

It is my understanding that the range() function, which is actually an object type in Python 3, generates its contents on the fly, similar to a generator.

This being the case, I would have expected the following line to take an inordinate amount of time, because in order to determine whether 1 quadrillion is in the range, a quadrillion values would have to be generated:

1000000000000000 in range(1000000000000001)

Furthermore: it seems that no matter how many zeroes I add on, the calculation more or less takes the same amount of time (basically instantaneous).

I have also tried things like this, but the calculation is still almost instant:

1000000000000000000000 in range(0,1000000000000000000001,10) # count by tens

If I try to implement my own range function, the result is not so nice!!

def my_crappy_range(N):
    i = 0
    while i < N:
        yield i
        i += 1
    return

What is the range() object doing under the hood that makes it so fast?


Martijn Pieters’ answer was chosen for its completeness, but also see abarnert’s first answer for a good discussion of what it means for range to be a full-fledged sequence in Python 3, and some information/warning regarding potential inconsistency for __contains__ function optimization across Python implementations. abarnert’s other answer goes into some more detail and provides links for those interested in the history behind the optimization in Python 3 (and lack of optimization of xrange in Python 2). Answers by poke and by wim provide the relevant C source code and explanations for those who are interested.


回答 0

Python 3 range()对象不会立即产生数字。它是一个智能序列对象,可按需生成数字。它包含的只是您的开始,结束和步长值,然后在对对象进行迭代时,每次迭代都会计算下一个整数。

该对象还实现了object.__contains__hook,并计算您的电话号码是否在其范围内。计算是一个(近)恒定时间运算*。永远不需要扫描范围内的所有可能整数。

range()对象文档中

所述的优点range类型通过常规listtuple是一个范围对象将始终以相同的内存(小)数量,无论它代表的范围内的大小(因为它仅存储startstopstep值,计算各个项目和子范围如所须)。

因此,您的range()对象至少可以做到:

class my_range(object):
    def __init__(self, start, stop=None, step=1):
        if stop is None:
            start, stop = 0, start
        self.start, self.stop, self.step = start, stop, step
        if step < 0:
            lo, hi, step = stop, start, -step
        else:
            lo, hi = start, stop
        self.length = 0 if lo > hi else ((hi - lo - 1) // step) + 1

    def __iter__(self):
        current = self.start
        if self.step < 0:
            while current > self.stop:
                yield current
                current += self.step
        else:
            while current < self.stop:
                yield current
                current += self.step

    def __len__(self):
        return self.length

    def __getitem__(self, i):
        if i < 0:
            i += self.length
        if 0 <= i < self.length:
            return self.start + i * self.step
        raise IndexError('Index out of range: {}'.format(i))

    def __contains__(self, num):
        if self.step < 0:
            if not (self.stop < num <= self.start):
                return False
        else:
            if not (self.start <= num < self.stop):
                return False
        return (num - self.start) % self.step == 0

这仍然缺少实际range()支持的几项内容(例如.index().count()方法,哈希,相等性测试或切片),但应该可以给您一个提示。

我还简化了__contains__实现,只专注于整数测试。如果您为实物range()提供非整数值(包括的子类int),则会启动慢速扫描以查看是否存在匹配项,就好像您对所有包含的值的列表使用了包含测试一样。这样做是为了继续支持其他数字类型,这些数字类型恰好支持使用整数进行相等性测试,但也不希望同时支持整数算术。请参阅实现收容测试的原始Python问题


* 由于Python整数是无界的,所以时间接近恒定,因此数学运算也随着N的增长而及时增长,这使其成为O(log N)运算。由于所有操作均以优化的C代码执行,并且Python将整数值存储在30位块中,因此,由于此处涉及的整数大小,您会用光内存,然后再看到任何性能影响。

The Python 3 range() object doesn’t produce numbers immediately; it is a smart sequence object that produces numbers on demand. All it contains is your start, stop and step values, then as you iterate over the object the next integer is calculated each iteration.

The object also implements the object.__contains__ hook, and calculates if your number is part of its range. Calculating is a (near) constant time operation *. There is never a need to scan through all possible integers in the range.

From the range() object documentation:

The advantage of the range type over a regular list or tuple is that a range object will always take the same (small) amount of memory, no matter the size of the range it represents (as it only stores the start, stop and step values, calculating individual items and subranges as needed).

So at a minimum, your range() object would do:

class my_range(object):
    def __init__(self, start, stop=None, step=1):
        if stop is None:
            start, stop = 0, start
        self.start, self.stop, self.step = start, stop, step
        if step < 0:
            lo, hi, step = stop, start, -step
        else:
            lo, hi = start, stop
        self.length = 0 if lo > hi else ((hi - lo - 1) // step) + 1

    def __iter__(self):
        current = self.start
        if self.step < 0:
            while current > self.stop:
                yield current
                current += self.step
        else:
            while current < self.stop:
                yield current
                current += self.step

    def __len__(self):
        return self.length

    def __getitem__(self, i):
        if i < 0:
            i += self.length
        if 0 <= i < self.length:
            return self.start + i * self.step
        raise IndexError('Index out of range: {}'.format(i))

    def __contains__(self, num):
        if self.step < 0:
            if not (self.stop < num <= self.start):
                return False
        else:
            if not (self.start <= num < self.stop):
                return False
        return (num - self.start) % self.step == 0

This is still missing several things that a real range() supports (such as the .index() or .count() methods, hashing, equality testing, or slicing), but should give you an idea.

I also simplified the __contains__ implementation to only focus on integer tests; if you give a real range() object a non-integer value (including subclasses of int), a slow scan is initiated to see if there is a match, just as if you use a containment test against a list of all the contained values. This was done to continue to support other numeric types that just happen to support equality testing with integers but are not expected to support integer arithmetic as well. See the original Python issue that implemented the containment test.


* Near constant time because Python integers are unbounded and so math operations also grow in time as N grows, making this a O(log N) operation. Since it’s all executed in optimised C code and Python stores integer values in 30-bit chunks, you’d run out of memory before you saw any performance impact due to the size of the integers involved here.


回答 1

此处的根本误解是认为range是生成器。不是。实际上,它不是任何迭代器。

您可以很容易地说出这一点:

>>> a = range(5)
>>> print(list(a))
[0, 1, 2, 3, 4]
>>> print(list(a))
[0, 1, 2, 3, 4]

如果它是一个生成器,则对其进行一次迭代将耗尽它:

>>> b = my_crappy_range(5)
>>> print(list(b))
[0, 1, 2, 3, 4]
>>> print(list(b))
[]

什么range实际上是,是一个序列,就像一个列表。您甚至可以测试一下:

>>> import collections.abc
>>> isinstance(a, collections.abc.Sequence)
True

这意味着它必须遵循成为序列的所有规则:

>>> a[3]         # indexable
3
>>> len(a)       # sized
5
>>> 3 in a       # membership
True
>>> reversed(a)  # reversible
<range_iterator at 0x101cd2360>
>>> a.index(3)   # implements 'index'
3
>>> a.count(3)   # implements 'count'
1

一个之间的差range和一list在于,range动态序列; 它不记得所有的价值,它只是记住它startstopstep,并根据需要创建的值__getitem__

(作为一个旁注,如果您使用print(iter(a)),则会注意到range使用与相同的listiterator类型list。它是如何工作的?A 除了listiterator使用listC的C实现这一事实外,没有使用任何其他特殊方法__getitem__,因此对于range太。)


现在,没有什么可以说Sequence.__contains__必须是恒定时间的-实际上,对于类似的明显示例list,事实并非如此。但是没有什么可以说是不可能的。与range.__contains__(val - start) % step实际进行计算和测试所有值相比,仅对其进行数学检查(,但具有一些额外的复杂性来处理否定步骤)要容易实现,那么为什么这样做会更好呢?

但是似乎没有什么语言可以保证会发生这种情况。正如Ashwini Chaudhari指出的那样,如果您给它提供一个非整数值,而不是转换为整数并进行数学测试,它将落到对所有值进行迭代并逐一进行比较的过程中。不仅因为CPython 3.2+和PyPy 3.x版本恰好包含此优化,而且这是一个显而易见的好主意且易于实现,所以Iron Iron或NewKickAssPython 3.x没有理由不能放弃它。(实际上,CPython 3.0-3.1 并未包含它。)


如果range实际上是一个生成器(如)my_crappy_range,那么以__contains__这种方式进行测试就没有意义,或者至少有一种合理的方式并不明显。如果您已经迭代了前三个值,那么生成器1仍然in是吗?测试是否应该1使其迭代并消耗所有值1(或直到第一个值>= 1)?

The fundamental misunderstanding here is in thinking that range is a generator. It’s not. In fact, it’s not any kind of iterator.

You can tell this pretty easily:

>>> a = range(5)
>>> print(list(a))
[0, 1, 2, 3, 4]
>>> print(list(a))
[0, 1, 2, 3, 4]

If it were a generator, iterating it once would exhaust it:

>>> b = my_crappy_range(5)
>>> print(list(b))
[0, 1, 2, 3, 4]
>>> print(list(b))
[]

What range actually is, is a sequence, just like a list. You can even test this:

>>> import collections.abc
>>> isinstance(a, collections.abc.Sequence)
True

This means it has to follow all the rules of being a sequence:

>>> a[3]         # indexable
3
>>> len(a)       # sized
5
>>> 3 in a       # membership
True
>>> reversed(a)  # reversible
<range_iterator at 0x101cd2360>
>>> a.index(3)   # implements 'index'
3
>>> a.count(3)   # implements 'count'
1

The difference between a range and a list is that a range is a lazy or dynamic sequence; it doesn’t remember all of its values, it just remembers its start, stop, and step, and creates the values on demand on __getitem__.

(As a side note, if you print(iter(a)), you’ll notice that range uses the same listiterator type as list. How does that work? A listiterator doesn’t use anything special about list except for the fact that it provides a C implementation of __getitem__, so it works fine for range too.)


Now, there’s nothing that says that Sequence.__contains__ has to be constant time—in fact, for obvious examples of sequences like list, it isn’t. But there’s nothing that says it can’t be. And it’s easier to implement range.__contains__ to just check it mathematically ((val - start) % step, but with some extra complexity to deal with negative steps) than to actually generate and test all the values, so why shouldn’t it do it the better way?

But there doesn’t seem to be anything in the language that guarantees this will happen. As Ashwini Chaudhari points out, if you give it a non-integral value, instead of converting to integer and doing the mathematical test, it will fall back to iterating all the values and comparing them one by one. And just because CPython 3.2+ and PyPy 3.x versions happen to contain this optimization, and it’s an obvious good idea and easy to do, there’s no reason that IronPython or NewKickAssPython 3.x couldn’t leave it out. (And in fact CPython 3.0-3.1 didn’t include it.)


If range actually were a generator, like my_crappy_range, then it wouldn’t make sense to test __contains__ this way, or at least the way it makes sense wouldn’t be obvious. If you’d already iterated the first 3 values, is 1 still in the generator? Should testing for 1 cause it to iterate and consume all the values up to 1 (or up to the first value >= 1)?


回答 2

使用消息来源,卢克!

在CPython中,range(...).__contains__(方法包装器)最终将委托给一个简单的计算,该计算将检查该值是否可以在该范围内。速度之所以如此,是因为我们使用关于边界的数学推理,而不是range对象的直接迭代。解释所使用的逻辑:

  1. 检查数字在start和之间stop,以及
  2. 检查步幅值是否不会“超过”我们的数字。

例如,994range(4, 1000, 2)因为:

  1. 4 <= 994 < 1000
  2. (994 - 4) % 2 == 0

完整的C代码包含在下面,由于内存管理和引用计数的详细信息,因此较为冗长,但这里存在基本思想:

static int
range_contains_long(rangeobject *r, PyObject *ob)
{
    int cmp1, cmp2, cmp3;
    PyObject *tmp1 = NULL;
    PyObject *tmp2 = NULL;
    PyObject *zero = NULL;
    int result = -1;

    zero = PyLong_FromLong(0);
    if (zero == NULL) /* MemoryError in int(0) */
        goto end;

    /* Check if the value can possibly be in the range. */

    cmp1 = PyObject_RichCompareBool(r->step, zero, Py_GT);
    if (cmp1 == -1)
        goto end;
    if (cmp1 == 1) { /* positive steps: start <= ob < stop */
        cmp2 = PyObject_RichCompareBool(r->start, ob, Py_LE);
        cmp3 = PyObject_RichCompareBool(ob, r->stop, Py_LT);
    }
    else { /* negative steps: stop < ob <= start */
        cmp2 = PyObject_RichCompareBool(ob, r->start, Py_LE);
        cmp3 = PyObject_RichCompareBool(r->stop, ob, Py_LT);
    }

    if (cmp2 == -1 || cmp3 == -1) /* TypeError */
        goto end;
    if (cmp2 == 0 || cmp3 == 0) { /* ob outside of range */
        result = 0;
        goto end;
    }

    /* Check that the stride does not invalidate ob's membership. */
    tmp1 = PyNumber_Subtract(ob, r->start);
    if (tmp1 == NULL)
        goto end;
    tmp2 = PyNumber_Remainder(tmp1, r->step);
    if (tmp2 == NULL)
        goto end;
    /* result = ((int(ob) - start) % step) == 0 */
    result = PyObject_RichCompareBool(tmp2, zero, Py_EQ);
  end:
    Py_XDECREF(tmp1);
    Py_XDECREF(tmp2);
    Py_XDECREF(zero);
    return result;
}

static int
range_contains(rangeobject *r, PyObject *ob)
{
    if (PyLong_CheckExact(ob) || PyBool_Check(ob))
        return range_contains_long(r, ob);

    return (int)_PySequence_IterSearch((PyObject*)r, ob,
                                       PY_ITERSEARCH_CONTAINS);
}

该行的“实质”在该行中提到:

/* result = ((int(ob) - start) % step) == 0 */ 

最后一点-查看range_contains代码段底部的函数。如果确切的类型检查失败,那么我们将不使用描述的巧妙算法,而是使用_PySequence_IterSearch!退回到该范围的愚蠢迭代搜索。您可以在解释器中检查此行为(我在这里使用v3.5.0):

>>> x, r = 1000000000000000, range(1000000000000001)
>>> class MyInt(int):
...     pass
... 
>>> x_ = MyInt(x)
>>> x in r  # calculates immediately :) 
True
>>> x_ in r  # iterates for ages.. :( 
^\Quit (core dumped)

Use the source, Luke!

In CPython, range(...).__contains__ (a method wrapper) will eventually delegate to a simple calculation which checks if the value can possibly be in the range. The reason for the speed here is we’re using mathematical reasoning about the bounds, rather than a direct iteration of the range object. To explain the logic used:

  1. Check that the number is between start and stop, and
  2. Check that the stride value doesn’t “step over” our number.

For example, 994 is in range(4, 1000, 2) because:

  1. 4 <= 994 < 1000, and
  2. (994 - 4) % 2 == 0.

The full C code is included below, which is a bit more verbose because of memory management and reference counting details, but the basic idea is there:

static int
range_contains_long(rangeobject *r, PyObject *ob)
{
    int cmp1, cmp2, cmp3;
    PyObject *tmp1 = NULL;
    PyObject *tmp2 = NULL;
    PyObject *zero = NULL;
    int result = -1;

    zero = PyLong_FromLong(0);
    if (zero == NULL) /* MemoryError in int(0) */
        goto end;

    /* Check if the value can possibly be in the range. */

    cmp1 = PyObject_RichCompareBool(r->step, zero, Py_GT);
    if (cmp1 == -1)
        goto end;
    if (cmp1 == 1) { /* positive steps: start <= ob < stop */
        cmp2 = PyObject_RichCompareBool(r->start, ob, Py_LE);
        cmp3 = PyObject_RichCompareBool(ob, r->stop, Py_LT);
    }
    else { /* negative steps: stop < ob <= start */
        cmp2 = PyObject_RichCompareBool(ob, r->start, Py_LE);
        cmp3 = PyObject_RichCompareBool(r->stop, ob, Py_LT);
    }

    if (cmp2 == -1 || cmp3 == -1) /* TypeError */
        goto end;
    if (cmp2 == 0 || cmp3 == 0) { /* ob outside of range */
        result = 0;
        goto end;
    }

    /* Check that the stride does not invalidate ob's membership. */
    tmp1 = PyNumber_Subtract(ob, r->start);
    if (tmp1 == NULL)
        goto end;
    tmp2 = PyNumber_Remainder(tmp1, r->step);
    if (tmp2 == NULL)
        goto end;
    /* result = ((int(ob) - start) % step) == 0 */
    result = PyObject_RichCompareBool(tmp2, zero, Py_EQ);
  end:
    Py_XDECREF(tmp1);
    Py_XDECREF(tmp2);
    Py_XDECREF(zero);
    return result;
}

static int
range_contains(rangeobject *r, PyObject *ob)
{
    if (PyLong_CheckExact(ob) || PyBool_Check(ob))
        return range_contains_long(r, ob);

    return (int)_PySequence_IterSearch((PyObject*)r, ob,
                                       PY_ITERSEARCH_CONTAINS);
}

The “meat” of the idea is mentioned in the line:

/* result = ((int(ob) - start) % step) == 0 */ 

As a final note – look at the range_contains function at the bottom of the code snippet. If the exact type check fails then we don’t use the clever algorithm described, instead falling back to a dumb iteration search of the range using _PySequence_IterSearch! You can check this behaviour in the interpreter (I’m using v3.5.0 here):

>>> x, r = 1000000000000000, range(1000000000000001)
>>> class MyInt(int):
...     pass
... 
>>> x_ = MyInt(x)
>>> x in r  # calculates immediately :) 
True
>>> x_ in r  # iterates for ages.. :( 
^\Quit (core dumped)

回答 3

为了补充Martijn的答案,这是源代码的相关部分(在C中,因为range对象是用本机代码编写的):

static int
range_contains(rangeobject *r, PyObject *ob)
{
    if (PyLong_CheckExact(ob) || PyBool_Check(ob))
        return range_contains_long(r, ob);

    return (int)_PySequence_IterSearch((PyObject*)r, ob,
                                       PY_ITERSEARCH_CONTAINS);
}

因此对于PyLong对象(int在Python 3中是),它将使用该range_contains_long函数确定结果。该函数实际上检查是否ob在指定范围内(尽管在C语言中看起来更复杂)。

如果不是int对象,它将退回到迭代,直到找到(或没有)值为止。

整个逻辑可以像这样转换为伪Python:

def range_contains (rangeObj, obj):
    if isinstance(obj, int):
        return range_contains_long(rangeObj, obj)

    # default logic by iterating
    return any(obj == x for x in rangeObj)

def range_contains_long (r, num):
    if r.step > 0:
        # positive step: r.start <= num < r.stop
        cmp2 = r.start <= num
        cmp3 = num < r.stop
    else:
        # negative step: r.start >= num > r.stop
        cmp2 = num <= r.start
        cmp3 = r.stop < num

    # outside of the range boundaries
    if not cmp2 or not cmp3:
        return False

    # num must be on a valid step inside the boundaries
    return (num - r.start) % r.step == 0

To add to Martijn’s answer, this is the relevant part of the source (in C, as the range object is written in native code):

static int
range_contains(rangeobject *r, PyObject *ob)
{
    if (PyLong_CheckExact(ob) || PyBool_Check(ob))
        return range_contains_long(r, ob);

    return (int)_PySequence_IterSearch((PyObject*)r, ob,
                                       PY_ITERSEARCH_CONTAINS);
}

So for PyLong objects (which is int in Python 3), it will use the range_contains_long function to determine the result. And that function essentially checks if ob is in the specified range (although it looks a bit more complex in C).

If it’s not an int object, it falls back to iterating until it finds the value (or not).

The whole logic could be translated to pseudo-Python like this:

def range_contains (rangeObj, obj):
    if isinstance(obj, int):
        return range_contains_long(rangeObj, obj)

    # default logic by iterating
    return any(obj == x for x in rangeObj)

def range_contains_long (r, num):
    if r.step > 0:
        # positive step: r.start <= num < r.stop
        cmp2 = r.start <= num
        cmp3 = num < r.stop
    else:
        # negative step: r.start >= num > r.stop
        cmp2 = num <= r.start
        cmp3 = r.stop < num

    # outside of the range boundaries
    if not cmp2 or not cmp3:
        return False

    # num must be on a valid step inside the boundaries
    return (num - r.start) % r.step == 0

回答 4

如果您想知道为什么将此优化添加到range.__contains__,以及为什么将其添加到xrange.__contains__2.7:

首先,正如Ashwini Chaudhary所发现的, 发行1766304已明确打开以进行优化[x]range.__contains__接受了此修补程序并签入了3.2版本,但没有回迁到2.7版本,因为“ xrange表现得如此之久,以至于我看不到它为什么让我们提交最新的修补程序。” (当时2.7快要淘汰了。)

与此同时:

最初xrange是一个非相当序列的对象。如 3.1文档所说:

范围对象的行为很少:它们仅支持索引,迭代和 len功能。

这不是真的。一个xrange对象实际支持,与索引和自动出现一些其他的东西len *包括__contains__(通过线性搜索)。但是,没有人认为有必要在那时将它们完整地序列化。

然后,作为实现抽象基类 PEP的一部分,重要的是弄清楚应将哪些内置类型标记为实现哪些ABC和xrange/ range声称实现collections.Sequence,即使它仍仅处理相同的“非常少的行为”。在发布9213之前,没有人注意到这个问题。该问题的补丁不仅增加indexcount3.2的range,它也重新工作的优化__contains__(共享相同的数学index,并直接使用count)。** 此更改也适用于3.2,并且没有回移植到2.x,因为“这是一个添加了新方法的错误修正”。(此时,2.7已经超过了rc状态。)

因此,有两次机会可以将此优化回溯到2.7,但都被拒绝了。


*实际上,您甚至可以单独使用索引免费获得迭代,但是在2.3 xrange对象中获得了自定义迭代器。

**第一个版本实际上是重新实现了它,并且弄错了细节-例如,它将给您MyIntSubclass(2) in range(5) == False。但是Daniel Stutzbach的补丁更新版本恢复了以前的大部分代码,包括对通用代码的后备支持,_PySequence_IterSearchrange.__contains__在不应用优化的情况下会缓慢地降低3.2 之前版本的隐式使用。

If you’re wondering why this optimization was added to range.__contains__, and why it wasn’t added to xrange.__contains__ in 2.7:

First, as Ashwini Chaudhary discovered, issue 1766304 was opened explicitly to optimize [x]range.__contains__. A patch for this was accepted and checked in for 3.2, but not backported to 2.7 because “xrange has behaved like this for such a long time that I don’t see what it buys us to commit the patch this late.” (2.7 was nearly out at that point.)

Meanwhile:

Originally, xrange was a not-quite-sequence object. As the 3.1 docs say:

Range objects have very little behavior: they only support indexing, iteration, and the len function.

This wasn’t quite true; an xrange object actually supported a few other things that come automatically with indexing and len,* including __contains__ (via linear search). But nobody thought it was worth making them full sequences at the time.

Then, as part of implementing the Abstract Base Classes PEP, it was important to figure out which builtin types should be marked as implementing which ABCs, and xrange/range claimed to implement collections.Sequence, even though it still only handled the same “very little behavior”. Nobody noticed that problem until issue 9213. The patch for that issue not only added index and count to 3.2’s range, it also re-worked the optimized __contains__ (which shares the same math with index, and is directly used by count).** This change went in for 3.2 as well, and was not backported to 2.x, because “it’s a bugfix that adds new methods”. (At this point, 2.7 was already past rc status.)

So, there were two chances to get this optimization backported to 2.7, but they were both rejected.


* In fact, you even get iteration for free with indexing alone, but in 2.3 xrange objects got a custom iterator.

** The first version actually reimplemented it, and got the details wrong—e.g., it would give you MyIntSubclass(2) in range(5) == False. But Daniel Stutzbach’s updated version of the patch restored most of the previous code, including the fallback to the generic, slow _PySequence_IterSearch that pre-3.2 range.__contains__ was implicitly using when the optimization doesn’t apply.


回答 5

其他答案已经很好地说明了这一点,但是我想提供另一个实验来说明范围对象的性质:

>>> r = range(5)
>>> for i in r:
        print(i, 2 in r, list(r))

0 True [0, 1, 2, 3, 4]
1 True [0, 1, 2, 3, 4]
2 True [0, 1, 2, 3, 4]
3 True [0, 1, 2, 3, 4]
4 True [0, 1, 2, 3, 4]

如您所见,范围对象是一个记住其范围的对象,可以多次使用(即使在其上进行迭代),而不仅仅是一次生成器。

The other answers explained it well already, but I’d like to offer another experiment illustrating the nature of range objects:

>>> r = range(5)
>>> for i in r:
        print(i, 2 in r, list(r))

0 True [0, 1, 2, 3, 4]
1 True [0, 1, 2, 3, 4]
2 True [0, 1, 2, 3, 4]
3 True [0, 1, 2, 3, 4]
4 True [0, 1, 2, 3, 4]

As you can see, a range object is an object that remembers its range and can be used many times (even while iterating over it), not just a one-time generator.


回答 6

这是关于一个偷懒的办法来评估和一些额外的优化range。直到实际使用时才需要计算范围内的值,或者由于额外的优化甚至不需要进一步计算。

顺便说一句,您的整数不是那么大,请考虑 sys.maxsize

sys.maxsize in range(sys.maxsize) 相当快

由于优化-比较给定的整数和范围的最小值和最大值很容易。

但:

Decimal(sys.maxsize) in range(sys.maxsize) 很慢

(在这种情况下, range,因此,如果python收到意外的Decimal,则python将比较所有数字)

您应该了解实现细节,但不应依赖它,因为将来可能会改变。

It’s all about a lazy approach to the evaluation and some extra optimization of range. Values in ranges don’t need to be computed until real use, or even further due to extra optimization.

By the way, your integer is not such big, consider sys.maxsize

sys.maxsize in range(sys.maxsize) is pretty fast

due to optimization – it’s easy to compare given integer just with min and max of range.

but:

Decimal(sys.maxsize) in range(sys.maxsize) is pretty slow.

(in this case, there is no optimization in range, so if python receives unexpected Decimal, python will compare all numbers)

You should be aware of an implementation detail but should not be relied upon, because this may change in the future.


回答 7

TL; DR

传回的物件range()实际上是range对象。该对象实现了迭代器接口,因此您可以按顺序迭代其值,就像生成器,列表或元组一样。

但是,它实现了__contains__接口,该接口实际上是当对象出现在in操作员右侧时调用的接口。该__contains__()方法返回a bool左侧项目是否in在对象中。由于range对象知道其边界和步幅,因此在O(1)中非常容易实现。

TL;DR

The object returned by range() is actually a range object. This object implements the iterator interface so you can iterate over its values sequentially, just like a generator, list, or tuple.

But it also implements the __contains__ interface which is actually what gets called when an object appears on the right hand side of the in operator. The __contains__() method returns a bool of whether or not the item on the left-hand-side of the in is in the object. Since range objects know their bounds and stride, this is very easy to implement in O(1).


回答 8

  1. 由于优化,将给定的整数与最小和最大范围进行比较非常容易。
  2. 在Python3 中range()函数之所以如此之快,是因为这里我们对边界使用数学推理,而不是范围对象的直接迭代。
  3. 所以在这里解释逻辑:
    • 检查数字是否在开始和停止之间。
    • 检查步长精度值是否不超过我们的数字。
  4. 例如,997在range(4,1000,3)内是因为:

    4 <= 997 < 1000, and (997 - 4) % 3 == 0.

  1. Due to optimization, it is very easy to compare given integers just with min and max range.
  2. The reason that range() function is so fast in Python3 is that here we use mathematical reasoning for the bounds, rather than a direct iteration of the range object.
  3. So for explaining the logic here:
    • Check whether the number is between the start and stop.
    • Check whether the step precision value doesn’t go over our number.
  4. Take an example, 997 is in range(4, 1000, 3) because:

    4 <= 997 < 1000, and (997 - 4) % 3 == 0.


回答 9

尝试x-1 in (i for i in range(x))使用较大的x值,该值使用生成器理解来避免调用range.__contains__优化。

Try x-1 in (i for i in range(x)) for large x values, which uses a generator comprehension to avoid invoking the range.__contains__ optimisation.


字符串格式:%与.format

问题:字符串格式:%与.format

Python 2.6引入的str.format()方法与现有%运算符的语法略有不同。哪个更好,什么情况下适合?

  1. 以下使用每种方法并具有相同的结果,那么有什么区别?

    #!/usr/bin/python
    sub1 = "python string!"
    sub2 = "an arg"
    
    a = "i am a %s" % sub1
    b = "i am a {0}".format(sub1)
    
    c = "with %(kwarg)s!" % {'kwarg':sub2}
    d = "with {kwarg}!".format(kwarg=sub2)
    
    print a    # "i am a python string!"
    print b    # "i am a python string!"
    print c    # "with an arg!"
    print d    # "with an arg!"
    
  2. 此外,何时在Python中进行字符串格式化?例如,如果我的日志记录级别设置为HIGH,那么执行以下%操作是否还会对我有所帮助?如果是这样,有办法避免这种情况吗?

    log.debug("some debug info: %s" % some_info)

Python 2.6 introduced the str.format() method with a slightly different syntax from the existing % operator. Which is better and for what situations?

  1. The following uses each method and has the same outcome, so what is the difference?

    #!/usr/bin/python
    sub1 = "python string!"
    sub2 = "an arg"
    
    a = "i am a %s" % sub1
    b = "i am a {0}".format(sub1)
    
    c = "with %(kwarg)s!" % {'kwarg':sub2}
    d = "with {kwarg}!".format(kwarg=sub2)
    
    print a    # "i am a python string!"
    print b    # "i am a python string!"
    print c    # "with an arg!"
    print d    # "with an arg!"
    
  2. Furthermore when does string formatting occur in Python? For example, if my logging level is set to HIGH will I still take a hit for performing the following % operation? And if so, is there a way to avoid this?

    log.debug("some debug info: %s" % some_info)
    

回答 0

要回答您的第一个问题… .format在许多方面似乎都更加复杂。令人烦恼的%是它如何可以采用变量或元组。您会认为以下各项将始终有效:

"hi there %s" % name

但是,如果name碰巧(1, 2, 3),它将抛出一个TypeError。为了确保它始终打印,您需要执行

"hi there %s" % (name,)   # supply the single argument as a single-item tuple

真丑。.format没有那些问题。同样在您给出的第二个示例中,该.format示例看起来更加简洁。

为什么不使用它?

  • 不知道(我在阅读本文之前)
  • 必须与Python 2.5兼容

为了回答您的第二个问题,字符串格式化与其他任何操作都同时发生-计算字符串格式化表达式时。而且,Python并不是一种惰性语言,它会在调用函数之前先对表达式求值,因此在您的log.debug示例中,表达式"some debug info: %s"%some_info将首先求值,例如"some debug info: roflcopters are active",然后将该字符串传递给log.debug()

To answer your first question… .format just seems more sophisticated in many ways. An annoying thing about % is also how it can either take a variable or a tuple. You’d think the following would always work:

"hi there %s" % name

yet, if name happens to be (1, 2, 3), it will throw a TypeError. To guarantee that it always prints, you’d need to do

"hi there %s" % (name,)   # supply the single argument as a single-item tuple

which is just ugly. .format doesn’t have those issues. Also in the second example you gave, the .format example is much cleaner looking.

Why would you not use it?

  • not knowing about it (me before reading this)
  • having to be compatible with Python 2.5

To answer your second question, string formatting happens at the same time as any other operation – when the string formatting expression is evaluated. And Python, not being a lazy language, evaluates expressions before calling functions, so in your log.debug example, the expression "some debug info: %s"%some_infowill first evaluate to, e.g. "some debug info: roflcopters are active", then that string will be passed to log.debug().


回答 1

afaik,模运算符(%)无法做到的事情:

tu = (12,45,22222,103,6)
print '{0} {2} {1} {2} {3} {2} {4} {2}'.format(*tu)

结果

12 22222 45 22222 103 22222 6 22222

很有用。

另一点:format()作为函数,可以用作其他函数的参数:

li = [12,45,78,784,2,69,1254,4785,984]
print map('the number is {}'.format,li)   

print

from datetime import datetime,timedelta

once_upon_a_time = datetime(2010, 7, 1, 12, 0, 0)
delta = timedelta(days=13, hours=8,  minutes=20)

gen =(once_upon_a_time +x*delta for x in xrange(20))

print '\n'.join(map('{:%Y-%m-%d %H:%M:%S}'.format, gen))

结果是:

['the number is 12', 'the number is 45', 'the number is 78', 'the number is 784', 'the number is 2', 'the number is 69', 'the number is 1254', 'the number is 4785', 'the number is 984']

2010-07-01 12:00:00
2010-07-14 20:20:00
2010-07-28 04:40:00
2010-08-10 13:00:00
2010-08-23 21:20:00
2010-09-06 05:40:00
2010-09-19 14:00:00
2010-10-02 22:20:00
2010-10-16 06:40:00
2010-10-29 15:00:00
2010-11-11 23:20:00
2010-11-25 07:40:00
2010-12-08 16:00:00
2010-12-22 00:20:00
2011-01-04 08:40:00
2011-01-17 17:00:00
2011-01-31 01:20:00
2011-02-13 09:40:00
2011-02-26 18:00:00
2011-03-12 02:20:00

Something that the modulo operator ( % ) can’t do, afaik:

tu = (12,45,22222,103,6)
print '{0} {2} {1} {2} {3} {2} {4} {2}'.format(*tu)

result

12 22222 45 22222 103 22222 6 22222

Very useful.

Another point: format(), being a function, can be used as an argument in other functions:

li = [12,45,78,784,2,69,1254,4785,984]
print map('the number is {}'.format,li)   

print

from datetime import datetime,timedelta

once_upon_a_time = datetime(2010, 7, 1, 12, 0, 0)
delta = timedelta(days=13, hours=8,  minutes=20)

gen =(once_upon_a_time +x*delta for x in xrange(20))

print '\n'.join(map('{:%Y-%m-%d %H:%M:%S}'.format, gen))

Results in:

['the number is 12', 'the number is 45', 'the number is 78', 'the number is 784', 'the number is 2', 'the number is 69', 'the number is 1254', 'the number is 4785', 'the number is 984']

2010-07-01 12:00:00
2010-07-14 20:20:00
2010-07-28 04:40:00
2010-08-10 13:00:00
2010-08-23 21:20:00
2010-09-06 05:40:00
2010-09-19 14:00:00
2010-10-02 22:20:00
2010-10-16 06:40:00
2010-10-29 15:00:00
2010-11-11 23:20:00
2010-11-25 07:40:00
2010-12-08 16:00:00
2010-12-22 00:20:00
2011-01-04 08:40:00
2011-01-17 17:00:00
2011-01-31 01:20:00
2011-02-13 09:40:00
2011-02-26 18:00:00
2011-03-12 02:20:00

回答 2

假设您正在使用Python的logging模块,则可以将字符串格式参数作为参数传递给.debug()方法,而不必自己进行格式设置:

log.debug("some debug info: %s", some_info)

除非记录器实际记录某些内容,否则可以避免进行格式化。

Assuming you’re using Python’s logging module, you can pass the string formatting arguments as arguments to the .debug() method rather than doing the formatting yourself:

log.debug("some debug info: %s", some_info)

which avoids doing the formatting unless the logger actually logs something.


回答 3

从Python 3.6(2016)开始,您可以使用f字符串替换变量:

>>> origin = "London"
>>> destination = "Paris"
>>> f"from {origin} to {destination}"
'from London to Paris'

注意f"前缀。如果您在Python 3.5或更早版本中尝试此操作,则会看到一个SyntaxError

参见https://docs.python.org/3.6/reference/lexical_analysis.html#f-strings

As of Python 3.6 (2016) you can use f-strings to substitute variables:

>>> origin = "London"
>>> destination = "Paris"
>>> f"from {origin} to {destination}"
'from London to Paris'

Note the f" prefix. If you try this in Python 3.5 or earlier, you’ll get a SyntaxError.

See https://docs.python.org/3.6/reference/lexical_analysis.html#f-strings


回答 4

PEP 3101提议%用Python 3中新的高级字符串格式替换运算符,这将是默认格式。

PEP 3101 proposes the replacement of the % operator with the new, advanced string formatting in Python 3, where it would be the default.


回答 5

但是请小心,刚才我在尝试%.format现有代码替换所有内容时发现了一个问题:'{}'.format(unicode_string)将尝试对unicode_string进行编码,并且可能会失败。

只需查看以下Python交互式会话日志即可:

Python 2.7.2 (default, Aug 27 2012, 19:52:55) 
[GCC 4.1.2 20080704 (Red Hat 4.1.2-48)] on linux2
; s='й'
; u=u'й'
; s
'\xd0\xb9'
; u
u'\u0439'

s只是一个字符串(在Python3中称为“字节数组”),并且u是Unicode字符串(在Python3中称为“字符串”):

; '%s' % s
'\xd0\xb9'
; '%s' % u
u'\u0439'

当将Unicode对象作为%运算符的参数时,即使原始字符串不是Unicode,它也会产生一个Unicode字符串:

; '{}'.format(s)
'\xd0\xb9'
; '{}'.format(u)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
UnicodeEncodeError: 'latin-1' codec can't encode character u'\u0439' in position 0: ordinal not in range(256)

但该.format函数将引发“ UnicodeEncodeError”:

; u'{}'.format(s)
u'\xd0\xb9'
; u'{}'.format(u)
u'\u0439'

并且仅当原始字符串为Unicode时,它才可以与Unicode参数一起使用。

; '{}'.format(u'i')
'i'

或者参数字符串可以转换为字符串(所谓的“字节数组”)

But please be careful, just now I’ve discovered one issue when trying to replace all % with .format in existing code: '{}'.format(unicode_string) will try to encode unicode_string and will probably fail.

Just look at this Python interactive session log:

Python 2.7.2 (default, Aug 27 2012, 19:52:55) 
[GCC 4.1.2 20080704 (Red Hat 4.1.2-48)] on linux2
; s='й'
; u=u'й'
; s
'\xd0\xb9'
; u
u'\u0439'

s is just a string (called ‘byte array’ in Python3) and u is a Unicode string (called ‘string’ in Python3):

; '%s' % s
'\xd0\xb9'
; '%s' % u
u'\u0439'

When you give a Unicode object as a parameter to % operator it will produce a Unicode string even if the original string wasn’t Unicode:

; '{}'.format(s)
'\xd0\xb9'
; '{}'.format(u)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
UnicodeEncodeError: 'latin-1' codec can't encode character u'\u0439' in position 0: ordinal not in range(256)

but the .format function will raise “UnicodeEncodeError”:

; u'{}'.format(s)
u'\xd0\xb9'
; u'{}'.format(u)
u'\u0439'

and it will work with a Unicode argument fine only if the original string was Unicode.

; '{}'.format(u'i')
'i'

or if argument string can be converted to a string (so called ‘byte array’)


回答 6

的另一个优点.format(我没有在答案中看到):它可以具有对象属性。

In [12]: class A(object):
   ....:     def __init__(self, x, y):
   ....:         self.x = x
   ....:         self.y = y
   ....:         

In [13]: a = A(2,3)

In [14]: 'x is {0.x}, y is {0.y}'.format(a)
Out[14]: 'x is 2, y is 3'

或者,作为关键字参数:

In [15]: 'x is {a.x}, y is {a.y}'.format(a=a)
Out[15]: 'x is 2, y is 3'

%据我所知,这是不可能的。

Yet another advantage of .format (which I don’t see in the answers): it can take object properties.

In [12]: class A(object):
   ....:     def __init__(self, x, y):
   ....:         self.x = x
   ....:         self.y = y
   ....:         

In [13]: a = A(2,3)

In [14]: 'x is {0.x}, y is {0.y}'.format(a)
Out[14]: 'x is 2, y is 3'

Or, as a keyword argument:

In [15]: 'x is {a.x}, y is {a.y}'.format(a=a)
Out[15]: 'x is 2, y is 3'

This is not possible with % as far as I can tell.


回答 7

%format我的测试提供更好的性能。

测试代码:

Python 2.7.2:

import timeit
print 'format:', timeit.timeit("'{}{}{}'.format(1, 1.23, 'hello')")
print '%:', timeit.timeit("'%s%s%s' % (1, 1.23, 'hello')")

结果:

> format: 0.470329046249
> %: 0.357107877731

Python 3.5.2

import timeit
print('format:', timeit.timeit("'{}{}{}'.format(1, 1.23, 'hello')"))
print('%:', timeit.timeit("'%s%s%s' % (1, 1.23, 'hello')"))

结果

> format: 0.5864730989560485
> %: 0.013593495357781649

它在Python2中看起来很小,而在Python3中%则比快得多format

感谢@Chris Cogdon提供示例代码。

编辑1:

2019年7月在Python 3.7.2中再次测试。

结果:

> format: 0.86600608
> %: 0.630180146

没有太大的区别。我想Python正在逐步完善。

编辑2:

在有人在评论中提到python 3的f字符串后,我在python 3.7.2下对以下代码进行了测试:

import timeit
print('format:', timeit.timeit("'{}{}{}'.format(1, 1.23, 'hello')"))
print('%:', timeit.timeit("'%s%s%s' % (1, 1.23, 'hello')"))
print('f-string:', timeit.timeit("f'{1}{1.23}{\"hello\"}'"))

结果:

format: 0.8331376779999999
%: 0.6314778750000001
f-string: 0.766649943

看来f-string仍然比慢,%但比慢format

% gives better performance than format from my test.

Test code:

Python 2.7.2:

import timeit
print 'format:', timeit.timeit("'{}{}{}'.format(1, 1.23, 'hello')")
print '%:', timeit.timeit("'%s%s%s' % (1, 1.23, 'hello')")

Result:

> format: 0.470329046249
> %: 0.357107877731

Python 3.5.2

import timeit
print('format:', timeit.timeit("'{}{}{}'.format(1, 1.23, 'hello')"))
print('%:', timeit.timeit("'%s%s%s' % (1, 1.23, 'hello')"))

Result

> format: 0.5864730989560485
> %: 0.013593495357781649

It looks in Python2, the difference is small whereas in Python3, % is much faster than format.

Thanks @Chris Cogdon for the sample code.

Edit 1:

Tested again in Python 3.7.2 in July 2019.

Result:

> format: 0.86600608
> %: 0.630180146

There is not much difference. I guess Python is improving gradually.

Edit 2:

After someone mentioned python 3’s f-string in comment, I did a test for the following code under python 3.7.2 :

import timeit
print('format:', timeit.timeit("'{}{}{}'.format(1, 1.23, 'hello')"))
print('%:', timeit.timeit("'%s%s%s' % (1, 1.23, 'hello')"))
print('f-string:', timeit.timeit("f'{1}{1.23}{\"hello\"}'"))

Result:

format: 0.8331376779999999
%: 0.6314778750000001
f-string: 0.766649943

It seems f-string is still slower than % but better than format.


回答 8

正如我今天发现的那样,通过格式化字符串的旧方法%不支持Decimalpython的用于十进制定点和浮点算术的模块。

示例(使用Python 3.3.5):

#!/usr/bin/env python3

from decimal import *

getcontext().prec = 50
d = Decimal('3.12375239e-24') # no magic number, I rather produced it by banging my head on my keyboard

print('%.50f' % d)
print('{0:.50f}'.format(d))

输出:

0.00000000000000000000000312375239000000009907464850 0.00000000000000000000000312312239239000000000000000000

当然可能有解决方法,但是您仍然可以考虑立即使用该format()方法。

As I discovered today, the old way of formatting strings via % doesn’t support Decimal, Python’s module for decimal fixed point and floating point arithmetic, out of the box.

Example (using Python 3.3.5):

#!/usr/bin/env python3

from decimal import *

getcontext().prec = 50
d = Decimal('3.12375239e-24') # no magic number, I rather produced it by banging my head on my keyboard

print('%.50f' % d)
print('{0:.50f}'.format(d))

Output:

0.00000000000000000000000312375239000000009907464850 0.00000000000000000000000312375239000000000000000000

There surely might be work-arounds but you still might consider using the format() method right away.


回答 9

如果您的python> = 3.6,则F字符串格式的文字是您的新朋友。

它更简单,更干净,性能更好。

In [1]: params=['Hello', 'adam', 42]

In [2]: %timeit "%s %s, the answer to everything is %d."%(params[0],params[1],params[2])
448 ns ± 1.48 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)

In [3]: %timeit "{} {}, the answer to everything is {}.".format(*params)
449 ns ± 1.42 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)

In [4]: %timeit f"{params[0]} {params[1]}, the answer to everything is {params[2]}."
12.7 ns ± 0.0129 ns per loop (mean ± std. dev. of 7 runs, 100000000 loops each)

If your python >= 3.6, F-string formatted literal is your new friend.

It’s more simple, clean, and better performance.

In [1]: params=['Hello', 'adam', 42]

In [2]: %timeit "%s %s, the answer to everything is %d."%(params[0],params[1],params[2])
448 ns ± 1.48 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)

In [3]: %timeit "{} {}, the answer to everything is {}.".format(*params)
449 ns ± 1.42 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)

In [4]: %timeit f"{params[0]} {params[1]}, the answer to everything is {params[2]}."
12.7 ns ± 0.0129 ns per loop (mean ± std. dev. of 7 runs, 100000000 loops each)

回答 10

附带说明,您不必为了提高性能而在日志记录中使用新样式格式。您可以将任何实现了magic方法的对象传递给logging.debuglogging.info等等__str__。当日志记录模块决定必须发出您的消息对象(无论它是什么)时,它将str(message_object)在发出消息之前先进行调用。因此,您可以执行以下操作:

import logging


class NewStyleLogMessage(object):
    def __init__(self, message, *args, **kwargs):
        self.message = message
        self.args = args
        self.kwargs = kwargs

    def __str__(self):
        args = (i() if callable(i) else i for i in self.args)
        kwargs = dict((k, v() if callable(v) else v) for k, v in self.kwargs.items())

        return self.message.format(*args, **kwargs)

N = NewStyleLogMessage

# Neither one of these messages are formatted (or calculated) until they're
# needed

# Emits "Lazily formatted log entry: 123 foo" in log
logging.debug(N('Lazily formatted log entry: {0} {keyword}', 123, keyword='foo'))


def expensive_func():
    # Do something that takes a long time...
    return 'foo'

# Emits "Expensive log entry: foo" in log
logging.debug(N('Expensive log entry: {keyword}', keyword=expensive_func))

所有这些都在Python 3文档(https://docs.python.org/3/howto/logging-cookbook.html#formatting-styles)中进行了描述。但是,它也可以在Python 2.6中使用(https://docs.python.org/2.6/library/logging.html#using-arbitrary-objects-as-messages)。

使用该技术的优点之一是它允许使用惰性值,例如expensive_func上面的函数,这是事实,除了格式风格不可知。这为Python文档中的建议提供了更优雅的替代方法:https : //docs.python.org/2.6/library/logging.html#optimization

As a side note, you don’t have to take a performance hit to use new style formatting with logging. You can pass any object to logging.debug, logging.info, etc. that implements the __str__ magic method. When the logging module has decided that it must emit your message object (whatever it is), it calls str(message_object) before doing so. So you could do something like this:

import logging


class NewStyleLogMessage(object):
    def __init__(self, message, *args, **kwargs):
        self.message = message
        self.args = args
        self.kwargs = kwargs

    def __str__(self):
        args = (i() if callable(i) else i for i in self.args)
        kwargs = dict((k, v() if callable(v) else v) for k, v in self.kwargs.items())

        return self.message.format(*args, **kwargs)

N = NewStyleLogMessage

# Neither one of these messages are formatted (or calculated) until they're
# needed

# Emits "Lazily formatted log entry: 123 foo" in log
logging.debug(N('Lazily formatted log entry: {0} {keyword}', 123, keyword='foo'))


def expensive_func():
    # Do something that takes a long time...
    return 'foo'

# Emits "Expensive log entry: foo" in log
logging.debug(N('Expensive log entry: {keyword}', keyword=expensive_func))

This is all described in the Python 3 documentation (https://docs.python.org/3/howto/logging-cookbook.html#formatting-styles). However, it will work with Python 2.6 as well (https://docs.python.org/2.6/library/logging.html#using-arbitrary-objects-as-messages).

One of the advantages of using this technique, other than the fact that it’s formatting-style agnostic, is that it allows for lazy values e.g. the function expensive_func above. This provides a more elegant alternative to the advice being given in the Python docs here: https://docs.python.org/2.6/library/logging.html#optimization.


回答 11

一种%可能有用的情况是格式化正则表达式时。例如,

'{type_names} [a-z]{2}'.format(type_names='triangle|square')

加薪IndexError。在这种情况下,您可以使用:

'%(type_names)s [a-z]{2}' % {'type_names': 'triangle|square'}

这样可以避免将正则表达式写为'{type_names} [a-z]{{2}}'。当您有两个正则表达式时,这很有用,其中一个正则表达式单独使用而没有格式,但是两个正则表达式的连接都已格式化。

One situation where % may help is when you are formatting regex expressions. For example,

'{type_names} [a-z]{2}'.format(type_names='triangle|square')

raises IndexError. In this situation, you can use:

'%(type_names)s [a-z]{2}' % {'type_names': 'triangle|square'}

This avoids writing the regex as '{type_names} [a-z]{{2}}'. This can be useful when you have two regexes, where one is used alone without format, but the concatenation of both is formatted.


回答 12

我要补充一点,从3.6版开始,我们可以像下面这样使用fstrings

foo = "john"
bar = "smith"
print(f"My name is {foo} {bar}")

哪个给

我叫约翰·史密斯

一切都转换为字符串

mylist = ["foo", "bar"]
print(f"mylist = {mylist}")

结果:

mylist = [‘foo’,’bar’]

您可以像其他格式一样传递函数

print(f'Hello, here is the date : {time.strftime("%d/%m/%Y")}')

举个例子

您好,这是日期:16/04/2018

I would add that since version 3.6, we can use fstrings like the following

foo = "john"
bar = "smith"
print(f"My name is {foo} {bar}")

Which give

My name is john smith

Everything is converted to strings

mylist = ["foo", "bar"]
print(f"mylist = {mylist}")

Result:

mylist = [‘foo’, ‘bar’]

you can pass function, like in others formats method

print(f'Hello, here is the date : {time.strftime("%d/%m/%Y")}')

Giving for example

Hello, here is the date : 16/04/2018


回答 13

对于python版本> = 3.6(请参阅PEP 498

s1='albha'
s2='beta'

f'{s1}{s2:>10}'

#output
'albha      beta'

For python version >= 3.6 (see PEP 498)

s1='albha'
s2='beta'

f'{s1}{s2:>10}'

#output
'albha      beta'

回答 14

Python 3.6.7比较:

#!/usr/bin/env python
import timeit

def time_it(fn):
    """
    Measure time of execution of a function
    """
    def wrapper(*args, **kwargs):
        t0 = timeit.default_timer()
        fn(*args, **kwargs)
        t1 = timeit.default_timer()
        print("{0:.10f} seconds".format(t1 - t0))
    return wrapper


@time_it
def new_new_format(s):
    print("new_new_format:", f"{s[0]} {s[1]} {s[2]} {s[3]} {s[4]}")


@time_it
def new_format(s):
    print("new_format:", "{0} {1} {2} {3} {4}".format(*s))


@time_it
def old_format(s):
    print("old_format:", "%s %s %s %s %s" % s)


def main():
    samples = (("uno", "dos", "tres", "cuatro", "cinco"), (1,2,3,4,5), (1.1, 2.1, 3.1, 4.1, 5.1), ("uno", 2, 3.14, "cuatro", 5.5),) 
    for s in samples:
        new_new_format(s)
        new_format(s)
        old_format(s)
        print("-----")


if __name__ == '__main__':
    main()

输出:

new_new_format: uno dos tres cuatro cinco
0.0000170280 seconds
new_format: uno dos tres cuatro cinco
0.0000046750 seconds
old_format: uno dos tres cuatro cinco
0.0000034820 seconds
-----
new_new_format: 1 2 3 4 5
0.0000043980 seconds
new_format: 1 2 3 4 5
0.0000062590 seconds
old_format: 1 2 3 4 5
0.0000041730 seconds
-----
new_new_format: 1.1 2.1 3.1 4.1 5.1
0.0000092650 seconds
new_format: 1.1 2.1 3.1 4.1 5.1
0.0000055340 seconds
old_format: 1.1 2.1 3.1 4.1 5.1
0.0000052130 seconds
-----
new_new_format: uno 2 3.14 cuatro 5.5
0.0000053380 seconds
new_format: uno 2 3.14 cuatro 5.5
0.0000047570 seconds
old_format: uno 2 3.14 cuatro 5.5
0.0000045320 seconds
-----

Python 3.6.7 comparative:

#!/usr/bin/env python
import timeit

def time_it(fn):
    """
    Measure time of execution of a function
    """
    def wrapper(*args, **kwargs):
        t0 = timeit.default_timer()
        fn(*args, **kwargs)
        t1 = timeit.default_timer()
        print("{0:.10f} seconds".format(t1 - t0))
    return wrapper


@time_it
def new_new_format(s):
    print("new_new_format:", f"{s[0]} {s[1]} {s[2]} {s[3]} {s[4]}")


@time_it
def new_format(s):
    print("new_format:", "{0} {1} {2} {3} {4}".format(*s))


@time_it
def old_format(s):
    print("old_format:", "%s %s %s %s %s" % s)


def main():
    samples = (("uno", "dos", "tres", "cuatro", "cinco"), (1,2,3,4,5), (1.1, 2.1, 3.1, 4.1, 5.1), ("uno", 2, 3.14, "cuatro", 5.5),) 
    for s in samples:
        new_new_format(s)
        new_format(s)
        old_format(s)
        print("-----")


if __name__ == '__main__':
    main()

Output:

new_new_format: uno dos tres cuatro cinco
0.0000170280 seconds
new_format: uno dos tres cuatro cinco
0.0000046750 seconds
old_format: uno dos tres cuatro cinco
0.0000034820 seconds
-----
new_new_format: 1 2 3 4 5
0.0000043980 seconds
new_format: 1 2 3 4 5
0.0000062590 seconds
old_format: 1 2 3 4 5
0.0000041730 seconds
-----
new_new_format: 1.1 2.1 3.1 4.1 5.1
0.0000092650 seconds
new_format: 1.1 2.1 3.1 4.1 5.1
0.0000055340 seconds
old_format: 1.1 2.1 3.1 4.1 5.1
0.0000052130 seconds
-----
new_new_format: uno 2 3.14 cuatro 5.5
0.0000053380 seconds
new_format: uno 2 3.14 cuatro 5.5
0.0000047570 seconds
old_format: uno 2 3.14 cuatro 5.5
0.0000045320 seconds
-----

回答 15

但是有一件事是,如果您嵌套了花括号,则不能使用格式但%可以使用。

例:

>>> '{{0}, {1}}'.format(1,2)
Traceback (most recent call last):
  File "<pyshell#3>", line 1, in <module>
    '{{0}, {1}}'.format(1,2)
ValueError: Single '}' encountered in format string
>>> '{%s, %s}'%(1,2)
'{1, 2}'
>>> 

But one thing is that also if you have nested curly-braces, won’t work for format but % will work.

Example:

>>> '{{0}, {1}}'.format(1,2)
Traceback (most recent call last):
  File "<pyshell#3>", line 1, in <module>
    '{{0}, {1}}'.format(1,2)
ValueError: Single '}' encountered in format string
>>> '{%s, %s}'%(1,2)
'{1, 2}'
>>> 

如何配置Python脚本?

问题:如何配置Python脚本?

欧拉计画和其他编码竞赛经常有最长的运行时间,或者人们吹嘘他们的特定解决方案的运行速度。使用Python时,有时这些方法有些繁琐-即向中添加计时代码__main__

剖析Python程序需要花费多长时间的好方法是什么?

Project Euler and other coding contests often have a maximum time to run or people boast of how fast their particular solution runs. With Python, sometimes the approaches are somewhat kludgey – i.e., adding timing code to __main__.

What is a good way to profile how long a Python program takes to run?


回答 0

Python包含一个名为cProfile的探查器。它不仅给出了总的运行时间,还分别对每个函数进行了计时,并告诉您每个函数被调用了多少次,从而使您轻松确定应该在哪里进行优化。

您可以从代码内部或解释器中调用它,如下所示:

import cProfile
cProfile.run('foo()')

更有用的是,您可以在运行脚本时调用cProfile:

python -m cProfile myscript.py

为了使其更容易,我制作了一个名为“ profile.bat”的批处理文件:

python -m cProfile %1

所以我要做的就是运行:

profile euler048.py

我得到这个:

1007 function calls in 0.061 CPU seconds

Ordered by: standard name
ncalls  tottime  percall  cumtime  percall filename:lineno(function)
    1    0.000    0.000    0.061    0.061 <string>:1(<module>)
 1000    0.051    0.000    0.051    0.000 euler048.py:2(<lambda>)
    1    0.005    0.005    0.061    0.061 euler048.py:2(<module>)
    1    0.000    0.000    0.061    0.061 {execfile}
    1    0.002    0.002    0.053    0.053 {map}
    1    0.000    0.000    0.000    0.000 {method 'disable' of '_lsprof.Profiler objects}
    1    0.000    0.000    0.000    0.000 {range}
    1    0.003    0.003    0.003    0.003 {sum}

编辑:更新了指向PyCon 2013的视频资源的链接,标题为 Python Profiling
Also via YouTube

Python includes a profiler called cProfile. It not only gives the total running time, but also times each function separately, and tells you how many times each function was called, making it easy to determine where you should make optimizations.

You can call it from within your code, or from the interpreter, like this:

import cProfile
cProfile.run('foo()')

Even more usefully, you can invoke the cProfile when running a script:

python -m cProfile myscript.py

To make it even easier, I made a little batch file called ‘profile.bat’:

python -m cProfile %1

So all I have to do is run:

profile euler048.py

And I get this:

1007 function calls in 0.061 CPU seconds

Ordered by: standard name
ncalls  tottime  percall  cumtime  percall filename:lineno(function)
    1    0.000    0.000    0.061    0.061 <string>:1(<module>)
 1000    0.051    0.000    0.051    0.000 euler048.py:2(<lambda>)
    1    0.005    0.005    0.061    0.061 euler048.py:2(<module>)
    1    0.000    0.000    0.061    0.061 {execfile}
    1    0.002    0.002    0.053    0.053 {map}
    1    0.000    0.000    0.000    0.000 {method 'disable' of '_lsprof.Profiler objects}
    1    0.000    0.000    0.000    0.000 {range}
    1    0.003    0.003    0.003    0.003 {sum}

EDIT: Updated link to a good video resource from PyCon 2013 titled Python Profiling
Also via YouTube.


回答 1

前一阵子,我pycallgraph从您的Python代码生成了可视化效果。编辑:我已经更新了该示例以使其可用于本文撰写时的最新版本3.3。

pip install pycallgraph安装GraphViz之后,您可以从命令行运行它:

pycallgraph graphviz -- ./mypythonscript.py

或者,您可以分析代码的特定部分:

from pycallgraph import PyCallGraph
from pycallgraph.output import GraphvizOutput

with PyCallGraph(output=GraphvizOutput()):
    code_to_profile()

这些都将生成pycallgraph.png类似于下图的文件:

在此处输入图片说明

A while ago I made pycallgraph which generates a visualisation from your Python code. Edit: I’ve updated the example to work with 3.3, the latest release as of this writing.

After a pip install pycallgraph and installing GraphViz you can run it from the command line:

pycallgraph graphviz -- ./mypythonscript.py

Or, you can profile particular parts of your code:

from pycallgraph import PyCallGraph
from pycallgraph.output import GraphvizOutput

with PyCallGraph(output=GraphvizOutput()):
    code_to_profile()

Either of these will generate a pycallgraph.png file similar to the image below:

enter image description here


回答 2

值得指出的是,使用探查器仅在主线程上有效(默认情况下),如果使用其他线程,则不会从其他线程获得任何信息。这可能有点麻烦,因为在探查器文档中完全没有提及。

如果您还想分析线程,则需要查看文档中的threading.setprofile()函数

您也可以创建自己的threading.Thread子类来做到这一点:

class ProfiledThread(threading.Thread):
    # Overrides threading.Thread.run()
    def run(self):
        profiler = cProfile.Profile()
        try:
            return profiler.runcall(threading.Thread.run, self)
        finally:
            profiler.dump_stats('myprofile-%d.profile' % (self.ident,))

并使用ProfiledThread该类而不是标准类。它可能会给您带来更大的灵活性,但是我不确定是否值得,特别是如果您使用的是不使用您的类的第三方代码。

It’s worth pointing out that using the profiler only works (by default) on the main thread, and you won’t get any information from other threads if you use them. This can be a bit of a gotcha as it is completely unmentioned in the profiler documentation.

If you also want to profile threads, you’ll want to look at the threading.setprofile() function in the docs.

You could also create your own threading.Thread subclass to do it:

class ProfiledThread(threading.Thread):
    # Overrides threading.Thread.run()
    def run(self):
        profiler = cProfile.Profile()
        try:
            return profiler.runcall(threading.Thread.run, self)
        finally:
            profiler.dump_stats('myprofile-%d.profile' % (self.ident,))

and use that ProfiledThread class instead of the standard one. It might give you more flexibility, but I’m not sure it’s worth it, especially if you are using third-party code which wouldn’t use your class.


回答 3

python Wiki是用于分析资源的好页面:http : //wiki.python.org/moin/PythonSpeed/PerformanceTips#Profiling_Code

就像python docs一样:http : //docs.python.org/library/profile.html

如Chris Lawlor所示,cProfile是一个很棒的工具,可以轻松地用于打印到屏幕上:

python -m cProfile -s time mine.py <args>

或提交:

python -m cProfile -o output.file mine.py <args>

PS>如果您使用的是Ubuntu,请确保安装python-profile

apt-get install python-profiler 

如果输出到文件,则可以使用以下工具获得不错的可视化效果

PyCallGraph:用于创建调用图图像的工具
安装:

 pip install pycallgraph

跑:

 pycallgraph mine.py args

视图:

 gimp pycallgraph.png

您可以使用任何喜欢的方式查看png文件,我使用的是gimp
不幸的是我经常得到

点:对于cairo-renderer位图,图形太大。按0.257079缩放以适应

这使我的图像变小了。所以我通常创建svg文件:

pycallgraph -f svg -o pycallgraph.svg mine.py <args>

PS>确保安装graphviz(提供点程序):

pip install graphviz

通过@maxy / @quodlibetor使用gprof2dot进行替代绘图:

pip install gprof2dot
python -m cProfile -o profile.pstats mine.py
gprof2dot -f pstats profile.pstats | dot -Tsvg -o mine.svg

The python wiki is a great page for profiling resources: http://wiki.python.org/moin/PythonSpeed/PerformanceTips#Profiling_Code

as is the python docs: http://docs.python.org/library/profile.html

as shown by Chris Lawlor cProfile is a great tool and can easily be used to print to the screen:

python -m cProfile -s time mine.py <args>

or to file:

python -m cProfile -o output.file mine.py <args>

PS> If you are using Ubuntu, make sure to install python-profile

apt-get install python-profiler 

If you output to file you can get nice visualizations using the following tools

PyCallGraph : a tool to create call graph images
install:

 pip install pycallgraph

run:

 pycallgraph mine.py args

view:

 gimp pycallgraph.png

You can use whatever you like to view the png file, I used gimp
Unfortunately I often get

dot: graph is too large for cairo-renderer bitmaps. Scaling by 0.257079 to fit

which makes my images unusably small. So I generally create svg files:

pycallgraph -f svg -o pycallgraph.svg mine.py <args>

PS> make sure to install graphviz (which provides the dot program):

pip install graphviz

Alternative Graphing using gprof2dot via @maxy / @quodlibetor :

pip install gprof2dot
python -m cProfile -o profile.pstats mine.py
gprof2dot -f pstats profile.pstats | dot -Tsvg -o mine.svg

回答 4

@Maxy对这个答案的评论为我提供了足够的帮助,我认为它应该得到自己的答案:我已经有cProfile生成的.pstats文件,并且我不想用pycallgraph重新运行,所以我使用了gprof2dot,并且很漂亮svgs:

$ sudo apt-get install graphviz
$ git clone https://github.com/jrfonseca/gprof2dot
$ ln -s "$PWD"/gprof2dot/gprof2dot.py ~/bin
$ cd $PROJECT_DIR
$ gprof2dot.py -f pstats profile.pstats | dot -Tsvg -o callgraph.svg

和布莱姆!

它使用点(pycallgraph使用相同的东西),因此输出看起来类似。我的印象是,尽管gprof2dot丢失的信息更少:

gprof2dot示例输出

@Maxy’s comment on this answer helped me out enough that I think it deserves its own answer: I already had cProfile-generated .pstats files and I didn’t want to re-run things with pycallgraph, so I used gprof2dot, and got pretty svgs:

$ sudo apt-get install graphviz
$ git clone https://github.com/jrfonseca/gprof2dot
$ ln -s "$PWD"/gprof2dot/gprof2dot.py ~/bin
$ cd $PROJECT_DIR
$ gprof2dot.py -f pstats profile.pstats | dot -Tsvg -o callgraph.svg

and BLAM!

It uses dot (the same thing that pycallgraph uses) so output looks similar. I get the impression that gprof2dot loses less information though:

gprof2dot example output


回答 5

在研究此主题时,我遇到了一个名为SnakeViz的便捷工具。SnakeViz是基于Web的配置文件可视化工具。这是非常容易安装和使用。我通常使用的方法是使用生成统计文件,%prun然后在SnakeViz中进行分析。

所使用的主要可视化技术是如下所示的森伯斯特图,其中,函数调用的层次结构被安排为弧形层,并且时间信息以其角宽进行编码。

最好的事情是您可以与图表进行交互。例如,要放大,可以单击圆弧,然后将圆弧及其后代放大为新的旭日形以显示更多详细信息。

在此处输入图片说明

I ran into a handy tool called SnakeViz when researching this topic. SnakeViz is a web-based profiling visualization tool. It is very easy to install and use. The usual way I use it is to generate a stat file with %prun and then do analysis in SnakeViz.

The main viz technique used is Sunburst chart as shown below, in which the hierarchy of function calls is arranged as layers of arcs and time info encoded in their angular widths.

The best thing is you can interact with the chart. For example, to zoom in one can click on an arc, and the arc and its descendants will be enlarged as a new sunburst to display more details.

enter image description here


回答 6

最简单最快的方式找到所有的时间是怎么回事。

1. pip install snakeviz

2. python -m cProfile -o temp.dat <PROGRAM>.py

3. snakeviz temp.dat

在浏览器中绘制饼图。最大的一块是问题功能。很简单。

Simplest and quickest way to find where all the time is going.

1. pip install snakeviz

2. python -m cProfile -o temp.dat <PROGRAM>.py

3. snakeviz temp.dat

Draws a pie chart in a browser. Biggest piece is the problem function. Very simple.


回答 7

我认为这cProfile对于概要分析非常有用,而kcachegrind对于可视化结果则非常有用。该pyprof2calltree文件转换手柄之间英寸

python -m cProfile -o script.profile script.py
pyprof2calltree -i script.profile -o script.calltree
kcachegrind script.calltree

要安装必需的工具(至少在Ubuntu上):

apt-get install kcachegrind
pip install pyprof2calltree

结果:

结果截图

I think that cProfile is great for profiling, while kcachegrind is great for visualizing the results. The pyprof2calltree in between handles the file conversion.

python -m cProfile -o script.profile script.py
pyprof2calltree -i script.profile -o script.calltree
kcachegrind script.calltree

To install the required tools (on Ubuntu, at least):

apt-get install kcachegrind
pip install pyprof2calltree

The result:

Screenshot of the result


回答 8

同样值得一提的是GUI cProfile转储查看器RunSnakeRun。它允许您排序和选择,从而放大程序的相关部分。图片中矩形的大小与所花费的时间成比例。如果将鼠标悬停在矩形上,它将突出显示表格中以及地图上所有位置的调用。当您双击一个矩形时,它将放大该部分。它将显示谁调用了该部分以及该部分调用了什么。

描述性信息非常有帮助。它显示了该位的代码,在处理内置库调用时可能会有所帮助。它告诉您要查找代码的文件和行。

还想指出,OP表示“概要分析”,但看来他的意思是“定时”。请记住,配置文件后,程序运行速度会变慢。

在此处输入图片说明

Also worth mentioning is the GUI cProfile dump viewer RunSnakeRun. It allows you to sort and select, thereby zooming in on the relevant parts of the program. The sizes of the rectangles in the picture is proportional to the time taken. If you mouse over a rectangle it highlights that call in the table and everywhere on the map. When you double-click on a rectangle it zooms in on that portion. It will show you who calls that portion and what that portion calls.

The descriptive information is very helpful. It shows you the code for that bit which can be helpful when you are dealing with built-in library calls. It tells you what file and what line to find the code.

Also want to point at that the OP said ‘profiling’ but it appears he meant ‘timing’. Keep in mind programs will run slower when profiled.

enter image description here


回答 9

一个不错的分析模块是line_profiler(使用脚本kernprof.py调用)。可以在这里下载。

我的理解是cProfile仅提供有关每个功能花费的总时间的信息。因此,单独的代码行不会计时。这是科学计算中的一个问题,因为单行通常会花费很多时间。另外,正如我记得的那样,cProfile并没有抓住我花在numpy.dot上的时间。

A nice profiling module is the line_profiler (called using the script kernprof.py). It can be downloaded here.

My understanding is that cProfile only gives information about total time spent in each function. So individual lines of code are not timed. This is an issue in scientific computing since often one single line can take a lot of time. Also, as I remember, cProfile didn’t catch the time I was spending in say numpy.dot.


回答 10

我最近创建了金枪鱼,用于可视化Python运行时和导入配置文件。这可能会有所帮助。

在此处输入图片说明

用安装

pip install tuna

创建运行时配置文件

python3 -m cProfile -o program.prof yourfile.py

或导入配置文件(需要Python 3.7+)

python3 -X importprofile yourfile.py 2> import.log

然后在文件上运行金枪鱼

tuna program.prof

I recently created tuna for visualizing Python runtime and import profiles; this may be helpful here.

enter image description here

Install with

pip install tuna

Create a runtime profile

python3 -m cProfile -o program.prof yourfile.py

or an import profile (Python 3.7+ required)

python3 -X importprofile yourfile.py 2> import.log

Then just run tuna on the file

tuna program.prof

回答 11

pprofile

line_profiler(已在此处展示)也受到了启发 pprofile,它被描述为:

线粒度,线程感知的确定性和统计纯Python探查器

它提供的行粒度为line_profiler,它是纯Python,可以用作独立命令或模块,甚至可以生成可以使用轻松分析的callgrind格式文件[k|q]cachegrind

vprof

还有vprof,一个Python包,描述为:

为各种Python程序特征(例如运行时间和内存使用情况)提供丰富的交互式可视化。

热图

pprofile

line_profiler (already presented here) also inspired pprofile, which is described as:

Line-granularity, thread-aware deterministic and statistic pure-python profiler

It provides line-granularity as line_profiler, is pure Python, can be used as a standalone command or a module, and can even generate callgrind-format files that can be easily analyzed with [k|q]cachegrind.

vprof

There is also vprof, a Python package described as:

[…] providing rich and interactive visualizations for various Python program characteristics such as running time and memory usage.

heatmap


回答 12

有很多不错的答案,但是他们要么使用命令行,要么使用某些外部程序来对结果进行概要分析和/或排序。

我真的很想念我可以在IDE(eclipse-PyDev)中使用的某些方式,而无需触摸命令行或安装任何东西。就是这样

不使用命令行进行分析

def count():
    from math import sqrt
    for x in range(10**5):
        sqrt(x)

if __name__ == '__main__':
    import cProfile, pstats
    cProfile.run("count()", "{}.profile".format(__file__))
    s = pstats.Stats("{}.profile".format(__file__))
    s.strip_dirs()
    s.sort_stats("time").print_stats(10)

有关更多信息,请参阅文档或其他答案。

There’s a lot of great answers but they either use command line or some external program for profiling and/or sorting the results.

I really missed some way I could use in my IDE (eclipse-PyDev) without touching the command line or installing anything. So here it is.

Profiling without command line

def count():
    from math import sqrt
    for x in range(10**5):
        sqrt(x)

if __name__ == '__main__':
    import cProfile, pstats
    cProfile.run("count()", "{}.profile".format(__file__))
    s = pstats.Stats("{}.profile".format(__file__))
    s.strip_dirs()
    s.sort_stats("time").print_stats(10)

See docs or other answers for more info.


回答 13

在Joe Shaw回答了多线程代码无法按预期工作的回答之后,我发现runcallcProfile 中的方法只是在做,self.enable()并且self.disable()在配置函数调用周围进行调用,因此您可以自己进行操作,并在中间使用任何想要的代码对现有代码的干扰最小。

Following Joe Shaw’s answer about multi-threaded code not to work as expected, I figured that the runcall method in cProfile is merely doing self.enable() and self.disable() calls around the profiled function call, so you can simply do that yourself and have whatever code you want in-between with minimal interference with existing code.


回答 14

在Virtaal的资料中,有一个非常有用的类和装饰器,可以使分析(即使对于特定的方法/函数)也非常容易。然后可以在KCacheGrind中非常舒适地查看输出。

In Virtaal’s source there’s a very useful class and decorator that can make profiling (even for specific methods/functions) very easy. The output can then be viewed very comfortably in KCacheGrind.


回答 15

cProfile非常适合快速分析,但是大多数情况下它以错误结束。函数runctx通过正确初始化环境和变量来解决此问题,希望它对某人有用:

import cProfile
cProfile.runctx('foo()', None, locals())

cProfile is great for quick profiling but most of the time it was ending for me with the errors. Function runctx solves this problem by initializing correctly the environment and variables, hope it can be useful for someone:

import cProfile
cProfile.runctx('foo()', None, locals())

回答 16

如果要制作累积分析器,则意味着连续运行该函数几次,并观察结果的总和。

您可以使用以下cumulative_profiler装饰器:

它是特定于python> = 3.6的python,但是您可以删除nonlocal它,以便在较旧版本上运行。

import cProfile, pstats

class _ProfileFunc:
    def __init__(self, func, sort_stats_by):
        self.func =  func
        self.profile_runs = []
        self.sort_stats_by = sort_stats_by

    def __call__(self, *args, **kwargs):
        pr = cProfile.Profile()
        pr.enable()  # this is the profiling section
        retval = self.func(*args, **kwargs)
        pr.disable()

        self.profile_runs.append(pr)
        ps = pstats.Stats(*self.profile_runs).sort_stats(self.sort_stats_by)
        return retval, ps

def cumulative_profiler(amount_of_times, sort_stats_by='time'):
    def real_decorator(function):
        def wrapper(*args, **kwargs):
            nonlocal function, amount_of_times, sort_stats_by  # for python 2.x remove this row

            profiled_func = _ProfileFunc(function, sort_stats_by)
            for i in range(amount_of_times):
                retval, ps = profiled_func(*args, **kwargs)
            ps.print_stats()
            return retval  # returns the results of the function
        return wrapper

    if callable(amount_of_times):  # incase you don't want to specify the amount of times
        func = amount_of_times  # amount_of_times is the function in here
        amount_of_times = 5  # the default amount
        return real_decorator(func)
    return real_decorator

分析功能 baz

import time

@cumulative_profiler
def baz():
    time.sleep(1)
    time.sleep(2)
    return 1

baz()

baz 跑了5次并打印了这个:

         20 function calls in 15.003 seconds

   Ordered by: internal time

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
       10   15.003    1.500   15.003    1.500 {built-in method time.sleep}
        5    0.000    0.000   15.003    3.001 <ipython-input-9-c89afe010372>:3(baz)
        5    0.000    0.000    0.000    0.000 {method 'disable' of '_lsprof.Profiler' objects}

指定次数

@cumulative_profiler(3)
def baz():
    ...

If you want to make a cumulative profiler, meaning to run the function several times in a row and watch the sum of the results.

you can use this cumulative_profiler decorator:

it’s python >= 3.6 specific, but you can remove nonlocal for it work on older versions.

import cProfile, pstats

class _ProfileFunc:
    def __init__(self, func, sort_stats_by):
        self.func =  func
        self.profile_runs = []
        self.sort_stats_by = sort_stats_by

    def __call__(self, *args, **kwargs):
        pr = cProfile.Profile()
        pr.enable()  # this is the profiling section
        retval = self.func(*args, **kwargs)
        pr.disable()

        self.profile_runs.append(pr)
        ps = pstats.Stats(*self.profile_runs).sort_stats(self.sort_stats_by)
        return retval, ps

def cumulative_profiler(amount_of_times, sort_stats_by='time'):
    def real_decorator(function):
        def wrapper(*args, **kwargs):
            nonlocal function, amount_of_times, sort_stats_by  # for python 2.x remove this row

            profiled_func = _ProfileFunc(function, sort_stats_by)
            for i in range(amount_of_times):
                retval, ps = profiled_func(*args, **kwargs)
            ps.print_stats()
            return retval  # returns the results of the function
        return wrapper

    if callable(amount_of_times):  # incase you don't want to specify the amount of times
        func = amount_of_times  # amount_of_times is the function in here
        amount_of_times = 5  # the default amount
        return real_decorator(func)
    return real_decorator

Example

profiling the function baz

import time

@cumulative_profiler
def baz():
    time.sleep(1)
    time.sleep(2)
    return 1

baz()

baz ran 5 times and printed this:

         20 function calls in 15.003 seconds

   Ordered by: internal time

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
       10   15.003    1.500   15.003    1.500 {built-in method time.sleep}
        5    0.000    0.000   15.003    3.001 <ipython-input-9-c89afe010372>:3(baz)
        5    0.000    0.000    0.000    0.000 {method 'disable' of '_lsprof.Profiler' objects}

specifying the amount of times

@cumulative_profiler(3)
def baz():
    ...

回答 17

仅终端机(也是最简单的)解决方案,如果所有这些精美的UI无法安装或运行:完全
忽略cProfile并替换为pyinstrument,它将在执行后立即收集并显示调用树。

安装:

$ pip install pyinstrument

配置文件和显示结果:

$ python -m pyinstrument ./prog.py

适用于python2和3。

[编辑]仅用于分析部分代码的API文档可在此处找到。

The terminal-only (and simplest) solution, in case all those fancy UI’s fail to install or to run:
ignore cProfile completely and replace it with pyinstrument, that will collect and display the tree of calls right after execution.

Install:

$ pip install pyinstrument

Profile and display result:

$ python -m pyinstrument ./prog.py

Works with python2 and 3.

[EDIT] The documentation of the API, for profiling only a part of the code, can be found here.


回答 18

我的方式是使用yappi(https://github.com/sumerc/yappi)。与RPC服务器结合使用时特别有用,在RPC服务器中(甚至仅用于调试),您注册方法以启动,停止和打印性能分析信息,例如:

@staticmethod
def startProfiler():
    yappi.start()

@staticmethod
def stopProfiler():
    yappi.stop()

@staticmethod
def printProfiler():
    stats = yappi.get_stats(yappi.SORTTYPE_TTOT, yappi.SORTORDER_DESC, 20)
    statPrint = '\n'
    namesArr = [len(str(stat[0])) for stat in stats.func_stats]
    log.debug("namesArr %s", str(namesArr))
    maxNameLen = max(namesArr)
    log.debug("maxNameLen: %s", maxNameLen)

    for stat in stats.func_stats:
        nameAppendSpaces = [' ' for i in range(maxNameLen - len(stat[0]))]
        log.debug('nameAppendSpaces: %s', nameAppendSpaces)
        blankSpace = ''
        for space in nameAppendSpaces:
            blankSpace += space

        log.debug("adding spaces: %s", len(nameAppendSpaces))
        statPrint = statPrint + str(stat[0]) + blankSpace + " " + str(stat[1]).ljust(8) + "\t" + str(
            round(stat[2], 2)).ljust(8 - len(str(stat[2]))) + "\t" + str(round(stat[3], 2)) + "\n"

    log.log(1000, "\nname" + ''.ljust(maxNameLen - 4) + " ncall \tttot \ttsub")
    log.log(1000, statPrint)

然后,当程序工作时,您可以随时通过调用startProfilerRPC方法来启动事件探查器,并通过调用printProfiler(或修改rpc方法以将其返回给调用者)将概要分析信息转储到日志文件中,并获得以下输出:

2014-02-19 16:32:24,128-|SVR-MAIN  |-(Thread-3   )-Level 1000: 
name                                                                                                                                      ncall     ttot    tsub
2014-02-19 16:32:24,128-|SVR-MAIN  |-(Thread-3   )-Level 1000: 
C:\Python27\lib\sched.py.run:80                                                                                                           22        0.11    0.05
M:\02_documents\_repos\09_aheadRepos\apps\ahdModbusSrv\pyAheadRpcSrv\xmlRpc.py.iterFnc:293                                                22        0.11    0.0
M:\02_documents\_repos\09_aheadRepos\apps\ahdModbusSrv\serverMain.py.makeIteration:515                                                    22        0.11    0.0
M:\02_documents\_repos\09_aheadRepos\apps\ahdModbusSrv\pyAheadRpcSrv\PicklingXMLRPC.py._dispatch:66                                       1         0.0     0.0
C:\Python27\lib\BaseHTTPServer.py.date_time_string:464                                                                                    1         0.0     0.0
c:\users\zasiec~1\appdata\local\temp\easy_install-hwcsr1\psutil-1.1.2-py2.7-win32.egg.tmp\psutil\_psmswindows.py._get_raw_meminfo:243     4         0.0     0.0
C:\Python27\lib\SimpleXMLRPCServer.py.decode_request_content:537                                                                          1         0.0     0.0
c:\users\zasiec~1\appdata\local\temp\easy_install-hwcsr1\psutil-1.1.2-py2.7-win32.egg.tmp\psutil\_psmswindows.py.get_system_cpu_times:148 4         0.0     0.0
<string>.__new__:8                                                                                                                        220       0.0     0.0
C:\Python27\lib\socket.py.close:276                                                                                                       4         0.0     0.0
C:\Python27\lib\threading.py.__init__:558                                                                                                 1         0.0     0.0
<string>.__new__:8                                                                                                                        4         0.0     0.0
C:\Python27\lib\threading.py.notify:372                                                                                                   1         0.0     0.0
C:\Python27\lib\rfc822.py.getheader:285                                                                                                   4         0.0     0.0
C:\Python27\lib\BaseHTTPServer.py.handle_one_request:301                                                                                  1         0.0     0.0
C:\Python27\lib\xmlrpclib.py.end:816                                                                                                      3         0.0     0.0
C:\Python27\lib\SimpleXMLRPCServer.py.do_POST:467                                                                                         1         0.0     0.0
C:\Python27\lib\SimpleXMLRPCServer.py.is_rpc_path_valid:460                                                                               1         0.0     0.0
C:\Python27\lib\SocketServer.py.close_request:475                                                                                         1         0.0     0.0
c:\users\zasiec~1\appdata\local\temp\easy_install-hwcsr1\psutil-1.1.2-py2.7-win32.egg.tmp\psutil\__init__.py.cpu_times:1066               4         0.0     0.0 

它对于短脚本可能不是很有用,但有助于优化服务器类型的进程,尤其是考虑到该printProfiler方法可以随时间多次调用以概要分析和比较例如不同的程序使用情况时,尤其如此。

在较新版本的yappi中,以下代码将起作用:

@staticmethod
def printProfile():
    yappi.get_func_stats().print_all()

My way is to use yappi (https://github.com/sumerc/yappi). It’s especially useful combined with an RPC server where (even just for debugging) you register method to start, stop and print profiling information, e.g. in this way:

@staticmethod
def startProfiler():
    yappi.start()

@staticmethod
def stopProfiler():
    yappi.stop()

@staticmethod
def printProfiler():
    stats = yappi.get_stats(yappi.SORTTYPE_TTOT, yappi.SORTORDER_DESC, 20)
    statPrint = '\n'
    namesArr = [len(str(stat[0])) for stat in stats.func_stats]
    log.debug("namesArr %s", str(namesArr))
    maxNameLen = max(namesArr)
    log.debug("maxNameLen: %s", maxNameLen)

    for stat in stats.func_stats:
        nameAppendSpaces = [' ' for i in range(maxNameLen - len(stat[0]))]
        log.debug('nameAppendSpaces: %s', nameAppendSpaces)
        blankSpace = ''
        for space in nameAppendSpaces:
            blankSpace += space

        log.debug("adding spaces: %s", len(nameAppendSpaces))
        statPrint = statPrint + str(stat[0]) + blankSpace + " " + str(stat[1]).ljust(8) + "\t" + str(
            round(stat[2], 2)).ljust(8 - len(str(stat[2]))) + "\t" + str(round(stat[3], 2)) + "\n"

    log.log(1000, "\nname" + ''.ljust(maxNameLen - 4) + " ncall \tttot \ttsub")
    log.log(1000, statPrint)

Then when your program work you can start profiler at any time by calling the startProfiler RPC method and dump profiling information to a log file by calling printProfiler (or modify the rpc method to return it to the caller) and get such output:

2014-02-19 16:32:24,128-|SVR-MAIN  |-(Thread-3   )-Level 1000: 
name                                                                                                                                      ncall     ttot    tsub
2014-02-19 16:32:24,128-|SVR-MAIN  |-(Thread-3   )-Level 1000: 
C:\Python27\lib\sched.py.run:80                                                                                                           22        0.11    0.05
M:\02_documents\_repos\09_aheadRepos\apps\ahdModbusSrv\pyAheadRpcSrv\xmlRpc.py.iterFnc:293                                                22        0.11    0.0
M:\02_documents\_repos\09_aheadRepos\apps\ahdModbusSrv\serverMain.py.makeIteration:515                                                    22        0.11    0.0
M:\02_documents\_repos\09_aheadRepos\apps\ahdModbusSrv\pyAheadRpcSrv\PicklingXMLRPC.py._dispatch:66                                       1         0.0     0.0
C:\Python27\lib\BaseHTTPServer.py.date_time_string:464                                                                                    1         0.0     0.0
c:\users\zasiec~1\appdata\local\temp\easy_install-hwcsr1\psutil-1.1.2-py2.7-win32.egg.tmp\psutil\_psmswindows.py._get_raw_meminfo:243     4         0.0     0.0
C:\Python27\lib\SimpleXMLRPCServer.py.decode_request_content:537                                                                          1         0.0     0.0
c:\users\zasiec~1\appdata\local\temp\easy_install-hwcsr1\psutil-1.1.2-py2.7-win32.egg.tmp\psutil\_psmswindows.py.get_system_cpu_times:148 4         0.0     0.0
<string>.__new__:8                                                                                                                        220       0.0     0.0
C:\Python27\lib\socket.py.close:276                                                                                                       4         0.0     0.0
C:\Python27\lib\threading.py.__init__:558                                                                                                 1         0.0     0.0
<string>.__new__:8                                                                                                                        4         0.0     0.0
C:\Python27\lib\threading.py.notify:372                                                                                                   1         0.0     0.0
C:\Python27\lib\rfc822.py.getheader:285                                                                                                   4         0.0     0.0
C:\Python27\lib\BaseHTTPServer.py.handle_one_request:301                                                                                  1         0.0     0.0
C:\Python27\lib\xmlrpclib.py.end:816                                                                                                      3         0.0     0.0
C:\Python27\lib\SimpleXMLRPCServer.py.do_POST:467                                                                                         1         0.0     0.0
C:\Python27\lib\SimpleXMLRPCServer.py.is_rpc_path_valid:460                                                                               1         0.0     0.0
C:\Python27\lib\SocketServer.py.close_request:475                                                                                         1         0.0     0.0
c:\users\zasiec~1\appdata\local\temp\easy_install-hwcsr1\psutil-1.1.2-py2.7-win32.egg.tmp\psutil\__init__.py.cpu_times:1066               4         0.0     0.0 

It may not be very useful for short scripts but helps to optimize server-type processes especially given the printProfiler method can be called multiple times over time to profile and compare e.g. different program usage scenarios.

In newer versions of yappi, the following code will work:

@staticmethod
def printProfile():
    yappi.get_func_stats().print_all()

回答 19

PyVmMonitor是处理Python中性能分析的新工具:http ://www.pyvmmonitor.com/

它具有一些独特的功能,例如

  • 将探查器附加到正在运行的(CPython)程序
  • 通过Yappi集成进行按需配置
  • 在其他计算机上配置文件
  • 多进程支持(多处理,django …)
  • 实时采样/ CPU视图(具有时间范围选择)
  • 通过cProfile / profile集成进行确定性分析
  • 分析现有的PStats结果
  • 打开DOT文件
  • 程序化API访问
  • 按方法或行对样本进行分组
  • PyDev集成
  • PyCharm整合

注意:它是商业性的,但对开源免费。

A new tool to handle profiling in Python is PyVmMonitor: http://www.pyvmmonitor.com/

It has some unique features such as

  • Attach profiler to a running (CPython) program
  • On demand profiling with Yappi integration
  • Profile on a different machine
  • Multiple processes support (multiprocessing, django…)
  • Live sampling/CPU view (with time range selection)
  • Deterministic profiling through cProfile/profile integration
  • Analyze existing PStats results
  • Open DOT files
  • Programatic API access
  • Group samples by method or line
  • PyDev integration
  • PyCharm integration

Note: it’s commercial, but free for open source.


回答 20

gprof2dot_magic

魔术函数,用于gprof2dot在JupyterLab或Jupyter Notebook中将任何Python语句配置为DOT图。

在此处输入图片说明

GitHub回购:https : //github.com/mattijn/gprof2dot_magic

安装

确保您拥有Python软件包gprof2dot_magic

pip install gprof2dot_magic

它的依赖关系gprof2dotgraphviz也将被安装

用法

要启用魔术功能,请先加载gprof2dot_magic模块

%load_ext gprof2dot_magic

然后将任何行语句配置为DOT图,如下所示:

%gprof2dot print('hello world')

在此处输入图片说明

gprof2dot_magic

Magic function for gprof2dot to profile any Python statement as a DOT graph in JupyterLab or Jupyter Notebook.

enter image description here

GitHub repo: https://github.com/mattijn/gprof2dot_magic

installation

Make sure you’ve the Python package gprof2dot_magic.

pip install gprof2dot_magic

Its dependencies gprof2dot and graphviz will be installed as well

usage

To enable the magic function, first load the gprof2dot_magic module

%load_ext gprof2dot_magic

and then profile any line statement as a DOT graph as such:

%gprof2dot print('hello world')

enter image description here


回答 21

是否曾经想知道python脚本到底在做什么?输入检查外壳。通过Inspect Shell,您可以在不中断正在运行的脚本的情况下打印/更改全局变量并运行函数。现在具有自动完成和命令历史记录(仅在Linux上)。

Inspect Shell不是pdb样式的调试器。

https://github.com/amoffat/Inspect-Shell

您可以使用它(和您的手表)。

Ever want to know what the hell that python script is doing? Enter the Inspect Shell. Inspect Shell lets you print/alter globals and run functions without interrupting the running script. Now with auto-complete and command history (only on linux).

Inspect Shell is not a pdb-style debugger.

https://github.com/amoffat/Inspect-Shell

You could use that (and your wristwatch).


回答 22

要添加到https://stackoverflow.com/a/582337/1070617

我编写了此模块,该模块使您可以使用cProfile并轻松查看其输出。此处更多内容:https//github.com/ymichael/cprofilev

$ python -m cprofilev /your/python/program
# Go to http://localhost:4000 to view collected statistics.

另请参阅:http: //ymichael.com/2014/03/08/profiling-python-with-cprofile.html,了解如何理解收集到的统计信息。

To add on to https://stackoverflow.com/a/582337/1070617,

I wrote this module that allows you to use cProfile and view its output easily. More here: https://github.com/ymichael/cprofilev

$ python -m cprofilev /your/python/program
# Go to http://localhost:4000 to view collected statistics.

Also see: http://ymichael.com/2014/03/08/profiling-python-with-cprofile.html on how to make sense of the collected statistics.


回答 23

这将取决于您希望从分析中看到什么。可以通过(bash)给出简单的时间指标。

time python python_prog.py

甚至’/ usr / bin / time’也可以使用’–verbose’标志输出详细的指标。

要检查每个函数给出的时间指标并更好地了解在函数上花费了多少时间,可以在python中使用内置的cProfile。

进入性能,时间等更详细的指标并不是唯一的指标。您可以担心内存,线程等问题。
分析选项:
1. line_profiler是另一个分析器,通常用于逐行找出时序度量。
2. memory_profiler是用于分析内存使用情况的工具。
3. 堆(来自项目Guppy)描述如何使用堆中的对象。

这些是我倾向于使用的一些常见的东西。但是,如果您想了解更多信息,请尝试阅读本书。 这是一本关于性能入门的不错的书。您可以转到使用Cython和JIT(即时)编译的python的高级主题。

It would depend on what you want to see out of profiling. Simple time metrics can be given by (bash).

time python python_prog.py

Even ‘/usr/bin/time’ can output detailed metrics by using ‘–verbose’ flag.

To check time metrics given by each function and to better understand how much time is spent on functions, you can use the inbuilt cProfile in python.

Going into more detailed metrics like performance, time is not the only metric. You can worry about memory, threads etc.
Profiling options:
1. line_profiler is another profiler used commonly to find out timing metrics line-by-line.
2. memory_profiler is a tool to profile memory usage.
3. heapy (from project Guppy) Profile how objects in the heap are used.

These are some of the common ones I tend to use. But if you want to find out more, try reading this book It is a pretty good book on starting out with performance in mind. You can move onto advanced topics on using Cython and JIT(Just-in-time) compiled python.


回答 24

使用austin之类的统计分析器,不需要任何检测,这意味着您可以轻松地从Python应用程序中分析数据

austin python3 my_script.py

原始输出不是很有用,但是您可以将其通过管道传递到flamegraph.pl 以获取该数据的火焰图表示,从而可以细分所花费的时间(以毫秒为单位的实时时间)。

austin python3 my_script.py | flamegraph.pl > my_script_profile.svg

With a statistical profiler like austin, no instrumentation is required, meaning that you can get profiling data out of a Python application simply with

austin python3 my_script.py

The raw output isn’t very useful, but you can pipe that to flamegraph.pl to get a flame graph representation of that data that gives you a breakdown of where the time (measured in microseconds of real time) is being spent.

austin python3 my_script.py | flamegraph.pl > my_script_profile.svg

回答 25

还有一个名为的统计分析器statprof。它是一个采样探查器,因此它为您的代码增加了最小的开销,并提供了基于行(不仅仅基于函数)的时序。它更适合诸如游戏之类的软实时应用程序,但精度可能不如cProfile。

pypi中版本有点旧,因此可以pip通过指定git仓库来安装它:

pip install git+git://github.com/bos/statprof.py@1a33eba91899afe17a8b752c6dfdec6f05dd0c01

您可以像这样运行它:

import statprof

with statprof.profile():
    my_questionable_function()

另请参阅https://stackoverflow.com/a/10333592/320036

There’s also a statistical profiler called statprof. It’s a sampling profiler, so it adds minimal overhead to your code and gives line-based (not just function-based) timings. It’s more suited to soft real-time applications like games, but may be have less precision than cProfile.

The version in pypi is a bit old, so can install it with pip by specifying the git repository:

pip install git+git://github.com/bos/statprof.py@1a33eba91899afe17a8b752c6dfdec6f05dd0c01

You can run it like this:

import statprof

with statprof.profile():
    my_questionable_function()

See also https://stackoverflow.com/a/10333592/320036


回答 26

我刚刚从pypref_time中开发了自己的探查器:

https://github.com/modaresimr/auto_profiler

通过添加装饰器,它将显示一棵耗时的功能树

@Profiler(depth=4, on_disable=show)

Install by: pip install auto_profiler

import time # line number 1
import random

from auto_profiler import Profiler, Tree

def f1():
    mysleep(.6+random.random())

def mysleep(t):
    time.sleep(t)

def fact(i):
    f1()
    if(i==1):
        return 1
    return i*fact(i-1)


def show(p):
    print('Time   [Hits * PerHit] Function name [Called from] [Function Location]\n'+\
          '-----------------------------------------------------------------------')
    print(Tree(p.root, threshold=0.5))

@Profiler(depth=4, on_disable=show)
def main():
    for i in range(5):
        f1()

    fact(3)


if __name__ == '__main__':
    main()

示例输出


Time   [Hits * PerHit] Function name [Called from] [function location]
-----------------------------------------------------------------------
8.974s [1 * 8.974]  main  [auto-profiler/profiler.py:267]  [/test/t2.py:30]
├── 5.954s [5 * 1.191]  f1  [/test/t2.py:34]  [/test/t2.py:14]
   └── 5.954s [5 * 1.191]  mysleep  [/test/t2.py:15]  [/test/t2.py:17]
       └── 5.954s [5 * 1.191]  <time.sleep>
|
|
|   # The rest is for the example recursive function call fact
└── 3.020s [1 * 3.020]  fact  [/test/t2.py:36]  [/test/t2.py:20]
    ├── 0.849s [1 * 0.849]  f1  [/test/t2.py:21]  [/test/t2.py:14]
       └── 0.849s [1 * 0.849]  mysleep  [/test/t2.py:15]  [/test/t2.py:17]
           └── 0.849s [1 * 0.849]  <time.sleep>
    └── 2.171s [1 * 2.171]  fact  [/test/t2.py:24]  [/test/t2.py:20]
        ├── 1.552s [1 * 1.552]  f1  [/test/t2.py:21]  [/test/t2.py:14]
           └── 1.552s [1 * 1.552]  mysleep  [/test/t2.py:15]  [/test/t2.py:17]
        └── 0.619s [1 * 0.619]  fact  [/test/t2.py:24]  [/test/t2.py:20]
            └── 0.619s [1 * 0.619]  f1  [/test/t2.py:21]  [/test/t2.py:14]

I just developed my own profiler inspired from pypref_time:

https://github.com/modaresimr/auto_profiler

By adding a decorator it will show a tree of time-consuming functions

@Profiler(depth=4, on_disable=show)

Install by: pip install auto_profiler

Example

import time # line number 1
import random

from auto_profiler import Profiler, Tree

def f1():
    mysleep(.6+random.random())

def mysleep(t):
    time.sleep(t)

def fact(i):
    f1()
    if(i==1):
        return 1
    return i*fact(i-1)


def show(p):
    print('Time   [Hits * PerHit] Function name [Called from] [Function Location]\n'+\
          '-----------------------------------------------------------------------')
    print(Tree(p.root, threshold=0.5))

@Profiler(depth=4, on_disable=show)
def main():
    for i in range(5):
        f1()

    fact(3)


if __name__ == '__main__':
    main()

Example Output


Time   [Hits * PerHit] Function name [Called from] [function location]
-----------------------------------------------------------------------
8.974s [1 * 8.974]  main  [auto-profiler/profiler.py:267]  [/test/t2.py:30]
├── 5.954s [5 * 1.191]  f1  [/test/t2.py:34]  [/test/t2.py:14]
│   └── 5.954s [5 * 1.191]  mysleep  [/test/t2.py:15]  [/test/t2.py:17]
│       └── 5.954s [5 * 1.191]  <time.sleep>
|
|
|   # The rest is for the example recursive function call fact
└── 3.020s [1 * 3.020]  fact  [/test/t2.py:36]  [/test/t2.py:20]
    ├── 0.849s [1 * 0.849]  f1  [/test/t2.py:21]  [/test/t2.py:14]
    │   └── 0.849s [1 * 0.849]  mysleep  [/test/t2.py:15]  [/test/t2.py:17]
    │       └── 0.849s [1 * 0.849]  <time.sleep>
    └── 2.171s [1 * 2.171]  fact  [/test/t2.py:24]  [/test/t2.py:20]
        ├── 1.552s [1 * 1.552]  f1  [/test/t2.py:21]  [/test/t2.py:14]
        │   └── 1.552s [1 * 1.552]  mysleep  [/test/t2.py:15]  [/test/t2.py:17]
        └── 0.619s [1 * 0.619]  fact  [/test/t2.py:24]  [/test/t2.py:20]
            └── 0.619s [1 * 0.619]  f1  [/test/t2.py:21]  [/test/t2.py:14]

回答 27

当我不在服务器上时,我使用 lsprofcalltree.py并像这样运行我的程序:

python lsprofcalltree.py -o callgrind.1 test.py

然后,我可以使用任何与callgrind兼容的软件打开报告,例如qcachegrind

When i’m not root on the server, I use lsprofcalltree.py and run my program like this:

python lsprofcalltree.py -o callgrind.1 test.py

Then I can open the report with any callgrind-compatible software, like qcachegrind


回答 28

用于在IPython笔记本上快速获取代码段的配置文件统计信息。可以将line_profiler和memory_profiler直接嵌入他们的笔记本中。

得到它!

!pip install line_profiler
!pip install memory_profiler

加载它!

%load_ext line_profiler
%load_ext memory_profiler

用它!


%时间

%time print('Outputs CPU time,Wall Clock time') 
#CPU times: user 2 µs, sys: 0 ns, total: 2 µs Wall time: 5.96 µs

给出:

  • CPU时间:CPU级执行时间
  • 系统时间:系统级执行时间
  • 总计:CPU时间+系统时间
  • 挂钟时间:挂钟时间

%timeit

%timeit -r 7 -n 1000 print('Outputs execution time of the snippet') 
#1000 loops, best of 7: 7.46 ns per loop
  • 在循环(n)次中,在给定的运行次数(r)中给出最佳时间。
  • 输出有关系统缓存的详细信息:
    • 当代码片段多次执行时,系统会缓存一些操作,而不会再次执行它们,这可能会影响概要文件报告的准确性。

%修剪

%prun -s cumulative 'Code to profile' 

给出:

  • 函数调用数(ncalls)
  • 每个函数调用都有条目(不同)
  • 每次通话所花费的时间(每次通话)
  • 直到该函数调用为止的时间(cumtime)
  • 称为etc的func /模块的名称…

累积资料


%mit

%memit 'Code to profile'
#peak memory: 199.45 MiB, increment: 0.00 MiB

给出:

  • 内存使用情况

%lprun

#Example function
def fun():
  for i in range(10):
    print(i)

#Usage: %lprun <name_of_the_function> function
%lprun -f fun fun()

给出:

  • 逐行统计

LineProfile

For getting quick profile stats for your code snippets on an IPython notebook. One can embed line_profiler and memory_profiler straight into their notebooks.

Get it!

!pip install line_profiler
!pip install memory_profiler

Load it!

%load_ext line_profiler
%load_ext memory_profiler

Use it!


%time

%time print('Outputs CPU time,Wall Clock time') 
#CPU times: user 2 µs, sys: 0 ns, total: 2 µs Wall time: 5.96 µs

Gives:

  • CPU times: CPU level execution time
  • sys times: system level execution time
  • total: CPU time + system time
  • Wall time: Wall Clock Time

%timeit

%timeit -r 7 -n 1000 print('Outputs execution time of the snippet') 
#1000 loops, best of 7: 7.46 ns per loop
  • Gives best time out of given number of runs(r) in looping (n) times.
  • Outputs details on system caching:
    • When code snippets are executed multiple times, system caches a few opearations and doesn’t execute them again that may hamper the accuracy of the profile reports.

%prun

%prun -s cumulative 'Code to profile' 

Gives:

  • number of function calls(ncalls)
  • has entries per function call(distinct)
  • time taken per call(percall)
  • time elapsed till that function call(cumtime)
  • name of the func/module called etc…

Cumulative profile


%memit

%memit 'Code to profile'
#peak memory: 199.45 MiB, increment: 0.00 MiB

Gives:

  • Memory usage

%lprun

#Example function
def fun():
  for i in range(10):
    print(i)

#Usage: %lprun <name_of_the_function> function
%lprun -f fun fun()

Gives:

  • Line wise stats

LineProfile


如何在Python中测量经过时间?

问题:如何在Python中测量经过时间?

我想要的是开始在我的代码中的某个地方开始计时,然后获取经过的时间,以衡量执行少量功能所花费的时间。我认为我使用的timeit模块错误,但是文档对我来说却很混乱。

import timeit

start = timeit.timeit()
print("hello")
end = timeit.timeit()
print(end - start)

What I want is to start counting time somewhere in my code and then get the passed time, to measure the time it took to execute few function. I think I’m using the timeit module wrong, but the docs are just confusing for me.

import timeit

start = timeit.timeit()
print("hello")
end = timeit.timeit()
print(end - start)

回答 0

如果您只想测量两点之间经过的挂钟时间,则可以使用 time.time()

import time

start = time.time()
print("hello")
end = time.time()
print(end - start)

这给出了执行时间(以秒为单位)。

自3.3起,另一个选择可能是使用perf_counterprocess_time,具体取决于您的要求。在3.3之前,建议使用time.clock(感谢Amber)。但是,目前不推荐使用:

在Unix上,以秒为单位返回当前处理器时间,以浮点数表示。精度,实际上是“处理器时间”含义的确切定义,取决于同名C函数的精度。

在Windows上,基于Win32函数,此函数返回自第一次调用此函数以来经过的时间(以秒为单位),以浮点数表示。 QueryPerformanceCounter()。分辨率通常优于一微秒。

从版本3.3开始不推荐使用:此功能的行为取决于平台:根据需要,使用perf_counter()process_time()相反,以具有明确定义的行为。

If you just want to measure the elapsed wall-clock time between two points, you could use time.time():

import time

start = time.time()
print("hello")
end = time.time()
print(end - start)

This gives the execution time in seconds.

Another option since 3.3 might be to use perf_counter or process_time, depending on your requirements. Before 3.3 it was recommended to use time.clock (thanks Amber). However, it is currently deprecated:

On Unix, return the current processor time as a floating point number expressed in seconds. The precision, and in fact the very definition of the meaning of “processor time”, depends on that of the C function of the same name.

On Windows, this function returns wall-clock seconds elapsed since the first call to this function, as a floating point number, based on the Win32 function QueryPerformanceCounter(). The resolution is typically better than one microsecond.

Deprecated since version 3.3: The behaviour of this function depends on the platform: use perf_counter() or process_time() instead, depending on your requirements, to have a well defined behaviour.


回答 1

使用timeit.default_timer代替timeit.timeit。前者会自动在您的平台和Python版本上提供最佳时钟:

from timeit import default_timer as timer

start = timer()
# ...
end = timer()
print(end - start) # Time in seconds, e.g. 5.38091952400282

根据操作系统,将timeit.default_timer分配给time.time()或time.clock()。在Python 3.3+ 上,在所有平台上,default_timertime.perf_counter()。参见Python-time.clock()与time.time()-准确性?

也可以看看:

Use timeit.default_timer instead of timeit.timeit. The former provides the best clock available on your platform and version of Python automatically:

from timeit import default_timer as timer

start = timer()
# ...
end = timer()
print(end - start) # Time in seconds, e.g. 5.38091952400282

timeit.default_timer is assigned to time.time() or time.clock() depending on OS. On Python 3.3+ default_timer is time.perf_counter() on all platforms. See Python – time.clock() vs. time.time() – accuracy?

See also:


回答 2

仅限Python 3:

由于从Python 3.3开始不推荐使用 time.clock(),因此您将希望time.perf_counter()用于系统范围的计时或time.process_time()进程范围的计时,就像您以前使用的方式一样time.clock()

import time

t = time.process_time()
#do some stuff
elapsed_time = time.process_time() - t

新功能process_time将不包括睡眠期间经过的时间。

Python 3 only:

Since time.clock() is deprecated as of Python 3.3, you will want to use time.perf_counter() for system-wide timing, or time.process_time() for process-wide timing, just the way you used to use time.clock():

import time

t = time.process_time()
#do some stuff
elapsed_time = time.process_time() - t

The new function process_time will not include time elapsed during sleep.


回答 3

有了您想计时的功能,

test.py:

def foo(): 
    # print "hello"   
    return "hello"

最简单的使用方法timeit是从命令行调用它:

% python -mtimeit -s'import test' 'test.foo()'
1000000 loops, best of 3: 0.254 usec per loop

请勿尝试使用time.timetime.clock(天真)比较功能的速度。他们可能会产生误导性的结果

PS。不要将打印语句放在您希望计时的函数中;否则,测量的时间将取决于终端速度

Given a function you’d like to time,

test.py:

def foo(): 
    # print "hello"   
    return "hello"

the easiest way to use timeit is to call it from the command line:

% python -mtimeit -s'import test' 'test.foo()'
1000000 loops, best of 3: 0.254 usec per loop

Do not try to use time.time or time.clock (naively) to compare the speed of functions. They can give misleading results.

PS. Do not put print statements in a function you wish to time; otherwise the time measured will depend on the speed of the terminal.


回答 4

使用上下文管理器执行此操作很有趣,该上下文管理器会自动记住进入with块时的开始时间,然后冻结块退出时的结束时间。只需一点技巧,您甚至可以通过相同的上下文管理器功能在块内获得运行时间计数。

核心库没有这个(但应该这样做)。放置到位后,您可以执行以下操作:

with elapsed_timer() as elapsed:
    # some lengthy code
    print( "midpoint at %.2f seconds" % elapsed() )  # time so far
    # other lengthy code

print( "all done at %.2f seconds" % elapsed() )

这是足以完成此任务的contextmanager代码:

from contextlib import contextmanager
from timeit import default_timer

@contextmanager
def elapsed_timer():
    start = default_timer()
    elapser = lambda: default_timer() - start
    yield lambda: elapser()
    end = default_timer()
    elapser = lambda: end-start

还有一些可运行的演示代码:

import time

with elapsed_timer() as elapsed:
    time.sleep(1)
    print(elapsed())
    time.sleep(2)
    print(elapsed())
    time.sleep(3)

请注意,根据该函数的设计,的退出值将elapsed()在块退出时冻结,并且进一步的调用将返回相同的持续时间(在此玩具示例中约为6秒)。

It’s fun to do this with a context-manager that automatically remembers the start time upon entry to a with block, then freezes the end time on block exit. With a little trickery, you can even get a running elapsed-time tally inside the block from the same context-manager function.

The core library doesn’t have this (but probably ought to). Once in place, you can do things like:

with elapsed_timer() as elapsed:
    # some lengthy code
    print( "midpoint at %.2f seconds" % elapsed() )  # time so far
    # other lengthy code

print( "all done at %.2f seconds" % elapsed() )

Here’s contextmanager code sufficient to do the trick:

from contextlib import contextmanager
from timeit import default_timer

@contextmanager
def elapsed_timer():
    start = default_timer()
    elapser = lambda: default_timer() - start
    yield lambda: elapser()
    end = default_timer()
    elapser = lambda: end-start

And some runnable demo code:

import time

with elapsed_timer() as elapsed:
    time.sleep(1)
    print(elapsed())
    time.sleep(2)
    print(elapsed())
    time.sleep(3)

Note that by design of this function, the return value of elapsed() is frozen on block exit, and further calls return the same duration (of about 6 seconds in this toy example).


回答 5

以秒为单位的测量时间

from timeit import default_timer as timer
from datetime import timedelta

start = timer()
end = timer()
print(timedelta(seconds=end-start))

输出

0:00:01.946339

Measuring time in seconds:

from timeit import default_timer as timer
from datetime import timedelta

start = timer()
end = timer()
print(timedelta(seconds=end-start))

Output:

0:00:01.946339

回答 6

我喜欢这个。timeitdoc太混乱了。

from datetime import datetime 

start_time = datetime.now() 

# INSERT YOUR CODE 

time_elapsed = datetime.now() - start_time 

print('Time elapsed (hh:mm:ss.ms) {}'.format(time_elapsed))

请注意,这里没有任何格式,我只是写到hh:mm:ss打印输出中,以便可以解释time_elapsed

I prefer this. timeit doc is far too confusing.

from datetime import datetime 

start_time = datetime.now() 

# INSERT YOUR CODE 

time_elapsed = datetime.now() - start_time 

print('Time elapsed (hh:mm:ss.ms) {}'.format(time_elapsed))

Note, that there isn’t any formatting going on here, I just wrote hh:mm:ss into the printout so one can interpret time_elapsed


回答 7

这是执行此操作的另一种方法:

>> from pytictoc import TicToc
>> t = TicToc() # create TicToc instance
>> t.tic() # Start timer
>> # do something
>> t.toc() # Print elapsed time
Elapsed time is 2.612231 seconds.

与传统方式比较:

>> from time import time
>> t1 = time()
>> # do something
>> t2 = time()
>> elapsed = t2 - t1
>> print('Elapsed time is %f seconds.' % elapsed)
Elapsed time is 2.612231 seconds.

安装:

pip install pytictoc

有关更多详细信息,请参阅PyPi页面

Here’s another way to do this:

>> from pytictoc import TicToc
>> t = TicToc() # create TicToc instance
>> t.tic() # Start timer
>> # do something
>> t.toc() # Print elapsed time
Elapsed time is 2.612231 seconds.

Comparing with traditional way:

>> from time import time
>> t1 = time()
>> # do something
>> t2 = time()
>> elapsed = t2 - t1
>> print('Elapsed time is %f seconds.' % elapsed)
Elapsed time is 2.612231 seconds.

Installation:

pip install pytictoc

Refer to the PyPi page for more details.


回答 8

这是我在这里经过许多不错的回答以及其他几篇文章后的发现。

首先,如果您在timeit和之间进行辩论time.time,则timeit有两个优点:

  1. timeit 选择操作系统和Python版本上可用的最佳计时器。
  2. timeit 禁用垃圾收集,但是,这不是您可能想要或不想要的东西。

现在的问题是,timeit使用起来并不是那么简单,因为它需要设置,并且当您进行大量导入时,情况变得很糟。理想情况下,您只需要一个装饰器或使用with块并测量时间。不幸的是,对此没有内置的功能,因此您有两个选择:

选项1:使用时间预算库

timebudget是一个多功能的,非常简单的库,你可以在一行代码只使用PIP后安装。

@timebudget  # Record how long this function takes
def my_method():
    # my code

选项2:直接使用代码模块

我在下面创建了小实用程序模块。

# utils.py
from functools import wraps
import gc
import timeit

def MeasureTime(f, no_print=False, disable_gc=False):
    @wraps(f)
    def _wrapper(*args, **kwargs):
        gcold = gc.isenabled()
        if disable_gc:
            gc.disable()
        start_time = timeit.default_timer()
        try:
            result = f(*args, **kwargs)
        finally:
            elapsed = timeit.default_timer() - start_time
            if disable_gc and gcold:
                gc.enable()
            if not no_print:
                print('"{}": {}s'.format(f.__name__, elapsed))
        return result
    return _wrapper

class MeasureBlockTime:
    def __init__(self,name="(block)", no_print=False, disable_gc=False):
        self.name = name
        self.no_print = no_print
        self.disable_gc = disable_gc
    def __enter__(self):
        self.gcold = gc.isenabled()
        if self.disable_gc:
            gc.disable()
        self.start_time = timeit.default_timer()
    def __exit__(self,ty,val,tb):
        self.elapsed = timeit.default_timer() - self.start_time
        if self.disable_gc and self.gcold:
            gc.enable()
        if not self.no_print:
            print('Function "{}": {}s'.format(self.name, self.elapsed))
        return False #re-raise any exceptions

现在,您只需在其前面放置装饰器即可计时任何功能:

import utils

@utils.MeasureTime
def MyBigFunc():
    #do something time consuming
    for i in range(10000):
        print(i)

如果要计时部分代码,只需将其放在代码with块中:

import utils

#somewhere in my code

with utils.MeasureBlockTime("MyBlock"):
    #do something time consuming
    for i in range(10000):
        print(i)

# rest of my code

优点:

有几个半支持的版本,所以我想指出一些重点:

  1. 出于前面所述的原因,请使用timeit中的timer代替time.time。
  2. 如果需要,可以在计时期间禁用GC。
  3. 装饰器接受带有已命名或未命名参数的函数。
  4. 能够按块定时禁用打印(先使用with utils.MeasureBlockTime() as t,然后再使用t.elapsed)。
  5. 能够为块定时保持启用gc的能力。

Here are my findings after going through many good answers here as well as a few other articles.

First, if you are debating between timeit and time.time, the timeit has two advantages:

  1. timeit selects the best timer available on your OS and Python version.
  2. timeit disables garbage collection, however, this is not something you may or may not want.

Now the problem is that timeit is not that simple to use because it needs setup and things get ugly when you have a bunch of imports. Ideally, you just want a decorator or use with block and measure time. Unfortunately, there is nothing built-in available for this so you have two options:

Option 1: Use timebudget library

The timebudget is a versatile and very simple library that you can use just in one line of code after pip install.

@timebudget  # Record how long this function takes
def my_method():
    # my code

Option 2: Use code module directly

I created below little utility module.

# utils.py
from functools import wraps
import gc
import timeit

def MeasureTime(f, no_print=False, disable_gc=False):
    @wraps(f)
    def _wrapper(*args, **kwargs):
        gcold = gc.isenabled()
        if disable_gc:
            gc.disable()
        start_time = timeit.default_timer()
        try:
            result = f(*args, **kwargs)
        finally:
            elapsed = timeit.default_timer() - start_time
            if disable_gc and gcold:
                gc.enable()
            if not no_print:
                print('"{}": {}s'.format(f.__name__, elapsed))
        return result
    return _wrapper

class MeasureBlockTime:
    def __init__(self,name="(block)", no_print=False, disable_gc=False):
        self.name = name
        self.no_print = no_print
        self.disable_gc = disable_gc
    def __enter__(self):
        self.gcold = gc.isenabled()
        if self.disable_gc:
            gc.disable()
        self.start_time = timeit.default_timer()
    def __exit__(self,ty,val,tb):
        self.elapsed = timeit.default_timer() - self.start_time
        if self.disable_gc and self.gcold:
            gc.enable()
        if not self.no_print:
            print('Function "{}": {}s'.format(self.name, self.elapsed))
        return False #re-raise any exceptions

Now you can time any function just by putting a decorator in front of it:

import utils

@utils.MeasureTime
def MyBigFunc():
    #do something time consuming
    for i in range(10000):
        print(i)

If you want to time portion of code then just put it inside with block:

import utils

#somewhere in my code

with utils.MeasureBlockTime("MyBlock"):
    #do something time consuming
    for i in range(10000):
        print(i)

# rest of my code

Advantages:

There are several half-backed versions floating around so I want to point out few highlights:

  1. Use timer from timeit instead of time.time for reasons described earlier.
  2. You can disable GC during timing if you want.
  3. Decorator accepts functions with named or unnamed params.
  4. Ability to disable printing in block timing (use with utils.MeasureBlockTime() as t and then t.elapsed).
  5. Ability to keep gc enabled for block timing.

回答 9

使用 time.time度量执行可以为您提供命令的整体执行时间,包括计算机上其他进程花费的运行时间。这是用户注意到的时间,但是如果您要比较不同的代码段/算法/函数/ …,则效果不佳。

有关更多信息timeit

如果您想对配置文件有更深入的了解:

更新:去年我大量使用了http://pythonhosted.org/line_profiler/,发现它非常有用,建议使用它代替Pythons profile模块。

Using time.time to measure execution gives you the overall execution time of your commands including running time spent by other processes on your computer. It is the time the user notices, but is not good if you want to compare different code snippets / algorithms / functions / …

More information on timeit:

If you want a deeper insight into profiling:

Update: I used http://pythonhosted.org/line_profiler/ a lot during the last year and find it very helpfull and recommend to use it instead of Pythons profile module.


回答 10

这是一个微小的计时器类,它返回“ hh:mm:ss”字符串:

class Timer:
  def __init__(self):
    self.start = time.time()

  def restart(self):
    self.start = time.time()

  def get_time_hhmmss(self):
    end = time.time()
    m, s = divmod(end - self.start, 60)
    h, m = divmod(m, 60)
    time_str = "%02d:%02d:%02d" % (h, m, s)
    return time_str

用法:

# Start timer
my_timer = Timer()

# ... do something

# Get time string:
time_hhmmss = my_timer.get_time_hhmmss()
print("Time elapsed: %s" % time_hhmmss )

# ... use the timer again
my_timer.restart()

# ... do something

# Get time:
time_hhmmss = my_timer.get_time_hhmmss()

# ... etc

Here is a tiny timer class that returns “hh:mm:ss” string:

class Timer:
  def __init__(self):
    self.start = time.time()

  def restart(self):
    self.start = time.time()

  def get_time_hhmmss(self):
    end = time.time()
    m, s = divmod(end - self.start, 60)
    h, m = divmod(m, 60)
    time_str = "%02d:%02d:%02d" % (h, m, s)
    return time_str

Usage:

# Start timer
my_timer = Timer()

# ... do something

# Get time string:
time_hhmmss = my_timer.get_time_hhmmss()
print("Time elapsed: %s" % time_hhmmss )

# ... use the timer again
my_timer.restart()

# ... do something

# Get time:
time_hhmmss = my_timer.get_time_hhmmss()

# ... etc

回答 11

python cProfile和pstats模块为测量某些功能所经过的时间提供了强大的支持,而无需在现有功能周围添加任何代码。

例如,如果您有python脚本timeFunctions.py:

import time

def hello():
    print "Hello :)"
    time.sleep(0.1)

def thankyou():
    print "Thank you!"
    time.sleep(0.05)

for idx in range(10):
    hello()

for idx in range(100):
    thankyou()

要运行事件探查器并为文件生成统计信息,您可以运行:

python -m cProfile -o timeStats.profile timeFunctions.py

这样做是使用cProfile模块来分析timeFunctions.py中的所有功能,并在timeStats.profile文件中收集统计信息。请注意,我们不必向现有模块(timeFunctions.py)添加任何代码,并且可以使用任何模块来完成此操作。

拥有stats文件后,您可以按以下方式运行pstats模块:

python -m pstats timeStats.profile

这将运行交互式统计浏览器,从而为您提供许多不错的功能。对于您的特定用例,您可以只检查功能的统计信息。在我们的示例中,检查这两个功能的统计信息显示以下内容:

Welcome to the profile statistics browser.
timeStats.profile% stats hello
<timestamp>    timeStats.profile

         224 function calls in 6.014 seconds

   Random listing order was used
   List reduced from 6 to 1 due to restriction <'hello'>

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
       10    0.000    0.000    1.001    0.100 timeFunctions.py:3(hello)

timeStats.profile% stats thankyou
<timestamp>    timeStats.profile

         224 function calls in 6.014 seconds

   Random listing order was used
   List reduced from 6 to 1 due to restriction <'thankyou'>

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
      100    0.002    0.000    5.012    0.050 timeFunctions.py:7(thankyou)

这个虚拟的例子并没有做太多,但是让您知道可以做什么。关于这种方法的最好之处在于,我不必编辑任何现有代码即可获得这些数字,并且显然可以帮助进行性能分析。

The python cProfile and pstats modules offer great support for measuring time elapsed in certain functions without having to add any code around the existing functions.

For example if you have a python script timeFunctions.py:

import time

def hello():
    print "Hello :)"
    time.sleep(0.1)

def thankyou():
    print "Thank you!"
    time.sleep(0.05)

for idx in range(10):
    hello()

for idx in range(100):
    thankyou()

To run the profiler and generate stats for the file you can just run:

python -m cProfile -o timeStats.profile timeFunctions.py

What this is doing is using the cProfile module to profile all functions in timeFunctions.py and collecting the stats in the timeStats.profile file. Note that we did not have to add any code to existing module (timeFunctions.py) and this can be done with any module.

Once you have the stats file you can run the pstats module as follows:

python -m pstats timeStats.profile

This runs the interactive statistics browser which gives you a lot of nice functionality. For your particular use case you can just check the stats for your function. In our example checking stats for both functions shows us the following:

Welcome to the profile statistics browser.
timeStats.profile% stats hello
<timestamp>    timeStats.profile

         224 function calls in 6.014 seconds

   Random listing order was used
   List reduced from 6 to 1 due to restriction <'hello'>

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
       10    0.000    0.000    1.001    0.100 timeFunctions.py:3(hello)

timeStats.profile% stats thankyou
<timestamp>    timeStats.profile

         224 function calls in 6.014 seconds

   Random listing order was used
   List reduced from 6 to 1 due to restriction <'thankyou'>

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
      100    0.002    0.000    5.012    0.050 timeFunctions.py:7(thankyou)

The dummy example does not do much but give you an idea of what can be done. The best part about this approach is that I dont have to edit any of my existing code to get these numbers and obviously help with profiling.


回答 12

这是另一个用于计时代码的上下文管理器-

用法:

from benchmark import benchmark

with benchmark("Test 1+1"):
    1+1
=>
Test 1+1 : 1.41e-06 seconds

或者,如果您需要时间值

with benchmark("Test 1+1") as b:
    1+1
print(b.time)
=>
Test 1+1 : 7.05e-07 seconds
7.05233786763e-07

Benchmark.py

from timeit import default_timer as timer

class benchmark(object):

    def __init__(self, msg, fmt="%0.3g"):
        self.msg = msg
        self.fmt = fmt

    def __enter__(self):
        self.start = timer()
        return self

    def __exit__(self, *args):
        t = timer() - self.start
        print(("%s : " + self.fmt + " seconds") % (self.msg, t))
        self.time = t

改编自http://dabeaz.blogspot.fr/2010/02/context-manager-for-timing-benchmarks.html

Here’s another context manager for timing code –

Usage:

from benchmark import benchmark

with benchmark("Test 1+1"):
    1+1
=>
Test 1+1 : 1.41e-06 seconds

or, if you need the time value

with benchmark("Test 1+1") as b:
    1+1
print(b.time)
=>
Test 1+1 : 7.05e-07 seconds
7.05233786763e-07

benchmark.py:

from timeit import default_timer as timer

class benchmark(object):

    def __init__(self, msg, fmt="%0.3g"):
        self.msg = msg
        self.fmt = fmt

    def __enter__(self):
        self.start = timer()
        return self

    def __exit__(self, *args):
        t = timer() - self.start
        print(("%s : " + self.fmt + " seconds") % (self.msg, t))
        self.time = t

Adapted from http://dabeaz.blogspot.fr/2010/02/context-manager-for-timing-benchmarks.html


回答 13

使用探查器模块。它提供了非常详细的配置文件。

import profile
profile.run('main()')

它输出类似:

          5 function calls in 0.047 seconds

   Ordered by: standard name

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1    0.000    0.000    0.000    0.000 :0(exec)
        1    0.047    0.047    0.047    0.047 :0(setprofile)
        1    0.000    0.000    0.000    0.000 <string>:1(<module>)
        0    0.000             0.000          profile:0(profiler)
        1    0.000    0.000    0.047    0.047 profile:0(main())
        1    0.000    0.000    0.000    0.000 two_sum.py:2(twoSum)

我发现它非常有用。

Use profiler module. It gives a very detailed profile.

import profile
profile.run('main()')

it outputs something like:

          5 function calls in 0.047 seconds

   Ordered by: standard name

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1    0.000    0.000    0.000    0.000 :0(exec)
        1    0.047    0.047    0.047    0.047 :0(setprofile)
        1    0.000    0.000    0.000    0.000 <string>:1(<module>)
        0    0.000             0.000          profile:0(profiler)
        1    0.000    0.000    0.047    0.047 profile:0(main())
        1    0.000    0.000    0.000    0.000 two_sum.py:2(twoSum)

I’ve found it very informative.


回答 14

我喜欢它简单(python 3):

from timeit import timeit

timeit(lambda: print("hello"))

单个执行的输出为微秒

2.430883963010274

说明:timeit 默认执行匿名函数一百万次,结果以秒为单位。因此,单次执行的结果是相同的数量,但平均为微秒


对于缓慢的操作添加较低数量的迭代,或者你可能会永远等待:

import time

timeit(lambda: time.sleep(1.5), number=1)

输出总是在数目迭代:

1.5015795179999714

I like it simple (python 3):

from timeit import timeit

timeit(lambda: print("hello"))

Output is microseconds for a single execution:

2.430883963010274

Explanation: timeit executes the anonymous function 1 million times by default and the result is given in seconds. Therefore the result for 1 single execution is the same amount but in microseconds on average.


For slow operations add a lower number of iterations or you could be waiting forever:

import time

timeit(lambda: time.sleep(1.5), number=1)

Output is always in seconds for the total number of iterations:

1.5015795179999714

回答 15

(仅对于Ipython)可以使用%timeit来衡量平均处理时间:

def foo():
    print "hello"

然后:

%timeit foo()

结果是这样的:

10000 loops, best of 3: 27 µs per loop

(With Ipython only) you can use %timeit to measure average processing time:

def foo():
    print "hello"

and then:

%timeit foo()

the result is something like:

10000 loops, best of 3: 27 µs per loop

回答 16

使用timeit的另一种方法:

from timeit import timeit

def func():
    return 1 + 1

time = timeit(func, number=1)
print(time)

One more way to use timeit:

from timeit import timeit

def func():
    return 1 + 1

time = timeit(func, number=1)
print(time)

回答 17

在python3上:

from time import sleep, perf_counter as pc
t0 = pc()
sleep(1)
print(pc()-t0)

优雅而短暂。

on python3:

from time import sleep, perf_counter as pc
t0 = pc()
sleep(1)
print(pc()-t0)

elegant and short.


回答 18

有点超级后来的反应,但也许对某人有用。我认为这是一种超级干净的方法。

import time

def timed(fun, *args):
    s = time.time()
    r = fun(*args)
    print('{} execution took {} seconds.'.format(fun.__name__, time.time()-s))
    return(r)

timed(print, "Hello")

请记住,“ print”是Python 3中的功能,而不是Python 2.7中的功能。但是,它可以与任何其他功能一起使用。干杯!

Kind of a super later response, but maybe it serves a purpose for someone. This is a way to do it which I think is super clean.

import time

def timed(fun, *args):
    s = time.time()
    r = fun(*args)
    print('{} execution took {} seconds.'.format(fun.__name__, time.time()-s))
    return(r)

timed(print, "Hello")

Keep in mind that “print” is a function in Python 3 and not Python 2.7. However, it works with any other function. Cheers!


回答 19

您可以使用timeit。

这是一个有关如何使用Python REPL测试带参数的naive_func的示例:

>>> import timeit                                                                                         

>>> def naive_func(x):                                                                                    
...     a = 0                                                                                             
...     for i in range(a):                                                                                
...         a += i                                                                                        
...     return a                                                                                          

>>> def wrapper(func, *args, **kwargs):                                                                   
...     def wrapper():                                                                                    
...         return func(*args, **kwargs)                                                                  
...     return wrapper                                                                                    

>>> wrapped = wrapper(naive_func, 1_000)                                                                  

>>> timeit.timeit(wrapped, number=1_000_000)                                                              
0.4458435332577161  

如果function没有任何参数,则不需要包装函数。

You can use timeit.

Here is an example on how to test naive_func that takes parameter using Python REPL:

>>> import timeit                                                                                         

>>> def naive_func(x):                                                                                    
...     a = 0                                                                                             
...     for i in range(a):                                                                                
...         a += i                                                                                        
...     return a                                                                                          

>>> def wrapper(func, *args, **kwargs):                                                                   
...     def wrapper():                                                                                    
...         return func(*args, **kwargs)                                                                  
...     return wrapper                                                                                    

>>> wrapped = wrapper(naive_func, 1_000)                                                                  

>>> timeit.timeit(wrapped, number=1_000_000)                                                              
0.4458435332577161  

You don’t need wrapper function if function doesn’t have any parameters.


回答 20

我们还可以将时间转换为人类可以理解的时间。

import time, datetime

start = time.clock()

def num_multi1(max):
    result = 0
    for num in range(0, 1000):
        if (num % 3 == 0 or num % 5 == 0):
            result += num

    print "Sum is %d " % result

num_multi1(1000)

end = time.clock()
value = end - start
timestamp = datetime.datetime.fromtimestamp(value)
print timestamp.strftime('%Y-%m-%d %H:%M:%S')

We can also convert time into human-readable time.

import time, datetime

start = time.clock()

def num_multi1(max):
    result = 0
    for num in range(0, 1000):
        if (num % 3 == 0 or num % 5 == 0):
            result += num

    print "Sum is %d " % result

num_multi1(1000)

end = time.clock()
value = end - start
timestamp = datetime.datetime.fromtimestamp(value)
print timestamp.strftime('%Y-%m-%d %H:%M:%S')

回答 21

我为此创建了一个库,如果您想测量一个函数,可以像这样做


from pythonbenchmark import compare, measure
import time

a,b,c,d,e = 10,10,10,10,10
something = [a,b,c,d,e]

@measure
def myFunction(something):
    time.sleep(0.4)

@measure
def myOptimizedFunction(something):
    time.sleep(0.2)

myFunction(input)
myOptimizedFunction(input)

https://github.com/Karlheinzniebuhr/pythonbenchmark

I made a library for this, if you want to measure a function you can just do it like this


from pythonbenchmark import compare, measure
import time

a,b,c,d,e = 10,10,10,10,10
something = [a,b,c,d,e]

@measure
def myFunction(something):
    time.sleep(0.4)

@measure
def myOptimizedFunction(something):
    time.sleep(0.2)

myFunction(input)
myOptimizedFunction(input)

https://github.com/Karlheinzniebuhr/pythonbenchmark


回答 22

要了解每个函数的递归调用,请执行以下操作:

%load_ext snakeviz
%%snakeviz

它只需在Jupyter笔记本中使用这两行代码,并生成一个漂亮的交互式图表。例如:

在此处输入图片说明

这是代码。再次,以2开头的%行是使用snakeviz所需的仅有的额外代码行:

# !pip install snakeviz
%load_ext snakeviz
import glob
import hashlib

%%snakeviz

files = glob.glob('*.txt')
def print_files_hashed(files):
    for file in files:
        with open(file) as f:
            print(hashlib.md5(f.read().encode('utf-8')).hexdigest())
print_files_hashed(files)

在笔记本外运行snakeviz也似乎是可能的。在snakeviz网站上有更多信息。

To get insight on every function calls recursively, do:

%load_ext snakeviz
%%snakeviz

It just takes those 2 lines of code in a Jupyter notebook, and it generates a nice interactive diagram. For example:

enter image description here

Here is the code. Again, the 2 lines starting with % are the only extra lines of code needed to use snakeviz:

# !pip install snakeviz
%load_ext snakeviz
import glob
import hashlib

%%snakeviz

files = glob.glob('*.txt')
def print_files_hashed(files):
    for file in files:
        with open(file) as f:
            print(hashlib.md5(f.read().encode('utf-8')).hexdigest())
print_files_hashed(files)

It also seems possible to run snakeviz outside notebooks. More info on the snakeviz website.


回答 23

import time

def getElapsedTime(startTime, units):
    elapsedInSeconds = time.time() - startTime
    if units == 'sec':
        return elapsedInSeconds
    if units == 'min':
        return elapsedInSeconds/60
    if units == 'hour':
        return elapsedInSeconds/(60*60)
import time

def getElapsedTime(startTime, units):
    elapsedInSeconds = time.time() - startTime
    if units == 'sec':
        return elapsedInSeconds
    if units == 'min':
        return elapsedInSeconds/60
    if units == 'hour':
        return elapsedInSeconds/(60*60)

回答 24

这种独特的基于类的方法提供了可打印的字符串表示形式,可自定义的舍入功能,并且可以方便地访问作为字符串或浮点数的经过时间。它是使用Python 3.7开发的。

import datetime
import timeit


class Timer:
    """Measure time used."""
    # Ref: https://stackoverflow.com/a/57931660/

    def __init__(self, round_ndigits: int = 0):
        self._round_ndigits = round_ndigits
        self._start_time = timeit.default_timer()

    def __call__(self) -> float:
        return timeit.default_timer() - self._start_time

    def __str__(self) -> str:
        return str(datetime.timedelta(seconds=round(self(), self._round_ndigits)))

用法:

# Setup timer
>>> timer = Timer()

# Access as a string
>>> print(f'Time elapsed is {timer}.')
Time elapsed is 0:00:03.
>>> print(f'Time elapsed is {timer}.')
Time elapsed is 0:00:04.

# Access as a float
>>> timer()
6.841332235
>>> timer()
7.970274425

This unique class-based approach offers a printable string representation, customizable rounding, and convenient access to the elapsed time as a string or a float. It was developed with Python 3.7.

import datetime
import timeit


class Timer:
    """Measure time used."""
    # Ref: https://stackoverflow.com/a/57931660/

    def __init__(self, round_ndigits: int = 0):
        self._round_ndigits = round_ndigits
        self._start_time = timeit.default_timer()

    def __call__(self) -> float:
        return timeit.default_timer() - self._start_time

    def __str__(self) -> str:
        return str(datetime.timedelta(seconds=round(self(), self._round_ndigits)))

Usage:

# Setup timer
>>> timer = Timer()

# Access as a string
>>> print(f'Time elapsed is {timer}.')
Time elapsed is 0:00:03.
>>> print(f'Time elapsed is {timer}.')
Time elapsed is 0:00:04.

# Access as a float
>>> timer()
6.841332235
>>> timer()
7.970274425

回答 25

测量小代码段的执行时间。

时间单位以秒为单位,以浮点数表示

import timeit
t = timeit.Timer('li = list(map(lambda x:x*2,[1,2,3,4,5]))')
t.timeit()
t.repeat()
>[1.2934070999999676, 1.3335035000000062, 1.422568500000125]

repeat()方法可以方便地多次调用timeit()并返回结果列表。

repeat(repeat=3

有了这个清单,我们可以花很多时间。

默认情况下,timeit()在计时期间临时关闭垃圾收集。time.Timer()解决了这个问题。

优点:

timeit.Timer()使独立计时更具可比性。gc可能是被测功能性能的重要组成部分。如果是这样,可以将gc(垃圾收集器)作为设置字符串中的第一条语句重新启用。例如:

timeit.Timer('li = list(map(lambda x:x*2,[1,2,3,4,5]))',setup='gc.enable()')

Python文档

Measure execution time of small code snippets.

Unit of time: measured in seconds as a float

import timeit
t = timeit.Timer('li = list(map(lambda x:x*2,[1,2,3,4,5]))')
t.timeit()
t.repeat()
>[1.2934070999999676, 1.3335035000000062, 1.422568500000125]

The repeat() method is a convenience to call timeit() multiple times and return a list of results.

repeat(repeat=3)¶

With this list we can take a mean of all times.

By default, timeit() temporarily turns off garbage collection during the timing. time.Timer() solves this problem.

Pros:

timeit.Timer() makes independent timings more comparable. The gc may be an important component of the performance of the function being measured. If so, gc(garbage collector) can be re-enabled as the first statement in the setup string. For example:

timeit.Timer('li = list(map(lambda x:x*2,[1,2,3,4,5]))',setup='gc.enable()')

Source Python Docs!


回答 26

如果您希望能够方便地计时功能,可以使用一个简单的装饰器:

def timing_decorator(func):
    def wrapper(*args, **kwargs):
        start = time.time()
        original_return_val = func(*args, **kwargs)
        end = time.time()
        print("time elapsed in ", func.__name__, ": ", end - start, sep='')
        return original_return_val

    return wrapper

您可以在想要计时的函数上使用它,如下所示:

@timing_decorator
def function_to_time():
    time.sleep(1)

然后function_to_time,在您每次调用时,它都会打印花费了多长时间以及函数的名称。

If you want to be able to time functions conveniently, you can use a simple decorator:

def timing_decorator(func):
    def wrapper(*args, **kwargs):
        start = time.time()
        original_return_val = func(*args, **kwargs)
        end = time.time()
        print("time elapsed in ", func.__name__, ": ", end - start, sep='')
        return original_return_val

    return wrapper

You can use it on a function that you want to time like this:

@timing_decorator
def function_to_time():
    time.sleep(1)

Then any time you call function_to_time, it will print how long it took and the name of the function being timed.


回答 27

基于https://stackoverflow.com/a/30024601/5095636提供的contextmanager解决方案,以下是lambda免费版本,因为flake8根据E731警告使用lambda :

from contextlib import contextmanager
from timeit import default_timer

@contextmanager
def elapsed_timer():
    start_time = default_timer()

    class _Timer():
      start = start_time
      end = default_timer()
      duration = end - start

    yield _Timer

    end_time = default_timer()
    _Timer.end = end_time
    _Timer.duration = end_time - start_time

测试:

from time import sleep

with elapsed_timer() as t:
    print("start:", t.start)
    sleep(1)
    print("end:", t.end)

t.start
t.end
t.duration

based on the contextmanager solution given by https://stackoverflow.com/a/30024601/5095636, hereunder the lambda free version, as flake8 warns on the usage of lambda as per E731:

from contextlib import contextmanager
from timeit import default_timer

@contextmanager
def elapsed_timer():
    start_time = default_timer()

    class _Timer():
      start = start_time
      end = default_timer()
      duration = end - start

    yield _Timer

    end_time = default_timer()
    _Timer.end = end_time
    _Timer.duration = end_time - start_time

test:

from time import sleep

with elapsed_timer() as t:
    print("start:", t.start)
    sleep(1)
    print("end:", t.end)

t.start
t.end
t.duration

回答 28

我能想到的唯一方法是使用time.time()

import time
start = time.time()
sleep(5) #just to give it some delay to show it working
finish = time.time()
elapsed = finish - start
print(elapsed)

希望对您有所帮助。

The only way I can think of is using time.time().

import time
start = time.time()
sleep(5) #just to give it some delay to show it working
finish = time.time()
elapsed = finish - start
print(elapsed)

Hope that will help.


回答 29

timeit模块非常适合计时一小段Python代码。它至少可以以三种形式使用:

1-作为命令行模块

python2 -m timeit 'for i in xrange(10): oct(i)' 

2-对于短代码,请将其作为参数传递。

import timeit
timeit.Timer('for i in xrange(10): oct(i)').timeit()

3-对于更长的代码为:

import timeit
code_to_test = """
a = range(100000)
b = []
for i in a:
    b.append(i*2)
"""
elapsed_time = timeit.timeit(code_to_test, number=100)/100
print(elapsed_time)

The timeit module is good for timing a small piece of Python code. It can be used at least in three forms:

1- As a command-line module

python2 -m timeit 'for i in xrange(10): oct(i)' 

2- For a short code, pass it as arguments.

import timeit
timeit.Timer('for i in xrange(10): oct(i)').timeit()

3- For longer code as:

import timeit
code_to_test = """
a = range(100000)
b = []
for i in a:
    b.append(i*2)
"""
elapsed_time = timeit.timeit(code_to_test, number=100)/100
print(elapsed_time)