问题:如何使用Python的timeit计时代码段以测试性能?
我有一个Python脚本,该脚本可以正常工作,但是我需要编写执行时间。我已经用谷歌搜索了,timeit
但是我似乎无法使它正常工作。
我的Python脚本如下所示:
import sys
import getopt
import timeit
import random
import os
import re
import ibm_db
import time
from string import maketrans
myfile = open("results_update.txt", "a")
for r in range(100):
rannumber = random.randint(0, 100)
update = "update TABLE set val = %i where MyCount >= '2010' and MyCount < '2012' and number = '250'" % rannumber
#print rannumber
conn = ibm_db.pconnect("dsn=myDB","usrname","secretPWD")
for r in range(5):
print "Run %s\n" % r
ibm_db.execute(query_stmt)
query_stmt = ibm_db.prepare(conn, update)
myfile.close()
ibm_db.close(conn)
我需要的是执行查询并将其写入文件所需的时间results_update.txt
。目的是测试具有不同索引和调整机制的数据库更新语句。
I’ve a python script which works just as it should, but I need to write the execution time. I’ve googled that I should use timeit
but I can’t seem to get it to work.
My Python script looks like this:
import sys
import getopt
import timeit
import random
import os
import re
import ibm_db
import time
from string import maketrans
myfile = open("results_update.txt", "a")
for r in range(100):
rannumber = random.randint(0, 100)
update = "update TABLE set val = %i where MyCount >= '2010' and MyCount < '2012' and number = '250'" % rannumber
#print rannumber
conn = ibm_db.pconnect("dsn=myDB","usrname","secretPWD")
for r in range(5):
print "Run %s\n" % r
ibm_db.execute(query_stmt)
query_stmt = ibm_db.prepare(conn, update)
myfile.close()
ibm_db.close(conn)
What I need is the time it takes to execute the query and write it to the file results_update.txt
. The purpose is to test an update statement for my database with different indexes and tuning mechanisms.
回答 0
您可以在要计时的块之前或之后使用time.time()
或time.clock()
。
import time
t0 = time.time()
code_block
t1 = time.time()
total = t1-t0
此方法不完全准确timeit
(它不会平均运行几次),但很简单。
time.time()
(在Windows和Linux中)和time.clock()
(在Linux中)不够精确,无法实现快速功能(total = 0)。在这种情况下,或者如果要平均几次运行所花费的时间,则必须多次手动调用该函数(就像我在示例代码中已经做过的那样,并且在设置其number参数时timeit会自动执行)
import time
def myfast():
code
n = 10000
t0 = time.time()
for i in range(n): myfast()
t1 = time.time()
total_n = t1-t0
如注释中所述,在Windows中,Corey time.clock()
具有更高的精度(微秒而不是秒),并且优于time.time()
。
You can use time.time()
or time.clock()
before and after the block you want to time.
import time
t0 = time.time()
code_block
t1 = time.time()
total = t1-t0
This method is not as exact as timeit
(it does not average several runs) but it is straightforward.
time.time()
(in Windows and Linux) and time.clock()
(in Linux) are not precise enough for fast functions (you get total = 0). In this case or if you want to average the time elapsed by several runs, you have to manually call the function multiple times (As I think you already do in you example code and timeit does automatically when you set its number argument)
import time
def myfast():
code
n = 10000
t0 = time.time()
for i in range(n): myfast()
t1 = time.time()
total_n = t1-t0
In Windows, as Corey stated in the comment, time.clock()
has much higher precision (microsecond instead of second) and is preferred over time.time()
.
回答 1
如果您要分析代码并可以使用IPython,则它具有magic函数%timeit
。
%%timeit
对细胞进行操作。
In [2]: %timeit cos(3.14)
10000000 loops, best of 3: 160 ns per loop
In [3]: %%timeit
...: cos(3.14)
...: x = 2 + 3
...:
10000000 loops, best of 3: 196 ns per loop
If you are profiling your code and can use IPython, it has the magic function %timeit
.
%%timeit
operates on cells.
In [2]: %timeit cos(3.14)
10000000 loops, best of 3: 160 ns per loop
In [3]: %%timeit
...: cos(3.14)
...: x = 2 + 3
...:
10000000 loops, best of 3: 196 ns per loop
回答 2
除了时间之外,您显示的这段代码是完全错误的:您执行100个连接(完全忽略最后一个连接,而所有连接除外),然后在您执行第一个执行调用时,将其传递给本地变量query_stmt
,该变量仅在执行后初始化呼叫。
首先,使您的代码正确,而不必担心时间安排:即建立或接收连接并对该连接执行100或500或任意数量的更新的函数,然后关闭该连接。一旦您的代码正常工作,便是考虑在其上使用的正确点timeit
!
具体来说,如果要计时的函数是一个无参数的函数,则foobar
可以使用timeit.timeit(2.6或更高版本-在2.5及更高版本中更为复杂):
timeit.timeit('foobar()', number=1000)
您最好指定运行次数,因为对于您的用例而言,默认值(百万)可能会很高(导致在此代码中花费大量时间;-)。
Quite apart from the timing, this code you show is simply incorrect: you execute 100 connections (completely ignoring all but the last one), and then when you do the first execute call you pass it a local variable query_stmt
which you only initialize after the execute call.
First, make your code correct, without worrying about timing yet: i.e. a function that makes or receives a connection and performs 100 or 500 or whatever number of updates on that connection, then closes the connection. Once you have your code working correctly is the correct point at which to think about using timeit
on it!
Specifically, if the function you want to time is a parameter-less one called foobar
you can use timeit.timeit (2.6 or later — it’s more complicated in 2.5 and before):
timeit.timeit('foobar()', number=1000)
You’d better specify the number of runs because the default, a million, may be high for your use case (leading to spending a lot of time in this code;-).
回答 3
专注于一件事。磁盘I / O速度很慢,因此如果您要调整的只是数据库查询,那么我将不进行测试。
而且,如果需要安排数据库执行时间,请改用数据库工具,例如询问查询计划,并注意性能不仅随确切的查询和拥有的索引而变化,还随数据负载(多少数据)而变化。您已存储)。
就是说,您只需将代码放入函数中,然后使用即可运行该函数timeit.timeit()
:
def function_to_repeat():
# ...
duration = timeit.timeit(function_to_repeat, number=1000)
这将禁用垃圾收集,重复调用该function_to_repeat()
函数,并使用以下命令计时这些调用的总持续时间timeit.default_timer()
,这是您特定平台上最准确的可用时钟。
您应该将设置代码移出重复功能;例如,您应该首先连接到数据库,然后仅对查询计时。使用setup
参数导入或创建这些依赖项,并将其传递给函数:
def function_to_repeat(var1, var2):
# ...
duration = timeit.timeit(
'function_to_repeat(var1, var2)',
'from __main__ import function_to_repeat, var1, var2',
number=1000)
会抓取globals function_to_repeat
,var1
并var2
从您的脚本中将其每次重复传递给函数。
Focus on one specific thing. Disk I/O is slow, so I’d take that out of the test if all you are going to tweak is the database query.
And if you need to time your database execution, look for database tools instead, like asking for the query plan, and note that performance varies not only with the exact query and what indexes you have, but also with the data load (how much data you have stored).
That said, you can simply put your code in a function and run that function with timeit.timeit()
:
def function_to_repeat():
# ...
duration = timeit.timeit(function_to_repeat, number=1000)
This would disable the garbage collection, repeatedly call the function_to_repeat()
function, and time the total duration of those calls using timeit.default_timer()
, which is the most accurate available clock for your specific platform.
You should move setup code out of the repeated function; for example, you should connect to the database first, then time only the queries. Use the setup
argument to either import or create those dependencies, and pass them into your function:
def function_to_repeat(var1, var2):
# ...
duration = timeit.timeit(
'function_to_repeat(var1, var2)',
'from __main__ import function_to_repeat, var1, var2',
number=1000)
would grab the globals function_to_repeat
, var1
and var2
from your script and pass those to the function each repetition.
回答 4
我看到问题已经得到解答,但是我仍然想加2美分。
我也遇到过类似的情况,在这种情况下,我必须测试几种方法的执行时间,因此编写了一个小脚本,该脚本对其中编写的所有函数都调用timeit。
该脚本也可以在github gist上找到。
希望对您和其他人有帮助。
from random import random
import types
def list_without_comprehension():
l = []
for i in xrange(1000):
l.append(int(random()*100 % 100))
return l
def list_with_comprehension():
# 1K random numbers between 0 to 100
l = [int(random()*100 % 100) for _ in xrange(1000)]
return l
# operations on list_without_comprehension
def sort_list_without_comprehension():
list_without_comprehension().sort()
def reverse_sort_list_without_comprehension():
list_without_comprehension().sort(reverse=True)
def sorted_list_without_comprehension():
sorted(list_without_comprehension())
# operations on list_with_comprehension
def sort_list_with_comprehension():
list_with_comprehension().sort()
def reverse_sort_list_with_comprehension():
list_with_comprehension().sort(reverse=True)
def sorted_list_with_comprehension():
sorted(list_with_comprehension())
def main():
objs = globals()
funcs = []
f = open("timeit_demo.sh", "w+")
for objname in objs:
if objname != 'main' and type(objs[objname]) == types.FunctionType:
funcs.append(objname)
funcs.sort()
for func in funcs:
f.write('''echo "Timing: %(funcname)s"
python -m timeit "import timeit_demo; timeit_demo.%(funcname)s();"\n\n
echo "------------------------------------------------------------"
''' % dict(
funcname = func,
)
)
f.close()
if __name__ == "__main__":
main()
from os import system
#Works only for *nix platforms
system("/bin/bash timeit_demo.sh")
#un-comment below for windows
#system("cmd timeit_demo.sh")
I see the question has already been answered, but still want to add my 2 cents for the same.
I have also faced similar scenario in which I have to test the execution times for several approaches and hence written a small script, which calls timeit on all functions written in it.
The script is also available as github gist here.
Hope it will help you and others.
from random import random
import types
def list_without_comprehension():
l = []
for i in xrange(1000):
l.append(int(random()*100 % 100))
return l
def list_with_comprehension():
# 1K random numbers between 0 to 100
l = [int(random()*100 % 100) for _ in xrange(1000)]
return l
# operations on list_without_comprehension
def sort_list_without_comprehension():
list_without_comprehension().sort()
def reverse_sort_list_without_comprehension():
list_without_comprehension().sort(reverse=True)
def sorted_list_without_comprehension():
sorted(list_without_comprehension())
# operations on list_with_comprehension
def sort_list_with_comprehension():
list_with_comprehension().sort()
def reverse_sort_list_with_comprehension():
list_with_comprehension().sort(reverse=True)
def sorted_list_with_comprehension():
sorted(list_with_comprehension())
def main():
objs = globals()
funcs = []
f = open("timeit_demo.sh", "w+")
for objname in objs:
if objname != 'main' and type(objs[objname]) == types.FunctionType:
funcs.append(objname)
funcs.sort()
for func in funcs:
f.write('''echo "Timing: %(funcname)s"
python -m timeit "import timeit_demo; timeit_demo.%(funcname)s();"\n\n
echo "------------------------------------------------------------"
''' % dict(
funcname = func,
)
)
f.close()
if __name__ == "__main__":
main()
from os import system
#Works only for *nix platforms
system("/bin/bash timeit_demo.sh")
#un-comment below for windows
#system("cmd timeit_demo.sh")
回答 5
这是史蒂文的答案的简单包装。该函数不会重复运行/求平均值,只是使您不必在任何地方重复计时代码即可:)
'''function which prints the wall time it takes to execute the given command'''
def time_func(func, *args): #*args can take 0 or more
import time
start_time = time.time()
func(*args)
end_time = time.time()
print("it took this long to run: {}".format(end_time-start_time))
Here’s a simple wrapper for steven’s answer. This function doesn’t do repeated runs/averaging, just saves you from having to repeat the timing code everywhere :)
'''function which prints the wall time it takes to execute the given command'''
def time_func(func, *args): #*args can take 0 or more
import time
start_time = time.time()
func(*args)
end_time = time.time()
print("it took this long to run: {}".format(end_time-start_time))
回答 6
测试套件没有尝试使用导入的程序,timeit
因此很难说出意图是什么。但是,这是一个规范的答案,因此timeit
似乎有一个完整的例子,详细说明了Martijn的答案。
提供的文档timeit
提供了许多示例和标志,值得一试。命令行的基本用法是:
$ python -mtimeit "all(True for _ in range(1000))"
2000 loops, best of 5: 161 usec per loop
$ python -mtimeit "all([True for _ in range(1000)])"
2000 loops, best of 5: 116 usec per loop
运行-h
以查看所有选项。Python MOTW的精彩部分timeit
展示了如何通过命令行中的导入和多行代码字符串运行模块。
在脚本形式中,我通常这样使用它:
import argparse
import copy
import dis
import inspect
import random
import sys
import timeit
def test_slice(L):
L[:]
def test_copy(L):
L.copy()
def test_deepcopy(L):
copy.deepcopy(L)
if __name__ == "__main__":
parser = argparse.ArgumentParser()
parser.add_argument("--n", type=int, default=10 ** 5)
parser.add_argument("--trials", type=int, default=100)
parser.add_argument("--dis", action="store_true")
args = parser.parse_args()
n = args.n
trials = args.trials
namespace = dict(L = random.sample(range(n), k=n))
funcs_to_test = [x for x in locals().values()
if callable(x) and x.__module__ == __name__]
print(f"{'-' * 30}\nn = {n}, {trials} trials\n{'-' * 30}\n")
for func in funcs_to_test:
fname = func.__name__
fargs = ", ".join(inspect.signature(func).parameters)
stmt = f"{fname}({fargs})"
setup = f"from __main__ import {fname}"
time = timeit.timeit(stmt, setup, number=trials, globals=namespace)
print(inspect.getsource(globals().get(fname)))
if args.dis:
dis.dis(globals().get(fname))
print(f"time (s) => {time}\n{'-' * 30}\n")
您可以轻松添加所需的函数和参数。使用不纯函数时要小心,并要注意状态。
样本输出:
$ python benchmark.py --n 10000
------------------------------
n = 10000, 100 trials
------------------------------
def test_slice(L):
L[:]
time (s) => 0.015502399999999972
------------------------------
def test_copy(L):
L.copy()
time (s) => 0.01651419999999998
------------------------------
def test_deepcopy(L):
copy.deepcopy(L)
time (s) => 2.136012
------------------------------
The testing suite doesn’t make an attempt at using the imported timeit
so it’s hard to tell what the intent was. Nonetheless, this is a canonical answer so a complete example of timeit
seems in order, elaborating on Martijn’s answer.
The docs for timeit
offer many examples and flags worth checking out. The basic usage on the command line is:
$ python -mtimeit "all(True for _ in range(1000))"
2000 loops, best of 5: 161 usec per loop
$ python -mtimeit "all([True for _ in range(1000)])"
2000 loops, best of 5: 116 usec per loop
Run with -h
to see all options. Python MOTW has a great section on timeit
that shows how to run modules via import and multiline code strings from the command line.
In script form, I typically use it like this:
import argparse
import copy
import dis
import inspect
import random
import sys
import timeit
def test_slice(L):
L[:]
def test_copy(L):
L.copy()
def test_deepcopy(L):
copy.deepcopy(L)
if __name__ == "__main__":
parser = argparse.ArgumentParser()
parser.add_argument("--n", type=int, default=10 ** 5)
parser.add_argument("--trials", type=int, default=100)
parser.add_argument("--dis", action="store_true")
args = parser.parse_args()
n = args.n
trials = args.trials
namespace = dict(L = random.sample(range(n), k=n))
funcs_to_test = [x for x in locals().values()
if callable(x) and x.__module__ == __name__]
print(f"{'-' * 30}\nn = {n}, {trials} trials\n{'-' * 30}\n")
for func in funcs_to_test:
fname = func.__name__
fargs = ", ".join(inspect.signature(func).parameters)
stmt = f"{fname}({fargs})"
setup = f"from __main__ import {fname}"
time = timeit.timeit(stmt, setup, number=trials, globals=namespace)
print(inspect.getsource(globals().get(fname)))
if args.dis:
dis.dis(globals().get(fname))
print(f"time (s) => {time}\n{'-' * 30}\n")
You can pretty easily drop in the functions and arguments you need. Use caution when using impure functions and take care of state.
Sample output:
$ python benchmark.py --n 10000
------------------------------
n = 10000, 100 trials
------------------------------
def test_slice(L):
L[:]
time (s) => 0.015502399999999972
------------------------------
def test_copy(L):
L.copy()
time (s) => 0.01651419999999998
------------------------------
def test_deepcopy(L):
copy.deepcopy(L)
time (s) => 2.136012
------------------------------