python中最有效的字符串连接方法是什么?

问题:python中最有效的字符串连接方法是什么?

有没有在Python任何有效的质量字符串连接方法(如StringBuilder的 C#或StringBuffer的在Java中)?我在这里找到以下方法:

  • 简单串联使用 +
  • 使用字符串列表和join方法
  • UserStringMutableString模块使用
  • 使用字符数组和array模块
  • cStringIOStringIO模块使用

但是您的专家使用或建议了什么,为什么?

[ 这里的一个相关问题 ]

Is there any efficient mass string concatenation method in Python (like StringBuilder in C# or StringBuffer in Java)? I found following methods here:

  • Simple concatenation using +
  • Using string list and join method
  • Using UserString from MutableString module
  • Using character array and the array module
  • Using cStringIO from StringIO module

But what do you experts use or suggest, and why?

[A related question here]


回答 0

您可能对此感兴趣:Guido 的优化轶事。尽管还应该记住这是一篇老文章,并且早于诸如此类的内容的存在''.join(尽管我猜string.joinfields大致相同)

鉴于此,如果您可以将问题塞入该array模块,则该模块可能是最快的。但是''.join可能足够快,并且具有惯用的好处,因此其他Python程序员更容易理解。

最后,优化的黄金法则:除非您知道自己需要进行优化,否则不要进行优化,而要进行衡量而不是猜测。

您可以使用该timeit模块测量不同的方法。这样可以告诉您哪个最快,而不是互联网上的随机陌生人进行猜测。

You may be interested in this: An optimization anecdote by Guido. Although it is worth remembering also that this is an old article and it predates the existence of things like ''.join (although I guess string.joinfields is more-or-less the same)

On the strength of that, the array module may be fastest if you can shoehorn your problem into it. But ''.join is probably fast enough and has the benefit of being idiomatic and thus easier for other python programmers to understand.

Finally, the golden rule of optimization: don’t optimize unless you know you need to, and measure rather than guessing.

You can measure different methods using the timeit module. That can tell you which is fastest, instead of random strangers on the internet making guesses.


回答 1

''.join(sequenceofstrings) 通常是最有效的方法-最简单,最快。

''.join(sequenceofstrings) is what usually works best — simplest and fastest.


回答 2

Python 3.6改变了使用文字字符串插值对已知组件进行字符串连接的游戏。

根据mkoistinen的答案给出测试用例,有字符串

domain = 'some_really_long_example.com'
lang = 'en'
path = 'some/really/long/path/'

竞争者是

  • f'http://{domain}/{lang}/{path}'0.151微秒

  • 'http://%s/%s/%s' % (domain, lang, path) -0.321微秒

  • 'http://' + domain + '/' + lang + '/' + path -0.356微秒

  • ''.join(('http://', domain, '/', lang, '/', path))0.249微秒(请注意,构建一个定长元组比构建一个定长列表更快)。

因此,目前最短和最漂亮的代码也是最快的。

在Python 3.6的Alpha版本中,f''字符串的实现是最慢的-实际上,生成的字节代码几乎等同于''.join()带有不必要调用的情况,str.__format__而没有参数的调用则只会返回self不变。这些效率低下问题已在3.6决赛之前解决。

速度可以与Python 2最快的方法(+在我的计算机上串联)形成对比。而这需要0.203 μs的8位字符串,0.259微秒如果字符串所有的Unicode。

Python 3.6 changed the game for string concatenation of known components with Literal String Interpolation.

Given the test case from mkoistinen’s answer, having strings

domain = 'some_really_long_example.com'
lang = 'en'
path = 'some/really/long/path/'

The contenders are

  • f'http://{domain}/{lang}/{path}'0.151 µs

  • 'http://%s/%s/%s' % (domain, lang, path) – 0.321 µs

  • 'http://' + domain + '/' + lang + '/' + path – 0.356 µs

  • ''.join(('http://', domain, '/', lang, '/', path))0.249 µs (notice that building a constant-length tuple is slightly faster than building a constant-length list).

Thus currently the shortest and the most beautiful code possible is also fastest.

In alpha versions of Python 3.6 the implementation of f'' strings was the slowest possible – actually the generated byte code is pretty much equivalent to the ''.join() case with unnecessary calls to str.__format__ which without arguments would just return self unchanged. These inefficiencies were addressed before 3.6 final.

The speed can be contrasted with the fastest method for Python 2, which is + concatenation on my computer; and that takes 0.203 µs with 8-bit strings, and 0.259 µs if the strings are all Unicode.


回答 3

这取决于您在做什么。

在Python 2.5之后,使用+运算符进行字符串连接非常快。如果您只是串联几个值,则使用+运算符最有效:

>>> x = timeit.Timer(stmt="'a' + 'b'")
>>> x.timeit()
0.039999961853027344

>>> x = timeit.Timer(stmt="''.join(['a', 'b'])")
>>> x.timeit()
0.76200008392333984

但是,如果将一个字符串放入一个循环中,则最好使用列表连接方法:

>>> join_stmt = """
... joined_str = ''
... for i in xrange(100000):
...   joined_str += str(i)
... """
>>> x = timeit.Timer(join_stmt)
>>> x.timeit(100)
13.278000116348267

>>> list_stmt = """
... str_list = []
... for i in xrange(100000):
...   str_list.append(str(i))
... ''.join(str_list)
... """
>>> x = timeit.Timer(list_stmt)
>>> x.timeit(100)
12.401000022888184

…但是请注意,在差异变得明显之前,您必须将相对大量的字符串放在一起。

It depends on what you’re doing.

After Python 2.5, string concatenation with the + operator is pretty fast. If you’re just concatenating a couple of values, using the + operator works best:

>>> x = timeit.Timer(stmt="'a' + 'b'")
>>> x.timeit()
0.039999961853027344

>>> x = timeit.Timer(stmt="''.join(['a', 'b'])")
>>> x.timeit()
0.76200008392333984

However, if you’re putting together a string in a loop, you’re better off using the list joining method:

>>> join_stmt = """
... joined_str = ''
... for i in xrange(100000):
...   joined_str += str(i)
... """
>>> x = timeit.Timer(join_stmt)
>>> x.timeit(100)
13.278000116348267

>>> list_stmt = """
... str_list = []
... for i in xrange(100000):
...   str_list.append(str(i))
... ''.join(str_list)
... """
>>> x = timeit.Timer(list_stmt)
>>> x.timeit(100)
12.401000022888184

…but notice that you have to be putting together a relatively high number of strings before the difference becomes noticeable.


回答 4

按照约翰·福伊(John Fouhy)的回答,除非必须这样做,否则不要进行优化,但是,如果您在这里问这个问题,可能正是因为您必须这样做。就我而言,我需要从字符串变量中组合一些URL……要快。我注意到(到目前为止)似乎没有人在考虑使用字符串格式方法,所以我认为我会尝试这样做,并且主要出于温和的兴趣,我认为我会把字符串插值运算符扔在那里,以获得更好的度量。老实说,我不认为这两个都会叠加成直接的’+’操作或”.join()。但猜猜怎么了?在我的Python 2.7.5系统上,字符串插值运算符将它们全部规则化,而string.format()的性能最差:

# concatenate_test.py

from __future__ import print_function
import timeit

domain = 'some_really_long_example.com'
lang = 'en'
path = 'some/really/long/path/'
iterations = 1000000

def meth_plus():
    '''Using + operator'''
    return 'http://' + domain + '/' + lang + '/' + path

def meth_join():
    '''Using ''.join()'''
    return ''.join(['http://', domain, '/', lang, '/', path])

def meth_form():
    '''Using string.format'''
    return 'http://{0}/{1}/{2}'.format(domain, lang, path)

def meth_intp():
    '''Using string interpolation'''
    return 'http://%s/%s/%s' % (domain, lang, path)

plus = timeit.Timer(stmt="meth_plus()", setup="from __main__ import meth_plus")
join = timeit.Timer(stmt="meth_join()", setup="from __main__ import meth_join")
form = timeit.Timer(stmt="meth_form()", setup="from __main__ import meth_form")
intp = timeit.Timer(stmt="meth_intp()", setup="from __main__ import meth_intp")

plus.val = plus.timeit(iterations)
join.val = join.timeit(iterations)
form.val = form.timeit(iterations)
intp.val = intp.timeit(iterations)

min_val = min([plus.val, join.val, form.val, intp.val])

print('plus %0.12f (%0.2f%% as fast)' % (plus.val, (100 * min_val / plus.val), ))
print('join %0.12f (%0.2f%% as fast)' % (join.val, (100 * min_val / join.val), ))
print('form %0.12f (%0.2f%% as fast)' % (form.val, (100 * min_val / form.val), ))
print('intp %0.12f (%0.2f%% as fast)' % (intp.val, (100 * min_val / intp.val), ))

结果:

# python2.7 concatenate_test.py
plus 0.360787868500 (90.81% as fast)
join 0.452811956406 (72.36% as fast)
form 0.502608060837 (65.19% as fast)
intp 0.327636957169 (100.00% as fast)

如果我使用较短的域和较短的路径,则插值仍然胜出。但是,更长的字符串之间的区别更加明显。

现在,我有了一个不错的测试脚本,我也在Python 2.6、3.3和3.4下进行了测试,这是结果。在Python 2.6中,加号运算符是最快的!在Python 3上,join胜出。注意:这些测试在我的系统上是非常可重复的。因此,“ plus”在2.6上总是更快,“ intp”在2.7上总是更快,而“ join”在Python 3.x上总是更快。

# python2.6 concatenate_test.py
plus 0.338213920593 (100.00% as fast)
join 0.427221059799 (79.17% as fast)
form 0.515371084213 (65.63% as fast)
intp 0.378169059753 (89.43% as fast)

# python3.3 concatenate_test.py
plus 0.409130576998 (89.20% as fast)
join 0.364938726001 (100.00% as fast)
form 0.621366866995 (58.73% as fast)
intp 0.419064424001 (87.08% as fast)

# python3.4 concatenate_test.py
plus 0.481188605998 (85.14% as fast)
join 0.409673971997 (100.00% as fast)
form 0.652010936996 (62.83% as fast)
intp 0.460400978001 (88.98% as fast)

# python3.5 concatenate_test.py
plus 0.417167026084 (93.47% as fast)
join 0.389929617057 (100.00% as fast)
form 0.595661019906 (65.46% as fast)
intp 0.404455224983 (96.41% as fast)

学过的知识:

  • 有时,我的假设是完全错误的。
  • 针对系统环境进行测试。您将在生产中运行。
  • 字符串插值还没有结束!

tl; dr:

  • 如果使用2.6,请使用+运算符。
  • 如果您使用的是2.7,请使用’%’运算符。
  • 如果您使用的是3.x,请使用”.join()。

As per John Fouhy’s answer, don’t optimize unless you have to, but if you’re here and asking this question, it may be precisely because you have to. In my case, I needed assemble some URLs from string variables… fast. I noticed no one (so far) seems to be considering the string format method, so I thought I’d try that and, mostly for mild interest, I thought I’d toss the string interpolation operator in there for good measuer. To be honest, I didn’t think either of these would stack up to a direct ‘+’ operation or a ”.join(). But guess what? On my Python 2.7.5 system, the string interpolation operator rules them all and string.format() is the worst performer:

# concatenate_test.py

from __future__ import print_function
import timeit

domain = 'some_really_long_example.com'
lang = 'en'
path = 'some/really/long/path/'
iterations = 1000000

def meth_plus():
    '''Using + operator'''
    return 'http://' + domain + '/' + lang + '/' + path

def meth_join():
    '''Using ''.join()'''
    return ''.join(['http://', domain, '/', lang, '/', path])

def meth_form():
    '''Using string.format'''
    return 'http://{0}/{1}/{2}'.format(domain, lang, path)

def meth_intp():
    '''Using string interpolation'''
    return 'http://%s/%s/%s' % (domain, lang, path)

plus = timeit.Timer(stmt="meth_plus()", setup="from __main__ import meth_plus")
join = timeit.Timer(stmt="meth_join()", setup="from __main__ import meth_join")
form = timeit.Timer(stmt="meth_form()", setup="from __main__ import meth_form")
intp = timeit.Timer(stmt="meth_intp()", setup="from __main__ import meth_intp")

plus.val = plus.timeit(iterations)
join.val = join.timeit(iterations)
form.val = form.timeit(iterations)
intp.val = intp.timeit(iterations)

min_val = min([plus.val, join.val, form.val, intp.val])

print('plus %0.12f (%0.2f%% as fast)' % (plus.val, (100 * min_val / plus.val), ))
print('join %0.12f (%0.2f%% as fast)' % (join.val, (100 * min_val / join.val), ))
print('form %0.12f (%0.2f%% as fast)' % (form.val, (100 * min_val / form.val), ))
print('intp %0.12f (%0.2f%% as fast)' % (intp.val, (100 * min_val / intp.val), ))

The results:

# python2.7 concatenate_test.py
plus 0.360787868500 (90.81% as fast)
join 0.452811956406 (72.36% as fast)
form 0.502608060837 (65.19% as fast)
intp 0.327636957169 (100.00% as fast)

If I use a shorter domain and shorter path, interpolation still wins out. The difference is more pronounced, though, with longer strings.

Now that I had a nice test script, I also tested under Python 2.6, 3.3 and 3.4, here’s the results. In Python 2.6, the plus operator is the fastest! On Python 3, join wins out. Note: these tests are very repeatable on my system. So, ‘plus’ is always faster on 2.6, ‘intp’ is always faster on 2.7 and ‘join’ is always faster on Python 3.x.

# python2.6 concatenate_test.py
plus 0.338213920593 (100.00% as fast)
join 0.427221059799 (79.17% as fast)
form 0.515371084213 (65.63% as fast)
intp 0.378169059753 (89.43% as fast)

# python3.3 concatenate_test.py
plus 0.409130576998 (89.20% as fast)
join 0.364938726001 (100.00% as fast)
form 0.621366866995 (58.73% as fast)
intp 0.419064424001 (87.08% as fast)

# python3.4 concatenate_test.py
plus 0.481188605998 (85.14% as fast)
join 0.409673971997 (100.00% as fast)
form 0.652010936996 (62.83% as fast)
intp 0.460400978001 (88.98% as fast)

# python3.5 concatenate_test.py
plus 0.417167026084 (93.47% as fast)
join 0.389929617057 (100.00% as fast)
form 0.595661019906 (65.46% as fast)
intp 0.404455224983 (96.41% as fast)

Lesson learned:

  • Sometimes, my assumptions are dead wrong.
  • Test against the system env. you’ll be running in production.
  • String interpolation isn’t dead yet!

tl;dr:

  • If you using 2.6, use the + operator.
  • if you’re using 2.7 use the ‘%’ operator.
  • if you’re using 3.x use ”.join().

回答 5

它在很大程度上取决于每个新串联后新字符串的相对大小。对于+运算符,对于每个串联,都会创建一个新字符串。如果中间字符串相对较长,则+由于存储新的中间字符串而变得越来越慢。

考虑这种情况:

from time import time
stri=''
a='aagsdfghfhdyjddtyjdhmfghmfgsdgsdfgsdfsdfsdfsdfsdfsdfddsksarigqeirnvgsdfsdgfsdfgfg'
l=[]
#case 1
t=time()
for i in range(1000):
    stri=stri+a+repr(i)
print time()-t

#case 2
t=time()
for i in xrange(1000):
    l.append(a+repr(i))
z=''.join(l)
print time()-t

#case 3
t=time()
for i in range(1000):
    stri=stri+repr(i)
print time()-t

#case 4
t=time()
for i in xrange(1000):
    l.append(repr(i))
z=''.join(l)
print time()-t

结果

1 0.00493192672729

2 0.000509023666382

3 0.00042200088501

4 0.000482797622681

在1&2的情况下,我们添加了一个大字符串,join()的执行速度提高了约10倍。在情况3&4中,我们添加一个小字符串,并且’+’的执行速度稍快

it pretty much depends on the relative sizes of the new string after every new concatenation. With the + operator, for every concatenation a new string is made. If the intermediary strings are relatively long, the + becomes increasingly slower because the new intermediary string is being stored.

Consider this case:

from time import time
stri=''
a='aagsdfghfhdyjddtyjdhmfghmfgsdgsdfgsdfsdfsdfsdfsdfsdfddsksarigqeirnvgsdfsdgfsdfgfg'
l=[]
#case 1
t=time()
for i in range(1000):
    stri=stri+a+repr(i)
print time()-t

#case 2
t=time()
for i in xrange(1000):
    l.append(a+repr(i))
z=''.join(l)
print time()-t

#case 3
t=time()
for i in range(1000):
    stri=stri+repr(i)
print time()-t

#case 4
t=time()
for i in xrange(1000):
    l.append(repr(i))
z=''.join(l)
print time()-t

Results

1 0.00493192672729

2 0.000509023666382

3 0.00042200088501

4 0.000482797622681

In the case of 1&2, we add a large string, and join() performs about 10 times faster. In case 3&4, we add a small string, and ‘+’ performs slightly faster


回答 6

我遇到了一种情况,我需要一个未知大小的可附加字符串。这些是基准测试结果(python 2.7.3):

$ python -m timeit -s 's=""' 's+="a"'
10000000 loops, best of 3: 0.176 usec per loop
$ python -m timeit -s 's=[]' 's.append("a")'
10000000 loops, best of 3: 0.196 usec per loop
$ python -m timeit -s 's=""' 's="".join((s,"a"))'
100000 loops, best of 3: 16.9 usec per loop
$ python -m timeit -s 's=""' 's="%s%s"%(s,"a")'
100000 loops, best of 3: 19.4 usec per loop

这似乎表明“ + =”是最快的。skymind链接的结果有些过时。

(我意识到第二个示例还不完整,最终列表将需要加入。但是,这确实表明,仅准备列表所花费的时间比字符串concat要长。)

I ran into a situation where I needed to have an appendable string of unknown size. These are the benchmark results (python 2.7.3):

$ python -m timeit -s 's=""' 's+="a"'
10000000 loops, best of 3: 0.176 usec per loop
$ python -m timeit -s 's=[]' 's.append("a")'
10000000 loops, best of 3: 0.196 usec per loop
$ python -m timeit -s 's=""' 's="".join((s,"a"))'
100000 loops, best of 3: 16.9 usec per loop
$ python -m timeit -s 's=""' 's="%s%s"%(s,"a")'
100000 loops, best of 3: 19.4 usec per loop

This seems to show that ‘+=’ is the fastest. The results from the skymind link are a bit out of date.

(I realize that the second example is not complete, the final list would need to be joined. This does show, however, that simply preparing the list takes longer than the string concat.)


回答 7

一年后,让我们用python 3.4.3测试mkoistinen的答案:

  • 加0.963564149000(速度为95.83%)
  • 加入0.923408469000(速度为100.00%)
  • 表格1.501130934000(速度为61.51%)
  • intp 1.019677452000(速度为90.56%)

没有改变。加入仍然是最快的方法。就可读性而言,可以说intp是最佳选择,但是您可能仍想使用intp。

One Year later, let’s test mkoistinen’s answer with python 3.4.3:

  • plus 0.963564149000 (95.83% as fast)
  • join 0.923408469000 (100.00% as fast)
  • form 1.501130934000 (61.51% as fast)
  • intp 1.019677452000 (90.56% as fast)

Nothing changed. Join is still the fastest method. With intp being arguably the best choice in terms of readability you might want to use intp nevertheless.


回答 8

受到@JasonBaker基准测试的启发,下面是一个比较10个"abcdefghijklmnopqrstuvxyz"字符串的简单示例,它显示了.join()更快的速度。即使变量有微小增加:

链状

>>> x = timeit.Timer(stmt='"abcdefghijklmnopqrstuvxyz" + "abcdefghijklmnopqrstuvxyz" + "abcdefghijklmnopqrstuvxyz" + "abcdefghijklmnopqrstuvxyz" + "abcdefghijklmnopqrstuvxyz" + "abcdefghijklmnopqrstuvxyz" + "abcdefghijklmnopqrstuvxyz" + "abcdefghijklmnopqrstuvxyz" + "abcdefghijklmnopqrstuvxyz" + "abcdefghijklmnopqrstuvxyz" + "abcdefghijklmnopqrstuvxyz"')
>>> x.timeit()
0.9828147209324385

加入

>>> x = timeit.Timer(stmt='"".join(["abcdefghijklmnopqrstuvxyz", "abcdefghijklmnopqrstuvxyz", "abcdefghijklmnopqrstuvxyz", "abcdefghijklmnopqrstuvxyz", "abcdefghijklmnopqrstuvxyz", "abcdefghijklmnopqrstuvxyz", "abcdefghijklmnopqrstuvxyz", "abcdefghijklmnopqrstuvxyz", "abcdefghijklmnopqrstuvxyz", "abcdefghijklmnopqrstuvxyz", "abcdefghijklmnopqrstuvxyz"])')
>>> x.timeit()
0.6114138159765048

Inspired by @JasonBaker’s benchmarks, here’s a simple one comparing 10 "abcdefghijklmnopqrstuvxyz" strings, showing that .join() is faster; even with this tiny increase in variables:

Catenation

>>> x = timeit.Timer(stmt='"abcdefghijklmnopqrstuvxyz" + "abcdefghijklmnopqrstuvxyz" + "abcdefghijklmnopqrstuvxyz" + "abcdefghijklmnopqrstuvxyz" + "abcdefghijklmnopqrstuvxyz" + "abcdefghijklmnopqrstuvxyz" + "abcdefghijklmnopqrstuvxyz" + "abcdefghijklmnopqrstuvxyz" + "abcdefghijklmnopqrstuvxyz" + "abcdefghijklmnopqrstuvxyz" + "abcdefghijklmnopqrstuvxyz"')
>>> x.timeit()
0.9828147209324385

Join

>>> x = timeit.Timer(stmt='"".join(["abcdefghijklmnopqrstuvxyz", "abcdefghijklmnopqrstuvxyz", "abcdefghijklmnopqrstuvxyz", "abcdefghijklmnopqrstuvxyz", "abcdefghijklmnopqrstuvxyz", "abcdefghijklmnopqrstuvxyz", "abcdefghijklmnopqrstuvxyz", "abcdefghijklmnopqrstuvxyz", "abcdefghijklmnopqrstuvxyz", "abcdefghijklmnopqrstuvxyz", "abcdefghijklmnopqrstuvxyz"])')
>>> x.timeit()
0.6114138159765048

回答 9

对于一部分短字符串(即2个或3个不超过几个字符的字符串),加号的速度仍然更快。在Python 2和3中使用mkoistinen的出色脚本:

plus 2.679107467004 (100.00% as fast)
join 3.653773699996 (73.32% as fast)
form 6.594011374000 (40.63% as fast)
intp 4.568015249999 (58.65% as fast)

因此,当您的代码执行大量单独的小串联时,如果速度至关重要,则plus是首选方法。

For a small set of short strings (i.e. 2 or 3 strings of no more than a few characters), plus is still way faster. Using mkoistinen’s wonderful script in Python 2 and 3:

plus 2.679107467004 (100.00% as fast)
join 3.653773699996 (73.32% as fast)
form 6.594011374000 (40.63% as fast)
intp 4.568015249999 (58.65% as fast)

So when your code is doing a huge number of separate small concatenations, plus is the preferred way if speed is crucial.


回答 10

可能“ Python 3.6中的新f字符串”是连接字符串的最有效方法。

使用%s

>>> timeit.timeit("""name = "Some"
... age = 100
... '%s is %s.' % (name, age)""", number = 10000)
0.0029734770068898797

使用.format

>>> timeit.timeit("""name = "Some"
... age = 100
... '{} is {}.'.format(name, age)""", number = 10000)
0.004015227983472869

使用f

>>> timeit.timeit("""name = "Some"
... age = 100
... f'{name} is {age}.'""", number = 10000)
0.0019175919878762215

资料来源:https : //realpython.com/python-f-strings/

Probably “new f-strings in Python 3.6” is the most efficient way of concatenating strings.

Using %s

>>> timeit.timeit("""name = "Some"
... age = 100
... '%s is %s.' % (name, age)""", number = 10000)
0.0029734770068898797

Using .format

>>> timeit.timeit("""name = "Some"
... age = 100
... '{} is {}.'.format(name, age)""", number = 10000)
0.004015227983472869

Using f

>>> timeit.timeit("""name = "Some"
... age = 100
... f'{name} is {age}.'""", number = 10000)
0.0019175919878762215

Source: https://realpython.com/python-f-strings/