python中最有效的字符串连接方法是什么？

Question 1

Is there any efficient mass string concatenation method in Python (like StringBuilder in C# or StringBuffer in Java)? I found following methods here:

Simple concatenation using +
Using string list and join method
Using UserString from MutableString module
Using character array and the array module
Using cStringIO from StringIO module

But what do you experts use or suggest, and why?

[A related question here]

Question 2

You may be interested in this: An optimization anecdote by Guido. Although it is worth remembering also that this is an old article and it predates the existence of things like ''.join (although I guess string.joinfields is more-or-less the same)

On the strength of that, the array module may be fastest if you can shoehorn your problem into it. But ''.join is probably fast enough and has the benefit of being idiomatic and thus easier for other python programmers to understand.

Finally, the golden rule of optimization: don’t optimize unless you know you need to, and measure rather than guessing.

You can measure different methods using the timeit module. That can tell you which is fastest, instead of random strangers on the internet making guesses.

Question 3

''.join(sequenceofstrings) is what usually works best — simplest and fastest.

Question 4

Python 3.6 changed the game for string concatenation of known components with Literal String Interpolation.

Given the test case from mkoistinen’s answer, having strings

domain = 'some_really_long_example.com'
lang = 'en'
path = 'some/really/long/path/'

The contenders are

f'http://{domain}/{lang}/{path}' – 0.151 µs
'http://%s/%s/%s' % (domain, lang, path) – 0.321 µs
'http://' + domain + '/' + lang + '/' + path – 0.356 µs
''.join(('http://', domain, '/', lang, '/', path)) – 0.249 µs (notice that building a constant-length tuple is slightly faster than building a constant-length list).

Thus currently the shortest and the most beautiful code possible is also fastest.

In alpha versions of Python 3.6 the implementation of f'' strings was the slowest possible – actually the generated byte code is pretty much equivalent to the ''.join() case with unnecessary calls to str.__format__ which without arguments would just return self unchanged. These inefficiencies were addressed before 3.6 final.

The speed can be contrasted with the fastest method for Python 2, which is + concatenation on my computer; and that takes 0.203 µs with 8-bit strings, and 0.259 µs if the strings are all Unicode.

Question 5

It depends on what you’re doing.

After Python 2.5, string concatenation with the + operator is pretty fast. If you’re just concatenating a couple of values, using the + operator works best:

>>> x = timeit.Timer(stmt="'a' + 'b'")
>>> x.timeit()
0.039999961853027344

>>> x = timeit.Timer(stmt="''.join(['a', 'b'])")
>>> x.timeit()
0.76200008392333984

However, if you’re putting together a string in a loop, you’re better off using the list joining method:

>>> join_stmt = """
... joined_str = ''
... for i in xrange(100000):
...   joined_str += str(i)
... """
>>> x = timeit.Timer(join_stmt)
>>> x.timeit(100)
13.278000116348267

>>> list_stmt = """
... str_list = []
... for i in xrange(100000):
...   str_list.append(str(i))
... ''.join(str_list)
... """
>>> x = timeit.Timer(list_stmt)
>>> x.timeit(100)
12.401000022888184

…but notice that you have to be putting together a relatively high number of strings before the difference becomes noticeable.

Question 6

As per John Fouhy’s answer, don’t optimize unless you have to, but if you’re here and asking this question, it may be precisely because you have to. In my case, I needed assemble some URLs from string variables… fast. I noticed no one (so far) seems to be considering the string format method, so I thought I’d try that and, mostly for mild interest, I thought I’d toss the string interpolation operator in there for good measuer. To be honest, I didn’t think either of these would stack up to a direct ‘+’ operation or a ”.join(). But guess what? On my Python 2.7.5 system, the string interpolation operator rules them all and string.format() is the worst performer:

# concatenate_test.py

from __future__ import print_function
import timeit

domain = 'some_really_long_example.com'
lang = 'en'
path = 'some/really/long/path/'
iterations = 1000000

def meth_plus():
    '''Using + operator'''
    return 'http://' + domain + '/' + lang + '/' + path

def meth_join():
    '''Using ''.join()'''
    return ''.join(['http://', domain, '/', lang, '/', path])

def meth_form():
    '''Using string.format'''
    return 'http://{0}/{1}/{2}'.format(domain, lang, path)

def meth_intp():
    '''Using string interpolation'''
    return 'http://%s/%s/%s' % (domain, lang, path)

plus = timeit.Timer(stmt="meth_plus()", setup="from __main__ import meth_plus")
join = timeit.Timer(stmt="meth_join()", setup="from __main__ import meth_join")
form = timeit.Timer(stmt="meth_form()", setup="from __main__ import meth_form")
intp = timeit.Timer(stmt="meth_intp()", setup="from __main__ import meth_intp")

plus.val = plus.timeit(iterations)
join.val = join.timeit(iterations)
form.val = form.timeit(iterations)
intp.val = intp.timeit(iterations)

min_val = min([plus.val, join.val, form.val, intp.val])

print('plus %0.12f (%0.2f%% as fast)' % (plus.val, (100 * min_val / plus.val), ))
print('join %0.12f (%0.2f%% as fast)' % (join.val, (100 * min_val / join.val), ))
print('form %0.12f (%0.2f%% as fast)' % (form.val, (100 * min_val / form.val), ))
print('intp %0.12f (%0.2f%% as fast)' % (intp.val, (100 * min_val / intp.val), ))

The results:

# python2.7 concatenate_test.py
plus 0.360787868500 (90.81% as fast)
join 0.452811956406 (72.36% as fast)
form 0.502608060837 (65.19% as fast)
intp 0.327636957169 (100.00% as fast)

If I use a shorter domain and shorter path, interpolation still wins out. The difference is more pronounced, though, with longer strings.

Now that I had a nice test script, I also tested under Python 2.6, 3.3 and 3.4, here’s the results. In Python 2.6, the plus operator is the fastest! On Python 3, join wins out. Note: these tests are very repeatable on my system. So, ‘plus’ is always faster on 2.6, ‘intp’ is always faster on 2.7 and ‘join’ is always faster on Python 3.x.

# python2.6 concatenate_test.py
plus 0.338213920593 (100.00% as fast)
join 0.427221059799 (79.17% as fast)
form 0.515371084213 (65.63% as fast)
intp 0.378169059753 (89.43% as fast)

# python3.3 concatenate_test.py
plus 0.409130576998 (89.20% as fast)
join 0.364938726001 (100.00% as fast)
form 0.621366866995 (58.73% as fast)
intp 0.419064424001 (87.08% as fast)

# python3.4 concatenate_test.py
plus 0.481188605998 (85.14% as fast)
join 0.409673971997 (100.00% as fast)
form 0.652010936996 (62.83% as fast)
intp 0.460400978001 (88.98% as fast)

# python3.5 concatenate_test.py
plus 0.417167026084 (93.47% as fast)
join 0.389929617057 (100.00% as fast)
form 0.595661019906 (65.46% as fast)
intp 0.404455224983 (96.41% as fast)

Lesson learned:

Sometimes, my assumptions are dead wrong.
Test against the system env. you’ll be running in production.
String interpolation isn’t dead yet!

tl;dr:

If you using 2.6, use the + operator.
if you’re using 2.7 use the ‘%’ operator.
if you’re using 3.x use ”.join().

Question 7

it pretty much depends on the relative sizes of the new string after every new concatenation. With the + operator, for every concatenation a new string is made. If the intermediary strings are relatively long, the + becomes increasingly slower because the new intermediary string is being stored.

Consider this case:

from time import time
stri=''
a='aagsdfghfhdyjddtyjdhmfghmfgsdgsdfgsdfsdfsdfsdfsdfsdfddsksarigqeirnvgsdfsdgfsdfgfg'
l=[]
#case 1
t=time()
for i in range(1000):
    stri=stri+a+repr(i)
print time()-t

#case 2
t=time()
for i in xrange(1000):
    l.append(a+repr(i))
z=''.join(l)
print time()-t

#case 3
t=time()
for i in range(1000):
    stri=stri+repr(i)
print time()-t

#case 4
t=time()
for i in xrange(1000):
    l.append(repr(i))
z=''.join(l)
print time()-t

Results

1 0.00493192672729

2 0.000509023666382

3 0.00042200088501

4 0.000482797622681

In the case of 1&2, we add a large string, and join() performs about 10 times faster. In case 3&4, we add a small string, and ‘+’ performs slightly faster

Question 8

I ran into a situation where I needed to have an appendable string of unknown size. These are the benchmark results (python 2.7.3):

$ python -m timeit -s 's=""' 's+="a"'
10000000 loops, best of 3: 0.176 usec per loop
$ python -m timeit -s 's=[]' 's.append("a")'
10000000 loops, best of 3: 0.196 usec per loop
$ python -m timeit -s 's=""' 's="".join((s,"a"))'
100000 loops, best of 3: 16.9 usec per loop
$ python -m timeit -s 's=""' 's="%s%s"%(s,"a")'
100000 loops, best of 3: 19.4 usec per loop

This seems to show that ‘+=’ is the fastest. The results from the skymind link are a bit out of date.

(I realize that the second example is not complete, the final list would need to be joined. This does show, however, that simply preparing the list takes longer than the string concat.)

Question 9

One Year later, let’s test mkoistinen’s answer with python 3.4.3:

plus 0.963564149000 (95.83% as fast)
join 0.923408469000 (100.00% as fast)
form 1.501130934000 (61.51% as fast)
intp 1.019677452000 (90.56% as fast)

Nothing changed. Join is still the fastest method. With intp being arguably the best choice in terms of readability you might want to use intp nevertheless.

Question 10

Inspired by @JasonBaker’s benchmarks, here’s a simple one comparing 10 "abcdefghijklmnopqrstuvxyz" strings, showing that .join() is faster; even with this tiny increase in variables:

Catenation

>>> x = timeit.Timer(stmt='"abcdefghijklmnopqrstuvxyz" + "abcdefghijklmnopqrstuvxyz" + "abcdefghijklmnopqrstuvxyz" + "abcdefghijklmnopqrstuvxyz" + "abcdefghijklmnopqrstuvxyz" + "abcdefghijklmnopqrstuvxyz" + "abcdefghijklmnopqrstuvxyz" + "abcdefghijklmnopqrstuvxyz" + "abcdefghijklmnopqrstuvxyz" + "abcdefghijklmnopqrstuvxyz" + "abcdefghijklmnopqrstuvxyz"')
>>> x.timeit()
0.9828147209324385

Join

>>> x = timeit.Timer(stmt='"".join(["abcdefghijklmnopqrstuvxyz", "abcdefghijklmnopqrstuvxyz", "abcdefghijklmnopqrstuvxyz", "abcdefghijklmnopqrstuvxyz", "abcdefghijklmnopqrstuvxyz", "abcdefghijklmnopqrstuvxyz", "abcdefghijklmnopqrstuvxyz", "abcdefghijklmnopqrstuvxyz", "abcdefghijklmnopqrstuvxyz", "abcdefghijklmnopqrstuvxyz", "abcdefghijklmnopqrstuvxyz"])')
>>> x.timeit()
0.6114138159765048

Question 11

For a small set of short strings (i.e. 2 or 3 strings of no more than a few characters), plus is still way faster. Using mkoistinen’s wonderful script in Python 2 and 3:

plus 2.679107467004 (100.00% as fast)
join 3.653773699996 (73.32% as fast)
form 6.594011374000 (40.63% as fast)
intp 4.568015249999 (58.65% as fast)

So when your code is doing a huge number of separate small concatenations, plus is the preferred way if speed is crucial.

Question 12

Probably “new f-strings in Python 3.6” is the most efficient way of concatenating strings.

Using %s

>>> timeit.timeit("""name = "Some"
... age = 100
... '%s is %s.' % (name, age)""", number = 10000)
0.0029734770068898797

Using .format

>>> timeit.timeit("""name = "Some"
... age = 100
... '{} is {}.'.format(name, age)""", number = 10000)
0.004015227983472869

Using f

>>> timeit.timeit("""name = "Some"
... age = 100
... f'{name} is {age}.'""", number = 10000)
0.0019175919878762215

Source: https://realpython.com/python-f-strings/

python中最有效的字符串连接方法是什么？

问题：python中最有效的字符串连接方法是什么？

回答 0

回答 1

回答 2

回答 3

回答 4

回答 5

回答 6

回答 7

回答 8

链状

加入

Catenation

Join

回答 9

回答 10

排行榜展示

Python 情人节超强技能导出微信聊天记录生成词云

你不得不知道的python超级文献批量搜索下载工具

7行代码 Python热力图可视化分析缺失数据处理

Python 流程图 — 一键转化代码为流程图

Python 优化—算出每条语句执行时间

你的10W块放哪里能赚最多钱？

文章展示

熊猫数据框中的随机行选择

如何将熊猫数据框的索引转换为列？

如何初始化基类（超级）？

使用Python获取文件的最后n行，类似于tail

什么是“一流”物品？

NameError：名称“ reduce”未在Python中定义

python中最有效的字符串连接方法是什么？

问题：python中最有效的字符串连接方法是什么？

回答 0

回答 1

回答 2

回答 3

回答 4

回答 5

回答 6

回答 7

回答 8

链状

加入

Catenation

Join

回答 9

回答 10

相关文章

排行榜展示

文章展示