如何在Python中将一个字符串附加到另一个字符串?

问题:如何在Python中将一个字符串附加到另一个字符串?

除了以下内容外,我想要一种有效的方法来在Python中将一个字符串附加到另一个字符串。

var1 = "foo"
var2 = "bar"
var3 = var1 + var2

有什么好的内置方法可以使用吗?

I want an efficient way to append one string to another in Python, other than the following.

var1 = "foo"
var2 = "bar"
var3 = var1 + var2

Is there any good built-in method to use?


回答 0

如果只有一个对字符串的引用,并且将另一个字符串连接到末尾,则CPython现在会对此进行特殊处理,并尝试将字符串扩展到位。

最终结果是将操作摊销O(n)。

例如

s = ""
for i in range(n):
    s+=str(i)

过去是O(n ^ 2),但现在是O(n)。

从源(bytesobject.c):

void
PyBytes_ConcatAndDel(register PyObject **pv, register PyObject *w)
{
    PyBytes_Concat(pv, w);
    Py_XDECREF(w);
}


/* The following function breaks the notion that strings are immutable:
   it changes the size of a string.  We get away with this only if there
   is only one module referencing the object.  You can also think of it
   as creating a new string object and destroying the old one, only
   more efficiently.  In any case, don't use this if the string may
   already be known to some other part of the code...
   Note that if there's not enough memory to resize the string, the original
   string object at *pv is deallocated, *pv is set to NULL, an "out of
   memory" exception is set, and -1 is returned.  Else (on success) 0 is
   returned, and the value in *pv may or may not be the same as on input.
   As always, an extra byte is allocated for a trailing \0 byte (newsize
   does *not* include that), and a trailing \0 byte is stored.
*/

int
_PyBytes_Resize(PyObject **pv, Py_ssize_t newsize)
{
    register PyObject *v;
    register PyBytesObject *sv;
    v = *pv;
    if (!PyBytes_Check(v) || Py_REFCNT(v) != 1 || newsize < 0) {
        *pv = 0;
        Py_DECREF(v);
        PyErr_BadInternalCall();
        return -1;
    }
    /* XXX UNREF/NEWREF interface should be more symmetrical */
    _Py_DEC_REFTOTAL;
    _Py_ForgetReference(v);
    *pv = (PyObject *)
        PyObject_REALLOC((char *)v, PyBytesObject_SIZE + newsize);
    if (*pv == NULL) {
        PyObject_Del(v);
        PyErr_NoMemory();
        return -1;
    }
    _Py_NewReference(*pv);
    sv = (PyBytesObject *) *pv;
    Py_SIZE(sv) = newsize;
    sv->ob_sval[newsize] = '\0';
    sv->ob_shash = -1;          /* invalidate cached hash value */
    return 0;
}

凭经验进行验证很容易。

$ python -m timeit -s“ s =”“”对于xrange(10):s + ='a'
1000000次循环,每循环3:1.85最佳
$ python -m timeit -s“ s =”“”对于xrange(100):s + ='a'
10000次循环,最佳为3次:每个循环16.8微秒
$ python -m timeit -s“ s =”“”对于xrange(1000)中的我来说:s + ='a'“
10000次循环,最佳为3次:每个循环158微秒
$ python -m timeit -s“ s =”“”对于xrange(10000):s + ='a'
1000次循环,每循环3:1.71毫秒最佳
$ python -m timeit -s“ s =”“”对于xrange(100000):s + ='a'
10个循环,每循环最好3:14.6毫秒
$ python -m timeit -s“ s =”“”对于xrange(1000000):s + ='a'
10个循环,最佳3:每个循环173毫秒

不过,请务必注意,此优化不是Python规范的一部分。据我所知,它仅在cPython实现中。例如,对pypy或jython进行的相同经验测试可能会显示较旧的O(n ** 2)性能。

$ pypy -m timeit -s“ s =”“”对于xrange(10)中的i:s + ='a'“
10000次循环,最好为3:每个循环90.8微秒
$ pypy -m timeit -s“ s =”“”对于xrange(100)中的i:s + ='a'“
1000个循环,每循环3:896最佳
$ pypy -m timeit -s“ s =”“”对于xrange(1000)中的i:s + ='a'“
100个循环,每个循环最好3:9.03毫秒
$ pypy -m timeit -s“ s =”“”对于xrange(10000):s + ='a'
10个循环,最好为3:每个循环89.5毫秒

到目前为止一切顺利,但随后,

$ pypy -m timeit -s“ s =”“”对于xrange(100000):s + ='a'
10次​​循环,每循环3:12.8秒的最佳时间

哎呀,甚至比二次还差。因此,pypy可以在短字符串上做得很好,但是在较大的字符串上却表现不佳。

If you only have one reference to a string and you concatenate another string to the end, CPython now special cases this and tries to extend the string in place.

The end result is that the operation is amortized O(n).

e.g.

s = ""
for i in range(n):
    s+=str(i)

used to be O(n^2), but now it is O(n).

From the source (bytesobject.c):

void
PyBytes_ConcatAndDel(register PyObject **pv, register PyObject *w)
{
    PyBytes_Concat(pv, w);
    Py_XDECREF(w);
}


/* The following function breaks the notion that strings are immutable:
   it changes the size of a string.  We get away with this only if there
   is only one module referencing the object.  You can also think of it
   as creating a new string object and destroying the old one, only
   more efficiently.  In any case, don't use this if the string may
   already be known to some other part of the code...
   Note that if there's not enough memory to resize the string, the original
   string object at *pv is deallocated, *pv is set to NULL, an "out of
   memory" exception is set, and -1 is returned.  Else (on success) 0 is
   returned, and the value in *pv may or may not be the same as on input.
   As always, an extra byte is allocated for a trailing \0 byte (newsize
   does *not* include that), and a trailing \0 byte is stored.
*/

int
_PyBytes_Resize(PyObject **pv, Py_ssize_t newsize)
{
    register PyObject *v;
    register PyBytesObject *sv;
    v = *pv;
    if (!PyBytes_Check(v) || Py_REFCNT(v) != 1 || newsize < 0) {
        *pv = 0;
        Py_DECREF(v);
        PyErr_BadInternalCall();
        return -1;
    }
    /* XXX UNREF/NEWREF interface should be more symmetrical */
    _Py_DEC_REFTOTAL;
    _Py_ForgetReference(v);
    *pv = (PyObject *)
        PyObject_REALLOC((char *)v, PyBytesObject_SIZE + newsize);
    if (*pv == NULL) {
        PyObject_Del(v);
        PyErr_NoMemory();
        return -1;
    }
    _Py_NewReference(*pv);
    sv = (PyBytesObject *) *pv;
    Py_SIZE(sv) = newsize;
    sv->ob_sval[newsize] = '\0';
    sv->ob_shash = -1;          /* invalidate cached hash value */
    return 0;
}

It’s easy enough to verify empirically.

$ python -m timeit -s"s=''" "for i in xrange(10):s+='a'"
1000000 loops, best of 3: 1.85 usec per loop
$ python -m timeit -s"s=''" "for i in xrange(100):s+='a'"
10000 loops, best of 3: 16.8 usec per loop
$ python -m timeit -s"s=''" "for i in xrange(1000):s+='a'"
10000 loops, best of 3: 158 usec per loop
$ python -m timeit -s"s=''" "for i in xrange(10000):s+='a'"
1000 loops, best of 3: 1.71 msec per loop
$ python -m timeit -s"s=''" "for i in xrange(100000):s+='a'"
10 loops, best of 3: 14.6 msec per loop
$ python -m timeit -s"s=''" "for i in xrange(1000000):s+='a'"
10 loops, best of 3: 173 msec per loop

It’s important however to note that this optimisation isn’t part of the Python spec. It’s only in the cPython implementation as far as I know. The same empirical testing on pypy or jython for example might show the older O(n**2) performance .

$ pypy -m timeit -s"s=''" "for i in xrange(10):s+='a'"
10000 loops, best of 3: 90.8 usec per loop
$ pypy -m timeit -s"s=''" "for i in xrange(100):s+='a'"
1000 loops, best of 3: 896 usec per loop
$ pypy -m timeit -s"s=''" "for i in xrange(1000):s+='a'"
100 loops, best of 3: 9.03 msec per loop
$ pypy -m timeit -s"s=''" "for i in xrange(10000):s+='a'"
10 loops, best of 3: 89.5 msec per loop

So far so good, but then,

$ pypy -m timeit -s"s=''" "for i in xrange(100000):s+='a'"
10 loops, best of 3: 12.8 sec per loop

ouch even worse than quadratic. So pypy is doing something that works well with short strings, but performs poorly for larger strings.


回答 1

不要过早优化。如果您没有理由相信字符串连接会造成速度瓶颈,那么请坚持使用+and +=

s  = 'foo'
s += 'bar'
s += 'baz'

就是说,如果您的目标是Java的StringBuilder之类的东西,那么规范的Python习惯用法就是将项目添加到列表中,然后最后str.join将它们全部串联起来:

l = []
l.append('foo')
l.append('bar')
l.append('baz')

s = ''.join(l)

Don’t prematurely optimize. If you have no reason to believe there’s a speed bottleneck caused by string concatenations then just stick with + and +=:

s  = 'foo'
s += 'bar'
s += 'baz'

That said, if you’re aiming for something like Java’s StringBuilder, the canonical Python idiom is to add items to a list and then use str.join to concatenate them all at the end:

l = []
l.append('foo')
l.append('bar')
l.append('baz')

s = ''.join(l)

回答 2

str1 = "Hello"
str2 = "World"
newstr = " ".join((str1, str2))

这将str1和str2加上一个空格作为分隔符。您也可以"".join(str1, str2, ...)str.join()需要迭代,因此您必须将字符串放入列表或元组中。

这与内置方法一样高效。

str1 = "Hello"
str2 = "World"
newstr = " ".join((str1, str2))

That joins str1 and str2 with a space as separators. You can also do "".join(str1, str2, ...). str.join() takes an iterable, so you’d have to put the strings in a list or a tuple.

That’s about as efficient as it gets for a builtin method.


回答 3

别。

也就是说,在大多数情况下,最好一次性生成整个字符串,而不是附加到现有字符串。

例如,不要: obj1.name + ":" + str(obj1.count)

相反:使用 "%s:%d" % (obj1.name, obj1.count)

这将更容易阅读和更有效。

Don’t.

That is, for most cases you are better off generating the whole string in one go rather then appending to an existing string.

For example, don’t do: obj1.name + ":" + str(obj1.count)

Instead: use "%s:%d" % (obj1.name, obj1.count)

That will be easier to read and more efficient.


回答 4

Python 3.6为我们提供了f字符串,这很令人高兴:

var1 = "foo"
var2 = "bar"
var3 = f"{var1}{var2}"
print(var3)                       # prints foobar

您可以在花括号内执行大多数操作

print(f"1 + 1 == {1 + 1}")        # prints 1 + 1 == 2

Python 3.6 gives us f-strings, which are a delight:

var1 = "foo"
var2 = "bar"
var3 = f"{var1}{var2}"
print(var3)                       # prints foobar

You can do most anything inside the curly braces

print(f"1 + 1 == {1 + 1}")        # prints 1 + 1 == 2

回答 5

如果需要执行许多附加操作来构建大字符串,则可以使用StringIO或cStringIO。界面就像一个文件。即:您write在其上附加文本。

如果您只是追加两个字符串,请使用+

If you need to do many append operations to build a large string, you can use StringIO or cStringIO. The interface is like a file. ie: you write to append text to it.

If you’re just appending two strings then just use +.


回答 6

这实际上取决于您的应用程序。如果您要遍历数百个单词并将其全部添加到列表中,.join()那就更好了。但是,如果要把很长的句子放在一起,最好使用+=

it really depends on your application. If you’re looping through hundreds of words and want to append them all into a list, .join() is better. But if you’re putting together a long sentence, you’re better off using +=.


回答 7

基本上没有区别。唯一一致的趋势是,每个版本的Python似乎都变得越来越慢… :(


清单

%%timeit
x = []
for i in range(100000000):  # xrange on Python 2.7
    x.append('a')
x = ''.join(x)

Python 2.7

1个循环,每循环3:7.34 s 最佳

Python 3.4

1个循环,每个循环最好3:7.99 s

Python 3.5

1次循环,每循环3:8.48 s 最佳

Python 3.6

1次循环,每循环3:9.93 s 最佳


%%timeit
x = ''
for i in range(100000000):  # xrange on Python 2.7
    x += 'a'

Python 2.7

1次循环,每循环3:7.41 s最佳

Python 3.4

1个循环,每个循环最好3:9.08 s

Python 3.5

1次循环,每循环3:8.82 s 最佳

Python 3.6

1次循环,每循环3:9.24 s 最佳

Basically, no difference. The only consistent trend is that Python seems to be getting slower with every version… :(


List

%%timeit
x = []
for i in range(100000000):  # xrange on Python 2.7
    x.append('a')
x = ''.join(x)

Python 2.7

1 loop, best of 3: 7.34 s per loop

Python 3.4

1 loop, best of 3: 7.99 s per loop

Python 3.5

1 loop, best of 3: 8.48 s per loop

Python 3.6

1 loop, best of 3: 9.93 s per loop


String

%%timeit
x = ''
for i in range(100000000):  # xrange on Python 2.7
    x += 'a'

Python 2.7:

1 loop, best of 3: 7.41 s per loop

Python 3.4

1 loop, best of 3: 9.08 s per loop

Python 3.5

1 loop, best of 3: 8.82 s per loop

Python 3.6

1 loop, best of 3: 9.24 s per loop


回答 8

__add__函数追加字符串

str = "Hello"
str2 = " World"
st = str.__add__(str2)
print(st)

输出量

Hello World

append strings with __add__ function

str = "Hello"
str2 = " World"
st = str.__add__(str2)
print(st)

Output

Hello World

回答 9

a='foo'
b='baaz'

a.__add__(b)

out: 'foobaaz'
a='foo'
b='baaz'

a.__add__(b)

out: 'foobaaz'