像C#中的StringBuilder这样的Python字符串类?

问题:像C#中的StringBuilder这样的Python字符串类?

Python中是否像StringBuilderC#中一样有一些字符串类?

Is there some string class in Python like StringBuilder in C#?


回答 0

没有一对一的关联。对于非常好的文章,请参见Python中的高效字符串连接

使用Python编程语言构建长字符串有时会导致运行速度非常慢。在本文中,我研究了各种字符串连接方法的计算性能。

There is no one-to-one correlation. For a really good article please see Efficient String Concatenation in Python:

Building long strings in the Python progamming language can sometimes result in very slow running code. In this article I investigate the computational performance of various string concatenation methods.


回答 1

我使用了Oliver Crow的代码(由Andrew Hare给出的链接),并对其进行了一些修改以适应Python 2.7.3。(通过使用timeit包)。我在个人计算机Lenovo T61、6GB RAM,Debian GNU / Linux 6.0.6(挤压)上运行。

这是10,000次迭代的结果:

方法1:0.0538418292999秒
处理大小4800 kb
方法2:0.22602891922秒
处理大小4960 kb
method3:0.0605459213257秒
处理大小4980 kb
method4:0.0544030666351秒
处理大小5536 kb
method5:0.0551080703735秒
处理大小5272 kb
method6:0.0542731285095秒
处理大小5512 kb

并且进行了5,000,000次迭代(方法2被忽略了,因为它运行得太慢了,就像永远一样):

方法1:5.88603997231秒
处理大小37976 kb
方法3:8.40748500824秒
处理大小38024 kb
方法4:7.96380496025秒
程序大小321968 kb
方法5:8.03666186333秒
处理大小71720 kb
方法6:6.68192911148秒
处理大小38240 kb

很明显,Python的人在优化字符串连接方面做得非常出色,正如Hoare所说:“过早的优化是万恶之源” :-)

I have used the code of Oliver Crow (link given by Andrew Hare) and adapted it a bit to tailor Python 2.7.3. (by using timeit package). I ran on my personal computer, Lenovo T61, 6GB RAM, Debian GNU/Linux 6.0.6 (squeeze).

Here is the result for 10,000 iterations:

method1:  0.0538418292999 secs
process size 4800 kb
method2:  0.22602891922 secs
process size 4960 kb
method3:  0.0605459213257 secs
process size 4980 kb
method4:  0.0544030666351 secs
process size 5536 kb
method5:  0.0551080703735 secs
process size 5272 kb
method6:  0.0542731285095 secs
process size 5512 kb

and for 5,000,000 iterations (method 2 was ignored because it ran tooo slowly, like forever):

method1:  5.88603997231 secs
process size 37976 kb
method3:  8.40748500824 secs
process size 38024 kb
method4:  7.96380496025 secs
process size 321968 kb
method5:  8.03666186333 secs
process size 71720 kb
method6:  6.68192911148 secs
process size 38240 kb

It is quite obvious that Python guys have done pretty great job to optimize string concatenation, and as Hoare said: “premature optimization is the root of all evil” :-)


回答 2

依靠编译器优化是脆弱的。接受的答案中链接的基准和Antoine-tran给出的数字不可信。安德鲁·黑尔(Andrew Hare)错误地repr在其方法中包含了一个调用。这会平均降低所有方法的速度,但会掩盖构建字符串的实际代价。

使用join。它非常快速且更强大。

$ ipython3
Python 3.5.1 (default, Mar  2 2016, 03:38:02) 
IPython 4.1.2 -- An enhanced Interactive Python.

In [1]: values = [str(num) for num in range(int(1e3))]

In [2]: %%timeit
   ...: ''.join(values)
   ...: 
100000 loops, best of 3: 7.37 µs per loop

In [3]: %%timeit
   ...: result = ''
   ...: for value in values:
   ...:     result += value
   ...: 
10000 loops, best of 3: 82.8 µs per loop

In [4]: import io

In [5]: %%timeit
   ...: writer = io.StringIO()
   ...: for value in values:
   ...:     writer.write(value)
   ...: writer.getvalue()
   ...: 
10000 loops, best of 3: 81.8 µs per loop

Relying on compiler optimizations is fragile. The benchmarks linked in the accepted answer and numbers given by Antoine-tran are not to be trusted. Andrew Hare makes the mistake of including a call to repr in his methods. That slows all the methods equally but obscures the real penalty in constructing the string.

Use join. It’s very fast and more robust.

$ ipython3
Python 3.5.1 (default, Mar  2 2016, 03:38:02) 
IPython 4.1.2 -- An enhanced Interactive Python.

In [1]: values = [str(num) for num in range(int(1e3))]

In [2]: %%timeit
   ...: ''.join(values)
   ...: 
100000 loops, best of 3: 7.37 µs per loop

In [3]: %%timeit
   ...: result = ''
   ...: for value in values:
   ...:     result += value
   ...: 
10000 loops, best of 3: 82.8 µs per loop

In [4]: import io

In [5]: %%timeit
   ...: writer = io.StringIO()
   ...: for value in values:
   ...:     writer.write(value)
   ...: writer.getvalue()
   ...: 
10000 loops, best of 3: 81.8 µs per loop

回答 3

Python具有满足类似目的的几件事:

  • 从片段构建大字符串的一种常用方法是增长字符串列表,并在完成后将其加入。这是一个常用的Python习惯用法。
    • 要构建包含格式化数据的字符串,您需要单独进行格式化。
  • 为了在字符级别插入和删除,您将保留一个长度为一的字符串列表。(要通过字符串进行此操作,list(your_string)您可以调用。您也可以UserString.MutableString为此使用a 。
  • (c)StringIO.StringIO 对于原本会占用文件的内容很有用,但对于一般的字符串构建则没什么用。

Python has several things that fulfill similar purposes:

  • One common way to build large strings from pieces is to grow a list of strings and join it when you are done. This is a frequently-used Python idiom.
    • To build strings incorporating data with formatting, you would do the formatting separately.
  • For insertion and deletion at a character level, you would keep a list of length-one strings. (To make this from a string, you’d call list(your_string). You could also use a UserString.MutableString for this.
  • (c)StringIO.StringIO is useful for things that would otherwise take a file, but less so for general string building.

回答 4

从上面使用方法5(伪文件),我们可以获得非常好的性能和灵活性

from cStringIO import StringIO

class StringBuilder:
     _file_str = None

     def __init__(self):
         self._file_str = StringIO()

     def Append(self, str):
         self._file_str.write(str)

     def __str__(self):
         return self._file_str.getvalue()

现在使用它

sb = StringBuilder()

sb.Append("Hello\n")
sb.Append("World")

print sb

Using method 5 from above (The Pseudo File) we can get very good perf and flexibility

from cStringIO import StringIO

class StringBuilder:
     _file_str = None

     def __init__(self):
         self._file_str = StringIO()

     def Append(self, str):
         self._file_str.write(str)

     def __str__(self):
         return self._file_str.getvalue()

now using it

sb = StringBuilder()

sb.Append("Hello\n")
sb.Append("World")

print sb

回答 5

您可以尝试StringIOcStringIO


回答 6

没有显式的类似物-我认为您应该使用字符串串联(可能如前所述进行优化)或第三方类(我怀疑它们效率更高)-python中的列表是动态类型的,因此无法快速工作char []用于缓冲区(我假设)。由于许多语言中的字符串固有特性(不可变性),类似Stringbuilder的类不是过早的优化-允许进行许多优化(例如,为片段/子字符串引用相同的缓冲区)。类似于Stringbuilder / stringbuffer / stringstream的类的工作比连接字符串(产生许多仍需要分配和垃圾回收的小型临时对象)甚至字符串格式的类似于printf的工具要快得多,不需要解释格式化模式的开销,这对于很多格式调用。

There is no explicit analogue – i think you are expected to use string concatenations(likely optimized as said before) or third-party class(i doubt that they are a lot more efficient – lists in python are dynamic-typed so no fast-working char[] for buffer as i assume). Stringbuilder-like classes are not premature optimization because of innate feature of strings in many languages(immutability) – that allows many optimizations(for example, referencing same buffer for slices/substrings). Stringbuilder/stringbuffer/stringstream-like classes work a lot faster than concatenating strings(producing many small temporary objects that still need allocations and garbage collection) and even string formatting printf-like tools, not needing of interpreting formatting pattern overhead that is pretty consuming for a lot of format calls.


回答 7

如果您在这里寻找Python中的快速字符串连接方法,则不需要特殊的StringBuilder类。简单的串联也可以正常工作,而不会降低C#中的性能。

resultString = ""

resultString += "Append 1"
resultString += "Append 2"

有关性能结果,请参见Antoine-tran的答案

In case you are here looking for a fast string concatenation method in Python, then you do not need a special StringBuilder class. Simple concatenation works just as well without the performance penalty seen in C#.

resultString = ""

resultString += "Append 1"
resultString += "Append 2"

See Antoine-tran’s answer for performance results