问题:Python中的字符串串联与字符串替换

在Python中,使用字符串连接与字符串替换的时间和地点使我难以理解。由于字符串连接的性能有了很大提高,这(成为更多)是一种风格上的决定,而不是一种实际的决定吗?

举一个具体的例子,如何处理灵活的URI:

DOMAIN = 'http://stackoverflow.com'
QUESTIONS = '/questions'

def so_question_uri_sub(q_num):
    return "%s%s/%d" % (DOMAIN, QUESTIONS, q_num)

def so_question_uri_cat(q_num):
    return DOMAIN + QUESTIONS + '/' + str(q_num)

编辑:也有关于加入字符串列表和使用命名替换的建议。这些是中心主题的变体,即在什么时候做正确的方法?感谢您的回复!

In Python, the where and when of using string concatenation versus string substitution eludes me. As the string concatenation has seen large boosts in performance, is this (becoming more) a stylistic decision rather than a practical one?

For a concrete example, how should one handle construction of flexible URIs:

DOMAIN = 'http://stackoverflow.com'
QUESTIONS = '/questions'

def so_question_uri_sub(q_num):
    return "%s%s/%d" % (DOMAIN, QUESTIONS, q_num)

def so_question_uri_cat(q_num):
    return DOMAIN + QUESTIONS + '/' + str(q_num)

Edit: There have also been suggestions about joining a list of strings and for using named substitution. These are variants on the central theme, which is, which way is the Right Way to do it at which time? Thanks for the responses!


回答 0

根据我的机器,连接的速度(明显)更快。但从风格上讲,如果性能不是很关键,我愿意付出替代的代价。好吧,如果我需要格式化,就不用问这个问题了……别无选择,只能使用插值/模板化。

>>> import timeit
>>> def so_q_sub(n):
...  return "%s%s/%d" % (DOMAIN, QUESTIONS, n)
...
>>> so_q_sub(1000)
'http://stackoverflow.com/questions/1000'
>>> def so_q_cat(n):
...  return DOMAIN + QUESTIONS + '/' + str(n)
...
>>> so_q_cat(1000)
'http://stackoverflow.com/questions/1000'
>>> t1 = timeit.Timer('so_q_sub(1000)','from __main__ import so_q_sub')
>>> t2 = timeit.Timer('so_q_cat(1000)','from __main__ import so_q_cat')
>>> t1.timeit(number=10000000)
12.166618871951641
>>> t2.timeit(number=10000000)
5.7813972166853773
>>> t1.timeit(number=1)
1.103492206766532e-05
>>> t2.timeit(number=1)
8.5206360154188587e-06

>>> def so_q_tmp(n):
...  return "{d}{q}/{n}".format(d=DOMAIN,q=QUESTIONS,n=n)
...
>>> so_q_tmp(1000)
'http://stackoverflow.com/questions/1000'
>>> t3= timeit.Timer('so_q_tmp(1000)','from __main__ import so_q_tmp')
>>> t3.timeit(number=10000000)
14.564135316080637

>>> def so_q_join(n):
...  return ''.join([DOMAIN,QUESTIONS,'/',str(n)])
...
>>> so_q_join(1000)
'http://stackoverflow.com/questions/1000'
>>> t4= timeit.Timer('so_q_join(1000)','from __main__ import so_q_join')
>>> t4.timeit(number=10000000)
9.4431309007150048

Concatenation is (significantly) faster according to my machine. But stylistically, I’m willing to pay the price of substitution if performance is not critical. Well, and if I need formatting, there’s no need to even ask the question… there’s no option but to use interpolation/templating.

>>> import timeit
>>> def so_q_sub(n):
...  return "%s%s/%d" % (DOMAIN, QUESTIONS, n)
...
>>> so_q_sub(1000)
'http://stackoverflow.com/questions/1000'
>>> def so_q_cat(n):
...  return DOMAIN + QUESTIONS + '/' + str(n)
...
>>> so_q_cat(1000)
'http://stackoverflow.com/questions/1000'
>>> t1 = timeit.Timer('so_q_sub(1000)','from __main__ import so_q_sub')
>>> t2 = timeit.Timer('so_q_cat(1000)','from __main__ import so_q_cat')
>>> t1.timeit(number=10000000)
12.166618871951641
>>> t2.timeit(number=10000000)
5.7813972166853773
>>> t1.timeit(number=1)
1.103492206766532e-05
>>> t2.timeit(number=1)
8.5206360154188587e-06

>>> def so_q_tmp(n):
...  return "{d}{q}/{n}".format(d=DOMAIN,q=QUESTIONS,n=n)
...
>>> so_q_tmp(1000)
'http://stackoverflow.com/questions/1000'
>>> t3= timeit.Timer('so_q_tmp(1000)','from __main__ import so_q_tmp')
>>> t3.timeit(number=10000000)
14.564135316080637

>>> def so_q_join(n):
...  return ''.join([DOMAIN,QUESTIONS,'/',str(n)])
...
>>> so_q_join(1000)
'http://stackoverflow.com/questions/1000'
>>> t4= timeit.Timer('so_q_join(1000)','from __main__ import so_q_join')
>>> t4.timeit(number=10000000)
9.4431309007150048

回答 1

不要忘记命名替换:

def so_question_uri_namedsub(q_num):
    return "%(domain)s%(questions)s/%(q_num)d" % locals()

Don’t forget about named substitution:

def so_question_uri_namedsub(q_num):
    return "%(domain)s%(questions)s/%(q_num)d" % locals()

回答 2

小心将字符串串联在一起! 字符串连接的代价与结果的长度成正比。循环使您直接进入N平方的区域。某些语言会优化串联到最近分配的字符串,但是依靠编译器将二次算法优化到线性优化是有风险的。最好使用原语(join?),该原语接收整个字符串列表,进行一次分配,然后一次性将它们全部串联起来。

Be wary of concatenating strings in a loop! The cost of string concatenation is proportional to the length of the result. Looping leads you straight to the land of N-squared. Some languages will optimize concatenation to the most recently allocated string, but it’s risky to count on the compiler to optimize your quadratic algorithm down to linear. Best to use the primitive (join?) that takes an entire list of strings, does a single allocation, and concatenates them all in one go.


回答 3

“由于字符串串联已经大大提高了性能……”

如果性能很重要,这是个好消息。

但是,我所见过的性能问题从未归结为字符串操作。我通常遇到I / O,排序和O(n 2)操作成为瓶颈的麻烦。

在字符串操作成为性能限制因素之前,我将坚持显而易见的事情。通常,当一行或更少行时,这是替换;当有意义时,则是串联;当它很大时,则是模板工具(例如Mako)。

“As the string concatenation has seen large boosts in performance…”

If performance matters, this is good to know.

However, performance problems I’ve seen have never come down to string operations. I’ve generally gotten in trouble with I/O, sorting and O(n2) operations being the bottlenecks.

Until string operations are the performance limiters, I’ll stick with things that are obvious. Mostly, that’s substitution when it’s one line or less, concatenation when it makes sense, and a template tool (like Mako) when it’s large.


回答 4

您要串联/插值的内容以及结果格式的格式应该会影响您的决策。

  • 字符串插值使您可以轻松添加格式。实际上,您的字符串插值版本与连接版本的功能不同。实际上,它会在q_num参数之前添加一个额外的正斜杠。要执行相同的操作,您将必须return DOMAIN + QUESTIONS + "/" + str(q_num)在该示例中编写。

  • 插值使设置数字格式更加容易;"%d of %d (%2.2f%%)" % (current, total, total/current)串联形式的可读性将大大降低。

  • 当您没有固定数量的项目要进行字符串化时,串联很有用。

另外,请知道Python 2.6引入了新版本的字符串插值,称为字符串模板

def so_question_uri_template(q_num):
    return "{domain}/{questions}/{num}".format(domain=DOMAIN,
                                               questions=QUESTIONS,
                                               num=q_num)

字符串模板将最终取代%插值,但是我认为这不会出现很长时间。

What you want to concatenate/interpolate and how you want to format the result should drive your decision.

  • String interpolation allows you to easily add formatting. In fact, your string interpolation version doesn’t do the same thing as your concatenation version; it actually adds an extra forward slash before the q_num parameter. To do the same thing, you would have to write return DOMAIN + QUESTIONS + "/" + str(q_num) in that example.

  • Interpolation makes it easier to format numerics; "%d of %d (%2.2f%%)" % (current, total, total/current) would be much less readable in concatenation form.

  • Concatenation is useful when you don’t have a fixed number of items to string-ize.

Also, know that Python 2.6 introduces a new version of string interpolation, called string templating:

def so_question_uri_template(q_num):
    return "{domain}/{questions}/{num}".format(domain=DOMAIN,
                                               questions=QUESTIONS,
                                               num=q_num)

String templating is slated to eventually replace %-interpolation, but that won’t happen for quite a while, I think.


回答 5

我只是出于好奇而测试了不同的字符串连接/替换方法的速度。谷歌搜索该主题将我带到这里。我以为我会发布测试结果,希望它可以帮助某人做出决定。

    import timeit
    def percent_():
            return "test %s, with number %s" % (1,2)

    def format_():
            return "test {}, with number {}".format(1,2)

    def format2_():
            return "test {1}, with number {0}".format(2,1)

    def concat_():
            return "test " + str(1) + ", with number " + str(2)

    def dotimers(func_list):
            # runs a single test for all functions in the list
            for func in func_list:
                    tmr = timeit.Timer(func)
                    res = tmr.timeit()
                    print "test " + func.func_name + ": " + str(res)

    def runtests(func_list, runs=5):
            # runs multiple tests for all functions in the list
            for i in range(runs):
                    print "----------- TEST #" + str(i + 1)
                    dotimers(func_list)

…运行之后runtests((percent_, format_, format2_, concat_), runs=5),我发现%方法的速度大约是这些小字符串上其他方法的两倍。concat方法始终是最慢的(很少)。切换format()方法中的位置时,差异很小,但是切换位置总是比常规格式方法至少慢0.01。

测试结果样本:

    test concat_()  : 0.62  (0.61 to 0.63)
    test format_()  : 0.56  (consistently 0.56)
    test format2_() : 0.58  (0.57 to 0.59)
    test percent_() : 0.34  (0.33 to 0.35)

之所以运行这些程序,是因为我在脚本中确实使用了字符串连接,所以我想知道这样做的代价是什么。我以不同的顺序运行它们,以确保没有任何干扰,或者获得更好的性能。附带说明一下,我将一些更长的字符串生成器加入了这些函数中,例如"%s" + ("a" * 1024),常规concat的速度几乎是使用formatand %方法的三倍(1.1 vs 2.8)。我想这取决于字符串以及您要实现的目标。如果性能确实很重要,那么尝试不同的东西并进行测试可能会更好。除非速度成为问题,否则我倾向于选择可读性而不是速度,但这就是我。所以不喜欢我的复制/粘贴,我必须在所有内容上放置8个空格以使其看起来正确。我通常使用4。

I was just testing the speed of different string concatenation/substitution methods out of curiosity. A google search on the subject brought me here. I thought I would post my test results in the hope that it might help someone decide.

    import timeit
    def percent_():
            return "test %s, with number %s" % (1,2)

    def format_():
            return "test {}, with number {}".format(1,2)

    def format2_():
            return "test {1}, with number {0}".format(2,1)

    def concat_():
            return "test " + str(1) + ", with number " + str(2)

    def dotimers(func_list):
            # runs a single test for all functions in the list
            for func in func_list:
                    tmr = timeit.Timer(func)
                    res = tmr.timeit()
                    print "test " + func.func_name + ": " + str(res)

    def runtests(func_list, runs=5):
            # runs multiple tests for all functions in the list
            for i in range(runs):
                    print "----------- TEST #" + str(i + 1)
                    dotimers(func_list)

…After running runtests((percent_, format_, format2_, concat_), runs=5), I found that the % method was about twice as fast as the others on these small strings. The concat method was always the slowest (barely). There were very tiny differences when switching the positions in the format() method, but switching positions was always at least .01 slower than the regular format method.

Sample of test results:

    test concat_()  : 0.62  (0.61 to 0.63)
    test format_()  : 0.56  (consistently 0.56)
    test format2_() : 0.58  (0.57 to 0.59)
    test percent_() : 0.34  (0.33 to 0.35)

I ran these because I do use string concatenation in my scripts, and I was wondering what the cost was. I ran them in different orders to make sure nothing was interfering, or getting better performance being first or last. On a side note, I threw in some longer string generators into those functions like "%s" + ("a" * 1024) and regular concat was almost 3 times as fast (1.1 vs 2.8) as using the format and % methods. I guess it depends on the strings, and what you are trying to achieve. If performance really matters, it might be better to try different things and test them. I tend to choose readability over speed, unless speed becomes a problem, but thats just me. SO didn’t like my copy/paste, i had to put 8 spaces on everything to make it look right. I usually use 4.


回答 6

请记住,如果您打算维护或调试代码,则风格决定实际的决定:-) Knuth有句著名的名言(可能引述Hoare?):“我们应该忘记效率低下的问题,大约有97%的时间是这样:过早的优化是万恶之源。”

只要您小心谨慎,不要(例如)将O(n)任务转换为O(n 2)任务,无论您发现最容易理解的是什么,我都会选择。

Remember, stylistic decisions are practical decisions, if you ever plan on maintaining or debugging your code :-) There’s a famous quote from Knuth (possibly quoting Hoare?): “We should forget about small efficiencies, say about 97% of the time: premature optimization is the root of all evil.”

As long as you’re careful not to (say) turn a O(n) task into an O(n2) task, I would go with whichever you find easiest to understand..


回答 7

我会尽一切可能使用替代。如果要在for循环中构建字符串,则仅使用串联。

I use substitution wherever I can. I only use concatenation if I’m building a string up in say a for-loop.


回答 8

实际上,在这种情况下(构建路径),正确的做法是使用os.path.join。不是字符串串联或插值

Actually the correct thing to do, in this case (building paths) is to use os.path.join. Not string concatenation or interpolation


声明:本站所有文章,如无特殊说明或标注,均为本站原创发布。任何个人或组织,在未征得本站同意时,禁止复制、盗用、采集、发布本站内容到任何网站、书籍等各类媒体平台。如若本站内容侵犯了原著者的合法权益,可联系我们进行处理。