Python 实用宝典

Question 1

In python say you have

s = "string"
i = 0
print s+i

will give you error so you write

print s+str(i)

to not get error.

I think this is quite a clumsy way to handle int and string concatenation. Even Java does not need explicit casting to String to do this sort of concatenation. Is there a better way to do this sort of concatenation i.e without explicit casting in Python?

Question 2

Modern string formatting:

"{} and {}".format("string", 1)

Question 3

No string formatting:

>> print 'Foo',0
Foo 0

Question 4

String formatting, using the new-style .format() method (with the defaults .format() provides):

 '{}{}'.format(s, i)

Or the older, but “still sticking around”, %-formatting:

 '%s%d' %(s, i)

In both examples above there’s no space between the two items concatenated. If space is needed, it can simply be added in the format strings.

These provide a lot of control and flexibility about how to concatenate items, the space between them etc. For details about format specifications see this.

Question 5

Python is an interesting language in that while there is usually one (or two) “obvious” ways to accomplish any given task, flexibility still exists.

s = "string"
i = 0

print (s + repr(i))

The above code snippet is written in Python3 syntax but the parentheses after print were always allowed (optional) until version 3 made them mandatory.

Hope this helps.

Caitlin

Question 6

format() method can be used to concatenate string and integer

print(s+"{}".format(i))

Question 7

if you only want to print yo can do this:

print(s , i)

Question 8

in python 3.6 and newer, you can format it just like this:

new_string = f'{s} {i}'
print(new_string)

or just:

print(f'{s} {i}')

Question 9

A common antipattern in Python is to concatenate a sequence of strings using + in a loop. This is bad because the Python interpreter has to create a new string object for each iteration, and it ends up taking quadratic time. (Recent versions of CPython can apparently optimize this in some cases, but other implementations can’t, so programmers are discouraged from relying on this.) ''.join is the right way to do this.

However, I’ve heard it said (including here on Stack Overflow) that you should never, ever use + for string concatenation, but instead always use ''.join or a format string. I don’t understand why this is the case if you’re only concatenating two strings. If my understanding is correct, it shouldn’t take quadratic time, and I think a + b is cleaner and more readable than either ''.join((a, b)) or '%s%s' % (a, b).

Is it good practice to use + to concatenate two strings? Or is there a problem I’m not aware of?

Question 10

There is nothing wrong in concatenating two strings with +. Indeed it’s easier to read than ''.join([a, b]).

You are right though that concatenating more than 2 strings with + is an O(n^2) operation (compared to O(n) for join) and thus becomes inefficient. However this has not to do with using a loop. Even a + b + c + ... is O(n^2), the reason being that each concatenation produces a new string.

CPython2.4 and above try to mitigate that, but it’s still advisable to use join when concatenating more than 2 strings.

Question 11

Plus operator is perfectly fine solution to concatenate two Python strings. But if you keep adding more than two strings (n > 25) , you might want to think something else.

''.join([a, b, c]) trick is a performance optimization.

Question 12

The assumption that one should never, ever use + for string concatenation, but instead always use ”.join may be a myth. It is true that using + creates unnecessary temporary copies of immutable string object but the other not oft quoted fact is that calling join in a loop would generally add the overhead of function call. Lets take your example.

Create two lists, one from the linked SO question and another a bigger fabricated

>>> myl1 = ['A','B','C','D','E','F']
>>> myl2=[chr(random.randint(65,90)) for i in range(0,10000)]

Lets create two functions, UseJoin and UsePlus to use the respective join and + functionality.

>>> def UsePlus():
    return [myl[i] + myl[i + 1] for i in range(0,len(myl), 2)]

>>> def UseJoin():
    [''.join((myl[i],myl[i + 1])) for i in range(0,len(myl), 2)]

Lets run timeit with the first list

>>> myl=myl1
>>> t1=timeit.Timer("UsePlus()","from __main__ import UsePlus")
>>> t2=timeit.Timer("UseJoin()","from __main__ import UseJoin")
>>> print "%.2f usec/pass" % (1000000 * t1.timeit(number=100000)/100000)
2.48 usec/pass
>>> print "%.2f usec/pass" % (1000000 * t2.timeit(number=100000)/100000)
2.61 usec/pass
>>>

They have almost the same runtime.

Lets use cProfile

>>> myl=myl2
>>> cProfile.run("UsePlus()")
         5 function calls in 0.001 CPU seconds

   Ordered by: standard name

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1    0.001    0.001    0.001    0.001 <pyshell#1376>:1(UsePlus)
        1    0.000    0.000    0.001    0.001 <string>:1(<module>)
        1    0.000    0.000    0.000    0.000 {len}
        1    0.000    0.000    0.000    0.000 {method 'disable' of '_lsprof.Profiler' objects}
        1    0.000    0.000    0.000    0.000 {range}


>>> cProfile.run("UseJoin()")
         5005 function calls in 0.029 CPU seconds

   Ordered by: standard name

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1    0.015    0.015    0.029    0.029 <pyshell#1388>:1(UseJoin)
        1    0.000    0.000    0.029    0.029 <string>:1(<module>)
        1    0.000    0.000    0.000    0.000 {len}
        1    0.000    0.000    0.000    0.000 {method 'disable' of '_lsprof.Profiler' objects}
     5000    0.014    0.000    0.014    0.000 {method 'join' of 'str' objects}
        1    0.000    0.000    0.000    0.000 {range}

And it looks that using Join, results in unnecessary function calls which could add to the overhead.

Now coming back to the question. Should one discourage the use of + over join in all cases?

I believe no, things should be taken into consideration

Length of the String in Question
No of Concatenation Operation.

And off-course in a development pre-mature optimization is evil.

Question 13

When working with multiple people, it’s sometimes difficult to know exactly what’s happening. Using a format string instead of concatenation can avoid one particular annoyance that’s happened a whole ton of times to us:

Say, a function requires an argument, and you write it expecting to get a string:

In [1]: def foo(zeta):
   ...:     print 'bar: ' + zeta

In [2]: foo('bang')
bar: bang

So, this function may be used pretty often throughout the code. Your coworkers may know exactly what it does, but not necessarily be fully up-to-speed on the internals, and may not know that the function expects a string. And so they may end up with this:

In [3]: foo(23)
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)

/home/izkata/<ipython console> in <module>()

/home/izkata/<ipython console> in foo(zeta)

TypeError: cannot concatenate 'str' and 'int' objects

There would be no problem if you just used a format string:

In [1]: def foo(zeta):
   ...:     print 'bar: %s' % zeta
   ...:     
   ...:     

In [2]: foo('bang')
bar: bang

In [3]: foo(23)
bar: 23

The same is true for all types of objects that define __str__, which may be passed in as well:

In [1]: from datetime import date

In [2]: zeta = date(2012, 4, 15)

In [3]: print 'bar: ' + zeta
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)

/home/izkata/<ipython console> in <module>()

TypeError: cannot concatenate 'str' and 'datetime.date' objects

In [4]: print 'bar: %s' % zeta
bar: 2012-04-15

So yes: If you can use a format string do it and take advantage of what Python has to offer.

Question 14

I have done a quick test:

import sys

str = e = "a xxxxxxxxxx very xxxxxxxxxx long xxxxxxxxxx string xxxxxxxxxx\n"

for i in range(int(sys.argv[1])):
    str = str + e

and timed it:

mslade@mickpc:/binks/micks/ruby/tests$ time python /binks/micks/junk/strings.py  8000000
8000000 times

real    0m2.165s
user    0m1.620s
sys     0m0.540s
mslade@mickpc:/binks/micks/ruby/tests$ time python /binks/micks/junk/strings.py  16000000
16000000 times

real    0m4.360s
user    0m3.480s
sys     0m0.870s

There is apparently an optimisation for the a = a + b case. It does not exhibit O(n^2) time as one might suspect.

So at least in terms of performance, using + is fine.

Question 15

According to Python docs, using str.join() will give you performance consistence across various implementations of Python. Although CPython optimizes away the quadratic behavior of s = s + t, other Python implementations may not.

CPython implementation detail: If s and t are both strings, some Python implementations such as CPython can usually perform an in-place optimization for assignments of the form s = s + t or s += t. When applicable, this optimization makes quadratic run-time much less likely. This optimization is both version and implementation dependent. For performance sensitive code, it is preferable to use the str.join() method which assures consistent linear concatenation performance across versions and implementations.

Sequence Types in Python docs (see the foot note [6])

Question 16

I use the following with python 3.8

string4 = f'{string1}{string2}{string3}'

Question 17

”.join([a, b]) is better solution than +.

Because Code should be written in a way that does not disadvantage other implementations of Python (PyPy, Jython, IronPython, Cython, Psyco, and such)

form a += b or a = a + b is fragile even in CPython and isn’t present at all in implementations that don’t use refcounting (reference counting is a technique of storing the number of references, pointers, or handles to a resource such as an object, block of memory, disk space or other resource)

https://www.python.org/dev/peps/pep-0008/#programming-recommendations

Question 18

In Python, the where and when of using string concatenation versus string substitution eludes me. As the string concatenation has seen large boosts in performance, is this (becoming more) a stylistic decision rather than a practical one?

For a concrete example, how should one handle construction of flexible URIs:

DOMAIN = 'http://stackoverflow.com'
QUESTIONS = '/questions'

def so_question_uri_sub(q_num):
    return "%s%s/%d" % (DOMAIN, QUESTIONS, q_num)

def so_question_uri_cat(q_num):
    return DOMAIN + QUESTIONS + '/' + str(q_num)

Edit: There have also been suggestions about joining a list of strings and for using named substitution. These are variants on the central theme, which is, which way is the Right Way to do it at which time? Thanks for the responses!

Question 19

Concatenation is (significantly) faster according to my machine. But stylistically, I’m willing to pay the price of substitution if performance is not critical. Well, and if I need formatting, there’s no need to even ask the question… there’s no option but to use interpolation/templating.

>>> import timeit
>>> def so_q_sub(n):
...  return "%s%s/%d" % (DOMAIN, QUESTIONS, n)
...
>>> so_q_sub(1000)
'http://stackoverflow.com/questions/1000'
>>> def so_q_cat(n):
...  return DOMAIN + QUESTIONS + '/' + str(n)
...
>>> so_q_cat(1000)
'http://stackoverflow.com/questions/1000'
>>> t1 = timeit.Timer('so_q_sub(1000)','from __main__ import so_q_sub')
>>> t2 = timeit.Timer('so_q_cat(1000)','from __main__ import so_q_cat')
>>> t1.timeit(number=10000000)
12.166618871951641
>>> t2.timeit(number=10000000)
5.7813972166853773
>>> t1.timeit(number=1)
1.103492206766532e-05
>>> t2.timeit(number=1)
8.5206360154188587e-06

>>> def so_q_tmp(n):
...  return "{d}{q}/{n}".format(d=DOMAIN,q=QUESTIONS,n=n)
...
>>> so_q_tmp(1000)
'http://stackoverflow.com/questions/1000'
>>> t3= timeit.Timer('so_q_tmp(1000)','from __main__ import so_q_tmp')
>>> t3.timeit(number=10000000)
14.564135316080637

>>> def so_q_join(n):
...  return ''.join([DOMAIN,QUESTIONS,'/',str(n)])
...
>>> so_q_join(1000)
'http://stackoverflow.com/questions/1000'
>>> t4= timeit.Timer('so_q_join(1000)','from __main__ import so_q_join')
>>> t4.timeit(number=10000000)
9.4431309007150048

Question 20

Don’t forget about named substitution:

def so_question_uri_namedsub(q_num):
    return "%(domain)s%(questions)s/%(q_num)d" % locals()

Question 21

Be wary of concatenating strings in a loop! The cost of string concatenation is proportional to the length of the result. Looping leads you straight to the land of N-squared. Some languages will optimize concatenation to the most recently allocated string, but it’s risky to count on the compiler to optimize your quadratic algorithm down to linear. Best to use the primitive (join?) that takes an entire list of strings, does a single allocation, and concatenates them all in one go.

Question 22

“As the string concatenation has seen large boosts in performance…”

If performance matters, this is good to know.

However, performance problems I’ve seen have never come down to string operations. I’ve generally gotten in trouble with I/O, sorting and O(n²) operations being the bottlenecks.

Until string operations are the performance limiters, I’ll stick with things that are obvious. Mostly, that’s substitution when it’s one line or less, concatenation when it makes sense, and a template tool (like Mako) when it’s large.

Question 23

What you want to concatenate/interpolate and how you want to format the result should drive your decision.

String interpolation allows you to easily add formatting. In fact, your string interpolation version doesn’t do the same thing as your concatenation version; it actually adds an extra forward slash before the q_num parameter. To do the same thing, you would have to write return DOMAIN + QUESTIONS + "/" + str(q_num) in that example.
Interpolation makes it easier to format numerics; "%d of %d (%2.2f%%)" % (current, total, total/current) would be much less readable in concatenation form.
Concatenation is useful when you don’t have a fixed number of items to string-ize.

Also, know that Python 2.6 introduces a new version of string interpolation, called string templating:

def so_question_uri_template(q_num):
    return "{domain}/{questions}/{num}".format(domain=DOMAIN,
                                               questions=QUESTIONS,
                                               num=q_num)

String templating is slated to eventually replace %-interpolation, but that won’t happen for quite a while, I think.

Question 24

I was just testing the speed of different string concatenation/substitution methods out of curiosity. A google search on the subject brought me here. I thought I would post my test results in the hope that it might help someone decide.

    import timeit
    def percent_():
            return "test %s, with number %s" % (1,2)

    def format_():
            return "test {}, with number {}".format(1,2)

    def format2_():
            return "test {1}, with number {0}".format(2,1)

    def concat_():
            return "test " + str(1) + ", with number " + str(2)

    def dotimers(func_list):
            # runs a single test for all functions in the list
            for func in func_list:
                    tmr = timeit.Timer(func)
                    res = tmr.timeit()
                    print "test " + func.func_name + ": " + str(res)

    def runtests(func_list, runs=5):
            # runs multiple tests for all functions in the list
            for i in range(runs):
                    print "----------- TEST #" + str(i + 1)
                    dotimers(func_list)

…After running runtests((percent_, format_, format2_, concat_), runs=5), I found that the % method was about twice as fast as the others on these small strings. The concat method was always the slowest (barely). There were very tiny differences when switching the positions in the format() method, but switching positions was always at least .01 slower than the regular format method.

Sample of test results:

    test concat_()  : 0.62  (0.61 to 0.63)
    test format_()  : 0.56  (consistently 0.56)
    test format2_() : 0.58  (0.57 to 0.59)
    test percent_() : 0.34  (0.33 to 0.35)

I ran these because I do use string concatenation in my scripts, and I was wondering what the cost was. I ran them in different orders to make sure nothing was interfering, or getting better performance being first or last. On a side note, I threw in some longer string generators into those functions like "%s" + ("a" * 1024) and regular concat was almost 3 times as fast (1.1 vs 2.8) as using the format and % methods. I guess it depends on the strings, and what you are trying to achieve. If performance really matters, it might be better to try different things and test them. I tend to choose readability over speed, unless speed becomes a problem, but thats just me. SO didn’t like my copy/paste, i had to put 8 spaces on everything to make it look right. I usually use 4.

Question 25

Remember, stylistic decisions are practical decisions, if you ever plan on maintaining or debugging your code :-) There’s a famous quote from Knuth (possibly quoting Hoare?): “We should forget about small efficiencies, say about 97% of the time: premature optimization is the root of all evil.”

As long as you’re careful not to (say) turn a O(n) task into an O(n²) task, I would go with whichever you find easiest to understand..

Question 26

I use substitution wherever I can. I only use concatenation if I’m building a string up in say a for-loop.

Question 27

Actually the correct thing to do, in this case (building paths) is to use os.path.join. Not string concatenation or interpolation

Python 实用宝典

标签归档：string-concatenation

在python中连接字符串和整数

问题：在python中连接字符串和整数

回答 0

回答 1

回答 2

回答 3

回答 4

回答 5

回答 6

有什么理由不使用’+’连接两个字符串吗？

问题：有什么理由不使用’+’连接两个字符串吗？

回答 0

回答 1

回答 2

回答 3

回答 4

回答 5

回答 6

回答 7

Python中的字符串串联与字符串替换

问题：Python中的字符串串联与字符串替换

回答 0

回答 1

回答 2

回答 3

回答 4

回答 5

回答 6

回答 7

回答 8

有趣好用的Python教程