问题:插入两个字符串的最pythonic方式
将两个字符串网格化的最Python方式是什么?
例如:
输入:
u = 'ABCDEFGHIJKLMNOPQRSTUVWXYZ'
l = 'abcdefghijklmnopqrstuvwxyz'
输出:
'AaBbCcDdEeFfGgHhIiJjKkLlMmNnOoPpQqRrSsTtUuVvWwXxYyZz'
What’s the most pythonic way to mesh two strings together?
For example:
Input:
u = 'ABCDEFGHIJKLMNOPQRSTUVWXYZ'
l = 'abcdefghijklmnopqrstuvwxyz'
Output:
'AaBbCcDdEeFfGgHhIiJjKkLlMmNnOoPpQqRrSsTtUuVvWwXxYyZz'
回答 0
对我来说,最pythonic *的方式是以下代码,它几乎做同样的事情,但是使用+
运算符来连接每个字符串中的各个字符:
res = "".join(i + j for i, j in zip(u, l))
print(res)
# 'AaBbCcDdEeFfGgHhIiJjKkLlMmNnOoPpQqRrSsTtUuVvWwXxYyZz'
它也比使用两个join()
调用更快:
In [5]: l1 = 'A' * 1000000; l2 = 'a' * 1000000
In [6]: %timeit "".join("".join(item) for item in zip(l1, l2))
1 loops, best of 3: 442 ms per loop
In [7]: %timeit "".join(i + j for i, j in zip(l1, l2))
1 loops, best of 3: 360 ms per loop
存在更快的方法,但是它们常常使代码模糊。
注:如果两个输入字符串是不相同的长度,则较长的一个将被截断,zip
停在较短字符串的结尾迭代。在这种情况下,zip
应该使用模块中的zip_longest
(izip_longest
在Python 2中)而不是一个itertools
来确保两个字符串都已用尽。
*引用Python之禅:可读性很重要。
Pythonic = 对我而言可读性;i + j
至少对于我的眼睛来说,更容易从视觉上进行解析。
For me, the most pythonic* way is the following which pretty much does the same thing but uses the +
operator for concatenating the individual characters in each string:
res = "".join(i + j for i, j in zip(u, l))
print(res)
# 'AaBbCcDdEeFfGgHhIiJjKkLlMmNnOoPpQqRrSsTtUuVvWwXxYyZz'
It is also faster than using two join()
calls:
In [5]: l1 = 'A' * 1000000; l2 = 'a' * 1000000
In [6]: %timeit "".join("".join(item) for item in zip(l1, l2))
1 loops, best of 3: 442 ms per loop
In [7]: %timeit "".join(i + j for i, j in zip(l1, l2))
1 loops, best of 3: 360 ms per loop
Faster approaches exist, but they often obfuscate the code.
Note: If the two input strings are not the same length then the longer one will be truncated as zip
stops iterating at the end of the shorter string. In this case instead of zip
one should use zip_longest
(izip_longest
in Python 2) from the itertools
module to ensure that both strings are fully exhausted.
*To take a quote from the Zen of Python: Readability counts.
Pythonic = readability for me; i + j
is just visually parsed more easily, at least for my eyes.
回答 1
更快的选择
其他方式:
res = [''] * len(u) * 2
res[::2] = u
res[1::2] = l
print(''.join(res))
输出:
'AaBbCcDdEeFfGgHhIiJjKkLlMmNnOoPpQqRrSsTtUuVvWwXxYyZz'
速度
看起来更快:
%%timeit
res = [''] * len(u) * 2
res[::2] = u
res[1::2] = l
''.join(res)
100000 loops, best of 3: 4.75 µs per loop
比迄今为止最快的解决方案:
%timeit "".join(list(chain.from_iterable(zip(u, l))))
100000 loops, best of 3: 6.52 µs per loop
同样对于较大的字符串:
l1 = 'A' * 1000000; l2 = 'a' * 1000000
%timeit "".join(list(chain.from_iterable(zip(l1, l2))))
1 loops, best of 3: 151 ms per loop
%%timeit
res = [''] * len(l1) * 2
res[::2] = l1
res[1::2] = l2
''.join(res)
10 loops, best of 3: 92 ms per loop
Python 3.5.1。
不同长度字符串的变化
u = 'ABCDEFGHIJKLMNOPQRSTUVWXYZ'
l = 'abcdefghijkl'
较短的一个确定长度(zip()
等效)
min_len = min(len(u), len(l))
res = [''] * min_len * 2
res[::2] = u[:min_len]
res[1::2] = l[:min_len]
print(''.join(res))
输出:
AaBbCcDdEeFfGgHhIiJjKkLl
更长的长度决定长度(itertools.zip_longest(fillvalue='')
等效)
min_len = min(len(u), len(l))
res = [''] * min_len * 2
res[::2] = u[:min_len]
res[1::2] = l[:min_len]
res += u[min_len:] + l[min_len:]
print(''.join(res))
输出:
AaBbCcDdEeFfGgHhIiJjKkLlMNOPQRSTUVWXYZ
Faster Alternative
Another way:
res = [''] * len(u) * 2
res[::2] = u
res[1::2] = l
print(''.join(res))
Output:
'AaBbCcDdEeFfGgHhIiJjKkLlMmNnOoPpQqRrSsTtUuVvWwXxYyZz'
Speed
Looks like it is faster:
%%timeit
res = [''] * len(u) * 2
res[::2] = u
res[1::2] = l
''.join(res)
100000 loops, best of 3: 4.75 µs per loop
than the fastest solution so far:
%timeit "".join(list(chain.from_iterable(zip(u, l))))
100000 loops, best of 3: 6.52 µs per loop
Also for the larger strings:
l1 = 'A' * 1000000; l2 = 'a' * 1000000
%timeit "".join(list(chain.from_iterable(zip(l1, l2))))
1 loops, best of 3: 151 ms per loop
%%timeit
res = [''] * len(l1) * 2
res[::2] = l1
res[1::2] = l2
''.join(res)
10 loops, best of 3: 92 ms per loop
Python 3.5.1.
Variation for strings with different lengths
u = 'ABCDEFGHIJKLMNOPQRSTUVWXYZ'
l = 'abcdefghijkl'
Shorter one determines length (zip()
equivalent)
min_len = min(len(u), len(l))
res = [''] * min_len * 2
res[::2] = u[:min_len]
res[1::2] = l[:min_len]
print(''.join(res))
Output:
AaBbCcDdEeFfGgHhIiJjKkLl
Longer one determines length (itertools.zip_longest(fillvalue='')
equivalent)
min_len = min(len(u), len(l))
res = [''] * min_len * 2
res[::2] = u[:min_len]
res[1::2] = l[:min_len]
res += u[min_len:] + l[min_len:]
print(''.join(res))
Output:
AaBbCcDdEeFfGgHhIiJjKkLlMNOPQRSTUVWXYZ
回答 2
与join()
和zip()
。
>>> ''.join(''.join(item) for item in zip(u,l))
'AaBbCcDdEeFfGgHhIiJjKkLlMmNnOoPpQqRrSsTtUuVvWwXxYyZz'
With join()
and zip()
.
>>> ''.join(''.join(item) for item in zip(u,l))
'AaBbCcDdEeFfGgHhIiJjKkLlMmNnOoPpQqRrSsTtUuVvWwXxYyZz'
回答 3
在Python 2上,到目前为止,做事的最快方法是小字符串列表切片的速度大约是3倍,长字符串列表切片的速度大约是30倍。
res = bytearray(len(u) * 2)
res[::2] = u
res[1::2] = l
str(res)
但是,这在Python 3上不起作用。您可以实现类似
res = bytearray(len(u) * 2)
res[::2] = u.encode("ascii")
res[1::2] = l.encode("ascii")
res.decode("ascii")
但是到那时,您已经失去了对小型字符串进行列表切片所获得的收益(对于长字符串而言,它的速度仍然是20倍),并且这甚至还不适用于非ASCII字符。
FWIW,如果您要在大量字符串上执行此操作并且需要每个周期,并且由于某种原因必须使用Python字符串…以下是操作方法:
res = bytearray(len(u) * 4 * 2)
u_utf32 = u.encode("utf_32_be")
res[0::8] = u_utf32[0::4]
res[1::8] = u_utf32[1::4]
res[2::8] = u_utf32[2::4]
res[3::8] = u_utf32[3::4]
l_utf32 = l.encode("utf_32_be")
res[4::8] = l_utf32[0::4]
res[5::8] = l_utf32[1::4]
res[6::8] = l_utf32[2::4]
res[7::8] = l_utf32[3::4]
res.decode("utf_32_be")
特殊情况下,较小类型的外壳也将有所帮助。FWIW,这只是长字符串列表切片速度的3倍,而小字符串则慢 4到5倍。
无论哪种方式,我都喜欢join
解决方案,但是由于在其他地方提到了时间安排,我认为我也应该加入。
On Python 2, by far the faster way to do things, at ~3x the speed of list slicing for small strings and ~30x for long ones, is
res = bytearray(len(u) * 2)
res[::2] = u
res[1::2] = l
str(res)
This wouldn’t work on Python 3, though. You could implement something like
res = bytearray(len(u) * 2)
res[::2] = u.encode("ascii")
res[1::2] = l.encode("ascii")
res.decode("ascii")
but by then you’ve already lost the gains over list slicing for small strings (it’s still 20x the speed for long strings) and this doesn’t even work for non-ASCII characters yet.
FWIW, if you are doing this on massive strings and need every cycle, and for some reason have to use Python strings… here’s how to do it:
res = bytearray(len(u) * 4 * 2)
u_utf32 = u.encode("utf_32_be")
res[0::8] = u_utf32[0::4]
res[1::8] = u_utf32[1::4]
res[2::8] = u_utf32[2::4]
res[3::8] = u_utf32[3::4]
l_utf32 = l.encode("utf_32_be")
res[4::8] = l_utf32[0::4]
res[5::8] = l_utf32[1::4]
res[6::8] = l_utf32[2::4]
res[7::8] = l_utf32[3::4]
res.decode("utf_32_be")
Special-casing the common case of smaller types will help too. FWIW, this is only 3x the speed of list slicing for long strings and a factor of 4 to 5 slower for small strings.
Either way I prefer the join
solutions, but since timings were mentioned elsewhere I thought I might as well join in.
回答 4
如果您想要最快的方法,可以将itertools与结合使用operator.add
:
In [36]: from operator import add
In [37]: from itertools import starmap, izip
In [38]: timeit "".join([i + j for i, j in uzip(l1, l2)])
1 loops, best of 3: 142 ms per loop
In [39]: timeit "".join(starmap(add, izip(l1,l2)))
1 loops, best of 3: 117 ms per loop
In [40]: timeit "".join(["".join(item) for item in zip(l1, l2)])
1 loops, best of 3: 196 ms per loop
In [41]: "".join(starmap(add, izip(l1,l2))) == "".join([i + j for i, j in izip(l1, l2)]) == "".join(["".join(item) for item in izip(l1, l2)])
Out[42]: True
但是合并起来izip
又chain.from_iterable
更快了
In [2]: from itertools import chain, izip
In [3]: timeit "".join(chain.from_iterable(izip(l1, l2)))
10 loops, best of 3: 98.7 ms per loop
chain(*
和之间也存在实质性差异
chain.from_iterable(...
。
In [5]: timeit "".join(chain(*izip(l1, l2)))
1 loops, best of 3: 212 ms per loop
没有像join那样的生成器,传递一个总是慢一些,因为python首先会使用内容来建立一个列表,因为它会对数据进行两次传递,一次传递所需的大小,一次传递实际的大小使用生成器无法实现的联接:
join.h:
/* Here is the general case. Do a pre-pass to figure out the total
* amount of space we'll need (sz), and see whether all arguments are
* bytes-like.
*/
另外,如果您使用不同长度的字符串,并且不想丢失数据,则可以使用izip_longest:
In [22]: from itertools import izip_longest
In [23]: a,b = "hlo","elworld"
In [24]: "".join(chain.from_iterable(izip_longest(a, b,fillvalue="")))
Out[24]: 'helloworld'
对于python 3,它称为 zip_longest
但是对于python2来说,veedrac的建议是迄今为止最快的:
In [18]: %%timeit
res = bytearray(len(u) * 2)
res[::2] = u
res[1::2] = l
str(res)
....:
100 loops, best of 3: 2.68 ms per loop
If you want the fastest way, you can combine itertools with operator.add
:
In [36]: from operator import add
In [37]: from itertools import starmap, izip
In [38]: timeit "".join([i + j for i, j in uzip(l1, l2)])
1 loops, best of 3: 142 ms per loop
In [39]: timeit "".join(starmap(add, izip(l1,l2)))
1 loops, best of 3: 117 ms per loop
In [40]: timeit "".join(["".join(item) for item in zip(l1, l2)])
1 loops, best of 3: 196 ms per loop
In [41]: "".join(starmap(add, izip(l1,l2))) == "".join([i + j for i, j in izip(l1, l2)]) == "".join(["".join(item) for item in izip(l1, l2)])
Out[42]: True
But combining izip
and chain.from_iterable
is faster again
In [2]: from itertools import chain, izip
In [3]: timeit "".join(chain.from_iterable(izip(l1, l2)))
10 loops, best of 3: 98.7 ms per loop
There is also a substantial difference between
chain(*
and chain.from_iterable(...
.
In [5]: timeit "".join(chain(*izip(l1, l2)))
1 loops, best of 3: 212 ms per loop
There is no such thing as a generator with join, passing one is always going to be slower as python will first build a list using the content because it does two passes over the data, one to figure out the size needed and one to actually do the join which would not be possible using a generator:
join.h:
/* Here is the general case. Do a pre-pass to figure out the total
* amount of space we'll need (sz), and see whether all arguments are
* bytes-like.
*/
Also if you have different length strings and you don’t want to lose data you can use izip_longest :
In [22]: from itertools import izip_longest
In [23]: a,b = "hlo","elworld"
In [24]: "".join(chain.from_iterable(izip_longest(a, b,fillvalue="")))
Out[24]: 'helloworld'
For python 3 it is called zip_longest
But for python2, veedrac’s suggestion is by far the fastest:
In [18]: %%timeit
res = bytearray(len(u) * 2)
res[::2] = u
res[1::2] = l
str(res)
....:
100 loops, best of 3: 2.68 ms per loop
回答 5
您也可以使用map
和执行此操作operator.add
:
from operator import add
u = 'AAAAA'
l = 'aaaaa'
s = "".join(map(add, u, l))
输出:
'AaAaAaAaAa'
map的作用是,它从第一个可迭代对象获取每个元素,u
并从第二个可迭代对象获取第一个元素,l
并应用作为第一个参数提供的函数add
。然后加入只是加入他们。
You could also do this using map
and operator.add
:
from operator import add
u = 'AAAAA'
l = 'aaaaa'
s = "".join(map(add, u, l))
Output:
'AaAaAaAaAa'
What map does is it takes every element from the first iterable u
and the first elements from the second iterable l
and applies the function supplied as the first argument add
. Then join just joins them.
回答 6
吉姆的答案很好,但是,如果您不介意几次导入,这是我最喜欢的选择:
from functools import reduce
from operator import add
reduce(add, map(add, u, l))
Jim’s answer is great, but here’s my favorite option, if you don’t mind a couple of imports:
from functools import reduce
from operator import add
reduce(add, map(add, u, l))
回答 7
这些建议很多都假设字符串长度相等。也许涵盖了所有合理的用例,但至少对我来说,您似乎也想适应长度不同的字符串。还是我是唯一认为网格应该像这样工作的人:
u = "foobar"
l = "baz"
mesh(u,l) = "fboaozbar"
一种方法是:
def mesh(a,b):
minlen = min(len(a),len(b))
return "".join(["".join(x+y for x,y in zip(a,b)),a[minlen:],b[minlen:]])
A lot of these suggestions assume the strings are of equal length. Maybe that covers all reasonable use cases, but at least to me it seems that you might want to accomodate strings of differing lengths too. Or am I the only one thinking the mesh should work a bit like this:
u = "foobar"
l = "baz"
mesh(u,l) = "fboaozbar"
One way to do this would be the following:
def mesh(a,b):
minlen = min(len(a),len(b))
return "".join(["".join(x+y for x,y in zip(a,b)),a[minlen:],b[minlen:]])
回答 8
我喜欢使用两个for
s,变量名可以提示/提醒正在发生的事情:
"".join(char for pair in zip(u,l) for char in pair)
I like using two for
s, the variable names can give a hint/reminder to what is going on:
"".join(char for pair in zip(u,l) for char in pair)
回答 9
只是添加另一种更基本的方法:
st = ""
for char in u:
st = "{0}{1}{2}".format( st, char, l[ u.index( char ) ] )
Just to add another, more basic approach:
st = ""
for char in u:
st = "{0}{1}{2}".format( st, char, l[ u.index( char ) ] )
回答 10
有点不讲究Python而不考虑这里的double-list-comprehension答案,用O(1)来处理n个字符串:
"".join(c for cs in itertools.zip_longest(*all_strings) for c in cs)
all_strings
您要插入的字符串的列表在哪里。就您而言,all_strings = [u, l]
。完整的使用示例如下所示:
import itertools
a = 'ABCDEFGHIJKLMNOPQRSTUVWXYZ'
b = 'abcdefghijklmnopqrstuvwxyz'
all_strings = [a,b]
interleaved = "".join(c for cs in itertools.zip_longest(*all_strings) for c in cs)
print(interleaved)
# 'AaBbCcDdEeFfGgHhIiJjKkLlMmNnOoPpQqRrSsTtUuVvWwXxYyZz'
喜欢许多答案,最快吗?可能不是,但是简单而灵活。另外,在没有增加太多复杂性的情况下,这比公认的答案要快一些(通常,在python中字符串添加有点慢):
In [7]: l1 = 'A' * 1000000; l2 = 'a' * 1000000;
In [8]: %timeit "".join(a + b for i, j in zip(l1, l2))
1 loops, best of 3: 227 ms per loop
In [9]: %timeit "".join(c for cs in zip(*(l1, l2)) for c in cs)
1 loops, best of 3: 198 ms per loop
Feels a bit un-pythonic not to consider the double-list-comprehension answer here, to handle n string with O(1) effort:
"".join(c for cs in itertools.zip_longest(*all_strings) for c in cs)
where all_strings
is a list of the strings you want to interleave. In your case, all_strings = [u, l]
. A full use example would look like this:
import itertools
a = 'ABCDEFGHIJKLMNOPQRSTUVWXYZ'
b = 'abcdefghijklmnopqrstuvwxyz'
all_strings = [a,b]
interleaved = "".join(c for cs in itertools.zip_longest(*all_strings) for c in cs)
print(interleaved)
# 'AaBbCcDdEeFfGgHhIiJjKkLlMmNnOoPpQqRrSsTtUuVvWwXxYyZz'
Like many answers, fastest? Probably not, but simple and flexible. Also, without too much added complexity, this is slightly faster than the accepted answer (in general, string addition is a bit slow in python):
In [7]: l1 = 'A' * 1000000; l2 = 'a' * 1000000;
In [8]: %timeit "".join(a + b for i, j in zip(l1, l2))
1 loops, best of 3: 227 ms per loop
In [9]: %timeit "".join(c for cs in zip(*(l1, l2)) for c in cs)
1 loops, best of 3: 198 ms per loop
回答 11
可能比当前领先的解决方案更快,更短:
from itertools import chain
u = 'ABCDEFGHIJKLMNOPQRSTUVWXYZ'
l = 'abcdefghijklmnopqrstuvwxyz'
res = "".join(chain(*zip(u, l)))
快速策略是在C级别上尽可能多地做。相同的zip_longest()修复了不均匀的字符串,它会与chain()来自同一个模块,所以在这里不能给我太多点!
我提出的其他解决方案:
res = "".join(u[x] + l[x] for x in range(len(u)))
res = "".join(k + l[i] for i, k in enumerate(u))
Potentially faster and shorter than the current leading solution:
from itertools import chain
u = 'ABCDEFGHIJKLMNOPQRSTUVWXYZ'
l = 'abcdefghijklmnopqrstuvwxyz'
res = "".join(chain(*zip(u, l)))
Strategy speed-wise is to do as much at the C-level as possible. Same zip_longest() fix for uneven strings and it would be coming out of the same module as chain() so can’t ding me too many points there!
Other solutions I came up with along the way:
res = "".join(u[x] + l[x] for x in range(len(u)))
res = "".join(k + l[i] for i, k in enumerate(u))
回答 12
你可以用1iteration_utilities.roundrobin
u = 'ABCDEFGHIJKLMNOPQRSTUVWXYZ'
l = 'abcdefghijklmnopqrstuvwxyz'
from iteration_utilities import roundrobin
''.join(roundrobin(u, l))
# returns 'AaBbCcDdEeFfGgHhIiJjKkLlMmNnOoPpQqRrSsTtUuVvWwXxYyZz'
或ManyIterables
同一包中的类:
from iteration_utilities import ManyIterables
ManyIterables(u, l).roundrobin().as_string()
# returns 'AaBbCcDdEeFfGgHhIiJjKkLlMmNnOoPpQqRrSsTtUuVvWwXxYyZz'
1这来自我编写的第三方库iteration_utilities
。
You could use iteration_utilities.roundrobin
1
u = 'ABCDEFGHIJKLMNOPQRSTUVWXYZ'
l = 'abcdefghijklmnopqrstuvwxyz'
from iteration_utilities import roundrobin
''.join(roundrobin(u, l))
# returns 'AaBbCcDdEeFfGgHhIiJjKkLlMmNnOoPpQqRrSsTtUuVvWwXxYyZz'
or the ManyIterables
class from the same package:
from iteration_utilities import ManyIterables
ManyIterables(u, l).roundrobin().as_string()
# returns 'AaBbCcDdEeFfGgHhIiJjKkLlMmNnOoPpQqRrSsTtUuVvWwXxYyZz'
1 This is from a third-party library I have written: iteration_utilities
.
回答 13
我将使用zip()来获得一种可读且简单的方法:
result = ''
for cha, chb in zip(u, l):
result += '%s%s' % (cha, chb)
print result
# 'AaBbCcDdEeFfGgHhIiJjKkLlMmNnOoPpQqRrSsTtUuVvWwXxYyZz'
I would use zip() to get a readable and easy way:
result = ''
for cha, chb in zip(u, l):
result += '%s%s' % (cha, chb)
print result
# 'AaBbCcDdEeFfGgHhIiJjKkLlMmNnOoPpQqRrSsTtUuVvWwXxYyZz'