替换字符串中多个字符的最佳方法？

Question 1

I need to replace some characters as follows: & ➔ \&, # ➔ \#, …

I coded as follows, but I guess there should be some better way. Any hints?

strs = strs.replace('&', '\&')
strs = strs.replace('#', '\#')
...

Question 2

Replacing two characters

I timed all the methods in the current answers along with one extra.

With an input string of abc&def#ghi and replacing & -> \& and # -> \#, the fastest way was to chain together the replacements like this: text.replace('&', '\&').replace('#', '\#').

Timings for each function:

a) 1000000 loops, best of 3: 1.47 μs per loop
b) 1000000 loops, best of 3: 1.51 μs per loop
c) 100000 loops, best of 3: 12.3 μs per loop
d) 100000 loops, best of 3: 12 μs per loop
e) 100000 loops, best of 3: 3.27 μs per loop
f) 1000000 loops, best of 3: 0.817 μs per loop
g) 100000 loops, best of 3: 3.64 μs per loop
h) 1000000 loops, best of 3: 0.927 μs per loop
i) 1000000 loops, best of 3: 0.814 μs per loop

Here are the functions:

def a(text):
    chars = "&#"
    for c in chars:
        text = text.replace(c, "\\" + c)


def b(text):
    for ch in ['&','#']:
        if ch in text:
            text = text.replace(ch,"\\"+ch)


import re
def c(text):
    rx = re.compile('([&#])')
    text = rx.sub(r'\\\1', text)


RX = re.compile('([&#])')
def d(text):
    text = RX.sub(r'\\\1', text)


def mk_esc(esc_chars):
    return lambda s: ''.join(['\\' + c if c in esc_chars else c for c in s])
esc = mk_esc('&#')
def e(text):
    esc(text)


def f(text):
    text = text.replace('&', '\&').replace('#', '\#')


def g(text):
    replacements = {"&": "\&", "#": "\#"}
    text = "".join([replacements.get(c, c) for c in text])


def h(text):
    text = text.replace('&', r'\&')
    text = text.replace('#', r'\#')


def i(text):
    text = text.replace('&', r'\&').replace('#', r'\#')

Timed like this:

python -mtimeit -s"import time_functions" "time_functions.a('abc&def#ghi')"
python -mtimeit -s"import time_functions" "time_functions.b('abc&def#ghi')"
python -mtimeit -s"import time_functions" "time_functions.c('abc&def#ghi')"
python -mtimeit -s"import time_functions" "time_functions.d('abc&def#ghi')"
python -mtimeit -s"import time_functions" "time_functions.e('abc&def#ghi')"
python -mtimeit -s"import time_functions" "time_functions.f('abc&def#ghi')"
python -mtimeit -s"import time_functions" "time_functions.g('abc&def#ghi')"
python -mtimeit -s"import time_functions" "time_functions.h('abc&def#ghi')"
python -mtimeit -s"import time_functions" "time_functions.i('abc&def#ghi')"

Replacing 17 characters

Here’s similar code to do the same but with more characters to escape (\`*_{}>#+-.!$):

def a(text):
    chars = "\\`*_{}[]()>#+-.!$"
    for c in chars:
        text = text.replace(c, "\\" + c)


def b(text):
    for ch in ['\\','`','*','_','{','}','[',']','(',')','>','#','+','-','.','!','$','\'']:
        if ch in text:
            text = text.replace(ch,"\\"+ch)


import re
def c(text):
    rx = re.compile('([&#])')
    text = rx.sub(r'\\\1', text)


RX = re.compile('([\\`*_{}[]()>#+-.!$])')
def d(text):
    text = RX.sub(r'\\\1', text)


def mk_esc(esc_chars):
    return lambda s: ''.join(['\\' + c if c in esc_chars else c for c in s])
esc = mk_esc('\\`*_{}[]()>#+-.!$')
def e(text):
    esc(text)


def f(text):
    text = text.replace('\\', '\\\\').replace('`', '\`').replace('*', '\*').replace('_', '\_').replace('{', '\{').replace('}', '\}').replace('[', '\[').replace(']', '\]').replace('(', '\(').replace(')', '\)').replace('>', '\>').replace('#', '\#').replace('+', '\+').replace('-', '\-').replace('.', '\.').replace('!', '\!').replace('$', '\$')


def g(text):
    replacements = {
        "\\": "\\\\",
        "`": "\`",
        "*": "\*",
        "_": "\_",
        "{": "\{",
        "}": "\}",
        "[": "\[",
        "]": "\]",
        "(": "\(",
        ")": "\)",
        ">": "\>",
        "#": "\#",
        "+": "\+",
        "-": "\-",
        ".": "\.",
        "!": "\!",
        "$": "\$",
    }
    text = "".join([replacements.get(c, c) for c in text])


def h(text):
    text = text.replace('\\', r'\\')
    text = text.replace('`', r'\`')
    text = text.replace('*', r'\*')
    text = text.replace('_', r'\_')
    text = text.replace('{', r'\{')
    text = text.replace('}', r'\}')
    text = text.replace('[', r'\[')
    text = text.replace(']', r'\]')
    text = text.replace('(', r'\(')
    text = text.replace(')', r'\)')
    text = text.replace('>', r'\>')
    text = text.replace('#', r'\#')
    text = text.replace('+', r'\+')
    text = text.replace('-', r'\-')
    text = text.replace('.', r'\.')
    text = text.replace('!', r'\!')
    text = text.replace('$', r'\$')


def i(text):
    text = text.replace('\\', r'\\').replace('`', r'\`').replace('*', r'\*').replace('_', r'\_').replace('{', r'\{').replace('}', r'\}').replace('[', r'\[').replace(']', r'\]').replace('(', r'\(').replace(')', r'\)').replace('>', r'\>').replace('#', r'\#').replace('+', r'\+').replace('-', r'\-').replace('.', r'\.').replace('!', r'\!').replace('$', r'\$')

Here’s the results for the same input string abc&def#ghi:

a) 100000 loops, best of 3: 6.72 μs per loop
b) 100000 loops, best of 3: 2.64 μs per loop
c) 100000 loops, best of 3: 11.9 μs per loop
d) 100000 loops, best of 3: 4.92 μs per loop
e) 100000 loops, best of 3: 2.96 μs per loop
f) 100000 loops, best of 3: 4.29 μs per loop
g) 100000 loops, best of 3: 4.68 μs per loop
h) 100000 loops, best of 3: 4.73 μs per loop
i) 100000 loops, best of 3: 4.24 μs per loop

And with a longer input string (## *Something* and [another] thing in a longer sentence with {more} things to replace$):

a) 100000 loops, best of 3: 7.59 μs per loop
b) 100000 loops, best of 3: 6.54 μs per loop
c) 100000 loops, best of 3: 16.9 μs per loop
d) 100000 loops, best of 3: 7.29 μs per loop
e) 100000 loops, best of 3: 12.2 μs per loop
f) 100000 loops, best of 3: 5.38 μs per loop
g) 10000 loops, best of 3: 21.7 μs per loop
h) 100000 loops, best of 3: 5.7 μs per loop
i) 100000 loops, best of 3: 5.13 μs per loop

Adding a couple of variants:

def ab(text):
    for ch in ['\\','`','*','_','{','}','[',']','(',')','>','#','+','-','.','!','$','\'']:
        text = text.replace(ch,"\\"+ch)


def ba(text):
    chars = "\\`*_{}[]()>#+-.!$"
    for c in chars:
        if c in text:
            text = text.replace(c, "\\" + c)

With the shorter input:

ab) 100000 loops, best of 3: 7.05 μs per loop
ba) 100000 loops, best of 3: 2.4 μs per loop

With the longer input:

ab) 100000 loops, best of 3: 7.71 μs per loop
ba) 100000 loops, best of 3: 6.08 μs per loop

So I’m going to use ba for readability and speed.

Addendum

Prompted by haccks in the comments, one difference between ab and ba is the if c in text: check. Let’s test them against two more variants:

def ab_with_check(text):
    for ch in ['\\','`','*','_','{','}','[',']','(',')','>','#','+','-','.','!','$','\'']:
        if ch in text:
            text = text.replace(ch,"\\"+ch)

def ba_without_check(text):
    chars = "\\`*_{}[]()>#+-.!$"
    for c in chars:
        text = text.replace(c, "\\" + c)

Times in μs per loop on Python 2.7.14 and 3.6.3, and on a different machine from the earlier set, so cannot be compared directly.

╭────────────╥──────┬───────────────┬──────┬──────────────────╮
│ Py, input  ║  ab  │ ab_with_check │  ba  │ ba_without_check │
╞════════════╬══════╪═══════════════╪══════╪══════════════════╡
│ Py2, short ║ 8.81 │    4.22       │ 3.45 │    8.01          │
│ Py3, short ║ 5.54 │    1.34       │ 1.46 │    5.34          │
├────────────╫──────┼───────────────┼──────┼──────────────────┤
│ Py2, long  ║ 9.3  │    7.15       │ 6.85 │    8.55          │
│ Py3, long  ║ 7.43 │    4.38       │ 4.41 │    7.02          │
└────────────╨──────┴───────────────┴──────┴──────────────────┘

We can conclude that:

Those with the check are up to 4x faster than those without the check
ab_with_check is slightly in the lead on Python 3, but ba (with check) has a greater lead on Python 2
However, the biggest lesson here is Python 3 is up to 3x faster than Python 2! There’s not a huge difference between the slowest on Python 3 and fastest on Python 2!

Question 3

>>> string="abc&def#ghi"
>>> for ch in ['&','#']:
...   if ch in string:
...      string=string.replace(ch,"\\"+ch)
...
>>> print string
abc\&def\#ghi

Question 4

Simply chain the replace functions like this

strs = "abc&def#ghi"
print strs.replace('&', '\&').replace('#', '\#')
# abc\&def\#ghi

If the replacements are going to be more in number, you can do this in this generic way

strs, replacements = "abc&def#ghi", {"&": "\&", "#": "\#"}
print "".join([replacements.get(c, c) for c in strs])
# abc\&def\#ghi

Question 5

Here is a python3 method using str.translate and str.maketrans:

s = "abc&def#ghi"
print(s.translate(str.maketrans({'&': '\&', '#': '\#'})))

The printed string is abc\&def\#ghi.

Question 6

Are you always going to prepend a backslash? If so, try

import re
rx = re.compile('([&#])')
#                  ^^ fill in the characters here.
strs = rx.sub('\\\\\\1', strs)

It may not be the most efficient method but I think it is the easiest.

Question 7

Late to the party, but I lost a lot of time with this issue until I found my answer.

Short and sweet, translate is superior to replace. If you’re more interested in funcionality over time optimization, do not use replace.

Also use translate if you don’t know if the set of characters to be replaced overlaps the set of characters used to replace.

Case in point:

Using replace you would naively expect the snippet "1234".replace("1", "2").replace("2", "3").replace("3", "4") to return "2344", but it will return in fact "4444".

Translation seems to perform what OP originally desired.

Question 8

You may consider writing a generic escape function:

def mk_esc(esc_chars):
    return lambda s: ''.join(['\\' + c if c in esc_chars else c for c in s])

>>> esc = mk_esc('&#')
>>> print esc('Learn & be #1')
Learn \& be \#1

This way you can make your function configurable with a list of character that should be escaped.

Question 9

FYI, this is of little or no use to the OP but it may be of use to other readers (please do not downvote, I’m aware of this).

As a somewhat ridiculous but interesting exercise, wanted to see if I could use python functional programming to replace multiple chars. I’m pretty sure this does NOT beat just calling replace() twice. And if performance was an issue, you could easily beat this in rust, C, julia, perl, java, javascript and maybe even awk. It uses an external ‘helpers’ package called pytoolz, accelerated via cython (cytoolz, it’s a pypi package).

from cytoolz.functoolz import compose
from cytoolz.itertoolz import chain,sliding_window
from itertools import starmap,imap,ifilter
from operator import itemgetter,contains
text='&hello#hi&yo&'
char_index_iter=compose(partial(imap, itemgetter(0)), partial(ifilter, compose(partial(contains, '#&'), itemgetter(1))), enumerate)
print '\\'.join(imap(text.__getitem__, starmap(slice, sliding_window(2, chain((0,), char_index_iter(text), (len(text),))))))

I’m not even going to explain this because no one would bother using this to accomplish multiple replace. Nevertheless, I felt somewhat accomplished in doing this and thought it might inspire other readers or win a code obfuscation contest.

Question 10

Using reduce which is available in python2.7 and python3.* you can easily replace mutiple substrings in a clean and pythonic way.

# Lets define a helper method to make it easy to use
def replacer(text, replacements):
    return reduce(
        lambda text, ptuple: text.replace(ptuple[0], ptuple[1]), 
        replacements, text
    )

if __name__ == '__main__':
    uncleaned_str = "abc&def#ghi"
    cleaned_str = replacer(uncleaned_str, [("&","\&"),("#","\#")])
    print(cleaned_str) # "abc\&def\#ghi"

In python2.7 you don’t have to import reduce but in python3.* you have to import it from the functools module.

Question 11

Maybe a simple loop for chars to replace:

a = '&#'

to_replace = ['&', '#']

for char in to_replace:
    a = a.replace(char, "\\"+char)

print(a)

>>> \&\#

Question 12

How about this?

def replace_all(dict, str):
    for key in dict:
        str = str.replace(key, dict[key])
    return str

then

print(replace_all({"&":"\&", "#":"\#"}, "&#"))

output

\&\#

替换字符串中多个字符的最佳方法？

问题：替换字符串中多个字符的最佳方法？

回答 0

替换两个字符

替换17个字符

附录

Replacing two characters

Replacing 17 characters

Addendum

回答 1

回答 2

回答 3

回答 4

回答 5

回答 6

回答 7

回答 8

回答 9

回答 10

回答 11

排行榜展示

Python 情人节超强技能导出微信聊天记录生成词云

你不得不知道的python超级文献批量搜索下载工具

Python 流程图 — 一键转化代码为流程图

7行代码 Python热力图可视化分析缺失数据处理

Python 优化—算出每条语句执行时间

你的10W块放哪里能赚最多钱？

文章展示

评估字符串中的数学表达式

读取二进制文件并遍历每个字节

Docker“错误：在分配给网络的默认设置中找不到可用的，不重叠的IPv4地址池”

在iPython Notebook中进行调试的正确方法是什么？

Python vs Bash-在性能方面，每个任务都胜过其他任务？

如何将文件加载到python控制台中？

替换字符串中多个字符的最佳方法？

问题：替换字符串中多个字符的最佳方法？

回答 0

替换两个字符

替换17个字符

附录

Replacing two characters

Replacing 17 characters

Addendum

回答 1

回答 2

回答 3

回答 4

回答 5

回答 6

回答 7

回答 8

回答 9

回答 10

回答 11

相关文章

排行榜展示

文章展示