问题:删除字符串中的字符列表

我想在python中删除字符串中的字符:

string.replace(',', '').replace("!", '').replace(":", '').replace(";", '')...

但是我必须删除许多字符。我想到了一个清单

list = [',', '!', '.', ';'...]

但是,如何使用list来替换中的字符string

I want to remove characters in a string in python:

string.replace(',', '').replace("!", '').replace(":", '').replace(";", '')...

But I have many characters I have to remove. I thought about a list

list = [',', '!', '.', ';'...]

But how can I use the list to replace the characters in the string?


回答 0

如果您使用的是python2,而您的输入是字符串(不是unicodes),则绝对最佳的方法是str.translate

>>> chars_to_remove = ['.', '!', '?']
>>> subj = 'A.B!C?'
>>> subj.translate(None, ''.join(chars_to_remove))
'ABC'

否则,可以考虑以下选项:

A.通过char迭代主题char,省略不需要的字符和join结果列表:

>>> sc = set(chars_to_remove)
>>> ''.join([c for c in subj if c not in sc])
'ABC'

(请注意,生成器版本 ''.join(c for c ...)效率较低)。

B.动态创建一个正则表达式,并re.sub带有一个空字符串:

>>> import re
>>> rx = '[' + re.escape(''.join(chars_to_remove)) + ']'
>>> re.sub(rx, '', subj)
'ABC'

确保字符喜欢^]不会破坏正则表达式)。

C.使用以下映射的变体translate

>>> chars_to_remove = [u'δ', u'Γ', u'ж']
>>> subj = u'AжBδCΓ'
>>> dd = {ord(c):None for c in chars_to_remove}
>>> subj.translate(dd)
u'ABC'

完整的测试代码和计时:

#coding=utf8

import re

def remove_chars_iter(subj, chars):
    sc = set(chars)
    return ''.join([c for c in subj if c not in sc])

def remove_chars_re(subj, chars):
    return re.sub('[' + re.escape(''.join(chars)) + ']', '', subj)

def remove_chars_re_unicode(subj, chars):
    return re.sub(u'(?u)[' + re.escape(''.join(chars)) + ']', '', subj)

def remove_chars_translate_bytes(subj, chars):
    return subj.translate(None, ''.join(chars))

def remove_chars_translate_unicode(subj, chars):
    d = {ord(c):None for c in chars}
    return subj.translate(d)

import timeit, sys

def profile(f):
    assert f(subj, chars_to_remove) == test
    t = timeit.timeit(lambda: f(subj, chars_to_remove), number=1000)
    print ('{0:.3f} {1}'.format(t, f.__name__))

print (sys.version)
PYTHON2 = sys.version_info[0] == 2

print ('\n"plain" string:\n')

chars_to_remove = ['.', '!', '?']
subj = 'A.B!C?' * 1000
test = 'ABC' * 1000

profile(remove_chars_iter)
profile(remove_chars_re)

if PYTHON2:
    profile(remove_chars_translate_bytes)
else:
    profile(remove_chars_translate_unicode)

print ('\nunicode string:\n')

if PYTHON2:
    chars_to_remove = [u'δ', u'Γ', u'ж']
    subj = u'AжBδCΓ'
else:
    chars_to_remove = ['δ', 'Γ', 'ж']
    subj = 'AжBδCΓ'

subj = subj * 1000
test = 'ABC' * 1000

profile(remove_chars_iter)

if PYTHON2:
    profile(remove_chars_re_unicode)
else:
    profile(remove_chars_re)

profile(remove_chars_translate_unicode)

结果:

2.7.5 (default, Mar  9 2014, 22:15:05) 
[GCC 4.2.1 Compatible Apple LLVM 5.0 (clang-500.0.68)]

"plain" string:

0.637 remove_chars_iter
0.649 remove_chars_re
0.010 remove_chars_translate_bytes

unicode string:

0.866 remove_chars_iter
0.680 remove_chars_re_unicode
1.373 remove_chars_translate_unicode

---

3.4.2 (v3.4.2:ab2c023a9432, Oct  5 2014, 20:42:22) 
[GCC 4.2.1 (Apple Inc. build 5666) (dot 3)]

"plain" string:

0.512 remove_chars_iter
0.574 remove_chars_re
0.765 remove_chars_translate_unicode

unicode string:

0.817 remove_chars_iter
0.686 remove_chars_re
0.876 remove_chars_translate_unicode

(作为附带说明,该数字remove_chars_translate_bytes可能为我们提供了一个线索,说明为什么该行业这么长时间不愿采用Unicode)。

If you’re using python2 and your inputs are strings (not unicodes), the absolutely best method is str.translate:

>>> chars_to_remove = ['.', '!', '?']
>>> subj = 'A.B!C?'
>>> subj.translate(None, ''.join(chars_to_remove))
'ABC'

Otherwise, there are following options to consider:

A. Iterate the subject char by char, omit unwanted characters and join the resulting list:

>>> sc = set(chars_to_remove)
>>> ''.join([c for c in subj if c not in sc])
'ABC'

(Note that the generator version ''.join(c for c ...) will be less efficient).

B. Create a regular expression on the fly and re.sub with an empty string:

>>> import re
>>> rx = '[' + re.escape(''.join(chars_to_remove)) + ']'
>>> re.sub(rx, '', subj)
'ABC'

( ensures that characters like ^ or ] won’t break the regular expression).

C. Use the mapping variant of translate:

>>> chars_to_remove = [u'δ', u'Γ', u'ж']
>>> subj = u'AжBδCΓ'
>>> dd = {ord(c):None for c in chars_to_remove}
>>> subj.translate(dd)
u'ABC'

Full testing code and timings:

#coding=utf8

import re

def remove_chars_iter(subj, chars):
    sc = set(chars)
    return ''.join([c for c in subj if c not in sc])

def remove_chars_re(subj, chars):
    return re.sub('[' + re.escape(''.join(chars)) + ']', '', subj)

def remove_chars_re_unicode(subj, chars):
    return re.sub(u'(?u)[' + re.escape(''.join(chars)) + ']', '', subj)

def remove_chars_translate_bytes(subj, chars):
    return subj.translate(None, ''.join(chars))

def remove_chars_translate_unicode(subj, chars):
    d = {ord(c):None for c in chars}
    return subj.translate(d)

import timeit, sys

def profile(f):
    assert f(subj, chars_to_remove) == test
    t = timeit.timeit(lambda: f(subj, chars_to_remove), number=1000)
    print ('{0:.3f} {1}'.format(t, f.__name__))

print (sys.version)
PYTHON2 = sys.version_info[0] == 2

print ('\n"plain" string:\n')

chars_to_remove = ['.', '!', '?']
subj = 'A.B!C?' * 1000
test = 'ABC' * 1000

profile(remove_chars_iter)
profile(remove_chars_re)

if PYTHON2:
    profile(remove_chars_translate_bytes)
else:
    profile(remove_chars_translate_unicode)

print ('\nunicode string:\n')

if PYTHON2:
    chars_to_remove = [u'δ', u'Γ', u'ж']
    subj = u'AжBδCΓ'
else:
    chars_to_remove = ['δ', 'Γ', 'ж']
    subj = 'AжBδCΓ'

subj = subj * 1000
test = 'ABC' * 1000

profile(remove_chars_iter)

if PYTHON2:
    profile(remove_chars_re_unicode)
else:
    profile(remove_chars_re)

profile(remove_chars_translate_unicode)

Results:

2.7.5 (default, Mar  9 2014, 22:15:05) 
[GCC 4.2.1 Compatible Apple LLVM 5.0 (clang-500.0.68)]

"plain" string:

0.637 remove_chars_iter
0.649 remove_chars_re
0.010 remove_chars_translate_bytes

unicode string:

0.866 remove_chars_iter
0.680 remove_chars_re_unicode
1.373 remove_chars_translate_unicode

---

3.4.2 (v3.4.2:ab2c023a9432, Oct  5 2014, 20:42:22) 
[GCC 4.2.1 (Apple Inc. build 5666) (dot 3)]

"plain" string:

0.512 remove_chars_iter
0.574 remove_chars_re
0.765 remove_chars_translate_unicode

unicode string:

0.817 remove_chars_iter
0.686 remove_chars_re
0.876 remove_chars_translate_unicode

(As a side note, the figure for remove_chars_translate_bytes might give us a clue why the industry was reluctant to adopt Unicode for such a long time).


回答 1

您可以使用str.translate()

s.translate(None, ",!.;")

例:

>>> s = "asjo,fdjk;djaso,oio!kod.kjods;dkps"
>>> s.translate(None, ",!.;")
'asjofdjkdjasooiokodkjodsdkps'

You can use str.translate():

s.translate(None, ",!.;")

Example:

>>> s = "asjo,fdjk;djaso,oio!kod.kjods;dkps"
>>> s.translate(None, ",!.;")
'asjofdjkdjasooiokodkjodsdkps'

回答 2

您可以使用翻译方法。

s.translate(None, '!.;,')

You can use the translate method.

s.translate(None, '!.;,')

回答 3

''.join(c for c in myString if not c in badTokens)
''.join(c for c in myString if not c in badTokens)

回答 4

如果您使用的是python3并且正在寻找translate解决方案-函数已更改,现在使用1个参数而不是2个参数。

该参数是一个表(可以是字典),其中每个键是要查找的字符的Unicode序数(int),值是替换字符(可以是Unicode序数或将键映射到的字符串)。

这是一个用法示例:

>>> list = [',', '!', '.', ';']
>>> s = "This is, my! str,ing."
>>> s.translate({ord(x): '' for x in list})
'This is my string'

If you are using python3 and looking for the translate solution – the function was changed and now takes 1 parameter instead of 2.

That parameter is a table (can be dictionary) where each key is the Unicode ordinal (int) of the character to find and the value is the replacement (can be either a Unicode ordinal or a string to map the key to).

Here is a usage example:

>>> list = [',', '!', '.', ';']
>>> s = "This is, my! str,ing."
>>> s.translate({ord(x): '' for x in list})
'This is my string'

回答 5

使用正则表达式的另一种方法:

''.join(re.split(r'[.;!?,]', s))

Another approach using regex:

''.join(re.split(r'[.;!?,]', s))

回答 6

为什么不进行简单循环?

for i in replace_list:
    string = string.replace(i, '')

另外,避免将列表命名为“列表”。它覆盖了内置函数list

Why not a simple loop?

for i in replace_list:
    string = string.replace(i, '')

Also, avoid naming lists ‘list’. It overrides the built-in function list.


回答 7

你可以用这样的东西

def replace_all(text, dic):
  for i, j in dic.iteritems():
    text = text.replace(i, j)
  return text

这段代码不是我自己的,来自这里,是一篇很棒的文章,并深入探讨了这一点。

you could use something like this

def replace_all(text, dic):
  for i, j in dic.iteritems():
    text = text.replace(i, j)
  return text

This code is not my own and comes from here its a great article and dicusses in depth doing this


回答 8

关于从字符串中将char转换为标准非重音char的字符串删除UTF-8重音也是一个有趣的话题:

删除python unicode字符串中的重音符号的最佳方法是什么?

来自主题的代码摘录:

import unicodedata

def remove_accents(input_str):
    nkfd_form = unicodedata.normalize('NFKD', input_str)
    return u"".join([c for c in nkfd_form if not unicodedata.combining(c)])

Also an interesting topic on removal UTF-8 accent form a string converting char to their standard non-accentuated char:

What is the best way to remove accents in a python unicode string?

code extract from the topic:

import unicodedata

def remove_accents(input_str):
    nkfd_form = unicodedata.normalize('NFKD', input_str)
    return u"".join([c for c in nkfd_form if not unicodedata.combining(c)])

回答 9

也许是一种更现代和实用的方法来实现您的期望:

>>> subj = 'A.B!C?'
>>> list = set([',', '!', '.', ';', '?'])
>>> filter(lambda x: x not in list, subj)
'ABC'

请注意,对于此特定目的,这是一个过大的杀伤力,但是一旦您需要更复杂的条件,过滤器就会派上用场

Perhaps a more modern and functional way to achieve what you wish:

>>> subj = 'A.B!C?'
>>> list = set([',', '!', '.', ';', '?'])
>>> filter(lambda x: x not in list, subj)
'ABC'

please note that for this particular purpose it’s quite an overkill, but once you need more complex conditions, filter comes handy


回答 10

简单的方法

import re
str = 'this is string !    >><< (foo---> bar) @-tuna-#   sandwich-%-is-$-* good'

// condense multiple empty spaces into 1
str = ' '.join(str.split()

// replace empty space with dash
str = str.replace(" ","-")

// take out any char that matches regex
str = re.sub('[!@#$%^&*()_+<>]', '', str)

输出:

this-is-string--foo----bar--tuna---sandwich--is---good

simple way,

import re
str = 'this is string !    >><< (foo---> bar) @-tuna-#   sandwich-%-is-$-* good'

// condense multiple empty spaces into 1
str = ' '.join(str.split()

// replace empty space with dash
str = str.replace(" ","-")

// take out any char that matches regex
str = re.sub('[!@#$%^&*()_+<>]', '', str)

output:

this-is-string--foo----bar--tuna---sandwich--is---good


回答 11

怎么样-一个衬垫。

reduce(lambda x,y : x.replace(y,"") ,[',', '!', '.', ';'],";Test , ,  !Stri!ng ..")

How about this – a one liner.

reduce(lambda x,y : x.replace(y,"") ,[',', '!', '.', ';'],";Test , ,  !Stri!ng ..")

回答 12

我认为这很简单并且可以!

list = [",",",","!",";",":"] #the list goes on.....

theString = "dlkaj;lkdjf'adklfaj;lsd'fa'dfj;alkdjf" #is an example string;
newString="" #the unwanted character free string
for i in range(len(TheString)):
    if theString[i] in list:
        newString += "" #concatenate an empty string.
    else:
        newString += theString[i]

这是做到这一点的一种方法。但是,如果您厌倦了要保留要删除的字符列表,则实际上可以使用迭代的字符串的顺序号来完成。订单号是该字符的ascii值。0作为字符的ascii数为48,小写字母z的ascii数为122,因此:

theString = "lkdsjf;alkd8a'asdjf;lkaheoialkdjf;ad"
newString = ""
for i in range(len(theString)):
     if ord(theString[i]) < 48 or ord(theString[i]) > 122: #ord() => ascii num.
         newString += ""
     else:
        newString += theString[i]

i think this is simple enough and will do!

list = [",",",","!",";",":"] #the list goes on.....

theString = "dlkaj;lkdjf'adklfaj;lsd'fa'dfj;alkdjf" #is an example string;
newString="" #the unwanted character free string
for i in range(len(TheString)):
    if theString[i] in list:
        newString += "" #concatenate an empty string.
    else:
        newString += theString[i]

this is one way to do it. But if you are tired of keeping a list of characters that you want to remove, you can actually do it by using the order number of the strings you iterate through. the order number is the ascii value of that character. the ascii number for 0 as a char is 48 and the ascii number for lower case z is 122 so:

theString = "lkdsjf;alkd8a'asdjf;lkaheoialkdjf;ad"
newString = ""
for i in range(len(theString)):
     if ord(theString[i]) < 48 or ord(theString[i]) > 122: #ord() => ascii num.
         newString += ""
     else:
        newString += theString[i]

回答 13

这些天,我开始研究计划,现在我认为擅长递归和评估。哈哈哈 只需分享一些新方法:

首先,评估一下

print eval('string%s' % (''.join(['.replace("%s","")'%i for i in replace_list])))

第二,递归

def repn(string,replace_list):
    if replace_list==[]:
        return string
    else:
        return repn(string.replace(replace_list.pop(),""),replace_list)

print repn(string,replace_list)

嘿,别投票。我只想分享一些新想法。

These days I am diving into scheme, and now I think am good at recursing and eval. HAHAHA. Just share some new ways:

first ,eval it

print eval('string%s' % (''.join(['.replace("%s","")'%i for i in replace_list])))

second , recurse it

def repn(string,replace_list):
    if replace_list==[]:
        return string
    else:
        return repn(string.replace(replace_list.pop(),""),replace_list)

print repn(string,replace_list)

Hey ,don’t downvote. I am just want to share some new idea.


回答 14

我正在考虑为此的解决方案。首先,我将字符串输入作为列表。然后,我将替换列表中的项目。然后通过使用join命令,我将以字符串形式返回list。代码可以像这样:

def the_replacer(text):
    test = []    
    for m in range(len(text)):
        test.append(text[m])
        if test[m]==','\
        or test[m]=='!'\
        or test[m]=='.'\
        or test[m]=='\''\
        or test[m]==';':
    #....
            test[n]=''
    return ''.join(test)

这将从字符串中删除任何内容。您对此有何看法?

I am thinking about a solution for this. First I would make the string input as a list. Then I would replace the items of list. Then through using join command, I will return list as a string. The code can be like this:

def the_replacer(text):
    test = []    
    for m in range(len(text)):
        test.append(text[m])
        if test[m]==','\
        or test[m]=='!'\
        or test[m]=='.'\
        or test[m]=='\''\
        or test[m]==';':
    #....
            test[n]=''
    return ''.join(test)

This would remove anything from the string. What do you think about that?


回答 15

这是一种more_itertools方法:

import more_itertools as mit


s = "A.B!C?D_E@F#"
blacklist = ".!?_@#"

"".join(mit.flatten(mit.split_at(s, pred=lambda x: x in set(blacklist))))
# 'ABCDEF'

在这里,我们分割了在中找到的项目blacklist,将结果展平并加入字符串。

Here is a more_itertools approach:

import more_itertools as mit


s = "A.B!C?D_E@F#"
blacklist = ".!?_@#"

"".join(mit.flatten(mit.split_at(s, pred=lambda x: x in set(blacklist))))
# 'ABCDEF'

Here we split upon items found in the blacklist, flatten the results and join the string.


回答 16

Python 3,单行列表理解实现。

from string import ascii_lowercase # 'abcdefghijklmnopqrstuvwxyz'
def remove_chars(input_string, removable):
  return ''.join([_ for _ in input_string if _ not in removable])

print(remove_chars(input_string="Stack Overflow", removable=ascii_lowercase))
>>> 'S O'

Python 3, single line list comprehension implementation.

from string import ascii_lowercase # 'abcdefghijklmnopqrstuvwxyz'
def remove_chars(input_string, removable):
  return ''.join([_ for _ in input_string if _ not in removable])

print(remove_chars(input_string="Stack Overflow", removable=ascii_lowercase))
>>> 'S O'

回答 17

去掉 *%,&@!从下面的字符串:

s = "this is my string,  and i will * remove * these ** %% "
new_string = s.translate(s.maketrans('','','*%,&@!'))
print(new_string)

# output: this is my string  and i will  remove  these  

Remove *%,&@! from below string:

s = "this is my string,  and i will * remove * these ** %% "
new_string = s.translate(s.maketrans('','','*%,&@!'))
print(new_string)

# output: this is my string  and i will  remove  these  

声明:本站所有文章,如无特殊说明或标注,均为本站原创发布。任何个人或组织,在未征得本站同意时,禁止复制、盗用、采集、发布本站内容到任何网站、书籍等各类媒体平台。如若本站内容侵犯了原著者的合法权益,可联系我们进行处理。