删除字符串中的字符列表

Question 1

I want to remove characters in a string in python:

string.replace(',', '').replace("!", '').replace(":", '').replace(";", '')...

But I have many characters I have to remove. I thought about a list

list = [',', '!', '.', ';'...]

But how can I use the list to replace the characters in the string?

Question 2

If you’re using python2 and your inputs are strings (not unicodes), the absolutely best method is str.translate:

>>> chars_to_remove = ['.', '!', '?']
>>> subj = 'A.B!C?'
>>> subj.translate(None, ''.join(chars_to_remove))
'ABC'

Otherwise, there are following options to consider:

A. Iterate the subject char by char, omit unwanted characters and join the resulting list:

>>> sc = set(chars_to_remove)
>>> ''.join([c for c in subj if c not in sc])
'ABC'

(Note that the generator version ''.join(c for c ...) will be less efficient).

B. Create a regular expression on the fly and re.sub with an empty string:

>>> import re
>>> rx = '[' + re.escape(''.join(chars_to_remove)) + ']'
>>> re.sub(rx, '', subj)
'ABC'

( ensures that characters like ^ or ] won’t break the regular expression).

C. Use the mapping variant of translate:

>>> chars_to_remove = [u'δ', u'Γ', u'ж']
>>> subj = u'AжBδCΓ'
>>> dd = {ord(c):None for c in chars_to_remove}
>>> subj.translate(dd)
u'ABC'

Full testing code and timings:

#coding=utf8

import re

def remove_chars_iter(subj, chars):
    sc = set(chars)
    return ''.join([c for c in subj if c not in sc])

def remove_chars_re(subj, chars):
    return re.sub('[' + re.escape(''.join(chars)) + ']', '', subj)

def remove_chars_re_unicode(subj, chars):
    return re.sub(u'(?u)[' + re.escape(''.join(chars)) + ']', '', subj)

def remove_chars_translate_bytes(subj, chars):
    return subj.translate(None, ''.join(chars))

def remove_chars_translate_unicode(subj, chars):
    d = {ord(c):None for c in chars}
    return subj.translate(d)

import timeit, sys

def profile(f):
    assert f(subj, chars_to_remove) == test
    t = timeit.timeit(lambda: f(subj, chars_to_remove), number=1000)
    print ('{0:.3f} {1}'.format(t, f.__name__))

print (sys.version)
PYTHON2 = sys.version_info[0] == 2

print ('\n"plain" string:\n')

chars_to_remove = ['.', '!', '?']
subj = 'A.B!C?' * 1000
test = 'ABC' * 1000

profile(remove_chars_iter)
profile(remove_chars_re)

if PYTHON2:
    profile(remove_chars_translate_bytes)
else:
    profile(remove_chars_translate_unicode)

print ('\nunicode string:\n')

if PYTHON2:
    chars_to_remove = [u'δ', u'Γ', u'ж']
    subj = u'AжBδCΓ'
else:
    chars_to_remove = ['δ', 'Γ', 'ж']
    subj = 'AжBδCΓ'

subj = subj * 1000
test = 'ABC' * 1000

profile(remove_chars_iter)

if PYTHON2:
    profile(remove_chars_re_unicode)
else:
    profile(remove_chars_re)

profile(remove_chars_translate_unicode)

Results:

2.7.5 (default, Mar  9 2014, 22:15:05) 
[GCC 4.2.1 Compatible Apple LLVM 5.0 (clang-500.0.68)]

"plain" string:

0.637 remove_chars_iter
0.649 remove_chars_re
0.010 remove_chars_translate_bytes

unicode string:

0.866 remove_chars_iter
0.680 remove_chars_re_unicode
1.373 remove_chars_translate_unicode

---

3.4.2 (v3.4.2:ab2c023a9432, Oct  5 2014, 20:42:22) 
[GCC 4.2.1 (Apple Inc. build 5666) (dot 3)]

"plain" string:

0.512 remove_chars_iter
0.574 remove_chars_re
0.765 remove_chars_translate_unicode

unicode string:

0.817 remove_chars_iter
0.686 remove_chars_re
0.876 remove_chars_translate_unicode

(As a side note, the figure for remove_chars_translate_bytes might give us a clue why the industry was reluctant to adopt Unicode for such a long time).

Question 3

You can use str.translate():

s.translate(None, ",!.;")

Example:

>>> s = "asjo,fdjk;djaso,oio!kod.kjods;dkps"
>>> s.translate(None, ",!.;")
'asjofdjkdjasooiokodkjodsdkps'

Question 4

You can use the translate method.

s.translate(None, '!.;,')

Question 5

''.join(c for c in myString if not c in badTokens)

Question 6

If you are using python3 and looking for the translate solution – the function was changed and now takes 1 parameter instead of 2.

That parameter is a table (can be dictionary) where each key is the Unicode ordinal (int) of the character to find and the value is the replacement (can be either a Unicode ordinal or a string to map the key to).

Here is a usage example:

>>> list = [',', '!', '.', ';']
>>> s = "This is, my! str,ing."
>>> s.translate({ord(x): '' for x in list})
'This is my string'

Question 7

Another approach using regex:

''.join(re.split(r'[.;!?,]', s))

Question 8

Why not a simple loop?

for i in replace_list:
    string = string.replace(i, '')

Also, avoid naming lists ‘list’. It overrides the built-in function list.

Question 9

you could use something like this

def replace_all(text, dic):
  for i, j in dic.iteritems():
    text = text.replace(i, j)
  return text

This code is not my own and comes from here its a great article and dicusses in depth doing this

Question 10

Also an interesting topic on removal UTF-8 accent form a string converting char to their standard non-accentuated char:

What is the best way to remove accents in a python unicode string?

code extract from the topic:

import unicodedata

def remove_accents(input_str):
    nkfd_form = unicodedata.normalize('NFKD', input_str)
    return u"".join([c for c in nkfd_form if not unicodedata.combining(c)])

Question 11

Perhaps a more modern and functional way to achieve what you wish:

>>> subj = 'A.B!C?'
>>> list = set([',', '!', '.', ';', '?'])
>>> filter(lambda x: x not in list, subj)
'ABC'

please note that for this particular purpose it’s quite an overkill, but once you need more complex conditions, filter comes handy

Question 12

simple way,

import re
str = 'this is string !    >><< (foo---> bar) @-tuna-#   sandwich-%-is-$-* good'

// condense multiple empty spaces into 1
str = ' '.join(str.split()

// replace empty space with dash
str = str.replace(" ","-")

// take out any char that matches regex
str = re.sub('[!@#$%^&*()_+<>]', '', str)

output:

this-is-string--foo----bar--tuna---sandwich--is---good

Question 13

How about this – a one liner.

reduce(lambda x,y : x.replace(y,"") ,[',', '!', '.', ';'],";Test , ,  !Stri!ng ..")

Question 14

i think this is simple enough and will do!

list = [",",",","!",";",":"] #the list goes on.....

theString = "dlkaj;lkdjf'adklfaj;lsd'fa'dfj;alkdjf" #is an example string;
newString="" #the unwanted character free string
for i in range(len(TheString)):
    if theString[i] in list:
        newString += "" #concatenate an empty string.
    else:
        newString += theString[i]

this is one way to do it. But if you are tired of keeping a list of characters that you want to remove, you can actually do it by using the order number of the strings you iterate through. the order number is the ascii value of that character. the ascii number for 0 as a char is 48 and the ascii number for lower case z is 122 so:

theString = "lkdsjf;alkd8a'asdjf;lkaheoialkdjf;ad"
newString = ""
for i in range(len(theString)):
     if ord(theString[i]) < 48 or ord(theString[i]) > 122: #ord() => ascii num.
         newString += ""
     else:
        newString += theString[i]

Question 15

These days I am diving into scheme, and now I think am good at recursing and eval. HAHAHA. Just share some new ways:

first ,eval it

print eval('string%s' % (''.join(['.replace("%s","")'%i for i in replace_list])))

second , recurse it

def repn(string,replace_list):
    if replace_list==[]:
        return string
    else:
        return repn(string.replace(replace_list.pop(),""),replace_list)

print repn(string,replace_list)

Hey ,don’t downvote. I am just want to share some new idea.

Question 16

I am thinking about a solution for this. First I would make the string input as a list. Then I would replace the items of list. Then through using join command, I will return list as a string. The code can be like this:

def the_replacer(text):
    test = []    
    for m in range(len(text)):
        test.append(text[m])
        if test[m]==','\
        or test[m]=='!'\
        or test[m]=='.'\
        or test[m]=='\''\
        or test[m]==';':
    #....
            test[n]=''
    return ''.join(test)

This would remove anything from the string. What do you think about that?

Question 17

Here is a more_itertools approach:

import more_itertools as mit


s = "A.B!C?D_E@F#"
blacklist = ".!?_@#"

"".join(mit.flatten(mit.split_at(s, pred=lambda x: x in set(blacklist))))
# 'ABCDEF'

Here we split upon items found in the blacklist, flatten the results and join the string.

Question 18

Python 3, single line list comprehension implementation.

from string import ascii_lowercase # 'abcdefghijklmnopqrstuvwxyz'
def remove_chars(input_string, removable):
  return ''.join([_ for _ in input_string if _ not in removable])

print(remove_chars(input_string="Stack Overflow", removable=ascii_lowercase))
>>> 'S O'

Question 19

Remove *%,&@! from below string:

s = "this is my string,  and i will * remove * these ** %% "
new_string = s.translate(s.maketrans('','','*%,&@!'))
print(new_string)

# output: this is my string  and i will  remove  these

删除字符串中的字符列表

排行榜展示

Python 情人节超强技能导出微信聊天记录生成词云

你不得不知道的python超级文献批量搜索下载工具

7行代码 Python热力图可视化分析缺失数据处理

Python 流程图 — 一键转化代码为流程图

Python 优化—算出每条语句执行时间

你的10W块放哪里能赚最多钱？

文章展示

了解Keras LSTM

Manim-用于解释数学视频的动画引擎

如何在一次通过中检查多个键是否在字典中？

将日期时间格式化为字符串（以毫秒为单位）

在python中，为什么要使用日志记录而不是print？

pip或pip3为Python 3安装软件包？

删除字符串中的字符列表

问题：删除字符串中的字符列表

回答 0

回答 1

回答 2

回答 3

回答 4

回答 5

回答 6

回答 7

回答 8

回答 9

回答 10

回答 11

回答 12

回答 13

回答 14

回答 15

回答 16

回答 17

相关文章

排行榜展示

文章展示