S.isalnum() -> bool
Return True if all characters in S are alphanumeric
and there is at least one character in S, False otherwise.
If you insist on using regex, other solutions will do fine. However note that if it can be done without using a regular expression, that’s the best way to go about it.
After seeing this, I was interested in expanding on the provided answers by finding out which executes in the least amount of time, so I went through and checked some of the proposed answers with timeit against two of the example strings:
string1 = 'Special $#! characters spaces 888323'
string2 = 'how much for the maple syrup? $20.99? That s ricidulous!!!'
Example 1
'.join(e for e in string if e.isalnum())
string1 – Result: 10.7061979771
string2 – Result: 7.78372597694
Example 2
import re
re.sub('[^A-Za-z0-9]+', '', string)
string1 – Result: 7.10785102844
string2 – Result: 4.12814903259
Example 3
import re
re.sub('\W+','', string)
string1 – Result: 3.11899876595
string2 – Result: 2.78014397621
The above results are a product of the lowest returned result from an average of: repeat(3, 2000000)
Example 3 can be 3x faster than Example 1.
回答 4
Python 2. *
我认为filter(str.isalnum, string)效果很好
In[20]: filter(str.isalnum,'string with special chars like !,#$% etcs.')Out[20]:'stringwithspecialcharslikeetcs'
In [20]: filter(str.isalnum, 'string with special chars like !,#$% etcs.')
Out[20]: 'stringwithspecialcharslikeetcs'
Python 3.*
In Python3, filter( ) function would return an itertable object (instead of string unlike in above). One has to join back to get a string from itertable:
#!/usr/bin/pythonimport re
strs ="how much for the maple syrup? $20.99? That's ricidulous!!!"print strs
nstr = re.sub(r'[?|$|.|!]',r'',strs)print nstr
nestr = re.sub(r'[^a-zA-Z0-9 ]',r'',nstr)print nestr
Differently than everyone else did using regex, I would try to exclude every character that is not what I want, instead of enumerating explicitly what I don’t want.
For example, if I want only characters from ‘a to z’ (upper and lower case) and numbers, I would exclude everything else:
import re
s = re.sub(r"[^a-zA-Z0-9]","",s)
This means “substitute every character that is not a number, or a character in the range ‘a to z’ or ‘A to Z’ with an empty string”.
In fact, if you insert the special character ^ at the first place of your regex, you will get the negation.
Extra tip: if you also need to lowercase the result, you can make the regex even faster and easier, as long as you won’t find any uppercase now.
import re
s = re.sub(r"[^a-z0-9]","",s.lower())
回答 7
假设您要使用正则表达式,并且想要/需要支持2to3的Unicode识别2.x代码:
>>>import re
>>> rx = re.compile(u'[\W_]+', re.UNICODE)>>> data = u''.join(unichr(i)for i in range(256))>>> rx.sub(u'', data)
u'0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz\xaa\xb2 [snip] \xfe\xff'>>>
Assuming you want to use a regex and you want/need Unicode-cognisant 2.x code that is 2to3-ready:
>>> import re
>>> rx = re.compile(u'[\W_]+', re.UNICODE)
>>> data = u''.join(unichr(i) for i in range(256))
>>> rx.sub(u'', data)
u'0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz\xaa\xb2 [snip] \xfe\xff'
>>>
import unicodedata
# strip of crap characters (based on the Unicode database# categorization:# http://www.sql-und-xml.de/unicode-database/#kategorien
PRINTABLE = set(('Lu','Ll','Nd','Zs'))def filter_non_printable(s):
result =[]
ws_last =Falsefor c in s:
c = unicodedata.category(c)in PRINTABLE and c or u'#'
result.append(c)return u''.join(result).replace(u'#', u' ')
The most generic approach is using the ‘categories’ of the unicodedata table which classifies every single character. E.g. the following code filters only printable characters based on their category:
import unicodedata
# strip of crap characters (based on the Unicode database
# categorization:
# http://www.sql-und-xml.de/unicode-database/#kategorien
PRINTABLE = set(('Lu', 'Ll', 'Nd', 'Zs'))
def filter_non_printable(s):
result = []
ws_last = False
for c in s:
c = unicodedata.category(c) in PRINTABLE and c or u'#'
result.append(c)
return u''.join(result).replace(u'#', u' ')
Look at the given URL above for all related categories. You also can of course filter
by the punctuation categories.
回答 10
string。标点符号包含以下字符:
‘!“#$%&\’()* +,-。/ :; <=>?@ [\] ^ _`{|}〜’
您可以使用translate和maketrans函数将标点符号映射到空值(替换)
import string
'This, is. A test!'.translate(str.maketrans('','', string.punctuation))
import re
my_string ="""Strings are amongst the most popular data types in Python. We can create the strings by enclosing characters in quotes. Python treats single quotes the
与双引号相同。“”“
# if we need to count the word python that ends with or without ',' or '.' at end
count =0for i in text:if i.endswith("."):
text[count]= re.sub("^([a-z]+)(.)?$", r"\1", i)
count +=1print("The count of Python : ", text.count("python"))
import re
my_string = """Strings are amongst the most popular data types in Python. We can create the strings by enclosing characters in quotes. Python treats single quotes the
same as double quotes.”””
# if we need to count the word python that ends with or without ',' or '.' at end
count = 0
for i in text:
if i.endswith("."):
text[count] = re.sub("^([a-z]+)(.)?$", r"\1", i)
count += 1
print("The count of Python : ", text.count("python"))
回答 13
import re
abc ="askhnl#$%askdjalsdk"
ddd = abc.replace("#$%","")print(ddd)