问题:在Python中从字符串中删除所有非数字字符
我们如何从Python字符串中删除所有非数字字符?
How do we remove all non-numeric characters from a string in Python?
回答 0
>>> import re
>>> re.sub("[^0-9]", "", "sdkjh987978asd098as0980a98sd")
'987978098098098'
>>> import re
>>> re.sub("[^0-9]", "", "sdkjh987978asd098as0980a98sd")
'987978098098098'
回答 1
不知道这是否是最有效的方法,但是:
>>> ''.join(c for c in "abc123def456" if c.isdigit())
'123456'
该''.join
部分意味着将所有结果字符组合在一起,而中间没有任何字符。然后剩下的就是列表推导了,在这里(您可能会猜到),我们只取匹配条件的字符串部分isdigit
。
Not sure if this is the most efficient way, but:
>>> ''.join(c for c in "abc123def456" if c.isdigit())
'123456'
The ''.join
part means to combine all the resulting characters together without any characters in between. Then the rest of it is a list comprehension, where (as you can probably guess) we only take the parts of the string that match the condition isdigit
.
回答 2
这对于Python2中的字符串和unicode对象均适用,在Python3中的字符串和字节均适用:
# python <3.0
def only_numerics(seq):
return filter(type(seq).isdigit, seq)
# python ≥3.0
def only_numerics(seq):
seq_type= type(seq)
return seq_type().join(filter(seq_type.isdigit, seq))
This should work for both strings and unicode objects in Python2, and both strings and bytes in Python3:
# python <3.0
def only_numerics(seq):
return filter(type(seq).isdigit, seq)
# python ≥3.0
def only_numerics(seq):
seq_type= type(seq)
return seq_type().join(filter(seq_type.isdigit, seq))
回答 3
只是为了给混合添加另一个选项,string
模块内有几个有用的常量。尽管在其他情况下更有用,但可以在此处使用它们。
>>> from string import digits
>>> ''.join(c for c in "abc123def456" if c in digits)
'123456'
模块中有几个常量,包括:
ascii_letters
(abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ)
hexdigits
(0123456789abcdefABCDEF)
如果您大量使用这些常量,那么将它们隐瞒为可能是值得的frozenset
。这将启用O(1)查找,而不是O(n),其中n是原始字符串的常数长度。
>>> digits = frozenset(digits)
>>> ''.join(c for c in "abc123def456" if c in digits)
'123456'
Just to add another option to the mix, there are several useful constants within the string
module. While more useful in other cases, they can be used here.
>>> from string import digits
>>> ''.join(c for c in "abc123def456" if c in digits)
'123456'
There are several constants in the module, including:
ascii_letters
(abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ)
hexdigits
(0123456789abcdefABCDEF)
If you are using these constants heavily, it can be worthwhile to covert them to a frozenset
. That enables O(1) lookups, rather than O(n), where n is the length of the constant for the original strings.
>>> digits = frozenset(digits)
>>> ''.join(c for c in "abc123def456" if c in digits)
'123456'
回答 4
@Ned Batchelder和@newacct提供了正确的答案,但是…
万一您的字符串中有逗号(,)小数点(。),以防万一:
import re
re.sub("[^\d\.]", "", "$1,999,888.77")
'1999888.77'
@Ned Batchelder and @newacct provided the right answer, but …
Just in case if you have comma(,) decimal(.) in your string:
import re
re.sub("[^\d\.]", "", "$1,999,888.77")
'1999888.77'
回答 5
如果您需要执行的删除操作不止一个或两个(或什至只执行一个,但是要处理非常长的字符串!-),最快的方法是依靠translate
字符串方法,即使它确实需要一些准备:
>>> import string
>>> allchars = ''.join(chr(i) for i in xrange(256))
>>> identity = string.maketrans('', '')
>>> nondigits = allchars.translate(identity, string.digits)
>>> s = 'abc123def456'
>>> s.translate(identity, nondigits)
'123456'
该translate
方法在Unicode字符串上比在字节字符串btw上有所不同,使用起来可能更简单一些:
>>> unondig = dict.fromkeys(xrange(65536))
>>> for x in string.digits: del unondig[ord(x)]
...
>>> s = u'abc123def456'
>>> s.translate(unondig)
u'123456'
您可能想使用映射类而不是实际的字典,尤其是如果您的Unicode字符串可能包含具有非常高ord值的字符(这会使字典过大;-)时,尤其如此。例如:
>>> class keeponly(object):
... def __init__(self, keep):
... self.keep = set(ord(c) for c in keep)
... def __getitem__(self, key):
... if key in self.keep:
... return key
... return None
...
>>> s.translate(keeponly(string.digits))
u'123456'
>>>
Fastest approach, if you need to perform more than just one or two such removal operations (or even just one, but on a very long string!-), is to rely on the translate
method of strings, even though it does need some prep:
>>> import string
>>> allchars = ''.join(chr(i) for i in xrange(256))
>>> identity = string.maketrans('', '')
>>> nondigits = allchars.translate(identity, string.digits)
>>> s = 'abc123def456'
>>> s.translate(identity, nondigits)
'123456'
The translate
method is different, and maybe a tad simpler simpler to use, on Unicode strings than it is on byte strings, btw:
>>> unondig = dict.fromkeys(xrange(65536))
>>> for x in string.digits: del unondig[ord(x)]
...
>>> s = u'abc123def456'
>>> s.translate(unondig)
u'123456'
You might want to use a mapping class rather than an actual dict, especially if your Unicode string may potentially contain characters with very high ord values (that would make the dict excessively large;-). For example:
>>> class keeponly(object):
... def __init__(self, keep):
... self.keep = set(ord(c) for c in keep)
... def __getitem__(self, key):
... if key in self.keep:
... return key
... return None
...
>>> s.translate(keeponly(string.digits))
u'123456'
>>>
回答 6
很多正确的答案,但是如果您直接使用浮点数,而不使用正则表达式,则可以:
x= '$123.45M'
float(''.join(c for c in x if (c.isdigit() or c =='.'))
123.45
您可以根据需要更改逗号的要点。
如果您知道您的数字是整数,请为此更改
x='$1123'
int(''.join(c for c in x if c.isdigit())
1123
Many right answers but in case you want it in a float, directly, without using regex:
x= '$123.45M'
float(''.join(c for c in x if (c.isdigit() or c =='.'))
123.45
You can change the point for a comma depending on your needs.
change for this if you know your number is an integer
x='$1123'
int(''.join(c for c in x if c.isdigit())
1123