问题:使用Python从字符串中删除数字以外的字符?
How can I remove all characters except numbers from string?
回答 0
在Python 2. *中,到目前为止最快的方法是.translate
:
>>> x='aaa12333bb445bb54b5b52'
>>> import string
>>> all=string.maketrans('','')
>>> nodigs=all.translate(all, string.digits)
>>> x.translate(all, nodigs)
'1233344554552'
>>>
string.maketrans
生成一个转换表(长度为256的字符串),在这种情况下,该转换表与''.join(chr(x) for x in range(256))
(更快地制作;-)相同。.translate
应用转换表(这里无关紧要,因为all
本质上是指身份),并删除第二个参数(关键部分)中存在的字符。
.translate
在Unicode字符串(和Python 3中的字符串)上的工作方式大不相同-我确实希望问题能说明感兴趣的是哪个Python的主要发行版!)-并不是那么简单,也不是那么快,尽管仍然非常有用。
回到2. *,性能差异令人印象深刻……:
$ python -mtimeit -s'import string; all=string.maketrans("", ""); nodig=all.translate(all, string.digits); x="aaa12333bb445bb54b5b52"' 'x.translate(all, nodig)'
1000000 loops, best of 3: 1.04 usec per loop
$ python -mtimeit -s'import re; x="aaa12333bb445bb54b5b52"' 're.sub(r"\D", "", x)'
100000 loops, best of 3: 7.9 usec per loop
将事情加速7到8倍几乎不是花生,因此该translate
方法非常值得了解和使用。另一种流行的非RE方法…
$ python -mtimeit -s'x="aaa12333bb445bb54b5b52"' '"".join(i for i in x if i.isdigit())'
100000 loops, best of 3: 11.5 usec per loop
比RE慢50%,因此该.translate
方法将其击败了一个数量级。
在Python 3或Unicode中,您需要传递.translate
一个映射(以普通字符而不是直接字符作为键),该映射返回None
要删除的内容。这是删除“除以下所有内容外”几个字符的一种便捷方式:
import string
class Del:
def __init__(self, keep=string.digits):
self.comp = dict((ord(c),c) for c in keep)
def __getitem__(self, k):
return self.comp.get(k)
DD = Del()
x='aaa12333bb445bb54b5b52'
x.translate(DD)
也发出'1233344554552'
。但是,将其放在xx.py中,我们可以…:
$ python3.1 -mtimeit -s'import re; x="aaa12333bb445bb54b5b52"' 're.sub(r"\D", "", x)'
100000 loops, best of 3: 8.43 usec per loop
$ python3.1 -mtimeit -s'import xx; x="aaa12333bb445bb54b5b52"' 'x.translate(xx.DD)'
10000 loops, best of 3: 24.3 usec per loop
…表明性能优势对于这种“删除”任务消失了,而变成了性能下降。
In Python 2.*, by far the fastest approach is the .translate
method:
>>> x='aaa12333bb445bb54b5b52'
>>> import string
>>> all=string.maketrans('','')
>>> nodigs=all.translate(all, string.digits)
>>> x.translate(all, nodigs)
'1233344554552'
>>>
string.maketrans
makes a translation table (a string of length 256) which in this case is the same as ''.join(chr(x) for x in range(256))
(just faster to make;-). .translate
applies the translation table (which here is irrelevant since all
essentially means identity) AND deletes characters present in the second argument — the key part.
.translate
works very differently on Unicode strings (and strings in Python 3 — I do wish questions specified which major-release of Python is of interest!) — not quite this simple, not quite this fast, though still quite usable.
Back to 2.*, the performance difference is impressive…:
$ python -mtimeit -s'import string; all=string.maketrans("", ""); nodig=all.translate(all, string.digits); x="aaa12333bb445bb54b5b52"' 'x.translate(all, nodig)'
1000000 loops, best of 3: 1.04 usec per loop
$ python -mtimeit -s'import re; x="aaa12333bb445bb54b5b52"' 're.sub(r"\D", "", x)'
100000 loops, best of 3: 7.9 usec per loop
Speeding things up by 7-8 times is hardly peanuts, so the translate
method is well worth knowing and using. The other popular non-RE approach…:
$ python -mtimeit -s'x="aaa12333bb445bb54b5b52"' '"".join(i for i in x if i.isdigit())'
100000 loops, best of 3: 11.5 usec per loop
is 50% slower than RE, so the .translate
approach beats it by over an order of magnitude.
In Python 3, or for Unicode, you need to pass .translate
a mapping (with ordinals, not characters directly, as keys) that returns None
for what you want to delete. Here’s a convenient way to express this for deletion of “everything but” a few characters:
import string
class Del:
def __init__(self, keep=string.digits):
self.comp = dict((ord(c),c) for c in keep)
def __getitem__(self, k):
return self.comp.get(k)
DD = Del()
x='aaa12333bb445bb54b5b52'
x.translate(DD)
also emits '1233344554552'
. However, putting this in xx.py we have…:
$ python3.1 -mtimeit -s'import re; x="aaa12333bb445bb54b5b52"' 're.sub(r"\D", "", x)'
100000 loops, best of 3: 8.43 usec per loop
$ python3.1 -mtimeit -s'import xx; x="aaa12333bb445bb54b5b52"' 'x.translate(xx.DD)'
10000 loops, best of 3: 24.3 usec per loop
…which shows the performance advantage disappears, for this kind of “deletion” tasks, and becomes a performance decrease.
回答 1
使用re.sub
,如下所示:
>>> import re
>>> re.sub('\D', '', 'aas30dsa20')
'3020'
\D
匹配任何非数字字符,因此,上面的代码实质上是将每个非数字字符替换为空字符串。
或者您可以使用filter
,就像这样(在Python 2中):
>>> filter(str.isdigit, 'aas30dsa20')
'3020'
由于在Python 3中,filter
返回的是迭代器而不是list
,因此您可以使用以下代码:
>>> ''.join(filter(str.isdigit, 'aas30dsa20'))
'3020'
Use re.sub
, like so:
>>> import re
>>> re.sub('\D', '', 'aas30dsa20')
'3020'
\D
matches any non-digit character so, the code above, is essentially replacing every non-digit character for the empty string.
Or you can use filter
, like so (in Python 2):
>>> filter(str.isdigit, 'aas30dsa20')
'3020'
Since in Python 3, filter
returns an iterator instead of a list
, you can use the following instead:
>>> ''.join(filter(str.isdigit, 'aas30dsa20'))
'3020'
回答 2
s=''.join(i for i in s if i.isdigit())
另一个生成器变体。
s=''.join(i for i in s if i.isdigit())
Another generator variant.
回答 3
您可以使用过滤器:
filter(lambda x: x.isdigit(), "dasdasd2313dsa")
在python3.0上,您必须加入这个(有点丑陋的:()
''.join(filter(lambda x: x.isdigit(), "dasdasd2313dsa"))
You can use filter:
filter(lambda x: x.isdigit(), "dasdasd2313dsa")
On python3.0 you have to join this (kinda ugly :( )
''.join(filter(lambda x: x.isdigit(), "dasdasd2313dsa"))
回答 4
按照拜耳的回答:
''.join(i for i in s if i.isdigit())
along the lines of bayer’s answer:
''.join(i for i in s if i.isdigit())
回答 5
您可以使用Regex轻松完成此操作
>>> import re
>>> re.sub("\D","","£70,000")
70000
You can easily do it using Regex
>>> import re
>>> re.sub("\D","","£70,000")
70000
回答 6
x.translate(None, string.digits)
将从字符串中删除所有数字。要删除字母并保留数字,请执行以下操作:
x.translate(None, string.letters)
x.translate(None, string.digits)
will delete all digits from string. To delete letters and keep the digits, do this:
x.translate(None, string.letters)
回答 7
这位操作员在评论中提到他想保留小数位。可以通过re.sub方法(按照第二个方法和恕我直言的最佳答案)来完成,方法是明确列出要保留的字符,例如
>>> re.sub("[^0123456789\.]","","poo123.4and5fish")
'123.45'
The op mentions in the comments that he wants to keep the decimal place. This can be done with the re.sub method (as per the second and IMHO best answer) by explicitly listing the characters to keep e.g.
>>> re.sub("[^0123456789\.]","","poo123.4and5fish")
'123.45'
回答 8
Python 3的快速版本:
# xx3.py
from collections import defaultdict
import string
_NoneType = type(None)
def keeper(keep):
table = defaultdict(_NoneType)
table.update({ord(c): c for c in keep})
return table
digit_keeper = keeper(string.digits)
这是与regex的性能比较:
$ python3.3 -mtimeit -s'import xx3; x="aaa12333bb445bb54b5b52"' 'x.translate(xx3.digit_keeper)'
1000000 loops, best of 3: 1.02 usec per loop
$ python3.3 -mtimeit -s'import re; r = re.compile(r"\D"); x="aaa12333bb445bb54b5b52"' 'r.sub("", x)'
100000 loops, best of 3: 3.43 usec per loop
对我来说,它比正则表达式快3倍多。它也比class Del
上面更快,因为defaultdict
它使用C语言而不是(慢)Python进行所有查找。这是我在同一系统上的版本,以进行比较。
$ python3.3 -mtimeit -s'import xx; x="aaa12333bb445bb54b5b52"' 'x.translate(xx.DD)'
100000 loops, best of 3: 13.6 usec per loop
A fast version for Python 3:
# xx3.py
from collections import defaultdict
import string
_NoneType = type(None)
def keeper(keep):
table = defaultdict(_NoneType)
table.update({ord(c): c for c in keep})
return table
digit_keeper = keeper(string.digits)
Here’s a performance comparison vs. regex:
$ python3.3 -mtimeit -s'import xx3; x="aaa12333bb445bb54b5b52"' 'x.translate(xx3.digit_keeper)'
1000000 loops, best of 3: 1.02 usec per loop
$ python3.3 -mtimeit -s'import re; r = re.compile(r"\D"); x="aaa12333bb445bb54b5b52"' 'r.sub("", x)'
100000 loops, best of 3: 3.43 usec per loop
So it’s a little bit more than 3 times faster than regex, for me. It’s also faster than class Del
above, because defaultdict
does all its lookups in C, rather than (slow) Python. Here’s that version on my same system, for comparison.
$ python3.3 -mtimeit -s'import xx; x="aaa12333bb445bb54b5b52"' 'x.translate(xx.DD)'
100000 loops, best of 3: 13.6 usec per loop
回答 9
使用生成器表达式:
>>> s = "foo200bar"
>>> new_s = "".join(i for i in s if i in "0123456789")
Use a generator expression:
>>> s = "foo200bar"
>>> new_s = "".join(i for i in s if i in "0123456789")
回答 10
丑陋但可行:
>>> s
'aaa12333bb445bb54b5b52'
>>> a = ''.join(filter(lambda x : x.isdigit(), s))
>>> a
'1233344554552'
>>>
Ugly but works:
>>> s
'aaa12333bb445bb54b5b52'
>>> a = ''.join(filter(lambda x : x.isdigit(), s))
>>> a
'1233344554552'
>>>
回答 11
$ python -mtimeit -s'import re; x="aaa12333bb445bb54b5b52"' 're.sub(r"\D", "", x)'
100000次循环,每循环3:2.48微秒最佳
$ python -mtimeit -s'import re; x="aaa12333bab445bb54b5b52"' '"".join(re.findall("[a-z]+",x))'
100000次循环,最好为3:每个循环2.02微秒
$ python -mtimeit -s'import re; x="aaa12333bb445bb54b5b52"' 're.sub(r"\D", "", x)'
100000次循环,每循环3:2.37最佳
$ python -mtimeit -s'import re; x="aaa12333bab445bb54b5b52"' '"".join(re.findall("[a-z]+",x))'
100000次循环,每循环3:1.97最佳
我已经观察到联接比sub快。
$ python -mtimeit -s'import re; x="aaa12333bb445bb54b5b52"' 're.sub(r"\D", "", x)'
100000 loops, best of 3: 2.48 usec per loop
$ python -mtimeit -s'import re; x="aaa12333bab445bb54b5b52"' '"".join(re.findall("[a-z]+",x))'
100000 loops, best of 3: 2.02 usec per loop
$ python -mtimeit -s'import re; x="aaa12333bb445bb54b5b52"' 're.sub(r"\D", "", x)'
100000 loops, best of 3: 2.37 usec per loop
$ python -mtimeit -s'import re; x="aaa12333bab445bb54b5b52"' '"".join(re.findall("[a-z]+",x))'
100000 loops, best of 3: 1.97 usec per loop
I had observed that join is faster than sub.
回答 12
您可以阅读每个字符。如果是数字,则将其包括在答案中。该str.isdigit()
方法是一种知道字符是否为数字的方法。
your_input = '12kjkh2nnk34l34'
your_output = ''.join(c for c in your_input if c.isdigit())
print(your_output) # '1223434'
You can read each character. If it is digit, then include it in the answer. The str.isdigit()
method is a way to know if a character is digit.
your_input = '12kjkh2nnk34l34'
your_output = ''.join(c for c in your_input if c.isdigit())
print(your_output) # '1223434'
回答 13
不是一行代码,但非常简单:
buffer = ""
some_str = "aas30dsa20"
for char in some_str:
if not char.isdigit():
buffer += char
print( buffer )
Not a one liner but very simple:
buffer = ""
some_str = "aas30dsa20"
for char in some_str:
if not char.isdigit():
buffer += char
print( buffer )
回答 14
我用这个 'letters'
应该包含您要删除的所有字母:
Output = Input.translate({ord(i): None for i in 'letters'}))
例:
Input = "I would like 20 dollars for that suit"
Output = Input.translate({ord(i): None for i in 'abcdefghijklmnopqrstuvwxzy'}))
print(Output)
输出:
20
I used this. 'letters'
should contain all the letters that you want to get rid of:
Output = Input.translate({ord(i): None for i in 'letters'}))
Example:
Input = "I would like 20 dollars for that suit"
Output = Input.translate({ord(i): None for i in 'abcdefghijklmnopqrstuvwxzy'}))
print(Output)
Output:
20