标签归档:string

将列表中的项目连接到字符串

问题:将列表中的项目连接到字符串

有没有更简单的方法将列表中的字符串项连接为单个字符串?我可以使用该str.join()功能吗?

例如,这是输入['this','is','a','sentence'],这是所需的输出this-is-a-sentence

sentence = ['this','is','a','sentence']
sent_str = ""
for i in sentence:
    sent_str += str(i) + "-"
sent_str = sent_str[:-1]
print sent_str

Is there a simpler way to concatenate string items in a list into a single string? Can I use the str.join() function?

E.g. this is the input ['this','is','a','sentence'] and this is the desired output this-is-a-sentence

sentence = ['this','is','a','sentence']
sent_str = ""
for i in sentence:
    sent_str += str(i) + "-"
sent_str = sent_str[:-1]
print sent_str

回答 0

用途join

>>> sentence = ['this','is','a','sentence']
>>> '-'.join(sentence)
'this-is-a-sentence'

Use join:

>>> sentence = ['this','is','a','sentence']
>>> '-'.join(sentence)
'this-is-a-sentence'

回答 1

将python列表转换为字符串的更通用的方法是:

>>> my_lst = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
>>> my_lst_str = ''.join(map(str, my_lst))
>>> print(my_lst_str)
'12345678910'

A more generic way to convert python lists to strings would be:

>>> my_lst = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
>>> my_lst_str = ''.join(map(str, my_lst))
>>> print(my_lst_str)
'12345678910'

回答 2

对于初学者来说,了解join为什么是字符串方法非常有用 。

一开始很奇怪,但此后非常有用。

连接的结果始终是一个字符串,但是要连接的对象可以有多种类型(生成器,列表,元组等)。

.join更快,因为它只分配一次内存。比经典串联更好(请参阅扩展说明)。

一旦学习了它,它就会非常舒适,您可以执行以下技巧来添加括号。

>>> ",".join("12345").join(("(",")"))
Out:
'(1,2,3,4,5)'

>>> list = ["(",")"]
>>> ",".join("12345").join(list)
Out:
'(1,2,3,4,5)'

It’s very useful for beginners to know why join is a string method.

It’s very strange at the beginning, but very useful after this.

The result of join is always a string, but the object to be joined can be of many types (generators, list, tuples, etc).

.join is faster because it allocates memory only once. Better than classical concatenation (see, extended explanation).

Once you learn it, it’s very comfortable and you can do tricks like this to add parentheses.

>>> ",".join("12345").join(("(",")"))
Out:
'(1,2,3,4,5)'

>>> list = ["(",")"]
>>> ",".join("12345").join(list)
Out:
'(1,2,3,4,5)'

回答 3

尽管@Burhan Khalid的回答很好,但我认为这样更容易理解:

from str import join

sentence = ['this','is','a','sentence']

join(sentence, "-") 

join()的第二个参数是可选的,默认为“”。

编辑:此功能已在Python 3中删除

Although @Burhan Khalid’s answer is good, I think it’s more understandable like this:

from str import join

sentence = ['this','is','a','sentence']

join(sentence, "-") 

The second argument to join() is optional and defaults to ” “.

EDIT: This function was removed in Python 3


回答 4

我们可以指定如何连接字符串。除了使用’-‘,我们还可以使用”

sentence = ['this','is','a','sentence']
s=(" ".join(sentence))
print(s)

We can specify how we have to join the string. Instead of ‘-‘, we can use ‘ ‘

sentence = ['this','is','a','sentence']
s=(" ".join(sentence))
print(s)

回答 5

我们还可以使用Python的reduce函数:

from functools import reduce

sentence = ['this','is','a','sentence']
out_str = str(reduce(lambda x,y: x+"-"+y, sentence))
print(out_str)

We can also use Python’s reduce function:

from functools import reduce

sentence = ['this','is','a','sentence']
out_str = str(reduce(lambda x,y: x+"-"+y, sentence))
print(out_str)

回答 6

def eggs(someParameter):
    del spam[3]
    someParameter.insert(3, ' and cats.')


spam = ['apples', 'bananas', 'tofu', 'cats']
eggs(spam)
spam =(','.join(spam))
print(spam)
def eggs(someParameter):
    del spam[3]
    someParameter.insert(3, ' and cats.')


spam = ['apples', 'bananas', 'tofu', 'cats']
eggs(spam)
spam =(','.join(spam))
print(spam)

从Python中的字符串中删除特定字符

问题:从Python中的字符串中删除特定字符

我正在尝试使用Python从字符串中删除特定字符。这是我现在正在使用的代码。不幸的是,它似乎对字符串没有任何作用。

for char in line:
    if char in " ?.!/;:":
        line.replace(char,'')

如何正确执行此操作?

I’m trying to remove specific characters from a string using Python. This is the code I’m using right now. Unfortunately it appears to do nothing to the string.

for char in line:
    if char in " ?.!/;:":
        line.replace(char,'')

How do I do this properly?


回答 0

Python中的字符串是不可变的(无法更改)。因此,的效果line.replace(...)只是创建一个新字符串,而不是更改旧字符串。您需要重新绑定(分配)它line,以使该变量采用新值,并删除这些字符。

而且,相对而言,您的操作方式会比较缓慢。这也可能会使经验丰富的pythonator感到有些困惑,他们将看到双重嵌套的结构,并暂时认为会发生一些更复杂的事情。

从Python 2.6和更高版本的Python 2.x版本*开始,您可以改用str.translate,(但请继续阅读Python 3的不同之处):

line = line.translate(None, '!@#$')

或将正则表达式替换为 re.sub

import re
line = re.sub('[!@#$]', '', line)

方括号内的字符构成一个字符类line该类中的所有字符都被替换为第二个参数sub:空字符串。

在Python 3中,字符串是Unicode。您必须进行一些不同的翻译。kevpie在对其中一个答案的评论中提到了这一点,并在的文档中str.translate对此进行了注明。

当调用translateUnicode字符串的方法时,您不能传递上面使用的第二个参数。您也不能None作为第一个参数传递。相反,您将翻译表(通常是字典)作为唯一参数传递。此表将字符的序号值(即调用ord它们的结果)映射到应替换它们的字符的序号值,或者(对我们有用)None表示应删除它们。

因此,使用Unicode字符串进行上述舞蹈时,您会调用类似

translation_table = dict.fromkeys(map(ord, '!@#$'), None)
unicode_line = unicode_line.translate(translation_table)

在此处dict.fromkeysmap用于简要生成包含以下内容的字典

{ord('!'): None, ord('@'): None, ...}

就像另一个答案所说的那样,甚至更简单,在原位创建翻译表:

unicode_line = unicode_line.translate({ord(c): None for c in '!@#$'})

或使用创建相同的翻译表str.maketrans

unicode_line = unicode_line.translate(str.maketrans('', '', '!@#$'))

*为了与早期的Python兼容,您可以创建一个“空”转换表来代替None

import string
line = line.translate(string.maketrans('', ''), '!@#$')

string.maketrans是用来创建转换表的,它只是一个字符串,其中包含序号为0到255的字符。

Strings in Python are immutable (can’t be changed). Because of this, the effect of line.replace(...) is just to create a new string, rather than changing the old one. You need to rebind (assign) it to line in order to have that variable take the new value, with those characters removed.

Also, the way you are doing it is going to be kind of slow, relatively. It’s also likely to be a bit confusing to experienced pythonators, who will see a doubly-nested structure and think for a moment that something more complicated is going on.

Starting in Python 2.6 and newer Python 2.x versions *, you can instead use str.translate, (but read on for Python 3 differences):

line = line.translate(None, '!@#$')

or regular expression replacement with re.sub

import re
line = re.sub('[!@#$]', '', line)

The characters enclosed in brackets constitute a character class. Any characters in line which are in that class are replaced with the second parameter to sub: an empty string.

In Python 3, strings are Unicode. You’ll have to translate a little differently. kevpie mentions this in a comment on one of the answers, and it’s noted in the documentation for str.translate.

When calling the translate method of a Unicode string, you cannot pass the second parameter that we used above. You also can’t pass None as the first parameter. Instead, you pass a translation table (usually a dictionary) as the only parameter. This table maps the ordinal values of characters (i.e. the result of calling ord on them) to the ordinal values of the characters which should replace them, or—usefully to us—None to indicate that they should be deleted.

So to do the above dance with a Unicode string you would call something like

translation_table = dict.fromkeys(map(ord, '!@#$'), None)
unicode_line = unicode_line.translate(translation_table)

Here dict.fromkeys and map are used to succinctly generate a dictionary containing

{ord('!'): None, ord('@'): None, ...}

Even simpler, as another answer puts it, create the translation table in place:

unicode_line = unicode_line.translate({ord(c): None for c in '!@#$'})

Or create the same translation table with str.maketrans:

unicode_line = unicode_line.translate(str.maketrans('', '', '!@#$'))

* for compatibility with earlier Pythons, you can create a “null” translation table to pass in place of None:

import string
line = line.translate(string.maketrans('', ''), '!@#$')

Here string.maketrans is used to create a translation table, which is just a string containing the characters with ordinal values 0 to 255.


回答 1

我是否在这里遗漏了要点,或者只是以下内容:

string = "ab1cd1ef"
string = string.replace("1","") 

print string
# result: "abcdef"

将其循环:

a = "a!b@c#d$"
b = "!@#$"
for char in b:
    a = a.replace(char,"")

print a
# result: "abcd"

Am I missing the point here, or is it just the following:

string = "ab1cd1ef"
string = string.replace("1","") 

print string
# result: "abcdef"

Put it in a loop:

a = "a!b@c#d$"
b = "!@#$"
for char in b:
    a = a.replace(char,"")

print a
# result: "abcd"

回答 2

>>> line = "abc#@!?efg12;:?"
>>> ''.join( c for c in line if  c not in '?:!/;' )
'abc#@efg12'
>>> line = "abc#@!?efg12;:?"
>>> ''.join( c for c in line if  c not in '?:!/;' )
'abc#@efg12'

回答 3

re.sub从Python 3.5开始具有正则表达式

re.sub('\ |\?|\.|\!|\/|\;|\:', '', line)

>>> import re

>>> line = 'Q: Do I write ;/.??? No!!!'

>>> re.sub('\ |\?|\.|\!|\/|\;|\:', '', line)
'QDoIwriteNo'

说明

正则表达式(regex)中,|它是逻辑OR,并\转义可能是实际regex命令的空格和特殊字符。而sub代表替换,在这种情况下为空字符串''

Easy peasy with re.sub regular expression as of Python 3.5

re.sub('\ |\?|\.|\!|\/|\;|\:', '', line)

Example

>>> import re

>>> line = 'Q: Do I write ;/.??? No!!!'

>>> re.sub('\ |\?|\.|\!|\/|\;|\:', '', line)
'QDoIwriteNo'

Explanation

In regular expressions (regex), | is a logical OR and \ escapes spaces and special characters that might be actual regex commands. Whereas sub stands for substitution, in this case with the empty string ''.


回答 4

对于允许在字符串中使用某些字符的相反要求,可以将正则表达式与集合补码运算符配合使用[^ABCabc]。例如,要删除除ASCII字母,数字和连字符以外的所有内容,请执行以下操作:

>>> import string
>>> import re
>>>
>>> phrase = '  There were "nine" (9) chick-peas in my pocket!!!      '
>>> allow = string.letters + string.digits + '-'
>>> re.sub('[^%s]' % allow, '', phrase)

'Therewerenine9chick-peasinmypocket'

python正则表达式文档中

可以通过补充集合来匹配不在范围内的字符。如果集合的第一个字符是'^',则所有不在集合中的字符都将被匹配。例如,[^5]将匹配除“ 5”以外的任何字符,并将匹配除以外的[^^]任何字符 '^'^如果不是集合中的第一个字符,则没有特殊含义。

For the inverse requirement of only allowing certain characters in a string, you can use regular expressions with a set complement operator [^ABCabc]. For example, to remove everything except ascii letters, digits, and the hyphen:

>>> import string
>>> import re
>>>
>>> phrase = '  There were "nine" (9) chick-peas in my pocket!!!      '
>>> allow = string.letters + string.digits + '-'
>>> re.sub('[^%s]' % allow, '', phrase)

'Therewerenine9chick-peasinmypocket'

From the python regular expression documentation:

Characters that are not within a range can be matched by complementing the set. If the first character of the set is '^', all the characters that are not in the set will be matched. For example, [^5] will match any character except ‘5’, and [^^] will match any character except '^'. ^ has no special meaning if it’s not the first character in the set.


回答 5

询问者几乎拥有了它。像Python中的大多数事物一样,答案比您想象的要简单。

>>> line = "H E?.LL!/;O:: "  
>>> for char in ' ?.!/;:':  
...  line = line.replace(char,'')  
...
>>> print line
HELLO

您不必执行嵌套的if / for循环操作,但是您需要单独检查每个字符。

The asker almost had it. Like most things in Python, the answer is simpler than you think.

>>> line = "H E?.LL!/;O:: "  
>>> for char in ' ?.!/;:':  
...  line = line.replace(char,'')  
...
>>> print line
HELLO

You don’t have to do the nested if/for loop thing, but you DO need to check each character individually.


回答 6

line = line.translate(None, " ?.!/;:")
line = line.translate(None, " ?.!/;:")

回答 7

>>> s = 'a1b2c3'
>>> ''.join(c for c in s if c not in '123')
'abc'
>>> s = 'a1b2c3'
>>> ''.join(c for c in s if c not in '123')
'abc'

回答 8

字符串在Python中是不可变的。replace替换后,该方法返回一个新字符串。尝试:

for char in line:
    if char in " ?.!/;:":
        line = line.replace(char,'')

Strings are immutable in Python. The replace method returns a new string after the replacement. Try:

for char in line:
    if char in " ?.!/;:":
        line = line.replace(char,'')

回答 9

令我惊讶的是,还没有人建议使用内置过滤器功能。

    import operator
    import string # only for the example you could use a custom string

    s = "1212edjaq"

假设我们要过滤掉所有不是数字的内容。使用过滤器内置方法“ …等效于生成器表达式(如果函数(item),则为可迭代的项目项)” [ Python 3内置:过滤器 ]

    sList = list(s)
    intsList = list(string.digits)
    obj = filter(lambda x: operator.contains(intsList, x), sList)))

在Python 3中返回

    >>  <filter object @ hex>

要获得打印的字符串,

    nums = "".join(list(obj))
    print(nums)
    >> "1212"

我不确定过滤器在效率方面如何排名,但是知道如何在进行列表理解等时使用过滤器是一件好事。

更新

从逻辑上讲,由于过滤器可以工作,因此您还可以使用列表理解功能,并且据我所读,由于lambda是编程功能领域的华尔街对冲基金经理,因此应该更有效。另一个优点是它是一种单线,不需要任何进口。例如,使用上面定义的相同字符串“ s”,

      num = "".join([i for i in s if i.isdigit()])

而已。返回值将是原始字符串中所有数字字符的字符串。

如果您有可接受/不可接受字符的特定列表,则只需调整列表理解的’if’部分。

      target_chars = "".join([i for i in s if i in some_list]) 

或者,

      target_chars = "".join([i for i in s if i not in some_list])

I was surprised that no one had yet recommended using the builtin filter function.

    import operator
    import string # only for the example you could use a custom string

    s = "1212edjaq"

Say we want to filter out everything that isn’t a number. Using the filter builtin method “…is equivalent to the generator expression (item for item in iterable if function(item))” [Python 3 Builtins: Filter]

    sList = list(s)
    intsList = list(string.digits)
    obj = filter(lambda x: operator.contains(intsList, x), sList)))

In Python 3 this returns

    >>  <filter object @ hex>

To get a printed string,

    nums = "".join(list(obj))
    print(nums)
    >> "1212"

I am not sure how filter ranks in terms of efficiency but it is a good thing to know how to use when doing list comprehensions and such.

UPDATE

Logically, since filter works you could also use list comprehension and from what I have read it is supposed to be more efficient because lambdas are the wall street hedge fund managers of the programming function world. Another plus is that it is a one-liner that doesnt require any imports. For example, using the same string ‘s’ defined above,

      num = "".join([i for i in s if i.isdigit()])

That’s it. The return will be a string of all the characters that are digits in the original string.

If you have a specific list of acceptable/unacceptable characters you need only adjust the ‘if’ part of the list comprehension.

      target_chars = "".join([i for i in s if i in some_list]) 

or alternatively,

      target_chars = "".join([i for i in s if i not in some_list])

回答 10

使用filter,您只需要一行

line = filter(lambda char: char not in " ?.!/;:", line)

这会将字符串视为可迭代的,并检查每个字符是否lambda返回True

>>> help(filter)
Help on built-in function filter in module __builtin__:

filter(...)
    filter(function or None, sequence) -> list, tuple, or string

    Return those items of sequence for which function(item) is true.  If
    function is None, return the items that are true.  If sequence is a tuple
    or string, return the same type, else return a list.

Using filter, you’d just need one line

line = filter(lambda char: char not in " ?.!/;:", line)

This treats the string as an iterable and checks every character if the lambda returns True:

>>> help(filter)
Help on built-in function filter in module __builtin__:

filter(...)
    filter(function or None, sequence) -> list, tuple, or string

    Return those items of sequence for which function(item) is true.  If
    function is None, return the items that are true.  If sequence is a tuple
    or string, return the same type, else return a list.

回答 11

这是完成此任务的一些可能方法:

def attempt1(string):
    return "".join([v for v in string if v not in ("a", "e", "i", "o", "u")])


def attempt2(string):
    for v in ("a", "e", "i", "o", "u"):
        string = string.replace(v, "")
    return string


def attempt3(string):
    import re
    for v in ("a", "e", "i", "o", "u"):
        string = re.sub(v, "", string)
    return string


def attempt4(string):
    return string.replace("a", "").replace("e", "").replace("i", "").replace("o", "").replace("u", "")


for attempt in [attempt1, attempt2, attempt3, attempt4]:
    print(attempt("murcielago"))

PS:示例中使用的是元音…,而不是“?。!/ ;:”,是的,“ murcielago”是西班牙语中用来说蝙蝠的单词…有趣的词,因为它包含所有元音

PS2:如果您对性能感兴趣,可以使用以下简单代码来衡量这些尝试:

import timeit


K = 1000000
for i in range(1,5):
    t = timeit.Timer(
        f"attempt{i}('murcielago')",
        setup=f"from __main__ import attempt{i}"
    ).repeat(1, K)
    print(f"attempt{i}",min(t))

在我的盒子里,你会得到:

attempt1 2.2334518376057244
attempt2 1.8806643818474513
attempt3 7.214925774955572
attempt4 1.7271184513757465

因此,对于这种特定输入,似乎try4是最快的尝试。

Here’s some possible ways to achieve this task:

def attempt1(string):
    return "".join([v for v in string if v not in ("a", "e", "i", "o", "u")])


def attempt2(string):
    for v in ("a", "e", "i", "o", "u"):
        string = string.replace(v, "")
    return string


def attempt3(string):
    import re
    for v in ("a", "e", "i", "o", "u"):
        string = re.sub(v, "", string)
    return string


def attempt4(string):
    return string.replace("a", "").replace("e", "").replace("i", "").replace("o", "").replace("u", "")


for attempt in [attempt1, attempt2, attempt3, attempt4]:
    print(attempt("murcielago"))

PS: Instead using ” ?.!/;:” the examples use the vowels… and yeah, “murcielago” is the Spanish word to say bat… funny word as it contains all the vowels :)

PS2: If you’re interested on performance you could measure these attempts with a simple code like:

import timeit


K = 1000000
for i in range(1,5):
    t = timeit.Timer(
        f"attempt{i}('murcielago')",
        setup=f"from __main__ import attempt{i}"
    ).repeat(1, K)
    print(f"attempt{i}",min(t))

In my box you’d get:

attempt1 2.2334518376057244
attempt2 1.8806643818474513
attempt3 7.214925774955572
attempt4 1.7271184513757465

So it seems attempt4 is the fastest one for this particular input.


回答 12

这是我的Python 2/3兼容版本。由于翻译API已更改。

def remove(str_, chars):
    """Removes each char in `chars` from `str_`.

    Args:
        str_: String to remove characters from
        chars: String of to-be removed characters

    Returns:
        A copy of str_ with `chars` removed

    Example:
            remove("What?!?: darn;", " ?.!:;") => 'Whatdarn'
    """
    try:
        # Python2.x
        return str_.translate(None, chars)
    except TypeError:
        # Python 3.x
        table = {ord(char): None for char in chars}
        return str_.translate(table)

Here’s my Python 2/3 compatible version. Since the translate api has changed.

def remove(str_, chars):
    """Removes each char in `chars` from `str_`.

    Args:
        str_: String to remove characters from
        chars: String of to-be removed characters

    Returns:
        A copy of str_ with `chars` removed

    Example:
            remove("What?!?: darn;", " ?.!:;") => 'Whatdarn'
    """
    try:
        # Python2.x
        return str_.translate(None, chars)
    except TypeError:
        # Python 3.x
        table = {ord(char): None for char in chars}
        return str_.translate(table)

回答 13

#!/usr/bin/python
import re

strs = "how^ much for{} the maple syrup? $20.99? That's[] ricidulous!!!"
print strs
nstr = re.sub(r'[?|$|.|!|a|b]',r' ',strs)#i have taken special character to remove but any #character can be added here
print nstr
nestr = re.sub(r'[^a-zA-Z0-9 ]',r'',nstr)#for removing special character
print nestr
#!/usr/bin/python
import re

strs = "how^ much for{} the maple syrup? $20.99? That's[] ricidulous!!!"
print strs
nstr = re.sub(r'[?|$|.|!|a|b]',r' ',strs)#i have taken special character to remove but any #character can be added here
print nstr
nestr = re.sub(r'[^a-zA-Z0-9 ]',r'',nstr)#for removing special character
print nestr

回答 14

这个怎么样:

def text_cleanup(text):
    new = ""
    for i in text:
        if i not in " ?.!/;:":
            new += i
    return new

How about this:

def text_cleanup(text):
    new = ""
    for i in text:
        if i not in " ?.!/;:":
            new += i
    return new

回答 15

您还可以使用一个函数,以使用列表替换其他种类的正则表达式或其他模式。这样,您就可以混合使用正则表达式,字符类和真正的基本文本模式。当您需要替换许多HTML元素时,它非常有用。

*注意:适用于Python 3.x

import re  # Regular expression library


def string_cleanup(x, notwanted):
    for item in notwanted:
        x = re.sub(item, '', x)
    return x

line = "<title>My example: <strong>A text %very% $clean!!</strong></title>"
print("Uncleaned: ", line)

# Get rid of html elements
html_elements = ["<title>", "</title>", "<strong>", "</strong>"]
line = string_cleanup(line, html_elements)
print("1st clean: ", line)

# Get rid of special characters
special_chars = ["[!@#$]", "%"]
line = string_cleanup(line, special_chars)
print("2nd clean: ", line)

在函数string_cleanup中,它将字符串x和不需要的列表作为参数。对于该元素或模式列表中的每个项目,如果需要替代,它将完成。

输出:

Uncleaned:  <title>My example: <strong>A text %very% $clean!!</strong></title>
1st clean:  My example: A text %very% $clean!!
2nd clean:  My example: A text very clean

You can also use a function in order to substitute different kind of regular expression or other pattern with the use of a list. With that, you can mixed regular expression, character class, and really basic text pattern. It’s really useful when you need to substitute a lot of elements like HTML ones.

*NB: works with Python 3.x

import re  # Regular expression library


def string_cleanup(x, notwanted):
    for item in notwanted:
        x = re.sub(item, '', x)
    return x

line = "<title>My example: <strong>A text %very% $clean!!</strong></title>"
print("Uncleaned: ", line)

# Get rid of html elements
html_elements = ["<title>", "</title>", "<strong>", "</strong>"]
line = string_cleanup(line, html_elements)
print("1st clean: ", line)

# Get rid of special characters
special_chars = ["[!@#$]", "%"]
line = string_cleanup(line, special_chars)
print("2nd clean: ", line)

In the function string_cleanup, it takes your string x and your list notwanted as arguments. For each item in that list of elements or pattern, if a substitute is needed it will be done.

The output:

Uncleaned:  <title>My example: <strong>A text %very% $clean!!</strong></title>
1st clean:  My example: A text %very% $clean!!
2nd clean:  My example: A text very clean

回答 16

我使用的方法可能无法有效地工作,但是它非常简单。我可以使用切片和格式化功能一次删除不同位置的多个字符。这是一个例子:

words = "things"
removed = "%s%s" % (words[:3], words[-1:])

这将导致“删除”中带有“ this”一词。

格式化对于在打印字符串中途打印变量非常有用。它可以使用插入任何数据类型,后跟变量的数据类型。所有数据类型都可以使用%s,而浮点数(也就是小数)和整数可以使用%d

切片可用于对字符串的复杂控制。当我输入words [:3]时,它允许我从字符串的开头选择所有字符(冒号在数字之前,这意味着“从开头到”)到第四个字符(包括第四个字符)字符)。之所以3等于第4位是因为Python从0开始。然后,当我将word [-1:]放到最后时,倒数第二个字符(冒号在数字后面)。放置-1将使Python从最后符开始计数,而不是从第一个字符开始计数。同样,Python将从0开始。因此,单词[-1:]基本上表示’从倒数第二个字符到字符串的末尾。

因此,通过剪掉我要删除的字符之前的字符,之后要剪掉的字符并将它们夹在中间,我可以删除不需要的字符。想起来像香肠。中间很脏,所以我想摆脱它。我只剪掉我想要的两端,然后将它们放在一起,中间没有多余的部分。

如果要删除多个连续的字符,只需在[](切片部分)中移动数字即可。或者,如果我想从不同位置删除多个字符,则可以一次将多个切片夹在一起。

例子:

 words = "control"
 removed = "%s%s" % (words[:2], words[-2:])

已移除等于“酷”。

words = "impacts"
removed = "%s%s%s" % (words[1], words[3:5], words[-1])

已移除等于“ macs”。

在这种情况下,[3:5]表示位置 3到位置处的字符位置的 5的字符(不包括最终位置的字符)。

请记住,Python从0开始计数,因此您也需要这样做。

My method I’d use probably wouldn’t work as efficiently, but it is massively simple. I can remove multiple characters at different positions all at once, using slicing and formatting. Here’s an example:

words = "things"
removed = "%s%s" % (words[:3], words[-1:])

This will result in ‘removed’ holding the word ‘this’.

Formatting can be very helpful for printing variables midway through a print string. It can insert any data type using a % followed by the variable’s data type; all data types can use %s, and floats (aka decimals) and integers can use %d.

Slicing can be used for intricate control over strings. When I put words[:3], it allows me to select all the characters in the string from the beginning (the colon is before the number, this will mean ‘from the beginning to’) to the 4th character (it includes the 4th character). The reason 3 equals till the 4th position is because Python starts at 0. Then, when I put word[-1:], it means the 2nd last character to the end (the colon is behind the number). Putting -1 will make Python count from the last character, rather than the first. Again, Python will start at 0. So, word[-1:] basically means ‘from the second last character to the end of the string.

So, by cutting off the characters before the character I want to remove and the characters after and sandwiching them together, I can remove the unwanted character. Think of it like a sausage. In the middle it’s dirty, so I want to get rid of it. I simply cut off the two ends I want then put them together without the unwanted part in the middle.

If I want to remove multiple consecutive characters, I simply shift the numbers around in the [] (slicing part). Or if I want to remove multiple characters from different positions, I can simply sandwich together multiple slices at once.

Examples:

 words = "control"
 removed = "%s%s" % (words[:2], words[-2:])

removed equals ‘cool’.

words = "impacts"
removed = "%s%s%s" % (words[1], words[3:5], words[-1])

removed equals ‘macs’.

In this case, [3:5] means character at position 3 through character at position 5 (excluding the character at the final position).

Remember, Python starts counting at 0, so you will need to as well.


回答 17

试试这个:

def rm_char(original_str, need2rm):
    ''' Remove charecters in "need2rm" from "original_str" '''
    return original_str.translate(str.maketrans('','',need2rm))

此方法在python 3.5.2中很好用

Try this one:

def rm_char(original_str, need2rm):
    ''' Remove charecters in "need2rm" from "original_str" '''
    return original_str.translate(str.maketrans('','',need2rm))

This method works well in python 3.5.2


回答 18

您可以使用re模块的正则表达式替换。使用^表达式可让您从字符串中准确选择所需的内容。

    import re
    text = "This is absurd!"
    text = re.sub("[^a-zA-Z]","",text) # Keeps only Alphabets
    print(text)

输出为“ Thisisabsurd”。仅出现在^符号后指定的内容。

You could use the re module’s regular expression replacement. Using the ^ expression allows you to pick exactly what you want from your string.

    import re
    text = "This is absurd!"
    text = re.sub("[^a-zA-Z]","",text) # Keeps only Alphabets
    print(text)

Output to this would be “Thisisabsurd”. Only things specified after the ^ symbol will appear.


回答 19

字符串方法replace不会修改原始字符串。它保留原始文件,并返回修改后的副本。

您想要的是这样的: line = line.replace(char,'')

def replace_all(line, )for char in line:
    if char in " ?.!/;:":
        line = line.replace(char,'')
    return line

但是,每次删除一个字符时都创建一个新字符串是非常低效的。我建议改为以下内容:

def replace_all(line, baddies, *):
    """
    The following is documentation on how to use the class,
    without reference to the implementation details:

    For implementation notes, please see comments begining with `#`
    in the source file.

    [*crickets chirp*]

    """

    is_bad = lambda ch, baddies=baddies: return ch in baddies
    filter_baddies = lambda ch, *, is_bad=is_bad: "" if is_bad(ch) else ch
    mahp = replace_all.map(filter_baddies, line)
    return replace_all.join('', join(mahp))

    # -------------------------------------------------
    # WHY `baddies=baddies`?!?
    #     `is_bad=is_bad`
    # -------------------------------------------------
    # Default arguments to a lambda function are evaluated
    # at the same time as when a lambda function is
    # **defined**.
    #
    # global variables of a lambda function
    # are evaluated when the lambda function is
    # **called**
    #
    # The following prints "as yellow as snow"
    #
    #     fleece_color = "white"
    #     little_lamb = lambda end: return "as " + fleece_color + end
    #
    #     # sometime later...
    #
    #     fleece_color = "yellow"
    #     print(little_lamb(" as snow"))
    # --------------------------------------------------
replace_all.map = map
replace_all.join = str.join

The string method replace does not modify the original string. It leaves the original alone and returns a modified copy.

What you want is something like: line = line.replace(char,'')

def replace_all(line, )for char in line:
    if char in " ?.!/;:":
        line = line.replace(char,'')
    return line

However, creating a new string each and every time that a character is removed is very inefficient. I recommend the following instead:

def replace_all(line, baddies, *):
    """
    The following is documentation on how to use the class,
    without reference to the implementation details:

    For implementation notes, please see comments begining with `#`
    in the source file.

    [*crickets chirp*]

    """

    is_bad = lambda ch, baddies=baddies: return ch in baddies
    filter_baddies = lambda ch, *, is_bad=is_bad: "" if is_bad(ch) else ch
    mahp = replace_all.map(filter_baddies, line)
    return replace_all.join('', join(mahp))

    # -------------------------------------------------
    # WHY `baddies=baddies`?!?
    #     `is_bad=is_bad`
    # -------------------------------------------------
    # Default arguments to a lambda function are evaluated
    # at the same time as when a lambda function is
    # **defined**.
    #
    # global variables of a lambda function
    # are evaluated when the lambda function is
    # **called**
    #
    # The following prints "as yellow as snow"
    #
    #     fleece_color = "white"
    #     little_lamb = lambda end: return "as " + fleece_color + end
    #
    #     # sometime later...
    #
    #     fleece_color = "yellow"
    #     print(little_lamb(" as snow"))
    # --------------------------------------------------
replace_all.map = map
replace_all.join = str.join

回答 20

下面的一个..不使用正则表达式的概念..

ipstring ="text with symbols!@#$^&*( ends here"
opstring=''
for i in ipstring:
    if i.isalnum()==1 or i==' ':
        opstring+=i
    pass
print opstring

Below one.. with out using regular expression concept..

ipstring ="text with symbols!@#$^&*( ends here"
opstring=''
for i in ipstring:
    if i.isalnum()==1 or i==' ':
        opstring+=i
    pass
print opstring

回答 21

在Python 3.5中

例如,

os.rename(file_name, file_name.translate({ord(c): None for c in '0123456789'}))

从字符串中删除所有数字

In Python 3.5

e.g.,

os.rename(file_name, file_name.translate({ord(c): None for c in '0123456789'}))

To remove all the number from the string


回答 22

你可以使用设置

    charlist = list(set(string.digits+string.ascii_uppercase) - set('10IO'))
    return ''.join([random.SystemRandom().choice(charlist) for _ in range(passlen)])

you can use set

    charlist = list(set(string.digits+string.ascii_uppercase) - set('10IO'))
    return ''.join([random.SystemRandom().choice(charlist) for _ in range(passlen)])

回答 23

递归拆分: s = string; chars =要删除的字符

def strip(s,chars):
if len(s)==1:
    return "" if s in chars else s
return strip(s[0:int(len(s)/2)],chars) +  strip(s[int(len(s)/2):len(s)],chars)

例:

print(strip("Hello!","lo"))    #He!

Recursive split: s=string ; chars=chars to remove

def strip(s,chars):
if len(s)==1:
    return "" if s in chars else s
return strip(s[0:int(len(s)/2)],chars) +  strip(s[int(len(s)/2):len(s)],chars)

example:

print(strip("Hello!","lo"))    #He!

回答 24

#为目录中的每个文件重命名文件名

   file_list = os.listdir (r"D:\Dev\Python")

   for file_name in file_list:

       os.rename(file_name, re.sub(r'\d+','',file_name))

# for each file on a directory, rename filename

   file_list = os.listdir (r"D:\Dev\Python")

   for file_name in file_list:

       os.rename(file_name, re.sub(r'\d+','',file_name))

回答 25

即使是以下方法也可以

line = "a,b,c,d,e"
alpha = list(line)
        while ',' in alpha:
            alpha.remove(',')
finalString = ''.join(alpha)
print(finalString)

输出: abcde

Even the below approach works

line = "a,b,c,d,e"
alpha = list(line)
        while ',' in alpha:
            alpha.remove(',')
finalString = ''.join(alpha)
print(finalString)

output: abcde


回答 26

>>> # Character stripping
>>> a = '?abcd1234!!'
>>> t.lstrip('?')
'abcd1234!!'
>>> t.strip('?!')
'abcd1234'
>>> # Character stripping
>>> a = '?abcd1234!!'
>>> t.lstrip('?')
'abcd1234!!'
>>> t.strip('?!')
'abcd1234'

如何将列表的字符串表示形式转换为列表?

问题:如何将列表的字符串表示形式转换为列表?

我想知道最简单的方法是将string类似以下的列表转换为list

x = u'[ "A","B","C" , " D"]'

即使用户在逗号之间加上空格,也要在引号内使用空格。我还需要处理以下内容:

x = ["A", "B", "C", "D"] 

在Python中。

我知道我可以使用strip()split()使用split运算符删除空格,并检查非字母。但是代码变得非常混乱。有我不知道的快速功能吗?

I was wondering what the simplest way is to convert a string list like the following to a list:

x = u'[ "A","B","C" , " D"]'

Even in case user puts spaces in between the commas, and spaces inside of the quotes. I need to handle that as well to:

x = ["A", "B", "C", "D"] 

in Python.

I know I can strip spaces with strip() and split() using the split operator and check for non alphabets. But the code was getting very kludgy. Is there a quick function that I’m not aware of?


回答 0

>>> import ast
>>> x = u'[ "A","B","C" , " D"]'
>>> x = ast.literal_eval(x)
>>> x
['A', 'B', 'C', ' D']
>>> x = [n.strip() for n in x]
>>> x
['A', 'B', 'C', 'D']

ast.literal_eval

使用ast.literal_eval,您可以安全地评估表达式节点或包含Python表达式的字符串。提供的字符串或节点只能由以下Python文字结构组成:字符串,数字,元组,列表,字典,布尔值和无。

>>> import ast
>>> x = u'[ "A","B","C" , " D"]'
>>> x = ast.literal_eval(x)
>>> x
['A', 'B', 'C', ' D']
>>> x = [n.strip() for n in x]
>>> x
['A', 'B', 'C', 'D']

ast.literal_eval:

With ast.literal_eval, you can safely evaluate an expression node or a string containing a Python expression. The string or node provided may only consist of the following Python literal structures: strings, numbers, tuples, lists, dicts, booleans, and None.


回答 1

json每当有字典的字符串列表时,该模块都是更好的解决方案。该json.loads(your_data)函数可用于将其转换为列表。

>>> import json
>>> x = u'[ "A","B","C" , " D"]'
>>> json.loads(x)
[u'A', u'B', u'C', u' D']

相似地

>>> x = u'[ "A","B","C" , {"D":"E"}]'
>>> json.loads(x)
[u'A', u'B', u'C', {u'D': u'E'}]

The json module is a better solution whenever there is a stringified list of dictionaries. The json.loads(your_data) function can be used to convert it to a list.

>>> import json
>>> x = u'[ "A","B","C" , " D"]'
>>> json.loads(x)
[u'A', u'B', u'C', u' D']

Similarly

>>> x = u'[ "A","B","C" , {"D":"E"}]'
>>> json.loads(x)
[u'A', u'B', u'C', {u'D': u'E'}]

回答 2

eval很危险-您不应该执行用户输入。

如果您使用2.6或更高版本,请使用ast而不是eval:

>>> import ast
>>> ast.literal_eval('["A","B" ,"C" ," D"]')
["A", "B", "C", " D"]

一旦有了,就可以strip了。

如果您使用的是旧版Python,则可以使用简单的正则表达式非常接近所需的内容:

>>> x='[  "A",  " B", "C","D "]'
>>> re.findall(r'"\s*([^"]*?)\s*"', x)
['A', 'B', 'C', 'D']

这不如ast解决方案好,例如,它不能正确处理字符串中的转义引号。但这很简单,不涉及危险的评估,如果您使用的是没有ast的旧Python,则可能足以满足您的目的。

The eval is dangerous – you shouldn’t execute user input.

If you have 2.6 or newer, use ast instead of eval:

>>> import ast
>>> ast.literal_eval('["A","B" ,"C" ," D"]')
["A", "B", "C", " D"]

Once you have that, strip the strings.

If you’re on an older version of Python, you can get very close to what you want with a simple regular expression:

>>> x='[  "A",  " B", "C","D "]'
>>> re.findall(r'"\s*([^"]*?)\s*"', x)
['A', 'B', 'C', 'D']

This isn’t as good as the ast solution, for example it doesn’t correctly handle escaped quotes in strings. But it’s simple, doesn’t involve a dangerous eval, and might be good enough for your purpose if you’re on an older Python without ast.


回答 3

import ast
l = ast.literal_eval('[ "A","B","C" , " D"]')
l = [i.strip() for i in l]
import ast
l = ast.literal_eval('[ "A","B","C" , " D"]')
l = [i.strip() for i in l]

回答 4

有一个快速的解决方案:

x = eval('[ "A","B","C" , " D"]')

可以通过以下方式删除列表元素中不需要的空格:

x = [x.strip() for x in eval('[ "A","B","C" , " D"]')]

There is a quick solution:

x = eval('[ "A","B","C" , " D"]')

Unwanted whitespaces in the list elements may be removed in this way:

x = [x.strip() for x in eval('[ "A","B","C" , " D"]')]

回答 5

从上面适用于基本python软件包的一些答案的启发中,我比较了一些(使用Python 3.7.3)的性能:

方法1:AST

import ast
list(map(str.strip, ast.literal_eval(u'[ "A","B","C" , " D"]')))
# ['A', 'B', 'C', 'D']

import timeit
timeit.timeit(stmt="list(map(str.strip, ast.literal_eval(u'[ \"A\",\"B\",\"C\" , \" D\"]')))", setup='import ast', number=100000)
# 1.292875313000195

方法2:JSON

import json
list(map(str.strip, json.loads(u'[ "A","B","C" , " D"]')))
# ['A', 'B', 'C', 'D']

import timeit
timeit.timeit(stmt="list(map(str.strip, json.loads(u'[ \"A\",\"B\",\"C\" , \" D\"]')))", setup='import json', number=100000)
# 0.27833264000014424

方法3:不导入

list(map(str.strip, u'[ "A","B","C" , " D"]'.strip('][').replace('"', '').split(',')))
# ['A', 'B', 'C', 'D']

import timeit
timeit.timeit(stmt="list(map(str.strip, u'[ \"A\",\"B\",\"C\" , \" D\"]'.strip('][').replace('\"', '').split(',')))", number=100000)
# 0.12935059100027502

我很失望地看到我认为可读性最差的方法是性能最好的方法。选择可读性最高的选项时要权衡考虑…对于我通常使用python的工作量类型相对于性能稍高的选项,它更重视可读性,但通常情况下,它取决于。

Inspired from some of the answers above that work with base python packages I compared the performance of a few (using Python 3.7.3):

Method 1: ast

import ast
list(map(str.strip, ast.literal_eval(u'[ "A","B","C" , " D"]')))
# ['A', 'B', 'C', 'D']

import timeit
timeit.timeit(stmt="list(map(str.strip, ast.literal_eval(u'[ \"A\",\"B\",\"C\" , \" D\"]')))", setup='import ast', number=100000)
# 1.292875313000195

Method 2: json

import json
list(map(str.strip, json.loads(u'[ "A","B","C" , " D"]')))
# ['A', 'B', 'C', 'D']

import timeit
timeit.timeit(stmt="list(map(str.strip, json.loads(u'[ \"A\",\"B\",\"C\" , \" D\"]')))", setup='import json', number=100000)
# 0.27833264000014424

Method 3: no import

list(map(str.strip, u'[ "A","B","C" , " D"]'.strip('][').replace('"', '').split(',')))
# ['A', 'B', 'C', 'D']

import timeit
timeit.timeit(stmt="list(map(str.strip, u'[ \"A\",\"B\",\"C\" , \" D\"]'.strip('][').replace('\"', '').split(',')))", number=100000)
# 0.12935059100027502

I was disappointed to see what I considered the method with the worst readability was the method with the best performance… there are tradeoffs to consider when going with the most readable option… for the type of workloads I use python for I usually value readability over a slightly more performant option, but as usual it depends.


回答 6

如果只是一维列表,则无需导入任何内容即可完成:

>>> x = u'[ "A","B","C" , " D"]'
>>> ls = x.strip('[]').replace('"', '').replace(' ', '').split(',')
>>> ls
['A', 'B', 'C', 'D']

If it’s only a one dimensional list, this can be done without importing anything:

>>> x = u'[ "A","B","C" , " D"]'
>>> ls = x.strip('[]').replace('"', '').replace(' ', '').split(',')
>>> ls
['A', 'B', 'C', 'D']

回答 7

假设所有输入都是列表,并且输入中的双引号实际上并不重要,则可以使用简单的regexp替换来完成。它有点Perl-y,但是却像魅力一样。还要注意,输出现在是unicode字符串的列表,您没有指定所需的字符串,但是对于unicode输入,这似乎很有意义。

import re
x = u'[ "A","B","C" , " D"]'
junkers = re.compile('[[" \]]')
result = junkers.sub('', x).split(',')
print result
--->  [u'A', u'B', u'C', u'D']

junkers变量包含一个我们不想使用的所有字符的正则表达式(用于速度),使用]作为字符需要一些反斜杠技巧。re.sub将所有这些字符全部替换为空,然后将结果字符串拆分为逗号。

请注意,这还会从内部条目u'[“ oh no”]’—> [u’ohno’]中删除空格。如果这不是您想要的,则需要增加正则表达式。

Assuming that all your inputs are lists and that the double quotes in the input actually don’t matter, this can be done with a simple regexp replace. It is a bit perl-y but works like a charm. Note also that the output is now a list of unicode strings, you didn’t specify that you needed that, but it seems to make sense given unicode input.

import re
x = u'[ "A","B","C" , " D"]'
junkers = re.compile('[[" \]]')
result = junkers.sub('', x).split(',')
print result
--->  [u'A', u'B', u'C', u'D']

The junkers variable contains a compiled regexp (for speed) of all characters we don’t want, using ] as a character required some backslash trickery. The re.sub replaces all these characters with nothing, and we split the resulting string at the commas.

Note that this also removes spaces from inside entries u'[“oh no”]’ —> [u’ohno’]. If this is not what you wanted, the regexp needs to be souped up a bit.


回答 8

如果您知道列表仅包含带引号的字符串,则此pyparsing示例将为您提供剥离字符串的列表(甚至保留原始Unicode-ness)。

>>> from pyparsing import *
>>> x =u'[ "A","B","C" , " D"]'
>>> LBR,RBR = map(Suppress,"[]")
>>> qs = quotedString.setParseAction(removeQuotes, lambda t: t[0].strip())
>>> qsList = LBR + delimitedList(qs) + RBR
>>> print qsList.parseString(x).asList()
[u'A', u'B', u'C', u'D']

如果你的列表可以有更多的数据类型,甚至包含列表中列出,那么你将需要一个更完整的语法-像这样一个在pyparsing wiki,它可以处理的元组,列表,整数,浮点数,和引用字符串。将适用于2.4之前的Python版本。

If you know that your lists only contain quoted strings, this pyparsing example will give you your list of stripped strings (even preserving the original Unicode-ness).

>>> from pyparsing import *
>>> x =u'[ "A","B","C" , " D"]'
>>> LBR,RBR = map(Suppress,"[]")
>>> qs = quotedString.setParseAction(removeQuotes, lambda t: t[0].strip())
>>> qsList = LBR + delimitedList(qs) + RBR
>>> print qsList.parseString(x).asList()
[u'A', u'B', u'C', u'D']

If your lists can have more datatypes, or even contain lists within lists, then you will need a more complete grammar – like this one on the pyparsing wiki, which will handle tuples, lists, ints, floats, and quoted strings. Will work with Python versions back to 2.4.


回答 9

为了进一步使用json完成@Ryan的答案,这里发布的一个非常方便的函数来转换unicode: https

例如,用双引号或单引号引起来:

>print byteify(json.loads(u'[ "A","B","C" , " D"]')
>print byteify(json.loads(u"[ 'A','B','C' , ' D']".replace('\'','"')))
['A', 'B', 'C', ' D']
['A', 'B', 'C', ' D']

To further complete @Ryan ‘s answer using json, one very convenient function to convert unicode is the one posted here: https://stackoverflow.com/a/13105359/7599285

ex with double or single quotes:

>print byteify(json.loads(u'[ "A","B","C" , " D"]')
>print byteify(json.loads(u"[ 'A','B','C' , ' D']".replace('\'','"')))
['A', 'B', 'C', ' D']
['A', 'B', 'C', ' D']

回答 10

我想用正则表达式提供一个更直观的模式解决方案。下面的函数将包含任意字符串的字符串化列表作为输入。

分步说明: 删除所有whitespacing,花括号和value_separators(前提是它们不是要提取的值的一部分,否则会使正则表达式更复杂)。然后,将清洗后的字符串用单引号或双引号引起来,并采用非空值(或奇数索引值,无论使用哪种首选项)。

def parse_strlist(sl):
import re
clean = re.sub("[\[\],\s]","",sl)
splitted = re.split("[\'\"]",clean)
values_only = [s for s in splitted if s != '']
return values_only

testsample:“ [’21’,” foo“’6’,’0’,” A“]”

I would like to provide a more intuitive patterning solution with regex. The below function takes as input a stringified list containing arbitrary strings.

Stepwise explanation: You remove all whitespacing,bracketing and value_separators (provided they are not part of the values you want to extract, else make the regex more complex). Then you split the cleaned string on single or double quotes and take the non-empty values (or odd indexed values, whatever the preference).

def parse_strlist(sl):
import re
clean = re.sub("[\[\],\s]","",sl)
splitted = re.split("[\'\"]",clean)
values_only = [s for s in splitted if s != '']
return values_only

testsample: “[’21’,”foo” ‘6’, ‘0’, ” A”]”


回答 11

并使用纯python-不导入任何库

[x for x in  x.split('[')[1].split(']')[0].split('"')[1:-1] if x not in[',',' , ',', ']]

and with pure python – not importing any libraries

[x for x in  x.split('[')[1].split(']')[0].split('"')[1:-1] if x not in[',',' , ',', ']]

回答 12

在处理存储为Pandas DataFrame的抓取数据时,您可能会遇到这样的问题。

如果值列表以文本形式出现,则此解决方案的工作方式类似于魅力。

def textToList(hashtags):
    return hashtags.strip('[]').replace('\'', '').replace(' ', '').split(',')

hashtags = "[ 'A','B','C' , ' D']"
hashtags = textToList(hashtags)

Output: ['A', 'B', 'C', 'D']

无需外部库。

You may run into such problem while dealing with scraped data stored as Pandas DataFrame.

This solution works like charm if the list of values is present as text.

def textToList(hashtags):
    return hashtags.strip('[]').replace('\'', '').replace(' ', '').split(',')

hashtags = "[ 'A','B','C' , ' D']"
hashtags = textToList(hashtags)

Output: ['A', 'B', 'C', 'D']

No external library required.


回答 13

因此,按照所有答案,我决定为最常见的方法计时:

from time import time
import re
import json


my_str = str(list(range(19)))
print(my_str)

reps = 100000

start = time()
for i in range(0, reps):
    re.findall("\w+", my_str)
print("Regex method:\t", (time() - start) / reps)

start = time()
for i in range(0, reps):
    json.loads(my_str)
print("json method:\t", (time() - start) / reps)

start = time()
for i in range(0, reps):
    ast.literal_eval(my_str)
print("ast method:\t\t", (time() - start) / reps)

start = time()
for i in range(0, reps):
    [n.strip() for n in my_str]
print("strip method:\t", (time() - start) / reps)



    regex method:    6.391477584838867e-07
    json method:     2.535374164581299e-06
    ast method:      2.4425282478332518e-05
    strip method:    4.983267784118653e-06

因此,最终正则表达式获胜!

So, following all the answers I decided to time the most common methods:

from time import time
import re
import json


my_str = str(list(range(19)))
print(my_str)

reps = 100000

start = time()
for i in range(0, reps):
    re.findall("\w+", my_str)
print("Regex method:\t", (time() - start) / reps)

start = time()
for i in range(0, reps):
    json.loads(my_str)
print("json method:\t", (time() - start) / reps)

start = time()
for i in range(0, reps):
    ast.literal_eval(my_str)
print("ast method:\t\t", (time() - start) / reps)

start = time()
for i in range(0, reps):
    [n.strip() for n in my_str]
print("strip method:\t", (time() - start) / reps)



    regex method:    6.391477584838867e-07
    json method:     2.535374164581299e-06
    ast method:      2.4425282478332518e-05
    strip method:    4.983267784118653e-06

So in the end regex wins!


回答 14

您可以通过从列表的字符串表示中切下第一个和最后符来节省.strip()fcn(请参见下面的第三行)

>>> mylist=[1,2,3,4,5,'baloney','alfalfa']
>>> strlist=str(mylist)
['1', ' 2', ' 3', ' 4', ' 5', " 'baloney'", " 'alfalfa'"]
>>> mylistfromstring=(strlist[1:-1].split(', '))
>>> mylistfromstring[3]
'4'
>>> for entry in mylistfromstring:
...     print(entry)
...     type(entry)
... 
1
<class 'str'>
2
<class 'str'>
3
<class 'str'>
4
<class 'str'>
5
<class 'str'>
'baloney'
<class 'str'>
'alfalfa'
<class 'str'>

you can save yourself the .strip() fcn by just slicing off the first and last characters from the string representation of the list (see third line below)

>>> mylist=[1,2,3,4,5,'baloney','alfalfa']
>>> strlist=str(mylist)
['1', ' 2', ' 3', ' 4', ' 5', " 'baloney'", " 'alfalfa'"]
>>> mylistfromstring=(strlist[1:-1].split(', '))
>>> mylistfromstring[3]
'4'
>>> for entry in mylistfromstring:
...     print(entry)
...     type(entry)
... 
1
<class 'str'>
2
<class 'str'>
3
<class 'str'>
4
<class 'str'>
5
<class 'str'>
'baloney'
<class 'str'>
'alfalfa'
<class 'str'>

您如何从字符串列表中创建逗号分隔的字符串?

问题:您如何从字符串列表中创建逗号分隔的字符串?

您最好采用哪种方法来连接序列中的字符串,以便在每两个连续对之间添加一个逗号。也就是说,例如,您如何映射['a', 'b', 'c']'a,b,c'?(案例['s'][]应该分别映射到's'''。)

我通常最终会使用类似的东西''.join(map(lambda x: x+',',l))[:-1],但也会感到有些不满意。

What would be your preferred way to concatenate strings from a sequence such that between every two consecutive pairs a comma is added. That is, how do you map, for instance, ['a', 'b', 'c'] to 'a,b,c'? (The cases ['s'] and [] should be mapped to 's' and '', respectively.)

I usually end up using something like ''.join(map(lambda x: x+',',l))[:-1], but also feeling somewhat unsatisfied.


回答 0

my_list = ['a', 'b', 'c', 'd']
my_string = ','.join(my_list)
'a,b,c,d'

如果列表包含整数,则此方法无效


并且如果列表包含非字符串类型(例如整数,浮点数,布尔值,无),则请执行以下操作:

my_string = ','.join(map(str, my_list)) 
my_list = ['a', 'b', 'c', 'd']
my_string = ','.join(my_list)
'a,b,c,d'

This won’t work if the list contains integers


And if the list contains non-string types (such as integers, floats, bools, None) then do:

my_string = ','.join(map(str, my_list)) 

回答 1

为什么map/ lambda魔术?这不行吗?

>>> foo = ['a', 'b', 'c']
>>> print(','.join(foo))
a,b,c
>>> print(','.join([]))

>>> print(','.join(['a']))
a

如果列表中有数字,则可以使用列表理解:

>>> ','.join([str(x) for x in foo])

或生成器表达式:

>>> ','.join(str(x) for x in foo)

Why the map/lambda magic? Doesn’t this work?

>>> foo = ['a', 'b', 'c']
>>> print(','.join(foo))
a,b,c
>>> print(','.join([]))

>>> print(','.join(['a']))
a

In case if there are numbers in the list, you could use list comprehension:

>>> ','.join([str(x) for x in foo])

or a generator expression:

>>> ','.join(str(x) for x in foo)

回答 2

",".join(l)不适用于所有情况。我建议将CSV模块与StringIO一起使用

import StringIO
import csv

l = ['list','of','["""crazy"quotes"and\'',123,'other things']

line = StringIO.StringIO()
writer = csv.writer(line)
writer.writerow(l)
csvcontent = line.getvalue()
# 'list,of,"[""""""crazy""quotes""and\'",123,other things\r\n'

",".join(l) will not work for all cases. I’d suggest using the csv module with StringIO

import StringIO
import csv

l = ['list','of','["""crazy"quotes"and\'',123,'other things']

line = StringIO.StringIO()
writer = csv.writer(line)
writer.writerow(l)
csvcontent = line.getvalue()
# 'list,of,"[""""""crazy""quotes""and\'",123,other things\r\n'

回答 3

这是Python 3.0中允许非字符串列表项的替代解决方案:

>>> alist = ['a', 1, (2, 'b')]
  • 标准方式

    >>> ", ".join(map(str, alist))
    "a, 1, (2, 'b')"
  • 替代解决方案

    >>> import io
    >>> s = io.StringIO()
    >>> print(*alist, file=s, sep=', ', end='')
    >>> s.getvalue()
    "a, 1, (2, 'b')"

注意:逗号后的空格是故意的。

Here is a alternative solution in Python 3.0 which allows non-string list items:

>>> alist = ['a', 1, (2, 'b')]
  • a standard way

    >>> ", ".join(map(str, alist))
    "a, 1, (2, 'b')"
    
  • the alternative solution

    >>> import io
    >>> s = io.StringIO()
    >>> print(*alist, file=s, sep=', ', end='')
    >>> s.getvalue()
    "a, 1, (2, 'b')"
    

NOTE: The space after comma is intentional.


回答 4

你不只是想要:

",".join(l)

显然,如果您需要在值中引用/转义逗号等,它将变得更加复杂。在这种情况下,我建议您查看标准库中的csv模块:

https://docs.python.org/library/csv.html

Don’t you just want:

",".join(l)

Obviously it gets more complicated if you need to quote/escape commas etc in the values. In that case I would suggest looking at the csv module in the standard library:

https://docs.python.org/library/csv.html


回答 5

彼得·霍夫曼(Peter Hoffmann)

使用生成器表达式的好处是还可以生成迭代器,但可以节省导入itertools的时间。此外,列表推导通常首选映射,因此,我希望生成器表达式比imap首选。

>>> l = [1, "foo", 4 ,"bar"]
>>> ",".join(str(bit) for bit in l)
'1,foo,4,bar' 

@Peter Hoffmann

Using generator expressions has the benefit of also producing an iterator but saves importing itertools. Furthermore, list comprehensions are generally preferred to map, thus, I’d expect generator expressions to be preferred to imap.

>>> l = [1, "foo", 4 ,"bar"]
>>> ",".join(str(bit) for bit in l)
'1,foo,4,bar' 

回答 6

>>> my_list = ['A', '', '', 'D', 'E',]
>>> ",".join([str(i) for i in my_list if i])
'A,D,E'

my_list可以包含任何类型的变量。这样可以避免结果'A,,,D,E'

>>> my_list = ['A', '', '', 'D', 'E',]
>>> ",".join([str(i) for i in my_list if i])
'A,D,E'

my_list may contain any type of variables. This avoid the result 'A,,,D,E'.


回答 7

l=['a', 1, 'b', 2]

print str(l)[1:-1]

Output: "'a', 1, 'b', 2"
l=['a', 1, 'b', 2]

print str(l)[1:-1]

Output: "'a', 1, 'b', 2"

回答 8

使用列表推导的@ jmanning2k不利于创建新的临时列表。更好的解决方案是使用itertools.imap返回一个迭代器

from itertools import imap
l = [1, "foo", 4 ,"bar"]
",".join(imap(str, l))

@jmanning2k using a list comprehension has the downside of creating a new temporary list. The better solution would be using itertools.imap which returns an iterator

from itertools import imap
l = [1, "foo", 4 ,"bar"]
",".join(imap(str, l))

回答 9

这是清单的例子

>>> myList = [['Apple'],['Orange']]
>>> myList = ','.join(map(str, [i[0] for i in myList])) 
>>> print "Output:", myList
Output: Apple,Orange

更准确的:-

>>> myList = [['Apple'],['Orange']]
>>> myList = ','.join(map(str, [type(i) == list and i[0] for i in myList])) 
>>> print "Output:", myList
Output: Apple,Orange

示例2:

myList = ['Apple','Orange']
myList = ','.join(map(str, myList)) 
print "Output:", myList
Output: Apple,Orange

Here is an example with list

>>> myList = [['Apple'],['Orange']]
>>> myList = ','.join(map(str, [i[0] for i in myList])) 
>>> print "Output:", myList
Output: Apple,Orange

More Accurate:-

>>> myList = [['Apple'],['Orange']]
>>> myList = ','.join(map(str, [type(i) == list and i[0] for i in myList])) 
>>> print "Output:", myList
Output: Apple,Orange

Example 2:-

myList = ['Apple','Orange']
myList = ','.join(map(str, myList)) 
print "Output:", myList
Output: Apple,Orange

回答 10

我要说的csv是,库是这里唯一明智的选择,因为它是为应对所有csv用例(例如字符串中的逗号)而构建的。

要将列表输出l到.csv文件,请执行以下操作:

import csv
with open('some.csv', 'w', newline='') as f:
    writer = csv.writer(f)
    writer.writerow(l)  # this will output l as a single row.  

也可以用于writer.writerows(iterable)将多行输出到csv。

此示例与Python 3兼容,此处使用的另一个答案StringIO是Python 2。

I would say the csv library is the only sensible option here, as it was built to cope with all csv use cases such as commas in a string, etc.

To output a list l to a .csv file:

import csv
with open('some.csv', 'w', newline='') as f:
    writer = csv.writer(f)
    writer.writerow(l)  # this will output l as a single row.  

It is also possible to use writer.writerows(iterable) to output multiple rows to csv.

This example is compatible with Python 3, as the other answer here used StringIO which is Python 2.


回答 11

除非我缺少任何东西,否则','.join(foo)应该做您所要的。

>>> ','.join([''])
''
>>> ','.join(['s'])
's'
>>> ','.join(['a','b','c'])
'a,b,c'

(编辑:正如jmanning2k指出的那样,

','.join([str(x) for x in foo])

是更安全且相当Pythonic的,尽管如果元素可以包含逗号,则生成的字符串将很难解析-那时,您需要csv模块的全部功能,正如Douglas在他的回答中指出的那样。)

Unless I’m missing something, ','.join(foo) should do what you’re asking for.

>>> ','.join([''])
''
>>> ','.join(['s'])
's'
>>> ','.join(['a','b','c'])
'a,b,c'

(edit: and as jmanning2k points out,

','.join([str(x) for x in foo])

is safer and quite Pythonic, though the resulting string will be difficult to parse if the elements can contain commas — at that point, you need the full power of the csv module, as Douglas points out in his answer.)


回答 12

我的两分钱。我喜欢更简单的python单行代码:

>>> from itertools import imap, ifilter
>>> l = ['a', '', 'b', 1, None]
>>> ','.join(imap(str, ifilter(lambda x: x, l)))
a,b,1
>>> m = ['a', '', None]
>>> ','.join(imap(str, ifilter(lambda x: x, m)))
'a'

这是pythonic,适用于字符串,数字,无和空字符串。它很短并且满足要求。如果列表中不包含数字,则可以使用以下更简单的变体:

>>> ','.join(ifilter(lambda x: x, l))

同样,该解决方案不会创建新列表,而是使用迭代器,如@Peter Hoffmann指出的(谢谢)。

My two cents. I like simpler an one-line code in python:

>>> from itertools import imap, ifilter
>>> l = ['a', '', 'b', 1, None]
>>> ','.join(imap(str, ifilter(lambda x: x, l)))
a,b,1
>>> m = ['a', '', None]
>>> ','.join(imap(str, ifilter(lambda x: x, m)))
'a'

It’s pythonic, works for strings, numbers, None and empty string. It’s short and satisfies the requirements. If the list is not going to contain numbers, we can use this simpler variation:

>>> ','.join(ifilter(lambda x: x, l))

Also this solution doesn’t create a new list, but uses an iterator, like @Peter Hoffmann pointed (thanks).


如何获得字符在Python中的位置?

问题:如何获得字符在Python中的位置?

如何在python中的字符串中获取字符的位置?

How can I get the position of a character inside a string in python?


回答 0

这有两个String的方法,find()index()。两者之间的区别在于找不到搜索字符串时会发生什么。 find()回报-1index()加薪ValueError

使用 find()

>>> myString = 'Position of a character'
>>> myString.find('s')
2
>>> myString.find('x')
-1

使用 index()

>>> myString = 'Position of a character'
>>> myString.index('s')
2
>>> myString.index('x')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
ValueError: substring not found

Python手册

string.find(s, sub[, start[, end]])
返回s中找到子字符串sub的最低索引,使sub完全包含在中s[start:end]-1失败返回。开始结束以及负值的解释默认值与切片相同。

和:

string.index(s, sub[, start[, end]])
喜欢,find()但是ValueError在找不到子字符串时提高。

There are two string methods for this, find() and index(). The difference between the two is what happens when the search string isn’t found. find() returns -1 and index() raises ValueError.

Using find()

>>> myString = 'Position of a character'
>>> myString.find('s')
2
>>> myString.find('x')
-1

Using index()

>>> myString = 'Position of a character'
>>> myString.index('s')
2
>>> myString.index('x')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
ValueError: substring not found

From the Python manual

string.find(s, sub[, start[, end]])
Return the lowest index in s where the substring sub is found such that sub is wholly contained in s[start:end]. Return -1 on failure. Defaults for start and end and interpretation of negative values is the same as for slices.

And:

string.index(s, sub[, start[, end]])
Like find() but raise ValueError when the substring is not found.


回答 1

仅出于完整性考虑,如果需要查找字符串中字符的所有位置,可以执行以下操作:

s = 'shak#spea#e'
c = '#'
print [pos for pos, char in enumerate(s) if char == c]

它将返回 [4, 9]

Just for a sake of completeness, if you need to find all positions of a character in a string, you can do the following:

s = 'shak#spea#e'
c = '#'
print [pos for pos, char in enumerate(s) if char == c]

which will return [4, 9]


回答 2

>>> s="mystring"
>>> s.index("r")
4
>>> s.find("r")
4

“长ed”方式

>>> for i,c in enumerate(s):
...   if "r"==c: print i
...
4

得到子串,

>>> s="mystring"
>>> s[4:10]
'ring'
>>> s="mystring"
>>> s.index("r")
4
>>> s.find("r")
4

“Long winded” way

>>> for i,c in enumerate(s):
...   if "r"==c: print i
...
4

to get substring,

>>> s="mystring"
>>> s[4:10]
'ring'

回答 3

只是为了完成,如果要在文件名中找到扩展名以进行检查,则需要找到最后一个“。”,在这种情况下,请使用rfind:

path = 'toto.titi.tata..xls'
path.find('.')
4
path.rfind('.')
15

就我而言,我使用以下命令,无论完整的文件名是什么,它都可以工作:

filename_without_extension = complete_name[:complete_name.rfind('.')]

Just for completion, in the case I want to find the extension in a file name in order to check it, I need to find the last ‘.’, in this case use rfind:

path = 'toto.titi.tata..xls'
path.find('.')
4
path.rfind('.')
15

in my case, I use the following, which works whatever the complete file name is:

filename_without_extension = complete_name[:complete_name.rfind('.')]

回答 4

当字符串包含重复字符时会发生什么?从我的经验中,index()我看到重复的结果会返回相同的索引。

例如:

s = 'abccde'
for c in s:
    print('%s, %d' % (c, s.index(c)))

会返回:

a, 0
b, 1
c, 2
c, 2
d, 4

在这种情况下,您可以执行以下操作:

for i, character in enumerate(my_string):
   # i is the position of the character in the string

What happens when the string contains a duplicate character? from my experience with index() I saw that for duplicate you get back the same index.

For example:

s = 'abccde'
for c in s:
    print('%s, %d' % (c, s.index(c)))

would return:

a, 0
b, 1
c, 2
c, 2
d, 4

In that case you can do something like that:

for i, character in enumerate(my_string):
   # i is the position of the character in the string

回答 5

string.find(character)  
string.index(character)  

也许您想看一下文档,找出两者之间的区别。

string.find(character)  
string.index(character)  

Perhaps you’d like to have a look at the documentation to find out what the difference between the two is.


回答 6

一个字符可能在字符串中多次出现。例如,在字符串中sentence,位置eis是1, 4, 7(因为索引通常从零开始)。但是我发现这是两个函数,find()并且index()返回字符的第一个位置。因此,这样做可以解决此问题:

def charposition(string, char):
    pos = [] #list to store positions for each 'char' in 'string'
    for n in range(len(string)):
        if string[n] == char:
            pos.append(n)
    return pos

s = "sentence"
print(charposition(s, 'e')) 

#Output: [1, 4, 7]

A character might appear multiple times in a string. For example in a string sentence, position of e is 1, 4, 7 (because indexing usually starts from zero). but what I find is both of the functions find() and index() returns first position of a character. So, this can be solved doing this:

def charposition(string, char):
    pos = [] #list to store positions for each 'char' in 'string'
    for n in range(len(string)):
        if string[n] == char:
            pos.append(n)
    return pos

s = "sentence"
print(charposition(s, 'e')) 

#Output: [1, 4, 7]

回答 7

more_itertools.locate 是一种第三方工具,用于查找满足条件的所有项目的索引。

在这里,我们找到字母的所有索引位置"i"

import more_itertools as mit


s = "supercalifragilisticexpialidocious"
list(mit.locate(s, lambda x: x == "i"))
# [8, 13, 15, 18, 23, 26, 30]

more_itertools.locate is a third-party tool that finds all indicies of items that satisfy a condition.

Here we find all index locations of the letter "i".

import more_itertools as mit


s = "supercalifragilisticexpialidocious"
list(mit.locate(s, lambda x: x == "i"))
# [8, 13, 15, 18, 23, 26, 30]

回答 8

使用numpy的解决方案可以快速访问所有索引:

string_array = np.array(list(my_string))
char_indexes = np.where(string_array == 'C')

A solution with numpy for quick access to all indexes:

string_array = np.array(list(my_string))
char_indexes = np.where(string_array == 'C')

使用Python迭代字符串中的每个字符

问题:使用Python迭代字符串中的每个字符

在C ++中,我可以std::string像这样迭代:

std::string str = "Hello World!";

for (int i = 0; i < str.length(); ++i)
{
    std::cout << str[i] << std::endl;
}

如何在Python中遍历字符串?

In C++, I can iterate over an std::string like this:

std::string str = "Hello World!";

for (int i = 0; i < str.length(); ++i)
{
    std::cout << str[i] << std::endl;
}

How do I iterate over a string in Python?


回答 0

正如约翰内斯指出的那样,

for c in "string":
    #do something with c

您可以使用struct迭代python中的几乎所有内容for loop

例如,open("file.txt")返回文件对象(并打开文件),对其进行迭代,然后对该文件中的行进行迭代

with open(filename) as f:
    for line in f:
        # do something with line

如果那看起来像魔术,那还算不错,但是它背后的想法真的很简单。

有一个简单的迭代器协议可以应用于任何对象,以使for循环在其上起作用。

只需实现定义一个next()方法的迭代器,然后__iter__在类上实现一个方法使其可迭代即可。(__iter__当然,应返回一个迭代器对象,即定义的对象next()

参阅官方文件

As Johannes pointed out,

for c in "string":
    #do something with c

You can iterate pretty much anything in python using the for loop construct,

for example, open("file.txt") returns a file object (and opens the file), iterating over it iterates over lines in that file

with open(filename) as f:
    for line in f:
        # do something with line

If that seems like magic, well it kinda is, but the idea behind it is really simple.

There’s a simple iterator protocol that can be applied to any kind of object to make the for loop work on it.

Simply implement an iterator that defines a next() method, and implement an __iter__ method on a class to make it iterable. (the __iter__ of course, should return an iterator object, that is, an object that defines next())

See official documentation


回答 1

如果在遍历字符串时需要访问索引,请使用enumerate()

>>> for i, c in enumerate('test'):
...     print i, c
... 
0 t
1 e
2 s
3 t

If you need access to the index as you iterate through the string, use enumerate():

>>> for i, c in enumerate('test'):
...     print i, c
... 
0 t
1 e
2 s
3 t

回答 2

更简单:

for c in "test":
    print c

Even easier:

for c in "test":
    print c

回答 3

只是为了做出更全面的回答,如果您确实想将方钉强行塞入圆孔中,则对字符串进行迭代的C方法可以在Python中应用。

i = 0
while i < len(str):
    print str[i]
    i += 1

但是话又说回来,当字符串具有固有的可迭代性时,为什么要这样做呢?

for i in str:
    print i

Just to make a more comprehensive answer, the C way of iterating over a string can apply in Python, if you really wanna force a square peg into a round hole.

i = 0
while i < len(str):
    print str[i]
    i += 1

But then again, why do that when strings are inherently iterable?

for i in str:
    print i

回答 4

好吧,您也可以像这样做一些有趣的事情,并通过使用for循环来完成您的工作

#suppose you have variable name
name = "Mr.Suryaa"
for index in range ( len ( name ) ):
    print ( name[index] ) #just like c and c++ 

答案是

先生 。苏里亚

但是,由于range()创建的是序列值的列表,因此您可以直接使用名称

for e in name:
    print(e)

这也可以产生相同的结果,并且看起来更好,并且可以与列表,元组和字典之类的任何序列一起使用。

我们曾经使用过“内置函数”(Python社区中的BIF)

1)range()-range()BIF用于创建索引示例

for i in range ( 5 ) :
can produce 0 , 1 , 2 , 3 , 4

2)len()-len()BIF用于找出给定字符串的长度

Well you can also do something interesting like this and do your job by using for loop

#suppose you have variable name
name = "Mr.Suryaa"
for index in range ( len ( name ) ):
    print ( name[index] ) #just like c and c++ 

Answer is

M r . S u r y a a

However since range() create a list of the values which is sequence thus you can directly use the name

for e in name:
    print(e)

This also produces the same result and also looks better and works with any sequence like list, tuple, and dictionary.

We have used tow Built in Functions ( BIFs in Python Community )

1) range() – range() BIF is used to create indexes Example

for i in range ( 5 ) :
can produce 0 , 1 , 2 , 3 , 4

2) len() – len() BIF is used to find out the length of given string


回答 5

如果您想使用一种更实用的方法遍历字符串(可能以某种方式进行转换),则可以将字符串拆分为字符,对每个函数应用一个函数,然后将所得的字符列表重新组合为字符串。

字符串本质上是一个字符列表,因此“ map”将遍历字符串-作为第二个参数-将函数-第一个参数应用于每个参数。

例如,这里我使用一种简单的lambda方法,因为我要做的只是对字符的微不足道的修改:在这里,增加每个字符的值:

>>> ''.join(map(lambda x: chr(ord(x)+1), "HAL"))
'IBM'

或更一般而言:

>>> ''.join(map(my_function, my_string))

其中my_function接受一个char值并返回一个char值。

If you would like to use a more functional approach to iterating over a string (perhaps to transform it somehow), you can split the string into characters, apply a function to each one, then join the resulting list of characters back into a string.

A string is inherently a list of characters, hence ‘map’ will iterate over the string – as second argument – applying the function – the first argument – to each one.

For example, here I use a simple lambda approach since all I want to do is a trivial modification to the character: here, to increment each character value:

>>> ''.join(map(lambda x: chr(ord(x)+1), "HAL"))
'IBM'

or more generally:

>>> ''.join(map(my_function, my_string))

where my_function takes a char value and returns a char value.


回答 6

这里使用几个答案rangexrange通常会更好,因为它返回生成器,而不是完全实例化的列表。在内存和/或长度可变的可迭代项可能成为问题的情况下,它xrange是优越的。

Several answers here use range. xrange is generally better as it returns a generator, rather than a fully-instantiated list. Where memory and or iterables of widely-varying lengths can be an issue, xrange is superior.


回答 7

如果您曾经在需要的情况下运行get the next char of the word using __next__(),请记住创建一个string_iterator并对其进行迭代,而不要迭代original string (it does not have the __next__() method)

在此示例中,当我找到一个char =时,[我一直在寻找下一个单词,但没有找到],所以我需要使用__next__

这里的字符串的for循环将无济于事

myString = "'string' 4 '['RP0', 'LC0']' '[3, 4]' '[3, '4']'"
processedInput = ""
word_iterator = myString.__iter__()
for idx, char in enumerate(word_iterator):
    if char == "'":
        continue

    processedInput+=char

    if char == '[':
        next_char=word_iterator.__next__()
        while(next_char != "]"):
          processedInput+=next_char
          next_char=word_iterator.__next__()
        else:
          processedInput+=next_char

If you ever run in a situation where you need to get the next char of the word using __next__(), remember to create a string_iterator and iterate over it and not the original string (it does not have the __next__() method)

In this example, when I find a char = [ I keep looking into the next word while I don’t find ], so I need to use __next__

here a for loop over the string wouldn’t help

myString = "'string' 4 '['RP0', 'LC0']' '[3, 4]' '[3, '4']'"
processedInput = ""
word_iterator = myString.__iter__()
for idx, char in enumerate(word_iterator):
    if char == "'":
        continue

    processedInput+=char

    if char == '[':
        next_char=word_iterator.__next__()
        while(next_char != "]"):
          processedInput+=next_char
          next_char=word_iterator.__next__()
        else:
          processedInput+=next_char

如何用空格填充Python字符串?

问题:如何用空格填充Python字符串?

我想用空格填充字符串。我知道以下内容适用于零:

>>> print  "'%06d'"%4
'000004'

但是,当我想要这个怎么办?:

'hi    '

当然,我可以测量字符串长度并这样做str+" "*leftover,但我想用最短的方法。

I want to fill out a string with spaces. I know that the following works for zero’s:

>>> print  "'%06d'"%4
'000004'

But what should I do when I want this?:

'hi    '

of course I can measure string length and do str+" "*leftover, but I’d like the shortest way.


回答 0

您可以使用str.ljust(width[, fillchar])

返回长度为width的左对齐字符串。使用指定的fillchar(默认为空格)填充。如果width小于,则返回原始字符串len(s)

>>> 'hi'.ljust(10)
'hi        '

You can do this with str.ljust(width[, fillchar]):

Return the string left justified in a string of length width. Padding is done using the specified fillchar (default is a space). The original string is returned if width is less than len(s).

>>> 'hi'.ljust(10)
'hi        '

回答 1

为了即使在格式化复杂的字符串时也可以使用灵活的方法,您可能应该使用string-formatting mini-language,无论使用哪种str.format()方法

>>> '{0: <16} StackOverflow!'.format('Hi')  # Python >=2.6
'Hi               StackOverflow!'

F-串

>>> f'{"Hi": <16} StackOverflow!'  # Python >= 3.6
'Hi               StackOverflow!'

For a flexible method that works even when formatting complicated string, you probably should use the string-formatting mini-language, using either the str.format() method

>>> '{0: <16} StackOverflow!'.format('Hi')  # Python >=2.6
'Hi               StackOverflow!'

of f-strings

>>> f'{"Hi": <16} StackOverflow!'  # Python >= 3.6
'Hi               StackOverflow!'

回答 2

新的(ish)字符串格式方法使您可以使用嵌套关键字参数来做一些有趣的事情。最简单的情况:

>>> '{message: <16}'.format(message='Hi')
'Hi             '

如果要16作为变量传递:

>>> '{message: <{width}}'.format(message='Hi', width=16)
'Hi              '

如果要为整个工具包和kaboodle传递变量,请执行以下操作

'{message:{fill}{align}{width}}'.format(
   message='Hi',
   fill=' ',
   align='<',
   width=16,
)

结果(您猜对了):

'Hi              '

The new(ish) string format method lets you do some fun stuff with nested keyword arguments. The simplest case:

>>> '{message: <16}'.format(message='Hi')
'Hi             '

If you want to pass in 16 as a variable:

>>> '{message: <{width}}'.format(message='Hi', width=16)
'Hi              '

If you want to pass in variables for the whole kit and kaboodle:

'{message:{fill}{align}{width}}'.format(
   message='Hi',
   fill=' ',
   align='<',
   width=16,
)

Which results in (you guessed it):

'Hi              '

回答 3

您可以尝试以下方法:

print "'%-100s'" % 'hi'

You can try this:

print "'%-100s'" % 'hi'

回答 4

正确的方法是使用官方文档中所述的Python格式语法

对于这种情况,它将简单地是:
'{:10}'.format('hi')
哪个输出:
'hi '

说明:

format_spec ::=  [[fill]align][sign][#][0][width][,][.precision][type]
fill        ::=  <any character>
align       ::=  "<" | ">" | "=" | "^"
sign        ::=  "+" | "-" | " "
width       ::=  integer
precision   ::=  integer
type        ::=  "b" | "c" | "d" | "e" | "E" | "f" | "F" | "g" | "G" | "n" | "o" | "s" | "x" | "X" | "%"

您几乎需要知道的全部都在那里^。

更新:从python 3.6开始,使用文字字符串插值更加方便!

foo = 'foobar'
print(f'{foo:10} is great!')
# foobar     is great!

Correct way of doing this would be to use Python’s format syntax as described in the official documentation

For this case it would simply be:
'{:10}'.format('hi')
which outputs:
'hi '

Explanation:

format_spec ::=  [[fill]align][sign][#][0][width][,][.precision][type]
fill        ::=  <any character>
align       ::=  "<" | ">" | "=" | "^"
sign        ::=  "+" | "-" | " "
width       ::=  integer
precision   ::=  integer
type        ::=  "b" | "c" | "d" | "e" | "E" | "f" | "F" | "g" | "G" | "n" | "o" | "s" | "x" | "X" | "%"

Pretty much all you need to know is there ^.

Update: as of python 3.6 it’s even more convenient with literal string interpolation!

foo = 'foobar'
print(f'{foo:10} is great!')
# foobar     is great!

回答 5

用途str.ljust()

>>> 'Hi'.ljust(6)
'Hi    '

您还应该考虑string.zfill()str.ljust()以及str.center()用于字符串格式化。这些可以链接起来并指定“ fill ”字符,因此:

>>> ('3'.zfill(8) + 'blind'.rjust(8) + 'mice'.ljust(8, '.')).center(40)
'        00000003   blindmice....        '

这些字符串格式化操作的优势在于可以在Python v2和v3中使用。

看一下pydoc str某个时间:里面有很多好东西。

Use str.ljust():

>>> 'Hi'.ljust(6)
'Hi    '

You should also consider string.zfill(), str.ljust() and str.center() for string formatting. These can be chained and have the ‘fill‘ character specified, thus:

>>> ('3'.zfill(8) + 'blind'.rjust(8) + 'mice'.ljust(8, '.')).center(40)
'        00000003   blindmice....        '

These string formatting operations have the advantage of working in Python v2 and v3.

Take a look at pydoc str sometime: there’s a wealth of good stuff in there.


回答 6

从Python 3.6开始,您可以执行

>>> strng = 'hi'
>>> f'{strng: <10}'

文字字符串插值

或者,如果您的填充大小在变量中,例如这样(感谢@Matt M.!):

>>> to_pad = 10
>>> f'{strng: <{to_pad}}'

As of Python 3.6 you can just do

>>> strng = 'hi'
>>> f'{strng: <10}'

with literal string interpolation.

Or, if your padding size is in a variable, like this (thanks @Matt M.!):

>>> to_pad = 10
>>> f'{strng: <{to_pad}}'

回答 7

您还可以将字符串居中

'{0: ^20}'.format('nice')

you can also center your string:

'{0: ^20}'.format('nice')

回答 8

对字符串使用Python 2.7的迷你格式

'{0: <8}'.format('123')

左对齐,并用”字符填充到8个字符。

Use Python 2.7’s mini formatting for strings:

'{0: <8}'.format('123')

This left aligns, and pads to 8 characters with the ‘ ‘ character.


回答 9

只需删除0,它将增加空间:

>>> print  "'%6d'"%4

Just remove the 0 and it will add space instead:

>>> print  "'%6d'"%4

回答 10

使用切片会不会更pythonic?

例如,要在字符串的右边填充空格,直到其长度为10个字符:

>>> x = "string"    
>>> (x + " " * 10)[:10]   
'string    '

要在其左侧填充空格,直到其长度为15个字符:

>>> (" " * 15 + x)[-15:]
'         string'

当然,它需要知道要填充多长时间,但是并不需要测量开始的字符串的长度。

Wouldn’t it be more pythonic to use slicing?

For example, to pad a string with spaces on the right until it’s 10 characters long:

>>> x = "string"    
>>> (x + " " * 10)[:10]   
'string    '

To pad it with spaces on the left until it’s 15 characters long:

>>> (" " * 15 + x)[-15:]
'         string'

It requires knowing how long you want to pad to, of course, but it doesn’t require measuring the length of the string you’re starting with.


回答 11

一个很好的技巧来代替各种打印格式:

(1)在右边加空格:

('hi' + '        ')[:8]

(2)在左前导零处填充:

('0000' + str(2))[-4:]

A nice trick to use in place of the various print formats:

(1) Pad with spaces to the right:

('hi' + '        ')[:8]

(2) Pad with leading zeros on the left:

('0000' + str(2))[-4:]

回答 12

您可以使用列表理解来做到这一点,这也会使您对空格的数量有所了解,并且只能是一个内衬。

"hello" + " ".join([" " for x in range(1,10)])
output --> 'hello                 '

You could do it using list comprehension, this’d give you an idea about the number of spaces too and would be a one liner.

"hello" + " ".join([" " for x in range(1,10)])
output --> 'hello                 '

从字符串中删除最后符

问题:从字符串中删除最后符

假设我的字符串长10个字符。

如何删除最后符?

如果我的字符串是"abcdefghij"(我不想替换'j'字符,因为我的字符串可能包含多个'j'字符),我只希望最后符消失。无论它是什么或发生多少次,我都需要从字符串中删除最后符。

Let’s say my string is 10 characters long.

How do I remove the last character?

If my string is "abcdefghij" (I do not want to replace the 'j' character, since my string may contain multiple 'j' characters) I only want the last character gone. Regardless of what it is or how many times it occurs, I need to remove the last character from my string.


回答 0

简单:

st =  "abcdefghij"
st = st[:-1]

还有另一种方法可以显示如何通过步骤完成:

list1 = "abcdefghij"
list2 = list(list1)
print(list2)
list3 = list2[:-1]
print(list3)

这也是用户输入的一种方式:

list1 = input ("Enter :")
list2 = list(list1)
print(list2)
list3 = list2[:-1]
print(list3)

为了使它带走列表中的最后一个单词:

list1 = input("Enter :")
list2 = list1.split()
print(list2)
list3 = list2[:-1]
print(list3)

Simple:

st =  "abcdefghij"
st = st[:-1]

There is also another way that shows how it is done with steps:

list1 = "abcdefghij"
list2 = list(list1)
print(list2)
list3 = list2[:-1]
print(list3)

This is also a way with user input:

list1 = input ("Enter :")
list2 = list(list1)
print(list2)
list3 = list2[:-1]
print(list3)

To make it take away the last word in a list:

list1 = input("Enter :")
list2 = list1.split()
print(list2)
list3 = list2[:-1]
print(list3)

回答 1

您正在尝试做的是Python 中字符串切片的扩展:

假设所有字符串的长度为10,最后符将被删除:

>>> st[:9]
'abcdefghi'

删除最后一个N字符:

>>> N = 3
>>> st[:-N]
'abcdefg'

What you are trying to do is an extension of string slicing in Python:

Say all strings are of length 10, last char to be removed:

>>> st[:9]
'abcdefghi'

To remove last N characters:

>>> N = 3
>>> st[:-N]
'abcdefg'

将Unicode字符串转换为Python中的字符串(包含多余的符号)

问题:将Unicode字符串转换为Python中的字符串(包含多余的符号)

如何将Unicode字符串(包含额外的字符,如£$等)转换为Python字符串?

How do you convert a Unicode string (containing extra characters like £ $, etc.) into a Python string?


回答 0

title = u"Klüft skräms inför på fédéral électoral große"
import unicodedata
unicodedata.normalize('NFKD', title).encode('ascii','ignore')
'Kluft skrams infor pa federal electoral groe'
title = u"Klüft skräms inför på fédéral électoral große"
import unicodedata
unicodedata.normalize('NFKD', title).encode('ascii','ignore')
'Kluft skrams infor pa federal electoral groe'

回答 1

如果不需要翻译非ASCII字符,则可以使用编码为ASCII:

>>> a=u"aaaàçççñññ"
>>> type(a)
<type 'unicode'>
>>> a.encode('ascii','ignore')
'aaa'
>>> a.encode('ascii','replace')
'aaa???????'
>>>

You can use encode to ASCII if you don’t need to translate the non-ASCII characters:

>>> a=u"aaaàçççñññ"
>>> type(a)
<type 'unicode'>
>>> a.encode('ascii','ignore')
'aaa'
>>> a.encode('ascii','replace')
'aaa???????'
>>>

回答 2

>>> text=u'abcd'
>>> str(text)
'abcd'

如果字符串仅包含ascii字符。

>>> text=u'abcd'
>>> str(text)
'abcd'

If the string only contains ascii characters.


回答 3

如果您有Unicode字符串,并且想要将其写入文件或其他序列化形式,则必须首先将其编码为可以存储的特定表示形式。有几种常见的Unicode编码,例如UTF-16(大多数Unicode字符使用两个字节)或UTF-8(1-4个字节/代码点,取决于字符)等。要将该字符串转换为特定的编码,您可以可以使用:

>>> s= u'£10'
>>> s.encode('utf8')
'\xc2\x9c10'
>>> s.encode('utf16')
'\xff\xfe\x9c\x001\x000\x00'

可以将此原始字节字符串写入文件。但是,请注意,当读回它时,您必须知道它所使用的编码并使用相同的编码对其进行解码。

写入文件时,您可以使用编解码器模块摆脱此手动编码/解码过程。因此,要打开将所有Unicode字符串编码为UTF-8的文件,请使用:

import codecs
f = codecs.open('path/to/file.txt','w','utf8')
f.write(my_unicode_string)  # Stored on disk as UTF-8

请注意,正在使用这些文件的其他任何文件,如果要读取它们,都必须了解文件的编码格式。如果您是唯一一个进行读/写的人,那么这不是问题,否则请确保以一种其他任何使用文件都可以理解的形式书写。

在Python 3中,这种形式的文件访问是默认的,并且内置open函数将采用编码参数,并始终与以文本模式打开的文件在Unicode字符串(Python 3中的默认字符串对象)之间进行转换。

If you have a Unicode string, and you want to write this to a file, or other serialised form, you must first encode it into a particular representation that can be stored. There are several common Unicode encodings, such as UTF-16 (uses two bytes for most Unicode characters) or UTF-8 (1-4 bytes / codepoint depending on the character), etc. To convert that string into a particular encoding, you can use:

>>> s= u'£10'
>>> s.encode('utf8')
'\xc2\x9c10'
>>> s.encode('utf16')
'\xff\xfe\x9c\x001\x000\x00'

This raw string of bytes can be written to a file. However, note that when reading it back, you must know what encoding it is in and decode it using that same encoding.

When writing to files, you can get rid of this manual encode/decode process by using the codecs module. So, to open a file that encodes all Unicode strings into UTF-8, use:

import codecs
f = codecs.open('path/to/file.txt','w','utf8')
f.write(my_unicode_string)  # Stored on disk as UTF-8

Do note that anything else that is using these files must understand what encoding the file is in if they want to read them. If you are the only one doing the reading/writing this isn’t a problem, otherwise make sure that you write in a form understandable by whatever else uses the files.

In Python 3, this form of file access is the default, and the built-in open function will take an encoding parameter and always translate to/from Unicode strings (the default string object in Python 3) for files opened in text mode.


回答 4

这是一个例子:

>>> u = u'€€€'
>>> s = u.encode('utf8')
>>> s
'\xe2\x82\xac\xe2\x82\xac\xe2\x82\xac'

Here is an example:

>>> u = u'€€€'
>>> s = u.encode('utf8')
>>> s
'\xe2\x82\xac\xe2\x82\xac\xe2\x82\xac'

回答 5

好吧,如果您愿意/准备切换到Python 3(可能不是由于与某些Python 2代码的向后不兼容),则不必进行任何转换。Python 3中的所有文本均以Unicode字符串表示,这也意味着不再使用该u'<text>'语法。实际上,您还有字节字符串,用于表示数据(可以是编码字符串)。

http://docs.python.org/3.1/whatsnew/3.0.html#text-vs-data-instead-of-unicode-vs-8位

(当然,如果您当前使用的是Python 3,则问题可能与您尝试将文本保存到文件中有关。)

Well, if you’re willing/ready to switch to Python 3 (which you may not be due to the backwards incompatibility with some Python 2 code), you don’t have to do any converting; all text in Python 3 is represented with Unicode strings, which also means that there’s no more usage of the u'<text>' syntax. You also have what are, in effect, strings of bytes, which are used to represent data (which may be an encoded string).

http://docs.python.org/3.1/whatsnew/3.0.html#text-vs-data-instead-of-unicode-vs-8-bit

(Of course, if you’re currently using Python 3, then the problem is likely something to do with how you’re attempting to save the text to a file.)


回答 6

这是一个示例代码

import unicodedata    
raw_text = u"here $%6757 dfgdfg"
convert_text = unicodedata.normalize('NFKD', raw_text).encode('ascii','ignore')

Here is an example code

import unicodedata    
raw_text = u"here $%6757 dfgdfg"
convert_text = unicodedata.normalize('NFKD', raw_text).encode('ascii','ignore')

回答 7

文件包含Unicode字符串

\"message\": \"\\u0410\\u0432\\u0442\\u043e\\u0437\\u0430\\u0446\\u0438\\u044f .....\",

为了我

 f = open("56ad62-json.log", encoding="utf-8")
 qq=f.readline() 

 print(qq)                          
 {"log":\"message\": \"\\u0410\\u0432\\u0442\\u043e\\u0440\\u0438\\u0437\\u0430\\u0446\\u0438\\u044f \\u043f\\u043e\\u043b\\u044c\\u0437\\u043e\\u0432\\u0430\\u0442\\u0435\\u043b\\u044f\"}

(qq.encode().decode("unicode-escape").encode().decode("unicode-escape")) 
# '{"log":"message": "Авторизация пользователя"}\n'

file contain unicode-esaped string

\"message\": \"\\u0410\\u0432\\u0442\\u043e\\u0437\\u0430\\u0446\\u0438\\u044f .....\",

for me

 f = open("56ad62-json.log", encoding="utf-8")
 qq=f.readline() 

 print(qq)                          
 {"log":\"message\": \"\\u0410\\u0432\\u0442\\u043e\\u0440\\u0438\\u0437\\u0430\\u0446\\u0438\\u044f \\u043f\\u043e\\u043b\\u044c\\u0437\\u043e\\u0432\\u0430\\u0442\\u0435\\u043b\\u044f\"}

(qq.encode().decode("unicode-escape").encode().decode("unicode-escape")) 
# '{"log":"message": "Авторизация пользователя"}\n'

回答 8

对于我的情况,没有答案可用。在这里,我有一个包含unichar字符的字符串变量,在此没有解释的encoding-decode起作用。

如果我在航站楼里

echo "no me llama mucho la atenci\u00f3n"

要么

python3
>>> print("no me llama mucho la atenci\u00f3n")

输出正确:

output: no me llama mucho la atención

但是使用脚本加载此字符串变量无法正常工作。

这是对我的案例起作用的,以防万一:

string_to_convert = "no me llama mucho la atenci\u00f3n"
print(json.dumps(json.loads(r'"%s"' % string_to_convert), ensure_ascii=False))
output: no me llama mucho la atención

No answere worked for my case, where I had a string variable containing unicode chars, and no encode-decode explained here did the work.

If I do in a Terminal

echo "no me llama mucho la atenci\u00f3n"

or

python3
>>> print("no me llama mucho la atenci\u00f3n")

The output is correct:

output: no me llama mucho la atención

But working with scripts loading this string variable didn’t work.

This is what worked on my case, in case helps anybody:

string_to_convert = "no me llama mucho la atenci\u00f3n"
print(json.dumps(json.loads(r'"%s"' % string_to_convert), ensure_ascii=False))
output: no me llama mucho la atención

如何在不使用try / except的情况下检查字符串是否表示int?

问题:如何在不使用try / except的情况下检查字符串是否表示int?

有没有办法在不使用try / except机制的情况下判断字符串是否表示一个整数(例如'3''-17'但不是'3.14''asfasfas')?

is_int('3.14') = False
is_int('-7')   = True

Is there any way to tell whether a string represents an integer (e.g., '3', '-17' but not '3.14' or 'asfasfas') Without using a try/except mechanism?

is_int('3.14') = False
is_int('-7')   = True

回答 0

如果您真的很讨厌在try/except各处使用s,请编写一个辅助函数:

def RepresentsInt(s):
    try: 
        int(s)
        return True
    except ValueError:
        return False

>>> print RepresentsInt("+123")
True
>>> print RepresentsInt("10.0")
False

要完全覆盖Python认为整数的所有字符串,将需要更多的代码。我说这是pythonic。

If you’re really just annoyed at using try/excepts all over the place, please just write a helper function:

def RepresentsInt(s):
    try: 
        int(s)
        return True
    except ValueError:
        return False

>>> print RepresentsInt("+123")
True
>>> print RepresentsInt("10.0")
False

It’s going to be WAY more code to exactly cover all the strings that Python considers integers. I say just be pythonic on this one.


回答 1

使用正整数可以使用.isdigit

>>> '16'.isdigit()
True

它不适用于负整数。假设您可以尝试以下操作:

>>> s = '-17'
>>> s.startswith('-') and s[1:].isdigit()
True

它不适用于'16.0'格式,int在这种意义上类似于强制转换。

编辑

def check_int(s):
    if s[0] in ('-', '+'):
        return s[1:].isdigit()
    return s.isdigit()

with positive integers you could use .isdigit:

>>> '16'.isdigit()
True

it doesn’t work with negative integers though. suppose you could try the following:

>>> s = '-17'
>>> s.startswith('-') and s[1:].isdigit()
True

it won’t work with '16.0' format, which is similar to int casting in this sense.

edit:

def check_int(s):
    if s[0] in ('-', '+'):
        return s[1:].isdigit()
    return s.isdigit()

回答 2

您知道,无论出于何种原因,我都发现(并且已经对此进行了反复测试)try / except的效果并不理想。我经常尝试几种做事方法,但我认为我从来没有找到一种使用try / except来最好地完成那些测试的方法,实际上,在我看来,这些方法通常已经接近于最糟糕的,即使不是最糟糕的。并非在每种情况下,但在许多情况下。我知道很多人说这是“ Pythonic”方式,但这是我与他们分开的一个领域。对我来说,它既不是很好的表现也不是非常优雅,因此,我倾向于只将其用于错误捕获和报告。

我要抱怨的是,PHP,perl,ruby,C,甚至是怪异的shell都具有简单的功能来测试整数整数字符串,但是尽力验证这些假设使我大跌眼镜!显然,这种缺乏是一种常见的疾病。

这是布鲁诺帖子的快速而肮脏的编辑:

import sys, time, re

g_intRegex = re.compile(r"^([+-]?[1-9]\d*|0)$")

testvals = [
    # integers
    0, 1, -1, 1.0, -1.0,
    '0', '0.','0.0', '1', '-1', '+1', '1.0', '-1.0', '+1.0', '06',
    # non-integers
    'abc 123',
    1.1, -1.1, '1.1', '-1.1', '+1.1',
    '1.1.1', '1.1.0', '1.0.1', '1.0.0',
    '1.0.', '1..0', '1..',
    '0.0.', '0..0', '0..',
    'one', object(), (1,2,3), [1,2,3], {'one':'two'},
    # with spaces
    ' 0 ', ' 0.', ' .0','.01 '
]

def isInt_try(v):
    try:     i = int(v)
    except:  return False
    return True

def isInt_str(v):
    v = str(v).strip()
    return v=='0' or (v if v.find('..') > -1 else v.lstrip('-+').rstrip('0').rstrip('.')).isdigit()

def isInt_re(v):
    import re
    if not hasattr(isInt_re, 'intRegex'):
        isInt_re.intRegex = re.compile(r"^([+-]?[1-9]\d*|0)$")
    return isInt_re.intRegex.match(str(v).strip()) is not None

def isInt_re2(v):
    return g_intRegex.match(str(v).strip()) is not None

def check_int(s):
    s = str(s)
    if s[0] in ('-', '+'):
        return s[1:].isdigit()
    return s.isdigit()    


def timeFunc(func, times):
    t1 = time.time()
    for n in range(times):
        for v in testvals: 
            r = func(v)
    t2 = time.time()
    return t2 - t1

def testFuncs(funcs):
    for func in funcs:
        sys.stdout.write( "\t%s\t|" % func.__name__)
    print()
    for v in testvals:
        if type(v) == type(''):
            sys.stdout.write("'%s'" % v)
        else:
            sys.stdout.write("%s" % str(v))
        for func in funcs:
            sys.stdout.write( "\t\t%s\t|" % func(v))
        sys.stdout.write("\r\n") 

if __name__ == '__main__':
    print()
    print("tests..")
    testFuncs((isInt_try, isInt_str, isInt_re, isInt_re2, check_int))
    print()

    print("timings..")
    print("isInt_try:   %6.4f" % timeFunc(isInt_try, 10000))
    print("isInt_str:   %6.4f" % timeFunc(isInt_str, 10000)) 
    print("isInt_re:    %6.4f" % timeFunc(isInt_re, 10000))
    print("isInt_re2:   %6.4f" % timeFunc(isInt_re2, 10000))
    print("check_int:   %6.4f" % timeFunc(check_int, 10000))

性能比较结果如下:

timings..
isInt_try:   0.6426
isInt_str:   0.7382
isInt_re:    1.1156
isInt_re2:   0.5344
check_int:   0.3452

AC方法可以扫描一次并完成。我认为,一次扫描字符串的AC方法将是正确的做法。

编辑:

我已经更新了上面的代码以在Python 3.5中工作,并包括了当前投票最多的答案中的check_int函数,并使用了我可以找到的当前最流行的正则表达式来测试整数罩。此正则表达式拒绝诸如“ abc 123”之类的字符串。我添加了“ abc 123”作为测试值。

在这一点上,我非常感兴趣的是要注意,没有一个测试的函数(包括try方法,流行的check_int函数和最流行的用于测试整数罩的正则表达式)会返回所有正确的答案。测试值(嗯,取决于您认为正确答案是什么;请参阅下面的测试结果)。

内置的int()函数会默默地截断浮点数的小数部分,并返回小数点前的整数部分,除非首先将浮点数转换为字符串。

check_int()函数对于0.0和1.0(在技术上是整数)之类的值返回false,对于“ 06”之类的值返回true。

以下是当前(Python 3.5)的测试结果:

                  isInt_try |       isInt_str       |       isInt_re        |       isInt_re2       |   check_int   |
    0               True    |               True    |               True    |               True    |       True    |
    1               True    |               True    |               True    |               True    |       True    |
    -1              True    |               True    |               True    |               True    |       True    |
    1.0             True    |               True    |               False   |               False   |       False   |
    -1.0            True    |               True    |               False   |               False   |       False   |
    '0'             True    |               True    |               True    |               True    |       True    |
    '0.'            False   |               True    |               False   |               False   |       False   |
    '0.0'           False   |               True    |               False   |               False   |       False   |
    '1'             True    |               True    |               True    |               True    |       True    |
    '-1'            True    |               True    |               True    |               True    |       True    |
    '+1'            True    |               True    |               True    |               True    |       True    |
    '1.0'           False   |               True    |               False   |               False   |       False   |
    '-1.0'          False   |               True    |               False   |               False   |       False   |
    '+1.0'          False   |               True    |               False   |               False   |       False   |
    '06'            True    |               True    |               False   |               False   |       True    |
    'abc 123'       False   |               False   |               False   |               False   |       False   |
    1.1             True    |               False   |               False   |               False   |       False   |
    -1.1            True    |               False   |               False   |               False   |       False   |
    '1.1'           False   |               False   |               False   |               False   |       False   |
    '-1.1'          False   |               False   |               False   |               False   |       False   |
    '+1.1'          False   |               False   |               False   |               False   |       False   |
    '1.1.1'         False   |               False   |               False   |               False   |       False   |
    '1.1.0'         False   |               False   |               False   |               False   |       False   |
    '1.0.1'         False   |               False   |               False   |               False   |       False   |
    '1.0.0'         False   |               False   |               False   |               False   |       False   |
    '1.0.'          False   |               False   |               False   |               False   |       False   |
    '1..0'          False   |               False   |               False   |               False   |       False   |
    '1..'           False   |               False   |               False   |               False   |       False   |
    '0.0.'          False   |               False   |               False   |               False   |       False   |
    '0..0'          False   |               False   |               False   |               False   |       False   |
    '0..'           False   |               False   |               False   |               False   |       False   |
    'one'           False   |               False   |               False   |               False   |       False   |
    <obj..>         False   |               False   |               False   |               False   |       False   |
    (1, 2, 3)       False   |               False   |               False   |               False   |       False   |
    [1, 2, 3]       False   |               False   |               False   |               False   |       False   |
    {'one': 'two'}  False   |               False   |               False   |               False   |       False   |
    ' 0 '           True    |               True    |               True    |               True    |       False   |
    ' 0.'           False   |               True    |               False   |               False   |       False   |
    ' .0'           False   |               False   |               False   |               False   |       False   |
    '.01 '          False   |               False   |               False   |               False   |       False   |

刚才我尝试添加此功能:

def isInt_float(s):
    try:
        return float(str(s)).is_integer()
    except:
        return False

它的性能几乎与check_int(0.3486)一样好,并且对于1.0和0.0以及+1.0和0和.0等值返回true。但是它对于’06’也返回true,因此。我猜,选择你的毒药。

You know, I’ve found (and I’ve tested this over and over) that try/except does not perform all that well, for whatever reason. I frequently try several ways of doing things, and I don’t think I’ve ever found a method that uses try/except to perform the best of those tested, in fact it seems to me those methods have usually come out close to the worst, if not the worst. Not in every case, but in many cases. I know a lot of people say it’s the “Pythonic” way, but that’s one area where I part ways with them. To me, it’s neither very performant nor very elegant, so, I tend to only use it for error trapping and reporting.

I was going to gripe that PHP, perl, ruby, C, and even the freaking shell have simple functions for testing a string for integer-hood, but due diligence in verifying those assumptions tripped me up! Apparently this lack is a common sickness.

Here’s a quick and dirty edit of Bruno’s post:

import sys, time, re

g_intRegex = re.compile(r"^([+-]?[1-9]\d*|0)$")

testvals = [
    # integers
    0, 1, -1, 1.0, -1.0,
    '0', '0.','0.0', '1', '-1', '+1', '1.0', '-1.0', '+1.0', '06',
    # non-integers
    'abc 123',
    1.1, -1.1, '1.1', '-1.1', '+1.1',
    '1.1.1', '1.1.0', '1.0.1', '1.0.0',
    '1.0.', '1..0', '1..',
    '0.0.', '0..0', '0..',
    'one', object(), (1,2,3), [1,2,3], {'one':'two'},
    # with spaces
    ' 0 ', ' 0.', ' .0','.01 '
]

def isInt_try(v):
    try:     i = int(v)
    except:  return False
    return True

def isInt_str(v):
    v = str(v).strip()
    return v=='0' or (v if v.find('..') > -1 else v.lstrip('-+').rstrip('0').rstrip('.')).isdigit()

def isInt_re(v):
    import re
    if not hasattr(isInt_re, 'intRegex'):
        isInt_re.intRegex = re.compile(r"^([+-]?[1-9]\d*|0)$")
    return isInt_re.intRegex.match(str(v).strip()) is not None

def isInt_re2(v):
    return g_intRegex.match(str(v).strip()) is not None

def check_int(s):
    s = str(s)
    if s[0] in ('-', '+'):
        return s[1:].isdigit()
    return s.isdigit()    


def timeFunc(func, times):
    t1 = time.time()
    for n in range(times):
        for v in testvals: 
            r = func(v)
    t2 = time.time()
    return t2 - t1

def testFuncs(funcs):
    for func in funcs:
        sys.stdout.write( "\t%s\t|" % func.__name__)
    print()
    for v in testvals:
        if type(v) == type(''):
            sys.stdout.write("'%s'" % v)
        else:
            sys.stdout.write("%s" % str(v))
        for func in funcs:
            sys.stdout.write( "\t\t%s\t|" % func(v))
        sys.stdout.write("\r\n") 

if __name__ == '__main__':
    print()
    print("tests..")
    testFuncs((isInt_try, isInt_str, isInt_re, isInt_re2, check_int))
    print()

    print("timings..")
    print("isInt_try:   %6.4f" % timeFunc(isInt_try, 10000))
    print("isInt_str:   %6.4f" % timeFunc(isInt_str, 10000)) 
    print("isInt_re:    %6.4f" % timeFunc(isInt_re, 10000))
    print("isInt_re2:   %6.4f" % timeFunc(isInt_re2, 10000))
    print("check_int:   %6.4f" % timeFunc(check_int, 10000))

Here are the performance comparison results:

timings..
isInt_try:   0.6426
isInt_str:   0.7382
isInt_re:    1.1156
isInt_re2:   0.5344
check_int:   0.3452

A C method could scan it Once Through, and be done. A C method that scans the string once through would be the Right Thing to do, I think.

EDIT:

I’ve updated the code above to work in Python 3.5, and to include the check_int function from the currently most voted up answer, and to use the current most popular regex that I can find for testing for integer-hood. This regex rejects strings like ‘abc 123’. I’ve added ‘abc 123’ as a test value.

It is Very Interesting to me to note, at this point, that NONE of the functions tested, including the try method, the popular check_int function, and the most popular regex for testing for integer-hood, return the correct answers for all of the test values (well, depending on what you think the correct answers are; see the test results below).

The built-in int() function silently truncates the fractional part of a floating point number and returns the integer part before the decimal, unless the floating point number is first converted to a string.

The check_int() function returns false for values like 0.0 and 1.0 (which technically are integers) and returns true for values like ’06’.

Here are the current (Python 3.5) test results:

                  isInt_try |       isInt_str       |       isInt_re        |       isInt_re2       |   check_int   |
    0               True    |               True    |               True    |               True    |       True    |
    1               True    |               True    |               True    |               True    |       True    |
    -1              True    |               True    |               True    |               True    |       True    |
    1.0             True    |               True    |               False   |               False   |       False   |
    -1.0            True    |               True    |               False   |               False   |       False   |
    '0'             True    |               True    |               True    |               True    |       True    |
    '0.'            False   |               True    |               False   |               False   |       False   |
    '0.0'           False   |               True    |               False   |               False   |       False   |
    '1'             True    |               True    |               True    |               True    |       True    |
    '-1'            True    |               True    |               True    |               True    |       True    |
    '+1'            True    |               True    |               True    |               True    |       True    |
    '1.0'           False   |               True    |               False   |               False   |       False   |
    '-1.0'          False   |               True    |               False   |               False   |       False   |
    '+1.0'          False   |               True    |               False   |               False   |       False   |
    '06'            True    |               True    |               False   |               False   |       True    |
    'abc 123'       False   |               False   |               False   |               False   |       False   |
    1.1             True    |               False   |               False   |               False   |       False   |
    -1.1            True    |               False   |               False   |               False   |       False   |
    '1.1'           False   |               False   |               False   |               False   |       False   |
    '-1.1'          False   |               False   |               False   |               False   |       False   |
    '+1.1'          False   |               False   |               False   |               False   |       False   |
    '1.1.1'         False   |               False   |               False   |               False   |       False   |
    '1.1.0'         False   |               False   |               False   |               False   |       False   |
    '1.0.1'         False   |               False   |               False   |               False   |       False   |
    '1.0.0'         False   |               False   |               False   |               False   |       False   |
    '1.0.'          False   |               False   |               False   |               False   |       False   |
    '1..0'          False   |               False   |               False   |               False   |       False   |
    '1..'           False   |               False   |               False   |               False   |       False   |
    '0.0.'          False   |               False   |               False   |               False   |       False   |
    '0..0'          False   |               False   |               False   |               False   |       False   |
    '0..'           False   |               False   |               False   |               False   |       False   |
    'one'           False   |               False   |               False   |               False   |       False   |
    <obj..>         False   |               False   |               False   |               False   |       False   |
    (1, 2, 3)       False   |               False   |               False   |               False   |       False   |
    [1, 2, 3]       False   |               False   |               False   |               False   |       False   |
    {'one': 'two'}  False   |               False   |               False   |               False   |       False   |
    ' 0 '           True    |               True    |               True    |               True    |       False   |
    ' 0.'           False   |               True    |               False   |               False   |       False   |
    ' .0'           False   |               False   |               False   |               False   |       False   |
    '.01 '          False   |               False   |               False   |               False   |       False   |

Just now I tried adding this function:

def isInt_float(s):
    try:
        return float(str(s)).is_integer()
    except:
        return False

It performs almost as well as check_int (0.3486) and it returns true for values like 1.0 and 0.0 and +1.0 and 0. and .0 and so on. But it also returns true for ’06’, so. Pick your poison, I guess.


回答 3

str.isdigit() 应该可以。

例子:

str.isdigit("23") ## True
str.isdigit("abc") ## False
str.isdigit("23.4") ## False

编辑:正如@BuzzMoschetti所指出的那样,这种方式将在减号(例如“ -23”)上失败。如果您的input_num可以小于0,请在应用str.isdigit()之前使用re.sub(regex_search,regex_replace,contents 。例如:

import re
input_num = "-23"
input_num = re.sub("^-", "", input_num) ## "^" indicates to remove the first "-" only
str.isdigit(input_num) ## True

str.isdigit() should do the trick.

Examples:

str.isdigit("23") ## True
str.isdigit("abc") ## False
str.isdigit("23.4") ## False

EDIT: As @BuzzMoschetti pointed out, this way will fail for minus number (e.g, “-23”). In case your input_num can be less than 0, use re.sub(regex_search,regex_replace,contents) before applying str.isdigit(). For example:

import re
input_num = "-23"
input_num = re.sub("^-", "", input_num) ## "^" indicates to remove the first "-" only
str.isdigit(input_num) ## True

回答 4

使用正则表达式:

import re
def RepresentsInt(s):
    return re.match(r"[-+]?\d+$", s) is not None

如果还必须接受小数:

def RepresentsInt(s):
    return re.match(r"[-+]?\d+(\.0*)?$", s) is not None

为了提高性能(如果您经常这样做),请仅使用一次编译正则表达式re.compile()

Use a regular expression:

import re
def RepresentsInt(s):
    return re.match(r"[-+]?\d+$", s) is not None

If you must accept decimal fractions also:

def RepresentsInt(s):
    return re.match(r"[-+]?\d+(\.0*)?$", s) is not None

For improved performance if you’re doing this often, compile the regular expression only once using re.compile().


回答 5

适当的RegEx解决方案将结合Greg Hewgill和Nowell的想法,但不使用全局变量。您可以通过将属性附加到方法来完成此操作。另外,我知道将导入放在一种方法中并不容易,但是我要使用的是“惰性模块”效果,例如http://peak.telecommunity.com/DevCenter/Importing#lazy-imports

编辑:到目前为止,我最喜欢的技术是仅使用String对象的方法。

#!/usr/bin/env python

# Uses exclusively methods of the String object
def isInteger(i):
    i = str(i)
    return i=='0' or (i if i.find('..') > -1 else i.lstrip('-+').rstrip('0').rstrip('.')).isdigit()

# Uses re module for regex
def isIntegre(i):
    import re
    if not hasattr(isIntegre, '_re'):
        print("I compile only once. Remove this line when you are confident in that.")
        isIntegre._re = re.compile(r"[-+]?\d+(\.0*)?$")
    return isIntegre._re.match(str(i)) is not None

# When executed directly run Unit Tests
if __name__ == '__main__':
    for obj in [
                # integers
                0, 1, -1, 1.0, -1.0,
                '0', '0.','0.0', '1', '-1', '+1', '1.0', '-1.0', '+1.0',
                # non-integers
                1.1, -1.1, '1.1', '-1.1', '+1.1',
                '1.1.1', '1.1.0', '1.0.1', '1.0.0',
                '1.0.', '1..0', '1..',
                '0.0.', '0..0', '0..',
                'one', object(), (1,2,3), [1,2,3], {'one':'two'}
            ]:
        # Notice the integre uses 're' (intended to be humorous)
        integer = ('an integer' if isInteger(obj) else 'NOT an integer')
        integre = ('an integre' if isIntegre(obj) else 'NOT an integre')
        # Make strings look like strings in the output
        if isinstance(obj, str):
            obj = ("'%s'" % (obj,))
        print("%30s is %14s is %14s" % (obj, integer, integre))

对于Class较少的成员,输出如下:

I compile only once. Remove this line when you are confident in that.
                             0 is     an integer is     an integre
                             1 is     an integer is     an integre
                            -1 is     an integer is     an integre
                           1.0 is     an integer is     an integre
                          -1.0 is     an integer is     an integre
                           '0' is     an integer is     an integre
                          '0.' is     an integer is     an integre
                         '0.0' is     an integer is     an integre
                           '1' is     an integer is     an integre
                          '-1' is     an integer is     an integre
                          '+1' is     an integer is     an integre
                         '1.0' is     an integer is     an integre
                        '-1.0' is     an integer is     an integre
                        '+1.0' is     an integer is     an integre
                           1.1 is NOT an integer is NOT an integre
                          -1.1 is NOT an integer is NOT an integre
                         '1.1' is NOT an integer is NOT an integre
                        '-1.1' is NOT an integer is NOT an integre
                        '+1.1' is NOT an integer is NOT an integre
                       '1.1.1' is NOT an integer is NOT an integre
                       '1.1.0' is NOT an integer is NOT an integre
                       '1.0.1' is NOT an integer is NOT an integre
                       '1.0.0' is NOT an integer is NOT an integre
                        '1.0.' is NOT an integer is NOT an integre
                        '1..0' is NOT an integer is NOT an integre
                         '1..' is NOT an integer is NOT an integre
                        '0.0.' is NOT an integer is NOT an integre
                        '0..0' is NOT an integer is NOT an integre
                         '0..' is NOT an integer is NOT an integre
                         'one' is NOT an integer is NOT an integre
<object object at 0x103b7d0a0> is NOT an integer is NOT an integre
                     (1, 2, 3) is NOT an integer is NOT an integre
                     [1, 2, 3] is NOT an integer is NOT an integre
                {'one': 'two'} is NOT an integer is NOT an integre

The proper RegEx solution would combine the ideas of Greg Hewgill and Nowell, but not use a global variable. You can accomplish this by attaching an attribute to the method. Also, I know that it is frowned upon to put imports in a method, but what I’m going for is a “lazy module” effect like http://peak.telecommunity.com/DevCenter/Importing#lazy-imports

edit: My favorite technique so far is to use exclusively methods of the String object.

#!/usr/bin/env python

# Uses exclusively methods of the String object
def isInteger(i):
    i = str(i)
    return i=='0' or (i if i.find('..') > -1 else i.lstrip('-+').rstrip('0').rstrip('.')).isdigit()

# Uses re module for regex
def isIntegre(i):
    import re
    if not hasattr(isIntegre, '_re'):
        print("I compile only once. Remove this line when you are confident in that.")
        isIntegre._re = re.compile(r"[-+]?\d+(\.0*)?$")
    return isIntegre._re.match(str(i)) is not None

# When executed directly run Unit Tests
if __name__ == '__main__':
    for obj in [
                # integers
                0, 1, -1, 1.0, -1.0,
                '0', '0.','0.0', '1', '-1', '+1', '1.0', '-1.0', '+1.0',
                # non-integers
                1.1, -1.1, '1.1', '-1.1', '+1.1',
                '1.1.1', '1.1.0', '1.0.1', '1.0.0',
                '1.0.', '1..0', '1..',
                '0.0.', '0..0', '0..',
                'one', object(), (1,2,3), [1,2,3], {'one':'two'}
            ]:
        # Notice the integre uses 're' (intended to be humorous)
        integer = ('an integer' if isInteger(obj) else 'NOT an integer')
        integre = ('an integre' if isIntegre(obj) else 'NOT an integre')
        # Make strings look like strings in the output
        if isinstance(obj, str):
            obj = ("'%s'" % (obj,))
        print("%30s is %14s is %14s" % (obj, integer, integre))

And for the less adventurous members of the class, here is the output:

I compile only once. Remove this line when you are confident in that.
                             0 is     an integer is     an integre
                             1 is     an integer is     an integre
                            -1 is     an integer is     an integre
                           1.0 is     an integer is     an integre
                          -1.0 is     an integer is     an integre
                           '0' is     an integer is     an integre
                          '0.' is     an integer is     an integre
                         '0.0' is     an integer is     an integre
                           '1' is     an integer is     an integre
                          '-1' is     an integer is     an integre
                          '+1' is     an integer is     an integre
                         '1.0' is     an integer is     an integre
                        '-1.0' is     an integer is     an integre
                        '+1.0' is     an integer is     an integre
                           1.1 is NOT an integer is NOT an integre
                          -1.1 is NOT an integer is NOT an integre
                         '1.1' is NOT an integer is NOT an integre
                        '-1.1' is NOT an integer is NOT an integre
                        '+1.1' is NOT an integer is NOT an integre
                       '1.1.1' is NOT an integer is NOT an integre
                       '1.1.0' is NOT an integer is NOT an integre
                       '1.0.1' is NOT an integer is NOT an integre
                       '1.0.0' is NOT an integer is NOT an integre
                        '1.0.' is NOT an integer is NOT an integre
                        '1..0' is NOT an integer is NOT an integre
                         '1..' is NOT an integer is NOT an integre
                        '0.0.' is NOT an integer is NOT an integre
                        '0..0' is NOT an integer is NOT an integre
                         '0..' is NOT an integer is NOT an integre
                         'one' is NOT an integer is NOT an integre
<object object at 0x103b7d0a0> is NOT an integer is NOT an integre
                     (1, 2, 3) is NOT an integer is NOT an integre
                     [1, 2, 3] is NOT an integer is NOT an integre
                {'one': 'two'} is NOT an integer is NOT an integre

回答 6

>>> "+7".lstrip("-+").isdigit()
True
>>> "-7".lstrip("-+").isdigit()
True
>>> "7".lstrip("-+").isdigit()
True
>>> "13.4".lstrip("-+").isdigit()
False

因此,您的功能将是:

def is_int(val):
   return val[1].isdigit() and val.lstrip("-+").isdigit()
>>> "+7".lstrip("-+").isdigit()
True
>>> "-7".lstrip("-+").isdigit()
True
>>> "7".lstrip("-+").isdigit()
True
>>> "13.4".lstrip("-+").isdigit()
False

So your function would be:

def is_int(val):
   return val[1].isdigit() and val.lstrip("-+").isdigit()

回答 7

Greg Hewgill的方法缺少一些组件:前导“ ^”只匹配字符串的开头,然后预先编译re。但是这种方法将使您避免尝试:专家:

import re
INT_RE = re.compile(r"^[-]?\d+$")
def RepresentsInt(s):
    return INT_RE.match(str(s)) is not None

我很想知道为什么您要尝试避免尝试:除了?

Greg Hewgill’s approach was missing a few components: the leading “^” to only match the start of the string, and compiling the re beforehand. But this approach will allow you to avoid a try: exept:

import re
INT_RE = re.compile(r"^[-]?\d+$")
def RepresentsInt(s):
    return INT_RE.match(str(s)) is not None

I would be interested why you are trying to avoid try: except?


回答 8

我必须一直这样做,而且我对使用try / except模式有轻微但不合理的厌恶感。我用这个:

all([xi in '1234567890' for xi in x])

它不包含负数,因此您可以去除一个减号(如果有),然后检查结果是否包含0-9的数字:

all([xi in '1234567890' for xi in x.replace('-', '', 1)])

如果不确定输入是否为字符串,也可以将x传递给str():

all([xi in '1234567890' for xi in str(x).replace('-', '', 1)])

至少有两种(边缘?)情况会崩溃:

  1. 它不适用于各种科学和/或指数表示法(例如1.2E3、10 ^ 3等)-两者都将返回False。我也不认为其他答案也可以解决这个问题,甚至Python 3.8也有不一致的意见,因为type(1E2)给出了<class 'float'>type(10^2)给出了<class 'int'>
  2. 空字符串输入为True。

因此,它不适用于所有可能的输入,但是如果您可以排除科学计数法,指数计数法和空字符串,则可以单行检查,False如果x不是整数,True并且x是整数,则返回单行检查。

我不知道它是否是pythonic,但这只是一行,而且相对清楚代码的作用。

I have to do this all the time, and I have a mild but admittedly irrational aversion to using the try/except pattern. I use this:

all([xi in '1234567890' for xi in x])

It doesn’t accommodate negative numbers, so you could strip out one minus sign (if any), and then check if the result comprises digits from 0-9:

all([xi in '1234567890' for xi in x.replace('-', '', 1)])

You could also pass x to str() if you’re not sure the input is a string:

all([xi in '1234567890' for xi in str(x).replace('-', '', 1)])

There are at least two (edge?) cases where this falls apart:

  1. It doesn’t work for various scientific and/or exponential notations (e.g. 1.2E3, 10^3, etc.) – both will return False. I don’t think other answers accommodated this either, and even Python 3.8 has inconsistent opinions, since type(1E2) gives <class 'float'> whereas type(10^2) gives <class 'int'>.
  2. An empty string input gives True.

So it won’t work for every possible input, but if you can exclude scientific notation, exponential notation, and empty strings, it’s an OK one-line check that returns False if x is not an integer and True if x is an integer.

I don’t know if it’s pythonic, but it’s one line, and it’s relatively clear what the code does.


回答 9

我认为

s.startswith('-') and s[1:].isdigit()

最好重写为:

s.replace('-', '').isdigit()

因为s [1:]也创建了一个新字符串

但是更好的解决方案是

s.lstrip('+-').isdigit()

I think

s.startswith('-') and s[1:].isdigit()

would be better to rewrite to:

s.replace('-', '').isdigit()

because s[1:] also creates a new string

But much better solution is

s.lstrip('+-').isdigit()

回答 10

我真的很喜欢Shavais的帖子,但是我又添加了一个测试用例(和内置的isdigit()函数):

def isInt_loop(v):
    v = str(v).strip()
    # swapping '0123456789' for '9876543210' makes nominal difference (might have because '1' is toward the beginning of the string)
    numbers = '0123456789'
    for i in v:
        if i not in numbers:
            return False
    return True

def isInt_Digit(v):
    v = str(v).strip()
    return v.isdigit()

并且始终如一地超越其他时间:

timings..
isInt_try:   0.4628
isInt_str:   0.3556
isInt_re:    0.4889
isInt_re2:   0.2726
isInt_loop:   0.1842
isInt_Digit:   0.1577

使用普通的2.7 python:

$ python --version
Python 2.7.10

我添加的两个测试用例(isInt_loop和isInt_digit)都通过了完全相同的测试用例(它们都只接受无符号整数),但是我认为人们可以更灵活地修改字符串实现(isInt_loop),而不是内置的isdigit ()函数,因此即使执行时间略有不同,我也将其包含在内。(而且这两种方法都击败了其他一切,但是不处理多余的东西:“ ./+/-”)

此外,我确实发现有趣的是注意到正则表达式(isInt_re2方法)在Shavais于2012年(当前为2018年)执行的同一测试中击败了字符串比较。也许正则表达式库得到了改进?

I really liked Shavais’ post, but I added one more test case ( & the built in isdigit() function):

def isInt_loop(v):
    v = str(v).strip()
    # swapping '0123456789' for '9876543210' makes nominal difference (might have because '1' is toward the beginning of the string)
    numbers = '0123456789'
    for i in v:
        if i not in numbers:
            return False
    return True

def isInt_Digit(v):
    v = str(v).strip()
    return v.isdigit()

and it significantly consistently beats the times of the rest:

timings..
isInt_try:   0.4628
isInt_str:   0.3556
isInt_re:    0.4889
isInt_re2:   0.2726
isInt_loop:   0.1842
isInt_Digit:   0.1577

using normal 2.7 python:

$ python --version
Python 2.7.10

Both the two test cases I added (isInt_loop and isInt_digit) pass the exact same test cases (they both only accept unsigned integers), but I thought that people could be more clever with modifying the string implementation (isInt_loop) opposed to the built in isdigit() function, so I included it, even though there’s a slight difference in execution time. (and both methods beat everything else by a lot, but don’t handle the extra stuff: “./+/-” )

Also, I did find it interesting to note that the regex (isInt_re2 method) beat the string comparison in the same test that was performed by Shavais in 2012 (currently 2018). Maybe the regex libraries have been improved?


回答 11

在我看来,这可能是最直接,最Python的方法。我没有看到这种解决方案,它与regex基本相同,但是没有regex。

def is_int(test):
    import string
    return not (set(test) - set(string.digits))

This is probably the most straightforward and pythonic way to approach it in my opinion. I didn’t see this solution and it’s basically the same as the regex one, but without the regex.

def is_int(test):
    import string
    return not (set(test) - set(string.digits))

回答 12

这是一个不会引起错误的解析函数。它处理明显的None失败案例(在CPython上默认处理多达2000个“-/ +”符号!):

#!/usr/bin/env python

def get_int(number):
    splits = number.split('.')
    if len(splits) > 2:
        # too many splits
        return None
    if len(splits) == 2 and splits[1]:
        # handle decimal part recursively :-)
        if get_int(splits[1]) != 0:
            return None

    int_part = splits[0].lstrip("+")
    if int_part.startswith('-'):
        # handle minus sign recursively :-)
        return get_int(int_part[1:]) * -1
    # successful 'and' returns last truth-y value (cast is always valid)
    return int_part.isdigit() and int(int_part)

一些测试:

tests = ["0", "0.0", "0.1", "1", "1.1", "1.0", "-1", "-1.1", "-1.0", "-0", "--0", "---3", '.3', '--3.', "+13", "+-1.00", "--+123", "-0.000"]

for t in tests:
    print "get_int(%s) = %s" % (t, get_int(str(t)))

结果:

get_int(0) = 0
get_int(0.0) = 0
get_int(0.1) = None
get_int(1) = 1
get_int(1.1) = None
get_int(1.0) = 1
get_int(-1) = -1
get_int(-1.1) = None
get_int(-1.0) = -1
get_int(-0) = 0
get_int(--0) = 0
get_int(---3) = -3
get_int(.3) = None
get_int(--3.) = 3
get_int(+13) = 13
get_int(+-1.00) = -1
get_int(--+123) = 123
get_int(-0.000) = 0

根据您的需要,您可以使用:

def int_predicate(number):
     return get_int(number) is not None

Here is a function that parses without raising errors. It handles obvious cases returns None on failure (handles up to 2000 ‘-/+’ signs by default on CPython!):

#!/usr/bin/env python

def get_int(number):
    splits = number.split('.')
    if len(splits) > 2:
        # too many splits
        return None
    if len(splits) == 2 and splits[1]:
        # handle decimal part recursively :-)
        if get_int(splits[1]) != 0:
            return None

    int_part = splits[0].lstrip("+")
    if int_part.startswith('-'):
        # handle minus sign recursively :-)
        return get_int(int_part[1:]) * -1
    # successful 'and' returns last truth-y value (cast is always valid)
    return int_part.isdigit() and int(int_part)

Some tests:

tests = ["0", "0.0", "0.1", "1", "1.1", "1.0", "-1", "-1.1", "-1.0", "-0", "--0", "---3", '.3', '--3.', "+13", "+-1.00", "--+123", "-0.000"]

for t in tests:
    print "get_int(%s) = %s" % (t, get_int(str(t)))

Results:

get_int(0) = 0
get_int(0.0) = 0
get_int(0.1) = None
get_int(1) = 1
get_int(1.1) = None
get_int(1.0) = 1
get_int(-1) = -1
get_int(-1.1) = None
get_int(-1.0) = -1
get_int(-0) = 0
get_int(--0) = 0
get_int(---3) = -3
get_int(.3) = None
get_int(--3.) = 3
get_int(+13) = 13
get_int(+-1.00) = -1
get_int(--+123) = 123
get_int(-0.000) = 0

For your needs you can use:

def int_predicate(number):
     return get_int(number) is not None

回答 13

我建议以下内容:

import ast

def is_int(s):
    return isinstance(ast.literal_eval(s), int)

文档

安全地评估表达式节点或包含Python文字或容器显示的字符串。提供的字符串或节点只能由以下Python文字结构组成:字符串,字节,数字,元组,列表,字典,集合,布尔值和无。

我应该注意,ValueError当对任何不构成Python文字的内容进行调用时,这将引发异常。由于问题要求的解决方案没有try / except,因此我为此准备了Kobayashi-Maru类型的解决方案:

from ast import literal_eval
from contextlib import suppress

def is_int(s):
    with suppress(ValueError):
        return isinstance(literal_eval(s), int)
    return False

¯\ _(ツ)_ /¯

I suggest the following:

import ast

def is_int(s):
    return isinstance(ast.literal_eval(s), int)

From the docs:

Safely evaluate an expression node or a string containing a Python literal or container display. The string or node provided may only consist of the following Python literal structures: strings, bytes, numbers, tuples, lists, dicts, sets, booleans, and None.

I should note that this will raise a ValueError exception when called against anything that does not constitute a Python literal. Since the question asked for a solution without try/except, I have a Kobayashi-Maru type solution for that:

from ast import literal_eval
from contextlib import suppress

def is_int(s):
    with suppress(ValueError):
        return isinstance(literal_eval(s), int)
    return False

¯\_(ツ)_/¯


回答 14

我有一种根本不使用int的可能性,并且除非字符串不代表数字,否则不应引发异常

float(number)==float(number)//1

它应该适用于float接受的任何类型的字符串(正,负,工程符号)。

I have one possibility that doesn’t use int at all, and should not raise an exception unless the string does not represent a number

float(number)==float(number)//1

It should work for any kind of string that float accepts, positive, negative, engineering notation…


回答 15

我猜这个问题与速度有关,因为try / except有时间限制:

 测试数据

首先,我创建了一个包含200个字符串,100个失败字符串和100个数字字符串的列表。

from random import shuffle
numbers = [u'+1'] * 100
nonumbers = [u'1abc'] * 100
testlist = numbers + nonumbers
shuffle(testlist)
testlist = np.array(testlist)

 numpy解决方案(仅适用于数组和unicode)

np.core.defchararray.isnumeric也可以使用unicode字符串,np.core.defchararray.isnumeric(u'+12')但是它返回和数组。因此,如果您必须进行数千次转换并且缺少数据或非数字数据,这是一个很好的解决方案。

import numpy as np
%timeit np.core.defchararray.isnumeric(testlist)
10000 loops, best of 3: 27.9 µs per loop # 200 numbers per loop

尝试/除外

def check_num(s):
  try:
    int(s)
    return True
  except:
    return False

def check_list(l):
  return [check_num(e) for e in l]

%timeit check_list(testlist)
1000 loops, best of 3: 217 µs per loop # 200 numbers per loop

似乎numpy解决方案要快得多。

I guess the question is related with speed since the try/except has a time penalty:

 test data

First, I created a list of 200 strings, 100 failing strings and 100 numeric strings.

from random import shuffle
numbers = [u'+1'] * 100
nonumbers = [u'1abc'] * 100
testlist = numbers + nonumbers
shuffle(testlist)
testlist = np.array(testlist)

 numpy solution (only works with arrays and unicode)

np.core.defchararray.isnumeric can also work with unicode strings np.core.defchararray.isnumeric(u'+12') but it returns and array. So, it’s a good solution if you have to do thousands of conversions and have missing data or non numeric data.

import numpy as np
%timeit np.core.defchararray.isnumeric(testlist)
10000 loops, best of 3: 27.9 µs per loop # 200 numbers per loop

try/except

def check_num(s):
  try:
    int(s)
    return True
  except:
    return False

def check_list(l):
  return [check_num(e) for e in l]

%timeit check_list(testlist)
1000 loops, best of 3: 217 µs per loop # 200 numbers per loop

Seems that numpy solution is much faster.


回答 16

如果您只想接受低位数字,请执行以下测试:

Python 3.7+: (u.isdecimal() and u.isascii())

Python <= 3.6: (u.isdecimal() and u == str(int(u)))

其他答案建议使用.isdigit()或,.isdecimal()都包含一些高位unicode字符,例如'٢'u'\u0662'):

u = u'\u0662'     # '٢'
u.isdigit()       # True
u.isdecimal()     # True
u.isascii()       # False (Python 3.7+ only)
u == str(int(u))  # False

If you want to accept lower-ascii digits only, here are tests to do so:

Python 3.7+: (u.isdecimal() and u.isascii())

Python <= 3.6: (u.isdecimal() and u == str(int(u)))

Other answers suggest using .isdigit() or .isdecimal() but these both include some upper-unicode characters such as '٢' (u'\u0662'):

u = u'\u0662'     # '٢'
u.isdigit()       # True
u.isdecimal()     # True
u.isascii()       # False (Python 3.7+ only)
u == str(int(u))  # False

回答 17

嗯。尝试这个:

def int_check(a):
    if int(a) == a:
        return True
    else:
        return False

如果您不输入不是数字的字符串,则此方法有效。

而且(我忘了放数字检查部分。),还有一个函数检查字符串是否是数字。它是str.isdigit()。这是一个例子:

a = 2
a.isdigit()

如果调用a.isdigit(),它将返回True。

Uh.. Try this:

def int_check(a):
    if int(a) == a:
        return True
    else:
        return False

This works if you don’t put a string that’s not a number.

And also (I forgot to put the number check part. ), there is a function checking if the string is a number or not. It is str.isdigit(). Here’s an example:

a = 2
a.isdigit()

If you call a.isdigit(), it will return True.