标签归档:string

如何编写符合PEP8的很长的字符串并防止E501

问题:如何编写符合PEP8的很长的字符串并防止E501

正如PEP8建议将python程序的规则保持在80列以下,我该如何使用长字符串来遵守该规则,即

s = "this is my really, really, really, really, really, really, really long string that I'd like to shorten."

我如何将其扩展到以下行,即

s = "this is my really, really, really, really, really, really" + 
    "really long string that I'd like to shorten."

As PEP8 suggests keeping below the 80 column rule for your python program, how can I abide to that with long strings, i.e.

s = "this is my really, really, really, really, really, really, really long string that I'd like to shorten."

How would I go about expanding this to the following line, i.e.

s = "this is my really, really, really, really, really, really" + 
    "really long string that I'd like to shorten."

回答 0

隐式串联可能是最干净的解决方案:

s = "this is my really, really, really, really, really, really," \
    " really long string that I'd like to shorten."

编辑在反思我同意,托德的建议,请使用括号,而不是续行是所有他给出的理由更好。我唯一的犹豫是将带括号的字符串与元组混淆是相对容易的。

Implicit concatenation might be the cleanest solution:

s = "this is my really, really, really, really, really, really," \
    " really long string that I'd like to shorten."

Edit On reflection I agree that Todd’s suggestion to use brackets rather than line continuation is better for all the reasons he gives. The only hesitation I have is that it’s relatively easy to confuse bracketed strings with tuples.


回答 1

另外,由于相邻的字符串常量是自动连接的,因此您也可以这样编码:

s = ("this is my really, really, really, really, really, really, "  
     "really long string that I'd like to shorten.")

请注意没有加号,我在示例格式之后添加了额外的逗号和空格。

我个人不喜欢反斜杠,并且我记得在某处读到它的使用实际上已被弃用,而倾向于这种更明确的形式。记住“显式胜于隐式”。

我认为反斜杠不太清楚,用处也不大,因为这实际上是在换行符的转义。如果有必要在行末尾添加注释。可以使用串联的字符串常量来做到这一点:

s = ("this is my really, really, really, really, really, really, " # comments ok
     "really long string that I'd like to shorten.")

我使用Google搜索“ python行长”来返回PEP8链接作为第一个结果,同时还链接到另一个有关该主题的StackOverflow帖子:“ 为什么Python PEP-8应该指定最大行长为79个字符?

另一个很好的搜索词组是“ python行连续”。

Also, because neighboring string constants are automatically concatenated, you can code it like this too:

s = ("this is my really, really, really, really, really, really, "  
     "really long string that I'd like to shorten.")

Note no plus sign, and I added the extra comma and space that follows the formatting of your example.

Personally I don’t like the backslashes, and I recall reading somewhere that its use is actually deprecated in favor of this form which is more explicit. Remember “Explicit is better than implicit.”

I consider the backslash to be less clear and less useful because this is actually escaping the newline character. It’s not possible to put a line end comment after it if one should be necessary. It is possible to do this with concatenated string constants:

s = ("this is my really, really, really, really, really, really, " # comments ok
     "really long string that I'd like to shorten.")

I used a Google search of “python line length” which returns the PEP8 link as the first result, but also links to another good StackOverflow post on this topic: “Why should Python PEP-8 specify a maximum line length of 79 characters?

Another good search phrase would be “python line continuation”.


回答 2

我认为您的问题中最重要的词是“建议”。

编码标准很有趣。通常,他们提供的指南在编写时有很好的基础(例如,大多数终端无法在一行上显示> 80个字符),但是随着时间的推移,它们在功能上已过时,但仍然严格遵守。我想您在这里需要做的是权衡“打破”该特定建议与代码的可读性和可维护性的相对优点。

抱歉,这不能直接回答您的问题。

I think the most important word in your question was “suggests”.

Coding standards are funny things. Often the guidance they provide has a really good basis when it was written (e.g. most terminals being unable to show > 80 characters on a line), but over time they become functionally obsolete, but still rigidly adhered to. I guess what you need to do here is weigh up the relative merits of “breaking” that particular suggestion against the readability and mainatinability of your code.

Sorry this doesn’t directly answer your question.


回答 3

您丢失了一个空格,并且可能需要换行符,即。一个\

s = "this is my really, really, really, really, really, really" +  \
    " really long string that I'd like to shorten."

甚至:

s = "this is my really, really, really, really, really, really"  \
    " really long string that I'd like to shorten."

Parens也可以代替行继续,但是您可能会冒险有人认为您打算创建一个元组而忘记了逗号。举个例子:

s = ("this is my really, really, really, really, really, really"
    " really long string that I'd like to shorten.")

与:

s = ("this is my really, really, really, really, really, really",
    " really long string that I'd like to shorten.")

使用Python的动态类型,代码可能会以任何一种方式运行,但是会产生与您不想要的结果相同的错误结果。

You lost a space, and you probably need a line continuation character, ie. a \.

s = "this is my really, really, really, really, really, really" +  \
    " really long string that I'd like to shorten."

or even:

s = "this is my really, really, really, really, really, really"  \
    " really long string that I'd like to shorten."

Parens would also work instead of the line continuation, but you risk someone thinking you intended to have a tuple and had just forgotten a comma. Take for instance:

s = ("this is my really, really, really, really, really, really"
    " really long string that I'd like to shorten.")

versus:

s = ("this is my really, really, really, really, really, really",
    " really long string that I'd like to shorten.")

With Python’s dynamic typing, the code may run either way, but produce incorrect results with the one you didn’t intend.


回答 4

反斜杠:

s = "this is my really, really, really, really, really, really" +  \
    "really long string that I'd like to shorten."

或包裹在括号中:

s = ("this is my really, really, really, really, really, really" + 
    "really long string that I'd like to shorten.")

Backslash:

s = "this is my really, really, really, really, really, really" +  \
    "really long string that I'd like to shorten."

or wrap in parens:

s = ("this is my really, really, really, really, really, really" + 
    "really long string that I'd like to shorten.")

回答 5

这些都是很好的答案,但是我找不到能帮助我编辑“隐式连接”字符串的编辑器插件,因此我编写了一个程序包使它更容易使用。

在pip(安装段落)上,如果有人在徘徊这个旧线程想要将其签出。格式化html格式的多行字符串(压缩空格,为新段落添加两个换行符,不必担心行之间的空格)。

from paragraphs import par


class SuddenDeathError(Exception):
    def __init__(self, cause: str) -> None:
        self.cause = cause

    def __str__(self):
        return par(
            f""" Y - e - e - e - es, Lord love you! Why should she die of
            {self.cause}? She come through diphtheria right enough the year
            before. I saw her with my own eyes. Fairly blue with it, she
            was. They all thought she was dead; but my father he kept ladling
            gin down her throat till she came to so sudden that she bit the bowl
            off the spoon. 

            What call would a woman with that strength in her have to die of
            {self.cause}? What become of her new straw hat that should have
            come to me? Somebody pinched it; and what I say is, them as pinched
            it done her in."""
        )


raise SuddenDeathError("influenza")

变成…

__main__.SuddenDeathError: Y - e - e - e - es, Lord love you! Why should she die of influenza? She come through diphtheria right enough the year before. I saw her with my own eyes. Fairly blue with it, she was. They all thought she was dead; but my father he kept ladling gin down her throat till she came to so sudden that she bit the bowl off the spoon.

What call would a woman with that strength in her have to die of influenza? What become of her new straw hat that should have come to me? Somebody pinched it; and what I say is, them as pinched it done her in.

一切都轻松地与(Vim)’gq’对齐

These are all great answers, but I couldn’t find an editor plugin that would help me with editing “implicitly concatenated” strings, so I wrote a package to make it easier on me.

On pip (install paragraphs) if whoever’s wandering this old thread would like to check it out. Formats multi-line strings the way html does (compress whitespace, two newlines for a new paragraph, no worries about spaces between lines).

from paragraphs import par


class SuddenDeathError(Exception):
    def __init__(self, cause: str) -> None:
        self.cause = cause

    def __str__(self):
        return par(
            f""" Y - e - e - e - es, Lord love you! Why should she die of
            {self.cause}? She come through diphtheria right enough the year
            before. I saw her with my own eyes. Fairly blue with it, she
            was. They all thought she was dead; but my father he kept ladling
            gin down her throat till she came to so sudden that she bit the bowl
            off the spoon. 

            What call would a woman with that strength in her have to die of
            {self.cause}? What become of her new straw hat that should have
            come to me? Somebody pinched it; and what I say is, them as pinched
            it done her in."""
        )


raise SuddenDeathError("influenza")

becomes …

__main__.SuddenDeathError: Y - e - e - e - es, Lord love you! Why should she die of influenza? She come through diphtheria right enough the year before. I saw her with my own eyes. Fairly blue with it, she was. They all thought she was dead; but my father he kept ladling gin down her throat till she came to so sudden that she bit the bowl off the spoon.

What call would a woman with that strength in her have to die of influenza? What become of her new straw hat that should have come to me? Somebody pinched it; and what I say is, them as pinched it done her in.

Everything lines up easily with (Vim) ‘gq’


回答 6

使用a \可以将语句扩展到多行:

s = "this is my really, really, really, really, really, really" + \
"really long string that I'd like to shorten."

应该管用。

With a \ you can expand statements to multiple lines:

s = "this is my really, really, really, really, really, really" + \
"really long string that I'd like to shorten."

should work.


回答 7

我倾向于使用一些此处未提及的方法来指定大字符串,但这是针对非常特定的场景的。YMMV …

  • 多行文本,通常带有格式化标记(不是您所要的,但仍然有用):

    error_message = '''
    I generally like to see how my helpful, sometimes multi-line error
    messages will look against the left border.
    '''.strip()
  • 通过您喜欢的任何字符串插值方法逐段增加变量:

    var = 'This is the start of a very,'
    var = f'{var} very long string which could'
    var = f'{var} contain a ridiculous number'
    var = f'{var} of words.'
  • 从文件中读取。PEP-8不会限制文件中字符串的长度;只是您的代码行。:)

  • 使用蛮力或您的编辑器使用换行符将字符串拆分为可换行,然后删除所有换行符。(类似于我列出的第一种技术):

    foo = '''
    agreatbigstringthatyoudonotwanttohaveanyne
    wlinesinbutforsomereasonyouneedtospecifyit
    verbatimintheactualcodejustlikethis
    '''.replace('\n', '')

I tend to use a couple of methods not mentioned here for specifying large strings, but these are for very specific scenarios. YMMV…

  • Multi-line blobs of text, often with formatted tokens (not quite what you were asking, but still useful):

    error_message = '''
    I generally like to see how my helpful, sometimes multi-line error
    messages will look against the left border.
    '''.strip()
    
  • Grow the variable piece-by-piece through whatever string interpolation method you prefer:

    var = 'This is the start of a very,'
    var = f'{var} very long string which could'
    var = f'{var} contain a ridiculous number'
    var = f'{var} of words.'
    
  • Read it from a file. PEP-8 doesn’t limit the length of strings in a file; just the lines of your code. :)

  • Use brute-force or your editor to split the string into managaeble lines using newlines, and then remove all newlines. (Similar to the first technique I listed):

    foo = '''
    agreatbigstringthatyoudonotwanttohaveanyne
    wlinesinbutforsomereasonyouneedtospecifyit
    verbatimintheactualcodejustlikethis
    '''.replace('\n', '')
    

回答 8

可用选项:

  • 反斜杠"foo" \ "bar"
  • 加号后跟反斜杠"foo" + \ "bar"
  • 括号
    • ("foo" "bar")
    • 加号的括号("foo" + "bar")
    • PEP8,E502:括号之间的反斜杠是多余的

避免

避免用逗号括起来:("foo", "bar")定义一个元组。


>>> s = "a" \
... "b"
>>> s
'ab'
>>> type(s)
<class 'str'>
>>> s = "a" + \
... "b"
>>> s
'ab'
>>> type(s)
<class 'str'>
>>> s = ("a"
... "b")
>>> type(s)
<class 'str'>
>>> print(s)
ab
>>> s = ("a",
... "b")
>>> type(s)
<class 'tuple'>
>>> s = ("a" + 
... "b")
>>> type(s)
<class 'str'>
>>> print(s)
ab
>>> 

Available options:

  • backslash: "foo" \ "bar"
  • plus sign followed by backslash: "foo" + \ "bar"
  • brackets:
    • ("foo" "bar")
    • brackets with plus sign: ("foo" + "bar")
    • PEP8, E502: the backslash is redundant between brackets

Avoid

Avoid brackets with comma: ("foo", "bar") which defines a tuple.


>>> s = "a" \
... "b"
>>> s
'ab'
>>> type(s)
<class 'str'>
>>> s = "a" + \
... "b"
>>> s
'ab'
>>> type(s)
<class 'str'>
>>> s = ("a"
... "b")
>>> type(s)
<class 'str'>
>>> print(s)
ab
>>> s = ("a",
... "b")
>>> type(s)
<class 'tuple'>
>>> s = ("a" + 
... "b")
>>> type(s)
<class 'str'>
>>> print(s)
ab
>>> 

回答 9

如果您必须插入一个长字符串文字并希望flake8关闭,则可以使用它的关闭指令。例如,在测试例程中,我定义了一些伪造的CSV输入。我发现将其分割成行的更多行会造成极大的混乱,因此我决定添加# noqa: E501以下内容:

csv_test_content = """"STATION","DATE","SOURCE","LATITUDE","LONGITUDE","ELEVATION","NAME","REPORT_TYPE","CALL_SIGN","QUALITY_CONTROL","WND","CIG","VIS","TMP","DEW","SLP","AA1","AA2","AY1","AY2","GF1","MW1","REM"
"94733099999","2019-01-03T22:00:00","4","-32.5833333","151.1666666","45.0","SINGLETON STP, AS","FM-12","99999","V020","050,1,N,0010,1","22000,1,9,N","025000,1,9,9","+0260,1","+0210,1","99999,9","24,0000,9,1",,"0,1,02,1","0,1,02,1","01,99,1,99,9,99,9,99999,9,99,9,99,9","01,1","SYN05294733 11/75 10502 10260 20210 60004 70100 333 70000="
"94733099999","2019-01-04T04:00:00","4","-32.5833333","151.1666666","45.0","SINGLETON STP, AS","FM-12","99999","V020","090,1,N,0021,1","22000,1,9,N","025000,1,9,9","+0378,1","+0172,1","99999,9","06,0000,9,1",,"0,1,02,1","0,1,02,1","03,99,1,99,9,99,9,99999,9,99,9,99,9","03,1","SYN04294733 11/75 30904 10378 20172 60001 70300="
"94733099999","2019-01-04T22:00:00","4","-32.5833333","151.1666666","45.0","SINGLETON STP, AS","FM-12","99999","V020","290,1,N,0057,1","99999,9,9,N","020000,1,9,9","+0339,1","+0201,1","99999,9","24,0000,9,1",,"0,1,02,1","0,1,02,1",,"02,1","SYN05294733 11970 02911 10339 20201 60004 70200 333 70000="
"94733099999","2019-01-05T22:00:00","4","-32.5833333","151.1666666","45.0","SINGLETON STP, AS","FM-12","99999","V020","200,1,N,0026,1","99999,9,9,N","000100,1,9,9","+0209,1","+0193,1","99999,9","24,0004,3,1",,"1,1,02,1","1,1,02,1","08,99,1,99,9,99,9,99999,9,99,9,99,9","51,1","SYN05294733 11/01 82005 10209 20193 69944 75111 333 70004="
"94733099999","2019-01-08T04:00:00","4","-32.5833333","151.1666666","45.0","SINGLETON STP, AS","FM-12","99999","V020","070,1,N,0026,1","22000,1,9,N","025000,1,9,9","+0344,1","+0213,1","99999,9","06,0000,9,1",,"2,1,02,1","2,1,02,1","04,99,1,99,9,99,9,99999,9,99,9,99,9","02,1","SYN04294733 11/75 40705 10344 20213 60001 70222="
"""  # noqa: E501

If you must insert a long string literal and want flake8 to shut up, you can use it’s shutting up directives. For example, in a testing routine I defined some fake CSV input. I found that splitting it over more lines that it had rows would be mightily confusing, so I decided to add a # noqa: E501 as follows:

csv_test_content = """"STATION","DATE","SOURCE","LATITUDE","LONGITUDE","ELEVATION","NAME","REPORT_TYPE","CALL_SIGN","QUALITY_CONTROL","WND","CIG","VIS","TMP","DEW","SLP","AA1","AA2","AY1","AY2","GF1","MW1","REM"
"94733099999","2019-01-03T22:00:00","4","-32.5833333","151.1666666","45.0","SINGLETON STP, AS","FM-12","99999","V020","050,1,N,0010,1","22000,1,9,N","025000,1,9,9","+0260,1","+0210,1","99999,9","24,0000,9,1",,"0,1,02,1","0,1,02,1","01,99,1,99,9,99,9,99999,9,99,9,99,9","01,1","SYN05294733 11/75 10502 10260 20210 60004 70100 333 70000="
"94733099999","2019-01-04T04:00:00","4","-32.5833333","151.1666666","45.0","SINGLETON STP, AS","FM-12","99999","V020","090,1,N,0021,1","22000,1,9,N","025000,1,9,9","+0378,1","+0172,1","99999,9","06,0000,9,1",,"0,1,02,1","0,1,02,1","03,99,1,99,9,99,9,99999,9,99,9,99,9","03,1","SYN04294733 11/75 30904 10378 20172 60001 70300="
"94733099999","2019-01-04T22:00:00","4","-32.5833333","151.1666666","45.0","SINGLETON STP, AS","FM-12","99999","V020","290,1,N,0057,1","99999,9,9,N","020000,1,9,9","+0339,1","+0201,1","99999,9","24,0000,9,1",,"0,1,02,1","0,1,02,1",,"02,1","SYN05294733 11970 02911 10339 20201 60004 70200 333 70000="
"94733099999","2019-01-05T22:00:00","4","-32.5833333","151.1666666","45.0","SINGLETON STP, AS","FM-12","99999","V020","200,1,N,0026,1","99999,9,9,N","000100,1,9,9","+0209,1","+0193,1","99999,9","24,0004,3,1",,"1,1,02,1","1,1,02,1","08,99,1,99,9,99,9,99999,9,99,9,99,9","51,1","SYN05294733 11/01 82005 10209 20193 69944 75111 333 70004="
"94733099999","2019-01-08T04:00:00","4","-32.5833333","151.1666666","45.0","SINGLETON STP, AS","FM-12","99999","V020","070,1,N,0026,1","22000,1,9,N","025000,1,9,9","+0344,1","+0213,1","99999,9","06,0000,9,1",,"2,1,02,1","2,1,02,1","04,99,1,99,9,99,9,99999,9,99,9,99,9","02,1","SYN04294733 11/75 40705 10344 20213 60001 70222="
"""  # noqa: E501

回答 10

我过去使用过textwrap.dedent。这有点麻烦,所以我现在更喜欢连续行,但是如果您真的想要块缩进,我认为这很棒。

示例代码(其中修剪将去除带有切片的第一个“ \ n”):

import textwrap as tw
x = """\
       This is a yet another test.
       This is only a test"""
print(tw.dedent(x))

说明:

dedent根据换行之前第一行文本中的空格来计算缩进。如果您想对其进行调整,则可以使用该re模块轻松地重新实现它。

此方法有局限性,因为很长的行可能仍然比您想要的更长,在这种情况下,其他将字符串连接起来的方法更合适。

I’ve used textwrap.dedent in the past. It’s a little cumbersome so I prefer line continuations now but if you really want the block indent, I think this is great.

Example Code (where the trim is to get rid of the first ‘\n’ with a slice):

import textwrap as tw
x = """\
       This is a yet another test.
       This is only a test"""
print(tw.dedent(x))

Explanation:

dedent calculates the indentation based on the white space in the first line of text before a new line. If you wanted to tweak it, you could easily reimplement it using the re module.

This method has limitations in that very long lines may still be longer than you want in which case other methods that concatenate strings is more suitable.


在Python字符串的最后一个分隔符上分割?

问题:在Python字符串的最后一个分隔符上分割?

对于在字符串中最后一次出现定界符时拆分字符串的建议Python惯用法是什么?例:

# instead of regular split
>> s = "a,b,c,d"
>> s.split(",")
>> ['a', 'b', 'c', 'd']

# ..split only on last occurrence of ',' in string:
>>> s.mysplit(s, -1)
>>> ['a,b,c', 'd']

mysplit接受第二个参数,即要分割的分隔符的出现。像常规列表索引一样,-1表示末尾的末尾。如何才能做到这一点?

What’s the recommended Python idiom for splitting a string on the last occurrence of the delimiter in the string? example:

# instead of regular split
>> s = "a,b,c,d"
>> s.split(",")
>> ['a', 'b', 'c', 'd']

# ..split only on last occurrence of ',' in string:
>>> s.mysplit(s, -1)
>>> ['a,b,c', 'd']

mysplit takes a second argument that is the occurrence of the delimiter to be split. Like in regular list indexing, -1 means the last from the end. How can this be done?


回答 0

使用.rsplit().rpartition()代替:

s.rsplit(',', 1)
s.rpartition(',')

str.rsplit()可让您指定拆分次数,而str.rpartition()仅拆分一次,但始终返回固定数量的元素(前缀,定界符和后缀),并且对于单个拆分情况而言更快。

演示:

>>> s = "a,b,c,d"
>>> s.rsplit(',', 1)
['a,b,c', 'd']
>>> s.rsplit(',', 2)
['a,b', 'c', 'd']
>>> s.rpartition(',')
('a,b,c', ',', 'd')

两种方法都从字符串的右侧开始拆分;通过str.rsplit()将最大值作为第二个参数,您可以仅分割最右边的出现。

Use .rsplit() or .rpartition() instead:

s.rsplit(',', 1)
s.rpartition(',')

str.rsplit() lets you specify how many times to split, while str.rpartition() only splits once but always returns a fixed number of elements (prefix, delimiter & postfix) and is faster for the single split case.

Demo:

>>> s = "a,b,c,d"
>>> s.rsplit(',', 1)
['a,b,c', 'd']
>>> s.rsplit(',', 2)
['a,b', 'c', 'd']
>>> s.rpartition(',')
('a,b,c', ',', 'd')

Both methods start splitting from the right-hand-side of the string; by giving str.rsplit() a maximum as the second argument, you get to split just the right-hand-most occurrences.


回答 1

您可以使用rsplit

string.rsplit('delimeter',1)[1]

从反向获取字符串。

You can use rsplit

string.rsplit('delimeter',1)[1]

To get the string from reverse.


回答 2

我只是为了好玩而做

    >>> s = 'a,b,c,d'
    >>> [item[::-1] for item in s[::-1].split(',', 1)][::-1]
    ['a,b,c', 'd']

警告:请参阅下面的第一个评论,此答案可能会出错。

I just did this for fun

    >>> s = 'a,b,c,d'
    >>> [item[::-1] for item in s[::-1].split(',', 1)][::-1]
    ['a,b,c', 'd']

Caution: Refer to the first comment in below where this answer can go wrong.


替换字符串中多个字符的最佳方法?

问题:替换字符串中多个字符的最佳方法?

我需要替换一些字符,如下所示:&\&#\#,…

我编码如下,但是我想应该有一些更好的方法。有什么提示吗?

strs = strs.replace('&', '\&')
strs = strs.replace('#', '\#')
...

I need to replace some characters as follows: &\&, #\#, …

I coded as follows, but I guess there should be some better way. Any hints?

strs = strs.replace('&', '\&')
strs = strs.replace('#', '\#')
...

回答 0

替换两个字符

我给当前答案中的所有方法加上了一个额外的时间。

使用输入字符串abc&def#ghi并替换&-> \&和#-> \#,最快的方法是将替换链接在一起,如下所示:text.replace('&', '\&').replace('#', '\#')

每个功能的时间:

  • a)1000000次循环,每循环3:1.47μs最佳
  • b)1000000次循环,每循环3:1.51μs最佳
  • c)100000个循环,每个循环最好为3:12.3μs
  • d)100000个循环,每个循环最好为3:12μs
  • e)100000个循环,每个循环最好为3:3.27μs
  • f)1000000次循环,最好为3:每个循环0.817μs
  • g)100000个循环,每个循环3:3.64μs最佳
  • h)1000000次循环,每循环最好3:0.927μs
  • i)1000000次循环,最佳3:每个循环0.814μs

功能如下:

def a(text):
    chars = "&#"
    for c in chars:
        text = text.replace(c, "\\" + c)


def b(text):
    for ch in ['&','#']:
        if ch in text:
            text = text.replace(ch,"\\"+ch)


import re
def c(text):
    rx = re.compile('([&#])')
    text = rx.sub(r'\\\1', text)


RX = re.compile('([&#])')
def d(text):
    text = RX.sub(r'\\\1', text)


def mk_esc(esc_chars):
    return lambda s: ''.join(['\\' + c if c in esc_chars else c for c in s])
esc = mk_esc('&#')
def e(text):
    esc(text)


def f(text):
    text = text.replace('&', '\&').replace('#', '\#')


def g(text):
    replacements = {"&": "\&", "#": "\#"}
    text = "".join([replacements.get(c, c) for c in text])


def h(text):
    text = text.replace('&', r'\&')
    text = text.replace('#', r'\#')


def i(text):
    text = text.replace('&', r'\&').replace('#', r'\#')

像这样计时:

python -mtimeit -s"import time_functions" "time_functions.a('abc&def#ghi')"
python -mtimeit -s"import time_functions" "time_functions.b('abc&def#ghi')"
python -mtimeit -s"import time_functions" "time_functions.c('abc&def#ghi')"
python -mtimeit -s"import time_functions" "time_functions.d('abc&def#ghi')"
python -mtimeit -s"import time_functions" "time_functions.e('abc&def#ghi')"
python -mtimeit -s"import time_functions" "time_functions.f('abc&def#ghi')"
python -mtimeit -s"import time_functions" "time_functions.g('abc&def#ghi')"
python -mtimeit -s"import time_functions" "time_functions.h('abc&def#ghi')"
python -mtimeit -s"import time_functions" "time_functions.i('abc&def#ghi')"

替换17个字符

下面是类似的代码,可完成相同的操作,但转义的字符更多(\`* _ {}>#+-。!$):

def a(text):
    chars = "\\`*_{}[]()>#+-.!$"
    for c in chars:
        text = text.replace(c, "\\" + c)


def b(text):
    for ch in ['\\','`','*','_','{','}','[',']','(',')','>','#','+','-','.','!','$','\'']:
        if ch in text:
            text = text.replace(ch,"\\"+ch)


import re
def c(text):
    rx = re.compile('([&#])')
    text = rx.sub(r'\\\1', text)


RX = re.compile('([\\`*_{}[]()>#+-.!$])')
def d(text):
    text = RX.sub(r'\\\1', text)


def mk_esc(esc_chars):
    return lambda s: ''.join(['\\' + c if c in esc_chars else c for c in s])
esc = mk_esc('\\`*_{}[]()>#+-.!$')
def e(text):
    esc(text)


def f(text):
    text = text.replace('\\', '\\\\').replace('`', '\`').replace('*', '\*').replace('_', '\_').replace('{', '\{').replace('}', '\}').replace('[', '\[').replace(']', '\]').replace('(', '\(').replace(')', '\)').replace('>', '\>').replace('#', '\#').replace('+', '\+').replace('-', '\-').replace('.', '\.').replace('!', '\!').replace('$', '\$')


def g(text):
    replacements = {
        "\\": "\\\\",
        "`": "\`",
        "*": "\*",
        "_": "\_",
        "{": "\{",
        "}": "\}",
        "[": "\[",
        "]": "\]",
        "(": "\(",
        ")": "\)",
        ">": "\>",
        "#": "\#",
        "+": "\+",
        "-": "\-",
        ".": "\.",
        "!": "\!",
        "$": "\$",
    }
    text = "".join([replacements.get(c, c) for c in text])


def h(text):
    text = text.replace('\\', r'\\')
    text = text.replace('`', r'\`')
    text = text.replace('*', r'\*')
    text = text.replace('_', r'\_')
    text = text.replace('{', r'\{')
    text = text.replace('}', r'\}')
    text = text.replace('[', r'\[')
    text = text.replace(']', r'\]')
    text = text.replace('(', r'\(')
    text = text.replace(')', r'\)')
    text = text.replace('>', r'\>')
    text = text.replace('#', r'\#')
    text = text.replace('+', r'\+')
    text = text.replace('-', r'\-')
    text = text.replace('.', r'\.')
    text = text.replace('!', r'\!')
    text = text.replace('$', r'\$')


def i(text):
    text = text.replace('\\', r'\\').replace('`', r'\`').replace('*', r'\*').replace('_', r'\_').replace('{', r'\{').replace('}', r'\}').replace('[', r'\[').replace(']', r'\]').replace('(', r'\(').replace(')', r'\)').replace('>', r'\>').replace('#', r'\#').replace('+', r'\+').replace('-', r'\-').replace('.', r'\.').replace('!', r'\!').replace('$', r'\$')

这是相同输入字符串的结果abc&def#ghi

  • a)100000个循环,每个循环最好3:6.72μs
  • b)100000个循环,每个循环最好3:2.64μs
  • c)100000个循环,每个循环最好3:11.9μs
  • d)100000个循环,每个循环的最佳时间为3:4.92μs
  • e)100000个循环,每个循环最好为3:2.96μs
  • f)100000个循环,每个循环最好为3:4.29μs
  • g)100000个循环,每个循环最好为3:4.68μs
  • h)100000次循环,每循环3:4.73μs最佳
  • i)100000个循环,每个循环最好为3:4.24μs

并使用更长的输入字符串(## *Something* and [another] thing in a longer sentence with {more} things to replace$):

  • a)100000个循环,每个循环的最佳时间为3:7.59μs
  • b)100000个循环,每个循环最好为3:6.54μs
  • c)100000个循环,每个循环最好3:16.9μs
  • d)100000个循环,每个循环最好3.:7.29μs
  • e)100000个循环,每个循环最好为3:12.2μs
  • f)100000个循环,每个循环最好为3:3.38μs
  • g)10000个循环,每个循环最好3:21.7μs
  • h)100000个循环,最佳3:每个循环5.7μs
  • i)100000个循环,每个循环中最好为3:5.13μs

添加几个变体:

def ab(text):
    for ch in ['\\','`','*','_','{','}','[',']','(',')','>','#','+','-','.','!','$','\'']:
        text = text.replace(ch,"\\"+ch)


def ba(text):
    chars = "\\`*_{}[]()>#+-.!$"
    for c in chars:
        if c in text:
            text = text.replace(c, "\\" + c)

输入较短时:

  • ab)100000个循环,每个循环中最好为3:7.05μs
  • ba)100000个循环,最佳3:每个循环2.4μs

输入较长时:

  • ab)100000个循环,每个循环最好为3:3.71μs
  • ba)100000次循环,每循环最好3:6.08μs

因此,我将使用ba其可读性和速度。

附录

在注释中出现提示时,ab和之间的区别baif c in text:检查。让我们针对另外两个变体进行测试:

def ab_with_check(text):
    for ch in ['\\','`','*','_','{','}','[',']','(',')','>','#','+','-','.','!','$','\'']:
        if ch in text:
            text = text.replace(ch,"\\"+ch)

def ba_without_check(text):
    chars = "\\`*_{}[]()>#+-.!$"
    for c in chars:
        text = text.replace(c, "\\" + c)

在Python 2.7.14和3.6.3上以及与先前设置不同的计算机上,每个循环的时间以μs为单位。

╭────────────╥──────┬───────────────┬──────┬──────────────────╮
 Py, input    ab   ab_with_check   ba   ba_without_check 
╞════════════╬══════╪═══════════════╪══════╪══════════════════╡
 Py2, short  8.81     4.22        3.45     8.01          
 Py3, short  5.54     1.34        1.46     5.34          
├────────────╫──────┼───────────────┼──────┼──────────────────┤
 Py2, long   9.3      7.15        6.85     8.55          
 Py3, long   7.43     4.38        4.41     7.02          
└────────────╨──────┴───────────────┴──────┴──────────────────┘

我们可以得出以下结论:

  • 有支票的人的速度比没有支票的人快4倍

  • ab_with_check在Python 3上稍有领先,但是ba(带检查)在Python 2上有更大的领先优势

  • 但是,这里最大的教训是Python 3比Python 2快3倍!Python 3上最慢的速度与Python 2上最快的速度之间并没有太大的区别!

Replacing two characters

I timed all the methods in the current answers along with one extra.

With an input string of abc&def#ghi and replacing & -> \& and # -> \#, the fastest way was to chain together the replacements like this: text.replace('&', '\&').replace('#', '\#').

Timings for each function:

  • a) 1000000 loops, best of 3: 1.47 μs per loop
  • b) 1000000 loops, best of 3: 1.51 μs per loop
  • c) 100000 loops, best of 3: 12.3 μs per loop
  • d) 100000 loops, best of 3: 12 μs per loop
  • e) 100000 loops, best of 3: 3.27 μs per loop
  • f) 1000000 loops, best of 3: 0.817 μs per loop
  • g) 100000 loops, best of 3: 3.64 μs per loop
  • h) 1000000 loops, best of 3: 0.927 μs per loop
  • i) 1000000 loops, best of 3: 0.814 μs per loop

Here are the functions:

def a(text):
    chars = "&#"
    for c in chars:
        text = text.replace(c, "\\" + c)


def b(text):
    for ch in ['&','#']:
        if ch in text:
            text = text.replace(ch,"\\"+ch)


import re
def c(text):
    rx = re.compile('([&#])')
    text = rx.sub(r'\\\1', text)


RX = re.compile('([&#])')
def d(text):
    text = RX.sub(r'\\\1', text)


def mk_esc(esc_chars):
    return lambda s: ''.join(['\\' + c if c in esc_chars else c for c in s])
esc = mk_esc('&#')
def e(text):
    esc(text)


def f(text):
    text = text.replace('&', '\&').replace('#', '\#')


def g(text):
    replacements = {"&": "\&", "#": "\#"}
    text = "".join([replacements.get(c, c) for c in text])


def h(text):
    text = text.replace('&', r'\&')
    text = text.replace('#', r'\#')


def i(text):
    text = text.replace('&', r'\&').replace('#', r'\#')

Timed like this:

python -mtimeit -s"import time_functions" "time_functions.a('abc&def#ghi')"
python -mtimeit -s"import time_functions" "time_functions.b('abc&def#ghi')"
python -mtimeit -s"import time_functions" "time_functions.c('abc&def#ghi')"
python -mtimeit -s"import time_functions" "time_functions.d('abc&def#ghi')"
python -mtimeit -s"import time_functions" "time_functions.e('abc&def#ghi')"
python -mtimeit -s"import time_functions" "time_functions.f('abc&def#ghi')"
python -mtimeit -s"import time_functions" "time_functions.g('abc&def#ghi')"
python -mtimeit -s"import time_functions" "time_functions.h('abc&def#ghi')"
python -mtimeit -s"import time_functions" "time_functions.i('abc&def#ghi')"

Replacing 17 characters

Here’s similar code to do the same but with more characters to escape (\`*_{}>#+-.!$):

def a(text):
    chars = "\\`*_{}[]()>#+-.!$"
    for c in chars:
        text = text.replace(c, "\\" + c)


def b(text):
    for ch in ['\\','`','*','_','{','}','[',']','(',')','>','#','+','-','.','!','$','\'']:
        if ch in text:
            text = text.replace(ch,"\\"+ch)


import re
def c(text):
    rx = re.compile('([&#])')
    text = rx.sub(r'\\\1', text)


RX = re.compile('([\\`*_{}[]()>#+-.!$])')
def d(text):
    text = RX.sub(r'\\\1', text)


def mk_esc(esc_chars):
    return lambda s: ''.join(['\\' + c if c in esc_chars else c for c in s])
esc = mk_esc('\\`*_{}[]()>#+-.!$')
def e(text):
    esc(text)


def f(text):
    text = text.replace('\\', '\\\\').replace('`', '\`').replace('*', '\*').replace('_', '\_').replace('{', '\{').replace('}', '\}').replace('[', '\[').replace(']', '\]').replace('(', '\(').replace(')', '\)').replace('>', '\>').replace('#', '\#').replace('+', '\+').replace('-', '\-').replace('.', '\.').replace('!', '\!').replace('$', '\$')


def g(text):
    replacements = {
        "\\": "\\\\",
        "`": "\`",
        "*": "\*",
        "_": "\_",
        "{": "\{",
        "}": "\}",
        "[": "\[",
        "]": "\]",
        "(": "\(",
        ")": "\)",
        ">": "\>",
        "#": "\#",
        "+": "\+",
        "-": "\-",
        ".": "\.",
        "!": "\!",
        "$": "\$",
    }
    text = "".join([replacements.get(c, c) for c in text])


def h(text):
    text = text.replace('\\', r'\\')
    text = text.replace('`', r'\`')
    text = text.replace('*', r'\*')
    text = text.replace('_', r'\_')
    text = text.replace('{', r'\{')
    text = text.replace('}', r'\}')
    text = text.replace('[', r'\[')
    text = text.replace(']', r'\]')
    text = text.replace('(', r'\(')
    text = text.replace(')', r'\)')
    text = text.replace('>', r'\>')
    text = text.replace('#', r'\#')
    text = text.replace('+', r'\+')
    text = text.replace('-', r'\-')
    text = text.replace('.', r'\.')
    text = text.replace('!', r'\!')
    text = text.replace('$', r'\$')


def i(text):
    text = text.replace('\\', r'\\').replace('`', r'\`').replace('*', r'\*').replace('_', r'\_').replace('{', r'\{').replace('}', r'\}').replace('[', r'\[').replace(']', r'\]').replace('(', r'\(').replace(')', r'\)').replace('>', r'\>').replace('#', r'\#').replace('+', r'\+').replace('-', r'\-').replace('.', r'\.').replace('!', r'\!').replace('$', r'\$')

Here’s the results for the same input string abc&def#ghi:

  • a) 100000 loops, best of 3: 6.72 μs per loop
  • b) 100000 loops, best of 3: 2.64 μs per loop
  • c) 100000 loops, best of 3: 11.9 μs per loop
  • d) 100000 loops, best of 3: 4.92 μs per loop
  • e) 100000 loops, best of 3: 2.96 μs per loop
  • f) 100000 loops, best of 3: 4.29 μs per loop
  • g) 100000 loops, best of 3: 4.68 μs per loop
  • h) 100000 loops, best of 3: 4.73 μs per loop
  • i) 100000 loops, best of 3: 4.24 μs per loop

And with a longer input string (## *Something* and [another] thing in a longer sentence with {more} things to replace$):

  • a) 100000 loops, best of 3: 7.59 μs per loop
  • b) 100000 loops, best of 3: 6.54 μs per loop
  • c) 100000 loops, best of 3: 16.9 μs per loop
  • d) 100000 loops, best of 3: 7.29 μs per loop
  • e) 100000 loops, best of 3: 12.2 μs per loop
  • f) 100000 loops, best of 3: 5.38 μs per loop
  • g) 10000 loops, best of 3: 21.7 μs per loop
  • h) 100000 loops, best of 3: 5.7 μs per loop
  • i) 100000 loops, best of 3: 5.13 μs per loop

Adding a couple of variants:

def ab(text):
    for ch in ['\\','`','*','_','{','}','[',']','(',')','>','#','+','-','.','!','$','\'']:
        text = text.replace(ch,"\\"+ch)


def ba(text):
    chars = "\\`*_{}[]()>#+-.!$"
    for c in chars:
        if c in text:
            text = text.replace(c, "\\" + c)

With the shorter input:

  • ab) 100000 loops, best of 3: 7.05 μs per loop
  • ba) 100000 loops, best of 3: 2.4 μs per loop

With the longer input:

  • ab) 100000 loops, best of 3: 7.71 μs per loop
  • ba) 100000 loops, best of 3: 6.08 μs per loop

So I’m going to use ba for readability and speed.

Addendum

Prompted by haccks in the comments, one difference between ab and ba is the if c in text: check. Let’s test them against two more variants:

def ab_with_check(text):
    for ch in ['\\','`','*','_','{','}','[',']','(',')','>','#','+','-','.','!','$','\'']:
        if ch in text:
            text = text.replace(ch,"\\"+ch)

def ba_without_check(text):
    chars = "\\`*_{}[]()>#+-.!$"
    for c in chars:
        text = text.replace(c, "\\" + c)

Times in μs per loop on Python 2.7.14 and 3.6.3, and on a different machine from the earlier set, so cannot be compared directly.

╭────────────╥──────┬───────────────┬──────┬──────────────────╮
│ Py, input  ║  ab  │ ab_with_check │  ba  │ ba_without_check │
╞════════════╬══════╪═══════════════╪══════╪══════════════════╡
│ Py2, short ║ 8.81 │    4.22       │ 3.45 │    8.01          │
│ Py3, short ║ 5.54 │    1.34       │ 1.46 │    5.34          │
├────────────╫──────┼───────────────┼──────┼──────────────────┤
│ Py2, long  ║ 9.3  │    7.15       │ 6.85 │    8.55          │
│ Py3, long  ║ 7.43 │    4.38       │ 4.41 │    7.02          │
└────────────╨──────┴───────────────┴──────┴──────────────────┘

We can conclude that:

  • Those with the check are up to 4x faster than those without the check

  • ab_with_check is slightly in the lead on Python 3, but ba (with check) has a greater lead on Python 2

  • However, the biggest lesson here is Python 3 is up to 3x faster than Python 2! There’s not a huge difference between the slowest on Python 3 and fastest on Python 2!


回答 1

>>> string="abc&def#ghi"
>>> for ch in ['&','#']:
...   if ch in string:
...      string=string.replace(ch,"\\"+ch)
...
>>> print string
abc\&def\#ghi
>>> string="abc&def#ghi"
>>> for ch in ['&','#']:
...   if ch in string:
...      string=string.replace(ch,"\\"+ch)
...
>>> print string
abc\&def\#ghi

回答 2

replace像这样简单地链接功能

strs = "abc&def#ghi"
print strs.replace('&', '\&').replace('#', '\#')
# abc\&def\#ghi

如果替代品的数量会更多,您可以采用这种通用方式

strs, replacements = "abc&def#ghi", {"&": "\&", "#": "\#"}
print "".join([replacements.get(c, c) for c in strs])
# abc\&def\#ghi

Simply chain the replace functions like this

strs = "abc&def#ghi"
print strs.replace('&', '\&').replace('#', '\#')
# abc\&def\#ghi

If the replacements are going to be more in number, you can do this in this generic way

strs, replacements = "abc&def#ghi", {"&": "\&", "#": "\#"}
print "".join([replacements.get(c, c) for c in strs])
# abc\&def\#ghi

回答 3

这是使用str.translate和的python3方法str.maketrans

s = "abc&def#ghi"
print(s.translate(str.maketrans({'&': '\&', '#': '\#'})))

打印的字符串是abc\&def\#ghi

Here is a python3 method using str.translate and str.maketrans:

s = "abc&def#ghi"
print(s.translate(str.maketrans({'&': '\&', '#': '\#'})))

The printed string is abc\&def\#ghi.


回答 4

您是否总是要加一个反斜杠?如果是这样,请尝试

import re
rx = re.compile('([&#])')
#                  ^^ fill in the characters here.
strs = rx.sub('\\\\\\1', strs)

它可能不是最有效的方法,但我认为这是最简单的方法。

Are you always going to prepend a backslash? If so, try

import re
rx = re.compile('([&#])')
#                  ^^ fill in the characters here.
strs = rx.sub('\\\\\\1', strs)

It may not be the most efficient method but I think it is the easiest.


回答 5

晚会晚了,但是我在这个问题上浪费了很多时间,直到找到答案。

简短而甜美,translate优于replace。如果您对随时间推移进行的功能优化更感兴趣,请不要使用replace

还可以使用translate,如果你不知道,如果重叠设置的用于替换的字符将被替换的字符集。

例子:

使用replace您会天真地希望代码片段"1234".replace("1", "2").replace("2", "3").replace("3", "4")返回"2344",但实际上它将返回"4444"

翻译似乎可以执行OP最初想要的功能。

Late to the party, but I lost a lot of time with this issue until I found my answer.

Short and sweet, translate is superior to replace. If you’re more interested in funcionality over time optimization, do not use replace.

Also use translate if you don’t know if the set of characters to be replaced overlaps the set of characters used to replace.

Case in point:

Using replace you would naively expect the snippet "1234".replace("1", "2").replace("2", "3").replace("3", "4") to return "2344", but it will return in fact "4444".

Translation seems to perform what OP originally desired.


回答 6

您可以考虑编写通用的转义函数:

def mk_esc(esc_chars):
    return lambda s: ''.join(['\\' + c if c in esc_chars else c for c in s])

>>> esc = mk_esc('&#')
>>> print esc('Learn & be #1')
Learn \& be \#1

这样,您就可以使用应转义的字符列表来配置函数。

You may consider writing a generic escape function:

def mk_esc(esc_chars):
    return lambda s: ''.join(['\\' + c if c in esc_chars else c for c in s])

>>> esc = mk_esc('&#')
>>> print esc('Learn & be #1')
Learn \& be \#1

This way you can make your function configurable with a list of character that should be escaped.


回答 7

仅供参考,这对OP几乎没有用处,甚至没有用,但是对其他读者可能有用(请不要投票,我知道这一点)。

作为一个有点荒谬但有趣的练习,我想看看我是否可以使用python函数式编程来替换多个字符。我很确定这不会打败两次调用replace()。如果性能是一个问题,您可以轻松地在rust,C,julia,perl,java,javascript和awk中击败它。它使用一个名为pytoolz的外部“帮助程序”包,该包通过cython加速(cytoolz,这是一个pypi包)。

from cytoolz.functoolz import compose
from cytoolz.itertoolz import chain,sliding_window
from itertools import starmap,imap,ifilter
from operator import itemgetter,contains
text='&hello#hi&yo&'
char_index_iter=compose(partial(imap, itemgetter(0)), partial(ifilter, compose(partial(contains, '#&'), itemgetter(1))), enumerate)
print '\\'.join(imap(text.__getitem__, starmap(slice, sliding_window(2, chain((0,), char_index_iter(text), (len(text),))))))

我什至不解释这一点,因为没有人会费心使用它来完成多次替换。但是,我在执行此操作时感到有些成就,并认为这可能会激发其他读者或赢得代码混淆竞赛。

FYI, this is of little or no use to the OP but it may be of use to other readers (please do not downvote, I’m aware of this).

As a somewhat ridiculous but interesting exercise, wanted to see if I could use python functional programming to replace multiple chars. I’m pretty sure this does NOT beat just calling replace() twice. And if performance was an issue, you could easily beat this in rust, C, julia, perl, java, javascript and maybe even awk. It uses an external ‘helpers’ package called pytoolz, accelerated via cython (cytoolz, it’s a pypi package).

from cytoolz.functoolz import compose
from cytoolz.itertoolz import chain,sliding_window
from itertools import starmap,imap,ifilter
from operator import itemgetter,contains
text='&hello#hi&yo&'
char_index_iter=compose(partial(imap, itemgetter(0)), partial(ifilter, compose(partial(contains, '#&'), itemgetter(1))), enumerate)
print '\\'.join(imap(text.__getitem__, starmap(slice, sliding_window(2, chain((0,), char_index_iter(text), (len(text),))))))

I’m not even going to explain this because no one would bother using this to accomplish multiple replace. Nevertheless, I felt somewhat accomplished in doing this and thought it might inspire other readers or win a code obfuscation contest.


回答 8

使用reduce可以在python2.7和python3。*中使用,您可以轻松地以干净的pythonic方式替换多个子字符串。

# Lets define a helper method to make it easy to use
def replacer(text, replacements):
    return reduce(
        lambda text, ptuple: text.replace(ptuple[0], ptuple[1]), 
        replacements, text
    )

if __name__ == '__main__':
    uncleaned_str = "abc&def#ghi"
    cleaned_str = replacer(uncleaned_str, [("&","\&"),("#","\#")])
    print(cleaned_str) # "abc\&def\#ghi"

在python2.7中,您不必导入reduce,但在python3。*中,您必须从functools模块中导入它。

Using reduce which is available in python2.7 and python3.* you can easily replace mutiple substrings in a clean and pythonic way.

# Lets define a helper method to make it easy to use
def replacer(text, replacements):
    return reduce(
        lambda text, ptuple: text.replace(ptuple[0], ptuple[1]), 
        replacements, text
    )

if __name__ == '__main__':
    uncleaned_str = "abc&def#ghi"
    cleaned_str = replacer(uncleaned_str, [("&","\&"),("#","\#")])
    print(cleaned_str) # "abc\&def\#ghi"

In python2.7 you don’t have to import reduce but in python3.* you have to import it from the functools module.


回答 9

也许是char替换的简单循环:

a = '&#'

to_replace = ['&', '#']

for char in to_replace:
    a = a.replace(char, "\\"+char)

print(a)

>>> \&\#

Maybe a simple loop for chars to replace:

a = '&#'

to_replace = ['&', '#']

for char in to_replace:
    a = a.replace(char, "\\"+char)

print(a)

>>> \&\#

回答 10

这个怎么样?

def replace_all(dict, str):
    for key in dict:
        str = str.replace(key, dict[key])
    return str

然后

print(replace_all({"&":"\&", "#":"\#"}, "&#"))

输出

\&\#

类似于答案

How about this?

def replace_all(dict, str):
    for key in dict:
        str = str.replace(key, dict[key])
    return str

then

print(replace_all({"&":"\&", "#":"\#"}, "&#"))

output

\&\#

similar to answer


回答 11

>>> a = '&#'
>>> print a.replace('&', r'\&')
\&#
>>> print a.replace('#', r'\#')
&\#
>>> 

您希望使用“原始”字符串(由替换字符串作为前缀的“ r”表示),因为原始字符串不会特别对待反斜杠。

>>> a = '&#'
>>> print a.replace('&', r'\&')
\&#
>>> print a.replace('#', r'\#')
&\#
>>> 

You want to use a ‘raw’ string (denoted by the ‘r’ prefixing the replacement string), since raw strings to not treat the backslash specially.


用python中的分隔符分割字符串

问题:用python中的分隔符分割字符串

如何__在定界符哪里分割此字符串

MATCHES__STRING

得到['MATCHES', 'STRING']?的输出

How to split this string where __ is the delimiter

MATCHES__STRING

To get an output of ['MATCHES', 'STRING']?


回答 0

您可以使用以下str.split功能:string.split('__')

>>> "MATCHES__STRING".split("__")
['MATCHES', 'STRING']

You can use the str.split function: string.split('__')

>>> "MATCHES__STRING".split("__")
['MATCHES', 'STRING']

回答 1

您可能对csv模块感兴趣,该模块是为逗号分隔的文件设计的,但可以轻松修改以使用自定义定界符。

import csv
csv.register_dialect( "myDialect", delimiter = "__", <other-options> )
lines = [ "MATCHES__STRING" ]

for row in csv.reader( lines ):
    ...

You may be interested in the csv module, which is designed for comma-separated files but can be easily modified to use a custom delimiter.

import csv
csv.register_dialect( "myDialect", delimiter = "__", <other-options> )
lines = [ "MATCHES__STRING" ]

for row in csv.reader( lines ):
    ...

回答 2

如果字符串中有两个或多个(在下面的示例中有三个)元素,则可以使用逗号分隔以下各项:

date, time, event_name = ev.get_text(separator='@').split("@")

在这行代码之后,三个变量将具有变量ev的三个部分的值

因此,如果变量ev包含此字符串,并且我们应用分隔符’@’:

Sa.,23.März@ 19:00 @ Klavier + Orchester:SPEZIAL

然后,在拆分操作之后,变量

  • 日期的值将为“ Sa.,23.März”
  • 时间的值为“ 19:00”
  • event_name的值将为“ Klavier + Orchester:SPEZIAL”

When you have two or more (in the example below there’re three) elements in the string, then you can use comma to separate these items:

date, time, event_name = ev.get_text(separator='@').split("@")

After this line of code, the three variables will have values from three parts of the variable ev

So, if the variable ev contains this string and we apply separator ‘@’:

Sa., 23. März@19:00@Klavier + Orchester: SPEZIAL

Then, after split operation the variable

  • date will have value “Sa., 23. März”
  • time will have value “19:00”
  • event_name will have value “Klavier + Orchester: SPEZIAL”

Python CSV字符串到数组

问题:Python CSV字符串到数组

有人知道一个简单的库或函数来解析csv编码的字符串并将其转换为数组或字典吗?

我不认为我想要内置的csv模块,因为在所有示例中,我看到的都是文件路径,而不是字符串。

Anyone know of a simple library or function to parse a csv encoded string and turn it into an array or dictionary?

I don’t think I want the built in csv module because in all the examples I’ve seen that takes filepaths, not strings.


回答 0

您可以使用将字符串转换为文件对象io.StringIO,然后将其传递给csv模块:

from io import StringIO
import csv

scsv = """text,with,Polish,non-Latin,letters
1,2,3,4,5,6
a,b,c,d,e,f
gęś,zółty,wąż,idzie,wąską,dróżką,
"""

f = StringIO(scsv)
reader = csv.reader(f, delimiter=',')
for row in reader:
    print('\t'.join(row))

带有split()换行符的简单版本:

reader = csv.reader(scsv.split('\n'), delimiter=',')
for row in reader:
    print('\t'.join(row))

或者,您可以split()使用\n分隔符将此字符串简单地分成几行,然后将split()每一行变成值,但是这种方式您必须知道引号,因此csv首选使用module。

Python 2上,您必须导入StringIO

from StringIO import StringIO

代替。

You can convert a string to a file object using io.StringIO and then pass that to the csv module:

from io import StringIO
import csv

scsv = """text,with,Polish,non-Latin,letters
1,2,3,4,5,6
a,b,c,d,e,f
gęś,zółty,wąż,idzie,wąską,dróżką,
"""

f = StringIO(scsv)
reader = csv.reader(f, delimiter=',')
for row in reader:
    print('\t'.join(row))

simpler version with split() on newlines:

reader = csv.reader(scsv.split('\n'), delimiter=',')
for row in reader:
    print('\t'.join(row))

Or you can simply split() this string into lines using \n as separator, and then split() each line into values, but this way you must be aware of quoting, so using csv module is preferred.

On Python 2 you have to import StringIO as

from StringIO import StringIO

instead.


回答 1

简单-csv模块也可以使用列表:

>>> a=["1,2,3","4,5,6"]  # or a = "1,2,3\n4,5,6".split('\n')
>>> import csv
>>> x = csv.reader(a)
>>> list(x)
[['1', '2', '3'], ['4', '5', '6']]

Simple – the csv module works with lists, too:

>>> a=["1,2,3","4,5,6"]  # or a = "1,2,3\n4,5,6".split('\n')
>>> import csv
>>> x = csv.reader(a)
>>> list(x)
[['1', '2', '3'], ['4', '5', '6']]

回答 2

csv.reader() https://docs.python.org/2/library/csv.html的官方文档 非常有帮助,它说

文件对象和列表对象都适合

import csv

text = """1,2,3
a,b,c
d,e,f"""

lines = text.splitlines()
reader = csv.reader(lines, delimiter=',')
for row in reader:
    print('\t'.join(row))

The official doc for csv.reader() https://docs.python.org/2/library/csv.html is very helpful, which says

file objects and list objects are both suitable

import csv

text = """1,2,3
a,b,c
d,e,f"""

lines = text.splitlines()
reader = csv.reader(lines, delimiter=',')
for row in reader:
    print('\t'.join(row))

回答 3

>>> a = "1,2"
>>> a
'1,2'
>>> b = a.split(",")
>>> b
['1', '2']

解析CSV文件:

f = open(file.csv, "r")
lines = f.read().split("\n") # "\r\n" if needed

for line in lines:
    if line != "": # add other needed checks to skip titles
        cols = line.split(",")
        print cols
>>> a = "1,2"
>>> a
'1,2'
>>> b = a.split(",")
>>> b
['1', '2']

To parse a CSV file:

f = open(file.csv, "r")
lines = f.read().split("\n") # "\r\n" if needed

for line in lines:
    if line != "": # add other needed checks to skip titles
        cols = line.split(",")
        print cols

回答 4

正如其他人已经指出的那样,Python包含一个用于读取和写入CSV文件的模块。只要输入字符保持在ASCII限制内,它就可以很好地工作。如果您要处理其他编码,则需要做更多的工作。

csv模块Python文档实现了csv.reader的扩展,该扩展使用相同的接口,但可以处理其他编码并返回unicode字符串。只需复制并粘贴文档中的代码即可。之后,您可以像这样处理CSV文件:

with open("some.csv", "rb") as csvFile: 
    for row in UnicodeReader(csvFile, encoding="iso-8859-15"):
        print row

As others have already pointed out, Python includes a module to read and write CSV files. It works pretty well as long as the input characters stay within ASCII limits. In case you want to process other encodings, more work is needed.

The Python documentation for the csv module implements an extension of csv.reader, which uses the same interface but can handle other encodings and returns unicode strings. Just copy and paste the code from the documentation. After that, you can process a CSV file like this:

with open("some.csv", "rb") as csvFile: 
    for row in UnicodeReader(csvFile, encoding="iso-8859-15"):
        print row

回答 5

根据文档:

尽管该模块不直接支持解析字符串,但可以轻松实现:

import csv
for row in csv.reader(['one,two,three']):
    print row

只需将您的字符串转换为单个元素列表即可。

当这个例子在文档中明确时,导入StringIO对我来说似乎有点多余。

Per the documentation:

And while the module doesn’t directly support parsing strings, it can easily be done:

import csv
for row in csv.reader(['one,two,three']):
    print row

Just turn your string into a single element list.

Importing StringIO seems a bit excessive to me when this example is explicitly in the docs.


回答 6

https://docs.python.org/2/library/csv.html?highlight=csv#csv.reader

csvfile可以是任何支持迭代器协议的对象,并且每次调用其next()方法时都会返回一个字符串

因此,一个StringIO.StringIO()str.splitlines()甚至一个生成器都很好。

https://docs.python.org/2/library/csv.html?highlight=csv#csv.reader

csvfile can be any object which supports the iterator protocol and returns a string each time its next() method is called

Thus, a StringIO.StringIO(), str.splitlines() or even a generator are all good.


回答 7

这是一个替代解决方案:

>>> import pyexcel as pe
>>> text="""1,2,3
... a,b,c
... d,e,f"""
>>> s = pe.load_from_memory('csv', text)
>>> s
Sheet Name: csv
+---+---+---+
| 1 | 2 | 3 |
+---+---+---+
| a | b | c |
+---+---+---+
| d | e | f |
+---+---+---+
>>> s.to_array()
[[u'1', u'2', u'3'], [u'a', u'b', u'c'], [u'd', u'e', u'f']]

这是文档

Here’s an alternative solution:

>>> import pyexcel as pe
>>> text="""1,2,3
... a,b,c
... d,e,f"""
>>> s = pe.load_from_memory('csv', text)
>>> s
Sheet Name: csv
+---+---+---+
| 1 | 2 | 3 |
+---+---+---+
| a | b | c |
+---+---+---+
| d | e | f |
+---+---+---+
>>> s.to_array()
[[u'1', u'2', u'3'], [u'a', u'b', u'c'], [u'd', u'e', u'f']]

Here’s the documentation


回答 8

使用此功能将csv加载到列表中

import csv

csvfile = open(myfile, 'r')
reader = csv.reader(csvfile, delimiter='\t')
my_list = list(reader)
print my_list
>>>[['1st_line', '0'],
    ['2nd_line', '0']]

Use this to have a csv loaded into a list

import csv

csvfile = open(myfile, 'r')
reader = csv.reader(csvfile, delimiter='\t')
my_list = list(reader)
print my_list
>>>[['1st_line', '0'],
    ['2nd_line', '0']]

回答 9

Panda是功能强大且智能的库,可使用Python读取CSV

这里有一个简单的例子,我有example.zip文件,其中有四个文件。

EXAMPLE.zip
 -- example1.csv
 -- example1.txt
 -- example2.csv
 -- example2.txt

from zipfile import ZipFile
import pandas as pd


filepath = 'EXAMPLE.zip'
file_prefix = filepath[:-4].lower()

zipfile = ZipFile(filepath)
target_file = ''.join([file_prefix, '/', file_prefix, 1 , '.csv'])

df = pd.read_csv(zipfile.open(target_file))

print(df.head()) # print first five row of csv
print(df[COL_NAME]) # fetch the col_name data

有了数据后,您就可以操纵播放列表或其他格式。

Panda is quite powerful and smart library reading CSV in Python

A simple example here, I have example.zip file with four files in it.

EXAMPLE.zip
 -- example1.csv
 -- example1.txt
 -- example2.csv
 -- example2.txt

from zipfile import ZipFile
import pandas as pd


filepath = 'EXAMPLE.zip'
file_prefix = filepath[:-4].lower()

zipfile = ZipFile(filepath)
target_file = ''.join([file_prefix, '/', file_prefix, 1 , '.csv'])

df = pd.read_csv(zipfile.open(target_file))

print(df.head()) # print first five row of csv
print(df[COL_NAME]) # fetch the col_name data

Once you have data you can manipulate to play with a list or other formats.


检查字符串是否包含数字

问题:检查字符串是否包含数字

我发现的大多数问题都偏向于他们正在寻找数字中的字母的事实,而我正在寻找我想成为无数字符串的数字。我需要输入一个字符串,并检查它是否包含任何数字以及是否确实拒绝它。

isdigit()True当所有字符均为数字时,该函数才返回。我只想看看用户是否输入了一个数字,如一个句子之类的"I own 1 dog"

有任何想法吗?

Most of the questions I’ve found are biased on the fact they’re looking for letters in their numbers, whereas I’m looking for numbers in what I’d like to be a numberless string. I need to enter a string and check to see if it contains any numbers and if it does reject it.

The function isdigit() only returns True if ALL of the characters are numbers. I just want to see if the user has entered a number so a sentence like "I own 1 dog" or something.

Any ideas?


回答 0

您可以像这样使用any函数和str.isdigit函数

>>> def hasNumbers(inputString):
...     return any(char.isdigit() for char in inputString)
... 
>>> hasNumbers("I own 1 dog")
True
>>> hasNumbers("I own no dog")
False

另外,您可以使用正则表达式,如下所示

>>> import re
>>> def hasNumbers(inputString):
...     return bool(re.search(r'\d', inputString))
... 
>>> hasNumbers("I own 1 dog")
True
>>> hasNumbers("I own no dog")
False

You can use any function, with the str.isdigit function, like this

>>> def hasNumbers(inputString):
...     return any(char.isdigit() for char in inputString)
... 
>>> hasNumbers("I own 1 dog")
True
>>> hasNumbers("I own no dog")
False

Alternatively you can use a Regular Expression, like this

>>> import re
>>> def hasNumbers(inputString):
...     return bool(re.search(r'\d', inputString))
... 
>>> hasNumbers("I own 1 dog")
True
>>> hasNumbers("I own no dog")
False

回答 1

您可以使用组合anystr.isdigit

def num_there(s):
    return any(i.isdigit() for i in s)

True如果字符串中存在数字,则该函数将返回,否则返回False

演示:

>>> king = 'I shall have 3 cakes'
>>> num_there(king)
True
>>> servant = 'I do not have any cakes'
>>> num_there(servant)
False

You can use a combination of any and str.isdigit:

def num_there(s):
    return any(i.isdigit() for i in s)

The function will return True if a digit exists in the string, otherwise False.

Demo:

>>> king = 'I shall have 3 cakes'
>>> num_there(king)
True
>>> servant = 'I do not have any cakes'
>>> num_there(servant)
False

回答 2

str.isalpha() 

参考:https : //docs.python.org/2/library/stdtypes.html#str.isalpha

如果字符串中的所有字符均为字母并且至少包含一个字符,则返回true,否则返回false。

use

str.isalpha() 

Ref: https://docs.python.org/2/library/stdtypes.html#str.isalpha

Return true if all characters in the string are alphabetic and there is at least one character, false otherwise.


回答 3

https://docs.python.org/2/library/re.html

您最好使用正则表达式。它要快得多。

import re

def f1(string):
    return any(i.isdigit() for i in string)


def f2(string):
    return re.search('\d', string)


# if you compile the regex string first, it's even faster
RE_D = re.compile('\d')
def f3(string):
    return RE_D.search(string)

# Output from iPython
# In [18]: %timeit  f1('assdfgag123')
# 1000000 loops, best of 3: 1.18 µs per loop

# In [19]: %timeit  f2('assdfgag123')
# 1000000 loops, best of 3: 923 ns per loop

# In [20]: %timeit  f3('assdfgag123')
# 1000000 loops, best of 3: 384 ns per loop

https://docs.python.org/2/library/re.html

You should better use regular expression. It’s much faster.

import re

def f1(string):
    return any(i.isdigit() for i in string)


def f2(string):
    return re.search('\d', string)


# if you compile the regex string first, it's even faster
RE_D = re.compile('\d')
def f3(string):
    return RE_D.search(string)

# Output from iPython
# In [18]: %timeit  f1('assdfgag123')
# 1000000 loops, best of 3: 1.18 µs per loop

# In [19]: %timeit  f2('assdfgag123')
# 1000000 loops, best of 3: 923 ns per loop

# In [20]: %timeit  f3('assdfgag123')
# 1000000 loops, best of 3: 384 ns per loop

回答 4

您可以在字符串中的每个字符上应用isdigit()函数。或者您可以使用正则表达式。

我也找到了如何在Python的字符串中找到一个数字?以非常适合的方式返回数字。以下解决方案来自该问题的答案。

number = re.search(r'\d+', yourString).group()

或者:

number = filter(str.isdigit, yourString)

有关更多信息,请查看正则表达式文档:http : //docs.python.org/2/library/re.html

编辑:这将返回实际数字,而不是布尔值,因此以上答案对于您的情况更为正确

第一种方法将返回第一个数字和后续的连续数字。因此1.56将被返回为1。10,000将被返回为10。0207-100-1000将被返回为0207。

第二种方法不起作用。

要提取所有数字,点和逗号,并且不丢失非连续数字,请使用:

re.sub('[^\d.,]' , '', yourString)

You could apply the function isdigit() on every character in the String. Or you could use regular expressions.

Also I found How do I find one number in a string in Python? with very suitable ways to return numbers. The solution below is from the answer in that question.

number = re.search(r'\d+', yourString).group()

Alternatively:

number = filter(str.isdigit, yourString)

For further Information take a look at the regex docu: http://docs.python.org/2/library/re.html

Edit: This Returns the actual numbers, not a boolean value, so the answers above are more correct for your case

The first method will return the first digit and subsequent consecutive digits. Thus 1.56 will be returned as 1. 10,000 will be returned as 10. 0207-100-1000 will be returned as 0207.

The second method does not work.

To extract all digits, dots and commas, and not lose non-consecutive digits, use:

re.sub('[^\d.,]' , '', yourString)

回答 5

您可以使用NLTK方法。

这将在文本中找到“ 1”和“ One”:

import nltk 

def existence_of_numeric_data(text):
    text=nltk.word_tokenize(text)
    pos = nltk.pos_tag(text)
    count = 0
    for i in range(len(pos)):
        word , pos_tag = pos[i]
        if pos_tag == 'CD':
            return True
    return False

existence_of_numeric_data('We are going out. Just five you and me.')

You can use NLTK method for it.

This will find both ‘1’ and ‘One’ in the text:

import nltk 

def existence_of_numeric_data(text):
    text=nltk.word_tokenize(text)
    pos = nltk.pos_tag(text)
    count = 0
    for i in range(len(pos)):
        word , pos_tag = pos[i]
        if pos_tag == 'CD':
            return True
    return False

existence_of_numeric_data('We are going out. Just five you and me.')

回答 6

您可以按照以下步骤完成此操作:

if a_string.isdigit(): do_this() else: do_that()

https://docs.python.org/2/library/stdtypes.html#str.isdigit

使用.isdigit()还意味着在需要使用列表理解的情况下不必使用异常处理(try / except)(在列表理解内部不可能进行try / except)。

You can accomplish this as follows:

if a_string.isdigit(): do_this() else: do_that()

https://docs.python.org/2/library/stdtypes.html#str.isdigit

Using .isdigit() also means not having to resort to exception handling (try/except) in cases where you need to use list comprehension (try/except is not possible inside a list comprehension).


回答 7

您可以使用带有计数的范围来检查数字在字符串中出现的次数,方法是对照范围进行检查:

def count_digit(a):
    sum = 0
    for i in range(10):
        sum += a.count(str(i))
    return sum

ans = count_digit("apple3rh5")
print(ans)

#This print 2

You can use range with count to check how many times a number appears in the string by checking it against the range:

def count_digit(a):
    sum = 0
    for i in range(10):
        sum += a.count(str(i))
    return sum

ans = count_digit("apple3rh5")
print(ans)

#This print 2

回答 8

我很惊讶没有人提到anyand的组合map

def contains_digit(s):
    isdigit = str.isdigit
    return any(map(isdigit,s))

在python 3中,它可能最快(因为正则表达式除外)是因为它不包含任何循环(并且对该函数进行别名避免了在中查找它str)。

不要在python 2中使用它作为map返回a list,这会中断any短路

I’m surprised that no-one mentionned this combination of any and map:

def contains_digit(s):
    isdigit = str.isdigit
    return any(map(isdigit,s))

in python 3 it’s probably the fastest there (except maybe for regexes) is because it doesn’t contain any loop (and aliasing the function avoids looking it up in str).

Don’t use that in python 2 as map returns a list, which breaks any short-circuiting


回答 9

这个如何?

import string

def containsNumber(line):
    res = False
    try:
        for val in line.split():
            if (float(val.strip(string.punctuation))):
                res = True
                break
    except ValueError:
        pass
    return res

containsNumber('234.12 a22') # returns True
containsNumber('234.12L a22') # returns False
containsNumber('234.12, a22') # returns True

What about this one?

import string

def containsNumber(line):
    res = False
    try:
        for val in line.split():
            if (float(val.strip(string.punctuation))):
                res = True
                break
    except ValueError:
        pass
    return res

containsNumber('234.12 a22') # returns True
containsNumber('234.12L a22') # returns False
containsNumber('234.12, a22') # returns True

回答 10

我将使@zyxue答案更加明确:

RE_D = re.compile('\d')

def has_digits(string):
    res = RE_D.search(string)
    return res is not None

has_digits('asdf1')
Out: True

has_digits('asdf')
Out: False

这是@zyxue在答案中提出的解决方案中最快的基准解决方案。

I’ll make the @zyxue answer a bit more explicit:

RE_D = re.compile('\d')

def has_digits(string):
    res = RE_D.search(string)
    return res is not None

has_digits('asdf1')
Out: True

has_digits('asdf')
Out: False

which is the solution with the fastest benchmark from the solutions that @zyxue proposed on the answer.


回答 11

更简单的解决方法是

s = '1dfss3sw235fsf7s'
count = 0
temp = list(s)
for item in temp:
    if(item.isdigit()):
        count = count + 1
    else:
        pass
print count

Simpler way to solve is as

s = '1dfss3sw235fsf7s'
count = 0
temp = list(s)
for item in temp:
    if(item.isdigit()):
        count = count + 1
    else:
        pass
print count

回答 12

import string
import random
n = 10

p = ''

while (string.ascii_uppercase not in p) and (string.ascii_lowercase not in p) and (string.digits not in p):
    for _ in range(n):
        state = random.randint(0, 2)
        if state == 0:
            p = p + chr(random.randint(97, 122))
        elif state == 1:
            p = p + chr(random.randint(65, 90))
        else:
            p = p + str(random.randint(0, 9))
    break
print(p)

此代码生成大小为n的序列,该序列至少包含一个大写字母,一个小写字母和一个数字。通过使用while循环,我们保证了该事件。

import string
import random
n = 10

p = ''

while (string.ascii_uppercase not in p) and (string.ascii_lowercase not in p) and (string.digits not in p):
    for _ in range(n):
        state = random.randint(0, 2)
        if state == 0:
            p = p + chr(random.randint(97, 122))
        elif state == 1:
            p = p + chr(random.randint(65, 90))
        else:
            p = p + str(random.randint(0, 9))
    break
print(p)

This code generates a sequence with size n which at least contain an uppercase, lowercase, and a digit. By using the while loop, we have guaranteed this event.


回答 13

any并且ord可以被组合以达到目的如下所示。

>>> def hasDigits(s):
...     return any( 48 <= ord(char) <= 57 for char in s)
...
>>> hasDigits('as1')
True
>>> hasDigits('as')
False
>>> hasDigits('as9')
True
>>> hasDigits('as_')
False
>>> hasDigits('1as')
True
>>>

关于此实现的几点。

  1. any 更好,因为它像C语言中的短路表达式一样工作,并且会在确定字符串后立即返回结果,例如,即使不比较字符串’a1bbbbbbc’,’b’和’c’。

  2. ord更好,因为它提供了更大的灵活性,例如仅在“ 0”和“ 5”之间或任何其他范围内的校验数字。例如,如果要编写数字的十六进制表示形式的验证器,则希望字符串的字母仅在“ A”至“ F”范围内。

any and ord can be combined to serve the purpose as shown below.

>>> def hasDigits(s):
...     return any( 48 <= ord(char) <= 57 for char in s)
...
>>> hasDigits('as1')
True
>>> hasDigits('as')
False
>>> hasDigits('as9')
True
>>> hasDigits('as_')
False
>>> hasDigits('1as')
True
>>>

A couple of points about this implementation.

  1. any is better because it works like short circuit expression in C Language and will return result as soon as it can be determined i.e. in case of string ‘a1bbbbbbc’ ‘b’s and ‘c’s won’t even be compared.

  2. ord is better because it provides more flexibility like check numbers only between ‘0’ and ‘5’ or any other range. For example if you were to write a validator for Hexadecimal representation of numbers you would want string to have alphabets in the range ‘A’ to ‘F’ only.


回答 14

alp_num = [x for x in string.split() if x.isalnum() and re.search(r'\d',x) and 
re.search(r'[a-z]',x)]

print(alp_num)

这将返回其中包含字母和数字的所有字符串。isalpha()返回包含所有数字或所有字符的字符串。

alp_num = [x for x in string.split() if x.isalnum() and re.search(r'\d',x) and 
re.search(r'[a-z]',x)]

print(alp_num)

This returns all the string that has both alphabets and numbers in it. isalpha() returns the string with all digits or all characters.


回答 15

这可能不是Python中最好的方法,但是作为Haskeller,这种lambda / map方法对我来说非常有意义,而且非常简短:

anydigit = lambda x: any(map(str.isdigit, x))

当然不需要命名。命名它可以像一样使用anydigit("abc123"),感觉就像我在寻找什么!

This probably isn’t the best approach in Python, but as a Haskeller this lambda/map approach made perfect sense to me and is very short:

anydigit = lambda x: any(map(str.isdigit, x))

Doesn’t need to be named of course. Named it could be used like anydigit("abc123"), which feels like what I was looking for!


TypeError:并非在字符串格式化python期间转换所有参数

问题:TypeError:并非在字符串格式化python期间转换所有参数

该程序应采用两个名称,如果它们的长度相同,则应检查它们是否相同。如果是相同的单词,则将显示“名称相同”。如果它们的长度相同但字母不同,则会显示“名称不同但长度相同”。我有问题的部分在底部的4行中。

#!/usr/bin/env python
# Enter your code for "What's In (The Length Of) A Name?" here.
name1 = input("Enter name 1: ")
name2 = input("Enter name 2: ")
len(name1)
len(name2)
if len(name1) == len(name2):
    if name1 == name2:
        print ("The names are the same")
    else:
        print ("The names are different, but are the same length")
    if len(name1) > len(name2):
        print ("'{0}' is longer than '{1}'"% name1, name2)
    elif len(name1) < len(name2):
        print ("'{0}'is longer than '{1}'"% name2, name1)

当我运行此代码时,它显示:

Traceback (most recent call last):
  File "program.py", line 13, in <module>
    print ("'{0}' is longer than '{1}'"% name1, name2)
TypeError: not all arguments converted during string formatting

任何建议,高度赞赏。

The program is supposed to take in two names, and if they are the same length it should check if they are the same word. If it’s the same word it will print “The names are the same”. If they are the same length but with different letters it will print “The names are different but the same length”. The part I’m having a problem with is in the bottom 4 lines.

#!/usr/bin/env python
# Enter your code for "What's In (The Length Of) A Name?" here.
name1 = input("Enter name 1: ")
name2 = input("Enter name 2: ")
len(name1)
len(name2)
if len(name1) == len(name2):
    if name1 == name2:
        print ("The names are the same")
    else:
        print ("The names are different, but are the same length")
    if len(name1) > len(name2):
        print ("'{0}' is longer than '{1}'"% name1, name2)
    elif len(name1) < len(name2):
        print ("'{0}'is longer than '{1}'"% name2, name1)

When I run this code it displays:

Traceback (most recent call last):
  File "program.py", line 13, in <module>
    print ("'{0}' is longer than '{1}'"% name1, name2)
TypeError: not all arguments converted during string formatting

Any suggestions are highly appreciated.


回答 0

您正在混合使用不同的格式功能。

旧式%格式化使用%代码进行格式化:

'It will cost $%d dollars.' % 95

新型{}格式使用{}代码和.format方法

'It will cost ${0} dollars.'.format(95)

请注意,使用旧格式时,必须使用元组指定多个参数:

'%d days and %d nights' % (40, 40)

就您而言,由于您使用的是{}格式说明符,请使用.format

"'{0}' is longer than '{1}'".format(name1, name2)

You’re mixing different format functions.

The old-style % formatting uses % codes for formatting:

'It will cost $%d dollars.' % 95

The new-style {} formatting uses {} codes and the .format method

'It will cost ${0} dollars.'.format(95)

Note that with old-style formatting, you have to specify multiple arguments using a tuple:

'%d days and %d nights' % (40, 40)

In your case, since you’re using {} format specifiers, use .format:

"'{0}' is longer than '{1}'".format(name1, name2)

回答 1

错误在于您的字符串格式。

使用’%’运算符使用传统字符串格式的正确方法是使用printf样式的格式字符串(此处的Python文档:http : //docs.python.org/2/library/string.html#format-字符串语法):

"'%s' is longer than '%s'" % (name1, name2)

但是,’%’运算符将来可能会被弃用。新的PEP 3101做事方式是这样的:

"'{0}' is longer than '{1}'".format(name1, name2)

The error is in your string formatting.

The correct way to use traditional string formatting using the ‘%’ operator is to use a printf-style format string (Python documentation for this here: http://docs.python.org/2/library/string.html#format-string-syntax):

"'%s' is longer than '%s'" % (name1, name2)

However, the ‘%’ operator will probably be deprecated in the future. The new PEP 3101 way of doing things is like this:

"'{0}' is longer than '{1}'".format(name1, name2)

回答 2

对我来说,此错误是在我试图将元组传递给字符串格式方法时引起的。

我从这个问题/答案中找到了解决方案

从链接中复制并粘贴正确答案(“不做我的工作”)

>>> thetuple = (1, 2, 3)
>>> print "this is a tuple: %s" % (thetuple,)
this is a tuple: (1, 2, 3)

用感兴趣的元组作为唯一项(即(tutuple)部分)制作单例元组是这里的关键。

For me, This error was caused when I was attempting to pass in a tuple into the string format method.

I found the solution from this question/answer

Copying and pasting the correct answer from the link (NOT MY WORK):

>>> thetuple = (1, 2, 3)
>>> print "this is a tuple: %s" % (thetuple,)
this is a tuple: (1, 2, 3)

Making a singleton tuple with the tuple of interest as the only item, i.e. the (thetuple,) part, is the key bit here.


回答 3

就我而言,这是因为我只需要一个%s,我就缺少值输入。

In my case, it’s because I need only a single %s, i missing values input.


回答 4

除了其他两个答案,我认为缩进在最后两个条件下也是不正确的。条件是一个名字要长于另一个名字,并且它们必须以“ elif”开头且没有缩进。如果将其放在第一个条件下(通过从边缘开始给它四个缩进),则最终会导致矛盾,因为名称的长度不能同时相等且不同。

    else:
        print ("The names are different, but are the same length")
elif len(name1) > len(name2):
    print ("{0} is longer than {1}".format(name1, name2))

In addition to the other two answers, I think the indentations are also incorrect in the last two conditions. The conditions are that one name is longer than the other and they need to start with ‘elif’ and with no indentations. If you put it within the first condition (by giving it four indentations from the margin), it ends up being contradictory because the lengths of the names cannot be equal and different at the same time.

    else:
        print ("The names are different, but are the same length")
elif len(name1) > len(name2):
    print ("{0} is longer than {1}".format(name1, name2))

回答 5

其他一些答案中也指出了很多问题。

  1. 正如nneonneo所指出的,您正在混合使用不同的字符串格式化方法。
  2. 正如GuyP指出的那样,您的缩进也已关闭。

我既提供了.format的示例,也提供了将元组传递给%s的参数说明符的方法。在这两种情况下,缩进均已固定,因此长度匹配时大于/小于检查范围。也将随后的if语句更改为elif,以便仅在先前的同一级别语句为False时才运行。

使用.format进行字符串格式化

name1 = input("Enter name 1: ")
name2 = input("Enter name 2: ")
len(name1)
len(name2)
if len(name1) == len(name2):
    if name1 == name2:
        print ("The names are the same")
    else:
        print ("The names are different, but are the same length")
elif len(name1) > len(name2):
    print ("{0} is longer than {1}".format(name1, name2))
elif len(name1) < len(name2):
    print ("{0} is longer than {1}".format(name2, name1))

使用%s和元组进行字符串格式化

name1 = input("Enter name 1: ")
name2 = input("Enter name 2: ")
len(name1)
len(name2)
if len(name1) == len(name2):
    if name1 == name2:
        print ("The names are the same")
    else:
        print ("The names are different, but are the same length")
elif len(name1) > len(name2):
    print ("%s is longer than %s" % (name1, name2))
elif len(name1) < len(name2):
    print ("%s is longer than %s" % (name2, name1))

There is a combination of issues as pointed out in a few of the other answers.

  1. As pointed out by nneonneo you are mixing different String Formatting methods.
  2. As pointed out by GuyP your indentation is off as well.

I’ve provided both the example of .format as well as passing tuples to the argument specifier of %s. In both cases the indentation has been fixed so greater/less than checks are outside of when length matches. Also changed subsequent if statements to elif’s so they only run if the prior same level statement was False.

String Formatting with .format

name1 = input("Enter name 1: ")
name2 = input("Enter name 2: ")
len(name1)
len(name2)
if len(name1) == len(name2):
    if name1 == name2:
        print ("The names are the same")
    else:
        print ("The names are different, but are the same length")
elif len(name1) > len(name2):
    print ("{0} is longer than {1}".format(name1, name2))
elif len(name1) < len(name2):
    print ("{0} is longer than {1}".format(name2, name1))

String Formatting with %s and a tuple

name1 = input("Enter name 1: ")
name2 = input("Enter name 2: ")
len(name1)
len(name2)
if len(name1) == len(name2):
    if name1 == name2:
        print ("The names are the same")
    else:
        print ("The names are different, but are the same length")
elif len(name1) > len(name2):
    print ("%s is longer than %s" % (name1, name2))
elif len(name1) < len(name2):
    print ("%s is longer than %s" % (name2, name1))

回答 6

在python 3.7及更高版本中,有一种新的简便方法。语法如下:

name = "Eric"
age = 74
f"Hello, {name}. You are {age}."

输出:

Hello, Eric. You are 74.

In python 3.7 and above there is a new and easy way. here is the syntax:

name = "Eric"
age = 74
f"Hello, {name}. You are {age}."

OutPut:

Hello, Eric. You are 74.

回答 7

我也遇到错误

_mysql_exceptions.ProgrammingError: not all arguments converted during string formatting 

但是列表参数工作良好。

我使用mysqlclient python lib。lib似乎不接受元组args。像这样传递列表参数['arg1', 'arg2'] 将起作用。

I encounter the error as well,

_mysql_exceptions.ProgrammingError: not all arguments converted during string formatting 

But list args work well.

I use mysqlclient python lib. The lib looks like not to accept tuple args. To pass list args like ['arg1', 'arg2'] will work.


回答 8

Django原始SQL查询视图

"SELECT * FROM VendorReport_vehicledamage WHERE requestdate BETWEEN '{0}' AND '{1}'".format(date_from, date_to)

models.py

class VehicleDamage(models.Model):
    requestdate = models.DateTimeField("requestdate")
    vendor_name = models.CharField("vendor_name", max_length=50)
    class Meta:
        managed=False

views.py

def location_damageReports(request):
    #static date for testing
    date_from = '2019-11-01'
    date_to = '2019-21-01'
    vehicle_damage_reports = VehicleDamage.objects.raw("SELECT * FROM VendorReport_vehicledamage WHERE requestdate BETWEEN '{0}' AND '{1}'".format(date_from, date_to))
    damage_report = DashboardDamageReportSerializer(vehicle_damage_reports, many=True)
    data={"data": damage_report.data}
    return HttpResponse(json.dumps(data), content_type="application/json")

注意:使用python 3.5和Django 1.11

django raw sql query in view

"SELECT * FROM VendorReport_vehicledamage WHERE requestdate BETWEEN '{0}' AND '{1}'".format(date_from, date_to)

models.py

class VehicleDamage(models.Model):
    requestdate = models.DateTimeField("requestdate")
    vendor_name = models.CharField("vendor_name", max_length=50)
    class Meta:
        managed=False

views.py

def location_damageReports(request):
    #static date for testing
    date_from = '2019-11-01'
    date_to = '2019-21-01'
    vehicle_damage_reports = VehicleDamage.objects.raw("SELECT * FROM VendorReport_vehicledamage WHERE requestdate BETWEEN '{0}' AND '{1}'".format(date_from, date_to))
    damage_report = DashboardDamageReportSerializer(vehicle_damage_reports, many=True)
    data={"data": damage_report.data}
    return HttpResponse(json.dumps(data), content_type="application/json")

Note: using python 3.5 and django 1.11


回答 9

对我来说,因为我在一个打印调用中存储了许多值,所以解决方案是创建一个单独的变量以将数据存储为元组,然后调用打印函数。

x = (f"{id}", f"{name}", f"{age}")
print(x) 

For me, as I was storing many values within a single print call, the solution was to create a separate variable to store the data as a tuple and then call the print function.

x = (f"{id}", f"{name}", f"{age}")
print(x) 

如何检查平面清单中是否存在重复项?

问题:如何检查平面清单中是否存在重复项?

例如,给定列表['one', 'two', 'one'],算法应返回True,而给定['one', 'two', 'three']应返回False

For example, given the list ['one', 'two', 'one'], the algorithm should return True, whereas given ['one', 'two', 'three'] it should return False.


回答 0

使用set()删除重复,如果所有的值是可哈希

>>> your_list = ['one', 'two', 'one']
>>> len(your_list) != len(set(your_list))
True

Use set() to remove duplicates if all values are hashable:

>>> your_list = ['one', 'two', 'one']
>>> len(your_list) != len(set(your_list))
True

回答 1

仅推荐用于名单:

any(thelist.count(x) > 1 for x in thelist)

难道不是一个长长的清单上使用-它可以采取的时间与列表中的项目的数量!

对于带有可哈希项(字符串,数字和&c)的较长列表:

def anydup(thelist):
  seen = set()
  for x in thelist:
    if x in seen: return True
    seen.add(x)
  return False

如果您的项目不可散列(子列表,字典等),则它会变得更毛茸茸,尽管如果它们至少具有可比性,仍然可能会获得O(N logN)。但是您需要了解或测试项目的特性(是否可哈希,是否可比较)才能获得最佳性能-哈希值为O(N),非哈希值可比较项为O(N log N),否则它下降到O(N平方),对此无能为力:-(。

Recommended for short lists only:

any(thelist.count(x) > 1 for x in thelist)

Do not use on a long list — it can take time proportional to the square of the number of items in the list!

For longer lists with hashable items (strings, numbers, &c):

def anydup(thelist):
  seen = set()
  for x in thelist:
    if x in seen: return True
    seen.add(x)
  return False

If your items are not hashable (sublists, dicts, etc) it gets hairier, though it may still be possible to get O(N logN) if they’re at least comparable. But you need to know or test the characteristics of the items (hashable or not, comparable or not) to get the best performance you can — O(N) for hashables, O(N log N) for non-hashable comparables, otherwise it’s down to O(N squared) and there’s nothing one can do about it:-(.


回答 2

这很古老,但是这里的答案使我得出了一个略有不同的解决方案。如果您想滥用理解力,则可以通过这种方式进行短路。

xs = [1, 2, 1]
s = set()
any(x in s or s.add(x) for x in xs)
# You can use a similar approach to actually retrieve the duplicates.
s = set()
duplicates = set(x for x in xs if x in s or s.add(x))

This is old, but the answers here led me to a slightly different solution. If you are up for abusing comprehensions, you can get short-circuiting this way.

xs = [1, 2, 1]
s = set()
any(x in s or s.add(x) for x in xs)
# You can use a similar approach to actually retrieve the duplicates.
s = set()
duplicates = set(x for x in xs if x in s or s.add(x))

回答 3

如果您喜欢函数式编程风格,那么这里是一个有用的函数,可以使用doctest进行自我记录和测试。

def decompose(a_list):
    """Turns a list into a set of all elements and a set of duplicated elements.

    Returns a pair of sets. The first one contains elements
    that are found at least once in the list. The second one
    contains elements that appear more than once.

    >>> decompose([1,2,3,5,3,2,6])
    (set([1, 2, 3, 5, 6]), set([2, 3]))
    """
    return reduce(
        lambda (u, d), o : (u.union([o]), d.union(u.intersection([o]))),
        a_list,
        (set(), set()))

if __name__ == "__main__":
    import doctest
    doctest.testmod()

在这里,您可以通过检查返回的对中的第二个元素是否为空来测试唯一性:

def is_set(l):
    """Test if there is no duplicate element in l.

    >>> is_set([1,2,3])
    True
    >>> is_set([1,2,1])
    False
    >>> is_set([])
    True
    """
    return not decompose(l)[1]

请注意,这是无效的,因为您正在显式构造分解。但是,在使用reduce的过程中,您可以得出相当于5(但效率稍低)的答案5:

def is_set(l):
    try:
        def func(s, o):
            if o in s:
                raise Exception
            return s.union([o])
        reduce(func, l, set())
        return True
    except:
        return False

If you are fond of functional programming style, here is a useful function, self-documented and tested code using doctest.

def decompose(a_list):
    """Turns a list into a set of all elements and a set of duplicated elements.

    Returns a pair of sets. The first one contains elements
    that are found at least once in the list. The second one
    contains elements that appear more than once.

    >>> decompose([1,2,3,5,3,2,6])
    (set([1, 2, 3, 5, 6]), set([2, 3]))
    """
    return reduce(
        lambda (u, d), o : (u.union([o]), d.union(u.intersection([o]))),
        a_list,
        (set(), set()))

if __name__ == "__main__":
    import doctest
    doctest.testmod()

From there you can test unicity by checking whether the second element of the returned pair is empty:

def is_set(l):
    """Test if there is no duplicate element in l.

    >>> is_set([1,2,3])
    True
    >>> is_set([1,2,1])
    False
    >>> is_set([])
    True
    """
    return not decompose(l)[1]

Note that this is not efficient since you are explicitly constructing the decomposition. But along the line of using reduce, you can come up to something equivalent (but slightly less efficient) to answer 5:

def is_set(l):
    try:
        def func(s, o):
            if o in s:
                raise Exception
            return s.union([o])
        reduce(func, l, set())
        return True
    except:
        return False

回答 4

我认为比较此处介绍的不同解决方案的时间会很有用。为此,我使用了自己的库simple_benchmark

在此处输入图片说明

因此,确实对于这种情况,Denis Otkidach的解决方案是最快的。

其中一些方法还显示出更陡峭的曲线,这些方法随元素数量呈二次方缩放(Alex Martellis第一解决方案,wjandrea和Xavier Decorets解决方案)。值得一提的是,Keiku提供的熊猫解决方案具有很大的常数。但是对于较大的列表,它几乎赶上了其他解决方案。

并且如果副本在第一位置。这对于查看哪些解决方案短路很有用:

在此处输入图片说明

这里有几种方法不会短路:Kaiku,Frank,Xavier_Decoret(第一个解决方案),Turn,Alex Martelli(第一个解决方案)以及Denis Otkidach提出的方法(在无重复情况下最快)。

我在自己的库中包含了一个函数: iteration_utilities.all_distinct在没有重复的情况下,它可以与最快的解决方案竞争,并且可以在开始重复的情况下以恒定的时间执行(尽管速度不是最快)。

基准测试代码:

from collections import Counter
from functools import reduce

import pandas as pd
from simple_benchmark import BenchmarkBuilder
from iteration_utilities import all_distinct

b = BenchmarkBuilder()

@b.add_function()
def Keiku(l):
    return pd.Series(l).duplicated().sum() > 0

@b.add_function()
def Frank(num_list):
    unique = []
    dupes = []
    for i in num_list:
        if i not in unique:
            unique.append(i)
        else:
            dupes.append(i)
    if len(dupes) != 0:
        return False
    else:
        return True

@b.add_function()
def wjandrea(iterable):
    seen = []
    for x in iterable:
        if x in seen:
            return True
        seen.append(x)
    return False

@b.add_function()
def user(iterable):
    clean_elements_set = set()
    clean_elements_set_add = clean_elements_set.add

    for possible_duplicate_element in iterable:

        if possible_duplicate_element in clean_elements_set:
            return True

        else:
            clean_elements_set_add( possible_duplicate_element )

    return False

@b.add_function()
def Turn(l):
    return Counter(l).most_common()[0][1] > 1

def getDupes(l):
    seen = set()
    seen_add = seen.add
    for x in l:
        if x in seen or seen_add(x):
            yield x

@b.add_function()          
def F1Rumors(l):
    try:
        if next(getDupes(l)): return True    # Found a dupe
    except StopIteration:
        pass
    return False

def decompose(a_list):
    return reduce(
        lambda u, o : (u[0].union([o]), u[1].union(u[0].intersection([o]))),
        a_list,
        (set(), set()))

@b.add_function()
def Xavier_Decoret_1(l):
    return not decompose(l)[1]

@b.add_function()
def Xavier_Decoret_2(l):
    try:
        def func(s, o):
            if o in s:
                raise Exception
            return s.union([o])
        reduce(func, l, set())
        return True
    except:
        return False

@b.add_function()
def pyrospade(xs):
    s = set()
    return any(x in s or s.add(x) for x in xs)

@b.add_function()
def Alex_Martelli_1(thelist):
    return any(thelist.count(x) > 1 for x in thelist)

@b.add_function()
def Alex_Martelli_2(thelist):
    seen = set()
    for x in thelist:
        if x in seen: return True
        seen.add(x)
    return False

@b.add_function()
def Denis_Otkidach(your_list):
    return len(your_list) != len(set(your_list))

@b.add_function()
def MSeifert04(l):
    return not all_distinct(l)

对于参数:


# No duplicate run
@b.add_arguments('list size')
def arguments():
    for exp in range(2, 14):
        size = 2**exp
        yield size, list(range(size))

# Duplicate at beginning run
@b.add_arguments('list size')
def arguments():
    for exp in range(2, 14):
        size = 2**exp
        yield size, [0, *list(range(size)]

# Running and plotting
r = b.run()
r.plot()

I thought it would be useful to compare the timings of the different solutions presented here. For this I used my own library simple_benchmark:

enter image description here

So indeed for this case the solution from Denis Otkidach is fastest.

Some of the approaches also exhibit a much steeper curve, these are the approaches that scale quadratic with the number of elements (Alex Martellis first solution, wjandrea and both of Xavier Decorets solutions). Also important to mention is that the pandas solution from Keiku has a very big constant factor. But for larger lists it almost catches up with the other solutions.

And in case the duplicate is at the first position. This is useful to see which solutions are short-circuiting:

enter image description here

Here several approaches don’t short-circuit: Kaiku, Frank, Xavier_Decoret (first solution), Turn, Alex Martelli (first solution) and the approach presented by Denis Otkidach (which was fastest in the no-duplicate case).

I included a function from my own library here: iteration_utilities.all_distinct which can compete with the fastest solution in the no-duplicates case and performs in constant-time for the duplicate-at-begin case (although not as fastest).

The code for the benchmark:

from collections import Counter
from functools import reduce

import pandas as pd
from simple_benchmark import BenchmarkBuilder
from iteration_utilities import all_distinct

b = BenchmarkBuilder()

@b.add_function()
def Keiku(l):
    return pd.Series(l).duplicated().sum() > 0

@b.add_function()
def Frank(num_list):
    unique = []
    dupes = []
    for i in num_list:
        if i not in unique:
            unique.append(i)
        else:
            dupes.append(i)
    if len(dupes) != 0:
        return False
    else:
        return True

@b.add_function()
def wjandrea(iterable):
    seen = []
    for x in iterable:
        if x in seen:
            return True
        seen.append(x)
    return False

@b.add_function()
def user(iterable):
    clean_elements_set = set()
    clean_elements_set_add = clean_elements_set.add

    for possible_duplicate_element in iterable:

        if possible_duplicate_element in clean_elements_set:
            return True

        else:
            clean_elements_set_add( possible_duplicate_element )

    return False

@b.add_function()
def Turn(l):
    return Counter(l).most_common()[0][1] > 1

def getDupes(l):
    seen = set()
    seen_add = seen.add
    for x in l:
        if x in seen or seen_add(x):
            yield x

@b.add_function()          
def F1Rumors(l):
    try:
        if next(getDupes(l)): return True    # Found a dupe
    except StopIteration:
        pass
    return False

def decompose(a_list):
    return reduce(
        lambda u, o : (u[0].union([o]), u[1].union(u[0].intersection([o]))),
        a_list,
        (set(), set()))

@b.add_function()
def Xavier_Decoret_1(l):
    return not decompose(l)[1]

@b.add_function()
def Xavier_Decoret_2(l):
    try:
        def func(s, o):
            if o in s:
                raise Exception
            return s.union([o])
        reduce(func, l, set())
        return True
    except:
        return False

@b.add_function()
def pyrospade(xs):
    s = set()
    return any(x in s or s.add(x) for x in xs)

@b.add_function()
def Alex_Martelli_1(thelist):
    return any(thelist.count(x) > 1 for x in thelist)

@b.add_function()
def Alex_Martelli_2(thelist):
    seen = set()
    for x in thelist:
        if x in seen: return True
        seen.add(x)
    return False

@b.add_function()
def Denis_Otkidach(your_list):
    return len(your_list) != len(set(your_list))

@b.add_function()
def MSeifert04(l):
    return not all_distinct(l)

And for the arguments:


# No duplicate run
@b.add_arguments('list size')
def arguments():
    for exp in range(2, 14):
        size = 2**exp
        yield size, list(range(size))

# Duplicate at beginning run
@b.add_arguments('list size')
def arguments():
    for exp in range(2, 14):
        size = 2**exp
        yield size, [0, *list(range(size)]

# Running and plotting
r = b.run()
r.plot()

回答 5

我最近回答了一个相关问题,以建立所有重复项,使用生成器在列表中。它的优势在于,如果仅用于建立“是否存在重复项”,则只需要获取第一项,其余项就可以忽略,这是最终的捷径。

这是一个有趣的基于集合的方法,我直接从moooeeeep改编而成

def getDupes(l):
    seen = set()
    seen_add = seen.add
    for x in l:
        if x in seen or seen_add(x):
            yield x

因此,将有一份完整的受骗名单list(getDupes(etc))。为了简单地测试“是否”存在欺骗,应将其包装如下:

def hasDupes(l):
    try:
        if getDupes(l).next(): return True    # Found a dupe
    except StopIteration:
        pass
    return False

这样可以很好地扩展规模,并在列表中的任何位置都提供一致的操作时间-我测试了最多100万个条目的列表。如果您对数据有所了解,特别是在上半年可能会出现重复数据,或者是其他使您歪曲要求的事情(例如需要获取实际的重复数据),那么有几个真正可以替代的重复数据定位器可能跑赢大市。我推荐的两个是…

基于简单dict的方法,可读性强:

def getDupes(c):
    d = {}
    for i in c:
        if i in d:
            if d[i]:
                yield i
                d[i] = False
        else:
            d[i] = True

利用排序列表上的itertools(本质上是ifilter / izip / tee),如果您要获取所有重复对象,则效率非常高,尽管获取第一个对象并没有那么快:

def getDupes(c):
    a, b = itertools.tee(sorted(c))
    next(b, None)
    r = None
    for k, g in itertools.ifilter(lambda x: x[0]==x[1], itertools.izip(a, b)):
        if k != r:
            yield k
            r = k

这些是我尝试使用完整重复列表的方法中表现最好的,第一个重复出现在从开始到中间的1m元素列表中的任何位置。令人惊讶的是,排序步骤增加了很少的开销。您的里程可能会有所不同,但这是我的具体计时结果:

Finding FIRST duplicate, single dupe places "n" elements in to 1m element array

Test set len change :        50 -  . . . . .  -- 0.002
Test in dict        :        50 -  . . . . .  -- 0.002
Test in set         :        50 -  . . . . .  -- 0.002
Test sort/adjacent  :        50 -  . . . . .  -- 0.023
Test sort/groupby   :        50 -  . . . . .  -- 0.026
Test sort/zip       :        50 -  . . . . .  -- 1.102
Test sort/izip      :        50 -  . . . . .  -- 0.035
Test sort/tee/izip  :        50 -  . . . . .  -- 0.024
Test moooeeeep      :        50 -  . . . . .  -- 0.001 *
Test iter*/sorted   :        50 -  . . . . .  -- 0.027

Test set len change :      5000 -  . . . . .  -- 0.017
Test in dict        :      5000 -  . . . . .  -- 0.003 *
Test in set         :      5000 -  . . . . .  -- 0.004
Test sort/adjacent  :      5000 -  . . . . .  -- 0.031
Test sort/groupby   :      5000 -  . . . . .  -- 0.035
Test sort/zip       :      5000 -  . . . . .  -- 1.080
Test sort/izip      :      5000 -  . . . . .  -- 0.043
Test sort/tee/izip  :      5000 -  . . . . .  -- 0.031
Test moooeeeep      :      5000 -  . . . . .  -- 0.003 *
Test iter*/sorted   :      5000 -  . . . . .  -- 0.031

Test set len change :     50000 -  . . . . .  -- 0.035
Test in dict        :     50000 -  . . . . .  -- 0.023
Test in set         :     50000 -  . . . . .  -- 0.023
Test sort/adjacent  :     50000 -  . . . . .  -- 0.036
Test sort/groupby   :     50000 -  . . . . .  -- 0.134
Test sort/zip       :     50000 -  . . . . .  -- 1.121
Test sort/izip      :     50000 -  . . . . .  -- 0.054
Test sort/tee/izip  :     50000 -  . . . . .  -- 0.045
Test moooeeeep      :     50000 -  . . . . .  -- 0.019 *
Test iter*/sorted   :     50000 -  . . . . .  -- 0.055

Test set len change :    500000 -  . . . . .  -- 0.249
Test in dict        :    500000 -  . . . . .  -- 0.145
Test in set         :    500000 -  . . . . .  -- 0.165
Test sort/adjacent  :    500000 -  . . . . .  -- 0.139
Test sort/groupby   :    500000 -  . . . . .  -- 1.138
Test sort/zip       :    500000 -  . . . . .  -- 1.159
Test sort/izip      :    500000 -  . . . . .  -- 0.126
Test sort/tee/izip  :    500000 -  . . . . .  -- 0.120 *
Test moooeeeep      :    500000 -  . . . . .  -- 0.131
Test iter*/sorted   :    500000 -  . . . . .  -- 0.157

I recently answered a related question to establish all the duplicates in a list, using a generator. It has the advantage that if used just to establish ‘if there is a duplicate’ then you just need to get the first item and the rest can be ignored, which is the ultimate shortcut.

This is an interesting set based approach I adapted straight from moooeeeep:

def getDupes(l):
    seen = set()
    seen_add = seen.add
    for x in l:
        if x in seen or seen_add(x):
            yield x

Accordingly, a full list of dupes would be list(getDupes(etc)). To simply test “if” there is a dupe, it should be wrapped as follows:

def hasDupes(l):
    try:
        if getDupes(l).next(): return True    # Found a dupe
    except StopIteration:
        pass
    return False

This scales well and provides consistent operating times wherever the dupe is in the list — I tested with lists of up to 1m entries. If you know something about the data, specifically, that dupes are likely to show up in the first half, or other things that let you skew your requirements, like needing to get the actual dupes, then there are a couple of really alternative dupe locators that might outperform. The two I recommend are…

Simple dict based approach, very readable:

def getDupes(c):
    d = {}
    for i in c:
        if i in d:
            if d[i]:
                yield i
                d[i] = False
        else:
            d[i] = True

Leverage itertools (essentially an ifilter/izip/tee) on the sorted list, very efficient if you are getting all the dupes though not as quick to get just the first:

def getDupes(c):
    a, b = itertools.tee(sorted(c))
    next(b, None)
    r = None
    for k, g in itertools.ifilter(lambda x: x[0]==x[1], itertools.izip(a, b)):
        if k != r:
            yield k
            r = k

These were the top performers from the approaches I tried for the full dupe list, with the first dupe occurring anywhere in a 1m element list from the start to the middle. It was surprising how little overhead the sort step added. Your mileage may vary, but here are my specific timed results:

Finding FIRST duplicate, single dupe places "n" elements in to 1m element array

Test set len change :        50 -  . . . . .  -- 0.002
Test in dict        :        50 -  . . . . .  -- 0.002
Test in set         :        50 -  . . . . .  -- 0.002
Test sort/adjacent  :        50 -  . . . . .  -- 0.023
Test sort/groupby   :        50 -  . . . . .  -- 0.026
Test sort/zip       :        50 -  . . . . .  -- 1.102
Test sort/izip      :        50 -  . . . . .  -- 0.035
Test sort/tee/izip  :        50 -  . . . . .  -- 0.024
Test moooeeeep      :        50 -  . . . . .  -- 0.001 *
Test iter*/sorted   :        50 -  . . . . .  -- 0.027

Test set len change :      5000 -  . . . . .  -- 0.017
Test in dict        :      5000 -  . . . . .  -- 0.003 *
Test in set         :      5000 -  . . . . .  -- 0.004
Test sort/adjacent  :      5000 -  . . . . .  -- 0.031
Test sort/groupby   :      5000 -  . . . . .  -- 0.035
Test sort/zip       :      5000 -  . . . . .  -- 1.080
Test sort/izip      :      5000 -  . . . . .  -- 0.043
Test sort/tee/izip  :      5000 -  . . . . .  -- 0.031
Test moooeeeep      :      5000 -  . . . . .  -- 0.003 *
Test iter*/sorted   :      5000 -  . . . . .  -- 0.031

Test set len change :     50000 -  . . . . .  -- 0.035
Test in dict        :     50000 -  . . . . .  -- 0.023
Test in set         :     50000 -  . . . . .  -- 0.023
Test sort/adjacent  :     50000 -  . . . . .  -- 0.036
Test sort/groupby   :     50000 -  . . . . .  -- 0.134
Test sort/zip       :     50000 -  . . . . .  -- 1.121
Test sort/izip      :     50000 -  . . . . .  -- 0.054
Test sort/tee/izip  :     50000 -  . . . . .  -- 0.045
Test moooeeeep      :     50000 -  . . . . .  -- 0.019 *
Test iter*/sorted   :     50000 -  . . . . .  -- 0.055

Test set len change :    500000 -  . . . . .  -- 0.249
Test in dict        :    500000 -  . . . . .  -- 0.145
Test in set         :    500000 -  . . . . .  -- 0.165
Test sort/adjacent  :    500000 -  . . . . .  -- 0.139
Test sort/groupby   :    500000 -  . . . . .  -- 1.138
Test sort/zip       :    500000 -  . . . . .  -- 1.159
Test sort/izip      :    500000 -  . . . . .  -- 0.126
Test sort/tee/izip  :    500000 -  . . . . .  -- 0.120 *
Test moooeeeep      :    500000 -  . . . . .  -- 0.131
Test iter*/sorted   :    500000 -  . . . . .  -- 0.157

回答 6

另一种简洁的方法是使用Counter

要确定原始列表中是否有重复项:

from collections import Counter

def has_dupes(l):
    # second element of the tuple has number of repetitions
    return Counter(l).most_common()[0][1] > 1

或获取具有重复项的列表:

def get_dupes(l):
    return [k for k, v in Counter(l).items() if v > 1]

Another way of doing this succinctly is with Counter.

To just determine if there are any duplicates in the original list:

from collections import Counter

def has_dupes(l):
    # second element of the tuple has number of repetitions
    return Counter(l).most_common()[0][1] > 1

Or to get a list of items that have duplicates:

def get_dupes(l):
    return [k for k, v in Counter(l).items() if v > 1]

回答 7

my_list = ['one', 'two', 'one']

duplicates = []

for value in my_list:
  if my_list.count(value) > 1:
    if value not in duplicates:
      duplicates.append(value)

print(duplicates) //["one"]
my_list = ['one', 'two', 'one']

duplicates = []

for value in my_list:
  if my_list.count(value) > 1:
    if value not in duplicates:
      duplicates.append(value)

print(duplicates) //["one"]

回答 8

我发现这是最好的性能,因为它会在发现第一个重复项时使操作短路,因此该算法具有时间和空间复杂度O(n),其中n是列表的长度:

def has_duplicated_elements(iterable):
    """ Given an `iterable`, return True if there are duplicated entries. """
    clean_elements_set = set()
    clean_elements_set_add = clean_elements_set.add

    for possible_duplicate_element in iterable:

        if possible_duplicate_element in clean_elements_set:
            return True

        else:
            clean_elements_set_add( possible_duplicate_element )

    return False

I found this to do the best performance because it short-circuit the operation when the first duplicated it found, then this algorithm has time and space complexity O(n) where n is the list’s length:

def has_duplicated_elements(iterable):
    """ Given an `iterable`, return True if there are duplicated entries. """
    clean_elements_set = set()
    clean_elements_set_add = clean_elements_set.add

    for possible_duplicate_element in iterable:

        if possible_duplicate_element in clean_elements_set:
            return True

        else:
            clean_elements_set_add( possible_duplicate_element )

    return False

回答 9

我真的不知道幕后花絮是什么,所以我只想保持简单。

def dupes(num_list):
    unique = []
    dupes = []
    for i in num_list:
        if i not in unique:
            unique.append(i)
        else:
            dupes.append(i)
    if len(dupes) != 0:
        return False
    else:
        return True

I dont really know what set does behind the scenes, so I just like to keep it simple.

def dupes(num_list):
    unique = []
    dupes = []
    for i in num_list:
        if i not in unique:
            unique.append(i)
        else:
            dupes.append(i)
    if len(dupes) != 0:
        return False
    else:
        return True

回答 10

一个更简单的解决方案如下。只需使用pandas .duplicated()方法检查对/错,然后求和。另请参阅 pandas.Series.duplicated — pandas 0.24.1文档

import pandas as pd

def has_duplicated(l):
    return pd.Series(l).duplicated().sum() > 0

print(has_duplicated(['one', 'two', 'one']))
# True
print(has_duplicated(['one', 'two', 'three']))
# False

A more simple solution is as follows. Just check True/False with pandas .duplicated() method and then take sum. Please also see pandas.Series.duplicated — pandas 0.24.1 documentation

import pandas as pd

def has_duplicated(l):
    return pd.Series(l).duplicated().sum() > 0

print(has_duplicated(['one', 'two', 'one']))
# True
print(has_duplicated(['one', 'two', 'three']))
# False

回答 11

如果列表包含不可散列的项目,则可以使用Alex Martelli的解决方案,但使用列表而不是集合,尽管对于较大的输入而言,速度较慢:O(N ^ 2)。

def has_duplicates(iterable):
    seen = []
    for x in iterable:
        if x in seen:
            return True
        seen.append(x)
    return False

If the list contains unhashable items, you can use Alex Martelli’s solution but with a list instead of a set, though it’s slower for larger inputs: O(N^2).

def has_duplicates(iterable):
    seen = []
    for x in iterable:
        if x in seen:
            return True
        seen.append(x)
    return False

回答 12

为了简单起见,我使用了pyrospade的方法,并在不区分大小写的Windows注册表的简短列表中对其进行了一些修改。

如果将原始PATH值字符串拆分为单独的路径,则可以使用以下命令删除所有“空”路径(空字符串或仅包含空格的字符串):

PATH_nonulls = [s for s in PATH if s.strip()]

def HasDupes(aseq) :
    s = set()
    return any(((x.lower() in s) or s.add(x.lower())) for x in aseq)

def GetDupes(aseq) :
    s = set()
    return set(x for x in aseq if ((x.lower() in s) or s.add(x.lower())))

def DelDupes(aseq) :
    seen = set()
    return [x for x in aseq if (x.lower() not in seen) and (not seen.add(x.lower()))]

原始PATH同时具有“空”条目和重复条目,以进行测试:

[list]  Root paths in HKLM\SYSTEM\CurrentControlSet\Control\Session Manager\Environment:PATH[list]  Root paths in HKLM\SYSTEM\CurrentControlSet\Control\Session Manager\Environment
  1  C:\Python37\
  2
  3
  4  C:\Python37\Scripts\
  5  c:\python37\
  6  C:\Program Files\ImageMagick-7.0.8-Q8
  7  C:\Program Files (x86)\poppler\bin
  8  D:\DATA\Sounds
  9  C:\Program Files (x86)\GnuWin32\bin
 10  C:\Program Files (x86)\Intel\iCLS Client\
 11  C:\Program Files\Intel\iCLS Client\
 12  D:\DATA\CCMD\FF
 13  D:\DATA\CCMD
 14  D:\DATA\UTIL
 15  C:\
 16  D:\DATA\UHELP
 17  %SystemRoot%\system32
 18
 19
 20  D:\DATA\CCMD\FF%SystemRoot%
 21  D:\DATA\Sounds
 22  %SystemRoot%\System32\Wbem
 23  D:\DATA\CCMD\FF
 24
 25
 26  c:\
 27  %SYSTEMROOT%\System32\WindowsPowerShell\v1.0\
 28

空路径已被删除,但仍具有重复项,例如(1,3)和(13,20):

    [list]  Null paths removed from HKLM\SYSTEM\CurrentControlSet\Control\Session Manager\Environment:PATH
  1  C:\Python37\
  2  C:\Python37\Scripts\
  3  c:\python37\
  4  C:\Program Files\ImageMagick-7.0.8-Q8
  5  C:\Program Files (x86)\poppler\bin
  6  D:\DATA\Sounds
  7  C:\Program Files (x86)\GnuWin32\bin
  8  C:\Program Files (x86)\Intel\iCLS Client\
  9  C:\Program Files\Intel\iCLS Client\
 10  D:\DATA\CCMD\FF
 11  D:\DATA\CCMD
 12  D:\DATA\UTIL
 13  C:\
 14  D:\DATA\UHELP
 15  %SystemRoot%\system32
 16  D:\DATA\CCMD\FF%SystemRoot%
 17  D:\DATA\Sounds
 18  %SystemRoot%\System32\Wbem
 19  D:\DATA\CCMD\FF
 20  c:\
 21  %SYSTEMROOT%\System32\WindowsPowerShell\v1.0\

最后,这些假人已被删除:

[list]  Massaged path list from in HKLM\SYSTEM\CurrentControlSet\Control\Session Manager\Environment:PATH
  1  C:\Python37\
  2  C:\Python37\Scripts\
  3  C:\Program Files\ImageMagick-7.0.8-Q8
  4  C:\Program Files (x86)\poppler\bin
  5  D:\DATA\Sounds
  6  C:\Program Files (x86)\GnuWin32\bin
  7  C:\Program Files (x86)\Intel\iCLS Client\
  8  C:\Program Files\Intel\iCLS Client\
  9  D:\DATA\CCMD\FF
 10  D:\DATA\CCMD
 11  D:\DATA\UTIL
 12  C:\
 13  D:\DATA\UHELP
 14  %SystemRoot%\system32
 15  D:\DATA\CCMD\FF%SystemRoot%
 16  %SystemRoot%\System32\Wbem
 17  %SYSTEMROOT%\System32\WindowsPowerShell\v1.0\

I used pyrospade’s approach, for its simplicity, and modified that slightly on a short list made from the case-insensitive Windows registry.

If the raw PATH value string is split into individual paths all ‘null’ paths (empty or whitespace-only strings) can be removed by using:

PATH_nonulls = [s for s in PATH if s.strip()]

def HasDupes(aseq) :
    s = set()
    return any(((x.lower() in s) or s.add(x.lower())) for x in aseq)

def GetDupes(aseq) :
    s = set()
    return set(x for x in aseq if ((x.lower() in s) or s.add(x.lower())))

def DelDupes(aseq) :
    seen = set()
    return [x for x in aseq if (x.lower() not in seen) and (not seen.add(x.lower()))]

The original PATH has both ‘null’ entries and duplicates for testing purposes:

[list]  Root paths in HKLM\SYSTEM\CurrentControlSet\Control\Session Manager\Environment:PATH[list]  Root paths in HKLM\SYSTEM\CurrentControlSet\Control\Session Manager\Environment
  1  C:\Python37\
  2
  3
  4  C:\Python37\Scripts\
  5  c:\python37\
  6  C:\Program Files\ImageMagick-7.0.8-Q8
  7  C:\Program Files (x86)\poppler\bin
  8  D:\DATA\Sounds
  9  C:\Program Files (x86)\GnuWin32\bin
 10  C:\Program Files (x86)\Intel\iCLS Client\
 11  C:\Program Files\Intel\iCLS Client\
 12  D:\DATA\CCMD\FF
 13  D:\DATA\CCMD
 14  D:\DATA\UTIL
 15  C:\
 16  D:\DATA\UHELP
 17  %SystemRoot%\system32
 18
 19
 20  D:\DATA\CCMD\FF%SystemRoot%
 21  D:\DATA\Sounds
 22  %SystemRoot%\System32\Wbem
 23  D:\DATA\CCMD\FF
 24
 25
 26  c:\
 27  %SYSTEMROOT%\System32\WindowsPowerShell\v1.0\
 28

Null paths have been removed, but still has duplicates, e.g., (1, 3) and (13, 20):

    [list]  Null paths removed from HKLM\SYSTEM\CurrentControlSet\Control\Session Manager\Environment:PATH
  1  C:\Python37\
  2  C:\Python37\Scripts\
  3  c:\python37\
  4  C:\Program Files\ImageMagick-7.0.8-Q8
  5  C:\Program Files (x86)\poppler\bin
  6  D:\DATA\Sounds
  7  C:\Program Files (x86)\GnuWin32\bin
  8  C:\Program Files (x86)\Intel\iCLS Client\
  9  C:\Program Files\Intel\iCLS Client\
 10  D:\DATA\CCMD\FF
 11  D:\DATA\CCMD
 12  D:\DATA\UTIL
 13  C:\
 14  D:\DATA\UHELP
 15  %SystemRoot%\system32
 16  D:\DATA\CCMD\FF%SystemRoot%
 17  D:\DATA\Sounds
 18  %SystemRoot%\System32\Wbem
 19  D:\DATA\CCMD\FF
 20  c:\
 21  %SYSTEMROOT%\System32\WindowsPowerShell\v1.0\

And finally, the dupes have been removed:

[list]  Massaged path list from in HKLM\SYSTEM\CurrentControlSet\Control\Session Manager\Environment:PATH
  1  C:\Python37\
  2  C:\Python37\Scripts\
  3  C:\Program Files\ImageMagick-7.0.8-Q8
  4  C:\Program Files (x86)\poppler\bin
  5  D:\DATA\Sounds
  6  C:\Program Files (x86)\GnuWin32\bin
  7  C:\Program Files (x86)\Intel\iCLS Client\
  8  C:\Program Files\Intel\iCLS Client\
  9  D:\DATA\CCMD\FF
 10  D:\DATA\CCMD
 11  D:\DATA\UTIL
 12  C:\
 13  D:\DATA\UHELP
 14  %SystemRoot%\system32
 15  D:\DATA\CCMD\FF%SystemRoot%
 16  %SystemRoot%\System32\Wbem
 17  %SYSTEMROOT%\System32\WindowsPowerShell\v1.0\

回答 13

def check_duplicates(my_list):
    seen = {}
    for item in my_list:
        if seen.get(item):
            return True
        seen[item] = True
    return False
def check_duplicates(my_list):
    seen = {}
    for item in my_list:
        if seen.get(item):
            return True
        seen[item] = True
    return False

如何检查字符串中的特定字符?[关闭]

问题:如何检查字符串中的特定字符?[关闭]

如何使用Python 2检查字符串中是否包含几个特定字符?

例如,给定以下字符串:

罪犯偷走了1,000,000美元的珠宝。

如何检测它是否带有美元符号(“ $”),逗号(“,”)和数字?

How can I check if a string has several specific characters in it using Python 2?

For example, given the following string:

The criminals stole $1,000,000 in jewels.

How do I detect if it has dollar signs (“$”), commas (“,”), and numbers?


回答 0

假设您的字符串是s

'$' in s        # found
'$' not in s    # not found

# original answer given, but less Pythonic than the above...
s.find('$')==-1 # not found
s.find('$')!=-1 # found

对于其他字符,依此类推。

… 要么

pattern = re.compile(r'\d\$,')
if pattern.findall(s):
    print('Found')
else
    print('Not found')

… 要么

chars = set('0123456789$,')
if any((c in chars) for c in s):
    print('Found')
else:
    print('Not Found')

[编辑:添加'$' in s答案]

Assuming your string is s:

'$' in s        # found
'$' not in s    # not found

# original answer given, but less Pythonic than the above...
s.find('$')==-1 # not found
s.find('$')!=-1 # found

And so on for other characters.

… or

pattern = re.compile(r'\d\$,')
if pattern.findall(s):
    print('Found')
else
    print('Not found')

… or

chars = set('0123456789$,')
if any((c in chars) for c in s):
    print('Found')
else:
    print('Not Found')

[Edit: added the '$' in s answers]


回答 1

用户Jochen Ritzel在评论用户dappawit对此问题的答案时说了这一点。它应该工作:

('1' in var) and ('2' in var) and ('3' in var) ...

“ 1”,“ 2”等应替换为您要查找的字符。

有关字符串的一些信息,请参见Python 2.7文档中的此页面,包括有关使用in运算符进行子字符串测试的信息。

更新:这与我上面的建议做的工作相同,重复次数更少:

# When looking for single characters, this checks for any of the characters...
# ...since strings are collections of characters
any(i in '<string>' for i in '123')
# any(i in 'a' for i in '123') -> False
# any(i in 'b3' for i in '123') -> True

# And when looking for subsrings
any(i in '<string>' for i in ('11','22','33'))
# any(i in 'hello' for i in ('18','36','613')) -> False
# any(i in '613 mitzvahs' for i in ('18','36','613')) ->True

user Jochen Ritzel said this in a comment to an answer to this question from user dappawit. It should work:

('1' in var) and ('2' in var) and ('3' in var) ...

‘1’, ‘2’, etc. should be replaced with the characters you are looking for.

See this page in the Python 2.7 documentation for some information on strings, including about using the in operator for substring tests.

Update: This does the same job as my above suggestion with less repetition:

# When looking for single characters, this checks for any of the characters...
# ...since strings are collections of characters
any(i in '<string>' for i in '123')
# any(i in 'a' for i in '123') -> False
# any(i in 'b3' for i in '123') -> True

# And when looking for subsrings
any(i in '<string>' for i in ('11','22','33'))
# any(i in 'hello' for i in ('18','36','613')) -> False
# any(i in '613 mitzvahs' for i in ('18','36','613')) ->True

回答 2

快速比较时间,以回应Abbafei的帖子:

import timeit

def func1():
    phrase = 'Lucky Dog'
    return any(i in 'LD' for i in phrase)

def func2():
    phrase = 'Lucky Dog'
    if ('L' in phrase) or ('D' in phrase):
        return True
    else:
        return False

if __name__ == '__main__': 
    func1_time = timeit.timeit(func1, number=100000)
    func2_time = timeit.timeit(func2, number=100000)
    print('Func1 Time: {0}\nFunc2 Time: {1}'.format(func1_time, func2_time))

输出:

Func1 Time: 0.0737484362111
Func2 Time: 0.0125144964371

因此,任何代码都更紧凑,而条件代码则更快。


编辑: TL; DR-对于长字符串,if-then 仍然比任何字符串都快得多!

我决定根据评论中提出的一些有效点来比较长随机字符串的计时:

# Tested in Python 2.7.14

import timeit
from string import ascii_letters
from random import choice

def create_random_string(length=1000):
    random_list = [choice(ascii_letters) for x in range(length)]
    return ''.join(random_list)

def function_using_any(phrase):
    return any(i in 'LD' for i in phrase)

def function_using_if_then(phrase):
    if ('L' in phrase) or ('D' in phrase):
        return True
    else:
        return False

if __name__ == '__main__':
    random_string = create_random_string(length=2000)
    func1_time = timeit.timeit(stmt="function_using_any(random_string)",
                               setup="from __main__ import function_using_any, random_string",
                               number=200000)
    func2_time = timeit.timeit(stmt="function_using_if_then(random_string)",
                               setup="from __main__ import function_using_if_then, random_string",
                               number=200000)
    print('Time for function using any: {0}\nTime for function using if-then: {1}'.format(func1_time, func2_time))

输出:

Time for function using any: 0.1342546
Time for function using if-then: 0.0201827

如果-那么几乎比任何一个都快一个数量级!

Quick comparison of timings in response to the post by Abbafei:

import timeit

def func1():
    phrase = 'Lucky Dog'
    return any(i in 'LD' for i in phrase)

def func2():
    phrase = 'Lucky Dog'
    if ('L' in phrase) or ('D' in phrase):
        return True
    else:
        return False

if __name__ == '__main__': 
    func1_time = timeit.timeit(func1, number=100000)
    func2_time = timeit.timeit(func2, number=100000)
    print('Func1 Time: {0}\nFunc2 Time: {1}'.format(func1_time, func2_time))

Output:

Func1 Time: 0.0737484362111
Func2 Time: 0.0125144964371

So the code is more compact with any, but faster with the conditional.


EDIT : TL;DR — For long strings, if-then is still much faster than any!

I decided to compare the timing for a long random string based on some of the valid points raised in the comments:

# Tested in Python 2.7.14

import timeit
from string import ascii_letters
from random import choice

def create_random_string(length=1000):
    random_list = [choice(ascii_letters) for x in range(length)]
    return ''.join(random_list)

def function_using_any(phrase):
    return any(i in 'LD' for i in phrase)

def function_using_if_then(phrase):
    if ('L' in phrase) or ('D' in phrase):
        return True
    else:
        return False

if __name__ == '__main__':
    random_string = create_random_string(length=2000)
    func1_time = timeit.timeit(stmt="function_using_any(random_string)",
                               setup="from __main__ import function_using_any, random_string",
                               number=200000)
    func2_time = timeit.timeit(stmt="function_using_if_then(random_string)",
                               setup="from __main__ import function_using_if_then, random_string",
                               number=200000)
    print('Time for function using any: {0}\nTime for function using if-then: {1}'.format(func1_time, func2_time))

Output:

Time for function using any: 0.1342546
Time for function using if-then: 0.0201827

If-then is almost an order of magnitude faster than any!


回答 3

这将测试字符串是否由某些组合或数字,美元符号和逗号组成。那是您要找的东西吗?

汇入

s1 ='测试字符串'
s2 ='1234,12345 $'

regex = re.compile('[0-9,$] + $')

如果(regex.match(s1)):
   打印“ s1匹配”
其他:
   打印“ s1不匹配”

如果(regex.match(s2)):
   打印“ s2匹配”
其他:
   打印“ s2不匹配”

This will test if strings are made up of some combination or digits, the dollar sign, and a commas. Is that what you’re looking for?

import re

s1 = 'Testing string'
s2 = '1234,12345$'

regex = re.compile('[0-9,$]+$')

if ( regex.match(s1) ):
   print "s1 matched"
else:
   print "s1 didn't match"

if ( regex.match(s2) ):
   print "s2 matched"
else:
   print "s2 didn't match"

回答 4

s=input("Enter any character:")   
if s.isalnum():   
   print("Alpha Numeric Character")   
   if s.isalpha():   
       print("Alphabet character")   
       if s.islower():   
         print("Lower case alphabet character")   
       else:   
         print("Upper case alphabet character")   
   else:   
     print("it is a digit")   
elif s.isspace():   
    print("It is space character")   

否则:
print(“非空间特殊字符”)

s=input("Enter any character:")   
if s.isalnum():   
   print("Alpha Numeric Character")   
   if s.isalpha():   
       print("Alphabet character")   
       if s.islower():   
         print("Lower case alphabet character")   
       else:   
         print("Upper case alphabet character")   
   else:   
     print("it is a digit")   
elif s.isspace():   
    print("It is space character")   

else:
print(“Non Space Special Character”)


检查字符串是否可以在Python中转换为float

问题:检查字符串是否可以在Python中转换为float

我有一些运行在字符串列表中的Python代码,并在可能的情况下将它们转换为整数或浮点数。对整数执行此操作非常简单

if element.isdigit():
  newelement = int(element)

浮点数比较困难。现在,我正在使用partition('.')分割字符串并检查以确保一侧或两侧都是数字。

partition = element.partition('.')
if (partition[0].isdigit() and partition[1] == '.' and partition[2].isdigit()) 
    or (partition[0] == '' and partition[1] == '.' and partition[2].isdigit()) 
    or (partition[0].isdigit() and partition[1] == '.' and partition[2] == ''):
  newelement = float(element)

这是可行的,但是显然,如果使用if语句有点让人头疼。我考虑的另一种解决方案是将转换仅包装在try / catch块中,然后查看转换是否成功,如本问题所述

还有其他想法吗?关于分区和尝试/捕获方法的相对优点的看法?

I’ve got some Python code that runs through a list of strings and converts them to integers or floating point numbers if possible. Doing this for integers is pretty easy

if element.isdigit():
  newelement = int(element)

Floating point numbers are more difficult. Right now I’m using partition('.') to split the string and checking to make sure that one or both sides are digits.

partition = element.partition('.')
if (partition[0].isdigit() and partition[1] == '.' and partition[2].isdigit()) 
    or (partition[0] == '' and partition[1] == '.' and partition[2].isdigit()) 
    or (partition[0].isdigit() and partition[1] == '.' and partition[2] == ''):
  newelement = float(element)

This works, but obviously the if statement for that is a bit of a bear. The other solution I considered is to just wrap the conversion in a try/catch block and see if it succeeds, as described in this question.

Anyone have any other ideas? Opinions on the relative merits of the partition and try/catch approaches?


回答 0

我会用..

try:
    float(element)
except ValueError:
    print "Not a float"

..它很简单,并且可以正常工作

另一个选择是正则表达式:

import re
if re.match(r'^-?\d+(?:\.\d+)?$', element) is None:
    print "Not float"

I would just use..

try:
    float(element)
except ValueError:
    print "Not a float"

..it’s simple, and it works

Another option would be a regular expression:

import re
if re.match(r'^-?\d+(?:\.\d+)?$', element) is None:
    print "Not float"

回答 1

检查浮点数的Python方法:

def isfloat(value):
  try:
    float(value)
    return True
  except ValueError:
    return False

不要被隐藏在浮船上的妖精所咬!做单元测试!

什么是浮动货币,哪些不是浮动货币,可能会让您感到惊讶:

Command to parse                        Is it a float?  Comment
--------------------------------------  --------------- ------------
print(isfloat(""))                      False
print(isfloat("1234567"))               True 
print(isfloat("NaN"))                   True            nan is also float
print(isfloat("NaNananana BATMAN"))     False
print(isfloat("123.456"))               True
print(isfloat("123.E4"))                True
print(isfloat(".1"))                    True
print(isfloat("1,234"))                 False
print(isfloat("NULL"))                  False           case insensitive
print(isfloat(",1"))                    False           
print(isfloat("123.EE4"))               False           
print(isfloat("6.523537535629999e-07")) True
print(isfloat("6e777777"))              True            This is same as Inf
print(isfloat("-iNF"))                  True
print(isfloat("1.797693e+308"))         True
print(isfloat("infinity"))              True
print(isfloat("infinity and BEYOND"))   False
print(isfloat("12.34.56"))              False           Two dots not allowed.
print(isfloat("#56"))                   False
print(isfloat("56%"))                   False
print(isfloat("0E0"))                   True
print(isfloat("x86E0"))                 False
print(isfloat("86-5"))                  False
print(isfloat("True"))                  False           Boolean is not a float.   
print(isfloat(True))                    True            Boolean is a float
print(isfloat("+1e1^5"))                False
print(isfloat("+1e1"))                  True
print(isfloat("+1e1.3"))                False
print(isfloat("+1.3P1"))                False
print(isfloat("-+1"))                   False
print(isfloat("(1)"))                   False           brackets not interpreted

Python method to check for float:

def isfloat(value):
  try:
    float(value)
    return True
  except ValueError:
    return False

Don’t get bit by the goblins hiding in the float boat! DO UNIT TESTING!

What is, and is not a float may surprise you:

Command to parse                        Is it a float?  Comment
--------------------------------------  --------------- ------------
print(isfloat(""))                      False
print(isfloat("1234567"))               True 
print(isfloat("NaN"))                   True            nan is also float
print(isfloat("NaNananana BATMAN"))     False
print(isfloat("123.456"))               True
print(isfloat("123.E4"))                True
print(isfloat(".1"))                    True
print(isfloat("1,234"))                 False
print(isfloat("NULL"))                  False           case insensitive
print(isfloat(",1"))                    False           
print(isfloat("123.EE4"))               False           
print(isfloat("6.523537535629999e-07")) True
print(isfloat("6e777777"))              True            This is same as Inf
print(isfloat("-iNF"))                  True
print(isfloat("1.797693e+308"))         True
print(isfloat("infinity"))              True
print(isfloat("infinity and BEYOND"))   False
print(isfloat("12.34.56"))              False           Two dots not allowed.
print(isfloat("#56"))                   False
print(isfloat("56%"))                   False
print(isfloat("0E0"))                   True
print(isfloat("x86E0"))                 False
print(isfloat("86-5"))                  False
print(isfloat("True"))                  False           Boolean is not a float.   
print(isfloat(True))                    True            Boolean is a float
print(isfloat("+1e1^5"))                False
print(isfloat("+1e1"))                  True
print(isfloat("+1e1.3"))                False
print(isfloat("+1.3P1"))                False
print(isfloat("-+1"))                   False
print(isfloat("(1)"))                   False           brackets not interpreted

回答 2

'1.43'.replace('.','',1).isdigit()

true仅当存在一个或没有“。”时返回。在数字字符串中。

'1.4.3'.replace('.','',1).isdigit()

将返回 false

'1.ww'.replace('.','',1).isdigit()

将返回 false

'1.43'.replace('.','',1).isdigit()

which will return true only if there is one or no ‘.’ in the string of digits.

'1.4.3'.replace('.','',1).isdigit()

will return false

'1.ww'.replace('.','',1).isdigit()

will return false


回答 3

TL; DR

  • 如果您输入的大部分内容都是可以转换为浮点数的字符串,try: except:方法是最好的本机Python方法。
  • 如果您的输入大部分是不能输入的字符串转换为浮点数的,则正则表达式或partition方法会更好。
  • 如果您1)不确定您的输入或需要更快的速度,并且2)不介意并可以安装第三方C扩展名,则fastnumbers效果很好。

通过第三方模块可以使用另一种称为fastnumbers的方法(公开,我是作者)。它提供了一个称为isfloat的功能。我在这个答案中采用了Jacob Gabrielson概述的单元测试示例,但是添加了该fastnumbers.isfloat方法。我还应该注意,Jacob的示例对正则表达式选项不公道,因为该示例中的大多数时间都因为点运算符而花费在全局查找中try: except:


def is_float_try(str):
    try:
        float(str)
        return True
    except ValueError:
        return False

import re
_float_regexp = re.compile(r"^[-+]?(?:\b[0-9]+(?:\.[0-9]*)?|\.[0-9]+\b)(?:[eE][-+]?[0-9]+\b)?$").match
def is_float_re(str):
    return True if _float_regexp(str) else False

def is_float_partition(element):
    partition=element.partition('.')
    if (partition[0].isdigit() and partition[1]=='.' and partition[2].isdigit()) or (partition[0]=='' and partition[1]=='.' and partition[2].isdigit()) or (partition[0].isdigit() and partition[1]=='.' and partition[2]==''):
        return True
    else:
        return False

from fastnumbers import isfloat


if __name__ == '__main__':
    import unittest
    import timeit

    class ConvertTests(unittest.TestCase):

        def test_re_perf(self):
            print
            print 're sad:', timeit.Timer('ttest.is_float_re("12.2x")', "import ttest").timeit()
            print 're happy:', timeit.Timer('ttest.is_float_re("12.2")', "import ttest").timeit()

        def test_try_perf(self):
            print
            print 'try sad:', timeit.Timer('ttest.is_float_try("12.2x")', "import ttest").timeit()
            print 'try happy:', timeit.Timer('ttest.is_float_try("12.2")', "import ttest").timeit()

        def test_fn_perf(self):
            print
            print 'fn sad:', timeit.Timer('ttest.isfloat("12.2x")', "import ttest").timeit()
            print 'fn happy:', timeit.Timer('ttest.isfloat("12.2")', "import ttest").timeit()


        def test_part_perf(self):
            print
            print 'part sad:', timeit.Timer('ttest.is_float_partition("12.2x")', "import ttest").timeit()
            print 'part happy:', timeit.Timer('ttest.is_float_partition("12.2")', "import ttest").timeit()

    unittest.main()

在我的机器上,输出为:

fn sad: 0.220988988876
fn happy: 0.212214946747
.
part sad: 1.2219619751
part happy: 0.754667043686
.
re sad: 1.50515985489
re happy: 1.01107215881
.
try sad: 2.40243887901
try happy: 0.425730228424
.
----------------------------------------------------------------------
Ran 4 tests in 7.761s

OK

如您所见,regex实际上并不像最初看起来的那样糟糕,并且如果您确实对速度有需求,则此fastnumbers方法相当不错。

TL;DR:

  • If your input is mostly strings that can be converted to floats, the try: except: method is the best native Python method.
  • If your input is mostly strings that cannot be converted to floats, regular expressions or the partition method will be better.
  • If you are 1) unsure of your input or need more speed and 2) don’t mind and can install a third-party C-extension, fastnumbers works very well.

There is another method available via a third-party module called fastnumbers (disclosure, I am the author); it provides a function called isfloat. I have taken the unittest example outlined by Jacob Gabrielson in this answer, but added the fastnumbers.isfloat method. I should also note that Jacob’s example did not do justice to the regex option because most of the time in that example was spent in global lookups because of the dot operator… I have modified that function to give a fairer comparison to try: except:.


def is_float_try(str):
    try:
        float(str)
        return True
    except ValueError:
        return False

import re
_float_regexp = re.compile(r"^[-+]?(?:\b[0-9]+(?:\.[0-9]*)?|\.[0-9]+\b)(?:[eE][-+]?[0-9]+\b)?$").match
def is_float_re(str):
    return True if _float_regexp(str) else False

def is_float_partition(element):
    partition=element.partition('.')
    if (partition[0].isdigit() and partition[1]=='.' and partition[2].isdigit()) or (partition[0]=='' and partition[1]=='.' and partition[2].isdigit()) or (partition[0].isdigit() and partition[1]=='.' and partition[2]==''):
        return True
    else:
        return False

from fastnumbers import isfloat


if __name__ == '__main__':
    import unittest
    import timeit

    class ConvertTests(unittest.TestCase):

        def test_re_perf(self):
            print
            print 're sad:', timeit.Timer('ttest.is_float_re("12.2x")', "import ttest").timeit()
            print 're happy:', timeit.Timer('ttest.is_float_re("12.2")', "import ttest").timeit()

        def test_try_perf(self):
            print
            print 'try sad:', timeit.Timer('ttest.is_float_try("12.2x")', "import ttest").timeit()
            print 'try happy:', timeit.Timer('ttest.is_float_try("12.2")', "import ttest").timeit()

        def test_fn_perf(self):
            print
            print 'fn sad:', timeit.Timer('ttest.isfloat("12.2x")', "import ttest").timeit()
            print 'fn happy:', timeit.Timer('ttest.isfloat("12.2")', "import ttest").timeit()


        def test_part_perf(self):
            print
            print 'part sad:', timeit.Timer('ttest.is_float_partition("12.2x")', "import ttest").timeit()
            print 'part happy:', timeit.Timer('ttest.is_float_partition("12.2")', "import ttest").timeit()

    unittest.main()

On my machine, the output is:

fn sad: 0.220988988876
fn happy: 0.212214946747
.
part sad: 1.2219619751
part happy: 0.754667043686
.
re sad: 1.50515985489
re happy: 1.01107215881
.
try sad: 2.40243887901
try happy: 0.425730228424
.
----------------------------------------------------------------------
Ran 4 tests in 7.761s

OK

As you can see, regex is actually not as bad as it originally seemed, and if you have a real need for speed, the fastnumbers method is quite good.


回答 4

如果您关心性能(我不建议您这样做),则只要您不期望太多,基于尝试的方法就是明显的赢家(与基于分区的方法或regexp方法相比)无效的字符串,在这种情况下,它可能会变慢(可能是由于异常处理的开销)。

再一次,我不建议您关心性能,只是给您数据,以防您每秒进行100亿次这样的操作。同样,基于分区的代码不能处理至少一个有效的字符串。

$ ./floatstr.py
F..
分区悲伤:3.1102449894
分区快乐:2.09208488464
..
难过:7.76906108856
重新开心:7.09421992302
..
尝试悲伤:12.1525540352
尝试快乐:1.44165301323
。
================================================== ====================
失败:test_partition(__main __。ConvertTests)
-------------------------------------------------- --------------------
追溯(最近一次通话):
  在test_partition中,文件“ ./floatstr.py”,第48行
    self.failUnless(is_float_partition(“ 20e2”))
断言错误

-------------------------------------------------- --------------------
在33.670秒内进行了8次测试

失败(失败= 1)

以下是代码(Python 2.6,来自John Gietzen的答案的正则表达式):

def is_float_try(str):
    try:
        float(str)
        return True
    except ValueError:
        return False

import re
_float_regexp = re.compile(r"^[-+]?(?:\b[0-9]+(?:\.[0-9]*)?|\.[0-9]+\b)(?:[eE][-+]?[0-9]+\b)?$")
def is_float_re(str):
    return re.match(_float_regexp, str)


def is_float_partition(element):
    partition=element.partition('.')
    if (partition[0].isdigit() and partition[1]=='.' and partition[2].isdigit()) or (partition[0]=='' and partition[1]=='.' and pa\
rtition[2].isdigit()) or (partition[0].isdigit() and partition[1]=='.' and partition[2]==''):
        return True

if __name__ == '__main__':
    import unittest
    import timeit

    class ConvertTests(unittest.TestCase):
        def test_re(self):
            self.failUnless(is_float_re("20e2"))

        def test_try(self):
            self.failUnless(is_float_try("20e2"))

        def test_re_perf(self):
            print
            print 're sad:', timeit.Timer('floatstr.is_float_re("12.2x")', "import floatstr").timeit()
            print 're happy:', timeit.Timer('floatstr.is_float_re("12.2")', "import floatstr").timeit()

        def test_try_perf(self):
            print
            print 'try sad:', timeit.Timer('floatstr.is_float_try("12.2x")', "import floatstr").timeit()
            print 'try happy:', timeit.Timer('floatstr.is_float_try("12.2")', "import floatstr").timeit()

        def test_partition_perf(self):
            print
            print 'partition sad:', timeit.Timer('floatstr.is_float_partition("12.2x")', "import floatstr").timeit()
            print 'partition happy:', timeit.Timer('floatstr.is_float_partition("12.2")', "import floatstr").timeit()

        def test_partition(self):
            self.failUnless(is_float_partition("20e2"))

        def test_partition2(self):
            self.failUnless(is_float_partition(".2"))

        def test_partition3(self):
            self.failIf(is_float_partition("1234x.2"))

    unittest.main()

If you cared about performance (and I’m not suggesting you should), the try-based approach is the clear winner (compared with your partition-based approach or the regexp approach), as long as you don’t expect a lot of invalid strings, in which case it’s potentially slower (presumably due to the cost of exception handling).

Again, I’m not suggesting you care about performance, just giving you the data in case you’re doing this 10 billion times a second, or something. Also, the partition-based code doesn’t handle at least one valid string.

$ ./floatstr.py
F..
partition sad: 3.1102449894
partition happy: 2.09208488464
..
re sad: 7.76906108856
re happy: 7.09421992302
..
try sad: 12.1525540352
try happy: 1.44165301323
.
======================================================================
FAIL: test_partition (__main__.ConvertTests)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "./floatstr.py", line 48, in test_partition
    self.failUnless(is_float_partition("20e2"))
AssertionError

----------------------------------------------------------------------
Ran 8 tests in 33.670s

FAILED (failures=1)

Here’s the code (Python 2.6, regexp taken from John Gietzen’s answer):

def is_float_try(str):
    try:
        float(str)
        return True
    except ValueError:
        return False

import re
_float_regexp = re.compile(r"^[-+]?(?:\b[0-9]+(?:\.[0-9]*)?|\.[0-9]+\b)(?:[eE][-+]?[0-9]+\b)?$")
def is_float_re(str):
    return re.match(_float_regexp, str)


def is_float_partition(element):
    partition=element.partition('.')
    if (partition[0].isdigit() and partition[1]=='.' and partition[2].isdigit()) or (partition[0]=='' and partition[1]=='.' and pa\
rtition[2].isdigit()) or (partition[0].isdigit() and partition[1]=='.' and partition[2]==''):
        return True

if __name__ == '__main__':
    import unittest
    import timeit

    class ConvertTests(unittest.TestCase):
        def test_re(self):
            self.failUnless(is_float_re("20e2"))

        def test_try(self):
            self.failUnless(is_float_try("20e2"))

        def test_re_perf(self):
            print
            print 're sad:', timeit.Timer('floatstr.is_float_re("12.2x")', "import floatstr").timeit()
            print 're happy:', timeit.Timer('floatstr.is_float_re("12.2")', "import floatstr").timeit()

        def test_try_perf(self):
            print
            print 'try sad:', timeit.Timer('floatstr.is_float_try("12.2x")', "import floatstr").timeit()
            print 'try happy:', timeit.Timer('floatstr.is_float_try("12.2")', "import floatstr").timeit()

        def test_partition_perf(self):
            print
            print 'partition sad:', timeit.Timer('floatstr.is_float_partition("12.2x")', "import floatstr").timeit()
            print 'partition happy:', timeit.Timer('floatstr.is_float_partition("12.2")', "import floatstr").timeit()

        def test_partition(self):
            self.failUnless(is_float_partition("20e2"))

        def test_partition2(self):
            self.failUnless(is_float_partition(".2"))

        def test_partition3(self):
            self.failIf(is_float_partition("1234x.2"))

    unittest.main()

回答 5

仅出于多样性,这是另一种方法。

>>> all([i.isnumeric() for i in '1.2'.split('.',1)])
True
>>> all([i.isnumeric() for i in '2'.split('.',1)])
True
>>> all([i.isnumeric() for i in '2.f'.split('.',1)])
False

编辑:我确信它不会容纳所有的float情况,尽管特别是在有指数的时候。为了解决它看起来像这样。这将返回True,只有val是浮点数,而对于int则返回False,但性能可能不如regex。

>>> def isfloat(val):
...     return all([ [any([i.isnumeric(), i in ['.','e']]) for i in val],  len(val.split('.')) == 2] )
...
>>> isfloat('1')
False
>>> isfloat('1.2')
True
>>> isfloat('1.2e3')
True
>>> isfloat('12e3')
False

Just for variety here is another method to do it.

>>> all([i.isnumeric() for i in '1.2'.split('.',1)])
True
>>> all([i.isnumeric() for i in '2'.split('.',1)])
True
>>> all([i.isnumeric() for i in '2.f'.split('.',1)])
False

Edit: Im sure it will not hold up to all cases of float though especially when there is an exponent. To solve that it looks like this. This will return True only val is a float and False for int but is probably less performant than regex.

>>> def isfloat(val):
...     return all([ [any([i.isnumeric(), i in ['.','e']]) for i in val],  len(val.split('.')) == 2] )
...
>>> isfloat('1')
False
>>> isfloat('1.2')
True
>>> isfloat('1.2e3')
True
>>> isfloat('12e3')
False

回答 6

此正则表达式将检查科学的浮点数:

^[-+]?(?:\b[0-9]+(?:\.[0-9]*)?|\.[0-9]+\b)(?:[eE][-+]?[0-9]+\b)?$

但是,我相信您最好的选择是尝试使用解析器。

This regex will check for scientific floating point numbers:

^[-+]?(?:\b[0-9]+(?:\.[0-9]*)?|\.[0-9]+\b)(?:[eE][-+]?[0-9]+\b)?$

However, I believe that your best bet is to use the parser in a try.


回答 7

如果您不必担心数字的科学表达式或其他表达式,而只使用可能是带或不带句点的数字的字符串,则:

功能

def is_float(s):
    result = False
    if s.count(".") == 1:
        if s.replace(".", "").isdigit():
            result = True
    return result

Lambda版本

is_float = lambda x: x.replace('.','',1).isdigit() and "." in x

if is_float(some_string):
    some_string = float(some_string)
elif some_string.isdigit():
    some_string = int(some_string)
else:
    print "Does not convert to int or float."

这样,您就不会意外将应为int的内容转换为float。

If you don’t need to worry about scientific or other expressions of numbers and are only working with strings that could be numbers with or without a period:

Function

def is_float(s):
    result = False
    if s.count(".") == 1:
        if s.replace(".", "").isdigit():
            result = True
    return result

Lambda version

is_float = lambda x: x.replace('.','',1).isdigit() and "." in x

Example

if is_float(some_string):
    some_string = float(some_string)
elif some_string.isdigit():
    some_string = int(some_string)
else:
    print "Does not convert to int or float."

This way you aren’t accidentally converting what should be an int, into a float.


回答 8

函数的简化版本, is_digit(str)在大多数情况下就足够了(不考虑指数符号“ NaN”值):

def is_digit(str):
    return str.lstrip('-').replace('.', '').isdigit()

Simplified version of the function is_digit(str), which suffices in most cases (doesn’t consider exponential notation and “NaN” value):

def is_digit(str):
    return str.lstrip('-').replace('.', '').isdigit()

回答 9

我使用了已经提到的函数,但是很快我注意到,字符串“ Nan”,“ Inf”及其变体被视为数字。因此,我建议您对函数进行改进,使其在这些输入类型上返回false,并且不会使“ 1e3”变体失败:

def is_float(text):
    # check for nan/infinity etc.
    if text.isalpha():
        return False
    try:
        float(text)
        return True
    except ValueError:
        return False

I used the function already mentioned, but soon I notice that strings as “Nan”, “Inf” and it’s variation are considered as number. So I propose you improved version of the function, that will return false on those type of input and will not fail “1e3” variants:

def is_float(text):
    # check for nan/infinity etc.
    if text.isalpha():
        return False
    try:
        float(text)
        return True
    except ValueError:
        return False

回答 10

尝试转换为浮点数。如果有错误,请打印ValueError异常。

try:
    x = float('1.23')
    print('val=',x)
    y = float('abc')
    print('val=',y)
except ValueError as err:
    print('floatErr;',err)

输出:

val= 1.23
floatErr: could not convert string to float: 'abc'

Try to convert to float. If there is an error, print the ValueError exception.

try:
    x = float('1.23')
    print('val=',x)
    y = float('abc')
    print('val=',y)
except ValueError as err:
    print('floatErr;',err)

Output:

val= 1.23
floatErr: could not convert string to float: 'abc'

回答 11

将字典作为参数传递时,它将转换可以转换为float的字符串,并留下其他字符串

def covertDict_float(data):
        for i in data:
            if data[i].split(".")[0].isdigit():
                try:
                    data[i] = float(data[i])
                except:
                    continue
        return data

Passing dictionary as argument it will convert strings which can be converted to float and will leave others

def covertDict_float(data):
        for i in data:
            if data[i].split(".")[0].isdigit():
                try:
                    data[i] = float(data[i])
                except:
                    continue
        return data

回答 12

我一直在寻找一些类似的代码,但是看起来使用try / excepts是最好的方法。这是我正在使用的代码。如果输入无效,则包括重试功能。我需要检查输入是否大于0,是否将其转换为浮点型。

def cleanInput(question,retry=False): 
    inputValue = input("\n\nOnly positive numbers can be entered, please re-enter the value.\n\n{}".format(question)) if retry else input(question)
    try:
        if float(inputValue) <= 0 : raise ValueError()
        else : return(float(inputValue))
    except ValueError : return(cleanInput(question,retry=True))


willbefloat = cleanInput("Give me the number: ")

I was looking for some similar code, but it looks like using try/excepts is the best way. Here is the code I’m using. It includes a retry function if the input is invalid. I needed to check if the input was greater than 0 and if so convert it to a float.

def cleanInput(question,retry=False): 
    inputValue = input("\n\nOnly positive numbers can be entered, please re-enter the value.\n\n{}".format(question)) if retry else input(question)
    try:
        if float(inputValue) <= 0 : raise ValueError()
        else : return(float(inputValue))
    except ValueError : return(cleanInput(question,retry=True))


willbefloat = cleanInput("Give me the number: ")

回答 13

def try_parse_float(item):
  result = None
  try:
    float(item)
  except:
    pass
  else:
    result = float(item)
  return result
def try_parse_float(item):
  result = None
  try:
    float(item)
  except:
    pass
  else:
    result = float(item)
  return result

回答 14

我尝试了上述一些简单的选项,并使用了一个围绕转换为浮点数的try测试,发现大多数答复中都存在问题。

简单测试(沿以上答案行):

entry = ttk.Entry(self, validate='key')
entry['validatecommand'] = (entry.register(_test_num), '%P')

def _test_num(P):
    try: 
        float(P)
        return True
    except ValueError:
        return False

问题出现在以下情况:

  • 输入“-”以开始一个负数:

然后float('-'),您正在尝试失败

  • 您输入一个数字,但然后尝试删除所有数字

然后float(''),您正在尝试同样失败的尝试

我的快速解决方案是:

def _test_num(P):
    if P == '' or P == '-': return True
    try: 
        float(P)
        return True
    except ValueError:
        return False

I tried some of the above simple options, using a try test around converting to a float, and found that there is a problem in most of the replies.

Simple test (along the lines of above answers):

entry = ttk.Entry(self, validate='key')
entry['validatecommand'] = (entry.register(_test_num), '%P')

def _test_num(P):
    try: 
        float(P)
        return True
    except ValueError:
        return False

The problem comes when:

  • You enter ‘-‘ to start a negative number:

You are then trying float('-') which fails

  • You enter a number, but then try to delete all the digits

You are then trying float('') which likewise also fails

The quick solution I had is:

def _test_num(P):
    if P == '' or P == '-': return True
    try: 
        float(P)
        return True
    except ValueError:
        return False

回答 15

str(strval).isdigit()

似乎很简单。

处理以字符串或int或float形式存储的值

str(strval).isdigit()

seems to be simple.

Handles values stored in as a string or int or float