标签归档:string

如何使用Python从字符串中删除字符

问题:如何使用Python从字符串中删除字符

例如,有一个字符串。EXAMPLE

如何从中删除中间字符M?我不需要代码。我想知道:

  • Python中的字符串是否以任何特殊字符结尾?
  • 哪种更好的方法-从中间字符开始或从创建新字符串开始,将所有内容从右移到左,而不是复制中间字符?

There is a string, for example. EXAMPLE.

How can I remove the middle character, i.e., M from it? I don’t need the code. I want to know:

  • Do strings in Python end in any special character?
  • Which is a better way – shifting everything right to left starting from the middle character OR creation of a new string and not copying the middle character?

回答 0

在Python中,字符串是不可变的,因此您必须创建一个新字符串。您有一些有关如何创建新字符串的选项。如果要删除出现的“ M”,请执行以下操作:

newstr = oldstr.replace("M", "")

如果要删除中心字符:

midlen = len(oldstr)/2   # //2 in python 3
newstr = oldstr[:midlen] + oldstr[midlen+1:]

您询问字符串是否以特殊字符结尾。不,您在想像C程序员。在Python中,字符串按其长度存储,因此任何字节值(包括\0)都可以出现在字符串中。

In Python, strings are immutable, so you have to create a new string. You have a few options of how to create the new string. If you want to remove the ‘M’ wherever it appears:

newstr = oldstr.replace("M", "")

If you want to remove the central character:

midlen = len(oldstr)/2   # //2 in python 3
newstr = oldstr[:midlen] + oldstr[midlen+1:]

You asked if strings end with a special character. No, you are thinking like a C programmer. In Python, strings are stored with their length, so any byte value, including \0, can appear in a string.


回答 1

这可能是最好的方法:

original = "EXAMPLE"
removed = original.replace("M", "")

不用担心字符转移等问题。大多数Python代码以更高的抽象级别进行。

This is probably the best way:

original = "EXAMPLE"
removed = original.replace("M", "")

Don’t worry about shifting characters and such. Most Python code takes place on a much higher level of abstraction.


回答 2

要替换特定职位:

s = s[:pos] + s[(pos+1):]

替换特定字符:

s = s.replace('M','')

To replace a specific position:

s = s[:pos] + s[(pos+1):]

To replace a specific character:

s = s.replace('M','')

回答 3

字符串是不可变的。但是您可以将它们转换为可变的列表,然后在更改列表后将其转换回字符串。

s = "this is a string"

l = list(s)  # convert to list

l[1] = ""    # "delete" letter h (the item actually still exists but is empty)
l[1:2] = []  # really delete letter h (the item is actually removed from the list)
del(l[1])    # another way to delete it

p = l.index("a")  # find position of the letter "a"
del(l[p])         # delete it

s = "".join(l)  # convert back to string

您还可以通过从现有字符串中获取所需字符以外的所有内容来创建一个新字符串,如其他字符串所示。

Strings are immutable. But you can convert them to a list, which is mutable, and then convert the list back to a string after you’ve changed it.

s = "this is a string"

l = list(s)  # convert to list

l[1] = ""    # "delete" letter h (the item actually still exists but is empty)
l[1:2] = []  # really delete letter h (the item is actually removed from the list)
del(l[1])    # another way to delete it

p = l.index("a")  # find position of the letter "a"
del(l[p])         # delete it

s = "".join(l)  # convert back to string

You can also create a new string, as others have shown, by taking everything except the character you want from the existing string.


回答 4

如何从中删除中间字符(即M)?

您不能,因为Python中的字符串是不可变的

Python中的字符串是否以任何特殊字符结尾?

不。它们类似于字符列表。列表的长度定义字符串的长度,并且没有字符充当终止符。

哪种更好的方法-从中间字符开始或从创建新字符串开始,将所有内容从右移到左,而不是复制中间字符?

您无法修改现有字符串,因此必须创建一个新字符串,其中包含除中间字符以外的所有内容。

How can I remove the middle character, i.e., M from it?

You can’t, because strings in Python are immutable.

Do strings in Python end in any special character?

No. They are similar to lists of characters; the length of the list defines the length of the string, and no character acts as a terminator.

Which is a better way – shifting everything right to left starting from the middle character OR creation of a new string and not copying the middle character?

You cannot modify the existing string, so you must create a new one containing everything except the middle character.


回答 5

使用translate()方法:

>>> s = 'EXAMPLE'
>>> s.translate(None, 'M')
'EXAPLE'

Use the translate() method:

>>> s = 'EXAMPLE'
>>> s.translate(None, 'M')
'EXAPLE'

回答 6

UserString.MutableString

可变方式:

import UserString

s = UserString.MutableString("EXAMPLE")

>>> type(s)
<type 'str'>

# Delete 'M'
del s[3]

# Turn it for immutable:
s = str(s)

UserString.MutableString

Mutable way:

import UserString

s = UserString.MutableString("EXAMPLE")

>>> type(s)
<type 'str'>

# Delete 'M'
del s[3]

# Turn it for immutable:
s = str(s)

回答 7

card = random.choice(cards)
cardsLeft = cards.replace(card, '', 1)

如何从字符串中删除一个字符: 这是一个示例,其中有一堆卡表示为字符串中的字符。其中一个被绘制(为random.choice()函数导入random模块,该函数从字符串中选择一个随机字符)。创建了一个新字符串cardsLeft,以容纳由字符串函数replace()给出的剩余卡片,其中最后一个参数指示仅一个“卡片”将被空字符串替换…

card = random.choice(cards)
cardsLeft = cards.replace(card, '', 1)

How to remove one character from a string: Here is an example where there is a stack of cards represented as characters in a string. One of them is drawn (import random module for the random.choice() function, that picks a random character in the string). A new string, cardsLeft, is created to hold the remaining cards given by the string function replace() where the last parameter indicates that only one “card” is to be replaced by the empty string…


回答 8

def kill_char(string, n): # n = position of which character you want to remove
    begin = string[:n]    # from beginning to n (n not included)
    end = string[n+1:]    # n+1 through end of string
    return begin + end
print kill_char("EXAMPLE", 3)  # "M" removed

我看到这个地方在这里

def kill_char(string, n): # n = position of which character you want to remove
    begin = string[:n]    # from beginning to n (n not included)
    end = string[n+1:]    # n+1 through end of string
    return begin + end
print kill_char("EXAMPLE", 3)  # "M" removed

I have seen this somewhere here.


回答 9

这是我切出“ M”的方法:

s = 'EXAMPLE'
s1 = s[:s.index('M')] + s[s.index('M')+1:]

Here’s what I did to slice out the “M”:

s = 'EXAMPLE'
s1 = s[:s.index('M')] + s[s.index('M')+1:]

回答 10

如果您要删除/忽略字符串中的字符,例如,您拥有此字符串,

“ [11:L:0]”

来自Web API响应或类似CSV文件之类的信息,假设您正在使用请求

import requests
udid = 123456
url = 'http://webservices.yourserver.com/action/id-' + udid
s = requests.Session()
s.verify = False
resp = s.get(url, stream=True)
content = resp.content

循环并摆脱不需要的字符:

for line in resp.iter_lines():
  line = line.replace("[", "")
  line = line.replace("]", "")
  line = line.replace('"', "")

可选拆分,您将能够单独读取值:

listofvalues = line.split(':')

现在访问每个值更容易:

print listofvalues[0]
print listofvalues[1]
print listofvalues[2]

这将打印

11

大号

0

If you want to delete/ignore characters in a string, and, for instance, you have this string,

“[11:L:0]”

from a web API response or something like that, like a CSV file, let’s say you are using requests

import requests
udid = 123456
url = 'http://webservices.yourserver.com/action/id-' + udid
s = requests.Session()
s.verify = False
resp = s.get(url, stream=True)
content = resp.content

loop and get rid of unwanted chars:

for line in resp.iter_lines():
  line = line.replace("[", "")
  line = line.replace("]", "")
  line = line.replace('"', "")

Optional split, and you will be able to read values individually:

listofvalues = line.split(':')

Now accessing each value is easier:

print listofvalues[0]
print listofvalues[1]
print listofvalues[2]

This will print

11

L

0


回答 11

删除一次charsub-string 一次(仅第一次出现):

main_string = main_string.replace(sub_str, replace_with, 1)

注意:在这里1可以用任何int您要替换的出现次数替换。

To delete a char or a sub-string once (only the first occurrence):

main_string = main_string.replace(sub_str, replace_with, 1)

NOTE: Here 1 can be replaced with any int for the number of occurrence you want to replace.


回答 12

您可以简单地使用列表理解。

假设您有字符串:,my name is并且想要删除character m。使用以下代码:

"".join([x for x in "my name is" if x is not 'm'])

You can simply use list comprehension.

Assume that you have the string: my name is and you want to remove character m. use the following code:

"".join([x for x in "my name is" if x is not 'm'])

回答 13

from random import randint


def shuffle_word(word):
    newWord=""
    for i in range(0,len(word)):
        pos=randint(0,len(word)-1)
        newWord += word[pos]
        word = word[:pos]+word[pos+1:]
    return newWord

word = "Sarajevo"
print(shuffle_word(word))
from random import randint


def shuffle_word(word):
    newWord=""
    for i in range(0,len(word)):
        pos=randint(0,len(word)-1)
        newWord += word[pos]
        word = word[:pos]+word[pos+1:]
    return newWord

word = "Sarajevo"
print(shuffle_word(word))

回答 14

另一种方法是使用函数

下面是一种仅通过调用函数即可从字符串中删除所有元音的方法

def disemvowel(s):
    return s.translate(None, "aeiouAEIOU")

Another way is with a function,

Below is a way to remove all vowels from a string, just by calling the function

def disemvowel(s):
    return s.translate(None, "aeiouAEIOU")

回答 15

字符串在Python中是不可变的,因此您的两个选项基本上意味着同一件事。

Strings are immutable in Python so both your options mean the same thing basically.


正确缩进Python多行字符串

问题:正确缩进Python多行字符串

函数中Python多行字符串的正确缩进是什么?

    def method():
        string = """line one
line two
line three"""

要么

    def method():
        string = """line one
        line two
        line three"""

或者是其他东西?

在第一个示例中,将字符串挂在函数外部看起来有些奇怪。

What is the proper indentation for Python multiline strings within a function?

    def method():
        string = """line one
line two
line three"""

or

    def method():
        string = """line one
        line two
        line three"""

or something else?

It looks kind of weird to have the string hanging outside the function in the first example.


回答 0

您可能想与 """

def foo():
    string = """line one
             line two
             line three"""

由于换行符和空格包含在字符串本身中,因此您必须对其进行后处理。如果您不想这样做,并且文本很多,则可能需要将其分别存储在文本文件中。如果文本文件不能很好地适合您的应用程序,并且您不想进行后处理,那么我可能会选择

def foo():
    string = ("this is an "
              "implicitly joined "
              "string")

如果要对多行字符串进行后处理以修剪掉不需要的部分,则应考虑PEP 257中textwrap介绍的对文档字符串进行后处理的模块或技术:

def trim(docstring):
    if not docstring:
        return ''
    # Convert tabs to spaces (following the normal Python rules)
    # and split into a list of lines:
    lines = docstring.expandtabs().splitlines()
    # Determine minimum indentation (first line doesn't count):
    indent = sys.maxint
    for line in lines[1:]:
        stripped = line.lstrip()
        if stripped:
            indent = min(indent, len(line) - len(stripped))
    # Remove indentation (first line is special):
    trimmed = [lines[0].strip()]
    if indent < sys.maxint:
        for line in lines[1:]:
            trimmed.append(line[indent:].rstrip())
    # Strip off trailing and leading blank lines:
    while trimmed and not trimmed[-1]:
        trimmed.pop()
    while trimmed and not trimmed[0]:
        trimmed.pop(0)
    # Return a single string:
    return '\n'.join(trimmed)

You probably want to line up with the """

def foo():
    string = """line one
             line two
             line three"""

Since the newlines and spaces are included in the string itself, you will have to postprocess it. If you don’t want to do that and you have a whole lot of text, you might want to store it separately in a text file. If a text file does not work well for your application and you don’t want to postprocess, I’d probably go with

def foo():
    string = ("this is an "
              "implicitly joined "
              "string")

If you want to postprocess a multiline string to trim out the parts you don’t need, you should consider the textwrap module or the technique for postprocessing docstrings presented in PEP 257:

def trim(docstring):
    if not docstring:
        return ''
    # Convert tabs to spaces (following the normal Python rules)
    # and split into a list of lines:
    lines = docstring.expandtabs().splitlines()
    # Determine minimum indentation (first line doesn't count):
    indent = sys.maxint
    for line in lines[1:]:
        stripped = line.lstrip()
        if stripped:
            indent = min(indent, len(line) - len(stripped))
    # Remove indentation (first line is special):
    trimmed = [lines[0].strip()]
    if indent < sys.maxint:
        for line in lines[1:]:
            trimmed.append(line[indent:].rstrip())
    # Strip off trailing and leading blank lines:
    while trimmed and not trimmed[-1]:
        trimmed.pop()
    while trimmed and not trimmed[0]:
        trimmed.pop(0)
    # Return a single string:
    return '\n'.join(trimmed)

回答 1

textwrap.dedent功能允许在源代码中正确的缩进开始,然后在使用前从文本中删除它。

正如其他一些人所指出的那样,这是对文字的一个额外的函数调用。在决定将这些文字放在代码中的位置时,请考虑到这一点。

import textwrap

def frobnicate(param):
    """ Frobnicate the scrognate param.

        The Weebly-Ruckford algorithm is employed to frobnicate
        the scrognate to within an inch of its life.

        """
    prepare_the_comfy_chair(param)
    log_message = textwrap.dedent("""\
            Prepare to frobnicate:
            Here it comes...
                Any moment now.
            And: Frobnicate!""")
    weebly(param, log_message)
    ruckford(param)

\日志消息文字中的结尾是为了确保换行符不在文字中;这样,文字不以空白行开头,而是以下一个完整行开头。

from的返回值textwrap.dedent是输入字符串,在字符串的每一行上都删除所有常见的前导空格。因此,上面的log_message值将是:

Prepare to frobnicate:
Here it comes...
    Any moment now.
And: Frobnicate!

The textwrap.dedent function allows one to start with correct indentation in the source, and then strip it from the text before use.

The trade-off, as noted by some others, is that this is an extra function call on the literal; take this into account when deciding where to place these literals in your code.

import textwrap

def frobnicate(param):
    """ Frobnicate the scrognate param.

        The Weebly-Ruckford algorithm is employed to frobnicate
        the scrognate to within an inch of its life.

        """
    prepare_the_comfy_chair(param)
    log_message = textwrap.dedent("""\
            Prepare to frobnicate:
            Here it comes...
                Any moment now.
            And: Frobnicate!""")
    weebly(param, log_message)
    ruckford(param)

The trailing \ in the log message literal is to ensure that line break isn’t in the literal; that way, the literal doesn’t start with a blank line, and instead starts with the next full line.

The return value from textwrap.dedent is the input string with all common leading whitespace indentation removed on each line of the string. So the above log_message value will be:

Prepare to frobnicate:
Here it comes...
    Any moment now.
And: Frobnicate!

回答 2

inspect.cleandoc像这样使用:

def method():
    string = inspect.cleandoc("""
        line one
        line two
        line three""")

相对缩进将保持预期。正如评论下面,如果你想保持使用前空行,textwrap.dedent。但是,这样也可以保持第一行。

注意:优良作法是在代码的相关上下文下缩进逻辑代码块以阐明结构。例如,属于变量的多行字符串string

Use inspect.cleandoc like so:

def method():
    string = inspect.cleandoc("""
        line one
        line two
        line three""")

Relative indentation will be maintained as expected. As commented below, if you want to keep preceding empty lines, use textwrap.dedent. However that also keeps the first line break.

Note: It’s good practice to indent logical blocks of code under its related context to clarify the structure. E.g. the multi-line string belonging to the variable string.


回答 3

以下似乎是其他答案(仅在naxa的评论的最下方提到)中缺少的一个选项:

def foo():
    string = ("line one\n"          # Add \n in the string
              "line two"  "\n"      # Add "\n" after the string
              "line three\n")

这将允许正确对齐,隐式连接行并仍保持行移位,这对我来说还是我仍然要使用多行字符串的原因之一。

它不需要任何后处理,但是您需要\n在要结束行的任何给定位置手动添加。内联或后接一个单独的字符串。后者更容易复制粘贴。

One option which seems to missing from the other answers (only mentioned deep down in a comment by naxa) is the following:

def foo():
    string = ("line one\n"          # Add \n in the string
              "line two"  "\n"      # Add "\n" after the string
              "line three\n")

This will allow proper aligning, join the lines implicitly, and still keep the line shift which, for me, is one of the reasons why I would like to use multiline strings anyway.

It doesn’t require any postprocessing, but you need to manually add the \n at any given place that you want the line to end. Either inline or as a separate string after. The latter is easier to copy-paste in.


回答 4

一些更多的选择。在启用pylab的Ipython中,dedent已经在命名空间中。我检查了,它来自matplotlib。或者可以将其导入:

from matplotlib.cbook import dedent

在文档中它指出它比等效的textwrap更快,在我的ipython测试中,它的确比我的快速测试平均快3倍。它还具有丢弃任何前导空白行的好处,这使您可以灵活地构造字符串:

"""
line 1 of string
line 2 of string
"""

"""\
line 1 of string
line 2 of string
"""

"""line 1 of string
line 2 of string
"""

在这三个示例上使用matplotlib dedent将产生相同的明智结果。textwrap dedent函数在第一个示例中将有一个前导空白行。

明显的缺点是textwrap在标准库中,而matplotlib是外部模块。

这里有一些折衷… dedent函数使您的代码在定义字符串的地方更具可读性,但是需要稍后进行处理才能以可用格式获取字符串。在文档字符串中,很明显应该使用正确的缩进,因为文档字符串的大多数用法都会进行所需的处理。

当我的代码中需要一个非长字符串时,我发现以下公认的丑陋代码,在其中让长字符串脱离了封闭的缩进。肯定在“美丽比丑陋更好”上失败了,但是有人会说它比坚决的选择更简单,更明确。

def example():
    long_string = '''\
Lorem ipsum dolor sit amet, consectetur adipisicing
elit, sed do eiusmod tempor incididunt ut labore et
dolore magna aliqua. Ut enim ad minim veniam, quis
nostrud exercitation ullamco laboris nisi ut aliquip.\
'''
    return long_string

print example()

Some more options. In Ipython with pylab enabled, dedent is already in the namespace. I checked and it is from matplotlib. Or it can be imported with:

from matplotlib.cbook import dedent

In documentation it states that it is faster than the textwrap equivalent one and in my tests in ipython it is indeed 3 times faster on average with my quick tests. It also has the benefit that it discards any leading blank lines this allows you to be flexible in how you construct the string:

"""
line 1 of string
line 2 of string
"""

"""\
line 1 of string
line 2 of string
"""

"""line 1 of string
line 2 of string
"""

Using the matplotlib dedent on these three examples will give the same sensible result. The textwrap dedent function will have a leading blank line with 1st example.

Obvious disadvantage is that textwrap is in standard library while matplotlib is external module.

Some tradeoffs here… the dedent functions make your code more readable where the strings get defined, but require processing later to get the string in usable format. In docstrings it is obvious that you should use correct indentation as most uses of the docstring will do the required processing.

When I need a non long string in my code I find the following admittedly ugly code where I let the long string drop out of the enclosing indentation. Definitely fails on “Beautiful is better than ugly.”, but one could argue that it is simpler and more explicit than the dedent alternative.

def example():
    long_string = '''\
Lorem ipsum dolor sit amet, consectetur adipisicing
elit, sed do eiusmod tempor incididunt ut labore et
dolore magna aliqua. Ut enim ad minim veniam, quis
nostrud exercitation ullamco laboris nisi ut aliquip.\
'''
    return long_string

print example()

回答 5

如果您想要一个快速简便的解决方案并避免输入换行符,则可以选择一个列表,例如:

def func(*args, **kwargs):
    string = '\n'.join([
        'first line of very long string and',
        'second line of the same long thing and',
        'third line of ...',
        'and so on...',
        ])
    print(string)
    return

If you want a quick&easy solution and save yourself from typing newlines, you could opt for a list instead, e.g.:

def func(*args, **kwargs):
    string = '\n'.join([
        'first line of very long string and',
        'second line of the same long thing and',
        'third line of ...',
        'and so on...',
        ])
    print(string)
    return

回答 6

我更喜欢

    def method():
        string = \
"""\
line one
line two
line three\
"""

要么

    def method():
        string = """\
line one
line two
line three\
"""

I prefer

    def method():
        string = \
"""\
line one
line two
line three\
"""

or

    def method():
        string = """\
line one
line two
line three\
"""

回答 7

我的两分钱,逃离行尾以获取缩进:

def foo():
    return "{}\n"\
           "freq: {}\n"\
           "temp: {}\n".format( time, freq, temp )

My two cents, escape the end of line to get the indents:

def foo():
    return "{}\n"\
           "freq: {}\n"\
           "temp: {}\n".format( time, freq, temp )

回答 8

我来这里是为了寻找一种简单的1-衬板,以去除/校正打印时文档字符串的标识级别而又不会使其看起来不整洁,例如,通过使其在脚本内“挂在函数外部”。

我最终要做的是:

import string
def myfunction():

    """
    line 1 of docstring
    line 2 of docstring
    line 3 of docstring"""

print str(string.replace(myfunction.__doc__,'\n\t','\n'))[1:] 

显然,如果要缩进空格(例如4)而不是Tab键,请改用如下代码:

print str(string.replace(myfunction.__doc__,'\n    ','\n'))[1:]

而且,如果您希望文档字符串看起来像这样,则无需删除第一个字符:

    """line 1 of docstring
    line 2 of docstring
    line 3 of docstring"""

print string.replace(myfunction.__doc__,'\n\t','\n') 

I came here looking for a simple 1-liner to remove/correct the identation level of the docstring for printing, without making it look untidy, for example by making it “hang outside the function” within the script.

Here’s what I ended up doing:

import string
def myfunction():

    """
    line 1 of docstring
    line 2 of docstring
    line 3 of docstring"""

print str(string.replace(myfunction.__doc__,'\n\t','\n'))[1:] 

Obviously, if you’re indenting with spaces (e.g. 4) rather than the tab key use something like this instead:

print str(string.replace(myfunction.__doc__,'\n    ','\n'))[1:]

And you don’t need to remove the first character if you like your docstrings to look like this instead:

    """line 1 of docstring
    line 2 of docstring
    line 3 of docstring"""

print string.replace(myfunction.__doc__,'\n\t','\n') 

回答 9

第一种选择是好的-包括缩进。它是python样式-提供代码的可读性。

要正确显示它:

print string.lstrip()

The first option is the good one – with indentation included. It is in python style – provides readability for the code.

To display it properly:

print string.lstrip()

回答 10

这取决于您希望文本如何显示。如果您希望所有内容都左对齐,则可以按照第一个代码段的格式对其进行格式化,也可以遍历所有空间的左行进行迭代。

It depends on how you want the text to display. If you want it all to be left-aligned then either format it as in the first snippet or iterate through the lines left-trimming all the space.


回答 11

对于字符串,您可以在处理字符串之后。对于文档字符串,您需要对函数进行后处理。这是一个仍然可读的解决方案。

class Lstrip(object):
    def __rsub__(self, other):
        import re
        return re.sub('^\n', '', re.sub('\n$', '', re.sub('\n\s+', '\n', other)))

msg = '''
      Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod
      tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim
      veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea
      commodo consequat. Duis aute irure dolor in reprehenderit in voluptate
      velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat
      cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id
      est laborum.
      ''' - Lstrip()

print msg

def lstrip_docstring(func):
    func.__doc__ = func.__doc__ - Lstrip()
    return func

@lstrip_docstring
def foo():
    '''
    Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod
    tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim
    veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea
    commodo consequat. Duis aute irure dolor in reprehenderit in voluptate
    velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat
    cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id
    est laborum.
    '''
    pass


print foo.__doc__

For strings you can just after process the string. For docstrings you need to after process the function instead. Here is a solution for both that is still readable.

class Lstrip(object):
    def __rsub__(self, other):
        import re
        return re.sub('^\n', '', re.sub('\n$', '', re.sub('\n\s+', '\n', other)))

msg = '''
      Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod
      tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim
      veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea
      commodo consequat. Duis aute irure dolor in reprehenderit in voluptate
      velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat
      cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id
      est laborum.
      ''' - Lstrip()

print msg

def lstrip_docstring(func):
    func.__doc__ = func.__doc__ - Lstrip()
    return func

@lstrip_docstring
def foo():
    '''
    Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod
    tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim
    veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea
    commodo consequat. Duis aute irure dolor in reprehenderit in voluptate
    velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat
    cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id
    est laborum.
    '''
    pass


print foo.__doc__

回答 12

我遇到类似的问题,使用多行代码使代码变得难以理解,我想到了类似

print("""aaaa
"""   """bbb
""")

是的,一开始看起来可能很糟糕,但是嵌入式语法非常复杂,并且在末尾添加一些内容(例如’\ n“’)不是解决方案

I’m having a similar issue, code got really unreadable using multilines, I came out with something like

print("""aaaa
"""   """bbb
""")

yes, at beginning could look terrible but the embedded syntax was quite complex and adding something at the end (like ‘\n”‘) was not a solution


回答 13

您可以使用此函数trim_indent

import re


def trim_indent(s: str):
    s = re.sub(r'^\n+', '', s)
    s = re.sub(r'\n+$', '', s)
    spaces = re.findall(r'^ +', s, flags=re.MULTILINE)
    if len(spaces) > 0 and len(re.findall(r'^[^\s]', s, flags=re.MULTILINE)) == 0:
        s = re.sub(r'^%s' % (min(spaces)), '', s, flags=re.MULTILINE)
    return s


print(trim_indent("""


        line one
            line two
                line three
            line two
        line one


"""))

结果:

"""
line one
    line two
        line three
    line two
line one
"""

You can use this function trim_indent.

import re


def trim_indent(s: str):
    s = re.sub(r'^\n+', '', s)
    s = re.sub(r'\n+$', '', s)
    spaces = re.findall(r'^ +', s, flags=re.MULTILINE)
    if len(spaces) > 0 and len(re.findall(r'^[^\s]', s, flags=re.MULTILINE)) == 0:
        s = re.sub(r'^%s' % (min(spaces)), '', s, flags=re.MULTILINE)
    return s


print(trim_indent("""


        line one
            line two
                line three
            line two
        line one


"""))

Result:

"""
line one
    line two
        line three
    line two
line one
"""

从熊猫DataFrame中按部分字符串选择

问题:从熊猫DataFrame中按部分字符串选择

我有一个DataFrame4列,其中2个包含字符串值。我想知道是否有一种方法可以根据针对特定列的部分字符串匹配来选择行?

换句话说,一个函数或lambda函数将执行以下操作

re.search(pattern, cell_in_question) 

返回一个布尔值。我熟悉的语法,df[df['A'] == "hello world"]但似乎找不到用部分字符串匹配说的方法'hello'

有人可以指出正确的方向吗?

I have a DataFrame with 4 columns of which 2 contain string values. I was wondering if there was a way to select rows based on a partial string match against a particular column?

In other words, a function or lambda function that would do something like

re.search(pattern, cell_in_question) 

returning a boolean. I am familiar with the syntax of df[df['A'] == "hello world"] but can’t seem to find a way to do the same with a partial string match say 'hello'.

Would someone be able to point me in the right direction?


回答 0

基于github问题#620,看来您很快将能够执行以下操作:

df[df['A'].str.contains("hello")]

更新:熊猫0.8.1及更高版本中提供了矢量化字符串方法(即Series.str)

Based on github issue #620, it looks like you’ll soon be able to do the following:

df[df['A'].str.contains("hello")]

Update: vectorized string methods (i.e., Series.str) are available in pandas 0.8.1 and up.


回答 1

我尝试了上面提出的解决方案:

df[df["A"].str.contains("Hello|Britain")]

并得到一个错误:

ValueError:无法使用包含NA / NaN值的数组进行遮罩

您可以将NA值转换为False,如下所示:

df[df["A"].str.contains("Hello|Britain", na=False)]

I tried the proposed solution above:

df[df["A"].str.contains("Hello|Britain")]

and got an error:

ValueError: cannot mask with array containing NA / NaN values

you can transform NA values into False, like this:

df[df["A"].str.contains("Hello|Britain", na=False)]

回答 2

如何从熊猫DataFrame中按部分字符串选择?

这篇文章是为想要

  • 在字符串列中搜索子字符串(最简单的情况)
  • 搜索多个子字符串(类似于isin
  • 匹配文本中的整个单词(例如,“蓝色”应匹配“天空是蓝色”,而不是“ bluejay”)
  • 匹配多个完整词
  • 了解“ ValueError:无法使用包含NA / NaN值的向量进行索引”背后的原因

…并想进一步了解应优先采用哪种方法。

(PS:我在类似主题上看到了很多问题,我认为最好把它留在这里。)


基本子串搜索

# setup
df1 = pd.DataFrame({'col': ['foo', 'foobar', 'bar', 'baz']})
df1

      col
0     foo
1  foobar
2     bar
3     baz

str.contains可用于执行子字符串搜索或基于正则表达式的搜索。搜索默认为基于正则表达式,除非您明确禁用它。

这是一个基于正则表达式的搜索示例,

# find rows in `df1` which contain "foo" followed by something
df1[df1['col'].str.contains(r'foo(?!$)')]

      col
1  foobar

有时,不需要进行正则表达式搜索,因此请指定regex=False为禁用它。

#select all rows containing "foo"
df1[df1['col'].str.contains('foo', regex=False)]
# same as df1[df1['col'].str.contains('foo')] but faster.

      col
0     foo
1  foobar

在性能方面,正则表达式搜索比子字符串搜索慢:

df2 = pd.concat([df1] * 1000, ignore_index=True)

%timeit df2[df2['col'].str.contains('foo')]
%timeit df2[df2['col'].str.contains('foo', regex=False)]

6.31 ms ± 126 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
2.8 ms ± 241 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

如果不需要,请避免使用基于正则表达式的搜索。

解决ValueError小号
有时,执行字符串搜索和对结果的过滤会导致

ValueError: cannot index with vector containing NA / NaN values

这通常是由于对象列中存在混合数据或NaN,

s = pd.Series(['foo', 'foobar', np.nan, 'bar', 'baz', 123])
s.str.contains('foo|bar')

0     True
1     True
2      NaN
3     True
4    False
5      NaN
dtype: object


s[s.str.contains('foo|bar')]
# ---------------------------------------------------------------------------
# ValueError                                Traceback (most recent call last)

非字符串的任何内容都不能应用字符串方法,因此结果自然是NaN。在这种情况下,请指定na=False忽略非字符串数据,

s.str.contains('foo|bar', na=False)

0     True
1     True
2    False
3     True
4    False
5    False
dtype: bool

多个子串搜索

通过使用正则表达式OR管道进行正则表达式搜索,最容易实现这一点。

# Slightly modified example.
df4 = pd.DataFrame({'col': ['foo abc', 'foobar xyz', 'bar32', 'baz 45']})
df4

          col
0     foo abc
1  foobar xyz
2       bar32
3      baz 45

df4[df4['col'].str.contains(r'foo|baz')]

          col
0     foo abc
1  foobar xyz
3      baz 45

您还可以创建一个术语列表,然后将其加入:

terms = ['foo', 'baz']
df4[df4['col'].str.contains('|'.join(terms))]

          col
0     foo abc
1  foobar xyz
3      baz 45

有时,明智的做法是将您的术语转义,以防它们包含可被解释为正则表达式元字符的字符。如果您的条款包含以下任何字符…

. ^ $ * + ? { } [ ] \ | ( )

然后,你就需要使用re.escape逃避它们:

import re
df4[df4['col'].str.contains('|'.join(map(re.escape, terms)))]

          col
0     foo abc
1  foobar xyz
3      baz 45

re.escape 具有转义特殊字符的效果,因此可以按字面意义对待它们。

re.escape(r'.foo^')
# '\\.foo\\^'

匹配全词

默认情况下,子字符串搜索将搜索指定的子字符串/模式,而不管其是否为完整单词。为了仅匹配完整的单词,我们将需要在此处使用正则表达式-特别是,我们的模式将需要指定单词边界(\b)。

例如,

df3 = pd.DataFrame({'col': ['the sky is blue', 'bluejay by the window']})
df3

                     col
0        the sky is blue
1  bluejay by the window

现在考虑

df3[df3['col'].str.contains('blue')]

                     col
0        the sky is blue
1  bluejay by the window

伏/秒

df3[df3['col'].str.contains(r'\bblue\b')]

               col
0  the sky is blue

多个全字搜索

与上述类似,不同之处\b在于我们在连接的模式中添加了字边界()。

p = r'\b(?:{})\b'.format('|'.join(map(re.escape, terms)))
df4[df4['col'].str.contains(p)]

       col
0  foo abc
3   baz 45

p这个样子的,

p
# '\\b(?:foo|baz)\\b'

一个很好的选择:使用列表推导

因为你能!而且你应该!它们通常比字符串方法快一点,因为字符串方法难以向量化并且通常具有循环实现。

代替,

df1[df1['col'].str.contains('foo', regex=False)]

in在列表组合中使用运算符,

df1[['foo' in x for x in df1['col']]]

       col
0  foo abc
1   foobar

代替,

regex_pattern = r'foo(?!$)'
df1[df1['col'].str.contains(regex_pattern)]

在列表组合中使用re.compile(用于缓存正则表达式)+ Pattern.search

p = re.compile(regex_pattern, flags=re.IGNORECASE)
df1[[bool(p.search(x)) for x in df1['col']]]

      col
1  foobar

如果“ col”具有NaN,则代替

df1[df1['col'].str.contains(regex_pattern, na=False)]

采用,

def try_search(p, x):
    try:
        return bool(p.search(x))
    except TypeError:
        return False

p = re.compile(regex_pattern)
df1[[try_search(p, x) for x in df1['col']]]

      col
1  foobar

偏字符串匹配更多选项:np.char.findnp.vectorizeDataFrame.query

除了str.contains和列出理解,您还可以使用以下替代方法。

np.char.find
仅支持子字符串搜索(读取:无正则表达式)。

df4[np.char.find(df4['col'].values.astype(str), 'foo') > -1]

          col
0     foo abc
1  foobar xyz

np.vectorize
这是一个循环的包装器,但是比大多数pandas str方法要少。

f = np.vectorize(lambda haystack, needle: needle in haystack)
f(df1['col'], 'foo')
# array([ True,  True, False, False])

df1[f(df1['col'], 'foo')]

       col
0  foo abc
1   foobar

正则表达式解决方案可能:

regex_pattern = r'foo(?!$)'
p = re.compile(regex_pattern)
f = np.vectorize(lambda x: pd.notna(x) and bool(p.search(x)))
df1[f(df1['col'])]

      col
1  foobar

DataFrame.query
通过python引擎支持字符串方法。这没有提供明显的性能优势,但是对于了解是否需要动态生成查询很有用。

df1.query('col.str.contains("foo")', engine='python')

      col
0     foo
1  foobar

有关更多信息queryeval方法系列,请参见使用pd.eval()在大熊猫中进行动态表达评估。


推荐用法

  1. (第一) str.contains,因为它简单易用,可以处理NaN和混合数据
  2. 列出其性能的理解(特别是如果您的数据是纯字符串)
  3. np.vectorize
  4. (持续) df.query

How do I select by partial string from a pandas DataFrame?

This post is meant for readers who want to

  • search for a substring in a string column (the simplest case)
  • search for multiple substrings (similar to isin)
  • match a whole word from text (e.g., “blue” should match “the sky is blue” but not “bluejay”)
  • match multiple whole words
  • Understand the reason behind “ValueError: cannot index with vector containing NA / NaN values”

…and would like to know more about what methods should be preferred over others.

(P.S.: I’ve seen a lot of questions on similar topics, I thought it would be good to leave this here.)


Basic Substring Search

# setup
df1 = pd.DataFrame({'col': ['foo', 'foobar', 'bar', 'baz']})
df1

      col
0     foo
1  foobar
2     bar
3     baz

str.contains can be used to perform either substring searches or regex based search. The search defaults to regex-based unless you explicitly disable it.

Here is an example of regex-based search,

# find rows in `df1` which contain "foo" followed by something
df1[df1['col'].str.contains(r'foo(?!$)')]

      col
1  foobar

Sometimes regex search is not required, so specify regex=False to disable it.

#select all rows containing "foo"
df1[df1['col'].str.contains('foo', regex=False)]
# same as df1[df1['col'].str.contains('foo')] but faster.

      col
0     foo
1  foobar

Performance wise, regex search is slower than substring search:

df2 = pd.concat([df1] * 1000, ignore_index=True)

%timeit df2[df2['col'].str.contains('foo')]
%timeit df2[df2['col'].str.contains('foo', regex=False)]

6.31 ms ± 126 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
2.8 ms ± 241 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

Avoid using regex-based search if you don’t need it.

Addressing ValueErrors
Sometimes, performing a substring search and filtering on the result will result in

ValueError: cannot index with vector containing NA / NaN values

This is usually because of mixed data or NaNs in your object column,

s = pd.Series(['foo', 'foobar', np.nan, 'bar', 'baz', 123])
s.str.contains('foo|bar')

0     True
1     True
2      NaN
3     True
4    False
5      NaN
dtype: object


s[s.str.contains('foo|bar')]
# ---------------------------------------------------------------------------
# ValueError                                Traceback (most recent call last)

Anything that is not a string cannot have string methods applied on it, so the result is NaN (naturally). In this case, specify na=False to ignore non-string data,

s.str.contains('foo|bar', na=False)

0     True
1     True
2    False
3     True
4    False
5    False
dtype: bool

Multiple Substring Search

This is most easily achieved through a regex search using the regex OR pipe.

# Slightly modified example.
df4 = pd.DataFrame({'col': ['foo abc', 'foobar xyz', 'bar32', 'baz 45']})
df4

          col
0     foo abc
1  foobar xyz
2       bar32
3      baz 45

df4[df4['col'].str.contains(r'foo|baz')]

          col
0     foo abc
1  foobar xyz
3      baz 45

You can also create a list of terms, then join them:

terms = ['foo', 'baz']
df4[df4['col'].str.contains('|'.join(terms))]

          col
0     foo abc
1  foobar xyz
3      baz 45

Sometimes, it is wise to escape your terms in case they have characters that can be interpreted as regex metacharacters. If your terms contain any of the following characters…

. ^ $ * + ? { } [ ] \ | ( )

Then, you’ll need to use re.escape to escape them:

import re
df4[df4['col'].str.contains('|'.join(map(re.escape, terms)))]

          col
0     foo abc
1  foobar xyz
3      baz 45

re.escape has the effect of escaping the special characters so they’re treated literally.

re.escape(r'.foo^')
# '\\.foo\\^'

Matching Entire Word(s)

By default, the substring search searches for the specified substring/pattern regardless of whether it is full word or not. To only match full words, we will need to make use of regular expressions here—in particular, our pattern will need to specify word boundaries (\b).

For example,

df3 = pd.DataFrame({'col': ['the sky is blue', 'bluejay by the window']})
df3

                     col
0        the sky is blue
1  bluejay by the window

Now consider,

df3[df3['col'].str.contains('blue')]

                     col
0        the sky is blue
1  bluejay by the window

v/s

df3[df3['col'].str.contains(r'\bblue\b')]

               col
0  the sky is blue

Multiple Whole Word Search

Similar to the above, except we add a word boundary (\b) to the joined pattern.

p = r'\b(?:{})\b'.format('|'.join(map(re.escape, terms)))
df4[df4['col'].str.contains(p)]

       col
0  foo abc
3   baz 45

Where p looks like this,

p
# '\\b(?:foo|baz)\\b'

A Great Alternative: Use List Comprehensions!

Because you can! And you should! They are usually a little bit faster than string methods, because string methods are hard to vectorise and usually have loopy implementations.

Instead of,

df1[df1['col'].str.contains('foo', regex=False)]

Use the in operator inside a list comp,

df1[['foo' in x for x in df1['col']]]

       col
0  foo abc
1   foobar

Instead of,

regex_pattern = r'foo(?!$)'
df1[df1['col'].str.contains(regex_pattern)]

Use re.compile (to cache your regex) + Pattern.search inside a list comp,

p = re.compile(regex_pattern, flags=re.IGNORECASE)
df1[[bool(p.search(x)) for x in df1['col']]]

      col
1  foobar

If “col” has NaNs, then instead of

df1[df1['col'].str.contains(regex_pattern, na=False)]

Use,

def try_search(p, x):
    try:
        return bool(p.search(x))
    except TypeError:
        return False

p = re.compile(regex_pattern)
df1[[try_search(p, x) for x in df1['col']]]

      col
1  foobar

More Options for Partial String Matching: np.char.find, np.vectorize, DataFrame.query.

In addition to str.contains and list comprehensions, you can also use the following alternatives.

np.char.find
Supports substring searches (read: no regex) only.

df4[np.char.find(df4['col'].values.astype(str), 'foo') > -1]

          col
0     foo abc
1  foobar xyz

np.vectorize
This is a wrapper around a loop, but with lesser overhead than most pandas str methods.

f = np.vectorize(lambda haystack, needle: needle in haystack)
f(df1['col'], 'foo')
# array([ True,  True, False, False])

df1[f(df1['col'], 'foo')]

       col
0  foo abc
1   foobar

Regex solutions possible:

regex_pattern = r'foo(?!$)'
p = re.compile(regex_pattern)
f = np.vectorize(lambda x: pd.notna(x) and bool(p.search(x)))
df1[f(df1['col'])]

      col
1  foobar

DataFrame.query
Supports string methods through the python engine. This offers no visible performance benefits, but is nonetheless useful to know if you need to dynamically generate your queries.

df1.query('col.str.contains("foo")', engine='python')

      col
0     foo
1  foobar

More information on query and eval family of methods can be found at Dynamic Expression Evaluation in pandas using pd.eval().


Recommended Usage Precedence

  1. (First) str.contains, for its simplicity and ease handling NaNs and mixed data
  2. List comprehensions, for its performance (especially if your data is purely strings)
  3. np.vectorize
  4. (Last) df.query

回答 3

如果有人想知道如何执行相关问题:“按部分字符串选择列”

采用:

df.filter(like='hello')  # select columns which contain the word hello

要通过部分字符串匹配选择行,请传递axis=0到过滤器:

# selects rows which contain the word hello in their index label
df.filter(like='hello', axis=0)  

If anyone wonders how to perform a related problem: “Select column by partial string”

Use:

df.filter(like='hello')  # select columns which contain the word hello

And to select rows by partial string matching, pass axis=0 to filter:

# selects rows which contain the word hello in their index label
df.filter(like='hello', axis=0)  

回答 4

快速说明:如果要基于索引中包含的部分字符串进行选择,请尝试以下操作:

df['stridx']=df.index
df[df['stridx'].str.contains("Hello|Britain")]

Quick note: if you want to do selection based on a partial string contained in the index, try the following:

df['stridx']=df.index
df[df['stridx'].str.contains("Hello|Britain")]

回答 5

说您有以下内容DataFrame

>>> df = pd.DataFrame([['hello', 'hello world'], ['abcd', 'defg']], columns=['a','b'])
>>> df
       a            b
0  hello  hello world
1   abcd         defg

您始终可以in在lambda表达式中使用运算符来创建过滤器。

>>> df.apply(lambda x: x['a'] in x['b'], axis=1)
0     True
1    False
dtype: bool

这里的技巧是使用中的axis=1选项apply将元素逐行(而不是逐列)传递给lambda函数。

Say you have the following DataFrame:

>>> df = pd.DataFrame([['hello', 'hello world'], ['abcd', 'defg']], columns=['a','b'])
>>> df
       a            b
0  hello  hello world
1   abcd         defg

You can always use the in operator in a lambda expression to create your filter.

>>> df.apply(lambda x: x['a'] in x['b'], axis=1)
0     True
1    False
dtype: bool

The trick here is to use the axis=1 option in the apply to pass elements to the lambda function row by row, as opposed to column by column.


回答 6

这就是我为部分字符串匹配所做的最终结果。如果有人有更有效的方法,请告诉我。

def stringSearchColumn_DataFrame(df, colName, regex):
    newdf = DataFrame()
    for idx, record in df[colName].iteritems():

        if re.search(regex, record):
            newdf = concat([df[df[colName] == record], newdf], ignore_index=True)

    return newdf

Here’s what I ended up doing for partial string matches. If anyone has a more efficient way of doing this please let me know.

def stringSearchColumn_DataFrame(df, colName, regex):
    newdf = DataFrame()
    for idx, record in df[colName].iteritems():

        if re.search(regex, record):
            newdf = concat([df[df[colName] == record], newdf], ignore_index=True)

    return newdf

回答 7

对于包含特殊字符的字符串,使用contains效果不佳。找到工作了。

df[df['A'].str.find("hello") != -1]

Using contains didn’t work well for my string with special characters. Find worked though.

df[df['A'].str.find("hello") != -1]

回答 8

在此之前,有一些答案可以完成所要求的功能,无论如何,我想以最普遍的方式展示:

df.filter(regex=".*STRING_YOU_LOOK_FOR.*")

这样,无论编写哪种方式,您都可以获取要查找的列。

(显然,您必须为每种情况编写正确的regex表达式)

There are answers before this which accomplish the asked feature, anyway I would like to show the most generally way:

df.filter(regex=".*STRING_YOU_LOOK_FOR.*")

This way let’s you get the column you look for whatever the way is wrote.

( Obviusly, you have to write the proper regex expression for each case )


回答 9

也许您想在Pandas数据框的所有列中搜索一些文本,而不仅仅是在它们的子集中。在这种情况下,以下代码将有所帮助。

df[df.apply(lambda row: row.astype(str).str.contains('String To Find').any(), axis=1)]

警告。此方法相对较慢,但很方便。

Maybe you want to search for some text in all columns of the Pandas dataframe, and not just in the subset of them. In this case, the following code will help.

df[df.apply(lambda row: row.astype(str).str.contains('String To Find').any(), axis=1)]

Warning. This method is relatively slow, albeit convenient.


回答 10

如果您需要在pandas dataframe列中进行不区分大小写的搜索,请执行以下操作:

df[df['A'].str.contains("hello", case=False)]

Should you need to do a case insensitive search for a string in a pandas dataframe column:

df[df['A'].str.contains("hello", case=False)]

如何在Python中将字符串转换为整数?

问题:如何在Python中将字符串转换为整数?

我有一个来自MySQL查询的元组,像这样:

T1 = (('13', '17', '18', '21', '32'),
      ('07', '11', '13', '14', '28'),
      ('01', '05', '06', '08', '15', '16'))

我想将所有字符串元素转换为整数,然后将它们放回列表列表中:

T2 = [[13, 17, 18, 21, 32], [7, 11, 13, 14, 28], [1, 5, 6, 8, 15, 16]]

我试图用它来实现它,eval但是还没有得到令人满意的结果。

I have a tuple of tuples from a MySQL query like this:

T1 = (('13', '17', '18', '21', '32'),
      ('07', '11', '13', '14', '28'),
      ('01', '05', '06', '08', '15', '16'))

I’d like to convert all the string elements into integers and put them back into a list of lists:

T2 = [[13, 17, 18, 21, 32], [7, 11, 13, 14, 28], [1, 5, 6, 8, 15, 16]]

I tried to achieve it with eval but didn’t get any decent result yet.


回答 0

int()是Python标准的内置函数,用于将字符串转换为整数值。您使用一个包含数字作为参数的字符串来调用它,它返回转换为整数的数字:

print (int("1") + 1)

上面的照片2

如果您知道列表T1的结构(它仅包含列表,仅一个级别),则可以在Python 2中执行此操作:

T2 = [map(int, x) for x in T1]

在Python 3中:

T2 = [list(map(int, x)) for x in T1]

int() is the Python standard built-in function to convert a string into an integer value. You call it with a string containing a number as the argument, and it returns the number converted to an integer:

print (int("1") + 1)

The above prints 2.

If you know the structure of your list, T1 (that it simply contains lists, only one level), you could do this in Python 2:

T2 = [map(int, x) for x in T1]

In Python 3:

T2 = [list(map(int, x)) for x in T1]

回答 1

您可以通过列表理解来做到这一点:

T2 = [[int(column) for column in row] for row in T1]

内部列表理解([int(column) for column in row])建立一个listint期从序列int-able物体,如小数字符串中row。外部列表推导([... for row in T1]))生成一个内部列表推导的结果的列表,该结果适用于中的每个项目T1

如果任何行包含无法通过转换的对象,则代码段将失败int。如果要处理包含非十进制字符串的行,则需要一个更智能的函数。

如果您知道行的结构,则可以使用对行函数的调用来替换内部列表理解。例如。

T2 = [parse_a_row_of_T1(row) for row in T1]

You can do this with a list comprehension:

T2 = [[int(column) for column in row] for row in T1]

The inner list comprehension ([int(column) for column in row]) builds a list of ints from a sequence of int-able objects, like decimal strings, in row. The outer list comprehension ([... for row in T1])) builds a list of the results of the inner list comprehension applied to each item in T1.

The code snippet will fail if any of the rows contain objects that can’t be converted by int. You’ll need a smarter function if you want to process rows containing non-decimal strings.

If you know the structure of the rows, you can replace the inner list comprehension with a call to a function of the row. Eg.

T2 = [parse_a_row_of_T1(row) for row in T1]

回答 2

我宁愿只使用理解列表:

[[int(y) for y in x] for x in T1]

I would rather prefer using only comprehension lists:

[[int(y) for y in x] for x in T1]

回答 3

代替put int( ),put float( )可以让您将小数与整数一起使用。

Instead of putting int( ), put float( ) which will let you use decimals along with integers.


回答 4

到目前为止,我都同意所有人的回答,但是问题是,如果您没有所有整数,它们将崩溃。

如果要排除非整数,则

T1 = (('13', '17', '18', '21', '32'),
      ('07', '11', '13', '14', '28'),
      ('01', '05', '06', '08', '15', '16'))
new_list = list(list(int(a) for a in b) for b in T1 if a.isdigit())

这仅产生实际数字。我不使用直接列表推导的原因是因为列表推导会泄漏其内部变量。

I would agree with everyones answers so far but the problem is is that if you do not have all integers they will crash.

If you wanted to exclude non-integers then

T1 = (('13', '17', '18', '21', '32'),
      ('07', '11', '13', '14', '28'),
      ('01', '05', '06', '08', '15', '16'))
new_list = list(list(int(a) for a in b) for b in T1 if a.isdigit())

This yields only actual digits. The reason I don’t use direct list comprehensions is because list comprehension leaks their internal variables.


回答 5

T3=[]

for i in range(0,len(T1)):
    T3.append([])
    for j in range(0,len(T1[i])):
        b=int(T1[i][j])
        T3[i].append(b)

print T3
T3=[]

for i in range(0,len(T1)):
    T3.append([])
    for j in range(0,len(T1[i])):
        b=int(T1[i][j])
        T3[i].append(b)

print T3

回答 6

尝试这个。

x = "1"

x是一个字符串,因为它周围带有引号,但其中带有数字。

x = int(x)

由于x的数字为1,因此我可以将其变成整数。

要查看字符串是否为数字,可以执行此操作。

def is_number(var):
    try:
        if var == int(var):
            return True
    except Exception:
        return False

x = "1"

y = "test"

x_test = is_number(x)

print(x_test)

它应该打印到IDLE True,因为x是一个数字。

y_test = is_number(y)

print(y_test)

它应该打印为IDLE False,因为y中没有数字。

Try this.

x = "1"

x is a string because it has quotes around it, but it has a number in it.

x = int(x)

Since x has the number 1 in it, I can turn it in to a integer.

To see if a string is a number, you can do this.

def is_number(var):
    try:
        if var == int(var):
            return True
    except Exception:
        return False

x = "1"

y = "test"

x_test = is_number(x)

print(x_test)

It should print to IDLE True because x is a number.

y_test = is_number(y)

print(y_test)

It should print to IDLE False because y in not a number.


回答 7

使用列表推导:

t2 = [map(int, list(l)) for l in t1]

Using list comprehensions:

t2 = [map(int, list(l)) for l in t1]

回答 8

在Python 3.5.1中,这些工作如下:

c = input('Enter number:')
print (int(float(c)))
print (round(float(c)))

Enter number:  4.7
4
5

乔治。

In Python 3.5.1 things like these work:

c = input('Enter number:')
print (int(float(c)))
print (round(float(c)))

and

Enter number:  4.7
4
5

George.


回答 9

查看此功能

def parse_int(s):
    try:
        res = int(eval(str(s)))
        if type(res) == int:
            return res
    except:
        return

然后

val = parse_int('10')  # Return 10
val = parse_int('0')  # Return 0
val = parse_int('10.5')  # Return 10
val = parse_int('0.0')  # Return 0
val = parse_int('Ten')  # Return None

您也可以检查

if val == None:  # True if input value can not be converted
    pass  # Note: Don't use 'if not val:'

See this function

def parse_int(s):
    try:
        res = int(eval(str(s)))
        if type(res) == int:
            return res
    except:
        return

Then

val = parse_int('10')  # Return 10
val = parse_int('0')  # Return 0
val = parse_int('10.5')  # Return 10
val = parse_int('0.0')  # Return 0
val = parse_int('Ten')  # Return None

You can also check

if val == None:  # True if input value can not be converted
    pass  # Note: Don't use 'if not val:'

回答 10

适用于Python 2的另一个功能解决方案:

from functools import partial

map(partial(map, int), T1)

不过,Python 3会有些混乱:

list(map(list, map(partial(map, int), T1)))

我们可以用包装纸解决

def oldmap(f, iterable):
    return list(map(f, iterable))

oldmap(partial(oldmap, int), T1)

Yet another functional solution for Python 2:

from functools import partial

map(partial(map, int), T1)

Python 3 will be a little bit messy though:

list(map(list, map(partial(map, int), T1)))

we can fix this with a wrapper

def oldmap(f, iterable):
    return list(map(f, iterable))

oldmap(partial(oldmap, int), T1)

回答 11

如果只是元组的元组,类似 rows=[map(int, row) for row in rows]就可以解决。(在其中有一个列表推导和对map(f,lst)的调用,该调用等于[f in a lst]中的f(a)。)

如果由于某种原因在数据库中有类似的东西,Eval 不是您想要做的__import__("os").unlink("importantsystemfile")。始终验证您的输入(如果没有其他问题,如果输入错误,则会引发int()异常)。

If it’s only a tuple of tuples, something like rows=[map(int, row) for row in rows] will do the trick. (There’s a list comprehension and a call to map(f, lst), which is equal to [f(a) for a in lst], in there.)

Eval is not what you want to do, in case there’s something like __import__("os").unlink("importantsystemfile") in your database for some reason. Always validate your input (if with nothing else, the exception int() will raise if you have bad input).


回答 12

您可以执行以下操作:

T1 = (('13', '17', '18', '21', '32'),  
     ('07', '11', '13', '14', '28'),  
     ('01', '05', '06', '08', '15', '16'))  
new_list = list(list(int(a) for a in b if a.isdigit()) for b in T1)  
print(new_list)  

You can do something like this:

T1 = (('13', '17', '18', '21', '32'),  
     ('07', '11', '13', '14', '28'),  
     ('01', '05', '06', '08', '15', '16'))  
new_list = list(list(int(a) for a in b if a.isdigit()) for b in T1)  
print(new_list)  

回答 13

我想分享一个似乎此处未提及的可用选项:

rumpy.random.permutation(x)

将生成数组x的随机排列。不完全是您的要求,但这是解决类似问题的潜在方法。

I want to share an available option that doesn’t seem to be mentioned here yet:

rumpy.random.permutation(x)

Will generate a random permutation of array x. Not exactly what you asked for, but it is a potential solution to similar questions.


如何在Python中从字符串中提取数字?

问题:如何在Python中从字符串中提取数字?

我将提取字符串中包含的所有数字。哪个更适合于目的,正则表达式或isdigit()方法?

例:

line = "hello 12 hi 89"

结果:

[12, 89]

I would extract all the numbers contained in a string. Which is the better suited for the purpose, regular expressions or the isdigit() method?

Example:

line = "hello 12 hi 89"

Result:

[12, 89]

回答 0

如果只想提取正整数,请尝试以下操作:

>>> str = "h3110 23 cat 444.4 rabbit 11 2 dog"
>>> [int(s) for s in str.split() if s.isdigit()]
[23, 11, 2]

我认为这比正则表达式示例更好,原因有三点。首先,您不需要其他模块;其次,它更具可读性,因为您无需解析正则表达式迷你语言;第三,它更快(因此可能更pythonic):

python -m timeit -s "str = 'h3110 23 cat 444.4 rabbit 11 2 dog' * 1000" "[s for s in str.split() if s.isdigit()]"
100 loops, best of 3: 2.84 msec per loop

python -m timeit -s "import re" "str = 'h3110 23 cat 444.4 rabbit 11 2 dog' * 1000" "re.findall('\\b\\d+\\b', str)"
100 loops, best of 3: 5.66 msec per loop

这将无法识别浮点数,负整数或十六进制格式的整数。如果您不能接受这些限制,则可以通过以下亭亭玉立的答案解决问题

If you only want to extract only positive integers, try the following:

>>> str = "h3110 23 cat 444.4 rabbit 11 2 dog"
>>> [int(s) for s in str.split() if s.isdigit()]
[23, 11, 2]

I would argue that this is better than the regex example for three reasons. First, you don’t need another module; secondly, it’s more readable because you don’t need to parse the regex mini-language; and third, it is faster (and thus likely more pythonic):

python -m timeit -s "str = 'h3110 23 cat 444.4 rabbit 11 2 dog' * 1000" "[s for s in str.split() if s.isdigit()]"
100 loops, best of 3: 2.84 msec per loop

python -m timeit -s "import re" "str = 'h3110 23 cat 444.4 rabbit 11 2 dog' * 1000" "re.findall('\\b\\d+\\b', str)"
100 loops, best of 3: 5.66 msec per loop

This will not recognize floats, negative integers, or integers in hexadecimal format. If you can’t accept these limitations, slim’s answer below will do the trick.


回答 1

我会使用regexp:

>>> import re
>>> re.findall(r'\d+', 'hello 42 I\'m a 32 string 30')
['42', '32', '30']

这也将匹配来自的42 bla42bla。如果只需要数字以单词边界(空格,句点,逗号)分隔,则可以使用\ b:

>>> re.findall(r'\b\d+\b', 'he33llo 42 I\'m a 32 string 30')
['42', '32', '30']

要以数字列表而不是字符串列表结尾:

>>> [int(s) for s in re.findall(r'\b\d+\b', 'he33llo 42 I\'m a 32 string 30')]
[42, 32, 30]

I’d use a regexp :

>>> import re
>>> re.findall(r'\d+', 'hello 42 I\'m a 32 string 30')
['42', '32', '30']

This would also match 42 from bla42bla. If you only want numbers delimited by word boundaries (space, period, comma), you can use \b :

>>> re.findall(r'\b\d+\b', 'he33llo 42 I\'m a 32 string 30')
['42', '32', '30']

To end up with a list of numbers instead of a list of strings:

>>> [int(s) for s in re.findall(r'\b\d+\b', 'he33llo 42 I\'m a 32 string 30')]
[42, 32, 30]

回答 2

这已经有点晚了,但是您也可以扩展regex表达式以说明科学计数法。

import re

# Format is [(<string>, <expected output>), ...]
ss = [("apple-12.34 ba33na fanc-14.23e-2yapple+45e5+67.56E+3",
       ['-12.34', '33', '-14.23e-2', '+45e5', '+67.56E+3']),
      ('hello X42 I\'m a Y-32.35 string Z30',
       ['42', '-32.35', '30']),
      ('he33llo 42 I\'m a 32 string -30', 
       ['33', '42', '32', '-30']),
      ('h3110 23 cat 444.4 rabbit 11 2 dog', 
       ['3110', '23', '444.4', '11', '2']),
      ('hello 12 hi 89', 
       ['12', '89']),
      ('4', 
       ['4']),
      ('I like 74,600 commas not,500', 
       ['74,600', '500']),
      ('I like bad math 1+2=.001', 
       ['1', '+2', '.001'])]

for s, r in ss:
    rr = re.findall("[-+]?[.]?[\d]+(?:,\d\d\d)*[\.]?\d*(?:[eE][-+]?\d+)?", s)
    if rr == r:
        print('GOOD')
    else:
        print('WRONG', rr, 'should be', r)

一切都好!

此外,您可以查看AWS Glue内置正则表达式

This is more than a bit late, but you can extend the regex expression to account for scientific notation too.

import re

# Format is [(<string>, <expected output>), ...]
ss = [("apple-12.34 ba33na fanc-14.23e-2yapple+45e5+67.56E+3",
       ['-12.34', '33', '-14.23e-2', '+45e5', '+67.56E+3']),
      ('hello X42 I\'m a Y-32.35 string Z30',
       ['42', '-32.35', '30']),
      ('he33llo 42 I\'m a 32 string -30', 
       ['33', '42', '32', '-30']),
      ('h3110 23 cat 444.4 rabbit 11 2 dog', 
       ['3110', '23', '444.4', '11', '2']),
      ('hello 12 hi 89', 
       ['12', '89']),
      ('4', 
       ['4']),
      ('I like 74,600 commas not,500', 
       ['74,600', '500']),
      ('I like bad math 1+2=.001', 
       ['1', '+2', '.001'])]

for s, r in ss:
    rr = re.findall("[-+]?[.]?[\d]+(?:,\d\d\d)*[\.]?\d*(?:[eE][-+]?\d+)?", s)
    if rr == r:
        print('GOOD')
    else:
        print('WRONG', rr, 'should be', r)

Gives all good!

Additionally, you can look at the AWS Glue built-in regex


回答 3

我假设您想要的不仅是浮点数,所以我会做这样的事情:

l = []
for t in s.split():
    try:
        l.append(float(t))
    except ValueError:
        pass

请注意,此处发布的其他一些解决方案不适用于负数:

>>> re.findall(r'\b\d+\b', 'he33llo 42 I\'m a 32 string -30')
['42', '32', '30']

>>> '-3'.isdigit()
False

I’m assuming you want floats not just integers so I’d do something like this:

l = []
for t in s.split():
    try:
        l.append(float(t))
    except ValueError:
        pass

Note that some of the other solutions posted here don’t work with negative numbers:

>>> re.findall(r'\b\d+\b', 'he33llo 42 I\'m a 32 string -30')
['42', '32', '30']

>>> '-3'.isdigit()
False

回答 4

如果您知道字符串中只有一个数字,即“ hello 12 hi”,则可以尝试过滤。

例如:

In [1]: int(''.join(filter(str.isdigit, '200 grams')))
Out[1]: 200
In [2]: int(''.join(filter(str.isdigit, 'Counters: 55')))
Out[2]: 55
In [3]: int(''.join(filter(str.isdigit, 'more than 23 times')))
Out[3]: 23

但是要小心!:

In [4]: int(''.join(filter(str.isdigit, '200 grams 5')))
Out[4]: 2005

If you know it will be only one number in the string, i.e ‘hello 12 hi’, you can try filter.

For example:

In [1]: int(''.join(filter(str.isdigit, '200 grams')))
Out[1]: 200
In [2]: int(''.join(filter(str.isdigit, 'Counters: 55')))
Out[2]: 55
In [3]: int(''.join(filter(str.isdigit, 'more than 23 times')))
Out[3]: 23

But be carefull !!! :

In [4]: int(''.join(filter(str.isdigit, '200 grams 5')))
Out[4]: 2005

回答 5

# extract numbers from garbage string:
s = '12//n,_@#$%3.14kjlw0xdadfackvj1.6e-19&*ghn334'
newstr = ''.join((ch if ch in '0123456789.-e' else ' ') for ch in s)
listOfNumbers = [float(i) for i in newstr.split()]
print(listOfNumbers)
[12.0, 3.14, 0.0, 1.6e-19, 334.0]
# extract numbers from garbage string:
s = '12//n,_@#$%3.14kjlw0xdadfackvj1.6e-19&*ghn334'
newstr = ''.join((ch if ch in '0123456789.-e' else ' ') for ch in s)
listOfNumbers = [float(i) for i in newstr.split()]
print(listOfNumbers)
[12.0, 3.14, 0.0, 1.6e-19, 334.0]

回答 6

我一直在寻找一种解决方案,特别是从巴西的电话号码中删除字符串的掩码,这篇帖子没有得到回答,但给了我启发。这是我的解决方案:

>>> phone_number = '+55(11)8715-9877'
>>> ''.join([n for n in phone_number if n.isdigit()])
'551187159877'

I was looking for a solution to remove strings’ masks, specifically from Brazilian phones numbers, this post not answered but inspired me. This is my solution:

>>> phone_number = '+55(11)8715-9877'
>>> ''.join([n for n in phone_number if n.isdigit()])
'551187159877'

回答 7

在下面使用正则表达式是

lines = "hello 12 hi 89"
import re
output = []
#repl_str = re.compile('\d+.?\d*')
repl_str = re.compile('^\d+$')
#t = r'\d+.?\d*'
line = lines.split()
for word in line:
        match = re.search(repl_str, word)
        if match:
            output.append(float(match.group()))
print (output)

与findall re.findall(r'\d+', "hello 12 hi 89")

['12', '89']

re.findall(r'\b\d+\b', "hello 12 hi 89 33F AC 777")

 ['12', '89', '777']

Using Regex below is the way

lines = "hello 12 hi 89"
import re
output = []
#repl_str = re.compile('\d+.?\d*')
repl_str = re.compile('^\d+$')
#t = r'\d+.?\d*'
line = lines.split()
for word in line:
        match = re.search(repl_str, word)
        if match:
            output.append(float(match.group()))
print (output)

with findall re.findall(r'\d+', "hello 12 hi 89")

['12', '89']

re.findall(r'\b\d+\b', "hello 12 hi 89 33F AC 777")

 ['12', '89', '777']

回答 8

line2 = "hello 12 hi 89"
temp1 = re.findall(r'\d+', line2) # through regular expression
res2 = list(map(int, temp1))
print(res2)

嗨,

您可以使用findall表达式通过数字搜索字符串中的所有整数。

在第二步中,创建一个列表res2并将在字符串中找到的数字添加到此列表中

希望这可以帮助

此致Diwakar Sharma

line2 = "hello 12 hi 89"
temp1 = re.findall(r'\d+', line2) # through regular expression
res2 = list(map(int, temp1))
print(res2)

Hi ,

you can search all the integers in the string through digit by using findall expression .

In the second step create a list res2 and add the digits found in string to this list

hope this helps

Regards, Diwakar Sharma


回答 9

此答案还包含数字在字符串中为浮点的情况

def get_first_nbr_from_str(input_str):
    '''
    :param input_str: strings that contains digit and words
    :return: the number extracted from the input_str
    demo:
    'ab324.23.123xyz': 324.23
    '.5abc44': 0.5
    '''
    if not input_str and not isinstance(input_str, str):
        return 0
    out_number = ''
    for ele in input_str:
        if (ele == '.' and '.' not in out_number) or ele.isdigit():
            out_number += ele
        elif out_number:
            break
    return float(out_number)

This answer also contains the case when the number is float in the string

def get_first_nbr_from_str(input_str):
    '''
    :param input_str: strings that contains digit and words
    :return: the number extracted from the input_str
    demo:
    'ab324.23.123xyz': 324.23
    '.5abc44': 0.5
    '''
    if not input_str and not isinstance(input_str, str):
        return 0
    out_number = ''
    for ele in input_str:
        if (ele == '.' and '.' not in out_number) or ele.isdigit():
            out_number += ele
        elif out_number:
            break
    return float(out_number)

回答 10

令我惊讶的是,还没有人提到使用itertools.groupby替代实现这一目标的方法。

您可以使用itertools.groupby()str.isdigit()来从字符串中提取数字,如下所示:

from itertools import groupby
my_str = "hello 12 hi 89"

l = [int(''.join(i)) for is_digit, i in groupby(my_str, str.isdigit) if is_digit]

保留的值l将是:

[12, 89]

PS:这只是出于说明的目的,以表明作为替代方案,我们也可以使用它groupby来实现此目的。但这不是推荐的解决方案。如果要实现此目的,则应基于将列表理解与as过滤器一起使用fmark可接受答案str.isdigit

I am amazed to see that no one has yet mentioned the usage of itertools.groupby as an alternative to achieve this.

You may use itertools.groupby() along with str.isdigit() in order to extract numbers from string as:

from itertools import groupby
my_str = "hello 12 hi 89"

l = [int(''.join(i)) for is_digit, i in groupby(my_str, str.isdigit) if is_digit]

The value hold by l will be:

[12, 89]

PS: This is just for illustration purpose to show that as an alternative we could also use groupby to achieve this. But this is not a recommended solution. If you want to achieve this, you should be using accepted answer of fmark based on using list comprehension with str.isdigit as filter.


回答 11

我只是添加这个答案,因为没有人使用异常处理添加了一个答案,因为这也适用于浮点数

a = []
line = "abcd 1234 efgh 56.78 ij"
for word in line.split():
    try:
        a.append(float(word))
    except ValueError:
        pass
print(a)

输出:

[1234.0, 56.78]

I am just adding this answer because no one added one using Exception handling and because this also works for floats

a = []
line = "abcd 1234 efgh 56.78 ij"
for word in line.split():
    try:
        a.append(float(word))
    except ValueError:
        pass
print(a)

Output :

[1234.0, 56.78]

回答 12

要捕获不同的模式,使用不同的模式进行查询很有帮助。

设置捕获不同兴趣数字模式的所有模式:

(查找逗号)12,300或12,300.00

‘[\ d] + [。,\ d] +’

(发现浮动)0.123或.123

‘[\ d] * [。] [\ d] +’

(找到整数)123

‘[\ d] +’

与管道(|)组合为一个具有多个或有条件的模式。

(注意:首先放置复杂模式,否则简单模式将返回复杂捕获的块,而不是复杂捕获返回完整的捕获)。

p = '[\d]+[.,\d]+|[\d]*[.][\d]+|[\d]+'

在下面,我们将确认存在的模式re.search(),然后返回捕获的可迭代列表。最后,我们将使用方括号符号打印每个捕获,以从匹配对象中选择匹配对象的返回值。

s = 'he33llo 42 I\'m a 32 string 30 444.4 12,001'

if re.search(p, s) is not None:
    for catch in re.finditer(p, s):
        print(catch[0]) # catch is a match object

返回值:

33
42
32
30
444.4
12,001

To catch different patterns it is helpful to query with different patterns.

Setup all the patterns that catch different number patterns of interest:

(finds commas) 12,300 or 12,300.00

‘[\d]+[.,\d]+’

(finds floats) 0.123 or .123

‘[\d]*[.][\d]+’

(finds integers) 123

‘[\d]+’

Combine with pipe ( | ) into one pattern with multiple or conditionals.

(Note: Put complex patterns first else simple patterns will return chunks of the complex catch instead of the complex catch returning the full catch).

p = '[\d]+[.,\d]+|[\d]*[.][\d]+|[\d]+'

Below, we’ll confirm a pattern is present with re.search(), then return an iterable list of catches. Finally, we’ll print each catch using bracket notation to subselect the match object return value from the match object.

s = 'he33llo 42 I\'m a 32 string 30 444.4 12,001'

if re.search(p, s) is not None:
    for catch in re.finditer(p, s):
        print(catch[0]) # catch is a match object

Returns:

33
42
32
30
444.4
12,001

回答 13

由于这些都不涉及我需要查找的excel和word docs中的真实财务数字,因此这里是我的变体。它处理整数,浮点数,负数,货币数字(因为它不会在拆分时回复),并且可以选择删除小数部分并仅返回整数或返回所有内容。

它还处理印第安拉克斯数字系统,其中逗号不规则出现,而不是每3个数字分开。

它不处理科学计数法,否则预算中括号内的负数将显示为正数。

它还不会提取日期。有更好的方法来查找字符串中的日期。

import re
def find_numbers(string, ints=True):            
    numexp = re.compile(r'[-]?\d[\d,]*[\.]?[\d{2}]*') #optional - in front
    numbers = numexp.findall(string)    
    numbers = [x.replace(',','') for x in numbers]
    if ints is True:
        return [int(x.replace(',','').split('.')[0]) for x in numbers]            
    else:
        return numbers

Since none of these dealt with real world financial numbers in excel and word docs that I needed to find, here is my variation. It handles ints, floats, negative numbers, currency numbers (because it doesn’t reply on split), and has the option to drop the decimal part and just return ints, or return everything.

It also handles Indian Laks number system where commas appear irregularly, not every 3 numbers apart.

It does not handle scientific notation or negative numbers put inside parentheses in budgets — will appear positive.

It also does not extract dates. There are better ways for finding dates in strings.

import re
def find_numbers(string, ints=True):            
    numexp = re.compile(r'[-]?\d[\d,]*[\.]?[\d{2}]*') #optional - in front
    numbers = numexp.findall(string)    
    numbers = [x.replace(',','') for x in numbers]
    if ints is True:
        return [int(x.replace(',','').split('.')[0]) for x in numbers]            
    else:
        return numbers

回答 14

@jmnas,我很喜欢您的回答,但没有找到浮点数。我正在处理一个脚本,以解析要输入CNC铣床的代码,并且需要查找可以是整数或浮点数的X和Y尺寸,因此我将代码修改为以下内容。查找具有正值和负值的int,float。仍然找不到十六进制格式的值,但是您可以在num_char元组中添加“ x”和“ A”至“ F” ,我认为它将解析“ 0x23AC”之类的内容。

s = 'hello X42 I\'m a Y-32.35 string Z30'
xy = ("X", "Y")
num_char = (".", "+", "-")

l = []

tokens = s.split()
for token in tokens:

    if token.startswith(xy):
        num = ""
        for char in token:
            # print(char)
            if char.isdigit() or (char in num_char):
                num = num + char

        try:
            l.append(float(num))
        except ValueError:
            pass

print(l)

@jmnas, I liked your answer, but it didn’t find floats. I’m working on a script to parse code going to a CNC mill and needed to find both X and Y dimensions that can be integers or floats, so I adapted your code to the following. This finds int, float with positive and negative vals. Still doesn’t find hex formatted values but you could add “x” and “A” through “F” to the num_char tuple and I think it would parse things like ‘0x23AC’.

s = 'hello X42 I\'m a Y-32.35 string Z30'
xy = ("X", "Y")
num_char = (".", "+", "-")

l = []

tokens = s.split()
for token in tokens:

    if token.startswith(xy):
        num = ""
        for char in token:
            # print(char)
            if char.isdigit() or (char in num_char):
                num = num + char

        try:
            l.append(float(num))
        except ValueError:
            pass

print(l)

回答 15

我发现的最佳选择如下。它将提取一个数字并可以消除任何类型的字符。

def extract_nbr(input_str):
    if input_str is None or input_str == '':
        return 0

    out_number = ''
    for ele in input_str:
        if ele.isdigit():
            out_number += ele
    return float(out_number)    

The best option I found is below. It will extract a number and can eliminate any type of char.

def extract_nbr(input_str):
    if input_str is None or input_str == '':
        return 0

    out_number = ''
    for ele in input_str:
        if ele.isdigit():
            out_number += ele
    return float(out_number)    

回答 16

对于电话号码,您只需在正则表达式中使用\ D排除所有非数字字符:

import re

phone_number = '(619) 459-3635'
phone_number = re.sub(r"\D", "", phone_number)
print(phone_number)

For phone numbers you can simply exclude all non-digit characters with \D in regex:

import re

phone_number = '(619) 459-3635'
phone_number = re.sub(r"\D", "", phone_number)
print(phone_number)

检查字符串是否以XXXX开头

问题:检查字符串是否以XXXX开头

我想知道如何检查Python中字符串是否以“ hello”开头。

在Bash中,我通常这样做:

if [[ "$string" =~ ^hello ]]; then
 do something here
fi

如何在Python中实现相同的目标?

I would like to know how to check whether a string starts with “hello” in Python.

In Bash I usually do:

if [[ "$string" =~ ^hello ]]; then
 do something here
fi

How do I achieve the same in Python?


回答 0

aString = "hello world"
aString.startswith("hello")

有关的更多信息startswith

aString = "hello world"
aString.startswith("hello")

More info about startswith.


回答 1

RanRag已经回答了您的特定问题。

但是,更一般地说,您在做什么

if [[ "$string" =~ ^hello ]]

正则表达式匹配。要在Python中执行相同的操作,您可以执行以下操作:

import re
if re.match(r'^hello', somestring):
    # do stuff

显然,在这种情况下somestring.startswith('hello')更好。

RanRag has already answered it for your specific question.

However, more generally, what you are doing with

if [[ "$string" =~ ^hello ]]

is a regex match. To do the same in Python, you would do:

import re
if re.match(r'^hello', somestring):
    # do stuff

Obviously, in this case, somestring.startswith('hello') is better.


回答 2

如果您想将多个单词与魔术单词匹配,则可以将单词匹配为元组:

>>> magicWord = 'zzzTest'
>>> magicWord.startswith(('zzz', 'yyy', 'rrr'))
True

注意startswithstr or a tuple of str

请参阅文档

In case you want to match multiple words to your magic word you can pass the words to match as a tuple:

>>> magicWord = 'zzzTest'
>>> magicWord.startswith(('zzz', 'yyy', 'rrr'))
True

Note: startswith takes str or a tuple of str

See the docs.


回答 3

也可以这样

regex=re.compile('^hello')

## THIS WAY YOU CAN CHECK FOR MULTIPLE STRINGS
## LIKE
## regex=re.compile('^hello|^john|^world')

if re.match(regex, somestring):
    print("Yes")

Can also be done this way..

regex=re.compile('^hello')

## THIS WAY YOU CAN CHECK FOR MULTIPLE STRINGS
## LIKE
## regex=re.compile('^hello|^john|^world')

if re.match(regex, somestring):
    print("Yes")

Python中的字母范围

问题:Python中的字母范围

而不是像这样列出字母字符:

alpha = ['a', 'b', 'c', 'd'.........'z']

有什么办法可以将它分组到某个范围之内?例如,对于数字,可以使用进行分组range()

range(1, 10)

Instead of making a list of alphabet characters like this:

alpha = ['a', 'b', 'c', 'd'.........'z']

is there any way that we can group it to a range or something? For example, for numbers it can be grouped using range():

range(1, 10)

回答 0

>>> import string
>>> string.ascii_lowercase
'abcdefghijklmnopqrstuvwxyz'

如果您确实需要列表:

>>> list(string.ascii_lowercase)
['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j', 'k', 'l', 'm', 'n', 'o', 'p', 'q', 'r', 's', 't', 'u', 'v', 'w', 'x', 'y', 'z']

并做到这一点 range

>>> list(map(chr, range(97, 123))) #or list(map(chr, range(ord('a'), ord('z')+1)))
['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j', 'k', 'l', 'm', 'n', 'o', 'p', 'q', 'r', 's', 't', 'u', 'v', 'w', 'x', 'y', 'z']

其他有用的string模块功能:

>>> help(string) # on Python 3
....
DATA
    ascii_letters = 'abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ'
    ascii_lowercase = 'abcdefghijklmnopqrstuvwxyz'
    ascii_uppercase = 'ABCDEFGHIJKLMNOPQRSTUVWXYZ'
    digits = '0123456789'
    hexdigits = '0123456789abcdefABCDEF'
    octdigits = '01234567'
    printable = '0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ!"#$%&\'()*+,-./:;<=>?@[\\]^_`{|}~ \t\n\r\x0b\x0c'
    punctuation = '!"#$%&\'()*+,-./:;<=>?@[\\]^_`{|}~'
    whitespace = ' \t\n\r\x0b\x0c'
>>> import string
>>> string.ascii_lowercase
'abcdefghijklmnopqrstuvwxyz'

If you really need a list:

>>> list(string.ascii_lowercase)
['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j', 'k', 'l', 'm', 'n', 'o', 'p', 'q', 'r', 's', 't', 'u', 'v', 'w', 'x', 'y', 'z']

And to do it with range

>>> list(map(chr, range(97, 123))) #or list(map(chr, range(ord('a'), ord('z')+1)))
['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j', 'k', 'l', 'm', 'n', 'o', 'p', 'q', 'r', 's', 't', 'u', 'v', 'w', 'x', 'y', 'z']

Other helpful string module features:

>>> help(string) # on Python 3
....
DATA
    ascii_letters = 'abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ'
    ascii_lowercase = 'abcdefghijklmnopqrstuvwxyz'
    ascii_uppercase = 'ABCDEFGHIJKLMNOPQRSTUVWXYZ'
    digits = '0123456789'
    hexdigits = '0123456789abcdefABCDEF'
    octdigits = '01234567'
    printable = '0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ!"#$%&\'()*+,-./:;<=>?@[\\]^_`{|}~ \t\n\r\x0b\x0c'
    punctuation = '!"#$%&\'()*+,-./:;<=>?@[\\]^_`{|}~'
    whitespace = ' \t\n\r\x0b\x0c'

回答 1

[chr(i) for i in range(ord('a'),ord('z')+1)]
[chr(i) for i in range(ord('a'),ord('z')+1)]

回答 2

在Python 2.7和3中,您可以使用以下代码:

import string
string.ascii_lowercase
'abcdefghijklmnopqrstuvwxyz'

string.ascii_uppercase
'ABCDEFGHIJKLMNOPQRSTUVWXYZ'

正如@Zaz所说: string.lowercase已弃用,不再在Python 3中string.ascii_lowercase工作,但在两者中都工作

In Python 2.7 and 3 you can use this:

import string
string.ascii_lowercase
'abcdefghijklmnopqrstuvwxyz'

string.ascii_uppercase
'ABCDEFGHIJKLMNOPQRSTUVWXYZ'

As @Zaz says: string.lowercase is deprecated and no longer works in Python 3 but string.ascii_lowercase works in both


回答 3

这是一个简单的字母范围实现:

def letter_range(start, stop="{", step=1):
    """Yield a range of lowercase letters.""" 
    for ord_ in range(ord(start.lower()), ord(stop.lower()), step):
        yield chr(ord_)

演示版

list(letter_range("a", "f"))
# ['a', 'b', 'c', 'd', 'e']

list(letter_range("a", "f", step=2))
# ['a', 'c', 'e']

Here is a simple letter-range implementation:

Code

def letter_range(start, stop="{", step=1):
    """Yield a range of lowercase letters.""" 
    for ord_ in range(ord(start.lower()), ord(stop.lower()), step):
        yield chr(ord_)

Demo

list(letter_range("a", "f"))
# ['a', 'b', 'c', 'd', 'e']

list(letter_range("a", "f", step=2))
# ['a', 'c', 'e']

回答 4

如果要查找letters[1:10]与R 相等的值,则可以使用:

 import string
 list(string.ascii_lowercase[0:10])

If you are looking to an equivalent of letters[1:10] from R, you can use:

 import string
 list(string.ascii_lowercase[0:10])

回答 5

使用内置的range函数在python中打印大写和小写字母

def upperCaseAlphabets():
    print("Upper Case Alphabets")
    for i in range(65, 91):
        print(chr(i), end=" ")
    print()

def lowerCaseAlphabets():
    print("Lower Case Alphabets")
    for i in range(97, 123):
        print(chr(i), end=" ")

upperCaseAlphabets();
lowerCaseAlphabets();

Print the Upper and Lower case alphabets in python using a built-in range function

def upperCaseAlphabets():
    print("Upper Case Alphabets")
    for i in range(65, 91):
        print(chr(i), end=" ")
    print()

def lowerCaseAlphabets():
    print("Lower Case Alphabets")
    for i in range(97, 123):
        print(chr(i), end=" ")

upperCaseAlphabets();
lowerCaseAlphabets();

回答 6

这是我能找出的最简单方法:

#!/usr/bin/python3 for i in range(97, 123): print("{:c}".format(i), end='')

因此,从97到122是与’a’和’z’等效的ASCII码。请注意,小写字母和放置123的需要,因为将不包括在内)。

在打印功能中,请确保设置{:c}(字符)格式,在这种情况下,我们希望它一起打印所有内容,甚至不让最后一行换行,这样end=''就可以了。

结果是这样的: abcdefghijklmnopqrstuvwxyz

This is the easiest way I can figure out:

#!/usr/bin/python3 for i in range(97, 123): print("{:c}".format(i), end='')

So, 97 to 122 are the ASCII number equivalent to ‘a’ to and ‘z’. Notice the lowercase and the need to put 123, since it will not be included).

In print function make sure to set the {:c} (character) format, and, in this case, we want it to print it all together not even letting a new line at the end, so end=''would do the job.

The result is this: abcdefghijklmnopqrstuvwxyz


如何对字符串列表进行排序?

问题:如何对字符串列表进行排序?

在Python中创建按字母顺序排序的列表的最佳方法是什么?

What is the best way of creating an alphabetically sorted list in Python?


回答 0

基本答案:

mylist = ["b", "C", "A"]
mylist.sort()

这会修改您的原始列表(即就地排序)。要获得列表的排序副本,而无需更改原始副本,请使用以下sorted()函数:

for x in sorted(mylist):
    print x

但是,上面的示例有些天真,因为它们没有考虑区域设置,而是执行区分大小写的排序。您可以利用可选参数key指定自定义排序顺序(使用的替代方法cmp是不推荐使用的解决方案,因为它必须多次评估- key每个元素仅计算一次)。

因此,要根据当前语言环境进行排序,并考虑到特定于语言的规则(这cmp_to_key是functools的帮助函数):

sorted(mylist, key=cmp_to_key(locale.strcoll))

最后,如果需要,您可以指定自定义语言环境进行排序:

import locale
locale.setlocale(locale.LC_ALL, 'en_US.UTF-8') # vary depending on your lang/locale
assert sorted((u'Ab', u'ad', u'aa'),
  key=cmp_to_key(locale.strcoll)) == [u'aa', u'Ab', u'ad']

最后要注意的是:您将看到使用该lower()方法的不区分大小写的排序示例-这些是不正确的,因为它们仅适用于ASCII字符子集。对于任何非英语数据,这两个错误:

# this is incorrect!
mylist.sort(key=lambda x: x.lower())
# alternative notation, a bit faster, but still wrong
mylist.sort(key=str.lower)

Basic answer:

mylist = ["b", "C", "A"]
mylist.sort()

This modifies your original list (i.e. sorts in-place). To get a sorted copy of the list, without changing the original, use the sorted() function:

for x in sorted(mylist):
    print x

However, the examples above are a bit naive, because they don’t take locale into account, and perform a case-sensitive sorting. You can take advantage of the optional parameter key to specify custom sorting order (the alternative, using cmp, is a deprecated solution, as it has to be evaluated multiple times – key is only computed once per element).

So, to sort according to the current locale, taking language-specific rules into account (cmp_to_key is a helper function from functools):

sorted(mylist, key=cmp_to_key(locale.strcoll))

And finally, if you need, you can specify a custom locale for sorting:

import locale
locale.setlocale(locale.LC_ALL, 'en_US.UTF-8') # vary depending on your lang/locale
assert sorted((u'Ab', u'ad', u'aa'),
  key=cmp_to_key(locale.strcoll)) == [u'aa', u'Ab', u'ad']

Last note: you will see examples of case-insensitive sorting which use the lower() method – those are incorrect, because they work only for the ASCII subset of characters. Those two are wrong for any non-English data:

# this is incorrect!
mylist.sort(key=lambda x: x.lower())
# alternative notation, a bit faster, but still wrong
mylist.sort(key=str.lower)

回答 1

还值得注意的sorted()功能:

for x in sorted(list):
    print x

这将返回列表的新排序版本,而不更改原始列表。

It is also worth noting the sorted() function:

for x in sorted(list):
    print x

This returns a new, sorted version of a list without changing the original list.


回答 2

list.sort()

真的就是这么简单:)

list.sort()

It really is that simple :)


回答 3

字符串排序的正确方法是:

import locale
locale.setlocale(locale.LC_ALL, 'en_US.UTF-8') # vary depending on your lang/locale
assert sorted((u'Ab', u'ad', u'aa'), cmp=locale.strcoll) == [u'aa', u'Ab', u'ad']

# Without using locale.strcoll you get:
assert sorted((u'Ab', u'ad', u'aa')) == [u'Ab', u'aa', u'ad']

前面的示例mylist.sort(key=lambda x: x.lower())对于仅ASCII上下文适用。

The proper way to sort strings is:

import locale
locale.setlocale(locale.LC_ALL, 'en_US.UTF-8') # vary depending on your lang/locale
assert sorted((u'Ab', u'ad', u'aa'), cmp=locale.strcoll) == [u'aa', u'Ab', u'ad']

# Without using locale.strcoll you get:
assert sorted((u'Ab', u'ad', u'aa')) == [u'Ab', u'aa', u'ad']

The previous example of mylist.sort(key=lambda x: x.lower()) will work fine for ASCII-only contexts.


回答 4

请在Python3中使用sorted()函数

items = ["love", "like", "play", "cool", "my"]
sorted(items2)

Please use sorted() function in Python3

items = ["love", "like", "play", "cool", "my"]
sorted(items2)

回答 5

但是,这如何处理特定于语言的排序规则?是否考虑到语言环境?

不,list.sort()是通用排序功能。如果要根据Unicode规则进行排序,则必须定义一个自定义的排序键函数。您可以尝试使用pyuca模块,但我不知道它的完整性。

But how does this handle language specific sorting rules? Does it take locale into account?

No, list.sort() is a generic sorting function. If you want to sort according to the Unicode rules, you’ll have to define a custom sort key function. You can try using the pyuca module, but I don’t know how complete it is.


回答 6

这是一个老问题,但是如果您想在不进行设置的情况下进行 locale.LC_ALL感知区域设置的排序,则可以按照此答案的建议使用PyICU库

import icu # PyICU

def sorted_strings(strings, locale=None):
    if locale is None:
       return sorted(strings)
    collator = icu.Collator.createInstance(icu.Locale(locale))
    return sorted(strings, key=collator.getSortKey)

然后用例如:

new_list = sorted_strings(list_of_strings, "de_DE.utf8")

这对我有用,而无需安装任何语言环境或更改其他系统设置。

(这已经在上面的评论中建议,但是我想让它更加突出,因为我一开始就很想念它。)

Old question, but if you want to do locale-aware sorting without setting locale.LC_ALL you can do so by using the PyICU library as suggested by this answer:

import icu # PyICU

def sorted_strings(strings, locale=None):
    if locale is None:
       return sorted(strings)
    collator = icu.Collator.createInstance(icu.Locale(locale))
    return sorted(strings, key=collator.getSortKey)

Then call with e.g.:

new_list = sorted_strings(list_of_strings, "de_DE.utf8")

This worked for me without installing any locales or changing other system settings.

(This was already suggested in a comment above, but I wanted to give it more prominence, because I missed it myself at first.)


回答 7

假设 s = "ZWzaAd"

要在字符串上方排序,简单的解决方案将是在字符串下方。

print ''.join(sorted(s))

Suppose s = "ZWzaAd"

To sort above string the simple solution will be below one.

print ''.join(sorted(s))

回答 8

或许:

names = ['Jasmine', 'Alberto', 'Ross', 'dig-dog']
print ("The solution for this is about this names being sorted:",sorted(names, key=lambda name:name.lower()))

Or maybe:

names = ['Jasmine', 'Alberto', 'Ross', 'dig-dog']
print ("The solution for this is about this names being sorted:",sorted(names, key=lambda name:name.lower()))

回答 9

l =['abc' , 'cd' , 'xy' , 'ba' , 'dc']
l.sort()
print(l1)

结果

[‘abc’,’ba’,’cd’,’dc’,’xy’]

l =['abc' , 'cd' , 'xy' , 'ba' , 'dc']
l.sort()
print(l1)

Result

[‘abc’, ‘ba’, ‘cd’, ‘dc’, ‘xy’]


回答 10

很简单:https : //trinket.io/library/trinkets/5db81676e4

scores = '54 - Alice,35 - Bob,27 - Carol,27 - Chuck,05 - Craig,30 - Dan,27 - Erin,77 - Eve,14 - Fay,20 - Frank,48 - Grace,61 - Heidi,03 - Judy,28 - Mallory,05 - Olivia,44 - Oscar,34 - Peggy,30 - Sybil,82 - Trent,75 - Trudy,92 - Victor,37 - Walter'

得分= scores.split(’,’)for x in sorted(scores):print(x)

It is simple: https://trinket.io/library/trinkets/5db81676e4

scores = '54 - Alice,35 - Bob,27 - Carol,27 - Chuck,05 - Craig,30 - Dan,27 - Erin,77 - Eve,14 - Fay,20 - Frank,48 - Grace,61 - Heidi,03 - Judy,28 - Mallory,05 - Olivia,44 - Oscar,34 - Peggy,30 - Sybil,82 - Trent,75 - Trudy,92 - Victor,37 - Walter'

scores = scores.split(‘,’) for x in sorted(scores): print(x)


将字符列表转换为字符串

问题:将字符列表转换为字符串

如果我有一个字符列表:

a = ['a','b','c','d']

如何将其转换为单个字符串?

a = 'abcd'

If I have a list of chars:

a = ['a','b','c','d']

How do I convert it into a single string?

a = 'abcd'

回答 0

使用join空字符串的方法将所有字符串以及中间的空字符串连接在一起,如下所示:

>>> a = ['a', 'b', 'c', 'd']
>>> ''.join(a)
'abcd'

Use the join method of the empty string to join all of the strings together with the empty string in between, like so:

>>> a = ['a', 'b', 'c', 'd']
>>> ''.join(a)
'abcd'

回答 1

这可以在许多流行的语言(例如JavaScript和Ruby)中使用,为什么不能在Python中使用?

>>> ['a', 'b', 'c'].join('')
Traceback (most recent call last):
   File "<stdin>", line 1, in <module>
AttributeError: 'list' object has no attribute 'join'

奇怪的是,在Python中,join方法在str类上:

# this is the Python way
"".join(['a','b','c','d'])

为什么对象中join的方法list不像JavaScript或其他流行的脚本语言那样?这是Python社区如何思考的一个示例。由于join返回的是字符串,因此应将其放置在字符串类中,而不是列表类中,因此该str.join(list)方法意味着:使用str分隔符将列表连接到新字符串中(本例中str为空字符串)。

过了一段时间,我莫名其妙地爱上了这种思维方式。我可以抱怨Python设计中的很多事情,但不能抱怨它的连贯性。

This works in many popular languages like JavaScript and Ruby, why not in Python?

>>> ['a', 'b', 'c'].join('')
Traceback (most recent call last):
   File "<stdin>", line 1, in <module>
AttributeError: 'list' object has no attribute 'join'

Strange enough, in Python the join method is on the str class:

# this is the Python way
"".join(['a','b','c','d'])

Why join is not a method in the list object like in JavaScript or other popular script languages? It is one example of how the Python community thinks. Since join is returning a string, it should be placed in the string class, not on the list class, so the str.join(list) method means: join the list into a new string using str as a separator (in this case str is an empty string).

Somehow I got to love this way of thinking after a while. I can complain about a lot of things in Python design, but not about its coherence.


回答 2

如果您的Python解释器较旧(例如,1.5.2在某些较旧的Linux发行版中很常见),则您可能无法join()在任何旧的字符串对象上将其用作方法,而需要使用字符串模块。例:

a = ['a', 'b', 'c', 'd']

try:
    b = ''.join(a)

except AttributeError:
    import string
    b = string.join(a, '')

字符串b将为'abcd'

If your Python interpreter is old (1.5.2, for example, which is common on some older Linux distributions), you may not have join() available as a method on any old string object, and you will instead need to use the string module. Example:

a = ['a', 'b', 'c', 'd']

try:
    b = ''.join(a)

except AttributeError:
    import string
    b = string.join(a, '')

The string b will be 'abcd'.


回答 3

这可能是最快的方法:

>> from array import array
>> a = ['a','b','c','d']
>> array('B', map(ord,a)).tostring()
'abcd'

This may be the fastest way:

>> from array import array
>> a = ['a','b','c','d']
>> array('B', map(ord,a)).tostring()
'abcd'

回答 4

减少功能也起作用

import operator
h=['a','b','c','d']
reduce(operator.add, h)
'abcd'

The reduce function also works

import operator
h=['a','b','c','d']
reduce(operator.add, h)
'abcd'

回答 5

如果列表包含数字,则可以map()与结合使用join()

例如:

>>> arr = [3, 30, 34, 5, 9]
>>> ''.join(map(str, arr))
3303459

If the list contains numbers, you can use map() with join().

Eg:

>>> arr = [3, 30, 34, 5, 9]
>>> ''.join(map(str, arr))
3303459

回答 6

h = ['a','b','c','d','e','f']
g = ''
for f in h:
    g = g + f

>>> g
'abcdef'
h = ['a','b','c','d','e','f']
g = ''
for f in h:
    g = g + f

>>> g
'abcdef'

回答 7

除了str.join这是最自然的方式,一种可能性是使用io.StringIO和滥用一次writelines编写所有元素:

import io

a = ['a','b','c','d']

out = io.StringIO()
out.writelines(a)
print(out.getvalue())

印刷品:

abcd

当将此方法与生成器函数或不是a tuple或a 的可迭代器一起使用时list,它将保存临时创建的列表,该列表join确实可以一次性分配正确的大小(并且1个字符的字符串列表在内存方面非常昂贵) )。

如果您的内存不足,并且输入的对象是惰性求值,则此方法是最佳解决方案。

besides str.join which is the most natural way, a possibility is to use io.StringIO and abusing writelines to write all elements in one go:

import io

a = ['a','b','c','d']

out = io.StringIO()
out.writelines(a)
print(out.getvalue())

prints:

abcd

When using this approach with a generator function or an iterable which isn’t a tuple or a list, it saves the temporary list creation that join does to allocate the right size in one go (and a list of 1-character strings is very expensive memory-wise).

If you’re low in memory and you have a lazily-evaluated object as input, this approach is the best solution.


回答 8

您也可以operator.concat()这样使用:

>>> from operator import concat
>>> a = ['a', 'b', 'c', 'd']
>>> reduce(concat, a)
'abcd'

如果您使用的是Python 3,则需要先添加:

>>> from functools import reduce

由于内置函数reduce()已从Python 3中删除,现在位于中functools.reduce()

You could also use operator.concat() like this:

>>> from operator import concat
>>> a = ['a', 'b', 'c', 'd']
>>> reduce(concat, a)
'abcd'

If you’re using Python 3 you need to prepend:

>>> from functools import reduce

since the builtin reduce() has been removed from Python 3 and now lives in functools.reduce().


使用Python将JSON字符串转换为dict

问题:使用Python将JSON字符串转换为dict

我对Python中的JSON感到有些困惑。在我看来,这就像是一本字典,因此我正在尝试这样做:

{
    "glossary":
    {
        "title": "example glossary",
        "GlossDiv":
        {
            "title": "S",
            "GlossList":
            {
                "GlossEntry":
                {
                    "ID": "SGML",
                    "SortAs": "SGML",
                    "GlossTerm": "Standard Generalized Markup Language",
                    "Acronym": "SGML",
                    "Abbrev": "ISO 8879:1986",
                    "GlossDef":
                    {
                        "para": "A meta-markup language, used to create markup languages such as DocBook.",
                        "GlossSeeAlso": ["GML", "XML"]
                    },
                    "GlossSee": "markup"
                }
            }
        }
    }
}

但是当我这样做时print dict(json),它会给出一个错误。

如何将该字符串转换为结构,然后调用json["title"]以获得“示例词汇表”?

I’m a little bit confused with JSON in Python. To me, it seems like a dictionary, and for that reason I’m trying to do that:

{
    "glossary":
    {
        "title": "example glossary",
        "GlossDiv":
        {
            "title": "S",
            "GlossList":
            {
                "GlossEntry":
                {
                    "ID": "SGML",
                    "SortAs": "SGML",
                    "GlossTerm": "Standard Generalized Markup Language",
                    "Acronym": "SGML",
                    "Abbrev": "ISO 8879:1986",
                    "GlossDef":
                    {
                        "para": "A meta-markup language, used to create markup languages such as DocBook.",
                        "GlossSeeAlso": ["GML", "XML"]
                    },
                    "GlossSee": "markup"
                }
            }
        }
    }
}

But when I do print dict(json), it gives an error.

How can I transform this string into a structure and then call json["title"] to obtain “example glossary”?


回答 0

json.loads()

import json

d = json.loads(j)
print d['glossary']['title']

json.loads()

import json

d = json.loads(j)
print d['glossary']['title']

回答 1

当我开始使用json时,我很困惑,无法解决一段时间,但最终我得到了想要的东西。
这是简单的解决方案

import json
m = {'id': 2, 'name': 'hussain'}
n = json.dumps(m)
o = json.loads(n)
print(o['id'], o['name'])

When I started using json, I was confused and unable to figure it out for some time, but finally I got what I wanted
Here is the simple solution

import json
m = {'id': 2, 'name': 'hussain'}
n = json.dumps(m)
o = json.loads(n)
print(o['id'], o['name'])

回答 2

使用simplejson或cjson进行加速

import simplejson as json

json.loads(obj)

or 

cjson.decode(obj)

use simplejson or cjson for speedups

import simplejson as json

json.loads(obj)

or 

cjson.decode(obj)

回答 3

如果您信任数据源,则可以用于eval将字符串转换为字典:

eval(your_json_format_string)

例:

>>> x = "{'a' : 1, 'b' : True, 'c' : 'C'}"
>>> y = eval(x)

>>> print x
{'a' : 1, 'b' : True, 'c' : 'C'}
>>> print y
{'a': 1, 'c': 'C', 'b': True}

>>> print type(x), type(y)
<type 'str'> <type 'dict'>

>>> print y['a'], type(y['a'])
1 <type 'int'>

>>> print y['a'], type(y['b'])
1 <type 'bool'>

>>> print y['a'], type(y['c'])
1 <type 'str'>

If you trust the data source, you can use eval to convert your string into a dictionary:

eval(your_json_format_string)

Example:

>>> x = "{'a' : 1, 'b' : True, 'c' : 'C'}"
>>> y = eval(x)

>>> print x
{'a' : 1, 'b' : True, 'c' : 'C'}
>>> print y
{'a': 1, 'c': 'C', 'b': True}

>>> print type(x), type(y)
<type 'str'> <type 'dict'>

>>> print y['a'], type(y['a'])
1 <type 'int'>

>>> print y['a'], type(y['b'])
1 <type 'bool'>

>>> print y['a'], type(y['c'])
1 <type 'str'>