标签归档:string

如何找出Python对象是否是字符串?

问题:如何找出Python对象是否是字符串?

如何检查Python对象是字符串(常规还是Unicode)?

How can I check if a Python object is a string (either regular or Unicode)?


回答 0

Python 2

使用isinstance(obj, basestring)一个对象来测试obj

文件

Python 2

Use isinstance(obj, basestring) for an object-to-test obj.

Docs.


回答 1

Python 2

要检查对象o是否是字符串类型的子类的字符串类型:

isinstance(o, basestring)

因为str和和unicode都是的子类basestring

检查的类型o是否完全是str

type(o) is str

检查是否o是的实例str或的任何子类str

isinstance(o, str)

以上还为Unicode字符串的工作,如果你更换str使用unicode

但是,您可能根本不需要进行显式类型检查。“鸭子打字”可能符合您的需求。请参阅http://docs.python.org/glossary.html#term-duck-typing

另请参阅在python中检查类型的规范方法是什么?

Python 2

To check if an object o is a string type of a subclass of a string type:

isinstance(o, basestring)

because both str and unicode are subclasses of basestring.

To check if the type of o is exactly str:

type(o) is str

To check if o is an instance of str or any subclass of str:

isinstance(o, str)

The above also work for Unicode strings if you replace str with unicode.

However, you may not need to do explicit type checking at all. “Duck typing” may fit your needs. See http://docs.python.org/glossary.html#term-duck-typing.

See also What’s the canonical way to check for type in python?


回答 2

Python 3

在Python 3.x basestring中,str唯一的字符串类型(具有Python 2.x的语义unicode)不再可用。

因此,Python 3.x中的检查只是:

isinstance(obj_to_test, str)

这是对官方转换工具的修复2to3:转换basestringstr

Python 3

In Python 3.x basestring is not available anymore, as str is the sole string type (with the semantics of Python 2.x’s unicode).

So the check in Python 3.x is just:

isinstance(obj_to_test, str)

This follows the fix of the official 2to3 conversion tool: converting basestring to str.


回答 3

Python 2和3

(兼容)

如果您不想检查Python版本(2.x与3.x),请使用sixPyPI)及其string_types属性:

import six

if isinstance(obj, six.string_types):
    print('obj is a string!')

six(一个重量很轻的单文件模块)中,它只是在做这件事

import sys
PY3 = sys.version_info[0] == 3

if PY3:
    string_types = str
else:
    string_types = basestring

Python 2 and 3

(cross-compatible)

If you want to check with no regard for Python version (2.x vs 3.x), use six (PyPI) and its string_types attribute:

import six

if isinstance(obj, six.string_types):
    print('obj is a string!')

Within six (a very light-weight single-file module), it’s simply doing this:

import sys
PY3 = sys.version_info[0] == 3

if PY3:
    string_types = str
else:
    string_types = basestring

回答 4

我发现了这个更多pythonic

if type(aObject) is str:
    #do your stuff here
    pass

由于类型对象是单例,因此可以用于将对象与str类型进行比较

I found this ans more pythonic:

if type(aObject) is str:
    #do your stuff here
    pass

since type objects are singleton, is can be used to do the compare the object to the str type


回答 5

如果一个人想从明确的类型检查(也有说走就走很好的理由远离它),可能是最安全的弦协议的一部分,以检查:

str(maybe_string) == maybe_string

它不会通过迭代的迭代或迭代器,它不会调用列表的串一个字符串,它正确地检测弦乐器的弦。

当然有缺点。例如,str(maybe_string)可能是繁重的计算。通常,答案取决于它

编辑:作为@Tcll 指出的意见,问题实际上询问的方式同时检测unicode字符串和字节串。在Python 2上,此答案将失败,但包含非ASCII字符的unicode字符串将exceptions,在Python 3上,它将False为所有字节串返回。

If one wants to stay away from explicit type-checking (and there are good reasons to stay away from it), probably the safest part of the string protocol to check is:

str(maybe_string) == maybe_string

It won’t iterate through an iterable or iterator, it won’t call a list-of-strings a string and it correctly detects a stringlike as a string.

Of course there are drawbacks. For example, str(maybe_string) may be a heavy calculation. As so often, the answer is it depends.

EDIT: As @Tcll points out in the comments, the question actually asks for a way to detect both unicode strings and bytestrings. On Python 2 this answer will fail with an exception for unicode strings that contain non-ASCII characters, and on Python 3 it will return False for all bytestrings.


回答 6

为了检查您的变量是否是某些东西,您可以像这样:

s='Hello World'
if isinstance(s,str):
#do something here,

isistance的输出将为您提供布尔值True或False,以便您可以进行相应的调整。您可以通过最初使用以下命令检查您的值的期望首字母缩写:type(s)这将返回您键入“ str”,以便您可以在isistance函数中使用它。

In order to check if your variable is something you could go like:

s='Hello World'
if isinstance(s,str):
#do something here,

The output of isistance will give you a boolean True or False value so you can adjust accordingly. You can check the expected acronym of your value by initially using: type(s) This will return you type ‘str’ so you can use it in the isistance function.


回答 7

我可能会像其他人提到的那样以鸭子打字的方式处理这个问题。我怎么知道一个字符串真的是一个字符串?好吧,显然是通过转换为字符串!

def myfunc(word):
    word = unicode(word)
    ...

如果arg已经是字符串或unicode类型,则real_word将保持其值不变。如果传递的对象实现一个__unicode__方法,则该方法用于获取其unicode表示形式。如果传递的对象不能用作字符串,则unicode内建函数引发异常。

I might deal with this in the duck-typing style, like others mention. How do I know a string is really a string? well, obviously by converting it to a string!

def myfunc(word):
    word = unicode(word)
    ...

If the arg is already a string or unicode type, real_word will hold its value unmodified. If the object passed implements a __unicode__ method, that is used to get its unicode representation. If the object passed cannot be used as a string, the unicode builtin raises an exception.


回答 8

isinstance(your_object, basestring)

如果您的对象确实是字符串类型,则将为True。’str’是保留字。

抱歉,正确的答案是使用’basestring’而不是’str’,以便它也包括unicode字符串-如上文其他响应者所述。

isinstance(your_object, basestring)

will be True if your object is indeed a string-type. ‘str’ is reserved word.

my apologies, the correct answer is using ‘basestring’ instead of ‘str’ in order of it to include unicode strings as well – as been noted above by one of the other responders.


回答 9

今天晚上,我遇到了一种情况,我以为我必须检查一下str类型,但事实证明我没有。

我解决问题的方法可能在许多情况下都可以使用,因此,在其他阅读此问题的人员感兴趣的情况下,我在下面提供了此方法(仅适用于Python 3)。

# NOTE: fields is an object that COULD be any number of things, including:
# - a single string-like object
# - a string-like object that needs to be converted to a sequence of 
# string-like objects at some separator, sep
# - a sequence of string-like objects
def getfields(*fields, sep=' ', validator=lambda f: True):
    '''Take a field sequence definition and yield from a validated
     field sequence. Accepts a string, a string with separators, 
     or a sequence of strings'''
    if fields:
        try:
            # single unpack in the case of a single argument
            fieldseq, = fields
            try:
                # convert to string sequence if string
                fieldseq = fieldseq.split(sep)
            except AttributeError:
                # not a string; assume other iterable
                pass
        except ValueError:
            # not a single argument and not a string
            fieldseq = fields
        invalid_fields = [field for field in fieldseq if not validator(field)]
        if invalid_fields:
            raise ValueError('One or more field names is invalid:\n'
                             '{!r}'.format(invalid_fields))
    else:
        raise ValueError('No fields were provided')
    try:
        yield from fieldseq
    except TypeError as e:
        raise ValueError('Single field argument must be a string'
                         'or an interable') from e

一些测试:

from . import getfields

def test_getfields_novalidation():
    result = ['a', 'b']
    assert list(getfields('a b')) == result
    assert list(getfields('a,b', sep=',')) == result
    assert list(getfields('a', 'b')) == result
    assert list(getfields(['a', 'b'])) == result

This evening I ran into a situation in which I thought I was going to have to check against the str type, but it turned out I did not.

My approach to solving the problem will probably work in many situations, so I offer it below in case others reading this question are interested (Python 3 only).

# NOTE: fields is an object that COULD be any number of things, including:
# - a single string-like object
# - a string-like object that needs to be converted to a sequence of 
# string-like objects at some separator, sep
# - a sequence of string-like objects
def getfields(*fields, sep=' ', validator=lambda f: True):
    '''Take a field sequence definition and yield from a validated
     field sequence. Accepts a string, a string with separators, 
     or a sequence of strings'''
    if fields:
        try:
            # single unpack in the case of a single argument
            fieldseq, = fields
            try:
                # convert to string sequence if string
                fieldseq = fieldseq.split(sep)
            except AttributeError:
                # not a string; assume other iterable
                pass
        except ValueError:
            # not a single argument and not a string
            fieldseq = fields
        invalid_fields = [field for field in fieldseq if not validator(field)]
        if invalid_fields:
            raise ValueError('One or more field names is invalid:\n'
                             '{!r}'.format(invalid_fields))
    else:
        raise ValueError('No fields were provided')
    try:
        yield from fieldseq
    except TypeError as e:
        raise ValueError('Single field argument must be a string'
                         'or an interable') from e

Some tests:

from . import getfields

def test_getfields_novalidation():
    result = ['a', 'b']
    assert list(getfields('a b')) == result
    assert list(getfields('a,b', sep=',')) == result
    assert list(getfields('a', 'b')) == result
    assert list(getfields(['a', 'b'])) == result

回答 10

它很简单,请使用以下代码(我们假设提到的对象为obj)-

if type(obj) == str:
    print('It is a string')
else:
    print('It is not a string.')

Its simple, use the following code (we assume the object mentioned to be obj)-

if type(obj) == str:
    print('It is a string')
else:
    print('It is not a string.')

回答 11

您可以通过连接一个空字符串来测试它:

def is_string(s):
  try:
    s += ''
  except:
    return False
  return True

编辑

在指出指出列表失败的评论后纠正我的答案

def is_string(s):
  return isinstance(s, basestring)

You can test it by concatenating with an empty string:

def is_string(s):
  try:
    s += ''
  except:
    return False
  return True

Edit:

Correcting my answer after comments pointing out that this fails with lists

def is_string(s):
  return isinstance(s, basestring)

回答 12

对于类似字符串的鸭式打字方法,它具有同时使用Python 2.x和3.x的优点:

def is_string(obj):
    try:
        obj + ''
        return True
    except TypeError:
        return False

明智的鱼转而使用鸭式输入法之前就与鸭式输入isinstance方式很接近,只是+=对列表的含义与以前不同+

For a nice duck-typing approach for string-likes that has the bonus of working with both Python 2.x and 3.x:

def is_string(obj):
    try:
        obj + ''
        return True
    except TypeError:
        return False

wisefish was close with the duck-typing before he switched to the isinstance approach, except that += has a different meaning for lists than + does.


回答 13

if type(varA) == str or type(varB) == str:
    print 'string involved'

来自EDX-在线类MITx:6.00.1x使用Python进行计算机科学和编程简介

if type(varA) == str or type(varB) == str:
    print 'string involved'

from EDX – online course MITx: 6.00.1x Introduction to Computer Science and Programming Using Python


有没有一种简单的方法来删除字符串中的多个空格?

问题:有没有一种简单的方法来删除字符串中的多个空格?

假设此字符串:

The   fox jumped   over    the log.

转变为:

The fox jumped over the log.

什么是最简单的方法(1-2行),而无需拆分并进入列表?

Suppose this string:

The   fox jumped   over    the log.

Turning into:

The fox jumped over the log.

What is the simplest (1-2 lines) to achieve this, without splitting and going into lists?


回答 0

>>> import re
>>> re.sub(' +', ' ', 'The     quick brown    fox')
'The quick brown fox'
>>> import re
>>> re.sub(' +', ' ', 'The     quick brown    fox')
'The quick brown fox'

回答 1

foo 是您的字符串:

" ".join(foo.split())

请注意,尽管这样做会删除“所有空白字符(空格,制表符,换行符,返回符,换页符)”(由于hhsaffar,请参见注释)。即,"this is \t a test\n"将有效地终止为"this is a test"

foo is your string:

" ".join(foo.split())

Be warned though this removes “all whitespace characters (space, tab, newline, return, formfeed)” (thanks to hhsaffar, see comments). I.e., "this is \t a test\n" will effectively end up as "this is a test".


回答 2

import re
s = "The   fox jumped   over    the log."
re.sub("\s\s+" , " ", s)

要么

re.sub("\s\s+", " ", s)

正如使用者Martin Thoma在评论中所提到的,因为逗号前的空格在PEP 8中被列为“ 宠儿”

import re
s = "The   fox jumped   over    the log."
re.sub("\s\s+" , " ", s)

or

re.sub("\s\s+", " ", s)

since the space before comma is listed as a pet peeve in PEP 8, as mentioned by user Martin Thoma in the comments.


回答 3

将正则表达式与“ \ s”一起使用并执行简单的string.split()也会删除其他空格,例如换行符,回车符,制表符。除非需要这样做,否则我只介绍多个示例。

我使用11个段落,1000个单词,6665字节的Lorem Ipsum进行了真实的时间测试,并在整个过程中使用了随机长度的额外空间:

original_string = ''.join(word + (' ' * random.randint(1, 10)) for word in lorem_ipsum.split(' '))

一衬垫将基本上做任何前/后间隔的条带,并且它保留一个前/后空间(但只ONE ;-)。

# setup = '''

import re

def while_replace(string):
    while '  ' in string:
        string = string.replace('  ', ' ')

    return string

def re_replace(string):
    return re.sub(r' {2,}' , ' ', string)

def proper_join(string):
    split_string = string.split(' ')

    # To account for leading/trailing spaces that would simply be removed
    beg = ' ' if not split_string[ 0] else ''
    end = ' ' if not split_string[-1] else ''

    # versus simply ' '.join(item for item in string.split(' ') if item)
    return beg + ' '.join(item for item in split_string if item) + end

original_string = """Lorem    ipsum        ... no, really, it kept going...          malesuada enim feugiat.         Integer imperdiet    erat."""

assert while_replace(original_string) == re_replace(original_string) == proper_join(original_string)

#'''

# while_replace_test
new_string = original_string[:]

new_string = while_replace(new_string)

assert new_string != original_string

# re_replace_test
new_string = original_string[:]

new_string = re_replace(new_string)

assert new_string != original_string

# proper_join_test
new_string = original_string[:]

new_string = proper_join(new_string)

assert new_string != original_string

注意: while版本”制作了的副本original_string,因为我相信一旦在第一次运行中对其进行了修改,后续运行就会更快(如果只是一点点的话)。随着时间的增加,我将此字符串副本添加到其他两个字符串中,以便时间仅显示逻辑上的差异。 请记住,主要stmttimeit情况下,将只执行一次 ; 我执行此操作的原始方式是,while循环在相同的标签上工作original_string,因此第二次运行将无事可做。现在设置的方式,使用两个不同的标签调用函数,这没有问题。我assert向所有工作人员添加了语句,以验证我们在每次迭代中都对某些内容进行了更改(对于那些可能令人怀疑的人)。例如,更改为它并中断:

# while_replace_test
new_string = original_string[:]

new_string = while_replace(new_string)

assert new_string != original_string # will break the 2nd iteration

while '  ' in original_string:
    original_string = original_string.replace('  ', ' ')

Tests run on a laptop with an i5 processor running Windows 7 (64-bit).

timeit.Timer(stmt = test, setup = setup).repeat(7, 1000)

test_string = 'The   fox jumped   over\n\t    the log.' # trivial

Python 2.7.3, 32-bit, Windows
                test |      minum |    maximum |    average |     median
---------------------+------------+------------+------------+-----------
  while_replace_test |   0.001066 |   0.001260 |   0.001128 |   0.001092
     re_replace_test |   0.003074 |   0.003941 |   0.003357 |   0.003349
    proper_join_test |   0.002783 |   0.004829 |   0.003554 |   0.003035

Python 2.7.3, 64-bit, Windows
                test |      minum |    maximum |    average |     median
---------------------+------------+------------+------------+-----------
  while_replace_test |   0.001025 |   0.001079 |   0.001052 |   0.001051
     re_replace_test |   0.003213 |   0.004512 |   0.003656 |   0.003504
    proper_join_test |   0.002760 |   0.006361 |   0.004626 |   0.004600

Python 3.2.3, 32-bit, Windows
                test |      minum |    maximum |    average |     median
---------------------+------------+------------+------------+-----------
  while_replace_test |   0.001350 |   0.002302 |   0.001639 |   0.001357
     re_replace_test |   0.006797 |   0.008107 |   0.007319 |   0.007440
    proper_join_test |   0.002863 |   0.003356 |   0.003026 |   0.002975

Python 3.3.3, 64-bit, Windows
                test |      minum |    maximum |    average |     median
---------------------+------------+------------+------------+-----------
  while_replace_test |   0.001444 |   0.001490 |   0.001460 |   0.001459
     re_replace_test |   0.011771 |   0.012598 |   0.012082 |   0.011910
    proper_join_test |   0.003741 |   0.005933 |   0.004341 |   0.004009

test_string = lorem_ipsum
# Thanks to http://www.lipsum.com/
# "Generated 11 paragraphs, 1000 words, 6665 bytes of Lorem Ipsum"

Python 2.7.3, 32-bit
                test |      minum |    maximum |    average |     median
---------------------+------------+------------+------------+-----------
  while_replace_test |   0.342602 |   0.387803 |   0.359319 |   0.356284
     re_replace_test |   0.337571 |   0.359821 |   0.348876 |   0.348006
    proper_join_test |   0.381654 |   0.395349 |   0.388304 |   0.388193    

Python 2.7.3, 64-bit
                test |      minum |    maximum |    average |     median
---------------------+------------+------------+------------+-----------
  while_replace_test |   0.227471 |   0.268340 |   0.240884 |   0.236776
     re_replace_test |   0.301516 |   0.325730 |   0.308626 |   0.307852
    proper_join_test |   0.358766 |   0.383736 |   0.370958 |   0.371866    

Python 3.2.3, 32-bit
                test |      minum |    maximum |    average |     median
---------------------+------------+------------+------------+-----------
  while_replace_test |   0.438480 |   0.463380 |   0.447953 |   0.446646
     re_replace_test |   0.463729 |   0.490947 |   0.472496 |   0.468778
    proper_join_test |   0.397022 |   0.427817 |   0.406612 |   0.402053    

Python 3.3.3, 64-bit
                test |      minum |    maximum |    average |     median
---------------------+------------+------------+------------+-----------
  while_replace_test |   0.284495 |   0.294025 |   0.288735 |   0.289153
     re_replace_test |   0.501351 |   0.525673 |   0.511347 |   0.508467
    proper_join_test |   0.422011 |   0.448736 |   0.436196 |   0.440318

对于琐碎的字符串,似乎while循环是最快的,其次是Pythonic字符串拆分/连接,而regex则拉到后面。

对于非平凡的字符串,似乎还有更多需要考虑的地方。32位2.7?正则表达式可以解救!2.7 64位?一while环是最好的,通过一个体面的保证金。32位3.2,使用“ proper” join。64位3.3,进行while循环。再次。

最后,如果需要/在哪里/何时需要,人们可以提高性能,但始终最好记住这一口头禅

  1. 让它起作用
  2. 改正它
  3. 快一点

IANAL,YMMV,警告加油站!

Using regexes with “\s” and doing simple string.split()’s will also remove other whitespace – like newlines, carriage returns, tabs. Unless this is desired, to only do multiple spaces, I present these examples.

I used 11 paragraphs, 1000 words, 6665 bytes of Lorem Ipsum to get realistic time tests and used random-length extra spaces throughout:

original_string = ''.join(word + (' ' * random.randint(1, 10)) for word in lorem_ipsum.split(' '))

The one-liner will essentially do a strip of any leading/trailing spaces, and it preserves a leading/trailing space (but only ONE ;-).

# setup = '''

import re

def while_replace(string):
    while '  ' in string:
        string = string.replace('  ', ' ')

    return string

def re_replace(string):
    return re.sub(r' {2,}' , ' ', string)

def proper_join(string):
    split_string = string.split(' ')

    # To account for leading/trailing spaces that would simply be removed
    beg = ' ' if not split_string[ 0] else ''
    end = ' ' if not split_string[-1] else ''

    # versus simply ' '.join(item for item in string.split(' ') if item)
    return beg + ' '.join(item for item in split_string if item) + end

original_string = """Lorem    ipsum        ... no, really, it kept going...          malesuada enim feugiat.         Integer imperdiet    erat."""

assert while_replace(original_string) == re_replace(original_string) == proper_join(original_string)

#'''

# while_replace_test
new_string = original_string[:]

new_string = while_replace(new_string)

assert new_string != original_string

# re_replace_test
new_string = original_string[:]

new_string = re_replace(new_string)

assert new_string != original_string

# proper_join_test
new_string = original_string[:]

new_string = proper_join(new_string)

assert new_string != original_string

NOTE: The “while version” made a copy of the original_string, as I believe once modified on the first run, successive runs would be faster (if only by a bit). As this adds time, I added this string copy to the other two so that the times showed the difference only in the logic. Keep in mind that the main stmt on timeit instances will only be executed once; the original way I did this, the while loop worked on the same label, original_string, thus the second run, there would be nothing to do. The way it’s set up now, calling a function, using two different labels, that isn’t a problem. I’ve added assert statements to all the workers to verify we change something every iteration (for those who may be dubious). E.g., change to this and it breaks:

# while_replace_test
new_string = original_string[:]

new_string = while_replace(new_string)

assert new_string != original_string # will break the 2nd iteration

while '  ' in original_string:
    original_string = original_string.replace('  ', ' ')

Tests run on a laptop with an i5 processor running Windows 7 (64-bit).

timeit.Timer(stmt = test, setup = setup).repeat(7, 1000)

test_string = 'The   fox jumped   over\n\t    the log.' # trivial

Python 2.7.3, 32-bit, Windows
                test |      minum |    maximum |    average |     median
---------------------+------------+------------+------------+-----------
  while_replace_test |   0.001066 |   0.001260 |   0.001128 |   0.001092
     re_replace_test |   0.003074 |   0.003941 |   0.003357 |   0.003349
    proper_join_test |   0.002783 |   0.004829 |   0.003554 |   0.003035

Python 2.7.3, 64-bit, Windows
                test |      minum |    maximum |    average |     median
---------------------+------------+------------+------------+-----------
  while_replace_test |   0.001025 |   0.001079 |   0.001052 |   0.001051
     re_replace_test |   0.003213 |   0.004512 |   0.003656 |   0.003504
    proper_join_test |   0.002760 |   0.006361 |   0.004626 |   0.004600

Python 3.2.3, 32-bit, Windows
                test |      minum |    maximum |    average |     median
---------------------+------------+------------+------------+-----------
  while_replace_test |   0.001350 |   0.002302 |   0.001639 |   0.001357
     re_replace_test |   0.006797 |   0.008107 |   0.007319 |   0.007440
    proper_join_test |   0.002863 |   0.003356 |   0.003026 |   0.002975

Python 3.3.3, 64-bit, Windows
                test |      minum |    maximum |    average |     median
---------------------+------------+------------+------------+-----------
  while_replace_test |   0.001444 |   0.001490 |   0.001460 |   0.001459
     re_replace_test |   0.011771 |   0.012598 |   0.012082 |   0.011910
    proper_join_test |   0.003741 |   0.005933 |   0.004341 |   0.004009

test_string = lorem_ipsum
# Thanks to http://www.lipsum.com/
# "Generated 11 paragraphs, 1000 words, 6665 bytes of Lorem Ipsum"

Python 2.7.3, 32-bit
                test |      minum |    maximum |    average |     median
---------------------+------------+------------+------------+-----------
  while_replace_test |   0.342602 |   0.387803 |   0.359319 |   0.356284
     re_replace_test |   0.337571 |   0.359821 |   0.348876 |   0.348006
    proper_join_test |   0.381654 |   0.395349 |   0.388304 |   0.388193    

Python 2.7.3, 64-bit
                test |      minum |    maximum |    average |     median
---------------------+------------+------------+------------+-----------
  while_replace_test |   0.227471 |   0.268340 |   0.240884 |   0.236776
     re_replace_test |   0.301516 |   0.325730 |   0.308626 |   0.307852
    proper_join_test |   0.358766 |   0.383736 |   0.370958 |   0.371866    

Python 3.2.3, 32-bit
                test |      minum |    maximum |    average |     median
---------------------+------------+------------+------------+-----------
  while_replace_test |   0.438480 |   0.463380 |   0.447953 |   0.446646
     re_replace_test |   0.463729 |   0.490947 |   0.472496 |   0.468778
    proper_join_test |   0.397022 |   0.427817 |   0.406612 |   0.402053    

Python 3.3.3, 64-bit
                test |      minum |    maximum |    average |     median
---------------------+------------+------------+------------+-----------
  while_replace_test |   0.284495 |   0.294025 |   0.288735 |   0.289153
     re_replace_test |   0.501351 |   0.525673 |   0.511347 |   0.508467
    proper_join_test |   0.422011 |   0.448736 |   0.436196 |   0.440318

For the trivial string, it would seem that a while-loop is the fastest, followed by the Pythonic string-split/join, and regex pulling up the rear.

For non-trivial strings, seems there’s a bit more to consider. 32-bit 2.7? It’s regex to the rescue! 2.7 64-bit? A while loop is best, by a decent margin. 32-bit 3.2, go with the “proper” join. 64-bit 3.3, go for a while loop. Again.

In the end, one can improve performance if/where/when needed, but it’s always best to remember the mantra:

  1. Make It Work
  2. Make It Right
  3. Make It Fast

IANAL, YMMV, Caveat Emptor!


回答 4

我必须同意保罗·麦圭尔的评论。对我来说,

' '.join(the_string.split())

比使用正则表达式要好得多。

我的测量结果(Linux和Python 2.5)显示split-then-join几乎比执行“ re.sub(…)”快五倍,如果预编译一次regex并执行操作,则快三倍。多次。它是比较容易理解的任何措施- 很多更Python。

I have to agree with Paul McGuire’s comment. To me,

' '.join(the_string.split())

is vastly preferable to whipping out a regex.

My measurements (Linux and Python 2.5) show the split-then-join to be almost five times faster than doing the “re.sub(…)”, and still three times faster if you precompile the regex once and do the operation multiple times. And it is by any measure easier to understand — much more Pythonic.


回答 5

与以前的解决方案类似,但更具体:用一个替换两个或多个空格:

>>> import re
>>> s = "The   fox jumped   over    the log."
>>> re.sub('\s{2,}', ' ', s)
'The fox jumped over the log.'

Similar to the previous solutions, but more specific: replace two or more spaces with one:

>>> import re
>>> s = "The   fox jumped   over    the log."
>>> re.sub('\s{2,}', ' ', s)
'The fox jumped over the log.'

回答 6

一个简单的灵魂

>>> import re
>>> s="The   fox jumped   over    the log."
>>> print re.sub('\s+',' ', s)
The fox jumped over the log.

A simple soultion

>>> import re
>>> s="The   fox jumped   over    the log."
>>> print re.sub('\s+',' ', s)
The fox jumped over the log.

回答 7

您也可以在Pandas DataFrame中使用字符串拆分技术,而无需使用.apply(..),如果您需要对大量字符串快速执行操作,此方法将非常有用。这是一行:

df['message'] = (df['message'].str.split()).str.join(' ')

You can also use the string splitting technique in a Pandas DataFrame without needing to use .apply(..), which is useful if you need to perform the operation quickly on a large number of strings. Here it is on one line:

df['message'] = (df['message'].str.split()).str.join(' ')

回答 8

import re
string = re.sub('[ \t\n]+', ' ', 'The     quick brown                \n\n             \t        fox')

这将删除所有选项卡,换行和带有单个空格的多个空格。

import re
string = re.sub('[ \t\n]+', ' ', 'The     quick brown                \n\n             \t        fox')

This will remove all the tabs, new lines and multiple white spaces with single white space.


回答 9

我尝试了以下方法,甚至可以在极端情况下使用:

str1='          I   live    on    earth           '

' '.join(str1.split())

但是,如果您更喜欢正则表达式,则可以通过以下方式完成:

re.sub('\s+', ' ', str1)

尽管必须进行一些预处理才能删除尾随和结尾的空间。

I have tried the following method and it even works with the extreme case like:

str1='          I   live    on    earth           '

' '.join(str1.split())

But if you prefer a regular expression it can be done as:

re.sub('\s+', ' ', str1)

Although some preprocessing has to be done in order to remove the trailing and ending space.


回答 10

这似乎也可行:

while "  " in s:
    s = s.replace("  ", " ")

其中变量s代表您的字符串。

This also seems to work:

while "  " in s:
    s = s.replace("  ", " ")

Where the variable s represents your string.


回答 11

在某些情况下,它是希望用的单个实例来代替每个空格字符的连续出现字符。您将使用带有反向引用的正则表达式来执行此操作。

(\s)\1{1,}匹配任何空白字符,后跟一个或多个该字符。现在,您所需要做的就是指定第一个组(\1)作为匹配项的替换。

将其包装在函数中:

import re

def normalize_whitespace(string):
    return re.sub(r'(\s)\1{1,}', r'\1', string)
>>> normalize_whitespace('The   fox jumped   over    the log.')
'The fox jumped over the log.'
>>> normalize_whitespace('First    line\t\t\t \n\n\nSecond    line')
'First line\t \nSecond line'

In some cases it’s desirable to replace consecutive occurrences of every whitespace character with a single instance of that character. You’d use a regular expression with backreferences to do that.

(\s)\1{1,} matches any whitespace character, followed by one or more occurrences of that character. Now, all you need to do is specify the first group (\1) as the replacement for the match.

Wrapping this in a function:

import re

def normalize_whitespace(string):
    return re.sub(r'(\s)\1{1,}', r'\1', string)
>>> normalize_whitespace('The   fox jumped   over    the log.')
'The fox jumped over the log.'
>>> normalize_whitespace('First    line\t\t\t \n\n\nSecond    line')
'First line\t \nSecond line'

回答 12

另一种选择:

>>> import re
>>> str = 'this is a            string with    multiple spaces and    tabs'
>>> str = re.sub('[ \t]+' , ' ', str)
>>> print str
this is a string with multiple spaces and tabs

Another alternative:

>>> import re
>>> str = 'this is a            string with    multiple spaces and    tabs'
>>> str = re.sub('[ \t]+' , ' ', str)
>>> print str
this is a string with multiple spaces and tabs

回答 13

一行代码删除句子之前,之后和之内的所有多余空格:

sentence = "  The   fox jumped   over    the log.  "
sentence = ' '.join(filter(None,sentence.split(' ')))

说明:

  1. 将整个字符串拆分为一个列表。
  2. 从列表中过滤空元素。
  3. 用一个空格重新合并其余元素*

*其余元素应该是单词或带有标点符号的单词等。我没有对此进行广泛的测试,但这应该是一个很好的起点。祝一切顺利!

One line of code to remove all extra spaces before, after, and within a sentence:

sentence = "  The   fox jumped   over    the log.  "
sentence = ' '.join(filter(None,sentence.split(' ')))

Explanation:

  1. Split the entire string into a list.
  2. Filter empty elements from the list.
  3. Rejoin the remaining elements* with a single space

*The remaining elements should be words or words with punctuations, etc. I did not test this extensively, but this should be a good starting point. All the best!


回答 14

适用于Python开发人员的解决方案:

import re

text1 = 'Python      Exercises    Are   Challenging Exercises'
print("Original string: ", text1)
print("Without extra spaces: ", re.sub(' +', ' ', text1))

输出:
Original string: Python Exercises Are Challenging Exercises Without extra spaces: Python Exercises Are Challenging Exercises

Solution for Python developers:

import re

text1 = 'Python      Exercises    Are   Challenging Exercises'
print("Original string: ", text1)
print("Without extra spaces: ", re.sub(' +', ' ', text1))

Output:
Original string: Python Exercises Are Challenging Exercises Without extra spaces: Python Exercises Are Challenging Exercises


回答 15

def unPretty(S):
   # Given a dictionary, JSON, list, float, int, or even a string...
   # return a string stripped of CR, LF replaced by space, with multiple spaces reduced to one.
   return ' '.join(str(S).replace('\n', ' ').replace('\r', '').split())
def unPretty(S):
   # Given a dictionary, JSON, list, float, int, or even a string...
   # return a string stripped of CR, LF replaced by space, with multiple spaces reduced to one.
   return ' '.join(str(S).replace('\n', ' ').replace('\r', '').split())

回答 16

用户生成的字符串最快的速度是:

if '  ' in text:
    while '  ' in text:
        text = text.replace('  ', ' ')

短路使其比pythonlarry的综合答案要快一些。如果您追求效率,并严格寻求除掉单个空间种类的多余空白,则可以这样做

The fastest you can get for user-generated strings is:

if '  ' in text:
    while '  ' in text:
        text = text.replace('  ', ' ')

The short circuiting makes it slightly faster than pythonlarry’s comprehensive answer. Go for this if you’re after efficiency and are strictly looking to weed out extra whitespaces of the single space variety.


回答 17

非常令人惊讶-没有人发布过简单的功能,它会比所有其他发布的解决方案快得多。它去了:

def compactSpaces(s):
    os = ""
    for c in s:
        if c != " " or os[-1] != " ":
            os += c 
    return os

Quite surprising – no one posted simple function which will be much faster than ALL other posted solutions. Here it goes:

def compactSpaces(s):
    os = ""
    for c in s:
        if c != " " or os[-1] != " ":
            os += c 
    return os

回答 18

如果您要处理的是空格,则在None上分割将不会在返回值中包含空字符串。

5.6.1。字符串方法,str.split()

If it’s whitespace you’re dealing with, splitting on None will not include an empty string in the returned value.

5.6.1. String Methods, str.split()


回答 19

string = 'This is a             string full of spaces          and taps'
string = string.split(' ')
while '' in string:
    string.remove('')
string = ' '.join(string)
print(string)

结果

这是一个充满空格和水龙头的字符串

string = 'This is a             string full of spaces          and taps'
string = string.split(' ')
while '' in string:
    string.remove('')
string = ' '.join(string)
print(string)

Results:

This is a string full of spaces and taps


回答 20

要删除空格,请考虑单词之间的前导,尾随和多余的空格,请使用:

(?<=\s) +|^ +(?=\s)| (?= +[\n\0])

第一个or处理前导空白,第二个or处理字符串前导空白的开始,最后一个处理尾随空白。

为了使用证明,此链接将为您提供测试。

https://regex101.com/r/meBYli/4

这将与re.split函数一起使用。

To remove white space, considering leading, trailing and extra white space in between words, use:

(?<=\s) +|^ +(?=\s)| (?= +[\n\0])

The first or deals with leading white space, the second or deals with start of string leading white space, and the last one deals with trailing white space.

For proof of use, this link will provide you with a test.

https://regex101.com/r/meBYli/4

This is to be used with the re.split function.


回答 21

我有我在大学中使用过的简单方法。

line = "I     have            a       nice    day."

end = 1000
while end != 0:
    line.replace("  ", " ")
    end -= 1

这会将每个双倍空格替换为一个空格并将执行1000次。这意味着您可以有2000个额外空间,并且仍然可以使用。:)

I have my simple method which I have used in college.

line = "I     have            a       nice    day."

end = 1000
while end != 0:
    line.replace("  ", " ")
    end -= 1

This will replace every double space with a single space and will do it 1000 times. It means you can have 2000 extra spaces and will still work. :)


回答 22

我有一个不分裂的简单方法:

a = "Lorem   Ipsum Darum     Diesrum!"
while True:
    count = a.find("  ")
    if count > 0:
        a = a.replace("  ", " ")
        count = a.find("  ")
        continue
    else:
        break

print(a)

I’ve got a simple method without splitting:

a = "Lorem   Ipsum Darum     Diesrum!"
while True:
    count = a.find("  ")
    if count > 0:
        a = a.replace("  ", " ")
        count = a.find("  ")
        continue
    else:
        break

print(a)

回答 23

import re

Text = " You can select below trims for removing white space!!   BR Aliakbar     "
  # trims all white spaces
print('Remove all space:',re.sub(r"\s+", "", Text), sep='') 
# trims left space
print('Remove leading space:', re.sub(r"^\s+", "", Text), sep='') 
# trims right space
print('Remove trailing spaces:', re.sub(r"\s+$", "", Text), sep='')  
# trims both
print('Remove leading and trailing spaces:', re.sub(r"^\s+|\s+$", "", Text), sep='')
# replace more than one white space in the string with one white space
print('Remove more than one space:',re.sub(' +', ' ',Text), sep='') 

结果:

删除所有空间:您可以选择下面的修剪来删除空白!BRAliakbar删除前导空间:您可以选择下面的修剪来删除空白!BR Aliakbar
删除尾部空格:您可以选择以下修剪以删除空白!BR Aliakbar删除前导和尾随空格:您可以选择以下修饰来删除空白!!BR Aliakbar删除多个空格:您可以选择以下修剪以删除空白!!BR Aliakbar

import re

Text = " You can select below trims for removing white space!!   BR Aliakbar     "
  # trims all white spaces
print('Remove all space:',re.sub(r"\s+", "", Text), sep='') 
# trims left space
print('Remove leading space:', re.sub(r"^\s+", "", Text), sep='') 
# trims right space
print('Remove trailing spaces:', re.sub(r"\s+$", "", Text), sep='')  
# trims both
print('Remove leading and trailing spaces:', re.sub(r"^\s+|\s+$", "", Text), sep='')
# replace more than one white space in the string with one white space
print('Remove more than one space:',re.sub(' +', ' ',Text), sep='') 

Result:

Remove all space:Youcanselectbelowtrimsforremovingwhitespace!!BRAliakbar Remove leading space:You can select below trims for removing white space!! BR Aliakbar
Remove trailing spaces: You can select below trims for removing white space!! BR Aliakbar Remove leading and trailing spaces:You can select below trims for removing white space!! BR Aliakbar Remove more than one space: You can select below trims for removing white space!! BR Aliakbar


回答 24

我没有在其他示例中读很多书,但是我刚刚创建了用于合并多个连续空格字符的方法。

它不使用任何库,尽管就脚本长度而言比较长,但它不是一个复杂的实现:

def spaceMatcher(command):
    """
    Function defined to consolidate multiple whitespace characters in
    strings to a single space
    """
    # Initiate index to flag if more than one consecutive character
    iteration
    space_match = 0
    space_char = ""
    for char in command:
      if char == " ":
          space_match += 1
          space_char += " "
      elif (char != " ") & (space_match > 1):
          new_command = command.replace(space_char, " ")
          space_match = 0
          space_char = ""
      elif char != " ":
          space_match = 0
          space_char = ""
   return new_command

command = None
command = str(input("Please enter a command ->"))
print(spaceMatcher(command))
print(list(spaceMatcher(command)))

I haven’t read a lot into the other examples, but I have just created this method for consolidating multiple consecutive space characters.

It does not use any libraries, and whilst it is relatively long in terms of script length, it is not a complex implementation:

def spaceMatcher(command):
    """
    Function defined to consolidate multiple whitespace characters in
    strings to a single space
    """
    # Initiate index to flag if more than one consecutive character
    iteration
    space_match = 0
    space_char = ""
    for char in command:
      if char == " ":
          space_match += 1
          space_char += " "
      elif (char != " ") & (space_match > 1):
          new_command = command.replace(space_char, " ")
          space_match = 0
          space_char = ""
      elif char != " ":
          space_match = 0
          space_char = ""
   return new_command

command = None
command = str(input("Please enter a command ->"))
print(spaceMatcher(command))
print(list(spaceMatcher(command)))

如何将文本文件读入字符串变量并删除换行符?

问题:如何将文本文件读入字符串变量并删除换行符?

我使用以下代码段在python中读取文件:

with open ("data.txt", "r") as myfile:
    data=myfile.readlines()

输入文件为:

LLKKKKKKKKMMMMMMMMNNNNNNNNNNNNN
GGGGGGGGGHHHHHHHHHHHHHHHHHHHHEEEEEEEE

当我打印数据时

['LLKKKKKKKKMMMMMMMMNNNNNNNNNNNNN\n', 'GGGGGGGGGHHHHHHHHHHHHHHHHHHHHEEEEEEEE']

如我所见,数据是list形式形式的。我如何使其成为字符串?而且我怎么删除"\n""["以及"]"从中字符?

I use the following code segment to read a file in python:

with open ("data.txt", "r") as myfile:
    data=myfile.readlines()

Input file is:

LLKKKKKKKKMMMMMMMMNNNNNNNNNNNNN
GGGGGGGGGHHHHHHHHHHHHHHHHHHHHEEEEEEEE

and when I print data I get

['LLKKKKKKKKMMMMMMMMNNNNNNNNNNNNN\n', 'GGGGGGGGGHHHHHHHHHHHHHHHHHHHHEEEEEEEE']

As I see data is in list form. How do I make it string? And also how do I remove the "\n", "[", and "]" characters from it?


回答 0

您可以使用:

with open('data.txt', 'r') as file:
    data = file.read().replace('\n', '')

You could use:

with open('data.txt', 'r') as file:
    data = file.read().replace('\n', '')

回答 1

使用read(),而不是readline()

with open('data.txt', 'r') as myfile:
  data = myfile.read()

Use read(), not readline():

with open('data.txt', 'r') as myfile:
  data = myfile.read()

回答 2

您可以在一行中读取文件:

str = open('very_Important.txt', 'r').read()

请注意,这不会显式关闭文件。

当文件作为垃圾回收的一部分退出时,CPython将关闭文件。

但是其他python实现不会。要编写可移植的代码,最好with显式使用或关闭文件。做空并不总是更好。参见https://stackoverflow.com/a/7396043/362951

You can read from a file in one line:

str = open('very_Important.txt', 'r').read()

Please note that this does not close the file explicitly.

CPython will close the file when it exits as part of the garbage collection.

But other python implementations won’t. To write portable code, it is better to use with or close the file explicitly. Short is not always better. See https://stackoverflow.com/a/7396043/362951


回答 3

要将所有行连接到字符串中并删除新行,我通常使用:

with open('t.txt') as f:
  s = " ".join([x.strip() for x in f]) 

To join all lines into a string and remove new lines I normally use :

with open('t.txt') as f:
  s = " ".join([x.strip() for x in f]) 

回答 4

在Python 3.5或更高版本中,可以使用pathlib将文本文件的内容复制到一个变量中在一行中关闭该文件

from pathlib import Path
txt = Path('data.txt').read_text()

然后您可以使用str.replace删除换行符:

txt = txt.replace('\n', '')

In Python 3.5 or later, using pathlib you can copy text file contents into a variable and close the file in one line:

from pathlib import Path
txt = Path('data.txt').read_text()

and then you can use str.replace to remove the newlines:

txt = txt.replace('\n', '')

回答 5

with open("data.txt") as myfile:
    data="".join(line.rstrip() for line in myfile)

join()将加入一个字符串列表,而不带参数的rstrip()将从字符串末尾修剪空白,包括换行符。

with open("data.txt") as myfile:
    data="".join(line.rstrip() for line in myfile)

join() will join a list of strings, and rstrip() with no arguments will trim whitespace, including newlines, from the end of strings.


回答 6

这可以使用read()方法完成:

text_as_string = open('Your_Text_File.txt', 'r').read()

或者由于默认模式本身是“ r”(读取),因此只需使用,

text_as_string = open('Your_Text_File.txt').read()

This can be done using the read() method :

text_as_string = open('Your_Text_File.txt', 'r').read()

Or as the default mode itself is ‘r’ (read) so simply use,

text_as_string = open('Your_Text_File.txt').read()

回答 7

我已经摆弄了一段时间,并且更喜欢与read结合使用rstrip。如果不使用rstrip("\n"),Python会在字符串末尾添加换行符,这在大多数情况下不是很有用。

with open("myfile.txt") as f:
    file_content = f.read().rstrip("\n")
    print file_content

I have fiddled around with this for a while and have prefer to use use read in combination with rstrip. Without rstrip("\n"), Python adds a newline to the end of the string, which in most cases is not very useful.

with open("myfile.txt") as f:
    file_content = f.read().rstrip("\n")
    print file_content

回答 8

很难确切地知道您要做什么,但是这样的事情应该可以帮助您入门:

with open ("data.txt", "r") as myfile:
    data = ' '.join([line.replace('\n', '') for line in myfile.readlines()])

It’s hard to tell exactly what you’re after, but something like this should get you started:

with open ("data.txt", "r") as myfile:
    data = ' '.join([line.replace('\n', '') for line in myfile.readlines()])

回答 9

我很惊讶没有人提及splitlines()

with open ("data.txt", "r") as myfile:
    data = myfile.read().splitlines()

data现在,变量是一个列表,在打印时如下所示:

['LLKKKKKKKKMMMMMMMMNNNNNNNNNNNNN', 'GGGGGGGGGHHHHHHHHHHHHHHHHHHHHEEEEEEEE']

请注意,没有换行符(\n)。

那时,这听起来像是要将行打印回控制台,您可以使用for循环来实现:

for line in data:
    print line

I’m surprised nobody mentioned splitlines() yet.

with open ("data.txt", "r") as myfile:
    data = myfile.read().splitlines()

Variable data is now a list that looks like this when printed:

['LLKKKKKKKKMMMMMMMMNNNNNNNNNNNNN', 'GGGGGGGGGHHHHHHHHHHHHHHHHHHHHEEEEEEEE']

Note there are no newlines (\n).

At that point, it sounds like you want to print back the lines to console, which you can achieve with a for loop:

for line in data:
    print line

回答 10

您还可以删除每行并连接成最终字符串。

myfile = open("data.txt","r")
data = ""
lines = myfile.readlines()
for line in lines:
    data = data + line.strip();

这也可以解决。

You can also strip each line and concatenate into a final string.

myfile = open("data.txt","r")
data = ""
lines = myfile.readlines()
for line in lines:
    data = data + line.strip();

This would also work out just fine.


回答 11

您可以将其压缩为两行代码!!!!

content = open('filepath','r').read().replace('\n',' ')
print(content)

如果您的文件显示为:

hello how are you?
who are you?
blank blank

python输出

hello how are you? who are you? blank blank

you can compress this into one into two lines of code!!!

content = open('filepath','r').read().replace('\n',' ')
print(content)

if your file reads:

hello how are you?
who are you?
blank blank

python output

hello how are you? who are you? blank blank

回答 12

这是一个可复制粘贴的单行解决方案,它也关闭了文件对象:

_ = open('data.txt', 'r'); data = _.read(); _.close()

This is a one line, copy-pasteable solution that also closes the file object:

_ = open('data.txt', 'r'); data = _.read(); _.close()

回答 13

f = open('data.txt','r')
string = ""
while 1:
    line = f.readline()
    if not line:break
    string += line

f.close()


print string
f = open('data.txt','r')
string = ""
while 1:
    line = f.readline()
    if not line:break
    string += line

f.close()


print string

回答 14

python3:如果您对方括号语法不陌生,请使用Google“列表注释”。

 with open('data.txt') as f:
     lines = [ line.strip( ) for line in list(f) ]

python3: Google “list comphrension” if the square bracket syntax is new to you.

 with open('data.txt') as f:
     lines = [ line.strip( ) for line in list(f) ]

回答 15

你有试过吗?

x = "yourfilename.txt"
y = open(x, 'r').read()

print(y)

Have you tried this?

x = "yourfilename.txt"
y = open(x, 'r').read()

print(y)

回答 16

我认为没有人解决您问题的[]部分。当您将每一行读入变量时,由于在用\替换\ n之前有多行,所以最终创建了一个列表。如果您有一个x变量,并通过以下方式将其打印出来

X

或打印(x)

或str(x)

您将看到带有括号的整个列表。如果您调用(排序数组)的每个元素

x [0]则省略括号。如果您使用str()函数,您将只会看到数据,而不会看到“”。str(x [0])

I don’t feel that anyone addressed the [ ] part of your question. When you read each line into your variable, because there were multiple lines before you replaced the \n with ” you ended up creating a list. If you have a variable of x and print it out just by

x

or print(x)

or str(x)

You will see the entire list with the brackets. If you call each element of the (array of sorts)

x[0] then it omits the brackets. If you use the str() function you will see just the data and not the ” either. str(x[0])


回答 17

也许您可以尝试一下?我在程序中使用它。

Data= open ('data.txt', 'r')
data = Data.readlines()
for i in range(len(data)):
    data[i] = data[i].strip()+ ' '
data = ''.join(data).strip()

Maybe you could try this? I use this in my programs.

Data= open ('data.txt', 'r')
data = Data.readlines()
for i in range(len(data)):
    data[i] = data[i].strip()+ ' '
data = ''.join(data).strip()

回答 18

正则表达式也适用:

import re
with open("depression.txt") as f:
     l = re.split(' ', re.sub('\n',' ', f.read()))[:-1]

print (l)

[‘I’,’feel’,’empty’,’and’,’dead’,’inside’]

Regular expression works too:

import re
with open("depression.txt") as f:
     l = re.split(' ', re.sub('\n',' ', f.read()))[:-1]

print (l)

[‘I’, ‘feel’, ’empty’, ‘and’, ‘dead’, ‘inside’]


回答 19

要使用Python删除换行符,您可以使用replace字符串函数。

本示例删除所有3种换行符:

my_string = open('lala.json').read()
print(my_string)

my_string = my_string.replace("\r","").replace("\n","")
print(my_string)

示例文件为:

{
  "lala": "lulu",
  "foo": "bar"
}

您可以使用以下重播方案进行尝试:

https://repl.it/repls/AnnualJointHardware

在此处输入图片说明

To remove line breaks using Python you can use replace function of a string.

This example removes all 3 types of line breaks:

my_string = open('lala.json').read()
print(my_string)

my_string = my_string.replace("\r","").replace("\n","")
print(my_string)

Example file is:

{
  "lala": "lulu",
  "foo": "bar"
}

You can try it using this replay scenario:

https://repl.it/repls/AnnualJointHardware

enter image description here


回答 20

这有效:将文件更改为:

LLKKKKKKKKMMMMMMMMNNNNNNNNNNNNN GGGGGGGGGHHHHHHHHHHHHHHHHHHHHEEEEEEEE

然后:

file = open("file.txt")
line = file.read()
words = line.split()

这将创建一个列表words,该列表等于:

['LLKKKKKKKKMMMMMMMMNNNNNNNNNNNNN', 'GGGGGGGGGHHHHHHHHHHHHHHHHHHHHEEEEEEEE']

那摆脱了“ \ n”。要回答括号中的问题,只需执行以下操作:

for word in words: # Assuming words is the list above
    print word # Prints each word in file on a different line

要么:

print words[0] + ",", words[1] # Note that the "+" symbol indicates no spaces
#The comma not in parentheses indicates a space

返回:

LLKKKKKKKKMMMMMMMMNNNNNNNNNNNNN, GGGGGGGGGHHHHHHHHHHHHHHHHHHHHEEEEEEEE

This works: Change your file to:

LLKKKKKKKKMMMMMMMMNNNNNNNNNNNNN GGGGGGGGGHHHHHHHHHHHHHHHHHHHHEEEEEEEE

Then:

file = open("file.txt")
line = file.read()
words = line.split()

This creates a list named words that equals:

['LLKKKKKKKKMMMMMMMMNNNNNNNNNNNNN', 'GGGGGGGGGHHHHHHHHHHHHHHHHHHHHEEEEEEEE']

That got rid of the “\n”. To answer the part about the brackets getting in your way, just do this:

for word in words: # Assuming words is the list above
    print word # Prints each word in file on a different line

Or:

print words[0] + ",", words[1] # Note that the "+" symbol indicates no spaces
#The comma not in parentheses indicates a space

This returns:

LLKKKKKKKKMMMMMMMMNNNNNNNNNNNNN, GGGGGGGGGHHHHHHHHHHHHHHHHHHHHEEEEEEEE

回答 21

with open(player_name, 'r') as myfile:
 data=myfile.readline()
 list=data.split(" ")
 word=list[0]

此代码将帮助您阅读第一行,然后使用list and split选项可以转换以空格分隔的第一行单词以存储在列表中。

比起您可以轻松访问任何单词,甚至将其存储在字符串中而言。

您也可以使用for循环执行相同的操作。

with open(player_name, 'r') as myfile:
 data=myfile.readline()
 list=data.split(" ")
 word=list[0]

This code will help you to read the first line and then using the list and split option you can convert the first line word separated by space to be stored in a list.

Than you can easily access any word, or even store it in a string.

You can also do the same thing with using a for loop.


回答 22

file = open("myfile.txt", "r")
lines = file.readlines()
str = ''                                     #string declaration

for i in range(len(lines)):
    str += lines[i].rstrip('\n') + ' '

print str
file = open("myfile.txt", "r")
lines = file.readlines()
str = ''                                     #string declaration

for i in range(len(lines)):
    str += lines[i].rstrip('\n') + ' '

print str

回答 23

尝试以下方法:

with open('data.txt', 'r') as myfile:
    data = myfile.read()

    sentences = data.split('\\n')
    for sentence in sentences:
        print(sentence)

注意:它不会删除\n。仅用于查看文本,好像没有\n

Try the following:

with open('data.txt', 'r') as myfile:
    data = myfile.read()

    sentences = data.split('\\n')
    for sentence in sentences:
        print(sentence)

Caution: It does not remove the \n. It is just for viewing the text as if there were no \n


计算字符串中字符的出现次数

问题:计算字符串中字符的出现次数

计算字符串中字符出现次数的最简单方法是什么?

例如计算'a'出现在其中的次数'Mary had a little lamb'

What’s the simplest way to count the number of occurrences of a character in a string?

e.g. count the number of times 'a' appears in 'Mary had a little lamb'


回答 0

str.count(sub [,start [,end]])

返回sub范围中的子字符串不重叠的次数[start, end]。可选参数startend并按片表示法解释。

>>> sentence = 'Mary had a little lamb'
>>> sentence.count('a')
4

str.count(sub[, start[, end]])

Return the number of non-overlapping occurrences of substring sub in the range [start, end]. Optional arguments start and end are interpreted as in slice notation.

>>> sentence = 'Mary had a little lamb'
>>> sentence.count('a')
4

回答 1

您可以使用count()

>>> 'Mary had a little lamb'.count('a')
4

You can use count() :

>>> 'Mary had a little lamb'.count('a')
4

回答 2

正如其他答案所说,使用字符串方法count()可能是最简单的方法,但是如果您经常这样做,请查看collections.Counter

from collections import Counter
my_str = "Mary had a little lamb"
counter = Counter(my_str)
print counter['a']

As other answers said, using the string method count() is probably the simplest, but if you’re doing this frequently, check out collections.Counter:

from collections import Counter
my_str = "Mary had a little lamb"
counter = Counter(my_str)
print counter['a']

回答 3

正则表达式可能吗?

import re
my_string = "Mary had a little lamb"
len(re.findall("a", my_string))

Regular expressions maybe?

import re
my_string = "Mary had a little lamb"
len(re.findall("a", my_string))

回答 4

myString.count('a');

更多信息在这里

myString.count('a');

more info here


回答 5

Python-3.x:

"aabc".count("a")

str.count(sub [,start [,end]])

返回子字符串sub在[start,end]范围内不重叠的次数。可选参数start和end解释为切片表示法。

Python-3.x:

"aabc".count("a")

str.count(sub[, start[, end]])

Return the number of non-overlapping occurrences of substring sub in the range [start, end]. Optional arguments start and end are interpreted as in slice notation.


回答 6

str.count(a)是计算字符串中单个字符的最佳解决方案。但是,如果您需要计算更多的字符,则必须读取整个字符串与要计算的字符一样多的次数。

这项工作的更好方法是:

from collections import defaultdict

text = 'Mary had a little lamb'
chars = defaultdict(int)

for char in text:
    chars[char] += 1

因此,您将拥有一个dict,它返回字符串中每个字母(0如果不存在)的出现次数。

>>>chars['a']
4
>>>chars['x']
0

对于不区分大小写的计数器,您可以通过子类化来覆盖mutator和accessor方法defaultdict(基类的方法是只读的):

class CICounter(defaultdict):
    def __getitem__(self, k):
        return super().__getitem__(k.lower())

    def __setitem__(self, k, v):
        super().__setitem__(k.lower(), v)


chars = CICounter(int)

for char in text:
    chars[char] += 1

>>>chars['a']
4
>>>chars['M']
2
>>>chars['x']
0

str.count(a) is the best solution to count a single character in a string. But if you need to count more characters you would have to read the whole string as many times as characters you want to count.

A better approach for this job would be:

from collections import defaultdict

text = 'Mary had a little lamb'
chars = defaultdict(int)

for char in text:
    chars[char] += 1

So you’ll have a dict that returns the number of occurrences of every letter in the string and 0 if it isn’t present.

>>>chars['a']
4
>>>chars['x']
0

For a case insensitive counter you could override the mutator and accessor methods by subclassing defaultdict (base class’ ones are read-only):

class CICounter(defaultdict):
    def __getitem__(self, k):
        return super().__getitem__(k.lower())

    def __setitem__(self, k, v):
        super().__setitem__(k.lower(), v)


chars = CICounter(int)

for char in text:
    chars[char] += 1

>>>chars['a']
4
>>>chars['M']
2
>>>chars['x']
0

回答 7

这个简单而直接的功能可能会有所帮助:

def check_freq(x):
    freq = {}
    for c in x:
       freq[c] = str.count(c)
    return freq

check_freq("abbabcbdbabdbdbabababcbcbab")
{'a': 7, 'b': 14, 'c': 3, 'd': 3}

This easy and straight forward function might help:

def check_freq(x):
    freq = {}
    for c in x:
       freq[c] = str.count(c)
    return freq

check_freq("abbabcbdbabdbdbabababcbcbab")
{'a': 7, 'b': 14, 'c': 3, 'd': 3}

回答 8

如果要区分大小写(当然还有正则表达式的全部功能),则正则表达式非常有用。

my_string = "Mary had a little lamb"
# simplest solution, using count, is case-sensitive
my_string.count("m")   # yields 1
import re
# case-sensitive with regex
len(re.findall("m", my_string))
# three ways to get case insensitivity - all yield 2
len(re.findall("(?i)m", my_string))
len(re.findall("m|M", my_string))
len(re.findall(re.compile("m",re.IGNORECASE), my_string))

请注意,正则表达式版本的运行时间大约是其十倍,这仅在my_string非常长或代码处于深循环内时才可能是一个问题。

Regular expressions are very useful if you want case-insensitivity (and of course all the power of regex).

my_string = "Mary had a little lamb"
# simplest solution, using count, is case-sensitive
my_string.count("m")   # yields 1
import re
# case-sensitive with regex
len(re.findall("m", my_string))
# three ways to get case insensitivity - all yield 2
len(re.findall("(?i)m", my_string))
len(re.findall("m|M", my_string))
len(re.findall(re.compile("m",re.IGNORECASE), my_string))

Be aware that the regex version takes on the order of ten times as long to run, which will likely be an issue only if my_string is tremendously long, or the code is inside a deep loop.


回答 9

a = 'have a nice day'
symbol = 'abcdefghijklmnopqrstuvwxyz'
for key in symbol:
    print key, a.count(key)
a = 'have a nice day'
symbol = 'abcdefghijklmnopqrstuvwxyz'
for key in symbol:
    print key, a.count(key)

回答 10

str = "count a character occurance"

List = list(str)
print (List)
Uniq = set(List)
print (Uniq)

for key in Uniq:
    print (key, str.count(key))
str = "count a character occurance"

List = list(str)
print (List)
Uniq = set(List)
print (Uniq)

for key in Uniq:
    print (key, str.count(key))

回答 11

另一种方式来获得所有的字符数不使用Counter()count和正则表达式

counts_dict = {}
for c in list(sentence):
  if c not in counts_dict:
    counts_dict[c] = 0
  counts_dict[c] += 1

for key, value in counts_dict.items():
    print(key, value)

An alternative way to get all the character counts without using Counter(), count and regex

counts_dict = {}
for c in list(sentence):
  if c not in counts_dict:
    counts_dict[c] = 0
  counts_dict[c] += 1

for key, value in counts_dict.items():
    print(key, value)

回答 12

count绝对是计算字符串中字符出现次数的最简洁,最有效的方法,但是我尝试使用解决方案lambda,例如:

sentence = 'Mary had a little lamb'
sum(map(lambda x : 1 if 'a' in x else 0, sentence))

这将导致:

4

同样,这样做还有一个好处,如果该句子是包含与上述相同字符的子字符串列表,则由于使用,这也会给出正确的结果in。看一看 :

sentence = ['M', 'ar', 'y', 'had', 'a', 'little', 'l', 'am', 'b']
sum(map(lambda x : 1 if 'a' in x else 0, sentence))

这也导致:

4

当然,这仅在检查单个字符的出现(例如'a'在这种特殊情况下)时才起作用。

count is definitely the most concise and efficient way of counting the occurrence of a character in a string but I tried to come up with a solution using lambda, something like this :

sentence = 'Mary had a little lamb'
sum(map(lambda x : 1 if 'a' in x else 0, sentence))

This will result in :

4

Also, there is one more advantage to this is if the sentence is a list of sub-strings containing same characters as above, then also this gives the correct result because of the use of in. Have a look :

sentence = ['M', 'ar', 'y', 'had', 'a', 'little', 'l', 'am', 'b']
sum(map(lambda x : 1 if 'a' in x else 0, sentence))

This also results in :

4

But Of-course this will work only when checking occurrence of single character such as 'a' in this particular case.


回答 13

“不使用count来查找想要的字符串中的字符”方法。

import re

def count(s, ch):

   pass

def main():

   s = raw_input ("Enter strings what you like, for example, 'welcome': ")  

   ch = raw_input ("Enter you want count characters, but best result to find one character: " )

   print ( len (re.findall ( ch, s ) ) )

main()

“Without using count to find you want character in string” method.

import re

def count(s, ch):

   pass

def main():

   s = raw_input ("Enter strings what you like, for example, 'welcome': ")  

   ch = raw_input ("Enter you want count characters, but best result to find one character: " )

   print ( len (re.findall ( ch, s ) ) )

main()

回答 14

我是熊猫图书馆的粉丝,尤其是value_counts()方法。您可以使用它来计算字符串中每个字符的出现:

>>> import pandas as pd
>>> phrase = "I love the pandas library and its `value_counts()` method"
>>> pd.Series(list(phrase)).value_counts()
     8
a    5
e    4
t    4
o    3
n    3
s    3
d    3
l    3
u    2
i    2
r    2
v    2
`    2
h    2
p    1
b    1
I    1
m    1
(    1
y    1
_    1
)    1
c    1
dtype: int64

I am a fan of the pandas library, in particular the value_counts() method. You could use it to count the occurrence of each character in your string:

>>> import pandas as pd
>>> phrase = "I love the pandas library and its `value_counts()` method"
>>> pd.Series(list(phrase)).value_counts()
     8
a    5
e    4
t    4
o    3
n    3
s    3
d    3
l    3
u    2
i    2
r    2
v    2
`    2
h    2
p    1
b    1
I    1
m    1
(    1
y    1
_    1
)    1
c    1
dtype: int64

回答 15

spam = 'have a nice day'
var = 'd'


def count(spam, var):
    found = 0
    for key in spam:
        if key == var:
            found += 1
    return found
count(spam, var)
print 'count %s is: %s ' %(var, count(spam, var))
spam = 'have a nice day'
var = 'd'


def count(spam, var):
    found = 0
    for key in spam:
        if key == var:
            found += 1
    return found
count(spam, var)
print 'count %s is: %s ' %(var, count(spam, var))

回答 16

Python 3

有两种方法可以实现此目的:

1)内置函数count()

sentence = 'Mary had a little lamb'
print(sentence.count('a'))`

2)不使用功能

sentence = 'Mary had a little lamb'    
count = 0

for i in sentence:
    if i == "a":
        count = count + 1

print(count)

Python 3

Ther are two ways to achieve this:

1) With built-in function count()

sentence = 'Mary had a little lamb'
print(sentence.count('a'))`

2) Without using a function

sentence = 'Mary had a little lamb'    
count = 0

for i in sentence:
    if i == "a":
        count = count + 1

print(count)

回答 17

仅此恕我直言-您可以添加上限或下限方法

def count_letter_in_str(string,letter):
    return string.count(letter)

No more than this IMHO – you can add the upper or lower methods

def count_letter_in_str(string,letter):
    return string.count(letter)

在Python 3中将字符串转换为字节的最佳方法?

问题:在Python 3中将字符串转换为字节的最佳方法?

TypeError的答案中可以看出,有两种不同的方法可以将字符串转换为字节:’str’不支持缓冲区接口

以下哪种方法更好或更Pythonic?还是仅仅是个人喜好问题?

b = bytes(mystring, 'utf-8')

b = mystring.encode('utf-8')

There appear to be two different ways to convert a string to bytes, as seen in the answers to TypeError: ‘str’ does not support the buffer interface

Which of these methods would be better or more Pythonic? Or is it just a matter of personal preference?

b = bytes(mystring, 'utf-8')

b = mystring.encode('utf-8')

回答 0

如果您查看的文档bytes,它将指向bytearray

bytearray([源[,编码[,错误]]])

返回一个新的字节数组。字节数组类型是一个可变的整数序列,范围为0 <= x <256。它具有可变序列类型中介绍的大多数可变序列的常用方法,以及字节类型具有的大多数方法,请参见字节和。字节数组方法。

可选的source参数可以通过几种不同的方式用于初始化数组:

如果是字符串,则还必须提供编码(以及可选的错误)参数;然后,bytearray()使用str.encode()将字符串转换为字节。

如果它是整数,则数组将具有该大小,并将使用空字节初始化。

如果它是符合缓冲区接口的对象,则该对象的只读缓冲区将用于初始化bytes数组。

如果是可迭代的,则它必须是0 <= x <256范围内的整数的可迭代对象,这些整数用作数组的初始内容。

没有参数,将创建大小为0的数组。

所以 bytes除了编码字符串以外,还可以做更多的事情。这是Pythonic的用法,它允许您使用有意义的任何类型的源参数来调用构造函数。

对于编码字符串,我认为它some_string.encode(encoding)比使用构造函数更具Pythonic性,因为它是最易于记录的文档-“使用此字符串并使用此编码对其进行编码”比bytes(some_string, encoding) – -当您使用构造函数。

编辑:我检查了Python源。如果您将unicode字符串传递给bytes使用CPython,它将调用PyUnicode_AsEncodedString,它是encode; 的实现。因此,如果您自称,则只是跳过了间接级别encode

另外,请参见Serdalis的评论- unicode_string.encode(encoding)也是Python 风格的,因为它的反函数为byte_string.decode(encoding),对称性很好。

If you look at the docs for bytes, it points you to bytearray:

bytearray([source[, encoding[, errors]]])

Return a new array of bytes. The bytearray type is a mutable sequence of integers in the range 0 <= x < 256. It has most of the usual methods of mutable sequences, described in Mutable Sequence Types, as well as most methods that the bytes type has, see Bytes and Byte Array Methods.

The optional source parameter can be used to initialize the array in a few different ways:

If it is a string, you must also give the encoding (and optionally, errors) parameters; bytearray() then converts the string to bytes using str.encode().

If it is an integer, the array will have that size and will be initialized with null bytes.

If it is an object conforming to the buffer interface, a read-only buffer of the object will be used to initialize the bytes array.

If it is an iterable, it must be an iterable of integers in the range 0 <= x < 256, which are used as the initial contents of the array.

Without an argument, an array of size 0 is created.

So bytes can do much more than just encode a string. It’s Pythonic that it would allow you to call the constructor with any type of source parameter that makes sense.

For encoding a string, I think that some_string.encode(encoding) is more Pythonic than using the constructor, because it is the most self documenting — “take this string and encode it with this encoding” is clearer than bytes(some_string, encoding) — there is no explicit verb when you use the constructor.

Edit: I checked the Python source. If you pass a unicode string to bytes using CPython, it calls PyUnicode_AsEncodedString, which is the implementation of encode; so you’re just skipping a level of indirection if you call encode yourself.

Also, see Serdalis’ comment — unicode_string.encode(encoding) is also more Pythonic because its inverse is byte_string.decode(encoding) and symmetry is nice.


回答 1

比想像的要容易:

my_str = "hello world"
my_str_as_bytes = str.encode(my_str)
type(my_str_as_bytes) # ensure it is byte representation
my_decoded_str = my_str_as_bytes.decode()
type(my_decoded_str) # ensure it is string representation

It’s easier than it is thought:

my_str = "hello world"
my_str_as_bytes = str.encode(my_str)
type(my_str_as_bytes) # ensure it is byte representation
my_decoded_str = my_str_as_bytes.decode()
type(my_decoded_str) # ensure it is string representation

回答 2

绝对最好的办法既不是2,但第3位。自Python 3.0以来,第一个参数默认默认值。因此,最好的方法是encode 'utf-8'

b = mystring.encode()

这也将更快,因为默认参数的结果不是"utf-8"C代码中的字符串,而是NULL,它的检查快得多!

以下是一些时间安排:

In [1]: %timeit -r 10 'abc'.encode('utf-8')
The slowest run took 38.07 times longer than the fastest. 
This could mean that an intermediate result is being cached.
10000000 loops, best of 10: 183 ns per loop

In [2]: %timeit -r 10 'abc'.encode()
The slowest run took 27.34 times longer than the fastest. 
This could mean that an intermediate result is being cached.
10000000 loops, best of 10: 137 ns per loop

尽管发出警告,但重复运行后时间仍然非常稳定-偏差仅为〜2%。


encode()不带参数使用不兼容Python 2,因为在Python 2中,默认字符编码为ASCII

>>> 'äöä'.encode()
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 0: ordinal not in range(128)

The absolutely best way is neither of the 2, but the 3rd. The first parameter to encode defaults to 'utf-8' ever since Python 3.0. Thus the best way is

b = mystring.encode()

This will also be faster, because the default argument results not in the string "utf-8" in the C code, but NULL, which is much faster to check!

Here be some timings:

In [1]: %timeit -r 10 'abc'.encode('utf-8')
The slowest run took 38.07 times longer than the fastest. 
This could mean that an intermediate result is being cached.
10000000 loops, best of 10: 183 ns per loop

In [2]: %timeit -r 10 'abc'.encode()
The slowest run took 27.34 times longer than the fastest. 
This could mean that an intermediate result is being cached.
10000000 loops, best of 10: 137 ns per loop

Despite the warning the times were very stable after repeated runs – the deviation was just ~2 per cent.


Using encode() without an argument is not Python 2 compatible, as in Python 2 the default character encoding is ASCII.

>>> 'äöä'.encode()
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 0: ordinal not in range(128)

字符串文字前的’b’字符做什么?

问题:字符串文字前的’b’字符做什么?

显然,以下是有效的语法:

my_string = b'The string'

我想知道:

  1. 这是什么b字在前面的字符串是什么意思?
  2. 使用它有什么作用?
  3. 在什么情况下可以使用它?

我在SO上找到了一个相关的问题,但是这个问题是关于PHP的,它指出b用来表示字符串是二进制的,与Unicode相反,Unicode是使PHP <6版本兼容的代码所必需的,当迁移到PHP 6时。我认为这不适用于Python。

我确实在Python站点上找到了有关使用相同语法的字符将字符串指定为Unicode的文档u。不幸的是,它在该文档的任何地方都没有提到b字符。

而且,只是出于好奇,有没有比多符号bu是做其他事情?

Apparently, the following is the valid syntax:

my_string = b'The string'

I would like to know:

  1. What does this b character in front of the string mean?
  2. What are the effects of using it?
  3. What are appropriate situations to use it?

I found a related question right here on SO, but that question is about PHP though, and it states the b is used to indicate the string is binary, as opposed to Unicode, which was needed for code to be compatible from version of PHP < 6, when migrating to PHP 6. I don’t think this applies to Python.

I did find this documentation on the Python site about using a u character in the same syntax to specify a string as Unicode. Unfortunately, it doesn’t mention the b character anywhere in that document.

Also, just out of curiosity, are there more symbols than the b and u that do other things?


回答 0

引用Python 2.x文档

在Python 2中,前缀’b’或’B’被忽略;它表示文字应在Python 3中变成字节文字(例如,当代码自动由2to3转换时)。前缀“ u”或“ b”后可以带有前缀“ r”。

Python 3中的文件状态:

字节字面量始终以“ b”或“ B”为前缀;它们产生字节类型的实例而不是str类型。它们只能包含ASCII字符;数值等于或大于128的字节必须用转义符表示。

To quote the Python 2.x documentation:

A prefix of ‘b’ or ‘B’ is ignored in Python 2; it indicates that the literal should become a bytes literal in Python 3 (e.g. when code is automatically converted with 2to3). A ‘u’ or ‘b’ prefix may be followed by an ‘r’ prefix.

The Python 3 documentation states:

Bytes literals are always prefixed with ‘b’ or ‘B’; they produce an instance of the bytes type instead of the str type. They may only contain ASCII characters; bytes with a numeric value of 128 or greater must be expressed with escapes.


回答 1

Python 3.x明确区分了以下类型:

  • str= '...'文字= Unicode字符序列(UTF-16或UTF-32,取决于Python的编译方式)
  • bytes= b'...'文字=八位字节序列(0到255之间的整数)

如果你熟悉Java或C#,想到strStringbytes作为byte[]。如果您熟悉SQL,请认为stras NVARCHARbytesas BINARYBLOB。如果你熟悉Windows注册表,想到strREG_SZbytes作为REG_BINARY。如果您熟悉C(++),请忘记学习的所有知识char和字符串,因为CHARACTER不是BYTE。这个想法早已过时。

您可以使用str,当你想要表达的文字。

print('שלום עולם')

您可以使用bytes,当你想表示相同结构的低级别的二进制数据。

NaN = struct.unpack('>d', b'\xff\xf8\x00\x00\x00\x00\x00\x00')[0]

您可以编码一个str到一个bytes对象。

>>> '\uFEFF'.encode('UTF-8')
b'\xef\xbb\xbf'

您可以将a解码bytesstr

>>> b'\xE2\x82\xAC'.decode('UTF-8')
'€'

但是您不能随意混合使用这两种类型。

>>> b'\xEF\xBB\xBF' + 'Text with a UTF-8 BOM'
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: can't concat bytes to str

这种b'...'表示法有些令人困惑,因为它允许使用ASCII字符而不是十六进制数字指定字节0x01-0x7F。

>>> b'A' == b'\x41'
True

但是我必须强调,字符不是字节

>>> 'A' == b'A'
False

在Python 2.x中

Python 3.0之前的版本在文本和二进制数据之间缺乏这种区别。相反,有:

  • unicode= u'...'文字= Unicode字符序列= 3.xstr
  • str= '...'文字=混杂字节/字符的序列
    • 通常为文本,以某种未指定的编码进行编码。
    • 而且还用来表示二进制数据,如struct.pack输出。

为了简化从2.x到3.x的过渡,b'...'将原义语法反向移植到Python 2.6,以便区分二进制字符串(应bytes在3.x中)和文本字符串(应str在3中) 。X)。该b前缀在2.x中不执行任何操作,但告诉2to3脚本不要在3.x中将其转换为Unicode字符串。

因此,是的,b'...'Python中的文字具有与PHP中相同的目的。

另外,出于好奇,还有比b和u更多的符号可以执行其他操作吗?

r前缀创建原始字符串(例如,r'\t'是反斜杠+ t,而不是一个选项卡),和三引号'''...'''"""..."""允许多行字符串文字。

Python 3.x makes a clear distinction between the types:

  • str = '...' literals = a sequence of Unicode characters (UTF-16 or UTF-32, depending on how Python was compiled)
  • bytes = b'...' literals = a sequence of octets (integers between 0 and 255)

If you’re familiar with Java or C#, think of str as String and bytes as byte[]. If you’re familiar with SQL, think of str as NVARCHAR and bytes as BINARY or BLOB. If you’re familiar with the Windows registry, think of str as REG_SZ and bytes as REG_BINARY. If you’re familiar with C(++), then forget everything you’ve learned about char and strings, because A CHARACTER IS NOT A BYTE. That idea is long obsolete.

You use str when you want to represent text.

print('שלום עולם')

You use bytes when you want to represent low-level binary data like structs.

NaN = struct.unpack('>d', b'\xff\xf8\x00\x00\x00\x00\x00\x00')[0]

You can encode a str to a bytes object.

>>> '\uFEFF'.encode('UTF-8')
b'\xef\xbb\xbf'

And you can decode a bytes into a str.

>>> b'\xE2\x82\xAC'.decode('UTF-8')
'€'

But you can’t freely mix the two types.

>>> b'\xEF\xBB\xBF' + 'Text with a UTF-8 BOM'
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: can't concat bytes to str

The b'...' notation is somewhat confusing in that it allows the bytes 0x01-0x7F to be specified with ASCII characters instead of hex numbers.

>>> b'A' == b'\x41'
True

But I must emphasize, a character is not a byte.

>>> 'A' == b'A'
False

In Python 2.x

Pre-3.0 versions of Python lacked this kind of distinction between text and binary data. Instead, there was:

  • unicode = u'...' literals = sequence of Unicode characters = 3.x str
  • str = '...' literals = sequences of confounded bytes/characters
    • Usually text, encoded in some unspecified encoding.
    • But also used to represent binary data like struct.pack output.

In order to ease the 2.x-to-3.x transition, the b'...' literal syntax was backported to Python 2.6, in order to allow distinguishing binary strings (which should be bytes in 3.x) from text strings (which should be str in 3.x). The b prefix does nothing in 2.x, but tells the 2to3 script not to convert it to a Unicode string in 3.x.

So yes, b'...' literals in Python have the same purpose that they do in PHP.

Also, just out of curiosity, are there more symbols than the b and u that do other things?

The r prefix creates a raw string (e.g., r'\t' is a backslash + t instead of a tab), and triple quotes '''...''' or """...""" allow multi-line string literals.


回答 2

b表示字节字符串。

字节是实际数据。字符串是一种抽象。

如果您有多个字符的字符串对象并且使用了一个字符,则该字符串将是一个字符串,并且根据编码的不同,大小可能会超过1个字节。

如果使用1个字节和一个字节字符串,则您将获得0-255之间的单个8位值,并且如果由于编码而导致的那些字符大于1个字节,则它可能不表示完整的字符。

TBH我将使用字符串,除非我有一些特定的低级原因要使用字节。

The b denotes a byte string.

Bytes are the actual data. Strings are an abstraction.

If you had multi-character string object and you took a single character, it would be a string, and it might be more than 1 byte in size depending on encoding.

If took 1 byte with a byte string, you’d get a single 8-bit value from 0-255 and it might not represent a complete character if those characters due to encoding were > 1 byte.

TBH I’d use strings unless I had some specific low level reason to use bytes.


回答 3

从服务器端,如果我们发送任何响应,它将以字节类型的形式发送,因此它将在客户端中显示为 b'Response from server'

为了摆脱,b'....'只需使用以下代码:

服务器文件:

stri="Response from server"    
c.send(stri.encode())

客户端文件:

print(s.recv(1024).decode())

然后它将打印 Response from server

From server side, if we send any response, it will be sent in the form of byte type, so it will appear in the client as b'Response from server'

In order get rid of b'....' simply use below code:

Server file:

stri="Response from server"    
c.send(stri.encode())

Client file:

print(s.recv(1024).decode())

then it will print Response from server


回答 4

这是一个示例,其中缺少bTypeError在Python 3.x中引发异常

>>> f=open("new", "wb")
>>> f.write("Hello Python!")
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: 'str' does not support the buffer interface

添加b前缀将解决此问题。

Here’s an example where the absence of b would throw a TypeError exception in Python 3.x

>>> f=open("new", "wb")
>>> f.write("Hello Python!")
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: 'str' does not support the buffer interface

Adding a b prefix would fix the problem.


回答 5

它将其转换为bytes文字(或str在2.x中),并且对于2.6+有效。

r前缀导致反斜杠需要“不解释”(不被忽略,差异确实物质)。

It turns it into a bytes literal (or str in 2.x), and is valid for 2.6+.

The r prefix causes backslashes to be “uninterpreted” (not ignored, and the difference does matter).


回答 6

除了其他人所说的以外,请注意unicode中的单个字符可以由多个字节组成

unicode的工作方式是采用旧的ASCII格式(7位代码,看起来像0xxx xxxx),并添加了多字节序列,其中所有字节均以1(1xxx xxxx)开头,以表示ASCII以外的字符,以便Unicode 向后-与ASCII 兼容

>>> len('Öl')  # German word for 'oil' with 2 characters
2
>>> 'Öl'.encode('UTF-8')  # convert str to bytes 
b'\xc3\x96l'
>>> len('Öl'.encode('UTF-8'))  # 3 bytes encode 2 characters !
3

In addition to what others have said, note that a single character in unicode can consist of multiple bytes.

The way unicode works is that it took the old ASCII format (7-bit code that looks like 0xxx xxxx) and added multi-bytes sequences where all bytes start with 1 (1xxx xxxx) to represent characters beyond ASCII so that Unicode would be backwards-compatible with ASCII.

>>> len('Öl')  # German word for 'oil' with 2 characters
2
>>> 'Öl'.encode('UTF-8')  # convert str to bytes 
b'\xc3\x96l'
>>> len('Öl'.encode('UTF-8'))  # 3 bytes encode 2 characters !
3

回答 7

您可以使用JSON将其转换为字典

import json
data = b'{"key":"value"}'
print(json.loads(data))

{“核心价值”}


烧瓶:

这是烧瓶的一个例子。在终端行上运行此命令:

import requests
requests.post(url='http://localhost(example)/',json={'key':'value'})

在flask / routes.py中

@app.route('/', methods=['POST'])
def api_script_add():
    print(request.data) # --> b'{"hi":"Hello"}'
    print(json.loads(request.data))
return json.loads(request.data)

{‘核心价值’}

You can use JSON to convert it to dictionary

import json
data = b'{"key":"value"}'
print(json.loads(data))

{“key”:”value”}


FLASK:

This is an example from flask. Run this on terminal line:

import requests
requests.post(url='http://localhost(example)/',json={'key':'value'})

In flask/routes.py

@app.route('/', methods=['POST'])
def api_script_add():
    print(request.data) # --> b'{"hi":"Hello"}'
    print(json.loads(request.data))
return json.loads(request.data)

{‘key’:’value’}


将字典的字符串表示形式转换为字典?

问题:将字典的字符串表示形式转换为字典?

如何将a的str表示形式(dict例如以下字符串)转换为a dict

s = "{'muffin' : 'lolz', 'foo' : 'kitty'}"

我宁愿不使用eval。我还能使用什么?

这样做的主要原因是他写的我的同事类之一,将所有输入都转换为字符串。我不打算去修改他的类,以解决这个问题。

How can I convert the str representation of a dict, such as the following string, into a dict?

s = "{'muffin' : 'lolz', 'foo' : 'kitty'}"

I prefer not to use eval. What else can I use?

The main reason for this, is one of my coworkers classes he wrote, converts all input into strings. I’m not in the mood to go and modify his classes, to deal with this issue.


回答 0

从Python 2.6开始,您可以使用内置的ast.literal_eval

>>> import ast
>>> ast.literal_eval("{'muffin' : 'lolz', 'foo' : 'kitty'}")
{'muffin': 'lolz', 'foo': 'kitty'}

这比使用更为安全eval。正如其文档所说:

>>>帮助(ast.literal_eval)
帮助ast模块中的literal_eval函数:

literal_eval(node_or_string)
    安全地评估表达式节点或包含Python的字符串
    表达。提供的字符串或节点只能由以下内容组成
    Python文字结构:字符串,数字,元组,列表,字典,布尔值,
    和没有。

例如:

>>> eval("shutil.rmtree('mongo')")
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "<string>", line 1, in <module>
  File "/opt/Python-2.6.1/lib/python2.6/shutil.py", line 208, in rmtree
    onerror(os.listdir, path, sys.exc_info())
  File "/opt/Python-2.6.1/lib/python2.6/shutil.py", line 206, in rmtree
    names = os.listdir(path)
OSError: [Errno 2] No such file or directory: 'mongo'
>>> ast.literal_eval("shutil.rmtree('mongo')")
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/opt/Python-2.6.1/lib/python2.6/ast.py", line 68, in literal_eval
    return _convert(node_or_string)
  File "/opt/Python-2.6.1/lib/python2.6/ast.py", line 67, in _convert
    raise ValueError('malformed string')
ValueError: malformed string

Starting in Python 2.6 you can use the built-in ast.literal_eval:

>>> import ast
>>> ast.literal_eval("{'muffin' : 'lolz', 'foo' : 'kitty'}")
{'muffin': 'lolz', 'foo': 'kitty'}

This is safer than using eval. As its own docs say:

>>> help(ast.literal_eval)
Help on function literal_eval in module ast:

literal_eval(node_or_string)
    Safely evaluate an expression node or a string containing a Python
    expression.  The string or node provided may only consist of the following
    Python literal structures: strings, numbers, tuples, lists, dicts, booleans,
    and None.

For example:

>>> eval("shutil.rmtree('mongo')")
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "<string>", line 1, in <module>
  File "/opt/Python-2.6.1/lib/python2.6/shutil.py", line 208, in rmtree
    onerror(os.listdir, path, sys.exc_info())
  File "/opt/Python-2.6.1/lib/python2.6/shutil.py", line 206, in rmtree
    names = os.listdir(path)
OSError: [Errno 2] No such file or directory: 'mongo'
>>> ast.literal_eval("shutil.rmtree('mongo')")
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/opt/Python-2.6.1/lib/python2.6/ast.py", line 68, in literal_eval
    return _convert(node_or_string)
  File "/opt/Python-2.6.1/lib/python2.6/ast.py", line 67, in _convert
    raise ValueError('malformed string')
ValueError: malformed string

回答 1

https://docs.python.org/3.8/library/json.html

JSON可以解决此问题,尽管其解码器希望在键和值周围使用双引号。如果您不介意更换骇客…

import json
s = "{'muffin' : 'lolz', 'foo' : 'kitty'}"
json_acceptable_string = s.replace("'", "\"")
d = json.loads(json_acceptable_string)
# d = {u'muffin': u'lolz', u'foo': u'kitty'}

请注意,如果将单引号作为键或值的一部分,则由于字符替换不当而导致此操作失败。仅当您对评估解决方案强烈反对时,才建议使用此解决方案。

有关JSON单引号的更多信息:jQuery.parseJSON由于JSON中的单引号已转义而引发“无效JSON”错误

https://docs.python.org/3.8/library/json.html

JSON can solve this problem though its decoder wants double quotes around keys and values. If you don’t mind a replace hack…

import json
s = "{'muffin' : 'lolz', 'foo' : 'kitty'}"
json_acceptable_string = s.replace("'", "\"")
d = json.loads(json_acceptable_string)
# d = {u'muffin': u'lolz', u'foo': u'kitty'}

NOTE that if you have single quotes as a part of your keys or values this will fail due to improper character replacement. This solution is only recommended if you have a strong aversion to the eval solution.

More about json single quote: jQuery.parseJSON throws “Invalid JSON” error due to escaped single quote in JSON


回答 2

使用json.loads

>>> import json
>>> h = '{"foo":"bar", "foo2":"bar2"}'
>>> d = json.loads(h)
>>> d
{u'foo': u'bar', u'foo2': u'bar2'}
>>> type(d)
<type 'dict'>

using json.loads:

>>> import json
>>> h = '{"foo":"bar", "foo2":"bar2"}'
>>> d = json.loads(h)
>>> d
{u'foo': u'bar', u'foo2': u'bar2'}
>>> type(d)
<type 'dict'>

回答 3

以OP为例:

s = "{'muffin' : 'lolz', 'foo' : 'kitty'}"

我们可以使用Yaml处理字符串中的这种非标准json:

>>> import yaml
>>> s = "{'muffin' : 'lolz', 'foo' : 'kitty'}"
>>> s
"{'muffin' : 'lolz', 'foo' : 'kitty'}"
>>> yaml.load(s)
{'muffin': 'lolz', 'foo': 'kitty'}

To OP’s example:

s = "{'muffin' : 'lolz', 'foo' : 'kitty'}"

We can use Yaml to deal with this kind of non-standard json in string:

>>> import yaml
>>> s = "{'muffin' : 'lolz', 'foo' : 'kitty'}"
>>> s
"{'muffin' : 'lolz', 'foo' : 'kitty'}"
>>> yaml.load(s)
{'muffin': 'lolz', 'foo': 'kitty'}

回答 4

如果始终可以信任该字符串,则可以使用eval(或literal_eval按建议使用;无论该字符串是什么都是安全的。)否则,您需要一个解析器。如果JSON解析器(例如simplejson)仅存储符合JSON方案的内容,则该解析器将起作用。

If the string can always be trusted, you could use eval (or use literal_eval as suggested; it’s safe no matter what the string is.) Otherwise you need a parser. A JSON parser (such as simplejson) would work if he only ever stores content that fits with the JSON scheme.


回答 5

使用json。该ast库消耗大量内存,并且速度较慢。我有一个过程需要读取156Mb的文本文件。Ast转换字典需要5分钟的延迟,json而使用内存减少60%则需要1分钟!

Use json. the ast library consumes a lot of memory and and slower. I have a process that needs to read a text file of 156Mb. Ast with 5 minutes delay for the conversion dictionary json and 1 minutes using 60% less memory!


回答 6

总结一下:

import ast, yaml, json, timeit

descs=['short string','long string']
strings=['{"809001":2,"848545":2,"565828":1}','{"2979":1,"30581":1,"7296":1,"127256":1,"18803":2,"41619":1,"41312":1,"16837":1,"7253":1,"70075":1,"3453":1,"4126":1,"23599":1,"11465":3,"19172":1,"4019":1,"4775":1,"64225":1,"3235":2,"15593":1,"7528":1,"176840":1,"40022":1,"152854":1,"9878":1,"16156":1,"6512":1,"4138":1,"11090":1,"12259":1,"4934":1,"65581":1,"9747":2,"18290":1,"107981":1,"459762":1,"23177":1,"23246":1,"3591":1,"3671":1,"5767":1,"3930":1,"89507":2,"19293":1,"92797":1,"32444":2,"70089":1,"46549":1,"30988":1,"4613":1,"14042":1,"26298":1,"222972":1,"2982":1,"3932":1,"11134":1,"3084":1,"6516":1,"486617":1,"14475":2,"2127":1,"51359":1,"2662":1,"4121":1,"53848":2,"552967":1,"204081":1,"5675":2,"32433":1,"92448":1}']
funcs=[json.loads,eval,ast.literal_eval,yaml.load]

for  desc,string in zip(descs,strings):
    print('***',desc,'***')
    print('')
    for  func in funcs:
        print(func.__module__+' '+func.__name__+':')
        %timeit func(string)        
    print('')

结果:

*** short string ***

json loads:
4.47 µs ± 33.4 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
builtins eval:
24.1 µs ± 163 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
ast literal_eval:
30.4 µs ± 299 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
yaml load:
504 µs ± 1.29 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

*** long string ***

json loads:
29.6 µs ± 230 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
builtins eval:
219 µs ± 3.92 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
ast literal_eval:
331 µs ± 1.89 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
yaml load:
9.02 ms ± 92.2 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

结论:更喜欢json.loads

To summarize:

import ast, yaml, json, timeit

descs=['short string','long string']
strings=['{"809001":2,"848545":2,"565828":1}','{"2979":1,"30581":1,"7296":1,"127256":1,"18803":2,"41619":1,"41312":1,"16837":1,"7253":1,"70075":1,"3453":1,"4126":1,"23599":1,"11465":3,"19172":1,"4019":1,"4775":1,"64225":1,"3235":2,"15593":1,"7528":1,"176840":1,"40022":1,"152854":1,"9878":1,"16156":1,"6512":1,"4138":1,"11090":1,"12259":1,"4934":1,"65581":1,"9747":2,"18290":1,"107981":1,"459762":1,"23177":1,"23246":1,"3591":1,"3671":1,"5767":1,"3930":1,"89507":2,"19293":1,"92797":1,"32444":2,"70089":1,"46549":1,"30988":1,"4613":1,"14042":1,"26298":1,"222972":1,"2982":1,"3932":1,"11134":1,"3084":1,"6516":1,"486617":1,"14475":2,"2127":1,"51359":1,"2662":1,"4121":1,"53848":2,"552967":1,"204081":1,"5675":2,"32433":1,"92448":1}']
funcs=[json.loads,eval,ast.literal_eval,yaml.load]

for  desc,string in zip(descs,strings):
    print('***',desc,'***')
    print('')
    for  func in funcs:
        print(func.__module__+' '+func.__name__+':')
        %timeit func(string)        
    print('')

Results:

*** short string ***

json loads:
4.47 µs ± 33.4 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
builtins eval:
24.1 µs ± 163 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
ast literal_eval:
30.4 µs ± 299 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
yaml load:
504 µs ± 1.29 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

*** long string ***

json loads:
29.6 µs ± 230 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
builtins eval:
219 µs ± 3.92 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
ast literal_eval:
331 µs ± 1.89 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
yaml load:
9.02 ms ± 92.2 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

Conclusion: prefer json.loads


回答 7

string = "{'server1':'value','server2':'value'}"

#Now removing { and }
s = string.replace("{" ,"")
finalstring = s.replace("}" , "")

#Splitting the string based on , we get key value pairs
list = finalstring.split(",")

dictionary ={}
for i in list:
    #Get Key Value pairs separately to store in dictionary
    keyvalue = i.split(":")

    #Replacing the single quotes in the leading.
    m= keyvalue[0].strip('\'')
    m = m.replace("\"", "")
    dictionary[m] = keyvalue[1].strip('"\'')

print dictionary
string = "{'server1':'value','server2':'value'}"

#Now removing { and }
s = string.replace("{" ,"")
finalstring = s.replace("}" , "")

#Splitting the string based on , we get key value pairs
list = finalstring.split(",")

dictionary ={}
for i in list:
    #Get Key Value pairs separately to store in dictionary
    keyvalue = i.split(":")

    #Replacing the single quotes in the leading.
    m= keyvalue[0].strip('\'')
    m = m.replace("\"", "")
    dictionary[m] = keyvalue[1].strip('"\'')

print dictionary

回答 8

没有使用任何库:

dict_format_string = "{'1':'one', '2' : 'two'}"
d = {}
elems  = filter(str.isalnum,dict_format_string.split("'"))
values = elems[1::2]
keys   = elems[0::2]
d.update(zip(keys,values))

注意:由于已进行硬编码,split("'")因此仅适用于“单引号”数据的字符串。

no any libs are used:

dict_format_string = "{'1':'one', '2' : 'two'}"
d = {}
elems  = filter(str.isalnum,dict_format_string.split("'"))
values = elems[1::2]
keys   = elems[0::2]
d.update(zip(keys,values))

NOTE: As it has hardcoded split("'") will work only for strings where data is “single quoted”.


如何检查变量的类型是否为字符串?

问题:如何检查变量的类型是否为字符串?

有没有一种方法可以检查python中变量的类型是否为string,例如:

isinstance(x,int);

对于整数值?

Is there a way to check if the type of a variable in python is a string, like:

isinstance(x,int);

for integer values?


回答 0

在Python 2.x中,您可以

isinstance(s, basestring)

basestring抽象的超类strunicode。它可用于测试对象是否是str或的实例unicode


在Python 3.x中,正确的测试是

isinstance(s, str)

bytes在Python 3中,该类不被视为字符串类型。

In Python 2.x, you would do

isinstance(s, basestring)

basestring is the abstract superclass of str and unicode. It can be used to test whether an object is an instance of str or unicode.


In Python 3.x, the correct test is

isinstance(s, str)

The bytes class isn’t considered a string type in Python 3.


回答 1

我知道这是一个古老的话题,但是作为第一个显示在google上的话题,鉴于我没有找到满意的答案,因此我将其留在此处以供将来参考:

第六个是Python 2和3兼容性库,它已经解决了这个问题。然后,您可以执行以下操作:

import six

if isinstance(value, six.string_types):
    pass # It's a string !!

检查代码,您会发现:

import sys

PY3 = sys.version_info[0] == 3

if PY3:
    string_types = str,
else:
    string_types = basestring,

I know this is an old topic, but being the first one shown on google and given that I don’t find any of the answers satisfactory, I’ll leave this here for future reference:

six is a Python 2 and 3 compatibility library which already covers this issue. You can then do something like this:

import six

if isinstance(value, six.string_types):
    pass # It's a string !!

Inspecting the code, this is what you find:

import sys

PY3 = sys.version_info[0] == 3

if PY3:
    string_types = str,
else:
    string_types = basestring,

回答 2

在Python 3.x或Python 2.7.6中

if type(x) == str:

In Python 3.x or Python 2.7.6

if type(x) == str:

回答 3

你可以做:

var = 1
if type(var) == int:
   print('your variable is an integer')

要么:

var2 = 'this is variable #2'
if type(var2) == str:
    print('your variable is a string')
else:
    print('your variable IS NOT a string')

希望这可以帮助!

you can do:

var = 1
if type(var) == int:
   print('your variable is an integer')

or:

var2 = 'this is variable #2'
if type(var2) == str:
    print('your variable is a string')
else:
    print('your variable IS NOT a string')

hope this helps!


回答 4

如果要检查的内容多于整数和字符串,则类型模块也存在。 http://docs.python.org/library/types.html

The type module also exists if you are checking more than ints and strings. http://docs.python.org/library/types.html


回答 5

根据以下更好的答案进行编辑。记下3个答案,找出基弦的凉爽。

旧答案:当心unicode字符串,您可以从多个地方获得unicode字符串,包括Windows中的所有COM调用。

if isinstance(target, str) or isinstance(target, unicode):

Edit based on better answer below. Go down about 3 answers and find out about the coolness of basestring.

Old answer: Watch out for unicode strings, which you can get from several places, including all COM calls in Windows.

if isinstance(target, str) or isinstance(target, unicode):

回答 6

由于basestring未在Python3中定义,因此此小技巧可能有助于使代码兼容:

try: # check whether python knows about 'basestring'
   basestring
except NameError: # no, it doesn't (it's Python3); use 'str' instead
   basestring=str

之后,您可以在Python2和Python3上运行以下测试

isinstance(myvar, basestring)

since basestring isn’t defined in Python3, this little trick might help to make the code compatible:

try: # check whether python knows about 'basestring'
   basestring
except NameError: # no, it doesn't (it's Python3); use 'str' instead
   basestring=str

after that you can run the following test on both Python2 and Python3

isinstance(myvar, basestring)

回答 7

Python 2/3包括unicode

from __future__ import unicode_literals
from builtins import str  #  pip install future
isinstance('asdf', str)   #  True
isinstance(u'asdf', str)  #  True

http://python-future.org/overview.html

Python 2 / 3 including unicode

from __future__ import unicode_literals
from builtins import str  #  pip install future
isinstance('asdf', str)   #  True
isinstance(u'asdf', str)  #  True

http://python-future.org/overview.html


回答 8

我还要注意,如果要检查变量的类型是否为特定类型,可以将变量的类型与已知对象的类型进行比较。

对于字符串,您可以使用此

type(s) == type('')

Also I want notice that if you want to check whether the type of a variable is a specific kind, you can compare the type of the variable to the type of a known object.

For string you can use this

type(s) == type('')

回答 9

其他人在这里提供了很多好的建议,但是我看不到一个很好的跨平台摘要。对于任何Python程序来说,以下内容都是不错的选择:

def isstring(s):
    # if we use Python 3
    if (sys.version_info[0] >= 3):
        return isinstance(s, str)
    # we use Python 2
    return isinstance(s, basestring)

在此函数中,我们用于isinstance(object, classinfo)查看输入是str在Python 3中还是basestring在Python 2中。

Lots of good suggestions provided by others here, but I don’t see a good cross-platform summary. The following should be a good drop in for any Python program:

def isstring(s):
    # if we use Python 3
    if (sys.version_info[0] >= 3):
        return isinstance(s, str)
    # we use Python 2
    return isinstance(s, basestring)

In this function, we use isinstance(object, classinfo) to see if our input is a str in Python 3 or a basestring in Python 2.


回答 10

不使用basestring的Python 2替代方法:

isinstance(s, (str, unicode))

但由于unicode未定义(在Python 3中),因此在Python 3中仍然无法使用。

Alternative way for Python 2, without using basestring:

isinstance(s, (str, unicode))

But still won’t work in Python 3 since unicode isn’t defined (in Python 3).


回答 11

所以,

您可以使用很多选项来检查变量是否为字符串:

a = "my string"
type(a) == str # first 
a.__class__ == str # second
isinstance(a, str) # third
str(a) == a # forth
type(a) == type('') # fifth

此命令是有目的的。

So,

You have plenty of options to check whether your variable is string or not:

a = "my string"
type(a) == str # first 
a.__class__ == str # second
isinstance(a, str) # third
str(a) == a # forth
type(a) == type('') # fifth

This order is for purpose.


回答 12

a = '1000' # also tested for 'abc100', 'a100bc', '100abc'

isinstance(a, str) or isinstance(a, unicode)

返回True

type(a) in [str, unicode]

返回True

a = '1000' # also tested for 'abc100', 'a100bc', '100abc'

isinstance(a, str) or isinstance(a, unicode)

returns True

type(a) in [str, unicode]

returns True


回答 13

这是我对同时支持Python 2和Python 3以及这些要求的回答:

  • 用最少的Py2兼容代码以Py3代码编写。
  • 稍后删除Py2兼容代码而不会受到干扰。即仅旨在删除,不修改Py3代码。
  • 避免使用 six或类似的compat模块,因为它们倾向于隐藏试图实现的目标。
  • 面向未来的潜在Py4。

import sys
PY2 = sys.version_info.major == 2

# Check if string (lenient for byte-strings on Py2):
isinstance('abc', basestring if PY2 else str)

# Check if strictly a string (unicode-string):
isinstance('abc', unicode if PY2 else str)

# Check if either string (unicode-string) or byte-string:
isinstance('abc', basestring if PY2 else (str, bytes))

# Check for byte-string (Py3 and Py2.7):
isinstance('abc', bytes)

Here is my answer to support both Python 2 and Python 3 along with these requirements:

  • Written in Py3 code with minimal Py2 compat code.
  • Remove Py2 compat code later without disruption. I.e. aim for deletion only, no modification to Py3 code.
  • Avoid using six or similar compat module as they tend to hide away what is trying to be achieved.
  • Future-proof for a potential Py4.

import sys
PY2 = sys.version_info.major == 2

# Check if string (lenient for byte-strings on Py2):
isinstance('abc', basestring if PY2 else str)

# Check if strictly a string (unicode-string):
isinstance('abc', unicode if PY2 else str)

# Check if either string (unicode-string) or byte-string:
isinstance('abc', basestring if PY2 else (str, bytes))

# Check for byte-string (Py3 and Py2.7):
isinstance('abc', bytes)

回答 14

如果您不想依赖外部库,那么这对于Python 2.7+和Python 3(http://ideone.com/uB4Kdc)都适用:

# your code goes here
s = ["test"];
#s = "test";
isString = False;

if(isinstance(s, str)):
    isString = True;
try:
    if(isinstance(s, basestring)):
        isString = True;
except NameError:
    pass;

if(isString):
    print("String");
else:
    print("Not String");

If you do not want to depend on external libs, this works both for Python 2.7+ and Python 3 (http://ideone.com/uB4Kdc):

# your code goes here
s = ["test"];
#s = "test";
isString = False;

if(isinstance(s, str)):
    isString = True;
try:
    if(isinstance(s, basestring)):
        isString = True;
except NameError:
    pass;

if(isString):
    print("String");
else:
    print("Not String");

回答 15

您可以简单地使用isinstance函数来确保输入数据的格式为stringunicode。以下示例将帮助您轻松理解。

>>> isinstance('my string', str)
True
>>> isinstance(12, str)
False
>>> isinstance('my string', unicode)
False
>>> isinstance(u'my string',  unicode)
True

You can simply use the isinstance function to make sure that the input data is of format string or unicode. Below examples will help you to understand easily.

>>> isinstance('my string', str)
True
>>> isinstance(12, str)
False
>>> isinstance('my string', unicode)
False
>>> isinstance(u'my string',  unicode)
True

回答 16

s = '123'
issubclass(s.__class__, str)
s = '123'
issubclass(s.__class__, str)

回答 17

这是我的方法:

if type(x) == type(str()):

This is how I do it:

if type(x) == type(str()):

回答 18

我见过:

hasattr(s, 'endswith') 

I’ve seen:

hasattr(s, 'endswith') 

回答 19

>>> thing = 'foo'
>>> type(thing).__name__ == 'str' or type(thing).__name__ == 'unicode'
True
>>> thing = 'foo'
>>> type(thing).__name__ == 'str' or type(thing).__name__ == 'unicode'
True

在Python中将十六进制字符串转换为int

问题:在Python中将十六进制字符串转换为int

如何在Python中将十六进制字符串转换为int?

我可能将其命名为“ 0xffff”或“ ffff”。

How do I convert a hex string to an int in Python?

I may have it as “0xffff” or just “ffff“.


回答 0

如果没有 0x前缀,则需要显式指定基数,否则无法告诉:

x = int("deadbeef", 16)

使用 0x前缀,Python可以自动区分十六进制和十进制。

>>> print int("0xdeadbeef", 0)
3735928559
>>> print int("10", 0)
10

(您必须指定0作为基准才能调用此前缀猜测行为;省略第二个参数意味着假定基准为10。)

Without the 0x prefix, you need to specify the base explicitly, otherwise there’s no way to tell:

x = int("deadbeef", 16)

With the 0x prefix, Python can distinguish hex and decimal automatically.

>>> print int("0xdeadbeef", 0)
3735928559
>>> print int("10", 0)
10

(You must specify 0 as the base in order to invoke this prefix-guessing behavior; omitting the second parameter means to assume base-10.)


回答 1

int(hexString, 16) 可以解决问题,并且可以使用和不使用0x前缀:

>>> int("a", 16)
10
>>> int("0xa",16)
10

int(hexString, 16) does the trick, and works with and without the 0x prefix:

>>> int("a", 16)
10
>>> int("0xa",16)
10

回答 2

对于任何给定的字符串s:

int(s, 16)

For any given string s:

int(s, 16)

回答 3

在Python中将十六进制字符串转换为int

我可能有它"0xffff"或只是它"ffff"

要将字符串转换为int,请将字符串int与要转换的基数一起传递给。

两个字符串都可以通过以下方式进行转换:

>>> string_1 = "0xffff"
>>> string_2 = "ffff"
>>> int(string_1, 16)
65535
>>> int(string_2, 16)
65535

int推断

如果您将0作为基数,int则将从字符串中的前缀推断基数。

>>> int(string_1, 0)
65535

如果没有十六进制前缀0xint没有足够的信息与猜测:

>>> int(string_2, 0)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
ValueError: invalid literal for int() with base 0: 'ffff'

文字:

如果您要输入源代码或解释器,Python将为您进行转换:

>>> integer = 0xffff
>>> integer
65535

这将无法使用,ffff因为Python会认为您正在尝试编写合法的Python名称:

>>> integer = ffff
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
NameError: name 'ffff' is not defined

Python数字以数字字符开头,而Python名称不能以数字字符开头。

Convert hex string to int in Python

I may have it as "0xffff" or just "ffff".

To convert a string to an int, pass the string to int along with the base you are converting from.

Both strings will suffice for conversion in this way:

>>> string_1 = "0xffff"
>>> string_2 = "ffff"
>>> int(string_1, 16)
65535
>>> int(string_2, 16)
65535

Letting int infer

If you pass 0 as the base, int will infer the base from the prefix in the string.

>>> int(string_1, 0)
65535

Without the hexadecimal prefix, 0x, int does not have enough information with which to guess:

>>> int(string_2, 0)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
ValueError: invalid literal for int() with base 0: 'ffff'

literals:

If you’re typing into source code or an interpreter, Python will make the conversion for you:

>>> integer = 0xffff
>>> integer
65535

This won’t work with ffff because Python will think you’re trying to write a legitimate Python name instead:

>>> integer = ffff
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
NameError: name 'ffff' is not defined

Python numbers start with a numeric character, while Python names cannot start with a numeric character.


回答 4

在上述Dan的答案中加上:如果为int()函数提供了十六进制字符串,则必须将基数指定为16,否则它不会认为您给了它有效的值。对于字符串中不包含的十六进制数字,无需指定基数16。

print int(0xdeadbeef) # valid

myHex = "0xdeadbeef"
print int(myHex) # invalid, raises ValueError
print int(myHex , 16) # valid

Adding to Dan’s answer above: if you supply the int() function with a hex string, you will have to specify the base as 16 or it will not think you gave it a valid value. Specifying base 16 is unnecessary for hex numbers not contained in strings.

print int(0xdeadbeef) # valid

myHex = "0xdeadbeef"
print int(myHex) # invalid, raises ValueError
print int(myHex , 16) # valid

回答 5

最坏的方法:

>>> def hex_to_int(x):
    return eval("0x" + x)

>>> hex_to_int("c0ffee")
12648430

请不要这样做!

在Python中使用eval是不好的做法吗?

The worst way:

>>> def hex_to_int(x):
    return eval("0x" + x)

>>> hex_to_int("c0ffee")
12648430

Please don’t do this!

Is using eval in Python a bad practice?


回答 6

或者ast.literal_eval(这很安全,不像eval):

ast.literal_eval("0xffff")

演示:

>>> import ast
>>> ast.literal_eval("0xffff")
65535
>>> 

Or ast.literal_eval (this is safe, unlike eval):

ast.literal_eval("0xffff")

Demo:

>>> import ast
>>> ast.literal_eval("0xffff")
65535
>>> 

回答 7

格式化程序选项’%x’%对我来说似乎也可以在赋值语句中使用。(假设Python 3.0及更高版本)

a = int('0x100', 16)
print(a)   #256
print('%x' % a) #100
b = a
print(b) #256
c = '%x' % a
print(c) #100

The formatter option ‘%x’ % seems to work in assignment statements as well for me. (Assuming Python 3.0 and later)

Example

a = int('0x100', 16)
print(a)   #256
print('%x' % a) #100
b = a
print(b) #256
c = '%x' % a
print(c) #100

回答 8

如果您使用的是python解释器,则只需键入0x(您的十六进制值),解释器就会自动为您转换。

>>> 0xffff

65535

If you are using the python interpreter, you can just type 0x(your hex value) and the interpreter will convert it automatically for you.

>>> 0xffff

65535

回答 9

处理十六进制,八进制,二进制,整数和浮点数

使用标准前缀(即0x,0b,0和0o),此函数会将任何合适的字符串转换为数字。我在这里回答了这个问题:https : //stackoverflow.com/a/58997070/2464381,但这是必需的功能。

def to_number(n):
    ''' Convert any number representation to a number 
    This covers: float, decimal, hex, and octal numbers.
    '''

    try:
        return int(str(n), 0)
    except:
        try:
            # python 3 doesn't accept "010" as a valid octal.  You must use the
            # '0o' prefix
            return int('0o' + n, 0)
        except:
            return float(n)

Handles hex, octal, binary, int, and float

Using the standard prefixes (i.e. 0x, 0b, 0, and 0o) this function will convert any suitable string to a number. I answered this here: https://stackoverflow.com/a/58997070/2464381 but here is the needed function.

def to_number(n):
    ''' Convert any number representation to a number 
    This covers: float, decimal, hex, and octal numbers.
    '''

    try:
        return int(str(n), 0)
    except:
        try:
            # python 3 doesn't accept "010" as a valid octal.  You must use the
            # '0o' prefix
            return int('0o' + n, 0)
        except:
            return float(n)

回答 10

在Python 2.7中,int('deadbeef',10)似乎不起作用。

以下对我有用:

>>a = int('deadbeef',16)
>>float(a)
3735928559.0

In Python 2.7, int('deadbeef',10) doesn’t seem to work.

The following works for me:

>>a = int('deadbeef',16)
>>float(a)
3735928559.0

回答 11

加上“ 0x”前缀,您也可以使用eval函数

例如

>>a='0xff'
>>eval(a)
255

with ‘0x’ prefix, you might also use eval function

For example

>>a='0xff'
>>eval(a)
255

在Python中从字符串转换为布尔值?

问题:在Python中从字符串转换为布尔值?

有谁知道如何在Python中从字符串转换为布尔值?我找到了这个链接。但这似乎不是正确的方法。即使用内置功能等

我之所以这样问,是因为我int("string")从这里学到了。但是当尝试bool("string")它总是返回True

>>> bool("False")
True

Does anyone know how to do convert from a string to a boolean in Python? I found this link. But it doesn’t look like a proper way to do it. I.e. using built-in functionality, etc.

The reason I’m asking this is because I learned about int("string") from here. But when trying bool("string") it always returns True:

>>> bool("False")
True

回答 0

实际上,您只需将字符串与希望接受的代表true的字符串进行比较,就可以做到这一点:

s == 'True'

或检查一堆值:

s.lower() in ['true', '1', 't', 'y', 'yes', 'yeah', 'yup', 'certainly', 'uh-huh']

使用以下内容时请小心:

>>> bool("foo")
True
>>> bool("")
False

空字符串的计算结果为False,但其他所有结果的计算结果为True。因此,不应将其用于任何类型的解析目的。

Really, you just compare the string to whatever you expect to accept as representing true, so you can do this:

s == 'True'

Or to checks against a whole bunch of values:

s.lower() in ['true', '1', 't', 'y', 'yes', 'yeah', 'yup', 'certainly', 'uh-huh']

Be cautious when using the following:

>>> bool("foo")
True
>>> bool("")
False

Empty strings evaluate to False, but everything else evaluates to True. So this should not be used for any kind of parsing purposes.


回答 1

采用:

bool(distutils.util.strtobool(some_string))

真实值为y,y,t,true,on和1;false值是n,no,f,false,off和0。如果val是其他值,则引发ValueError。

请注意,它distutils.util.strtobool()返回整数表示形式,因此需要将其包装bool()以获取布尔值。

Use:

bool(distutils.util.strtobool(some_string))

True values are y, yes, t, true, on and 1; false values are n, no, f, false, off and 0. Raises ValueError if val is anything else.

Be aware that distutils.util.strtobool() returns integer representations and thus it needs to be wrapped with bool() to get Boolean values.


回答 2

def str2bool(v):
  return v.lower() in ("yes", "true", "t", "1")

然后这样称呼它:

>>> str2bool("yes")
True
>>> str2bool("no")
False
>>> str2bool("stuff")
False
>>> str2bool("1")
True
>>> str2bool("0")
False

显式处理真假:

您还可以使函数显式地检查True单词列表和False单词列表。然后,如果它不在两个列表中,则可能引发异常。

def str2bool(v):
  return v.lower() in ("yes", "true", "t", "1")

Then call it like so:

>>> str2bool("yes")
True
>>> str2bool("no")
False
>>> str2bool("stuff")
False
>>> str2bool("1")
True
>>> str2bool("0")
False

Handling true and false explicitly:

You could also make your function explicitly check against a True list of words and a False list of words. Then if it is in neither list, you could throw an exception.


回答 3

JSON解析器通常也可用于将字符串转换为合理的python类型。

>>> import json
>>> json.loads("false".lower())
False
>>> json.loads("True".lower())
True

The JSON parser is also useful for in general converting strings to reasonable python types.

>>> import json
>>> json.loads("false".lower())
False
>>> json.loads("True".lower())
True

回答 4

从Python 2.6开始,现在有ast.literal_eval

>>>导入AST
>>>帮助(ast.literal_eval)
帮助ast模块中的literal_eval函数:

literal_eval(node_or_string)
    安全地评估表达式节点或包含Python的字符串
    表达。提供的字符串或节点只能由以下内容组成
    Python文字结构:字符串,数字,元组,列表,字典,布尔值,
    和没有。

因为你这似乎工作,只要确保你的字符串将是两种"True""False"

>>> ast.literal_eval(“ True”)
真正
>>> ast.literal_eval(“ False”)
假
>>> ast.literal_eval(“ F”)
追溯(最近一次通话):
  文件“”,第1行,位于 
  文件“ /opt/Python-2.6.1/lib/python2.6/ast.py”,第68行,位于literal_eval中
    返回_convert(node_or_string)
  _convert中的文件“ /opt/Python-2.6.1/lib/python2.6/ast.py”,第67行
    引发ValueError('格式错误的字符串')
ValueError:格式错误的字符串
>>> ast.literal_eval(“'False'”)
'假'

我通常不建议这样做,但是它是完全内置的,根据您的要求可能是正确的选择。

Starting with Python 2.6, there is now ast.literal_eval:

>>> import ast
>>> help(ast.literal_eval)
Help on function literal_eval in module ast:

literal_eval(node_or_string)
    Safely evaluate an expression node or a string containing a Python
    expression.  The string or node provided may only consist of the following
    Python literal structures: strings, numbers, tuples, lists, dicts, booleans,
    and None.

Which seems to work, as long as you’re sure your strings are going to be either "True" or "False":

>>> ast.literal_eval("True")
True
>>> ast.literal_eval("False")
False
>>> ast.literal_eval("F")
Traceback (most recent call last):
  File "", line 1, in 
  File "/opt/Python-2.6.1/lib/python2.6/ast.py", line 68, in literal_eval
    return _convert(node_or_string)
  File "/opt/Python-2.6.1/lib/python2.6/ast.py", line 67, in _convert
    raise ValueError('malformed string')
ValueError: malformed string
>>> ast.literal_eval("'False'")
'False'

I wouldn’t normally recommend this, but it is completely built-in and could be the right thing depending on your requirements.


回答 5

如果您知道字符串为"True"or "False",则可以使用eval(s)

>>> eval("True")
True
>>> eval("False")
False

不过,仅在确定字符串的内容时才使用它,因为如果字符串不包含有效的Python,它将引发异常,并且还将执行字符串中包含的代码。

If you know the string will be either "True" or "False", you could just use eval(s).

>>> eval("True")
True
>>> eval("False")
False

Only use this if you are sure of the contents of the string though, as it will throw an exception if the string does not contain valid Python, and will also execute code contained in the string.


回答 6

此版本保留了int(value)等构造函数的语义,并提供了一种定义可接受的字符串值的简便方法。

def to_bool(value):
    valid = {'true': True, 't': True, '1': True,
             'false': False, 'f': False, '0': False,
             }   

    if isinstance(value, bool):
        return value

    if not isinstance(value, basestring):
        raise ValueError('invalid literal for boolean. Not a string.')

    lower_value = value.lower()
    if lower_value in valid:
        return valid[lower_value]
    else:
        raise ValueError('invalid literal for boolean: "%s"' % value)


# Test cases
assert to_bool('true'), '"true" is True' 
assert to_bool('True'), '"True" is True' 
assert to_bool('TRue'), '"TRue" is True' 
assert to_bool('TRUE'), '"TRUE" is True' 
assert to_bool('T'), '"T" is True' 
assert to_bool('t'), '"t" is True' 
assert to_bool('1'), '"1" is True' 
assert to_bool(True), 'True is True' 
assert to_bool(u'true'), 'unicode "true" is True'

assert to_bool('false') is False, '"false" is False' 
assert to_bool('False') is False, '"False" is False' 
assert to_bool('FAlse') is False, '"FAlse" is False' 
assert to_bool('FALSE') is False, '"FALSE" is False' 
assert to_bool('F') is False, '"F" is False' 
assert to_bool('f') is False, '"f" is False' 
assert to_bool('0') is False, '"0" is False' 
assert to_bool(False) is False, 'False is False'
assert to_bool(u'false') is False, 'unicode "false" is False'

# Expect ValueError to be raised for invalid parameter...
try:
    to_bool('')
    to_bool(12)
    to_bool([])
    to_bool('yes')
    to_bool('FOObar')
except ValueError, e:
    pass

This version keeps the semantics of constructors like int(value) and provides an easy way to define acceptable string values.

def to_bool(value):
    valid = {'true': True, 't': True, '1': True,
             'false': False, 'f': False, '0': False,
             }   

    if isinstance(value, bool):
        return value

    if not isinstance(value, basestring):
        raise ValueError('invalid literal for boolean. Not a string.')

    lower_value = value.lower()
    if lower_value in valid:
        return valid[lower_value]
    else:
        raise ValueError('invalid literal for boolean: "%s"' % value)


# Test cases
assert to_bool('true'), '"true" is True' 
assert to_bool('True'), '"True" is True' 
assert to_bool('TRue'), '"TRue" is True' 
assert to_bool('TRUE'), '"TRUE" is True' 
assert to_bool('T'), '"T" is True' 
assert to_bool('t'), '"t" is True' 
assert to_bool('1'), '"1" is True' 
assert to_bool(True), 'True is True' 
assert to_bool(u'true'), 'unicode "true" is True'

assert to_bool('false') is False, '"false" is False' 
assert to_bool('False') is False, '"False" is False' 
assert to_bool('FAlse') is False, '"FAlse" is False' 
assert to_bool('FALSE') is False, '"FALSE" is False' 
assert to_bool('F') is False, '"F" is False' 
assert to_bool('f') is False, '"f" is False' 
assert to_bool('0') is False, '"0" is False' 
assert to_bool(False) is False, 'False is False'
assert to_bool(u'false') is False, 'unicode "false" is False'

# Expect ValueError to be raised for invalid parameter...
try:
    to_bool('')
    to_bool(12)
    to_bool([])
    to_bool('yes')
    to_bool('FOObar')
except ValueError, e:
    pass

回答 7

这是我的版本。它同时检查正值和负值列表,从而引发未知值的异常。而且它不接收字符串,但是任何类型都可以。

def to_bool(value):
    """
       Converts 'something' to boolean. Raises exception for invalid formats
           Possible True  values: 1, True, "1", "TRue", "yes", "y", "t"
           Possible False values: 0, False, None, [], {}, "", "0", "faLse", "no", "n", "f", 0.0, ...
    """
    if str(value).lower() in ("yes", "y", "true",  "t", "1"): return True
    if str(value).lower() in ("no",  "n", "false", "f", "0", "0.0", "", "none", "[]", "{}"): return False
    raise Exception('Invalid value for boolean conversion: ' + str(value))

样品运行:

>>> to_bool(True)
True
>>> to_bool("tRUe")
True
>>> to_bool("1")
True
>>> to_bool(1)
True
>>> to_bool(2)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "<stdin>", line 9, in to_bool
Exception: Invalid value for boolean conversion: 2
>>> to_bool([])
False
>>> to_bool({})
False
>>> to_bool(None)
False
>>> to_bool("Wasssaaaaa")
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "<stdin>", line 9, in to_bool
Exception: Invalid value for boolean conversion: Wasssaaaaa
>>>

Here’s is my version. It checks against both positive and negative values lists, raising an exception for unknown values. And it does not receive a string, but any type should do.

def to_bool(value):
    """
       Converts 'something' to boolean. Raises exception for invalid formats
           Possible True  values: 1, True, "1", "TRue", "yes", "y", "t"
           Possible False values: 0, False, None, [], {}, "", "0", "faLse", "no", "n", "f", 0.0, ...
    """
    if str(value).lower() in ("yes", "y", "true",  "t", "1"): return True
    if str(value).lower() in ("no",  "n", "false", "f", "0", "0.0", "", "none", "[]", "{}"): return False
    raise Exception('Invalid value for boolean conversion: ' + str(value))

Sample runs:

>>> to_bool(True)
True
>>> to_bool("tRUe")
True
>>> to_bool("1")
True
>>> to_bool(1)
True
>>> to_bool(2)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "<stdin>", line 9, in to_bool
Exception: Invalid value for boolean conversion: 2
>>> to_bool([])
False
>>> to_bool({})
False
>>> to_bool(None)
False
>>> to_bool("Wasssaaaaa")
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "<stdin>", line 9, in to_bool
Exception: Invalid value for boolean conversion: Wasssaaaaa
>>>

回答 8

你总是可以做这样的事情

myString = "false"
val = (myString == "true")

括号中的位将评估为False。这是无需执行实际函数调用的另一种方法。

you could always do something like

myString = "false"
val = (myString == "true")

the bit in parens would evaluate to False. This is just another way to do it without having to do an actual function call.


回答 9

一个很酷的简单技巧(基于@Alan Marchiori发布的内容),但是使用了yaml:

import yaml

parsed = yaml.load("true")
print bool(parsed)

如果宽度太大,可以通过测试类型结果来完善它。如果yaml返回的类型是str,则不能将其强制转换为任何其他类型(无论如何我都可以想到),因此您可以单独处理它,也可以使其为true。

我不会对速度做出任何猜测,但是由于无论如何我都在Qt gui下使用yaml数据,所以这具有很好的对称性。

A cool, simple trick (based on what @Alan Marchiori posted), but using yaml:

import yaml

parsed = yaml.load("true")
print bool(parsed)

If this is too wide, it can be refined by testing the type result. If the yaml-returned type is a str, then it can’t be cast to any other type (that I can think of anyway), so you could handle that separately, or just let it be true.

I won’t make any guesses at speed, but since I am working with yaml data under Qt gui anyway, this has a nice symmetry.


回答 10

我不同意这里的任何解决方案,因为它们太宽容了。这通常不是解析字符串时想要的。

所以这是我正在使用的解决方案:

def to_bool(bool_str):
    """Parse the string and return the boolean value encoded or raise an exception"""
    if isinstance(bool_str, basestring) and bool_str: 
        if bool_str.lower() in ['true', 't', '1']: return True
        elif bool_str.lower() in ['false', 'f', '0']: return False

    #if here we couldn't parse it
    raise ValueError("%s is no recognized as a boolean value" % bool_str)

结果:

>>> [to_bool(v) for v in ['true','t','1','F','FALSE','0']]
[True, True, True, False, False, False]
>>> to_bool("")
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "<stdin>", line 8, in to_bool
ValueError: '' is no recognized as a boolean value

只是要清楚一点,因为它看起来好像我的回答以某种方式冒犯了别人:

关键是您不想只测试一个值并假设另一个值。我不认为您总是想将所有内容绝对映射到未解析的值。产生易于出错的代码。

因此,如果您知道要编码的内容。

I don’t agree with any solution here, as they are too permissive. This is not normally what you want when parsing a string.

So here the solution I’m using:

def to_bool(bool_str):
    """Parse the string and return the boolean value encoded or raise an exception"""
    if isinstance(bool_str, basestring) and bool_str: 
        if bool_str.lower() in ['true', 't', '1']: return True
        elif bool_str.lower() in ['false', 'f', '0']: return False

    #if here we couldn't parse it
    raise ValueError("%s is no recognized as a boolean value" % bool_str)

And the results:

>>> [to_bool(v) for v in ['true','t','1','F','FALSE','0']]
[True, True, True, False, False, False]
>>> to_bool("")
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "<stdin>", line 8, in to_bool
ValueError: '' is no recognized as a boolean value

Just to be clear because it looks as if my answer offended somebody somehow:

The point is that you don’t want to test for only one value and assume the other. I don’t think you always want to map Absolutely everything to the non parsed value. That produces error prone code.

So, if you know what you want code it in.


回答 11

dict(实际上是defaultdict)为您提供了一种非常简单的方法来完成此操作:

from collections import defaultdict
bool_mapping = defaultdict(bool) # Will give you False for non-found values
for val in ['True', 'yes', ...]:
    bool_mapping[val] = True

print(bool_mapping['True']) # True
print(bool_mapping['kitten']) # False

将该方法定制为所需的确切转换行为非常容易-您可以使用允许的Truthy和Falsy值填充该方法,并在找不到值时将其引发异常(或返回None),或者默认为True,或默认为False或您想要的任何值。

A dict (really, a defaultdict) gives you a pretty easy way to do this trick:

from collections import defaultdict
bool_mapping = defaultdict(bool) # Will give you False for non-found values
for val in ['True', 'yes', ...]:
    bool_mapping[val] = True

print(bool_mapping['True']) # True
print(bool_mapping['kitten']) # False

It’s really easy to tailor this method to the exact conversion behavior you want — you can fill it with allowed Truthy and Falsy values and let it raise an exception (or return None) when a value isn’t found, or default to True, or default to False, or whatever you want.


回答 12

您可能已经有了解决方案,但对于其他人,他们正在寻找一种方法,使用除false,no和0之外的“ Non”,[],{}和“”等“标准” false值将值转换为布尔值。

def toBoolean( val ):
    """ 
    Get the boolean value of the provided input.

        If the value is a boolean return the value.
        Otherwise check to see if the value is in 
        ["false", "f", "no", "n", "none", "0", "[]", "{}", "" ]
        and returns True if value is not in the list
    """

    if val is True or val is False:
        return val

    falseItems = ["false", "f", "no", "n", "none", "0", "[]", "{}", "" ]

    return not str( val ).strip().lower() in falseItems

You probably already have a solution but for others who are looking for a method to convert a value to a boolean value using “standard” false values including None, [], {}, and “” in addition to false, no , and 0.

def toBoolean( val ):
    """ 
    Get the boolean value of the provided input.

        If the value is a boolean return the value.
        Otherwise check to see if the value is in 
        ["false", "f", "no", "n", "none", "0", "[]", "{}", "" ]
        and returns True if value is not in the list
    """

    if val is True or val is False:
        return val

    falseItems = ["false", "f", "no", "n", "none", "0", "[]", "{}", "" ]

    return not str( val ).strip().lower() in falseItems

回答 13

您可以简单地使用内置函数eval()

a='True'
if a is True:
    print 'a is True, a type is', type(a)
else:
    print "a isn't True, a type is", type(a)
b = eval(a)
if b is True:
    print 'b is True, b type is', type(b)
else:
    print "b isn't True, b type is", type(b)

和输出:

a isn't True, a type is <type 'str'>
b is True, b type is <type 'bool'>

You can simply use the built-in function eval():

a='True'
if a is True:
    print 'a is True, a type is', type(a)
else:
    print "a isn't True, a type is", type(a)
b = eval(a)
if b is True:
    print 'b is True, b type is', type(b)
else:
    print "b isn't True, b type is", type(b)

and the output:

a isn't True, a type is <type 'str'>
b is True, b type is <type 'bool'>

回答 14

另一个选择

from ansible.module_utils.parsing.convert_bool import boolean
boolean('no')
# False
boolean('yEs')
# True
boolean('true')
# True

但是在生产中,如果不需要ansible及其所有依赖关系,一个好主意是查看其源代码并复制所需逻辑的一部分。

Yet another option

from ansible.module_utils.parsing.convert_bool import boolean
boolean('no')
# False
boolean('yEs')
# True
boolean('true')
# True

But in production if you don’t need ansible and all its dependencies, a good idea is to look at its source code and copy part of the logic that you need.


回答 15

进行浇铸,一个布尔值,通常的规则是,一些特殊的文字(False00.0()[]{})都是假的,然后一切是真实的,所以我提出以下建议:

def boolify(val):
    if (isinstance(val, basestring) and bool(val)):
        return not val in ('False', '0', '0.0')
    else:
        return bool(val)

The usual rule for casting to a bool is that a few special literals (False, 0, 0.0, (), [], {}) are false and then everything else is true, so I recommend the following:

def boolify(val):
    if (isinstance(val, basestring) and bool(val)):
        return not val in ('False', '0', '0.0')
    else:
        return bool(val)

回答 16

这是我写的版本。将其他几种解决方案合并为一个。

def to_bool(value):
    """
    Converts 'something' to boolean. Raises exception if it gets a string it doesn't handle.
    Case is ignored for strings. These string values are handled:
      True: 'True', "1", "TRue", "yes", "y", "t"
      False: "", "0", "faLse", "no", "n", "f"
    Non-string values are passed to bool.
    """
    if type(value) == type(''):
        if value.lower() in ("yes", "y", "true",  "t", "1"):
            return True
        if value.lower() in ("no",  "n", "false", "f", "0", ""):
            return False
        raise Exception('Invalid value for boolean conversion: ' + value)
    return bool(value)

如果它得到一个字符串,则它期望特定的值,否则引发Exception。如果没有得到字符串,只需让bool构造函数弄清楚即可。测试了以下情况:

test_cases = [
    ('true', True),
    ('t', True),
    ('yes', True),
    ('y', True),
    ('1', True),
    ('false', False),
    ('f', False),
    ('no', False),
    ('n', False),
    ('0', False),
    ('', False),
    (1, True),
    (0, False),
    (1.0, True),
    (0.0, False),
    ([], False),
    ({}, False),
    ((), False),
    ([1], True),
    ({1:2}, True),
    ((1,), True),
    (None, False),
    (object(), True),
    ]

This is the version I wrote. Combines several of the other solutions into one.

def to_bool(value):
    """
    Converts 'something' to boolean. Raises exception if it gets a string it doesn't handle.
    Case is ignored for strings. These string values are handled:
      True: 'True', "1", "TRue", "yes", "y", "t"
      False: "", "0", "faLse", "no", "n", "f"
    Non-string values are passed to bool.
    """
    if type(value) == type(''):
        if value.lower() in ("yes", "y", "true",  "t", "1"):
            return True
        if value.lower() in ("no",  "n", "false", "f", "0", ""):
            return False
        raise Exception('Invalid value for boolean conversion: ' + value)
    return bool(value)

If it gets a string it expects specific values, otherwise raises an Exception. If it doesn’t get a string, just lets the bool constructor figure it out. Tested these cases:

test_cases = [
    ('true', True),
    ('t', True),
    ('yes', True),
    ('y', True),
    ('1', True),
    ('false', False),
    ('f', False),
    ('no', False),
    ('n', False),
    ('0', False),
    ('', False),
    (1, True),
    (0, False),
    (1.0, True),
    (0.0, False),
    ([], False),
    ({}, False),
    ((), False),
    ([1], True),
    ({1:2}, True),
    ((1,), True),
    (None, False),
    (object(), True),
    ]

回答 17

如果您知道您的输入将为“ True”或“ False”,那么为什么不使用:

def bool_convert(s):
    return s == "True"

If you know that your input will be either “True” or “False” then why not use:

def bool_convert(s):
    return s == "True"

回答 18

我用

# function
def toBool(x):
    return x in ("True","true",True)

# test cases
[[x, toBool(x)] for x in [True,"True","true",False,"False","false",None,1,0,-1,123]]
"""
Result:
[[True, True],
 ['True', True],
 ['true', True],
 [False, False],
 ['False', False],
 ['false', False],
 [None, False],
 [1, True],
 [0, False],
 [-1, False],
 [123, False]]
"""

I use

# function
def toBool(x):
    return x in ("True","true",True)

# test cases
[[x, toBool(x)] for x in [True,"True","true",False,"False","false",None,1,0,-1,123]]
"""
Result:
[[True, True],
 ['True', True],
 ['true', True],
 [False, False],
 ['False', False],
 ['false', False],
 [None, False],
 [1, True],
 [0, False],
 [-1, False],
 [123, False]]
"""

回答 19

我喜欢使用三元运算符,因为对于某些东西来说,它不应该超过1行,因此更加简洁。

True if myString=="True" else False

I like to use the ternary operator for this, since it’s a bit more succinct for something that feels like it shouldn’t be more than 1 line.

True if myString=="True" else False

回答 20

我意识到这是一篇旧文章,但是某些解决方案需要大量代码,这就是我最终使用的内容:

def str2bool(value):
    return {"True": True, "true": True}.get(value, False)

I realize this is an old post, but some of the solutions require quite a bit of code, here’s what I ended up using:

def str2bool(value):
    return {"True": True, "true": True}.get(value, False)

回答 21

使用软件包str2bool pip install str2bool

Use package str2bool pip install str2bool


回答 22

如果您喜欢我,只需要来自字符串的变量中的布尔值即可。您可以使用@jzwiener前面提到的distils。但是,我无法按照他的建议导入和使用该模块。

相反,我最终在python3.7上以这种方式使用它

distutils字符串在Python中布尔

from distutils import util # to handle str to bool conversion
enable_deletion = 'False'
enable_deletion = bool(util.strtobool(enable_deletion))

distutils是python std lib的一部分,因此无需安装。太好了!👍

If you like me just need boolean from variable which is string. You can use distils as mentioned earlier by @jzwiener. However I could not import and use the module as he suggested.

Instead I end up using it this way on python3.7

distutils string to bool in python

from distutils import util # to handle str to bool conversion
enable_deletion = 'False'
enable_deletion = bool(util.strtobool(enable_deletion))

distutils is part of the python std lib so no need of installation. Which is great!👍


回答 23

我想分享我的简单解决方案:使用eval()。这将字符串转换True以及False如果字符串恰恰是在标题格式正确的布尔类型TrueFalse总是第一个字母大写,否则该函数将引发错误。

例如

>>> eval('False')
False

>>> eval('True')
True

当然,对于动态变量,您可以简单地使用.title()来格式化布尔字符串。

>>> x = 'true'
>>> eval(x.title())
True

这将引发错误。

>>> eval('true')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "<string>", line 1, in <module>
NameError: name 'true' is not defined

>>> eval('false')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "<string>", line 1, in <module>
NameError: name 'false' is not defined

I would like to share my simple solution: use the eval(). It will convert the string True and False to proper boolean type IF the string is exactly in title format True or False always first letter capital or else the function will raise an error.

e.g.

>>> eval('False')
False

>>> eval('True')
True

Of course for dynamic variable you can simple use the .title() to format the boolean string.

>>> x = 'true'
>>> eval(x.title())
True

This will throw an error.

>>> eval('true')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "<string>", line 1, in <module>
NameError: name 'true' is not defined

>>> eval('false')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "<string>", line 1, in <module>
NameError: name 'false' is not defined

回答 24

这是一个毛茸茸的方法,旨在获得许多相同的答案。请注意,尽管python认为""是false,而所有其他字符串都为true,但TCL对事物的看法却截然不同。

>>> import Tkinter
>>> tk = Tkinter.Tk()
>>> var = Tkinter.BooleanVar(tk)
>>> var.set("false")
>>> var.get()
False
>>> var.set("1")
>>> var.get()
True
>>> var.set("[exec 'rm -r /']")
>>> var.get()
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/lib/python2.5/lib-tk/Tkinter.py", line 324, in get
    return self._tk.getboolean(self._tk.globalgetvar(self._name))
_tkinter.TclError: 0expected boolean value but got "[exec 'rm -r /']"
>>> 

这样做的好处是,您可以宽恕可以使用的值。关于将字符串转换为值是懒惰的,对于接受和拒绝的内容也很不合时宜(请注意,如果上述声明是在tcl提示符下给出的,则会擦除用户的硬盘)。

坏的事情是,它要求Tkinter可用,这通常是正确的,但并非普遍如此,更重要的是,它要求创建一个Tk实例,它相对较重。

什么被认为是真还是假取决于的行为Tcl_GetBoolean,它认为0falsenooff为假1trueyeson是真实的,不区分大小写。任何其他字符串(包括空字符串)都会导致异常。

here’s a hairy, built in way to get many of the same answers. Note that although python considers "" to be false and all other strings to be true, TCL has a very different idea about things.

>>> import Tkinter
>>> tk = Tkinter.Tk()
>>> var = Tkinter.BooleanVar(tk)
>>> var.set("false")
>>> var.get()
False
>>> var.set("1")
>>> var.get()
True
>>> var.set("[exec 'rm -r /']")
>>> var.get()
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/lib/python2.5/lib-tk/Tkinter.py", line 324, in get
    return self._tk.getboolean(self._tk.globalgetvar(self._name))
_tkinter.TclError: 0expected boolean value but got "[exec 'rm -r /']"
>>> 

A good thing about this is that it is fairly forgiving about the values you can use. It’s lazy about turning strings into values, and it’s hygenic about what it accepts and rejects(notice that if the above statement were given at a tcl prompt, it would erase the users hard disk).

the bad thing is that it requires that Tkinter be available, which is usually, but not universally true, and more significantly, requires that a Tk instance be created, which is comparatively heavy.

What is considered true or false depends on the behavior of the Tcl_GetBoolean, which considers 0, false, no and off to be false and 1, true, yes and on to be true, case insensitive. Any other string, including the empty string, cause an exception.


回答 25

def str2bool(str):
  if isinstance(str, basestring) and str.lower() in ['0','false','no']:
    return False
  else:
    return bool(str)

想法:检查您是否希望将字符串评估为False;否则,bool()对于任何非空字符串都返回True。

def str2bool(str):
  if isinstance(str, basestring) and str.lower() in ['0','false','no']:
    return False
  else:
    return bool(str)

idea: check if you want the string to be evaluated to False; otherwise bool() returns True for any non-empty string.


回答 26

我汇总了以下内容以评估字符串的真实性:

def as_bool(val):
 if val:
  try:
   if not int(val): val=False
  except: pass
  try:
   if val.lower()=="false": val=False
  except: pass
 return bool(val)

与使用大致相同的结果,eval但更安全。

Here’s something I threw together to evaluate the truthiness of a string:

def as_bool(val):
 if val:
  try:
   if not int(val): val=False
  except: pass
  try:
   if val.lower()=="false": val=False
  except: pass
 return bool(val)

more-or-less same results as using eval but safer.


回答 27

我只需要这样做…所以聚会晚了-但有人可能会觉得有用

def str_to_bool(input, default):
    """
    | Default | not_default_str | input   | result
    | T       |  "false"        | "true"  |  T
    | T       |  "false"        | "false" |  F
    | F       |  "true"         | "true"  |  T
    | F       |  "true"         | "false" |  F

    """
    if default:
        not_default_str = "false"
    else:
        not_default_str = "true"

    if input.lower() == not_default_str:
        return not default
    else:
        return default

I just had to do this… so maybe late to the party – but someone may find it useful

def str_to_bool(input, default):
    """
    | Default | not_default_str | input   | result
    | T       |  "false"        | "true"  |  T
    | T       |  "false"        | "false" |  F
    | F       |  "true"         | "true"  |  T
    | F       |  "true"         | "false" |  F

    """
    if default:
        not_default_str = "false"
    else:
        not_default_str = "true"

    if input.lower() == not_default_str:
        return not default
    else:
        return default

回答 28

如果您可以控制返回true/ 的实体,false则可以选择使它返回1/ 0而不是true/ false,然后:

boolean_response = bool(int(response))

用于int处理来自网络的响应(通常是字符串)的额外转换。

If you have control over the entity that’s returning true/false, one option is to have it return 1/0 instead of true/false, then:

boolean_response = bool(int(response))

The extra cast to int handles responses from a network, which are always string.


回答 29

通过使用Python的内置eval()函数和.capitalize()方法,您可以将任何“ true” /“ false”字符串(无论初始大小写如何)转换为真实的Python布尔值。

例如:

true_false = "trUE"
type(true_false)

# OUTPUT: <type 'str'>

true_false = eval(true_false.capitalize())
type(true_false)

# OUTPUT: <type 'bool'>

By using Python’s built-in eval() function and the .capitalize() method, you can convert any “true” / “false” string (regardless of initial capitalization) to a true Python boolean.

For example:

true_false = "trUE"
type(true_false)

# OUTPUT: <type 'str'>

true_false = eval(true_false.capitalize())
type(true_false)

# OUTPUT: <type 'bool'>