分类目录归档:知识问答

创建长的多行字符串的Pythonic方法

问题:创建长的多行字符串的Pythonic方法

我有一个很长的查询。我想在Python中将其分成几行。用JavaScript做到这一点的一种方法是使用几个句子,然后将它们与一个+运算符连接起来(我知道,这可能不是最有效的方法,但是我并不真正关心此阶段的性能,只是代码可读性) 。例:

var long_string = 'some text not important. just garbage to' +
                  'illustrate my example';

我尝试在Python中做类似的事情,但是没有用,所以我过去常常\拆分长字符串。但是,我不确定这是否是唯一/最佳/最佳的方法。看起来很尴尬。实际代码:

query = 'SELECT action.descr as "action", '\
    'role.id as role_id,'\
    'role.descr as role'\
    'FROM '\
    'public.role_action_def,'\
    'public.role,'\
    'public.record_def, '\
    'public.action'\
    'WHERE role.id = role_action_def.role_id AND'\
    'record_def.id = role_action_def.def_id AND'\
    'action.id = role_action_def.action_id AND'\
    'role_action_def.account_id = ' + account_id + ' AND'\
    'record_def.account_id=' + account_id + ' AND'\
    'def_id=' + def_id

I have a very long query. I would like to split it in several lines in Python. A way to do it in JavaScript would be using several sentences and joining them with a + operator (I know, maybe it’s not the most efficient way to do it, but I’m not really concerned about performance in this stage, just code readability). Example:

var long_string = 'some text not important. just garbage to' +
                  'illustrate my example';

I tried doing something similar in Python, but it didn’t work, so I used \ to split the long string. However, I’m not sure if this is the only/best/pythonicest way of doing it. It looks awkward. Actual code:

query = 'SELECT action.descr as "action", '\
    'role.id as role_id,'\
    'role.descr as role'\
    'FROM '\
    'public.role_action_def,'\
    'public.role,'\
    'public.record_def, '\
    'public.action'\
    'WHERE role.id = role_action_def.role_id AND'\
    'record_def.id = role_action_def.def_id AND'\
    'action.id = role_action_def.action_id AND'\
    'role_action_def.account_id = ' + account_id + ' AND'\
    'record_def.account_id=' + account_id + ' AND'\
    'def_id=' + def_id

回答 0

您在谈论多行字符串吗?容易,使用三引号将它们开始和结束。

s = """ this is a very
        long string if I had the
        energy to type more and more ..."""

您也可以使用单引号(当然在开始和结束时使用三个引号),并将结果字符串s与其他任何字符串一样对待。

注意:与任何字符串一样,引号和结尾引号之间的任何内容都将成为字符串的一部分,因此本示例中有一个前导空格(如@ root45所指出)。该字符串还将包含空格和换行符。

即:

' this is a very\n        long string if I had the\n        energy to type more and more ...'

最后,还可以像这样在Python中构造长行:

 s = ("this is a very"
      "long string too"
      "for sure ..."
     )

其中将包含任何额外的空格或换行符(这是一个有意的示例,显示了跳过空格的结果将导致什么):

'this is a verylong string toofor sure ...'

不需要逗号,只需将要连接的字符串放在一对括号中,并确保考虑到任何需要的空格和换行符。

Are you talking about multi-line strings? Easy, use triple quotes to start and end them.

s = """ this is a very
        long string if I had the
        energy to type more and more ..."""

You can use single quotes too (3 of them of course at start and end) and treat the resulting string s just like any other string.

NOTE: Just as with any string, anything between the starting and ending quotes becomes part of the string, so this example has a leading blank (as pointed out by @root45). This string will also contain both blanks and newlines.

I.e.,:

' this is a very\n        long string if I had the\n        energy to type more and more ...'

Finally, one can also construct long lines in Python like this:

 s = ("this is a very"
      "long string too"
      "for sure ..."
     )

which will not include any extra blanks or newlines (this is a deliberate example showing what the effect of skipping blanks will result in):

'this is a verylong string toofor sure ...'

No commas required, simply place the strings to be joined together into a pair of parenthesis and be sure to account for any needed blanks and newlines.


回答 1

如果您不希望使用多行字符串,而只需要一个长的单行字符串,则可以使用括号,只需确保在字符串段之间不包含逗号,那么它将是一个元组。

query = ('SELECT   action.descr as "action", '
         'role.id as role_id,'
         'role.descr as role'
         ' FROM '
         'public.role_action_def,'
         'public.role,'
         'public.record_def, '
         'public.action'
         ' WHERE role.id = role_action_def.role_id AND'
         ' record_def.id = role_action_def.def_id AND'
         ' action.id = role_action_def.action_id AND'
         ' role_action_def.account_id = '+account_id+' AND'
         ' record_def.account_id='+account_id+' AND'
         ' def_id='+def_id)

在您正在构造的SQL语句中,多行字符串也可以。但是,如果多行字符串将包含额外的空格将是一个问题,那么这将是实现所需内容的好方法。

If you don’t want a multiline string but just have a long single line string, you can use parentheses, just make sure you don’t include commas between the string segments, then it will be a tuple.

query = ('SELECT   action.descr as "action", '
         'role.id as role_id,'
         'role.descr as role'
         ' FROM '
         'public.role_action_def,'
         'public.role,'
         'public.record_def, '
         'public.action'
         ' WHERE role.id = role_action_def.role_id AND'
         ' record_def.id = role_action_def.def_id AND'
         ' action.id = role_action_def.action_id AND'
         ' role_action_def.account_id = '+account_id+' AND'
         ' record_def.account_id='+account_id+' AND'
         ' def_id='+def_id)

In a SQL statement like what you’re constructing, multiline strings would also be fine. But if the extra whitespace a multiline string would contain would be a problem, then this would be a good way to achieve what you want.


回答 2

打破行\对我的作品。这是一个例子:

longStr = "This is a very long string " \
        "that I wrote to help somebody " \
        "who had a question about " \
        "writing long strings in Python"

Breaking lines by \ works for me. Here is an example:

longStr = "This is a very long string " \
        "that I wrote to help somebody " \
        "who had a question about " \
        "writing long strings in Python"

回答 3

我发现自己对此很满意:

string = """This is a
very long string,
containing commas,
that I split up
for readability""".replace('\n',' ')

I found myself happy with this one:

string = """This is a
very long string,
containing commas,
that I split up
for readability""".replace('\n',' ')

回答 4

我发现在构建长字符串时,通常会执行诸如构建SQL查询之类的事情,在这种情况下,这是最好的:

query = ' '.join((  # note double parens, join() takes an iterable
    "SELECT foo",
    "FROM bar",
    "WHERE baz",
))

莱文的建议是好的,但可能容易出错:

query = (
    "SELECT foo"
    "FROM bar"
    "WHERE baz"
)

query == "SELECT fooFROM barWHERE baz"  # probably not what you want

I find that when building long strings, you are usually doing something like building an SQL query, in which case this is best:

query = ' '.join((  # note double parens, join() takes an iterable
    "SELECT foo",
    "FROM bar",
    "WHERE baz",
))

What Levon suggested is good, but might be vulnerable to mistakes:

query = (
    "SELECT foo"
    "FROM bar"
    "WHERE baz"
)

query == "SELECT fooFROM barWHERE baz"  # probably not what you want

回答 5

您还可以在使用“”符号时串联变量:

foo = '1234'

long_string = """fosdl a sdlfklaskdf as
as df ajsdfj asdfa sld
a sdf alsdfl alsdfl """ +  foo + """ aks
asdkfkasdk fak"""

编辑:找到了一种更好的方法,命名为params和.format():

body = """
<html>
<head>
</head>
<body>
    <p>Lorem ipsum.</p>
    <dl>
        <dt>Asdf:</dt>     <dd><a href="{link}">{name}</a></dd>
    </dl>
    </body>
</html>
""".format(
    link='http://www.asdf.com',
    name='Asdf',
)

print(body)

You can also concatenate variables in when using “”” notation:

foo = '1234'

long_string = """fosdl a sdlfklaskdf as
as df ajsdfj asdfa sld
a sdf alsdfl alsdfl """ +  foo + """ aks
asdkfkasdk fak"""

EDIT: Found a better way, with named params and .format():

body = """
<html>
<head>
</head>
<body>
    <p>Lorem ipsum.</p>
    <dl>
        <dt>Asdf:</dt>     <dd><a href="{link}">{name}</a></dd>
    </dl>
    </body>
</html>
""".format(
    link='http://www.asdf.com',
    name='Asdf',
)

print(body)

回答 6

此方法使用:

  • 只需一个反斜杠即可避免初始换行
  • 通过使用三引号引起来的字符串,几乎没有内部标点符号
  • 使用textwrap inspect模块去除局部缩进
  • account_iddef_id变量使用python 3.6格式的字符串插值(’f’)。

这种方式对我来说似乎是最pythonic的。

# import textwrap  # See update to answer below
import inspect

# query = textwrap.dedent(f'''\
query = inspect.cleandoc(f'''
    SELECT action.descr as "action", 
    role.id as role_id,
    role.descr as role
    FROM 
    public.role_action_def,
    public.role,
    public.record_def, 
    public.action
    WHERE role.id = role_action_def.role_id AND
    record_def.id = role_action_def.def_id AND
    action.id = role_action_def.action_id AND
    role_action_def.account_id = {account_id} AND
    record_def.account_id={account_id} AND
    def_id={def_id}'''
)

更新:1/29/2019合并@ShadowRanger的建议使用inspect.cleandoc代替textwrap.dedent

This approach uses:

  • just one backslash to avoid an initial linefeed
  • almost no internal punctuation by using a triple quoted string
  • strips away local indentation using the textwrap inspect module
  • uses python 3.6 formatted string interpolation (‘f’) for the account_id and def_id variables.

This way looks the most pythonic to me.

# import textwrap  # See update to answer below
import inspect

# query = textwrap.dedent(f'''\
query = inspect.cleandoc(f'''
    SELECT action.descr as "action", 
    role.id as role_id,
    role.descr as role
    FROM 
    public.role_action_def,
    public.role,
    public.record_def, 
    public.action
    WHERE role.id = role_action_def.role_id AND
    record_def.id = role_action_def.def_id AND
    action.id = role_action_def.action_id AND
    role_action_def.account_id = {account_id} AND
    record_def.account_id={account_id} AND
    def_id={def_id}'''
)

Update: 1/29/2019 Incorporate @ShadowRanger’s suggestion to use inspect.cleandoc instead of textwrap.dedent


回答 7

在Python> = 3.6中,您可以使用格式化字符串文字(f字符串)

query= f'''SELECT   action.descr as "action"
    role.id as role_id,
    role.descr as role
    FROM
    public.role_action_def,
    public.role,
    public.record_def,
    public.action
    WHERE role.id = role_action_def.role_id AND
    record_def.id = role_action_def.def_id AND
    action.id = role_action_def.action_id AND
    role_action_def.account_id = {account_id} AND
    record_def.account_id = {account_id} AND
    def_id = {def_id}'''

In Python >= 3.6 you can use Formatted string literals (f string)

query= f'''SELECT   action.descr as "action"
    role.id as role_id,
    role.descr as role
    FROM
    public.role_action_def,
    public.role,
    public.record_def,
    public.action
    WHERE role.id = role_action_def.role_id AND
    record_def.id = role_action_def.def_id AND
    action.id = role_action_def.action_id AND
    role_action_def.account_id = {account_id} AND
    record_def.account_id = {account_id} AND
    def_id = {def_id}'''

回答 8

例如:

sql = ("select field1, field2, field3, field4 "
       "from table "
       "where condition1={} "
       "and condition2={}").format(1, 2)

Output: 'select field1, field2, field3, field4 from table 
         where condition1=1 and condition2=2'

如果condition的值应该是字符串,则可以这样:

sql = ("select field1, field2, field3, field4 "
       "from table "
       "where condition1='{0}' "
       "and condition2='{1}'").format('2016-10-12', '2017-10-12')

Output: "select field1, field2, field3, field4 from table where
         condition1='2016-10-12' and condition2='2017-10-12'"

For example:

sql = ("select field1, field2, field3, field4 "
       "from table "
       "where condition1={} "
       "and condition2={}").format(1, 2)

Output: 'select field1, field2, field3, field4 from table 
         where condition1=1 and condition2=2'

if the value of condition should be a string, you can do like this:

sql = ("select field1, field2, field3, field4 "
       "from table "
       "where condition1='{0}' "
       "and condition2='{1}'").format('2016-10-12', '2017-10-12')

Output: "select field1, field2, field3, field4 from table where
         condition1='2016-10-12' and condition2='2017-10-12'"

回答 9

textwrap.dedent这里找到了长字符串的最佳选择:

def create_snippet():
    code_snippet = textwrap.dedent("""\
        int main(int argc, char* argv[]) {
            return 0;
        }
    """)
    do_something(code_snippet)

I find textwrap.dedent the best for long strings as described here:

def create_snippet():
    code_snippet = textwrap.dedent("""\
        int main(int argc, char* argv[]) {
            return 0;
        }
    """)
    do_something(code_snippet)

回答 10

其他人已经提到了括号方法,但是我想在括号中添加,允许内联注释。

对每个片段进行评论:

nursery_rhyme = (
    'Mary had a little lamb,'          # Comments are great!
    'its fleece was white as snow.'
    'And everywhere that Mary went,'
    'her sheep would surely go.'       # What a pesky sheep.
)

继续后不允许发表评论:

当使用反斜杠连续行(\)时,不允许注释。您会收到一个SyntaxError: unexpected character after line continuation character错误消息。

nursery_rhyme = 'Mary had a little lamb,' \  # These comments
    'its fleece was white as snow.'       \  # are invalid!
    'And everywhere that Mary went,'      \
    'her sheep would surely go.'
# => SyntaxError: unexpected character after line continuation character

对Regex字符串的更好注释:

根据https://docs.python.org/3/library/re.html#re.VERBOSE的示例,

a = re.compile(
    r'\d+'  # the integral part
    r'\.'   # the decimal point
    r'\d*'  # some fractional digits
)
# Using VERBOSE flag, IDE usually can't syntax highight the string comment.
a = re.compile(r"""\d +  # the integral part
                   \.    # the decimal point
                   \d *  # some fractional digits""", re.X)

Others have mentioned the parentheses method already, but I’d like to add that with parentheses, inline comments are allowed.

Comment on each fragment:

nursery_rhyme = (
    'Mary had a little lamb,'          # Comments are great!
    'its fleece was white as snow.'
    'And everywhere that Mary went,'
    'her sheep would surely go.'       # What a pesky sheep.
)

Comment not allowed after continuation:

When using backslash line continuations (\ ), comments are not allowed. You’ll receive a SyntaxError: unexpected character after line continuation character error.

nursery_rhyme = 'Mary had a little lamb,' \  # These comments
    'its fleece was white as snow.'       \  # are invalid!
    'And everywhere that Mary went,'      \
    'her sheep would surely go.'
# => SyntaxError: unexpected character after line continuation character

Better comments for Regex strings:

Based on the example from https://docs.python.org/3/library/re.html#re.VERBOSE,

a = re.compile(
    r'\d+'  # the integral part
    r'\.'   # the decimal point
    r'\d*'  # some fractional digits
)
# Using VERBOSE flag, IDE usually can't syntax highight the string comment.
a = re.compile(r"""\d +  # the integral part
                   \.    # the decimal point
                   \d *  # some fractional digits""", re.X)

回答 11

我个人发现以下是用Python编写原始SQL查询的最佳方式(简单,安全和Pythonic),尤其是在使用Python的sqlite3模块时

query = '''
    SELECT
        action.descr as action,
        role.id as role_id,
        role.descr as role
    FROM
        public.role_action_def,
        public.role,
        public.record_def,
        public.action
    WHERE
        role.id = role_action_def.role_id
        AND record_def.id = role_action_def.def_id
        AND action.id = role_action_def.action_id
        AND role_action_def.account_id = ?
        AND record_def.account_id = ?
        AND def_id = ?
'''
vars = (account_id, account_id, def_id)   # a tuple of query variables
cursor.execute(query, vars)   # using Python's sqlite3 module

优点

  • 简洁的代码(Pythonic!)
  • 防止SQL注入
  • 与Python 2和Python 3兼容(毕竟是Pythonic)
  • 无需字符串连接
  • 无需确保每行的最右字符是一个空格

缺点

  • 由于查询中的变量已被?占位符替换,因此?当查询中有很多变量时,要跟踪哪个变量将被哪个Python变量替换可能会有些困难。

I personally find the following to be the best (simple, safe and Pythonic) way to write raw SQL queries in Python, especially when using Python’s sqlite3 module:

query = '''
    SELECT
        action.descr as action,
        role.id as role_id,
        role.descr as role
    FROM
        public.role_action_def,
        public.role,
        public.record_def,
        public.action
    WHERE
        role.id = role_action_def.role_id
        AND record_def.id = role_action_def.def_id
        AND action.id = role_action_def.action_id
        AND role_action_def.account_id = ?
        AND record_def.account_id = ?
        AND def_id = ?
'''
vars = (account_id, account_id, def_id)   # a tuple of query variables
cursor.execute(query, vars)   # using Python's sqlite3 module

Pros

  • Neat and simple code (Pythonic!)
  • Safe from SQL injection
  • Compatible with both Python 2 and Python 3 (it’s Pythonic after all)
  • No string concatenation required
  • No need to ensure that the right-most character of each line is a space

Cons

  • Since variables in the query are replaced by the ? placeholder, it may become a little difficult to keep track of which ? is to be substituted by which Python variable when there are lots of them in the query.

回答 12

我通常使用这样的东西:

text = '''
    This string was typed to be a demo
    on how could we write a multi-line
    text in Python.
'''

如果要删除每行中令人讨厌的空格,可以执行以下操作:

text = '\n'.join(line.lstrip() for line in text.splitlines())

I usually use something like this:

text = '''
    This string was typed to be a demo
    on how could we write a multi-line
    text in Python.
'''

If you want to remove annoying blank spaces in each line, you could do as follows:

text = '\n'.join(line.lstrip() for line in text.splitlines())

回答 13

您的实际代码不起作用,在“行”末尾缺少空格(例如: role.descr as roleFROM...

多行字符串有三引号:

string = """line
  line2
  line3"""

它将包含换行符和多余的空格,但是对于SQL来说这不是问题。

Your actual code shouldn’t work, you are missing whitespaces at the end of “lines” (eg: role.descr as roleFROM...)

There is triplequotes for multiline string:

string = """line
  line2
  line3"""

It will contain the line breaks and extra spaces, but for SQL that’s not a problem.


回答 14

您还可以将sql语句放置在单独的文件中,action.sql然后使用以下命令将其加载到py文件中:

with open('action.sql') as f:
   query = f.read()

因此,sql语句将与python代码分开。如果sql语句中有需要从python填充的参数,则可以使用字符串格式(例如%s或{field})

You can also place the sql-statement in a seperate file action.sql and load it in the py file with

with open('action.sql') as f:
   query = f.read()

So the sql-statements will be separated from the python code. If there are parameters in the sql statement which needs to be filled from python, you can use string formating (like %s or {field})


回答 15

“Àla” Scala方式(但是我认为这是OQ要求的最Python方式):

description = """
            | The intention of this module is to provide a method to 
            | pass meta information in markdown_ header files for 
            | using it in jinja_ templates. 
            | 
            | Also, to provide a method to use markdown files as jinja 
            | templates. Maybe you prefer to see the code than 
            | to install it.""".replace('\n            | \n','\n').replace('            | ',' ')

如果您想要没有跳线的最终str,只需将其放在\n第二个替换的第一个参数的开头:

.replace('\n            | ',' ')`.

注意:“ …模板”之间的白线。和“还,…”在后面需要一个空格|

“À la” Scala way (but I think is the most pythonic way as OQ demands):

description = """
            | The intention of this module is to provide a method to 
            | pass meta information in markdown_ header files for 
            | using it in jinja_ templates. 
            | 
            | Also, to provide a method to use markdown files as jinja 
            | templates. Maybe you prefer to see the code than 
            | to install it.""".replace('\n            | \n','\n').replace('            | ',' ')

If you want final str without jump lines, just put \n at the start of the first argument of the second replace:

.replace('\n            | ',' ')`.

Note: the white line between “…templates.” and “Also, …” requires a whitespace after the |.


回答 16

tl; dr:使用"""\"""包装字符串,如

string = """\
This is a long string
spanning multiple lines.
"""

官方python文档中

字符串文字可以跨越多行。一种方法是使用三引号:“”“ …”“”或”’…”’。行尾会自动包含在字符串中,但是可以通过在行尾添加\来防止这种情况。下面的例子:

print("""\
Usage: thingy [OPTIONS]
     -h                        Display this usage message
     -H hostname               Hostname to connect to
""")

产生以下输出(请注意,不包括初始换行符):

Usage: thingy [OPTIONS]
     -h                        Display this usage message
     -H hostname               Hostname to connect to

tl;dr: Use """\ and """ to wrap the string, as in

string = """\
This is a long string
spanning multiple lines.
"""

From the official python documentation:

String literals can span multiple lines. One way is using triple-quotes: “””…””” or ”’…”’. End of lines are automatically included in the string, but it’s possible to prevent this by adding a \ at the end of the line. The following example:

print("""\
Usage: thingy [OPTIONS]
     -h                        Display this usage message
     -H hostname               Hostname to connect to
""")

produces the following output (note that the initial newline is not included):

Usage: thingy [OPTIONS]
     -h                        Display this usage message
     -H hostname               Hostname to connect to

回答 17

嘿,尝试这种希望能起作用的方法,就像这种格式,它将像您已成功查询此属性一样,返回一条连续的行。

"message": f'you have successfully inquired about '
           f'{enquiring_property.title} Property owned by '
           f'{enquiring_property.client}'

Hey try something like this hope it works, like in this format it will return you a continuous line like you have successfully enquired about this property`

"message": f'you have successfully inquired about '
           f'{enquiring_property.title} Property owned by '
           f'{enquiring_property.client}'

回答 18

我使用递归函数来构建复杂的SQL查询。此技术通常可用于构建大型字符串,同时保持代码的可读性。

# Utility function to recursively resolve SQL statements.
# CAUTION: Use this function carefully, Pass correct SQL parameters {},
# TODO: This should never happen but check for infinite loops
def resolveSQL(sql_seed, sqlparams):
    sql = sql_seed % (sqlparams)
    if sql == sql_seed:
        return ' '.join([x.strip() for x in sql.split()])
    else:
        return resolveSQL(sql, sqlparams)

PS:看一下很棒的python-sqlparse库,可以根据需要漂亮地打印SQL查询。 http://sqlparse.readthedocs.org/en/latest/api/#sqlparse.format

I use a recursive function to build complex SQL Queries. This technique can generally be used to build large strings while maintaining code readability.

# Utility function to recursively resolve SQL statements.
# CAUTION: Use this function carefully, Pass correct SQL parameters {},
# TODO: This should never happen but check for infinite loops
def resolveSQL(sql_seed, sqlparams):
    sql = sql_seed % (sqlparams)
    if sql == sql_seed:
        return ' '.join([x.strip() for x in sql.split()])
    else:
        return resolveSQL(sql, sqlparams)

P.S: Have a look at the awesome python-sqlparse library to pretty print SQL queries if needed. http://sqlparse.readthedocs.org/en/latest/api/#sqlparse.format


回答 19

当代码(例如变量)缩进并且输出字符串应该是一个衬线(没有换行符)时,我认为另一种方法更易读:

def some_method():

    long_string = """
a presumptuous long string 
which looks a bit nicer 
in a text editor when
written over multiple lines
""".strip('\n').replace('\n', ' ')

    return long_string 

Another option that I think is more readable when the code (e.g variable) is indented and the output string should be a one liner (no newlines):

def some_method():

    long_string = """
a presumptuous long string 
which looks a bit nicer 
in a text editor when
written over multiple lines
""".strip('\n').replace('\n', ' ')

    return long_string 

回答 20

使用三引号。人们经常在程序开始时使用它们来创建文档字符串,以解释其目的以及与该文档创建相关的其他信息。人们还在功能中使用这些来解释功能的目的和应用。例:

'''
Filename: practice.py
File creator: me
File purpose: explain triple quotes
'''


def example():
    """This prints a string that occupies multiple lines!!"""
    print("""
    This
    is 
    a multi-line
    string!
    """)

Use triple quotation marks. People often use these to create docstrings at the start of programs to explain their purpose and other information relevant to its creation. People also use these in functions to explain the purpose and application of functions. Example:

'''
Filename: practice.py
File creator: me
File purpose: explain triple quotes
'''


def example():
    """This prints a string that occupies multiple lines!!"""
    print("""
    This
    is 
    a multi-line
    string!
    """)

回答 21

我喜欢这种方法,因为它具有阅读的特权。如果我们的弦长,那就没办法了!根据您所处的缩进级别,仍然限制为每行80个字符。。。嗯…无需赘述。我认为python样式指南仍然很模糊。我采用@Eero Aaltonen方法是因为它具有阅读和常识的特权。我知道样式指南应该对我们有帮助,而不会使我们的生活变得一团糟。谢谢!

class ClassName():
    def method_name():
        if condition_0:
            if condition_1:
                if condition_2:
                    some_variable_0 =\
"""
some_js_func_call(
    undefined, 
    {
        'some_attr_0': 'value_0', 
        'some_attr_1': 'value_1', 
        'some_attr_2': '""" + some_variable_1 + """'
    }, 
    undefined, 
    undefined, 
    true
)
"""

I like this approach because it privileges reading. In cases where we have long strings there is no way! Depending on the level of indentation you are in and still limited to 80 characters per line… Well… No need to say anything else. In my view the python style guides are still very vague. I took the @Eero Aaltonen approach because it privileges reading and common sense. I understand that style guides should help us and not make our lives a mess. Thanks!

class ClassName():
    def method_name():
        if condition_0:
            if condition_1:
                if condition_2:
                    some_variable_0 =\
"""
some_js_func_call(
    undefined, 
    {
        'some_attr_0': 'value_0', 
        'some_attr_1': 'value_1', 
        'some_attr_2': '""" + some_variable_1 + """'
    }, 
    undefined, 
    undefined, 
    true
)
"""

回答 22

官方python文档中

字符串文字可以跨越多行。一种方法是使用三引号:“”“ …”“”或”’…”’。行尾会自动包含在字符串中,但是可以通过在行尾添加\来防止这种情况。下面的例子:

print("""\
Usage: thingy [OPTIONS]
     -h                        Display this usage message
     -H hostname               Hostname to connect to
""")

产生以下输出(请注意,不包括初始换行符):

From the official python documentation:

String literals can span multiple lines. One way is using triple-quotes: “””…””” or ”’…”’. End of lines are automatically included in the string, but it’s possible to prevent this by adding a \ at the end of the line. The following example:

print("""\
Usage: thingy [OPTIONS]
     -h                        Display this usage message
     -H hostname               Hostname to connect to
""")

produces the following output (note that the initial newline is not included):


回答 23

为了在字典中定义一个长字符串, 保留换行符,但省略空格,我最终在一个常量中定义字符串,如下所示:

LONG_STRING = \
"""
This is a long sting
that contains newlines.
The newlines are important.
"""

my_dict = {
   'foo': 'bar',
   'string': LONG_STRING
}

For defining a long string inside a dict, keeping the newlines but omitting the spaces, I ended up defining the string in a constant like this:

LONG_STRING = \
"""
This is a long sting
that contains newlines.
The newlines are important.
"""

my_dict = {
   'foo': 'bar',
   'string': LONG_STRING
}

回答 24

作为Python中长字符串的一种通用方法,您可以使用三引号splitjoin

_str = ' '.join('''Lorem ipsum dolor sit amet, consectetur adipiscing 
        elit, sed do eiusmod tempor incididunt ut labore et dolore 
        magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation 
        ullamco laboris nisi ut aliquip ex ea commodo.'''.split())

输出:

'Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo.'

关于OP的与SQL查询有关的问题,下面的答案无视此构建SQL查询方法的正确性,并且仅关注以可读性和美观性方式构建长字符串,而没有其他导入。它还忽略了这带来的计算负荷。

使用三重引号,我们构建了一个长且可读的字符串,然后使用split()将该字符串分解为一个列表,从而去除了空格,然后将其与重新连接在一起' '.join()。最后,我们使用以下format()命令插入变量:

account_id = 123
def_id = 321

_str = '''
    SELECT action.descr AS "action", role.id AS role_id, role.descr AS role 
    FROM public.role_action_def, public.role, public.record_def, public.action
    WHERE role.id = role_action_def.role_id 
    AND record_def.id = role_action_def.def_id 
    AND' action.id = role_action_def.action_id 
    AND role_action_def.account_id = {} 
    AND record_def.account_id = {} 
    AND def_id = {}
    '''

query = ' '.join(_str.split()).format(account_id, account_id, def_id)

生成:

SELECT action.descr AS "action", role.id AS role_id, role.descr AS role FROM public.role_action_def, public.role, public.record_def, public.action WHERE role.id = role_action_def.role_id AND record_def.id = role_action_def.def_id AND\' action.id = role_action_def.action_id AND role_action_def.account_id = 123 AND record_def.account_id=123 AND def_id=321

编辑:这种方法不符合PEP8,但我有时发现它很有用

As a general approach to long strings in Python you can use triple quotes, split and join:

_str = ' '.join('''Lorem ipsum dolor sit amet, consectetur adipiscing 
        elit, sed do eiusmod tempor incididunt ut labore et dolore 
        magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation 
        ullamco laboris nisi ut aliquip ex ea commodo.'''.split())

Output:

'Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo.'

With regard to OP’s question relating to a SQL query, the answer below disregards the correctness of this approach to building SQL queries and focuses only on building long strings in a readable and aesthetic way without additional imports. It also disregards the computational load this entails.

Using triple quotes we build a long and readable string which we then break up into a list using split() thereby stripping the whitespace and then join it back together with ' '.join(). Finally we insert the variables using the format() command:

account_id = 123
def_id = 321

_str = '''
    SELECT action.descr AS "action", role.id AS role_id, role.descr AS role 
    FROM public.role_action_def, public.role, public.record_def, public.action
    WHERE role.id = role_action_def.role_id 
    AND record_def.id = role_action_def.def_id 
    AND' action.id = role_action_def.action_id 
    AND role_action_def.account_id = {} 
    AND record_def.account_id = {} 
    AND def_id = {}
    '''

query = ' '.join(_str.split()).format(account_id, account_id, def_id)

Produces:

SELECT action.descr AS "action", role.id AS role_id, role.descr AS role FROM public.role_action_def, public.role, public.record_def, public.action WHERE role.id = role_action_def.role_id AND record_def.id = role_action_def.def_id AND\' action.id = role_action_def.action_id AND role_action_def.account_id = 123 AND record_def.account_id=123 AND def_id=321

Edit: This approach is not in line with PEP8 but I find it useful at times


回答 25

通常,我将listjoin用于多行注释/字符串。

lines = list()
lines.append('SELECT action.enter code here descr as "action", ')
lines.append('role.id as role_id,')
lines.append('role.descr as role')
lines.append('FROM ')
lines.append('public.role_action_def,')
lines.append('public.role,')
lines.append('public.record_def, ')
lines.append('public.action')
query = " ".join(lines)

您可以使用任何字符串来连接所有此列表元素,例如’ \n‘(换行符)或’ ,‘(逗号)或’ ‘(空格)

干杯..!!

Generally, I use list and join for multi-line comments/string.

lines = list()
lines.append('SELECT action.enter code here descr as "action", ')
lines.append('role.id as role_id,')
lines.append('role.descr as role')
lines.append('FROM ')
lines.append('public.role_action_def,')
lines.append('public.role,')
lines.append('public.record_def, ')
lines.append('public.action')
query = " ".join(lines)

you can use any string to join all this list element like ‘\n‘(newline) or ‘,‘(comma) or ‘‘(space)

Cheers..!!


在Python中从文件名提取扩展名

问题:在Python中从文件名提取扩展名

是否有从文件名中提取扩展名的功能?

Is there a function to extract the extension from a filename?


回答 0

是。使用os.path.splitext(请参阅Python 2.X文档Python 3.X文档):

>>> import os
>>> filename, file_extension = os.path.splitext('/path/to/somefile.ext')
>>> filename
'/path/to/somefile'
>>> file_extension
'.ext'

与大多数手动字符串拆分尝试不同,os.path.splitext它将正确地/a/b.c/d视为没有扩展而不是具有extension .c/d,并且将被.bashrc视为没有扩展而不是具有extension .bashrc

>>> os.path.splitext('/a/b.c/d')
('/a/b.c/d', '')
>>> os.path.splitext('.bashrc')
('.bashrc', '')

Yes. Use os.path.splitext(see Python 2.X documentation or Python 3.X documentation):

>>> import os
>>> filename, file_extension = os.path.splitext('/path/to/somefile.ext')
>>> filename
'/path/to/somefile'
>>> file_extension
'.ext'

Unlike most manual string-splitting attempts, os.path.splitext will correctly treat /a/b.c/d as having no extension instead of having extension .c/d, and it will treat .bashrc as having no extension instead of having extension .bashrc:

>>> os.path.splitext('/a/b.c/d')
('/a/b.c/d', '')
>>> os.path.splitext('.bashrc')
('.bashrc', '')

回答 1

import os.path
extension = os.path.splitext(filename)[1]
import os.path
extension = os.path.splitext(filename)[1]

回答 2

3.4版的新功能。

import pathlib

print(pathlib.Path('yourPath.example').suffix) # '.example'

令人惊讶的是,还没有人提到它pathlibpathlib真是太棒了!

如果需要所有后缀(例如,如果有.tar.gz),.suffixes将返回它们的列表!

New in version 3.4.

import pathlib

print(pathlib.Path('yourPath.example').suffix) # '.example'

I’m surprised no one has mentioned pathlib yet, pathlib IS awesome!

If you need all the suffixes (eg if you have a .tar.gz), .suffixes will return a list of them!


回答 3

import os.path
extension = os.path.splitext(filename)[1][1:]

只获取扩展名的文本,不带点。

import os.path
extension = os.path.splitext(filename)[1][1:]

To get only the text of the extension, without the dot.


回答 4

一种选择可能是与点分开:

>>> filename = "example.jpeg"
>>> filename.split(".")[-1]
'jpeg'

文件没有扩展名时没有错误:

>>> "filename".split(".")[-1]
'filename'

但您必须小心:

>>> "png".split(".")[-1]
'png'    # But file doesn't have an extension

One option may be splitting from dot:

>>> filename = "example.jpeg"
>>> filename.split(".")[-1]
'jpeg'

No error when file doesn’t have an extension:

>>> "filename".split(".")[-1]
'filename'

But you must be careful:

>>> "png".split(".")[-1]
'png'    # But file doesn't have an extension

回答 5

值得在其中添加一个下标,这样您就不会怀疑自己为什么未在列表中显示JPG。

os.path.splitext(filename)[1][1:].strip().lower()

worth adding a lower in there so you don’t find yourself wondering why the JPG’s aren’t showing up in your list.

os.path.splitext(filename)[1][1:].strip().lower()

回答 6

上面的任何解决方案都可以,但是在linux上,我发现扩展字符串的末尾有换行符,这将阻止匹配成功。将strip()方法添加到末尾。例如:

import os.path
extension = os.path.splitext(filename)[1][1:].strip() 

Any of the solutions above work, but on linux I have found that there is a newline at the end of the extension string which will prevent matches from succeeding. Add the strip() method to the end. For example:

import os.path
extension = os.path.splitext(filename)[1][1:].strip() 

回答 7

随着splitext有与双扩展名的文件的问题(例如file.tar.gzfile.tar.bz2等..)

>>> fileName, fileExtension = os.path.splitext('/path/to/somefile.tar.gz')
>>> fileExtension 
'.gz'

但应为: .tar.gz

可能的解决方案在这里

With splitext there are problems with files with double extension (e.g. file.tar.gz, file.tar.bz2, etc..)

>>> fileName, fileExtension = os.path.splitext('/path/to/somefile.tar.gz')
>>> fileExtension 
'.gz'

but should be: .tar.gz

The possible solutions are here


回答 8

您可以在pathlib模块中找到一些很棒的东西(在python 3.x中可用)。

import pathlib
x = pathlib.PurePosixPath("C:\\Path\\To\\File\\myfile.txt").suffix
print(x)

# Output 
'.txt'

You can find some great stuff in pathlib module (available in python 3.x).

import pathlib
x = pathlib.PurePosixPath("C:\\Path\\To\\File\\myfile.txt").suffix
print(x)

# Output 
'.txt'

回答 9

尽管这是一个古老的话题,但是我想知道为什么在这种情况下为什么没有提到一个叫做rpartition的非常简单的python api:

要获取给定文件绝对路径的扩展名,只需键入:

filepath.rpartition('.')[-1]

例:

path = '/home/jersey/remote/data/test.csv'
print path.rpartition('.')[-1]

会给你:’csv’

Although it is an old topic, but i wonder why there is none mentioning a very simple api of python called rpartition in this case:

to get extension of a given file absolute path, you can simply type:

filepath.rpartition('.')[-1]

example:

path = '/home/jersey/remote/data/test.csv'
print path.rpartition('.')[-1]

will give you: ‘csv’


回答 10

只是join全部pathlib suffixes

>>> x = 'file/path/archive.tar.gz'
>>> y = 'file/path/text.txt'
>>> ''.join(pathlib.Path(x).suffixes)
'.tar.gz'
>>> ''.join(pathlib.Path(y).suffixes)
'.txt'

Just join all pathlib suffixes.

>>> x = 'file/path/archive.tar.gz'
>>> y = 'file/path/text.txt'
>>> ''.join(pathlib.Path(x).suffixes)
'.tar.gz'
>>> ''.join(pathlib.Path(y).suffixes)
'.txt'

回答 11

惊讶的是尚未提及:

import os
fn = '/some/path/a.tar.gz'

basename = os.path.basename(fn)  # os independent
Out[] a.tar.gz

base = basename.split('.')[0]
Out[] a

ext = '.'.join(basename.split('.')[1:])   # <-- main part

# if you want a leading '.', and if no result `None`:
ext = '.' + ext if ext else None
Out[] .tar.gz

优点:

  • 我可以想到的任何东西都能按预期工作
  • 没有模块
  • 没有正则表达式
  • 跨平台
  • 易于扩展(例如,没有扩展引号,仅扩展的最后一部分)

作为功​​能:

def get_extension(filename):
    basename = os.path.basename(filename)  # os independent
    ext = '.'.join(basename.split('.')[1:])
    return '.' + ext if ext else None

Surprised this wasn’t mentioned yet:

import os
fn = '/some/path/a.tar.gz'

basename = os.path.basename(fn)  # os independent
Out[] a.tar.gz

base = basename.split('.')[0]
Out[] a

ext = '.'.join(basename.split('.')[1:])   # <-- main part

# if you want a leading '.', and if no result `None`:
ext = '.' + ext if ext else None
Out[] .tar.gz

Benefits:

  • Works as expected for anything I can think of
  • No modules
  • No regex
  • Cross-platform
  • Easily extendible (e.g. no leading dots for extension, only last part of extension)

As function:

def get_extension(filename):
    basename = os.path.basename(filename)  # os independent
    ext = '.'.join(basename.split('.')[1:])
    return '.' + ext if ext else None

回答 12

您可以在split上使用filename

f_extns = filename.split(".")
print ("The extension of the file is : " + repr(f_extns[-1]))

这不需要额外的库

You can use a split on a filename:

f_extns = filename.split(".")
print ("The extension of the file is : " + repr(f_extns[-1]))

This does not require additional library


回答 13

filename='ext.tar.gz'
extension = filename[filename.rfind('.'):]
filename='ext.tar.gz'
extension = filename[filename.rfind('.'):]

回答 14

这是一种直接的字符串表示技术:我看到了很多解决方案,但我认为大多数都在考虑拆分。但是,拆分在每次出现“。”时都会执行。。您宁愿寻找的是分区。

string = "folder/to_path/filename.ext"
extension = string.rpartition(".")[-1]

This is a direct string representation techniques : I see a lot of solutions mentioned, but I think most are looking at split. Split however does it at every occurrence of “.” . What you would rather be looking for is partition.

string = "folder/to_path/filename.ext"
extension = string.rpartition(".")[-1]

回答 15

右拆分的另一种解决方案:

# to get extension only

s = 'test.ext'

if '.' in s: ext = s.rsplit('.', 1)[1]

# or, to get file name and extension

def split_filepath(s):
    """
    get filename and extension from filepath 
    filepath -> (filename, extension)
    """
    if not '.' in s: return (s, '')
    r = s.rsplit('.', 1)
    return (r[0], r[1])

Another solution with right split:

# to get extension only

s = 'test.ext'

if '.' in s: ext = s.rsplit('.', 1)[1]

# or, to get file name and extension

def split_filepath(s):
    """
    get filename and extension from filepath 
    filepath -> (filename, extension)
    """
    if not '.' in s: return (s, '')
    r = s.rsplit('.', 1)
    return (r[0], r[1])

回答 16

即使这个问题已经被回答,我也会在正则表达式中添加解决方案。

>>> import re
>>> file_suffix = ".*(\..*)"
>>> result = re.search(file_suffix, "somefile.ext")
>>> result.group(1)
'.ext'

Even this question is already answered I’d add the solution in Regex.

>>> import re
>>> file_suffix = ".*(\..*)"
>>> result = re.search(file_suffix, "somefile.ext")
>>> result.group(1)
'.ext'

回答 17

如果您喜欢正则表达式,则是真正的单线。而且即使您有其他“。”也没关系。在中间

import re

file_ext = re.search(r"\.([^.]+)$", filename).group(1)

结果请看这里:点击这里

A true one-liner, if you like regex. And it doesn’t matter even if you have additional “.” in the middle

import re

file_ext = re.search(r"\.([^.]+)$", filename).group(1)

See here for the result: Click Here


回答 18

这是在单行中同时获取文件名和扩展名的最简单方法

fName, ext = 'C:/folder name/Flower.jpeg'.split('/')[-1].split('.')

>>> print(fName)
Flower
>>> print(ext)
jpeg

与其他解决方案不同,您不需要为此导入任何软件包。

This is The Simplest Method to get both Filename & Extension in just a single line.

fName, ext = 'C:/folder name/Flower.jpeg'.split('/')[-1].split('.')

>>> print(fName)
Flower
>>> print(ext)
jpeg

Unlike other solutions, you don’t need to import any package for this.


回答 19

对于趣味性…只需将扩展名收集到字典中,然后将所有扩展名跟踪到文件夹中即可。然后,只需拉出所需的扩展名即可。

import os

search = {}

for f in os.listdir(os.getcwd()):
    fn, fe = os.path.splitext(f)
    try:
        search[fe].append(f)
    except:
        search[fe]=[f,]

extensions = ('.png','.jpg')
for ex in extensions:
    found = search.get(ex,'')
    if found:
        print(found)

For funsies… just collect the extensions in a dict, and track all of them in a folder. Then just pull the extensions you want.

import os

search = {}

for f in os.listdir(os.getcwd()):
    fn, fe = os.path.splitext(f)
    try:
        search[fe].append(f)
    except:
        search[fe]=[f,]

extensions = ('.png','.jpg')
for ex in extensions:
    found = search.get(ex,'')
    if found:
        print(found)

回答 20

尝试这个:

files = ['file.jpeg','file.tar.gz','file.png','file.foo.bar','file.etc']
pen_ext = ['foo', 'tar', 'bar', 'etc']

for file in files: #1
    if (file.split(".")[-2] in pen_ext): #2
        ext =  file.split(".")[-2]+"."+file.split(".")[-1]#3
    else:
        ext = file.split(".")[-1] #4
    print (ext) #5
  1. 获取列表中的所有文件名
  2. 分割文件名并检查倒数第二个扩展名,是否在pen_ext列表中?
  3. 如果是,则使用最后一个扩展名将其加入,并将其设置为文件的扩展名
  4. 如果不是,那么只需将最后一个扩展名作为文件的扩展名
  5. 然后检查一下

try this:

files = ['file.jpeg','file.tar.gz','file.png','file.foo.bar','file.etc']
pen_ext = ['foo', 'tar', 'bar', 'etc']

for file in files: #1
    if (file.split(".")[-2] in pen_ext): #2
        ext =  file.split(".")[-2]+"."+file.split(".")[-1]#3
    else:
        ext = file.split(".")[-1] #4
    print (ext) #5
  1. get all file name inside the list
  2. splitting file name and check the penultimate extension, is it in the pen_ext list or not?
  3. if yes then join it with the last extension and set it as the file’s extension
  4. if not then just put the last extension as the file’s extension
  5. and then check it out

回答 21

# try this, it works for anything, any length of extension
# e.g www.google.com/downloads/file1.gz.rs -> .gz.rs

import os.path

class LinkChecker:

    @staticmethod
    def get_link_extension(link: str)->str:
        if link is None or link == "":
            return ""
        else:
            paths = os.path.splitext(link)
            ext = paths[1]
            new_link = paths[0]
            if ext != "":
                return LinkChecker.get_link_extension(new_link) + ext
            else:
                return ""
# try this, it works for anything, any length of extension
# e.g www.google.com/downloads/file1.gz.rs -> .gz.rs

import os.path

class LinkChecker:

    @staticmethod
    def get_link_extension(link: str)->str:
        if link is None or link == "":
            return ""
        else:
            paths = os.path.splitext(link)
            ext = paths[1]
            new_link = paths[0]
            if ext != "":
                return LinkChecker.get_link_extension(new_link) + ext
            else:
                return ""

回答 22

def NewFileName(fichier):
    cpt = 0
    fic , *ext =  fichier.split('.')
    ext = '.'.join(ext)
    while os.path.isfile(fichier):
        cpt += 1
        fichier = '{0}-({1}).{2}'.format(fic, cpt, ext)
    return fichier
def NewFileName(fichier):
    cpt = 0
    fic , *ext =  fichier.split('.')
    ext = '.'.join(ext)
    while os.path.isfile(fichier):
        cpt += 1
        fichier = '{0}-({1}).{2}'.format(fic, cpt, ext)
    return fichier

回答 23

name_only=file_name[:filename.index(".")

这将为您提供最常见的第一个“。”文件名。

name_only=file_name[:filename.index(".")

That will give you the file name up to the first “.”, which would be the most common.


UnicodeEncodeError:’ascii’编解码器无法在位置20编码字符u’\ xa0’:序数不在范围内(128)

问题:UnicodeEncodeError:’ascii’编解码器无法在位置20编码字符u’\ xa0’:序数不在范围内(128)

我在处理从不同网页(在不同站点上)获取的文本中的unicode字符时遇到问题。我正在使用BeautifulSoup。

问题是错误并非总是可重现的。它有时可以在某些页面上使用,有时它会通过抛出来发声UnicodeEncodeError。我已经尝试了几乎所有我能想到的东西,但是没有找到任何能正常工作而不抛出某种与Unicode相关的错误的东西。

导致问题的代码部分之一如下所示:

agent_telno = agent.find('div', 'agent_contact_number')
agent_telno = '' if agent_telno is None else agent_telno.contents[0]
p.agent_info = str(agent_contact + ' ' + agent_telno).strip()

这是运行上述代码段时在某些字符串上生成的堆栈跟踪:

Traceback (most recent call last):
  File "foobar.py", line 792, in <module>
    p.agent_info = str(agent_contact + ' ' + agent_telno).strip()
UnicodeEncodeError: 'ascii' codec can't encode character u'\xa0' in position 20: ordinal not in range(128)

我怀疑这是因为某些页面(或更具体地说,来自某些站点的页面)可能已编码,而其他页面可能未编码。所有站点都位于英国,并提供供英国消费的数据-因此,与英语以外的其他任何形式的内部化或文字处理都没有问题。

是否有人对如何解决此问题有任何想法,以便我可以始终如一地解决此问题?

I’m having problems dealing with unicode characters from text fetched from different web pages (on different sites). I am using BeautifulSoup.

The problem is that the error is not always reproducible; it sometimes works with some pages, and sometimes, it barfs by throwing a UnicodeEncodeError. I have tried just about everything I can think of, and yet I have not found anything that works consistently without throwing some kind of Unicode-related error.

One of the sections of code that is causing problems is shown below:

agent_telno = agent.find('div', 'agent_contact_number')
agent_telno = '' if agent_telno is None else agent_telno.contents[0]
p.agent_info = str(agent_contact + ' ' + agent_telno).strip()

Here is a stack trace produced on SOME strings when the snippet above is run:

Traceback (most recent call last):
  File "foobar.py", line 792, in <module>
    p.agent_info = str(agent_contact + ' ' + agent_telno).strip()
UnicodeEncodeError: 'ascii' codec can't encode character u'\xa0' in position 20: ordinal not in range(128)

I suspect that this is because some pages (or more specifically, pages from some of the sites) may be encoded, whilst others may be unencoded. All the sites are based in the UK and provide data meant for UK consumption – so there are no issues relating to internalization or dealing with text written in anything other than English.

Does anyone have any ideas as to how to solve this so that I can CONSISTENTLY fix this problem?


回答 0

您需要阅读Python Unicode HOWTO。这个错误是第一个例子

基本上,停止使用str从Unicode转换为编码的文本/字节。

相反,请正确使用.encode()编码字符串:

p.agent_info = u' '.join((agent_contact, agent_telno)).encode('utf-8').strip()

或完全以unicode工作。

You need to read the Python Unicode HOWTO. This error is the very first example.

Basically, stop using str to convert from unicode to encoded text / bytes.

Instead, properly use .encode() to encode the string:

p.agent_info = u' '.join((agent_contact, agent_telno)).encode('utf-8').strip()

or work entirely in unicode.


回答 1

这是经典的python unicode痛点!考虑以下:

a = u'bats\u00E0'
print a
 => batsà

到目前为止一切都很好,但是如果我们调用str(a),让我们看看会发生什么:

str(a)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
UnicodeEncodeError: 'ascii' codec can't encode character u'\xe0' in position 4: ordinal not in range(128)

噢,蘸,那对任何人都不会有好处!要解决该错误,请使用.encode明确编码字节,并告诉python使用哪种编解码器:

a.encode('utf-8')
 => 'bats\xc3\xa0'
print a.encode('utf-8')
 => batsà

Voil \ u00E0!

问题是,当您调用str()时,python使用默认的字符编码来尝试对给定的字节进行编码,在您的情况下,有时表示为unicode字符。要解决此问题,您必须告诉python如何使用.encode(’whatever_unicode’)处理您给它的字符串。大多数时候,使用utf-8应该会很好。

有关此主题的出色论述,请参见Ned Batchelder在PyCon上的演讲:http : //nedbatchelder.com/text/unipain.html

This is a classic python unicode pain point! Consider the following:

a = u'bats\u00E0'
print a
 => batsà

All good so far, but if we call str(a), let’s see what happens:

str(a)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
UnicodeEncodeError: 'ascii' codec can't encode character u'\xe0' in position 4: ordinal not in range(128)

Oh dip, that’s not gonna do anyone any good! To fix the error, encode the bytes explicitly with .encode and tell python what codec to use:

a.encode('utf-8')
 => 'bats\xc3\xa0'
print a.encode('utf-8')
 => batsà

Voil\u00E0!

The issue is that when you call str(), python uses the default character encoding to try and encode the bytes you gave it, which in your case are sometimes representations of unicode characters. To fix the problem, you have to tell python how to deal with the string you give it by using .encode(‘whatever_unicode’). Most of the time, you should be fine using utf-8.

For an excellent exposition on this topic, see Ned Batchelder’s PyCon talk here: http://nedbatchelder.com/text/unipain.html


回答 2

我发现可以通过优雅的方法删除符号并继续按以下方式将字符串保留为字符串:

yourstring = yourstring.encode('ascii', 'ignore').decode('ascii')

重要的是要注意,使用ignore选项是危险的,因为它会悄悄地从使用它的代码中删除所有对unicode(和国际化)的支持,如下所示(转换unicode):

>>> u'City: Malmö'.encode('ascii', 'ignore').decode('ascii')
'City: Malm'

I found elegant work around for me to remove symbols and continue to keep string as string in follows:

yourstring = yourstring.encode('ascii', 'ignore').decode('ascii')

It’s important to notice that using the ignore option is dangerous because it silently drops any unicode(and internationalization) support from the code that uses it, as seen here (convert unicode):

>>> u'City: Malmö'.encode('ascii', 'ignore').decode('ascii')
'City: Malm'

回答 3

好吧,我尝试了一切,但并没有帮助,在谷歌搜索之后,我发现了以下内容并有所帮助。使用python 2.7。

# encoding=utf8
import sys
reload(sys)
sys.setdefaultencoding('utf8')

well i tried everything but it did not help, after googling around i figured the following and it helped. python 2.7 is in use.

# encoding=utf8
import sys
reload(sys)
sys.setdefaultencoding('utf8')

回答 4

导致甚至打印失败的一个细微问题是环境变量设置错误,例如。此处LC_ALL设置为“ C”。在Debian中,他们不鼓励设置它:Locale上的Debian Wiki

$ echo $LANG
en_US.utf8
$ echo $LC_ALL 
C
$ python -c "print (u'voil\u00e0')"
Traceback (most recent call last):
  File "<string>", line 1, in <module>
UnicodeEncodeError: 'ascii' codec can't encode character u'\xe0' in position 4: ordinal not in range(128)
$ export LC_ALL='en_US.utf8'
$ python -c "print (u'voil\u00e0')"
voilà
$ unset LC_ALL
$ python -c "print (u'voil\u00e0')"
voilà

A subtle problem causing even print to fail is having your environment variables set wrong, eg. here LC_ALL set to “C”. In Debian they discourage setting it: Debian wiki on Locale

$ echo $LANG
en_US.utf8
$ echo $LC_ALL 
C
$ python -c "print (u'voil\u00e0')"
Traceback (most recent call last):
  File "<string>", line 1, in <module>
UnicodeEncodeError: 'ascii' codec can't encode character u'\xe0' in position 4: ordinal not in range(128)
$ export LC_ALL='en_US.utf8'
$ python -c "print (u'voil\u00e0')"
voilà
$ unset LC_ALL
$ python -c "print (u'voil\u00e0')"
voilà

回答 5

对我来说,有效的是:

BeautifulSoup(html_text,from_encoding="utf-8")

希望这对某人有帮助。

For me, what worked was:

BeautifulSoup(html_text,from_encoding="utf-8")

Hope this helps someone.


回答 6

实际上,我发现在大多数情况下,仅去除那些字符会更加简单:

s = mystring.decode('ascii', 'ignore')

I’ve actually found that in most of my cases, just stripping out those characters is much simpler:

s = mystring.decode('ascii', 'ignore')

回答 7

问题是您正在尝试打印unicode字符,但是您的终端不支持它。

您可以尝试安装language-pack-en软件包来解决此问题:

sudo apt-get install language-pack-en

它为所有支持的软件包(包括Python)提供英语翻译数据更新。如有必要,请安装其他语言包(取决于您尝试打​​印的字符)。

在某些Linux发行版中,需要确保正确设置了默认的英语语言环境(因此unicode字符可以由shell / terminal处理)。有时,与手动配置相比,它更容易安装。

然后,在编写代码时,请确保在代码中使用正确的编码。

例如:

open(foo, encoding='utf-8')

如果仍然有问题,请仔细检查系统配置,例如:

  • 您的语言环境文件(/etc/default/locale),应包含例如

    LANG="en_US.UTF-8"
    LC_ALL="en_US.UTF-8"

    要么:

    LC_ALL=C.UTF-8
    LANG=C.UTF-8
  • LANG/ LC_CTYPEin shell的值。

  • 通过以下方法检查您的shell支持的语言环境:

    locale -a | grep "UTF-8"

演示新VM中的问题和解决方案。

  1. 初始化和配置VM(例如使用vagrant):

    vagrant init ubuntu/trusty64; vagrant up; vagrant ssh

    请参阅:可用的Ubuntu盒

  2. 打印unicode字符(例如商标符号):

    $ python -c 'print(u"\u2122");'
    Traceback (most recent call last):
      File "<string>", line 1, in <module>
    UnicodeEncodeError: 'ascii' codec can't encode character u'\u2122' in position 0: ordinal not in range(128)
  3. 现在安装language-pack-en

    $ sudo apt-get -y install language-pack-en
    The following extra packages will be installed:
      language-pack-en-base
    Generating locales...
      en_GB.UTF-8... /usr/sbin/locale-gen: done
    Generation complete.
  4. 现在应该解决问题:

    $ python -c 'print(u"\u2122");'
    
  5. 否则,请尝试以下命令:

    $ LC_ALL=C.UTF-8 python -c 'print(u"\u2122");'
    

The problem is that you’re trying to print a unicode character, but your terminal doesn’t support it.

You can try installing language-pack-en package to fix that:

sudo apt-get install language-pack-en

which provides English translation data updates for all supported packages (including Python). Install different language package if necessary (depending which characters you’re trying to print).

On some Linux distributions it’s required in order to make sure that the default English locales are set-up properly (so unicode characters can be handled by shell/terminal). Sometimes it’s easier to install it, than configuring it manually.

Then when writing the code, make sure you use the right encoding in your code.

For example:

open(foo, encoding='utf-8')

If you’ve still a problem, double check your system configuration, such as:

  • Your locale file (/etc/default/locale), which should have e.g.

    LANG="en_US.UTF-8"
    LC_ALL="en_US.UTF-8"
    

    or:

    LC_ALL=C.UTF-8
    LANG=C.UTF-8
    
  • Value of LANG/LC_CTYPE in shell.

  • Check which locale your shell supports by:

    locale -a | grep "UTF-8"
    

Demonstrating the problem and solution in fresh VM.

  1. Initialize and provision the VM (e.g. using vagrant):

    vagrant init ubuntu/trusty64; vagrant up; vagrant ssh
    

    See: available Ubuntu boxes..

  2. Printing unicode characters (such as trade mark sign like ):

    $ python -c 'print(u"\u2122");'
    Traceback (most recent call last):
      File "<string>", line 1, in <module>
    UnicodeEncodeError: 'ascii' codec can't encode character u'\u2122' in position 0: ordinal not in range(128)
    
  3. Now installing language-pack-en:

    $ sudo apt-get -y install language-pack-en
    The following extra packages will be installed:
      language-pack-en-base
    Generating locales...
      en_GB.UTF-8... /usr/sbin/locale-gen: done
    Generation complete.
    
  4. Now problem should be solved:

    $ python -c 'print(u"\u2122");'
    ™
    
  5. Otherwise, try the following command:

    $ LC_ALL=C.UTF-8 python -c 'print(u"\u2122");'
    ™
    

回答 8

在外壳中:

  1. 通过以下命令查找支持的UTF-8语言环境:

    locale -a | grep "UTF-8"
  2. 在运行脚本之前将其导出,例如:

    export LC_ALL=$(locale -a | grep UTF-8)

    或手动像:

    export LC_ALL=C.UTF-8
  3. 通过打印特殊字符进行测试,例如

    python -c 'print(u"\u2122");'

以上在Ubuntu中测试过。

In shell:

  1. Find supported UTF-8 locale by the following command:

    locale -a | grep "UTF-8"
    
  2. Export it, before running the script, e.g.:

    export LC_ALL=$(locale -a | grep UTF-8)
    

    or manually like:

    export LC_ALL=C.UTF-8
    
  3. Test it by printing special character, e.g. :

    python -c 'print(u"\u2122");'
    

Above tested in Ubuntu.


回答 9

在脚本开头的下面添加一行(或作为第二行):

# -*- coding: utf-8 -*-

那就是python源代码编码的定义。PEP 263中的更多信息。

Add line below at the beginning of your script ( or as second line):

# -*- coding: utf-8 -*-

That’s definition of python source code encoding. More info in PEP 263.


回答 10

这是其他一些所谓的“警惕”答案的重新表述。在某些情况下,尽管有人在这里提出抗议,但简单地扔掉麻烦的字符/字符串是一个很好的解决方案。

def safeStr(obj):
    try: return str(obj)
    except UnicodeEncodeError:
        return obj.encode('ascii', 'ignore').decode('ascii')
    except: return ""

测试它:

if __name__ == '__main__': 
    print safeStr( 1 ) 
    print safeStr( "test" ) 
    print u'98\xb0'
    print safeStr( u'98\xb0' )

结果:

1
test
98°
98

建议:您可能想将此函数命名为toAscii?这是优先事项。

这是为Python 2编写的。 对于Python 3,我相信您将要使用bytes(obj,"ascii")而不是str(obj)。我尚未对此进行测试,但是我会在某个时候修改答案。

Here’s a rehashing of some other so-called “cop out” answers. There are situations in which simply throwing away the troublesome characters/strings is a good solution, despite the protests voiced here.

def safeStr(obj):
    try: return str(obj)
    except UnicodeEncodeError:
        return obj.encode('ascii', 'ignore').decode('ascii')
    except: return ""

Testing it:

if __name__ == '__main__': 
    print safeStr( 1 ) 
    print safeStr( "test" ) 
    print u'98\xb0'
    print safeStr( u'98\xb0' )

Results:

1
test
98°
98

Suggestion: you might want to name this function to toAscii instead? That’s a matter of preference.

This was written for Python 2. For Python 3, I believe you’ll want to use bytes(obj,"ascii") rather than str(obj). I didn’t test this yet, but I will at some point and revise the answer.


回答 11

我总是将代码放在python文件的前两行中:

# -*- coding: utf-8 -*-
from __future__ import unicode_literals

I always put the code below in the first two lines of the python files:

# -*- coding: utf-8 -*-
from __future__ import unicode_literals

回答 12

这里可以找到简单的帮助程序功能。

def safe_unicode(obj, *args):
    """ return the unicode representation of obj """
    try:
        return unicode(obj, *args)
    except UnicodeDecodeError:
        # obj is byte string
        ascii_text = str(obj).encode('string_escape')
        return unicode(ascii_text)

def safe_str(obj):
    """ return the byte string representation of obj """
    try:
        return str(obj)
    except UnicodeEncodeError:
        # obj is unicode
        return unicode(obj).encode('unicode_escape')

Simple helper functions found here.

def safe_unicode(obj, *args):
    """ return the unicode representation of obj """
    try:
        return unicode(obj, *args)
    except UnicodeDecodeError:
        # obj is byte string
        ascii_text = str(obj).encode('string_escape')
        return unicode(ascii_text)

def safe_str(obj):
    """ return the byte string representation of obj """
    try:
        return str(obj)
    except UnicodeEncodeError:
        # obj is unicode
        return unicode(obj).encode('unicode_escape')

回答 13

只需添加到变量encode(’utf-8’)

agent_contact.encode('utf-8')

Just add to a variable encode(‘utf-8’)

agent_contact.encode('utf-8')

回答 14

请打开终端并执行以下命令:

export LC_ALL="en_US.UTF-8"

Please open terminal and fire the below command:

export LC_ALL="en_US.UTF-8"

回答 15

我只使用了以下内容:

import unicodedata
message = unicodedata.normalize("NFKD", message)

查看有关它的文档说明:

unicodedata.normalize(form,unistr)返回Unicode字符串unistr的普通形式form。格式的有效值为“ NFC”,“ NFKC”,“ NFD”和“ NFKD”。

Unicode标准基于规范对等和兼容性对等的定义,定义了Unicode字符串的各种规范化形式。在Unicode中,可以用各种方式表示几个字符。例如,字符U + 00C7(带有CEDILLA的拉丁文大写字母C)也可以表示为序列U + 0043(拉丁文的大写字母C)U + 0327(合并CEDILLA)。

对于每个字符,有两种规范形式:规范形式C和规范形式D。规范形式D(NFD)也称为规范分解,将每个字符转换为其分解形式。范式C(NFC)首先应用规范分解,然后再次组成预组合字符。

除了这两种形式,还有基于兼容性对等的两种其他常规形式。在Unicode中,支持某些字符,这些字符通常会与其他字符统一。例如,U + 2160(罗马数字ONE)与U + 0049(拉丁大写字母I)实际上是同一回事。但是,Unicode支持它与现有字符集(例如gb2312)兼容。

普通形式的KD(NFKD)将应用兼容性分解,即用所有等效字符替换它们的等效字符。范式KC(NFKC)首先应用兼容性分解,然后进行规范组合。

即使将两个unicode字符串归一化并在人类读者看来是相同的,但是如果一个字符串包含组合字符而另一个字符串没有组合,则它们可能不相等。

为我解决。简单容易。

I just used the following:

import unicodedata
message = unicodedata.normalize("NFKD", message)

Check what documentation says about it:

unicodedata.normalize(form, unistr) Return the normal form form for the Unicode string unistr. Valid values for form are ‘NFC’, ‘NFKC’, ‘NFD’, and ‘NFKD’.

The Unicode standard defines various normalization forms of a Unicode string, based on the definition of canonical equivalence and compatibility equivalence. In Unicode, several characters can be expressed in various way. For example, the character U+00C7 (LATIN CAPITAL LETTER C WITH CEDILLA) can also be expressed as the sequence U+0043 (LATIN CAPITAL LETTER C) U+0327 (COMBINING CEDILLA).

For each character, there are two normal forms: normal form C and normal form D. Normal form D (NFD) is also known as canonical decomposition, and translates each character into its decomposed form. Normal form C (NFC) first applies a canonical decomposition, then composes pre-combined characters again.

In addition to these two forms, there are two additional normal forms based on compatibility equivalence. In Unicode, certain characters are supported which normally would be unified with other characters. For example, U+2160 (ROMAN NUMERAL ONE) is really the same thing as U+0049 (LATIN CAPITAL LETTER I). However, it is supported in Unicode for compatibility with existing character sets (e.g. gb2312).

The normal form KD (NFKD) will apply the compatibility decomposition, i.e. replace all compatibility characters with their equivalents. The normal form KC (NFKC) first applies the compatibility decomposition, followed by the canonical composition.

Even if two unicode strings are normalized and look the same to a human reader, if one has combining characters and the other doesn’t, they may not compare equal.

Solves it for me. Simple and easy.


回答 16

下面的解决方案为我工作,刚刚添加

u“字符串”

(将字符串表示为unicode)在我的字符串之前。

result_html = result.to_html(col_space=1, index=False, justify={'right'})

text = u"""
<html>
<body>
<p>
Hello all, <br>
<br>
Here's weekly summary report.  Let me know if you have any questions. <br>
<br>
Data Summary <br>
<br>
<br>
{0}
</p>
<p>Thanks,</p>
<p>Data Team</p>
</body></html>
""".format(result_html)

Below solution worked for me, Just added

u “String”

(representing the string as unicode) before my string.

result_html = result.to_html(col_space=1, index=False, justify={'right'})

text = u"""
<html>
<body>
<p>
Hello all, <br>
<br>
Here's weekly summary report.  Let me know if you have any questions. <br>
<br>
Data Summary <br>
<br>
<br>
{0}
</p>
<p>Thanks,</p>
<p>Data Team</p>
</body></html>
""".format(result_html)

回答 17

this这至少在Python 3中有效…

Python 3

有时错误在于环境变量中,因此

import os
import locale
os.environ["PYTHONIOENCODING"] = "utf-8"
myLocale=locale.setlocale(category=locale.LC_ALL, locale="en_GB.UTF-8")
... 
print(myText.encode('utf-8', errors='ignore'))

在编码中忽略错误的地方。

Alas this works in Python 3 at least…

Python 3

Sometimes the error is in the enviroment variables and enconding so

import os
import locale
os.environ["PYTHONIOENCODING"] = "utf-8"
myLocale=locale.setlocale(category=locale.LC_ALL, locale="en_GB.UTF-8")
... 
print(myText.encode('utf-8', errors='ignore'))

where errors are ignored in encoding.


回答 18

我只是遇到了这个问题,而Google带领我来到这里,因此,为了在这里添加一般的解决方案,这对我有用:

# 'value' contains the problematic data
unic = u''
unic += value
value = unic

阅读内德的演讲后,我有了这个主意。

不过,我并没有声称完全理解为什么这样做。因此,如果任何人都可以编辑此答案或发表评论以进行解释,我将不胜感激。

I just had this problem, and Google led me here, so just to add to the general solutions here, this is what worked for me:

# 'value' contains the problematic data
unic = u''
unic += value
value = unic

I had this idea after reading Ned’s presentation.

I don’t claim to fully understand why this works, though. So if anyone can edit this answer or put in a comment to explain, I’ll appreciate it.


回答 19

manage.py migrate在带有本地化夹具的Django中运行时,我们遇到了此错误。

我们的资料包含# -*- coding: utf-8 -*-声明,MySQL已为utf8正确配置,而Ubuntu在中具有适当的语言包和值/etc/default/locale

问题只是因为Django容器(我们使用docker)缺少 LANG env var。

在重新运行迁移之前设置LANGen_US.UTF-8并重新启动容器可以解决此问题。

We struck this error when running manage.py migrate in Django with localized fixtures.

Our source contained the # -*- coding: utf-8 -*- declaration, MySQL was correctly configured for utf8 and Ubuntu had the appropriate language pack and values in /etc/default/locale.

The issue was simply that the Django container (we use docker) was missing the LANG env var.

Setting LANG to en_US.UTF-8 and restarting the container before re-running migrations fixed the problem.


回答 20

这里的许多答案(例如,@ agf和@Andbdrew)已经解决了OP问题的最直接方面。

但是,我认为有一个微妙但重要的方面已被很大程度上忽略,这对于像我这样在尝试理解Python编码时最终落到这里的每个人都非常重要:Python 2 vs Python 3字符表示的管理截然不同。我觉得很多困惑与人们在不了解版本的情况下阅读Python编码有关。

我建议有兴趣了解OP问题根本原因的人首先阅读Spolsky对字符表示法和Unicode 介绍,然后转向Python 2和Python 3中的Unicode Batchelder

Many answers here (@agf and @Andbdrew for example) have already addressed the most immediate aspects of the OP question.

However, I think there is one subtle but important aspect that has been largely ignored and that matters dearly for everyone who like me ended up here while trying to make sense of encodings in Python: Python 2 vs Python 3 management of character representation is wildly different. I feel like a big chunk of confusion out there has to do with people reading about encodings in Python without being version aware.

I suggest anyone interested in understanding the root cause of OP problem to begin by reading Spolsky’s introduction to character representations and Unicode and then move to Batchelder on Unicode in Python 2 and Python 3.


回答 21

尽量避免将变量转换为str(variable)。有时,这可能会导致问题。

避免的简单提示:

try: 
    data=str(data)
except:
    data = data #Don't convert to String

上面的示例还将解决Encode错误。

Try to avoid conversion of variable to str(variable). Sometimes, It may cause the issue.

Simple tip to avoid :

try: 
    data=str(data)
except:
    data = data #Don't convert to String

The above example will solve Encode error also.


回答 22

如果您有类似的packet_data = "This is data"操作,请在初始化后立即在下一行执行此操作packet_data

unic = u''
packet_data = unic

If you have something like packet_data = "This is data" then do this on the next line, right after initializing packet_data:

unic = u''
packet_data = unic

回答 23

python 3.0及更高版本的更新。在python编辑器中尝试以下操作:

locale-gen en_US.UTF-8
export LANG=en_US.UTF-8 LANGUAGE=en_US.en
LC_ALL=en_US.UTF-8

这会将系统的默认语言环境编码设置为UTF-8格式。

有关更多信息,请参见PEP 538-将传统C语言环境强制为基于UTF-8的语言环境

Update for python 3.0 and later. Try the following in the python editor:

locale-gen en_US.UTF-8
export LANG=en_US.UTF-8 LANGUAGE=en_US.en
LC_ALL=en_US.UTF-8

This sets the system`s default locale encoding to the UTF-8 format.

More can be read here at PEP 538 — Coercing the legacy C locale to a UTF-8 based locale.


回答 24

我遇到了尝试将Unicode字符输出到stdout,但使用sys.stdout.write而不是print的问题(这样我也可以支持将输出输出到其他文件)。

从BeautifulSoup自己的文档中,我使用编解码器库解决了此问题:

import sys
import codecs

def main(fIn, fOut):
    soup = BeautifulSoup(fIn)
    # Do processing, with data including non-ASCII characters
    fOut.write(unicode(soup))

if __name__ == '__main__':
    with (sys.stdin) as fIn: # Don't think we need codecs.getreader here
        with codecs.getwriter('utf-8')(sys.stdout) as fOut:
            main(fIn, fOut)

I had this issue trying to output Unicode characters to stdout, but with sys.stdout.write, rather than print (so that I could support output to a different file as well).

From BeautifulSoup’s own documentation, I solved this with the codecs library:

import sys
import codecs

def main(fIn, fOut):
    soup = BeautifulSoup(fIn)
    # Do processing, with data including non-ASCII characters
    fOut.write(unicode(soup))

if __name__ == '__main__':
    with (sys.stdin) as fIn: # Don't think we need codecs.getreader here
        with codecs.getwriter('utf-8')(sys.stdout) as fOut:
            main(fIn, fOut)

回答 25

当使用Apache部署django项目时,经常会发生此问题。因为Apache在/ etc / sysconfig / httpd中设置环境变量LANG = C。只需打开文件并注释(或更改为您的样式)此设置即可。或使用WSGIDaemonProcess命令的lang选项,在这种情况下,您将能够为不同的虚拟主机设置不同的LANG环境变量。

This problem often happens when a django project deploys using Apache. Because Apache sets environment variable LANG=C in /etc/sysconfig/httpd. Just open the file and comment (or change to your flavior) this setting. Or use the lang option of the WSGIDaemonProcess command, in this case you will be able to set different LANG environment variable to different virtualhosts.


回答 26

推荐的解决方案对我不起作用,我可以忍受所有非ascii字符的转储,因此

s = s.encode('ascii',errors='ignore')

这给我留下了不会抛出错误的东西。

The recommended solution did not work for me, and I could live with dumping all non ascii characters, so

s = s.encode('ascii',errors='ignore')

which left me with something stripped that doesn’t throw errors.


回答 27

这将起作用:

 >>>print(unicodedata.normalize('NFD', re.sub("[\(\[].*?[\)\]]", "", "bats\xc3\xa0")).encode('ascii', 'ignore'))

输出:

>>>bats

This will work:

 >>>print(unicodedata.normalize('NFD', re.sub("[\(\[].*?[\)\]]", "", "bats\xc3\xa0")).encode('ascii', 'ignore'))

Output:

>>>bats

对象名称前的单下划线和双下划线是什么意思?

问题:对象名称前的单下划线和双下划线是什么意思?

有人可以解释一下在Python中对象名称前加下划线的确切含义,以及两者之间的区别吗?

此外,无论所讨论的对象是变量,函数,方法等,该含义是否都保持不变?

Can someone please explain the exact meaning of having leading underscores before an object’s name in Python, and the difference between both?

Also, does that meaning stay the same whether the object in question is a variable, a function, a method, etc.?


回答 0

单下划线

类中带有下划线的名称仅是为了向其他程序员表明该属性或方法旨在私有。但是,名称本身并没有做任何特别的事情。

引用PEP-8

_single_leading_underscore:“内部使用”指标较弱。例如from M import *,不导入名称以下划线开头的对象。

双下划线(名称改写)

Python文档

形式上的任何标识符__spam(至少两个前导下划线,至多一个尾随下划线)在文本上均替换为_classname__spam,其中classname是当前类名,其中已删除前导下划线。进行这种修饰时无需考虑标识符的语法位置,因此可用于定义类私有实例和类变量,方法,全局变量中存储的变量,甚至实例中存储的变量。在其他类的实例上对此类私有。

和来自同一页面的警告:

名称修饰旨在为类提供一种轻松的方法来定义“私有”实例变量和方法,而不必担心派生类定义的实例变量或类外部代码对实例变量的处理。请注意,修改规则主要是为了避免发生意外。坚定的灵魂仍然有可能访问或修改被视为私有的变量。

>>> class MyClass():
...     def __init__(self):
...             self.__superprivate = "Hello"
...             self._semiprivate = ", world!"
...
>>> mc = MyClass()
>>> print mc.__superprivate
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
AttributeError: myClass instance has no attribute '__superprivate'
>>> print mc._semiprivate
, world!
>>> print mc.__dict__
{'_MyClass__superprivate': 'Hello', '_semiprivate': ', world!'}

Single Underscore

Names, in a class, with a leading underscore are simply to indicate to other programmers that the attribute or method is intended to be private. However, nothing special is done with the name itself.

To quote PEP-8:

_single_leading_underscore: weak “internal use” indicator. E.g. from M import * does not import objects whose name starts with an underscore.

Double Underscore (Name Mangling)

From the Python docs:

Any identifier of the form __spam (at least two leading underscores, at most one trailing underscore) is textually replaced with _classname__spam, where classname is the current class name with leading underscore(s) stripped. This mangling is done without regard to the syntactic position of the identifier, so it can be used to define class-private instance and class variables, methods, variables stored in globals, and even variables stored in instances. private to this class on instances of other classes.

And a warning from the same page:

Name mangling is intended to give classes an easy way to define “private” instance variables and methods, without having to worry about instance variables defined by derived classes, or mucking with instance variables by code outside the class. Note that the mangling rules are designed mostly to avoid accidents; it still is possible for a determined soul to access or modify a variable that is considered private.

Example

>>> class MyClass():
...     def __init__(self):
...             self.__superprivate = "Hello"
...             self._semiprivate = ", world!"
...
>>> mc = MyClass()
>>> print mc.__superprivate
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
AttributeError: myClass instance has no attribute '__superprivate'
>>> print mc._semiprivate
, world!
>>> print mc.__dict__
{'_MyClass__superprivate': 'Hello', '_semiprivate': ', world!'}

回答 1

到目前为止,很好的答案,但缺少一些花絮。单个前导下划线并不仅仅是一个约定:如果使用from foobar import *,并且模块foobar未定义__all__列表,则从模块导入的名称将包括带有下划线的名称。可以说这主要是一个约定,因为这种情况是一个相当模糊的角落;-)。

领先的下划线惯例被广泛用于不只是私人的名字,而且对什么C ++称之为保护的-例如,都完全的方法名称由子类(甚至是那些覆盖被否决,因为在它们的基类raise NotImplementedError!-)通常是单引号下划线名称,用于指示使用该类(或子类)的实例进行编码时,这些方法并不旨在被直接调用。

例如,要创建一个与FIFO不同的排队规则的线程安全队列,可以导入Queue,将Queue.Queue子类化,并覆盖诸如_get_put;之类的方法。“客户端代码”从不调用那些(“挂钩”)方法,而是调用(和“组织”)公共方法,例如putget(称为模板方法设计模式),例如,有关基于视频的有趣演示,请参见此处。我关于这个话题的演讲,以及抄本的提要)。

编辑:演讲说明中的视频链接现在已断开。您可以在此处此处找到前两个视频。

Excellent answers so far but some tidbits are missing. A single leading underscore isn’t exactly just a convention: if you use from foobar import *, and module foobar does not define an __all__ list, the names imported from the module do not include those with a leading underscore. Let’s say it’s mostly a convention, since this case is a pretty obscure corner;-).

The leading-underscore convention is widely used not just for private names, but also for what C++ would call protected ones — for example, names of methods that are fully intended to be overridden by subclasses (even ones that have to be overridden since in the base class they raise NotImplementedError!-) are often single-leading-underscore names to indicate to code using instances of that class (or subclasses) that said methods are not meant to be called directly.

For example, to make a thread-safe queue with a different queueing discipline than FIFO, one imports Queue, subclasses Queue.Queue, and overrides such methods as _get and _put; “client code” never calls those (“hook”) methods, but rather the (“organizing”) public methods such as put and get (this is known as the Template Method design pattern — see e.g. here for an interesting presentation based on a video of a talk of mine on the subject, with the addition of synopses of the transcript).

Edit: The video links in the description of the talks are now broken. You can find the first two videos here and here.


回答 2

__foo__:这只是一个约定,这是Python系统使用不会与用户名冲突的名称的一种方式。

_foo:这只是一个约定,是程序员用来表明变量是私有的(无论在Python中是什么意思)的一种方式。

__foo:这具有实际含义:解释器将其替换为,_classname__foo以确保该名称不会与另一个类中的相似名称重叠。

在Python世界中,没有其他形式的下划线具有意义。

在这些约定中,类,变量,全局等之间没有区别。

__foo__: this is just a convention, a way for the Python system to use names that won’t conflict with user names.

_foo: this is just a convention, a way for the programmer to indicate that the variable is private (whatever that means in Python).

__foo: this has real meaning: the interpreter replaces this name with _classname__foo as a way to ensure that the name will not overlap with a similar name in another class.

No other form of underscores have meaning in the Python world.

There’s no difference between class, variable, global, etc in these conventions.


回答 3

._variable 是半私有的,仅用于约定

.__variable通常被错误地认为是超私有的,而其实际含义只是为了防止意外访问而使用 namemangle [1]

.__variable__ 通常为内置方法或变量保留

您仍然可以.__mangled根据需要访问变量。双下划线只是将变量命名或重命名为类似instance._className__mangled

例:

class Test(object):
    def __init__(self):
        self.__a = 'a'
        self._b = 'b'

>>> t = Test()
>>> t._b
'b'

t._b是可访问的,因为它仅被约定隐藏

>>> t.__a
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
AttributeError: 'Test' object has no attribute '__a'

找不到t .__ a,因为由于名称修改而不再存在

>>> t._Test__a
'a'

通过访问instance._className__variable而不只是双下划线名称,您可以访问隐藏值

._variable is semiprivate and meant just for convention

.__variable is often incorrectly considered superprivate, while it’s actual meaning is just to namemangle to prevent accidental access[1]

.__variable__ is typically reserved for builtin methods or variables

You can still access .__mangled variables if you desperately want to. The double underscores just namemangles, or renames, the variable to something like instance._className__mangled

Example:

class Test(object):
    def __init__(self):
        self.__a = 'a'
        self._b = 'b'

>>> t = Test()
>>> t._b
'b'

t._b is accessible because it is only hidden by convention

>>> t.__a
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
AttributeError: 'Test' object has no attribute '__a'

t.__a isn’t found because it no longer exists due to namemangling

>>> t._Test__a
'a'

By accessing instance._className__variable instead of just the double underscore name, you can access the hidden value


回答 4

开头下划线:

Python没有真正的私有方法。相反,在方法或属性名称的开头加下划线表示您不应访问此方法,因为它不是API的一部分。

class BaseForm(StrAndUnicode):

    def _get_errors(self):
        "Returns an ErrorDict for the data provided for the form"
        if self._errors is None:
            self.full_clean()
        return self._errors

    errors = property(_get_errors)

(此代码段摘自django源代码:django / forms / forms.py)。在这段代码中,它errors是一个公共属性,但是此属性调用的方法_get_errors是“私有”的,因此您不应访问它。

一开始有两个下划线:

这引起很多混乱。不应将其用于创建私有方法。应该使用它来避免您的方法被子类覆盖或意外访问。让我们来看一个例子:

class A(object):
    def __test(self):
        print "I'm a test method in class A"

    def test(self):
        self.__test()

a = A()
a.test()
# a.__test() # This fails with an AttributeError
a._A__test() # Works! We can access the mangled name directly!

输出:

$ python test.py
I'm test method in class A
I'm test method in class A

现在创建一个子类B并为__test方法进行自定义

class B(A):
    def __test(self):
        print "I'm test method in class B"

b = B()
b.test()

输出将是…。

$ python test.py
I'm test method in class A

如我们所见,A.test()没有像我们期望的那样调用B .__ test()方法。但实际上,这是__的正确行为。这两个称为__test()的方法会自动重命名(混合)为_A__test()和_B__test(),因此不会意外覆盖它们。当您创建以__开头的方法时,这意味着您不希望任何人都可以覆盖它,而只打算从其自己的类内部访问它。

开头和结尾有两个下划线:

当我们看到类似的方法时__this__,请勿调用它。这是python而不是您要调用的方法。让我们来看看:

>>> name = "test string"
>>> name.__len__()
11
>>> len(name)
11

>>> number = 10
>>> number.__add__(40)
50
>>> number + 50
60

总是有一个调用这些魔术方法的运算符或本机函数。有时,在特定情况下,这只是python的一个钩子调用。例如__init__()在创建对象之后创建对象时__new__()调用,以构建实例…

让我们举个例子…

class FalseCalculator(object):

    def __init__(self, number):
        self.number = number

    def __add__(self, number):
        return self.number - number

    def __sub__(self, number):
        return self.number + number

number = FalseCalculator(20)
print number + 10      # 10
print number - 20      # 40

有关更多详细信息,请参阅PEP-8指南。有关更多魔术方法,请参见此PDF

Single underscore at the beginning:

Python doesn’t have real private methods. Instead, one underscore at the start of a method or attribute name means you shouldn’t access this method, because it’s not part of the API.

class BaseForm(StrAndUnicode):

    def _get_errors(self):
        "Returns an ErrorDict for the data provided for the form"
        if self._errors is None:
            self.full_clean()
        return self._errors

    errors = property(_get_errors)

(This code snippet was taken from django source code: django/forms/forms.py). In this code, errors is a public property, but the method this property calls, _get_errors, is “private”, so you shouldn’t access it.

Two underscores at the beginning:

This causes a lot of confusion. It should not be used to create a private method. It should be used to avoid your method being overridden by a subclass or accessed accidentally. Let’s see an example:

class A(object):
    def __test(self):
        print "I'm a test method in class A"

    def test(self):
        self.__test()

a = A()
a.test()
# a.__test() # This fails with an AttributeError
a._A__test() # Works! We can access the mangled name directly!

Output:

$ python test.py
I'm test method in class A
I'm test method in class A

Now create a subclass B and do customization for __test method

class B(A):
    def __test(self):
        print "I'm test method in class B"

b = B()
b.test()

Output will be….

$ python test.py
I'm test method in class A

As we have seen, A.test() didn’t call B.__test() methods, as we might expect. But in fact, this is the correct behavior for __. The two methods called __test() are automatically renamed (mangled) to _A__test() and _B__test(), so they do not accidentally override. When you create a method starting with __ it means that you don’t want to anyone to be able to override it, and you only intend to access it from inside its own class.

Two underscores at the beginning and at the end:

When we see a method like __this__, don’t call it. This is a method which python is meant to call, not you. Let’s take a look:

>>> name = "test string"
>>> name.__len__()
11
>>> len(name)
11

>>> number = 10
>>> number.__add__(40)
50
>>> number + 50
60

There is always an operator or native function which calls these magic methods. Sometimes it’s just a hook python calls in specific situations. For example __init__() is called when the object is created after __new__() is called to build the instance…

Let’s take an example…

class FalseCalculator(object):

    def __init__(self, number):
        self.number = number

    def __add__(self, number):
        return self.number - number

    def __sub__(self, number):
        return self.number + number

number = FalseCalculator(20)
print number + 10      # 10
print number - 20      # 40

For more details, see the PEP-8 guide. For more magic methods, see this PDF.


回答 5

有时您看起来像是一个带下划线的元组,如下所示

def foo(bar):
    return _('my_' + bar)

在这种情况下,正在发生的情况是_()是本地化功能的别名,该功能基于文本在文本上进行操作以将其放入适当的语言等。例如,Sphinx执行此操作,您将在导入文件中找到

from sphinx.locale import l_, _

在sphinx.locale中,_()被分配为某些本地化功能的别名。

Sometimes you have what appears to be a tuple with a leading underscore as in

def foo(bar):
    return _('my_' + bar)

In this case, what’s going on is that _() is an alias for a localization function that operates on text to put it into the proper language, etc. based on the locale. For example, Sphinx does this, and you’ll find among the imports

from sphinx.locale import l_, _

and in sphinx.locale, _() is assigned as an alias of some localization function.


回答 6

既然有这么多人在谈论雷蒙德的演讲,我将写下他的话简化一下:

双下划线的目的不是关于隐私。目的是像这样完全使用它

class Circle(object):

    def __init__(self, radius):
        self.radius = radius

    def area(self):
        p = self.__perimeter()
        r = p / math.pi / 2.0
        return math.pi * r ** 2.0

    def perimeter(self):
        return 2.0 * math.pi * self.radius

    __perimeter = perimeter  # local reference


class Tire(Circle):

    def perimeter(self):
        return Circle.perimeter(self) * 1.25

实际上,这是隐私的对立面,它与自由有关。它使您的子类可以随意覆盖任何一种方法而不会破坏其他方法

假设您没有保留perimeterin 的本地引用Circle。现在,派生类将Tire覆盖实现perimeter,而无需触摸areaTire(5).area()从理论上讲,当您调用时,它仍Circle.perimeter应用于计算,但实际上,它正在使用Tire.perimeter,这不是预期的行为。这就是为什么我们在Circle中需要本地引用。

但是为什么要__perimeter代替_perimeter呢?因为_perimeter仍然给派生类重写的机会:

class Tire(Circle):

    def perimeter(self):
        return Circle.perimeter(self) * 1.25

    _perimeter = perimeter

双下划线具有名称修饰,因此父类中的本地引用在派生类中被覆盖的可能性很小。因此“ 使子类可以自由地覆盖任何一种方法而不会破坏其他方法 ”。

如果您的类不会被继承,或者方法重写不会破坏任何内容,那么您根本不需要__double_leading_underscore

Since so many people are referring to Raymond’s talk, I’ll just make it a little easier by writing down what he said:

The intention of the double underscores was not about privacy. The intention was to use it exactly like this

class Circle(object):

    def __init__(self, radius):
        self.radius = radius

    def area(self):
        p = self.__perimeter()
        r = p / math.pi / 2.0
        return math.pi * r ** 2.0

    def perimeter(self):
        return 2.0 * math.pi * self.radius

    __perimeter = perimeter  # local reference


class Tire(Circle):

    def perimeter(self):
        return Circle.perimeter(self) * 1.25

It’s actually the opposite of privacy, it’s all about freedom. It makes your subclasses free to override any one method without breaking the others.

Say you don’t keep a local reference of perimeter in Circle. Now, a derived class Tire overrides the implementation of perimeter, without touching area. When you call Tire(5).area(), in theory it should still be using Circle.perimeter for computation, but in reality it’s using Tire.perimeter, which is not the intended behavior. That’s why we need a local reference in Circle.

But why __perimeter instead of _perimeter? Because _perimeter still gives derived class the chance to override:

class Tire(Circle):

    def perimeter(self):
        return Circle.perimeter(self) * 1.25

    _perimeter = perimeter

Double underscores has name mangling, so there’s a very little chance that the local reference in parent class get override in derived class. thus “makes your subclasses free to override any one method without breaking the others“.

If your class won’t be inherited, or method overriding does not break anything, then you simply don’t need __double_leading_underscore.


回答 7

如果您真的想将变量设为只读,恕我直言,最好的方法是使用仅传递有getter的property()。使用property(),我们可以完全控制数据。

class PrivateVarC(object):

    def get_x(self):
        pass

    def set_x(self, val):
        pass

    rwvar = property(get_p, set_p)  

    ronly = property(get_p) 

我知道OP提出了一个稍有不同的问题,但是由于我发现了另一个问题,询问“如何设置私有变量”与此标记为重复,因此我想在此处添加此附加信息。

If one really wants to make a variable read-only, IMHO the best way would be to use property() with only getter passed to it. With property() we can have complete control over the data.

class PrivateVarC(object):

    def get_x(self):
        pass

    def set_x(self, val):
        pass

    rwvar = property(get_p, set_p)  

    ronly = property(get_p) 

I understand that OP asked a little different question but since I found another question asking for ‘how to set private variables’ marked duplicate with this one, I thought of adding this additional info here.


回答 8

进入https://dbader.org/blog/含义-underscores-in-python

  • 单个前导下划线(_var):表示名称的命名约定仅供内部使用。通常不由Python解释器强制执行(通配符导入除外),并且仅作为对程序员的提示。
  • 单个尾随下划线(var_):按惯例用于避免与Python关键字命名冲突。
  • 双引号下划线(__var):在类上下文中使用时触发名称修饰。由Python解释器实施。
  • 前后双下划线(__var__):表示Python语言定义的特殊方法。避免为您自己的属性使用此命名方案。
  • 单个下划线(_):有时用作临时或无关紧要的变量的名称(“无关紧要”)。另外:Python REPL中最后一个表达式的结果。

Accroding to https://dbader.org/blog/meaning-of-underscores-in-python

  • Single Leading Underscore(_var): Naming convention indicating a name is meant for internal use. Generally not enforced by the Python interpreter (except in wildcard imports) and meant as a hint to the programmer only.
  • Single Trailing Underscore(var_): Used by convention to avoid naming conflicts with Python keywords.
  • Double Leading Underscore(__var): Triggers name mangling when used in a class context. Enforced by the Python interpreter.
  • Double Leading and Trailing Underscore(__var__): Indicates special methods defined by the Python language. Avoid this naming scheme for your own attributes.
  • Single Underscore(_): Sometimes used as a name for temporary or insignificant variables (“don’t care”). Also: The result of the last expression in a Python REPL.

回答 9

单个下划线是惯例。名称是否以单个下划线开头,与解释器的观点没有区别。

双开头和结尾的下划线用于内置的方法,如__init____bool__

双下划线开头W / O尾随同行是惯例也是如此,但是,该类方法将被错位的解释。对于变量或基本函数名称,没有区别。

Single leading underscores is a convention. there is no difference from the interpreter’s point of view if whether names starts with a single underscore or not.

Double leading and trailing underscores are used for built-in methods, such as __init__, __bool__, etc.

Double leading underscores w/o trailing counterparts are a convention too, however, the class methods will be mangled by the interpreter. For variables or basic function names no difference exists.


回答 10

好的答案都正确。我提供了简单的示例以及简单的定义/含义。

含义:

some_variable–►任何人都可以看到这是公开的。

_some_variable–►任何人都可以看到这是公开的,但这是表示私有的惯例… 警告 Python不会执行任何强制执行。

__some_varaible–►Python将变量名替换为_classname__some_varaible(又名名称改写),它降低/隐藏了它的可见性,更像是私有变量。

坦白地说,根据Python文档

“只能从对象内部访问的“私有”实例变量在Python中不存在”

这个例子:

class A():
    here="abc"
    _here="_abc"
    __here="__abc"


aObject=A()
print(aObject.here) 
print(aObject._here)
# now if we try to print __here then it will fail because it's not public variable 
#print(aObject.__here)

Great answers and all are correct.I have provided simple example along with simple definition/meaning.

Meaning:

some_variable –► it’s public anyone can see this.

_some_variable –► it’s public anyone can see this but it’s a convention to indicate private…warning no enforcement is done by Python.

__some_varaible –► Python replaces the variable name with _classname__some_varaible (AKA name mangling) and it reduces/hides it’s visibility and be more like private variable.

Just to be honest here According to Python documentation

““Private” instance variables that cannot be accessed except from inside an object don’t exist in Python”

The example:

class A():
    here="abc"
    _here="_abc"
    __here="__abc"


aObject=A()
print(aObject.here) 
print(aObject._here)
# now if we try to print __here then it will fail because it's not public variable 
#print(aObject.__here)

回答 11

您的问题很好,不仅是方法。模块中的函数和对象通常也以一个下划线为前缀,并且可以以两个为前缀。

但是,例如,__double_underscore名称在模块中没有名称混杂。发生的情况是,如果您从模块(从module import *)全部导入,则不会导入以一个(或多个)下划线开头的名称,也不会在help(module)中显示名称。

Your question is good, it is not only about methods. Functions and objects in modules are commonly prefixed with one underscore as well, and can be prefixed by two.

But __double_underscore names are not name-mangled in modules, for example. What happens is that names beginning with one (or more) underscores are not imported if you import all from a module (from module import *), nor are the names shown in help(module).


回答 12

这是一个关于双下划线属性如何影响继承的类的简单说明性示例。因此,使用以下设置:

class parent(object):
    __default = "parent"
    def __init__(self, name=None):
        self.default = name or self.__default

    @property
    def default(self):
        return self.__default

    @default.setter
    def default(self, value):
        self.__default = value


class child(parent):
    __default = "child"

如果您随后在python REPL中创建一个子实例,则会看到以下内容

child_a = child()
child_a.default            # 'parent'
child_a._child__default    # 'child'
child_a._parent__default   # 'parent'

child_b = child("orphan")
## this will show 
child_b.default            # 'orphan'
child_a._child__default    # 'child'
child_a._parent__default   # 'orphan'

这对于某些人可能是显而易见的,但是在更复杂的环境中使我措手不及

Here is a simple illustrative example on how double underscore properties can affect an inherited class. So with the following setup:

class parent(object):
    __default = "parent"
    def __init__(self, name=None):
        self.default = name or self.__default

    @property
    def default(self):
        return self.__default

    @default.setter
    def default(self, value):
        self.__default = value


class child(parent):
    __default = "child"

if you then create a child instance in the python REPL, you will see the below

child_a = child()
child_a.default            # 'parent'
child_a._child__default    # 'child'
child_a._parent__default   # 'parent'

child_b = child("orphan")
## this will show 
child_b.default            # 'orphan'
child_a._child__default    # 'child'
child_a._parent__default   # 'orphan'

This may be obvious to some, but it caught me off guard in a much more complex environment


回答 13

Python中不存在只能从对象内部访问的“私有”实例变量。但是,大多数Python代码遵循一个约定:以下划线开头的名称(例如_spam)应被视为API的非公开部分(无论是函数,方法还是数据成员) 。它应被视为实现细节,如有更改,恕不另行通知。

参考 https://docs.python.org/2/tutorial/classes.html#private-variables-and-class-local-references

“Private” instance variables that cannot be accessed except from inside an object don’t exist in Python. However, there is a convention that is followed by most Python code: a name prefixed with an underscore (e.g. _spam) should be treated as a non-public part of the API (whether it is a function, a method or a data member). It should be considered an implementation detail and subject to change without notice.

reference https://docs.python.org/2/tutorial/classes.html#private-variables-and-class-local-references


回答 14

得到_和__的事实非常容易。其他答案很好地表达了他们。使用情况很难确定。

这是我的看法:

_

应该用于表示某个功能不公开使用,例如API。这和导入限制使它的行为很像internal在c#中。

__

应该用来避免继承层次结构中的名称冲突以及避免后期绑定。就像C#中的private。

==>

如果您想指出某事不是公共用途,而是应该像protecteduse 一样_。如果您想指出某事不是公共用途,而是应该像privateuse 一样__

这也是我非常喜欢的报价:

问题在于,类的作者可能会合理地认为“此属性/方法名称应为私有的,只能从该类定义中访问”,并使用__private约定。但是稍后,该类的用户可能会创建合法需要访问该名称的子类。因此,要么必须修改超类(可能很难或不可能),要么子类代码必须使用手动修改的名称(充其量是丑陋和脆弱的)。

但是我的问题是,如果没有IDE在您重写方法时警告您,如果您意外地从基类中重写了一个方法,则发现错误可能会花费您一段时间。

Getting the facts of _ and __ is pretty easy; the other answers express them pretty well. The usage is much harder to determine.

This is how I see it:

_

Should be used to indicate that a function is not for public use as for example an API. This and the import restriction make it behave much like internal in c#.

__

Should be used to avoid name collision in the inheritace hirarchy and to avoid latebinding. Much like private in c#.

==>

If you want to indicate that something is not for public use, but it should act like protected use _. If you want to indicate that something is not for public use, but it should act like private use __.

This is also a quote that I like very much:

The problem is that the author of a class may legitimately think “this attribute/method name should be private, only accessible from within this class definition” and use the __private convention. But later on, a user of that class may make a subclass that legitimately needs access to that name. So either the superclass has to be modified (which may be difficult or impossible), or the subclass code has to use manually mangled names (which is ugly and fragile at best).

But the problem with that is in my opinion that if there’s no IDE that warns you when you override methods, finding the error might take you a while if you have accidentially overriden a method from a base-class.


在现代Python中声明自定义异常的正确方法?

问题:在现代Python中声明自定义异常的正确方法?

在现代Python中声明自定义异常类的正确方法是什么?我的主要目标是遵循其他异常类具有的任何标准,以便(例如)我捕获到异常中的任何工具都会打印出我包含在异常中的任何多余字符串。

“现代Python”是指可以在Python 2.5中运行但对于Python 2.6和Python 3. *是“正确”的方式。所谓“自定义”,是指一个Exception对象,该对象可以包含有关错误原因的其他数据:字符串,也可以是与该异常相关的其他任意对象。

我在Python 2.6.2中被以下弃用警告绊倒了:

>>> class MyError(Exception):
...     def __init__(self, message):
...         self.message = message
... 
>>> MyError("foo")
_sandbox.py:3: DeprecationWarning: BaseException.message has been deprecated as of Python 2.6

BaseException对于名为的属性有特殊含义似乎很疯狂message。我从PEP-352收集到,该属性确实在2.5中有特殊含义,因此他们想弃用该属性,所以我想现在禁止使用该名称了(并且一个人)。啊。

我也模糊地意识到它Exception具有一些不可思议的参数args,但是我从未知道如何使用它。我也不确定这是前进的正确方法。我在网上发现的很多讨论都表明他们正在尝试消除Python 3中的args。

更新:有两个答案建议覆盖__init__,和__str__/ __unicode__/ __repr__。好像要打字很多,有必要吗?

What’s the proper way to declare custom exception classes in modern Python? My primary goal is to follow whatever standard other exception classes have, so that (for instance) any extra string I include in the exception is printed out by whatever tool caught the exception.

By “modern Python” I mean something that will run in Python 2.5 but be ‘correct’ for the Python 2.6 and Python 3.* way of doing things. And by “custom” I mean an Exception object that can include extra data about the cause of the error: a string, maybe also some other arbitrary object relevant to the exception.

I was tripped up by the following deprecation warning in Python 2.6.2:

>>> class MyError(Exception):
...     def __init__(self, message):
...         self.message = message
... 
>>> MyError("foo")
_sandbox.py:3: DeprecationWarning: BaseException.message has been deprecated as of Python 2.6

It seems crazy that BaseException has a special meaning for attributes named message. I gather from PEP-352 that attribute did have a special meaning in 2.5 they’re trying to deprecate away, so I guess that name (and that one alone) is now forbidden? Ugh.

I’m also fuzzily aware that Exception has some magic parameter args, but I’ve never known how to use it. Nor am I sure it’s the right way to do things going forward; a lot of the discussion I found online suggested they were trying to do away with args in Python 3.

Update: two answers have suggested overriding __init__, and __str__/__unicode__/__repr__. That seems like a lot of typing, is it necessary?


回答 0

也许我错过了这个问题,但是为什么不呢?

class MyException(Exception):
    pass

编辑:要覆盖某些内容(或传递额外的args),请执行以下操作:

class ValidationError(Exception):
    def __init__(self, message, errors):

        # Call the base class constructor with the parameters it needs
        super(ValidationError, self).__init__(message)

        # Now for your custom code...
        self.errors = errors

这样,您可以将错误消息的字典传递给第二个参数,并在以后使用 e.errors


Python 3更新:在Python 3+中,您可以使用以下更紧凑的用法super()

class ValidationError(Exception):
    def __init__(self, message, errors):

        # Call the base class constructor with the parameters it needs
        super().__init__(message)

        # Now for your custom code...
        self.errors = errors

Maybe I missed the question, but why not:

class MyException(Exception):
    pass

Edit: to override something (or pass extra args), do this:

class ValidationError(Exception):
    def __init__(self, message, errors):

        # Call the base class constructor with the parameters it needs
        super(ValidationError, self).__init__(message)

        # Now for your custom code...
        self.errors = errors

That way you could pass dict of error messages to the second param, and get to it later with e.errors


Python 3 Update: In Python 3+, you can use this slightly more compact use of super():

class ValidationError(Exception):
    def __init__(self, message, errors):

        # Call the base class constructor with the parameters it needs
        super().__init__(message)

        # Now for your custom code...
        self.errors = errors

回答 1

随着现代Python的exceptions,你并不需要滥用.message,或覆盖.__str__().__repr__()或任何它。如果在引发异常时,您所希望的只是一条提示性消​​息,请执行以下操作:

class MyException(Exception):
    pass

raise MyException("My hovercraft is full of eels")

这将提供以结尾的回溯MyException: My hovercraft is full of eels

如果您希望从异常中获得更大的灵活性,则可以传递一个字典作为参数:

raise MyException({"message":"My hovercraft is full of animals", "animal":"eels"})

但是,要在一个except块中获得这些详细信息则要复杂一些。详细信息存储在args列表中的属性中。您将需要执行以下操作:

try:
    raise MyException({"message":"My hovercraft is full of animals", "animal":"eels"})
except MyException as e:
    details = e.args[0]
    print(details["animal"])

仍然可以将多个项目传递给异常并通过元组索引访问它们,但是强烈建议不要这样做(甚至是打算在不久后弃用)。如果确实需要多个信息,而上述方法不足以满足您的要求,则应Exception按照本教程中的描述进行子类化。

class MyError(Exception):
    def __init__(self, message, animal):
        self.message = message
        self.animal = animal
    def __str__(self):
        return self.message

With modern Python Exceptions, you don’t need to abuse .message, or override .__str__() or .__repr__() or any of it. If all you want is an informative message when your exception is raised, do this:

class MyException(Exception):
    pass

raise MyException("My hovercraft is full of eels")

That will give a traceback ending with MyException: My hovercraft is full of eels.

If you want more flexibility from the exception, you could pass a dictionary as the argument:

raise MyException({"message":"My hovercraft is full of animals", "animal":"eels"})

However, to get at those details in an except block is a bit more complicated. The details are stored in the args attribute, which is a list. You would need to do something like this:

try:
    raise MyException({"message":"My hovercraft is full of animals", "animal":"eels"})
except MyException as e:
    details = e.args[0]
    print(details["animal"])

It is still possible to pass in multiple items to the exception and access them via tuple indexes, but this is highly discouraged (and was even intended for deprecation a while back). If you do need more than a single piece of information and the above method is not sufficient for you, then you should subclass Exception as described in the tutorial.

class MyError(Exception):
    def __init__(self, message, animal):
        self.message = message
        self.animal = animal
    def __str__(self):
        return self.message

回答 2

“在现代Python中声明自定义异常的正确方法?”

很好,除非您的异常确实是更具体的异常的一种:

class MyException(Exception):
    pass

或者更好(也许是完美的),而不是pass提供一个文档字符串:

class MyException(Exception):
    """Raise for my specific kind of exception"""

子类化异常子类

来自文档

Exception

所有内置的,非系统退出的异常都派生自此类。所有用户定义的异常也应从此类派生。

这意味着,如果您的异常是一种更具体的异常,请将该异常归为子类,而不是泛型Exception(其结果将是您仍然Exception按照文档建议的方式派生)。另外,您至少可以提供一个文档字符串(并且不被迫使用pass关键字):

class MyAppValueError(ValueError):
    '''Raise when my specific value is wrong'''

设置可使用custom创建自己的属性__init__。避免将dict作为位置参数传递,以后您的代码用户将感谢您。如果您使用不推荐使用的message属性,则自行分配该属性将避免出现DeprecationWarning

class MyAppValueError(ValueError):
    '''Raise when a specific subset of values in context of app is wrong'''
    def __init__(self, message, foo, *args):
        self.message = message # without this you may get DeprecationWarning
        # Special attribute you desire with your Error, 
        # perhaps the value that caused the error?:
        self.foo = foo         
        # allow users initialize misc. arguments as any other builtin Error
        super(MyAppValueError, self).__init__(message, foo, *args) 

确实不需要自己编写__str____repr__。内置的非常好,您的合作继承可确保您使用它。

批判最佳答案

也许我错过了这个问题,但是为什么不呢?

class MyException(Exception):
    pass

同样,上面的问题是,要捕获它,您必须专门为其命名(如果在其他位置创建,则将其导入)或捕获Exception(但您可能不准备处理所有类型的Exception,并且您应该只捕获您准备处理的异常)。与以下内容类似的批评,但除此之外,这不是通过进行初始化的方式superDeprecationWarning如果您访问message属性,则会得到一个:

编辑:要覆盖某些内容(或传递额外的args),请执行以下操作:

class ValidationError(Exception):
    def __init__(self, message, errors):

        # Call the base class constructor with the parameters it needs
        super(ValidationError, self).__init__(message)

        # Now for your custom code...
        self.errors = errors

这样,您可以将错误消息的字典传递给第二个参数,并在以后使用e.errors到达它。

它也需要传入两个参数(self。除外)。这是一个有趣的约束,将来的用户可能不会欣赏。

直截了当-它违反了Liskov的可替代性

我将演示两个错误:

>>> ValidationError('foo', 'bar', 'baz').message

Traceback (most recent call last):
  File "<pyshell#10>", line 1, in <module>
    ValidationError('foo', 'bar', 'baz').message
TypeError: __init__() takes exactly 3 arguments (4 given)

>>> ValidationError('foo', 'bar').message
__main__:1: DeprecationWarning: BaseException.message has been deprecated as of Python 2.6
'foo'

相比:

>>> MyAppValueError('foo', 'FOO', 'bar').message
'foo'

“Proper way to declare custom exceptions in modern Python?”

This is fine, unless your exception is really a type of a more specific exception:

class MyException(Exception):
    pass

Or better (maybe perfect), instead of pass give a docstring:

class MyException(Exception):
    """Raise for my specific kind of exception"""

Subclassing Exception Subclasses

From the docs

Exception

All built-in, non-system-exiting exceptions are derived from this class. All user-defined exceptions should also be derived from this class.

That means that if your exception is a type of a more specific exception, subclass that exception instead of the generic Exception (and the result will be that you still derive from Exception as the docs recommend). Also, you can at least provide a docstring (and not be forced to use the pass keyword):

class MyAppValueError(ValueError):
    '''Raise when my specific value is wrong'''

Set attributes you create yourself with a custom __init__. Avoid passing a dict as a positional argument, future users of your code will thank you. If you use the deprecated message attribute, assigning it yourself will avoid a DeprecationWarning:

class MyAppValueError(ValueError):
    '''Raise when a specific subset of values in context of app is wrong'''
    def __init__(self, message, foo, *args):
        self.message = message # without this you may get DeprecationWarning
        # Special attribute you desire with your Error, 
        # perhaps the value that caused the error?:
        self.foo = foo         
        # allow users initialize misc. arguments as any other builtin Error
        super(MyAppValueError, self).__init__(message, foo, *args) 

There’s really no need to write your own __str__ or __repr__. The builtin ones are very nice, and your cooperative inheritance ensures that you use it.

Critique of the top answer

Maybe I missed the question, but why not:

class MyException(Exception):
    pass

Again, the problem with the above is that in order to catch it, you’ll either have to name it specifically (importing it if created elsewhere) or catch Exception, (but you’re probably not prepared to handle all types of Exceptions, and you should only catch exceptions you are prepared to handle). Similar criticism to the below, but additionally that’s not the way to initialize via super, and you’ll get a DeprecationWarning if you access the message attribute:

Edit: to override something (or pass extra args), do this:

class ValidationError(Exception):
    def __init__(self, message, errors):

        # Call the base class constructor with the parameters it needs
        super(ValidationError, self).__init__(message)

        # Now for your custom code...
        self.errors = errors

That way you could pass dict of error messages to the second param, and get to it later with e.errors

It also requires exactly two arguments to be passed in (aside from the self.) No more, no less. That’s an interesting constraint that future users may not appreciate.

To be direct – it violates Liskov substitutability.

I’ll demonstrate both errors:

>>> ValidationError('foo', 'bar', 'baz').message

Traceback (most recent call last):
  File "<pyshell#10>", line 1, in <module>
    ValidationError('foo', 'bar', 'baz').message
TypeError: __init__() takes exactly 3 arguments (4 given)

>>> ValidationError('foo', 'bar').message
__main__:1: DeprecationWarning: BaseException.message has been deprecated as of Python 2.6
'foo'

Compared to:

>>> MyAppValueError('foo', 'FOO', 'bar').message
'foo'

回答 3

见异常缺省情况下是如何工作的,如果一个VS多个属性使用(回溯略):

>>> raise Exception('bad thing happened')
Exception: bad thing happened

>>> raise Exception('bad thing happened', 'code is broken')
Exception: ('bad thing happened', 'code is broken')

因此,您可能需要一种“ 异常模板 ”,以兼容的方式作为异常本身工作:

>>> nastyerr = NastyError('bad thing happened')
>>> raise nastyerr
NastyError: bad thing happened

>>> raise nastyerr()
NastyError: bad thing happened

>>> raise nastyerr('code is broken')
NastyError: ('bad thing happened', 'code is broken')

使用此子类可以轻松完成此操作

class ExceptionTemplate(Exception):
    def __call__(self, *args):
        return self.__class__(*(self.args + args))
# ...
class NastyError(ExceptionTemplate): pass

如果您不喜欢默认的类似元组的表示形式,只需将__str__方法添加到ExceptionTemplate类中,例如:

    # ...
    def __str__(self):
        return ': '.join(self.args)

然后你会

>>> raise nastyerr('code is broken')
NastyError: bad thing happened: code is broken

see how exceptions work by default if one vs more attributes are used (tracebacks omitted):

>>> raise Exception('bad thing happened')
Exception: bad thing happened

>>> raise Exception('bad thing happened', 'code is broken')
Exception: ('bad thing happened', 'code is broken')

so you might want to have a sort of “exception template“, working as an exception itself, in a compatible way:

>>> nastyerr = NastyError('bad thing happened')
>>> raise nastyerr
NastyError: bad thing happened

>>> raise nastyerr()
NastyError: bad thing happened

>>> raise nastyerr('code is broken')
NastyError: ('bad thing happened', 'code is broken')

this can be done easily with this subclass

class ExceptionTemplate(Exception):
    def __call__(self, *args):
        return self.__class__(*(self.args + args))
# ...
class NastyError(ExceptionTemplate): pass

and if you don’t like that default tuple-like representation, just add __str__ method to the ExceptionTemplate class, like:

    # ...
    def __str__(self):
        return ': '.join(self.args)

and you’ll have

>>> raise nastyerr('code is broken')
NastyError: bad thing happened: code is broken

回答 4

从Python 3.8(2018,https://docs.python.org/dev/whatsnew/3.8.html)开始,推荐的方法仍然是:

class CustomExceptionName(Exception):
    """Exception raised when very uncommon things happen"""
    pass

请不要忘记记录文档,为什么需要自定义异常!

如果需要,这是处理包含更多数据的异常的方法:

class CustomExceptionName(Exception):
    """Still an exception raised when uncommon things happen"""
    def __init__(self, message, payload=None):
        self.message = message
        self.payload = payload # you could add more args
    def __str__(self):
        return str(self.message) # __str__() obviously expects a string to be returned, so make sure not to send any other data types

并像这样获取它们:

try:
    raise CustomExceptionName("Very bad mistake.", "Forgot upgrading from Python 1")
except CustomExceptionName as error:
    print(str(error)) # Very bad mistake
    print("Detail: {}".format(error.payload)) # Detail: Forgot upgrading from Python 1

payload=None使它变得可腌很重要。转储之前,您必须调用error.__reduce__()。加载将按预期工作。

return如果您需要将大量数据传输到某些外部结构,则可能应该调查使用pythons 语句查找解决方案。对我来说,这似乎更清楚/更pythonic。高级异常在Java中大量使用,当使用框架并不得不捕获所有可能的错误时,有时会很烦人。

As of Python 3.8 (2018, https://docs.python.org/dev/whatsnew/3.8.html), the recommended method is still:

class CustomExceptionName(Exception):
    """Exception raised when very uncommon things happen"""
    pass

Please don’t forget to document, why a custom exception is neccessary!

If you need to, this is the way to go for exceptions with more data:

class CustomExceptionName(Exception):
    """Still an exception raised when uncommon things happen"""
    def __init__(self, message, payload=None):
        self.message = message
        self.payload = payload # you could add more args
    def __str__(self):
        return str(self.message) # __str__() obviously expects a string to be returned, so make sure not to send any other data types

and fetch them like:

try:
    raise CustomExceptionName("Very bad mistake.", "Forgot upgrading from Python 1")
except CustomExceptionName as error:
    print(str(error)) # Very bad mistake
    print("Detail: {}".format(error.payload)) # Detail: Forgot upgrading from Python 1

payload=None is important to make it pickle-able. Before dumping it, you have to call error.__reduce__(). Loading will work as expected.

You maybe should investigate in finding a solution using pythons return statement if you need much data to be transferred to some outer structure. This seems to be clearer/more pythonic to me. Advanced exceptions are heavily used in Java, which can sometimes be annoying, when using a framework and having to catch all possible errors.


回答 5

您应该重写__repr____unicode__方法,而不使用消息,构造异常时提供的参数将位于args异常对象的属性中。

You should override __repr__ or __unicode__ methods instead of using message, the args you provide when you construct the exception will be in the args attribute of the exception object.


回答 6

不,“消息”不是禁止的。只是过时了。您的应用程序可以正常使用消息。但是,您当然可以摆脱折旧错误。

当为应用程序创建自定义Exception类时,它们中的许多不仅仅从Exception继承子类,还从ValueError之类的其他子类继承子类。然后,您必须适应它们对变量的使用。

而且,如果您的应用程序中有很多异常,通常最好为所有异常都拥有一个通用的自定义基类,以便模块的用户可以

try:
    ...
except NelsonsExceptions:
    ...

在这种情况下,您可以在__init__ and __str__那里进行所需的操作,因此您不必为每个异常重复执行该操作。但是简单地调用message变量而不是message可以解决问题。

无论如何,__init__ or __str__如果您做的事情与Exception本身不同,则仅需要使用。并且因为如果不赞成使用,那么您将同时需要两者,否则您将得到一个错误。每个类不需要很多额外的代码。;)

No, “message” is not forbidden. It’s just deprecated. You application will work fine with using message. But you may want to get rid of the deprecation error, of course.

When you create custom Exception classes for your application, many of them do not subclass just from Exception, but from others, like ValueError or similar. Then you have to adapt to their usage of variables.

And if you have many exceptions in your application it’s usually a good idea to have a common custom base class for all of them, so that users of your modules can do

try:
    ...
except NelsonsExceptions:
    ...

And in that case you can do the __init__ and __str__ needed there, so you don’t have to repeat it for every exception. But simply calling the message variable something else than message does the trick.

In any case, you only need the __init__ or __str__ if you do something different from what Exception itself does. And because if the deprecation, you then need both, or you get an error. That’s not a whole lot of extra code you need per class. ;)


回答 7

请参阅非常好的文章“ Python异常的权威指南 ”。基本原则是:

  • 始终从(至少)继承。
  • 始终BaseException.__init__仅使用一个参数进行调用。
  • 构建库时,请定义从Exception继承的基类。
  • 提供有关错误的详细信息。
  • 从内置异常类型继承是有意义的。

还有关于组织(在模块中)和包装异常的信息,我建议阅读指南。

See a very good article “The definitive guide to Python exceptions“. The basic principles are:

  • Always inherit from (at least) Exception.
  • Always call BaseException.__init__ with only one argument.
  • When building a library, define a base class inheriting from Exception.
  • Provide details about the error.
  • Inherit from builtin exceptions types when it makes sense.

There is also information on organizing (in modules) and wrapping exceptions, I recommend to read the guide.


回答 8

试试这个例子

class InvalidInputError(Exception):
    def __init__(self, msg):
        self.msg = msg
    def __str__(self):
        return repr(self.msg)

inp = int(input("Enter a number between 1 to 10:"))
try:
    if type(inp) != int or inp not in list(range(1,11)):
        raise InvalidInputError
except InvalidInputError:
    print("Invalid input entered")

Try this Example

class InvalidInputError(Exception):
    def __init__(self, msg):
        self.msg = msg
    def __str__(self):
        return repr(self.msg)

inp = int(input("Enter a number between 1 to 10:"))
try:
    if type(inp) != int or inp not in list(range(1,11)):
        raise InvalidInputError
except InvalidInputError:
    print("Invalid input entered")

回答 9

要正确定义自己的异常,需要遵循一些最佳实践:

  • 定义一个继承自的基类Exception。这将允许捕获与项目相关的任何异常(更具体的异常应从该项目继承):

    class MyProjectError(Exception):
        """A base class for MyProject exceptions."""

    在单独的模块(例如exceptions.py)中组织这些异常类通常是一个好主意。

  • 要将额外的参数传递给您的异常,请定义一个__init__()具有可变数量参数的自定义方法。调用基类__init__()将任何位置参数传递给它(记住BaseException/Exception期望有任意数量的位置参数):

    class CustomError(MyProjectError):
        def __init__(self, *args, **kwargs):
            super(CustomError, self).__init__(*args)
            self.foo = kwargs.get('foo')

    要使用额外的参数引发此类异常,可以使用:

    raise CustomError('Something bad happened', foo='foo')

    这样的设计居然坚持里氏替换原则,因为你可以用一个派生的异常类的实例代替基本异常类的实例。

To define your own exceptions correctly, there are a few best practices that you need to follow:

  • Define a base class inheriting from Exception. This will allow to catch any exception related to the project (more specific exceptions should inherit from it):

    class MyProjectError(Exception):
        """A base class for MyProject exceptions."""
    

    Organizing these exception classes in a separate module (e.g. exceptions.py) is generally a good idea.

  • To pass extra argument(s) to your exception, define a custom __init__() method with a variable number of arguments. Call the base class’s __init__() passing any positional arguments to it (remember that BaseException/Exception expect any number of positional arguments):

    class CustomError(MyProjectError):
        def __init__(self, *args, **kwargs):
            super(CustomError, self).__init__(*args)
            self.foo = kwargs.get('foo')
    

    To raise such exception with an extra argument you can use:

    raise CustomError('Something bad happened', foo='foo')
    

    This design actually adheres to the Liskov substitution principle, since you can replace an instance of a base exception class with an instance of a derived exception class.


如何配置Python脚本?

问题:如何配置Python脚本?

欧拉计画和其他编码竞赛经常有最长的运行时间,或者人们吹嘘他们的特定解决方案的运行速度。使用Python时,有时这些方法有些繁琐-即向中添加计时代码__main__

剖析Python程序需要花费多长时间的好方法是什么?

Project Euler and other coding contests often have a maximum time to run or people boast of how fast their particular solution runs. With Python, sometimes the approaches are somewhat kludgey – i.e., adding timing code to __main__.

What is a good way to profile how long a Python program takes to run?


回答 0

Python包含一个名为cProfile的探查器。它不仅给出了总的运行时间,还分别对每个函数进行了计时,并告诉您每个函数被调用了多少次,从而使您轻松确定应该在哪里进行优化。

您可以从代码内部或解释器中调用它,如下所示:

import cProfile
cProfile.run('foo()')

更有用的是,您可以在运行脚本时调用cProfile:

python -m cProfile myscript.py

为了使其更容易,我制作了一个名为“ profile.bat”的批处理文件:

python -m cProfile %1

所以我要做的就是运行:

profile euler048.py

我得到这个:

1007 function calls in 0.061 CPU seconds

Ordered by: standard name
ncalls  tottime  percall  cumtime  percall filename:lineno(function)
    1    0.000    0.000    0.061    0.061 <string>:1(<module>)
 1000    0.051    0.000    0.051    0.000 euler048.py:2(<lambda>)
    1    0.005    0.005    0.061    0.061 euler048.py:2(<module>)
    1    0.000    0.000    0.061    0.061 {execfile}
    1    0.002    0.002    0.053    0.053 {map}
    1    0.000    0.000    0.000    0.000 {method 'disable' of '_lsprof.Profiler objects}
    1    0.000    0.000    0.000    0.000 {range}
    1    0.003    0.003    0.003    0.003 {sum}

编辑:更新了指向PyCon 2013的视频资源的链接,标题为 Python Profiling
Also via YouTube

Python includes a profiler called cProfile. It not only gives the total running time, but also times each function separately, and tells you how many times each function was called, making it easy to determine where you should make optimizations.

You can call it from within your code, or from the interpreter, like this:

import cProfile
cProfile.run('foo()')

Even more usefully, you can invoke the cProfile when running a script:

python -m cProfile myscript.py

To make it even easier, I made a little batch file called ‘profile.bat’:

python -m cProfile %1

So all I have to do is run:

profile euler048.py

And I get this:

1007 function calls in 0.061 CPU seconds

Ordered by: standard name
ncalls  tottime  percall  cumtime  percall filename:lineno(function)
    1    0.000    0.000    0.061    0.061 <string>:1(<module>)
 1000    0.051    0.000    0.051    0.000 euler048.py:2(<lambda>)
    1    0.005    0.005    0.061    0.061 euler048.py:2(<module>)
    1    0.000    0.000    0.061    0.061 {execfile}
    1    0.002    0.002    0.053    0.053 {map}
    1    0.000    0.000    0.000    0.000 {method 'disable' of '_lsprof.Profiler objects}
    1    0.000    0.000    0.000    0.000 {range}
    1    0.003    0.003    0.003    0.003 {sum}

EDIT: Updated link to a good video resource from PyCon 2013 titled Python Profiling
Also via YouTube.


回答 1

前一阵子,我pycallgraph从您的Python代码生成了可视化效果。编辑:我已经更新了该示例以使其可用于本文撰写时的最新版本3.3。

pip install pycallgraph安装GraphViz之后,您可以从命令行运行它:

pycallgraph graphviz -- ./mypythonscript.py

或者,您可以分析代码的特定部分:

from pycallgraph import PyCallGraph
from pycallgraph.output import GraphvizOutput

with PyCallGraph(output=GraphvizOutput()):
    code_to_profile()

这些都将生成pycallgraph.png类似于下图的文件:

在此处输入图片说明

A while ago I made pycallgraph which generates a visualisation from your Python code. Edit: I’ve updated the example to work with 3.3, the latest release as of this writing.

After a pip install pycallgraph and installing GraphViz you can run it from the command line:

pycallgraph graphviz -- ./mypythonscript.py

Or, you can profile particular parts of your code:

from pycallgraph import PyCallGraph
from pycallgraph.output import GraphvizOutput

with PyCallGraph(output=GraphvizOutput()):
    code_to_profile()

Either of these will generate a pycallgraph.png file similar to the image below:

enter image description here


回答 2

值得指出的是,使用探查器仅在主线程上有效(默认情况下),如果使用其他线程,则不会从其他线程获得任何信息。这可能有点麻烦,因为在探查器文档中完全没有提及。

如果您还想分析线程,则需要查看文档中的threading.setprofile()函数

您也可以创建自己的threading.Thread子类来做到这一点:

class ProfiledThread(threading.Thread):
    # Overrides threading.Thread.run()
    def run(self):
        profiler = cProfile.Profile()
        try:
            return profiler.runcall(threading.Thread.run, self)
        finally:
            profiler.dump_stats('myprofile-%d.profile' % (self.ident,))

并使用ProfiledThread该类而不是标准类。它可能会给您带来更大的灵活性,但是我不确定是否值得,特别是如果您使用的是不使用您的类的第三方代码。

It’s worth pointing out that using the profiler only works (by default) on the main thread, and you won’t get any information from other threads if you use them. This can be a bit of a gotcha as it is completely unmentioned in the profiler documentation.

If you also want to profile threads, you’ll want to look at the threading.setprofile() function in the docs.

You could also create your own threading.Thread subclass to do it:

class ProfiledThread(threading.Thread):
    # Overrides threading.Thread.run()
    def run(self):
        profiler = cProfile.Profile()
        try:
            return profiler.runcall(threading.Thread.run, self)
        finally:
            profiler.dump_stats('myprofile-%d.profile' % (self.ident,))

and use that ProfiledThread class instead of the standard one. It might give you more flexibility, but I’m not sure it’s worth it, especially if you are using third-party code which wouldn’t use your class.


回答 3

python Wiki是用于分析资源的好页面:http : //wiki.python.org/moin/PythonSpeed/PerformanceTips#Profiling_Code

就像python docs一样:http : //docs.python.org/library/profile.html

如Chris Lawlor所示,cProfile是一个很棒的工具,可以轻松地用于打印到屏幕上:

python -m cProfile -s time mine.py <args>

或提交:

python -m cProfile -o output.file mine.py <args>

PS>如果您使用的是Ubuntu,请确保安装python-profile

apt-get install python-profiler 

如果输出到文件,则可以使用以下工具获得不错的可视化效果

PyCallGraph:用于创建调用图图像的工具
安装:

 pip install pycallgraph

跑:

 pycallgraph mine.py args

视图:

 gimp pycallgraph.png

您可以使用任何喜欢的方式查看png文件,我使用的是gimp
不幸的是我经常得到

点:对于cairo-renderer位图,图形太大。按0.257079缩放以适应

这使我的图像变小了。所以我通常创建svg文件:

pycallgraph -f svg -o pycallgraph.svg mine.py <args>

PS>确保安装graphviz(提供点程序):

pip install graphviz

通过@maxy / @quodlibetor使用gprof2dot进行替代绘图:

pip install gprof2dot
python -m cProfile -o profile.pstats mine.py
gprof2dot -f pstats profile.pstats | dot -Tsvg -o mine.svg

The python wiki is a great page for profiling resources: http://wiki.python.org/moin/PythonSpeed/PerformanceTips#Profiling_Code

as is the python docs: http://docs.python.org/library/profile.html

as shown by Chris Lawlor cProfile is a great tool and can easily be used to print to the screen:

python -m cProfile -s time mine.py <args>

or to file:

python -m cProfile -o output.file mine.py <args>

PS> If you are using Ubuntu, make sure to install python-profile

apt-get install python-profiler 

If you output to file you can get nice visualizations using the following tools

PyCallGraph : a tool to create call graph images
install:

 pip install pycallgraph

run:

 pycallgraph mine.py args

view:

 gimp pycallgraph.png

You can use whatever you like to view the png file, I used gimp
Unfortunately I often get

dot: graph is too large for cairo-renderer bitmaps. Scaling by 0.257079 to fit

which makes my images unusably small. So I generally create svg files:

pycallgraph -f svg -o pycallgraph.svg mine.py <args>

PS> make sure to install graphviz (which provides the dot program):

pip install graphviz

Alternative Graphing using gprof2dot via @maxy / @quodlibetor :

pip install gprof2dot
python -m cProfile -o profile.pstats mine.py
gprof2dot -f pstats profile.pstats | dot -Tsvg -o mine.svg

回答 4

@Maxy对这个答案的评论为我提供了足够的帮助,我认为它应该得到自己的答案:我已经有cProfile生成的.pstats文件,并且我不想用pycallgraph重新运行,所以我使用了gprof2dot,并且很漂亮svgs:

$ sudo apt-get install graphviz
$ git clone https://github.com/jrfonseca/gprof2dot
$ ln -s "$PWD"/gprof2dot/gprof2dot.py ~/bin
$ cd $PROJECT_DIR
$ gprof2dot.py -f pstats profile.pstats | dot -Tsvg -o callgraph.svg

和布莱姆!

它使用点(pycallgraph使用相同的东西),因此输出看起来类似。我的印象是,尽管gprof2dot丢失的信息更少:

gprof2dot示例输出

@Maxy’s comment on this answer helped me out enough that I think it deserves its own answer: I already had cProfile-generated .pstats files and I didn’t want to re-run things with pycallgraph, so I used gprof2dot, and got pretty svgs:

$ sudo apt-get install graphviz
$ git clone https://github.com/jrfonseca/gprof2dot
$ ln -s "$PWD"/gprof2dot/gprof2dot.py ~/bin
$ cd $PROJECT_DIR
$ gprof2dot.py -f pstats profile.pstats | dot -Tsvg -o callgraph.svg

and BLAM!

It uses dot (the same thing that pycallgraph uses) so output looks similar. I get the impression that gprof2dot loses less information though:

gprof2dot example output


回答 5

在研究此主题时,我遇到了一个名为SnakeViz的便捷工具。SnakeViz是基于Web的配置文件可视化工具。这是非常容易安装和使用。我通常使用的方法是使用生成统计文件,%prun然后在SnakeViz中进行分析。

所使用的主要可视化技术是如下所示的森伯斯特图,其中,函数调用的层次结构被安排为弧形层,并且时间信息以其角宽进行编码。

最好的事情是您可以与图表进行交互。例如,要放大,可以单击圆弧,然后将圆弧及其后代放大为新的旭日形以显示更多详细信息。

在此处输入图片说明

I ran into a handy tool called SnakeViz when researching this topic. SnakeViz is a web-based profiling visualization tool. It is very easy to install and use. The usual way I use it is to generate a stat file with %prun and then do analysis in SnakeViz.

The main viz technique used is Sunburst chart as shown below, in which the hierarchy of function calls is arranged as layers of arcs and time info encoded in their angular widths.

The best thing is you can interact with the chart. For example, to zoom in one can click on an arc, and the arc and its descendants will be enlarged as a new sunburst to display more details.

enter image description here


回答 6

最简单最快的方式找到所有的时间是怎么回事。

1. pip install snakeviz

2. python -m cProfile -o temp.dat <PROGRAM>.py

3. snakeviz temp.dat

在浏览器中绘制饼图。最大的一块是问题功能。很简单。

Simplest and quickest way to find where all the time is going.

1. pip install snakeviz

2. python -m cProfile -o temp.dat <PROGRAM>.py

3. snakeviz temp.dat

Draws a pie chart in a browser. Biggest piece is the problem function. Very simple.


回答 7

我认为这cProfile对于概要分析非常有用,而kcachegrind对于可视化结果则非常有用。该pyprof2calltree文件转换手柄之间英寸

python -m cProfile -o script.profile script.py
pyprof2calltree -i script.profile -o script.calltree
kcachegrind script.calltree

要安装必需的工具(至少在Ubuntu上):

apt-get install kcachegrind
pip install pyprof2calltree

结果:

结果截图

I think that cProfile is great for profiling, while kcachegrind is great for visualizing the results. The pyprof2calltree in between handles the file conversion.

python -m cProfile -o script.profile script.py
pyprof2calltree -i script.profile -o script.calltree
kcachegrind script.calltree

To install the required tools (on Ubuntu, at least):

apt-get install kcachegrind
pip install pyprof2calltree

The result:

Screenshot of the result


回答 8

同样值得一提的是GUI cProfile转储查看器RunSnakeRun。它允许您排序和选择,从而放大程序的相关部分。图片中矩形的大小与所花费的时间成比例。如果将鼠标悬停在矩形上,它将突出显示表格中以及地图上所有位置的调用。当您双击一个矩形时,它将放大该部分。它将显示谁调用了该部分以及该部分调用了什么。

描述性信息非常有帮助。它显示了该位的代码,在处理内置库调用时可能会有所帮助。它告诉您要查找代码的文件和行。

还想指出,OP表示“概要分析”,但看来他的意思是“定时”。请记住,配置文件后,程序运行速度会变慢。

在此处输入图片说明

Also worth mentioning is the GUI cProfile dump viewer RunSnakeRun. It allows you to sort and select, thereby zooming in on the relevant parts of the program. The sizes of the rectangles in the picture is proportional to the time taken. If you mouse over a rectangle it highlights that call in the table and everywhere on the map. When you double-click on a rectangle it zooms in on that portion. It will show you who calls that portion and what that portion calls.

The descriptive information is very helpful. It shows you the code for that bit which can be helpful when you are dealing with built-in library calls. It tells you what file and what line to find the code.

Also want to point at that the OP said ‘profiling’ but it appears he meant ‘timing’. Keep in mind programs will run slower when profiled.

enter image description here


回答 9

一个不错的分析模块是line_profiler(使用脚本kernprof.py调用)。可以在这里下载。

我的理解是cProfile仅提供有关每个功能花费的总时间的信息。因此,单独的代码行不会计时。这是科学计算中的一个问题,因为单行通常会花费很多时间。另外,正如我记得的那样,cProfile并没有抓住我花在numpy.dot上的时间。

A nice profiling module is the line_profiler (called using the script kernprof.py). It can be downloaded here.

My understanding is that cProfile only gives information about total time spent in each function. So individual lines of code are not timed. This is an issue in scientific computing since often one single line can take a lot of time. Also, as I remember, cProfile didn’t catch the time I was spending in say numpy.dot.


回答 10

我最近创建了金枪鱼,用于可视化Python运行时和导入配置文件。这可能会有所帮助。

在此处输入图片说明

用安装

pip install tuna

创建运行时配置文件

python3 -m cProfile -o program.prof yourfile.py

或导入配置文件(需要Python 3.7+)

python3 -X importprofile yourfile.py 2> import.log

然后在文件上运行金枪鱼

tuna program.prof

I recently created tuna for visualizing Python runtime and import profiles; this may be helpful here.

enter image description here

Install with

pip install tuna

Create a runtime profile

python3 -m cProfile -o program.prof yourfile.py

or an import profile (Python 3.7+ required)

python3 -X importprofile yourfile.py 2> import.log

Then just run tuna on the file

tuna program.prof

回答 11

pprofile

line_profiler(已在此处展示)也受到了启发 pprofile,它被描述为:

线粒度,线程感知的确定性和统计纯Python探查器

它提供的行粒度为line_profiler,它是纯Python,可以用作独立命令或模块,甚至可以生成可以使用轻松分析的callgrind格式文件[k|q]cachegrind

vprof

还有vprof,一个Python包,描述为:

为各种Python程序特征(例如运行时间和内存使用情况)提供丰富的交互式可视化。

热图

pprofile

line_profiler (already presented here) also inspired pprofile, which is described as:

Line-granularity, thread-aware deterministic and statistic pure-python profiler

It provides line-granularity as line_profiler, is pure Python, can be used as a standalone command or a module, and can even generate callgrind-format files that can be easily analyzed with [k|q]cachegrind.

vprof

There is also vprof, a Python package described as:

[…] providing rich and interactive visualizations for various Python program characteristics such as running time and memory usage.

heatmap


回答 12

有很多不错的答案,但是他们要么使用命令行,要么使用某些外部程序来对结果进行概要分析和/或排序。

我真的很想念我可以在IDE(eclipse-PyDev)中使用的某些方式,而无需触摸命令行或安装任何东西。就是这样

不使用命令行进行分析

def count():
    from math import sqrt
    for x in range(10**5):
        sqrt(x)

if __name__ == '__main__':
    import cProfile, pstats
    cProfile.run("count()", "{}.profile".format(__file__))
    s = pstats.Stats("{}.profile".format(__file__))
    s.strip_dirs()
    s.sort_stats("time").print_stats(10)

有关更多信息,请参阅文档或其他答案。

There’s a lot of great answers but they either use command line or some external program for profiling and/or sorting the results.

I really missed some way I could use in my IDE (eclipse-PyDev) without touching the command line or installing anything. So here it is.

Profiling without command line

def count():
    from math import sqrt
    for x in range(10**5):
        sqrt(x)

if __name__ == '__main__':
    import cProfile, pstats
    cProfile.run("count()", "{}.profile".format(__file__))
    s = pstats.Stats("{}.profile".format(__file__))
    s.strip_dirs()
    s.sort_stats("time").print_stats(10)

See docs or other answers for more info.


回答 13

在Joe Shaw回答了多线程代码无法按预期工作的回答之后,我发现runcallcProfile 中的方法只是在做,self.enable()并且self.disable()在配置函数调用周围进行调用,因此您可以自己进行操作,并在中间使用任何想要的代码对现有代码的干扰最小。

Following Joe Shaw’s answer about multi-threaded code not to work as expected, I figured that the runcall method in cProfile is merely doing self.enable() and self.disable() calls around the profiled function call, so you can simply do that yourself and have whatever code you want in-between with minimal interference with existing code.


回答 14

在Virtaal的资料中,有一个非常有用的类和装饰器,可以使分析(即使对于特定的方法/函数)也非常容易。然后可以在KCacheGrind中非常舒适地查看输出。

In Virtaal’s source there’s a very useful class and decorator that can make profiling (even for specific methods/functions) very easy. The output can then be viewed very comfortably in KCacheGrind.


回答 15

cProfile非常适合快速分析,但是大多数情况下它以错误结束。函数runctx通过正确初始化环境和变量来解决此问题,希望它对某人有用:

import cProfile
cProfile.runctx('foo()', None, locals())

cProfile is great for quick profiling but most of the time it was ending for me with the errors. Function runctx solves this problem by initializing correctly the environment and variables, hope it can be useful for someone:

import cProfile
cProfile.runctx('foo()', None, locals())

回答 16

如果要制作累积分析器,则意味着连续运行该函数几次,并观察结果的总和。

您可以使用以下cumulative_profiler装饰器:

它是特定于python> = 3.6的python,但是您可以删除nonlocal它,以便在较旧版本上运行。

import cProfile, pstats

class _ProfileFunc:
    def __init__(self, func, sort_stats_by):
        self.func =  func
        self.profile_runs = []
        self.sort_stats_by = sort_stats_by

    def __call__(self, *args, **kwargs):
        pr = cProfile.Profile()
        pr.enable()  # this is the profiling section
        retval = self.func(*args, **kwargs)
        pr.disable()

        self.profile_runs.append(pr)
        ps = pstats.Stats(*self.profile_runs).sort_stats(self.sort_stats_by)
        return retval, ps

def cumulative_profiler(amount_of_times, sort_stats_by='time'):
    def real_decorator(function):
        def wrapper(*args, **kwargs):
            nonlocal function, amount_of_times, sort_stats_by  # for python 2.x remove this row

            profiled_func = _ProfileFunc(function, sort_stats_by)
            for i in range(amount_of_times):
                retval, ps = profiled_func(*args, **kwargs)
            ps.print_stats()
            return retval  # returns the results of the function
        return wrapper

    if callable(amount_of_times):  # incase you don't want to specify the amount of times
        func = amount_of_times  # amount_of_times is the function in here
        amount_of_times = 5  # the default amount
        return real_decorator(func)
    return real_decorator

分析功能 baz

import time

@cumulative_profiler
def baz():
    time.sleep(1)
    time.sleep(2)
    return 1

baz()

baz 跑了5次并打印了这个:

         20 function calls in 15.003 seconds

   Ordered by: internal time

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
       10   15.003    1.500   15.003    1.500 {built-in method time.sleep}
        5    0.000    0.000   15.003    3.001 <ipython-input-9-c89afe010372>:3(baz)
        5    0.000    0.000    0.000    0.000 {method 'disable' of '_lsprof.Profiler' objects}

指定次数

@cumulative_profiler(3)
def baz():
    ...

If you want to make a cumulative profiler, meaning to run the function several times in a row and watch the sum of the results.

you can use this cumulative_profiler decorator:

it’s python >= 3.6 specific, but you can remove nonlocal for it work on older versions.

import cProfile, pstats

class _ProfileFunc:
    def __init__(self, func, sort_stats_by):
        self.func =  func
        self.profile_runs = []
        self.sort_stats_by = sort_stats_by

    def __call__(self, *args, **kwargs):
        pr = cProfile.Profile()
        pr.enable()  # this is the profiling section
        retval = self.func(*args, **kwargs)
        pr.disable()

        self.profile_runs.append(pr)
        ps = pstats.Stats(*self.profile_runs).sort_stats(self.sort_stats_by)
        return retval, ps

def cumulative_profiler(amount_of_times, sort_stats_by='time'):
    def real_decorator(function):
        def wrapper(*args, **kwargs):
            nonlocal function, amount_of_times, sort_stats_by  # for python 2.x remove this row

            profiled_func = _ProfileFunc(function, sort_stats_by)
            for i in range(amount_of_times):
                retval, ps = profiled_func(*args, **kwargs)
            ps.print_stats()
            return retval  # returns the results of the function
        return wrapper

    if callable(amount_of_times):  # incase you don't want to specify the amount of times
        func = amount_of_times  # amount_of_times is the function in here
        amount_of_times = 5  # the default amount
        return real_decorator(func)
    return real_decorator

Example

profiling the function baz

import time

@cumulative_profiler
def baz():
    time.sleep(1)
    time.sleep(2)
    return 1

baz()

baz ran 5 times and printed this:

         20 function calls in 15.003 seconds

   Ordered by: internal time

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
       10   15.003    1.500   15.003    1.500 {built-in method time.sleep}
        5    0.000    0.000   15.003    3.001 <ipython-input-9-c89afe010372>:3(baz)
        5    0.000    0.000    0.000    0.000 {method 'disable' of '_lsprof.Profiler' objects}

specifying the amount of times

@cumulative_profiler(3)
def baz():
    ...

回答 17

仅终端机(也是最简单的)解决方案,如果所有这些精美的UI无法安装或运行:完全
忽略cProfile并替换为pyinstrument,它将在执行后立即收集并显示调用树。

安装:

$ pip install pyinstrument

配置文件和显示结果:

$ python -m pyinstrument ./prog.py

适用于python2和3。

[编辑]仅用于分析部分代码的API文档可在此处找到。

The terminal-only (and simplest) solution, in case all those fancy UI’s fail to install or to run:
ignore cProfile completely and replace it with pyinstrument, that will collect and display the tree of calls right after execution.

Install:

$ pip install pyinstrument

Profile and display result:

$ python -m pyinstrument ./prog.py

Works with python2 and 3.

[EDIT] The documentation of the API, for profiling only a part of the code, can be found here.


回答 18

我的方式是使用yappi(https://github.com/sumerc/yappi)。与RPC服务器结合使用时特别有用,在RPC服务器中(甚至仅用于调试),您注册方法以启动,停止和打印性能分析信息,例如:

@staticmethod
def startProfiler():
    yappi.start()

@staticmethod
def stopProfiler():
    yappi.stop()

@staticmethod
def printProfiler():
    stats = yappi.get_stats(yappi.SORTTYPE_TTOT, yappi.SORTORDER_DESC, 20)
    statPrint = '\n'
    namesArr = [len(str(stat[0])) for stat in stats.func_stats]
    log.debug("namesArr %s", str(namesArr))
    maxNameLen = max(namesArr)
    log.debug("maxNameLen: %s", maxNameLen)

    for stat in stats.func_stats:
        nameAppendSpaces = [' ' for i in range(maxNameLen - len(stat[0]))]
        log.debug('nameAppendSpaces: %s', nameAppendSpaces)
        blankSpace = ''
        for space in nameAppendSpaces:
            blankSpace += space

        log.debug("adding spaces: %s", len(nameAppendSpaces))
        statPrint = statPrint + str(stat[0]) + blankSpace + " " + str(stat[1]).ljust(8) + "\t" + str(
            round(stat[2], 2)).ljust(8 - len(str(stat[2]))) + "\t" + str(round(stat[3], 2)) + "\n"

    log.log(1000, "\nname" + ''.ljust(maxNameLen - 4) + " ncall \tttot \ttsub")
    log.log(1000, statPrint)

然后,当程序工作时,您可以随时通过调用startProfilerRPC方法来启动事件探查器,并通过调用printProfiler(或修改rpc方法以将其返回给调用者)将概要分析信息转储到日志文件中,并获得以下输出:

2014-02-19 16:32:24,128-|SVR-MAIN  |-(Thread-3   )-Level 1000: 
name                                                                                                                                      ncall     ttot    tsub
2014-02-19 16:32:24,128-|SVR-MAIN  |-(Thread-3   )-Level 1000: 
C:\Python27\lib\sched.py.run:80                                                                                                           22        0.11    0.05
M:\02_documents\_repos\09_aheadRepos\apps\ahdModbusSrv\pyAheadRpcSrv\xmlRpc.py.iterFnc:293                                                22        0.11    0.0
M:\02_documents\_repos\09_aheadRepos\apps\ahdModbusSrv\serverMain.py.makeIteration:515                                                    22        0.11    0.0
M:\02_documents\_repos\09_aheadRepos\apps\ahdModbusSrv\pyAheadRpcSrv\PicklingXMLRPC.py._dispatch:66                                       1         0.0     0.0
C:\Python27\lib\BaseHTTPServer.py.date_time_string:464                                                                                    1         0.0     0.0
c:\users\zasiec~1\appdata\local\temp\easy_install-hwcsr1\psutil-1.1.2-py2.7-win32.egg.tmp\psutil\_psmswindows.py._get_raw_meminfo:243     4         0.0     0.0
C:\Python27\lib\SimpleXMLRPCServer.py.decode_request_content:537                                                                          1         0.0     0.0
c:\users\zasiec~1\appdata\local\temp\easy_install-hwcsr1\psutil-1.1.2-py2.7-win32.egg.tmp\psutil\_psmswindows.py.get_system_cpu_times:148 4         0.0     0.0
<string>.__new__:8                                                                                                                        220       0.0     0.0
C:\Python27\lib\socket.py.close:276                                                                                                       4         0.0     0.0
C:\Python27\lib\threading.py.__init__:558                                                                                                 1         0.0     0.0
<string>.__new__:8                                                                                                                        4         0.0     0.0
C:\Python27\lib\threading.py.notify:372                                                                                                   1         0.0     0.0
C:\Python27\lib\rfc822.py.getheader:285                                                                                                   4         0.0     0.0
C:\Python27\lib\BaseHTTPServer.py.handle_one_request:301                                                                                  1         0.0     0.0
C:\Python27\lib\xmlrpclib.py.end:816                                                                                                      3         0.0     0.0
C:\Python27\lib\SimpleXMLRPCServer.py.do_POST:467                                                                                         1         0.0     0.0
C:\Python27\lib\SimpleXMLRPCServer.py.is_rpc_path_valid:460                                                                               1         0.0     0.0
C:\Python27\lib\SocketServer.py.close_request:475                                                                                         1         0.0     0.0
c:\users\zasiec~1\appdata\local\temp\easy_install-hwcsr1\psutil-1.1.2-py2.7-win32.egg.tmp\psutil\__init__.py.cpu_times:1066               4         0.0     0.0 

它对于短脚本可能不是很有用,但有助于优化服务器类型的进程,尤其是考虑到该printProfiler方法可以随时间多次调用以概要分析和比较例如不同的程序使用情况时,尤其如此。

在较新版本的yappi中,以下代码将起作用:

@staticmethod
def printProfile():
    yappi.get_func_stats().print_all()

My way is to use yappi (https://github.com/sumerc/yappi). It’s especially useful combined with an RPC server where (even just for debugging) you register method to start, stop and print profiling information, e.g. in this way:

@staticmethod
def startProfiler():
    yappi.start()

@staticmethod
def stopProfiler():
    yappi.stop()

@staticmethod
def printProfiler():
    stats = yappi.get_stats(yappi.SORTTYPE_TTOT, yappi.SORTORDER_DESC, 20)
    statPrint = '\n'
    namesArr = [len(str(stat[0])) for stat in stats.func_stats]
    log.debug("namesArr %s", str(namesArr))
    maxNameLen = max(namesArr)
    log.debug("maxNameLen: %s", maxNameLen)

    for stat in stats.func_stats:
        nameAppendSpaces = [' ' for i in range(maxNameLen - len(stat[0]))]
        log.debug('nameAppendSpaces: %s', nameAppendSpaces)
        blankSpace = ''
        for space in nameAppendSpaces:
            blankSpace += space

        log.debug("adding spaces: %s", len(nameAppendSpaces))
        statPrint = statPrint + str(stat[0]) + blankSpace + " " + str(stat[1]).ljust(8) + "\t" + str(
            round(stat[2], 2)).ljust(8 - len(str(stat[2]))) + "\t" + str(round(stat[3], 2)) + "\n"

    log.log(1000, "\nname" + ''.ljust(maxNameLen - 4) + " ncall \tttot \ttsub")
    log.log(1000, statPrint)

Then when your program work you can start profiler at any time by calling the startProfiler RPC method and dump profiling information to a log file by calling printProfiler (or modify the rpc method to return it to the caller) and get such output:

2014-02-19 16:32:24,128-|SVR-MAIN  |-(Thread-3   )-Level 1000: 
name                                                                                                                                      ncall     ttot    tsub
2014-02-19 16:32:24,128-|SVR-MAIN  |-(Thread-3   )-Level 1000: 
C:\Python27\lib\sched.py.run:80                                                                                                           22        0.11    0.05
M:\02_documents\_repos\09_aheadRepos\apps\ahdModbusSrv\pyAheadRpcSrv\xmlRpc.py.iterFnc:293                                                22        0.11    0.0
M:\02_documents\_repos\09_aheadRepos\apps\ahdModbusSrv\serverMain.py.makeIteration:515                                                    22        0.11    0.0
M:\02_documents\_repos\09_aheadRepos\apps\ahdModbusSrv\pyAheadRpcSrv\PicklingXMLRPC.py._dispatch:66                                       1         0.0     0.0
C:\Python27\lib\BaseHTTPServer.py.date_time_string:464                                                                                    1         0.0     0.0
c:\users\zasiec~1\appdata\local\temp\easy_install-hwcsr1\psutil-1.1.2-py2.7-win32.egg.tmp\psutil\_psmswindows.py._get_raw_meminfo:243     4         0.0     0.0
C:\Python27\lib\SimpleXMLRPCServer.py.decode_request_content:537                                                                          1         0.0     0.0
c:\users\zasiec~1\appdata\local\temp\easy_install-hwcsr1\psutil-1.1.2-py2.7-win32.egg.tmp\psutil\_psmswindows.py.get_system_cpu_times:148 4         0.0     0.0
<string>.__new__:8                                                                                                                        220       0.0     0.0
C:\Python27\lib\socket.py.close:276                                                                                                       4         0.0     0.0
C:\Python27\lib\threading.py.__init__:558                                                                                                 1         0.0     0.0
<string>.__new__:8                                                                                                                        4         0.0     0.0
C:\Python27\lib\threading.py.notify:372                                                                                                   1         0.0     0.0
C:\Python27\lib\rfc822.py.getheader:285                                                                                                   4         0.0     0.0
C:\Python27\lib\BaseHTTPServer.py.handle_one_request:301                                                                                  1         0.0     0.0
C:\Python27\lib\xmlrpclib.py.end:816                                                                                                      3         0.0     0.0
C:\Python27\lib\SimpleXMLRPCServer.py.do_POST:467                                                                                         1         0.0     0.0
C:\Python27\lib\SimpleXMLRPCServer.py.is_rpc_path_valid:460                                                                               1         0.0     0.0
C:\Python27\lib\SocketServer.py.close_request:475                                                                                         1         0.0     0.0
c:\users\zasiec~1\appdata\local\temp\easy_install-hwcsr1\psutil-1.1.2-py2.7-win32.egg.tmp\psutil\__init__.py.cpu_times:1066               4         0.0     0.0 

It may not be very useful for short scripts but helps to optimize server-type processes especially given the printProfiler method can be called multiple times over time to profile and compare e.g. different program usage scenarios.

In newer versions of yappi, the following code will work:

@staticmethod
def printProfile():
    yappi.get_func_stats().print_all()

回答 19

PyVmMonitor是处理Python中性能分析的新工具:http ://www.pyvmmonitor.com/

它具有一些独特的功能,例如

  • 将探查器附加到正在运行的(CPython)程序
  • 通过Yappi集成进行按需配置
  • 在其他计算机上配置文件
  • 多进程支持(多处理,django …)
  • 实时采样/ CPU视图(具有时间范围选择)
  • 通过cProfile / profile集成进行确定性分析
  • 分析现有的PStats结果
  • 打开DOT文件
  • 程序化API访问
  • 按方法或行对样本进行分组
  • PyDev集成
  • PyCharm整合

注意:它是商业性的,但对开源免费。

A new tool to handle profiling in Python is PyVmMonitor: http://www.pyvmmonitor.com/

It has some unique features such as

  • Attach profiler to a running (CPython) program
  • On demand profiling with Yappi integration
  • Profile on a different machine
  • Multiple processes support (multiprocessing, django…)
  • Live sampling/CPU view (with time range selection)
  • Deterministic profiling through cProfile/profile integration
  • Analyze existing PStats results
  • Open DOT files
  • Programatic API access
  • Group samples by method or line
  • PyDev integration
  • PyCharm integration

Note: it’s commercial, but free for open source.


回答 20

gprof2dot_magic

魔术函数,用于gprof2dot在JupyterLab或Jupyter Notebook中将任何Python语句配置为DOT图。

在此处输入图片说明

GitHub回购:https : //github.com/mattijn/gprof2dot_magic

安装

确保您拥有Python软件包gprof2dot_magic

pip install gprof2dot_magic

它的依赖关系gprof2dotgraphviz也将被安装

用法

要启用魔术功能,请先加载gprof2dot_magic模块

%load_ext gprof2dot_magic

然后将任何行语句配置为DOT图,如下所示:

%gprof2dot print('hello world')

在此处输入图片说明

gprof2dot_magic

Magic function for gprof2dot to profile any Python statement as a DOT graph in JupyterLab or Jupyter Notebook.

enter image description here

GitHub repo: https://github.com/mattijn/gprof2dot_magic

installation

Make sure you’ve the Python package gprof2dot_magic.

pip install gprof2dot_magic

Its dependencies gprof2dot and graphviz will be installed as well

usage

To enable the magic function, first load the gprof2dot_magic module

%load_ext gprof2dot_magic

and then profile any line statement as a DOT graph as such:

%gprof2dot print('hello world')

enter image description here


回答 21

是否曾经想知道python脚本到底在做什么?输入检查外壳。通过Inspect Shell,您可以在不中断正在运行的脚本的情况下打印/更改全局变量并运行函数。现在具有自动完成和命令历史记录(仅在Linux上)。

Inspect Shell不是pdb样式的调试器。

https://github.com/amoffat/Inspect-Shell

您可以使用它(和您的手表)。

Ever want to know what the hell that python script is doing? Enter the Inspect Shell. Inspect Shell lets you print/alter globals and run functions without interrupting the running script. Now with auto-complete and command history (only on linux).

Inspect Shell is not a pdb-style debugger.

https://github.com/amoffat/Inspect-Shell

You could use that (and your wristwatch).


回答 22

要添加到https://stackoverflow.com/a/582337/1070617

我编写了此模块,该模块使您可以使用cProfile并轻松查看其输出。此处更多内容:https//github.com/ymichael/cprofilev

$ python -m cprofilev /your/python/program
# Go to http://localhost:4000 to view collected statistics.

另请参阅:http: //ymichael.com/2014/03/08/profiling-python-with-cprofile.html,了解如何理解收集到的统计信息。

To add on to https://stackoverflow.com/a/582337/1070617,

I wrote this module that allows you to use cProfile and view its output easily. More here: https://github.com/ymichael/cprofilev

$ python -m cprofilev /your/python/program
# Go to http://localhost:4000 to view collected statistics.

Also see: http://ymichael.com/2014/03/08/profiling-python-with-cprofile.html on how to make sense of the collected statistics.


回答 23

这将取决于您希望从分析中看到什么。可以通过(bash)给出简单的时间指标。

time python python_prog.py

甚至’/ usr / bin / time’也可以使用’–verbose’标志输出详细的指标。

要检查每个函数给出的时间指标并更好地了解在函数上花费了多少时间,可以在python中使用内置的cProfile。

进入性能,时间等更详细的指标并不是唯一的指标。您可以担心内存,线程等问题。
分析选项:
1. line_profiler是另一个分析器,通常用于逐行找出时序度量。
2. memory_profiler是用于分析内存使用情况的工具。
3. 堆(来自项目Guppy)描述如何使用堆中的对象。

这些是我倾向于使用的一些常见的东西。但是,如果您想了解更多信息,请尝试阅读本书。 这是一本关于性能入门的不错的书。您可以转到使用Cython和JIT(即时)编译的python的高级主题。

It would depend on what you want to see out of profiling. Simple time metrics can be given by (bash).

time python python_prog.py

Even ‘/usr/bin/time’ can output detailed metrics by using ‘–verbose’ flag.

To check time metrics given by each function and to better understand how much time is spent on functions, you can use the inbuilt cProfile in python.

Going into more detailed metrics like performance, time is not the only metric. You can worry about memory, threads etc.
Profiling options:
1. line_profiler is another profiler used commonly to find out timing metrics line-by-line.
2. memory_profiler is a tool to profile memory usage.
3. heapy (from project Guppy) Profile how objects in the heap are used.

These are some of the common ones I tend to use. But if you want to find out more, try reading this book It is a pretty good book on starting out with performance in mind. You can move onto advanced topics on using Cython and JIT(Just-in-time) compiled python.


回答 24

使用austin之类的统计分析器,不需要任何检测,这意味着您可以轻松地从Python应用程序中分析数据

austin python3 my_script.py

原始输出不是很有用,但是您可以将其通过管道传递到flamegraph.pl 以获取该数据的火焰图表示,从而可以细分所花费的时间(以毫秒为单位的实时时间)。

austin python3 my_script.py | flamegraph.pl > my_script_profile.svg

With a statistical profiler like austin, no instrumentation is required, meaning that you can get profiling data out of a Python application simply with

austin python3 my_script.py

The raw output isn’t very useful, but you can pipe that to flamegraph.pl to get a flame graph representation of that data that gives you a breakdown of where the time (measured in microseconds of real time) is being spent.

austin python3 my_script.py | flamegraph.pl > my_script_profile.svg

回答 25

还有一个名为的统计分析器statprof。它是一个采样探查器,因此它为您的代码增加了最小的开销,并提供了基于行(不仅仅基于函数)的时序。它更适合诸如游戏之类的软实时应用程序,但精度可能不如cProfile。

pypi中版本有点旧,因此可以pip通过指定git仓库来安装它:

pip install git+git://github.com/bos/statprof.py@1a33eba91899afe17a8b752c6dfdec6f05dd0c01

您可以像这样运行它:

import statprof

with statprof.profile():
    my_questionable_function()

另请参阅https://stackoverflow.com/a/10333592/320036

There’s also a statistical profiler called statprof. It’s a sampling profiler, so it adds minimal overhead to your code and gives line-based (not just function-based) timings. It’s more suited to soft real-time applications like games, but may be have less precision than cProfile.

The version in pypi is a bit old, so can install it with pip by specifying the git repository:

pip install git+git://github.com/bos/statprof.py@1a33eba91899afe17a8b752c6dfdec6f05dd0c01

You can run it like this:

import statprof

with statprof.profile():
    my_questionable_function()

See also https://stackoverflow.com/a/10333592/320036


回答 26

我刚刚从pypref_time中开发了自己的探查器:

https://github.com/modaresimr/auto_profiler

通过添加装饰器,它将显示一棵耗时的功能树

@Profiler(depth=4, on_disable=show)

Install by: pip install auto_profiler

import time # line number 1
import random

from auto_profiler import Profiler, Tree

def f1():
    mysleep(.6+random.random())

def mysleep(t):
    time.sleep(t)

def fact(i):
    f1()
    if(i==1):
        return 1
    return i*fact(i-1)


def show(p):
    print('Time   [Hits * PerHit] Function name [Called from] [Function Location]\n'+\
          '-----------------------------------------------------------------------')
    print(Tree(p.root, threshold=0.5))

@Profiler(depth=4, on_disable=show)
def main():
    for i in range(5):
        f1()

    fact(3)


if __name__ == '__main__':
    main()

示例输出


Time   [Hits * PerHit] Function name [Called from] [function location]
-----------------------------------------------------------------------
8.974s [1 * 8.974]  main  [auto-profiler/profiler.py:267]  [/test/t2.py:30]
├── 5.954s [5 * 1.191]  f1  [/test/t2.py:34]  [/test/t2.py:14]
   └── 5.954s [5 * 1.191]  mysleep  [/test/t2.py:15]  [/test/t2.py:17]
       └── 5.954s [5 * 1.191]  <time.sleep>
|
|
|   # The rest is for the example recursive function call fact
└── 3.020s [1 * 3.020]  fact  [/test/t2.py:36]  [/test/t2.py:20]
    ├── 0.849s [1 * 0.849]  f1  [/test/t2.py:21]  [/test/t2.py:14]
       └── 0.849s [1 * 0.849]  mysleep  [/test/t2.py:15]  [/test/t2.py:17]
           └── 0.849s [1 * 0.849]  <time.sleep>
    └── 2.171s [1 * 2.171]  fact  [/test/t2.py:24]  [/test/t2.py:20]
        ├── 1.552s [1 * 1.552]  f1  [/test/t2.py:21]  [/test/t2.py:14]
           └── 1.552s [1 * 1.552]  mysleep  [/test/t2.py:15]  [/test/t2.py:17]
        └── 0.619s [1 * 0.619]  fact  [/test/t2.py:24]  [/test/t2.py:20]
            └── 0.619s [1 * 0.619]  f1  [/test/t2.py:21]  [/test/t2.py:14]

I just developed my own profiler inspired from pypref_time:

https://github.com/modaresimr/auto_profiler

By adding a decorator it will show a tree of time-consuming functions

@Profiler(depth=4, on_disable=show)

Install by: pip install auto_profiler

Example

import time # line number 1
import random

from auto_profiler import Profiler, Tree

def f1():
    mysleep(.6+random.random())

def mysleep(t):
    time.sleep(t)

def fact(i):
    f1()
    if(i==1):
        return 1
    return i*fact(i-1)


def show(p):
    print('Time   [Hits * PerHit] Function name [Called from] [Function Location]\n'+\
          '-----------------------------------------------------------------------')
    print(Tree(p.root, threshold=0.5))

@Profiler(depth=4, on_disable=show)
def main():
    for i in range(5):
        f1()

    fact(3)


if __name__ == '__main__':
    main()

Example Output


Time   [Hits * PerHit] Function name [Called from] [function location]
-----------------------------------------------------------------------
8.974s [1 * 8.974]  main  [auto-profiler/profiler.py:267]  [/test/t2.py:30]
├── 5.954s [5 * 1.191]  f1  [/test/t2.py:34]  [/test/t2.py:14]
│   └── 5.954s [5 * 1.191]  mysleep  [/test/t2.py:15]  [/test/t2.py:17]
│       └── 5.954s [5 * 1.191]  <time.sleep>
|
|
|   # The rest is for the example recursive function call fact
└── 3.020s [1 * 3.020]  fact  [/test/t2.py:36]  [/test/t2.py:20]
    ├── 0.849s [1 * 0.849]  f1  [/test/t2.py:21]  [/test/t2.py:14]
    │   └── 0.849s [1 * 0.849]  mysleep  [/test/t2.py:15]  [/test/t2.py:17]
    │       └── 0.849s [1 * 0.849]  <time.sleep>
    └── 2.171s [1 * 2.171]  fact  [/test/t2.py:24]  [/test/t2.py:20]
        ├── 1.552s [1 * 1.552]  f1  [/test/t2.py:21]  [/test/t2.py:14]
        │   └── 1.552s [1 * 1.552]  mysleep  [/test/t2.py:15]  [/test/t2.py:17]
        └── 0.619s [1 * 0.619]  fact  [/test/t2.py:24]  [/test/t2.py:20]
            └── 0.619s [1 * 0.619]  f1  [/test/t2.py:21]  [/test/t2.py:14]

回答 27

当我不在服务器上时,我使用 lsprofcalltree.py并像这样运行我的程序:

python lsprofcalltree.py -o callgrind.1 test.py

然后,我可以使用任何与callgrind兼容的软件打开报告,例如qcachegrind

When i’m not root on the server, I use lsprofcalltree.py and run my program like this:

python lsprofcalltree.py -o callgrind.1 test.py

Then I can open the report with any callgrind-compatible software, like qcachegrind


回答 28

用于在IPython笔记本上快速获取代码段的配置文件统计信息。可以将line_profiler和memory_profiler直接嵌入他们的笔记本中。

得到它!

!pip install line_profiler
!pip install memory_profiler

加载它!

%load_ext line_profiler
%load_ext memory_profiler

用它!


%时间

%time print('Outputs CPU time,Wall Clock time') 
#CPU times: user 2 µs, sys: 0 ns, total: 2 µs Wall time: 5.96 µs

给出:

  • CPU时间:CPU级执行时间
  • 系统时间:系统级执行时间
  • 总计:CPU时间+系统时间
  • 挂钟时间:挂钟时间

%timeit

%timeit -r 7 -n 1000 print('Outputs execution time of the snippet') 
#1000 loops, best of 7: 7.46 ns per loop
  • 在循环(n)次中,在给定的运行次数(r)中给出最佳时间。
  • 输出有关系统缓存的详细信息:
    • 当代码片段多次执行时,系统会缓存一些操作,而不会再次执行它们,这可能会影响概要文件报告的准确性。

%修剪

%prun -s cumulative 'Code to profile' 

给出:

  • 函数调用数(ncalls)
  • 每个函数调用都有条目(不同)
  • 每次通话所花费的时间(每次通话)
  • 直到该函数调用为止的时间(cumtime)
  • 称为etc的func /模块的名称…

累积资料


%mit

%memit 'Code to profile'
#peak memory: 199.45 MiB, increment: 0.00 MiB

给出:

  • 内存使用情况

%lprun

#Example function
def fun():
  for i in range(10):
    print(i)

#Usage: %lprun <name_of_the_function> function
%lprun -f fun fun()

给出:

  • 逐行统计

LineProfile

For getting quick profile stats for your code snippets on an IPython notebook. One can embed line_profiler and memory_profiler straight into their notebooks.

Get it!

!pip install line_profiler
!pip install memory_profiler

Load it!

%load_ext line_profiler
%load_ext memory_profiler

Use it!


%time

%time print('Outputs CPU time,Wall Clock time') 
#CPU times: user 2 µs, sys: 0 ns, total: 2 µs Wall time: 5.96 µs

Gives:

  • CPU times: CPU level execution time
  • sys times: system level execution time
  • total: CPU time + system time
  • Wall time: Wall Clock Time

%timeit

%timeit -r 7 -n 1000 print('Outputs execution time of the snippet') 
#1000 loops, best of 7: 7.46 ns per loop
  • Gives best time out of given number of runs(r) in looping (n) times.
  • Outputs details on system caching:
    • When code snippets are executed multiple times, system caches a few opearations and doesn’t execute them again that may hamper the accuracy of the profile reports.

%prun

%prun -s cumulative 'Code to profile' 

Gives:

  • number of function calls(ncalls)
  • has entries per function call(distinct)
  • time taken per call(percall)
  • time elapsed till that function call(cumtime)
  • name of the func/module called etc…

Cumulative profile


%memit

%memit 'Code to profile'
#peak memory: 199.45 MiB, increment: 0.00 MiB

Gives:

  • Memory usage

%lprun

#Example function
def fun():
  for i in range(10):
    print(i)

#Usage: %lprun <name_of_the_function> function
%lprun -f fun fun()

Gives:

  • Line wise stats

LineProfile


如何在Python中使用线程?

问题:如何在Python中使用线程?

我试图了解Python中的线程。我看过文档和示例,但坦率地说,许多示例过于复杂,我难以理解它们。

您如何清楚地显示为多线程而划分的任务?

I am trying to understand threading in Python. I’ve looked at the documentation and examples, but quite frankly, many examples are overly sophisticated and I’m having trouble understanding them.

How do you clearly show tasks being divided for multi-threading?


回答 0

自2010年提出这个问题以来,如何使用带有mappool的 Python进行简单的多线程处理已经有了真正的简化。

下面的代码来自于一篇文章/博客文章,您绝对应该检出(没有从属关系)- 并行显示在一行中:更好的日常线程任务模型。我将在下面进行总结-最终仅是几行代码:

from multiprocessing.dummy import Pool as ThreadPool
pool = ThreadPool(4)
results = pool.map(my_function, my_array)

这是以下内容的多线程版本:

results = []
for item in my_array:
    results.append(my_function(item))

描述

Map是一个很棒的小功能,是轻松将并行性注入Python代码的关键。对于那些不熟悉的人来说,地图是从Lisp之类的功能语言中提炼出来的。它是将另一个功能映射到序列上的功能。

Map为我们处理序列上的迭代,应用函数,并将所有结果存储在最后的方便列表中。

在此处输入图片说明


实作

map函数的并行版本由以下两个库提供:multiprocessing,以及鲜为人知但同样出色的继子child:multiprocessing.dummy。

multiprocessing.dummy与多处理模块完全相同,但是使用线程代替一个重要的区别 -使用多个进程来执行CPU密集型任务;用于I / O的线程)。

multiprocessing.dummy复制了多处理的API,但仅不过是线程模块的包装器。

import urllib2
from multiprocessing.dummy import Pool as ThreadPool

urls = [
  'http://www.python.org',
  'http://www.python.org/about/',
  'http://www.onlamp.com/pub/a/python/2003/04/17/metaclasses.html',
  'http://www.python.org/doc/',
  'http://www.python.org/download/',
  'http://www.python.org/getit/',
  'http://www.python.org/community/',
  'https://wiki.python.org/moin/',
]

# Make the Pool of workers
pool = ThreadPool(4)

# Open the URLs in their own threads
# and return the results
results = pool.map(urllib2.urlopen, urls)

# Close the pool and wait for the work to finish
pool.close()
pool.join()

以及计时结果:

Single thread:   14.4 seconds
       4 Pool:   3.1 seconds
       8 Pool:   1.4 seconds
      13 Pool:   1.3 seconds

传递多个参数仅在Python 3.3和更高版本中才这样):

要传递多个数组:

results = pool.starmap(function, zip(list_a, list_b))

或传递一个常量和一个数组:

results = pool.starmap(function, zip(itertools.repeat(constant), list_a))

如果您使用的是Python的早期版本,则可以通过此变通方法()传递多个参数。

(感谢user136036的有用评论。)

Since this question was asked in 2010, there has been real simplification in how to do simple multithreading with Python with map and pool.

The code below comes from an article/blog post that you should definitely check out (no affiliation) – Parallelism in one line: A Better Model for Day to Day Threading Tasks. I’ll summarize below – it ends up being just a few lines of code:

from multiprocessing.dummy import Pool as ThreadPool
pool = ThreadPool(4)
results = pool.map(my_function, my_array)

Which is the multithreaded version of:

results = []
for item in my_array:
    results.append(my_function(item))

Description

Map is a cool little function, and the key to easily injecting parallelism into your Python code. For those unfamiliar, map is something lifted from functional languages like Lisp. It is a function which maps another function over a sequence.

Map handles the iteration over the sequence for us, applies the function, and stores all of the results in a handy list at the end.

Enter image description here


Implementation

Parallel versions of the map function are provided by two libraries:multiprocessing, and also its little known, but equally fantastic step child:multiprocessing.dummy.

multiprocessing.dummy is exactly the same as multiprocessing module, but uses threads instead (an important distinction – use multiple processes for CPU-intensive tasks; threads for (and during) I/O):

multiprocessing.dummy replicates the API of multiprocessing, but is no more than a wrapper around the threading module.

import urllib2
from multiprocessing.dummy import Pool as ThreadPool

urls = [
  'http://www.python.org',
  'http://www.python.org/about/',
  'http://www.onlamp.com/pub/a/python/2003/04/17/metaclasses.html',
  'http://www.python.org/doc/',
  'http://www.python.org/download/',
  'http://www.python.org/getit/',
  'http://www.python.org/community/',
  'https://wiki.python.org/moin/',
]

# Make the Pool of workers
pool = ThreadPool(4)

# Open the URLs in their own threads
# and return the results
results = pool.map(urllib2.urlopen, urls)

# Close the pool and wait for the work to finish
pool.close()
pool.join()

And the timing results:

Single thread:   14.4 seconds
       4 Pool:   3.1 seconds
       8 Pool:   1.4 seconds
      13 Pool:   1.3 seconds

Passing multiple arguments (works like this only in Python 3.3 and later):

To pass multiple arrays:

results = pool.starmap(function, zip(list_a, list_b))

Or to pass a constant and an array:

results = pool.starmap(function, zip(itertools.repeat(constant), list_a))

If you are using an earlier version of Python, you can pass multiple arguments via this workaround).

(Thanks to user136036 for the helpful comment.)


回答 1

这是一个简单的示例:您需要尝试一些备用URL并返回第一个URL的内容以进行响应。

import Queue
import threading
import urllib2

# Called by each thread
def get_url(q, url):
    q.put(urllib2.urlopen(url).read())

theurls = ["http://google.com", "http://yahoo.com"]

q = Queue.Queue()

for u in theurls:
    t = threading.Thread(target=get_url, args = (q,u))
    t.daemon = True
    t.start()

s = q.get()
print s

在这种情况下,线程被用作简单的优化:每个子线程都在等待URL解析和响应,以便将其内容放入队列中。每个线程都是一个守护进程(如果主线程结束,则不会使进程继续运行-这比不常见);主线程启动所有子线程,get在队列中执行a ,以等待直到其中一个完成a put,然后发出结果并终止(由于它们是守护线程,因此将取消可能仍在运行的所有子线程)。

正确使用Python中的线程总是会与I / O操作相关联(因为CPython无论如何都不会使用多个内核来运行受CPU约束的任务,因此,线程的唯一原因是在等待某些I / O时不会阻塞进程)。顺便说一句,队列几乎总是将工作分配到线程和/或收集工作结果的最佳方法,并且它们本质上是线程安全的,因此它们使您不必担心锁,条件,事件,信号量以及其他相互之间的关系。线程协调/通信概念。

Here’s a simple example: you need to try a few alternative URLs and return the contents of the first one to respond.

import Queue
import threading
import urllib2

# Called by each thread
def get_url(q, url):
    q.put(urllib2.urlopen(url).read())

theurls = ["http://google.com", "http://yahoo.com"]

q = Queue.Queue()

for u in theurls:
    t = threading.Thread(target=get_url, args = (q,u))
    t.daemon = True
    t.start()

s = q.get()
print s

This is a case where threading is used as a simple optimization: each subthread is waiting for a URL to resolve and respond, in order to put its contents on the queue; each thread is a daemon (won’t keep the process up if main thread ends — that’s more common than not); the main thread starts all subthreads, does a get on the queue to wait until one of them has done a put, then emits the results and terminates (which takes down any subthreads that might still be running, since they’re daemon threads).

Proper use of threads in Python is invariably connected to I/O operations (since CPython doesn’t use multiple cores to run CPU-bound tasks anyway, the only reason for threading is not blocking the process while there’s a wait for some I/O). Queues are almost invariably the best way to farm out work to threads and/or collect the work’s results, by the way, and they’re intrinsically threadsafe, so they save you from worrying about locks, conditions, events, semaphores, and other inter-thread coordination/communication concepts.


回答 2

注意:对于Python中的实际并行化,您应该使用多处理模块来分叉多个并行执行的进程(由于全局解释器锁,Python线程提供了交织,但实际上它们是串行执行的,而不是并行执行的,并且仅仅是在交错I / O操作时很有用)。

但是,如果您只是在寻找交织(或者正在进行尽管可以使用全局解释器锁而可以并行化的I / O操作),那么就可以从线程模块开始。作为一个非常简单的示例,让我们考虑通过并行求和子范围来求和一个大范围的问题:

import threading

class SummingThread(threading.Thread):
     def __init__(self,low,high):
         super(SummingThread, self).__init__()
         self.low=low
         self.high=high
         self.total=0

     def run(self):
         for i in range(self.low,self.high):
             self.total+=i


thread1 = SummingThread(0,500000)
thread2 = SummingThread(500000,1000000)
thread1.start() # This actually causes the thread to run
thread2.start()
thread1.join()  # This waits until the thread has completed
thread2.join()
# At this point, both threads have completed
result = thread1.total + thread2.total
print result

请注意,以上示例是一个非常愚蠢的示例,因为它完全不执行任何I / O操作,并且由于全局解释器锁定,尽管在CPython中是交错执行的(带有上下文切换的额外开销),但仍将串行执行。

NOTE: For actual parallelization in Python, you should use the multiprocessing module to fork multiple processes that execute in parallel (due to the global interpreter lock, Python threads provide interleaving, but they are in fact executed serially, not in parallel, and are only useful when interleaving I/O operations).

However, if you are merely looking for interleaving (or are doing I/O operations that can be parallelized despite the global interpreter lock), then the threading module is the place to start. As a really simple example, let’s consider the problem of summing a large range by summing subranges in parallel:

import threading

class SummingThread(threading.Thread):
     def __init__(self,low,high):
         super(SummingThread, self).__init__()
         self.low=low
         self.high=high
         self.total=0

     def run(self):
         for i in range(self.low,self.high):
             self.total+=i


thread1 = SummingThread(0,500000)
thread2 = SummingThread(500000,1000000)
thread1.start() # This actually causes the thread to run
thread2.start()
thread1.join()  # This waits until the thread has completed
thread2.join()
# At this point, both threads have completed
result = thread1.total + thread2.total
print result

Note that the above is a very stupid example, as it does absolutely no I/O and will be executed serially albeit interleaved (with the added overhead of context switching) in CPython due to the global interpreter lock.


回答 3

像其他提到的一样,由于GIL,CPython只能将线程用于I / O等待。

如果您想从多个内核中受益于CPU绑定任务,请使用multiprocessing

from multiprocessing import Process

def f(name):
    print 'hello', name

if __name__ == '__main__':
    p = Process(target=f, args=('bob',))
    p.start()
    p.join()

Like others mentioned, CPython can use threads only for I/O waits due to GIL.

If you want to benefit from multiple cores for CPU-bound tasks, use multiprocessing:

from multiprocessing import Process

def f(name):
    print 'hello', name

if __name__ == '__main__':
    p = Process(target=f, args=('bob',))
    p.start()
    p.join()

回答 4

仅需注意:线程不需要队列。

这是我能想到的最简单的示例,其中显示了10个进程同时运行。

import threading
from random import randint
from time import sleep


def print_number(number):

    # Sleeps a random 1 to 10 seconds
    rand_int_var = randint(1, 10)
    sleep(rand_int_var)
    print "Thread " + str(number) + " slept for " + str(rand_int_var) + " seconds"

thread_list = []

for i in range(1, 10):

    # Instantiates the thread
    # (i) does not make a sequence, so (i,)
    t = threading.Thread(target=print_number, args=(i,))
    # Sticks the thread in a list so that it remains accessible
    thread_list.append(t)

# Starts threads
for thread in thread_list:
    thread.start()

# This blocks the calling thread until the thread whose join() method is called is terminated.
# From http://docs.python.org/2/library/threading.html#thread-objects
for thread in thread_list:
    thread.join()

# Demonstrates that the main process waited for threads to complete
print "Done"

Just a note: A queue is not required for threading.

This is the simplest example I could imagine that shows 10 processes running concurrently.

import threading
from random import randint
from time import sleep


def print_number(number):

    # Sleeps a random 1 to 10 seconds
    rand_int_var = randint(1, 10)
    sleep(rand_int_var)
    print "Thread " + str(number) + " slept for " + str(rand_int_var) + " seconds"

thread_list = []

for i in range(1, 10):

    # Instantiates the thread
    # (i) does not make a sequence, so (i,)
    t = threading.Thread(target=print_number, args=(i,))
    # Sticks the thread in a list so that it remains accessible
    thread_list.append(t)

# Starts threads
for thread in thread_list:
    thread.start()

# This blocks the calling thread until the thread whose join() method is called is terminated.
# From http://docs.python.org/2/library/threading.html#thread-objects
for thread in thread_list:
    thread.join()

# Demonstrates that the main process waited for threads to complete
print "Done"

回答 5

Alex Martelli的回答对我有所帮助。但是,这是我认为更有用的修改版本(至少对我而言)。

更新:在Python 2和Python 3中均可使用

try:
    # For Python 3
    import queue
    from urllib.request import urlopen
except:
    # For Python 2 
    import Queue as queue
    from urllib2 import urlopen

import threading

worker_data = ['http://google.com', 'http://yahoo.com', 'http://bing.com']

# Load up a queue with your data. This will handle locking
q = queue.Queue()
for url in worker_data:
    q.put(url)

# Define a worker function
def worker(url_queue):
    queue_full = True
    while queue_full:
        try:
            # Get your data off the queue, and do some work
            url = url_queue.get(False)
            data = urlopen(url).read()
            print(len(data))

        except queue.Empty:
            queue_full = False

# Create as many threads as you want
thread_count = 5
for i in range(thread_count):
    t = threading.Thread(target=worker, args = (q,))
    t.start()

The answer from Alex Martelli helped me. However, here is a modified version that I thought was more useful (at least to me).

Updated: works in both Python 2 and Python 3

try:
    # For Python 3
    import queue
    from urllib.request import urlopen
except:
    # For Python 2 
    import Queue as queue
    from urllib2 import urlopen

import threading

worker_data = ['http://google.com', 'http://yahoo.com', 'http://bing.com']

# Load up a queue with your data. This will handle locking
q = queue.Queue()
for url in worker_data:
    q.put(url)

# Define a worker function
def worker(url_queue):
    queue_full = True
    while queue_full:
        try:
            # Get your data off the queue, and do some work
            url = url_queue.get(False)
            data = urlopen(url).read()
            print(len(data))

        except queue.Empty:
            queue_full = False

# Create as many threads as you want
thread_count = 5
for i in range(thread_count):
    t = threading.Thread(target=worker, args = (q,))
    t.start()

回答 6

给定一个函数,将f其像这样进行线程化:

import threading
threading.Thread(target=f).start()

将参数传递给 f

threading.Thread(target=f, args=(a,b,c)).start()

Given a function, f, thread it like this:

import threading
threading.Thread(target=f).start()

To pass arguments to f

threading.Thread(target=f, args=(a,b,c)).start()

回答 7

我发现这非常有用:创建与内核一样多的线程,并让它们执行(大量)任务(在这种情况下,调用Shell程序):

import Queue
import threading
import multiprocessing
import subprocess

q = Queue.Queue()
for i in range(30): # Put 30 tasks in the queue
    q.put(i)

def worker():
    while True:
        item = q.get()
        # Execute a task: call a shell program and wait until it completes
        subprocess.call("echo " + str(item), shell=True)
        q.task_done()

cpus = multiprocessing.cpu_count() # Detect number of cores
print("Creating %d threads" % cpus)
for i in range(cpus):
     t = threading.Thread(target=worker)
     t.daemon = True
     t.start()

q.join() # Block until all tasks are done

I found this very useful: create as many threads as cores and let them execute a (large) number of tasks (in this case, calling a shell program):

import Queue
import threading
import multiprocessing
import subprocess

q = Queue.Queue()
for i in range(30): # Put 30 tasks in the queue
    q.put(i)

def worker():
    while True:
        item = q.get()
        # Execute a task: call a shell program and wait until it completes
        subprocess.call("echo " + str(item), shell=True)
        q.task_done()

cpus = multiprocessing.cpu_count() # Detect number of cores
print("Creating %d threads" % cpus)
for i in range(cpus):
     t = threading.Thread(target=worker)
     t.daemon = True
     t.start()

q.join() # Block until all tasks are done

回答 8

Python 3具有启动并行任务的功能。这使我们的工作更加轻松。

它具有线程池进程池

以下提供了一个见解:

ThreadPoolExecutor示例

import concurrent.futures
import urllib.request

URLS = ['http://www.foxnews.com/',
        'http://www.cnn.com/',
        'http://europe.wsj.com/',
        'http://www.bbc.co.uk/',
        'http://some-made-up-domain.com/']

# Retrieve a single page and report the URL and contents
def load_url(url, timeout):
    with urllib.request.urlopen(url, timeout=timeout) as conn:
        return conn.read()

# We can use a with statement to ensure threads are cleaned up promptly
with concurrent.futures.ThreadPoolExecutor(max_workers=5) as executor:
    # Start the load operations and mark each future with its URL
    future_to_url = {executor.submit(load_url, url, 60): url for url in URLS}
    for future in concurrent.futures.as_completed(future_to_url):
        url = future_to_url[future]
        try:
            data = future.result()
        except Exception as exc:
            print('%r generated an exception: %s' % (url, exc))
        else:
            print('%r page is %d bytes' % (url, len(data)))

ProcessPoolExecutor

import concurrent.futures
import math

PRIMES = [
    112272535095293,
    112582705942171,
    112272535095293,
    115280095190773,
    115797848077099,
    1099726899285419]

def is_prime(n):
    if n % 2 == 0:
        return False

    sqrt_n = int(math.floor(math.sqrt(n)))
    for i in range(3, sqrt_n + 1, 2):
        if n % i == 0:
            return False
    return True

def main():
    with concurrent.futures.ProcessPoolExecutor() as executor:
        for number, prime in zip(PRIMES, executor.map(is_prime, PRIMES)):
            print('%d is prime: %s' % (number, prime))

if __name__ == '__main__':
    main()

Python 3 has the facility of launching parallel tasks. This makes our work easier.

It has thread pooling and process pooling.

The following gives an insight:

ThreadPoolExecutor Example (source)

import concurrent.futures
import urllib.request

URLS = ['http://www.foxnews.com/',
        'http://www.cnn.com/',
        'http://europe.wsj.com/',
        'http://www.bbc.co.uk/',
        'http://some-made-up-domain.com/']

# Retrieve a single page and report the URL and contents
def load_url(url, timeout):
    with urllib.request.urlopen(url, timeout=timeout) as conn:
        return conn.read()

# We can use a with statement to ensure threads are cleaned up promptly
with concurrent.futures.ThreadPoolExecutor(max_workers=5) as executor:
    # Start the load operations and mark each future with its URL
    future_to_url = {executor.submit(load_url, url, 60): url for url in URLS}
    for future in concurrent.futures.as_completed(future_to_url):
        url = future_to_url[future]
        try:
            data = future.result()
        except Exception as exc:
            print('%r generated an exception: %s' % (url, exc))
        else:
            print('%r page is %d bytes' % (url, len(data)))

ProcessPoolExecutor (source)

import concurrent.futures
import math

PRIMES = [
    112272535095293,
    112582705942171,
    112272535095293,
    115280095190773,
    115797848077099,
    1099726899285419]

def is_prime(n):
    if n % 2 == 0:
        return False

    sqrt_n = int(math.floor(math.sqrt(n)))
    for i in range(3, sqrt_n + 1, 2):
        if n % i == 0:
            return False
    return True

def main():
    with concurrent.futures.ProcessPoolExecutor() as executor:
        for number, prime in zip(PRIMES, executor.map(is_prime, PRIMES)):
            print('%d is prime: %s' % (number, prime))

if __name__ == '__main__':
    main()

回答 9

使用新的并发模块

def sqr(val):
    import time
    time.sleep(0.1)
    return val * val

def process_result(result):
    print(result)

def process_these_asap(tasks):
    import concurrent.futures

    with concurrent.futures.ProcessPoolExecutor() as executor:
        futures = []
        for task in tasks:
            futures.append(executor.submit(sqr, task))

        for future in concurrent.futures.as_completed(futures):
            process_result(future.result())
        # Or instead of all this just do:
        # results = executor.map(sqr, tasks)
        # list(map(process_result, results))

def main():
    tasks = list(range(10))
    print('Processing {} tasks'.format(len(tasks)))
    process_these_asap(tasks)
    print('Done')
    return 0

if __name__ == '__main__':
    import sys
    sys.exit(main())

对于所有以前接触过Java的人来说,执行者方法似乎都很熟悉。

另外请注意:为了使Universe保持理智,如果您不使用with上下文,请不要忘记关闭池/执行器(它非常强大,它可以为您完成此工作)

Using the blazing new concurrent.futures module

def sqr(val):
    import time
    time.sleep(0.1)
    return val * val

def process_result(result):
    print(result)

def process_these_asap(tasks):
    import concurrent.futures

    with concurrent.futures.ProcessPoolExecutor() as executor:
        futures = []
        for task in tasks:
            futures.append(executor.submit(sqr, task))

        for future in concurrent.futures.as_completed(futures):
            process_result(future.result())
        # Or instead of all this just do:
        # results = executor.map(sqr, tasks)
        # list(map(process_result, results))

def main():
    tasks = list(range(10))
    print('Processing {} tasks'.format(len(tasks)))
    process_these_asap(tasks)
    print('Done')
    return 0

if __name__ == '__main__':
    import sys
    sys.exit(main())

The executor approach might seem familiar to all those who have gotten their hands dirty with Java before.

Also on a side note: To keep the universe sane, don’t forget to close your pools/executors if you don’t use with context (which is so awesome that it does it for you)


回答 10

对我而言,线程的完美示例是监视异步事件。看这段代码。

# thread_test.py
import threading
import time

class Monitor(threading.Thread):
    def __init__(self, mon):
        threading.Thread.__init__(self)
        self.mon = mon

    def run(self):
        while True:
            if self.mon[0] == 2:
                print "Mon = 2"
                self.mon[0] = 3;

您可以通过打开IPython会话并执行以下操作来处理此代码:

>>> from thread_test import Monitor
>>> a = [0]
>>> mon = Monitor(a)
>>> mon.start()
>>> a[0] = 2
Mon = 2
>>>a[0] = 2
Mon = 2

等一下

>>> a[0] = 2
Mon = 2

For me, the perfect example for threading is monitoring asynchronous events. Look at this code.

# thread_test.py
import threading
import time

class Monitor(threading.Thread):
    def __init__(self, mon):
        threading.Thread.__init__(self)
        self.mon = mon

    def run(self):
        while True:
            if self.mon[0] == 2:
                print "Mon = 2"
                self.mon[0] = 3;

You can play with this code by opening an IPython session and doing something like:

>>> from thread_test import Monitor
>>> a = [0]
>>> mon = Monitor(a)
>>> mon.start()
>>> a[0] = 2
Mon = 2
>>>a[0] = 2
Mon = 2

Wait a few minutes

>>> a[0] = 2
Mon = 2

回答 11

大多数文档和教程都使用Python ThreadingQueue模块,对于初学者来说,它们似乎不胜枚举。

也许考虑使用concurrent.futures.ThreadPoolExecutorPython 3 的模块。

结合with子句和列表理解,这可能是一个真正的魅力。

from concurrent.futures import ThreadPoolExecutor, as_completed

def get_url(url):
    # Your actual program here. Using threading.Lock() if necessary
    return ""

# List of URLs to fetch
urls = ["url1", "url2"]

with ThreadPoolExecutor(max_workers = 5) as executor:

    # Create threads
    futures = {executor.submit(get_url, url) for url in urls}

    # as_completed() gives you the threads once finished
    for f in as_completed(futures):
        # Get the results
        rs = f.result()

Most documentation and tutorials use Python’s Threading and Queue module, and they could seem overwhelming for beginners.

Perhaps consider the concurrent.futures.ThreadPoolExecutor module of Python 3.

Combined with with clause and list comprehension it could be a real charm.

from concurrent.futures import ThreadPoolExecutor, as_completed

def get_url(url):
    # Your actual program here. Using threading.Lock() if necessary
    return ""

# List of URLs to fetch
urls = ["url1", "url2"]

with ThreadPoolExecutor(max_workers = 5) as executor:

    # Create threads
    futures = {executor.submit(get_url, url) for url in urls}

    # as_completed() gives you the threads once finished
    for f in as_completed(futures):
        # Get the results
        rs = f.result()

回答 12

我在这里看到了很多没有执行任何实际工作的示例,这些示例主要是CPU约束的。这是一个CPU限制任务的示例,该任务计算1000万到10.05百万之间的所有素数。我在这里使用了所有四种方法:

import math
import timeit
import threading
import multiprocessing
from concurrent.futures import ThreadPoolExecutor, ProcessPoolExecutor


def time_stuff(fn):
    """
    Measure time of execution of a function
    """
    def wrapper(*args, **kwargs):
        t0 = timeit.default_timer()
        fn(*args, **kwargs)
        t1 = timeit.default_timer()
        print("{} seconds".format(t1 - t0))
    return wrapper

def find_primes_in(nmin, nmax):
    """
    Compute a list of prime numbers between the given minimum and maximum arguments
    """
    primes = []

    # Loop from minimum to maximum
    for current in range(nmin, nmax + 1):

        # Take the square root of the current number
        sqrt_n = int(math.sqrt(current))
        found = False

        # Check if the any number from 2 to the square root + 1 divides the current numnber under consideration
        for number in range(2, sqrt_n + 1):

            # If divisible we have found a factor, hence this is not a prime number, lets move to the next one
            if current % number == 0:
                found = True
                break

        # If not divisible, add this number to the list of primes that we have found so far
        if not found:
            primes.append(current)

    # I am merely printing the length of the array containing all the primes, but feel free to do what you want
    print(len(primes))

@time_stuff
def sequential_prime_finder(nmin, nmax):
    """
    Use the main process and main thread to compute everything in this case
    """
    find_primes_in(nmin, nmax)

@time_stuff
def threading_prime_finder(nmin, nmax):
    """
    If the minimum is 1000 and the maximum is 2000 and we have four workers,
    1000 - 1250 to worker 1
    1250 - 1500 to worker 2
    1500 - 1750 to worker 3
    1750 - 2000 to worker 4
    so let’s split the minimum and maximum values according to the number of workers
    """
    nrange = nmax - nmin
    threads = []
    for i in range(8):
        start = int(nmin + i * nrange/8)
        end = int(nmin + (i + 1) * nrange/8)

        # Start the thread with the minimum and maximum split up to compute
        # Parallel computation will not work here due to the GIL since this is a CPU-bound task
        t = threading.Thread(target = find_primes_in, args = (start, end))
        threads.append(t)
        t.start()

    # Don’t forget to wait for the threads to finish
    for t in threads:
        t.join()

@time_stuff
def processing_prime_finder(nmin, nmax):
    """
    Split the minimum, maximum interval similar to the threading method above, but use processes this time
    """
    nrange = nmax - nmin
    processes = []
    for i in range(8):
        start = int(nmin + i * nrange/8)
        end = int(nmin + (i + 1) * nrange/8)
        p = multiprocessing.Process(target = find_primes_in, args = (start, end))
        processes.append(p)
        p.start()

    for p in processes:
        p.join()

@time_stuff
def thread_executor_prime_finder(nmin, nmax):
    """
    Split the min max interval similar to the threading method, but use a thread pool executor this time.
    This method is slightly faster than using pure threading as the pools manage threads more efficiently.
    This method is still slow due to the GIL limitations since we are doing a CPU-bound task.
    """
    nrange = nmax - nmin
    with ThreadPoolExecutor(max_workers = 8) as e:
        for i in range(8):
            start = int(nmin + i * nrange/8)
            end = int(nmin + (i + 1) * nrange/8)
            e.submit(find_primes_in, start, end)

@time_stuff
def process_executor_prime_finder(nmin, nmax):
    """
    Split the min max interval similar to the threading method, but use the process pool executor.
    This is the fastest method recorded so far as it manages process efficiently + overcomes GIL limitations.
    RECOMMENDED METHOD FOR CPU-BOUND TASKS
    """
    nrange = nmax - nmin
    with ProcessPoolExecutor(max_workers = 8) as e:
        for i in range(8):
            start = int(nmin + i * nrange/8)
            end = int(nmin + (i + 1) * nrange/8)
            e.submit(find_primes_in, start, end)

def main():
    nmin = int(1e7)
    nmax = int(1.05e7)
    print("Sequential Prime Finder Starting")
    sequential_prime_finder(nmin, nmax)
    print("Threading Prime Finder Starting")
    threading_prime_finder(nmin, nmax)
    print("Processing Prime Finder Starting")
    processing_prime_finder(nmin, nmax)
    print("Thread Executor Prime Finder Starting")
    thread_executor_prime_finder(nmin, nmax)
    print("Process Executor Finder Starting")
    process_executor_prime_finder(nmin, nmax)

main()

这是我的Mac OS X四核计算机上的结果

Sequential Prime Finder Starting
9.708213827005238 seconds
Threading Prime Finder Starting
9.81836523200036 seconds
Processing Prime Finder Starting
3.2467174359990167 seconds
Thread Executor Prime Finder Starting
10.228896902000997 seconds
Process Executor Finder Starting
2.656402041000547 seconds

I saw a lot of examples here where no real work was being performed, and they were mostly CPU-bound. Here is an example of a CPU-bound task that computes all prime numbers between 10 million and 10.05 million. I have used all four methods here:

import math
import timeit
import threading
import multiprocessing
from concurrent.futures import ThreadPoolExecutor, ProcessPoolExecutor


def time_stuff(fn):
    """
    Measure time of execution of a function
    """
    def wrapper(*args, **kwargs):
        t0 = timeit.default_timer()
        fn(*args, **kwargs)
        t1 = timeit.default_timer()
        print("{} seconds".format(t1 - t0))
    return wrapper

def find_primes_in(nmin, nmax):
    """
    Compute a list of prime numbers between the given minimum and maximum arguments
    """
    primes = []

    # Loop from minimum to maximum
    for current in range(nmin, nmax + 1):

        # Take the square root of the current number
        sqrt_n = int(math.sqrt(current))
        found = False

        # Check if the any number from 2 to the square root + 1 divides the current numnber under consideration
        for number in range(2, sqrt_n + 1):

            # If divisible we have found a factor, hence this is not a prime number, lets move to the next one
            if current % number == 0:
                found = True
                break

        # If not divisible, add this number to the list of primes that we have found so far
        if not found:
            primes.append(current)

    # I am merely printing the length of the array containing all the primes, but feel free to do what you want
    print(len(primes))

@time_stuff
def sequential_prime_finder(nmin, nmax):
    """
    Use the main process and main thread to compute everything in this case
    """
    find_primes_in(nmin, nmax)

@time_stuff
def threading_prime_finder(nmin, nmax):
    """
    If the minimum is 1000 and the maximum is 2000 and we have four workers,
    1000 - 1250 to worker 1
    1250 - 1500 to worker 2
    1500 - 1750 to worker 3
    1750 - 2000 to worker 4
    so let’s split the minimum and maximum values according to the number of workers
    """
    nrange = nmax - nmin
    threads = []
    for i in range(8):
        start = int(nmin + i * nrange/8)
        end = int(nmin + (i + 1) * nrange/8)

        # Start the thread with the minimum and maximum split up to compute
        # Parallel computation will not work here due to the GIL since this is a CPU-bound task
        t = threading.Thread(target = find_primes_in, args = (start, end))
        threads.append(t)
        t.start()

    # Don’t forget to wait for the threads to finish
    for t in threads:
        t.join()

@time_stuff
def processing_prime_finder(nmin, nmax):
    """
    Split the minimum, maximum interval similar to the threading method above, but use processes this time
    """
    nrange = nmax - nmin
    processes = []
    for i in range(8):
        start = int(nmin + i * nrange/8)
        end = int(nmin + (i + 1) * nrange/8)
        p = multiprocessing.Process(target = find_primes_in, args = (start, end))
        processes.append(p)
        p.start()

    for p in processes:
        p.join()

@time_stuff
def thread_executor_prime_finder(nmin, nmax):
    """
    Split the min max interval similar to the threading method, but use a thread pool executor this time.
    This method is slightly faster than using pure threading as the pools manage threads more efficiently.
    This method is still slow due to the GIL limitations since we are doing a CPU-bound task.
    """
    nrange = nmax - nmin
    with ThreadPoolExecutor(max_workers = 8) as e:
        for i in range(8):
            start = int(nmin + i * nrange/8)
            end = int(nmin + (i + 1) * nrange/8)
            e.submit(find_primes_in, start, end)

@time_stuff
def process_executor_prime_finder(nmin, nmax):
    """
    Split the min max interval similar to the threading method, but use the process pool executor.
    This is the fastest method recorded so far as it manages process efficiently + overcomes GIL limitations.
    RECOMMENDED METHOD FOR CPU-BOUND TASKS
    """
    nrange = nmax - nmin
    with ProcessPoolExecutor(max_workers = 8) as e:
        for i in range(8):
            start = int(nmin + i * nrange/8)
            end = int(nmin + (i + 1) * nrange/8)
            e.submit(find_primes_in, start, end)

def main():
    nmin = int(1e7)
    nmax = int(1.05e7)
    print("Sequential Prime Finder Starting")
    sequential_prime_finder(nmin, nmax)
    print("Threading Prime Finder Starting")
    threading_prime_finder(nmin, nmax)
    print("Processing Prime Finder Starting")
    processing_prime_finder(nmin, nmax)
    print("Thread Executor Prime Finder Starting")
    thread_executor_prime_finder(nmin, nmax)
    print("Process Executor Finder Starting")
    process_executor_prime_finder(nmin, nmax)

main()

Here are the results on my Mac OS X four-core machine

Sequential Prime Finder Starting
9.708213827005238 seconds
Threading Prime Finder Starting
9.81836523200036 seconds
Processing Prime Finder Starting
3.2467174359990167 seconds
Thread Executor Prime Finder Starting
10.228896902000997 seconds
Process Executor Finder Starting
2.656402041000547 seconds

回答 13

这是使用线程导入CSV的非常简单的示例。(图书馆收录的目的可能有所不同。)

辅助功能:

from threading import Thread
from project import app
import csv


def import_handler(csv_file_name):
    thr = Thread(target=dump_async_csv_data, args=[csv_file_name])
    thr.start()

def dump_async_csv_data(csv_file_name):
    with app.app_context():
        with open(csv_file_name) as File:
            reader = csv.DictReader(File)
            for row in reader:
                # DB operation/query

驱动功能:

import_handler(csv_file_name)

Here is the very simple example of CSV import using threading. (Library inclusion may differ for different purpose.)

Helper Functions:

from threading import Thread
from project import app
import csv


def import_handler(csv_file_name):
    thr = Thread(target=dump_async_csv_data, args=[csv_file_name])
    thr.start()

def dump_async_csv_data(csv_file_name):
    with app.app_context():
        with open(csv_file_name) as File:
            reader = csv.DictReader(File)
            for row in reader:
                # DB operation/query

Driver Function:

import_handler(csv_file_name)

回答 14

我想举一个简单的例子,当我不得不自己解决这个问题时,我发现这些解释很有用。

在此答案中,您将找到有关Python的GIL(全局解释器锁)的一些信息,以及使用multiprocessing.dummy和一些简单的基准编写的简单的日常示例。

全局翻译锁定(GIL)

Python不允许真正意义上的多线程。它具有一个多线程程序包,但是如果您想使用多线程来加快代码速度,那么使用它通常不是一个好主意。

Python具有称为全局解释器锁(GIL)的构造。GIL确保您的“线程”只能在任何一次执行。线程获取GIL,做一些工作,然后将GIL传递到下一个线程。

这发生得非常快,以至于人眼似乎您的线程正在并行执行,但是实际上它们只是使用相同的CPU内核轮流执行。

所有这些GIL传递都会增加执行开销。这意味着,如果您想使代码运行更快,那么使用线程包通常不是一个好主意。

有理由使用Python的线程包。如果您想同时运行某些东西,而效率不是问题,那么它就很好而且很方便。或者,如果您正在运行的代码需要等待某些东西(例如某些I / O),那么这很有意义。但是线程库不允许您使用额外的CPU内核。

多线程可以外包给操作系统(通过执行多处理),某些外部应用程序可以调用Python代码(例如SparkHadoop),也可以外包给Python代码调用的某些代码(例如:让您的Python代码调用C函数来执行昂贵的多线程任务)。

为什么如此重要

因为很多人在学习GIL是什么之前,会花费大量时间试图在他们喜欢的Python多线程代码中找到瓶颈。

清除此信息后,这是我的代码:

#!/bin/python
from multiprocessing.dummy import Pool
from subprocess import PIPE,Popen
import time
import os

# In the variable pool_size we define the "parallelness".
# For CPU-bound tasks, it doesn't make sense to create more Pool processes
# than you have cores to run them on.
#
# On the other hand, if you are using I/O-bound tasks, it may make sense
# to create a quite a few more Pool processes than cores, since the processes
# will probably spend most their time blocked (waiting for I/O to complete).
pool_size = 8

def do_ping(ip):
    if os.name == 'nt':
        print ("Using Windows Ping to " + ip)
        proc = Popen(['ping', ip], stdout=PIPE)
        return proc.communicate()[0]
    else:
        print ("Using Linux / Unix Ping to " + ip)
        proc = Popen(['ping', ip, '-c', '4'], stdout=PIPE)
        return proc.communicate()[0]


os.system('cls' if os.name=='nt' else 'clear')
print ("Running using threads\n")
start_time = time.time()
pool = Pool(pool_size)
website_names = ["www.google.com","www.facebook.com","www.pinterest.com","www.microsoft.com"]
result = {}
for website_name in website_names:
    result[website_name] = pool.apply_async(do_ping, args=(website_name,))
pool.close()
pool.join()
print ("\n--- Execution took {} seconds ---".format((time.time() - start_time)))

# Now we do the same without threading, just to compare time
print ("\nRunning NOT using threads\n")
start_time = time.time()
for website_name in website_names:
    do_ping(website_name)
print ("\n--- Execution took {} seconds ---".format((time.time() - start_time)))

# Here's one way to print the final output from the threads
output = {}
for key, value in result.items():
    output[key] = value.get()
print ("\nOutput aggregated in a Dictionary:")
print (output)
print ("\n")

print ("\nPretty printed output: ")
for key, value in output.items():
    print (key + "\n")
    print (value)

I would like to contribute with a simple example and the explanations I’ve found useful when I had to tackle this problem myself.

In this answer you will find some information about Python’s GIL (global interpreter lock) and a simple day-to-day example written using multiprocessing.dummy plus some simple benchmarks.

Global Interpreter Lock (GIL)

Python doesn’t allow multi-threading in the truest sense of the word. It has a multi-threading package, but if you want to multi-thread to speed your code up, then it’s usually not a good idea to use it.

Python has a construct called the global interpreter lock (GIL). The GIL makes sure that only one of your ‘threads’ can execute at any one time. A thread acquires the GIL, does a little work, then passes the GIL onto the next thread.

This happens very quickly so to the human eye it may seem like your threads are executing in parallel, but they are really just taking turns using the same CPU core.

All this GIL passing adds overhead to execution. This means that if you want to make your code run faster then using the threading package often isn’t a good idea.

There are reasons to use Python’s threading package. If you want to run some things simultaneously, and efficiency is not a concern, then it’s totally fine and convenient. Or if you are running code that needs to wait for something (like some I/O) then it could make a lot of sense. But the threading library won’t let you use extra CPU cores.

Multi-threading can be outsourced to the operating system (by doing multi-processing), and some external application that calls your Python code (for example, Spark or Hadoop), or some code that your Python code calls (for example: you could have your Python code call a C function that does the expensive multi-threaded stuff).

Why This Matters

Because lots of people spend a lot of time trying to find bottlenecks in their fancy Python multi-threaded code before they learn what the GIL is.

Once this information is clear, here’s my code:

#!/bin/python
from multiprocessing.dummy import Pool
from subprocess import PIPE,Popen
import time
import os

# In the variable pool_size we define the "parallelness".
# For CPU-bound tasks, it doesn't make sense to create more Pool processes
# than you have cores to run them on.
#
# On the other hand, if you are using I/O-bound tasks, it may make sense
# to create a quite a few more Pool processes than cores, since the processes
# will probably spend most their time blocked (waiting for I/O to complete).
pool_size = 8

def do_ping(ip):
    if os.name == 'nt':
        print ("Using Windows Ping to " + ip)
        proc = Popen(['ping', ip], stdout=PIPE)
        return proc.communicate()[0]
    else:
        print ("Using Linux / Unix Ping to " + ip)
        proc = Popen(['ping', ip, '-c', '4'], stdout=PIPE)
        return proc.communicate()[0]


os.system('cls' if os.name=='nt' else 'clear')
print ("Running using threads\n")
start_time = time.time()
pool = Pool(pool_size)
website_names = ["www.google.com","www.facebook.com","www.pinterest.com","www.microsoft.com"]
result = {}
for website_name in website_names:
    result[website_name] = pool.apply_async(do_ping, args=(website_name,))
pool.close()
pool.join()
print ("\n--- Execution took {} seconds ---".format((time.time() - start_time)))

# Now we do the same without threading, just to compare time
print ("\nRunning NOT using threads\n")
start_time = time.time()
for website_name in website_names:
    do_ping(website_name)
print ("\n--- Execution took {} seconds ---".format((time.time() - start_time)))

# Here's one way to print the final output from the threads
output = {}
for key, value in result.items():
    output[key] = value.get()
print ("\nOutput aggregated in a Dictionary:")
print (output)
print ("\n")

print ("\nPretty printed output: ")
for key, value in output.items():
    print (key + "\n")
    print (value)

回答 15

这是带有一个简单示例的多线程,将很有帮助。您可以运行它并轻松了解Python中多线程的工作方式。在以前的线程完成其工作之前,我使用了一个锁来防止访问其他线程。通过使用这一行代码,

tLock = threading.BoundedSemaphore(值= 4)

您可以一次允许多个进程,并保留其余线程,这些线程将在以后的进程或之前的进程完成后运行。

import threading
import time

#tLock = threading.Lock()
tLock = threading.BoundedSemaphore(value=4)
def timer(name, delay, repeat):
    print  "\r\nTimer: ", name, " Started"
    tLock.acquire()
    print "\r\n", name, " has the acquired the lock"
    while repeat > 0:
        time.sleep(delay)
        print "\r\n", name, ": ", str(time.ctime(time.time()))
        repeat -= 1

    print "\r\n", name, " is releaseing the lock"
    tLock.release()
    print "\r\nTimer: ", name, " Completed"

def Main():
    t1 = threading.Thread(target=timer, args=("Timer1", 2, 5))
    t2 = threading.Thread(target=timer, args=("Timer2", 3, 5))
    t3 = threading.Thread(target=timer, args=("Timer3", 4, 5))
    t4 = threading.Thread(target=timer, args=("Timer4", 5, 5))
    t5 = threading.Thread(target=timer, args=("Timer5", 0.1, 5))

    t1.start()
    t2.start()
    t3.start()
    t4.start()
    t5.start()

    print "\r\nMain Complete"

if __name__ == "__main__":
    Main()

Here is multi threading with a simple example which will be helpful. You can run it and understand easily how multi threading is working in Python. I used a lock for preventing access to other threads until the previous threads finished their work. By the use of this line of code,

tLock = threading.BoundedSemaphore(value=4)

you can allow a number of processes at a time and keep hold to the rest of the threads which will run later or after finished previous processes.

import threading
import time

#tLock = threading.Lock()
tLock = threading.BoundedSemaphore(value=4)
def timer(name, delay, repeat):
    print  "\r\nTimer: ", name, " Started"
    tLock.acquire()
    print "\r\n", name, " has the acquired the lock"
    while repeat > 0:
        time.sleep(delay)
        print "\r\n", name, ": ", str(time.ctime(time.time()))
        repeat -= 1

    print "\r\n", name, " is releaseing the lock"
    tLock.release()
    print "\r\nTimer: ", name, " Completed"

def Main():
    t1 = threading.Thread(target=timer, args=("Timer1", 2, 5))
    t2 = threading.Thread(target=timer, args=("Timer2", 3, 5))
    t3 = threading.Thread(target=timer, args=("Timer3", 4, 5))
    t4 = threading.Thread(target=timer, args=("Timer4", 5, 5))
    t5 = threading.Thread(target=timer, args=("Timer5", 0.1, 5))

    t1.start()
    t2.start()
    t3.start()
    t4.start()
    t5.start()

    print "\r\nMain Complete"

if __name__ == "__main__":
    Main()

回答 16

通过从这篇文章中借用,我们知道在多线程,多处理和异步/ asyncio及其用法之间进行选择。

Python 3具有新的内置库以实现并发性和并行性:current.futures

因此,我将通过一个实验来演示如何通过以下方式运行四个任务(即.sleep()方法)Threading-Pool

from concurrent.futures import ThreadPoolExecutor, as_completed
from time import sleep, time

def concurrent(max_worker=1):
    futures = []

    tick = time()
    with ThreadPoolExecutor(max_workers=max_worker) as executor:
        futures.append(executor.submit(sleep, 2))  # Two seconds sleep
        futures.append(executor.submit(sleep, 1))
        futures.append(executor.submit(sleep, 7))
        futures.append(executor.submit(sleep, 3))

        for future in as_completed(futures):
            if future.result() is not None:
                print(future.result())

    print('Total elapsed time by {} workers:'.format(max_worker), time()-tick)

concurrent(5)
concurrent(4)
concurrent(3)
concurrent(2)
concurrent(1)

输出:

Total elapsed time by 5 workers: 7.007831811904907
Total elapsed time by 4 workers: 7.007944107055664
Total elapsed time by 3 workers: 7.003149509429932
Total elapsed time by 2 workers: 8.004627466201782
Total elapsed time by 1 workers: 13.013478994369507

[ 注意 ]:

  • 从以上结果中可以看出,最好的情况是这四个任务需要3个工作人员。
  • 如果您有流程任务而不是I / O绑定或阻止(multiprocessingthreading),则可以将更ThreadPoolExecutor改为ProcessPoolExecutor

With borrowing from this post we know about choosing between the multithreading, multiprocessing, and async/asyncio and their usage.

Python 3 has a new built-in library in order to concurrency and parallelism: concurrent.futures

So I’ll demonstrate through an experiment to run four tasks (i.e. .sleep() method) by Threading-Pool manner:

from concurrent.futures import ThreadPoolExecutor, as_completed
from time import sleep, time

def concurrent(max_worker=1):
    futures = []

    tick = time()
    with ThreadPoolExecutor(max_workers=max_worker) as executor:
        futures.append(executor.submit(sleep, 2))  # Two seconds sleep
        futures.append(executor.submit(sleep, 1))
        futures.append(executor.submit(sleep, 7))
        futures.append(executor.submit(sleep, 3))

        for future in as_completed(futures):
            if future.result() is not None:
                print(future.result())

    print('Total elapsed time by {} workers:'.format(max_worker), time()-tick)

concurrent(5)
concurrent(4)
concurrent(3)
concurrent(2)
concurrent(1)

Output:

Total elapsed time by 5 workers: 7.007831811904907
Total elapsed time by 4 workers: 7.007944107055664
Total elapsed time by 3 workers: 7.003149509429932
Total elapsed time by 2 workers: 8.004627466201782
Total elapsed time by 1 workers: 13.013478994369507

[NOTE]:

  • As you can see in the above results, the best case was 3 workers for those four tasks.
  • If you have a process task instead of I/O bound or blocking (multiprocessing vs threading) you could change the ThreadPoolExecutor to ProcessPoolExecutor.

回答 17

先前的解决方案均未在我的GNU / Linux服务器(我没有管理员权限)上实际使用多个内核。他们只是在一个核心上运行。

我使用了较低级别的os.fork界面来生成多个进程。这是对我有用的代码:

from os import fork

values = ['different', 'values', 'for', 'threads']

for i in range(len(values)):
    p = fork()
    if p == 0:
        my_function(values[i])
        break

None of the previous solutions actually used multiple cores on my GNU/Linux server (where I don’t have administrator rights). They just ran on a single core.

I used the lower level os.fork interface to spawn multiple processes. This is the code that worked for me:

from os import fork

values = ['different', 'values', 'for', 'threads']

for i in range(len(values)):
    p = fork()
    if p == 0:
        my_function(values[i])
        break

回答 18

import threading
import requests

def send():

  r = requests.get('https://www.stackoverlow.com')

thread = []
t = threading.Thread(target=send())
thread.append(t)
t.start()
import threading
import requests

def send():

  r = requests.get('https://www.stackoverlow.com')

thread = []
t = threading.Thread(target=send())
thread.append(t)
t.start()

创建具有列表理解的字典

问题:创建具有列表理解的字典

我喜欢Python列表理解语法。

它也可以用来创建字典吗?例如,通过遍历键和值对:

mydict = {(k,v) for (k,v) in blah blah blah}  # doesn't work

I like the Python list comprehension syntax.

Can it be used to create dictionaries too? For example, by iterating over pairs of keys and values:

mydict = {(k,v) for (k,v) in blah blah blah}  # doesn't work

回答 0

从Python 2.7和3开始,您应该只使用dict comprehension语法

{key: value for (key, value) in iterable}

在Python 2.6和更早版本中,dict内置函数可以接收键/值对的迭代,因此您可以将其传递给列表推导或生成器表达式。例如:

dict((key, func(key)) for key in keys)

但是,如果您已经具有可迭代的键和/或值,则根本不需要使用任何理解-最简单dict的方法是直接调用内置函数:

# consumed from any iterable yielding pairs of keys/vals
dict(pairs)

# "zipped" from two separate iterables of keys/vals
dict(zip(list_of_keys, list_of_values))

From Python 2.7 and 3 onwards, you should just use the dict comprehension syntax:

{key: value for (key, value) in iterable}

In Python 2.6 and earlier, the dict built-in can receive an iterable of key/value pairs, so you can pass it a list comprehension or generator expression. For example:

dict((key, func(key)) for key in keys)

However if you already have iterable(s) of keys and/or vals, you needn’t use a comprehension at all – it’s simplest just call the dict built-in directly:

# consumed from any iterable yielding pairs of keys/vals
dict(pairs)

# "zipped" from two separate iterables of keys/vals
dict(zip(list_of_keys, list_of_values))

回答 1

在Python 3和Python 2.7+中,字典理解如下所示:

d = {k:v for k, v in iterable}

对于Python 2.6或更早版本,请参见fortran的答案

In Python 3 and Python 2.7+, dictionary comprehensions look like the below:

d = {k:v for k, v in iterable}

For Python 2.6 or earlier, see fortran’s answer.


回答 2

实际上,如果它已经包含某种映射,则您甚至不需要遍历可迭代对象,dict构造函数会为您轻松地做到这一点:

>>> ts = [(1, 2), (3, 4), (5, 6)]
>>> dict(ts)
{1: 2, 3: 4, 5: 6}
>>> gen = ((i, i+1) for i in range(1, 6, 2))
>>> gen
<generator object <genexpr> at 0xb7201c5c>
>>> dict(gen)
{1: 2, 3: 4, 5: 6}

In fact, you don’t even need to iterate over the iterable if it already comprehends some kind of mapping, the dict constructor doing it graciously for you:

>>> ts = [(1, 2), (3, 4), (5, 6)]
>>> dict(ts)
{1: 2, 3: 4, 5: 6}
>>> gen = ((i, i+1) for i in range(1, 6, 2))
>>> gen
<generator object <genexpr> at 0xb7201c5c>
>>> dict(gen)
{1: 2, 3: 4, 5: 6}

回答 3

在Python 2.7中,它类似于:

>>> list1, list2 = ['a', 'b', 'c'], [1,2,3]
>>> dict( zip( list1, list2))
{'a': 1, 'c': 3, 'b': 2}

压缩他们

In Python 2.7, it goes like:

>>> list1, list2 = ['a', 'b', 'c'], [1,2,3]
>>> dict( zip( list1, list2))
{'a': 1, 'c': 3, 'b': 2}

Zip them!


回答 4

在Python中创建具有列表理解的字典

我喜欢Python列表理解语法。

它也可以用来创建字典吗?例如,通过遍历键和值对:

mydict = {(k,v) for (k,v) in blah blah blah}

您正在寻找“ dict comprehension”一词-实际上是:

mydict = {k: v for k, v in iterable}

假设blah blah blah是两个元组的迭代-您是如此亲密。让我们创建一些类似的“ blah”:

blahs = [('blah0', 'blah'), ('blah1', 'blah'), ('blah2', 'blah'), ('blah3', 'blah')]

Dict理解语法:

现在,这里的语法是映射部分。使它成为dict理解而不是set理解(这是您的伪代码近似值)的是冒号,:如下所示:

mydict = {k: v for k, v in blahs}

而且我们看到它起作用了,并且应该保留Python 3.7的插入顺序:

>>> mydict
{'blah0': 'blah', 'blah1': 'blah', 'blah2': 'blah', 'blah3': 'blah'}

在Python 2和3.6以下版本中,不能保证顺序:

>>> mydict
{'blah0': 'blah', 'blah1': 'blah', 'blah3': 'blah', 'blah2': 'blah'}

添加过滤器:

所有的理解都具有一个映射组件和一个过滤组件,您可以为它们提供任意表达式。

因此,您可以在末尾添加一个过滤器部分:

>>> mydict = {k: v for k, v in blahs if not int(k[-1]) % 2}
>>> mydict
{'blah0': 'blah', 'blah2': 'blah'}

在这里,我们只是测试最后符是否可被2整除以在映射键和值之前过滤掉数据。

Create a dictionary with list comprehension in Python

I like the Python list comprehension syntax.

Can it be used to create dictionaries too? For example, by iterating over pairs of keys and values:

mydict = {(k,v) for (k,v) in blah blah blah}

You’re looking for the phrase “dict comprehension” – it’s actually:

mydict = {k: v for k, v in iterable}

Assuming blah blah blah is an iterable of two-tuples – you’re so close. Let’s create some “blahs” like that:

blahs = [('blah0', 'blah'), ('blah1', 'blah'), ('blah2', 'blah'), ('blah3', 'blah')]

Dict comprehension syntax:

Now the syntax here is the mapping part. What makes this a dict comprehension instead of a set comprehension (which is what your pseudo-code approximates) is the colon, : like below:

mydict = {k: v for k, v in blahs}

And we see that it worked, and should retain insertion order as-of Python 3.7:

>>> mydict
{'blah0': 'blah', 'blah1': 'blah', 'blah2': 'blah', 'blah3': 'blah'}

In Python 2 and up to 3.6, order was not guaranteed:

>>> mydict
{'blah0': 'blah', 'blah1': 'blah', 'blah3': 'blah', 'blah2': 'blah'}

Adding a Filter:

All comprehensions feature a mapping component and a filtering component that you can provide with arbitrary expressions.

So you can add a filter part to the end:

>>> mydict = {k: v for k, v in blahs if not int(k[-1]) % 2}
>>> mydict
{'blah0': 'blah', 'blah2': 'blah'}

Here we are just testing for if the last character is divisible by 2 to filter out data before mapping the keys and values.


回答 5

Python版本<2.7 (RIP,2010年7月3日至2019年12月31日),请执行以下操作:

d = dict((i,True) for i in [1,2,3])

Python版本> = 2.7,请执行以下操作:

d = {i: True for i in [1,2,3]}

Python version < 2.7(RIP, 3 July 2010 – 31 December 2019), do the below:

d = dict((i,True) for i in [1,2,3])

Python version >= 2.7, do the below:

d = {i: True for i in [1,2,3]}

回答 6

如果要遍历键key_list列表和值列表,请添加到@fortran的答案中value_list

d = dict((key, value) for (key, value) in zip(key_list, value_list))

要么

d = {(key, value) for (key, value) in zip(key_list, value_list)}

To add onto @fortran’s answer, if you want to iterate over a list of keys key_list as well as a list of values value_list:

d = dict((key, value) for (key, value) in zip(key_list, value_list))

or

d = {(key, value) for (key, value) in zip(key_list, value_list)}

回答 7

这是使用dict理解创建字典的另一个示例:

我在这里要做的是在每对字典中创建一个字母字典。是英文字母及其在英文字母中的对应位置

>>> import string
>>> dict1 = {value: (int(key) + 1) for key, value in 
enumerate(list(string.ascii_lowercase))}
>>> dict1
{'a': 1, 'c': 3, 'b': 2, 'e': 5, 'd': 4, 'g': 7, 'f': 6, 'i': 9, 'h': 8, 
'k': 11, 'j': 10, 'm': 13, 'l': 12, 'o': 15, 'n': 14, 'q': 17, 'p': 16, 's': 
19, 'r': 18, 'u': 21, 't': 20, 'w': 23, 'v': 22, 'y': 25, 'x': 24, 'z': 26}
>>> 

请注意,此处使用枚举可获取列表中的字母及其索引,并交换字母和索引以生成字典的键值对

希望它对您的字典组合有个好主意,并鼓励您更多地使用它来使代码紧凑

Here is another example of dictionary creation using dict comprehension:

What i am tring to do here is to create a alphabet dictionary where each pair; is the english letter and its corresponding position in english alphabet

>>> import string
>>> dict1 = {value: (int(key) + 1) for key, value in 
enumerate(list(string.ascii_lowercase))}
>>> dict1
{'a': 1, 'c': 3, 'b': 2, 'e': 5, 'd': 4, 'g': 7, 'f': 6, 'i': 9, 'h': 8, 
'k': 11, 'j': 10, 'm': 13, 'l': 12, 'o': 15, 'n': 14, 'q': 17, 'p': 16, 's': 
19, 'r': 18, 'u': 21, 't': 20, 'w': 23, 'v': 22, 'y': 25, 'x': 24, 'z': 26}
>>> 

Notice the use of enumerate here to get a list of alphabets and their indexes in the list and swapping the alphabets and indices to generate the key value pair for dictionary

Hope it gives a good idea of dictionary comp to you and encourages you to use it more often to make your code compact


回答 8

尝试这个,

def get_dic_from_two_lists(keys, values):
    return { keys[i] : values[i] for i in range(len(keys)) }

假设我们有两个清单,国家首都

country = ['India', 'Pakistan', 'China']
capital = ['New Delhi', 'Islamabad', 'Beijing']

然后从两个列表中创建字典:

print get_dic_from_two_lists(country, capital)

输出是这样的,

{'Pakistan': 'Islamabad', 'China': 'Beijing', 'India': 'New Delhi'}

Try this,

def get_dic_from_two_lists(keys, values):
    return { keys[i] : values[i] for i in range(len(keys)) }

Assume we have two lists country and capital

country = ['India', 'Pakistan', 'China']
capital = ['New Delhi', 'Islamabad', 'Beijing']

Then create dictionary from the two lists:

print get_dic_from_two_lists(country, capital)

The output is like this,

{'Pakistan': 'Islamabad', 'China': 'Beijing', 'India': 'New Delhi'}

回答 9

>>> {k: v**3 for (k, v) in zip(string.ascii_lowercase, range(26))}

Python支持dict理解,它允许您使用类似的简洁语法在运行时表示字典的创建。

字典理解采用{key:(key,value)inerable中的值}的形式。该语法是在Python 3中引入的,并且一直移植到Python 2.7,因此,无论安装了哪个版本的Python,您都应该能够使用它。

一个典型的例子是获取两个列表并创建一个字典,其中第一个列表中每个位置的项成为键,而第二个列表中相应位置的项变为值。

此推导中使用的zip函数返回一个元组的迭代器,其中元组中的每个元素均取自每个输入可迭代对象中的相同位置。在上面的示例中,返回的迭代器包含元组(“ a”,1),(“ b”,2)等。

输出:

{‘i’:512,’e’:64,’o’:2744,’h’:343,’l’:1331,’s’:5832,’b’:1,’w’:10648,’ c’:8,’x’:12167,’y’:13824,’t’:6859,’p’:3375,’d’:27,’j’:729,’a’:0,’z’ :15625,’f’:125,’q’:4096,’u’:8000,’n’:2197,’m’:1728,’r’:4913,’k’:1000,’g’:216 ,’v’:9261}

>>> {k: v**3 for (k, v) in zip(string.ascii_lowercase, range(26))}

Python supports dict comprehensions, which allow you to express the creation of dictionaries at runtime using a similarly concise syntax.

A dictionary comprehension takes the form {key: value for (key, value) in iterable}. This syntax was introduced in Python 3 and backported as far as Python 2.7, so you should be able to use it regardless of which version of Python you have installed.

A canonical example is taking two lists and creating a dictionary where the item at each position in the first list becomes a key and the item at the corresponding position in the second list becomes the value.

The zip function used inside this comprehension returns an iterator of tuples, where each element in the tuple is taken from the same position in each of the input iterables. In the example above, the returned iterator contains the tuples (“a”, 1), (“b”, 2), etc.

Output:

{‘i’: 512, ‘e’: 64, ‘o’: 2744, ‘h’: 343, ‘l’: 1331, ‘s’: 5832, ‘b’: 1, ‘w’: 10648, ‘c’: 8, ‘x’: 12167, ‘y’: 13824, ‘t’: 6859, ‘p’: 3375, ‘d’: 27, ‘j’: 729, ‘a’: 0, ‘z’: 15625, ‘f’: 125, ‘q’: 4096, ‘u’: 8000, ‘n’: 2197, ‘m’: 1728, ‘r’: 4913, ‘k’: 1000, ‘g’: 216, ‘v’: 9261}


回答 10

此代码将使用列表推导为多个具有不同值的列表创建字典,这些字典可用于 pd.DataFrame()

#Multiple lists 
model=['A', 'B', 'C', 'D']
launched=[1983,1984,1984,1984]
discontinued=[1986, 1985, 1984, 1986]

#Dictionary with list comprehension
keys=['model','launched','discontinued']
vals=[model, launched,discontinued]
data = {key:vals[n] for n, key in enumerate(keys)}

enumerate将通过nvals使其与key列表匹配

This code will create dictionary using list comprehension for multiple lists with different values that can be used for pd.DataFrame()

#Multiple lists 
model=['A', 'B', 'C', 'D']
launched=[1983,1984,1984,1984]
discontinued=[1986, 1985, 1984, 1986]

#Dictionary with list comprehension
keys=['model','launched','discontinued']
vals=[model, launched,discontinued]
data = {key:vals[n] for n, key in enumerate(keys)}

enumerate will pass n to vals to match each key with its list


回答 11

仅举另一个例子。假设您有以下列表:

nums = [4,2,2,1,3]

并且您想将其变成字典,其中键是索引,值是列表中的元素。您可以使用以下代码行执行此操作:

{index:nums[index] for index in range(0,len(nums))}

Just to throw in another example. Imagine you have the following list:

nums = [4,2,2,1,3]

and you want to turn it into a dict where the key is the index and value is the element in the list. You can do so with the following line of code:

{index:nums[index] for index in range(0,len(nums))}

回答 12

您可以为每对创建一个新的字典并将其与上一个字典合并:

reduce(lambda p, q: {**p, **{q[0]: q[1]}}, bla bla bla, {})

显然,这种方法需要reduce来自functools

You can create a new dict for each pair and merge it with the previous dict:

reduce(lambda p, q: {**p, **{q[0]: q[1]}}, bla bla bla, {})

Obviously this approaches requires reduce from functools.


在Python中检查类型的规范方法是什么?

问题:在Python中检查类型的规范方法是什么?

检查给定对象是否为给定类型的最佳方法是什么?如何检查对象是否从给定类型继承?

假设我有一个对象o。如何检查是否是str

What is the best way to check whether a given object is of a given type? How about checking whether the object inherits from a given type?

Let’s say I have an object o. How do I check whether it’s a str?


回答 0

要检查o是的实例str还是的任何子类str,请使用isinstance(这是“规范”的方式):

if isinstance(o, str):

要检查的类型o是否准确str(排除子类):

if type(o) is str:

以下内容也可以使用,并且在某些情况下可能有用:

if issubclass(type(o), str):

有关相关信息,请参见Python库参考中的内置函数

还有一点要注意:在这种情况下,如果您使用的是Python 2,则实际上可能要使用:

if isinstance(o, basestring):

因为这也将赶上Unicode字符串(unicode不是的子类str,这两个strunicode是的子类basestring)。请注意,basestring在Python 3中不再存在,在Python 3中,字符串()和二进制数据()严格分开strbytes

或者,isinstance接受一个类的元组。True如果o是以下任何一个的任何子类的实例,则将返回(str, unicode)

if isinstance(o, (str, unicode)):

To check if o is an instance of str or any subclass of str, use isinstance (this would be the “canonical” way):

if isinstance(o, str):

To check if the type of o is exactly str (exclude subclasses):

if type(o) is str:

The following also works, and can be useful in some cases:

if issubclass(type(o), str):

See Built-in Functions in the Python Library Reference for relevant information.

One more note: in this case, if you’re using Python 2, you may actually want to use:

if isinstance(o, basestring):

because this will also catch Unicode strings (unicode is not a subclass of str; both str and unicode are subclasses of basestring). Note that basestring no longer exists in Python 3, where there’s a strict separation of strings (str) and binary data (bytes).

Alternatively, isinstance accepts a tuple of classes. This will return True if o is an instance of any subclass of any of (str, unicode):

if isinstance(o, (str, unicode)):

回答 1

检查对象类型的 Pythonic方法是…不检查它。

由于Python鼓励使用Duck Typing,因此您应该只try...except使用对象的方法来使用它们。因此,如果您的函数正在寻找可写文件对象,则不要检查它是否是的子类file,只需尝试使用其.write()方法即可!

当然,有时这些漂亮的抽象会分解,isinstance(obj, cls)这正是您所需要的。但是要谨慎使用。

The most Pythonic way to check the type of an object is… not to check it.

Since Python encourages Duck Typing, you should just try...except to use the object’s methods the way you want to use them. So if your function is looking for a writable file object, don’t check that it’s a subclass of file, just try to use its .write() method!

Of course, sometimes these nice abstractions break down and isinstance(obj, cls) is what you need. But use sparingly.


回答 2

isinstance(o, str)True如果ostr或继承自的类型,则将返回str

type(o) is strTrue当且仅当o是str时将返回。False如果o是继承自的类型,它将返回str

isinstance(o, str) will return True if o is an str or is of a type that inherits from str.

type(o) is str will return True if and only if o is a str. It will return False if o is of a type that inherits from str.


回答 3

在询问并回答问题之后,将类型提示添加到Python中。Python中的类型提示允许检查类型,但与静态类型的语言完全不同。Python中的类型提示将期望的参数类型与函数关联,作为与函数关联的运行时可访问的数据,这允许检查类型。类型提示语法的示例:

def foo(i: int):
    return i

foo(5)
foo('oops')

在这种情况下,我们希望触发错误,foo('oops')因为参数的带注释类型为int。正常运行脚本时,添加的类型提示不会导致发生错误。但是,它向函数添加了属性,这些属性描述了其他程序可以查询并用于检查类型错误的预期类型。

可以用来查找类型错误的其他程序之一是mypy

mypy script.py
script.py:12: error: Argument 1 to "foo" has incompatible type "str"; expected "int"

(您可能需要mypy从软件包管理器进行安装。我不认为CPython附带了该软件包,但似乎具有一定的“正式性”。)

这种方式的类型检查不同于静态类型的编译语言中的类型检查。由于类型在Python中是动态的,因此类型检查必须在运行时完成,如果我们坚持要千方百计地进行,则即使在正确的程序上也要付出代价。显式类型检查也可能比需要的限制更严格,并且会导致不必要的错误(例如,参数是否确实需要精确地list类型或可迭代的内容是否足够?)。

显式类型检查的好处在于,与鸭子输入相比,它可以更早地捕获错误并给出更清晰的错误消息。鸭子类型的确切要求只能用外部文档来表达(希望它是彻底而准确的),并且不兼容类型的错误可能发生在它们起源的地方。

Python的类型提示旨在提供一种折衷方案,可以在其中指定和检查类型,但在常规代码执行过程中不会增加任何成本。

typing可以在类型提示使用配套优惠类型变量来表达需要的行为,而不需要特定类型。例如,它包含诸如Iterable和的变量,Callable用于提示是否需要具有这些行为的任何类型。

尽管类型提示是检查类型的最Pythonic方式,但通常甚至根本不检查类型并依赖于鸭子类型甚至更像Pythonic。类型提示是相对较新的,并且在它们是最Pythonic的解决方案时还没有定论。相对无争议的但非常笼统的比较:类型提示提供了一种可以强制执行的文档形式,允许代码生成更早且更容易理解的错误,可以捕获鸭子输入不能的错误,并且可以静态检查(在不寻常的情况下)感觉,但它仍在运行时之外)。另一方面,鸭子类型很长一段时间以来一直是Python的方式,不会增加静态类型的认知开销,不那么冗长,并且会接受所有可行的类型,然后接受某些类型。

After the question was asked and answered, type hints were added to Python. Type hints in Python allow types to be checked but in a very different way from statically typed languages. Type hints in Python associate the expected types of arguments with functions as runtime accessible data associated with functions and this allows for types to be checked. Example of type hint syntax:

def foo(i: int):
    return i

foo(5)
foo('oops')

In this case we want an error to be triggered for foo('oops') since the annotated type of the argument is int. The added type hint does not cause an error to occur when the script is run normally. However, it adds attributes to the function describing the expected types that other programs can query and use to check for type errors.

One of these other programs that can be used to find the type error is mypy:

mypy script.py
script.py:12: error: Argument 1 to "foo" has incompatible type "str"; expected "int"

(You might need to install mypy from your package manager. I don’t think it comes with CPython but seems to have some level of “officialness”.)

Type checking this way is different from type checking in statically typed compiled languages. Because types are dynamic in Python, type checking must be done at runtime, which imposes a cost — even on correct programs — if we insist that it happen at every chance. Explicit type checks may also be more restrictive than needed and cause unnecessary errors (e.g. does the argument really need to be of exactly list type or is anything iterable sufficient?).

The upside of explicit type checking is that it can catch errors earlier and give clearer error messages than duck typing. The exact requirements of a duck type can only be expressed with external documentation (hopefully it’s thorough and accurate) and errors from incompatible types can occur far from where they originate.

Python’s type hints are meant to offer a compromise where types can be specified and checked but there is no additional cost during usual code execution.

The typing package offers type variables that can be used in type hints to express needed behaviors without requiring particular types. For example, it includes variables such as Iterable and Callable for hints to specify the need for any type with those behaviors.

While type hints are the most Pythonic way to check types, it’s often even more Pythonic to not check types at all and rely on duck typing. Type hints are relatively new and the jury is still out on when they’re the most Pythonic solution. A relatively uncontroversial but very general comparison: Type hints provide a form of documentation that can be enforced, allow code to generate earlier and easier to understand errors, can catch errors that duck typing can’t, and can be checked statically (in an unusual sense but it’s still outside of runtime). On the other hand, duck typing has been the Pythonic way for a long time, doesn’t impose the cognitive overhead of static typing, is less verbose, and will accept all viable types and then some.


回答 4

这是一个为什么鸭式打字是邪恶的却不知道什么时候是危险的例子。例如:这是Python代码(可能省略适当的缩进),请注意,可以通过注意isinstance和issubclassof函数来避免这种情况,以确保当您真正需要鸭子时,不会炸弹。

class Bomb:
    def __init__(self):
        ""

    def talk(self):
        self.explode()

    def explode(self):
        print "BOOM!, The bomb explodes."

class Duck:
    def __init__(self):
        ""
    def talk(self):
        print "I am a duck, I will not blow up if you ask me to talk."    

class Kid:
    kids_duck = None

    def __init__(self):
        print "Kid comes around a corner and asks you for money so he could buy a duck."

    def takeDuck(self, duck):
        self.kids_duck = duck
        print "The kid accepts the duck, and happily skips along"

    def doYourThing(self):
        print "The kid tries to get the duck to talk"
        self.kids_duck.talk()

myKid = Kid()
myBomb = Bomb()
myKid.takeDuck(myBomb)
myKid.doYourThing()

Here is an example why duck typing is evil without knowing when it is dangerous. For instance: Here is the Python code (possibly omitting proper indenting), note that this situation is avoidable by taking care of isinstance and issubclassof functions to make sure that when you really need a duck, you don’t get a bomb.

class Bomb:
    def __init__(self):
        ""

    def talk(self):
        self.explode()

    def explode(self):
        print "BOOM!, The bomb explodes."

class Duck:
    def __init__(self):
        ""
    def talk(self):
        print "I am a duck, I will not blow up if you ask me to talk."    

class Kid:
    kids_duck = None

    def __init__(self):
        print "Kid comes around a corner and asks you for money so he could buy a duck."

    def takeDuck(self, duck):
        self.kids_duck = duck
        print "The kid accepts the duck, and happily skips along"

    def doYourThing(self):
        print "The kid tries to get the duck to talk"
        self.kids_duck.talk()

myKid = Kid()
myBomb = Bomb()
myKid.takeDuck(myBomb)
myKid.doYourThing()

回答 5

isinstance(o, str)

链接到文档


回答 6

我认为使用像Python这样的动态语言的很酷的事情是,您实际上不必检查类似的东西。

我只会在您的对象上调用所需的方法并捕获一个AttributeError。稍后,这将允许您与其他(看似无关)对象一起调用您的方法以完成不同的任务,例如模拟对象进行测试。

当从网络上获取数据urllib2.urlopen()并返回类似对象的文件时,我已经使用了很多。这又可以传递给几乎所有从文件读取的方法,因为它实现了相同的方法。read()方法与真实文件。

但我确定会有时间和地点使用isinstance(),否则可能不会存在:)

I think the cool thing about using a dynamic language like Python is you really shouldn’t have to check something like that.

I would just call the required methods on your object and catch an AttributeError. Later on this will allow you to call your methods with other (seemingly unrelated) objects to accomplish different tasks, such as mocking an object for testing.

I’ve used this a lot when getting data off the web with urllib2.urlopen() which returns a file like object. This can in turn can be passed to almost any method that reads from a file, because it implements the same read() method as a real file.

But I’m sure there is a time and place for using isinstance(), otherwise it probably wouldn’t be there :)


回答 7

对于更复杂的类型验证,我喜欢typeguard基于python类型提示注释的验证方法:

from typeguard import check_type
from typing import List

try:
    check_type('mylist', [1, 2], List[int])
except TypeError as e:
    print(e)

您可以以非常清晰易读的方式执行非常复杂的验证。

check_type('foo', [1, 3.14], List[Union[int, float]])
# vs
isinstance(foo, list) and all(isinstance(a, (int, float)) for a in foo) 

For more complex type validations I like typeguard‘s approach of validating based on python type hint annotations:

from typeguard import check_type
from typing import List

try:
    check_type('mylist', [1, 2], List[int])
except TypeError as e:
    print(e)

You can perform very complex validations in very clean and readable fashion.

check_type('foo', [1, 3.14], List[Union[int, float]])
# vs
isinstance(foo, list) and all(isinstance(a, (int, float)) for a in foo) 

回答 8

您可以使用类型的__name__检查变量的类型。

例如:

>>> a = [1,2,3,4]  
>>> b = 1  
>>> type(a).__name__
'list'
>>> type(a).__name__ == 'list'
True
>>> type(b).__name__ == 'list'
False
>>> type(b).__name__
'int'

You can check for type of a variable using __name__ of a type.

Ex:

>>> a = [1,2,3,4]  
>>> b = 1  
>>> type(a).__name__
'list'
>>> type(a).__name__ == 'list'
True
>>> type(b).__name__ == 'list'
False
>>> type(b).__name__
'int'

回答 9

前往雨果:

你可能是说list而不是array,但这指向类型检查的整个问题-您不想知道所讨论的对象是否是列表,您不想知道它是某种序列还是单个对象。因此,请尝试像序列一样使用它。

假设您要将对象添加到现有序列中,或者如果它是对象序列,则将它们全部添加

try:
   my_sequence.extend(o)
except TypeError:
  my_sequence.append(o)

一个技巧是,如果您正在处理字符串和/或字符串序列-这很棘手,因为字符串通常被认为是单个对象,但是它也是一个字符序列。更糟糕的是,因为它实际上是单长度字符串的序列。

我通常选择设计API,使其只接受一个值或一个序列-这样会使事情变得更容易。[ ]如果需要,在传递单个值时并不难。

(尽管这可能会导致字符串错误,因为它们看起来确实像是序列)。

To Hugo:

You probably mean list rather than array, but that points to the whole problem with type checking – you don’t want to know if the object in question is a list, you want to know if it’s some kind of sequence or if it’s a single object. So try to use it like a sequence.

Say you want to add the object to an existing sequence, or if it’s a sequence of objects, add them all

try:
   my_sequence.extend(o)
except TypeError:
  my_sequence.append(o)

One trick with this is if you are working with strings and/or sequences of strings – that’s tricky, as a string is often thought of as a single object, but it’s also a sequence of characters. Worse than that, as it’s really a sequence of single-length strings.

I usually choose to design my API so that it only accepts either a single value or a sequence – it makes things easier. It’s not hard to put a [ ] around your single value when you pass it in if need be.

(Though this can cause errors with strings, as they do look like (are) sequences.)


回答 10

一种检查类型的简单方法是将其与您知道类型的对象进行比较。

>>> a  = 1
>>> type(a) == type(1)
True
>>> b = 'abc'
>>> type(b) == type('')
True

A simple way to check type is to compare it with something whose type you know.

>>> a  = 1
>>> type(a) == type(1)
True
>>> b = 'abc'
>>> type(b) == type('')
True

回答 11

我认为最好的方法是正确键入变量。您可以使用“ typing”库来做到这一点。

例:

from typing import NewType UserId = NewType ('UserId', int) some_id = UserId (524313)`

参见https://docs.python.org/3/library/typing.html

I think the best way is to typing well your variables. You can do this by using the “typing” library.

Example:

from typing import NewType UserId = NewType ('UserId', int) some_id = UserId (524313)`

See https://docs.python.org/3/library/typing.html


回答 12

您可以检查以下行,以检查给定值是哪种字符类型:

def chr_type(chrx):
    if chrx.isalpha()==True:
        return 'alpha'
    elif chrx.isdigit()==True:
        return 'numeric'
    else:
        return 'nothing'

chr_type("12)

You can check with the below line to check which character type the given value is:

def chr_type(chrx):
    if chrx.isalpha()==True:
        return 'alpha'
    elif chrx.isdigit()==True:
        return 'numeric'
    else:
        return 'nothing'

chr_type("12)

type()和isinstance()有什么区别?

问题:type()和isinstance()有什么区别?

这两个代码片段之间有什么区别?

使用type()

import types

if type(a) is types.DictType:
    do_something()
if type(b) in types.StringTypes:
    do_something_else()

使用isinstance()

if isinstance(a, dict):
    do_something()
if isinstance(b, str) or isinstance(b, unicode):
    do_something_else()

What are the differences between these two code fragments?

Using type():

import types

if type(a) is types.DictType:
    do_something()
if type(b) in types.StringTypes:
    do_something_else()

Using isinstance():

if isinstance(a, dict):
    do_something()
if isinstance(b, str) or isinstance(b, unicode):
    do_something_else()

回答 0

总结其他(已经很好!)答案的内容,isinstance迎合继承(派生类实例也是基类实例),而检查的相等性type则不(要求类型的标识并拒绝实例)子类型,又称为AKA子类)。

通常,在Python中,当然,您希望您的代码支持继承(由于继承非常方便,因此停止使用您的代码来使用它会很糟糕!),这isinstance比检查types的身份要糟糕得多,因为它无缝地支持s遗产。

这并不是说isinstance不错的,你要知道,它只是不那么糟糕不是检查的类型平等。正常的,Python式的首选解决方案几乎总是“鸭式输入”:尝试使用参数,就好像它是某个所需的类型一样,在try/ except语句中进行处理,以捕获如果参数实际上不是该参数可能会出现的所有异常类型(或其他可以模仿它的其他类型;-),然后在except子句中尝试其他操作(使用参数“好像”是其他类型)。

basestring ,但是,相当多的特殊情况,一个内建存在类型让你使用isinstance(包括strunicode子类basestring)。字符串是序列(您可以对它们进行循环,对其进行索引,对其进行切片等),但是您通常希望将它们视为“标量”类型-处理各种类型的字符串有点不方便(但在相当频繁的情况下)字符串(可能还有其他标量类型,即您不能循环的类型),所有容器(列表,集合,字典,…)的另一种方法,basestring加上这些isinstance可以帮助您做到这一点–总体结构成语是这样的:

if isinstance(x, basestring)
  return treatasscalar(x)
try:
  return treatasiter(iter(x))
except TypeError:
  return treatasscalar(x)

您可以说这basestring是一个抽象基类(“ ABC”)-它没有为子类提供具体功能,而是作为“标记”存在,主要用于isinstance。自从引入PEP 3119(引入了它的概括)以来,该概念显然在Python中正在不断发展。自从PEP 3119以来,该概念已从Python 2.6和3.0开始实施。

PEP清楚地表明,尽管ABC通常可以代替鸭类打字,但这样做通常没有很大的压力(请参见此处)。但是,在最新的Python版本中实现的ABC确实提供了额外的好处:(isinstanceissubclass)现在的含义不仅仅是“派生类的一个实例”(特别是,任何类都可以在ABC中“注册”,以便它可以显示为子类,其实例显示为ABC实例);并且ABC还可以通过模板方法设计模式应用程序以一种非常自然的方式为实际子类提供额外的便利(有关TM DP的更多信息,请参见此处此处的 [[Part II]],有关Python的更多信息,特别是在Python中,与ABC无关) 。

有关Python 2.6中提供的ABC支持的基本机制,请参见此处;其3.1版本非常相似,请参见此处。在这两个版本中,标准库模块集合(即3.1版本,对于非常相似的2.6版本,请参见此处)都提供了一些有用的ABC。

出于这个答案的目的,保留ABC的关键是(与TM DP混合类​​的经典Python替代类(例如UserDict.DictMixin相比,TM DP功能可以说是更自然的放置))是它们使isinstance(和issubclass)具有更多优势(在Python 2.6及更高版本中)比以前(在2.5及更低版本中)更具吸引力和普遍性,因此,相比之下,使类型相等性检查在最近的Python版本中比以前更加糟糕。

To summarize the contents of other (already good!) answers, isinstance caters for inheritance (an instance of a derived class is an instance of a base class, too), while checking for equality of type does not (it demands identity of types and rejects instances of subtypes, AKA subclasses).

Normally, in Python, you want your code to support inheritance, of course (since inheritance is so handy, it would be bad to stop code using yours from using it!), so isinstance is less bad than checking identity of types because it seamlessly supports inheritance.

It’s not that isinstance is good, mind you—it’s just less bad than checking equality of types. The normal, Pythonic, preferred solution is almost invariably “duck typing”: try using the argument as if it was of a certain desired type, do it in a try/except statement catching all exceptions that could arise if the argument was not in fact of that type (or any other type nicely duck-mimicking it;-), and in the except clause, try something else (using the argument “as if” it was of some other type).

basestring is, however, quite a special case—a builtin type that exists only to let you use isinstance (both str and unicode subclass basestring). Strings are sequences (you could loop over them, index them, slice them, …), but you generally want to treat them as “scalar” types—it’s somewhat incovenient (but a reasonably frequent use case) to treat all kinds of strings (and maybe other scalar types, i.e., ones you can’t loop on) one way, all containers (lists, sets, dicts, …) in another way, and basestring plus isinstance helps you do that—the overall structure of this idiom is something like:

if isinstance(x, basestring)
  return treatasscalar(x)
try:
  return treatasiter(iter(x))
except TypeError:
  return treatasscalar(x)

You could say that basestring is an Abstract Base Class (“ABC”)—it offers no concrete functionality to subclasses, but rather exists as a “marker”, mainly for use with isinstance. The concept is obviously a growing one in Python, since PEP 3119, which introduces a generalization of it, was accepted and has been implemented starting with Python 2.6 and 3.0.

The PEP makes it clear that, while ABCs can often substitute for duck typing, there is generally no big pressure to do that (see here). ABCs as implemented in recent Python versions do however offer extra goodies: isinstance (and issubclass) can now mean more than just “[an instance of] a derived class” (in particular, any class can be “registered” with an ABC so that it will show as a subclass, and its instances as instances of the ABC); and ABCs can also offer extra convenience to actual subclasses in a very natural way via Template Method design pattern applications (see here and here [[part II]] for more on the TM DP, in general and specifically in Python, independent of ABCs).

For the underlying mechanics of ABC support as offered in Python 2.6, see here; for their 3.1 version, very similar, see here. In both versions, standard library module collections (that’s the 3.1 version—for the very similar 2.6 version, see here) offers several useful ABCs.

For the purpose of this answer, the key thing to retain about ABCs (beyond an arguably more natural placement for TM DP functionality, compared to the classic Python alternative of mixin classes such as UserDict.DictMixin) is that they make isinstance (and issubclass) much more attractive and pervasive (in Python 2.6 and going forward) than they used to be (in 2.5 and before), and therefore, by contrast, make checking type equality an even worse practice in recent Python versions than it already used to be.


回答 1

这是一个无法isinstance达到的目标的示例type

class Vehicle:
    pass

class Truck(Vehicle):
    pass

在这种情况下,卡车对象是车辆,但是您会得到以下信息:

isinstance(Vehicle(), Vehicle)  # returns True
type(Vehicle()) == Vehicle      # returns True
isinstance(Truck(), Vehicle)    # returns True
type(Truck()) == Vehicle        # returns False, and this probably won't be what you want.

换句话说,isinstance对于子类也是如此。

另请参阅:如何在Python中比较对象的类型?

Here’s an example where isinstance achieves something that type cannot:

class Vehicle:
    pass

class Truck(Vehicle):
    pass

in this case, a truck object is a Vehicle, but you’ll get this:

isinstance(Vehicle(), Vehicle)  # returns True
type(Vehicle()) == Vehicle      # returns True
isinstance(Truck(), Vehicle)    # returns True
type(Truck()) == Vehicle        # returns False, and this probably won't be what you want.

In other words, isinstance is true for subclasses, too.

Also see: How to compare type of an object in Python?


回答 2

isinstance()type()Python 之间的区别?

进行类型检查

isinstance(obj, Base)

允许子类实例和多个可能的基数:

isinstance(obj, (Base1, Base2))

而类型检查

type(obj) is Base

仅支持引用的类型。


附带说明,is可能比

type(obj) == Base

因为类是单例。

避免类型检查-使用多态(鸭式输入)

在Python中,通常您希望为您的参数允许任何类型,将其按预期方式对待,如果对象的行为不符合预期,则会引发适当的错误。这被称为多态,也被称为鸭式打字。

def function_of_duck(duck):
    duck.quack()
    duck.swim()

如果上面的代码有效,我们可以假设我们的论点是鸭子。因此,我们可以传入的其他东西是鸭子的实际子类型:

function_of_duck(mallard)

或者像鸭子一样工作:

function_of_duck(object_that_quacks_and_swims_like_a_duck)

并且我们的代码仍然有效。

但是,在某些情况下,需要显式地进行类型检查。也许您与不同的对象类型有关。例如,Pandas Dataframe对象可以由字典记录构造。在这种情况下,您的代码需要知道它获取的参数类型,以便它可以正确处理它。

所以,要回答这个问题:

isinstance()type()Python 之间的区别?

请允许我展示一下区别:

type

假设您的函数获得某种类型的参数(构造函数的常见用例),则需要确保某种行为。如果您检查像这样的类型:

def foo(data):
    '''accepts a dict to construct something, string support in future'''
    if type(data) is not dict:
        # we're only going to test for dicts for now
        raise ValueError('only dicts are supported for now')

如果我们尝试传递dict作为的子类的dict (我们应该能够,如果我们期望我们的代码遵循Liskov Substitution的原理,则可以用子类型代替类型),则代码将中断!:

from collections import OrderedDict

foo(OrderedDict([('foo', 'bar'), ('fizz', 'buzz')]))

引发错误!

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "<stdin>", line 3, in foo
ValueError: argument must be a dict

isinstance

但是,如果使用isinstance,我们可以支持Liskov替换!:

def foo(a_dict):
    if not isinstance(a_dict, dict):
        raise ValueError('argument must be a dict')
    return a_dict

foo(OrderedDict([('foo', 'bar'), ('fizz', 'buzz')]))

退货 OrderedDict([('foo', 'bar'), ('fizz', 'buzz')])

抽象基类

实际上,我们可以做得更好。collections提供抽象基本类,这些基本类对各种类型强制执行最少的协议。在我们的情况下,如果仅希望使用Mapping协议,则可以执行以下操作,并且代码变得更加灵活:

from collections import Mapping

def foo(a_dict):
    if not isinstance(a_dict, Mapping):
        raise ValueError('argument must be a dict')
    return a_dict

对评论的回应:

应该注意的是,类型可以用来检查使用 type(obj) in (A, B, C)

是的,您可以测试类型的相等性,但是除了上面的以外,请使用多个基础进行控制流,除非您专门允许这些类型:

isinstance(obj, (A, B, C))

同样,区别在于isinstance支持子类,这些子类可以替换父类而又不会破坏程序,该属性称为Liskov替换。

更好的是,倒置依赖项,根本不检查特定类型。

结论

因此,由于我们希望支持替换子类,因此在大多数情况下,我们希望避免使用-进行类型检查,type而更喜欢使用isinstance-进行类型检查-除非您确实需要知道实例的确切类。

Differences between isinstance() and type() in Python?

Type-checking with

isinstance(obj, Base)

allows for instances of subclasses and multiple possible bases:

isinstance(obj, (Base1, Base2))

whereas type-checking with

type(obj) is Base

only supports the type referenced.


As a sidenote, is is likely more appropriate than

type(obj) == Base

because classes are singletons.

Avoid type-checking – use Polymorphism (duck-typing)

In Python, usually you want to allow any type for your arguments, treat it as expected, and if the object doesn’t behave as expected, it will raise an appropriate error. This is known as polymorphism, also known as duck-typing.

def function_of_duck(duck):
    duck.quack()
    duck.swim()

If the code above works, we can presume our argument is a duck. Thus we can pass in other things are actual sub-types of duck:

function_of_duck(mallard)

or that work like a duck:

function_of_duck(object_that_quacks_and_swims_like_a_duck)

and our code still works.

However, there are some cases where it is desirable to explicitly type-check. Perhaps you have sensible things to do with different object types. For example, the Pandas Dataframe object can be constructed from dicts or records. In such a case, your code needs to know what type of argument it is getting so that it can properly handle it.

So, to answer the question:

Differences between isinstance() and type() in Python?

Allow me to demonstrate the difference:

type

Say you need to ensure a certain behavior if your function gets a certain kind of argument (a common use-case for constructors). If you check for type like this:

def foo(data):
    '''accepts a dict to construct something, string support in future'''
    if type(data) is not dict:
        # we're only going to test for dicts for now
        raise ValueError('only dicts are supported for now')

If we try to pass in a dict that is a subclass of dict (as we should be able to, if we’re expecting our code to follow the principle of Liskov Substitution, that subtypes can be substituted for types) our code breaks!:

from collections import OrderedDict

foo(OrderedDict([('foo', 'bar'), ('fizz', 'buzz')]))

raises an error!

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "<stdin>", line 3, in foo
ValueError: argument must be a dict

isinstance

But if we use isinstance, we can support Liskov Substitution!:

def foo(a_dict):
    if not isinstance(a_dict, dict):
        raise ValueError('argument must be a dict')
    return a_dict

foo(OrderedDict([('foo', 'bar'), ('fizz', 'buzz')]))

returns OrderedDict([('foo', 'bar'), ('fizz', 'buzz')])

Abstract Base Classes

In fact, we can do even better. collections provides Abstract Base Classes that enforce minimal protocols for various types. In our case, if we only expect the Mapping protocol, we can do the following, and our code becomes even more flexible:

from collections import Mapping

def foo(a_dict):
    if not isinstance(a_dict, Mapping):
        raise ValueError('argument must be a dict')
    return a_dict

Response to comment:

It should be noted that type can be used to check against multiple classes using type(obj) in (A, B, C)

Yes, you can test for equality of types, but instead of the above, use the multiple bases for control flow, unless you are specifically only allowing those types:

isinstance(obj, (A, B, C))

The difference, again, is that isinstance supports subclasses that can be substituted for the parent without otherwise breaking the program, a property known as Liskov substitution.

Even better, though, invert your dependencies and don’t check for specific types at all.

Conclusion

So since we want to support substituting subclasses, in most cases, we want to avoid type-checking with type and prefer type-checking with isinstance – unless you really need to know the precise class of an instance.


回答 3

首选后者,因为它将正确处理子类。实际上,由于isinstance()的第二个参数可能是元组,因此您的示例编写起来甚至更加容易:

if isinstance(b, (str, unicode)):
    do_something_else()

或者,使用basestring抽象类:

if isinstance(b, basestring):
    do_something_else()

The latter is preferred, because it will handle subclasses properly. In fact, your example can be written even more easily because isinstance()‘s second parameter may be a tuple:

if isinstance(b, (str, unicode)):
    do_something_else()

or, using the basestring abstract class:

if isinstance(b, basestring):
    do_something_else()

回答 4

根据python文档,这是一条语句:

8.15。类型-内置类型的名称

从Python 2.2开始,内置的工厂函数(例如int()和) str()也是相应类型的名称。

所以isinstance()应该优先于type()

According to python documentation here is a statement:

8.15. types — Names for built-in types

Starting in Python 2.2, built-in factory functions such as int() and str() are also names for the corresponding types.

So isinstance() should be preferred over type().


回答 5

实际用法的区别在于它们如何处理booleans

TrueFalse只是关键字,平均10Python编写的。从而,

isinstance(True, int)

isinstance(False, int)

都回来了True。两个布尔值都是整数的实例。type()但是,它更聪明:

type(True) == int

返回False

A practical usage difference is how they handle booleans:

True and False are just keywords that mean 1 and 0 in python. Thus,

isinstance(True, int)

and

isinstance(False, int)

both return True. Both booleans are an instance of an integer. type(), however, is more clever:

type(True) == int

returns False.


回答 6

对于真正的差异,我们可以在中找到它code,但我找不到的默认行为的实现isinstance()

但是,我们可以根据__instancecheck__获得类似的abc .__ instancecheck__

从上方abc.__instancecheck__,使用以下测试后:

# file tree
# /test/__init__.py
# /test/aaa/__init__.py
# /test/aaa/aa.py
class b():
pass

# /test/aaa/a.py
import sys
sys.path.append('/test')

from aaa.aa import b
from aa import b as c

d = b()

print(b, c, d.__class__)
for i in [b, c, object]:
    print(i, '__subclasses__',  i.__subclasses__())
    print(i, '__mro__', i.__mro__)
    print(i, '__subclasshook__', i.__subclasshook__(d.__class__))
    print(i, '__subclasshook__', i.__subclasshook__(type(d)))
print(isinstance(d, b))
print(isinstance(d, c))

<class 'aaa.aa.b'> <class 'aa.b'> <class 'aaa.aa.b'>
<class 'aaa.aa.b'> __subclasses__ []
<class 'aaa.aa.b'> __mro__ (<class 'aaa.aa.b'>, <class 'object'>)
<class 'aaa.aa.b'> __subclasshook__ NotImplemented
<class 'aaa.aa.b'> __subclasshook__ NotImplemented
<class 'aa.b'> __subclasses__ []
<class 'aa.b'> __mro__ (<class 'aa.b'>, <class 'object'>)
<class 'aa.b'> __subclasshook__ NotImplemented
<class 'aa.b'> __subclasshook__ NotImplemented
<class 'object'> __subclasses__ [..., <class 'aaa.aa.b'>, <class 'aa.b'>]
<class 'object'> __mro__ (<class 'object'>,)
<class 'object'> __subclasshook__ NotImplemented
<class 'object'> __subclasshook__ NotImplemented
True
False

我得到以下结论type

# according to `abc.__instancecheck__`, they are maybe different! I have not found negative one 
type(INSTANCE) ~= INSTANCE.__class__
type(CLASS) ~= CLASS.__class__

对于isinstance

# guess from `abc.__instancecheck__`
return any(c in cls.__mro__ or c in cls.__subclasses__ or cls.__subclasshook__(c) for c in {INSTANCE.__class__, type(INSTANCE)})

顺便说一句:最好不要混用use relative and absolutely import,而要使用absolutely importproject_dir(由添加sys.path

For the real differences, we can find it in code, but I can’t find the implement of the default behavior of the isinstance().

However we can get the similar one abc.__instancecheck__ according to __instancecheck__.

From above abc.__instancecheck__, after using test below:

# file tree
# /test/__init__.py
# /test/aaa/__init__.py
# /test/aaa/aa.py
class b():
pass

# /test/aaa/a.py
import sys
sys.path.append('/test')

from aaa.aa import b
from aa import b as c

d = b()

print(b, c, d.__class__)
for i in [b, c, object]:
    print(i, '__subclasses__',  i.__subclasses__())
    print(i, '__mro__', i.__mro__)
    print(i, '__subclasshook__', i.__subclasshook__(d.__class__))
    print(i, '__subclasshook__', i.__subclasshook__(type(d)))
print(isinstance(d, b))
print(isinstance(d, c))

<class 'aaa.aa.b'> <class 'aa.b'> <class 'aaa.aa.b'>
<class 'aaa.aa.b'> __subclasses__ []
<class 'aaa.aa.b'> __mro__ (<class 'aaa.aa.b'>, <class 'object'>)
<class 'aaa.aa.b'> __subclasshook__ NotImplemented
<class 'aaa.aa.b'> __subclasshook__ NotImplemented
<class 'aa.b'> __subclasses__ []
<class 'aa.b'> __mro__ (<class 'aa.b'>, <class 'object'>)
<class 'aa.b'> __subclasshook__ NotImplemented
<class 'aa.b'> __subclasshook__ NotImplemented
<class 'object'> __subclasses__ [..., <class 'aaa.aa.b'>, <class 'aa.b'>]
<class 'object'> __mro__ (<class 'object'>,)
<class 'object'> __subclasshook__ NotImplemented
<class 'object'> __subclasshook__ NotImplemented
True
False

I get this conclusion, For type:

# according to `abc.__instancecheck__`, they are maybe different! I have not found negative one 
type(INSTANCE) ~= INSTANCE.__class__
type(CLASS) ~= CLASS.__class__

For isinstance:

# guess from `abc.__instancecheck__`
return any(c in cls.__mro__ or c in cls.__subclasses__ or cls.__subclasshook__(c) for c in {INSTANCE.__class__, type(INSTANCE)})

BTW: better not to mix use relative and absolutely import, use absolutely import from project_dir( added by sys.path)