标签归档:Python

我应该放#!(shebang)在Python脚本中,它应该采用什么形式?

问题:我应该放#!(shebang)在Python脚本中,它应该采用什么形式?

我应该把shebang放到我的Python脚本中吗?以什么形式?

#!/usr/bin/env python 

要么

#!/usr/local/bin/python

这些同样便携吗?最常用哪种形式?

注:龙卷风项目采用的家当。另一方面, Django项目没有。

Should I put the shebang in my Python scripts? In what form?

#!/usr/bin/env python 

or

#!/usr/local/bin/python

Are these equally portable? Which form is used most?

Note: the tornado project uses the shebang. On the other hand the Django project doesn’t.


回答 0

任何脚本中的shebang行都决定了脚本的执行能力,就像独立的可执行文件一样,无需python事先在终端中键入或在文件管理器中双击(正确配置时)。不必要,但通常放在那里,因此当有人看到在编辑器中打开文件时,他们会立即知道他们在看什么。但是,您使用的家当线IS重要。

Python 3脚本的正确用法是:

#!/usr/bin/env python3

默认为版本3.latest。对于Python 2.7.latest python2代替python3

不应使用以下内容(除了极少数情况下,您正在编写与Python 2.x和3.x兼容的代码):

#!/usr/bin/env python

中给出的原因是这些建议,PEP 394,是python可以指到python2python3在不同的系统。目前,它python2在大多数发行版中都涉及,但是在某些时候可能会改变。

另外,请勿使用:

#!/usr/local/bin/python

“在这种情况下,python可能安装在/ usr / bin / python或/ bin / python上,上述#!将失败。”

“#!/ usr / bin / env python”与“#!/ usr / local / bin / python”

The shebang line in any script determines the script’s ability to be executed like a standalone executable without typing python beforehand in the terminal or when double clicking it in a file manager (when configured properly). It isn’t necessary but generally put there so when someone sees the file opened in an editor, they immediately know what they’re looking at. However, which shebang line you use IS important.

Correct usage for Python 3 scripts is:

#!/usr/bin/env python3

This defaults to version 3.latest. For Python 2.7.latest use python2 in place of python3.

The following should NOT be used (except for the rare case that you are writing code which is compatible with both Python 2.x and 3.x):

#!/usr/bin/env python

The reason for these recommendations, given in PEP 394, is that python can refer either to python2 or python3 on different systems. It currently refers to python2 on most distributions, but that is likely to change at some point.

Also, DO NOT Use:

#!/usr/local/bin/python

“python may be installed at /usr/bin/python or /bin/python in those cases, the above #! will fail.”

“#!/usr/bin/env python” vs “#!/usr/local/bin/python”


回答 1

这实际上只是一个品味问题。添加shebang意味着人们可以根据需要直接调用脚本(假设它被标记为可执行文件);省略它只是意味着python必须手动调用。

无论哪种方式,运行该程序的最终结果都不会受到影响。这只是手段的选择。

It’s really just a matter of taste. Adding the shebang means people can invoke the script directly if they want (assuming it’s marked as executable); omitting it just means python has to be invoked manually.

The end result of running the program isn’t affected either way; it’s just options of the means.


回答 2

我应该把shebang放到我的Python脚本中吗?

将shebang放入Python脚本中以指示:

  • 该模块可以作为脚本运行
  • 它只能在python2,python3上运行还是与Python 2/3兼容?
  • 在POSIX上,如果要直接运行脚本而不python显式调用可执行文件,则很有必要

这些同样便携吗?最常用哪种形式?

如果您手动编写shebang ,请始终使用,#!/usr/bin/env python除非有特殊原因不使用它。即使在Windows(Python启动器)上也可以理解这种形式。

注意:已安装的脚本应使用特定的python可执行文件,例如/usr/bin/python/home/me/.virtualenvs/project/bin/python。如果您在Shell中激活virtualenv,如果某些工具损坏了,那就很糟糕。幸运的是,在大多数情况下,正确的shebang是由setuptools您或您的分发包工具自动创建的(在Windows上,setuptools可以.exe自动生成包装器脚本)。

换句话说,如果脚本在源签出中,则可能会看到#!/usr/bin/env python。如果已安装,则shebang是特定python可执行文件的路径,例如#!/usr/local/bin/python (注意:您不应手动编写来自后一类别的路径)。

要选择是否应该使用pythonpython2python3在家当,见PEP 394 -在类Unix系统中的“Python”命令

  • python应该仅在shebang行中用于与Python 2和3源兼容的脚本。

  • 为了最终更改Python的默认版本,应仅将Python 2脚本更新为与Python 3源兼容,或者python2在shebang行中使用。

Should I put the shebang in my Python scripts?

Put a shebang into a Python script to indicate:

  • this module can be run as a script
  • whether it can be run only on python2, python3 or is it Python 2/3 compatible
  • on POSIX, it is necessary if you want to run the script directly without invoking python executable explicitly

Are these equally portable? Which form is used most?

If you write a shebang manually then always use #!/usr/bin/env python unless you have a specific reason not to use it. This form is understood even on Windows (Python launcher).

Note: installed scripts should use a specific python executable e.g., /usr/bin/python or /home/me/.virtualenvs/project/bin/python. It is bad if some tool breaks if you activate a virtualenv in your shell. Luckily, the correct shebang is created automatically in most cases by setuptools or your distribution package tools (on Windows, setuptools can generate wrapper .exe scripts automatically).

In other words, if the script is in a source checkout then you will probably see #!/usr/bin/env python. If it is installed then the shebang is a path to a specific python executable such as #!/usr/local/bin/python (NOTE: you should not write the paths from the latter category manually).

To choose whether you should use python, python2, or python3 in the shebang, see PEP 394 – The “python” Command on Unix-Like Systems:

  • python should be used in the shebang line only for scripts that are source compatible with both Python 2 and 3.

  • in preparation for an eventual change in the default version of Python, Python 2 only scripts should either be updated to be source compatible with Python 3 or else to use python2 in the shebang line.


回答 3

如果您有多个版本的Python,并且脚本需要在特定版本下运行,那么在直接执行脚本时,she-bang可以确保使用正确的版本,例如:

#!/usr/bin/python2.7

请注意,脚本仍然可以通过完整的Python命令行或通过import运行,在这种情况下,she-bang会被忽略。但是对于直接运行的脚本,这是使用she-bang的一个不错的理由。

#!/usr/bin/env python 通常是更好的方法,但这在特殊情况下会有所帮助。

通常,最好建立一个Python虚拟环境,在这种情况下,泛型#!/usr/bin/env python将为virtualenv标识正确的Python实例。

If you have more than one version of Python and the script needs to run under a specific version, the she-bang can ensure the right one is used when the script is executed directly, for example:

#!/usr/bin/python2.7

Note the script could still be run via a complete Python command line, or via import, in which case the she-bang is ignored. But for scripts run directly, this is a decent reason to use the she-bang.

#!/usr/bin/env python is generally the better approach, but this helps with special cases.

Usually it would be better to establish a Python virtual environment, in which case the generic #!/usr/bin/env python would identify the correct instance of Python for the virtualenv.


回答 4

如果脚本旨在可执行,则应添加shebang。您还应该使用可将shebang修改为正确的安装软件来安装脚本,以使其可以在目标平台上运行。例如distutils和Distribute。

You should add a shebang if the script is intended to be executable. You should also install the script with an installing software that modifies the shebang to something correct so it will work on the target platform. Examples of this is distutils and Distribute.


回答 5

shebang的目的是让脚本在您要从外壳执行脚本时识别解释器类型。通常,并非总是如此,您可以通过从外部提供解释器来执行脚本。用法示例:python-x.x script.py

即使您没有shebang声明符,这也将起作用。

为什么第一个更“便携”的原因是因为它/usr/bin/env包含了PATH声明,该声明说明了系统可执行文件所在的所有目标。

注意:Tornado严格不使用shebang,而Django严格不使用。它随您执行应用程序主要功能的方式而异。

还:它与Python并没有变化。

The purpose of shebang is for the script to recognize the interpreter type when you want to execute the script from the shell. Mostly, and not always, you execute scripts by supplying the interpreter externally. Example usage: python-x.x script.py

This will work even if you don’t have a shebang declarator.

Why first one is more “portable” is because, /usr/bin/env contains your PATH declaration which accounts for all the destinations where your system executables reside.

NOTE: Tornado doesn’t strictly use shebangs, and Django strictly doesn’t. It varies with how you are executing your application’s main function.

ALSO: It doesn’t vary with Python.


回答 6

有时,如果答案不是很清楚(我的意思是,你不能,如果是或否决定),那么它没有太大的关系,直到答案,你可以忽略的问题清楚的。

#!唯一目的是为了启动脚本。Django会自行加载并使用源。不需要决定使用哪种解释器。这样,#!这里实际上没有任何意义。

通常,如果它是一个模块并且不能用作脚本,则无需使用#!。另一方面,模块源通常包含if __name__ == '__main__': ...至少一些琐碎的功能测试。然后#!再次有意义。

使用的一个好理由#!是当您同时使用Python 2和Python 3脚本时-它们必须由不同版本的Python解释。这样,您必须记住python手动启动脚本时必须使用的内容(无#!内部内容)。如果混合使用这些脚本,则最好使用#!内部脚本,使其成为可执行文件,然后将其作为可执行文件启动(chmod …)。

使用MS-Windows时,#!直到最近才有意义。Python 3.3引入了Windows Python启动器(py.exe和pyw.exe),该启动器读取#!行,检测已安装的Python版本并使用正确或明确需要的Python版本。由于扩展可以与程序相关联,因此在Windows中可以获得与基于Unix的系统中的execute标志类似的行为。

Sometimes, if the answer is not very clear (I mean you cannot decide if yes or no), then it does not matter too much, and you can ignore the problem until the answer is clear.

The #! only purpose is for launching the script. Django loads the sources on its own and uses them. It never needs to decide what interpreter should be used. This way, the #! actually makes no sense here.

Generally, if it is a module and cannot be used as a script, there is no need for using the #!. On the other hand, a module source often contains if __name__ == '__main__': ... with at least some trivial testing of the functionality. Then the #! makes sense again.

One good reason for using #! is when you use both Python 2 and Python 3 scripts — they must be interpreted by different versions of Python. This way, you have to remember what python must be used when launching the script manually (without the #! inside). If you have a mixture of such scripts, it is a good idea to use the #! inside, make them executable, and launch them as executables (chmod …).

When using MS-Windows, the #! had no sense — until recently. Python 3.3 introduces a Windows Python Launcher (py.exe and pyw.exe) that reads the #! line, detects the installed versions of Python, and uses the correct or explicitly wanted version of Python. As the extension can be associated with a program, you can get similar behaviour in Windows as with execute flag in Unix-based systems.


回答 7

当我最近在Windows 7上安装Python 3.6.1时,它还安装了Windows的Python启动器,该应用程序应该可以处理shebang行。但是,我发现Python Launcher并没有做到这一点:shebang行被忽略,并且始终使用Python 2.7.13(除非我使用py -3执行脚本)。

要解决此问题,我必须编辑Windows注册表项HKEY_LOCAL_MACHINE\SOFTWARE\Classes\Python.File\shell\open\command。这仍然有价值

"C:\Python27\python.exe" "%1" %*

从我以前的Python 2.7安装中获取。我将此注册表项值修改为

"C:\Windows\py.exe" "%1" %*

并且Python Launcher shebang行处理如上所述。

When I installed Python 3.6.1 on Windows 7 recently, it also installed the Python Launcher for Windows, which is supposed to handle the shebang line. However, I found that the Python Launcher did not do this: the shebang line was ignored and Python 2.7.13 was always used (unless I executed the script using py -3).

To fix this, I had to edit the Windows registry key HKEY_LOCAL_MACHINE\SOFTWARE\Classes\Python.File\shell\open\command. This still had the value

"C:\Python27\python.exe" "%1" %*

from my earlier Python 2.7 installation. I modified this registry key value to

"C:\Windows\py.exe" "%1" %*

and the Python Launcher shebang line processing worked as described above.


回答 8

如果您安装了不同的模块,并且需要使用特定的python安装,那么shebang似乎一开始受到限制。但是,您可以执行以下操作,使shebang首先作为shell脚本被调用,然后选择python。这是非常灵活的imo:

#!/bin/sh
#
# Choose the python we need. Explanation:
# a) '''\' translates to \ in shell, and starts a python multi-line string
# b) "" strings are treated as string concat by python, shell ignores them
# c) "true" command ignores its arguments
# c) exit before the ending ''' so the shell reads no further
# d) reset set docstrings to ignore the multiline comment code
#
"true" '''\'
PREFERRED_PYTHON=/Library/Frameworks/Python.framework/Versions/2.7/bin/python
ALTERNATIVE_PYTHON=/Library/Frameworks/Python.framework/Versions/3.6/bin/python3
FALLBACK_PYTHON=python3

if [ -x $PREFERRED_PYTHON ]; then
    echo Using preferred python $PREFERRED_PYTHON
    exec $PREFERRED_PYTHON "$0" "$@"
elif [ -x $ALTERNATIVE_PYTHON ]; then
    echo Using alternative python $ALTERNATIVE_PYTHON
    exec $ALTERNATIVE_PYTHON "$0" "$@"
else
    echo Using fallback python $FALLBACK_PYTHON
    exec python3 "$0" "$@"
fi
exit 127
'''

__doc__ = """What this file does"""
print(__doc__)
import platform
print(platform.python_version())

或许更好的办法是,促进跨多个python脚本的代码重用:

#!/bin/bash
"true" '''\'; source $(cd $(dirname ${BASH_SOURCE[@]}) &>/dev/null && pwd)/select.sh; exec $CHOSEN_PYTHON "$0" "$@"; exit 127; '''

然后select.sh具有:

PREFERRED_PYTHON=/Library/Frameworks/Python.framework/Versions/2.7/bin/python
ALTERNATIVE_PYTHON=/Library/Frameworks/Python.framework/Versions/3.6/bin/python3
FALLBACK_PYTHON=python3

if [ -x $PREFERRED_PYTHON ]; then
    CHOSEN_PYTHON=$PREFERRED_PYTHON
elif [ -x $ALTERNATIVE_PYTHON ]; then
    CHOSEN_PYTHON=$ALTERNATIVE_PYTHON
else
    CHOSEN_PYTHON=$FALLBACK_PYTHON
fi

If you have different modules installed and need to use a specific python install, then shebang appears to be limited at first. However, you can do tricks like the below to allow the shebang to be invoked first as a shell script and then choose python. This is very flexible imo:

#!/bin/sh
#
# Choose the python we need. Explanation:
# a) '''\' translates to \ in shell, and starts a python multi-line string
# b) "" strings are treated as string concat by python, shell ignores them
# c) "true" command ignores its arguments
# c) exit before the ending ''' so the shell reads no further
# d) reset set docstrings to ignore the multiline comment code
#
"true" '''\'
PREFERRED_PYTHON=/Library/Frameworks/Python.framework/Versions/2.7/bin/python
ALTERNATIVE_PYTHON=/Library/Frameworks/Python.framework/Versions/3.6/bin/python3
FALLBACK_PYTHON=python3

if [ -x $PREFERRED_PYTHON ]; then
    echo Using preferred python $PREFERRED_PYTHON
    exec $PREFERRED_PYTHON "$0" "$@"
elif [ -x $ALTERNATIVE_PYTHON ]; then
    echo Using alternative python $ALTERNATIVE_PYTHON
    exec $ALTERNATIVE_PYTHON "$0" "$@"
else
    echo Using fallback python $FALLBACK_PYTHON
    exec python3 "$0" "$@"
fi
exit 127
'''

__doc__ = """What this file does"""
print(__doc__)
import platform
print(platform.python_version())

Or better yet, perhaps, to facilitate code reuse across multiple python scripts:

#!/bin/bash
"true" '''\'; source $(cd $(dirname ${BASH_SOURCE[@]}) &>/dev/null && pwd)/select.sh; exec $CHOSEN_PYTHON "$0" "$@"; exit 127; '''

and then select.sh has:

PREFERRED_PYTHON=/Library/Frameworks/Python.framework/Versions/2.7/bin/python
ALTERNATIVE_PYTHON=/Library/Frameworks/Python.framework/Versions/3.6/bin/python3
FALLBACK_PYTHON=python3

if [ -x $PREFERRED_PYTHON ]; then
    CHOSEN_PYTHON=$PREFERRED_PYTHON
elif [ -x $ALTERNATIVE_PYTHON ]; then
    CHOSEN_PYTHON=$ALTERNATIVE_PYTHON
else
    CHOSEN_PYTHON=$FALLBACK_PYTHON
fi

回答 9

答:仅当您计划使其成为命令行可执行脚本时。

步骤如下:

首先,验证要使用的适当的shebang字符串:

which python

从中获取输出,并在第一行中将其添加(带有shebang#!)。

在我的系统上,它的响应如下:

$which python
/usr/bin/python

因此,您的shebang将如下所示:

#!/usr/bin/python

保存后,它仍将像以前一样运行,因为python会将第一行视为注释。

python filename.py

要使其成为命令,请将其复制以删除.py扩展名。

cp filename.py filename

告诉文件系统这将是可执行的:

chmod +x filename

要测试它,请使用:

./filename

最佳实践是将其移动到$ PATH中的某个位置,因此只需键入文件名即可。

sudo cp filename /usr/sbin

这样,它将可以在任何地方使用(文件名前没有./)

Answer: Only if you plan to make it a command-line executable script.

Here is the procedure:

Start off by verifying the proper shebang string to use:

which python

Take the output from that and add it (with the shebang #!) in the first line.

On my system it responds like so:

$which python
/usr/bin/python

So your shebang will look like:

#!/usr/bin/python

After saving, it will still run as before since python will see that first line as a comment.

python filename.py

To make it a command, copy it to drop the .py extension.

cp filename.py filename

Tell the file system that this will be executable:

chmod +x filename

To test it, use:

./filename

Best practice is to move it somewhere in your $PATH so all you need to type is the filename itself.

sudo cp filename /usr/sbin

That way it will work everywhere (without the ./ before the filename)


回答 10

绝对路径与逻辑路径:

关于可移植性,这实际上是一个关于Python解释器的路径是绝对路径还是Logical/usr/bin/env)的问题。

遇到这个和谈论这个问题在一般的方式,而不支持其他证明堆栈网站其他的答案,我已经进行了一些真的,真的在上过这个问题,颗粒测试和分析unix.stackexchange.com。与其在此处粘贴答案,不如将那些对比较分析感兴趣的人指向该答案:

https://unix.stackexchange.com/a/566019/334294

作为一名Linux工程师,我的目标始终是为我的开发人员客户端提供最合适的,优化的主机,因此,我确实需要一个可靠的解决方案来解决Python环境问题。测试后,我的观点是,在(2)选项中,she-bang 中的逻辑路径更好。

Absolute vs Logical Path:

This is really a question about whether the path to the Python interpreter should be absolute or Logical (/usr/bin/env) in respect to portability.

Encountering other answers on this and other Stack sites which talked about the issue in a general way without supporting proofs, I’ve performed some really, REALLY, granular testing & analysis on this very question on the unix.stackexchange.com. Rather than paste that answer here, I’ll point those interested to the comparative analysis to that answer:

https://unix.stackexchange.com/a/566019/334294

Being a Linux Engineer, my goal is always to provide the most suitable, optimized hosts for my developer clients, so the issue of Python environments was something I really needed a solid answer to. My view after the testing was that the logical path in the she-bang was the better of the (2) options.


回答 11

首先使用

which python

这将给出输出作为我的python解释器(二进制)所在的位置。

此输出可以是任何这样的

/usr/bin/python

要么

/bin/python

现在,适当选择shebang行并使用它。

概括地说,我们可以使用:

#!/usr/bin/env

要么

#!/bin/env

Use first

which python

This will give the output as the location where my python interpreter (binary) is present.

This output could be any such as

/usr/bin/python

or

/bin/python

Now appropriately select the shebang line and use it.

To generalize we can use:

#!/usr/bin/env

or

#!/bin/env

将Unix时间戳字符串转换为可读日期

问题:将Unix时间戳字符串转换为可读日期

我有一个表示Python中的unix时间戳的字符串(即“ 1284101485”),我想将其转换为可读的日期。使用时time.strftime,我得到TypeError

>>>import time
>>>print time.strftime("%B %d %Y", "1284101485")

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: argument must be 9-item sequence, not str

I have a string representing a unix timestamp (i.e. “1284101485”) in Python, and I’d like to convert it to a readable date. When I use time.strftime, I get a TypeError:

>>>import time
>>>print time.strftime("%B %d %Y", "1284101485")

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: argument must be 9-item sequence, not str

回答 0

使用datetime模块:

from datetime import datetime
ts = int("1284101485")

# if you encounter a "year is out of range" error the timestamp
# may be in milliseconds, try `ts /= 1000` in that case
print(datetime.utcfromtimestamp(ts).strftime('%Y-%m-%d %H:%M:%S'))

Use datetime module:

from datetime import datetime
ts = int("1284101485")

# if you encounter a "year is out of range" error the timestamp
# may be in milliseconds, try `ts /= 1000` in that case
print(datetime.utcfromtimestamp(ts).strftime('%Y-%m-%d %H:%M:%S'))

回答 1

>>> from datetime import datetime
>>> datetime.fromtimestamp(1172969203.1)
datetime.datetime(2007, 3, 4, 0, 46, 43, 100000)

取自http://seehuhn.de/pages/pdate

>>> from datetime import datetime
>>> datetime.fromtimestamp(1172969203.1)
datetime.datetime(2007, 3, 4, 0, 46, 43, 100000)

Taken from http://seehuhn.de/pages/pdate


回答 2

投票最多的答案建议使用fromtimestamp,因为它使用本地时区,因此容易出错。为了避免出现问题,更好的方法是使用UTC:

datetime.datetime.utcfromtimestamp(posix_time).strftime('%Y-%m-%dT%H:%M:%SZ')

其中posix_time是要转换的Posix纪元时间

The most voted answer suggests using fromtimestamp which is error prone since it uses the local timezone. To avoid issues a better approach is to use UTC:

datetime.datetime.utcfromtimestamp(posix_time).strftime('%Y-%m-%dT%H:%M:%SZ')

Where posix_time is the Posix epoch time you want to convert


回答 3

>>> import time
>>> time.ctime(int("1284101485"))
'Fri Sep 10 16:51:25 2010'
>>> time.strftime("%D %H:%M", time.localtime(int("1284101485")))
'09/10/10 16:51'
>>> import time
>>> time.ctime(int("1284101485"))
'Fri Sep 10 16:51:25 2010'
>>> time.strftime("%D %H:%M", time.localtime(int("1284101485")))
'09/10/10 16:51'

回答 4

有两个部分:

  1. 将Unix时间戳(“自纪元以来的秒数”)转换为本地时间
  2. 以所需格式显示当地时间。

即使本地时区过去有不同的utc偏移并且python无法访问tz数据库,获取本地时间有效的一种便携式方法是使用pytz时区:

#!/usr/bin/env python
from datetime import datetime
import tzlocal  # $ pip install tzlocal

unix_timestamp = float("1284101485")
local_timezone = tzlocal.get_localzone() # get pytz timezone
local_time = datetime.fromtimestamp(unix_timestamp, local_timezone)

要显示它,您可以使用系统支持的任何时间格式,例如:

print(local_time.strftime("%Y-%m-%d %H:%M:%S.%f%z (%Z)"))
print(local_time.strftime("%B %d %Y"))  # print date in your format

如果您不需要当地时间,请改为获取可读的UTC时间:

utc_time = datetime.utcfromtimestamp(unix_timestamp)
print(utc_time.strftime("%Y-%m-%d %H:%M:%S.%f+00:00 (UTC)"))

如果您不关心可能影响返回日期的时区问题,或者python是否有权访问系统上的tz数据库:

local_time = datetime.fromtimestamp(unix_timestamp)
print(local_time.strftime("%Y-%m-%d %H:%M:%S.%f"))

在Python 3上,您可以仅使用stdlib获得时区感知日期时间(如果python无法访问系统上的tz数据库,例如Windows上的UTC偏移量可能是错误的):

#!/usr/bin/env python3
from datetime import datetime, timezone

utc_time = datetime.fromtimestamp(unix_timestamp, timezone.utc)
local_time = utc_time.astimezone()
print(local_time.strftime("%Y-%m-%d %H:%M:%S.%f%z (%Z)"))

time模块中的函数是对应C API的薄包装,因此它们可能比对应datetime方法的可移植性差,否则您也可以使用它们:

#!/usr/bin/env python
import time

unix_timestamp  = int("1284101485")
utc_time = time.gmtime(unix_timestamp)
local_time = time.localtime(unix_timestamp)
print(time.strftime("%Y-%m-%d %H:%M:%S", local_time)) 
print(time.strftime("%Y-%m-%d %H:%M:%S+00:00 (UTC)", utc_time))  

There are two parts:

  1. Convert the unix timestamp (“seconds since epoch”) to the local time
  2. Display the local time in the desired format.

A portable way to get the local time that works even if the local time zone had a different utc offset in the past and python has no access to the tz database is to use a pytz timezone:

#!/usr/bin/env python
from datetime import datetime
import tzlocal  # $ pip install tzlocal

unix_timestamp = float("1284101485")
local_timezone = tzlocal.get_localzone() # get pytz timezone
local_time = datetime.fromtimestamp(unix_timestamp, local_timezone)

To display it, you could use any time format that is supported by your system e.g.:

print(local_time.strftime("%Y-%m-%d %H:%M:%S.%f%z (%Z)"))
print(local_time.strftime("%B %d %Y"))  # print date in your format

If you do not need a local time, to get a readable UTC time instead:

utc_time = datetime.utcfromtimestamp(unix_timestamp)
print(utc_time.strftime("%Y-%m-%d %H:%M:%S.%f+00:00 (UTC)"))

If you don’t care about the timezone issues that might affect what date is returned or if python has access to the tz database on your system:

local_time = datetime.fromtimestamp(unix_timestamp)
print(local_time.strftime("%Y-%m-%d %H:%M:%S.%f"))

On Python 3, you could get a timezone-aware datetime using only stdlib (the UTC offset may be wrong if python has no access to the tz database on your system e.g., on Windows):

#!/usr/bin/env python3
from datetime import datetime, timezone

utc_time = datetime.fromtimestamp(unix_timestamp, timezone.utc)
local_time = utc_time.astimezone()
print(local_time.strftime("%Y-%m-%d %H:%M:%S.%f%z (%Z)"))

Functions from the time module are thin wrappers around the corresponding C API and therefore they may be less portable than the corresponding datetime methods otherwise you could use them too:

#!/usr/bin/env python
import time

unix_timestamp  = int("1284101485")
utc_time = time.gmtime(unix_timestamp)
local_time = time.localtime(unix_timestamp)
print(time.strftime("%Y-%m-%d %H:%M:%S", local_time)) 
print(time.strftime("%Y-%m-%d %H:%M:%S+00:00 (UTC)", utc_time))  

回答 5

为了使UNIX时间戳易于理解,我以前在脚本中使用过它:

import os, datetime

datetime.datetime.fromtimestamp(float(os.path.getmtime("FILE"))).strftime("%B %d, %Y")

输出:

‘2012年12月26日’

For a human readable timestamp from a UNIX timestamp, I have used this in scripts before:

import os, datetime

datetime.datetime.fromtimestamp(float(os.path.getmtime("FILE"))).strftime("%B %d, %Y")

Output:

‘December 26, 2012’


回答 6

您可以像这样转换当前时间

t=datetime.fromtimestamp(time.time())
t.strftime('%Y-%m-%d')
'2012-03-07'

将字符串中的日期转换为其他格式。

import datetime,time

def createDateObject(str_date,strFormat="%Y-%m-%d"):    
    timeStamp = time.mktime(time.strptime(str_date,strFormat))
    return datetime.datetime.fromtimestamp(timeStamp)

def FormatDate(objectDate,strFormat="%Y-%m-%d"):
    return objectDate.strftime(strFormat)

Usage
=====
o=createDateObject('2013-03-03')
print FormatDate(o,'%d-%m-%Y')

Output 03-03-2013

You can convert the current time like this

t=datetime.fromtimestamp(time.time())
t.strftime('%Y-%m-%d')
'2012-03-07'

To convert a date in string to different formats.

import datetime,time

def createDateObject(str_date,strFormat="%Y-%m-%d"):    
    timeStamp = time.mktime(time.strptime(str_date,strFormat))
    return datetime.datetime.fromtimestamp(timeStamp)

def FormatDate(objectDate,strFormat="%Y-%m-%d"):
    return objectDate.strftime(strFormat)

Usage
=====
o=createDateObject('2013-03-03')
print FormatDate(o,'%d-%m-%Y')

Output 03-03-2013

回答 7

除了使用time / datetime包之外,pandas还可以用于解决相同的问题。这是我们可以使用pandas时间戳转换为可读日期的方法

时间戳可以有两种格式:

  1. 13位数字(毫秒)-要将毫秒转换日期,请使用:

    import pandas
    result_ms=pandas.to_datetime('1493530261000',unit='ms')
    str(result_ms)
    
    Output: '2017-04-30 05:31:01'
  2. 10位(秒)-要将转换为日期,请使用:

    import pandas
    result_s=pandas.to_datetime('1493530261',unit='s')
    str(result_s)
    
    Output: '2017-04-30 05:31:01'

Other than using time/datetime package, pandas can also be used to solve the same problem.Here is how we can use pandas to convert timestamp to readable date:

Timestamps can be in two formats:

  1. 13 digits(milliseconds) – To convert milliseconds to date, use:

    import pandas
    result_ms=pandas.to_datetime('1493530261000',unit='ms')
    str(result_ms)
    
    Output: '2017-04-30 05:31:01'
    
  2. 10 digits(seconds) – To convert seconds to date, use:

    import pandas
    result_s=pandas.to_datetime('1493530261',unit='s')
    str(result_s)
    
    Output: '2017-04-30 05:31:01'
    

回答 8

timestamp ="124542124"
value = datetime.datetime.fromtimestamp(timestamp)
exct_time = value.strftime('%d %B %Y %H:%M:%S')

您还可以从时间戳获取带有时间的可读日期,也可以更改日期格式。

timestamp ="124542124"
value = datetime.datetime.fromtimestamp(timestamp)
exct_time = value.strftime('%d %B %Y %H:%M:%S')

Get the readable date from timestamp with time also, also you can change the format of the date.


回答 9

在Python 3.6+中:

import datetime

timestamp = 1579117901
value = datetime.datetime.fromtimestamp(timestamp)
print(f"{value:%Y-%m-%d %H:%M:%S}")

输出:

2020-01-15 19:51:41

说明:

In Python 3.6+:

import datetime

timestamp = 1579117901
value = datetime.datetime.fromtimestamp(timestamp)
print(f"{value:%Y-%m-%d %H:%M:%S}")

Output:

2020-01-15 19:51:41

Explanation:


回答 10

import datetime
temp = datetime.datetime.fromtimestamp(1386181800).strftime('%Y-%m-%d %H:%M:%S')
print temp
import datetime
temp = datetime.datetime.fromtimestamp(1386181800).strftime('%Y-%m-%d %H:%M:%S')
print temp

回答 11

可以使用gmtime和format函数完成此操作的另一种方法;

from time import gmtime
print('{}-{}-{} {}:{}:{}'.format(*gmtime(1538654264.703337)))

输出: 2018-10-4 11:57:44

Another way that this can be done using gmtime and format function;

from time import gmtime
print('{}-{}-{} {}:{}:{}'.format(*gmtime(1538654264.703337)))

Output: 2018-10-4 11:57:44


回答 12

我刚刚成功使用过:

>>> type(tstamp)
pandas.tslib.Timestamp
>>> newDt = tstamp.date()
>>> type(newDt)
datetime.date

i just successfully used:

>>> type(tstamp)
pandas.tslib.Timestamp
>>> newDt = tstamp.date()
>>> type(newDt)
datetime.date

回答 13

快速又脏的一个衬里:

'-'.join(str(x) for x in list(tuple(datetime.datetime.now().timetuple())[:6]))

‘2013-5-5-1-9-43’

quick and dirty one liner:

'-'.join(str(x) for x in list(tuple(datetime.datetime.now().timetuple())[:6]))

‘2013-5-5-1-9-43’


回答 14

您可以使用easy_date使其变得容易:

import date_converter
my_date_string = date_converter.timestamp_to_string(1284101485, "%B %d, %Y")

You can use easy_date to make it easy:

import date_converter
my_date_string = date_converter.timestamp_to_string(1284101485, "%B %d, %Y")

pip安装mysql-python失败,并显示EnvironmentError:找不到mysql_config

问题:pip安装mysql-python失败,并显示EnvironmentError:找不到mysql_config

这是我得到的错误

(mysite)zjm1126@zjm1126-G41MT-S2:~/zjm_test/mysite$ pip install mysql-python
Downloading/unpacking mysql-python
  Downloading MySQL-python-1.2.3.tar.gz (70Kb): 70Kb downloaded
  Running setup.py egg_info for package mysql-python
    sh: mysql_config: not found
    Traceback (most recent call last):
      File "<string>", line 14, in <module>
      File "/home/zjm1126/zjm_test/mysite/build/mysql-python/setup.py", line 15, in <module>
        metadata, options = get_config()
      File "setup_posix.py", line 43, in get_config
        libs = mysql_config("libs_r")
      File "setup_posix.py", line 24, in mysql_config
        raise EnvironmentError("%s not found" % (mysql_config.path,))
    EnvironmentError: mysql_config not found
    Complete output from command python setup.py egg_info:
    sh: mysql_config: not found

Traceback (most recent call last):

  File "<string>", line 14, in <module>

  File "/home/zjm1126/zjm_test/mysite/build/mysql-python/setup.py", line 15, in <module>

    metadata, options = get_config()

  File "setup_posix.py", line 43, in get_config

    libs = mysql_config("libs_r")

  File "setup_posix.py", line 24, in mysql_config

    raise EnvironmentError("%s not found" % (mysql_config.path,))

EnvironmentError: mysql_config not found

----------------------------------------
Command python setup.py egg_info failed with error code 1
Storing complete log in /home/zjm1126/.pip/pip.log
(mysite)zjm1126@zjm1126-G41MT-S2:~/zjm_test/mysite$ pip install mysql-python
Downloading/unpacking mysql-python
  Running setup.py egg_info for package mysql-python
    sh: mysql_config: not found
    Traceback (most recent call last):
      File "<string>", line 14, in <module>
      File "/home/zjm1126/zjm_test/mysite/build/mysql-python/setup.py", line 15, in <module>
        metadata, options = get_config()
      File "setup_posix.py", line 43, in get_config
        libs = mysql_config("libs_r")
      File "setup_posix.py", line 24, in mysql_config
        raise EnvironmentError("%s not found" % (mysql_config.path,))
    EnvironmentError: mysql_config not found
    Complete output from command python setup.py egg_info:
    sh: mysql_config: not found

Traceback (most recent call last):

  File "<string>", line 14, in <module>

  File "/home/zjm1126/zjm_test/mysite/build/mysql-python/setup.py", line 15, in <module>

    metadata, options = get_config()

  File "setup_posix.py", line 43, in get_config

    libs = mysql_config("libs_r")

  File "setup_posix.py", line 24, in mysql_config

    raise EnvironmentError("%s not found" % (mysql_config.path,))

EnvironmentError: mysql_config not found

----------------------------------------
Command python setup.py egg_info failed with error code 1
Storing complete log in /home/zjm1126/.pip/pip.log

我该怎么解决?

This is the error I get

(mysite)zjm1126@zjm1126-G41MT-S2:~/zjm_test/mysite$ pip install mysql-python
Downloading/unpacking mysql-python
  Downloading MySQL-python-1.2.3.tar.gz (70Kb): 70Kb downloaded
  Running setup.py egg_info for package mysql-python
    sh: mysql_config: not found
    Traceback (most recent call last):
      File "<string>", line 14, in <module>
      File "/home/zjm1126/zjm_test/mysite/build/mysql-python/setup.py", line 15, in <module>
        metadata, options = get_config()
      File "setup_posix.py", line 43, in get_config
        libs = mysql_config("libs_r")
      File "setup_posix.py", line 24, in mysql_config
        raise EnvironmentError("%s not found" % (mysql_config.path,))
    EnvironmentError: mysql_config not found
    Complete output from command python setup.py egg_info:
    sh: mysql_config: not found

Traceback (most recent call last):

  File "<string>", line 14, in <module>

  File "/home/zjm1126/zjm_test/mysite/build/mysql-python/setup.py", line 15, in <module>

    metadata, options = get_config()

  File "setup_posix.py", line 43, in get_config

    libs = mysql_config("libs_r")

  File "setup_posix.py", line 24, in mysql_config

    raise EnvironmentError("%s not found" % (mysql_config.path,))

EnvironmentError: mysql_config not found

----------------------------------------
Command python setup.py egg_info failed with error code 1
Storing complete log in /home/zjm1126/.pip/pip.log
(mysite)zjm1126@zjm1126-G41MT-S2:~/zjm_test/mysite$ pip install mysql-python
Downloading/unpacking mysql-python
  Running setup.py egg_info for package mysql-python
    sh: mysql_config: not found
    Traceback (most recent call last):
      File "<string>", line 14, in <module>
      File "/home/zjm1126/zjm_test/mysite/build/mysql-python/setup.py", line 15, in <module>
        metadata, options = get_config()
      File "setup_posix.py", line 43, in get_config
        libs = mysql_config("libs_r")
      File "setup_posix.py", line 24, in mysql_config
        raise EnvironmentError("%s not found" % (mysql_config.path,))
    EnvironmentError: mysql_config not found
    Complete output from command python setup.py egg_info:
    sh: mysql_config: not found

Traceback (most recent call last):

  File "<string>", line 14, in <module>

  File "/home/zjm1126/zjm_test/mysite/build/mysql-python/setup.py", line 15, in <module>

    metadata, options = get_config()

  File "setup_posix.py", line 43, in get_config

    libs = mysql_config("libs_r")

  File "setup_posix.py", line 24, in mysql_config

    raise EnvironmentError("%s not found" % (mysql_config.path,))

EnvironmentError: mysql_config not found

----------------------------------------
Command python setup.py egg_info failed with error code 1
Storing complete log in /home/zjm1126/.pip/pip.log

What can I do to resolve this?


回答 0

看来您的系统上缺少mysql_config或安装程序找不到它。确保确实安装了mysql_config。

例如,在Debian / Ubuntu上,您必须安装软件包:

sudo apt-get install libmysqlclient-dev

也许mysql_config不在您的路径中,当您自己编译mysql套件时就是这种情况。

更新:对于最新版本的debian / ubuntu(截至2018年),它是

sudo apt install default-libmysqlclient-dev

It seems mysql_config is missing on your system or the installer could not find it. Be sure mysql_config is really installed.

For example on Debian/Ubuntu you must install the package:

sudo apt-get install libmysqlclient-dev

Maybe the mysql_config is not in your path, it will be the case when you compile by yourself the mysql suite.

Update: For recent versions of debian/ubuntu (as of 2018) it is

sudo apt install default-libmysqlclient-dev

回答 1

在Mac OS中,我只是在终端中运行此程序来修复:

export PATH=$PATH:/usr/local/mysql/bin

这是我找到的最快的修复程序-将其添加到路径中,但是/etc/paths如果您打算在其他环境中安装MySQL-python,最好永久添加(即将其添加到)。

(在OSX Mountain Lion中测试)

In Mac OS, I simply ran this in terminal to fix:

export PATH=$PATH:/usr/local/mysql/bin

This is the quickest fix I found – it adds it to the path, but I think you’re better off adding it permanently (ie add it to /etc/paths) if you plan to install MySQL-python in another environment.

(tested in OSX Mountain Lion)


回答 2

apt-get install libmysqlclient-dev python-dev

似乎做到了。

apt-get install libmysqlclient-dev python-dev

Seemed to do the trick.


回答 3

上面的问题可能有各种答案,下面是一个汇总的解决方案。

对于Ubuntu:

$ sudo apt update
$ sudo apt install python-dev
$ sudo apt install python-MySQLdb

对于CentOS:

$ yum install python-devel mysql-devel

There maybe various answers for the above issue, below is a aggregated solution.

For Ubuntu:

$ sudo apt update
$ sudo apt install python-dev
$ sudo apt install python-MySQLdb

For CentOS:

$ yum install python-devel mysql-devel

回答 4

如果您使用的是MAC,请全局安装

brew install mysql

然后像这样导出路径

export PATH=$PATH:/usr/local/mysql/bin

比全球或您喜欢的任何方式

pip install MySQL-Python

注意:全局适用于python3,因为Mac可以同时拥有python2和3

pip3 install MySQL-Python

If you are on MAC Install this globally

brew install mysql

then export path like this

export PATH=$PATH:/usr/local/mysql/bin

Than globally or in your venv whatever you like

pip install MySQL-Python

Note: globally for python3 as Mac can have both python2 & 3

pip3 install MySQL-Python

回答 5

您可以使用MySQL Connector / Python

通过PyPip安装

pip install mysql-connector-python

可以在MySQL Connector / Python 1.0.5 beta公告博客上找到更多信息。

在Launchpad上,有一个很好的示例,说明如何使用该库添加,编辑或删除数据

You can use the MySQL Connector/Python

Installation via PyPip

pip install mysql-connector-python

Further information can be found on the MySQL Connector/Python 1.0.5 beta announcement blog.

On Launchpad there’s a good example of how to add-, edit- or remove data with the library.


回答 6

对于centos用户:

yum install -y mysql-devel python-devel python-setuptools

然后

pip install MySQL-python


如果此解决方案不起作用,请打印gcc编译错误,例如:
_mysql.c:29:20: error: Python.h: No such file or directory

您需要指定的路径Python.h,如下所示:
pip install --global-option=build_ext --global-option="-I/usr/include/python2.6" MySQL-python

For centos users:

yum install -y mysql-devel python-devel python-setuptools

then

pip install MySQL-python


If this solution doesn’t work, and print gcc compile error like:
_mysql.c:29:20: error: Python.h: No such file or directory

You need to specify the path of Python.h, like this:
pip install --global-option=build_ext --global-option="-I/usr/include/python2.6" MySQL-python


回答 7

我试图mysql-python在Amazon EC2 Linux实例上安装,但我必须安装这些:

yum install mysql mysql-devel mysql-common mysql-libs gcc

但是后来我得到了这个错误:

_mysql.c:29:20: fatal error: Python.h: No such file or directory

所以我安装了:

yum install python-devel

那就成功了。

I was trying to install mysql-python on an Amazon EC2 Linux instance and I had to install these :

yum install mysql mysql-devel mysql-common mysql-libs gcc

But then I got this error :

_mysql.c:29:20: fatal error: Python.h: No such file or directory

So I installed :

yum install python-devel

And that did the trick.


回答 8

对于任何使用MariaDB而不是MySQL的用户,解决方案是安装libmariadbclient-dev软件包并创建指向具有正确名称的配置文件的符号链接。

例如,这对我有用:

ln -s /usr/bin/mariadb_config /usr/bin/mysql_config

For anyone that is using MariaDB instead of MySQL, the solution is to install the libmariadbclient-dev package and create a symbolic link to the config file with the correct name.

For example this worked for me:

ln -s /usr/bin/mariadb_config /usr/bin/mysql_config

回答 9

尝试 sudo apt-get build-dep python-mysqldb

Try sudo apt-get build-dep python-mysqldb


回答 10

OSX小牛

由于osx mavericks和xcode开发工具中的更改,您可能会在安装时得到错误

clang: error: unknown argument: '-mno-fused-madd' [-Wunused-command-line-argument-hard-error-in-future]

因此使用:

sudo ARCHFLAGS=-Wno-error=unused-command-line-argument-hard-error-in-future pip install mysql-python

OSX Mavericks

Due to changes within osx mavericks & xcode development tools you may get the error on installation

clang: error: unknown argument: '-mno-fused-madd' [-Wunused-command-line-argument-hard-error-in-future]

therefore use :

sudo ARCHFLAGS=-Wno-error=unused-command-line-argument-hard-error-in-future pip install mysql-python

回答 11

对于Linux

这对我有用

yum install python-devel mysql-devel

For Linux

this works for me

yum install python-devel mysql-devel

回答 12

对于mariadb,请安装lib mariadb client-dev而不是libmysqlclient-dev

sudo apt-get install libmariadbclient-dev

for mariadb install libmariadbclient-dev instead of libmysqlclient-dev

sudo apt-get install libmariadbclient-dev

回答 13

您应该安装第mysql一个:

yum install python-devel mysql-community-devel -y

然后您可以安装mysqlclient

pip install  mysqlclient

You should install the mysql first:

yum install python-devel mysql-community-devel -y

Then you can install mysqlclient:

pip install  mysqlclient

回答 14

有时,错误取决于实际原因。我们曾经遇到过通过python-mysqldb debian软件包安装mysql-python的情况。

一个不知道这一点的开发人员,无意中跑了出来,但pip uninstall mysql-python由于pip install mysql-python给出上述错误而无法恢复。

pip uninstall mysql-python已经破坏了debian软件包的内容,当然pip install mysql-python失败了,因为debian软件包不需要任何dev文件。

在这种情况下,正确的解决方案是apt-get install --reinstall python-mysqldb将mysql-python恢复到其原始状态。

sometimes the error depends on the actual cause. we had a case where mysql-python was installed through the python-mysqldb debian package.

a developer who didn’t know this, accidentally ran pip uninstall mysql-python and then failed to recover with pip install mysql-python giving the above error.

pip uninstall mysql-python had destroyed the debian package contents, and of course pip install mysql-python failed because the debian package didn’t need any dev files.

the correct solution in that case was apt-get install --reinstall python-mysqldb which restored mysql-python to its original state.


回答 15

我在Terraform:light容器中遇到了同样的问题。它基于高山。

在那里,您必须使用以下命令安装mariadb-dev:

apk add mariadb-dev

但是,这还不够,因为还遗漏了所有其他依赖项:

apk add python2 py2-pip gcc python2-dev musl-dev

I had the same problem in the Terraform:light container. It is based on Alpine.

There you have to install mariadb-dev with:

apk add mariadb-dev

But that one is not enough because also all the other dependencies are missed:

apk add python2 py2-pip gcc python2-dev musl-dev

回答 16

要遵循的顺序。

pip install mysqlclient
sudo apt-get install python3-dev libmysqlclient-dev
pip install configparser 
sudo cp /usr/lib/python3.6/configparser.py /usr/lib/python3.6/ConfigParser.py 

然后尝试再次安装MYSQL-python。对我有用

Sequence to be followed.

pip install mysqlclient
sudo apt-get install python3-dev libmysqlclient-dev
pip install configparser 
sudo cp /usr/lib/python3.6/configparser.py /usr/lib/python3.6/ConfigParser.py 

Then try to install the MYSQL-python again. That Worked for me


回答 17

尝试在OS X Server 10.6.8上安装时遇到了类似的问题。这就是我要做的。使用:

MySQL-python 1.2.4b4(源)MySQL-5.6.19(二进制安装程序)Python 2.7(二进制安装程序)注意:在virtualenv中安装…

解压缩源代码,打开’distribute_setup.py’并编辑DEFAULT_VERSION以使用最新版本的分发工具,如下所示:

DEFAULT_VERSION = "0.6.49"

救。打开“ site.cfg”文件,取消注释mysql_config的路径,使其看起来像(参考您自己的mysql_config路径):

# The path to mysql_config.
# Only use this if mysql_config is not on your PATH, or you have some weird
# setup that requires it.
mysql_config = /usr/local/mysql/bin/mysql_config

现在,清理,构建和制作不会因找不到“ mysql_config”错误而失败。希望这可以帮助其他尝试利用其旧xserve的人:-)

Had a similar issue trying to install on OS X Server 10.6.8. Here’s what I had to do. Using:

MySQL-python 1.2.4b4 (source) MySQL-5.6.19 (binary installer) Python 2.7 (binary installer) NOTE: Installing in virtualenv…

Unzip source, open ‘distribute_setup.py’ and edit DEFAULT_VERSION to use the latest version of distribute tools, like so:

DEFAULT_VERSION = "0.6.49"

Save. Open ‘site.cfg’ file and uncomment the path to mysql_config so it looks something like (reference your own path to mysql_config):

# The path to mysql_config.
# Only use this if mysql_config is not on your PATH, or you have some weird
# setup that requires it.
mysql_config = /usr/local/mysql/bin/mysql_config

Now clean, build and make will not fail with the ‘mysql_config’ not found error. Hope this helps someone else trying to make use of their old xserves :-)


回答 18

您的sudo路径不知道您的本地路径…进入超级用户模式,添加路径,然后从那里安装它。

sudo su
export PATH=$PATH:/usr/local/mysql/bin/
pip install mysql-python
exit

您就可以在OSX上运行了。现在,您有了一个更新的全局python。

Your sudo path does not know about your local path… go into superuser mode, add the path, and install it from there.

sudo su
export PATH=$PATH:/usr/local/mysql/bin/
pip install mysql-python
exit

And you’re up and running on OSX. Now you have an updated global python.


回答 19

如果在虚拟环境中安装MySQL-python,则应检查pip版本,如果该版本早于9.0.1,请进行更新

pip install --upgrade pip

if you install MySQL-python in your virtual env, you should check the pip version, if the version is older than 9.0.1, please update it

pip install --upgrade pip

回答 20

在MacOS Mojave上,mysql_config位于/ usr / local / bin /而不是如上所述的/ usr / local / mysql / bin,因此无需在路径中添加任何内容。

on MacOS Mojave, mysql_config is found at /usr/local/bin/ rather than /usr/local/mysql/bin as pointed above, so no need to add anything to path.


检查列表中是否存在值的最快方法

问题:检查列表中是否存在值的最快方法

最快的方法是什么才能知道列表中是否存在值(列表中包含数百万个值)及其索引是什么?

我知道列表中的所有值都是唯一的,如本例所示。

我尝试的第一种方法是(在我的实际代码中为3.8秒):

a = [4,2,3,1,5,6]

if a.count(7) == 1:
    b=a.index(7)
    "Do something with variable b"

我尝试的第二种方法是(速度提高了2倍:实际代码为1.9秒):

a = [4,2,3,1,5,6]

try:
    b=a.index(7)
except ValueError:
    "Do nothing"
else:
    "Do something with variable b"

堆栈溢出用户建议的方法(我的实际代码为2.74秒):

a = [4,2,3,1,5,6]
if 7 in a:
    a.index(7)

在我的真实代码中,第一种方法耗时3.81秒,第二种方法耗时1.88秒。这是一个很好的改进,但是:

我是使用Python /脚本的初学者,有没有更快的方法来完成相同的事情并节省更多的处理时间?

我的应用程序更具体的说明:

在Blender API中,我可以访问粒子列表:

particles = [1, 2, 3, 4, etc.]

从那里,我可以访问粒子的位置:

particles[x].location = [x,y,z]

对于每个粒子,我通过搜索每个粒子位置来测试是否存在邻居:

if [x+1,y,z] in particles.location
    "Find the identity of this neighbour particle in x:the particle's index
    in the array"
    particles.index([x+1,y,z])

What is the fastest way to know if a value exists in a list (a list with millions of values in it) and what its index is?

I know that all values in the list are unique as in this example.

The first method I try is (3.8 sec in my real code):

a = [4,2,3,1,5,6]

if a.count(7) == 1:
    b=a.index(7)
    "Do something with variable b"

The second method I try is (2x faster: 1.9 sec for my real code):

a = [4,2,3,1,5,6]

try:
    b=a.index(7)
except ValueError:
    "Do nothing"
else:
    "Do something with variable b"

Proposed methods from Stack Overflow user (2.74 sec for my real code):

a = [4,2,3,1,5,6]
if 7 in a:
    a.index(7)

In my real code, the first method takes 3.81 sec and the second method takes 1.88 sec. It’s a good improvement, but:

I’m a beginner with Python/scripting, and is there a faster way to do the same things and save more processing time?

More specific explication for my application:

In the Blender API I can access a list of particles:

particles = [1, 2, 3, 4, etc.]

From there, I can access a particle’s location:

particles[x].location = [x,y,z]

And for each particle I test if a neighbour exists by searching each particle location like so:

if [x+1,y,z] in particles.location
    "Find the identity of this neighbour particle in x:the particle's index
    in the array"
    particles.index([x+1,y,z])

回答 0

7 in a

最清晰,最快的方法。

您也可以考虑使用set,但是从列表中构造该集合所花费的时间可能比更快的成员资格测试所节省的时间还要长。唯一可以确定的基准就是基准测试。(这还取决于您需要执行哪些操作)

7 in a

Clearest and fastest way to do it.

You can also consider using a set, but constructing that set from your list may take more time than faster membership testing will save. The only way to be certain is to benchmark well. (this also depends on what operations you require)


回答 1

正如其他人所述,in对于大型列表,它可能非常慢。这里是表演一些比较insetbisect。请注意时间(以秒为单位)是对数刻度。

在此处输入图片说明

测试代码:

import random
import bisect
import matplotlib.pyplot as plt
import math
import time

def method_in(a,b,c):
    start_time = time.time()
    for i,x in enumerate(a):
        if x in b:
            c[i] = 1
    return(time.time()-start_time)   

def method_set_in(a,b,c):
    start_time = time.time()
    s = set(b)
    for i,x in enumerate(a):
        if x in s:
            c[i] = 1
    return(time.time()-start_time)

def method_bisect(a,b,c):
    start_time = time.time()
    b.sort()
    for i,x in enumerate(a):
        index = bisect.bisect_left(b,x)
        if index < len(a):
            if x == b[index]:
                c[i] = 1
    return(time.time()-start_time)

def profile():
    time_method_in = []
    time_method_set_in = []
    time_method_bisect = []

    Nls = [x for x in range(1000,20000,1000)]
    for N in Nls:
        a = [x for x in range(0,N)]
        random.shuffle(a)
        b = [x for x in range(0,N)]
        random.shuffle(b)
        c = [0 for x in range(0,N)]

        time_method_in.append(math.log(method_in(a,b,c)))
        time_method_set_in.append(math.log(method_set_in(a,b,c)))
        time_method_bisect.append(math.log(method_bisect(a,b,c)))

    plt.plot(Nls,time_method_in,marker='o',color='r',linestyle='-',label='in')
    plt.plot(Nls,time_method_set_in,marker='o',color='b',linestyle='-',label='set')
    plt.plot(Nls,time_method_bisect,marker='o',color='g',linestyle='-',label='bisect')
    plt.xlabel('list size', fontsize=18)
    plt.ylabel('log(time)', fontsize=18)
    plt.legend(loc = 'upper left')
    plt.show()

As stated by others, in can be very slow for large lists. Here are some comparisons of the performances for in, set and bisect. Note the time (in second) is in log scale.

enter image description here

Code for testing:

import random
import bisect
import matplotlib.pyplot as plt
import math
import time

def method_in(a,b,c):
    start_time = time.time()
    for i,x in enumerate(a):
        if x in b:
            c[i] = 1
    return(time.time()-start_time)   

def method_set_in(a,b,c):
    start_time = time.time()
    s = set(b)
    for i,x in enumerate(a):
        if x in s:
            c[i] = 1
    return(time.time()-start_time)

def method_bisect(a,b,c):
    start_time = time.time()
    b.sort()
    for i,x in enumerate(a):
        index = bisect.bisect_left(b,x)
        if index < len(a):
            if x == b[index]:
                c[i] = 1
    return(time.time()-start_time)

def profile():
    time_method_in = []
    time_method_set_in = []
    time_method_bisect = []

    Nls = [x for x in range(1000,20000,1000)]
    for N in Nls:
        a = [x for x in range(0,N)]
        random.shuffle(a)
        b = [x for x in range(0,N)]
        random.shuffle(b)
        c = [0 for x in range(0,N)]

        time_method_in.append(math.log(method_in(a,b,c)))
        time_method_set_in.append(math.log(method_set_in(a,b,c)))
        time_method_bisect.append(math.log(method_bisect(a,b,c)))

    plt.plot(Nls,time_method_in,marker='o',color='r',linestyle='-',label='in')
    plt.plot(Nls,time_method_set_in,marker='o',color='b',linestyle='-',label='set')
    plt.plot(Nls,time_method_bisect,marker='o',color='g',linestyle='-',label='bisect')
    plt.xlabel('list size', fontsize=18)
    plt.ylabel('log(time)', fontsize=18)
    plt.legend(loc = 'upper left')
    plt.show()

回答 2

您可以将物品放入set。集合查找非常有效。

尝试:

s = set(a)
if 7 in s:
  # do stuff

编辑在注释中,您说您想获取元素的索引。不幸的是,集合没有元素位置的概念。另一种方法是对列表进行预排序,然后在每次需要查找元素时使用二进制搜索

You could put your items into a set. Set lookups are very efficient.

Try:

s = set(a)
if 7 in s:
  # do stuff

edit In a comment you say that you’d like to get the index of the element. Unfortunately, sets have no notion of element position. An alternative is to pre-sort your list and then use binary search every time you need to find an element.


回答 3

def check_availability(element, collection: iter):
    return element in collection

用法

check_availability('a', [1,2,3,4,'a','b','c'])

我相信这是知道所选值是否在数组中的最快方法。

def check_availability(element, collection: iter):
    return element in collection

Usage

check_availability('a', [1,2,3,4,'a','b','c'])

I believe this is the fastest way to know if a chosen value is in an array.


回答 4

a = [4,2,3,1,5,6]

index = dict((y,x) for x,y in enumerate(a))
try:
   a_index = index[7]
except KeyError:
   print "Not found"
else:
   print "found"

如果a不变,这将是一个好主意,因此我们可以做一次dict()部分,然后重复使用它。如果确实发生变化,请提供您正在做的更多详细信息。

a = [4,2,3,1,5,6]

index = dict((y,x) for x,y in enumerate(a))
try:
   a_index = index[7]
except KeyError:
   print "Not found"
else:
   print "found"

This will only be a good idea if a doesn’t change and thus we can do the dict() part once and then use it repeatedly. If a does change, please provide more detail on what you are doing.


回答 5

最初的问题是:

最快的方法是什么才能知道列表中是否存在值(列表中包含数百万个值)及其索引是什么?

因此,有两件事可以找到:

  1. 是列表中的一项,并且
  2. 什么是索引(如果在列表中)。

为此,我修改了@xslittlegrass代码以在所有情况下计算索引,并添加了其他方法。

结果

在此处输入图片说明

方法是:

  1. in-基本上如果b中的x:返回b.index(x)
  2. try–try / catch on b.index(x)(跳过必须检查b中的x)
  3. set-基本上,如果x在set(b)中:返回b.index(x)
  4. bisect-用索引对其b进行排序,对sorted(b)中的x进行二进制搜索。请注意@xslittlegrass的mod,它返回排序后的b中的索引,而不是原始b)
  5. 反向-为b形成反向查找字典d; 然后d [x]提供x的索引。

结果表明,方法5最快。

有趣的是,tryset方法在时间上是等效的。


测试代码

import random
import bisect
import matplotlib.pyplot as plt
import math
import timeit
import itertools

def wrapper(func, *args, **kwargs):
    " Use to produced 0 argument function for call it"
    # Reference https://www.pythoncentral.io/time-a-python-function/
    def wrapped():
        return func(*args, **kwargs)
    return wrapped

def method_in(a,b,c):
    for i,x in enumerate(a):
        if x in b:
            c[i] = b.index(x)
        else:
            c[i] = -1
    return c

def method_try(a,b,c):
    for i, x in enumerate(a):
        try:
            c[i] = b.index(x)
        except ValueError:
            c[i] = -1

def method_set_in(a,b,c):
    s = set(b)
    for i,x in enumerate(a):
        if x in s:
            c[i] = b.index(x)
        else:
            c[i] = -1
    return c

def method_bisect(a,b,c):
    " Finds indexes using bisection "

    # Create a sorted b with its index
    bsorted = sorted([(x, i) for i, x in enumerate(b)], key = lambda t: t[0])

    for i,x in enumerate(a):
        index = bisect.bisect_left(bsorted,(x, ))
        c[i] = -1
        if index < len(a):
            if x == bsorted[index][0]:
                c[i] = bsorted[index][1]  # index in the b array

    return c

def method_reverse_lookup(a, b, c):
    reverse_lookup = {x:i for i, x in enumerate(b)}
    for i, x in enumerate(a):
        c[i] = reverse_lookup.get(x, -1)
    return c

def profile():
    Nls = [x for x in range(1000,20000,1000)]
    number_iterations = 10
    methods = [method_in, method_try, method_set_in, method_bisect, method_reverse_lookup]
    time_methods = [[] for _ in range(len(methods))]

    for N in Nls:
        a = [x for x in range(0,N)]
        random.shuffle(a)
        b = [x for x in range(0,N)]
        random.shuffle(b)
        c = [0 for x in range(0,N)]

        for i, func in enumerate(methods):
            wrapped = wrapper(func, a, b, c)
            time_methods[i].append(math.log(timeit.timeit(wrapped, number=number_iterations)))

    markers = itertools.cycle(('o', '+', '.', '>', '2'))
    colors = itertools.cycle(('r', 'b', 'g', 'y', 'c'))
    labels = itertools.cycle(('in', 'try', 'set', 'bisect', 'reverse'))

    for i in range(len(time_methods)):
        plt.plot(Nls,time_methods[i],marker = next(markers),color=next(colors),linestyle='-',label=next(labels))

    plt.xlabel('list size', fontsize=18)
    plt.ylabel('log(time)', fontsize=18)
    plt.legend(loc = 'upper left')
    plt.show()

profile()

The original question was:

What is the fastest way to know if a value exists in a list (a list with millions of values in it) and what its index is?

Thus there are two things to find:

  1. is an item in the list, and
  2. what is the index (if in the list).

Towards this, I modified @xslittlegrass code to compute indexes in all cases, and added an additional method.

Results

enter image description here

Methods are:

  1. in–basically if x in b: return b.index(x)
  2. try–try/catch on b.index(x) (skips having to check if x in b)
  3. set–basically if x in set(b): return b.index(x)
  4. bisect–sort b with its index, binary search for x in sorted(b). Note mod from @xslittlegrass who returns the index in the sorted b, rather than the original b)
  5. reverse–form a reverse lookup dictionary d for b; then d[x] provides the index of x.

Results show that method 5 is the fastest.

Interestingly the try and the set methods are equivalent in time.


Test Code

import random
import bisect
import matplotlib.pyplot as plt
import math
import timeit
import itertools

def wrapper(func, *args, **kwargs):
    " Use to produced 0 argument function for call it"
    # Reference https://www.pythoncentral.io/time-a-python-function/
    def wrapped():
        return func(*args, **kwargs)
    return wrapped

def method_in(a,b,c):
    for i,x in enumerate(a):
        if x in b:
            c[i] = b.index(x)
        else:
            c[i] = -1
    return c

def method_try(a,b,c):
    for i, x in enumerate(a):
        try:
            c[i] = b.index(x)
        except ValueError:
            c[i] = -1

def method_set_in(a,b,c):
    s = set(b)
    for i,x in enumerate(a):
        if x in s:
            c[i] = b.index(x)
        else:
            c[i] = -1
    return c

def method_bisect(a,b,c):
    " Finds indexes using bisection "

    # Create a sorted b with its index
    bsorted = sorted([(x, i) for i, x in enumerate(b)], key = lambda t: t[0])

    for i,x in enumerate(a):
        index = bisect.bisect_left(bsorted,(x, ))
        c[i] = -1
        if index < len(a):
            if x == bsorted[index][0]:
                c[i] = bsorted[index][1]  # index in the b array

    return c

def method_reverse_lookup(a, b, c):
    reverse_lookup = {x:i for i, x in enumerate(b)}
    for i, x in enumerate(a):
        c[i] = reverse_lookup.get(x, -1)
    return c

def profile():
    Nls = [x for x in range(1000,20000,1000)]
    number_iterations = 10
    methods = [method_in, method_try, method_set_in, method_bisect, method_reverse_lookup]
    time_methods = [[] for _ in range(len(methods))]

    for N in Nls:
        a = [x for x in range(0,N)]
        random.shuffle(a)
        b = [x for x in range(0,N)]
        random.shuffle(b)
        c = [0 for x in range(0,N)]

        for i, func in enumerate(methods):
            wrapped = wrapper(func, a, b, c)
            time_methods[i].append(math.log(timeit.timeit(wrapped, number=number_iterations)))

    markers = itertools.cycle(('o', '+', '.', '>', '2'))
    colors = itertools.cycle(('r', 'b', 'g', 'y', 'c'))
    labels = itertools.cycle(('in', 'try', 'set', 'bisect', 'reverse'))

    for i in range(len(time_methods)):
        plt.plot(Nls,time_methods[i],marker = next(markers),color=next(colors),linestyle='-',label=next(labels))

    plt.xlabel('list size', fontsize=18)
    plt.ylabel('log(time)', fontsize=18)
    plt.legend(loc = 'upper left')
    plt.show()

profile()

回答 6

听起来您的应用程序可能会受益于使用Bloom Filter数据结构的优势。

简而言之,布隆过滤器查询可以很快告诉您集合中是否绝对没有值。否则,您可以进行较慢的查找,以获取列表中可能存在的值的索引。因此,如果您的应用程序倾向于比“已找到”结果更频繁地获得“未找到”结果,则可以通过添加Bloom Filter来加快速度。

有关详细信息,Wikipedia很好地概述了布隆过滤器的工作方式,并且对“ python布隆过滤器库”的网络搜索将至少提供一些有用的实现。

It sounds like your application might gain advantage from the use of a Bloom Filter data structure.

In short, a bloom filter look-up can tell you very quickly if a value is DEFINITELY NOT present in a set. Otherwise, you can do a slower look-up to get the index of a value that POSSIBLY MIGHT BE in the list. So if your application tends to get the “not found” result much more often then the “found” result, you might see a speed up by adding a Bloom Filter.

For details, Wikipedia provides a good overview of how Bloom Filters work, and a web search for “python bloom filter library” will provide at least a couple useful implementations.


回答 7

请注意,in运算符不仅测试相等性(==),还测试身份(is),s 的in逻辑大致等同于以下内容(它实际上是用C编写的,但不是用Python编写的,至少是用CPython编写的):list

for element in s:
    if element is target:
        # fast check for identity implies equality
        return True
    if element == target:
        # slower check for actual equality
        return True
return False

在大多数情况下,这个细节是无关紧要的,但是在某些情况下,它可能会使Python新手感到惊讶,例如,numpy.NAN具有不等于自身的异常特性:

>>> import numpy
>>> numpy.NAN == numpy.NAN
False
>>> numpy.NAN is numpy.NAN
True
>>> numpy.NAN in [numpy.NAN]
True

要区分这些异常情况,可以使用any()

>>> lst = [numpy.NAN, 1 , 2]
>>> any(element == numpy.NAN for element in lst)
False
>>> any(element is numpy.NAN for element in lst)
True 

注意s 的in逻辑为:listany()

any(element is target or element == target for element in lst)

但是,我要强调的是,这是一个in极端的情况,在绝大多数情况下,运算符都是经过高度优化的,而这正是您想要的(当然是使用a list或使用a set)。

Be aware that the in operator tests not only equality (==) but also identity (is), the in logic for lists is roughly equivalent to the following (it’s actually written in C and not Python though, at least in CPython):

for element in s:
    if element is target:
        # fast check for identity implies equality
        return True
    if element == target:
        # slower check for actual equality
        return True
return False

In most circumstances this detail is irrelevant, but in some circumstances it might leave a Python novice surprised, for example, numpy.NAN has the unusual property of being not being equal to itself:

>>> import numpy
>>> numpy.NAN == numpy.NAN
False
>>> numpy.NAN is numpy.NAN
True
>>> numpy.NAN in [numpy.NAN]
True

To distinguish between these unusual cases you could use any() like:

>>> lst = [numpy.NAN, 1 , 2]
>>> any(element == numpy.NAN for element in lst)
False
>>> any(element is numpy.NAN for element in lst)
True 

Note the in logic for lists with any() would be:

any(element is target or element == target for element in lst)

However, I should emphasize that this is an edge case, and for the vast majority of cases the in operator is highly optimised and exactly what you want of course (either with a list or with a set).


回答 8

或使用__contains__

sequence.__contains__(value)

演示:

>>> l=[1,2,3]
>>> l.__contains__(3)
True
>>> 

Or use __contains__:

sequence.__contains__(value)

Demo:

>>> l=[1,2,3]
>>> l.__contains__(3)
True
>>> 

回答 9

@Winston Ewert的解决方案极大地提高了非常大的列表的速度,但是这个stackoverflow答案表明,如果经常到达除外分支,则try:/ except:/ else:构造将变慢。一种替代方法是利用该.get()方法使用dict:

a = [4,2,3,1,5,6]

index = dict((y, x) for x, y in enumerate(a))

b = index.get(7, None)
if b is not None:
    "Do something with variable b"

.get(key, default)方法仅适用于无法保证键将包含在dict中的情况。如果关键存在,它返回值(将dict[key]),但是,当它不是,.get()返回默认值(在这里None)。在这种情况下,您需要确保所选的默认值不会在中a

@Winston Ewert’s solution yields a big speed-up for very large lists, but this stackoverflow answer indicates that the the try:/except:/else: construct will be slowed down if the except branch is often reached. An alternative is to take advantage of the .get() method for the dict:

a = [4,2,3,1,5,6]

index = dict((y, x) for x, y in enumerate(a))

b = index.get(7, None)
if b is not None:
    "Do something with variable b"

The .get(key, default) method is just for the case when you can’t guarantee a key will be in the dict. If key is present, it returns the value (as would dict[key]), but when it is not, .get() returns your default value (here None). You need to make sure in this case that the chosen default will not be in a.


回答 10

这不是代码,而是用于快速搜索的算法。

如果您的列表和要查找的值都是数字,那么这很简单。如果是字符串:请看底部:

  • -让“ n”为列表的长度
  • -可选步骤:如果需要元素索引:将第二列添加到元素的当前索引(0到n-1)-稍后再说
  • 订购列表或列表的副本(.sort())
  • 依次通过:
    • 将您的数字与列表的第n / 2个元素进行比较
      • 如果更大,则在索引n / 2-n之间再次循环
      • 如果较小,则在索引0-n / 2之间再次循环
      • 如果相同:您找到了
  • 不断缩小列表的范围,直到找到它或只有2个数字(在您要查找的数字的下方和上方)
  • 这将在最多19个步骤中找到1.000.000列表中的任何元素(准确地说是log(2)n)

如果您还需要号码的原始位置,请在第二个索引列中查找。

如果您的列表不是由数字组成的,则该方法仍然有效并且将是最快的,但是您可能需要定义一个可以比较/排序字符串的函数。

当然,这需要sorted()方法的投资,但是如果您继续重复使用相同的列表进行检查,那可能是值得的。

This is not the code, but the algorithm for very fast searching.

If your list and the value you are looking for are all numbers, this is pretty straightforward. If strings: look at the bottom:

  • -Let “n” be the length of your list
  • -Optional step: if you need the index of the element: add a second column to the list with current index of elements (0 to n-1) – see later
  • Order your list or a copy of it (.sort())
  • Loop through:
    • Compare your number to the n/2th element of the list
      • If larger, loop again between indexes n/2-n
      • If smaller, loop again between indexes 0-n/2
      • If the same: you found it
  • Keep narrowing the list until you have found it or only have 2 numbers (below and above the one you are looking for)
  • This will find any element in at most 19 steps for a list of 1.000.000 (log(2)n to be precise)

If you also need the original position of your number, look for it in the second, index column.

If your list is not made of numbers, the method still works and will be fastest, but you may need to define a function which can compare/order strings.

Of course, this needs the investment of the sorted() method, but if you keep reusing the same list for checking, it may be worth it.


回答 11

因为问题不一定总是被理解为最快的技术方法-我总是建议理解/编写最直接的最快方法:列表理解,单线

[i for i in list_from_which_to_search if i in list_to_search_in]

我对list_to_search_in所有项目都拥有一个,并想返回中的项目索引list_from_which_to_search

这将在一个不错的列表中返回索引。

还有其他方法可以解决此问题-但是列表理解速度足够快,并且可以以足够快的速度编写它来解决问题。

Because the question is not always supposed to be understood as the fastest technical way – I always suggest the most straightforward fastest way to understand/write: a list comprehension, one-liner

[i for i in list_from_which_to_search if i in list_to_search_in]

I had a list_to_search_in with all the items, and wanted to return the indexes of the items in the list_from_which_to_search.

This returns the indexes in a nice list.

There are other ways to check this problem – however list comprehensions are quick enough, adding to the fact of writing it quick enough, to solve a problem.


回答 12

对我而言,这是0.030秒(实际),0.026秒(用户)和0.004秒(系统)。

try:
print("Started")
x = ["a", "b", "c", "d", "e", "f"]

i = 0

while i < len(x):
    i += 1
    if x[i] == "e":
        print("Found")
except IndexError:
    pass

For me it was 0.030 sec (real), 0.026 sec (user), and 0.004 sec (sys).

try:
print("Started")
x = ["a", "b", "c", "d", "e", "f"]

i = 0

while i < len(x):
    i += 1
    if x[i] == "e":
        print("Found")
except IndexError:
    pass

回答 13

检查乘积等于k的数组中是否存在两个元素的代码:

n = len(arr1)
for i in arr1:
    if k%i==0:
        print(i)

Code to check whether two elements exist in array whose product equals k:

n = len(arr1)
for i in arr1:
    if k%i==0:
        print(i)

获得两个列表之间的差异

问题:获得两个列表之间的差异

我在Python中有两个列表,如下所示:

temp1 = ['One', 'Two', 'Three', 'Four']
temp2 = ['One', 'Two']

我需要用第一个列表中没有的项目创建第三个列表。从示例中,我必须得到:

temp3 = ['Three', 'Four']

有没有循环和检查的快速方法吗?

I have two lists in Python, like these:

temp1 = ['One', 'Two', 'Three', 'Four']
temp2 = ['One', 'Two']

I need to create a third list with items from the first list which aren’t present in the second one. From the example I have to get:

temp3 = ['Three', 'Four']

Are there any fast ways without cycles and checking?


回答 0

In [5]: list(set(temp1) - set(temp2))
Out[5]: ['Four', 'Three']

当心

In [5]: set([1, 2]) - set([2, 3])
Out[5]: set([1]) 

您可能希望/希望它等于的位置set([1, 3])。如果确实要set([1, 3])作为答案,则需要使用set([1, 2]).symmetric_difference(set([2, 3]))

In [5]: list(set(temp1) - set(temp2))
Out[5]: ['Four', 'Three']

Beware that

In [5]: set([1, 2]) - set([2, 3])
Out[5]: set([1]) 

where you might expect/want it to equal set([1, 3]). If you do want set([1, 3]) as your answer, you’ll need to use set([1, 2]).symmetric_difference(set([2, 3])).


回答 1

现有解决方案均提供以下一项或多项:

  • 比O(n * m)性能快。
  • 保留输入列表的顺序。

但是到目前为止,还没有解决方案。如果两者都想要,请尝试以下操作:

s = set(temp2)
temp3 = [x for x in temp1 if x not in s]

性能测试

import timeit
init = 'temp1 = list(range(100)); temp2 = [i * 2 for i in range(50)]'
print timeit.timeit('list(set(temp1) - set(temp2))', init, number = 100000)
print timeit.timeit('s = set(temp2);[x for x in temp1 if x not in s]', init, number = 100000)
print timeit.timeit('[item for item in temp1 if item not in temp2]', init, number = 100000)

结果:

4.34620224079 # ars' answer
4.2770634955  # This answer
30.7715615392 # matt b's answer

我介绍的方法以及保留顺序也比集合减法要快(略),因为它不需要构造不必要的集合。如果第一个列表比第二个列表长很多,并且散列很昂贵,则性能差异将更加明显。这是第二个测试,证明了这一点:

init = '''
temp1 = [str(i) for i in range(100000)]
temp2 = [str(i * 2) for i in range(50)]
'''

结果:

11.3836875916 # ars' answer
3.63890368748 # this answer (3 times faster!)
37.7445402279 # matt b's answer

The existing solutions all offer either one or the other of:

  • Faster than O(n*m) performance.
  • Preserve order of input list.

But so far no solution has both. If you want both, try this:

s = set(temp2)
temp3 = [x for x in temp1 if x not in s]

Performance test

import timeit
init = 'temp1 = list(range(100)); temp2 = [i * 2 for i in range(50)]'
print timeit.timeit('list(set(temp1) - set(temp2))', init, number = 100000)
print timeit.timeit('s = set(temp2);[x for x in temp1 if x not in s]', init, number = 100000)
print timeit.timeit('[item for item in temp1 if item not in temp2]', init, number = 100000)

Results:

4.34620224079 # ars' answer
4.2770634955  # This answer
30.7715615392 # matt b's answer

The method I presented as well as preserving order is also (slightly) faster than the set subtraction because it doesn’t require construction of an unnecessary set. The performance difference would be more noticable if the first list is considerably longer than the second and if hashing is expensive. Here’s a second test demonstrating this:

init = '''
temp1 = [str(i) for i in range(100000)]
temp2 = [str(i * 2) for i in range(50)]
'''

Results:

11.3836875916 # ars' answer
3.63890368748 # this answer (3 times faster!)
37.7445402279 # matt b's answer

回答 2

temp3 = [item for item in temp1 if item not in temp2]
temp3 = [item for item in temp1 if item not in temp2]

回答 3

可以使用以下简单函数找到两个列表(例如list1和list2)之间的差异。

def diff(list1, list2):
    c = set(list1).union(set(list2))  # or c = set(list1) | set(list2)
    d = set(list1).intersection(set(list2))  # or d = set(list1) & set(list2)
    return list(c - d)

要么

def diff(list1, list2):
    return list(set(list1).symmetric_difference(set(list2)))  # or return list(set(list1) ^ set(list2))

通过使用上述功能,可以使用diff(temp2, temp1)或找到差异diff(temp1, temp2)。两者都会给出结果['Four', 'Three']。您不必担心列表的顺序或先给出哪个列表。

Python文档参考

The difference between two lists (say list1 and list2) can be found using the following simple function.

def diff(list1, list2):
    c = set(list1).union(set(list2))  # or c = set(list1) | set(list2)
    d = set(list1).intersection(set(list2))  # or d = set(list1) & set(list2)
    return list(c - d)

or

def diff(list1, list2):
    return list(set(list1).symmetric_difference(set(list2)))  # or return list(set(list1) ^ set(list2))

By Using the above function, the difference can be found using diff(temp2, temp1) or diff(temp1, temp2). Both will give the result ['Four', 'Three']. You don’t have to worry about the order of the list or which list is to be given first.

Python doc reference


回答 4

如果您需要递归的区别,我已经为python编写了一个软件包:https : //github.com/seperman/deepdiff

安装

从PyPi安装:

pip install deepdiff

用法示例

输入

>>> from deepdiff import DeepDiff
>>> from pprint import pprint
>>> from __future__ import print_function # In case running on Python 2

同一对象返回空

>>> t1 = {1:1, 2:2, 3:3}
>>> t2 = t1
>>> print(DeepDiff(t1, t2))
{}

项目类型已更改

>>> t1 = {1:1, 2:2, 3:3}
>>> t2 = {1:1, 2:"2", 3:3}
>>> pprint(DeepDiff(t1, t2), indent=2)
{ 'type_changes': { 'root[2]': { 'newtype': <class 'str'>,
                                 'newvalue': '2',
                                 'oldtype': <class 'int'>,
                                 'oldvalue': 2}}}

物品的价值已经改变

>>> t1 = {1:1, 2:2, 3:3}
>>> t2 = {1:1, 2:4, 3:3}
>>> pprint(DeepDiff(t1, t2), indent=2)
{'values_changed': {'root[2]': {'newvalue': 4, 'oldvalue': 2}}}

添加和/或删除项目

>>> t1 = {1:1, 2:2, 3:3, 4:4}
>>> t2 = {1:1, 2:4, 3:3, 5:5, 6:6}
>>> ddiff = DeepDiff(t1, t2)
>>> pprint (ddiff)
{'dic_item_added': ['root[5]', 'root[6]'],
 'dic_item_removed': ['root[4]'],
 'values_changed': {'root[2]': {'newvalue': 4, 'oldvalue': 2}}}

弦差异

>>> t1 = {1:1, 2:2, 3:3, 4:{"a":"hello", "b":"world"}}
>>> t2 = {1:1, 2:4, 3:3, 4:{"a":"hello", "b":"world!"}}
>>> ddiff = DeepDiff(t1, t2)
>>> pprint (ddiff, indent = 2)
{ 'values_changed': { 'root[2]': {'newvalue': 4, 'oldvalue': 2},
                      "root[4]['b']": { 'newvalue': 'world!',
                                        'oldvalue': 'world'}}}

弦差异2

>>> t1 = {1:1, 2:2, 3:3, 4:{"a":"hello", "b":"world!\nGoodbye!\n1\n2\nEnd"}}
>>> t2 = {1:1, 2:2, 3:3, 4:{"a":"hello", "b":"world\n1\n2\nEnd"}}
>>> ddiff = DeepDiff(t1, t2)
>>> pprint (ddiff, indent = 2)
{ 'values_changed': { "root[4]['b']": { 'diff': '--- \n'
                                                '+++ \n'
                                                '@@ -1,5 +1,4 @@\n'
                                                '-world!\n'
                                                '-Goodbye!\n'
                                                '+world\n'
                                                ' 1\n'
                                                ' 2\n'
                                                ' End',
                                        'newvalue': 'world\n1\n2\nEnd',
                                        'oldvalue': 'world!\n'
                                                    'Goodbye!\n'
                                                    '1\n'
                                                    '2\n'
                                                    'End'}}}

>>> 
>>> print (ddiff['values_changed']["root[4]['b']"]["diff"])
--- 
+++ 
@@ -1,5 +1,4 @@
-world!
-Goodbye!
+world
 1
 2
 End

类型变更

>>> t1 = {1:1, 2:2, 3:3, 4:{"a":"hello", "b":[1, 2, 3]}}
>>> t2 = {1:1, 2:2, 3:3, 4:{"a":"hello", "b":"world\n\n\nEnd"}}
>>> ddiff = DeepDiff(t1, t2)
>>> pprint (ddiff, indent = 2)
{ 'type_changes': { "root[4]['b']": { 'newtype': <class 'str'>,
                                      'newvalue': 'world\n\n\nEnd',
                                      'oldtype': <class 'list'>,
                                      'oldvalue': [1, 2, 3]}}}

清单差异

>>> t1 = {1:1, 2:2, 3:3, 4:{"a":"hello", "b":[1, 2, 3, 4]}}
>>> t2 = {1:1, 2:2, 3:3, 4:{"a":"hello", "b":[1, 2]}}
>>> ddiff = DeepDiff(t1, t2)
>>> pprint (ddiff, indent = 2)
{'iterable_item_removed': {"root[4]['b'][2]": 3, "root[4]['b'][3]": 4}}

清单差异2:

>>> t1 = {1:1, 2:2, 3:3, 4:{"a":"hello", "b":[1, 2, 3]}}
>>> t2 = {1:1, 2:2, 3:3, 4:{"a":"hello", "b":[1, 3, 2, 3]}}
>>> ddiff = DeepDiff(t1, t2)
>>> pprint (ddiff, indent = 2)
{ 'iterable_item_added': {"root[4]['b'][3]": 3},
  'values_changed': { "root[4]['b'][1]": {'newvalue': 3, 'oldvalue': 2},
                      "root[4]['b'][2]": {'newvalue': 2, 'oldvalue': 3}}}

列出差异忽略顺序或重复项:(具有与上述相同的字典)

>>> t1 = {1:1, 2:2, 3:3, 4:{"a":"hello", "b":[1, 2, 3]}}
>>> t2 = {1:1, 2:2, 3:3, 4:{"a":"hello", "b":[1, 3, 2, 3]}}
>>> ddiff = DeepDiff(t1, t2, ignore_order=True)
>>> print (ddiff)
{}

包含字典的列表:

>>> t1 = {1:1, 2:2, 3:3, 4:{"a":"hello", "b":[1, 2, {1:1, 2:2}]}}
>>> t2 = {1:1, 2:2, 3:3, 4:{"a":"hello", "b":[1, 2, {1:3}]}}
>>> ddiff = DeepDiff(t1, t2)
>>> pprint (ddiff, indent = 2)
{ 'dic_item_removed': ["root[4]['b'][2][2]"],
  'values_changed': {"root[4]['b'][2][1]": {'newvalue': 3, 'oldvalue': 1}}}

套装:

>>> t1 = {1, 2, 8}
>>> t2 = {1, 2, 3, 5}
>>> ddiff = DeepDiff(t1, t2)
>>> pprint (DeepDiff(t1, t2))
{'set_item_added': ['root[3]', 'root[5]'], 'set_item_removed': ['root[8]']}

命名元组:

>>> from collections import namedtuple
>>> Point = namedtuple('Point', ['x', 'y'])
>>> t1 = Point(x=11, y=22)
>>> t2 = Point(x=11, y=23)
>>> pprint (DeepDiff(t1, t2))
{'values_changed': {'root.y': {'newvalue': 23, 'oldvalue': 22}}}

自定义对象:

>>> class ClassA(object):
...     a = 1
...     def __init__(self, b):
...         self.b = b
... 
>>> t1 = ClassA(1)
>>> t2 = ClassA(2)
>>> 
>>> pprint(DeepDiff(t1, t2))
{'values_changed': {'root.b': {'newvalue': 2, 'oldvalue': 1}}}

对象属性添加:

>>> t2.c = "new attribute"
>>> pprint(DeepDiff(t1, t2))
{'attribute_added': ['root.c'],
 'values_changed': {'root.b': {'newvalue': 2, 'oldvalue': 1}}}

In case you want the difference recursively, I have written a package for python: https://github.com/seperman/deepdiff

Installation

Install from PyPi:

pip install deepdiff

Example usage

Importing

>>> from deepdiff import DeepDiff
>>> from pprint import pprint
>>> from __future__ import print_function # In case running on Python 2

Same object returns empty

>>> t1 = {1:1, 2:2, 3:3}
>>> t2 = t1
>>> print(DeepDiff(t1, t2))
{}

Type of an item has changed

>>> t1 = {1:1, 2:2, 3:3}
>>> t2 = {1:1, 2:"2", 3:3}
>>> pprint(DeepDiff(t1, t2), indent=2)
{ 'type_changes': { 'root[2]': { 'newtype': <class 'str'>,
                                 'newvalue': '2',
                                 'oldtype': <class 'int'>,
                                 'oldvalue': 2}}}

Value of an item has changed

>>> t1 = {1:1, 2:2, 3:3}
>>> t2 = {1:1, 2:4, 3:3}
>>> pprint(DeepDiff(t1, t2), indent=2)
{'values_changed': {'root[2]': {'newvalue': 4, 'oldvalue': 2}}}

Item added and/or removed

>>> t1 = {1:1, 2:2, 3:3, 4:4}
>>> t2 = {1:1, 2:4, 3:3, 5:5, 6:6}
>>> ddiff = DeepDiff(t1, t2)
>>> pprint (ddiff)
{'dic_item_added': ['root[5]', 'root[6]'],
 'dic_item_removed': ['root[4]'],
 'values_changed': {'root[2]': {'newvalue': 4, 'oldvalue': 2}}}

String difference

>>> t1 = {1:1, 2:2, 3:3, 4:{"a":"hello", "b":"world"}}
>>> t2 = {1:1, 2:4, 3:3, 4:{"a":"hello", "b":"world!"}}
>>> ddiff = DeepDiff(t1, t2)
>>> pprint (ddiff, indent = 2)
{ 'values_changed': { 'root[2]': {'newvalue': 4, 'oldvalue': 2},
                      "root[4]['b']": { 'newvalue': 'world!',
                                        'oldvalue': 'world'}}}

String difference 2

>>> t1 = {1:1, 2:2, 3:3, 4:{"a":"hello", "b":"world!\nGoodbye!\n1\n2\nEnd"}}
>>> t2 = {1:1, 2:2, 3:3, 4:{"a":"hello", "b":"world\n1\n2\nEnd"}}
>>> ddiff = DeepDiff(t1, t2)
>>> pprint (ddiff, indent = 2)
{ 'values_changed': { "root[4]['b']": { 'diff': '--- \n'
                                                '+++ \n'
                                                '@@ -1,5 +1,4 @@\n'
                                                '-world!\n'
                                                '-Goodbye!\n'
                                                '+world\n'
                                                ' 1\n'
                                                ' 2\n'
                                                ' End',
                                        'newvalue': 'world\n1\n2\nEnd',
                                        'oldvalue': 'world!\n'
                                                    'Goodbye!\n'
                                                    '1\n'
                                                    '2\n'
                                                    'End'}}}

>>> 
>>> print (ddiff['values_changed']["root[4]['b']"]["diff"])
--- 
+++ 
@@ -1,5 +1,4 @@
-world!
-Goodbye!
+world
 1
 2
 End

Type change

>>> t1 = {1:1, 2:2, 3:3, 4:{"a":"hello", "b":[1, 2, 3]}}
>>> t2 = {1:1, 2:2, 3:3, 4:{"a":"hello", "b":"world\n\n\nEnd"}}
>>> ddiff = DeepDiff(t1, t2)
>>> pprint (ddiff, indent = 2)
{ 'type_changes': { "root[4]['b']": { 'newtype': <class 'str'>,
                                      'newvalue': 'world\n\n\nEnd',
                                      'oldtype': <class 'list'>,
                                      'oldvalue': [1, 2, 3]}}}

List difference

>>> t1 = {1:1, 2:2, 3:3, 4:{"a":"hello", "b":[1, 2, 3, 4]}}
>>> t2 = {1:1, 2:2, 3:3, 4:{"a":"hello", "b":[1, 2]}}
>>> ddiff = DeepDiff(t1, t2)
>>> pprint (ddiff, indent = 2)
{'iterable_item_removed': {"root[4]['b'][2]": 3, "root[4]['b'][3]": 4}}

List difference 2:

>>> t1 = {1:1, 2:2, 3:3, 4:{"a":"hello", "b":[1, 2, 3]}}
>>> t2 = {1:1, 2:2, 3:3, 4:{"a":"hello", "b":[1, 3, 2, 3]}}
>>> ddiff = DeepDiff(t1, t2)
>>> pprint (ddiff, indent = 2)
{ 'iterable_item_added': {"root[4]['b'][3]": 3},
  'values_changed': { "root[4]['b'][1]": {'newvalue': 3, 'oldvalue': 2},
                      "root[4]['b'][2]": {'newvalue': 2, 'oldvalue': 3}}}

List difference ignoring order or duplicates: (with the same dictionaries as above)

>>> t1 = {1:1, 2:2, 3:3, 4:{"a":"hello", "b":[1, 2, 3]}}
>>> t2 = {1:1, 2:2, 3:3, 4:{"a":"hello", "b":[1, 3, 2, 3]}}
>>> ddiff = DeepDiff(t1, t2, ignore_order=True)
>>> print (ddiff)
{}

List that contains dictionary:

>>> t1 = {1:1, 2:2, 3:3, 4:{"a":"hello", "b":[1, 2, {1:1, 2:2}]}}
>>> t2 = {1:1, 2:2, 3:3, 4:{"a":"hello", "b":[1, 2, {1:3}]}}
>>> ddiff = DeepDiff(t1, t2)
>>> pprint (ddiff, indent = 2)
{ 'dic_item_removed': ["root[4]['b'][2][2]"],
  'values_changed': {"root[4]['b'][2][1]": {'newvalue': 3, 'oldvalue': 1}}}

Sets:

>>> t1 = {1, 2, 8}
>>> t2 = {1, 2, 3, 5}
>>> ddiff = DeepDiff(t1, t2)
>>> pprint (DeepDiff(t1, t2))
{'set_item_added': ['root[3]', 'root[5]'], 'set_item_removed': ['root[8]']}

Named Tuples:

>>> from collections import namedtuple
>>> Point = namedtuple('Point', ['x', 'y'])
>>> t1 = Point(x=11, y=22)
>>> t2 = Point(x=11, y=23)
>>> pprint (DeepDiff(t1, t2))
{'values_changed': {'root.y': {'newvalue': 23, 'oldvalue': 22}}}

Custom objects:

>>> class ClassA(object):
...     a = 1
...     def __init__(self, b):
...         self.b = b
... 
>>> t1 = ClassA(1)
>>> t2 = ClassA(2)
>>> 
>>> pprint(DeepDiff(t1, t2))
{'values_changed': {'root.b': {'newvalue': 2, 'oldvalue': 1}}}

Object attribute added:

>>> t2.c = "new attribute"
>>> pprint(DeepDiff(t1, t2))
{'attribute_added': ['root.c'],
 'values_changed': {'root.b': {'newvalue': 2, 'oldvalue': 1}}}

回答 5

可以使用python XOR运算符完成。

  • 这将删除每个列表中的重复项
  • 这将显示temp1与temp2和temp2与temp1的差异。

set(temp1) ^ set(temp2)

Can be done using python XOR operator.

  • This will remove the duplicates in each list
  • This will show difference of temp1 from temp2 and temp2 from temp1.

set(temp1) ^ set(temp2)

回答 6

最简单的方法

使用set()。difference(set())

list_a = [1,2,3]
list_b = [2,3]
print set(list_a).difference(set(list_b))

答案是 set([1])

可以打印为列表,

print list(set(list_a).difference(set(list_b)))

most simple way,

use set().difference(set())

list_a = [1,2,3]
list_b = [2,3]
print set(list_a).difference(set(list_b))

answer is set([1])

can print as a list,

print list(set(list_a).difference(set(list_b)))

回答 7

如果您真的在考虑性能,请使用numpy!

这是完整的笔记本,是github上的要点,其中包括list,numpy和pandas之间的比较。

https://gist.github.com/denfromufa/2821ff59b02e9482be15d27f2bbd4451

在此处输入图片说明

If you are really looking into performance, then use numpy!

Here is the full notebook as a gist on github with comparison between list, numpy, and pandas.

https://gist.github.com/denfromufa/2821ff59b02e9482be15d27f2bbd4451

enter image description here


回答 8

因为目前的解决方案都无法产生元组,所以我会抛出:

temp3 = tuple(set(temp1) - set(temp2))

或者:

#edited using @Mark Byers idea. If you accept this one as answer, just accept his instead.
temp3 = tuple(x for x in temp1 if x not in set(temp2))

像其他非元组在该方向上产生答案一样,它保留了顺序

i’ll toss in since none of the present solutions yield a tuple:

temp3 = tuple(set(temp1) - set(temp2))

alternatively:

#edited using @Mark Byers idea. If you accept this one as answer, just accept his instead.
temp3 = tuple(x for x in temp1 if x not in set(temp2))

Like the other non-tuple yielding answers in this direction, it preserves order


回答 9

我想要的东西,将采取两个列表,并可以做什么diffbash呢。因为当您搜索“ python diff two list”时该问题首先弹出,并且不是很具体,所以我将发布我提出的内容。

使用SequenceMatherfrom difflib可以像比较两个列表diff。其他答案都不会告诉您差异发生的位置,但是这个答案确实可以。一些答案只能在一个方向上有所不同。一些重新排列元素。有些不处理重复项。但是此解决方案为您提供了两个列表之间的真正区别:

a = 'A quick fox jumps the lazy dog'.split()
b = 'A quick brown mouse jumps over the dog'.split()

from difflib import SequenceMatcher

for tag, i, j, k, l in SequenceMatcher(None, a, b).get_opcodes():
  if tag == 'equal': print('both have', a[i:j])
  if tag in ('delete', 'replace'): print('  1st has', a[i:j])
  if tag in ('insert', 'replace'): print('  2nd has', b[k:l])

输出:

both have ['A', 'quick']
  1st has ['fox']
  2nd has ['brown', 'mouse']
both have ['jumps']
  2nd has ['over']
both have ['the']
  1st has ['lazy']
both have ['dog']

当然,如果您的应用程序做出与其他答案相同的假设,则您将从中受益最大。但是,如果您正在寻找真正的diff功能,那么这是唯一的方法。

例如,其他答案都无法处理:

a = [1,2,3,4,5]
b = [5,4,3,2,1]

但这确实做到了:

  2nd has [5, 4, 3, 2]
both have [1]
  1st has [2, 3, 4, 5]

I wanted something that would take two lists and could do what diff in bash does. Since this question pops up first when you search for “python diff two lists” and is not very specific, I will post what I came up with.

Using SequenceMather from difflib you can compare two lists like diff does. None of the other answers will tell you the position where the difference occurs, but this one does. Some answers give the difference in only one direction. Some reorder the elements. Some don’t handle duplicates. But this solution gives you a true difference between two lists:

a = 'A quick fox jumps the lazy dog'.split()
b = 'A quick brown mouse jumps over the dog'.split()

from difflib import SequenceMatcher

for tag, i, j, k, l in SequenceMatcher(None, a, b).get_opcodes():
  if tag == 'equal': print('both have', a[i:j])
  if tag in ('delete', 'replace'): print('  1st has', a[i:j])
  if tag in ('insert', 'replace'): print('  2nd has', b[k:l])

This outputs:

both have ['A', 'quick']
  1st has ['fox']
  2nd has ['brown', 'mouse']
both have ['jumps']
  2nd has ['over']
both have ['the']
  1st has ['lazy']
both have ['dog']

Of course, if your application makes the same assumptions the other answers make, you will benefit from them the most. But if you are looking for a true diff functionality, then this is the only way to go.

For example, none of the other answers could handle:

a = [1,2,3,4,5]
b = [5,4,3,2,1]

But this one does:

  2nd has [5, 4, 3, 2]
both have [1]
  1st has [2, 3, 4, 5]

回答 10

尝试这个:

temp3 = set(temp1) - set(temp2)

Try this:

temp3 = set(temp1) - set(temp2)

回答 11

这可能比Mark的列表理解速度还要快:

list(itertools.filterfalse(set(temp2).__contains__, temp1))

this could be even faster than Mark’s list comprehension:

list(itertools.filterfalse(set(temp2).__contains__, temp1))

回答 12

这是Counter最简单情况的答案。

这比上面的双向差异短,因为它只完全满足问题的要求:生成第一个列表的列表,而不生成第二个列表。

from collections import Counter

lst1 = ['One', 'Two', 'Three', 'Four']
lst2 = ['One', 'Two']

c1 = Counter(lst1)
c2 = Counter(lst2)
diff = list((c1 - c2).elements())

另外,根据您对可读性的偏好,它可以提供不错的一线:

diff = list((Counter(lst1) - Counter(lst2)).elements())

输出:

['Three', 'Four']

请注意,list(...)如果仅在呼叫上进行迭代,则可以将其删除。

由于此解决方案使用计数器,因此与许多基于集合的答案相比,它可以正确处理数量。例如在此输入上:

lst1 = ['One', 'Two', 'Two', 'Two', 'Three', 'Three', 'Four']
lst2 = ['One', 'Two']

输出为:

['Two', 'Two', 'Three', 'Three', 'Four']

Here’s a Counter answer for the simplest case.

This is shorter than the one above that does two-way diffs because it only does exactly what the question asks: generate a list of what’s in the first list but not the second.

from collections import Counter

lst1 = ['One', 'Two', 'Three', 'Four']
lst2 = ['One', 'Two']

c1 = Counter(lst1)
c2 = Counter(lst2)
diff = list((c1 - c2).elements())

Alternatively, depending on your readability preferences, it makes for a decent one-liner:

diff = list((Counter(lst1) - Counter(lst2)).elements())

Output:

['Three', 'Four']

Note that you can remove the list(...) call if you are just iterating over it.

Because this solution uses counters, it handles quantities properly vs the many set-based answers. For example on this input:

lst1 = ['One', 'Two', 'Two', 'Two', 'Three', 'Three', 'Four']
lst2 = ['One', 'Two']

The output is:

['Two', 'Two', 'Three', 'Three', 'Four']

回答 13

如果对difflist的元素进行排序和设置,则可以使用幼稚的方法。

list1=[1,2,3,4,5]
list2=[1,2,3]

print list1[len(list2):]

或使用本机set方法:

subset=set(list1).difference(list2)

print subset

import timeit
init = 'temp1 = list(range(100)); temp2 = [i * 2 for i in range(50)]'
print "Naive solution: ", timeit.timeit('temp1[len(temp2):]', init, number = 100000)
print "Native set solution: ", timeit.timeit('set(temp1).difference(temp2)', init, number = 100000)

天真的解决方案:0.0787101593292

本机设置解决方案:0.998837615564

You could use a naive method if the elements of the difflist are sorted and sets.

list1=[1,2,3,4,5]
list2=[1,2,3]

print list1[len(list2):]

or with native set methods:

subset=set(list1).difference(list2)

print subset

import timeit
init = 'temp1 = list(range(100)); temp2 = [i * 2 for i in range(50)]'
print "Naive solution: ", timeit.timeit('temp1[len(temp2):]', init, number = 100000)
print "Native set solution: ", timeit.timeit('set(temp1).difference(temp2)', init, number = 100000)

Naive solution: 0.0787101593292

Native set solution: 0.998837615564


回答 14

为此我在游戏中为时不晚,但是您可以将上述某些代码的性能与此进行比较,其中两个最快的竞争者是:

list(set(x).symmetric_difference(set(y)))
list(set(x) ^ set(y))

对于基本的编码我深表歉意。

import time
import random
from itertools import filterfalse

# 1 - performance (time taken)
# 2 - correctness (answer - 1,4,5,6)
# set performance
performance = 1
numberoftests = 7

def answer(x,y,z):
    if z == 0:
        start = time.clock()
        lists = (str(list(set(x)-set(y))+list(set(y)-set(y))))
        times = ("1 = " + str(time.clock() - start))
        return (lists,times)

    elif z == 1:
        start = time.clock()
        lists = (str(list(set(x).symmetric_difference(set(y)))))
        times = ("2 = " + str(time.clock() - start))
        return (lists,times)

    elif z == 2:
        start = time.clock()
        lists = (str(list(set(x) ^ set(y))))
        times = ("3 = " + str(time.clock() - start))
        return (lists,times)

    elif z == 3:
        start = time.clock()
        lists = (filterfalse(set(y).__contains__, x))
        times = ("4 = " + str(time.clock() - start))
        return (lists,times)

    elif z == 4:
        start = time.clock()
        lists = (tuple(set(x) - set(y)))
        times = ("5 = " + str(time.clock() - start))
        return (lists,times)

    elif z == 5:
        start = time.clock()
        lists = ([tt for tt in x if tt not in y])
        times = ("6 = " + str(time.clock() - start))
        return (lists,times)

    else:    
        start = time.clock()
        Xarray = [iDa for iDa in x if iDa not in y]
        Yarray = [iDb for iDb in y if iDb not in x]
        lists = (str(Xarray + Yarray))
        times = ("7 = " + str(time.clock() - start))
        return (lists,times)

n = numberoftests

if performance == 2:
    a = [1,2,3,4,5]
    b = [3,2,6]
    for c in range(0,n):
        d = answer(a,b,c)
        print(d[0])

elif performance == 1:
    for tests in range(0,10):
        print("Test Number" + str(tests + 1))
        a = random.sample(range(1, 900000), 9999)
        b = random.sample(range(1, 900000), 9999)
        for c in range(0,n):
            #if c not in (1,4,5,6):
            d = answer(a,b,c)
            print(d[1])

I am little too late in the game for this but you can do a comparison of performance of some of the above mentioned code with this, two of the fastest contenders are,

list(set(x).symmetric_difference(set(y)))
list(set(x) ^ set(y))

I apologize for the elementary level of coding.

import time
import random
from itertools import filterfalse

# 1 - performance (time taken)
# 2 - correctness (answer - 1,4,5,6)
# set performance
performance = 1
numberoftests = 7

def answer(x,y,z):
    if z == 0:
        start = time.clock()
        lists = (str(list(set(x)-set(y))+list(set(y)-set(y))))
        times = ("1 = " + str(time.clock() - start))
        return (lists,times)

    elif z == 1:
        start = time.clock()
        lists = (str(list(set(x).symmetric_difference(set(y)))))
        times = ("2 = " + str(time.clock() - start))
        return (lists,times)

    elif z == 2:
        start = time.clock()
        lists = (str(list(set(x) ^ set(y))))
        times = ("3 = " + str(time.clock() - start))
        return (lists,times)

    elif z == 3:
        start = time.clock()
        lists = (filterfalse(set(y).__contains__, x))
        times = ("4 = " + str(time.clock() - start))
        return (lists,times)

    elif z == 4:
        start = time.clock()
        lists = (tuple(set(x) - set(y)))
        times = ("5 = " + str(time.clock() - start))
        return (lists,times)

    elif z == 5:
        start = time.clock()
        lists = ([tt for tt in x if tt not in y])
        times = ("6 = " + str(time.clock() - start))
        return (lists,times)

    else:    
        start = time.clock()
        Xarray = [iDa for iDa in x if iDa not in y]
        Yarray = [iDb for iDb in y if iDb not in x]
        lists = (str(Xarray + Yarray))
        times = ("7 = " + str(time.clock() - start))
        return (lists,times)

n = numberoftests

if performance == 2:
    a = [1,2,3,4,5]
    b = [3,2,6]
    for c in range(0,n):
        d = answer(a,b,c)
        print(d[0])

elif performance == 1:
    for tests in range(0,10):
        print("Test Number" + str(tests + 1))
        a = random.sample(range(1, 900000), 9999)
        b = random.sample(range(1, 900000), 9999)
        for c in range(0,n):
            #if c not in (1,4,5,6):
            d = answer(a,b,c)
            print(d[1])

回答 15

这是一些比较两个字符串列表的简单的,保留顺序的方法。

一种不寻常的方法,使用pathlib

import pathlib


temp1 = ["One", "Two", "Three", "Four"]
temp2 = ["One", "Two"]

p = pathlib.Path(*temp1)
r = p.relative_to(*temp2)
list(r.parts)
# ['Three', 'Four']

假设两个列表都包含以相同的开头的字符串。有关更多详细信息,请参阅文档。注意,与设置操作相比,它并不是特别快。


使用以下方法的直接实现itertools.zip_longest

import itertools as it


[x for x, y in it.zip_longest(temp1, temp2) if x != y]
# ['Three', 'Four']

Here are a few simple, order-preserving ways of diffing two lists of strings.

Code

An unusual approach using pathlib:

import pathlib


temp1 = ["One", "Two", "Three", "Four"]
temp2 = ["One", "Two"]

p = pathlib.Path(*temp1)
r = p.relative_to(*temp2)
list(r.parts)
# ['Three', 'Four']

This assumes both lists contain strings with equivalent beginnings. See the docs for more details. Note, it is not particularly fast compared to set operations.


A straight-forward implementation using itertools.zip_longest:

import itertools as it


[x for x, y in it.zip_longest(temp1, temp2) if x != y]
# ['Three', 'Four']

回答 16

这是另一个解决方案:

def diff(a, b):
    xa = [i for i in set(a) if i not in b]
    xb = [i for i in set(b) if i not in a]
    return xa + xb

This is another solution:

def diff(a, b):
    xa = [i for i in set(a) if i not in b]
    xb = [i for i in set(b) if i not in a]
    return xa + xb

回答 17

如果遇到问题TypeError: unhashable type: 'list',则需要将列表或集合转换为元组,例如

set(map(tuple, list_of_lists1)).symmetric_difference(set(map(tuple, list_of_lists2)))

另请参阅如何在python中比较列表/集合的列表?

If you run into TypeError: unhashable type: 'list' you need to turn lists or sets into tuples, e.g.

set(map(tuple, list_of_lists1)).symmetric_difference(set(map(tuple, list_of_lists2)))

See also How to compare a list of lists/sets in python?


回答 18

假设我们有两个清单

list1 = [1, 3, 5, 7, 9]
list2 = [1, 2, 3, 4, 5]

从上面的两个列表中可以看出,列表2中存在项目1、3、5,而项目7、9中则不存在。另一方面,列表1中存在项目1、3、5,而项目2、4中不存在。

返回包含项目7、9和2、4的新列表的最佳解决方案是什么?

上面的所有答案都找到了解决方案,现在最最佳的是什么?

def difference(list1, list2):
    new_list = []
    for i in list1:
        if i not in list2:
            new_list.append(i)

    for j in list2:
        if j not in list1:
            new_list.append(j)
    return new_list

def sym_diff(list1, list2):
    return list(set(list1).symmetric_difference(set(list2)))

使用timeit我们可以看到结果

t1 = timeit.Timer("difference(list1, list2)", "from __main__ import difference, 
list1, list2")
t2 = timeit.Timer("sym_diff(list1, list2)", "from __main__ import sym_diff, 
list1, list2")

print('Using two for loops', t1.timeit(number=100000), 'Milliseconds')
print('Using two for loops', t2.timeit(number=100000), 'Milliseconds')

退货

[7, 9, 2, 4]
Using two for loops 0.11572412995155901 Milliseconds
Using symmetric_difference 0.11285737506113946 Milliseconds

Process finished with exit code 0

Let’s say we have two lists

list1 = [1, 3, 5, 7, 9]
list2 = [1, 2, 3, 4, 5]

we can see from the above two lists that items 1, 3, 5 exist in list2 and items 7, 9 do not. On the other hand, items 1, 3, 5 exist in list1 and items 2, 4 do not.

What is the best solution to return a new list containing items 7, 9 and 2, 4?

All answers above find the solution, now whats the most optimal?

def difference(list1, list2):
    new_list = []
    for i in list1:
        if i not in list2:
            new_list.append(i)

    for j in list2:
        if j not in list1:
            new_list.append(j)
    return new_list

versus

def sym_diff(list1, list2):
    return list(set(list1).symmetric_difference(set(list2)))

Using timeit we can see the results

t1 = timeit.Timer("difference(list1, list2)", "from __main__ import difference, 
list1, list2")
t2 = timeit.Timer("sym_diff(list1, list2)", "from __main__ import sym_diff, 
list1, list2")

print('Using two for loops', t1.timeit(number=100000), 'Milliseconds')
print('Using two for loops', t2.timeit(number=100000), 'Milliseconds')

returns

[7, 9, 2, 4]
Using two for loops 0.11572412995155901 Milliseconds
Using symmetric_difference 0.11285737506113946 Milliseconds

Process finished with exit code 0

回答 19

arulmr解决方案的单行版本

def diff(listA, listB):
    return set(listA) - set(listB) | set(listA) -set(listB)

single line version of arulmr solution

def diff(listA, listB):
    return set(listA) - set(listB) | set(listA) -set(listB)

回答 20

如果您想要更像变更集的东西…可以使用Counter

from collections import Counter

def diff(a, b):
  """ more verbose than needs to be, for clarity """
  ca, cb = Counter(a), Counter(b)
  to_add = cb - ca
  to_remove = ca - cb
  changes = Counter(to_add)
  changes.subtract(to_remove)
  return changes

lista = ['one', 'three', 'four', 'four', 'one']
listb = ['one', 'two', 'three']

In [127]: diff(lista, listb)
Out[127]: Counter({'two': 1, 'one': -1, 'four': -2})
# in order to go from lista to list b, you need to add a "two", remove a "one", and remove two "four"s

In [128]: diff(listb, lista)
Out[128]: Counter({'four': 2, 'one': 1, 'two': -1})
# in order to go from listb to lista, you must add two "four"s, add a "one", and remove a "two"

if you want something more like a changeset… could use Counter

from collections import Counter

def diff(a, b):
  """ more verbose than needs to be, for clarity """
  ca, cb = Counter(a), Counter(b)
  to_add = cb - ca
  to_remove = ca - cb
  changes = Counter(to_add)
  changes.subtract(to_remove)
  return changes

lista = ['one', 'three', 'four', 'four', 'one']
listb = ['one', 'two', 'three']

In [127]: diff(lista, listb)
Out[127]: Counter({'two': 1, 'one': -1, 'four': -2})
# in order to go from lista to list b, you need to add a "two", remove a "one", and remove two "four"s

In [128]: diff(listb, lista)
Out[128]: Counter({'four': 2, 'one': 1, 'two': -1})
# in order to go from listb to lista, you must add two "four"s, add a "one", and remove a "two"

回答 21

我们可以计算交集减去列表的并集:

temp1 = ['One', 'Two', 'Three', 'Four']
temp2 = ['One', 'Two', 'Five']

set(temp1+temp2)-(set(temp1)&set(temp2))

Out: set(['Four', 'Five', 'Three']) 

We can calculate intersection minus union of lists:

temp1 = ['One', 'Two', 'Three', 'Four']
temp2 = ['One', 'Two', 'Five']

set(temp1+temp2)-(set(temp1)&set(temp2))

Out: set(['Four', 'Five', 'Three']) 

回答 22

只需一行即可解决。给定的问题是两个列表(temp1和temp2)在第三个列表(temp3)中返​​回它们的差。

temp3 = list(set(temp1).difference(set(temp2)))

This can be solved with one line. The question is given two lists (temp1 and temp2) return their difference in a third list (temp3).

temp3 = list(set(temp1).difference(set(temp2)))

回答 23

这是区分两个列表(无论内容如何)的一种简单方法,您可以得到如下所示的结果:

>>> from sets import Set
>>>
>>> l1 = ['xvda', False, 'xvdbb', 12, 'xvdbc']
>>> l2 = ['xvda', 'xvdbb', 'xvdbc', 'xvdbd', None]
>>>
>>> Set(l1).symmetric_difference(Set(l2))
Set([False, 'xvdbd', None, 12])

希望这会有所帮助。

Here is an simple way to distinguish two lists (whatever the contents are), you can get the result as shown below :

>>> from sets import Set
>>>
>>> l1 = ['xvda', False, 'xvdbb', 12, 'xvdbc']
>>> l2 = ['xvda', 'xvdbb', 'xvdbc', 'xvdbd', None]
>>>
>>> Set(l1).symmetric_difference(Set(l2))
Set([False, 'xvdbd', None, 12])

Hope this will helpful.


回答 24

我更喜欢使用转换为集合,然后使用“ difference()”函数。完整的代码是:

temp1 = ['One', 'Two', 'Three', 'Four'  ]                   
temp2 = ['One', 'Two']
set1 = set(temp1)
set2 = set(temp2)
set3 = set1.difference(set2)
temp3 = list(set3)
print(temp3)

输出:

>>>print(temp3)
['Three', 'Four']

这是最容易理解的,如果将来使用大数据,将来会更容易,如果不需要重复数据,将其转换为数据集将删除重复数据。希望能帮助到你 ;-)

I prefer to use converting to sets and then using the “difference()” function. The full code is :

temp1 = ['One', 'Two', 'Three', 'Four'  ]                   
temp2 = ['One', 'Two']
set1 = set(temp1)
set2 = set(temp2)
set3 = set1.difference(set2)
temp3 = list(set3)
print(temp3)

Output:

>>>print(temp3)
['Three', 'Four']

It’s the easiest to undersand, and morover in future if you work with large data, converting it to sets will remove duplicates if duplicates are not required. Hope it helps ;-)


回答 25

(list(set(a)-set(b))+list(set(b)-set(a)))
(list(set(a)-set(b))+list(set(b)-set(a)))

回答 26

def diffList(list1, list2):     # returns the difference between two lists.
    if len(list1) > len(list2):
        return (list(set(list1) - set(list2)))
    else:
        return (list(set(list2) - set(list1)))

例如,如果list1 = [10, 15, 20, 25, 30, 35, 40]list2 = [25, 40, 35]则返回的列表将是output = [10, 20, 30, 15]

def diffList(list1, list2):     # returns the difference between two lists.
    if len(list1) > len(list2):
        return (list(set(list1) - set(list2)))
    else:
        return (list(set(list2) - set(list1)))

e.g. if list1 = [10, 15, 20, 25, 30, 35, 40] and list2 = [25, 40, 35] then the returned list will be output = [10, 20, 30, 15]


如何根据对象的属性对对象列表进行排序?

问题:如何根据对象的属性对对象列表进行排序?

我有一个Python对象列表,我想按对象本身的属性对其进行排序。该列表如下所示:

>>> ut
[<Tag: 128>, <Tag: 2008>, <Tag: <>, <Tag: actionscript>, <Tag: addresses>,
 <Tag: aes>, <Tag: ajax> ...]

每个对象都有一个计数:

>>> ut[1].count
1L

我需要按递减计数对列表进行排序。

我已经看到了几种方法,但是我正在寻找Python的最佳实践。

I’ve got a list of Python objects that I’d like to sort by an attribute of the objects themselves. The list looks like:

>>> ut
[<Tag: 128>, <Tag: 2008>, <Tag: <>, <Tag: actionscript>, <Tag: addresses>,
 <Tag: aes>, <Tag: ajax> ...]

Each object has a count:

>>> ut[1].count
1L

I need to sort the list by number of counts descending.

I’ve seen several methods for this, but I’m looking for best practice in Python.


回答 0

# To sort the list in place...
ut.sort(key=lambda x: x.count, reverse=True)

# To return a new list, use the sorted() built-in function...
newlist = sorted(ut, key=lambda x: x.count, reverse=True)

有关按键排序的更多信息。

# To sort the list in place...
ut.sort(key=lambda x: x.count, reverse=True)

# To return a new list, use the sorted() built-in function...
newlist = sorted(ut, key=lambda x: x.count, reverse=True)

More on sorting by keys.


回答 1

可以使用最快的方法,尤其是在您的列表中有很多记录的情况下operator.attrgetter("count")。但是,它可以在预操作者版本的Python上运行,因此具有后备机制会很好。然后,您可能需要执行以下操作:

try: import operator
except ImportError: keyfun= lambda x: x.count # use a lambda if no operator module
else: keyfun= operator.attrgetter("count") # use operator since it's faster than lambda

ut.sort(key=keyfun, reverse=True) # sort in-place

A way that can be fastest, especially if your list has a lot of records, is to use operator.attrgetter("count"). However, this might run on an pre-operator version of Python, so it would be nice to have a fallback mechanism. You might want to do the following, then:

try: import operator
except ImportError: keyfun= lambda x: x.count # use a lambda if no operator module
else: keyfun= operator.attrgetter("count") # use operator since it's faster than lambda

ut.sort(key=keyfun, reverse=True) # sort in-place

回答 2

读者应注意,key =方法:

ut.sort(key=lambda x: x.count, reverse=True)

比向对象添加丰富的比较运算符快许多倍。我很惊讶地阅读了这篇文章(“ Python in a Nutshell”的第485页)。您可以通过在这个小程序上运行测试来确认这一点:

#!/usr/bin/env python
import random

class C:
    def __init__(self,count):
        self.count = count

    def __cmp__(self,other):
        return cmp(self.count,other.count)

longList = [C(random.random()) for i in xrange(1000000)] #about 6.1 secs
longList2 = longList[:]

longList.sort() #about 52 - 6.1 = 46 secs
longList2.sort(key = lambda c: c.count) #about 9 - 6.1 = 3 secs

我的非常少的测试表明,第一种方法的运行速度要慢10倍以上,但书中说,一般而言,它仅慢5倍左右。他们说的原因是由于python(timsort)中使用了高度优化的排序算法。

仍然,.sort(lambda)比普通的旧.sort()快是很奇怪的。我希望他们能解决这个问题。

Readers should notice that the key= method:

ut.sort(key=lambda x: x.count, reverse=True)

is many times faster than adding rich comparison operators to the objects. I was surprised to read this (page 485 of “Python in a Nutshell”). You can confirm this by running tests on this little program:

#!/usr/bin/env python
import random

class C:
    def __init__(self,count):
        self.count = count

    def __cmp__(self,other):
        return cmp(self.count,other.count)

longList = [C(random.random()) for i in xrange(1000000)] #about 6.1 secs
longList2 = longList[:]

longList.sort() #about 52 - 6.1 = 46 secs
longList2.sort(key = lambda c: c.count) #about 9 - 6.1 = 3 secs

My, very minimal, tests show the first sort is more than 10 times slower, but the book says it is only about 5 times slower in general. The reason they say is due to the highly optimizes sort algorithm used in python (timsort).

Still, its very odd that .sort(lambda) is faster than plain old .sort(). I hope they fix that.


回答 3

面向对象的方法

最好将对象排序逻辑(如果适用)设置为类的属性,而不是在每个实例中都要求进行排序。

这样可以确保一致性,并且不需要样板代码。

至少,您应该指定__eq____lt__操作此功能。然后使用sorted(list_of_objects)

class Card(object):

    def __init__(self, rank, suit):
        self.rank = rank
        self.suit = suit

    def __eq__(self, other):
        return self.rank == other.rank and self.suit == other.suit

    def __lt__(self, other):
        return self.rank < other.rank

hand = [Card(10, 'H'), Card(2, 'h'), Card(12, 'h'), Card(13, 'h'), Card(14, 'h')]
hand_order = [c.rank for c in hand]  # [10, 2, 12, 13, 14]

hand_sorted = sorted(hand)
hand_sorted_order = [c.rank for c in hand_sorted]  # [2, 10, 12, 13, 14]

Object-oriented approach

It’s good practice to make object sorting logic, if applicable, a property of the class rather than incorporated in each instance the ordering is required.

This ensures consistency and removes the need for boilerplate code.

At a minimum, you should specify __eq__ and __lt__ operations for this to work. Then just use sorted(list_of_objects).

class Card(object):

    def __init__(self, rank, suit):
        self.rank = rank
        self.suit = suit

    def __eq__(self, other):
        return self.rank == other.rank and self.suit == other.suit

    def __lt__(self, other):
        return self.rank < other.rank

hand = [Card(10, 'H'), Card(2, 'h'), Card(12, 'h'), Card(13, 'h'), Card(14, 'h')]
hand_order = [c.rank for c in hand]  # [10, 2, 12, 13, 14]

hand_sorted = sorted(hand)
hand_sorted_order = [c.rank for c in hand_sorted]  # [2, 10, 12, 13, 14]

回答 4

from operator import attrgetter
ut.sort(key = attrgetter('count'), reverse = True)
from operator import attrgetter
ut.sort(key = attrgetter('count'), reverse = True)

回答 5

它看起来很像Django ORM模型实例的列表。

为什么不对这样的查询进行排序:

ut = Tag.objects.order_by('-count')

It looks much like a list of Django ORM model instances.

Why not sort them on query like this:

ut = Tag.objects.order_by('-count')

回答 6

将丰富的比较运算符添加到对象类,然后使用列表的sort()方法。
参见python中的丰富比较


更新:尽管此方法可行,但我认为Triptych的解决方案更简单,因此更适合您的情况。

Add rich comparison operators to the object class, then use sort() method of the list.
See rich comparison in python.


Update: Although this method would work, I think solution from Triptych is better suited to your case because way simpler.


回答 7

如果要排序的属性property,则可以避免导入,operator.attrgetter而可以使用属性的fget方法。

例如,对于Circle具有属性的类,radius我们可以circles按如下所示对半径列表进行排序:

result = sorted(circles, key=Circle.radius.fget)

这不是最知名的功能,但通常使我免于导入的麻烦。

If the attribute you want to sort by is a property, then you can avoid importing operator.attrgetter and use the property’s fget method instead.

For example, for a class Circle with a property radius we could sort a list of circles by radii as follows:

result = sorted(circles, key=Circle.radius.fget)

This is not the most well-known feature but often saves me a line with the import.


更改Pandas中列的数据类型

问题:更改Pandas中列的数据类型

我想将表示为列表列表的表转换为Pandas DataFrame。作为一个极其简化的示例:

a = [['a', '1.2', '4.2'], ['b', '70', '0.03'], ['x', '5', '0']]
df = pd.DataFrame(a)

将列转换为适当类型的最佳方法是什么,在这种情况下,将列2和3转换为浮点数?有没有一种方法可以在转换为DataFrame时指定类型?还是先创建DataFrame然后遍历各列以更改各列的类型更好?理想情况下,我想以动态方式执行此操作,因为可以有数百个列,并且我不想确切指定哪些列属于哪种类型。我可以保证的是,每一列都包含相同类型的值。

I want to convert a table, represented as a list of lists, into a Pandas DataFrame. As an extremely simplified example:

a = [['a', '1.2', '4.2'], ['b', '70', '0.03'], ['x', '5', '0']]
df = pd.DataFrame(a)

What is the best way to convert the columns to the appropriate types, in this case columns 2 and 3 into floats? Is there a way to specify the types while converting to DataFrame? Or is it better to create the DataFrame first and then loop through the columns to change the type for each column? Ideally I would like to do this in a dynamic way because there can be hundreds of columns and I don’t want to specify exactly which columns are of which type. All I can guarantee is that each columns contains values of the same type.


回答 0

您可以使用三种主要选项来转换熊猫的类型:

  1. to_numeric()提供安全地将非数字类型(例如字符串)转换为合适的数字类型的功能。(另请参见to_datetime()to_timedelta()。)

  2. astype()-将(几乎)任何类型转换为(几乎)任何其他类型(即使这样做不一定明智)。还允许您转换为分类类型(非常有用)。

  3. infer_objects() -一种实用方法,如果可能的话,将保存Python对象的对象列转换为熊猫类型。

继续阅读以获取每种方法的更详细的解释和用法。


1。 to_numeric()

将DataFrame的一列或多列转换为数值的最佳方法是使用pandas.to_numeric()

此函数将尝试将非数字对象(例如字符串)适当地更改为整数或浮点数。

基本用法

输入to_numeric()是DataFrame的Series或单个列。

>>> s = pd.Series(["8", 6, "7.5", 3, "0.9"]) # mixed string and numeric values
>>> s
0      8
1      6
2    7.5
3      3
4    0.9
dtype: object

>>> pd.to_numeric(s) # convert everything to float values
0    8.0
1    6.0
2    7.5
3    3.0
4    0.9
dtype: float64

如您所见,将返回一个新的Series。请记住,将此输出分配给变量或列名以继续使用它:

# convert Series
my_series = pd.to_numeric(my_series)

# convert column "a" of a DataFrame
df["a"] = pd.to_numeric(df["a"])

您还可以通过以下apply()方法使用它来转换DataFrame的多个列:

# convert all columns of DataFrame
df = df.apply(pd.to_numeric) # convert all columns of DataFrame

# convert just columns "a" and "b"
df[["a", "b"]] = df[["a", "b"]].apply(pd.to_numeric)

只要您的值都可以转换,那可能就是您所需要的。

错误处理

但是,如果某些值不能转换为数字类型怎么办?

to_numeric()还使用errors关键字参数,该参数允许您将非数字值强制为NaN,或仅忽略包含这些值的列。

这是使用一系列s具有对象dtype 的字符串的示例:

>>> s = pd.Series(['1', '2', '4.7', 'pandas', '10'])
>>> s
0         1
1         2
2       4.7
3    pandas
4        10
dtype: object

如果无法转换值,则默认行为是引发。在这种情况下,它不能处理字符串“ pandas”:

>>> pd.to_numeric(s) # or pd.to_numeric(s, errors='raise')
ValueError: Unable to parse string

我们可能希望将“ pandas”视为丢失/错误的数值,而不是失败。我们可以NaN使用errors关键字参数将无效值强制如下:

>>> pd.to_numeric(s, errors='coerce')
0     1.0
1     2.0
2     4.7
3     NaN
4    10.0
dtype: float64

第三个选项errors只是在遇到无效值时忽略该操作:

>>> pd.to_numeric(s, errors='ignore')
# the original Series is returned untouched

当您要转换整个DataFrame,但又不知道我们哪些列可以可靠地转换为数字类型时,最后一个选项特别有用。在这种情况下,只需写:

df.apply(pd.to_numeric, errors='ignore')

该函数将应用于DataFrame的每一列。可以转换为数字类型的列将被转换,而不能转换(例如,它们包含非数字字符串或日期)的列将被保留。

下垂

默认情况下,with转换to_numeric()将为您提供a int64float64dtype(或平台固有的任何整数宽度)。

通常这就是您想要的,但是如果您想节省一些内存并使用更紧凑的dtype,如float32int8呢?

to_numeric()您可以选择向下转换为“整数”,“有符号”,“无符号”,“浮点型”。这是一个简单s的整数类型系列的示例:

>>> s = pd.Series([1, 2, -7])
>>> s
0    1
1    2
2   -7
dtype: int64

向下转换为“整数”将使用可以保存值的最小整数:

>>> pd.to_numeric(s, downcast='integer')
0    1
1    2
2   -7
dtype: int8

向下转换为“ float”类似地选择了一个比普通浮点型小的类型:

>>> pd.to_numeric(s, downcast='float')
0    1.0
1    2.0
2   -7.0
dtype: float32

2。 astype()

astype()方法使您可以明确表示希望DataFrame或Series具有的dtype。它非常通用,可以尝试从一种类型转换为另一种类型。

基本用法

只需选择一个类型:您可以使用NumPy dtype(例如np.int16),某些Python类型(例如bool)或特定于熊猫的类型(例如类别dtype)。

在要转换的对象上调用方法,然后astype()将尝试为您转换:

# convert all DataFrame columns to the int64 dtype
df = df.astype(int)

# convert column "a" to int64 dtype and "b" to complex type
df = df.astype({"a": int, "b": complex})

# convert Series to float16 type
s = s.astype(np.float16)

# convert Series to Python strings
s = s.astype(str)

# convert Series to categorical type - see docs for more details
s = s.astype('category')

注意,我说的是“尝试”-如果astype()不知道如何在Series或DataFrame中转换值,则会引发错误。例如,如果您具有NaNor inf值,则尝试将其转换为整数时会出错。

从熊猫0.20.0开始,可以通过传递来抑制此错误errors='ignore'。您的原始对象将保持原样返回。

小心

astype()功能强大,但有时会“错误地”转换值。例如:

>>> s = pd.Series([1, 2, -7])
>>> s
0    1
1    2
2   -7
dtype: int64

这些都是小整数,那么如何转换为无符号8位类型以节省内存呢?

>>> s.astype(np.uint8)
0      1
1      2
2    249
dtype: uint8

转换工作,但-7包裹轮成为249(如2 8 – 7)!

尝试使用向下转换来pd.to_numeric(s, downcast='unsigned')帮助防止此错误。


3。 infer_objects()

pandas的0.21.0版引入了infer_objects()将具有对象数据类型的DataFrame列转换为更特定类型(软转换)的方法。

例如,这是一个带有两列对象类型的DataFrame。一个保存实际的整数,另一个保存代表整数的字符串:

>>> df = pd.DataFrame({'a': [7, 1, 5], 'b': ['3','2','1']}, dtype='object')
>>> df.dtypes
a    object
b    object
dtype: object

使用infer_objects(),您可以将列’a’的类型更改为int64:

>>> df = df.infer_objects()
>>> df.dtypes
a     int64
b    object
dtype: object

由于列“ b”的值是字符串而不是整数,因此已被保留。如果要尝试强制将两列都转换为整数类型,则可以df.astype(int)改用。

You have three main options for converting types in pandas:

  1. to_numeric() – provides functionality to safely convert non-numeric types (e.g. strings) to a suitable numeric type. (See also to_datetime() and to_timedelta().)

  2. astype() – convert (almost) any type to (almost) any other type (even if it’s not necessarily sensible to do so). Also allows you to convert to categorial types (very useful).

  3. infer_objects() – a utility method to convert object columns holding Python objects to a pandas type if possible.

Read on for more detailed explanations and usage of each of these methods.


1. to_numeric()

The best way to convert one or more columns of a DataFrame to numeric values is to use pandas.to_numeric().

This function will try to change non-numeric objects (such as strings) into integers or floating point numbers as appropriate.

Basic usage

The input to to_numeric() is a Series or a single column of a DataFrame.

>>> s = pd.Series(["8", 6, "7.5", 3, "0.9"]) # mixed string and numeric values
>>> s
0      8
1      6
2    7.5
3      3
4    0.9
dtype: object

>>> pd.to_numeric(s) # convert everything to float values
0    8.0
1    6.0
2    7.5
3    3.0
4    0.9
dtype: float64

As you can see, a new Series is returned. Remember to assign this output to a variable or column name to continue using it:

# convert Series
my_series = pd.to_numeric(my_series)

# convert column "a" of a DataFrame
df["a"] = pd.to_numeric(df["a"])

You can also use it to convert multiple columns of a DataFrame via the apply() method:

# convert all columns of DataFrame
df = df.apply(pd.to_numeric) # convert all columns of DataFrame

# convert just columns "a" and "b"
df[["a", "b"]] = df[["a", "b"]].apply(pd.to_numeric)

As long as your values can all be converted, that’s probably all you need.

Error handling

But what if some values can’t be converted to a numeric type?

to_numeric() also takes an errors keyword argument that allows you to force non-numeric values to be NaN, or simply ignore columns containing these values.

Here’s an example using a Series of strings s which has the object dtype:

>>> s = pd.Series(['1', '2', '4.7', 'pandas', '10'])
>>> s
0         1
1         2
2       4.7
3    pandas
4        10
dtype: object

The default behaviour is to raise if it can’t convert a value. In this case, it can’t cope with the string ‘pandas’:

>>> pd.to_numeric(s) # or pd.to_numeric(s, errors='raise')
ValueError: Unable to parse string

Rather than fail, we might want ‘pandas’ to be considered a missing/bad numeric value. We can coerce invalid values to NaN as follows using the errors keyword argument:

>>> pd.to_numeric(s, errors='coerce')
0     1.0
1     2.0
2     4.7
3     NaN
4    10.0
dtype: float64

The third option for errors is just to ignore the operation if an invalid value is encountered:

>>> pd.to_numeric(s, errors='ignore')
# the original Series is returned untouched

This last option is particularly useful when you want to convert your entire DataFrame, but don’t not know which of our columns can be converted reliably to a numeric type. In that case just write:

df.apply(pd.to_numeric, errors='ignore')

The function will be applied to each column of the DataFrame. Columns that can be converted to a numeric type will be converted, while columns that cannot (e.g. they contain non-digit strings or dates) will be left alone.

Downcasting

By default, conversion with to_numeric() will give you either a int64 or float64 dtype (or whatever integer width is native to your platform).

That’s usually what you want, but what if you wanted to save some memory and use a more compact dtype, like float32, or int8?

to_numeric() gives you the option to downcast to either ‘integer’, ‘signed’, ‘unsigned’, ‘float’. Here’s an example for a simple series s of integer type:

>>> s = pd.Series([1, 2, -7])
>>> s
0    1
1    2
2   -7
dtype: int64

Downcasting to ‘integer’ uses the smallest possible integer that can hold the values:

>>> pd.to_numeric(s, downcast='integer')
0    1
1    2
2   -7
dtype: int8

Downcasting to ‘float’ similarly picks a smaller than normal floating type:

>>> pd.to_numeric(s, downcast='float')
0    1.0
1    2.0
2   -7.0
dtype: float32

2. astype()

The astype() method enables you to be explicit about the dtype you want your DataFrame or Series to have. It’s very versatile in that you can try and go from one type to the any other.

Basic usage

Just pick a type: you can use a NumPy dtype (e.g. np.int16), some Python types (e.g. bool), or pandas-specific types (like the categorical dtype).

Call the method on the object you want to convert and astype() will try and convert it for you:

# convert all DataFrame columns to the int64 dtype
df = df.astype(int)

# convert column "a" to int64 dtype and "b" to complex type
df = df.astype({"a": int, "b": complex})

# convert Series to float16 type
s = s.astype(np.float16)

# convert Series to Python strings
s = s.astype(str)

# convert Series to categorical type - see docs for more details
s = s.astype('category')

Notice I said “try” – if astype() does not know how to convert a value in the Series or DataFrame, it will raise an error. For example if you have a NaN or inf value you’ll get an error trying to convert it to an integer.

As of pandas 0.20.0, this error can be suppressed by passing errors='ignore'. Your original object will be return untouched.

Be careful

astype() is powerful, but it will sometimes convert values “incorrectly”. For example:

>>> s = pd.Series([1, 2, -7])
>>> s
0    1
1    2
2   -7
dtype: int64

These are small integers, so how about converting to an unsigned 8-bit type to save memory?

>>> s.astype(np.uint8)
0      1
1      2
2    249
dtype: uint8

The conversion worked, but the -7 was wrapped round to become 249 (i.e. 28 – 7)!

Trying to downcast using pd.to_numeric(s, downcast='unsigned') instead could help prevent this error.


3. infer_objects()

Version 0.21.0 of pandas introduced the method infer_objects() for converting columns of a DataFrame that have an object datatype to a more specific type (soft conversions).

For example, here’s a DataFrame with two columns of object type. One holds actual integers and the other holds strings representing integers:

>>> df = pd.DataFrame({'a': [7, 1, 5], 'b': ['3','2','1']}, dtype='object')
>>> df.dtypes
a    object
b    object
dtype: object

Using infer_objects(), you can change the type of column ‘a’ to int64:

>>> df = df.infer_objects()
>>> df.dtypes
a     int64
b    object
dtype: object

Column ‘b’ has been left alone since its values were strings, not integers. If you wanted to try and force the conversion of both columns to an integer type, you could use df.astype(int) instead.


回答 1

这个怎么样?

a = [['a', '1.2', '4.2'], ['b', '70', '0.03'], ['x', '5', '0']]
df = pd.DataFrame(a, columns=['one', 'two', 'three'])
df
Out[16]: 
  one  two three
0   a  1.2   4.2
1   b   70  0.03
2   x    5     0

df.dtypes
Out[17]: 
one      object
two      object
three    object

df[['two', 'three']] = df[['two', 'three']].astype(float)

df.dtypes
Out[19]: 
one       object
two      float64
three    float64

How about this?

a = [['a', '1.2', '4.2'], ['b', '70', '0.03'], ['x', '5', '0']]
df = pd.DataFrame(a, columns=['one', 'two', 'three'])
df
Out[16]: 
  one  two three
0   a  1.2   4.2
1   b   70  0.03
2   x    5     0

df.dtypes
Out[17]: 
one      object
two      object
three    object

df[['two', 'three']] = df[['two', 'three']].astype(float)

df.dtypes
Out[19]: 
one       object
two      float64
three    float64

回答 2

下面的代码将更改列的数据类型。

df[['col.name1', 'col.name2'...]] = df[['col.name1', 'col.name2'..]].astype('data_type')

您可以给数据类型代替数据类型。您想要什么,例如str,float,int等。

this below code will change datatype of column.

df[['col.name1', 'col.name2'...]] = df[['col.name1', 'col.name2'..]].astype('data_type')

in place of data type you can give your datatype .what do you want like str,float,int etc.


回答 3

当我只需要指定特定的列并且想要明确时,我就使用了(每个DOCS LOCATION):

dataframe = dataframe.astype({'col_name_1':'int','col_name_2':'float64', etc. ...})

因此,使用原始问题,但为其提供列名称…

a = [['a', '1.2', '4.2'], ['b', '70', '0.03'], ['x', '5', '0']]
df = pd.DataFrame(a, columns=['col_name_1', 'col_name_2', 'col_name_3'])
df = df.astype({'col_name_2':'float64', 'col_name_3':'float64'})

When I’ve only needed to specify specific columns, and I want to be explicit, I’ve used (per DOCS LOCATION):

dataframe = dataframe.astype({'col_name_1':'int','col_name_2':'float64', etc. ...})

So, using the original question, but providing column names to it …

a = [['a', '1.2', '4.2'], ['b', '70', '0.03'], ['x', '5', '0']]
df = pd.DataFrame(a, columns=['col_name_1', 'col_name_2', 'col_name_3'])
df = df.astype({'col_name_2':'float64', 'col_name_3':'float64'})

回答 4

这是一个函数,该函数将DataFrame和列列表作为参数,并将列中的所有数据强制转换为数字。

# df is the DataFrame, and column_list is a list of columns as strings (e.g ["col1","col2","col3"])
# dependencies: pandas

def coerce_df_columns_to_numeric(df, column_list):
    df[column_list] = df[column_list].apply(pd.to_numeric, errors='coerce')

因此,以您的示例为例:

import pandas as pd

def coerce_df_columns_to_numeric(df, column_list):
    df[column_list] = df[column_list].apply(pd.to_numeric, errors='coerce')

a = [['a', '1.2', '4.2'], ['b', '70', '0.03'], ['x', '5', '0']]
df = pd.DataFrame(a, columns=['col1','col2','col3'])

coerce_df_columns_to_numeric(df, ['col2','col3'])

Here is a function that takes as its arguments a DataFrame and a list of columns and coerces all data in the columns to numbers.

# df is the DataFrame, and column_list is a list of columns as strings (e.g ["col1","col2","col3"])
# dependencies: pandas

def coerce_df_columns_to_numeric(df, column_list):
    df[column_list] = df[column_list].apply(pd.to_numeric, errors='coerce')

So, for your example:

import pandas as pd

def coerce_df_columns_to_numeric(df, column_list):
    df[column_list] = df[column_list].apply(pd.to_numeric, errors='coerce')

a = [['a', '1.2', '4.2'], ['b', '70', '0.03'], ['x', '5', '0']]
df = pd.DataFrame(a, columns=['col1','col2','col3'])

coerce_df_columns_to_numeric(df, ['col2','col3'])

回答 5

如何创建两个数据框,每个数据框的列具有不同的数据类型,然后将它们附加在一起?

d1 = pd.DataFrame(columns=[ 'float_column' ], dtype=float)
d1 = d1.append(pd.DataFrame(columns=[ 'string_column' ], dtype=str))

结果

In[8}:  d1.dtypes
Out[8]: 
float_column     float64
string_column     object
dtype: object

创建数据框后,可以在第一列中填充浮点变量,并在第二列中填充字符串(或所需的任何数据类型)。

How about creating two dataframes, each with different data types for their columns, and then appending them together?

d1 = pd.DataFrame(columns=[ 'float_column' ], dtype=float)
d1 = d1.append(pd.DataFrame(columns=[ 'string_column' ], dtype=str))

Results

In[8}:  d1.dtypes
Out[8]: 
float_column     float64
string_column     object
dtype: object

After the dataframe is created, you can populate it with floating point variables in the 1st column, and strings (or any data type you desire) in the 2nd column.


回答 6

熊猫> = 1.0

这是一张图表,总结了熊猫中一些最重要的转换。

在此处输入图片说明

转换为字符串很简单.astype(str),未在图中显示。

“硬”对“软”转换

注意,在这种情况下,“转换”既可以指将文本数据转换为实际数据类型(硬转换),也可以为对象列中的数据推断更合适的数据类型(软转换)。为了说明不同之处,请看一下

df = pd.DataFrame({'a': ['1', '2', '3'], 'b': [4, 5, 6]}, dtype=object)
df.dtypes                                                                  

a    object
b    object
dtype: object

# Actually converts string to numeric - hard conversion
df.apply(pd.to_numeric).dtypes                                             

a    int64
b    int64
dtype: object

# Infers better data types for object data - soft conversion
df.infer_objects().dtypes                                                  

a    object  # no change
b     int64
dtype: object

# Same as infer_objects, but converts to equivalent ExtensionType
df.convert_dtypes().dtypes                                                     

pandas >= 1.0

Here’s a chart that summarises some of the most important conversions in pandas.

enter image description here

Conversions to string are trivial .astype(str) and are not shown in the figure.

“Hard” versus “Soft” conversions

Note that “conversions” in this context could either refer to converting text data into their actual data type (hard conversion), or inferring more appropriate data types for data in object columns (soft conversion). To illustrate the difference, take a look at

df = pd.DataFrame({'a': ['1', '2', '3'], 'b': [4, 5, 6]}, dtype=object)
df.dtypes                                                                  

a    object
b    object
dtype: object

# Actually converts string to numeric - hard conversion
df.apply(pd.to_numeric).dtypes                                             

a    int64
b    int64
dtype: object

# Infers better data types for object data - soft conversion
df.infer_objects().dtypes                                                  

a    object  # no change
b     int64
dtype: object

# Same as infer_objects, but converts to equivalent ExtensionType
df.convert_dtypes().dtypes                                                     

回答 7

我以为我遇到了同样的问题,但实际上我有一些细微的差别,使问题更容易解决。对于其他正在看这个问题的人,值得检查输入列表的格式。就我而言,数字最初是浮动的,而不是问题中的字符串:

a = [['a', 1.2, 4.2], ['b', 70, 0.03], ['x', 5, 0]]

但是通过在创建数据框之前过多处理列表,我丢失了类型,所有内容都变成了字符串。

通过numpy数组创建数据框

df = pd.DataFrame(np.array(a))

df
Out[5]: 
   0    1     2
0  a  1.2   4.2
1  b   70  0.03
2  x    5     0

df[1].dtype
Out[7]: dtype('O')

给出与问题相同的数据帧,其中第1列和第2列中的条目被视为字符串。但是做

df = pd.DataFrame(a)

df
Out[10]: 
   0     1     2
0  a   1.2  4.20
1  b  70.0  0.03
2  x   5.0  0.00

df[1].dtype
Out[11]: dtype('float64')

确实给出了具有正确格式的列的数据框

I thought I had the same problem but actually I have a slight difference that makes the problem easier to solve. For others looking at this question it’s worth checking the format of your input list. In my case the numbers are initially floats not strings as in the question:

a = [['a', 1.2, 4.2], ['b', 70, 0.03], ['x', 5, 0]]

but by processing the list too much before creating the dataframe I lose the types and everything becomes a string.

Creating the data frame via a numpy array

df = pd.DataFrame(np.array(a))

df
Out[5]: 
   0    1     2
0  a  1.2   4.2
1  b   70  0.03
2  x    5     0

df[1].dtype
Out[7]: dtype('O')

gives the same data frame as in the question, where the entries in columns 1 and 2 are considered as strings. However doing

df = pd.DataFrame(a)

df
Out[10]: 
   0     1     2
0  a   1.2  4.20
1  b  70.0  0.03
2  x   5.0  0.00

df[1].dtype
Out[11]: dtype('float64')

does actually give a data frame with the columns in the correct format


回答 8

从熊猫1.0.0开始,我们有了pandas.DataFrame.convert_dtypes。您甚至可以控制要转换的类型!

In [40]: df = pd.DataFrame(
    ...:     {
    ...:         "a": pd.Series([1, 2, 3], dtype=np.dtype("int32")),
    ...:         "b": pd.Series(["x", "y", "z"], dtype=np.dtype("O")),
    ...:         "c": pd.Series([True, False, np.nan], dtype=np.dtype("O")),
    ...:         "d": pd.Series(["h", "i", np.nan], dtype=np.dtype("O")),
    ...:         "e": pd.Series([10, np.nan, 20], dtype=np.dtype("float")),
    ...:         "f": pd.Series([np.nan, 100.5, 200], dtype=np.dtype("float")),
    ...:     }
    ...: )

In [41]: dff = df.copy()

In [42]: df 
Out[42]: 
   a  b      c    d     e      f
0  1  x   True    h  10.0    NaN
1  2  y  False    i   NaN  100.5
2  3  z    NaN  NaN  20.0  200.0

In [43]: df.dtypes
Out[43]: 
a      int32
b     object
c     object
d     object
e    float64
f    float64
dtype: object

In [44]: df = df.convert_dtypes()

In [45]: df.dtypes
Out[45]: 
a      Int32
b     string
c    boolean
d     string
e      Int64
f    float64
dtype: object

In [46]: dff = dff.convert_dtypes(convert_boolean = False)

In [47]: dff.dtypes
Out[47]: 
a      Int32
b     string
c     object
d     string
e      Int64
f    float64
dtype: object

Starting pandas 1.0.0, we have pandas.DataFrame.convert_dtypes. You can even control what types to convert!

In [40]: df = pd.DataFrame(
    ...:     {
    ...:         "a": pd.Series([1, 2, 3], dtype=np.dtype("int32")),
    ...:         "b": pd.Series(["x", "y", "z"], dtype=np.dtype("O")),
    ...:         "c": pd.Series([True, False, np.nan], dtype=np.dtype("O")),
    ...:         "d": pd.Series(["h", "i", np.nan], dtype=np.dtype("O")),
    ...:         "e": pd.Series([10, np.nan, 20], dtype=np.dtype("float")),
    ...:         "f": pd.Series([np.nan, 100.5, 200], dtype=np.dtype("float")),
    ...:     }
    ...: )

In [41]: dff = df.copy()

In [42]: df 
Out[42]: 
   a  b      c    d     e      f
0  1  x   True    h  10.0    NaN
1  2  y  False    i   NaN  100.5
2  3  z    NaN  NaN  20.0  200.0

In [43]: df.dtypes
Out[43]: 
a      int32
b     object
c     object
d     object
e    float64
f    float64
dtype: object

In [44]: df = df.convert_dtypes()

In [45]: df.dtypes
Out[45]: 
a      Int32
b     string
c    boolean
d     string
e      Int64
f    float64
dtype: object

In [46]: dff = dff.convert_dtypes(convert_boolean = False)

In [47]: dff.dtypes
Out[47]: 
a      Int32
b     string
c     object
d     string
e      Int64
f    float64
dtype: object

如何获取当前文件目录的完整路径?

问题:如何获取当前文件目录的完整路径?

我想获取当前文件的目录路径。我试过了:

>>> os.path.abspath(__file__)
'C:\\python27\\test.py'

但是如何检索目录的路径?

例如:

'C:\\python27\\'

I want to get the current file’s directory path. I tried:

>>> os.path.abspath(__file__)
'C:\\python27\\test.py'

But how can I retrieve the directory’s path?

For example:

'C:\\python27\\'

回答 0

Python 3

对于正在运行的脚本的目录:

import pathlib
pathlib.Path(__file__).parent.absolute()

对于当前工作目录:

import pathlib
pathlib.Path().absolute()

Python 2和3

对于正在运行的脚本的目录:

import os
os.path.dirname(os.path.abspath(__file__))

如果您的意思是当前工作目录:

import os
os.path.abspath(os.getcwd())

请注意,前后分别file是两个下划线,而不仅仅是一个。

另请注意,如果您正在交互运行或已从文件以外的内容(例如数据库或在线资源)中加载了代码,则__file__可能不会设置,因为没有“当前文件”的概念。上面的答案假设运行文件中的python脚本的最常见情况。

参考文献

  1. python文档中的pathlib
  2. os.path 2.7os.path 3.8
  3. os.getcwd 2.7os.getcwd 3.8
  4. __file__变量的含义/作用是什么?

Python 3

For the directory of the script being run:

import pathlib
pathlib.Path(__file__).parent.absolute()

For the current working directory:

import pathlib
pathlib.Path().absolute()

Python 2 and 3

For the directory of the script being run:

import os
os.path.dirname(os.path.abspath(__file__))

If you mean the current working directory:

import os
os.path.abspath(os.getcwd())

Note that before and after file is two underscores, not just one.

Also note that if you are running interactively or have loaded code from something other than a file (eg: a database or online resource), __file__ may not be set since there is no notion of “current file”. The above answer assumes the most common scenario of running a python script that is in a file.

References

  1. pathlib in the python documentation.
  2. os.path 2.7, os.path 3.8
  3. os.getcwd 2.7, os.getcwd 3.8
  4. what does the __file__ variable mean/do?

回答 1

使用Path是因为Python 3的推荐方式:

from pathlib import Path
print("File      Path:", Path(__file__).absolute())
print("Directory Path:", Path().absolute())  

文档:pathlib

注意:如果使用Jupyter Notebook,__file__则不会返回期望值,因此Path().absolute()必须使用。

Using Path is the recommended way since Python 3:

from pathlib import Path
print("File      Path:", Path(__file__).absolute())
print("Directory Path:", Path().absolute())  

Documentation: pathlib

Note: If using Jupyter Notebook, __file__ doesn’t return expected value, so Path().absolute() has to be used.


回答 2

在Python 3.x中,我这样做:

from pathlib import Path

path = Path(__file__).parent.absolute()

说明:

  • Path(__file__) 是当前文件的路径。
  • .parent为您提供文件所在的目录
  • .absolute()给您完整的绝对路径。

使用pathlib是使用路径的现代方法。如果以后由于某种原因需要它作为字符串,只需执行str(path)

In Python 3.x I do:

from pathlib import Path

path = Path(__file__).parent.absolute()

Explanation:

  • Path(__file__) is the path to the current file.
  • .parent gives you the directory the file is in.
  • .absolute() gives you the full absolute path to it.

Using pathlib is the modern way to work with paths. If you need it as a string later for some reason, just do str(path).


回答 3

import os
print os.path.dirname(__file__)
import os
print os.path.dirname(__file__)

回答 4

您可以轻松地使用os和存储os.path库,如下所示

import os
os.chdir(os.path.dirname(os.getcwd()))

os.path.dirname从当前目录返回上一级目录。它使我们可以在不传递任何文件参数且不知道绝对路径的情况下切换到更高级别。

You can use os and os.path library easily as follows

import os
os.chdir(os.path.dirname(os.getcwd()))

os.path.dirname returns upper directory from current one. It lets us change to an upper level without passing any file argument and without knowing absolute path.


回答 5

尝试这个:

import os
dir_path = os.path.dirname(os.path.realpath(__file__))

Try this:

import os
dir_path = os.path.dirname(os.path.realpath(__file__))

回答 6

我发现以下命令将全部返回Python 3.6脚本的父目录的完整路径。

Python 3.6脚本:

#!/usr/bin/env python3.6
# -*- coding: utf-8 -*-

from pathlib import Path

#Get the absolute path of a Python3.6 script
dir1 = Path().resolve()  #Make the path absolute, resolving any symlinks.
dir2 = Path().absolute() #See @RonKalian answer 
dir3 = Path(__file__).parent.absolute() #See @Arminius answer 

print(f'dir1={dir1}\ndir2={dir2}\ndir3={dir3}')

说明链接:.resolve() .absolute() 路径(文件).parent()绝对的()。

I found the following commands will all return the full path of the parent directory of a Python 3.6 script.

Python 3.6 Script:

#!/usr/bin/env python3.6
# -*- coding: utf-8 -*-

from pathlib import Path

#Get the absolute path of a Python3.6 script
dir1 = Path().resolve()  #Make the path absolute, resolving any symlinks.
dir2 = Path().absolute() #See @RonKalian answer 
dir3 = Path(__file__).parent.absolute() #See @Arminius answer 

print(f'dir1={dir1}\ndir2={dir2}\ndir3={dir3}')

Explanation links: .resolve(), .absolute(), Path(file).parent().absolute()


回答 7

系统:MacOS

版本:Python 3.6 w / Anaconda

import os

rootpath = os.getcwd()

os.chdir(rootpath)

System: MacOS

Version: Python 3.6 w/ Anaconda

import os

rootpath = os.getcwd()

os.chdir(rootpath)

回答 8

PYTHON中的有用路径属性:

 from pathlib import Path

    #Returns the path of the directory, where your script file is placed
    mypath = Path().absolute()
    print('Absolute path : {}'.format(mypath))

    #if you want to go to any other file inside the subdirectories of the directory path got from above method
    filePath = mypath/'data'/'fuel_econ.csv'
    print('File path : {}'.format(filePath))

    #To check if file present in that directory or Not
    isfileExist = filePath.exists()
    print('isfileExist : {}'.format(isfileExist))

    #To check if the path is a directory or a File
    isadirectory = filePath.is_dir()
    print('isadirectory : {}'.format(isadirectory))

    #To get the extension of the file
    fileExtension = mypath/'data'/'fuel_econ.csv'
    print('File extension : {}'.format(filePath.suffix))

输出: 绝对路径是放置Python文件的路径

绝对路径:D:\ Study \ Machine Learning \ Jupitor Notebook \ JupytorNotebookTest2 \ Udacity_Scripts \ Matplotlib和seaborn Part2

文件路径:D:\ Study \ Machine Learning \ Jupitor Notebook \ JupytorNotebookTest2 \ Udacity_Scripts \ Matplotlib和seaborn Part2 \ data \ fuel_econ.csv

isfileExist:真

isadirectory:错误

文件扩展名:.csv

USEFUL PATH PROPERTIES IN PYTHON:

 from pathlib import Path

    #Returns the path of the directory, where your script file is placed
    mypath = Path().absolute()
    print('Absolute path : {}'.format(mypath))

    #if you want to go to any other file inside the subdirectories of the directory path got from above method
    filePath = mypath/'data'/'fuel_econ.csv'
    print('File path : {}'.format(filePath))

    #To check if file present in that directory or Not
    isfileExist = filePath.exists()
    print('isfileExist : {}'.format(isfileExist))

    #To check if the path is a directory or a File
    isadirectory = filePath.is_dir()
    print('isadirectory : {}'.format(isadirectory))

    #To get the extension of the file
    fileExtension = mypath/'data'/'fuel_econ.csv'
    print('File extension : {}'.format(filePath.suffix))

OUTPUT: ABSOLUTE PATH IS THE PATH WHERE YOUR PYTHON FILE IS PLACED

Absolute path : D:\Study\Machine Learning\Jupitor Notebook\JupytorNotebookTest2\Udacity_Scripts\Matplotlib and seaborn Part2

File path : D:\Study\Machine Learning\Jupitor Notebook\JupytorNotebookTest2\Udacity_Scripts\Matplotlib and seaborn Part2\data\fuel_econ.csv

isfileExist : True

isadirectory : False

File extension : .csv


回答 9

如果您只想查看当前的工作目录

import os
print(os.getcwd)

如果要更改当前工作目录

os.chdir(path)

path是一个字符串,其中包含要移动的所需路径。例如

path = "C:\\Users\\xyz\\Desktop\\move here"

If you just want to see the current working directory

import os
print(os.getcwd)

If you want to change the current working directory

os.chdir(path)

path is a string containing the required path to be moved. e.g.

path = "C:\\Users\\xyz\\Desktop\\move here"

回答 10

IPython有一个神奇的命令%pwd来获取当前的工作目录。它可以按以下方式使用:

from IPython.terminal.embed import InteractiveShellEmbed

ip_shell = InteractiveShellEmbed()

present_working_directory = ip_shell.magic("%pwd")

在IPython Jupyter Notebook上%pwd可以直接使用,如下所示:

present_working_directory = %pwd

IPython has a magic command %pwd to get the present working directory. It can be used in following way:

from IPython.terminal.embed import InteractiveShellEmbed

ip_shell = InteractiveShellEmbed()

present_working_directory = ip_shell.magic("%pwd")

On IPython Jupyter Notebook %pwd can be used directly as following:

present_working_directory = %pwd

回答 11

要保持跨平台(macOS / Windows / Linux)的迁移一致性,请尝试:

path = r'%s' % os.getcwd().replace('\\','/')

To keep the migration consistency across platforms (macOS/Windows/Linux), try:

path = r'%s' % os.getcwd().replace('\\','/')

回答 12

为了获取当前文件夹,我已经在CGI的IIS下运行python时使用了一个函数:

import os 
   def getLocalFolder():
        path=str(os.path.dirname(os.path.abspath(__file__))).split('\\')
        return path[len(path)-1]

I have made a function to use when running python under IIS in CGI in order to get the current folder:

import os 
   def getLocalFolder():
        path=str(os.path.dirname(os.path.abspath(__file__))).split('\\')
        return path[len(path)-1]

回答 13

假设您具有以下目录结构:-

主/折1折2折3 …

folders = glob.glob("main/fold*")

for fold in folders:
    abspath = os.path.dirname(os.path.abspath(fold))
    fullpath = os.path.join(abspath, sch)
    print(fullpath)

Let’s assume you have the following directory structure: –

main/ fold1 fold2 fold3…

folders = glob.glob("main/fold*")

for fold in folders:
    abspath = os.path.dirname(os.path.abspath(fold))
    fullpath = os.path.join(abspath, sch)
    print(fullpath)

回答 14

## IMPORT MODULES
import os

## CALCULATE FILEPATH VARIABLE
filepath = os.path.abspath('') ## ~ os.getcwd()
## TEST TO MAKE SURE os.getcwd() is EQUIVALENT ALWAYS..
## ..OR DIFFERENT IN SOME CIRCUMSTANCES
## IMPORT MODULES
import os

## CALCULATE FILEPATH VARIABLE
filepath = os.path.abspath('') ## ~ os.getcwd()
## TEST TO MAKE SURE os.getcwd() is EQUIVALENT ALWAYS..
## ..OR DIFFERENT IN SOME CIRCUMSTANCES

在Python中模拟do-while循环?

问题:在Python中模拟do-while循环?

我需要在Python程序中模拟do-while循环。不幸的是,以下简单的代码不起作用:

list_of_ints = [ 1, 2, 3 ]
iterator = list_of_ints.__iter__()
element = None

while True:
  if element:
    print element

  try:
    element = iterator.next()
  except StopIteration:
    break

print "done"

代替“ 1,2,3,done”,它输出以下输出:

[stdout:]1
[stdout:]2
[stdout:]3
None['Traceback (most recent call last):
', '  File "test_python.py", line 8, in <module>
    s = i.next()
', 'StopIteration
']

为了捕获“停止迭代”异常并正确中断while循环,我该怎么办?

为什么需要这种东西的一个示例在下面显示为伪代码。

状态机:

s = ""
while True :
  if state is STATE_CODE :
    if "//" in s :
      tokens.add( TOKEN_COMMENT, s.split( "//" )[1] )
      state = STATE_COMMENT
    else :
      tokens.add( TOKEN_CODE, s )
  if state is STATE_COMMENT :
    if "//" in s :
      tokens.append( TOKEN_COMMENT, s.split( "//" )[1] )
    else
      state = STATE_CODE
      # Re-evaluate same line
      continue
  try :
    s = i.next()
  except StopIteration :
    break

I need to emulate a do-while loop in a Python program. Unfortunately, the following straightforward code does not work:

list_of_ints = [ 1, 2, 3 ]
iterator = list_of_ints.__iter__()
element = None

while True:
  if element:
    print element

  try:
    element = iterator.next()
  except StopIteration:
    break

print "done"

Instead of “1,2,3,done”, it prints the following output:

[stdout:]1
[stdout:]2
[stdout:]3
None['Traceback (most recent call last):
', '  File "test_python.py", line 8, in <module>
    s = i.next()
', 'StopIteration
']

What can I do in order to catch the ‘stop iteration’ exception and break a while loop properly?

An example of why such a thing may be needed is shown below as pseudocode.

State machine:

s = ""
while True :
  if state is STATE_CODE :
    if "//" in s :
      tokens.add( TOKEN_COMMENT, s.split( "//" )[1] )
      state = STATE_COMMENT
    else :
      tokens.add( TOKEN_CODE, s )
  if state is STATE_COMMENT :
    if "//" in s :
      tokens.append( TOKEN_COMMENT, s.split( "//" )[1] )
    else
      state = STATE_CODE
      # Re-evaluate same line
      continue
  try :
    s = i.next()
  except StopIteration :
    break

回答 0

我不确定您要做什么。您可以像这样实现一个do-while循环:

while True:
  stuff()
  if fail_condition:
    break

要么:

stuff()
while not fail_condition:
  stuff()

您在尝试使用do while循环来打印列表中的内容在做什么?为什么不使用:

for i in l:
  print i
print "done"

更新:

那你有行列表吗?而您想继续迭代呢?怎么样:

for s in l: 
  while True: 
    stuff() 
    # use a "break" instead of s = i.next()

看起来像您想要的东西吗?在您的代码示例中,它将是:

for s in some_list:
  while True:
    if state is STATE_CODE:
      if "//" in s:
        tokens.add( TOKEN_COMMENT, s.split( "//" )[1] )
        state = STATE_COMMENT
      else :
        tokens.add( TOKEN_CODE, s )
    if state is STATE_COMMENT:
      if "//" in s:
        tokens.append( TOKEN_COMMENT, s.split( "//" )[1] )
        break # get next s
      else:
        state = STATE_CODE
        # re-evaluate same line
        # continues automatically

I am not sure what you are trying to do. You can implement a do-while loop like this:

while True:
  stuff()
  if fail_condition:
    break

Or:

stuff()
while not fail_condition:
  stuff()

What are you doing trying to use a do while loop to print the stuff in the list? Why not just use:

for i in l:
  print i
print "done"

Update:

So do you have a list of lines? And you want to keep iterating through it? How about:

for s in l: 
  while True: 
    stuff() 
    # use a "break" instead of s = i.next()

Does that seem like something close to what you would want? With your code example, it would be:

for s in some_list:
  while True:
    if state is STATE_CODE:
      if "//" in s:
        tokens.add( TOKEN_COMMENT, s.split( "//" )[1] )
        state = STATE_COMMENT
      else :
        tokens.add( TOKEN_CODE, s )
    if state is STATE_COMMENT:
      if "//" in s:
        tokens.append( TOKEN_COMMENT, s.split( "//" )[1] )
        break # get next s
      else:
        state = STATE_CODE
        # re-evaluate same line
        # continues automatically

回答 1

这是一种模拟do-while循环的非常简单的方法:

condition = True
while condition:
    # loop body here
    condition = test_loop_condition()
# end of loop

同时执行循环的关键特征是循环主体始终至少执行一次,并且条件在循环主体的底部进行评估。此处显示的控制结构无需异常或break语句即可完成这两项操作。它确实引入了一个额外的布尔变量。

Here’s a very simple way to emulate a do-while loop:

condition = True
while condition:
    # loop body here
    condition = test_loop_condition()
# end of loop

The key features of a do-while loop are that the loop body always executes at least once, and that the condition is evaluated at the bottom of the loop body. The control structure show here accomplishes both of these with no need for exceptions or break statements. It does introduce one extra Boolean variable.


回答 2

我下面的代码可能是一个有用的实现,着重说明了两者之间的主要区别 据我了解。

因此,在这种情况下,您总是至少要循环一次。

first_pass = True
while first_pass or condition:
    first_pass = False
    do_stuff()

My code below might be a useful implementation, highlighting the main difference between vs as I understand it.

So in this one case, you always go through the loop at least once.

first_pass = True
while first_pass or condition:
    first_pass = False
    do_stuff()

回答 3

异常会破坏循环,因此您最好在循环之外进行处理。

try:
  while True:
    if s:
      print s
    s = i.next()
except StopIteration:   
  pass

我想您的代码的问题是break内部行为except未定义。通常break,仅上移一个级别,因此,例如breaktryInside直接进入finally(如果存在的话)a try,而不是循环。

相关的PEP:http
: //www.python.org/dev/peps/pep-3136相关的问题:打破嵌套循环

Exception will break the loop, so you might as well handle it outside the loop.

try:
  while True:
    if s:
      print s
    s = i.next()
except StopIteration:   
  pass

I guess that the problem with your code is that behaviour of break inside except is not defined. Generally break goes only one level up, so e.g. break inside try goes directly to finally (if it exists) an out of the try, but not out of the loop.

Related PEP: http://www.python.org/dev/peps/pep-3136
Related question: Breaking out of nested loops


回答 4

do {
  stuff()
} while (condition())

->

while True:
  stuff()
  if not condition():
    break

您可以执行以下功能:

def do_while(stuff, condition):
  while condition(stuff()):
    pass

但是1)这很丑。2)条件应该是带有一个参数的函数,应该由填充物填充(这是使用经典while循环的唯一原因。)

do {
  stuff()
} while (condition())

->

while True:
  stuff()
  if not condition():
    break

You can do a function:

def do_while(stuff, condition):
  while condition(stuff()):
    pass

But 1) It’s ugly. 2) Condition should be a function with one parameter, supposed to be filled by stuff (it’s the only reason not to use the classic while loop.)


回答 5

这是使用协程的不同模式的更疯狂的解决方案。代码仍然非常相似,但有一个重要区别。根本没有退出条件!当您停止向数据提供数据时,协程(实际上是协程链)就会停止。

def coroutine(func):
    """Coroutine decorator

    Coroutines must be started, advanced to their first "yield" point,
    and this decorator does this automatically.
    """
    def startcr(*ar, **kw):
        cr = func(*ar, **kw)
        cr.next()
        return cr
    return startcr

@coroutine
def collector(storage):
    """Act as "sink" and collect all sent in @storage"""
    while True:
        storage.append((yield))

@coroutine      
def state_machine(sink):
    """ .send() new parts to be tokenized by the state machine,
    tokens are passed on to @sink
    """ 
    s = ""
    state = STATE_CODE
    while True: 
        if state is STATE_CODE :
            if "//" in s :
                sink.send((TOKEN_COMMENT, s.split( "//" )[1] ))
                state = STATE_COMMENT
            else :
                sink.send(( TOKEN_CODE, s ))
        if state is STATE_COMMENT :
            if "//" in s :
                sink.send(( TOKEN_COMMENT, s.split( "//" )[1] ))
            else
                state = STATE_CODE
                # re-evaluate same line
                continue
        s = (yield)

tokens = []
sm = state_machine(collector(tokens))
for piece in i:
    sm.send(piece)

上述收集的代码中的所有令牌作为元组tokens和我假定之间不存在差异.append()并且.add()在原始代码中。

Here is a crazier solution of a different pattern — using coroutines. The code is still very similar, but with one important difference; there are no exit conditions at all! The coroutine (chain of coroutines really) just stops when you stop feeding it with data.

def coroutine(func):
    """Coroutine decorator

    Coroutines must be started, advanced to their first "yield" point,
    and this decorator does this automatically.
    """
    def startcr(*ar, **kw):
        cr = func(*ar, **kw)
        cr.next()
        return cr
    return startcr

@coroutine
def collector(storage):
    """Act as "sink" and collect all sent in @storage"""
    while True:
        storage.append((yield))

@coroutine      
def state_machine(sink):
    """ .send() new parts to be tokenized by the state machine,
    tokens are passed on to @sink
    """ 
    s = ""
    state = STATE_CODE
    while True: 
        if state is STATE_CODE :
            if "//" in s :
                sink.send((TOKEN_COMMENT, s.split( "//" )[1] ))
                state = STATE_COMMENT
            else :
                sink.send(( TOKEN_CODE, s ))
        if state is STATE_COMMENT :
            if "//" in s :
                sink.send(( TOKEN_COMMENT, s.split( "//" )[1] ))
            else
                state = STATE_CODE
                # re-evaluate same line
                continue
        s = (yield)

tokens = []
sm = state_machine(collector(tokens))
for piece in i:
    sm.send(piece)

The code above collects all tokens as tuples in tokens and I assume there is no difference between .append() and .add() in the original code.


回答 6

我这样做的方式如下…

condition = True
while condition:
     do_stuff()
     condition = (<something that evaluates to True or False>)

在我看来,这是一个简单的解决方案,我很惊讶自己还没有在这里看到它。显然,这也可以转化为

while not condition:

等等

The way I’ve done this is as follows…

condition = True
while condition:
     do_stuff()
     condition = (<something that evaluates to True or False>)

This seems to me to be the simplistic solution, I’m surprised I haven’t seen it here already. This can obviously also be inverted to

while not condition:

etc.


回答 7

做-包含try语句的while循环

loop = True
while loop:
    generic_stuff()
    try:
        questionable_stuff()
#       to break from successful completion
#       loop = False  
    except:
        optional_stuff()
#       to break from unsuccessful completion - 
#       the case referenced in the OP's question
        loop = False
   finally:
        more_generic_stuff()

或者,当不需要“ finally”子句时

while True:
    generic_stuff()
    try:
        questionable_stuff()
#       to break from successful completion
#       break  
    except:
        optional_stuff()
#       to break from unsuccessful completion - 
#       the case referenced in the OP's question
        break

for a do – while loop containing try statements

loop = True
while loop:
    generic_stuff()
    try:
        questionable_stuff()
#       to break from successful completion
#       loop = False  
    except:
        optional_stuff()
#       to break from unsuccessful completion - 
#       the case referenced in the OP's question
        loop = False
   finally:
        more_generic_stuff()

alternatively, when there’s no need for the ‘finally’ clause

while True:
    generic_stuff()
    try:
        questionable_stuff()
#       to break from successful completion
#       break  
    except:
        optional_stuff()
#       to break from unsuccessful completion - 
#       the case referenced in the OP's question
        break

回答 8

while condition is True: 
  stuff()
else:
  stuff()
while condition is True: 
  stuff()
else:
  stuff()

回答 9

快速破解:

def dowhile(func = None, condition = None):
    if not func or not condition:
        return
    else:
        func()
        while condition():
            func()

像这样使用:

>>> x = 10
>>> def f():
...     global x
...     x = x - 1
>>> def c():
        global x
        return x > 0
>>> dowhile(f, c)
>>> print x
0

Quick hack:

def dowhile(func = None, condition = None):
    if not func or not condition:
        return
    else:
        func()
        while condition():
            func()

Use like so:

>>> x = 10
>>> def f():
...     global x
...     x = x - 1
>>> def c():
        global x
        return x > 0
>>> dowhile(f, c)
>>> print x
0

回答 10

你为什么不做

for s in l :
    print s
print "done"

Why don’t you just do

for s in l :
    print s
print "done"

?


回答 11

看看是否有帮助:

在s之前,在异常处理程序中设置一个标志并检查它。

flagBreak = false;
while True :

    if flagBreak : break

    if s :
        print s
    try :
        s = i.next()
    except StopIteration :
        flagBreak = true

print "done"

See if this helps :

Set a flag inside the exception handler and check it before working on the s.

flagBreak = false;
while True :

    if flagBreak : break

    if s :
        print s
    try :
        s = i.next()
    except StopIteration :
        flagBreak = true

print "done"

回答 12

如果您处于资源不可用或可能引发异常的类似情况的循环环境中,则可以使用类似

import time

while True:
    try:
       f = open('some/path', 'r')
    except IOError:
       print('File could not be read. Retrying in 5 seconds')   
       time.sleep(5)
    else:
       break

If you’re in a scenario where you are looping while a resource is unavaliable or something similar that throws an exception, you could use something like

import time

while True:
    try:
       f = open('some/path', 'r')
    except IOError:
       print('File could not be read. Retrying in 5 seconds')   
       time.sleep(5)
    else:
       break

回答 13

对我来说,典型的while循环将是这样的:

xBool = True
# A counter to force a condition (eg. yCount = some integer value)

while xBool:
    # set up the condition (eg. if yCount > 0):
        (Do something)
        yCount = yCount - 1
    else:
        # (condition is not met, set xBool False)
        xBool = False

如果情况允许,我也可以在while循环中包含for..loop,以循环另一组条件。

For me a typical while loop will be something like this:

xBool = True
# A counter to force a condition (eg. yCount = some integer value)

while xBool:
    # set up the condition (eg. if yCount > 0):
        (Do something)
        yCount = yCount - 1
    else:
        # (condition is not met, set xBool False)
        xBool = False

I could include a for..loop within the while loop as well, if situation so warrants, for looping through another set of condition.


如何卸载(重新加载)模块?

问题:如何卸载(重新加载)模块?

我有一台运行时间较长的Python服务器,并且希望能够在不重新启动服务器的情况下升级服务。最好的方法是什么?

if foo.py has changed:
    unimport foo  <-- How do I do this?
    import foo
    myfoo = foo.Foo()

I have a long-running Python server and would like to be able to upgrade a service without restarting the server. What’s the best way do do this?

if foo.py has changed:
    unimport foo  <-- How do I do this?
    import foo
    myfoo = foo.Foo()

回答 0

您可以使用reload内置函数(仅适用于Python 3.4+)重新导入已导入的模块:

from importlib import reload  
import foo

while True:
    # Do some things.
    if is_changed(foo):
        foo = reload(foo)

在Python 3中,reload已移至imp模块。在3.4中,imp不推荐使用importlib,而reload在中添加了。当定位到3或更高版本时,在调用reload或导入它时参考相应的模块。

我认为这就是您想要的。诸如Django开发服务器之类的Web服务器都使用此服务器,这样您就可以查看代码更改的效果,而无需重新启动服务器进程本身。

引用文档:

重新编译Python模块的代码并重新执行模块级代码,从而定义了一组新对象,这些对象绑定到模块字典中的名称。扩展模块的init函数不会被第二次调用。与Python中的所有其他对象一样,旧对象仅在其引用计数降至零后才被回收。模块命名空间中的名称将更新为指向任何新的或更改的对象。对旧对象的其他引用(例如模块外部的名称)不会反弹以引用新对象,并且如果需要的话,必须在出现它们的每个命名空间中进行更新。

正如您在问题中指出的那样,Foo如果Foo类驻留在foo模块中,则必须重构对象。

You can reload a module when it has already been imported by using the reload builtin function (Python 3.4+ only):

from importlib import reload  
import foo

while True:
    # Do some things.
    if is_changed(foo):
        foo = reload(foo)

In Python 3, reload was moved to the imp module. In 3.4, imp was deprecated in favor of importlib, and reload was added to the latter. When targeting 3 or later, either reference the appropriate module when calling reload or import it.

I think that this is what you want. Web servers like Django’s development server use this so that you can see the effects of your code changes without restarting the server process itself.

To quote from the docs:

Python modules’ code is recompiled and the module-level code reexecuted, defining a new set of objects which are bound to names in the module’s dictionary. The init function of extension modules is not called a second time. As with all other objects in Python the old objects are only reclaimed after their reference counts drop to zero. The names in the module namespace are updated to point to any new or changed objects. Other references to the old objects (such as names external to the module) are not rebound to refer to the new objects and must be updated in each namespace where they occur if that is desired.

As you noted in your question, you’ll have to reconstruct Foo objects if the Foo class resides in the foo module.


回答 1

在Python 3.0–3.3中,您将使用: imp.reload(module)

BDFL已经回答了这个问题。

但是,imp在3.4中已弃用,importlib改为(感谢@Stefan!)。

因此,importlib.reload(module)尽管我不确定,但您现在应该使用。

In Python 3.0–3.3 you would use: imp.reload(module)

The BDFL has answered this question.

However, imp was deprecated in 3.4, in favour of importlib (thanks @Stefan!).

I think, therefore, you’d now use importlib.reload(module), although I’m not sure.


回答 2

如果模块不是纯Python,则删除模块可能会特别困难。

以下是一些信息:我如何真正删除导入的模块?

您可以使用sys.getrefcount()来查找实际的引用数。

>>> import sys, empty, os
>>> sys.getrefcount(sys)
9
>>> sys.getrefcount(os)
6
>>> sys.getrefcount(empty)
3

大于3的数字表示很难摆脱该模块。本地的“空”(不包含任何内容)模块应在之后收集垃圾

>>> del sys.modules["empty"]
>>> del empty

作为第三个引用是getrefcount()函数的构件。

It can be especially difficult to delete a module if it is not pure Python.

Here is some information from: How do I really delete an imported module?

You can use sys.getrefcount() to find out the actual number of references.

>>> import sys, empty, os
>>> sys.getrefcount(sys)
9
>>> sys.getrefcount(os)
6
>>> sys.getrefcount(empty)
3

Numbers greater than 3 indicate that it will be hard to get rid of the module. The homegrown “empty” (containing nothing) module should be garbage collected after

>>> del sys.modules["empty"]
>>> del empty

as the third reference is an artifact of the getrefcount() function.


回答 3

reload(module),但前提是它是完全独立的。如果还有其他引用该模块(或属于该模块的任何对象)的引用,则您将得到细微而奇怪的错误,这些错误是由于旧代码的停留时间超出您的预期而导致的,并且isinstance无法在不同版本的相同的代码。

如果您具有单向依赖关系,则还必须重新加载所有依赖于重新加载的模块的模块,以摆脱对旧代码的所有引用。然后递归依赖于重新加载的模块重新加载模块。

如果您有循环依赖关系(例如在处理重新加载程序包时非常常见),则必须一次性卸载组中的所有模块。您无法执行此操作,reload()因为它将在刷新依赖关系之前重新导入每个模块,从而允许旧引用爬入新模块。

在这种情况下,唯一的方法是hack sys.modules,这是不受支持的。您必须仔细检查并删除sys.modules要在下次导入时重新加载的每个条目,还必须删除其值None用于处理实现问题的条目,以缓存失败的相对导入。它不是很好,但是只要您有一套完全独立的依赖项,并且不会将引用保留在其代码库之外,那么它就是可行的。

最好重新启动服务器。:-)

reload(module), but only if it’s completely stand-alone. If anything else has a reference to the module (or any object belonging to the module), then you’ll get subtle and curious errors caused by the old code hanging around longer than you expected, and things like isinstance not working across different versions of the same code.

If you have one-way dependencies, you must also reload all modules that depend on the the reloaded module to get rid of all the references to the old code. And then reload modules that depend on the reloaded modules, recursively.

If you have circular dependencies, which is very common for example when you are dealing with reloading a package, you must unload all the modules in the group in one go. You can’t do this with reload() because it will re-import each module before its dependencies have been refreshed, allowing old references to creep into new modules.

The only way to do it in this case is to hack sys.modules, which is kind of unsupported. You’d have to go through and delete each sys.modules entry you wanted to be reloaded on next import, and also delete entries whose values are None to deal with an implementation issue to do with caching failed relative imports. It’s not terribly nice but as long as you have a fully self-contained set of dependencies that doesn’t leave references outside its codebase, it’s workable.

It’s probably best to restart the server. :-)


回答 4

if 'myModule' in sys.modules:  
    del sys.modules["myModule"]
if 'myModule' in sys.modules:  
    del sys.modules["myModule"]

回答 5

对于Python 2,请使用内置函数reload()

reload(module)

对于Python 2和3.2–3.3,请使用从模块imp重新加载

import imp
imp.reload(module)

但是从3.4版开始imp 不推荐使用importlib,所以请使用:

import importlib
importlib.reload(module)

要么

from importlib import reload
reload(module)

For Python 2 use built-in function reload():

reload(module)

For Python 2 and 3.2–3.3 use reload from module imp:

import imp
imp.reload(module)

But imp is deprecated since version 3.4 in favor of importlib, so use:

import importlib
importlib.reload(module)

or

from importlib import reload
reload(module)

回答 6

以下代码允许您与Python 2/3兼容:

try:
    reload
except NameError:
    # Python 3
    from imp import reload

您可以reload()在两个版本中都使用它,这使事情变得更简单。

The following code allows you Python 2/3 compatibility:

try:
    reload
except NameError:
    # Python 3
    from imp import reload

The you can use it as reload() in both versions which makes things simpler.


回答 7

接受的答案不处理from X import Y的情况。这段代码可以处理它以及标准的导入情况:

def importOrReload(module_name, *names):
    import sys

    if module_name in sys.modules:
        reload(sys.modules[module_name])
    else:
        __import__(module_name, fromlist=names)

    for name in names:
        globals()[name] = getattr(sys.modules[module_name], name)

# use instead of: from dfly_parser import parseMessages
importOrReload("dfly_parser", "parseMessages")

在重载的情况下,我们将顶级名称重新分配给新重载的模块中存储的值,从而更新它们。

The accepted answer doesn’t handle the from X import Y case. This code handles it and the standard import case as well:

def importOrReload(module_name, *names):
    import sys

    if module_name in sys.modules:
        reload(sys.modules[module_name])
    else:
        __import__(module_name, fromlist=names)

    for name in names:
        globals()[name] = getattr(sys.modules[module_name], name)

# use instead of: from dfly_parser import parseMessages
importOrReload("dfly_parser", "parseMessages")

In the reloading case, we reassign the top level names to the values stored in the newly reloaded module, which updates them.


回答 8

这是重新加载模块的现代方法:

from importlib import reload

如果要支持3.5之前的Python版本,请尝试以下操作:

from sys import version_info
if version_info[0] < 3:
    pass # Python 2 has built in reload
elif version_info[0] == 3 and version_info[1] <= 4:
    from imp import reload # Python 3.0 - 3.4 
else:
    from importlib import reload # Python 3.5+

要使用它,请运行reload(MODULE),并替换MODULE为要重新加载的模块。

例如,reload(math)将重新加载math模块。

This is the modern way of reloading a module:

from importlib import reload

If you want to support versions of Python older than 3.5, try this:

from sys import version_info
if version_info[0] < 3:
    pass # Python 2 has built in reload
elif version_info[0] == 3 and version_info[1] <= 4:
    from imp import reload # Python 3.0 - 3.4 
else:
    from importlib import reload # Python 3.5+

To use it, run reload(MODULE), replacing MODULE with the module you want to reload.

For example, reload(math) will reload the math module.


回答 9

如果您不在服务器中,但是正在开发并且需要经常重新加载模块,那么这里是个不错的提示。

首先,请确保您使用的是Jupyter Notebook项目中出色的IPython shell。安装Jupyter后,你可以启动它ipython,或者jupyter console,甚至更好,jupyter qtconsole,这将为您提供一个漂亮的彩色控制台,并在任何OS中均具有代码完成功能。

现在在您的外壳中,键入:

%load_ext autoreload
%autoreload 2

现在,每次您运行脚本时,模块都会重新加载。

除了2,自动重载魔术还有其他选择

%autoreload
Reload all modules (except those excluded by %aimport) automatically now.

%autoreload 0
Disable automatic reloading.

%autoreload 1
Reload all modules imported with %aimport every time before executing the Python code typed.

%autoreload 2
Reload all modules (except those excluded by %aimport) every time before
executing the Python code typed.

If you are not in a server, but developing and need to frequently reload a module, here’s a nice tip.

First, make sure you are using the excellent IPython shell, from the Jupyter Notebook project. After installing Jupyter, you can start it with ipython, or jupyter console, or even better, jupyter qtconsole, which will give you a nice colorized console with code completion in any OS.

Now in your shell, type:

%load_ext autoreload
%autoreload 2

Now, every time you run your script, your modules will be reloaded.

Beyond the 2, there are other options of the autoreload magic:

%autoreload
Reload all modules (except those excluded by %aimport) automatically now.

%autoreload 0
Disable automatic reloading.

%autoreload 1
Reload all modules imported with %aimport every time before executing the Python code typed.

%autoreload 2
Reload all modules (except those excluded by %aimport) every time before
executing the Python code typed.

回答 10

对于那些想要卸载所有模块的人(在Emacs下的Python解释器中运行时):

   for mod in sys.modules.values():
      reload(mod)

有关更多信息,请参见重新加载Python模块

For those like me who want to unload all modules (when running in the Python interpreter under Emacs):

   for mod in sys.modules.values():
      reload(mod)

More information is in Reloading Python modules.


回答 11

追求特质有一个可以很好地完成此任务的模块。https://traits.readthedocs.org/zh/4.3.0/_modules/traits/util/refresh.html

它将重新加载已更改的所有模块,并更新正在使用该模块的其他模块和实例对象。大多数情况下它不起作用__very_private__方法使用,并且可能会阻塞类继承,但是它为我节省了编写PyQt guis或在Maya或Nuke等程序中运行的东西时不必重新启动主机应用程序的疯狂时间。它可能在20%到30%的时间内无效,但是仍然非常有用。

Enthought的软件包不会在文件更改时立即重新加载文件-您必须明确地调用它-但是如果您真的需要它,那么实现起来应该不那么困难

Enthought Traits has a module that works fairly well for this. https://traits.readthedocs.org/en/4.3.0/_modules/traits/util/refresh.html

It will reload any module that has been changed, and update other modules and instanced objects that are using it. It does not work most of the time with __very_private__ methods, and can choke on class inheritance, but it saves me crazy amounts of time from having to restart the host application when writing PyQt guis, or stuff that runs inside programs such as Maya or Nuke. It doesn’t work maybe 20-30 % of the time, but it’s still incredibly helpful.

Enthought’s package doesn’t reload files the moment they change – you have to call it explicitely – but that shouldn’t be all that hard to implement if you really need it


回答 12

那些正在使用python 3并从importlib重新加载的人。

如果您遇到问题,例如似乎模块无法重新加载…那是因为它需要一些时间来重新编译pyc(最多60秒)。我写此提示只是想知道您是否遇到过此类问题。

Those who are using python 3 and reload from importlib.

If you have problems like it seems that module doesn’t reload… That is because it needs some time to recompile pyc (up to 60 sec).I writing this hint just that you know if you have experienced this kind of problem.


回答 13

2018-02-01

  1. foo必须提前成功导入模块。
  2. from importlib import reloadreload(foo)

31.5。importlib —导入的实现— Python 3.6.4文档

2018-02-01

  1. module foo must be imported successfully in advance.
  2. from importlib import reload, reload(foo)

31.5. importlib — The implementation of import — Python 3.6.4 documentation


回答 14

其他选择。看到Python默认值importlib.reload将只是重新导入作为参数传递的库。它不会重新加载您的lib导入的库。如果您更改了很多文件并且要导入的包有些复杂,则必须进行一次深度重载

如果您安装了IPythonJupyter,则可以使用一个函数来深度重新加载所有库:

from IPython.lib.deepreload import reload as dreload
dreload(foo)

如果您没有Jupyter,请在外壳程序中使用以下命令将其安装:

pip3 install jupyter

Other option. See that Python default importlib.reload will just reimport the library passed as an argument. It won’t reload the libraries that your lib import. If you changed a lot of files and have a somewhat complex package to import, you must do a deep reload.

If you have IPython or Jupyter installed, you can use a function to deep reload all libs:

from IPython.lib.deepreload import reload as dreload
dreload(foo)

If you don’t have Jupyter, install it with this command in your shell:

pip3 install jupyter

回答 15

编辑(答案V2)

之前的解决方案仅适用于获取重置信息,但是它不会更改所有引用(超出reload但少于要求)。为了实际设置所有引用,我必须进入垃圾收集器,并在那里重写引用。现在它就像一种魅力!

请注意,这不会如果GC已关闭,或者重新加载了不受GC监视的数据,则。如果您不想弄乱GC,那么原始答案可能就足够了。

新代码:

import importlib
import inspect
import gc
from weakref import ref


def reset_module(module, inner_modules_also=True):
    """
    This function is a stronger form of importlib's `reload` function. What it does, is that aside from reloading a
    module, it goes to the old instance of the module, and sets all the (not read-only) attributes, functions and classes
    to be the reloaded-module's
    :param module: The module to reload (module reference, not the name)
    :param inner_modules_also: Whether to treat ths module as a package as well, and reload all the modules within it.
    """

    # For the case when the module is actually a package
    if inner_modules_also:
        submods = {submod for _, submod in inspect.getmembers(module)
                   if (type(submod).__name__ == 'module') and (submod.__package__.startswith(module.__name__))}
        for submod in submods:
            reset_module(submod, True)

    # First, log all the references before reloading (because some references may be changed by the reload operation).
    module_tree = _get_tree_references_to_reset_recursively(module, module.__name__)

    new_module = importlib.reload(module)
    _reset_item_recursively(module, module_tree, new_module)


def _update_referrers(item, new_item):
    refs = gc.get_referrers(item)

    weak_ref_item = ref(item)
    for coll in refs:
        if type(coll) == dict:
            enumerator = coll.keys()
        elif type(coll) == list:
            enumerator = range(len(coll))
        else:
            continue

        for key in enumerator:

            if weak_ref_item() is None:
                # No refs are left in the GC
                return

            if coll[key] is weak_ref_item():
                coll[key] = new_item

def _get_tree_references_to_reset_recursively(item, module_name, grayed_out_item_ids = None):
    if grayed_out_item_ids is None:
        grayed_out_item_ids = set()

    item_tree = dict()
    attr_names = set(dir(item)) - _readonly_attrs
    for sub_item_name in attr_names:

        sub_item = getattr(item, sub_item_name)
        item_tree[sub_item_name] = [sub_item, None]

        try:
            # Will work for classes and functions defined in that module.
            mod_name = sub_item.__module__
        except AttributeError:
            mod_name = None

        # If this item was defined within this module, deep-reset
        if (mod_name is None) or (mod_name != module_name) or (id(sub_item) in grayed_out_item_ids) \
                or isinstance(sub_item, EnumMeta):
            continue

        grayed_out_item_ids.add(id(sub_item))
        item_tree[sub_item_name][1] = \
            _get_tree_references_to_reset_recursively(sub_item, module_name, grayed_out_item_ids)

    return item_tree


def _reset_item_recursively(item, item_subtree, new_item):

    # Set children first so we don't lose the current references.
    if item_subtree is not None:
        for sub_item_name, (sub_item, sub_item_tree) in item_subtree.items():

            try:
                new_sub_item = getattr(new_item, sub_item_name)
            except AttributeError:
                # The item doesn't exist in the reloaded module. Ignore.
                continue

            try:
                # Set the item
                _reset_item_recursively(sub_item, sub_item_tree, new_sub_item)
            except Exception as ex:
                pass

    _update_referrers(item, new_item)

原始答案

就像@bobince的答案中所写,如果另一个模块中已经存在对该模块的引用(特别是如果它是使用as诸如import numpy as np),则该实例将不会被覆盖。

在应用要求配置模块处于“干净状态”状态的测试时,这对我来说是相当麻烦的,因此我编写了一个名为的函数,该函数reset_module使用importlibreload函数并递归覆盖所有声明的模块的属性。已通过Python 3.6版进行了测试。

import importlib
import inspect
from enum import EnumMeta

_readonly_attrs = {'__annotations__', '__call__', '__class__', '__closure__', '__code__', '__defaults__', '__delattr__',
               '__dict__', '__dir__', '__doc__', '__eq__', '__format__', '__func__', '__ge__', '__get__',
               '__getattribute__', '__globals__', '__gt__', '__hash__', '__init__', '__init_subclass__',
               '__kwdefaults__', '__le__', '__lt__', '__module__', '__name__', '__ne__', '__new__', '__qualname__',
               '__reduce__', '__reduce_ex__', '__repr__', '__self__', '__setattr__', '__sizeof__', '__str__',
               '__subclasshook__', '__weakref__', '__members__', '__mro__', '__itemsize__', '__isabstractmethod__',
               '__basicsize__', '__base__'}


def reset_module(module, inner_modules_also=True):
    """
    This function is a stronger form of importlib's `reload` function. What it does, is that aside from reloading a
    module, it goes to the old instance of the module, and sets all the (not read-only) attributes, functions and classes
    to be the reloaded-module's
    :param module: The module to reload (module reference, not the name)
    :param inner_modules_also: Whether to treat ths module as a package as well, and reload all the modules within it.
    """

    new_module = importlib.reload(module)

    reset_items = set()

    # For the case when the module is actually a package
    if inner_modules_also:
        submods = {submod for _, submod in inspect.getmembers(module)
                   if (type(submod).__name__ == 'module') and (submod.__package__.startswith(module.__name__))}
        for submod in submods:
            reset_module(submod, True)

    _reset_item_recursively(module, new_module, module.__name__, reset_items)


def _reset_item_recursively(item, new_item, module_name, reset_items=None):
    if reset_items is None:
        reset_items = set()

    attr_names = set(dir(item)) - _readonly_attrs

    for sitem_name in attr_names:

        sitem = getattr(item, sitem_name)
        new_sitem = getattr(new_item, sitem_name)

        try:
            # Set the item
            setattr(item, sitem_name, new_sitem)

            try:
                # Will work for classes and functions defined in that module.
                mod_name = sitem.__module__
            except AttributeError:
                mod_name = None

            # If this item was defined within this module, deep-reset
            if (mod_name is None) or (mod_name != module_name) or (id(sitem) in reset_items) \
                    or isinstance(sitem, EnumMeta):  # Deal with enums
                continue

            reset_items.add(id(sitem))
            _reset_item_recursively(sitem, new_sitem, module_name, reset_items)
        except Exception as ex:
            raise Exception(sitem_name) from ex

注意:小心使用!在非外围模块(例如,定义外部使用的类的模块)上使用它们可能会导致Python内部发生问题(例如,酸洗/不酸洗问题)。

Edit (Answer V2)

The solution from before is good for just getting the reset information, but it will not change all the references (more than reload but less then required). To actually set all the references as well, I had to go into the garbage collector, and rewrite the references there. Now it works like a charm!

Note that this will not work if the GC is turned off, or if reloading data that’s not monitored by the GC. If you don’t want to mess with the GC, the original answer might be enough for you.

New code:

import importlib
import inspect
import gc
from weakref import ref


def reset_module(module, inner_modules_also=True):
    """
    This function is a stronger form of importlib's `reload` function. What it does, is that aside from reloading a
    module, it goes to the old instance of the module, and sets all the (not read-only) attributes, functions and classes
    to be the reloaded-module's
    :param module: The module to reload (module reference, not the name)
    :param inner_modules_also: Whether to treat ths module as a package as well, and reload all the modules within it.
    """

    # For the case when the module is actually a package
    if inner_modules_also:
        submods = {submod for _, submod in inspect.getmembers(module)
                   if (type(submod).__name__ == 'module') and (submod.__package__.startswith(module.__name__))}
        for submod in submods:
            reset_module(submod, True)

    # First, log all the references before reloading (because some references may be changed by the reload operation).
    module_tree = _get_tree_references_to_reset_recursively(module, module.__name__)

    new_module = importlib.reload(module)
    _reset_item_recursively(module, module_tree, new_module)


def _update_referrers(item, new_item):
    refs = gc.get_referrers(item)

    weak_ref_item = ref(item)
    for coll in refs:
        if type(coll) == dict:
            enumerator = coll.keys()
        elif type(coll) == list:
            enumerator = range(len(coll))
        else:
            continue

        for key in enumerator:

            if weak_ref_item() is None:
                # No refs are left in the GC
                return

            if coll[key] is weak_ref_item():
                coll[key] = new_item

def _get_tree_references_to_reset_recursively(item, module_name, grayed_out_item_ids = None):
    if grayed_out_item_ids is None:
        grayed_out_item_ids = set()

    item_tree = dict()
    attr_names = set(dir(item)) - _readonly_attrs
    for sub_item_name in attr_names:

        sub_item = getattr(item, sub_item_name)
        item_tree[sub_item_name] = [sub_item, None]

        try:
            # Will work for classes and functions defined in that module.
            mod_name = sub_item.__module__
        except AttributeError:
            mod_name = None

        # If this item was defined within this module, deep-reset
        if (mod_name is None) or (mod_name != module_name) or (id(sub_item) in grayed_out_item_ids) \
                or isinstance(sub_item, EnumMeta):
            continue

        grayed_out_item_ids.add(id(sub_item))
        item_tree[sub_item_name][1] = \
            _get_tree_references_to_reset_recursively(sub_item, module_name, grayed_out_item_ids)

    return item_tree


def _reset_item_recursively(item, item_subtree, new_item):

    # Set children first so we don't lose the current references.
    if item_subtree is not None:
        for sub_item_name, (sub_item, sub_item_tree) in item_subtree.items():

            try:
                new_sub_item = getattr(new_item, sub_item_name)
            except AttributeError:
                # The item doesn't exist in the reloaded module. Ignore.
                continue

            try:
                # Set the item
                _reset_item_recursively(sub_item, sub_item_tree, new_sub_item)
            except Exception as ex:
                pass

    _update_referrers(item, new_item)

Original Answer

As written in @bobince’s answer, if there’s already a reference to that module in another module (especially if it was imported with the as keyword like import numpy as np), that instance will not be overwritten.

This proved quite problematic to me when applying tests that required a “clean-slate” state of the configuration modules, so I’ve written a function named reset_module that uses importlib‘s reload function and recursively overwrites all the declared module’s attributes. It has been tested with Python version 3.6.

import importlib
import inspect
from enum import EnumMeta

_readonly_attrs = {'__annotations__', '__call__', '__class__', '__closure__', '__code__', '__defaults__', '__delattr__',
               '__dict__', '__dir__', '__doc__', '__eq__', '__format__', '__func__', '__ge__', '__get__',
               '__getattribute__', '__globals__', '__gt__', '__hash__', '__init__', '__init_subclass__',
               '__kwdefaults__', '__le__', '__lt__', '__module__', '__name__', '__ne__', '__new__', '__qualname__',
               '__reduce__', '__reduce_ex__', '__repr__', '__self__', '__setattr__', '__sizeof__', '__str__',
               '__subclasshook__', '__weakref__', '__members__', '__mro__', '__itemsize__', '__isabstractmethod__',
               '__basicsize__', '__base__'}


def reset_module(module, inner_modules_also=True):
    """
    This function is a stronger form of importlib's `reload` function. What it does, is that aside from reloading a
    module, it goes to the old instance of the module, and sets all the (not read-only) attributes, functions and classes
    to be the reloaded-module's
    :param module: The module to reload (module reference, not the name)
    :param inner_modules_also: Whether to treat ths module as a package as well, and reload all the modules within it.
    """

    new_module = importlib.reload(module)

    reset_items = set()

    # For the case when the module is actually a package
    if inner_modules_also:
        submods = {submod for _, submod in inspect.getmembers(module)
                   if (type(submod).__name__ == 'module') and (submod.__package__.startswith(module.__name__))}
        for submod in submods:
            reset_module(submod, True)

    _reset_item_recursively(module, new_module, module.__name__, reset_items)


def _reset_item_recursively(item, new_item, module_name, reset_items=None):
    if reset_items is None:
        reset_items = set()

    attr_names = set(dir(item)) - _readonly_attrs

    for sitem_name in attr_names:

        sitem = getattr(item, sitem_name)
        new_sitem = getattr(new_item, sitem_name)

        try:
            # Set the item
            setattr(item, sitem_name, new_sitem)

            try:
                # Will work for classes and functions defined in that module.
                mod_name = sitem.__module__
            except AttributeError:
                mod_name = None

            # If this item was defined within this module, deep-reset
            if (mod_name is None) or (mod_name != module_name) or (id(sitem) in reset_items) \
                    or isinstance(sitem, EnumMeta):  # Deal with enums
                continue

            reset_items.add(id(sitem))
            _reset_item_recursively(sitem, new_sitem, module_name, reset_items)
        except Exception as ex:
            raise Exception(sitem_name) from ex

Note: Use with care! Using these on non-peripheral modules (modules that define externally-used classes, for example) might lead to internal problems in Python (such as pickling/un-pickling issues).


回答 16

对我而言,Abaqus就是这种方式。假设您的文件是Class_VerticesEdges.py

sys.path.append('D:\...\My Pythons')
if 'Class_VerticesEdges' in sys.modules:  
    del sys.modules['Class_VerticesEdges']
    print 'old module Class_VerticesEdges deleted'
from Class_VerticesEdges import *
reload(sys.modules['Class_VerticesEdges'])

for me for case of Abaqus it is the way it works. Imagine your file is Class_VerticesEdges.py

sys.path.append('D:\...\My Pythons')
if 'Class_VerticesEdges' in sys.modules:  
    del sys.modules['Class_VerticesEdges']
    print 'old module Class_VerticesEdges deleted'
from Class_VerticesEdges import *
reload(sys.modules['Class_VerticesEdges'])

回答 17

尝试在Sublime Text中重新加载某些内容时遇到了很多麻烦,但最终我可以编写此实用程序,根据代码在Sublime Text上重新加载模块 sublime_plugin.py用于重新加载模块重新加载模块。

下面的内容允许您从路径上带有空格的模块中重新加载模块,然后在重新加载之后,您可以照常导入。

def reload_module(full_module_name):
    """
        Assuming the folder `full_module_name` is a folder inside some
        folder on the python sys.path, for example, sys.path as `C:/`, and
        you are inside the folder `C:/Path With Spaces` on the file 
        `C:/Path With Spaces/main.py` and want to re-import some files on
        the folder `C:/Path With Spaces/tests`

        @param full_module_name   the relative full path to the module file
                                  you want to reload from a folder on the
                                  python `sys.path`
    """
    import imp
    import sys
    import importlib

    if full_module_name in sys.modules:
        module_object = sys.modules[full_module_name]
        module_object = imp.reload( module_object )

    else:
        importlib.import_module( full_module_name )

def run_tests():
    print( "\n\n" )
    reload_module( "Path With Spaces.tests.semantic_linefeed_unit_tests" )
    reload_module( "Path With Spaces.tests.semantic_linefeed_manual_tests" )

    from .tests import semantic_linefeed_unit_tests
    from .tests import semantic_linefeed_manual_tests

    semantic_linefeed_unit_tests.run_unit_tests()
    semantic_linefeed_manual_tests.run_manual_tests()

if __name__ == "__main__":
    run_tests()

如果是第一次运行,则应该加载该模块,但是如果以后可以再次使用该方法/功能run_tests(),它将重新加载测试文件。使用Sublime Text(Python 3.3.6)会发生很多事情,因为它的解释器永远不会关闭(除非您重新启动Sublime Text,即Python3.3解释器)。

I got a lot of trouble trying to reload something inside Sublime Text, but finally I could wrote this utility to reload modules on Sublime Text based on the code sublime_plugin.py uses to reload modules.

This below accepts you to reload modules from paths with spaces on their names, then later after reloading you can just import as you usually do.

def reload_module(full_module_name):
    """
        Assuming the folder `full_module_name` is a folder inside some
        folder on the python sys.path, for example, sys.path as `C:/`, and
        you are inside the folder `C:/Path With Spaces` on the file 
        `C:/Path With Spaces/main.py` and want to re-import some files on
        the folder `C:/Path With Spaces/tests`

        @param full_module_name   the relative full path to the module file
                                  you want to reload from a folder on the
                                  python `sys.path`
    """
    import imp
    import sys
    import importlib

    if full_module_name in sys.modules:
        module_object = sys.modules[full_module_name]
        module_object = imp.reload( module_object )

    else:
        importlib.import_module( full_module_name )

def run_tests():
    print( "\n\n" )
    reload_module( "Path With Spaces.tests.semantic_linefeed_unit_tests" )
    reload_module( "Path With Spaces.tests.semantic_linefeed_manual_tests" )

    from .tests import semantic_linefeed_unit_tests
    from .tests import semantic_linefeed_manual_tests

    semantic_linefeed_unit_tests.run_unit_tests()
    semantic_linefeed_manual_tests.run_manual_tests()

if __name__ == "__main__":
    run_tests()

If you run for the first time, this should load the module, but if later you can again the method/function run_tests() it will reload the tests files. With Sublime Text (Python 3.3.6) this happens a lot because its interpreter never closes (unless you restart Sublime Text, i.e., the Python3.3 interpreter).


回答 18

另一种方法是将模块导入功能中。这样,当函数完成时,模块将收集垃圾。

Another way could be to import the module in a function. This way when the function completes the module gets garbage collected.