分类目录归档:知识问答

pip安装mysql-python失败,并显示EnvironmentError:找不到mysql_config

问题:pip安装mysql-python失败,并显示EnvironmentError:找不到mysql_config

这是我得到的错误

(mysite)zjm1126@zjm1126-G41MT-S2:~/zjm_test/mysite$ pip install mysql-python
Downloading/unpacking mysql-python
  Downloading MySQL-python-1.2.3.tar.gz (70Kb): 70Kb downloaded
  Running setup.py egg_info for package mysql-python
    sh: mysql_config: not found
    Traceback (most recent call last):
      File "<string>", line 14, in <module>
      File "/home/zjm1126/zjm_test/mysite/build/mysql-python/setup.py", line 15, in <module>
        metadata, options = get_config()
      File "setup_posix.py", line 43, in get_config
        libs = mysql_config("libs_r")
      File "setup_posix.py", line 24, in mysql_config
        raise EnvironmentError("%s not found" % (mysql_config.path,))
    EnvironmentError: mysql_config not found
    Complete output from command python setup.py egg_info:
    sh: mysql_config: not found

Traceback (most recent call last):

  File "<string>", line 14, in <module>

  File "/home/zjm1126/zjm_test/mysite/build/mysql-python/setup.py", line 15, in <module>

    metadata, options = get_config()

  File "setup_posix.py", line 43, in get_config

    libs = mysql_config("libs_r")

  File "setup_posix.py", line 24, in mysql_config

    raise EnvironmentError("%s not found" % (mysql_config.path,))

EnvironmentError: mysql_config not found

----------------------------------------
Command python setup.py egg_info failed with error code 1
Storing complete log in /home/zjm1126/.pip/pip.log
(mysite)zjm1126@zjm1126-G41MT-S2:~/zjm_test/mysite$ pip install mysql-python
Downloading/unpacking mysql-python
  Running setup.py egg_info for package mysql-python
    sh: mysql_config: not found
    Traceback (most recent call last):
      File "<string>", line 14, in <module>
      File "/home/zjm1126/zjm_test/mysite/build/mysql-python/setup.py", line 15, in <module>
        metadata, options = get_config()
      File "setup_posix.py", line 43, in get_config
        libs = mysql_config("libs_r")
      File "setup_posix.py", line 24, in mysql_config
        raise EnvironmentError("%s not found" % (mysql_config.path,))
    EnvironmentError: mysql_config not found
    Complete output from command python setup.py egg_info:
    sh: mysql_config: not found

Traceback (most recent call last):

  File "<string>", line 14, in <module>

  File "/home/zjm1126/zjm_test/mysite/build/mysql-python/setup.py", line 15, in <module>

    metadata, options = get_config()

  File "setup_posix.py", line 43, in get_config

    libs = mysql_config("libs_r")

  File "setup_posix.py", line 24, in mysql_config

    raise EnvironmentError("%s not found" % (mysql_config.path,))

EnvironmentError: mysql_config not found

----------------------------------------
Command python setup.py egg_info failed with error code 1
Storing complete log in /home/zjm1126/.pip/pip.log

我该怎么解决?

This is the error I get

(mysite)zjm1126@zjm1126-G41MT-S2:~/zjm_test/mysite$ pip install mysql-python
Downloading/unpacking mysql-python
  Downloading MySQL-python-1.2.3.tar.gz (70Kb): 70Kb downloaded
  Running setup.py egg_info for package mysql-python
    sh: mysql_config: not found
    Traceback (most recent call last):
      File "<string>", line 14, in <module>
      File "/home/zjm1126/zjm_test/mysite/build/mysql-python/setup.py", line 15, in <module>
        metadata, options = get_config()
      File "setup_posix.py", line 43, in get_config
        libs = mysql_config("libs_r")
      File "setup_posix.py", line 24, in mysql_config
        raise EnvironmentError("%s not found" % (mysql_config.path,))
    EnvironmentError: mysql_config not found
    Complete output from command python setup.py egg_info:
    sh: mysql_config: not found

Traceback (most recent call last):

  File "<string>", line 14, in <module>

  File "/home/zjm1126/zjm_test/mysite/build/mysql-python/setup.py", line 15, in <module>

    metadata, options = get_config()

  File "setup_posix.py", line 43, in get_config

    libs = mysql_config("libs_r")

  File "setup_posix.py", line 24, in mysql_config

    raise EnvironmentError("%s not found" % (mysql_config.path,))

EnvironmentError: mysql_config not found

----------------------------------------
Command python setup.py egg_info failed with error code 1
Storing complete log in /home/zjm1126/.pip/pip.log
(mysite)zjm1126@zjm1126-G41MT-S2:~/zjm_test/mysite$ pip install mysql-python
Downloading/unpacking mysql-python
  Running setup.py egg_info for package mysql-python
    sh: mysql_config: not found
    Traceback (most recent call last):
      File "<string>", line 14, in <module>
      File "/home/zjm1126/zjm_test/mysite/build/mysql-python/setup.py", line 15, in <module>
        metadata, options = get_config()
      File "setup_posix.py", line 43, in get_config
        libs = mysql_config("libs_r")
      File "setup_posix.py", line 24, in mysql_config
        raise EnvironmentError("%s not found" % (mysql_config.path,))
    EnvironmentError: mysql_config not found
    Complete output from command python setup.py egg_info:
    sh: mysql_config: not found

Traceback (most recent call last):

  File "<string>", line 14, in <module>

  File "/home/zjm1126/zjm_test/mysite/build/mysql-python/setup.py", line 15, in <module>

    metadata, options = get_config()

  File "setup_posix.py", line 43, in get_config

    libs = mysql_config("libs_r")

  File "setup_posix.py", line 24, in mysql_config

    raise EnvironmentError("%s not found" % (mysql_config.path,))

EnvironmentError: mysql_config not found

----------------------------------------
Command python setup.py egg_info failed with error code 1
Storing complete log in /home/zjm1126/.pip/pip.log

What can I do to resolve this?


回答 0

看来您的系统上缺少mysql_config或安装程序找不到它。确保确实安装了mysql_config。

例如,在Debian / Ubuntu上,您必须安装软件包:

sudo apt-get install libmysqlclient-dev

也许mysql_config不在您的路径中,当您自己编译mysql套件时就是这种情况。

更新:对于最新版本的debian / ubuntu(截至2018年),它是

sudo apt install default-libmysqlclient-dev

It seems mysql_config is missing on your system or the installer could not find it. Be sure mysql_config is really installed.

For example on Debian/Ubuntu you must install the package:

sudo apt-get install libmysqlclient-dev

Maybe the mysql_config is not in your path, it will be the case when you compile by yourself the mysql suite.

Update: For recent versions of debian/ubuntu (as of 2018) it is

sudo apt install default-libmysqlclient-dev

回答 1

在Mac OS中,我只是在终端中运行此程序来修复:

export PATH=$PATH:/usr/local/mysql/bin

这是我找到的最快的修复程序-将其添加到路径中,但是/etc/paths如果您打算在其他环境中安装MySQL-python,最好永久添加(即将其添加到)。

(在OSX Mountain Lion中测试)

In Mac OS, I simply ran this in terminal to fix:

export PATH=$PATH:/usr/local/mysql/bin

This is the quickest fix I found – it adds it to the path, but I think you’re better off adding it permanently (ie add it to /etc/paths) if you plan to install MySQL-python in another environment.

(tested in OSX Mountain Lion)


回答 2

apt-get install libmysqlclient-dev python-dev

似乎做到了。

apt-get install libmysqlclient-dev python-dev

Seemed to do the trick.


回答 3

上面的问题可能有各种答案,下面是一个汇总的解决方案。

对于Ubuntu:

$ sudo apt update
$ sudo apt install python-dev
$ sudo apt install python-MySQLdb

对于CentOS:

$ yum install python-devel mysql-devel

There maybe various answers for the above issue, below is a aggregated solution.

For Ubuntu:

$ sudo apt update
$ sudo apt install python-dev
$ sudo apt install python-MySQLdb

For CentOS:

$ yum install python-devel mysql-devel

回答 4

如果您使用的是MAC,请全局安装

brew install mysql

然后像这样导出路径

export PATH=$PATH:/usr/local/mysql/bin

比全球或您喜欢的任何方式

pip install MySQL-Python

注意:全局适用于python3,因为Mac可以同时拥有python2和3

pip3 install MySQL-Python

If you are on MAC Install this globally

brew install mysql

then export path like this

export PATH=$PATH:/usr/local/mysql/bin

Than globally or in your venv whatever you like

pip install MySQL-Python

Note: globally for python3 as Mac can have both python2 & 3

pip3 install MySQL-Python

回答 5

您可以使用MySQL Connector / Python

通过PyPip安装

pip install mysql-connector-python

可以在MySQL Connector / Python 1.0.5 beta公告博客上找到更多信息。

在Launchpad上,有一个很好的示例,说明如何使用该库添加,编辑或删除数据

You can use the MySQL Connector/Python

Installation via PyPip

pip install mysql-connector-python

Further information can be found on the MySQL Connector/Python 1.0.5 beta announcement blog.

On Launchpad there’s a good example of how to add-, edit- or remove data with the library.


回答 6

对于centos用户:

yum install -y mysql-devel python-devel python-setuptools

然后

pip install MySQL-python


如果此解决方案不起作用,请打印gcc编译错误,例如:
_mysql.c:29:20: error: Python.h: No such file or directory

您需要指定的路径Python.h,如下所示:
pip install --global-option=build_ext --global-option="-I/usr/include/python2.6" MySQL-python

For centos users:

yum install -y mysql-devel python-devel python-setuptools

then

pip install MySQL-python


If this solution doesn’t work, and print gcc compile error like:
_mysql.c:29:20: error: Python.h: No such file or directory

You need to specify the path of Python.h, like this:
pip install --global-option=build_ext --global-option="-I/usr/include/python2.6" MySQL-python


回答 7

我试图mysql-python在Amazon EC2 Linux实例上安装,但我必须安装这些:

yum install mysql mysql-devel mysql-common mysql-libs gcc

但是后来我得到了这个错误:

_mysql.c:29:20: fatal error: Python.h: No such file or directory

所以我安装了:

yum install python-devel

那就成功了。

I was trying to install mysql-python on an Amazon EC2 Linux instance and I had to install these :

yum install mysql mysql-devel mysql-common mysql-libs gcc

But then I got this error :

_mysql.c:29:20: fatal error: Python.h: No such file or directory

So I installed :

yum install python-devel

And that did the trick.


回答 8

对于任何使用MariaDB而不是MySQL的用户,解决方案是安装libmariadbclient-dev软件包并创建指向具有正确名称的配置文件的符号链接。

例如,这对我有用:

ln -s /usr/bin/mariadb_config /usr/bin/mysql_config

For anyone that is using MariaDB instead of MySQL, the solution is to install the libmariadbclient-dev package and create a symbolic link to the config file with the correct name.

For example this worked for me:

ln -s /usr/bin/mariadb_config /usr/bin/mysql_config

回答 9

尝试 sudo apt-get build-dep python-mysqldb

Try sudo apt-get build-dep python-mysqldb


回答 10

OSX小牛

由于osx mavericks和xcode开发工具中的更改,您可能会在安装时得到错误

clang: error: unknown argument: '-mno-fused-madd' [-Wunused-command-line-argument-hard-error-in-future]

因此使用:

sudo ARCHFLAGS=-Wno-error=unused-command-line-argument-hard-error-in-future pip install mysql-python

OSX Mavericks

Due to changes within osx mavericks & xcode development tools you may get the error on installation

clang: error: unknown argument: '-mno-fused-madd' [-Wunused-command-line-argument-hard-error-in-future]

therefore use :

sudo ARCHFLAGS=-Wno-error=unused-command-line-argument-hard-error-in-future pip install mysql-python

回答 11

对于Linux

这对我有用

yum install python-devel mysql-devel

For Linux

this works for me

yum install python-devel mysql-devel

回答 12

对于mariadb,请安装lib mariadb client-dev而不是libmysqlclient-dev

sudo apt-get install libmariadbclient-dev

for mariadb install libmariadbclient-dev instead of libmysqlclient-dev

sudo apt-get install libmariadbclient-dev

回答 13

您应该安装第mysql一个:

yum install python-devel mysql-community-devel -y

然后您可以安装mysqlclient

pip install  mysqlclient

You should install the mysql first:

yum install python-devel mysql-community-devel -y

Then you can install mysqlclient:

pip install  mysqlclient

回答 14

有时,错误取决于实际原因。我们曾经遇到过通过python-mysqldb debian软件包安装mysql-python的情况。

一个不知道这一点的开发人员,无意中跑了出来,但pip uninstall mysql-python由于pip install mysql-python给出上述错误而无法恢复。

pip uninstall mysql-python已经破坏了debian软件包的内容,当然pip install mysql-python失败了,因为debian软件包不需要任何dev文件。

在这种情况下,正确的解决方案是apt-get install --reinstall python-mysqldb将mysql-python恢复到其原始状态。

sometimes the error depends on the actual cause. we had a case where mysql-python was installed through the python-mysqldb debian package.

a developer who didn’t know this, accidentally ran pip uninstall mysql-python and then failed to recover with pip install mysql-python giving the above error.

pip uninstall mysql-python had destroyed the debian package contents, and of course pip install mysql-python failed because the debian package didn’t need any dev files.

the correct solution in that case was apt-get install --reinstall python-mysqldb which restored mysql-python to its original state.


回答 15

我在Terraform:light容器中遇到了同样的问题。它基于高山。

在那里,您必须使用以下命令安装mariadb-dev:

apk add mariadb-dev

但是,这还不够,因为还遗漏了所有其他依赖项:

apk add python2 py2-pip gcc python2-dev musl-dev

I had the same problem in the Terraform:light container. It is based on Alpine.

There you have to install mariadb-dev with:

apk add mariadb-dev

But that one is not enough because also all the other dependencies are missed:

apk add python2 py2-pip gcc python2-dev musl-dev

回答 16

要遵循的顺序。

pip install mysqlclient
sudo apt-get install python3-dev libmysqlclient-dev
pip install configparser 
sudo cp /usr/lib/python3.6/configparser.py /usr/lib/python3.6/ConfigParser.py 

然后尝试再次安装MYSQL-python。对我有用

Sequence to be followed.

pip install mysqlclient
sudo apt-get install python3-dev libmysqlclient-dev
pip install configparser 
sudo cp /usr/lib/python3.6/configparser.py /usr/lib/python3.6/ConfigParser.py 

Then try to install the MYSQL-python again. That Worked for me


回答 17

尝试在OS X Server 10.6.8上安装时遇到了类似的问题。这就是我要做的。使用:

MySQL-python 1.2.4b4(源)MySQL-5.6.19(二进制安装程序)Python 2.7(二进制安装程序)注意:在virtualenv中安装…

解压缩源代码,打开’distribute_setup.py’并编辑DEFAULT_VERSION以使用最新版本的分发工具,如下所示:

DEFAULT_VERSION = "0.6.49"

救。打开“ site.cfg”文件,取消注释mysql_config的路径,使其看起来像(参考您自己的mysql_config路径):

# The path to mysql_config.
# Only use this if mysql_config is not on your PATH, or you have some weird
# setup that requires it.
mysql_config = /usr/local/mysql/bin/mysql_config

现在,清理,构建和制作不会因找不到“ mysql_config”错误而失败。希望这可以帮助其他尝试利用其旧xserve的人:-)

Had a similar issue trying to install on OS X Server 10.6.8. Here’s what I had to do. Using:

MySQL-python 1.2.4b4 (source) MySQL-5.6.19 (binary installer) Python 2.7 (binary installer) NOTE: Installing in virtualenv…

Unzip source, open ‘distribute_setup.py’ and edit DEFAULT_VERSION to use the latest version of distribute tools, like so:

DEFAULT_VERSION = "0.6.49"

Save. Open ‘site.cfg’ file and uncomment the path to mysql_config so it looks something like (reference your own path to mysql_config):

# The path to mysql_config.
# Only use this if mysql_config is not on your PATH, or you have some weird
# setup that requires it.
mysql_config = /usr/local/mysql/bin/mysql_config

Now clean, build and make will not fail with the ‘mysql_config’ not found error. Hope this helps someone else trying to make use of their old xserves :-)


回答 18

您的sudo路径不知道您的本地路径…进入超级用户模式,添加路径,然后从那里安装它。

sudo su
export PATH=$PATH:/usr/local/mysql/bin/
pip install mysql-python
exit

您就可以在OSX上运行了。现在,您有了一个更新的全局python。

Your sudo path does not know about your local path… go into superuser mode, add the path, and install it from there.

sudo su
export PATH=$PATH:/usr/local/mysql/bin/
pip install mysql-python
exit

And you’re up and running on OSX. Now you have an updated global python.


回答 19

如果在虚拟环境中安装MySQL-python,则应检查pip版本,如果该版本早于9.0.1,请进行更新

pip install --upgrade pip

if you install MySQL-python in your virtual env, you should check the pip version, if the version is older than 9.0.1, please update it

pip install --upgrade pip

回答 20

在MacOS Mojave上,mysql_config位于/ usr / local / bin /而不是如上所述的/ usr / local / mysql / bin,因此无需在路径中添加任何内容。

on MacOS Mojave, mysql_config is found at /usr/local/bin/ rather than /usr/local/mysql/bin as pointed above, so no need to add anything to path.


检查列表中是否存在值的最快方法

问题:检查列表中是否存在值的最快方法

最快的方法是什么才能知道列表中是否存在值(列表中包含数百万个值)及其索引是什么?

我知道列表中的所有值都是唯一的,如本例所示。

我尝试的第一种方法是(在我的实际代码中为3.8秒):

a = [4,2,3,1,5,6]

if a.count(7) == 1:
    b=a.index(7)
    "Do something with variable b"

我尝试的第二种方法是(速度提高了2倍:实际代码为1.9秒):

a = [4,2,3,1,5,6]

try:
    b=a.index(7)
except ValueError:
    "Do nothing"
else:
    "Do something with variable b"

堆栈溢出用户建议的方法(我的实际代码为2.74秒):

a = [4,2,3,1,5,6]
if 7 in a:
    a.index(7)

在我的真实代码中,第一种方法耗时3.81秒,第二种方法耗时1.88秒。这是一个很好的改进,但是:

我是使用Python /脚本的初学者,有没有更快的方法来完成相同的事情并节省更多的处理时间?

我的应用程序更具体的说明:

在Blender API中,我可以访问粒子列表:

particles = [1, 2, 3, 4, etc.]

从那里,我可以访问粒子的位置:

particles[x].location = [x,y,z]

对于每个粒子,我通过搜索每个粒子位置来测试是否存在邻居:

if [x+1,y,z] in particles.location
    "Find the identity of this neighbour particle in x:the particle's index
    in the array"
    particles.index([x+1,y,z])

What is the fastest way to know if a value exists in a list (a list with millions of values in it) and what its index is?

I know that all values in the list are unique as in this example.

The first method I try is (3.8 sec in my real code):

a = [4,2,3,1,5,6]

if a.count(7) == 1:
    b=a.index(7)
    "Do something with variable b"

The second method I try is (2x faster: 1.9 sec for my real code):

a = [4,2,3,1,5,6]

try:
    b=a.index(7)
except ValueError:
    "Do nothing"
else:
    "Do something with variable b"

Proposed methods from Stack Overflow user (2.74 sec for my real code):

a = [4,2,3,1,5,6]
if 7 in a:
    a.index(7)

In my real code, the first method takes 3.81 sec and the second method takes 1.88 sec. It’s a good improvement, but:

I’m a beginner with Python/scripting, and is there a faster way to do the same things and save more processing time?

More specific explication for my application:

In the Blender API I can access a list of particles:

particles = [1, 2, 3, 4, etc.]

From there, I can access a particle’s location:

particles[x].location = [x,y,z]

And for each particle I test if a neighbour exists by searching each particle location like so:

if [x+1,y,z] in particles.location
    "Find the identity of this neighbour particle in x:the particle's index
    in the array"
    particles.index([x+1,y,z])

回答 0

7 in a

最清晰,最快的方法。

您也可以考虑使用set,但是从列表中构造该集合所花费的时间可能比更快的成员资格测试所节省的时间还要长。唯一可以确定的基准就是基准测试。(这还取决于您需要执行哪些操作)

7 in a

Clearest and fastest way to do it.

You can also consider using a set, but constructing that set from your list may take more time than faster membership testing will save. The only way to be certain is to benchmark well. (this also depends on what operations you require)


回答 1

正如其他人所述,in对于大型列表,它可能非常慢。这里是表演一些比较insetbisect。请注意时间(以秒为单位)是对数刻度。

在此处输入图片说明

测试代码:

import random
import bisect
import matplotlib.pyplot as plt
import math
import time

def method_in(a,b,c):
    start_time = time.time()
    for i,x in enumerate(a):
        if x in b:
            c[i] = 1
    return(time.time()-start_time)   

def method_set_in(a,b,c):
    start_time = time.time()
    s = set(b)
    for i,x in enumerate(a):
        if x in s:
            c[i] = 1
    return(time.time()-start_time)

def method_bisect(a,b,c):
    start_time = time.time()
    b.sort()
    for i,x in enumerate(a):
        index = bisect.bisect_left(b,x)
        if index < len(a):
            if x == b[index]:
                c[i] = 1
    return(time.time()-start_time)

def profile():
    time_method_in = []
    time_method_set_in = []
    time_method_bisect = []

    Nls = [x for x in range(1000,20000,1000)]
    for N in Nls:
        a = [x for x in range(0,N)]
        random.shuffle(a)
        b = [x for x in range(0,N)]
        random.shuffle(b)
        c = [0 for x in range(0,N)]

        time_method_in.append(math.log(method_in(a,b,c)))
        time_method_set_in.append(math.log(method_set_in(a,b,c)))
        time_method_bisect.append(math.log(method_bisect(a,b,c)))

    plt.plot(Nls,time_method_in,marker='o',color='r',linestyle='-',label='in')
    plt.plot(Nls,time_method_set_in,marker='o',color='b',linestyle='-',label='set')
    plt.plot(Nls,time_method_bisect,marker='o',color='g',linestyle='-',label='bisect')
    plt.xlabel('list size', fontsize=18)
    plt.ylabel('log(time)', fontsize=18)
    plt.legend(loc = 'upper left')
    plt.show()

As stated by others, in can be very slow for large lists. Here are some comparisons of the performances for in, set and bisect. Note the time (in second) is in log scale.

enter image description here

Code for testing:

import random
import bisect
import matplotlib.pyplot as plt
import math
import time

def method_in(a,b,c):
    start_time = time.time()
    for i,x in enumerate(a):
        if x in b:
            c[i] = 1
    return(time.time()-start_time)   

def method_set_in(a,b,c):
    start_time = time.time()
    s = set(b)
    for i,x in enumerate(a):
        if x in s:
            c[i] = 1
    return(time.time()-start_time)

def method_bisect(a,b,c):
    start_time = time.time()
    b.sort()
    for i,x in enumerate(a):
        index = bisect.bisect_left(b,x)
        if index < len(a):
            if x == b[index]:
                c[i] = 1
    return(time.time()-start_time)

def profile():
    time_method_in = []
    time_method_set_in = []
    time_method_bisect = []

    Nls = [x for x in range(1000,20000,1000)]
    for N in Nls:
        a = [x for x in range(0,N)]
        random.shuffle(a)
        b = [x for x in range(0,N)]
        random.shuffle(b)
        c = [0 for x in range(0,N)]

        time_method_in.append(math.log(method_in(a,b,c)))
        time_method_set_in.append(math.log(method_set_in(a,b,c)))
        time_method_bisect.append(math.log(method_bisect(a,b,c)))

    plt.plot(Nls,time_method_in,marker='o',color='r',linestyle='-',label='in')
    plt.plot(Nls,time_method_set_in,marker='o',color='b',linestyle='-',label='set')
    plt.plot(Nls,time_method_bisect,marker='o',color='g',linestyle='-',label='bisect')
    plt.xlabel('list size', fontsize=18)
    plt.ylabel('log(time)', fontsize=18)
    plt.legend(loc = 'upper left')
    plt.show()

回答 2

您可以将物品放入set。集合查找非常有效。

尝试:

s = set(a)
if 7 in s:
  # do stuff

编辑在注释中,您说您想获取元素的索引。不幸的是,集合没有元素位置的概念。另一种方法是对列表进行预排序,然后在每次需要查找元素时使用二进制搜索

You could put your items into a set. Set lookups are very efficient.

Try:

s = set(a)
if 7 in s:
  # do stuff

edit In a comment you say that you’d like to get the index of the element. Unfortunately, sets have no notion of element position. An alternative is to pre-sort your list and then use binary search every time you need to find an element.


回答 3

def check_availability(element, collection: iter):
    return element in collection

用法

check_availability('a', [1,2,3,4,'a','b','c'])

我相信这是知道所选值是否在数组中的最快方法。

def check_availability(element, collection: iter):
    return element in collection

Usage

check_availability('a', [1,2,3,4,'a','b','c'])

I believe this is the fastest way to know if a chosen value is in an array.


回答 4

a = [4,2,3,1,5,6]

index = dict((y,x) for x,y in enumerate(a))
try:
   a_index = index[7]
except KeyError:
   print "Not found"
else:
   print "found"

如果a不变,这将是一个好主意,因此我们可以做一次dict()部分,然后重复使用它。如果确实发生变化,请提供您正在做的更多详细信息。

a = [4,2,3,1,5,6]

index = dict((y,x) for x,y in enumerate(a))
try:
   a_index = index[7]
except KeyError:
   print "Not found"
else:
   print "found"

This will only be a good idea if a doesn’t change and thus we can do the dict() part once and then use it repeatedly. If a does change, please provide more detail on what you are doing.


回答 5

最初的问题是:

最快的方法是什么才能知道列表中是否存在值(列表中包含数百万个值)及其索引是什么?

因此,有两件事可以找到:

  1. 是列表中的一项,并且
  2. 什么是索引(如果在列表中)。

为此,我修改了@xslittlegrass代码以在所有情况下计算索引,并添加了其他方法。

结果

在此处输入图片说明

方法是:

  1. in-基本上如果b中的x:返回b.index(x)
  2. try–try / catch on b.index(x)(跳过必须检查b中的x)
  3. set-基本上,如果x在set(b)中:返回b.index(x)
  4. bisect-用索引对其b进行排序,对sorted(b)中的x进行二进制搜索。请注意@xslittlegrass的mod,它返回排序后的b中的索引,而不是原始b)
  5. 反向-为b形成反向查找字典d; 然后d [x]提供x的索引。

结果表明,方法5最快。

有趣的是,tryset方法在时间上是等效的。


测试代码

import random
import bisect
import matplotlib.pyplot as plt
import math
import timeit
import itertools

def wrapper(func, *args, **kwargs):
    " Use to produced 0 argument function for call it"
    # Reference https://www.pythoncentral.io/time-a-python-function/
    def wrapped():
        return func(*args, **kwargs)
    return wrapped

def method_in(a,b,c):
    for i,x in enumerate(a):
        if x in b:
            c[i] = b.index(x)
        else:
            c[i] = -1
    return c

def method_try(a,b,c):
    for i, x in enumerate(a):
        try:
            c[i] = b.index(x)
        except ValueError:
            c[i] = -1

def method_set_in(a,b,c):
    s = set(b)
    for i,x in enumerate(a):
        if x in s:
            c[i] = b.index(x)
        else:
            c[i] = -1
    return c

def method_bisect(a,b,c):
    " Finds indexes using bisection "

    # Create a sorted b with its index
    bsorted = sorted([(x, i) for i, x in enumerate(b)], key = lambda t: t[0])

    for i,x in enumerate(a):
        index = bisect.bisect_left(bsorted,(x, ))
        c[i] = -1
        if index < len(a):
            if x == bsorted[index][0]:
                c[i] = bsorted[index][1]  # index in the b array

    return c

def method_reverse_lookup(a, b, c):
    reverse_lookup = {x:i for i, x in enumerate(b)}
    for i, x in enumerate(a):
        c[i] = reverse_lookup.get(x, -1)
    return c

def profile():
    Nls = [x for x in range(1000,20000,1000)]
    number_iterations = 10
    methods = [method_in, method_try, method_set_in, method_bisect, method_reverse_lookup]
    time_methods = [[] for _ in range(len(methods))]

    for N in Nls:
        a = [x for x in range(0,N)]
        random.shuffle(a)
        b = [x for x in range(0,N)]
        random.shuffle(b)
        c = [0 for x in range(0,N)]

        for i, func in enumerate(methods):
            wrapped = wrapper(func, a, b, c)
            time_methods[i].append(math.log(timeit.timeit(wrapped, number=number_iterations)))

    markers = itertools.cycle(('o', '+', '.', '>', '2'))
    colors = itertools.cycle(('r', 'b', 'g', 'y', 'c'))
    labels = itertools.cycle(('in', 'try', 'set', 'bisect', 'reverse'))

    for i in range(len(time_methods)):
        plt.plot(Nls,time_methods[i],marker = next(markers),color=next(colors),linestyle='-',label=next(labels))

    plt.xlabel('list size', fontsize=18)
    plt.ylabel('log(time)', fontsize=18)
    plt.legend(loc = 'upper left')
    plt.show()

profile()

The original question was:

What is the fastest way to know if a value exists in a list (a list with millions of values in it) and what its index is?

Thus there are two things to find:

  1. is an item in the list, and
  2. what is the index (if in the list).

Towards this, I modified @xslittlegrass code to compute indexes in all cases, and added an additional method.

Results

enter image description here

Methods are:

  1. in–basically if x in b: return b.index(x)
  2. try–try/catch on b.index(x) (skips having to check if x in b)
  3. set–basically if x in set(b): return b.index(x)
  4. bisect–sort b with its index, binary search for x in sorted(b). Note mod from @xslittlegrass who returns the index in the sorted b, rather than the original b)
  5. reverse–form a reverse lookup dictionary d for b; then d[x] provides the index of x.

Results show that method 5 is the fastest.

Interestingly the try and the set methods are equivalent in time.


Test Code

import random
import bisect
import matplotlib.pyplot as plt
import math
import timeit
import itertools

def wrapper(func, *args, **kwargs):
    " Use to produced 0 argument function for call it"
    # Reference https://www.pythoncentral.io/time-a-python-function/
    def wrapped():
        return func(*args, **kwargs)
    return wrapped

def method_in(a,b,c):
    for i,x in enumerate(a):
        if x in b:
            c[i] = b.index(x)
        else:
            c[i] = -1
    return c

def method_try(a,b,c):
    for i, x in enumerate(a):
        try:
            c[i] = b.index(x)
        except ValueError:
            c[i] = -1

def method_set_in(a,b,c):
    s = set(b)
    for i,x in enumerate(a):
        if x in s:
            c[i] = b.index(x)
        else:
            c[i] = -1
    return c

def method_bisect(a,b,c):
    " Finds indexes using bisection "

    # Create a sorted b with its index
    bsorted = sorted([(x, i) for i, x in enumerate(b)], key = lambda t: t[0])

    for i,x in enumerate(a):
        index = bisect.bisect_left(bsorted,(x, ))
        c[i] = -1
        if index < len(a):
            if x == bsorted[index][0]:
                c[i] = bsorted[index][1]  # index in the b array

    return c

def method_reverse_lookup(a, b, c):
    reverse_lookup = {x:i for i, x in enumerate(b)}
    for i, x in enumerate(a):
        c[i] = reverse_lookup.get(x, -1)
    return c

def profile():
    Nls = [x for x in range(1000,20000,1000)]
    number_iterations = 10
    methods = [method_in, method_try, method_set_in, method_bisect, method_reverse_lookup]
    time_methods = [[] for _ in range(len(methods))]

    for N in Nls:
        a = [x for x in range(0,N)]
        random.shuffle(a)
        b = [x for x in range(0,N)]
        random.shuffle(b)
        c = [0 for x in range(0,N)]

        for i, func in enumerate(methods):
            wrapped = wrapper(func, a, b, c)
            time_methods[i].append(math.log(timeit.timeit(wrapped, number=number_iterations)))

    markers = itertools.cycle(('o', '+', '.', '>', '2'))
    colors = itertools.cycle(('r', 'b', 'g', 'y', 'c'))
    labels = itertools.cycle(('in', 'try', 'set', 'bisect', 'reverse'))

    for i in range(len(time_methods)):
        plt.plot(Nls,time_methods[i],marker = next(markers),color=next(colors),linestyle='-',label=next(labels))

    plt.xlabel('list size', fontsize=18)
    plt.ylabel('log(time)', fontsize=18)
    plt.legend(loc = 'upper left')
    plt.show()

profile()

回答 6

听起来您的应用程序可能会受益于使用Bloom Filter数据结构的优势。

简而言之,布隆过滤器查询可以很快告诉您集合中是否绝对没有值。否则,您可以进行较慢的查找,以获取列表中可能存在的值的索引。因此,如果您的应用程序倾向于比“已找到”结果更频繁地获得“未找到”结果,则可以通过添加Bloom Filter来加快速度。

有关详细信息,Wikipedia很好地概述了布隆过滤器的工作方式,并且对“ python布隆过滤器库”的网络搜索将至少提供一些有用的实现。

It sounds like your application might gain advantage from the use of a Bloom Filter data structure.

In short, a bloom filter look-up can tell you very quickly if a value is DEFINITELY NOT present in a set. Otherwise, you can do a slower look-up to get the index of a value that POSSIBLY MIGHT BE in the list. So if your application tends to get the “not found” result much more often then the “found” result, you might see a speed up by adding a Bloom Filter.

For details, Wikipedia provides a good overview of how Bloom Filters work, and a web search for “python bloom filter library” will provide at least a couple useful implementations.


回答 7

请注意,in运算符不仅测试相等性(==),还测试身份(is),s 的in逻辑大致等同于以下内容(它实际上是用C编写的,但不是用Python编写的,至少是用CPython编写的):list

for element in s:
    if element is target:
        # fast check for identity implies equality
        return True
    if element == target:
        # slower check for actual equality
        return True
return False

在大多数情况下,这个细节是无关紧要的,但是在某些情况下,它可能会使Python新手感到惊讶,例如,numpy.NAN具有不等于自身的异常特性:

>>> import numpy
>>> numpy.NAN == numpy.NAN
False
>>> numpy.NAN is numpy.NAN
True
>>> numpy.NAN in [numpy.NAN]
True

要区分这些异常情况,可以使用any()

>>> lst = [numpy.NAN, 1 , 2]
>>> any(element == numpy.NAN for element in lst)
False
>>> any(element is numpy.NAN for element in lst)
True 

注意s 的in逻辑为:listany()

any(element is target or element == target for element in lst)

但是,我要强调的是,这是一个in极端的情况,在绝大多数情况下,运算符都是经过高度优化的,而这正是您想要的(当然是使用a list或使用a set)。

Be aware that the in operator tests not only equality (==) but also identity (is), the in logic for lists is roughly equivalent to the following (it’s actually written in C and not Python though, at least in CPython):

for element in s:
    if element is target:
        # fast check for identity implies equality
        return True
    if element == target:
        # slower check for actual equality
        return True
return False

In most circumstances this detail is irrelevant, but in some circumstances it might leave a Python novice surprised, for example, numpy.NAN has the unusual property of being not being equal to itself:

>>> import numpy
>>> numpy.NAN == numpy.NAN
False
>>> numpy.NAN is numpy.NAN
True
>>> numpy.NAN in [numpy.NAN]
True

To distinguish between these unusual cases you could use any() like:

>>> lst = [numpy.NAN, 1 , 2]
>>> any(element == numpy.NAN for element in lst)
False
>>> any(element is numpy.NAN for element in lst)
True 

Note the in logic for lists with any() would be:

any(element is target or element == target for element in lst)

However, I should emphasize that this is an edge case, and for the vast majority of cases the in operator is highly optimised and exactly what you want of course (either with a list or with a set).


回答 8

或使用__contains__

sequence.__contains__(value)

演示:

>>> l=[1,2,3]
>>> l.__contains__(3)
True
>>> 

Or use __contains__:

sequence.__contains__(value)

Demo:

>>> l=[1,2,3]
>>> l.__contains__(3)
True
>>> 

回答 9

@Winston Ewert的解决方案极大地提高了非常大的列表的速度,但是这个stackoverflow答案表明,如果经常到达除外分支,则try:/ except:/ else:构造将变慢。一种替代方法是利用该.get()方法使用dict:

a = [4,2,3,1,5,6]

index = dict((y, x) for x, y in enumerate(a))

b = index.get(7, None)
if b is not None:
    "Do something with variable b"

.get(key, default)方法仅适用于无法保证键将包含在dict中的情况。如果关键存在,它返回值(将dict[key]),但是,当它不是,.get()返回默认值(在这里None)。在这种情况下,您需要确保所选的默认值不会在中a

@Winston Ewert’s solution yields a big speed-up for very large lists, but this stackoverflow answer indicates that the the try:/except:/else: construct will be slowed down if the except branch is often reached. An alternative is to take advantage of the .get() method for the dict:

a = [4,2,3,1,5,6]

index = dict((y, x) for x, y in enumerate(a))

b = index.get(7, None)
if b is not None:
    "Do something with variable b"

The .get(key, default) method is just for the case when you can’t guarantee a key will be in the dict. If key is present, it returns the value (as would dict[key]), but when it is not, .get() returns your default value (here None). You need to make sure in this case that the chosen default will not be in a.


回答 10

这不是代码,而是用于快速搜索的算法。

如果您的列表和要查找的值都是数字,那么这很简单。如果是字符串:请看底部:

  • -让“ n”为列表的长度
  • -可选步骤:如果需要元素索引:将第二列添加到元素的当前索引(0到n-1)-稍后再说
  • 订购列表或列表的副本(.sort())
  • 依次通过:
    • 将您的数字与列表的第n / 2个元素进行比较
      • 如果更大,则在索引n / 2-n之间再次循环
      • 如果较小,则在索引0-n / 2之间再次循环
      • 如果相同:您找到了
  • 不断缩小列表的范围,直到找到它或只有2个数字(在您要查找的数字的下方和上方)
  • 这将在最多19个步骤中找到1.000.000列表中的任何元素(准确地说是log(2)n)

如果您还需要号码的原始位置,请在第二个索引列中查找。

如果您的列表不是由数字组成的,则该方法仍然有效并且将是最快的,但是您可能需要定义一个可以比较/排序字符串的函数。

当然,这需要sorted()方法的投资,但是如果您继续重复使用相同的列表进行检查,那可能是值得的。

This is not the code, but the algorithm for very fast searching.

If your list and the value you are looking for are all numbers, this is pretty straightforward. If strings: look at the bottom:

  • -Let “n” be the length of your list
  • -Optional step: if you need the index of the element: add a second column to the list with current index of elements (0 to n-1) – see later
  • Order your list or a copy of it (.sort())
  • Loop through:
    • Compare your number to the n/2th element of the list
      • If larger, loop again between indexes n/2-n
      • If smaller, loop again between indexes 0-n/2
      • If the same: you found it
  • Keep narrowing the list until you have found it or only have 2 numbers (below and above the one you are looking for)
  • This will find any element in at most 19 steps for a list of 1.000.000 (log(2)n to be precise)

If you also need the original position of your number, look for it in the second, index column.

If your list is not made of numbers, the method still works and will be fastest, but you may need to define a function which can compare/order strings.

Of course, this needs the investment of the sorted() method, but if you keep reusing the same list for checking, it may be worth it.


回答 11

因为问题不一定总是被理解为最快的技术方法-我总是建议理解/编写最直接的最快方法:列表理解,单线

[i for i in list_from_which_to_search if i in list_to_search_in]

我对list_to_search_in所有项目都拥有一个,并想返回中的项目索引list_from_which_to_search

这将在一个不错的列表中返回索引。

还有其他方法可以解决此问题-但是列表理解速度足够快,并且可以以足够快的速度编写它来解决问题。

Because the question is not always supposed to be understood as the fastest technical way – I always suggest the most straightforward fastest way to understand/write: a list comprehension, one-liner

[i for i in list_from_which_to_search if i in list_to_search_in]

I had a list_to_search_in with all the items, and wanted to return the indexes of the items in the list_from_which_to_search.

This returns the indexes in a nice list.

There are other ways to check this problem – however list comprehensions are quick enough, adding to the fact of writing it quick enough, to solve a problem.


回答 12

对我而言,这是0.030秒(实际),0.026秒(用户)和0.004秒(系统)。

try:
print("Started")
x = ["a", "b", "c", "d", "e", "f"]

i = 0

while i < len(x):
    i += 1
    if x[i] == "e":
        print("Found")
except IndexError:
    pass

For me it was 0.030 sec (real), 0.026 sec (user), and 0.004 sec (sys).

try:
print("Started")
x = ["a", "b", "c", "d", "e", "f"]

i = 0

while i < len(x):
    i += 1
    if x[i] == "e":
        print("Found")
except IndexError:
    pass

回答 13

检查乘积等于k的数组中是否存在两个元素的代码:

n = len(arr1)
for i in arr1:
    if k%i==0:
        print(i)

Code to check whether two elements exist in array whose product equals k:

n = len(arr1)
for i in arr1:
    if k%i==0:
        print(i)

获得两个列表之间的差异

问题:获得两个列表之间的差异

我在Python中有两个列表,如下所示:

temp1 = ['One', 'Two', 'Three', 'Four']
temp2 = ['One', 'Two']

我需要用第一个列表中没有的项目创建第三个列表。从示例中,我必须得到:

temp3 = ['Three', 'Four']

有没有循环和检查的快速方法吗?

I have two lists in Python, like these:

temp1 = ['One', 'Two', 'Three', 'Four']
temp2 = ['One', 'Two']

I need to create a third list with items from the first list which aren’t present in the second one. From the example I have to get:

temp3 = ['Three', 'Four']

Are there any fast ways without cycles and checking?


回答 0

In [5]: list(set(temp1) - set(temp2))
Out[5]: ['Four', 'Three']

当心

In [5]: set([1, 2]) - set([2, 3])
Out[5]: set([1]) 

您可能希望/希望它等于的位置set([1, 3])。如果确实要set([1, 3])作为答案,则需要使用set([1, 2]).symmetric_difference(set([2, 3]))

In [5]: list(set(temp1) - set(temp2))
Out[5]: ['Four', 'Three']

Beware that

In [5]: set([1, 2]) - set([2, 3])
Out[5]: set([1]) 

where you might expect/want it to equal set([1, 3]). If you do want set([1, 3]) as your answer, you’ll need to use set([1, 2]).symmetric_difference(set([2, 3])).


回答 1

现有解决方案均提供以下一项或多项:

  • 比O(n * m)性能快。
  • 保留输入列表的顺序。

但是到目前为止,还没有解决方案。如果两者都想要,请尝试以下操作:

s = set(temp2)
temp3 = [x for x in temp1 if x not in s]

性能测试

import timeit
init = 'temp1 = list(range(100)); temp2 = [i * 2 for i in range(50)]'
print timeit.timeit('list(set(temp1) - set(temp2))', init, number = 100000)
print timeit.timeit('s = set(temp2);[x for x in temp1 if x not in s]', init, number = 100000)
print timeit.timeit('[item for item in temp1 if item not in temp2]', init, number = 100000)

结果:

4.34620224079 # ars' answer
4.2770634955  # This answer
30.7715615392 # matt b's answer

我介绍的方法以及保留顺序也比集合减法要快(略),因为它不需要构造不必要的集合。如果第一个列表比第二个列表长很多,并且散列很昂贵,则性能差异将更加明显。这是第二个测试,证明了这一点:

init = '''
temp1 = [str(i) for i in range(100000)]
temp2 = [str(i * 2) for i in range(50)]
'''

结果:

11.3836875916 # ars' answer
3.63890368748 # this answer (3 times faster!)
37.7445402279 # matt b's answer

The existing solutions all offer either one or the other of:

  • Faster than O(n*m) performance.
  • Preserve order of input list.

But so far no solution has both. If you want both, try this:

s = set(temp2)
temp3 = [x for x in temp1 if x not in s]

Performance test

import timeit
init = 'temp1 = list(range(100)); temp2 = [i * 2 for i in range(50)]'
print timeit.timeit('list(set(temp1) - set(temp2))', init, number = 100000)
print timeit.timeit('s = set(temp2);[x for x in temp1 if x not in s]', init, number = 100000)
print timeit.timeit('[item for item in temp1 if item not in temp2]', init, number = 100000)

Results:

4.34620224079 # ars' answer
4.2770634955  # This answer
30.7715615392 # matt b's answer

The method I presented as well as preserving order is also (slightly) faster than the set subtraction because it doesn’t require construction of an unnecessary set. The performance difference would be more noticable if the first list is considerably longer than the second and if hashing is expensive. Here’s a second test demonstrating this:

init = '''
temp1 = [str(i) for i in range(100000)]
temp2 = [str(i * 2) for i in range(50)]
'''

Results:

11.3836875916 # ars' answer
3.63890368748 # this answer (3 times faster!)
37.7445402279 # matt b's answer

回答 2

temp3 = [item for item in temp1 if item not in temp2]
temp3 = [item for item in temp1 if item not in temp2]

回答 3

可以使用以下简单函数找到两个列表(例如list1和list2)之间的差异。

def diff(list1, list2):
    c = set(list1).union(set(list2))  # or c = set(list1) | set(list2)
    d = set(list1).intersection(set(list2))  # or d = set(list1) & set(list2)
    return list(c - d)

要么

def diff(list1, list2):
    return list(set(list1).symmetric_difference(set(list2)))  # or return list(set(list1) ^ set(list2))

通过使用上述功能,可以使用diff(temp2, temp1)或找到差异diff(temp1, temp2)。两者都会给出结果['Four', 'Three']。您不必担心列表的顺序或先给出哪个列表。

Python文档参考

The difference between two lists (say list1 and list2) can be found using the following simple function.

def diff(list1, list2):
    c = set(list1).union(set(list2))  # or c = set(list1) | set(list2)
    d = set(list1).intersection(set(list2))  # or d = set(list1) & set(list2)
    return list(c - d)

or

def diff(list1, list2):
    return list(set(list1).symmetric_difference(set(list2)))  # or return list(set(list1) ^ set(list2))

By Using the above function, the difference can be found using diff(temp2, temp1) or diff(temp1, temp2). Both will give the result ['Four', 'Three']. You don’t have to worry about the order of the list or which list is to be given first.

Python doc reference


回答 4

如果您需要递归的区别,我已经为python编写了一个软件包:https : //github.com/seperman/deepdiff

安装

从PyPi安装:

pip install deepdiff

用法示例

输入

>>> from deepdiff import DeepDiff
>>> from pprint import pprint
>>> from __future__ import print_function # In case running on Python 2

同一对象返回空

>>> t1 = {1:1, 2:2, 3:3}
>>> t2 = t1
>>> print(DeepDiff(t1, t2))
{}

项目类型已更改

>>> t1 = {1:1, 2:2, 3:3}
>>> t2 = {1:1, 2:"2", 3:3}
>>> pprint(DeepDiff(t1, t2), indent=2)
{ 'type_changes': { 'root[2]': { 'newtype': <class 'str'>,
                                 'newvalue': '2',
                                 'oldtype': <class 'int'>,
                                 'oldvalue': 2}}}

物品的价值已经改变

>>> t1 = {1:1, 2:2, 3:3}
>>> t2 = {1:1, 2:4, 3:3}
>>> pprint(DeepDiff(t1, t2), indent=2)
{'values_changed': {'root[2]': {'newvalue': 4, 'oldvalue': 2}}}

添加和/或删除项目

>>> t1 = {1:1, 2:2, 3:3, 4:4}
>>> t2 = {1:1, 2:4, 3:3, 5:5, 6:6}
>>> ddiff = DeepDiff(t1, t2)
>>> pprint (ddiff)
{'dic_item_added': ['root[5]', 'root[6]'],
 'dic_item_removed': ['root[4]'],
 'values_changed': {'root[2]': {'newvalue': 4, 'oldvalue': 2}}}

弦差异

>>> t1 = {1:1, 2:2, 3:3, 4:{"a":"hello", "b":"world"}}
>>> t2 = {1:1, 2:4, 3:3, 4:{"a":"hello", "b":"world!"}}
>>> ddiff = DeepDiff(t1, t2)
>>> pprint (ddiff, indent = 2)
{ 'values_changed': { 'root[2]': {'newvalue': 4, 'oldvalue': 2},
                      "root[4]['b']": { 'newvalue': 'world!',
                                        'oldvalue': 'world'}}}

弦差异2

>>> t1 = {1:1, 2:2, 3:3, 4:{"a":"hello", "b":"world!\nGoodbye!\n1\n2\nEnd"}}
>>> t2 = {1:1, 2:2, 3:3, 4:{"a":"hello", "b":"world\n1\n2\nEnd"}}
>>> ddiff = DeepDiff(t1, t2)
>>> pprint (ddiff, indent = 2)
{ 'values_changed': { "root[4]['b']": { 'diff': '--- \n'
                                                '+++ \n'
                                                '@@ -1,5 +1,4 @@\n'
                                                '-world!\n'
                                                '-Goodbye!\n'
                                                '+world\n'
                                                ' 1\n'
                                                ' 2\n'
                                                ' End',
                                        'newvalue': 'world\n1\n2\nEnd',
                                        'oldvalue': 'world!\n'
                                                    'Goodbye!\n'
                                                    '1\n'
                                                    '2\n'
                                                    'End'}}}

>>> 
>>> print (ddiff['values_changed']["root[4]['b']"]["diff"])
--- 
+++ 
@@ -1,5 +1,4 @@
-world!
-Goodbye!
+world
 1
 2
 End

类型变更

>>> t1 = {1:1, 2:2, 3:3, 4:{"a":"hello", "b":[1, 2, 3]}}
>>> t2 = {1:1, 2:2, 3:3, 4:{"a":"hello", "b":"world\n\n\nEnd"}}
>>> ddiff = DeepDiff(t1, t2)
>>> pprint (ddiff, indent = 2)
{ 'type_changes': { "root[4]['b']": { 'newtype': <class 'str'>,
                                      'newvalue': 'world\n\n\nEnd',
                                      'oldtype': <class 'list'>,
                                      'oldvalue': [1, 2, 3]}}}

清单差异

>>> t1 = {1:1, 2:2, 3:3, 4:{"a":"hello", "b":[1, 2, 3, 4]}}
>>> t2 = {1:1, 2:2, 3:3, 4:{"a":"hello", "b":[1, 2]}}
>>> ddiff = DeepDiff(t1, t2)
>>> pprint (ddiff, indent = 2)
{'iterable_item_removed': {"root[4]['b'][2]": 3, "root[4]['b'][3]": 4}}

清单差异2:

>>> t1 = {1:1, 2:2, 3:3, 4:{"a":"hello", "b":[1, 2, 3]}}
>>> t2 = {1:1, 2:2, 3:3, 4:{"a":"hello", "b":[1, 3, 2, 3]}}
>>> ddiff = DeepDiff(t1, t2)
>>> pprint (ddiff, indent = 2)
{ 'iterable_item_added': {"root[4]['b'][3]": 3},
  'values_changed': { "root[4]['b'][1]": {'newvalue': 3, 'oldvalue': 2},
                      "root[4]['b'][2]": {'newvalue': 2, 'oldvalue': 3}}}

列出差异忽略顺序或重复项:(具有与上述相同的字典)

>>> t1 = {1:1, 2:2, 3:3, 4:{"a":"hello", "b":[1, 2, 3]}}
>>> t2 = {1:1, 2:2, 3:3, 4:{"a":"hello", "b":[1, 3, 2, 3]}}
>>> ddiff = DeepDiff(t1, t2, ignore_order=True)
>>> print (ddiff)
{}

包含字典的列表:

>>> t1 = {1:1, 2:2, 3:3, 4:{"a":"hello", "b":[1, 2, {1:1, 2:2}]}}
>>> t2 = {1:1, 2:2, 3:3, 4:{"a":"hello", "b":[1, 2, {1:3}]}}
>>> ddiff = DeepDiff(t1, t2)
>>> pprint (ddiff, indent = 2)
{ 'dic_item_removed': ["root[4]['b'][2][2]"],
  'values_changed': {"root[4]['b'][2][1]": {'newvalue': 3, 'oldvalue': 1}}}

套装:

>>> t1 = {1, 2, 8}
>>> t2 = {1, 2, 3, 5}
>>> ddiff = DeepDiff(t1, t2)
>>> pprint (DeepDiff(t1, t2))
{'set_item_added': ['root[3]', 'root[5]'], 'set_item_removed': ['root[8]']}

命名元组:

>>> from collections import namedtuple
>>> Point = namedtuple('Point', ['x', 'y'])
>>> t1 = Point(x=11, y=22)
>>> t2 = Point(x=11, y=23)
>>> pprint (DeepDiff(t1, t2))
{'values_changed': {'root.y': {'newvalue': 23, 'oldvalue': 22}}}

自定义对象:

>>> class ClassA(object):
...     a = 1
...     def __init__(self, b):
...         self.b = b
... 
>>> t1 = ClassA(1)
>>> t2 = ClassA(2)
>>> 
>>> pprint(DeepDiff(t1, t2))
{'values_changed': {'root.b': {'newvalue': 2, 'oldvalue': 1}}}

对象属性添加:

>>> t2.c = "new attribute"
>>> pprint(DeepDiff(t1, t2))
{'attribute_added': ['root.c'],
 'values_changed': {'root.b': {'newvalue': 2, 'oldvalue': 1}}}

In case you want the difference recursively, I have written a package for python: https://github.com/seperman/deepdiff

Installation

Install from PyPi:

pip install deepdiff

Example usage

Importing

>>> from deepdiff import DeepDiff
>>> from pprint import pprint
>>> from __future__ import print_function # In case running on Python 2

Same object returns empty

>>> t1 = {1:1, 2:2, 3:3}
>>> t2 = t1
>>> print(DeepDiff(t1, t2))
{}

Type of an item has changed

>>> t1 = {1:1, 2:2, 3:3}
>>> t2 = {1:1, 2:"2", 3:3}
>>> pprint(DeepDiff(t1, t2), indent=2)
{ 'type_changes': { 'root[2]': { 'newtype': <class 'str'>,
                                 'newvalue': '2',
                                 'oldtype': <class 'int'>,
                                 'oldvalue': 2}}}

Value of an item has changed

>>> t1 = {1:1, 2:2, 3:3}
>>> t2 = {1:1, 2:4, 3:3}
>>> pprint(DeepDiff(t1, t2), indent=2)
{'values_changed': {'root[2]': {'newvalue': 4, 'oldvalue': 2}}}

Item added and/or removed

>>> t1 = {1:1, 2:2, 3:3, 4:4}
>>> t2 = {1:1, 2:4, 3:3, 5:5, 6:6}
>>> ddiff = DeepDiff(t1, t2)
>>> pprint (ddiff)
{'dic_item_added': ['root[5]', 'root[6]'],
 'dic_item_removed': ['root[4]'],
 'values_changed': {'root[2]': {'newvalue': 4, 'oldvalue': 2}}}

String difference

>>> t1 = {1:1, 2:2, 3:3, 4:{"a":"hello", "b":"world"}}
>>> t2 = {1:1, 2:4, 3:3, 4:{"a":"hello", "b":"world!"}}
>>> ddiff = DeepDiff(t1, t2)
>>> pprint (ddiff, indent = 2)
{ 'values_changed': { 'root[2]': {'newvalue': 4, 'oldvalue': 2},
                      "root[4]['b']": { 'newvalue': 'world!',
                                        'oldvalue': 'world'}}}

String difference 2

>>> t1 = {1:1, 2:2, 3:3, 4:{"a":"hello", "b":"world!\nGoodbye!\n1\n2\nEnd"}}
>>> t2 = {1:1, 2:2, 3:3, 4:{"a":"hello", "b":"world\n1\n2\nEnd"}}
>>> ddiff = DeepDiff(t1, t2)
>>> pprint (ddiff, indent = 2)
{ 'values_changed': { "root[4]['b']": { 'diff': '--- \n'
                                                '+++ \n'
                                                '@@ -1,5 +1,4 @@\n'
                                                '-world!\n'
                                                '-Goodbye!\n'
                                                '+world\n'
                                                ' 1\n'
                                                ' 2\n'
                                                ' End',
                                        'newvalue': 'world\n1\n2\nEnd',
                                        'oldvalue': 'world!\n'
                                                    'Goodbye!\n'
                                                    '1\n'
                                                    '2\n'
                                                    'End'}}}

>>> 
>>> print (ddiff['values_changed']["root[4]['b']"]["diff"])
--- 
+++ 
@@ -1,5 +1,4 @@
-world!
-Goodbye!
+world
 1
 2
 End

Type change

>>> t1 = {1:1, 2:2, 3:3, 4:{"a":"hello", "b":[1, 2, 3]}}
>>> t2 = {1:1, 2:2, 3:3, 4:{"a":"hello", "b":"world\n\n\nEnd"}}
>>> ddiff = DeepDiff(t1, t2)
>>> pprint (ddiff, indent = 2)
{ 'type_changes': { "root[4]['b']": { 'newtype': <class 'str'>,
                                      'newvalue': 'world\n\n\nEnd',
                                      'oldtype': <class 'list'>,
                                      'oldvalue': [1, 2, 3]}}}

List difference

>>> t1 = {1:1, 2:2, 3:3, 4:{"a":"hello", "b":[1, 2, 3, 4]}}
>>> t2 = {1:1, 2:2, 3:3, 4:{"a":"hello", "b":[1, 2]}}
>>> ddiff = DeepDiff(t1, t2)
>>> pprint (ddiff, indent = 2)
{'iterable_item_removed': {"root[4]['b'][2]": 3, "root[4]['b'][3]": 4}}

List difference 2:

>>> t1 = {1:1, 2:2, 3:3, 4:{"a":"hello", "b":[1, 2, 3]}}
>>> t2 = {1:1, 2:2, 3:3, 4:{"a":"hello", "b":[1, 3, 2, 3]}}
>>> ddiff = DeepDiff(t1, t2)
>>> pprint (ddiff, indent = 2)
{ 'iterable_item_added': {"root[4]['b'][3]": 3},
  'values_changed': { "root[4]['b'][1]": {'newvalue': 3, 'oldvalue': 2},
                      "root[4]['b'][2]": {'newvalue': 2, 'oldvalue': 3}}}

List difference ignoring order or duplicates: (with the same dictionaries as above)

>>> t1 = {1:1, 2:2, 3:3, 4:{"a":"hello", "b":[1, 2, 3]}}
>>> t2 = {1:1, 2:2, 3:3, 4:{"a":"hello", "b":[1, 3, 2, 3]}}
>>> ddiff = DeepDiff(t1, t2, ignore_order=True)
>>> print (ddiff)
{}

List that contains dictionary:

>>> t1 = {1:1, 2:2, 3:3, 4:{"a":"hello", "b":[1, 2, {1:1, 2:2}]}}
>>> t2 = {1:1, 2:2, 3:3, 4:{"a":"hello", "b":[1, 2, {1:3}]}}
>>> ddiff = DeepDiff(t1, t2)
>>> pprint (ddiff, indent = 2)
{ 'dic_item_removed': ["root[4]['b'][2][2]"],
  'values_changed': {"root[4]['b'][2][1]": {'newvalue': 3, 'oldvalue': 1}}}

Sets:

>>> t1 = {1, 2, 8}
>>> t2 = {1, 2, 3, 5}
>>> ddiff = DeepDiff(t1, t2)
>>> pprint (DeepDiff(t1, t2))
{'set_item_added': ['root[3]', 'root[5]'], 'set_item_removed': ['root[8]']}

Named Tuples:

>>> from collections import namedtuple
>>> Point = namedtuple('Point', ['x', 'y'])
>>> t1 = Point(x=11, y=22)
>>> t2 = Point(x=11, y=23)
>>> pprint (DeepDiff(t1, t2))
{'values_changed': {'root.y': {'newvalue': 23, 'oldvalue': 22}}}

Custom objects:

>>> class ClassA(object):
...     a = 1
...     def __init__(self, b):
...         self.b = b
... 
>>> t1 = ClassA(1)
>>> t2 = ClassA(2)
>>> 
>>> pprint(DeepDiff(t1, t2))
{'values_changed': {'root.b': {'newvalue': 2, 'oldvalue': 1}}}

Object attribute added:

>>> t2.c = "new attribute"
>>> pprint(DeepDiff(t1, t2))
{'attribute_added': ['root.c'],
 'values_changed': {'root.b': {'newvalue': 2, 'oldvalue': 1}}}

回答 5

可以使用python XOR运算符完成。

  • 这将删除每个列表中的重复项
  • 这将显示temp1与temp2和temp2与temp1的差异。

set(temp1) ^ set(temp2)

Can be done using python XOR operator.

  • This will remove the duplicates in each list
  • This will show difference of temp1 from temp2 and temp2 from temp1.

set(temp1) ^ set(temp2)

回答 6

最简单的方法

使用set()。difference(set())

list_a = [1,2,3]
list_b = [2,3]
print set(list_a).difference(set(list_b))

答案是 set([1])

可以打印为列表,

print list(set(list_a).difference(set(list_b)))

most simple way,

use set().difference(set())

list_a = [1,2,3]
list_b = [2,3]
print set(list_a).difference(set(list_b))

answer is set([1])

can print as a list,

print list(set(list_a).difference(set(list_b)))

回答 7

如果您真的在考虑性能,请使用numpy!

这是完整的笔记本,是github上的要点,其中包括list,numpy和pandas之间的比较。

https://gist.github.com/denfromufa/2821ff59b02e9482be15d27f2bbd4451

在此处输入图片说明

If you are really looking into performance, then use numpy!

Here is the full notebook as a gist on github with comparison between list, numpy, and pandas.

https://gist.github.com/denfromufa/2821ff59b02e9482be15d27f2bbd4451

enter image description here


回答 8

因为目前的解决方案都无法产生元组,所以我会抛出:

temp3 = tuple(set(temp1) - set(temp2))

或者:

#edited using @Mark Byers idea. If you accept this one as answer, just accept his instead.
temp3 = tuple(x for x in temp1 if x not in set(temp2))

像其他非元组在该方向上产生答案一样,它保留了顺序

i’ll toss in since none of the present solutions yield a tuple:

temp3 = tuple(set(temp1) - set(temp2))

alternatively:

#edited using @Mark Byers idea. If you accept this one as answer, just accept his instead.
temp3 = tuple(x for x in temp1 if x not in set(temp2))

Like the other non-tuple yielding answers in this direction, it preserves order


回答 9

我想要的东西,将采取两个列表,并可以做什么diffbash呢。因为当您搜索“ python diff two list”时该问题首先弹出,并且不是很具体,所以我将发布我提出的内容。

使用SequenceMatherfrom difflib可以像比较两个列表diff。其他答案都不会告诉您差异发生的位置,但是这个答案确实可以。一些答案只能在一个方向上有所不同。一些重新排列元素。有些不处理重复项。但是此解决方案为您提供了两个列表之间的真正区别:

a = 'A quick fox jumps the lazy dog'.split()
b = 'A quick brown mouse jumps over the dog'.split()

from difflib import SequenceMatcher

for tag, i, j, k, l in SequenceMatcher(None, a, b).get_opcodes():
  if tag == 'equal': print('both have', a[i:j])
  if tag in ('delete', 'replace'): print('  1st has', a[i:j])
  if tag in ('insert', 'replace'): print('  2nd has', b[k:l])

输出:

both have ['A', 'quick']
  1st has ['fox']
  2nd has ['brown', 'mouse']
both have ['jumps']
  2nd has ['over']
both have ['the']
  1st has ['lazy']
both have ['dog']

当然,如果您的应用程序做出与其他答案相同的假设,则您将从中受益最大。但是,如果您正在寻找真正的diff功能,那么这是唯一的方法。

例如,其他答案都无法处理:

a = [1,2,3,4,5]
b = [5,4,3,2,1]

但这确实做到了:

  2nd has [5, 4, 3, 2]
both have [1]
  1st has [2, 3, 4, 5]

I wanted something that would take two lists and could do what diff in bash does. Since this question pops up first when you search for “python diff two lists” and is not very specific, I will post what I came up with.

Using SequenceMather from difflib you can compare two lists like diff does. None of the other answers will tell you the position where the difference occurs, but this one does. Some answers give the difference in only one direction. Some reorder the elements. Some don’t handle duplicates. But this solution gives you a true difference between two lists:

a = 'A quick fox jumps the lazy dog'.split()
b = 'A quick brown mouse jumps over the dog'.split()

from difflib import SequenceMatcher

for tag, i, j, k, l in SequenceMatcher(None, a, b).get_opcodes():
  if tag == 'equal': print('both have', a[i:j])
  if tag in ('delete', 'replace'): print('  1st has', a[i:j])
  if tag in ('insert', 'replace'): print('  2nd has', b[k:l])

This outputs:

both have ['A', 'quick']
  1st has ['fox']
  2nd has ['brown', 'mouse']
both have ['jumps']
  2nd has ['over']
both have ['the']
  1st has ['lazy']
both have ['dog']

Of course, if your application makes the same assumptions the other answers make, you will benefit from them the most. But if you are looking for a true diff functionality, then this is the only way to go.

For example, none of the other answers could handle:

a = [1,2,3,4,5]
b = [5,4,3,2,1]

But this one does:

  2nd has [5, 4, 3, 2]
both have [1]
  1st has [2, 3, 4, 5]

回答 10

尝试这个:

temp3 = set(temp1) - set(temp2)

Try this:

temp3 = set(temp1) - set(temp2)

回答 11

这可能比Mark的列表理解速度还要快:

list(itertools.filterfalse(set(temp2).__contains__, temp1))

this could be even faster than Mark’s list comprehension:

list(itertools.filterfalse(set(temp2).__contains__, temp1))

回答 12

这是Counter最简单情况的答案。

这比上面的双向差异短,因为它只完全满足问题的要求:生成第一个列表的列表,而不生成第二个列表。

from collections import Counter

lst1 = ['One', 'Two', 'Three', 'Four']
lst2 = ['One', 'Two']

c1 = Counter(lst1)
c2 = Counter(lst2)
diff = list((c1 - c2).elements())

另外,根据您对可读性的偏好,它可以提供不错的一线:

diff = list((Counter(lst1) - Counter(lst2)).elements())

输出:

['Three', 'Four']

请注意,list(...)如果仅在呼叫上进行迭代,则可以将其删除。

由于此解决方案使用计数器,因此与许多基于集合的答案相比,它可以正确处理数量。例如在此输入上:

lst1 = ['One', 'Two', 'Two', 'Two', 'Three', 'Three', 'Four']
lst2 = ['One', 'Two']

输出为:

['Two', 'Two', 'Three', 'Three', 'Four']

Here’s a Counter answer for the simplest case.

This is shorter than the one above that does two-way diffs because it only does exactly what the question asks: generate a list of what’s in the first list but not the second.

from collections import Counter

lst1 = ['One', 'Two', 'Three', 'Four']
lst2 = ['One', 'Two']

c1 = Counter(lst1)
c2 = Counter(lst2)
diff = list((c1 - c2).elements())

Alternatively, depending on your readability preferences, it makes for a decent one-liner:

diff = list((Counter(lst1) - Counter(lst2)).elements())

Output:

['Three', 'Four']

Note that you can remove the list(...) call if you are just iterating over it.

Because this solution uses counters, it handles quantities properly vs the many set-based answers. For example on this input:

lst1 = ['One', 'Two', 'Two', 'Two', 'Three', 'Three', 'Four']
lst2 = ['One', 'Two']

The output is:

['Two', 'Two', 'Three', 'Three', 'Four']

回答 13

如果对difflist的元素进行排序和设置,则可以使用幼稚的方法。

list1=[1,2,3,4,5]
list2=[1,2,3]

print list1[len(list2):]

或使用本机set方法:

subset=set(list1).difference(list2)

print subset

import timeit
init = 'temp1 = list(range(100)); temp2 = [i * 2 for i in range(50)]'
print "Naive solution: ", timeit.timeit('temp1[len(temp2):]', init, number = 100000)
print "Native set solution: ", timeit.timeit('set(temp1).difference(temp2)', init, number = 100000)

天真的解决方案:0.0787101593292

本机设置解决方案:0.998837615564

You could use a naive method if the elements of the difflist are sorted and sets.

list1=[1,2,3,4,5]
list2=[1,2,3]

print list1[len(list2):]

or with native set methods:

subset=set(list1).difference(list2)

print subset

import timeit
init = 'temp1 = list(range(100)); temp2 = [i * 2 for i in range(50)]'
print "Naive solution: ", timeit.timeit('temp1[len(temp2):]', init, number = 100000)
print "Native set solution: ", timeit.timeit('set(temp1).difference(temp2)', init, number = 100000)

Naive solution: 0.0787101593292

Native set solution: 0.998837615564


回答 14

为此我在游戏中为时不晚,但是您可以将上述某些代码的性能与此进行比较,其中两个最快的竞争者是:

list(set(x).symmetric_difference(set(y)))
list(set(x) ^ set(y))

对于基本的编码我深表歉意。

import time
import random
from itertools import filterfalse

# 1 - performance (time taken)
# 2 - correctness (answer - 1,4,5,6)
# set performance
performance = 1
numberoftests = 7

def answer(x,y,z):
    if z == 0:
        start = time.clock()
        lists = (str(list(set(x)-set(y))+list(set(y)-set(y))))
        times = ("1 = " + str(time.clock() - start))
        return (lists,times)

    elif z == 1:
        start = time.clock()
        lists = (str(list(set(x).symmetric_difference(set(y)))))
        times = ("2 = " + str(time.clock() - start))
        return (lists,times)

    elif z == 2:
        start = time.clock()
        lists = (str(list(set(x) ^ set(y))))
        times = ("3 = " + str(time.clock() - start))
        return (lists,times)

    elif z == 3:
        start = time.clock()
        lists = (filterfalse(set(y).__contains__, x))
        times = ("4 = " + str(time.clock() - start))
        return (lists,times)

    elif z == 4:
        start = time.clock()
        lists = (tuple(set(x) - set(y)))
        times = ("5 = " + str(time.clock() - start))
        return (lists,times)

    elif z == 5:
        start = time.clock()
        lists = ([tt for tt in x if tt not in y])
        times = ("6 = " + str(time.clock() - start))
        return (lists,times)

    else:    
        start = time.clock()
        Xarray = [iDa for iDa in x if iDa not in y]
        Yarray = [iDb for iDb in y if iDb not in x]
        lists = (str(Xarray + Yarray))
        times = ("7 = " + str(time.clock() - start))
        return (lists,times)

n = numberoftests

if performance == 2:
    a = [1,2,3,4,5]
    b = [3,2,6]
    for c in range(0,n):
        d = answer(a,b,c)
        print(d[0])

elif performance == 1:
    for tests in range(0,10):
        print("Test Number" + str(tests + 1))
        a = random.sample(range(1, 900000), 9999)
        b = random.sample(range(1, 900000), 9999)
        for c in range(0,n):
            #if c not in (1,4,5,6):
            d = answer(a,b,c)
            print(d[1])

I am little too late in the game for this but you can do a comparison of performance of some of the above mentioned code with this, two of the fastest contenders are,

list(set(x).symmetric_difference(set(y)))
list(set(x) ^ set(y))

I apologize for the elementary level of coding.

import time
import random
from itertools import filterfalse

# 1 - performance (time taken)
# 2 - correctness (answer - 1,4,5,6)
# set performance
performance = 1
numberoftests = 7

def answer(x,y,z):
    if z == 0:
        start = time.clock()
        lists = (str(list(set(x)-set(y))+list(set(y)-set(y))))
        times = ("1 = " + str(time.clock() - start))
        return (lists,times)

    elif z == 1:
        start = time.clock()
        lists = (str(list(set(x).symmetric_difference(set(y)))))
        times = ("2 = " + str(time.clock() - start))
        return (lists,times)

    elif z == 2:
        start = time.clock()
        lists = (str(list(set(x) ^ set(y))))
        times = ("3 = " + str(time.clock() - start))
        return (lists,times)

    elif z == 3:
        start = time.clock()
        lists = (filterfalse(set(y).__contains__, x))
        times = ("4 = " + str(time.clock() - start))
        return (lists,times)

    elif z == 4:
        start = time.clock()
        lists = (tuple(set(x) - set(y)))
        times = ("5 = " + str(time.clock() - start))
        return (lists,times)

    elif z == 5:
        start = time.clock()
        lists = ([tt for tt in x if tt not in y])
        times = ("6 = " + str(time.clock() - start))
        return (lists,times)

    else:    
        start = time.clock()
        Xarray = [iDa for iDa in x if iDa not in y]
        Yarray = [iDb for iDb in y if iDb not in x]
        lists = (str(Xarray + Yarray))
        times = ("7 = " + str(time.clock() - start))
        return (lists,times)

n = numberoftests

if performance == 2:
    a = [1,2,3,4,5]
    b = [3,2,6]
    for c in range(0,n):
        d = answer(a,b,c)
        print(d[0])

elif performance == 1:
    for tests in range(0,10):
        print("Test Number" + str(tests + 1))
        a = random.sample(range(1, 900000), 9999)
        b = random.sample(range(1, 900000), 9999)
        for c in range(0,n):
            #if c not in (1,4,5,6):
            d = answer(a,b,c)
            print(d[1])

回答 15

这是一些比较两个字符串列表的简单的,保留顺序的方法。

一种不寻常的方法,使用pathlib

import pathlib


temp1 = ["One", "Two", "Three", "Four"]
temp2 = ["One", "Two"]

p = pathlib.Path(*temp1)
r = p.relative_to(*temp2)
list(r.parts)
# ['Three', 'Four']

假设两个列表都包含以相同的开头的字符串。有关更多详细信息,请参阅文档。注意,与设置操作相比,它并不是特别快。


使用以下方法的直接实现itertools.zip_longest

import itertools as it


[x for x, y in it.zip_longest(temp1, temp2) if x != y]
# ['Three', 'Four']

Here are a few simple, order-preserving ways of diffing two lists of strings.

Code

An unusual approach using pathlib:

import pathlib


temp1 = ["One", "Two", "Three", "Four"]
temp2 = ["One", "Two"]

p = pathlib.Path(*temp1)
r = p.relative_to(*temp2)
list(r.parts)
# ['Three', 'Four']

This assumes both lists contain strings with equivalent beginnings. See the docs for more details. Note, it is not particularly fast compared to set operations.


A straight-forward implementation using itertools.zip_longest:

import itertools as it


[x for x, y in it.zip_longest(temp1, temp2) if x != y]
# ['Three', 'Four']

回答 16

这是另一个解决方案:

def diff(a, b):
    xa = [i for i in set(a) if i not in b]
    xb = [i for i in set(b) if i not in a]
    return xa + xb

This is another solution:

def diff(a, b):
    xa = [i for i in set(a) if i not in b]
    xb = [i for i in set(b) if i not in a]
    return xa + xb

回答 17

如果遇到问题TypeError: unhashable type: 'list',则需要将列表或集合转换为元组,例如

set(map(tuple, list_of_lists1)).symmetric_difference(set(map(tuple, list_of_lists2)))

另请参阅如何在python中比较列表/集合的列表?

If you run into TypeError: unhashable type: 'list' you need to turn lists or sets into tuples, e.g.

set(map(tuple, list_of_lists1)).symmetric_difference(set(map(tuple, list_of_lists2)))

See also How to compare a list of lists/sets in python?


回答 18

假设我们有两个清单

list1 = [1, 3, 5, 7, 9]
list2 = [1, 2, 3, 4, 5]

从上面的两个列表中可以看出,列表2中存在项目1、3、5,而项目7、9中则不存在。另一方面,列表1中存在项目1、3、5,而项目2、4中不存在。

返回包含项目7、9和2、4的新列表的最佳解决方案是什么?

上面的所有答案都找到了解决方案,现在最最佳的是什么?

def difference(list1, list2):
    new_list = []
    for i in list1:
        if i not in list2:
            new_list.append(i)

    for j in list2:
        if j not in list1:
            new_list.append(j)
    return new_list

def sym_diff(list1, list2):
    return list(set(list1).symmetric_difference(set(list2)))

使用timeit我们可以看到结果

t1 = timeit.Timer("difference(list1, list2)", "from __main__ import difference, 
list1, list2")
t2 = timeit.Timer("sym_diff(list1, list2)", "from __main__ import sym_diff, 
list1, list2")

print('Using two for loops', t1.timeit(number=100000), 'Milliseconds')
print('Using two for loops', t2.timeit(number=100000), 'Milliseconds')

退货

[7, 9, 2, 4]
Using two for loops 0.11572412995155901 Milliseconds
Using symmetric_difference 0.11285737506113946 Milliseconds

Process finished with exit code 0

Let’s say we have two lists

list1 = [1, 3, 5, 7, 9]
list2 = [1, 2, 3, 4, 5]

we can see from the above two lists that items 1, 3, 5 exist in list2 and items 7, 9 do not. On the other hand, items 1, 3, 5 exist in list1 and items 2, 4 do not.

What is the best solution to return a new list containing items 7, 9 and 2, 4?

All answers above find the solution, now whats the most optimal?

def difference(list1, list2):
    new_list = []
    for i in list1:
        if i not in list2:
            new_list.append(i)

    for j in list2:
        if j not in list1:
            new_list.append(j)
    return new_list

versus

def sym_diff(list1, list2):
    return list(set(list1).symmetric_difference(set(list2)))

Using timeit we can see the results

t1 = timeit.Timer("difference(list1, list2)", "from __main__ import difference, 
list1, list2")
t2 = timeit.Timer("sym_diff(list1, list2)", "from __main__ import sym_diff, 
list1, list2")

print('Using two for loops', t1.timeit(number=100000), 'Milliseconds')
print('Using two for loops', t2.timeit(number=100000), 'Milliseconds')

returns

[7, 9, 2, 4]
Using two for loops 0.11572412995155901 Milliseconds
Using symmetric_difference 0.11285737506113946 Milliseconds

Process finished with exit code 0

回答 19

arulmr解决方案的单行版本

def diff(listA, listB):
    return set(listA) - set(listB) | set(listA) -set(listB)

single line version of arulmr solution

def diff(listA, listB):
    return set(listA) - set(listB) | set(listA) -set(listB)

回答 20

如果您想要更像变更集的东西…可以使用Counter

from collections import Counter

def diff(a, b):
  """ more verbose than needs to be, for clarity """
  ca, cb = Counter(a), Counter(b)
  to_add = cb - ca
  to_remove = ca - cb
  changes = Counter(to_add)
  changes.subtract(to_remove)
  return changes

lista = ['one', 'three', 'four', 'four', 'one']
listb = ['one', 'two', 'three']

In [127]: diff(lista, listb)
Out[127]: Counter({'two': 1, 'one': -1, 'four': -2})
# in order to go from lista to list b, you need to add a "two", remove a "one", and remove two "four"s

In [128]: diff(listb, lista)
Out[128]: Counter({'four': 2, 'one': 1, 'two': -1})
# in order to go from listb to lista, you must add two "four"s, add a "one", and remove a "two"

if you want something more like a changeset… could use Counter

from collections import Counter

def diff(a, b):
  """ more verbose than needs to be, for clarity """
  ca, cb = Counter(a), Counter(b)
  to_add = cb - ca
  to_remove = ca - cb
  changes = Counter(to_add)
  changes.subtract(to_remove)
  return changes

lista = ['one', 'three', 'four', 'four', 'one']
listb = ['one', 'two', 'three']

In [127]: diff(lista, listb)
Out[127]: Counter({'two': 1, 'one': -1, 'four': -2})
# in order to go from lista to list b, you need to add a "two", remove a "one", and remove two "four"s

In [128]: diff(listb, lista)
Out[128]: Counter({'four': 2, 'one': 1, 'two': -1})
# in order to go from listb to lista, you must add two "four"s, add a "one", and remove a "two"

回答 21

我们可以计算交集减去列表的并集:

temp1 = ['One', 'Two', 'Three', 'Four']
temp2 = ['One', 'Two', 'Five']

set(temp1+temp2)-(set(temp1)&set(temp2))

Out: set(['Four', 'Five', 'Three']) 

We can calculate intersection minus union of lists:

temp1 = ['One', 'Two', 'Three', 'Four']
temp2 = ['One', 'Two', 'Five']

set(temp1+temp2)-(set(temp1)&set(temp2))

Out: set(['Four', 'Five', 'Three']) 

回答 22

只需一行即可解决。给定的问题是两个列表(temp1和temp2)在第三个列表(temp3)中返​​回它们的差。

temp3 = list(set(temp1).difference(set(temp2)))

This can be solved with one line. The question is given two lists (temp1 and temp2) return their difference in a third list (temp3).

temp3 = list(set(temp1).difference(set(temp2)))

回答 23

这是区分两个列表(无论内容如何)的一种简单方法,您可以得到如下所示的结果:

>>> from sets import Set
>>>
>>> l1 = ['xvda', False, 'xvdbb', 12, 'xvdbc']
>>> l2 = ['xvda', 'xvdbb', 'xvdbc', 'xvdbd', None]
>>>
>>> Set(l1).symmetric_difference(Set(l2))
Set([False, 'xvdbd', None, 12])

希望这会有所帮助。

Here is an simple way to distinguish two lists (whatever the contents are), you can get the result as shown below :

>>> from sets import Set
>>>
>>> l1 = ['xvda', False, 'xvdbb', 12, 'xvdbc']
>>> l2 = ['xvda', 'xvdbb', 'xvdbc', 'xvdbd', None]
>>>
>>> Set(l1).symmetric_difference(Set(l2))
Set([False, 'xvdbd', None, 12])

Hope this will helpful.


回答 24

我更喜欢使用转换为集合,然后使用“ difference()”函数。完整的代码是:

temp1 = ['One', 'Two', 'Three', 'Four'  ]                   
temp2 = ['One', 'Two']
set1 = set(temp1)
set2 = set(temp2)
set3 = set1.difference(set2)
temp3 = list(set3)
print(temp3)

输出:

>>>print(temp3)
['Three', 'Four']

这是最容易理解的,如果将来使用大数据,将来会更容易,如果不需要重复数据,将其转换为数据集将删除重复数据。希望能帮助到你 ;-)

I prefer to use converting to sets and then using the “difference()” function. The full code is :

temp1 = ['One', 'Two', 'Three', 'Four'  ]                   
temp2 = ['One', 'Two']
set1 = set(temp1)
set2 = set(temp2)
set3 = set1.difference(set2)
temp3 = list(set3)
print(temp3)

Output:

>>>print(temp3)
['Three', 'Four']

It’s the easiest to undersand, and morover in future if you work with large data, converting it to sets will remove duplicates if duplicates are not required. Hope it helps ;-)


回答 25

(list(set(a)-set(b))+list(set(b)-set(a)))
(list(set(a)-set(b))+list(set(b)-set(a)))

回答 26

def diffList(list1, list2):     # returns the difference between two lists.
    if len(list1) > len(list2):
        return (list(set(list1) - set(list2)))
    else:
        return (list(set(list2) - set(list1)))

例如,如果list1 = [10, 15, 20, 25, 30, 35, 40]list2 = [25, 40, 35]则返回的列表将是output = [10, 20, 30, 15]

def diffList(list1, list2):     # returns the difference between two lists.
    if len(list1) > len(list2):
        return (list(set(list1) - set(list2)))
    else:
        return (list(set(list2) - set(list1)))

e.g. if list1 = [10, 15, 20, 25, 30, 35, 40] and list2 = [25, 40, 35] then the returned list will be output = [10, 20, 30, 15]


如何根据对象的属性对对象列表进行排序?

问题:如何根据对象的属性对对象列表进行排序?

我有一个Python对象列表,我想按对象本身的属性对其进行排序。该列表如下所示:

>>> ut
[<Tag: 128>, <Tag: 2008>, <Tag: <>, <Tag: actionscript>, <Tag: addresses>,
 <Tag: aes>, <Tag: ajax> ...]

每个对象都有一个计数:

>>> ut[1].count
1L

我需要按递减计数对列表进行排序。

我已经看到了几种方法,但是我正在寻找Python的最佳实践。

I’ve got a list of Python objects that I’d like to sort by an attribute of the objects themselves. The list looks like:

>>> ut
[<Tag: 128>, <Tag: 2008>, <Tag: <>, <Tag: actionscript>, <Tag: addresses>,
 <Tag: aes>, <Tag: ajax> ...]

Each object has a count:

>>> ut[1].count
1L

I need to sort the list by number of counts descending.

I’ve seen several methods for this, but I’m looking for best practice in Python.


回答 0

# To sort the list in place...
ut.sort(key=lambda x: x.count, reverse=True)

# To return a new list, use the sorted() built-in function...
newlist = sorted(ut, key=lambda x: x.count, reverse=True)

有关按键排序的更多信息。

# To sort the list in place...
ut.sort(key=lambda x: x.count, reverse=True)

# To return a new list, use the sorted() built-in function...
newlist = sorted(ut, key=lambda x: x.count, reverse=True)

More on sorting by keys.


回答 1

可以使用最快的方法,尤其是在您的列表中有很多记录的情况下operator.attrgetter("count")。但是,它可以在预操作者版本的Python上运行,因此具有后备机制会很好。然后,您可能需要执行以下操作:

try: import operator
except ImportError: keyfun= lambda x: x.count # use a lambda if no operator module
else: keyfun= operator.attrgetter("count") # use operator since it's faster than lambda

ut.sort(key=keyfun, reverse=True) # sort in-place

A way that can be fastest, especially if your list has a lot of records, is to use operator.attrgetter("count"). However, this might run on an pre-operator version of Python, so it would be nice to have a fallback mechanism. You might want to do the following, then:

try: import operator
except ImportError: keyfun= lambda x: x.count # use a lambda if no operator module
else: keyfun= operator.attrgetter("count") # use operator since it's faster than lambda

ut.sort(key=keyfun, reverse=True) # sort in-place

回答 2

读者应注意,key =方法:

ut.sort(key=lambda x: x.count, reverse=True)

比向对象添加丰富的比较运算符快许多倍。我很惊讶地阅读了这篇文章(“ Python in a Nutshell”的第485页)。您可以通过在这个小程序上运行测试来确认这一点:

#!/usr/bin/env python
import random

class C:
    def __init__(self,count):
        self.count = count

    def __cmp__(self,other):
        return cmp(self.count,other.count)

longList = [C(random.random()) for i in xrange(1000000)] #about 6.1 secs
longList2 = longList[:]

longList.sort() #about 52 - 6.1 = 46 secs
longList2.sort(key = lambda c: c.count) #about 9 - 6.1 = 3 secs

我的非常少的测试表明,第一种方法的运行速度要慢10倍以上,但书中说,一般而言,它仅慢5倍左右。他们说的原因是由于python(timsort)中使用了高度优化的排序算法。

仍然,.sort(lambda)比普通的旧.sort()快是很奇怪的。我希望他们能解决这个问题。

Readers should notice that the key= method:

ut.sort(key=lambda x: x.count, reverse=True)

is many times faster than adding rich comparison operators to the objects. I was surprised to read this (page 485 of “Python in a Nutshell”). You can confirm this by running tests on this little program:

#!/usr/bin/env python
import random

class C:
    def __init__(self,count):
        self.count = count

    def __cmp__(self,other):
        return cmp(self.count,other.count)

longList = [C(random.random()) for i in xrange(1000000)] #about 6.1 secs
longList2 = longList[:]

longList.sort() #about 52 - 6.1 = 46 secs
longList2.sort(key = lambda c: c.count) #about 9 - 6.1 = 3 secs

My, very minimal, tests show the first sort is more than 10 times slower, but the book says it is only about 5 times slower in general. The reason they say is due to the highly optimizes sort algorithm used in python (timsort).

Still, its very odd that .sort(lambda) is faster than plain old .sort(). I hope they fix that.


回答 3

面向对象的方法

最好将对象排序逻辑(如果适用)设置为类的属性,而不是在每个实例中都要求进行排序。

这样可以确保一致性,并且不需要样板代码。

至少,您应该指定__eq____lt__操作此功能。然后使用sorted(list_of_objects)

class Card(object):

    def __init__(self, rank, suit):
        self.rank = rank
        self.suit = suit

    def __eq__(self, other):
        return self.rank == other.rank and self.suit == other.suit

    def __lt__(self, other):
        return self.rank < other.rank

hand = [Card(10, 'H'), Card(2, 'h'), Card(12, 'h'), Card(13, 'h'), Card(14, 'h')]
hand_order = [c.rank for c in hand]  # [10, 2, 12, 13, 14]

hand_sorted = sorted(hand)
hand_sorted_order = [c.rank for c in hand_sorted]  # [2, 10, 12, 13, 14]

Object-oriented approach

It’s good practice to make object sorting logic, if applicable, a property of the class rather than incorporated in each instance the ordering is required.

This ensures consistency and removes the need for boilerplate code.

At a minimum, you should specify __eq__ and __lt__ operations for this to work. Then just use sorted(list_of_objects).

class Card(object):

    def __init__(self, rank, suit):
        self.rank = rank
        self.suit = suit

    def __eq__(self, other):
        return self.rank == other.rank and self.suit == other.suit

    def __lt__(self, other):
        return self.rank < other.rank

hand = [Card(10, 'H'), Card(2, 'h'), Card(12, 'h'), Card(13, 'h'), Card(14, 'h')]
hand_order = [c.rank for c in hand]  # [10, 2, 12, 13, 14]

hand_sorted = sorted(hand)
hand_sorted_order = [c.rank for c in hand_sorted]  # [2, 10, 12, 13, 14]

回答 4

from operator import attrgetter
ut.sort(key = attrgetter('count'), reverse = True)
from operator import attrgetter
ut.sort(key = attrgetter('count'), reverse = True)

回答 5

它看起来很像Django ORM模型实例的列表。

为什么不对这样的查询进行排序:

ut = Tag.objects.order_by('-count')

It looks much like a list of Django ORM model instances.

Why not sort them on query like this:

ut = Tag.objects.order_by('-count')

回答 6

将丰富的比较运算符添加到对象类,然后使用列表的sort()方法。
参见python中的丰富比较


更新:尽管此方法可行,但我认为Triptych的解决方案更简单,因此更适合您的情况。

Add rich comparison operators to the object class, then use sort() method of the list.
See rich comparison in python.


Update: Although this method would work, I think solution from Triptych is better suited to your case because way simpler.


回答 7

如果要排序的属性property,则可以避免导入,operator.attrgetter而可以使用属性的fget方法。

例如,对于Circle具有属性的类,radius我们可以circles按如下所示对半径列表进行排序:

result = sorted(circles, key=Circle.radius.fget)

这不是最知名的功能,但通常使我免于导入的麻烦。

If the attribute you want to sort by is a property, then you can avoid importing operator.attrgetter and use the property’s fget method instead.

For example, for a class Circle with a property radius we could sort a list of circles by radii as follows:

result = sorted(circles, key=Circle.radius.fget)

This is not the most well-known feature but often saves me a line with the import.


更改Pandas中列的数据类型

问题:更改Pandas中列的数据类型

我想将表示为列表列表的表转换为Pandas DataFrame。作为一个极其简化的示例:

a = [['a', '1.2', '4.2'], ['b', '70', '0.03'], ['x', '5', '0']]
df = pd.DataFrame(a)

将列转换为适当类型的最佳方法是什么,在这种情况下,将列2和3转换为浮点数?有没有一种方法可以在转换为DataFrame时指定类型?还是先创建DataFrame然后遍历各列以更改各列的类型更好?理想情况下,我想以动态方式执行此操作,因为可以有数百个列,并且我不想确切指定哪些列属于哪种类型。我可以保证的是,每一列都包含相同类型的值。

I want to convert a table, represented as a list of lists, into a Pandas DataFrame. As an extremely simplified example:

a = [['a', '1.2', '4.2'], ['b', '70', '0.03'], ['x', '5', '0']]
df = pd.DataFrame(a)

What is the best way to convert the columns to the appropriate types, in this case columns 2 and 3 into floats? Is there a way to specify the types while converting to DataFrame? Or is it better to create the DataFrame first and then loop through the columns to change the type for each column? Ideally I would like to do this in a dynamic way because there can be hundreds of columns and I don’t want to specify exactly which columns are of which type. All I can guarantee is that each columns contains values of the same type.


回答 0

您可以使用三种主要选项来转换熊猫的类型:

  1. to_numeric()提供安全地将非数字类型(例如字符串)转换为合适的数字类型的功能。(另请参见to_datetime()to_timedelta()。)

  2. astype()-将(几乎)任何类型转换为(几乎)任何其他类型(即使这样做不一定明智)。还允许您转换为分类类型(非常有用)。

  3. infer_objects() -一种实用方法,如果可能的话,将保存Python对象的对象列转换为熊猫类型。

继续阅读以获取每种方法的更详细的解释和用法。


1。 to_numeric()

将DataFrame的一列或多列转换为数值的最佳方法是使用pandas.to_numeric()

此函数将尝试将非数字对象(例如字符串)适当地更改为整数或浮点数。

基本用法

输入to_numeric()是DataFrame的Series或单个列。

>>> s = pd.Series(["8", 6, "7.5", 3, "0.9"]) # mixed string and numeric values
>>> s
0      8
1      6
2    7.5
3      3
4    0.9
dtype: object

>>> pd.to_numeric(s) # convert everything to float values
0    8.0
1    6.0
2    7.5
3    3.0
4    0.9
dtype: float64

如您所见,将返回一个新的Series。请记住,将此输出分配给变量或列名以继续使用它:

# convert Series
my_series = pd.to_numeric(my_series)

# convert column "a" of a DataFrame
df["a"] = pd.to_numeric(df["a"])

您还可以通过以下apply()方法使用它来转换DataFrame的多个列:

# convert all columns of DataFrame
df = df.apply(pd.to_numeric) # convert all columns of DataFrame

# convert just columns "a" and "b"
df[["a", "b"]] = df[["a", "b"]].apply(pd.to_numeric)

只要您的值都可以转换,那可能就是您所需要的。

错误处理

但是,如果某些值不能转换为数字类型怎么办?

to_numeric()还使用errors关键字参数,该参数允许您将非数字值强制为NaN,或仅忽略包含这些值的列。

这是使用一系列s具有对象dtype 的字符串的示例:

>>> s = pd.Series(['1', '2', '4.7', 'pandas', '10'])
>>> s
0         1
1         2
2       4.7
3    pandas
4        10
dtype: object

如果无法转换值,则默认行为是引发。在这种情况下,它不能处理字符串“ pandas”:

>>> pd.to_numeric(s) # or pd.to_numeric(s, errors='raise')
ValueError: Unable to parse string

我们可能希望将“ pandas”视为丢失/错误的数值,而不是失败。我们可以NaN使用errors关键字参数将无效值强制如下:

>>> pd.to_numeric(s, errors='coerce')
0     1.0
1     2.0
2     4.7
3     NaN
4    10.0
dtype: float64

第三个选项errors只是在遇到无效值时忽略该操作:

>>> pd.to_numeric(s, errors='ignore')
# the original Series is returned untouched

当您要转换整个DataFrame,但又不知道我们哪些列可以可靠地转换为数字类型时,最后一个选项特别有用。在这种情况下,只需写:

df.apply(pd.to_numeric, errors='ignore')

该函数将应用于DataFrame的每一列。可以转换为数字类型的列将被转换,而不能转换(例如,它们包含非数字字符串或日期)的列将被保留。

下垂

默认情况下,with转换to_numeric()将为您提供a int64float64dtype(或平台固有的任何整数宽度)。

通常这就是您想要的,但是如果您想节省一些内存并使用更紧凑的dtype,如float32int8呢?

to_numeric()您可以选择向下转换为“整数”,“有符号”,“无符号”,“浮点型”。这是一个简单s的整数类型系列的示例:

>>> s = pd.Series([1, 2, -7])
>>> s
0    1
1    2
2   -7
dtype: int64

向下转换为“整数”将使用可以保存值的最小整数:

>>> pd.to_numeric(s, downcast='integer')
0    1
1    2
2   -7
dtype: int8

向下转换为“ float”类似地选择了一个比普通浮点型小的类型:

>>> pd.to_numeric(s, downcast='float')
0    1.0
1    2.0
2   -7.0
dtype: float32

2。 astype()

astype()方法使您可以明确表示希望DataFrame或Series具有的dtype。它非常通用,可以尝试从一种类型转换为另一种类型。

基本用法

只需选择一个类型:您可以使用NumPy dtype(例如np.int16),某些Python类型(例如bool)或特定于熊猫的类型(例如类别dtype)。

在要转换的对象上调用方法,然后astype()将尝试为您转换:

# convert all DataFrame columns to the int64 dtype
df = df.astype(int)

# convert column "a" to int64 dtype and "b" to complex type
df = df.astype({"a": int, "b": complex})

# convert Series to float16 type
s = s.astype(np.float16)

# convert Series to Python strings
s = s.astype(str)

# convert Series to categorical type - see docs for more details
s = s.astype('category')

注意,我说的是“尝试”-如果astype()不知道如何在Series或DataFrame中转换值,则会引发错误。例如,如果您具有NaNor inf值,则尝试将其转换为整数时会出错。

从熊猫0.20.0开始,可以通过传递来抑制此错误errors='ignore'。您的原始对象将保持原样返回。

小心

astype()功能强大,但有时会“错误地”转换值。例如:

>>> s = pd.Series([1, 2, -7])
>>> s
0    1
1    2
2   -7
dtype: int64

这些都是小整数,那么如何转换为无符号8位类型以节省内存呢?

>>> s.astype(np.uint8)
0      1
1      2
2    249
dtype: uint8

转换工作,但-7包裹轮成为249(如2 8 – 7)!

尝试使用向下转换来pd.to_numeric(s, downcast='unsigned')帮助防止此错误。


3。 infer_objects()

pandas的0.21.0版引入了infer_objects()将具有对象数据类型的DataFrame列转换为更特定类型(软转换)的方法。

例如,这是一个带有两列对象类型的DataFrame。一个保存实际的整数,另一个保存代表整数的字符串:

>>> df = pd.DataFrame({'a': [7, 1, 5], 'b': ['3','2','1']}, dtype='object')
>>> df.dtypes
a    object
b    object
dtype: object

使用infer_objects(),您可以将列’a’的类型更改为int64:

>>> df = df.infer_objects()
>>> df.dtypes
a     int64
b    object
dtype: object

由于列“ b”的值是字符串而不是整数,因此已被保留。如果要尝试强制将两列都转换为整数类型,则可以df.astype(int)改用。

You have three main options for converting types in pandas:

  1. to_numeric() – provides functionality to safely convert non-numeric types (e.g. strings) to a suitable numeric type. (See also to_datetime() and to_timedelta().)

  2. astype() – convert (almost) any type to (almost) any other type (even if it’s not necessarily sensible to do so). Also allows you to convert to categorial types (very useful).

  3. infer_objects() – a utility method to convert object columns holding Python objects to a pandas type if possible.

Read on for more detailed explanations and usage of each of these methods.


1. to_numeric()

The best way to convert one or more columns of a DataFrame to numeric values is to use pandas.to_numeric().

This function will try to change non-numeric objects (such as strings) into integers or floating point numbers as appropriate.

Basic usage

The input to to_numeric() is a Series or a single column of a DataFrame.

>>> s = pd.Series(["8", 6, "7.5", 3, "0.9"]) # mixed string and numeric values
>>> s
0      8
1      6
2    7.5
3      3
4    0.9
dtype: object

>>> pd.to_numeric(s) # convert everything to float values
0    8.0
1    6.0
2    7.5
3    3.0
4    0.9
dtype: float64

As you can see, a new Series is returned. Remember to assign this output to a variable or column name to continue using it:

# convert Series
my_series = pd.to_numeric(my_series)

# convert column "a" of a DataFrame
df["a"] = pd.to_numeric(df["a"])

You can also use it to convert multiple columns of a DataFrame via the apply() method:

# convert all columns of DataFrame
df = df.apply(pd.to_numeric) # convert all columns of DataFrame

# convert just columns "a" and "b"
df[["a", "b"]] = df[["a", "b"]].apply(pd.to_numeric)

As long as your values can all be converted, that’s probably all you need.

Error handling

But what if some values can’t be converted to a numeric type?

to_numeric() also takes an errors keyword argument that allows you to force non-numeric values to be NaN, or simply ignore columns containing these values.

Here’s an example using a Series of strings s which has the object dtype:

>>> s = pd.Series(['1', '2', '4.7', 'pandas', '10'])
>>> s
0         1
1         2
2       4.7
3    pandas
4        10
dtype: object

The default behaviour is to raise if it can’t convert a value. In this case, it can’t cope with the string ‘pandas’:

>>> pd.to_numeric(s) # or pd.to_numeric(s, errors='raise')
ValueError: Unable to parse string

Rather than fail, we might want ‘pandas’ to be considered a missing/bad numeric value. We can coerce invalid values to NaN as follows using the errors keyword argument:

>>> pd.to_numeric(s, errors='coerce')
0     1.0
1     2.0
2     4.7
3     NaN
4    10.0
dtype: float64

The third option for errors is just to ignore the operation if an invalid value is encountered:

>>> pd.to_numeric(s, errors='ignore')
# the original Series is returned untouched

This last option is particularly useful when you want to convert your entire DataFrame, but don’t not know which of our columns can be converted reliably to a numeric type. In that case just write:

df.apply(pd.to_numeric, errors='ignore')

The function will be applied to each column of the DataFrame. Columns that can be converted to a numeric type will be converted, while columns that cannot (e.g. they contain non-digit strings or dates) will be left alone.

Downcasting

By default, conversion with to_numeric() will give you either a int64 or float64 dtype (or whatever integer width is native to your platform).

That’s usually what you want, but what if you wanted to save some memory and use a more compact dtype, like float32, or int8?

to_numeric() gives you the option to downcast to either ‘integer’, ‘signed’, ‘unsigned’, ‘float’. Here’s an example for a simple series s of integer type:

>>> s = pd.Series([1, 2, -7])
>>> s
0    1
1    2
2   -7
dtype: int64

Downcasting to ‘integer’ uses the smallest possible integer that can hold the values:

>>> pd.to_numeric(s, downcast='integer')
0    1
1    2
2   -7
dtype: int8

Downcasting to ‘float’ similarly picks a smaller than normal floating type:

>>> pd.to_numeric(s, downcast='float')
0    1.0
1    2.0
2   -7.0
dtype: float32

2. astype()

The astype() method enables you to be explicit about the dtype you want your DataFrame or Series to have. It’s very versatile in that you can try and go from one type to the any other.

Basic usage

Just pick a type: you can use a NumPy dtype (e.g. np.int16), some Python types (e.g. bool), or pandas-specific types (like the categorical dtype).

Call the method on the object you want to convert and astype() will try and convert it for you:

# convert all DataFrame columns to the int64 dtype
df = df.astype(int)

# convert column "a" to int64 dtype and "b" to complex type
df = df.astype({"a": int, "b": complex})

# convert Series to float16 type
s = s.astype(np.float16)

# convert Series to Python strings
s = s.astype(str)

# convert Series to categorical type - see docs for more details
s = s.astype('category')

Notice I said “try” – if astype() does not know how to convert a value in the Series or DataFrame, it will raise an error. For example if you have a NaN or inf value you’ll get an error trying to convert it to an integer.

As of pandas 0.20.0, this error can be suppressed by passing errors='ignore'. Your original object will be return untouched.

Be careful

astype() is powerful, but it will sometimes convert values “incorrectly”. For example:

>>> s = pd.Series([1, 2, -7])
>>> s
0    1
1    2
2   -7
dtype: int64

These are small integers, so how about converting to an unsigned 8-bit type to save memory?

>>> s.astype(np.uint8)
0      1
1      2
2    249
dtype: uint8

The conversion worked, but the -7 was wrapped round to become 249 (i.e. 28 – 7)!

Trying to downcast using pd.to_numeric(s, downcast='unsigned') instead could help prevent this error.


3. infer_objects()

Version 0.21.0 of pandas introduced the method infer_objects() for converting columns of a DataFrame that have an object datatype to a more specific type (soft conversions).

For example, here’s a DataFrame with two columns of object type. One holds actual integers and the other holds strings representing integers:

>>> df = pd.DataFrame({'a': [7, 1, 5], 'b': ['3','2','1']}, dtype='object')
>>> df.dtypes
a    object
b    object
dtype: object

Using infer_objects(), you can change the type of column ‘a’ to int64:

>>> df = df.infer_objects()
>>> df.dtypes
a     int64
b    object
dtype: object

Column ‘b’ has been left alone since its values were strings, not integers. If you wanted to try and force the conversion of both columns to an integer type, you could use df.astype(int) instead.


回答 1

这个怎么样?

a = [['a', '1.2', '4.2'], ['b', '70', '0.03'], ['x', '5', '0']]
df = pd.DataFrame(a, columns=['one', 'two', 'three'])
df
Out[16]: 
  one  two three
0   a  1.2   4.2
1   b   70  0.03
2   x    5     0

df.dtypes
Out[17]: 
one      object
two      object
three    object

df[['two', 'three']] = df[['two', 'three']].astype(float)

df.dtypes
Out[19]: 
one       object
two      float64
three    float64

How about this?

a = [['a', '1.2', '4.2'], ['b', '70', '0.03'], ['x', '5', '0']]
df = pd.DataFrame(a, columns=['one', 'two', 'three'])
df
Out[16]: 
  one  two three
0   a  1.2   4.2
1   b   70  0.03
2   x    5     0

df.dtypes
Out[17]: 
one      object
two      object
three    object

df[['two', 'three']] = df[['two', 'three']].astype(float)

df.dtypes
Out[19]: 
one       object
two      float64
three    float64

回答 2

下面的代码将更改列的数据类型。

df[['col.name1', 'col.name2'...]] = df[['col.name1', 'col.name2'..]].astype('data_type')

您可以给数据类型代替数据类型。您想要什么,例如str,float,int等。

this below code will change datatype of column.

df[['col.name1', 'col.name2'...]] = df[['col.name1', 'col.name2'..]].astype('data_type')

in place of data type you can give your datatype .what do you want like str,float,int etc.


回答 3

当我只需要指定特定的列并且想要明确时,我就使用了(每个DOCS LOCATION):

dataframe = dataframe.astype({'col_name_1':'int','col_name_2':'float64', etc. ...})

因此,使用原始问题,但为其提供列名称…

a = [['a', '1.2', '4.2'], ['b', '70', '0.03'], ['x', '5', '0']]
df = pd.DataFrame(a, columns=['col_name_1', 'col_name_2', 'col_name_3'])
df = df.astype({'col_name_2':'float64', 'col_name_3':'float64'})

When I’ve only needed to specify specific columns, and I want to be explicit, I’ve used (per DOCS LOCATION):

dataframe = dataframe.astype({'col_name_1':'int','col_name_2':'float64', etc. ...})

So, using the original question, but providing column names to it …

a = [['a', '1.2', '4.2'], ['b', '70', '0.03'], ['x', '5', '0']]
df = pd.DataFrame(a, columns=['col_name_1', 'col_name_2', 'col_name_3'])
df = df.astype({'col_name_2':'float64', 'col_name_3':'float64'})

回答 4

这是一个函数,该函数将DataFrame和列列表作为参数,并将列中的所有数据强制转换为数字。

# df is the DataFrame, and column_list is a list of columns as strings (e.g ["col1","col2","col3"])
# dependencies: pandas

def coerce_df_columns_to_numeric(df, column_list):
    df[column_list] = df[column_list].apply(pd.to_numeric, errors='coerce')

因此,以您的示例为例:

import pandas as pd

def coerce_df_columns_to_numeric(df, column_list):
    df[column_list] = df[column_list].apply(pd.to_numeric, errors='coerce')

a = [['a', '1.2', '4.2'], ['b', '70', '0.03'], ['x', '5', '0']]
df = pd.DataFrame(a, columns=['col1','col2','col3'])

coerce_df_columns_to_numeric(df, ['col2','col3'])

Here is a function that takes as its arguments a DataFrame and a list of columns and coerces all data in the columns to numbers.

# df is the DataFrame, and column_list is a list of columns as strings (e.g ["col1","col2","col3"])
# dependencies: pandas

def coerce_df_columns_to_numeric(df, column_list):
    df[column_list] = df[column_list].apply(pd.to_numeric, errors='coerce')

So, for your example:

import pandas as pd

def coerce_df_columns_to_numeric(df, column_list):
    df[column_list] = df[column_list].apply(pd.to_numeric, errors='coerce')

a = [['a', '1.2', '4.2'], ['b', '70', '0.03'], ['x', '5', '0']]
df = pd.DataFrame(a, columns=['col1','col2','col3'])

coerce_df_columns_to_numeric(df, ['col2','col3'])

回答 5

如何创建两个数据框,每个数据框的列具有不同的数据类型,然后将它们附加在一起?

d1 = pd.DataFrame(columns=[ 'float_column' ], dtype=float)
d1 = d1.append(pd.DataFrame(columns=[ 'string_column' ], dtype=str))

结果

In[8}:  d1.dtypes
Out[8]: 
float_column     float64
string_column     object
dtype: object

创建数据框后,可以在第一列中填充浮点变量,并在第二列中填充字符串(或所需的任何数据类型)。

How about creating two dataframes, each with different data types for their columns, and then appending them together?

d1 = pd.DataFrame(columns=[ 'float_column' ], dtype=float)
d1 = d1.append(pd.DataFrame(columns=[ 'string_column' ], dtype=str))

Results

In[8}:  d1.dtypes
Out[8]: 
float_column     float64
string_column     object
dtype: object

After the dataframe is created, you can populate it with floating point variables in the 1st column, and strings (or any data type you desire) in the 2nd column.


回答 6

熊猫> = 1.0

这是一张图表,总结了熊猫中一些最重要的转换。

在此处输入图片说明

转换为字符串很简单.astype(str),未在图中显示。

“硬”对“软”转换

注意,在这种情况下,“转换”既可以指将文本数据转换为实际数据类型(硬转换),也可以为对象列中的数据推断更合适的数据类型(软转换)。为了说明不同之处,请看一下

df = pd.DataFrame({'a': ['1', '2', '3'], 'b': [4, 5, 6]}, dtype=object)
df.dtypes                                                                  

a    object
b    object
dtype: object

# Actually converts string to numeric - hard conversion
df.apply(pd.to_numeric).dtypes                                             

a    int64
b    int64
dtype: object

# Infers better data types for object data - soft conversion
df.infer_objects().dtypes                                                  

a    object  # no change
b     int64
dtype: object

# Same as infer_objects, but converts to equivalent ExtensionType
df.convert_dtypes().dtypes                                                     

pandas >= 1.0

Here’s a chart that summarises some of the most important conversions in pandas.

enter image description here

Conversions to string are trivial .astype(str) and are not shown in the figure.

“Hard” versus “Soft” conversions

Note that “conversions” in this context could either refer to converting text data into their actual data type (hard conversion), or inferring more appropriate data types for data in object columns (soft conversion). To illustrate the difference, take a look at

df = pd.DataFrame({'a': ['1', '2', '3'], 'b': [4, 5, 6]}, dtype=object)
df.dtypes                                                                  

a    object
b    object
dtype: object

# Actually converts string to numeric - hard conversion
df.apply(pd.to_numeric).dtypes                                             

a    int64
b    int64
dtype: object

# Infers better data types for object data - soft conversion
df.infer_objects().dtypes                                                  

a    object  # no change
b     int64
dtype: object

# Same as infer_objects, but converts to equivalent ExtensionType
df.convert_dtypes().dtypes                                                     

回答 7

我以为我遇到了同样的问题,但实际上我有一些细微的差别,使问题更容易解决。对于其他正在看这个问题的人,值得检查输入列表的格式。就我而言,数字最初是浮动的,而不是问题中的字符串:

a = [['a', 1.2, 4.2], ['b', 70, 0.03], ['x', 5, 0]]

但是通过在创建数据框之前过多处理列表,我丢失了类型,所有内容都变成了字符串。

通过numpy数组创建数据框

df = pd.DataFrame(np.array(a))

df
Out[5]: 
   0    1     2
0  a  1.2   4.2
1  b   70  0.03
2  x    5     0

df[1].dtype
Out[7]: dtype('O')

给出与问题相同的数据帧,其中第1列和第2列中的条目被视为字符串。但是做

df = pd.DataFrame(a)

df
Out[10]: 
   0     1     2
0  a   1.2  4.20
1  b  70.0  0.03
2  x   5.0  0.00

df[1].dtype
Out[11]: dtype('float64')

确实给出了具有正确格式的列的数据框

I thought I had the same problem but actually I have a slight difference that makes the problem easier to solve. For others looking at this question it’s worth checking the format of your input list. In my case the numbers are initially floats not strings as in the question:

a = [['a', 1.2, 4.2], ['b', 70, 0.03], ['x', 5, 0]]

but by processing the list too much before creating the dataframe I lose the types and everything becomes a string.

Creating the data frame via a numpy array

df = pd.DataFrame(np.array(a))

df
Out[5]: 
   0    1     2
0  a  1.2   4.2
1  b   70  0.03
2  x    5     0

df[1].dtype
Out[7]: dtype('O')

gives the same data frame as in the question, where the entries in columns 1 and 2 are considered as strings. However doing

df = pd.DataFrame(a)

df
Out[10]: 
   0     1     2
0  a   1.2  4.20
1  b  70.0  0.03
2  x   5.0  0.00

df[1].dtype
Out[11]: dtype('float64')

does actually give a data frame with the columns in the correct format


回答 8

从熊猫1.0.0开始,我们有了pandas.DataFrame.convert_dtypes。您甚至可以控制要转换的类型!

In [40]: df = pd.DataFrame(
    ...:     {
    ...:         "a": pd.Series([1, 2, 3], dtype=np.dtype("int32")),
    ...:         "b": pd.Series(["x", "y", "z"], dtype=np.dtype("O")),
    ...:         "c": pd.Series([True, False, np.nan], dtype=np.dtype("O")),
    ...:         "d": pd.Series(["h", "i", np.nan], dtype=np.dtype("O")),
    ...:         "e": pd.Series([10, np.nan, 20], dtype=np.dtype("float")),
    ...:         "f": pd.Series([np.nan, 100.5, 200], dtype=np.dtype("float")),
    ...:     }
    ...: )

In [41]: dff = df.copy()

In [42]: df 
Out[42]: 
   a  b      c    d     e      f
0  1  x   True    h  10.0    NaN
1  2  y  False    i   NaN  100.5
2  3  z    NaN  NaN  20.0  200.0

In [43]: df.dtypes
Out[43]: 
a      int32
b     object
c     object
d     object
e    float64
f    float64
dtype: object

In [44]: df = df.convert_dtypes()

In [45]: df.dtypes
Out[45]: 
a      Int32
b     string
c    boolean
d     string
e      Int64
f    float64
dtype: object

In [46]: dff = dff.convert_dtypes(convert_boolean = False)

In [47]: dff.dtypes
Out[47]: 
a      Int32
b     string
c     object
d     string
e      Int64
f    float64
dtype: object

Starting pandas 1.0.0, we have pandas.DataFrame.convert_dtypes. You can even control what types to convert!

In [40]: df = pd.DataFrame(
    ...:     {
    ...:         "a": pd.Series([1, 2, 3], dtype=np.dtype("int32")),
    ...:         "b": pd.Series(["x", "y", "z"], dtype=np.dtype("O")),
    ...:         "c": pd.Series([True, False, np.nan], dtype=np.dtype("O")),
    ...:         "d": pd.Series(["h", "i", np.nan], dtype=np.dtype("O")),
    ...:         "e": pd.Series([10, np.nan, 20], dtype=np.dtype("float")),
    ...:         "f": pd.Series([np.nan, 100.5, 200], dtype=np.dtype("float")),
    ...:     }
    ...: )

In [41]: dff = df.copy()

In [42]: df 
Out[42]: 
   a  b      c    d     e      f
0  1  x   True    h  10.0    NaN
1  2  y  False    i   NaN  100.5
2  3  z    NaN  NaN  20.0  200.0

In [43]: df.dtypes
Out[43]: 
a      int32
b     object
c     object
d     object
e    float64
f    float64
dtype: object

In [44]: df = df.convert_dtypes()

In [45]: df.dtypes
Out[45]: 
a      Int32
b     string
c    boolean
d     string
e      Int64
f    float64
dtype: object

In [46]: dff = dff.convert_dtypes(convert_boolean = False)

In [47]: dff.dtypes
Out[47]: 
a      Int32
b     string
c     object
d     string
e      Int64
f    float64
dtype: object

如何获取当前文件目录的完整路径?

问题:如何获取当前文件目录的完整路径?

我想获取当前文件的目录路径。我试过了:

>>> os.path.abspath(__file__)
'C:\\python27\\test.py'

但是如何检索目录的路径?

例如:

'C:\\python27\\'

I want to get the current file’s directory path. I tried:

>>> os.path.abspath(__file__)
'C:\\python27\\test.py'

But how can I retrieve the directory’s path?

For example:

'C:\\python27\\'

回答 0

Python 3

对于正在运行的脚本的目录:

import pathlib
pathlib.Path(__file__).parent.absolute()

对于当前工作目录:

import pathlib
pathlib.Path().absolute()

Python 2和3

对于正在运行的脚本的目录:

import os
os.path.dirname(os.path.abspath(__file__))

如果您的意思是当前工作目录:

import os
os.path.abspath(os.getcwd())

请注意,前后分别file是两个下划线,而不仅仅是一个。

另请注意,如果您正在交互运行或已从文件以外的内容(例如数据库或在线资源)中加载了代码,则__file__可能不会设置,因为没有“当前文件”的概念。上面的答案假设运行文件中的python脚本的最常见情况。

参考文献

  1. python文档中的pathlib
  2. os.path 2.7os.path 3.8
  3. os.getcwd 2.7os.getcwd 3.8
  4. __file__变量的含义/作用是什么?

Python 3

For the directory of the script being run:

import pathlib
pathlib.Path(__file__).parent.absolute()

For the current working directory:

import pathlib
pathlib.Path().absolute()

Python 2 and 3

For the directory of the script being run:

import os
os.path.dirname(os.path.abspath(__file__))

If you mean the current working directory:

import os
os.path.abspath(os.getcwd())

Note that before and after file is two underscores, not just one.

Also note that if you are running interactively or have loaded code from something other than a file (eg: a database or online resource), __file__ may not be set since there is no notion of “current file”. The above answer assumes the most common scenario of running a python script that is in a file.

References

  1. pathlib in the python documentation.
  2. os.path 2.7, os.path 3.8
  3. os.getcwd 2.7, os.getcwd 3.8
  4. what does the __file__ variable mean/do?

回答 1

使用Path是因为Python 3的推荐方式:

from pathlib import Path
print("File      Path:", Path(__file__).absolute())
print("Directory Path:", Path().absolute())  

文档:pathlib

注意:如果使用Jupyter Notebook,__file__则不会返回期望值,因此Path().absolute()必须使用。

Using Path is the recommended way since Python 3:

from pathlib import Path
print("File      Path:", Path(__file__).absolute())
print("Directory Path:", Path().absolute())  

Documentation: pathlib

Note: If using Jupyter Notebook, __file__ doesn’t return expected value, so Path().absolute() has to be used.


回答 2

在Python 3.x中,我这样做:

from pathlib import Path

path = Path(__file__).parent.absolute()

说明:

  • Path(__file__) 是当前文件的路径。
  • .parent为您提供文件所在的目录
  • .absolute()给您完整的绝对路径。

使用pathlib是使用路径的现代方法。如果以后由于某种原因需要它作为字符串,只需执行str(path)

In Python 3.x I do:

from pathlib import Path

path = Path(__file__).parent.absolute()

Explanation:

  • Path(__file__) is the path to the current file.
  • .parent gives you the directory the file is in.
  • .absolute() gives you the full absolute path to it.

Using pathlib is the modern way to work with paths. If you need it as a string later for some reason, just do str(path).


回答 3

import os
print os.path.dirname(__file__)
import os
print os.path.dirname(__file__)

回答 4

您可以轻松地使用os和存储os.path库,如下所示

import os
os.chdir(os.path.dirname(os.getcwd()))

os.path.dirname从当前目录返回上一级目录。它使我们可以在不传递任何文件参数且不知道绝对路径的情况下切换到更高级别。

You can use os and os.path library easily as follows

import os
os.chdir(os.path.dirname(os.getcwd()))

os.path.dirname returns upper directory from current one. It lets us change to an upper level without passing any file argument and without knowing absolute path.


回答 5

尝试这个:

import os
dir_path = os.path.dirname(os.path.realpath(__file__))

Try this:

import os
dir_path = os.path.dirname(os.path.realpath(__file__))

回答 6

我发现以下命令将全部返回Python 3.6脚本的父目录的完整路径。

Python 3.6脚本:

#!/usr/bin/env python3.6
# -*- coding: utf-8 -*-

from pathlib import Path

#Get the absolute path of a Python3.6 script
dir1 = Path().resolve()  #Make the path absolute, resolving any symlinks.
dir2 = Path().absolute() #See @RonKalian answer 
dir3 = Path(__file__).parent.absolute() #See @Arminius answer 

print(f'dir1={dir1}\ndir2={dir2}\ndir3={dir3}')

说明链接:.resolve() .absolute() 路径(文件).parent()绝对的()。

I found the following commands will all return the full path of the parent directory of a Python 3.6 script.

Python 3.6 Script:

#!/usr/bin/env python3.6
# -*- coding: utf-8 -*-

from pathlib import Path

#Get the absolute path of a Python3.6 script
dir1 = Path().resolve()  #Make the path absolute, resolving any symlinks.
dir2 = Path().absolute() #See @RonKalian answer 
dir3 = Path(__file__).parent.absolute() #See @Arminius answer 

print(f'dir1={dir1}\ndir2={dir2}\ndir3={dir3}')

Explanation links: .resolve(), .absolute(), Path(file).parent().absolute()


回答 7

系统:MacOS

版本:Python 3.6 w / Anaconda

import os

rootpath = os.getcwd()

os.chdir(rootpath)

System: MacOS

Version: Python 3.6 w/ Anaconda

import os

rootpath = os.getcwd()

os.chdir(rootpath)

回答 8

PYTHON中的有用路径属性:

 from pathlib import Path

    #Returns the path of the directory, where your script file is placed
    mypath = Path().absolute()
    print('Absolute path : {}'.format(mypath))

    #if you want to go to any other file inside the subdirectories of the directory path got from above method
    filePath = mypath/'data'/'fuel_econ.csv'
    print('File path : {}'.format(filePath))

    #To check if file present in that directory or Not
    isfileExist = filePath.exists()
    print('isfileExist : {}'.format(isfileExist))

    #To check if the path is a directory or a File
    isadirectory = filePath.is_dir()
    print('isadirectory : {}'.format(isadirectory))

    #To get the extension of the file
    fileExtension = mypath/'data'/'fuel_econ.csv'
    print('File extension : {}'.format(filePath.suffix))

输出: 绝对路径是放置Python文件的路径

绝对路径:D:\ Study \ Machine Learning \ Jupitor Notebook \ JupytorNotebookTest2 \ Udacity_Scripts \ Matplotlib和seaborn Part2

文件路径:D:\ Study \ Machine Learning \ Jupitor Notebook \ JupytorNotebookTest2 \ Udacity_Scripts \ Matplotlib和seaborn Part2 \ data \ fuel_econ.csv

isfileExist:真

isadirectory:错误

文件扩展名:.csv

USEFUL PATH PROPERTIES IN PYTHON:

 from pathlib import Path

    #Returns the path of the directory, where your script file is placed
    mypath = Path().absolute()
    print('Absolute path : {}'.format(mypath))

    #if you want to go to any other file inside the subdirectories of the directory path got from above method
    filePath = mypath/'data'/'fuel_econ.csv'
    print('File path : {}'.format(filePath))

    #To check if file present in that directory or Not
    isfileExist = filePath.exists()
    print('isfileExist : {}'.format(isfileExist))

    #To check if the path is a directory or a File
    isadirectory = filePath.is_dir()
    print('isadirectory : {}'.format(isadirectory))

    #To get the extension of the file
    fileExtension = mypath/'data'/'fuel_econ.csv'
    print('File extension : {}'.format(filePath.suffix))

OUTPUT: ABSOLUTE PATH IS THE PATH WHERE YOUR PYTHON FILE IS PLACED

Absolute path : D:\Study\Machine Learning\Jupitor Notebook\JupytorNotebookTest2\Udacity_Scripts\Matplotlib and seaborn Part2

File path : D:\Study\Machine Learning\Jupitor Notebook\JupytorNotebookTest2\Udacity_Scripts\Matplotlib and seaborn Part2\data\fuel_econ.csv

isfileExist : True

isadirectory : False

File extension : .csv


回答 9

如果您只想查看当前的工作目录

import os
print(os.getcwd)

如果要更改当前工作目录

os.chdir(path)

path是一个字符串,其中包含要移动的所需路径。例如

path = "C:\\Users\\xyz\\Desktop\\move here"

If you just want to see the current working directory

import os
print(os.getcwd)

If you want to change the current working directory

os.chdir(path)

path is a string containing the required path to be moved. e.g.

path = "C:\\Users\\xyz\\Desktop\\move here"

回答 10

IPython有一个神奇的命令%pwd来获取当前的工作目录。它可以按以下方式使用:

from IPython.terminal.embed import InteractiveShellEmbed

ip_shell = InteractiveShellEmbed()

present_working_directory = ip_shell.magic("%pwd")

在IPython Jupyter Notebook上%pwd可以直接使用,如下所示:

present_working_directory = %pwd

IPython has a magic command %pwd to get the present working directory. It can be used in following way:

from IPython.terminal.embed import InteractiveShellEmbed

ip_shell = InteractiveShellEmbed()

present_working_directory = ip_shell.magic("%pwd")

On IPython Jupyter Notebook %pwd can be used directly as following:

present_working_directory = %pwd

回答 11

要保持跨平台(macOS / Windows / Linux)的迁移一致性,请尝试:

path = r'%s' % os.getcwd().replace('\\','/')

To keep the migration consistency across platforms (macOS/Windows/Linux), try:

path = r'%s' % os.getcwd().replace('\\','/')

回答 12

为了获取当前文件夹,我已经在CGI的IIS下运行python时使用了一个函数:

import os 
   def getLocalFolder():
        path=str(os.path.dirname(os.path.abspath(__file__))).split('\\')
        return path[len(path)-1]

I have made a function to use when running python under IIS in CGI in order to get the current folder:

import os 
   def getLocalFolder():
        path=str(os.path.dirname(os.path.abspath(__file__))).split('\\')
        return path[len(path)-1]

回答 13

假设您具有以下目录结构:-

主/折1折2折3 …

folders = glob.glob("main/fold*")

for fold in folders:
    abspath = os.path.dirname(os.path.abspath(fold))
    fullpath = os.path.join(abspath, sch)
    print(fullpath)

Let’s assume you have the following directory structure: –

main/ fold1 fold2 fold3…

folders = glob.glob("main/fold*")

for fold in folders:
    abspath = os.path.dirname(os.path.abspath(fold))
    fullpath = os.path.join(abspath, sch)
    print(fullpath)

回答 14

## IMPORT MODULES
import os

## CALCULATE FILEPATH VARIABLE
filepath = os.path.abspath('') ## ~ os.getcwd()
## TEST TO MAKE SURE os.getcwd() is EQUIVALENT ALWAYS..
## ..OR DIFFERENT IN SOME CIRCUMSTANCES
## IMPORT MODULES
import os

## CALCULATE FILEPATH VARIABLE
filepath = os.path.abspath('') ## ~ os.getcwd()
## TEST TO MAKE SURE os.getcwd() is EQUIVALENT ALWAYS..
## ..OR DIFFERENT IN SOME CIRCUMSTANCES

在Python中模拟do-while循环?

问题:在Python中模拟do-while循环?

我需要在Python程序中模拟do-while循环。不幸的是,以下简单的代码不起作用:

list_of_ints = [ 1, 2, 3 ]
iterator = list_of_ints.__iter__()
element = None

while True:
  if element:
    print element

  try:
    element = iterator.next()
  except StopIteration:
    break

print "done"

代替“ 1,2,3,done”,它输出以下输出:

[stdout:]1
[stdout:]2
[stdout:]3
None['Traceback (most recent call last):
', '  File "test_python.py", line 8, in <module>
    s = i.next()
', 'StopIteration
']

为了捕获“停止迭代”异常并正确中断while循环,我该怎么办?

为什么需要这种东西的一个示例在下面显示为伪代码。

状态机:

s = ""
while True :
  if state is STATE_CODE :
    if "//" in s :
      tokens.add( TOKEN_COMMENT, s.split( "//" )[1] )
      state = STATE_COMMENT
    else :
      tokens.add( TOKEN_CODE, s )
  if state is STATE_COMMENT :
    if "//" in s :
      tokens.append( TOKEN_COMMENT, s.split( "//" )[1] )
    else
      state = STATE_CODE
      # Re-evaluate same line
      continue
  try :
    s = i.next()
  except StopIteration :
    break

I need to emulate a do-while loop in a Python program. Unfortunately, the following straightforward code does not work:

list_of_ints = [ 1, 2, 3 ]
iterator = list_of_ints.__iter__()
element = None

while True:
  if element:
    print element

  try:
    element = iterator.next()
  except StopIteration:
    break

print "done"

Instead of “1,2,3,done”, it prints the following output:

[stdout:]1
[stdout:]2
[stdout:]3
None['Traceback (most recent call last):
', '  File "test_python.py", line 8, in <module>
    s = i.next()
', 'StopIteration
']

What can I do in order to catch the ‘stop iteration’ exception and break a while loop properly?

An example of why such a thing may be needed is shown below as pseudocode.

State machine:

s = ""
while True :
  if state is STATE_CODE :
    if "//" in s :
      tokens.add( TOKEN_COMMENT, s.split( "//" )[1] )
      state = STATE_COMMENT
    else :
      tokens.add( TOKEN_CODE, s )
  if state is STATE_COMMENT :
    if "//" in s :
      tokens.append( TOKEN_COMMENT, s.split( "//" )[1] )
    else
      state = STATE_CODE
      # Re-evaluate same line
      continue
  try :
    s = i.next()
  except StopIteration :
    break

回答 0

我不确定您要做什么。您可以像这样实现一个do-while循环:

while True:
  stuff()
  if fail_condition:
    break

要么:

stuff()
while not fail_condition:
  stuff()

您在尝试使用do while循环来打印列表中的内容在做什么?为什么不使用:

for i in l:
  print i
print "done"

更新:

那你有行列表吗?而您想继续迭代呢?怎么样:

for s in l: 
  while True: 
    stuff() 
    # use a "break" instead of s = i.next()

看起来像您想要的东西吗?在您的代码示例中,它将是:

for s in some_list:
  while True:
    if state is STATE_CODE:
      if "//" in s:
        tokens.add( TOKEN_COMMENT, s.split( "//" )[1] )
        state = STATE_COMMENT
      else :
        tokens.add( TOKEN_CODE, s )
    if state is STATE_COMMENT:
      if "//" in s:
        tokens.append( TOKEN_COMMENT, s.split( "//" )[1] )
        break # get next s
      else:
        state = STATE_CODE
        # re-evaluate same line
        # continues automatically

I am not sure what you are trying to do. You can implement a do-while loop like this:

while True:
  stuff()
  if fail_condition:
    break

Or:

stuff()
while not fail_condition:
  stuff()

What are you doing trying to use a do while loop to print the stuff in the list? Why not just use:

for i in l:
  print i
print "done"

Update:

So do you have a list of lines? And you want to keep iterating through it? How about:

for s in l: 
  while True: 
    stuff() 
    # use a "break" instead of s = i.next()

Does that seem like something close to what you would want? With your code example, it would be:

for s in some_list:
  while True:
    if state is STATE_CODE:
      if "//" in s:
        tokens.add( TOKEN_COMMENT, s.split( "//" )[1] )
        state = STATE_COMMENT
      else :
        tokens.add( TOKEN_CODE, s )
    if state is STATE_COMMENT:
      if "//" in s:
        tokens.append( TOKEN_COMMENT, s.split( "//" )[1] )
        break # get next s
      else:
        state = STATE_CODE
        # re-evaluate same line
        # continues automatically

回答 1

这是一种模拟do-while循环的非常简单的方法:

condition = True
while condition:
    # loop body here
    condition = test_loop_condition()
# end of loop

同时执行循环的关键特征是循环主体始终至少执行一次,并且条件在循环主体的底部进行评估。此处显示的控制结构无需异常或break语句即可完成这两项操作。它确实引入了一个额外的布尔变量。

Here’s a very simple way to emulate a do-while loop:

condition = True
while condition:
    # loop body here
    condition = test_loop_condition()
# end of loop

The key features of a do-while loop are that the loop body always executes at least once, and that the condition is evaluated at the bottom of the loop body. The control structure show here accomplishes both of these with no need for exceptions or break statements. It does introduce one extra Boolean variable.


回答 2

我下面的代码可能是一个有用的实现,着重说明了两者之间的主要区别 据我了解。

因此,在这种情况下,您总是至少要循环一次。

first_pass = True
while first_pass or condition:
    first_pass = False
    do_stuff()

My code below might be a useful implementation, highlighting the main difference between vs as I understand it.

So in this one case, you always go through the loop at least once.

first_pass = True
while first_pass or condition:
    first_pass = False
    do_stuff()

回答 3

异常会破坏循环,因此您最好在循环之外进行处理。

try:
  while True:
    if s:
      print s
    s = i.next()
except StopIteration:   
  pass

我想您的代码的问题是break内部行为except未定义。通常break,仅上移一个级别,因此,例如breaktryInside直接进入finally(如果存在的话)a try,而不是循环。

相关的PEP:http
: //www.python.org/dev/peps/pep-3136相关的问题:打破嵌套循环

Exception will break the loop, so you might as well handle it outside the loop.

try:
  while True:
    if s:
      print s
    s = i.next()
except StopIteration:   
  pass

I guess that the problem with your code is that behaviour of break inside except is not defined. Generally break goes only one level up, so e.g. break inside try goes directly to finally (if it exists) an out of the try, but not out of the loop.

Related PEP: http://www.python.org/dev/peps/pep-3136
Related question: Breaking out of nested loops


回答 4

do {
  stuff()
} while (condition())

->

while True:
  stuff()
  if not condition():
    break

您可以执行以下功能:

def do_while(stuff, condition):
  while condition(stuff()):
    pass

但是1)这很丑。2)条件应该是带有一个参数的函数,应该由填充物填充(这是使用经典while循环的唯一原因。)

do {
  stuff()
} while (condition())

->

while True:
  stuff()
  if not condition():
    break

You can do a function:

def do_while(stuff, condition):
  while condition(stuff()):
    pass

But 1) It’s ugly. 2) Condition should be a function with one parameter, supposed to be filled by stuff (it’s the only reason not to use the classic while loop.)


回答 5

这是使用协程的不同模式的更疯狂的解决方案。代码仍然非常相似,但有一个重要区别。根本没有退出条件!当您停止向数据提供数据时,协程(实际上是协程链)就会停止。

def coroutine(func):
    """Coroutine decorator

    Coroutines must be started, advanced to their first "yield" point,
    and this decorator does this automatically.
    """
    def startcr(*ar, **kw):
        cr = func(*ar, **kw)
        cr.next()
        return cr
    return startcr

@coroutine
def collector(storage):
    """Act as "sink" and collect all sent in @storage"""
    while True:
        storage.append((yield))

@coroutine      
def state_machine(sink):
    """ .send() new parts to be tokenized by the state machine,
    tokens are passed on to @sink
    """ 
    s = ""
    state = STATE_CODE
    while True: 
        if state is STATE_CODE :
            if "//" in s :
                sink.send((TOKEN_COMMENT, s.split( "//" )[1] ))
                state = STATE_COMMENT
            else :
                sink.send(( TOKEN_CODE, s ))
        if state is STATE_COMMENT :
            if "//" in s :
                sink.send(( TOKEN_COMMENT, s.split( "//" )[1] ))
            else
                state = STATE_CODE
                # re-evaluate same line
                continue
        s = (yield)

tokens = []
sm = state_machine(collector(tokens))
for piece in i:
    sm.send(piece)

上述收集的代码中的所有令牌作为元组tokens和我假定之间不存在差异.append()并且.add()在原始代码中。

Here is a crazier solution of a different pattern — using coroutines. The code is still very similar, but with one important difference; there are no exit conditions at all! The coroutine (chain of coroutines really) just stops when you stop feeding it with data.

def coroutine(func):
    """Coroutine decorator

    Coroutines must be started, advanced to their first "yield" point,
    and this decorator does this automatically.
    """
    def startcr(*ar, **kw):
        cr = func(*ar, **kw)
        cr.next()
        return cr
    return startcr

@coroutine
def collector(storage):
    """Act as "sink" and collect all sent in @storage"""
    while True:
        storage.append((yield))

@coroutine      
def state_machine(sink):
    """ .send() new parts to be tokenized by the state machine,
    tokens are passed on to @sink
    """ 
    s = ""
    state = STATE_CODE
    while True: 
        if state is STATE_CODE :
            if "//" in s :
                sink.send((TOKEN_COMMENT, s.split( "//" )[1] ))
                state = STATE_COMMENT
            else :
                sink.send(( TOKEN_CODE, s ))
        if state is STATE_COMMENT :
            if "//" in s :
                sink.send(( TOKEN_COMMENT, s.split( "//" )[1] ))
            else
                state = STATE_CODE
                # re-evaluate same line
                continue
        s = (yield)

tokens = []
sm = state_machine(collector(tokens))
for piece in i:
    sm.send(piece)

The code above collects all tokens as tuples in tokens and I assume there is no difference between .append() and .add() in the original code.


回答 6

我这样做的方式如下…

condition = True
while condition:
     do_stuff()
     condition = (<something that evaluates to True or False>)

在我看来,这是一个简单的解决方案,我很惊讶自己还没有在这里看到它。显然,这也可以转化为

while not condition:

等等

The way I’ve done this is as follows…

condition = True
while condition:
     do_stuff()
     condition = (<something that evaluates to True or False>)

This seems to me to be the simplistic solution, I’m surprised I haven’t seen it here already. This can obviously also be inverted to

while not condition:

etc.


回答 7

做-包含try语句的while循环

loop = True
while loop:
    generic_stuff()
    try:
        questionable_stuff()
#       to break from successful completion
#       loop = False  
    except:
        optional_stuff()
#       to break from unsuccessful completion - 
#       the case referenced in the OP's question
        loop = False
   finally:
        more_generic_stuff()

或者,当不需要“ finally”子句时

while True:
    generic_stuff()
    try:
        questionable_stuff()
#       to break from successful completion
#       break  
    except:
        optional_stuff()
#       to break from unsuccessful completion - 
#       the case referenced in the OP's question
        break

for a do – while loop containing try statements

loop = True
while loop:
    generic_stuff()
    try:
        questionable_stuff()
#       to break from successful completion
#       loop = False  
    except:
        optional_stuff()
#       to break from unsuccessful completion - 
#       the case referenced in the OP's question
        loop = False
   finally:
        more_generic_stuff()

alternatively, when there’s no need for the ‘finally’ clause

while True:
    generic_stuff()
    try:
        questionable_stuff()
#       to break from successful completion
#       break  
    except:
        optional_stuff()
#       to break from unsuccessful completion - 
#       the case referenced in the OP's question
        break

回答 8

while condition is True: 
  stuff()
else:
  stuff()
while condition is True: 
  stuff()
else:
  stuff()

回答 9

快速破解:

def dowhile(func = None, condition = None):
    if not func or not condition:
        return
    else:
        func()
        while condition():
            func()

像这样使用:

>>> x = 10
>>> def f():
...     global x
...     x = x - 1
>>> def c():
        global x
        return x > 0
>>> dowhile(f, c)
>>> print x
0

Quick hack:

def dowhile(func = None, condition = None):
    if not func or not condition:
        return
    else:
        func()
        while condition():
            func()

Use like so:

>>> x = 10
>>> def f():
...     global x
...     x = x - 1
>>> def c():
        global x
        return x > 0
>>> dowhile(f, c)
>>> print x
0

回答 10

你为什么不做

for s in l :
    print s
print "done"

Why don’t you just do

for s in l :
    print s
print "done"

?


回答 11

看看是否有帮助:

在s之前,在异常处理程序中设置一个标志并检查它。

flagBreak = false;
while True :

    if flagBreak : break

    if s :
        print s
    try :
        s = i.next()
    except StopIteration :
        flagBreak = true

print "done"

See if this helps :

Set a flag inside the exception handler and check it before working on the s.

flagBreak = false;
while True :

    if flagBreak : break

    if s :
        print s
    try :
        s = i.next()
    except StopIteration :
        flagBreak = true

print "done"

回答 12

如果您处于资源不可用或可能引发异常的类似情况的循环环境中,则可以使用类似

import time

while True:
    try:
       f = open('some/path', 'r')
    except IOError:
       print('File could not be read. Retrying in 5 seconds')   
       time.sleep(5)
    else:
       break

If you’re in a scenario where you are looping while a resource is unavaliable or something similar that throws an exception, you could use something like

import time

while True:
    try:
       f = open('some/path', 'r')
    except IOError:
       print('File could not be read. Retrying in 5 seconds')   
       time.sleep(5)
    else:
       break

回答 13

对我来说,典型的while循环将是这样的:

xBool = True
# A counter to force a condition (eg. yCount = some integer value)

while xBool:
    # set up the condition (eg. if yCount > 0):
        (Do something)
        yCount = yCount - 1
    else:
        # (condition is not met, set xBool False)
        xBool = False

如果情况允许,我也可以在while循环中包含for..loop,以循环另一组条件。

For me a typical while loop will be something like this:

xBool = True
# A counter to force a condition (eg. yCount = some integer value)

while xBool:
    # set up the condition (eg. if yCount > 0):
        (Do something)
        yCount = yCount - 1
    else:
        # (condition is not met, set xBool False)
        xBool = False

I could include a for..loop within the while loop as well, if situation so warrants, for looping through another set of condition.


如何卸载(重新加载)模块?

问题:如何卸载(重新加载)模块?

我有一台运行时间较长的Python服务器,并且希望能够在不重新启动服务器的情况下升级服务。最好的方法是什么?

if foo.py has changed:
    unimport foo  <-- How do I do this?
    import foo
    myfoo = foo.Foo()

I have a long-running Python server and would like to be able to upgrade a service without restarting the server. What’s the best way do do this?

if foo.py has changed:
    unimport foo  <-- How do I do this?
    import foo
    myfoo = foo.Foo()

回答 0

您可以使用reload内置函数(仅适用于Python 3.4+)重新导入已导入的模块:

from importlib import reload  
import foo

while True:
    # Do some things.
    if is_changed(foo):
        foo = reload(foo)

在Python 3中,reload已移至imp模块。在3.4中,imp不推荐使用importlib,而reload在中添加了。当定位到3或更高版本时,在调用reload或导入它时参考相应的模块。

我认为这就是您想要的。诸如Django开发服务器之类的Web服务器都使用此服务器,这样您就可以查看代码更改的效果,而无需重新启动服务器进程本身。

引用文档:

重新编译Python模块的代码并重新执行模块级代码,从而定义了一组新对象,这些对象绑定到模块字典中的名称。扩展模块的init函数不会被第二次调用。与Python中的所有其他对象一样,旧对象仅在其引用计数降至零后才被回收。模块命名空间中的名称将更新为指向任何新的或更改的对象。对旧对象的其他引用(例如模块外部的名称)不会反弹以引用新对象,并且如果需要的话,必须在出现它们的每个命名空间中进行更新。

正如您在问题中指出的那样,Foo如果Foo类驻留在foo模块中,则必须重构对象。

You can reload a module when it has already been imported by using the reload builtin function (Python 3.4+ only):

from importlib import reload  
import foo

while True:
    # Do some things.
    if is_changed(foo):
        foo = reload(foo)

In Python 3, reload was moved to the imp module. In 3.4, imp was deprecated in favor of importlib, and reload was added to the latter. When targeting 3 or later, either reference the appropriate module when calling reload or import it.

I think that this is what you want. Web servers like Django’s development server use this so that you can see the effects of your code changes without restarting the server process itself.

To quote from the docs:

Python modules’ code is recompiled and the module-level code reexecuted, defining a new set of objects which are bound to names in the module’s dictionary. The init function of extension modules is not called a second time. As with all other objects in Python the old objects are only reclaimed after their reference counts drop to zero. The names in the module namespace are updated to point to any new or changed objects. Other references to the old objects (such as names external to the module) are not rebound to refer to the new objects and must be updated in each namespace where they occur if that is desired.

As you noted in your question, you’ll have to reconstruct Foo objects if the Foo class resides in the foo module.


回答 1

在Python 3.0–3.3中,您将使用: imp.reload(module)

BDFL已经回答了这个问题。

但是,imp在3.4中已弃用,importlib改为(感谢@Stefan!)。

因此,importlib.reload(module)尽管我不确定,但您现在应该使用。

In Python 3.0–3.3 you would use: imp.reload(module)

The BDFL has answered this question.

However, imp was deprecated in 3.4, in favour of importlib (thanks @Stefan!).

I think, therefore, you’d now use importlib.reload(module), although I’m not sure.


回答 2

如果模块不是纯Python,则删除模块可能会特别困难。

以下是一些信息:我如何真正删除导入的模块?

您可以使用sys.getrefcount()来查找实际的引用数。

>>> import sys, empty, os
>>> sys.getrefcount(sys)
9
>>> sys.getrefcount(os)
6
>>> sys.getrefcount(empty)
3

大于3的数字表示很难摆脱该模块。本地的“空”(不包含任何内容)模块应在之后收集垃圾

>>> del sys.modules["empty"]
>>> del empty

作为第三个引用是getrefcount()函数的构件。

It can be especially difficult to delete a module if it is not pure Python.

Here is some information from: How do I really delete an imported module?

You can use sys.getrefcount() to find out the actual number of references.

>>> import sys, empty, os
>>> sys.getrefcount(sys)
9
>>> sys.getrefcount(os)
6
>>> sys.getrefcount(empty)
3

Numbers greater than 3 indicate that it will be hard to get rid of the module. The homegrown “empty” (containing nothing) module should be garbage collected after

>>> del sys.modules["empty"]
>>> del empty

as the third reference is an artifact of the getrefcount() function.


回答 3

reload(module),但前提是它是完全独立的。如果还有其他引用该模块(或属于该模块的任何对象)的引用,则您将得到细微而奇怪的错误,这些错误是由于旧代码的停留时间超出您的预期而导致的,并且isinstance无法在不同版本的相同的代码。

如果您具有单向依赖关系,则还必须重新加载所有依赖于重新加载的模块的模块,以摆脱对旧代码的所有引用。然后递归依赖于重新加载的模块重新加载模块。

如果您有循环依赖关系(例如在处理重新加载程序包时非常常见),则必须一次性卸载组中的所有模块。您无法执行此操作,reload()因为它将在刷新依赖关系之前重新导入每个模块,从而允许旧引用爬入新模块。

在这种情况下,唯一的方法是hack sys.modules,这是不受支持的。您必须仔细检查并删除sys.modules要在下次导入时重新加载的每个条目,还必须删除其值None用于处理实现问题的条目,以缓存失败的相对导入。它不是很好,但是只要您有一套完全独立的依赖项,并且不会将引用保留在其代码库之外,那么它就是可行的。

最好重新启动服务器。:-)

reload(module), but only if it’s completely stand-alone. If anything else has a reference to the module (or any object belonging to the module), then you’ll get subtle and curious errors caused by the old code hanging around longer than you expected, and things like isinstance not working across different versions of the same code.

If you have one-way dependencies, you must also reload all modules that depend on the the reloaded module to get rid of all the references to the old code. And then reload modules that depend on the reloaded modules, recursively.

If you have circular dependencies, which is very common for example when you are dealing with reloading a package, you must unload all the modules in the group in one go. You can’t do this with reload() because it will re-import each module before its dependencies have been refreshed, allowing old references to creep into new modules.

The only way to do it in this case is to hack sys.modules, which is kind of unsupported. You’d have to go through and delete each sys.modules entry you wanted to be reloaded on next import, and also delete entries whose values are None to deal with an implementation issue to do with caching failed relative imports. It’s not terribly nice but as long as you have a fully self-contained set of dependencies that doesn’t leave references outside its codebase, it’s workable.

It’s probably best to restart the server. :-)


回答 4

if 'myModule' in sys.modules:  
    del sys.modules["myModule"]
if 'myModule' in sys.modules:  
    del sys.modules["myModule"]

回答 5

对于Python 2,请使用内置函数reload()

reload(module)

对于Python 2和3.2–3.3,请使用从模块imp重新加载

import imp
imp.reload(module)

但是从3.4版开始imp 不推荐使用importlib,所以请使用:

import importlib
importlib.reload(module)

要么

from importlib import reload
reload(module)

For Python 2 use built-in function reload():

reload(module)

For Python 2 and 3.2–3.3 use reload from module imp:

import imp
imp.reload(module)

But imp is deprecated since version 3.4 in favor of importlib, so use:

import importlib
importlib.reload(module)

or

from importlib import reload
reload(module)

回答 6

以下代码允许您与Python 2/3兼容:

try:
    reload
except NameError:
    # Python 3
    from imp import reload

您可以reload()在两个版本中都使用它,这使事情变得更简单。

The following code allows you Python 2/3 compatibility:

try:
    reload
except NameError:
    # Python 3
    from imp import reload

The you can use it as reload() in both versions which makes things simpler.


回答 7

接受的答案不处理from X import Y的情况。这段代码可以处理它以及标准的导入情况:

def importOrReload(module_name, *names):
    import sys

    if module_name in sys.modules:
        reload(sys.modules[module_name])
    else:
        __import__(module_name, fromlist=names)

    for name in names:
        globals()[name] = getattr(sys.modules[module_name], name)

# use instead of: from dfly_parser import parseMessages
importOrReload("dfly_parser", "parseMessages")

在重载的情况下,我们将顶级名称重新分配给新重载的模块中存储的值,从而更新它们。

The accepted answer doesn’t handle the from X import Y case. This code handles it and the standard import case as well:

def importOrReload(module_name, *names):
    import sys

    if module_name in sys.modules:
        reload(sys.modules[module_name])
    else:
        __import__(module_name, fromlist=names)

    for name in names:
        globals()[name] = getattr(sys.modules[module_name], name)

# use instead of: from dfly_parser import parseMessages
importOrReload("dfly_parser", "parseMessages")

In the reloading case, we reassign the top level names to the values stored in the newly reloaded module, which updates them.


回答 8

这是重新加载模块的现代方法:

from importlib import reload

如果要支持3.5之前的Python版本,请尝试以下操作:

from sys import version_info
if version_info[0] < 3:
    pass # Python 2 has built in reload
elif version_info[0] == 3 and version_info[1] <= 4:
    from imp import reload # Python 3.0 - 3.4 
else:
    from importlib import reload # Python 3.5+

要使用它,请运行reload(MODULE),并替换MODULE为要重新加载的模块。

例如,reload(math)将重新加载math模块。

This is the modern way of reloading a module:

from importlib import reload

If you want to support versions of Python older than 3.5, try this:

from sys import version_info
if version_info[0] < 3:
    pass # Python 2 has built in reload
elif version_info[0] == 3 and version_info[1] <= 4:
    from imp import reload # Python 3.0 - 3.4 
else:
    from importlib import reload # Python 3.5+

To use it, run reload(MODULE), replacing MODULE with the module you want to reload.

For example, reload(math) will reload the math module.


回答 9

如果您不在服务器中,但是正在开发并且需要经常重新加载模块,那么这里是个不错的提示。

首先,请确保您使用的是Jupyter Notebook项目中出色的IPython shell。安装Jupyter后,你可以启动它ipython,或者jupyter console,甚至更好,jupyter qtconsole,这将为您提供一个漂亮的彩色控制台,并在任何OS中均具有代码完成功能。

现在在您的外壳中,键入:

%load_ext autoreload
%autoreload 2

现在,每次您运行脚本时,模块都会重新加载。

除了2,自动重载魔术还有其他选择

%autoreload
Reload all modules (except those excluded by %aimport) automatically now.

%autoreload 0
Disable automatic reloading.

%autoreload 1
Reload all modules imported with %aimport every time before executing the Python code typed.

%autoreload 2
Reload all modules (except those excluded by %aimport) every time before
executing the Python code typed.

If you are not in a server, but developing and need to frequently reload a module, here’s a nice tip.

First, make sure you are using the excellent IPython shell, from the Jupyter Notebook project. After installing Jupyter, you can start it with ipython, or jupyter console, or even better, jupyter qtconsole, which will give you a nice colorized console with code completion in any OS.

Now in your shell, type:

%load_ext autoreload
%autoreload 2

Now, every time you run your script, your modules will be reloaded.

Beyond the 2, there are other options of the autoreload magic:

%autoreload
Reload all modules (except those excluded by %aimport) automatically now.

%autoreload 0
Disable automatic reloading.

%autoreload 1
Reload all modules imported with %aimport every time before executing the Python code typed.

%autoreload 2
Reload all modules (except those excluded by %aimport) every time before
executing the Python code typed.

回答 10

对于那些想要卸载所有模块的人(在Emacs下的Python解释器中运行时):

   for mod in sys.modules.values():
      reload(mod)

有关更多信息,请参见重新加载Python模块

For those like me who want to unload all modules (when running in the Python interpreter under Emacs):

   for mod in sys.modules.values():
      reload(mod)

More information is in Reloading Python modules.


回答 11

追求特质有一个可以很好地完成此任务的模块。https://traits.readthedocs.org/zh/4.3.0/_modules/traits/util/refresh.html

它将重新加载已更改的所有模块,并更新正在使用该模块的其他模块和实例对象。大多数情况下它不起作用__very_private__方法使用,并且可能会阻塞类继承,但是它为我节省了编写PyQt guis或在Maya或Nuke等程序中运行的东西时不必重新启动主机应用程序的疯狂时间。它可能在20%到30%的时间内无效,但是仍然非常有用。

Enthought的软件包不会在文件更改时立即重新加载文件-您必须明确地调用它-但是如果您真的需要它,那么实现起来应该不那么困难

Enthought Traits has a module that works fairly well for this. https://traits.readthedocs.org/en/4.3.0/_modules/traits/util/refresh.html

It will reload any module that has been changed, and update other modules and instanced objects that are using it. It does not work most of the time with __very_private__ methods, and can choke on class inheritance, but it saves me crazy amounts of time from having to restart the host application when writing PyQt guis, or stuff that runs inside programs such as Maya or Nuke. It doesn’t work maybe 20-30 % of the time, but it’s still incredibly helpful.

Enthought’s package doesn’t reload files the moment they change – you have to call it explicitely – but that shouldn’t be all that hard to implement if you really need it


回答 12

那些正在使用python 3并从importlib重新加载的人。

如果您遇到问题,例如似乎模块无法重新加载…那是因为它需要一些时间来重新编译pyc(最多60秒)。我写此提示只是想知道您是否遇到过此类问题。

Those who are using python 3 and reload from importlib.

If you have problems like it seems that module doesn’t reload… That is because it needs some time to recompile pyc (up to 60 sec).I writing this hint just that you know if you have experienced this kind of problem.


回答 13

2018-02-01

  1. foo必须提前成功导入模块。
  2. from importlib import reloadreload(foo)

31.5。importlib —导入的实现— Python 3.6.4文档

2018-02-01

  1. module foo must be imported successfully in advance.
  2. from importlib import reload, reload(foo)

31.5. importlib — The implementation of import — Python 3.6.4 documentation


回答 14

其他选择。看到Python默认值importlib.reload将只是重新导入作为参数传递的库。它不会重新加载您的lib导入的库。如果您更改了很多文件并且要导入的包有些复杂,则必须进行一次深度重载

如果您安装了IPythonJupyter,则可以使用一个函数来深度重新加载所有库:

from IPython.lib.deepreload import reload as dreload
dreload(foo)

如果您没有Jupyter,请在外壳程序中使用以下命令将其安装:

pip3 install jupyter

Other option. See that Python default importlib.reload will just reimport the library passed as an argument. It won’t reload the libraries that your lib import. If you changed a lot of files and have a somewhat complex package to import, you must do a deep reload.

If you have IPython or Jupyter installed, you can use a function to deep reload all libs:

from IPython.lib.deepreload import reload as dreload
dreload(foo)

If you don’t have Jupyter, install it with this command in your shell:

pip3 install jupyter

回答 15

编辑(答案V2)

之前的解决方案仅适用于获取重置信息,但是它不会更改所有引用(超出reload但少于要求)。为了实际设置所有引用,我必须进入垃圾收集器,并在那里重写引用。现在它就像一种魅力!

请注意,这不会如果GC已关闭,或者重新加载了不受GC监视的数据,则。如果您不想弄乱GC,那么原始答案可能就足够了。

新代码:

import importlib
import inspect
import gc
from weakref import ref


def reset_module(module, inner_modules_also=True):
    """
    This function is a stronger form of importlib's `reload` function. What it does, is that aside from reloading a
    module, it goes to the old instance of the module, and sets all the (not read-only) attributes, functions and classes
    to be the reloaded-module's
    :param module: The module to reload (module reference, not the name)
    :param inner_modules_also: Whether to treat ths module as a package as well, and reload all the modules within it.
    """

    # For the case when the module is actually a package
    if inner_modules_also:
        submods = {submod for _, submod in inspect.getmembers(module)
                   if (type(submod).__name__ == 'module') and (submod.__package__.startswith(module.__name__))}
        for submod in submods:
            reset_module(submod, True)

    # First, log all the references before reloading (because some references may be changed by the reload operation).
    module_tree = _get_tree_references_to_reset_recursively(module, module.__name__)

    new_module = importlib.reload(module)
    _reset_item_recursively(module, module_tree, new_module)


def _update_referrers(item, new_item):
    refs = gc.get_referrers(item)

    weak_ref_item = ref(item)
    for coll in refs:
        if type(coll) == dict:
            enumerator = coll.keys()
        elif type(coll) == list:
            enumerator = range(len(coll))
        else:
            continue

        for key in enumerator:

            if weak_ref_item() is None:
                # No refs are left in the GC
                return

            if coll[key] is weak_ref_item():
                coll[key] = new_item

def _get_tree_references_to_reset_recursively(item, module_name, grayed_out_item_ids = None):
    if grayed_out_item_ids is None:
        grayed_out_item_ids = set()

    item_tree = dict()
    attr_names = set(dir(item)) - _readonly_attrs
    for sub_item_name in attr_names:

        sub_item = getattr(item, sub_item_name)
        item_tree[sub_item_name] = [sub_item, None]

        try:
            # Will work for classes and functions defined in that module.
            mod_name = sub_item.__module__
        except AttributeError:
            mod_name = None

        # If this item was defined within this module, deep-reset
        if (mod_name is None) or (mod_name != module_name) or (id(sub_item) in grayed_out_item_ids) \
                or isinstance(sub_item, EnumMeta):
            continue

        grayed_out_item_ids.add(id(sub_item))
        item_tree[sub_item_name][1] = \
            _get_tree_references_to_reset_recursively(sub_item, module_name, grayed_out_item_ids)

    return item_tree


def _reset_item_recursively(item, item_subtree, new_item):

    # Set children first so we don't lose the current references.
    if item_subtree is not None:
        for sub_item_name, (sub_item, sub_item_tree) in item_subtree.items():

            try:
                new_sub_item = getattr(new_item, sub_item_name)
            except AttributeError:
                # The item doesn't exist in the reloaded module. Ignore.
                continue

            try:
                # Set the item
                _reset_item_recursively(sub_item, sub_item_tree, new_sub_item)
            except Exception as ex:
                pass

    _update_referrers(item, new_item)

原始答案

就像@bobince的答案中所写,如果另一个模块中已经存在对该模块的引用(特别是如果它是使用as诸如import numpy as np),则该实例将不会被覆盖。

在应用要求配置模块处于“干净状态”状态的测试时,这对我来说是相当麻烦的,因此我编写了一个名为的函数,该函数reset_module使用importlibreload函数并递归覆盖所有声明的模块的属性。已通过Python 3.6版进行了测试。

import importlib
import inspect
from enum import EnumMeta

_readonly_attrs = {'__annotations__', '__call__', '__class__', '__closure__', '__code__', '__defaults__', '__delattr__',
               '__dict__', '__dir__', '__doc__', '__eq__', '__format__', '__func__', '__ge__', '__get__',
               '__getattribute__', '__globals__', '__gt__', '__hash__', '__init__', '__init_subclass__',
               '__kwdefaults__', '__le__', '__lt__', '__module__', '__name__', '__ne__', '__new__', '__qualname__',
               '__reduce__', '__reduce_ex__', '__repr__', '__self__', '__setattr__', '__sizeof__', '__str__',
               '__subclasshook__', '__weakref__', '__members__', '__mro__', '__itemsize__', '__isabstractmethod__',
               '__basicsize__', '__base__'}


def reset_module(module, inner_modules_also=True):
    """
    This function is a stronger form of importlib's `reload` function. What it does, is that aside from reloading a
    module, it goes to the old instance of the module, and sets all the (not read-only) attributes, functions and classes
    to be the reloaded-module's
    :param module: The module to reload (module reference, not the name)
    :param inner_modules_also: Whether to treat ths module as a package as well, and reload all the modules within it.
    """

    new_module = importlib.reload(module)

    reset_items = set()

    # For the case when the module is actually a package
    if inner_modules_also:
        submods = {submod for _, submod in inspect.getmembers(module)
                   if (type(submod).__name__ == 'module') and (submod.__package__.startswith(module.__name__))}
        for submod in submods:
            reset_module(submod, True)

    _reset_item_recursively(module, new_module, module.__name__, reset_items)


def _reset_item_recursively(item, new_item, module_name, reset_items=None):
    if reset_items is None:
        reset_items = set()

    attr_names = set(dir(item)) - _readonly_attrs

    for sitem_name in attr_names:

        sitem = getattr(item, sitem_name)
        new_sitem = getattr(new_item, sitem_name)

        try:
            # Set the item
            setattr(item, sitem_name, new_sitem)

            try:
                # Will work for classes and functions defined in that module.
                mod_name = sitem.__module__
            except AttributeError:
                mod_name = None

            # If this item was defined within this module, deep-reset
            if (mod_name is None) or (mod_name != module_name) or (id(sitem) in reset_items) \
                    or isinstance(sitem, EnumMeta):  # Deal with enums
                continue

            reset_items.add(id(sitem))
            _reset_item_recursively(sitem, new_sitem, module_name, reset_items)
        except Exception as ex:
            raise Exception(sitem_name) from ex

注意:小心使用!在非外围模块(例如,定义外部使用的类的模块)上使用它们可能会导致Python内部发生问题(例如,酸洗/不酸洗问题)。

Edit (Answer V2)

The solution from before is good for just getting the reset information, but it will not change all the references (more than reload but less then required). To actually set all the references as well, I had to go into the garbage collector, and rewrite the references there. Now it works like a charm!

Note that this will not work if the GC is turned off, or if reloading data that’s not monitored by the GC. If you don’t want to mess with the GC, the original answer might be enough for you.

New code:

import importlib
import inspect
import gc
from weakref import ref


def reset_module(module, inner_modules_also=True):
    """
    This function is a stronger form of importlib's `reload` function. What it does, is that aside from reloading a
    module, it goes to the old instance of the module, and sets all the (not read-only) attributes, functions and classes
    to be the reloaded-module's
    :param module: The module to reload (module reference, not the name)
    :param inner_modules_also: Whether to treat ths module as a package as well, and reload all the modules within it.
    """

    # For the case when the module is actually a package
    if inner_modules_also:
        submods = {submod for _, submod in inspect.getmembers(module)
                   if (type(submod).__name__ == 'module') and (submod.__package__.startswith(module.__name__))}
        for submod in submods:
            reset_module(submod, True)

    # First, log all the references before reloading (because some references may be changed by the reload operation).
    module_tree = _get_tree_references_to_reset_recursively(module, module.__name__)

    new_module = importlib.reload(module)
    _reset_item_recursively(module, module_tree, new_module)


def _update_referrers(item, new_item):
    refs = gc.get_referrers(item)

    weak_ref_item = ref(item)
    for coll in refs:
        if type(coll) == dict:
            enumerator = coll.keys()
        elif type(coll) == list:
            enumerator = range(len(coll))
        else:
            continue

        for key in enumerator:

            if weak_ref_item() is None:
                # No refs are left in the GC
                return

            if coll[key] is weak_ref_item():
                coll[key] = new_item

def _get_tree_references_to_reset_recursively(item, module_name, grayed_out_item_ids = None):
    if grayed_out_item_ids is None:
        grayed_out_item_ids = set()

    item_tree = dict()
    attr_names = set(dir(item)) - _readonly_attrs
    for sub_item_name in attr_names:

        sub_item = getattr(item, sub_item_name)
        item_tree[sub_item_name] = [sub_item, None]

        try:
            # Will work for classes and functions defined in that module.
            mod_name = sub_item.__module__
        except AttributeError:
            mod_name = None

        # If this item was defined within this module, deep-reset
        if (mod_name is None) or (mod_name != module_name) or (id(sub_item) in grayed_out_item_ids) \
                or isinstance(sub_item, EnumMeta):
            continue

        grayed_out_item_ids.add(id(sub_item))
        item_tree[sub_item_name][1] = \
            _get_tree_references_to_reset_recursively(sub_item, module_name, grayed_out_item_ids)

    return item_tree


def _reset_item_recursively(item, item_subtree, new_item):

    # Set children first so we don't lose the current references.
    if item_subtree is not None:
        for sub_item_name, (sub_item, sub_item_tree) in item_subtree.items():

            try:
                new_sub_item = getattr(new_item, sub_item_name)
            except AttributeError:
                # The item doesn't exist in the reloaded module. Ignore.
                continue

            try:
                # Set the item
                _reset_item_recursively(sub_item, sub_item_tree, new_sub_item)
            except Exception as ex:
                pass

    _update_referrers(item, new_item)

Original Answer

As written in @bobince’s answer, if there’s already a reference to that module in another module (especially if it was imported with the as keyword like import numpy as np), that instance will not be overwritten.

This proved quite problematic to me when applying tests that required a “clean-slate” state of the configuration modules, so I’ve written a function named reset_module that uses importlib‘s reload function and recursively overwrites all the declared module’s attributes. It has been tested with Python version 3.6.

import importlib
import inspect
from enum import EnumMeta

_readonly_attrs = {'__annotations__', '__call__', '__class__', '__closure__', '__code__', '__defaults__', '__delattr__',
               '__dict__', '__dir__', '__doc__', '__eq__', '__format__', '__func__', '__ge__', '__get__',
               '__getattribute__', '__globals__', '__gt__', '__hash__', '__init__', '__init_subclass__',
               '__kwdefaults__', '__le__', '__lt__', '__module__', '__name__', '__ne__', '__new__', '__qualname__',
               '__reduce__', '__reduce_ex__', '__repr__', '__self__', '__setattr__', '__sizeof__', '__str__',
               '__subclasshook__', '__weakref__', '__members__', '__mro__', '__itemsize__', '__isabstractmethod__',
               '__basicsize__', '__base__'}


def reset_module(module, inner_modules_also=True):
    """
    This function is a stronger form of importlib's `reload` function. What it does, is that aside from reloading a
    module, it goes to the old instance of the module, and sets all the (not read-only) attributes, functions and classes
    to be the reloaded-module's
    :param module: The module to reload (module reference, not the name)
    :param inner_modules_also: Whether to treat ths module as a package as well, and reload all the modules within it.
    """

    new_module = importlib.reload(module)

    reset_items = set()

    # For the case when the module is actually a package
    if inner_modules_also:
        submods = {submod for _, submod in inspect.getmembers(module)
                   if (type(submod).__name__ == 'module') and (submod.__package__.startswith(module.__name__))}
        for submod in submods:
            reset_module(submod, True)

    _reset_item_recursively(module, new_module, module.__name__, reset_items)


def _reset_item_recursively(item, new_item, module_name, reset_items=None):
    if reset_items is None:
        reset_items = set()

    attr_names = set(dir(item)) - _readonly_attrs

    for sitem_name in attr_names:

        sitem = getattr(item, sitem_name)
        new_sitem = getattr(new_item, sitem_name)

        try:
            # Set the item
            setattr(item, sitem_name, new_sitem)

            try:
                # Will work for classes and functions defined in that module.
                mod_name = sitem.__module__
            except AttributeError:
                mod_name = None

            # If this item was defined within this module, deep-reset
            if (mod_name is None) or (mod_name != module_name) or (id(sitem) in reset_items) \
                    or isinstance(sitem, EnumMeta):  # Deal with enums
                continue

            reset_items.add(id(sitem))
            _reset_item_recursively(sitem, new_sitem, module_name, reset_items)
        except Exception as ex:
            raise Exception(sitem_name) from ex

Note: Use with care! Using these on non-peripheral modules (modules that define externally-used classes, for example) might lead to internal problems in Python (such as pickling/un-pickling issues).


回答 16

对我而言,Abaqus就是这种方式。假设您的文件是Class_VerticesEdges.py

sys.path.append('D:\...\My Pythons')
if 'Class_VerticesEdges' in sys.modules:  
    del sys.modules['Class_VerticesEdges']
    print 'old module Class_VerticesEdges deleted'
from Class_VerticesEdges import *
reload(sys.modules['Class_VerticesEdges'])

for me for case of Abaqus it is the way it works. Imagine your file is Class_VerticesEdges.py

sys.path.append('D:\...\My Pythons')
if 'Class_VerticesEdges' in sys.modules:  
    del sys.modules['Class_VerticesEdges']
    print 'old module Class_VerticesEdges deleted'
from Class_VerticesEdges import *
reload(sys.modules['Class_VerticesEdges'])

回答 17

尝试在Sublime Text中重新加载某些内容时遇到了很多麻烦,但最终我可以编写此实用程序,根据代码在Sublime Text上重新加载模块 sublime_plugin.py用于重新加载模块重新加载模块。

下面的内容允许您从路径上带有空格的模块中重新加载模块,然后在重新加载之后,您可以照常导入。

def reload_module(full_module_name):
    """
        Assuming the folder `full_module_name` is a folder inside some
        folder on the python sys.path, for example, sys.path as `C:/`, and
        you are inside the folder `C:/Path With Spaces` on the file 
        `C:/Path With Spaces/main.py` and want to re-import some files on
        the folder `C:/Path With Spaces/tests`

        @param full_module_name   the relative full path to the module file
                                  you want to reload from a folder on the
                                  python `sys.path`
    """
    import imp
    import sys
    import importlib

    if full_module_name in sys.modules:
        module_object = sys.modules[full_module_name]
        module_object = imp.reload( module_object )

    else:
        importlib.import_module( full_module_name )

def run_tests():
    print( "\n\n" )
    reload_module( "Path With Spaces.tests.semantic_linefeed_unit_tests" )
    reload_module( "Path With Spaces.tests.semantic_linefeed_manual_tests" )

    from .tests import semantic_linefeed_unit_tests
    from .tests import semantic_linefeed_manual_tests

    semantic_linefeed_unit_tests.run_unit_tests()
    semantic_linefeed_manual_tests.run_manual_tests()

if __name__ == "__main__":
    run_tests()

如果是第一次运行,则应该加载该模块,但是如果以后可以再次使用该方法/功能run_tests(),它将重新加载测试文件。使用Sublime Text(Python 3.3.6)会发生很多事情,因为它的解释器永远不会关闭(除非您重新启动Sublime Text,即Python3.3解释器)。

I got a lot of trouble trying to reload something inside Sublime Text, but finally I could wrote this utility to reload modules on Sublime Text based on the code sublime_plugin.py uses to reload modules.

This below accepts you to reload modules from paths with spaces on their names, then later after reloading you can just import as you usually do.

def reload_module(full_module_name):
    """
        Assuming the folder `full_module_name` is a folder inside some
        folder on the python sys.path, for example, sys.path as `C:/`, and
        you are inside the folder `C:/Path With Spaces` on the file 
        `C:/Path With Spaces/main.py` and want to re-import some files on
        the folder `C:/Path With Spaces/tests`

        @param full_module_name   the relative full path to the module file
                                  you want to reload from a folder on the
                                  python `sys.path`
    """
    import imp
    import sys
    import importlib

    if full_module_name in sys.modules:
        module_object = sys.modules[full_module_name]
        module_object = imp.reload( module_object )

    else:
        importlib.import_module( full_module_name )

def run_tests():
    print( "\n\n" )
    reload_module( "Path With Spaces.tests.semantic_linefeed_unit_tests" )
    reload_module( "Path With Spaces.tests.semantic_linefeed_manual_tests" )

    from .tests import semantic_linefeed_unit_tests
    from .tests import semantic_linefeed_manual_tests

    semantic_linefeed_unit_tests.run_unit_tests()
    semantic_linefeed_manual_tests.run_manual_tests()

if __name__ == "__main__":
    run_tests()

If you run for the first time, this should load the module, but if later you can again the method/function run_tests() it will reload the tests files. With Sublime Text (Python 3.3.6) this happens a lot because its interpreter never closes (unless you restart Sublime Text, i.e., the Python3.3 interpreter).


回答 18

另一种方法是将模块导入功能中。这样,当函数完成时,模块将收集垃圾。

Another way could be to import the module in a function. This way when the function completes the module gets garbage collected.


如何导入其他Python文件?

问题:如何导入其他Python文件?

如何在Python中导入其他文件?

  1. 我到底该如何导入特定的python文件import file.py呢?
  2. 如何导入文件夹而不是特定文件?
  3. 我想根据用户输入在运行时动态加载Python文件。
  4. 我想知道如何从文件中仅加载一个特定部分。

例如,在main.py我有:

from extra import * 

尽管这给了我中的所有定义extra.py,但也许我只想要一个定义:

def gap():
    print
    print

我要从import语句中添加什么?gapextra.py

How do I import other files in Python?

  1. How exactly can I import a specific python file like import file.py?
  2. How can I import a folder instead of a specific file?
  3. I want to load a Python file dynamically at runtime, based on user input.
  4. I want to know how to load just one specific part from the file.

For example, in main.py I have:

from extra import * 

Although this gives me all the definitions in extra.py, when maybe all I want is a single definition:

def gap():
    print
    print

What do I add to the import statement to just get gap from extra.py?


回答 0

importlib已添加到Python 3中,以编程方式导入模块。它只是一个包装__import__,请参阅文档

import importlib

moduleName = input('Enter module name:')
importlib.import_module(moduleName)

注意:.py扩展名应从中删除moduleName。该函数还package为相对导入定义了一个参数。


更新:以下答案已过时。使用上面的最新替代方法。

  1. 只是import file没有’.py’扩展名。

  2. 您可以通过添加一个名为的空文件来将文件夹标记为包__init__.py

  3. 您可以使用该__import__功能。它以模块名称作为字符串。(同样:模块名称不带“ .py”扩展名。)

    pmName = input('Enter module name:')
    pm = __import__(pmName)
    print(dir(pm))
    

    输入help(__import__)以获取更多详细信息。

importlib was added to Python 3 to programmatically import a module. It is just a wrapper around __import__, see the docs.

import importlib

moduleName = input('Enter module name:')
importlib.import_module(moduleName)

Note: the .py extension should be removed from moduleName. The function also defines a package argument for relative imports.


Update: Answer below is outdated. Use the more recent alternative above.

  1. Just import file without the ‘.py’ extension.

  2. You can mark a folder as a package, by adding an empty file named __init__.py.

  3. You can use the __import__ function. It takes the module name as a string. (Again: module name without the ‘.py’ extension.)

    pmName = input('Enter module name:')
    pm = __import__(pmName)
    print(dir(pm))
    

    Type help(__import__) for more details.


回答 1

导入python文件的方法有很多,各有利弊。

不要只是匆忙地选择适合您的第一个导入策略,否则您将不得不在以后发现无法满足您的需要时重写代码库。

我将首先说明最简单的示例#1,然后将介绍最专业,最可靠的示例#7

示例1,使用python解释器导入python模块:

  1. 将其放在/home/el/foo/fox.py中:

    def what_does_the_fox_say():
      print("vixens cry")
  2. 进入python解释器:

    el@apollo:/home/el/foo$ python
    Python 2.7.3 (default, Sep 26 2013, 20:03:06) 
    >>> import fox
    >>> fox.what_does_the_fox_say()
    vixens cry
    >>> 

    您通过python解释器导入了fox,并what_does_the_fox_say()从fox.py中调用了python函数。

示例2,在脚本中使用execfile或(exec在Python 3中)在适当的位置执行另一个python文件:

  1. 将其放在/home/el/foo2/mylib.py中:

    def moobar():
      print("hi")
  2. 将其放在/home/el/foo2/main.py中:

    execfile("/home/el/foo2/mylib.py")
    moobar()
  3. 运行文件:

    el@apollo:/home/el/foo$ python main.py
    hi

    功能moobar是从mylib.py导入的,并在main.py中可用

示例3,从…使用…导入…功能:

  1. 将其放在/home/el/foo3/chekov.py中:

    def question():
      print "where are the nuclear wessels?"
  2. 将其放在/home/el/foo3/main.py中:

    from chekov import question
    question()
  3. 像这样运行它:

    el@apollo:/home/el/foo3$ python main.py 
    where are the nuclear wessels?

    如果您在chekov.py中定义了其他函数,除非您定义了这些函数,否则它们将不可用。 import *

示例4,如果导入的riaa.py与导入的文件位于不同的文件位置

  1. 将其放在/home/el/foo4/stuff/riaa.py中:

    def watchout():
      print "computers are transforming into a noose and a yoke for humans"
  2. 将其放在/home/el/foo4/main.py中:

    import sys 
    import os
    sys.path.append(os.path.abspath("/home/el/foo4/stuff"))
    from riaa import *
    watchout()
  3. 运行:

    el@apollo:/home/el/foo4$ python main.py 
    computers are transforming into a noose and a yoke for humans

    那会从另一个目录导入外部文件中的所有内容。

示例5,使用 os.system("python yourfile.py")

import os
os.system("python yourfile.py")

示例6,通过piggy带python startuphook导入文件:

更新:此示例曾经同时适用于python2和3,但现在仅适用于python2。python3摆脱了此用户启动钩子功能集,因为它被低技能的python库编写者滥用,使用它在所有用户定义的程序之前不礼貌地将其代码注入到全局命名空间中。如果您希望此功能适用于python3,则必须变得更有创意。如果我告诉您如何做,python开发人员也会禁用该功能集,因此您是一个人。

参见:https : //docs.python.org/2/library/user.html

将此代码放入您的主目录中 ~/.pythonrc.py

class secretclass:
    def secretmessage(cls, myarg):
        return myarg + " is if.. up in the sky, the sky"
    secretmessage = classmethod( secretmessage )

    def skycake(cls):
        return "cookie and sky pie people can't go up and "
    skycake = classmethod( skycake )

将此代码放入您的main.py(可以在任何地方):

import user
msg = "The only way skycake tates good" 
msg = user.secretclass.secretmessage(msg)
msg += user.secretclass.skycake()
print(msg + " have the sky pie! SKYCAKE!")

运行它,您应该获得以下信息:

$ python main.py
The only way skycake tates good is if.. up in the sky, 
the skycookie and sky pie people can't go up and  have the sky pie! 
SKYCAKE!

如果您在这里遇到错误:ModuleNotFoundError: No module named 'user'这意味着您正在使用python3,默认情况下会禁用启动钩。

值得一提的是:https : //github.com/docwhat/homedir-examples/blob/master/python-commandline/.pythonrc.py随便 发送。

示例7,最健壮:使用裸导入命令在python中导入文件:

  1. 建立一个新目录 /home/el/foo5/
  2. 建立一个新目录 /home/el/foo5/herp
  3. 制作一个以__init__.pyherp 命名的空文件:

    el@apollo:/home/el/foo5/herp$ touch __init__.py
    el@apollo:/home/el/foo5/herp$ ls
    __init__.py
  4. 新建一个目录/ home / el / foo5 / herp / derp

  5. 在derp下,制作另一个__init__.py文件:

    el@apollo:/home/el/foo5/herp/derp$ touch __init__.py
    el@apollo:/home/el/foo5/herp/derp$ ls
    __init__.py
  6. 在/ home / el / foo5 / herp / derp下,创建一个名为yolo.pyPut this 的新文件:

    def skycake():
      print "SkyCake evolves to stay just beyond the cognitive reach of " +
      "the bulk of men. SKYCAKE!!"
  7. 关键时刻,创建新文件/home/el/foo5/main.py,并将其放入其中;

    from herp.derp.yolo import skycake
    skycake()
  8. 运行:

    el@apollo:/home/el/foo5$ python main.py
    SkyCake evolves to stay just beyond the cognitive reach of the bulk 
    of men. SKYCAKE!!

    __init__.py文件会通知python解释器开发人员打算将此目录作为可导入包。

如果您想查看我的帖子,如何在目录下包含所有.py文件,请参见此处:https : //stackoverflow.com/a/20753073/445131

There are many ways to import a python file, all with their pros and cons.

Don’t just hastily pick the first import strategy that works for you or else you’ll have to rewrite the codebase later on when you find it doesn’t meet your needs.

I’ll start out explaining the easiest example #1, then I’ll move toward the most professional and robust example #7

Example 1, Import a python module with python interpreter:

  1. Put this in /home/el/foo/fox.py:

    def what_does_the_fox_say():
      print("vixens cry")
    
  2. Get into the python interpreter:

    el@apollo:/home/el/foo$ python
    Python 2.7.3 (default, Sep 26 2013, 20:03:06) 
    >>> import fox
    >>> fox.what_does_the_fox_say()
    vixens cry
    >>> 
    

    You imported fox through the python interpreter, invoked the python function what_does_the_fox_say() from within fox.py.

Example 2, Use execfile or (exec in Python 3) in a script to execute the other python file in place:

  1. Put this in /home/el/foo2/mylib.py:

    def moobar():
      print("hi")
    
  2. Put this in /home/el/foo2/main.py:

    execfile("/home/el/foo2/mylib.py")
    moobar()
    
  3. run the file:

    el@apollo:/home/el/foo$ python main.py
    hi
    

    The function moobar was imported from mylib.py and made available in main.py

Example 3, Use from … import … functionality:

  1. Put this in /home/el/foo3/chekov.py:

    def question():
      print "where are the nuclear wessels?"
    
  2. Put this in /home/el/foo3/main.py:

    from chekov import question
    question()
    
  3. Run it like this:

    el@apollo:/home/el/foo3$ python main.py 
    where are the nuclear wessels?
    

    If you defined other functions in chekov.py, they would not be available unless you import *

Example 4, Import riaa.py if it’s in a different file location from where it is imported

  1. Put this in /home/el/foo4/stuff/riaa.py:

    def watchout():
      print "computers are transforming into a noose and a yoke for humans"
    
  2. Put this in /home/el/foo4/main.py:

    import sys 
    import os
    sys.path.append(os.path.abspath("/home/el/foo4/stuff"))
    from riaa import *
    watchout()
    
  3. Run it:

    el@apollo:/home/el/foo4$ python main.py 
    computers are transforming into a noose and a yoke for humans
    

    That imports everything in the foreign file from a different directory.

Example 5, use os.system("python yourfile.py")

import os
os.system("python yourfile.py")

Example 6, import your file via piggybacking the python startuphook:

Update: This example used to work for both python2 and 3, but now only works for python2. python3 got rid of this user startuphook feature set because it was abused by low-skill python library writers, using it to impolitely inject their code into the global namespace, before all user-defined programs. If you want this to work for python3, you’ll have to get more creative. If I tell you how to do it, python developers will disable that feature set as well, so you’re on your own.

See: https://docs.python.org/2/library/user.html

Put this code into your home directory in ~/.pythonrc.py

class secretclass:
    def secretmessage(cls, myarg):
        return myarg + " is if.. up in the sky, the sky"
    secretmessage = classmethod( secretmessage )

    def skycake(cls):
        return "cookie and sky pie people can't go up and "
    skycake = classmethod( skycake )

Put this code into your main.py (can be anywhere):

import user
msg = "The only way skycake tates good" 
msg = user.secretclass.secretmessage(msg)
msg += user.secretclass.skycake()
print(msg + " have the sky pie! SKYCAKE!")

Run it, you should get this:

$ python main.py
The only way skycake tates good is if.. up in the sky, 
the skycookie and sky pie people can't go up and  have the sky pie! 
SKYCAKE!

If you get an error here: ModuleNotFoundError: No module named 'user' then it means you’re using python3, startuphooks are disabled there by default.

Credit for this jist goes to: https://github.com/docwhat/homedir-examples/blob/master/python-commandline/.pythonrc.py Send along your up-boats.

Example 7, Most Robust: Import files in python with the bare import command:

  1. Make a new directory /home/el/foo5/
  2. Make a new directory /home/el/foo5/herp
  3. Make an empty file named __init__.py under herp:

    el@apollo:/home/el/foo5/herp$ touch __init__.py
    el@apollo:/home/el/foo5/herp$ ls
    __init__.py
    
  4. Make a new directory /home/el/foo5/herp/derp

  5. Under derp, make another __init__.py file:

    el@apollo:/home/el/foo5/herp/derp$ touch __init__.py
    el@apollo:/home/el/foo5/herp/derp$ ls
    __init__.py
    
  6. Under /home/el/foo5/herp/derp make a new file called yolo.py Put this in there:

    def skycake():
      print "SkyCake evolves to stay just beyond the cognitive reach of " +
      "the bulk of men. SKYCAKE!!"
    
  7. The moment of truth, Make the new file /home/el/foo5/main.py, put this in there;

    from herp.derp.yolo import skycake
    skycake()
    
  8. Run it:

    el@apollo:/home/el/foo5$ python main.py
    SkyCake evolves to stay just beyond the cognitive reach of the bulk 
    of men. SKYCAKE!!
    

    The empty __init__.py file communicates to the python interpreter that the developer intends this directory to be an importable package.

If you want to see my post on how to include ALL .py files under a directory see here: https://stackoverflow.com/a/20753073/445131


回答 2

要在“运行时”以已知名称导入特定的Python文件:

import os
import sys

scriptpath = "../Test/"

# Add the directory containing your module to the Python path (wants absolute paths)
sys.path.append(os.path.abspath(scriptpath))

# Do the import
import MyModule

To import a specific Python file at ‘runtime’ with a known name:

import os
import sys

scriptpath = "../Test/"

# Add the directory containing your module to the Python path (wants absolute paths)
sys.path.append(os.path.abspath(scriptpath))

# Do the import
import MyModule

回答 3

您没有很多复杂的方法可以将python文件从一个文件夹导入到另一个文件夹。只需创建一个__init__.py文件,以声明此文件夹为python软件包,然后转到要导入的主机文件即可,只需键入

from root.parent.folder.file import variable, class, whatever

You do not have many complex methods to import a python file from one folder to another. Just create a __init__.py file to declare this folder is a python package and then go to your host file where you want to import just type

from root.parent.folder.file import variable, class, whatever


回答 4

导入文档..-链接以供参考

__init__.py需要这些文件才能使Python将目录视为包含软件包的目录,这样做是为了防止具有通用名称(例如字符串)的目录无意间隐藏了稍后在模块搜索路径中出现的有效模块。

__init__.py可以只是一个空文件,但也可以执行包的初始化代码或设置__all__变量。

mydir/spam/__init__.py
mydir/spam/module.py
import spam.module
or
from spam import module

Import doc .. — Link for reference

The __init__.py files are required to make Python treat the directories as containing packages, this is done to prevent directories with a common name, such as string, from unintentionally hiding valid modules that occur later on the module search path.

__init__.py can just be an empty file, but it can also execute initialization code for the package or set the __all__ variable.

mydir/spam/__init__.py
mydir/spam/module.py
import spam.module
or
from spam import module

回答 5

第一种情况:您要在file A.py中导入文件B.py,这两个文件位于同一文件夹中,如下所示:

. 
├── A.py 
└── B.py

您可以在file中执行此操作B.py

import A

要么

from A import *

要么

from A import THINGS_YOU_WANT_TO_IMPORT_IN_A

然后,您将可以使用文件A.py中文件的所有功能B.py


第二种情况:您要在file folder/A.py中导入文件B.py,这两个文件不在同一文件夹中,如下所示:

.
├── B.py
└── folder
     └── A.py

您可以在文件B中执行此操作:

import folder.A

要么

from folder.A import *

要么

from folder.A import THINGS_YOU_WANT_TO_IMPORT_IN_A

然后,您将可以使用文件A.py中文件的所有功能B.py


摘要:在第一种情况下,file A.py是您在file中导入的模块,B.py使用了语法import module_name。在第二种情况下,使用的语法folder是包含模块的软件包。A.pyimport package_name.module_name

有关软件包和模块的更多信息,请参考此链接

First case: You want to import file A.py in file B.py, these two files are in the same folder, like this:

. 
├── A.py 
└── B.py

You can do this in file B.py:

import A

or

from A import *

or

from A import THINGS_YOU_WANT_TO_IMPORT_IN_A

Then you will be able to use all the functions of file A.py in file B.py


Second case: You want to import file folder/A.py in file B.py, these two files are not in the same folder, like this:

.
├── B.py
└── folder
     └── A.py

You can do this in file B:

import folder.A

or

from folder.A import *

or

from folder.A import THINGS_YOU_WANT_TO_IMPORT_IN_A

Then you will be able to use all the functions of file A.py in file B.py


Summary: In the first case, file A.py is a module that you imports in file B.py, you used the syntax import module_name. In the second case, folder is the package that contains the module A.py, you used the syntax import package_name.module_name.

For more info on packages and modules, consult this link.


回答 6

from file import function_name  ######## Importing specific function
function_name()                 ######## Calling function

import file              ######## Importing whole package
file.function1_name()    ######## Calling function
file.function2_name()    ######## Calling function

这是到目前为止我已经了解的两种简单方法,请确保要导入为库的“ file.py”文件仅存在于当前目录中。

from file import function_name  ######## Importing specific function
function_name()                 ######## Calling function

and

import file              ######## Importing whole package
file.function1_name()    ######## Calling function
file.function2_name()    ######## Calling function

Here are the two simple ways I have understood by now and make sure your “file.py” file which you want to import as a library is present in your current directory only.


回答 7

如果定义的函数在文件中x.py

def greet():
    print('Hello! How are you?')

在要导入函数的文件中,编写以下代码:

from x import greet

如果您不希望将所有功能导入文件中,这将很有用。

If the function defined is in a file x.py:

def greet():
    print('Hello! How are you?')

In the file where you are importing the function, write this:

from x import greet

This is useful if you do not wish to import all the functions in a file.


回答 8

导入.py文件的最佳方法是通过__init__.py。最简单的操作是__init__.py在your.py文件所在的目录中创建一个名为空文件。

Mike Grouchy的这篇文章很好地解释了__init__.py它在制作,导入和设置python包中的用途。

the best way to import .py files is by way of __init__.py. the simplest thing to do, is to create an empty file named __init__.py in the same directory that your.py file is located.

this post by Mike Grouchy is a great explanation of __init__.py and its use for making, importing, and setting up python packages.


回答 9

我的导入方式是导入文件并使用其名称的缩写。

import DoStuff.py as DS
DS.main()

不要忘记您的导入文件必须以.py扩展名命名

How I import is import the file and use shorthand of it’s name.

import DoStuff.py as DS
DS.main()

Don’t forget that your importing file MUST BE named with .py extension


回答 10

我想在其他地方加一个不太清楚的注释。在模块/软件包内部,从文件加载时,模块/软件包名称必须带有前缀mymodule。想象一下mymodule这样的布局:

/main.py
/mymodule
    /__init__.py
    /somefile.py
    /otherstuff.py

从内容中加载somefile.py/ 时应如下所示:otherstuff.py__init__.py

from mymodule.somefile import somefunc
from mymodule.otherstuff import otherfunc

I’d like to add this note I don’t very clearly elsewhere; inside a module/package, when loading from files, the module/package name must be prefixed with the mymodule. Imagine mymodule being layout like this:

/main.py
/mymodule
    /__init__.py
    /somefile.py
    /otherstuff.py

When loading somefile.py/otherstuff.py from __init__.py the contents should look like:

from mymodule.somefile import somefunc
from mymodule.otherstuff import otherfunc

回答 11

如果要导入的模块不在子目录中,请尝试以下操作并app.py从最深的公共父目录中运行:

目录结构:

/path/to/common_dir/module/file.py
/path/to/common_dir/application/app.py
/path/to/common_dir/application/subpath/config.json

在中app.py,将客户端路径附加到sys.path:

import os, sys, inspect

sys.path.append(os.getcwd())
from module.file import MyClass
instance = MyClass()

可选(如果您加载例如configs)(对于我的用例,检查似乎是最可靠的一种)

# Get dirname from inspect module
filename = inspect.getframeinfo(inspect.currentframe()).filename
dirname = os.path.dirname(os.path.abspath(filename))
MY_CONFIG = os.path.join(dirname, "subpath/config.json")

user@host:/path/to/common_dir$ python3 application/app.py

此解决方案在cli和PyCharm中都对我有效。

In case the module you want to import is not in a sub-directory, then try the following and run app.py from the deepest common parent directory:

Directory Structure:

/path/to/common_dir/module/file.py
/path/to/common_dir/application/app.py
/path/to/common_dir/application/subpath/config.json

In app.py, append path of client to sys.path:

import os, sys, inspect

sys.path.append(os.getcwd())
from module.file import MyClass
instance = MyClass()

Optional (If you load e.g. configs) (Inspect seems to be the most robust one for my use cases)

# Get dirname from inspect module
filename = inspect.getframeinfo(inspect.currentframe()).filename
dirname = os.path.dirname(os.path.abspath(filename))
MY_CONFIG = os.path.join(dirname, "subpath/config.json")

Run

user@host:/path/to/common_dir$ python3 application/app.py

This solution works for me in cli, as well as PyCharm.


回答 12

有几种方法可以包含名称为abc.py的python脚本

  1. 例如,如果您的文件名为abc.py(导入abc),则限制是文件应位于与您的调用python脚本相同的位置。

导入abc

  1. 例如,如果您的python文件位于Windows文件夹中。Windows文件夹位于您调用Python脚本的位置。

从文件夹导入abc

  1. 如果abc.py脚本在文件夹中存在的Insider Internal_folder内部可用

从folder.internal_folder导入abc

  1. 如上文James所述,如果您的文件位于某个固定位置

import os
import sys
scriptpath =“ ../Test/MyModule.py”
sys.path.append(os.path.abspath(scriptpath))
import MyModule

如果您的python脚本已更新并且不想上传,请使用这些语句进行自动刷新。奖金:)

%load_ext autoreload 
%autoreload 2

There are couple of ways of including your python script with name abc.py

  1. e.g. if your file is called abc.py (import abc) Limitation is that your file should be present in the same location where your calling python script is.

import abc

  1. e.g. if your python file is inside the Windows folder. Windows folder is present at the same location where your calling python script is.

from folder import abc

  1. Incase abc.py script is available insider internal_folder which is present inside folder

from folder.internal_folder import abc

  1. As answered by James above, in case your file is at some fixed location

import os
import sys
scriptpath = “../Test/MyModule.py”
sys.path.append(os.path.abspath(scriptpath))
import MyModule

In case your python script gets updated and you don’t want to upload – use these statements for auto refresh. Bonus :)

%load_ext autoreload 
%autoreload 2

回答 13

只是为了将python文件导入另一个python文件

可以说我有一个具有显示功能的helper.py python文件,

def display():
    print("I'm working sundar gsv")

现在在app.py中,您可以使用显示功能,

import helper
helper.display()

输出,

I'm working sundar gsv

注意:无需指定.py扩展名。

Just to import python file in another python file

lets say I have helper.py python file which has a display function like,

def display():
    print("I'm working sundar gsv")

Now in app.py, you can use the display function,

import helper
helper.display()

The output,

I'm working sundar gsv

NOTE: No need to specify the .py extension.


回答 14

这听起来很疯狂,但是如果您只是要为其创建包装脚本,则可以仅创建指向要导入文件的符号链接。

This may sound crazy but you can just create a symbolic link to the file you want to import if you’re just creating a wrapper script to it.


回答 15

您也可以这样做: from filename import something

示例:from client import Client 请注意,您不需要.py .pyw .pyui扩展名。

You can also do this: from filename import something

example: from client import Client Note that you do not need the .py .pyw .pyui extension.


回答 16

这就是我从python文件中调用函数的方式,这对我来说可以灵活地调用任何函数。

import os, importlib, sys

def callfunc(myfile, myfunc, *args):
    pathname, filename = os.path.split(myfile)
    sys.path.append(os.path.abspath(pathname))
    modname = os.path.splitext(filename)[0]
    mymod = importlib.import_module(modname)
    result = getattr(mymod, myfunc)(*args)
    return result

result = callfunc("pathto/myfile.py", "myfunc", arg1, arg2)

This is how I did to call a function from a python file, that is flexible for me to call any functions.

import os, importlib, sys

def callfunc(myfile, myfunc, *args):
    pathname, filename = os.path.split(myfile)
    sys.path.append(os.path.abspath(pathname))
    modname = os.path.splitext(filename)[0]
    mymod = importlib.import_module(modname)
    result = getattr(mymod, myfunc)(*args)
    return result

result = callfunc("pathto/myfile.py", "myfunc", arg1, arg2)

回答 17

如上所述,有很多方法,但是我发现我只想导入他文件的内容,而不想要写行和输入其他模块。因此,我想出了一种方法来获取文件的内容,即使使用点语法(file.property)也是如此,而不是将导入的文件与您的文件合并。
首先,这是我要导入的文件,data.py

    testString= "A string literal to import and test with"


注意:您可以改用.txt扩展名。
在中mainfile.py,首先打开并获取内容。

    #!usr/bin/env python3
    Data=open('data.txt','r+').read()

现在,您已将内容作为字符串保存,但是尝试访问data.testString将导致错误,就像该类data的实例一样str,即使它确实具有属性testString,也无法实现您期望的效果。
接下来,创建一个类。例如(双关语意图),ImportedFile

    class ImportedFile:

并放入其中(带有适当的缩进):

    exec(data)


最后,data像这样重新分配:

    data=ImportedFile()

就是这样!就像访问其他模块一样进行访问,print(data.testString)将在控制台上输入内容A string literal to import and test with
但是,如果您from mod import *只想删除类,实例分配并取消缩进,则等效于exec

希望这会
有所帮助:)-Benji

There are many ways, as listed above, but I find that I just want to import he contents of a file, and don’t want to have to write lines and lines and have to import other modules. So, I came up with a way to get the contents of a file, even with the dot syntax (file.property) as opposed to merging the imported file with yours.
First of all, here is my file which I’ll import, data.py

    testString= "A string literal to import and test with"


Note: You could use the .txt extension instead.
In mainfile.py, start by opening and getting the contents.

    #!usr/bin/env python3
    Data=open('data.txt','r+').read()

Now you have the contents as a string, but trying to access data.testString will cause an error, as data is an instance of the str class, and even if it does have a property testString it will not do what you expected.
Next, create a class. For instance (pun intended), ImportedFile

    class ImportedFile:

And put this into it (with the appropriate indentation):

    exec(data)


And finally, re-assign data like so:

    data=ImportedFile()

And that’s it! Just access like you would for any-other module, typing print(data.testString) will print to the console A string literal to import and test with.
If, however, you want the equivalent of from mod import * just drop the class, instance assignment, and de-dent the exec.

Hope this helps:)
-Benji


回答 18

Python的一个非常未知的功能是能够导入zip文件:

library.zip
|-library
|--__init__.py

__init__.py软件包的文件包含以下内容:

def dummy():
    print 'Testing things out...'

我们可以编写另一个脚本,该脚本可以从zip存档中导入包。仅需要将zip文件添加到sys.path。

import sys
sys.path.append(r'library.zip')

import library

def run():
    library.dummy()

run()

One very unknown feature of Python is the ability to import zip files:

library.zip
|-library
|--__init__.py

The file __init__.py of the package contains the following:

def dummy():
    print 'Testing things out...'

We can write another script which can import a package from the zip archive. It is only necessary to add the zip file to the sys.path.

import sys
sys.path.append(r'library.zip')

import library

def run():
    library.dummy()

run()

Python中递增和递减运算符的行为

问题:Python中递增和递减运算符的行为

我注意到,可以对变量(如++count)应用预增减算符。它可以编译,但实际上并不会改变变量的值!

Python中预增/减运算符(++ /-)的行为是什么?

为什么Python会偏离C / C ++中看到的这些运算符的行为?

I notice that a pre-increment/decrement operator can be applied on a variable (like ++count). It compiles, but it does not actually change the value of the variable!

What is the behavior of the pre-increment/decrement operators (++/–) in Python?

Why does Python deviate from the behavior of these operators seen in C/C++?


回答 0

++不是运算符。它是两个+运算符。该+运营商的身份运营,这什么都不做。(澄清:the +-一元运算符仅对数字起作用,但是我假设您不会期望假设的++运算符对字符串起作用。)

++count

解析为

+(+count)

转化为

count

您必须使用稍长的+=运算符来完成您想做的事情:

count += 1

我怀疑++--运算符因一致性和简单性而被遗漏了。我不知道Guido van Rossum做出决定的确切论据,但我可以想象一些论点:

  • 更简单的解析。从技术上讲,解析++count是模糊的,因为它可能是++count(两个一元+经营者)一样容易,因为它可能是++count(一个一元++运算符)。它不是语法上的明显歧义,但确实存在。
  • 语言更简单。++只不过是的同义词+= 1。这是一种速记方法,因为C编译器很愚蠢,并且不知道如何优化大多数计算机所拥有a += 1inc指令。在优化编译器和字节码解释语言的这一天,通常不赞成在一种语言中添加运算符以允许程序员优化其代码,尤其是在像Python这样设计成一致且易读的语言中。
  • 令人困惑的副作用。带有++运算符的语言中一个常见的新手错误是将递增/递减运算符前后的差异(优先级和返回值)混合在一起,Python喜欢消除语言“陷阱”。该优先事项用C前置/后置增量是相当毛,和令人难以置信的容易陷入困境。

++ is not an operator. It is two + operators. The + operator is the identity operator, which does nothing. (Clarification: the + and - unary operators only work on numbers, but I presume that you wouldn’t expect a hypothetical ++ operator to work on strings.)

++count

Parses as

+(+count)

Which translates to

count

You have to use the slightly longer += operator to do what you want to do:

count += 1

I suspect the ++ and -- operators were left out for consistency and simplicity. I don’t know the exact argument Guido van Rossum gave for the decision, but I can imagine a few arguments:

  • Simpler parsing. Technically, parsing ++count is ambiguous, as it could be +, +, count (two unary + operators) just as easily as it could be ++, count (one unary ++ operator). It’s not a significant syntactic ambiguity, but it does exist.
  • Simpler language. ++ is nothing more than a synonym for += 1. It was a shorthand invented because C compilers were stupid and didn’t know how to optimize a += 1 into the inc instruction most computers have. In this day of optimizing compilers and bytecode interpreted languages, adding operators to a language to allow programmers to optimize their code is usually frowned upon, especially in a language like Python that is designed to be consistent and readable.
  • Confusing side-effects. One common newbie error in languages with ++ operators is mixing up the differences (both in precedence and in return value) between the pre- and post-increment/decrement operators, and Python likes to eliminate language “gotcha”-s. The precedence issues of pre-/post-increment in C are pretty hairy, and incredibly easy to mess up.

回答 1

当您想增加或减少时,通常需要对整数进行操作。像这样:

b++

但是在Python中,整数是不可变的。那是你不能改变他们。这是因为可以使用多个名称使用整数对象。尝试这个:

>>> b = 5
>>> a = 5
>>> id(a)
162334512
>>> id(b)
162334512
>>> a is b
True

上面的a和b实际上是同一对象。如果增加a,也将增加b。那不是你想要的。因此,您必须重新分配。像这样:

b = b + 1

或更简单:

b += 1

哪个将重新分配bb+1。那不是增量运算符,因为它不会增量b,而是重新分配它。

简而言之:Python在这里的行为有所不同,因为它不是C,也不是机器代码的底层包装,而是高级动态语言,在这种语言中,增量没有意义,也没有C所必需,例如,每次有循环时在哪里使用它们。

When you want to increment or decrement, you typically want to do that on an integer. Like so:

b++

But in Python, integers are immutable. That is you can’t change them. This is because the integer objects can be used under several names. Try this:

>>> b = 5
>>> a = 5
>>> id(a)
162334512
>>> id(b)
162334512
>>> a is b
True

a and b above are actually the same object. If you incremented a, you would also increment b. That’s not what you want. So you have to reassign. Like this:

b = b + 1

Or simpler:

b += 1

Which will reassign b to b+1. That is not an increment operator, because it does not increment b, it reassigns it.

In short: Python behaves differently here, because it is not C, and is not a low level wrapper around machine code, but a high-level dynamic language, where increments don’t make sense, and also are not as necessary as in C, where you use them every time you have a loop, for example.


回答 2

尽管其他答案在表明仅仅+做某事上是正确的(即,保留数字,如果是一个,则保持不变),但就他们不解释会发生什么而言,它们是不完整的。

确切地说,+xx.__pos__()++x求值x.__pos__().__pos__()

我可以想象一个非常奇怪的类结构(孩子们,不要在家做!),像这样:

class ValueKeeper(object):
    def __init__(self, value): self.value = value
    def __str__(self): return str(self.value)

class A(ValueKeeper):
    def __pos__(self):
        print 'called A.__pos__'
        return B(self.value - 3)

class B(ValueKeeper):
    def __pos__(self):
        print 'called B.__pos__'
        return A(self.value + 19)

x = A(430)
print x, type(x)
print +x, type(+x)
print ++x, type(++x)
print +++x, type(+++x)

While the others answers are correct in so far as they show what a mere + usually does (namely, leave the number as it is, if it is one), they are incomplete in so far as they don’t explain what happens.

To be exact, +x evaluates to x.__pos__() and ++x to x.__pos__().__pos__().

I could imagine a VERY weird class structure (Children, don’t do this at home!) like this:

class ValueKeeper(object):
    def __init__(self, value): self.value = value
    def __str__(self): return str(self.value)

class A(ValueKeeper):
    def __pos__(self):
        print 'called A.__pos__'
        return B(self.value - 3)

class B(ValueKeeper):
    def __pos__(self):
        print 'called B.__pos__'
        return A(self.value + 19)

x = A(430)
print x, type(x)
print +x, type(+x)
print ++x, type(++x)
print +++x, type(+++x)

回答 3

Python没有这些运算符,但是如果您确实需要它们,则可以编写具有相同功能的函数。

def PreIncrement(name, local={}):
    #Equivalent to ++name
    if name in local:
        local[name]+=1
        return local[name]
    globals()[name]+=1
    return globals()[name]

def PostIncrement(name, local={}):
    #Equivalent to name++
    if name in local:
        local[name]+=1
        return local[name]-1
    globals()[name]+=1
    return globals()[name]-1

用法:

x = 1
y = PreIncrement('x') #y and x are both 2
a = 1
b = PostIncrement('a') #b is 1 and a is 2

在函数内部,如果要更改局部变量,则必须添加locals()作为第二个参数,否则它将尝试更改全局变量。

x = 1
def test():
    x = 10
    y = PreIncrement('x') #y will be 2, local x will be still 10 and global x will be changed to 2
    z = PreIncrement('x', locals()) #z will be 11, local x will be 11 and global x will be unaltered
test()

使用这些功能,您还可以执行以下操作:

x = 1
print(PreIncrement('x'))   #print(x+=1) is illegal!

但是我认为以下方法更加清晰:

x = 1
x+=1
print(x)

减量运算符:

def PreDecrement(name, local={}):
    #Equivalent to --name
    if name in local:
        local[name]-=1
        return local[name]
    globals()[name]-=1
    return globals()[name]

def PostDecrement(name, local={}):
    #Equivalent to name--
    if name in local:
        local[name]-=1
        return local[name]+1
    globals()[name]-=1
    return globals()[name]+1

我在将javascript转换为python的模块中使用了这些功能。

Python does not have these operators, but if you really need them you can write a function having the same functionality.

def PreIncrement(name, local={}):
    #Equivalent to ++name
    if name in local:
        local[name]+=1
        return local[name]
    globals()[name]+=1
    return globals()[name]

def PostIncrement(name, local={}):
    #Equivalent to name++
    if name in local:
        local[name]+=1
        return local[name]-1
    globals()[name]+=1
    return globals()[name]-1

Usage:

x = 1
y = PreIncrement('x') #y and x are both 2
a = 1
b = PostIncrement('a') #b is 1 and a is 2

Inside a function you have to add locals() as a second argument if you want to change local variable, otherwise it will try to change global.

x = 1
def test():
    x = 10
    y = PreIncrement('x') #y will be 2, local x will be still 10 and global x will be changed to 2
    z = PreIncrement('x', locals()) #z will be 11, local x will be 11 and global x will be unaltered
test()

Also with these functions you can do:

x = 1
print(PreIncrement('x'))   #print(x+=1) is illegal!

But in my opinion following approach is much clearer:

x = 1
x+=1
print(x)

Decrement operators:

def PreDecrement(name, local={}):
    #Equivalent to --name
    if name in local:
        local[name]-=1
        return local[name]
    globals()[name]-=1
    return globals()[name]

def PostDecrement(name, local={}):
    #Equivalent to name--
    if name in local:
        local[name]-=1
        return local[name]+1
    globals()[name]-=1
    return globals()[name]+1

I used these functions in my module translating javascript to python.


回答 4

在Python中,与Common Lisp,Scheme或Ruby之类的语言相比,严格执行了表达式和语句之间的区别。

维基百科

因此,通过引入此类运算符,可以打破表达式/语句的拆分。

出于同样的原因,你不能写

if x = 0:
  y = 1

就像其他一些语言一样,这种语言没有保留。

In Python, a distinction between expressions and statements is rigidly enforced, in contrast to languages such as Common Lisp, Scheme, or Ruby.

Wikipedia

So by introducing such operators, you would break the expression/statement split.

For the same reason you can’t write

if x = 0:
  y = 1

as you can in some other languages where such distinction is not preserved.


回答 5

TL; DR

Python没有一元增减运算符(--/ ++)。相反,要增加值,请使用

a += 1

更多细节和陷阱

但是请注意这里。如果您来自C,即使在python中也是如此。在C的意义上,Python没有“变量”,而是python使用名称对象,并且在ints中是不可变的。

所以说你做

a = 1

这在python中的含义是:创建一个int具有值的类型的对象,1并将名称绑定a到该对象。的对象是的一个实例int具有值1,并且名称 a是指它。名称a和它引用的对象是不同的。

现在说你做

a += 1

由于ints是不可变的,因此这里发生的情况如下:

  1. 查找所a引用的对象(intID为0x559239eeb380
  2. 查找对象的值0x559239eeb380(为1
  3. 给该值加1(1 +1 = 2)
  4. 创建一个具有值的 int对象2(它具有对象id 0x559239eeb3a0
  5. 将名称重新绑定a到这个新对象
  6. 现在a引用对象,0x559239eeb3a0并且0x559239eeb380名称不再引用原始对象()a。如果没有其他名称引用原始对象,则稍后将对其进行垃圾回收。

自己尝试一下:

a = 1
print(hex(id(a)))
a += 1
print(hex(id(a)))

TL;DR

Python does not have unary increment/decrement operators (--/++). Instead, to increment a value, use

a += 1

More detail and gotchas

But be careful here. If you’re coming from C, even this is different in python. Python doesn’t have “variables” in the sense that C does, instead python uses names and objects, and in python ints are immutable.

so lets say you do

a = 1

What this means in python is: create an object of type int having value 1 and bind the name a to it. The object is an instance of int having value 1, and the name a refers to it. The name a and the object to which it refers are distinct.

Now lets say you do

a += 1

Since ints are immutable, what happens here is as follows:

  1. look up the object that a refers to (it is an int with id 0x559239eeb380)
  2. look up the value of object 0x559239eeb380 (it is 1)
  3. add 1 to that value (1 + 1 = 2)
  4. create a new int object with value 2 (it has object id 0x559239eeb3a0)
  5. rebind the name a to this new object
  6. Now a refers to object 0x559239eeb3a0 and the original object (0x559239eeb380) is no longer refered to by the name a. If there aren’t any other names refering to the original object it will be garbage collected later.

Give it a try yourself:

a = 1
print(hex(id(a)))
a += 1
print(hex(id(a)))

回答 6

是的,我也错过了++和-功能。几百万行c代码使这种思想深深地扎根在我的脑海中,而不是与之抗争……这是我拼凑而成的一类,实现了:

pre- and post-increment, pre- and post-decrement, addition,
subtraction, multiplication, division, results assignable
as integer, printable, settable.

这是:

class counter(object):
    def __init__(self,v=0):
        self.set(v)

    def preinc(self):
        self.v += 1
        return self.v
    def predec(self):
        self.v -= 1
        return self.v

    def postinc(self):
        self.v += 1
        return self.v - 1
    def postdec(self):
        self.v -= 1
        return self.v + 1

    def __add__(self,addend):
        return self.v + addend
    def __sub__(self,subtrahend):
        return self.v - subtrahend
    def __mul__(self,multiplier):
        return self.v * multiplier
    def __div__(self,divisor):
        return self.v / divisor

    def __getitem__(self):
        return self.v

    def __str__(self):
        return str(self.v)

    def set(self,v):
        if type(v) != int:
            v = 0
        self.v = v

您可以这样使用它:

c = counter()                          # defaults to zero
for listItem in myList:                # imaginary task
     doSomething(c.postinc(),listItem) # passes c, but becomes c+1

…已经有了c,您可以执行此操作…

c.set(11)
while c.predec() > 0:
    print c

….要不就…

d = counter(11)
while d.predec() > 0:
    print d

…并用于(重新)分配为整数…

c = counter(100)
d = c + 223 # assignment as integer
c = c + 223 # re-assignment as integer
print type(c),c # <type 'int'> 323

…这将使c保持为类型计数器:

c = counter(100)
c.set(c + 223)
print type(c),c # <class '__main__.counter'> 323

编辑:

然后还有一些意想不到的(并且完全是不想要的)行为

c = counter(42)
s = '%s: %d' % ('Expecting 42',c) # but getting non-numeric exception
print s

…因为在该元组中,没有使用getitem(),而是将对对象的引用传递给格式函数。叹。所以:

c = counter(42)
s = '%s: %d' % ('Expecting 42',c.v) # and getting 42.
print s

…或更确切地说,是我们实际上想要发生的事情,尽管冗长性以实际形式相反表示(c.v改为使用)。

c = counter(42)
s = '%s: %d' % ('Expecting 42',c.__getitem__()) # and getting 42.
print s

Yeah, I missed ++ and — functionality as well. A few million lines of c code engrained that kind of thinking in my old head, and rather than fight it… Here’s a class I cobbled up that implements:

pre- and post-increment, pre- and post-decrement, addition,
subtraction, multiplication, division, results assignable
as integer, printable, settable.

Here ’tis:

class counter(object):
    def __init__(self,v=0):
        self.set(v)

    def preinc(self):
        self.v += 1
        return self.v
    def predec(self):
        self.v -= 1
        return self.v

    def postinc(self):
        self.v += 1
        return self.v - 1
    def postdec(self):
        self.v -= 1
        return self.v + 1

    def __add__(self,addend):
        return self.v + addend
    def __sub__(self,subtrahend):
        return self.v - subtrahend
    def __mul__(self,multiplier):
        return self.v * multiplier
    def __div__(self,divisor):
        return self.v / divisor

    def __getitem__(self):
        return self.v

    def __str__(self):
        return str(self.v)

    def set(self,v):
        if type(v) != int:
            v = 0
        self.v = v

You might use it like this:

c = counter()                          # defaults to zero
for listItem in myList:                # imaginary task
     doSomething(c.postinc(),listItem) # passes c, but becomes c+1

…already having c, you could do this…

c.set(11)
while c.predec() > 0:
    print c

….or just…

d = counter(11)
while d.predec() > 0:
    print d

…and for (re-)assignment into integer…

c = counter(100)
d = c + 223 # assignment as integer
c = c + 223 # re-assignment as integer
print type(c),c # <type 'int'> 323

…while this will maintain c as type counter:

c = counter(100)
c.set(c + 223)
print type(c),c # <class '__main__.counter'> 323

EDIT:

And then there’s this bit of unexpected (and thoroughly unwanted) behavior,

c = counter(42)
s = '%s: %d' % ('Expecting 42',c) # but getting non-numeric exception
print s

…because inside that tuple, getitem() isn’t what used, instead a reference to the object is passed to the formatting function. Sigh. So:

c = counter(42)
s = '%s: %d' % ('Expecting 42',c.v) # and getting 42.
print s

…or, more verbosely, and explicitly what we actually wanted to happen, although counter-indicated in actual form by the verbosity (use c.v instead)…

c = counter(42)
s = '%s: %d' % ('Expecting 42',c.__getitem__()) # and getting 42.
print s

回答 7

python中没有像C之类的语言中的post / pre增量/减量运算符。

我们可以看到++--随着多个符号成倍增加,就像我们在数学(-1)*(-1)=(+1)中一样。

例如

---count

解析为

-(-(-count)))

转化为

-(+count)

因为,-符号与-符号的乘积为+

最后,

-count

There are no post/pre increment/decrement operators in python like in languages like C.

We can see ++ or -- as multiple signs getting multiplied, like we do in maths (-1) * (-1) = (+1).

E.g.

---count

Parses as

-(-(-count)))

Which translates to

-(+count)

Because, multiplication of - sign with - sign is +

And finally,

-count

回答 8

在python 3.8+中,您可以执行以下操作:

(a:=a+1) #same as a++

您可以对此进行很多思考。

>>> a = 0
>>> while (a:=a+1) < 5:
    print(a)


1
2
3
4

或者,如果您想使用更复杂的语法编写东西(目标不是优化):

>>> del a
>>> while (a := (a if 'a' in locals() else 0) + 1) < 5:
    print(a)


1
2
3
4

如果不存在任何错误,它将很好地返回0,然后将其设置为1

In python 3.8+ you can do :

(a:=a+1) #same as a++

You can do a lot of thinks with this.

>>> a = 0
>>> while (a:=a+1) < 5:
    print(a)


1
2
3
4

Or if you want write somthing with more sophisticated syntaxe (the goal is not optimization):

>>> del a
>>> while (a := (a if 'a' in locals() else 0) + 1) < 5:
    print(a)


1
2
3
4

It well return 0 if a dosn’t exist without errors, and then will set it to 1