分类目录归档:知识问答

安装Graphviz 2.38后,“ RuntimeError:确保Graphviz可执行文件在系统路径上”

问题:安装Graphviz 2.38后,“ RuntimeError:确保Graphviz可执行文件在系统路径上”

我下载了Graphviz 2.38MSI版本并安装在文件夹下C:\Python34,然后运行pip install Graphviz,一切顺利。在系统的路径中,我添加了C:\Python34\bin。在尝试在线运行测试脚本时filename=dot.render(filename='test'),我收到一条消息

 RuntimeError: failed to execute ['dot', '-Tpdf', '-O', 'test'], make sure the Graphviz executables are on your systems' path

我试图"C:\Python34\bin\dot.exe"输入系统的路径,但是它没有用,甚至创建了一个"GRAPHVIZ_DOT"带有value 的新环境变量"C:\Python34\bin\dot.exe",但仍然没有用。我尝试卸载Graphviz和pip uninstall graphviz,然后重新安装并再次进行pip安装,但是没有任何效果。

整个回溯消息是:

Traceback (most recent call last):
  File "C:\Python34\lib\site-packages\graphviz\files.py", line 220, in render
    proc = subprocess.Popen(cmd, startupinfo=STARTUPINFO)
  File "C:\Python34\lib\subprocess.py", line 859, in __init__
    restore_signals, start_new_session)
  File "C:\Python34\lib\subprocess.py", line 1112, in _execute_child
    startupinfo)
FileNotFoundError: [WinError 2] The system cannot find the file specified

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "C:\Users\Documents\Kissmetrics\curves and lines\eventNodes.py", line 56, in <module>
    filename=dot.render(filename='test')
  File "C:\Python34\lib\site-packages\graphviz\files.py", line 225, in render
    'are on your systems\' path' % cmd)
RuntimeError: failed to execute ['dot', '-Tpdf', '-O', 'test'], make sure the Graphviz executables are on your systems' path

有人有经验吗?

I downloaded Graphviz 2.38 MSI version and installed under folder C:\Python34, then I run pip install Graphviz, everything went well. In system’s path I added C:\Python34\bin. When I tried to run a test script, in line filename=dot.render(filename='test'), I got a message

 RuntimeError: failed to execute ['dot', '-Tpdf', '-O', 'test'], make sure the Graphviz executables are on your systems' path

I tried to put "C:\Python34\bin\dot.exe" in system’s path, but it didn’t work, and I even created a new environment variable "GRAPHVIZ_DOT" with value "C:\Python34\bin\dot.exe", still not working. I tried to uninstall Graphviz and pip uninstall graphviz, then reinstall it and pip install again, but nothing works.

The whole traceback message is:

Traceback (most recent call last):
  File "C:\Python34\lib\site-packages\graphviz\files.py", line 220, in render
    proc = subprocess.Popen(cmd, startupinfo=STARTUPINFO)
  File "C:\Python34\lib\subprocess.py", line 859, in __init__
    restore_signals, start_new_session)
  File "C:\Python34\lib\subprocess.py", line 1112, in _execute_child
    startupinfo)
FileNotFoundError: [WinError 2] The system cannot find the file specified

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "C:\Users\Documents\Kissmetrics\curves and lines\eventNodes.py", line 56, in <module>
    filename=dot.render(filename='test')
  File "C:\Python34\lib\site-packages\graphviz\files.py", line 225, in render
    'are on your systems\' path' % cmd)
RuntimeError: failed to execute ['dot', '-Tpdf', '-O', 'test'], make sure the Graphviz executables are on your systems' path

Does anybody have any experience with it?


回答 0

import os
os.environ["PATH"] += os.pathsep + 'D:/Program Files (x86)/Graphviz2.38/bin/'

在Windows中,只需在开头添加这两行,其中“ D:/ Program Files(x86)/Graphviz2.38/bin/”将替换为bin文件所在的地址。

那解决了问题。

import os
os.environ["PATH"] += os.pathsep + 'D:/Program Files (x86)/Graphviz2.38/bin/'

In windows just add these 2 lines in the beginning, where ‘D:/Program Files (x86)/Graphviz2.38/bin/’ is replaced by the address of where your bin file is.

That solves the problem.


回答 1

您应该在系统中安装graphviz软件包(而不仅仅是python软件包)。在Ubuntu上,您应该尝试:

sudo apt-get install graphviz

You should install the graphviz package in your system (not just the python package). On Ubuntu you should try:

sudo apt-get install graphviz

回答 2

这是我在MAC上为我解决的问题

  brew install graphviz

This one solved the problem for me on MAC:

  brew install graphviz

回答 3

对于Windows:

  1. 从以下位置安装Windows软件包:https : //graphviz.gitlab.io/_pages/Download/Download_windows.html
  2. 安装python graphviz
  3. 添加C:\Program Files (x86)\Graphviz2.38\bin到用户路径
  4. 添加C:\Program Files (x86)\Graphviz2.38\bin\dot.exe到系统路径

这对我有用!

For Windows:

  1. Install windows package from: https://graphviz.gitlab.io/_pages/Download/Download_windows.html
  2. Install python graphviz package
  3. Add C:\Program Files (x86)\Graphviz2.38\bin to User path
  4. Add C:\Program Files (x86)\Graphviz2.38\bin\dot.exe to System Path

This worked for me!


回答 4

尝试使用:

conda install python-graphviz

如果使用,graphviz可执行文件与conda目录位于不同的路径pip install graphviz

Try using:

conda install python-graphviz

The graphviz executable sit on a different path from your conda directory, if you use pip install graphviz.


回答 5

OSX Sierra,Python 2.7,Graphviz 2.38

使用pip install graphvizconda install graphviz解决了该问题。

pip只得到与您相同的路径问题,并且conda只得到导入错误。

OSX Sierra, Python 2.7, Graphviz 2.38

Using pip install graphviz and conda install graphviz BOTH resolves the problem.

pip only gets path problem same as yours and conda only gets import error.


回答 6

只需将以下内容添加到 Windows上的环境变量(系统)路径

C:\ Program档案(x86)\ Graphviz2.38 \ bin

在那里,您可以找到 .exe文件

如果不行

在您的程序文件中而不是python lib中找到Graphviz2.38 / bin文件夹

然后,添加到您的PATH

找到存在.exe文件的文件夹很重要

Just add below to your Environmental Variable(system) PATH on Windows

C:\Program Files (x86)\Graphviz2.38\bin

there, you can find .exe files

If not work

Find Graphviz2.38/bin folder in your Program Files not in python lib

Then, add to your PATH

It’s important to find a folder where .exe files exist


回答 7

步骤1:安装Graphviz二进制文件

视窗:

  1. http://www.graphviz.org/download/下载Graphviz
  2. 在下面添加到PATH环境变量中(提及已安装的graphviz版本):
    • C:\ Program档案(x86)\ Graphviz2.38 \ bin
    • C:\ Program Files(x86)\ Graphviz2.38 \ bin \ dot.exe
  3. 关闭所有打开的Juypter笔记本和命令提示符
  4. 重新启动Jupyter / cmd提示并测试

Linux:

  1. sudo apt-get更新
  2. 须藤apt-get install graphviz
  3. 或从http://www.graphviz.org/download/手动构建

步骤2:为Python安装graphviz模块

点:

  • 点安装graphviz

康达:

  • 康达安装graphviz

Step 1: Install Graphviz binary

Windows:

  1. Download Graphviz from http://www.graphviz.org/download/
  2. Add below to PATH environment variable (mention the installed graphviz version):
    • C:\Program Files (x86)\Graphviz2.38\bin
    • C:\Program Files (x86)\Graphviz2.38\bin\dot.exe
  3. Close any opened Juypter notebook and the command prompt
  4. Restart Jupyter / cmd prompt and test

Linux:

  1. sudo apt-get update
  2. sudo apt-get install graphviz
  3. or build it manually from http://www.graphviz.org/download/

Step 2: Install graphviz module for python

pip:

  • pip install graphviz

conda:

  • conda install graphviz

回答 8

尝试conda install graphviz。我有同样的问题,我通过MacOS中提到的命令解决了。

Try conda install graphviz. I had the same problem, I resolved it by mentioned command in MacOS.


回答 9

对我来说,在Windows10上使用conda install graphvizconda install python-graphviz安装GraphViz所需的路径是C:/ ProgramData / Anaconda3 / Library / bin / graphviz /。即添加

import os
os.environ["PATH"] += os.pathsep + 'C:/ProgramData/Anaconda3/Library/bin/graphviz/'

为我解决了这个问题。

Using conda install graphviz and conda install python-graphviz to install GraphViz on Windows10 the path needed was C:/ProgramData/Anaconda3/Library/bin/graphviz/ for me. I.e. adding

import os
os.environ["PATH"] += os.pathsep + 'C:/ProgramData/Anaconda3/Library/bin/graphviz/'

solved the issue for me.


回答 10

conda install python-graphviz

对于Windows,请安装Python Graphviz,它将在路径中包含可执行文件。

conda install python-graphviz

For Windows, install the Python Graphviz which will include the executables in the path.


回答 11

在Ubuntu Linux上,这为我解决了问题:

pip install graphviz
sudo apt-get install graphviz

conda install -c conda-forge graphviz如果使用Anaconda,也可以尝试代替pip。

On Ubuntu Linux this solved it for me:

pip install graphviz
sudo apt-get install graphviz

You could also try conda install -c conda-forge graphviz instead of pip if using Anaconda.


回答 12

在为自己解决此问题时,我使用了此GitHub教程,该教程分析了导致此问题的原因。如果我们在两行之间阅读,它说它需要系统以及python graph viz。除了conda install,我们还需要运行:

conda install -c conda-forge python-graphviz

然后重启内核;它就像一个魅力。

When solving this issue for myself, I used this GitHub tutorial, which analysed the cause of this issue. If we read in between the lines, it says it needs system as well as python graph viz. In addition to conda install, we would need to run:

conda install -c conda-forge python-graphviz

Then restart the kernel; it works like a charm.


回答 13

1)Graphviz –在系统的特定位置下载解压缩文件(pip在Windows中不起作用),并在每个程序中手动设置的路径中包含bin文件夹(“在Windows中设置环境变量”或)

import os
os.environ["PATH"] += os.pathsep + 'C:/GraphViz/bin'

2)然后将模型绘制

clf = xgb.train(params, d_train, 1000, evals=evallist, early_stopping_rounds=10)
xgb.plot_tree(clf)
plt.rcParams['figure.figsize'] = [50, 10]
plt.show()

1) Graphviz – download unzip in a particular place in the system (pip does not work in windows ) and include the bin folder in the path (‘set environment variables in windows’ OR) set manually in each program

import os
os.environ["PATH"] += os.pathsep + 'C:/GraphViz/bin'

2) Then put the model to plot

clf = xgb.train(params, d_train, 1000, evals=evallist, early_stopping_rounds=10)
xgb.plot_tree(clf)
plt.rcParams['figure.figsize'] = [50, 10]
plt.show()

回答 14

安装软件包后(如果尚未安装,请链接),将dot.exe的路径添加为新的系统变量。

默认路径是:

C:\ Program Files(x86)\ Graphviz2.38 \ bin \ dot.exe

After you’ve installed the package (link if you haven’t), add the path to dot.exe as a new system variable.

Default path is:

C:\Program Files (x86)\Graphviz2.38\bin\dot.exe


回答 15

我在使用Jupyter的Linux上遇到了相同的问题。

为了解决这个问题,我已经将点库添加到python sys.path中

首先:检查是否dot已安装,

然后:
找到他的路径whereis dot-> / local / notebook / miniconda2 / envs / ik2 / bin / dot

最后在python脚本中:sys.path.append(“ / local / notebook / miniconda2 / envs / ik2 / bin / dot”)

I had the same issue on Linux with Jupyter.

To solve it I’ve added the dot library to python sys.path

First: check if dot is installed,

Then:
find his path whereis dot -> /local/notebook/miniconda2/envs/ik2/bin/dot

Finally in python script : sys.path.append(“/local/notebook/miniconda2/envs/ik2/bin/dot”)


回答 16

首先,您应该使用pip install,然后在http://www.graphviz.org/Download_windows.php中下载另一个软件包 ,并将安装位置添加到环境路径中,然后它可以工作。

First, you should use pip install, and then download another package in http://www.graphviz.org/Download_windows.php and add the install location into the environmental path, then it works.


回答 17

在Mac OS(El Capitan)上,我使用PyCharm IDE遇到了相同的错误消息。我已按照RZK的答案中的建议使用brew安装了Graphviz,并使用PyCharm 安装了graphviz python软件包(我可以通过dot -V在终端中尝试并获取以下内容来检查Graphviz是否已正确安装:dot - graphviz version 2.40.1 (20161225.0304))。但是,当尝试从PyCharm调用Graphviz时,我仍然收到错误消息。

我必须按照此问题的答案中的建议,在PyCharm选项中添加路径/ usr / local / bin 。

I had the same error message on Mac OS (El Capitan), using the PyCharm IDE. I had installed Graphviz using brew, as recommended in RZK’s answer, and installed the graphviz python package using PyCharm (I could check Graphviz was installed correctly by trying dot -V in a terminal and getting: dot - graphviz version 2.40.1 (20161225.0304)). Yet I was still getting the error message when trying to call Graphviz from PyCharm.

I had to add the path /usr/local/bin in PyCharm options, as recommended in the answer to this question to resolve the problem.


回答 18

这显示了一些路径问题:

pip install graphviz

所以这对我有用:

sudo apt-get install graphviz

This is showing some path issue:

pip install graphviz

So this worked for me:

sudo apt-get install graphviz

回答 19

我在macOS Catalina 10.15.3上,并且遇到了类似的错误: ExecutableNotFound: failed to execute ['dot', '-Tsvg'], make sure the Graphviz executables are on your systems' PATH

使用以下方法修复了该问题:

pip3 install graphvizbrew install graphviz

请注意,pip3 install只会返回成功消息,Successfully installed graphviz-0.13.2因此我们仍然需要运行brew install以获得graphviz 2.42.3(截至2020年3月10日,下午6点)。

I’m on macOS Catalina 10.15.3, and I had a similar error: ExecutableNotFound: failed to execute ['dot', '-Tsvg'], make sure the Graphviz executables are on your systems' PATH

Fixed it with:

pip3 install graphviz AND brew install graphviz

Note the pip3 install will only return the success message Successfully installed graphviz-0.13.2 so we still need to run brew install to get graphviz 2.42.3 (as of 10 Mar 2020, 6PM).


回答 20

对于没有root权限并因此无法sudo按照其他答案中的建议使用命令的Linux用户…

首先,通过以下方法激活您的conda虚拟环境(如果要使用):

source activate virtual-env-name

然后安装graphviz,即使您已经使用pip完成了它:

conda install graphviz

然后复制以下命令的结果:

whereis dot

就我而言,其输出为:

/home/nader/anaconda2/bin/dot

并将其添加到您的PATH变量中。只需运行以下命令

nano ~/.bashrc

并将这些行添加到打开的文件的末尾:

PATH="/home/username/anaconda2/bin/dot:$PATH"
export PATH

现在按Ctrl+ O,然后Ctrl+ X保存并退出。

现在应该解决问题。

Pycharm用户,请注意:Pycharm并不总是看到与您的终端相同的PATH变量。该解决方案不适用于Pycharm以及其他IDE。但是您可以通过添加以下代码行来解决此问题:

os.environ["PATH"] += os.pathsep + '/home/nader/anaconda2/bin'

到您的python程序。不要忘记

import os

首先:)

编辑:如果您不想使用conda,您仍然可以没有任何root权限的情况下从此处安装graphviz ,并将bin文件夹添加到PATH变量中。我没有测试。

For Linux users who don’t have root access and hence can’t use sudo command as suggested in other answers…

First, activate your conda virtual-environment (if you want to use one) by:

source activate virtual-env-name

Then install graphviz, even if you have already done it using pip:

conda install graphviz

then copy the result of the following command:

whereis dot

In my case, its output is:

/home/nader/anaconda2/bin/dot

and add it to your PATH variable. Just run the command below

nano ~/.bashrc

and add these lines to the end of the opened file:

PATH="/home/username/anaconda2/bin/dot:$PATH"
export PATH

now press Ctrl+O and then Ctrl+X to save and exit.

Problem should be solved by now.

Pycharm users, please note: Pycharm does not always see the PATH variable the same as your terminal. This solution does not work for Pycharm, and maybe other IDEs. But you can fix this by adding this line of code:

os.environ["PATH"] += os.pathsep + '/home/nader/anaconda2/bin'

to your python program. Do not forget to

import os

first :)

Edit: If you don’t want to use conda, you can still install graphviz from here without any root permissions and add the bin folder to your PATH variable. I didn’t test this.


回答 21

1.从以下位置安装Windows软件包:https ://graphviz.gitlab.io/_pages/Download/Download_windows.html 并下载msi文件

添加环境变量2.将C:\ Program Files(x86)\ Graphviz2.38 \ bin添加到用户路径

  1. 将C:\ Program Files(x86)\ Graphviz2.38 \ bin \ dot.exe添加到系统路径

  2. 重新启动您的python笔记本。

会的。

1.install windows package from: https://graphviz.gitlab.io/_pages/Download/Download_windows.html and download msi file

Add in Environmental variables 2. Add C:\Program Files (x86)\Graphviz2.38\bin to User path

  1. Add C:\Program Files (x86)\Graphviz2.38\bin\dot.exe to System Path

  2. Restart your python notebook.

It will work.


回答 22

graphviz添加到系统路径

  1. Windows-编辑系统环境变量。
  2. 选择环境变量。
  3. 选择路径-新建
  4. 添加graphviz的路径

例如:C:\ Users \ AppData \ Local \ Continuum \ anaconda3 \ Library \ bin \ graphviz

Add graphviz to the System Path

  1. Windows – Edit the System Environment Variables.
  2. Choose Environment Variables.
  3. Select Path – New
  4. Add the Path of graphviz

Ex: C:\Users\AppData\Local\Continuum\anaconda3\Library\bin\graphviz


回答 23

OS Mojave 10.14。,Python 3.6

使用pip install graphviz在终端上有很好的反馈,但是当我尝试在Jupyter笔记本中绘制图形时导致使用此错误。然后brew install graphviz,我运行,这在终端中给出了错误。然后我跑conda install graphviz了,图开始工作了。

来自@Leighton的评论:pip仅会遇到与您相同的路径问题,而conda仅会导致导入错误。

OS Mojave 10.14., Python 3.6

Using pip install graphviz had good feedback in terminal, but lead to this error when I tried to make a graph in a Jupyter notebook. I then ran brew install graphviz, which gave an error in terminal. Then I ran conda install graphviz and the graph worked.

From @Leighton’s comment: pip only gets path problem same as yours and conda only gets import error.


回答 24

import os
os.environ["PATH"] += os.pathsep + "/Macintosh HD⁩/anaconda3⁩/lib⁩/⁨python3.7⁩/site-packages⁩/sphinx⁩/templates⁩/graphviz"

这为我解决了MAC上的PATH问题!

import os
os.environ["PATH"] += os.pathsep + "/Macintosh HD⁩/anaconda3⁩/lib⁩/⁨python3.7⁩/site-packages⁩/sphinx⁩/templates⁩/graphviz"

This solved the PATH issue on MAC for me!


回答 25

如果您不是使用Conda而是使用Vanilla Python,则“ brew install graphviz”有效。

If you are not using Conda but vanilla Python, ‘brew install graphviz’ works.


回答 26

#Write this on anaconda prompt in admin mode
conda install -c anaconda graphviz
conda install -c conda-forge python-graphviz
conda install -c conda-forge/label/broken python-graphviz
conda install -c conda-forge/label/cf201901 python-graphviz
conda install -c conda-forge/label/cf202003 python-graphviz

#check dot -v in window's cmd prompt
C:\WINDOWS\system32>dot -V
dot - graphviz version 2.38.0 (20140413.2041)
(this means graphviz installed successfully)

#Add path to sys and user eve variables
PATH
C:\Anaconda3\pkgs\graphviz-2.38-hfd603c8_2\Library\bin
(search bin folder of graphviz and then copy n paste path in env variables)

#Re-run all cmds in jyupter notebook
#if error occurs (less chances)
#then 
#Restart anaconda and again run all cmds in jyupter notebook
eg.
import graphviz as gp
with open("tree.dot") as f:
    dot_read=f.read()
display(gp.Source(dot_read))
#Write this on anaconda prompt in admin mode
conda install -c anaconda graphviz
conda install -c conda-forge python-graphviz
conda install -c conda-forge/label/broken python-graphviz
conda install -c conda-forge/label/cf201901 python-graphviz
conda install -c conda-forge/label/cf202003 python-graphviz

#check dot -v in window's cmd prompt
C:\WINDOWS\system32>dot -V
dot - graphviz version 2.38.0 (20140413.2041)
(this means graphviz installed successfully)

#Add path to sys and user eve variables
PATH
C:\Anaconda3\pkgs\graphviz-2.38-hfd603c8_2\Library\bin
(search bin folder of graphviz and then copy n paste path in env variables)

#Re-run all cmds in jyupter notebook
#if error occurs (less chances)
#then 
#Restart anaconda and again run all cmds in jyupter notebook
eg.
import graphviz as gp
with open("tree.dot") as f:
    dot_read=f.read()
display(gp.Source(dot_read))

回答 27

尝试在anaconda提示中一一键入以下代码。

这对我有用。

资料来源:https : //anaconda.org/conda-forge/python-graphviz

conda install -c conda-forge python-graphviz
conda install -c conda-forge/label/broken python-graphviz
conda install -c conda-forge/label/cf201901 python-graphviz
conda install -c conda-forge/label/cf202003 python-graphviz 

try typing the following code in anaconda prompt one by one.

this worked for me.

Source: https://anaconda.org/conda-forge/python-graphviz

conda install -c conda-forge python-graphviz
conda install -c conda-forge/label/broken python-graphviz
conda install -c conda-forge/label/cf201901 python-graphviz
conda install -c conda-forge/label/cf202003 python-graphviz 

回答 28

尝试在python import sys中做到这一点!conda install –yes –prefix {sys.prefix} graphviz import graphviz

trying doing this in python import sys !conda install –yes –prefix {sys.prefix} graphviz import graphviz


一个Flask进程接收多少个并发请求?

问题:一个Flask进程接收多少个并发请求?

我正在用Flask构建一个应用程序,但是我对WSGI并不太了解,它是基于HTTP的Werkzeug。当我开始使用gunicorn和4个工作进程处理Flask应用程序时,这是否意味着我可以处理4个并发请求?

我的意思是并发请求,而不是每秒的请求或其他任何请求。

I’m building an app with Flask, but I don’t know much about WSGI and it’s HTTP base, Werkzeug. When I start serving a Flask application with gunicorn and 4 worker processes, does this mean that I can handle 4 concurrent requests?

I do mean concurrent requests, and not requests per second or anything else.


回答 0

在运行开发服务器时(这是通过运行获得的)app.run(),您将获得一个同步过程,这意味着一次最多处理一个请求。

通过将Gunicorn保留在其默认配置中并简单地增加Gunicorn的数量--workers,您所获得的实际上是一些流程(由Gunicorn管理),每个流程的行为都类似于app.run()开发服务器。4个工作人员== 4个并发请求。这是因为Gunicorn sync默认使用其包含的工作程序类型。

重要的是要注意,Gunicorn还包含异步工作程序,即eventletgevent(以及tornado,但似乎最好在Tornado框架中使用)。通过使用--worker-class标志指定这些异步工作程序之一,您将获得Gunicorn管理多个异步进程的信息,每个进程管理自己的并发性。这些进程不使用线程,而是协程。基本上,在每个进程中,一次只能发生1件事(1个线程),但是当对象等待外部进程完成(例如数据库查询或等待网络I / O)时,它们可以被“暂停”。

这意味着,如果您使用的是Gunicorn的异步工作程序之一,则每个工作程序一次最多可以处理多个请求。多少工人才是最好的,取决于您的应用程序的性质,其环境,运行的硬件等。有关更多详细信息,请参见Gunicorn的设计页面,并在其介绍页面上介绍gevent的工作方式

When running the development server – which is what you get by running app.run(), you get a single synchronous process, which means at most 1 request is being processed at a time.

By sticking Gunicorn in front of it in its default configuration and simply increasing the number of --workers, what you get is essentially a number of processes (managed by Gunicorn) that each behave like the app.run() development server. 4 workers == 4 concurrent requests. This is because Gunicorn uses its included sync worker type by default.

It is important to note that Gunicorn also includes asynchronous workers, namely eventlet and gevent (and also tornado, but that’s best used with the Tornado framework, it seems). By specifying one of these async workers with the --worker-class flag, what you get is Gunicorn managing a number of async processes, each of which managing its own concurrency. These processes don’t use threads, but instead coroutines. Basically, within each process, still only 1 thing can be happening at a time (1 thread), but objects can be ‘paused’ when they are waiting on external processes to finish (think database queries or waiting on network I/O).

This means, if you’re using one of Gunicorn’s async workers, each worker can handle many more than a single request at a time. Just how many workers is best depends on the nature of your app, its environment, the hardware it runs on, etc. More details can be found on Gunicorn’s design page and notes on how gevent works on its intro page.


回答 1

当前,存在比已提供的解决方案简单得多的解决方案。运行应用程序时,只需将threaded=True参数传递给app.run()调用,例如:

app.run(host="your.host", port=4321, threaded=True)

根据在werkzeug文档中可以看到的另一种选择是使用processes参数,该参数接收的数字> 1表示要处理的最大并发进程数:

  • 线程化–进程应在单独的线程中处理每个请求吗?
  • 进程–如果大于1,则将处理新进程中的每个请求,直到最大并发进程数。

就像是:

app.run(host="your.host", port=4321, processes=3) #up to 3 processes

关于更多信息run()方法在这里,和博客文章,导致我找到解决方案和API引用。


注意:在Flask文档中,关于run()方法的方法表明不鼓励在生产环境中使用它,因为(quote):“虽然Flask轻巧易用,但其内置服务器不适合生产,因为它的扩展性不好”。

但是,他们确实指向其“ 部署选项”页面,以了解在投入生产时执行此操作的推荐方法。

Currently there is a far simpler solution than the ones already provided. When running your application you just have to pass along the threaded=True parameter to the app.run() call, like:

app.run(host="your.host", port=4321, threaded=True)

Another option as per what we can see in the werkzeug docs, is to use the processes parameter, which receives a number > 1 indicating the maximum number of concurrent processes to handle:

  • threaded – should the process handle each request in a separate thread?
  • processes – if greater than 1 then handle each request in a new process up to this maximum number of concurrent processes.

Something like:

app.run(host="your.host", port=4321, processes=3) #up to 3 processes

More info on the run() method here, and the blog post that led me to find the solution and api references.


Note: on the Flask docs on the run() methods it’s indicated that using it in a Production Environment is discouraged because (quote): “While lightweight and easy to use, Flask’s built-in server is not suitable for production as it doesn’t scale well.”

However, they do point to their Deployment Options page for the recommended ways to do this when going for production.


回答 2

Flask将同时为每个线程处理一个请求。如果您有2个进程,每个进程有4个线程,则是8个并发请求。

Flask不会产生或管理线程或进程。这就是WSGI网关(例如gunicorn)的责任。

Flask will process one request per thread at the same time. If you have 2 processes with 4 threads each, that’s 8 concurrent requests.

Flask doesn’t spawn or manage threads or processes. That’s the responsability of the WSGI gateway (eg. gunicorn).


回答 3

不,您绝对可以处理更多。

重要的是要记住,假设您正在运行一台单核计算机,那么CPU实际上一次只能运行一条指令*。

也就是说,CPU只能执行非常有限的一组指令,并且每个时钟周期不能执行多个指令(许多指令甚至需要1个周期以上)。

因此,我们在计算机科学中谈论的大多数并发是软件并发。换句话说,有一些软件实现层从我们这里提取底层CPU,并使我们认为我们正在同时运行代码。

这些“事物”可以是进程,它们是代码单元,它们在每个进程都认为其在自己的世界中使用自己的非共享内存在运行时可以并发运行。

另一个示例是线程,线程是进程内部的代码单元,也允许并发。

您的4个辅助进程能够处理4个以上请求的原因是,它们将触发线程以处理越来越多的请求。

实际的请求限制取决于所选择的HTTP服务器,I / O,操作系统,硬件,网络连接等。

祝好运!

*说明是CPU可以运行的最基本的命令。示例-加两个数字,从一条指令跳转到另一条指令

No- you can definitely handle more than that.

Its important to remember that deep deep down, assuming you are running a single core machine, the CPU really only runs one instruction* at a time.

Namely, the CPU can only execute a very limited set of instructions, and it can’t execute more than one instruction per clock tick (many instructions even take more than 1 tick).

Therefore, most concurrency we talk about in computer science is software concurrency. In other words, there are layers of software implementation that abstract the bottom level CPU from us and make us think we are running code concurrently.

These “things” can be processes, which are units of code that get run concurrently in the sense that each process thinks its running in its own world with its own, non-shared memory.

Another example is threads, which are units of code inside processes that allow concurrency as well.

The reason your 4 worker processes will be able to handle more than 4 requests is that they will fire off threads to handle more and more requests.

The actual request limit depends on HTTP server chosen, I/O, OS, hardware, network connection etc.

Good luck!

*instructions are the very basic commands the CPU can run. examples – add two numbers, jump from one instruction to another


Django-makemigrations-未检测到更改

问题:Django-makemigrations-未检测到更改

我试图使用makemigrations命令在现有应用程序中创建迁移,但输出“未检测到更改”。

通常,我使用startapp命令创建新应用,但在创建该应用时并未将其用于该应用。

调试后,我发现它没有创建迁移,因为migrations应用程序中缺少软件包/文件夹。

如果不存在该文件夹,还是创建丢失的文件夹,会更好吗?

I was trying to create migrations within an existing app using the makemigrations command but it outputs “No changes detected”.

Usually I create new apps using the startapp command but did not use it for this app when I created it.

After debugging, I found that it is not creating migration because the migrations package/folder is missing from an app.

Would it be better if it creates the folder if it is not there or am I missing something?


回答 0

要为应用创建初始迁移,请运行makemigrations并指定应用名称。将创建迁移文件夹。

./manage.py makemigrations <myapp>

您的应用必须首先包含INSTALLED_APPS(在settings.py内部)。

To create initial migrations for an app, run makemigrations and specify the app name. The migrations folder will be created.

./manage.py makemigrations <myapp>

Your app must be included in INSTALLED_APPS first (inside settings.py).


回答 1

我的问题(以及解决方案)与上述问题有所不同。

我没有使用models.py文件,而是创建了一个models目录并在my_model.py其中创建了文件,并在其中放置了模型。Django找不到我的模型,因此它写道没有要应用的迁移。

我的解决方案是:在my_app/models/__init__.py文件中添加以下行: from .my_model import MyModel

My problem (and so solution) was yet different from those described above.

I wasn’t using models.py file, but created a models directory and created the my_model.py file there, where I put my model. Django couldn’t find my model so it wrote that there are no migrations to apply.

My solution was: in the my_app/models/__init__.py file I added this line: from .my_model import MyModel


回答 2

django在makemigrations命令执行期间未检测到要迁移的内容有多种可能的原因。

  1. 迁移文件夹您的应用程序中需要一个迁移软件包。
  2. INSTALLED_APPS您需要在INSTALLED_APPS.dict中指定您的应用
  3. 详尽度 从运行makemigrations -v 3冗长开始。这可能会为这个问题提供一些启示。
  4. 完整路径INSTALLED_APPS建议指定完整的模块应用程序配置路径“apply.apps.MyAppConfig”
  5. –settings,您可能需要确保设置了正确的设置文件:manage.py makemigrations --settings mysite.settings
  6. 指定应用程序名称可以显式地放入应用程序名称manage.py makemigrations myapp-不仅可以缩小应用程序的迁移范围,而且可以帮助您隔离问题。
  7. 模型元检查你有合适的app_label模型中的元

  8. 调试django调试django核心脚本。makemigrations命令非常简单。这是在pycharm中执行的方法。相应地改变你的脚本定义(例如:makemigrations --traceback myapp

多个数据库:

  • DB路由器时使用Django DB路由器,路由器类(您的自定义类路由器)工作需要实现的allow_syncdb方法。

makemigrations总是为模型更改创建迁移,但是如果allow_migrate()返回False,

There are multiple possible reasons for django not detecting what to migrate during the makemigrations command.

  1. migration folder You need a migrations package in your app.
  2. INSTALLED_APPS You need your app to be specified in the INSTALLED_APPS .dict
  3. Verbosity start by running makemigrations -v 3 for verbosity. This might shed some light on the problem.
  4. Full path In INSTALLED_APPS it is recommended to specify the full module app config path ‘apply.apps.MyAppConfig’
  5. –settings you might want to make sure the correct settings file is set: manage.py makemigrations --settings mysite.settings
  6. specify app name explicitly put the app name in manage.py makemigrations myapp – that narrows down the migrations for the app alone and helps you isolate the problem.
  7. model meta check you have the right app_label in your model meta

  8. Debug django debug django core script. makemigrations command is pretty much straight forward. Here’s how to do it in pycharm. change your script definition accordingly (ex: makemigrations --traceback myapp)

Multiple databases:

  • Db Router when working with django db router, the router class (your custom router class) needs to implement the allow_syncdb method.

makemigrations always creates migrations for model changes, but if allow_migrate() returns False,


回答 3

我已经读过许多关于这个问题的答案,通常说它们只是makemigrations以其他方式运行。但是对我来说,问题出在Meta模型的子类中。

我有一个应用程序配置说label = <app name>(在apps.py文件中models.pyviews.py等等)。如果您的元类碰巧没有与应用程序标签相同的标签(例如,因为您将一个太大的应用程序拆分为多个应用程序),则不会检测到任何更改(也不会显示任何有用的错误消息)。因此,在我的模型课中,我现在有:

class ModelClassName(models.Model):

    class Meta:
        app_label = '<app name>' # <-- this label was wrong before.

    field_name = models.FloatField()
    ...

在此处运行Django 1.10。

I’ve read many answers to this question often stating to simply run makemigrations in some other ways. But to me, the problem was in the Meta subclass of models.

I have an app config that says label = <app name> (in the apps.py file, beside models.py, views.py etc). If by any chance your meta class doesn’t have the same label as the app label (for instance because you are splitting one too big app into multiple ones), no changes are detected (and no helpful error message whatsoever). So in my model class I have now:

class ModelClassName(models.Model):

    class Meta:
        app_label = '<app name>' # <-- this label was wrong before.

    field_name = models.FloatField()
    ...

Running Django 1.10 here.


回答 4

这是一个评论,但可能应该是一个答案。

确保您的应用程序名称位于settings.py中,INSTALLED_APPS否则无论执行什么操作都不会运行迁移。

INSTALLED_APPS = [
    'django.contrib.admin',
    'django.contrib.auth',
    'django.contrib.contenttypes',
    'django.contrib.sessions',
    'django.contrib.messages',
    'django.contrib.staticfiles',

    'blog',
]

然后运行:

./manage.py makemigrations blog

It is a comment but should probably be an answer.

Make sure that your app name is in settings.py INSTALLED_APPS otherwise no matter what you do it will not run the migrations.

INSTALLED_APPS = [
    'django.contrib.admin',
    'django.contrib.auth',
    'django.contrib.contenttypes',
    'django.contrib.sessions',
    'django.contrib.messages',
    'django.contrib.staticfiles',

    'blog',
]

Then run:

./manage.py makemigrations blog

回答 5

我还有另一个这里未描述的问题,这让我发疯。

class MyModel(models.Model):
    name = models.CharField(max_length=64, null=True)  # works
    language_code = models.CharField(max_length=2, default='en')  # works
    is_dumb = models.BooleanField(default=False),  # doesn't work

在复制和粘贴的一行中,我有一个尾随的“,”。is_dumb所在的行不会使用’./manage.py makemigrations’创建模型迁移,但也不会引发错误。删除“后,”它按预期方式工作。

因此,在复制粘贴时请多加注意:-)

I had another problem not described here, which drove me nuts.

class MyModel(models.Model):
    name = models.CharField(max_length=64, null=True)  # works
    language_code = models.CharField(max_length=2, default='en')  # works
    is_dumb = models.BooleanField(default=False),  # doesn't work

I had a trailing ‘,’ in one line perhaps from copy&paste. The line with is_dumb doesn’t created a model migration with ‘./manage.py makemigrations’ but also didn’t throw an error. After removing the ‘,’ it worked as expected.

So be careful when you do copy&paste :-)


回答 6

有时./manage.py makemigrations有优于./manage.py makemigrations <myapp>因为它可以处理应用程序之间的某些冲突。

这些场合默默发生,花了几个小时swearing才能理解恐惧的真正含义No changes detected消息。

因此,使用以下命令是一个更好的选择:

./manage.py makemigrations <myapp1> <myapp2> ... <myappN>

There are sometimes when ./manage.py makemigrations is superior to ./manage.py makemigrations <myapp> because it can handle certain conflicts between apps.

Those occasions occur silently and it takes several hours of swearing to understand the real meaning of the dreaded No changes detected message.

Therefore, it is a far better choice to make use of the following command:

./manage.py makemigrations <myapp1> <myapp2> ... <myappN>


回答 7

我从django外部复制了一个表,默认将Meta类设置为“ managed = false”。例如:

class Rssemailsubscription(models.Model):
    id = models.CharField(primary_key=True, max_length=36)
    ...
    area = models.FloatField('Area (Sq. KM)', null=True)

    class Meta:
        managed = False
        db_table = 'RSSEmailSubscription'

通过将managed更改为True,makemigrations开始接受更改。

I had copied a table in from outside of django and the Meta class defaulted to “managed = false”. For example:

class Rssemailsubscription(models.Model):
    id = models.CharField(primary_key=True, max_length=36)
    ...
    area = models.FloatField('Area (Sq. KM)', null=True)

    class Meta:
        managed = False
        db_table = 'RSSEmailSubscription'

By changing manged to True, makemigrations started picking up changes.


回答 8

  1. 确保您的应用在settings.py中的installed_apps中被提及
  2. 确保您的模型类扩展了模型。
  1. Make sure your app is mentioned in installed_apps in settings.py
  2. Make sure you model class extends models.Model

回答 9

我这样做来解决了这个问题:

  1. 擦除“ db.sqlite3”文件。这里的问题是您当前的数据库将被删除,因此您将不得不重新制作它。
  2. 在已编辑的应用程序的迁移文件夹中,删除上次更新的文件。请记住,第一个创建的文件是:“ 0001_initial.py”。例如:我创建了一个新类,并通过“ makemigrations”和“ migrate”过程进行了注册,现在创建了一个名为“ 0002_auto_etc.py”的新文件;删除它。
  3. 转到pycache ”文件夹(位于migrations文件夹内),然后删除文件“ 0002_auto_etc.pyc”。
  4. 最后,转到控制台并使用“ python manage.py makemigrations”和“ python manage.py migration”。

I solved that problem by doing this:

  1. Erase the “db.sqlite3” file. The issue here is that your current data base will be erased, so you will have to remake it again.
  2. Inside the migrations folder of your edited app, erase the last updated file. Remember that the first created file is: “0001_initial.py”. For example: I made a new class and register it by the “makemigrations” and “migrate” procedure, now a new file called “0002_auto_etc.py” was created; erase it.
  3. Go to the “pycache” folder (inside the migrations folder) and erase the file “0002_auto_etc.pyc”.
  4. Finally, go to the console and use “python manage.py makemigrations” and “python manage.py migrate”.

回答 10

我忘了提出正确的论点:

class LineInOffice(models.Model):   # here
    addressOfOffice = models.CharField("Корхоная жош",max_length= 200)   #and here
    ...

在models.py中,然后就开始消除烦人的

在应用程序“ myApp”中未检测到更改

I forgot to put correct arguments:

class LineInOffice(models.Model):   # here
    addressOfOffice = models.CharField("Корхоная жош",max_length= 200)   #and here
    ...

in models.py and then it started to drop that annoying

No changes detected in app ‘myApp ‘


回答 11

另一个可能的原因是,如果您在其他文件中(而不是在程序包中)定义了某些模型,而在其他任何地方都没有引用过。

对我来说,简单地增加from .graph_model import *,以admin.py(其中graph_model.py为新文件)解决了这一问题。

Another possible reason is if you had some models defined in another file (not in a package) and haven’t referenced that anywhere else.

For me, simply adding from .graph_model import * to admin.py (where graph_model.py was the new file) fixed the problem.


回答 12

我的问题比上面的答案要简单得多,并且可能是一个更普遍的原因,只要您的项目已经建立并可以运行。在我的一个已经使用了很长时间的应用程序中,迁移似乎很困难,因此,我急着做了以下事情:

rm -r */migrations/*
rm db.sqlite3
python3 manage.py makemigrations
No changes detected

哇?

我错误地还删除了所有__init__.py文件:(-进入后,一切又恢复正常了:

touch ads1/migrations/__init__.py

对于我的每个应用程序,都可以makemigrations再次使用。

事实证明,我已经手动通过复制另一个创建了一个新的应用程序,忘了把__init__.pymigrations文件夹和confinved我,一切都是靠不住的-我的领先使得它与更坏rm -r如上所述。

希望这可以帮助某人在几个小时内发誓“未检测到更改”错误。

My problem was much simpler than the above answers and probably a far more common reason as long as your project is already set up and working. In one of my applications that had been working for a long time, migrations seemed wonky, so in a hurry, I did the following:

rm -r */migrations/*
rm db.sqlite3
python3 manage.py makemigrations
No changes detected

Whaat??

I had mistakenly also removed all the __init__.py files :( – Everything was working again after I went in and:

touch ads1/migrations/__init__.py

For each of my applications then the makemigrations worked again.

It turns out that I had manually created a new application by copying another and forgot to put the __init__.py in the migrations folder and that confinved me that everything was wonky – leading my making it worse with an rm -r as described above.

Hope this helps someone from swearing at the “No changes detected” error for a few hours.


回答 13

解决方案是您必须将您的应用程序包含在INSTALLED_APPS中。

我错过了,发现了同样的问题。

指定我的应用名称后,迁移成功

INSTALLED_APPS = [
    'django.contrib.admin',
    'django.contrib.auth',
    'django.contrib.contenttypes',
    'django.contrib.sessions',
    'django.contrib.messages',
    'django.contrib.staticfiles',
    'boards',
]

请注意,我最后提到的木板是我的应用名称。

The solution is you have to include your app in INSTALLED_APPS.

I missed it and I found this same issue.

after specifying my app name migration became successful

INSTALLED_APPS = [
    'django.contrib.admin',
    'django.contrib.auth',
    'django.contrib.contenttypes',
    'django.contrib.sessions',
    'django.contrib.messages',
    'django.contrib.staticfiles',
    'boards',
]

please note I mentioned boards in last, which is my app name.


回答 14

INSTALLED_APPS = [

'blog.apps.BlogConfig',
'django.contrib.admin',
'django.contrib.auth',
'django.contrib.contenttypes',
'django.contrib.sessions',
'django.contrib.messages',
'django.contrib.staticfiles',

]

确保“ blog.apps.BlogConfig”(此设置包含在settings.py中,以便进行应用迁移)

然后运行python3 manage.py makemigrations博客或您的应用名称

INSTALLED_APPS = [

'blog.apps.BlogConfig',
'django.contrib.admin',
'django.contrib.auth',
'django.contrib.contenttypes',
'django.contrib.sessions',
'django.contrib.messages',
'django.contrib.staticfiles',

]

make sure ‘blog.apps.BlogConfig’, (this is included in your settings.py in order to make your app migrations)

then run python3 manage.py makemigrations blog or your app name


回答 15

您可能还会遇到的一个非常愚蠢的问题是class Meta在模型中定义两个。在这种情况下,运行时不会应用对第一个所做的任何更改makemigrations

class Product(models.Model):
    somefield = models.CharField(max_length=255)
    someotherfield = models.CharField(max_length=255)

    class Meta:
        indexes = [models.Index(fields=["somefield"], name="somefield_idx")]

    def somefunc(self):
        pass

    # Many lines...

    class Meta:
        indexes = [models.Index(fields=["someotherfield"], name="someotherfield_idx")]

A very dumb issue you can have as well is to define two class Meta in your model. In that case, any change to the first one won’t be applied when running makemigrations.

class Product(models.Model):
    somefield = models.CharField(max_length=255)
    someotherfield = models.CharField(max_length=255)

    class Meta:
        indexes = [models.Index(fields=["somefield"], name="somefield_idx")]

    def somefunc(self):
        pass

    # Many lines...

    class Meta:
        indexes = [models.Index(fields=["someotherfield"], name="someotherfield_idx")]

回答 16

我知道这是一个老问题,但是我整天都在同一个问题上作斗争,而我的解决方案很简单。

我的目录结构类似于…

apps/
   app/
      __init__.py
      app_sub1/
           __init__.py
           models.py
      app_sub2/
           __init__.py
           models.py
      app_sub3/
           __init__.py
           models.py
   app2/
      __init__.py
      app2_sub1/
           __init__.py
           models.py
      app2_sub2/
           __init__.py
           models.py
      app2_sub3/
           __init__.py
           models.py
    main_app/
      __init__.py
      models.py

而且由于直到我遇到问题的所有其他模型都被导入到其他地方,最后又从main_app那里导入了INSTALLED_APPS,所以我很幸运,他们都可以使用。

但是由于我只添加了一个appINSTALLED_APPS而不是app_sub*当我最终添加了一个其他未导入的新模型文件时添加的,因此Django完全忽略了它。

我的解决方法是像这样将models.py文件添加到每个文件的基本目录中app

apps/
   app/
      __init__.py
      models.py <<<<<<<<<<--------------------------
      app_sub1/
           __init__.py
           models.py
      app_sub2/
           __init__.py
           models.py
      app_sub3/
           __init__.py
           models.py
   app2/
      __init__.py
      models.py <<<<<<<<<<--------------------------
      app2_sub1/
           __init__.py
           models.py
      app2_sub2/
           __init__.py
           models.py
      app2_sub3/
           __init__.py
           models.py
    main_app/
      __init__.py
      models.py

然后添加from apps.app.app_sub1 import *等等到每个app关卡models.py文件中。

Bleh …这花了我很长时间才弄清楚,我在任何地方都找不到解决方案…我什至转到了Google搜索结果的第2页。

希望这对某人有帮助!

I know this is an old question but I fought with this same issue all day and my solution was a simple one.

I had my directory structure something along the lines of…

apps/
   app/
      __init__.py
      app_sub1/
           __init__.py
           models.py
      app_sub2/
           __init__.py
           models.py
      app_sub3/
           __init__.py
           models.py
   app2/
      __init__.py
      app2_sub1/
           __init__.py
           models.py
      app2_sub2/
           __init__.py
           models.py
      app2_sub3/
           __init__.py
           models.py
    main_app/
      __init__.py
      models.py

And since all the other models up until the one I had a problem with were being imported somewhere else that ended up importing from main_app which was registered in the INSTALLED_APPS, I just got lucky that they all worked.

But since I only added each app to INSTALLED_APPS and not the app_sub* when I finally added a new models file that wasn’t imported ANYWHERE else, Django totally ignored it.

My fix was adding a models.py file to the base directory of each app like this…

apps/
   app/
      __init__.py
      models.py <<<<<<<<<<--------------------------
      app_sub1/
           __init__.py
           models.py
      app_sub2/
           __init__.py
           models.py
      app_sub3/
           __init__.py
           models.py
   app2/
      __init__.py
      models.py <<<<<<<<<<--------------------------
      app2_sub1/
           __init__.py
           models.py
      app2_sub2/
           __init__.py
           models.py
      app2_sub3/
           __init__.py
           models.py
    main_app/
      __init__.py
      models.py

and then add from apps.app.app_sub1 import * and so on to each of the app level models.py files.

Bleh… this took me SO long to figure out and I couldn’t find the solution anywhere… I even went to page 2 of the google results.

Hope this helps someone!


回答 17

根据官方文档中的“迁移”部分,我在django 3.0中遇到了类似的问题,运行它足以更新我的表结构:

python manage.py makemigrations
python manage.py migrate

但是输出始终是相同的:执行“ makemigrations”脚本后,模型“未检测到更改”。我要在数据库上更新的模型在models.py上出现语法错误:

field_model : models.CharField(max_length=255, ...)

代替:

field_model = models.CharField(max_length=255, ...)

解决了这个愚蠢的错误,使用这些命令可以顺利进行迁移。也许这可以帮助某人。

I had a similar issue with django 3.0, according migrations section in the official documentation, running this was enough to update my table structure:

python manage.py makemigrations
python manage.py migrate

But the output was always the same: ‘no change detected’ about my models after I executed ‘makemigrations’ script. I had a syntax error on models.py at the model I wanted to update on db:

field_model : models.CharField(max_length=255, ...)

instead of:

field_model = models.CharField(max_length=255, ...)

Solving this stupid mistake, with those command the migration was done without problems. Maybe this helps someone.


回答 18

您应该添加polls.apps.PollsConfigINSTALLED_APPSsetting.py

You should add polls.apps.PollsConfig to INSTALLED_APPS in setting.py


回答 19

就我而言,我忘记插入类参数

错误:

class AccountInformation():

正确

class AccountInformation(models.Model):

In my case i forgot to insert the class arguments

Wrong:

class AccountInformation():

Correct

class AccountInformation(models.Model):

回答 20

就我而言,我首先向模型添加了一个字段,而Django说没有任何变化。

比起我决定更改模型的“表名”,makemigrations起作用了。比起将表名改回默认值,新字段也在那里。

django迁移系统中有一个“ bug”,有时看不到新字段。可能与日期字段有关。

In my case, I first added a field to the model, and Django said there’re no changes.

Than I decided to change the “table name” of the model, makemigrations worked. Than I changed table name back to default, and the new field was also there.

There is a “bug” in django migration system, sometimes it doesn’t see the new field. Might be related with date field.


回答 21

可能的原因可能是删除了现有的db文件和migrations文件夹,您可以使用python manage.py makemigrations <app_name>来工作。我曾经遇到过类似的问题。

The possible reason could be deletion of the existing db file and migrations folder you can use python manage.py makemigrations <app_name> this should work. I once faced a similar problem.


回答 22

另一种边缘情况和解决方案:

我添加了一个布尔字段,并同时添加了一个引用该属性的@property,其名称相同(doh)。注释了该属性,并且迁移看到并添加了新字段。重命名该属性,一切都很好。

One more edge case and solution:

I added a boolean field, and at the same time added an @property referencing it, with the same name (doh). Commented the property and migration sees and adds the new field. Renamed the property and all is good.


回答 23

如果您拥有managed = True模型内部元数据,则需要将其删除并进行迁移。然后再次运行迁移,它将检测到新更新。

If you have the managed = True in yout model Meta, you need to remove it and do a migration. Then run the migrations again, it will detect the new updates.


回答 24

将新模型添加到django api应用程序并运行python manage.py makemigrations该工具时,未检测到任何新模型。

奇怪的是,旧模型确实被选中了makemigrations,但这是因为它们在urlpatterns链中并且该工具以某种方式检测到了它们。因此,请注意该行为。

问题是因为与models包相对应的目录结构具有子包,并且所有__init__.py文件均为空。他们必须在每个子文件夹和模型中 显式导入所有必需的类,以__init__.py供Django使用该makemigrations工具将其提取。

models
  ├── __init__.py          <--- empty
  ├── patient
     ├── __init__.py      <--- empty
     ├── breed.py
     └── ...
  ├── timeline
     ├── __init__.py      <-- empty
     ├── event.py
     └── ...

When adding new models to the django api application and running the python manage.py makemigrations the tool did not detect any new models.

The strange thing was that the old models did got picked by makemigrations, but this was because they were referenced in the urlpatterns chain and the tool somehow detected them. So keep an eye on that behavior.

The problem was because the directory structure corresponding to the models package had subpackages and all the __init__.py files were empty. They must explicitly import all the required classes in each subfolder and in the models __init__.py for Django to pick them up with the makemigrations tool.

models
  ├── __init__.py          <--- empty
  ├── patient
  │   ├── __init__.py      <--- empty
  │   ├── breed.py
  │   └── ...
  ├── timeline
  │   ├── __init__.py      <-- empty
  │   ├── event.py
  │   └── ...

回答 25

尝试在admin.py中注册模型,这是一个示例:-admin.site.register(YourModelHere)

您可以执行以下操作:1. 1. admin.site.register(YourModelHere)#在admin.py中2.重新加载页面并重试3.按下CTRL-S并保存4.可能有错误,特别是检查模型.py和admin.py5。或者,在结束时,只需重新启动服务器即可。

Try registering your model in admin.py, here’s an example:- admin.site.register(YourModelHere)

You can do the following things:- 1. admin.site.register(YourModelHere) # In admin.py 2. Reload the page and try again 3. Hit CTRL-S and save 4. There might be an error, specially check models.py and admin.py 5. Or, at the end of it all just restart the server


回答 26

这可能希望对其他人有所帮助,因为我最终花了数小时试图将其消除。

如果你有一个函数模型由同一个名字,这将删除值。事后看来很明显,但仍然如此。

因此,如果您有这样的事情:

class Foobar(models.Model):
    [...]
    something = models.BooleanField(default=False)

    [...]
    def something(self):
        return [some logic]

在这种情况下,该功能将覆盖上面的设置,使其对变为“不可见” makemigrations

This might hopefully help someone else, as I ended up spending hours trying to chase this down.

If you have a function within your model by the same name, this will remove the value. Pretty obvious in hindsight, but nonetheless.

So, if you have something like this:

class Foobar(models.Model):
    [...]
    something = models.BooleanField(default=False)

    [...]
    def something(self):
        return [some logic]

In that case, the function will override the setting above, making it “invisible” to makemigrations.


回答 27

您可以做的最好的事情是,删除现有数据库。就我而言,我使用的是phpMyAdmin SQL数据库,因此我手动删除了上面创建的数据库。

删除后: 我在PhpMyAdmin中创建数据库,并且不添加任何表。

再次运行以下命令:

python manage.py makemigrations

python manage.py migrate

这些命令之后:您可以看到django在数据库中自动创建了其他必要的表(大约有10个表)。

python manage.py makemigrations <app_name>

python manage.py migrate

最后:在上述命令之后,您创建的所有模型(表)都将直接导入数据库。

希望这会有所帮助。

The Best Thing You can do is, Delete the existing database. In my case, I were using phpMyAdmin SQL database, so I manually delete the created database overthere.

After Deleting: I create database in PhpMyAdmin, and doesn,t add any tables.

Again run the following Commands:

python manage.py makemigrations

python manage.py migrate

After These Commands: You can see django has automatically created other necessary tables in Database(Approx there are 10 tables).

python manage.py makemigrations <app_name>

python manage.py migrate

And Lastly: After above commands all the model(table) you have created are directly imported to the database.

Hope this will help.


回答 28

我对此错误的问题是,我包括了:

class Meta:
   abstract = True

我要为其创建的内部模型。

My problem with this error, was that I had included:

class Meta:
   abstract = True

Inside model that I wanted to creation migrate for.


回答 29

创建一个名为的新应用时,我遇到了另一个问题deals。我想在该应用中分离模型,所以我有2个名为deals.py和的模型文件dealers.py。运行时,python manage.py makemigrations我得到了:No changes detected

我继续前进,并将__init__.py其放在我的模型文件所在的同一目录(交易和经销商)中

from .deals import *
from .dealers import *

然后makemigrations命令起作用了。

原来,如果您不打算在任何地方导入模型,或者模型文件名不是 models.py不会检测到模型。

我遇到的另一个问题是我在settings.py以下位置编写应用程序的方式:

我有:

apps.deals

它应该已经包含了根项目文件夹:

cars.apps.deals

I had a different issue while creating a new app called deals. I wanted to separate the models inside that app so I had 2 model files named deals.py and dealers.py. When running python manage.py makemigrations I got: No changes detected.

I went ahead and inside the __init__.py which lives on the same directory where my model files lived (deals and dealer) I did

from .deals import *
from .dealers import *

And then the makemigrations command worked.

Turns out that if you are not importing the models anywhere OR your models file name isn’t models.py then the models wont be detected.

Another issue that happened to me is the way I wrote the app in settings.py:

I had:

apps.deals

It should’ve been including the root project folder:

cars.apps.deals

了解__getitem__方法

问题:了解__getitem__方法

我已经阅读了__getitem__Python文档中的大多数文档,但仍然无法理解它的含义。

因此,我所能理解的__getitem__就是用于实现类似的调用self[key]。但是它有什么用?

可以说我以这种方式定义了一个python类:

class Person:
    def __init__(self,name,age):
        self.name = name
        self.age = age

    def __getitem__(self,key):
        print ("Inside `__getitem__` method!")
        return getattr(self,key)

p = Person("Subhayan",32)
print (p["age"])

这将返回预期的结果。但是为什么要__getitem__首先使用?我还听说过Python __getitem__内部调用。但是为什么要这样做呢?

有人可以详细解释一下吗?

I have gone through most of the documentation of __getitem__ in the Python docs, but I am still unable to grasp the meaning of it.

So all I can understand is that __getitem__ is used to implement calls like self[key]. But what is the use of it?

Lets say I have a python class defined in this way:

class Person:
    def __init__(self,name,age):
        self.name = name
        self.age = age

    def __getitem__(self,key):
        print ("Inside `__getitem__` method!")
        return getattr(self,key)

p = Person("Subhayan",32)
print (p["age"])

This returns the results as expected. But why use __getitem__ in the first place? I have also heard that Python calls __getitem__ internally. But why does it do it?

Can someone please explain this in more detail?


回答 0

马聪(Cong Ma)很好地解释了__getitem__用途-但我想举一个可能有用的示例。想象一个为建筑物建模的类。在建筑物数据中,它包含许多属性,包括对占据每个楼层的公司的描述:

如果不使用,__getitem__我们将有一个像这样的类:

class Building(object):
     def __init__(self, floors):
         self._floors = [None]*floors
     def occupy(self, floor_number, data):
          self._floors[floor_number] = data
     def get_floor_data(self, floor_number):
          return self._floors[floor_number]

building1 = Building(4) # Construct a building with 4 floors
building1.occupy(0, 'Reception')
building1.occupy(1, 'ABC Corp')
building1.occupy(2, 'DEF Inc')
print( building1.get_floor_data(2) )

但是,我们可以使用__getitem__(及其对应的__setitem__)来使用Building类“ nicer”。

class Building(object):
     def __init__(self, floors):
         self._floors = [None]*floors
     def __setitem__(self, floor_number, data):
          self._floors[floor_number] = data
     def __getitem__(self, floor_number):
          return self._floors[floor_number]

building1 = Building(4) # Construct a building with 4 floors
building1[0] = 'Reception'
building1[1] = 'ABC Corp'
building1[2] = 'DEF Inc'
print( building1[2] )

是否__setitem__像这样使用,实际上取决于您计划如何抽象化数据-在这种情况下,我们决定将建筑物视为楼层的容器(并且您还可以为建筑物实现迭代器,甚至可以实现切片功能-即一次获取多个楼层的数据-这取决于您的需求。

Cong Ma does a good job of explaining what __getitem__ is used for – but I want to give you an example which might be useful. Imagine a class which models a building. Within the data for the building it includes a number of attributes, including descriptions of the companies that occupy each floor :

Without using __getitem__ we would have a class like this :

class Building(object):
     def __init__(self, floors):
         self._floors = [None]*floors
     def occupy(self, floor_number, data):
          self._floors[floor_number] = data
     def get_floor_data(self, floor_number):
          return self._floors[floor_number]

building1 = Building(4) # Construct a building with 4 floors
building1.occupy(0, 'Reception')
building1.occupy(1, 'ABC Corp')
building1.occupy(2, 'DEF Inc')
print( building1.get_floor_data(2) )

We could however use __getitem__ (and its counterpart __setitem__) to make the usage of the Building class ‘nicer’.

class Building(object):
     def __init__(self, floors):
         self._floors = [None]*floors
     def __setitem__(self, floor_number, data):
          self._floors[floor_number] = data
     def __getitem__(self, floor_number):
          return self._floors[floor_number]

building1 = Building(4) # Construct a building with 4 floors
building1[0] = 'Reception'
building1[1] = 'ABC Corp'
building1[2] = 'DEF Inc'
print( building1[2] )

Whether you use __setitem__ like this really depends on how you plan to abstract your data – in this case we have decided to treat a building as a container of floors (and you could also implement an iterator for the Building, and maybe even the ability to slice – i.e. get more than one floor’s data at a time – it depends on what you need.


回答 1

[]通过键或索引获取项目的语法仅仅是语法糖。

在评估a[i]Python调用时a.__getitem__(i)(或type(a).__getitem__(a, i),但是这种区别是关于继承模型的,在这里并不重要)。即使的类a可能未明确定义此方法,它也通常是从祖先类继承的。

此处列出了所有(Python 2.7)特殊方法名称及其语义:https : //docs.python.org/2.7/reference/datamodel.html#special-method-names

The [] syntax for getting item by key or index is just syntax sugar.

When you evaluate a[i] Python calls a.__getitem__(i) (or type(a).__getitem__(a, i), but this distinction is about inheritance models and is not important here). Even if the class of a may not explicitly define this method, it is usually inherited from an ancestor class.

All the (Python 2.7) special method names and their semantics are listed here: https://docs.python.org/2.7/reference/datamodel.html#special-method-names


回答 2

magic方法__getitem__基本上用于访问列表项,字典条目,数组元素等。它对于快速查找实例属性非常有用。

在这里,我用一个示例类Person来显示这一点,该类可以通过“名称”,“年龄”和“出生日期”(出生日期)实例化。该__getitem__方法以一种可以访问索引实例属性的方式编写,例如,名字或姓氏,日期,日期,月份或年份等。

import copy

# Constants that can be used to index date of birth's Date-Month-Year
D = 0; M = 1; Y = -1

class Person(object):
    def __init__(self, name, age, dob):
        self.name = name
        self.age = age
        self.dob = dob

    def __getitem__(self, indx):
        print ("Calling __getitem__")
        p = copy.copy(self)

        p.name = p.name.split(" ")[indx]
        p.dob = p.dob[indx] # or, p.dob = p.dob.__getitem__(indx)
        return p

假设一个用户输入如下:

p = Person(name = 'Jonab Gutu', age = 20, dob=(1, 1, 1999))

借助__getitem__方法,用户可以访问索引属性。例如,

print p[0].name # print first (or last) name
print p[Y].dob  # print (Date or Month or ) Year of the 'date of birth'

The magic method __getitem__ is basically used for accessing list items, dictionary entries, array elements etc. It is very useful for a quick lookup of instance attributes.

Here I am showing this with an example class Person that can be instantiated by ‘name’, ‘age’, and ‘dob’ (date of birth). The __getitem__ method is written in a way that one can access the indexed instance attributes, such as first or last name, day, month or year of the dob, etc.

import copy

# Constants that can be used to index date of birth's Date-Month-Year
D = 0; M = 1; Y = -1

class Person(object):
    def __init__(self, name, age, dob):
        self.name = name
        self.age = age
        self.dob = dob

    def __getitem__(self, indx):
        print ("Calling __getitem__")
        p = copy.copy(self)

        p.name = p.name.split(" ")[indx]
        p.dob = p.dob[indx] # or, p.dob = p.dob.__getitem__(indx)
        return p

Suppose one user input is as follows:

p = Person(name = 'Jonab Gutu', age = 20, dob=(1, 1, 1999))

With the help of __getitem__ method, the user can access the indexed attributes. e.g.,

print p[0].name # print first (or last) name
print p[Y].dob  # print (Date or Month or ) Year of the 'date of birth'

为什么TensorFlow 2比TensorFlow 1慢得多?

问题:为什么TensorFlow 2比TensorFlow 1慢得多?

许多用户都将其作为切换到Pytorch的原因,但是我还没有找到牺牲/最渴望的实用质量,速度和执行力的理由/解释。

以下是代码基准测试性能,即TF1与TF2的对比-TF1的运行速度提高了47%至276%

我的问题是:在图形或硬件级别上,什么导致如此显着的下降?


寻找详细的答案-已经熟悉广泛的概念。相关的Git

规格:CUDA 10.0.130,cuDNN 7.4.2,Python 3.7.4,Windows 10,GTX 1070


基准测试结果


UPDATE:禁用每下面的代码不会急于执行没有帮助。但是,该行为是不一致的:有时以图形方式运行会有所帮助,而其他时候其运行速度要比 Eager

由于TF开发人员没有出现在任何地方,因此我将自己进行调查-可以跟踪相关Github问题的进展。

更新2:分享大量实验结果,并附有解释;应该在今天完成。


基准代码

# use tensorflow.keras... to benchmark tf.keras; used GPU for all above benchmarks
from keras.layers import Input, Dense, LSTM, Bidirectional, Conv1D
from keras.layers import Flatten, Dropout
from keras.models import Model
from keras.optimizers import Adam
import keras.backend as K
import numpy as np
from time import time

batch_shape = (32, 400, 16)
X, y = make_data(batch_shape)

model_small = make_small_model(batch_shape)
model_small.train_on_batch(X, y)  # skip first iteration which builds graph
timeit(model_small.train_on_batch, 200, X, y)

K.clear_session()  # in my testing, kernel was restarted instead

model_medium = make_medium_model(batch_shape)
model_medium.train_on_batch(X, y)  # skip first iteration which builds graph
timeit(model_medium.train_on_batch, 10, X, y)

使用的功能

def timeit(func, iterations, *args):
    t0 = time()
    for _ in range(iterations):
        func(*args)
    print("Time/iter: %.4f sec" % ((time() - t0) / iterations))

def make_small_model(batch_shape):
    ipt   = Input(batch_shape=batch_shape)
    x     = Conv1D(128, 400, strides=4, padding='same')(ipt)
    x     = Flatten()(x)
    x     = Dropout(0.5)(x)
    x     = Dense(64, activation='relu')(x)
    out   = Dense(1,  activation='sigmoid')(x)
    model = Model(ipt, out)
    model.compile(Adam(lr=1e-4), 'binary_crossentropy')
    return model

def make_medium_model(batch_shape):
    ipt   = Input(batch_shape=batch_shape)
    x     = Bidirectional(LSTM(512, activation='relu', return_sequences=True))(ipt)
    x     = LSTM(512, activation='relu', return_sequences=True)(x)
    x     = Conv1D(128, 400, strides=4, padding='same')(x)
    x     = Flatten()(x)
    x     = Dense(256, activation='relu')(x)
    x     = Dropout(0.5)(x)
    x     = Dense(128, activation='relu')(x)
    x     = Dense(64,  activation='relu')(x)
    out   = Dense(1,   activation='sigmoid')(x)
    model = Model(ipt, out)
    model.compile(Adam(lr=1e-4), 'binary_crossentropy')
    return model

def make_data(batch_shape):
    return np.random.randn(*batch_shape), np.random.randint(0, 2, (batch_shape[0], 1))

It’s been cited by many users as the reason for switching to Pytorch, but I’ve yet to find a justification / explanation for sacrificing the most important practical quality, speed, for eager execution.

Below is code benchmarking performance, TF1 vs. TF2 – with TF1 running anywhere from 47% to 276% faster.

My question is: what is it, at the graph or hardware level, that yields such a significant slowdown?


Looking for a detailed answer – am already familiar with broad concepts. Relevant Git

Specs: CUDA 10.0.130, cuDNN 7.4.2, Python 3.7.4, Windows 10, GTX 1070


Benchmark results:


UPDATE: Disabling Eager Execution per below code does not help. The behavior, however, is inconsistent: sometimes running in graph mode helps considerably, other times it runs slower relative to Eager.

As TF devs don’t appear around anywhere, I’ll be investigating this matter myself – can follow progress in the linked Github issue.

UPDATE 2: tons of experimental results to share, along explanations; should be done today.


Benchmark code:

# use tensorflow.keras... to benchmark tf.keras; used GPU for all above benchmarks
from keras.layers import Input, Dense, LSTM, Bidirectional, Conv1D
from keras.layers import Flatten, Dropout
from keras.models import Model
from keras.optimizers import Adam
import keras.backend as K
import numpy as np
from time import time

batch_shape = (32, 400, 16)
X, y = make_data(batch_shape)

model_small = make_small_model(batch_shape)
model_small.train_on_batch(X, y)  # skip first iteration which builds graph
timeit(model_small.train_on_batch, 200, X, y)

K.clear_session()  # in my testing, kernel was restarted instead

model_medium = make_medium_model(batch_shape)
model_medium.train_on_batch(X, y)  # skip first iteration which builds graph
timeit(model_medium.train_on_batch, 10, X, y)

Functions used:

def timeit(func, iterations, *args):
    t0 = time()
    for _ in range(iterations):
        func(*args)
    print("Time/iter: %.4f sec" % ((time() - t0) / iterations))

def make_small_model(batch_shape):
    ipt   = Input(batch_shape=batch_shape)
    x     = Conv1D(128, 400, strides=4, padding='same')(ipt)
    x     = Flatten()(x)
    x     = Dropout(0.5)(x)
    x     = Dense(64, activation='relu')(x)
    out   = Dense(1,  activation='sigmoid')(x)
    model = Model(ipt, out)
    model.compile(Adam(lr=1e-4), 'binary_crossentropy')
    return model

def make_medium_model(batch_shape):
    ipt   = Input(batch_shape=batch_shape)
    x     = Bidirectional(LSTM(512, activation='relu', return_sequences=True))(ipt)
    x     = LSTM(512, activation='relu', return_sequences=True)(x)
    x     = Conv1D(128, 400, strides=4, padding='same')(x)
    x     = Flatten()(x)
    x     = Dense(256, activation='relu')(x)
    x     = Dropout(0.5)(x)
    x     = Dense(128, activation='relu')(x)
    x     = Dense(64,  activation='relu')(x)
    out   = Dense(1,   activation='sigmoid')(x)
    model = Model(ipt, out)
    model.compile(Adam(lr=1e-4), 'binary_crossentropy')
    return model

def make_data(batch_shape):
    return np.random.randn(*batch_shape), np.random.randint(0, 2, (batch_shape[0], 1))

回答 0

2020年2月18日更新:我每晚排练 2.1和2.1;结果好坏参半。除了一个配置(模型和数据大小)外,其他配置的运行速度都快于TF2和TF1的最佳配置。速度较慢且急剧下降的是大型-尤其是。在图形执行中(慢1.6倍至2.5倍)。

此外,对于我测试的大型模型,Graph和Eager之间存在极大的可重复性差异-无法通过随机性/计算并行性来解释这一差异。我目前无法按时间限制显示这些声明的可重现代码,因此我强烈建议您针对自己的模型进行测试。

尚未针对这些问题打开Git问题,但我确实对原始内容发表了评论-尚未回复。取得进展后,我将更新答案。


VERDICT:它是不是,如果你知道自己在做什么。但是,如果您不这样做,则可能会花费大量成本-平均而言,需要进行几次GPU升级,而在最坏的情况下,则需要多个GPU。


答案:旨在提供对该问题的高级描述,以及有关如何根据您的需求决定培训配置的指南。有关详细的低级描述(包括所有基准测试结果和所使用的代码),请参阅我的其他答案。

如果我学到更多信息,我将更新我的答案,并提供更多信息-可以为该问题添加书签/“加上星号”以供参考。


问题摘要:正如TensorFlow开发人员Q. Scott Zhu 确认的那样,TF2专注于Eager执行和带有Keras的紧密集成的开发,这涉及到TF源的全面更改-包括图形级。好处:大大扩展了处理,分发,调试和部署功能。但是,其中一些成本是速度。

但是,这个问题要复杂得多。不仅仅是TF1和TF2-导致火车速度显着差异的因素包括:

  1. TF2与TF1
  2. 渴望与图表模式
  3. kerastf.keras
  4. numpyvs. tf.data.Datasetvs ….
  5. train_on_batch()fit()
  6. GPU与CPU
  7. model(x)vs. model.predict(x)vs ….

不幸的是,以上几乎没有一个是彼此独立的,并且每个相对于另一个可以至少使执行时间加倍。幸运的是,您可以确定哪些是系统上最有效的方法,并提供一些捷径-正如我将要展示的。


我该怎么办?当前,唯一的方法是-针对您的特定模型,数据和硬件进行实验。没有任何单一的配置总是最好的工作-但也做的,并没有对简化搜索:

>>做:

  • train_on_batch()++ numpy+ tf.kerasTF1 +热切/图
  • train_on_batch()+ numpy+ tf.keras+ + TF2图
  • fit()++ numpy+ tf.kerasTF1 / TF2 +图表+大型模型和数据

>>不要:

  • fit()+ numpy+ keras用于中小型模型和数据
  • fit()++ numpy+ tf.kerasTF1 / TF2 +渴望
  • train_on_batch()+ numpy+ keras+ + TF1伊格

  • [主要] tf.python.keras;它的运行速度可以降低10到100倍,并且带有许多错误;更多信息

    • 这包括layersmodelsoptimizers,和相关的“乱用”的使用进口; ops,utils和相关的“私有”导入都可以-但可以肯定的是,请检查alt以及它们是否用于tf.keras

请参阅其他答案底部的代码,以获取基准测试设置示例。上面的列表主要基于其他答案中的“ BENCHMARKS”表。


上述注意事项的局限性

  • 这个问题的标题是“为什么TF2比TF1慢得多?”,尽管它的主体明确地涉及训练,但问题并不局限于此。即使在相同的TF版本,导入,数据格式等中,推理也将受到主要速度差异的影响-参见此答案
  • RNN在TF2中得到了改进,很可能会明显改变其他答案中的数据网格。
  • 模型主要用于Conv1DDense-不RNNs,稀疏数据/目标,4 / 5D输入,和其他CONFIGS
  • 输入数据限制为numpytf.data.Dataset,同时存在许多其他格式;查看其他答案
  • 使用了GPU;结果在CPU上有所不同。实际上,当我问这个问题时,我的CUDA配置不正确,并且某些结果是基于CPU的。

为什么TF2为急切执行而牺牲了最实用的质量,速度?显然,它还没有-图形仍然可用。但是如果问题是“为什么要渴望”:

  • 出色的调试:您可能会遇到许多问题,询问“如何获得中间层输出”或“如何检查权重”;渴望,它(几乎)很简单.__dict__。相比之下,Graph需要熟悉特殊的后端功能-极大地增加了调试和自省的整个过程。
  • 更快的原型制作:与上述类似的想法;更快的理解=剩下更多的时间用于实际DL。

如何启用/禁用EAGER?

tf.enable_eager_execution()  # TF1; must be done before any model/tensor creation
tf.compat.v1.disable_eager_execution() # TF2; above holds

附加信息

  • 仔细_on_batch()研究TF2中的方法;根据TF开发人员的说法,他们仍然使用较慢的实现方式,但不是故意的 -即必须解决。有关详细信息,请参见其他答案。

张力流需求

  1. 请修复train_on_batch(),以及fit()迭代调用的性能方面;定制火车循环对许多人尤其是我来说很重要。
  2. 添加有关这些性能差异的文档/文档字符串,以供用户了解。
  3. 提高一般执行速度,以防止窥视现象跳入Pytorch。

致谢:感谢


更新

  • 191114日 -找到了一个模型(在我的实际应用程序中),该模型在TF2上针对所有*配置(带有Numpy输入数据)的速度较慢。差异范围为13-19%,平均为17%。但是,keras和之间的tf.keras差异更为明显:平均18-40%。32%(TF1和2)。(*-渴望者(TF2 OOM’d为此)

  • 11/17/19 -devs on_batch()最近的一次提交中更新了方法,指出已提高了速度-将在TF 2.1中发布,或现在以形式提供tf-nightly。由于我无法让后者运行,因此将替补席推迟到2.1。

  • 2/20/20-预测性能也值得借鉴;例如,在TF2中,CPU预测时间可能涉及周期性的峰值

UPDATE 2/18/2020: I’ve benched 2.1 and 2.1-nightly; the results are mixed. All but one configs (model & data size) are as fast as or much faster than the best of TF2 & TF1. The one that’s slower, and slower dramatically, is Large-Large – esp. in Graph execution (1.6x to 2.5x slower).

Furthermore, there are extreme reproducibility differences between Graph and Eager for a large model I tested – one not explainable via randomness/compute-parallelism. I can’t currently present reproducible code for these claims per time constraints, so instead I strongly recommend testing this for your own models.

Haven’t opened a Git issue on these yet, but I did comment on the original – no response yet. I’ll update the answer(s) once progress is made.


VERDICT: it isn’t, IF you know what you’re doing. But if you don’t, it could cost you, lots – by a few GPU upgrades on average, and by multiple GPUs worst-case.


THIS ANSWER: aims to provide a high-level description of the issue, as well as guidelines for how to decide on the training configuration specific to your needs. For a detailed, low-level description, which includes all benchmarking results + code used, see my other answer.

I’ll be updating my answer(s) w/ more info if I learn any – can bookmark / “star” this question for reference.


ISSUE SUMMARY: as confirmed by a TensorFlow developer, Q. Scott Zhu, TF2 focused development on Eager execution & tight integration w/ Keras, which involved sweeping changes in TF source – including at graph-level. Benefits: greatly expanded processing, distribution, debug, and deployment capabilities. The cost of some of these, however, is speed.

The matter, however, is fairly more complex. It isn’t just TF1 vs. TF2 – factors yielding significant differences in train speed include:

  1. TF2 vs. TF1
  2. Eager vs. Graph mode
  3. keras vs. tf.keras
  4. numpy vs. tf.data.Dataset vs. …
  5. train_on_batch() vs. fit()
  6. GPU vs. CPU
  7. model(x) vs. model.predict(x) vs. …

Unfortunately, almost none of the above are independent of the other, and each can at least double execution time relative to another. Fortunately, you can determine what’ll work best systematically, and with a few shortcuts – as I’ll be showing.


WHAT SHOULD I DO? Currently, the only way is – experiment for your specific model, data, and hardware. No single configuration will always work best – but there are do’s and don’t’s to simplify your search:

>> DO:

  • train_on_batch() + numpy + tf.keras + TF1 + Eager/Graph
  • train_on_batch() + numpy + tf.keras + TF2 + Graph
  • fit() + numpy + tf.keras + TF1/TF2 + Graph + large model & data

>> DON’T:

  • fit() + numpy + keras for small & medium models and data
  • fit() + numpy + tf.keras + TF1/TF2 + Eager
  • train_on_batch() + numpy + keras + TF1 + Eager

  • [Major] tf.python.keras; it can run 10-100x slower, and w/ plenty of bugs; more info

    • This includes layers, models, optimizers, & related “out-of-box” usage imports; ops, utils, & related ‘private’ imports are fine – but to be sure, check for alts, & whether they’re used in tf.keras

Refer to code at bottom of my other answer for an example benchmarking setup. The list above is based mainly on the “BENCHMARKS” tables in the other answer.


LIMITATIONS of the above DO’s & DON’T’s:

  • This question’s titled “Why is TF2 much slower than TF1?”, and while its body concerns training explicitly, the matter isn’t limited to it; inference, too, is subject to major speed differences, even within the same TF version, import, data format, etc. – see this answer.
  • RNNs are likely to notably change the data grid in the other answer, as they’ve been improved in TF2
  • Models primarily used Conv1D and Dense – no RNNs, sparse data/targets, 4/5D inputs, & other configs
  • Input data limited to numpy and tf.data.Dataset, while many other formats exist; see other answer
  • GPU was used; results will differ on a CPU. In fact, when I asked the question, my CUDA wasn’t properly configured, and some of the results were CPU-based.

Why did TF2 sacrifice the most practical quality, speed, for eager execution? It hasn’t, clearly – graph is still available. But if the question is “why eager at all”:

  • Superior debugging: you’ve likely come across multitudes of questions asking “how do I get intermediate layer outputs” or “how do I inspect weights”; with eager, it’s (almost) as simple as .__dict__. Graph, in contrast, requires familiarity with special backend functions – greatly complicating the entire process of debugging & introspection.
  • Faster prototyping: per ideas similar to above; faster understanding = more time left for actual DL.

HOW TO ENABLE/DISABLE EAGER?

tf.enable_eager_execution()  # TF1; must be done before any model/tensor creation
tf.compat.v1.disable_eager_execution() # TF2; above holds

ADDITIONAL INFO:

  • Careful with _on_batch() methods in TF2; according to the TF dev, they still use a slower implementation, but not intentionally – i.e. it’s to be fixed. See other answer for details.

REQUESTS TO TENSORFLOW DEVS:

  1. Please fix train_on_batch(), and the performance aspect of calling fit() iteratively; custom train loops are important to many, especially to me.
  2. Add documentation / docstring mention of these performance differences for users’ knowledge.
  3. Improve general execution speed to keep peeps from hopping to Pytorch.

ACKNOWLEDGEMENTS: Thanks to


UPDATES:

  • 11/14/19 – found a model (in my real application) that that runs slower on TF2 for all* configurations w/ Numpy input data. Differences ranged 13-19%, averaging 17%. Differences between keras and tf.keras, however, were more dramatic: 18-40%, avg. 32% (both TF1 & 2). (* – except Eager, for which TF2 OOM’d)

  • 11/17/19 – devs updated on_batch() methods in a recent commit, stating to have improved speed – to be released in TF 2.1, or available now as tf-nightly. As I’m unable to get latter running, will delay benching until 2.1.

  • 2/20/20 – prediction performance is also worth benching; in TF2, for example, CPU prediction times can involve periodic spikes

回答 1

解答:旨在提供对该问题的详细图形/硬件级别描述-包括TF2与TF1训练循环,输入数据处理器以及Eager与Graph模式的执行。有关问题摘要和解决方案的指南,请参见我的其他答案。


性能验证:有时一个更快,有时另一个更快,具体取决于配置。就TF2与TF1而言,它们的平均水平差不多,但是确实存在基于配置的重大差异,并且TF1比TF2更为常见。请参阅下面的“标记”。


EAGER VS. GRAPH:这可以说是整个答案的关键:根据我的测试,TF2的渴望比TF1的渴望。细节进一步下降。

两者之间的根本区别是:Graph 主动设置计算网络,并在“提示”时执行-而Eager在创建时执行所有操作。但故事只从这里开始:

  • 渴望并不是没有Graph,实际上可能主要是 Graph,这与预期相反。它主要是执行图 -包括模型和优化器权重,占图的很大一部分。

  • 渴望在执行时重建自己图的一部分 ; Graph未完全构建的直接结果-请参阅分析器结果。这具有计算开销。

  • 渴望慢与脾气暴躁的输入 ; 根据此Git注释和代码,Eager中的Numpy输入包括将张量从CPU复制到GPU的开销成本。遍历源代码,数据处理差异很明显;渴望直接通过Numpy,而图则通过张量,然后求和为Numpy。不确定确切的过程,但后者应涉及GPU级别的优化

  • TF2 Eager 比TF1 Eager -这是…意外。请参阅下面的基准测试结果。差异从可以忽略不计到显着,但是是一致的。不确定为什么会这样-如果TF开发人员澄清了,将会更新答案。


TF2与TF1:引用TF开发人员Q. Scott Zhu的相关部分的回复 -附上我的强调和改写:

急切地,运行时需要执行ops并为python代码的每一行返回数值。单步执行的性质使其运行缓慢

在TF2中,Keras利用tf.function构建其图形进行训练,评估和预测。我们称它们为模型的“执行功能”。在TF1中,“执行功能”是FuncGraph,它与TF功能共享一些公共组件,但是实现方式不同。

在此过程中,我们以某种方式为train_on_batch(),test_on_batch()和预报_on_batch()留下了错误的实现。它们在数值上仍然是正确的,但是x_on_batch的执行函数是纯python函数,而不是tf.function包装的python函数。这会导致缓慢

在TF2中,我们将所有输入数据转换为tf.data.Dataset,通过它我们可以统一执行函数来处理单一类型的输入。数据集转换中可能会有一些开销,我认为这是一次性的开销,而不是每次批处理的开销

带有上一段的最后一句和下段的最后一句:

为了克服急切模式下的缓慢性,我们提供了@ tf.function,它将把python函数变成图形。当像np数组一样输入数值时,tf.function的主体将转换为静态图,进行优化,并返回最终值,该值很快,并且应具有与TF1图模式相似的性能。

我不同意-根据我的分析结果,该结果表明Eager的输入数据处理比Graph的处理要慢得多。另外,tf.data.Dataset尤其不确定,但是Eager确实反复调用了多个相同的数据转换方法-请参阅事件探查器。

最后,开发人员的链接提交:支持Keras v2循环的大量更改


训练循环:取决于(1)渴望与图表;(2)输入数据格式,训练将在一个独特的训练循环中进行-在TF2中_select_training_loop()training.py,其中之一:

training_v2.Loop()
training_distributed.DistributionMultiWorkerTrainingLoop(
              training_v2.Loop()) # multi-worker mode
# Case 1: distribution strategy
training_distributed.DistributionMultiWorkerTrainingLoop(
            training_distributed.DistributionSingleWorkerTrainingLoop())
# Case 2: generator-like. Input is Python generator, or Sequence object,
# or a non-distributed Dataset or iterator in eager execution.
training_generator.GeneratorOrSequenceTrainingLoop()
training_generator.EagerDatasetOrIteratorTrainingLoop()
# Case 3: Symbolic tensors or Numpy array-like. This includes Datasets and iterators 
# in graph mode (since they generate symbolic tensors).
training_generator.GeneratorLikeTrainingLoop() # Eager
training_arrays.ArrayLikeTrainingLoop() # Graph

每个人对资源分配的处理方式不同,并会对性能和功能造成影响。


火车循环:fitvs train_on_batchkerasvstf.keras:四个循环都使用不同的火车循环,尽管可能不是每种可能的组合。kerasfit,例如,使用的形式fit_loop,例如training_arrays.fit_loop(),其train_on_batch可以使用K.function()tf.keras具有更复杂的层次结构,在上一节中进行了部分描述。


训练循环:文档 -有关某些不同执行方法的相关源文档字符串

与其他TensorFlow操作不同,我们不会将python数值输入转换为张量。此外,将为每个不同的python数值生成一个新图

function 为每个唯一的输入形状和数据类型集实例化一个单独的图

一个tf.function对象可能需要映射到后台的多个计算图。这应该仅在性能上可见(跟踪图的计算和内存成本非零


输入数据处理器:与上述类似,根据运行时配置(执行模式,数据格式,分发策略)设置的内部标志,视情况选择处理器。最简单的情况是使用Eager,它可以直接与Numpy数组一起使用。有关某些特定示例,请参见此答案


模型大小,数据大小:

  • 是决定性的;没有任何一种配置能在所有型号和数据尺寸上脱颖而出。
  • 相对于模型大小的数据大小很重要;对于小型数据和模型,数据传输(例如,CPU至GPU)的开销可能占主导。同样,小型的开销处理器在每个数据转换时间对大型数据的运行速度上较慢(请参见 convert_to_tensor“配置文件”)
  • 速度因火车循环和输入数据处理器处理资源的方式而异。

基准:磨碎的肉。- Word文档Excel电子表格


术语

  • 减去%的数字都是
  • %计算为(1 - longer_time / shorter_time)*100; 理由:我们对哪个因素比另一个因素更快感兴趣shorter / longer实际上是非线性关系,对直接比较没有用
  • 百分号确定:
    • TF2 vs TF1:+如果TF2更快
    • GvE(图表vs.渴望):+如果图表更快
  • TF2 = TensorFlow 2.0.0 + Keras 2.3.1; TF1 = TensorFlow 1.14.0 + Keras 2.2.5

简介


PROFILER-说明:Spyder 3.3.6 IDE分析器。

  • 有些功能在其他嵌套中重复;因此,很难找到“数据处理”和“训练”功能之间的确切间隔,因此会有一些重叠-在最后一个结果中很明显。

  • 计算的wrt运行时数减去构建时间的百分比

  • 通过将所有(唯一)运行时间相加得出的构建时间来计算,这些运行时间称为1或2次
  • 通过累加所有(唯一的)运行时间(与迭代的次数和它们的嵌套的运行时间相同)计算出的训练时间
  • 不幸的是,函数是根据其原始名称进行_func = func概要分析的(即,将概要分析为func),这会混入构建时间-因此需要将其排除在外

测试环境

  • 底部执行的代码带有最少的后台任务运行
  • GPU是“热身” W /定时重复前几次反复,在提出这个帖子
  • 从源代码构建的CUDA 10.0.130,cuDNN 7.6.0,TensorFlow 1.14.0和TensorFlow 2.0.0,以及Anaconda
  • Python 3.7.4,Spyder 3.3.6 IDE
  • GTX 1070,Windows 10、24 GB DDR4 2.4 MHz RAM,i7-7700HQ 2.8 GHz CPU

方法

  • 基准“小”,“中”和“大”模型和数据大小
  • 固定每个模型大小的参数数,与输入数据大小无关
  • “较大”模型具有更多参数和层
  • “较大”的数据具有更长的序列,但相同batch_sizenum_channels
  • 模型只使用Conv1DDense“可学习”层; 每个TF版本的符号都避免了RNN。差异
  • 始终在基准循环之外运行一列火车,以省略模型和优化器图的构建
  • 不使用稀疏数据(例如layers.Embedding())或稀疏目标(例如SparseCategoricalCrossEntropy()

局限性:“完整”的答案将解释所有可能的火车循环和迭代器,但这肯定超出了我的时间能力,不存在薪水或一般必要性。结果仅与方法学一样好-用开放的心态进行解释。


代码

import numpy as np
import tensorflow as tf
import random
from termcolor import cprint
from time import time

from tensorflow.keras.layers import Input, Dense, Conv1D
from tensorflow.keras.layers import Dropout, GlobalAveragePooling1D
from tensorflow.keras.models import Model
from tensorflow.keras.optimizers import Adam
import tensorflow.keras.backend as K
#from keras.layers import Input, Dense, Conv1D
#from keras.layers import Dropout, GlobalAveragePooling1D
#from keras.models import Model 
#from keras.optimizers import Adam
#import keras.backend as K

#tf.compat.v1.disable_eager_execution()
#tf.enable_eager_execution()

def reset_seeds(reset_graph_with_backend=None, verbose=1):
    if reset_graph_with_backend is not None:
        K = reset_graph_with_backend
        K.clear_session()
        tf.compat.v1.reset_default_graph()
        if verbose:
            print("KERAS AND TENSORFLOW GRAPHS RESET")

    np.random.seed(1)
    random.seed(2)
    if tf.__version__[0] == '2':
        tf.random.set_seed(3)
    else:
        tf.set_random_seed(3)
    if verbose:
        print("RANDOM SEEDS RESET")

print("TF version: {}".format(tf.__version__))
reset_seeds()

def timeit(func, iterations, *args, _verbose=0, **kwargs):
    t0 = time()
    for _ in range(iterations):
        func(*args, **kwargs)
        print(end='.'*int(_verbose))
    print("Time/iter: %.4f sec" % ((time() - t0) / iterations))

def make_model_small(batch_shape):
    ipt   = Input(batch_shape=batch_shape)
    x     = Conv1D(128, 40, strides=4, padding='same')(ipt)
    x     = GlobalAveragePooling1D()(x)
    x     = Dropout(0.5)(x)
    x     = Dense(64, activation='relu')(x)
    out   = Dense(1,  activation='sigmoid')(x)
    model = Model(ipt, out)
    model.compile(Adam(lr=1e-4), 'binary_crossentropy')
    return model

def make_model_medium(batch_shape):
    ipt = Input(batch_shape=batch_shape)
    x = ipt
    for filters in [64, 128, 256, 256, 128, 64]:
        x  = Conv1D(filters, 20, strides=1, padding='valid')(x)
    x     = GlobalAveragePooling1D()(x)
    x     = Dense(256, activation='relu')(x)
    x     = Dropout(0.5)(x)
    x     = Dense(128, activation='relu')(x)
    x     = Dense(64,  activation='relu')(x)
    out   = Dense(1,   activation='sigmoid')(x)
    model = Model(ipt, out)
    model.compile(Adam(lr=1e-4), 'binary_crossentropy')
    return model

def make_model_large(batch_shape):
    ipt   = Input(batch_shape=batch_shape)
    x     = Conv1D(64,  400, strides=4, padding='valid')(ipt)
    x     = Conv1D(128, 200, strides=1, padding='valid')(x)
    for _ in range(40):
        x = Conv1D(256,  12, strides=1, padding='same')(x)
    x     = Conv1D(512,  20, strides=2, padding='valid')(x)
    x     = Conv1D(1028, 10, strides=2, padding='valid')(x)
    x     = Conv1D(256,   1, strides=1, padding='valid')(x)
    x     = GlobalAveragePooling1D()(x)
    x     = Dense(256, activation='relu')(x)
    x     = Dropout(0.5)(x)
    x     = Dense(128, activation='relu')(x)
    x     = Dense(64,  activation='relu')(x)    
    out   = Dense(1,   activation='sigmoid')(x)
    model = Model(ipt, out)
    model.compile(Adam(lr=1e-4), 'binary_crossentropy')
    return model

def make_data(batch_shape):
    return np.random.randn(*batch_shape), \
           np.random.randint(0, 2, (batch_shape[0], 1))

def make_data_tf(batch_shape, n_batches, iters):
    data = np.random.randn(n_batches, *batch_shape),
    trgt = np.random.randint(0, 2, (n_batches, batch_shape[0], 1))
    return tf.data.Dataset.from_tensor_slices((data, trgt))#.repeat(iters)

batch_shape_small  = (32, 140,   30)
batch_shape_medium = (32, 1400,  30)
batch_shape_large  = (32, 14000, 30)

batch_shapes = batch_shape_small, batch_shape_medium, batch_shape_large
make_model_fns = make_model_small, make_model_medium, make_model_large
iterations = [200, 100, 50]
shape_names = ["Small data",  "Medium data",  "Large data"]
model_names = ["Small model", "Medium model", "Large model"]

def test_all(fit=False, tf_dataset=False):
    for model_fn, model_name, iters in zip(make_model_fns, model_names, iterations):
        for batch_shape, shape_name in zip(batch_shapes, shape_names):
            if (model_fn is make_model_large) and (batch_shape is batch_shape_small):
                continue
            reset_seeds(reset_graph_with_backend=K)
            if tf_dataset:
                data = make_data_tf(batch_shape, iters, iters)
            else:
                data = make_data(batch_shape)
            model = model_fn(batch_shape)

            if fit:
                if tf_dataset:
                    model.train_on_batch(data.take(1))
                    t0 = time()
                    model.fit(data, steps_per_epoch=iters)
                    print("Time/iter: %.4f sec" % ((time() - t0) / iters))
                else:
                    model.train_on_batch(*data)
                    timeit(model.fit, iters, *data, _verbose=1, verbose=0)
            else:
                model.train_on_batch(*data)
                timeit(model.train_on_batch, iters, *data, _verbose=1)
            cprint(">> {}, {} done <<\n".format(model_name, shape_name), 'blue')
            del model

test_all(fit=True, tf_dataset=False)

THIS ANSWER: aims to provide a detailed, graph/hardware-level description of the issue – including TF2 vs. TF1 train loops, input data processors, and Eager vs. Graph mode executions. For an issue summary & resolution guidelines, see my other answer.


PERFORMANCE VERDICT: sometimes one is faster, sometimes the other, depending on configuration. As far as TF2 vs TF1 goes, they’re about on par on average, but significant config-based differences do exist, and TF1 trumps TF2 more often than vice versa. See “BENCHMARKING” below.


EAGER VS. GRAPH: the meat of this entire answer for some: TF2’s eager is slower than TF1’s, according to my testing. Details further down.

The fundamental difference between the two is: Graph sets up a computational network proactively, and executes when ‘told to’ – whereas Eager executes everything upon creation. But the story only begins here:

  • Eager is NOT devoid of Graph, and may in fact be mostly Graph, contrary to expectation. What it largely is, is executed Graph – this includes model & optimizer weights, comprising a great portion of the graph.

  • Eager rebuilds part of own graph at execution; direct consequence of Graph not being fully built — see profiler results. This has a computational overhead.

  • Eager is slower w/ Numpy inputs; per this Git comment & code, Numpy inputs in Eager include the overhead cost of copying tensors from CPU to GPU. Stepping through source code, data handling differences are clear; Eager directly passes Numpy, while Graph passes tensors which then evaluate to Numpy; uncertain of the exact process, but latter should involve GPU-level optimizations

  • TF2 Eager is slower than TF1 Eager – this is… unexpected. See benchmarking results below. Differences span from negligible to significant, but are consistent. Unsure why it’s the case – if a TF dev clarifies, will update answer.


TF2 vs. TF1: quoting relevant portions of a TF dev’s, Q. Scott Zhu’s, response – w/ bit of my emphasis & rewording:

In eager, the runtime needs to execute the ops and return the numerical value for every line of python code. The nature of single step execution causes it to be slow.

In TF2, Keras leverages tf.function to build its graph for training, eval and prediction. We call them “execution function” for the model. In TF1, the “execution function” was a FuncGraph, which shared some common component as TF function, but has a different implementation.

During the process, we somehow left an incorrect implementation for train_on_batch(), test_on_batch() and predict_on_batch(). They are still numerically correct, but the execution function for x_on_batch is a pure python function, rather than a tf.function wrapped python function. This will cause slowness

In TF2, we convert all input data into a tf.data.Dataset, by which we can unify our execution function to handle the single type of the inputs. There might be some overhead in the dataset conversion, and I think this is a one-time only overhead, rather than a per-batch cost

With the last sentence of last paragraph above, and last clause of below paragraph:

To overcome the slowness in eager mode, we have @tf.function, which will turn a python function into a graph. When feed numerical value like np array, the body of the tf.function is converted into static graph, being optimized, and return the final value, which is fast and should have similar performance as TF1 graph mode.

I disagree – per my profiling results, which show Eager’s input data processing to be substantially slower than Graph’s. Also, unsure about tf.data.Dataset in particular, but Eager does repeatedly call multiple of the same data conversion methods – see profiler.

Lastly, dev’s linked commit: Significant number of changes to support the Keras v2 loops.


Train Loops: depending on (1) Eager vs. Graph; (2) input data format, training in will proceed with a distinct train loop – in TF2, _select_training_loop(), training.py, one of:

training_v2.Loop()
training_distributed.DistributionMultiWorkerTrainingLoop(
              training_v2.Loop()) # multi-worker mode
# Case 1: distribution strategy
training_distributed.DistributionMultiWorkerTrainingLoop(
            training_distributed.DistributionSingleWorkerTrainingLoop())
# Case 2: generator-like. Input is Python generator, or Sequence object,
# or a non-distributed Dataset or iterator in eager execution.
training_generator.GeneratorOrSequenceTrainingLoop()
training_generator.EagerDatasetOrIteratorTrainingLoop()
# Case 3: Symbolic tensors or Numpy array-like. This includes Datasets and iterators 
# in graph mode (since they generate symbolic tensors).
training_generator.GeneratorLikeTrainingLoop() # Eager
training_arrays.ArrayLikeTrainingLoop() # Graph

Each handles resource allocation differently, and bears consequences on performance & capability.


Train Loops: fit vs train_on_batch, keras vs. tf.keras: each of the four uses different train loops, though perhaps not in every possible combination. kerasfit, for example, uses a form of fit_loop, e.g. training_arrays.fit_loop(), and its train_on_batch may use K.function(). tf.keras has a more sophisticated hierarchy described in part in previous section.


Train Loops: documentation — relevant source docstring on some of the different execution methods:

Unlike other TensorFlow operations, we don’t convert python numerical inputs to tensors. Moreover, a new graph is generated for each distinct python numerical value

function instantiates a separate graph for every unique set of input shapes and datatypes.

A single tf.function object might need to map to multiple computation graphs under the hood. This should be visible only as performance (tracing graphs has a nonzero computational and memory cost)


Input data processors: similar to above, the processor is selected case-by-case, depending on internal flags set according to runtime configurations (execution mode, data format, distribution strategy). The simplest case’s with Eager, which works directly w/ Numpy arrays. For some specific examples, see this answer.


MODEL SIZE, DATA SIZE:

  • Is decisive; no single configuration crowned itself atop all model & data sizes.
  • Data size relative to model size is important; for small data & model, data transfer (e.g. CPU to GPU) overhead can dominate. Likewise, small overhead processors can run slower on large data per data conversion time dominating (see convert_to_tensor in “PROFILER”)
  • Speed differs per train loops’ and input data processors’ differing means of handling resources.

BENCHMARKS: the grinded meat. — Word DocumentExcel Spreadsheet


Terminology:

  • %-less numbers are all seconds
  • % computed as (1 - longer_time / shorter_time)*100; rationale: we’re interested by what factor one is faster than the other; shorter / longer is actually a non-linear relation, not useful for direct comparison
  • % sign determination:
    • TF2 vs TF1: + if TF2 is faster
    • GvE (Graph vs. Eager): + if Graph is faster
  • TF2 = TensorFlow 2.0.0 + Keras 2.3.1; TF1 = TensorFlow 1.14.0 + Keras 2.2.5

PROFILER:


PROFILER – Explanation: Spyder 3.3.6 IDE profiler.

  • Some functions are repeated in nests of others; hence, it’s hard to track down the exact separation between “data processing” and “training” functions, so there will be some overlap – as pronounced in the very last result.

  • % figures computed w.r.t. runtime minus build time

  • Build time computed by summing all (unique) runtimes which were called 1 or 2 times
  • Train time computed by summing all (unique) runtimes which were called the same # of times as the # of iterations, and some of their nests’ runtimes
  • Functions are profiled according to their original names, unfortunately (i.e. _func = func will profile as func), which mixes in build time – hence the need to exclude it

TESTING ENVIRONMENT:

  • Executed code at bottom w/ minimal background tasks running
  • GPU was “warmed up” w/ a few iterations before timing iterations, as suggested in this post
  • CUDA 10.0.130, cuDNN 7.6.0, TensorFlow 1.14.0, & TensorFlow 2.0.0 built from source, plus Anaconda
  • Python 3.7.4, Spyder 3.3.6 IDE
  • GTX 1070, Windows 10, 24GB DDR4 2.4-MHz RAM, i7-7700HQ 2.8-GHz CPU

METHODOLOGY:

  • Benchmark ‘small’, ‘medium’, & ‘large’ model & data sizes
  • Fix # of parameters for each model size, independent of input data size
  • “Larger” model has more parameters and layers
  • “Larger” data has a longer sequence, but same batch_size and num_channels
  • Models only use Conv1D, Dense ‘learnable’ layers; RNNs avoided per TF-version implem. differences
  • Always ran one train fit outside of benchmarking loop, to omit model & optimizer graph building
  • Not using sparse data (e.g. layers.Embedding()) or sparse targets (e.g. SparseCategoricalCrossEntropy()

LIMITATIONS: a “complete” answer would explain every possible train loop & iterator, but that’s surely beyond my time ability, nonexistent paycheck, or general necessity. The results are only as good as the methodology – interpret with an open mind.


CODE:

import numpy as np
import tensorflow as tf
import random
from termcolor import cprint
from time import time

from tensorflow.keras.layers import Input, Dense, Conv1D
from tensorflow.keras.layers import Dropout, GlobalAveragePooling1D
from tensorflow.keras.models import Model
from tensorflow.keras.optimizers import Adam
import tensorflow.keras.backend as K
#from keras.layers import Input, Dense, Conv1D
#from keras.layers import Dropout, GlobalAveragePooling1D
#from keras.models import Model 
#from keras.optimizers import Adam
#import keras.backend as K

#tf.compat.v1.disable_eager_execution()
#tf.enable_eager_execution()

def reset_seeds(reset_graph_with_backend=None, verbose=1):
    if reset_graph_with_backend is not None:
        K = reset_graph_with_backend
        K.clear_session()
        tf.compat.v1.reset_default_graph()
        if verbose:
            print("KERAS AND TENSORFLOW GRAPHS RESET")

    np.random.seed(1)
    random.seed(2)
    if tf.__version__[0] == '2':
        tf.random.set_seed(3)
    else:
        tf.set_random_seed(3)
    if verbose:
        print("RANDOM SEEDS RESET")

print("TF version: {}".format(tf.__version__))
reset_seeds()

def timeit(func, iterations, *args, _verbose=0, **kwargs):
    t0 = time()
    for _ in range(iterations):
        func(*args, **kwargs)
        print(end='.'*int(_verbose))
    print("Time/iter: %.4f sec" % ((time() - t0) / iterations))

def make_model_small(batch_shape):
    ipt   = Input(batch_shape=batch_shape)
    x     = Conv1D(128, 40, strides=4, padding='same')(ipt)
    x     = GlobalAveragePooling1D()(x)
    x     = Dropout(0.5)(x)
    x     = Dense(64, activation='relu')(x)
    out   = Dense(1,  activation='sigmoid')(x)
    model = Model(ipt, out)
    model.compile(Adam(lr=1e-4), 'binary_crossentropy')
    return model

def make_model_medium(batch_shape):
    ipt = Input(batch_shape=batch_shape)
    x = ipt
    for filters in [64, 128, 256, 256, 128, 64]:
        x  = Conv1D(filters, 20, strides=1, padding='valid')(x)
    x     = GlobalAveragePooling1D()(x)
    x     = Dense(256, activation='relu')(x)
    x     = Dropout(0.5)(x)
    x     = Dense(128, activation='relu')(x)
    x     = Dense(64,  activation='relu')(x)
    out   = Dense(1,   activation='sigmoid')(x)
    model = Model(ipt, out)
    model.compile(Adam(lr=1e-4), 'binary_crossentropy')
    return model

def make_model_large(batch_shape):
    ipt   = Input(batch_shape=batch_shape)
    x     = Conv1D(64,  400, strides=4, padding='valid')(ipt)
    x     = Conv1D(128, 200, strides=1, padding='valid')(x)
    for _ in range(40):
        x = Conv1D(256,  12, strides=1, padding='same')(x)
    x     = Conv1D(512,  20, strides=2, padding='valid')(x)
    x     = Conv1D(1028, 10, strides=2, padding='valid')(x)
    x     = Conv1D(256,   1, strides=1, padding='valid')(x)
    x     = GlobalAveragePooling1D()(x)
    x     = Dense(256, activation='relu')(x)
    x     = Dropout(0.5)(x)
    x     = Dense(128, activation='relu')(x)
    x     = Dense(64,  activation='relu')(x)    
    out   = Dense(1,   activation='sigmoid')(x)
    model = Model(ipt, out)
    model.compile(Adam(lr=1e-4), 'binary_crossentropy')
    return model

def make_data(batch_shape):
    return np.random.randn(*batch_shape), \
           np.random.randint(0, 2, (batch_shape[0], 1))

def make_data_tf(batch_shape, n_batches, iters):
    data = np.random.randn(n_batches, *batch_shape),
    trgt = np.random.randint(0, 2, (n_batches, batch_shape[0], 1))
    return tf.data.Dataset.from_tensor_slices((data, trgt))#.repeat(iters)

batch_shape_small  = (32, 140,   30)
batch_shape_medium = (32, 1400,  30)
batch_shape_large  = (32, 14000, 30)

batch_shapes = batch_shape_small, batch_shape_medium, batch_shape_large
make_model_fns = make_model_small, make_model_medium, make_model_large
iterations = [200, 100, 50]
shape_names = ["Small data",  "Medium data",  "Large data"]
model_names = ["Small model", "Medium model", "Large model"]

def test_all(fit=False, tf_dataset=False):
    for model_fn, model_name, iters in zip(make_model_fns, model_names, iterations):
        for batch_shape, shape_name in zip(batch_shapes, shape_names):
            if (model_fn is make_model_large) and (batch_shape is batch_shape_small):
                continue
            reset_seeds(reset_graph_with_backend=K)
            if tf_dataset:
                data = make_data_tf(batch_shape, iters, iters)
            else:
                data = make_data(batch_shape)
            model = model_fn(batch_shape)

            if fit:
                if tf_dataset:
                    model.train_on_batch(data.take(1))
                    t0 = time()
                    model.fit(data, steps_per_epoch=iters)
                    print("Time/iter: %.4f sec" % ((time() - t0) / iters))
                else:
                    model.train_on_batch(*data)
                    timeit(model.fit, iters, *data, _verbose=1, verbose=0)
            else:
                model.train_on_batch(*data)
                timeit(model.train_on_batch, iters, *data, _verbose=1)
            cprint(">> {}, {} done <<\n".format(model_name, shape_name), 'blue')
            del model

test_all(fit=True, tf_dataset=False)

如何在Python中打破一系列链接方法?

问题:如何在Python中打破一系列链接方法?

我有以下代码行(不要怪罪命名约定,它们不是我的):

subkeyword = Session.query(
    Subkeyword.subkeyword_id, Subkeyword.subkeyword_word
).filter_by(
    subkeyword_company_id=self.e_company_id
).filter_by(
    subkeyword_word=subkeyword_word
).filter_by(
    subkeyword_active=True
).one()

我不喜欢它的外观(不太可读),但是在这种情况下,我没有更好的主意将行数限制为79个字符。有没有更好的方法来破解它(最好没有反斜杠)?

I have a line of the following code (don’t blame for naming conventions, they are not mine):

subkeyword = Session.query(
    Subkeyword.subkeyword_id, Subkeyword.subkeyword_word
).filter_by(
    subkeyword_company_id=self.e_company_id
).filter_by(
    subkeyword_word=subkeyword_word
).filter_by(
    subkeyword_active=True
).one()

I don’t like how it looks like (not too readable) but I don’t have any better idea to limit lines to 79 characters in this situation. Is there a better way of breaking it (preferably without backslashes)?


回答 0

您可以使用其他括号:

subkeyword = (
        Session.query(Subkeyword.subkeyword_id, Subkeyword.subkeyword_word)
        .filter_by(subkeyword_company_id=self.e_company_id)
        .filter_by(subkeyword_word=subkeyword_word)
        .filter_by(subkeyword_active=True)
        .one()
    )

You could use additional parenthesis:

subkeyword = (
        Session.query(Subkeyword.subkeyword_id, Subkeyword.subkeyword_word)
        .filter_by(subkeyword_company_id=self.e_company_id)
        .filter_by(subkeyword_word=subkeyword_word)
        .filter_by(subkeyword_active=True)
        .one()
    )

回答 1

在这种情况下,最好使用连续行字符代替括号。随着方法名称变长以及方法开始采用参数,对这种样式的需求变得更加明显:

subkeyword = Session.query(Subkeyword.subkeyword_id, Subkeyword.subkeyword_word) \
                    .filter_by(subkeyword_company_id=self.e_company_id)          \
                    .filter_by(subkeyword_word=subkeyword_word)                  \
                    .filter_by(subkeyword_active=True)                           \
                    .one()

PEP 8旨在以一种常识的方式进行解释,并兼顾实用性和美观性。很高兴违反任何导致难看或难以阅读代码的PEP 8准则。

话虽如此,如果您经常发现自己与PEP 8不符,则可能表明存在一些可读性问题超出了对空白的选择:-)

This is a case where a line continuation character is preferred to open parentheses. The need for this style becomes more obvious as method names get longer and as methods start taking arguments:

subkeyword = Session.query(Subkeyword.subkeyword_id, Subkeyword.subkeyword_word) \
                    .filter_by(subkeyword_company_id=self.e_company_id)          \
                    .filter_by(subkeyword_word=subkeyword_word)                  \
                    .filter_by(subkeyword_active=True)                           \
                    .one()

PEP 8 is intend to be interpreted with a measure of common-sense and an eye for both the practical and the beautiful. Happily violate any PEP 8 guideline that results in ugly or hard to read code.

That being said, if you frequently find yourself at odds with PEP 8, it may be a sign that there are readability issues that transcend your choice of whitespace :-)


回答 2

我个人的选择是:

子关键字= Session.query(
    Subkeyword.subkeyword_id,
    Subkeyword.subkeyword_word,
)。过滤(
    subkeyword_company_id = self.e_company_id,
    subkeyword_word = subkeyword_word,
    subkeyword_active =真实,
)。之一()

My personal choice would be:

subkeyword = Session.query(
    Subkeyword.subkeyword_id,
    Subkeyword.subkeyword_word,
).filter_by(
    subkeyword_company_id=self.e_company_id,
    subkeyword_word=subkeyword_word,
    subkeyword_active=True,
).one()

回答 3

只需存储中间结果/对象并在其上调用下一个方法,例如

q = Session.query(Subkeyword.subkeyword_id, Subkeyword.subkeyword_word)
q = q.filter_by(subkeyword_company_id=self.e_company_id)
q = q.filter_by(subkeyword_word=subkeyword_word)
q = q.filter_by(subkeyword_active=True)
subkeyword = q.one()

Just store the intermediate result/object and invoke the next method on it, e.g.

q = Session.query(Subkeyword.subkeyword_id, Subkeyword.subkeyword_word)
q = q.filter_by(subkeyword_company_id=self.e_company_id)
q = q.filter_by(subkeyword_word=subkeyword_word)
q = q.filter_by(subkeyword_active=True)
subkeyword = q.one()

回答 4

根据Python语言参考,
您可以使用反斜杠。
或简单地打破它。如果括号未配对,则python不会将其视为一行。在这种情况下,以下行的缩进无关紧要。

According to Python Language Reference
You can use a backslash.
Or simply break it. If a bracket is not paired, python will not treat that as a line. And under such circumstance, the indentation of following lines doesn’t matter.


回答 5

它与其他人提供的解决方案有些不同,但我的最爱,因为它有时会导致漂亮的元编程。

base = [Subkeyword.subkeyword_id, Subkeyword_word]
search = {
    'subkeyword_company_id':self.e_company_id,
    'subkeyword_word':subkeyword_word,
    'subkeyword_active':True,
    }
subkeyword = Session.query(*base).filter_by(**search).one()

这是构建搜索的好方法。浏览条件列表,从复杂的查询表单(或基于字符串的有关用户正在寻找的内容的推论)中进行挖掘,然后将字典分解到过滤器中。

It’s a bit of a different solution than provided by others but a favorite of mine since it leads to nifty metaprogramming sometimes.

base = [Subkeyword.subkeyword_id, Subkeyword_word]
search = {
    'subkeyword_company_id':self.e_company_id,
    'subkeyword_word':subkeyword_word,
    'subkeyword_active':True,
    }
subkeyword = Session.query(*base).filter_by(**search).one()

This is a nice technique for building searches. Go through a list of conditionals to mine from your complex query form (or string-based deductions about what the user is looking for), then just explode the dictionary into the filter.


回答 6

您似乎在使用SQLAlchemy,如果为true,则sqlalchemy.orm.query.Query.filter_by()方法采用多个关键字参数,因此您可以这样编写:

subkeyword = Session.query(Subkeyword.subkeyword_id,
                           Subkeyword.subkeyword_word) \
                    .filter_by(subkeyword_company_id=self.e_company_id,
                               subkeyword_word=subkeyword_word,
                               subkeyword_active=True) \
                    .one()

但这会更好:

subkeyword = Session.query(Subkeyword.subkeyword_id,
                           Subkeyword.subkeyword_word)
subkeyword = subkeyword.filter_by(subkeyword_company_id=self.e_company_id,
                                  subkeyword_word=subkeyword_word,
                                  subkeyword_active=True)
subkeuword = subkeyword.one()

You seems using SQLAlchemy, if it is true, sqlalchemy.orm.query.Query.filter_by() method takes multiple keyword arguments, so you could write like:

subkeyword = Session.query(Subkeyword.subkeyword_id,
                           Subkeyword.subkeyword_word) \
                    .filter_by(subkeyword_company_id=self.e_company_id,
                               subkeyword_word=subkeyword_word,
                               subkeyword_active=True) \
                    .one()

But it would be better:

subkeyword = Session.query(Subkeyword.subkeyword_id,
                           Subkeyword.subkeyword_word)
subkeyword = subkeyword.filter_by(subkeyword_company_id=self.e_company_id,
                                  subkeyword_word=subkeyword_word,
                                  subkeyword_active=True)
subkeuword = subkeyword.one()

回答 7

我喜欢将参数缩进两个块,将语句缩进一个块,如下所示:

for image_pathname in image_directory.iterdir():
    image = cv2.imread(str(image_pathname))
    input_image = np.resize(
            image, (height, width, 3)
        ).transpose((2,0,1)).reshape(1, 3, height, width)
    net.forward_all(data=input_image)
    segmentation_index = net.blobs[
            'argmax'
        ].data.squeeze().transpose(1,2,0).astype(np.uint8)
    segmentation = np.empty(segmentation_index.shape, dtype=np.uint8)
    cv2.LUT(segmentation_index, label_colours, segmentation)
    prediction_pathname = prediction_directory / image_pathname.name
    cv2.imwrite(str(prediction_pathname), segmentation)

I like to indent the arguments by two blocks, and the statement by one block, like these:

for image_pathname in image_directory.iterdir():
    image = cv2.imread(str(image_pathname))
    input_image = np.resize(
            image, (height, width, 3)
        ).transpose((2,0,1)).reshape(1, 3, height, width)
    net.forward_all(data=input_image)
    segmentation_index = net.blobs[
            'argmax'
        ].data.squeeze().transpose(1,2,0).astype(np.uint8)
    segmentation = np.empty(segmentation_index.shape, dtype=np.uint8)
    cv2.LUT(segmentation_index, label_colours, segmentation)
    prediction_pathname = prediction_directory / image_pathname.name
    cv2.imwrite(str(prediction_pathname), segmentation)

给定一百万个数字的字符串,返回所有重复的3位数字

问题:给定一百万个数字的字符串,返回所有重复的3位数字

几个月前,我在纽约接受了一家对冲基金公司的采访,不幸的是,我没有获得数据/软件工程师的实习机会。(他们还要求解决方案使用Python。)

我几乎搞砸了第一次面试的问题…

问题:给定一百万个数字的字符串(例如,Pi),编写一个函数/程序,该函数/程序返回所有重复的3位数字,并且重复次数大于1

例如:如果字符串为:123412345123456则函数/程序将返回:

123 - 3 times
234 - 3 times
345 - 2 times

在面试失败后,他们没有给我解决方案,但他们确实告诉我,解决方案的时间复杂度恒定为1000,因为所有可能的结果都介于:

000-> 999

现在我正在考虑它,我认为不可能提出一个恒定时间算法。是吗?

I had an interview with a hedge fund company in New York a few months ago and unfortunately, I did not get the internship offer as a data/software engineer. (They also asked the solution to be in Python.)

I pretty much screwed up on the first interview problem…

Question: Given a string of a million numbers (Pi for example), write a function/program that returns all repeating 3 digit numbers and number of repetition greater than 1

For example: if the string was: 123412345123456 then the function/program would return:

123 - 3 times
234 - 3 times
345 - 2 times

They did not give me the solution after I failed the interview, but they did tell me that the time complexity for the solution was constant of 1000 since all the possible outcomes are between:

000 –> 999

Now that I’m thinking about it, I don’t think it’s possible to come up with a constant time algorithm. Is it?


回答 0

您轻轻松松下手,您可能不想在对量子点不了解基本算法的对冲基金中工作:-)

在这种情况下,如果您需要至少访问一次每个元素,则无法处理任意大小的数据结构O(1)。在这种情况下,您可以期望的最好是字符串的长度。O(n)n

虽然,顺便说一句,标称O(n)算法O(1)对一个固定的输入大小,这样,在技术上,他们可能已经在这里正确的。但是,这通常不是人们使用复杂度分析的方式。

在我看来,您可能会在很多方面给他们留下深刻的印象。

首先,通知他们,它是不是能够做到这一点的O(1),除非你使用上面的“犯罪嫌疑人”说理给出。

其次,通过提供Pythonic代码来展示您的精英技能,例如:

inpStr = '123412345123456'

# O(1) array creation.
freq = [0] * 1000

# O(n) string processing.
for val in [int(inpStr[pos:pos+3]) for pos in range(len(inpStr) - 2)]:
    freq[val] += 1

# O(1) output of relevant array values.
print ([(num, freq[num]) for num in range(1000) if freq[num] > 1])

输出:

[(123, 3), (234, 3), (345, 2)]

尽管您当然可以将输出格式修改为所需的任何格式。

最后,通过告诉他们解决方案几乎没有问题O(n),因为上面的代码在不到半秒钟的时间内即可提供一百万个数字字符串的结果。它似乎也线性地缩放,因为一个10,000,000个字符的字符串需要3.5秒,而一个100,000,000个字符的字符串需要36秒。

而且,如果他们需要的更好,则可以采用多种方法并行化此类内容,从而可以大大加快这种处理速度。

当然,由于GIL的缘故,不在单个 Python解释器中,但是您可以将字符串拆分成类似的字符(vv为了正确处理边界区域,必须用表示的重叠):

    vv
123412  vv
    123451
        5123456

您可以将它们种出以分开工作,然后再合并结果。

输入的拆分和输出的合并很可能会用小字符串(甚至可能是百万位数字的字符串)淹没任何节省的时间,但是,对于更大的数据集,这很可能会有所作为。当然,我通常的口号是:“不要猜测”


此口头禅也适用于其他可能性,例如完全绕过Python并使用可能更快的其他语言。

例如,以下C代码,在相同的硬件作为较早Python代码运行,处理一个在0.6秒万位,大致为Python代码处理的相同的时间量之一百万。换句话说,速度快:

#include <stdio.h>
#include <string.h>

int main(void) {
    static char inpStr[100000000+1];
    static int freq[1000];

    // Set up test data.

    memset(inpStr, '1', sizeof(inpStr));
    inpStr[sizeof(inpStr)-1] = '\0';

    // Need at least three digits to do anything useful.

    if (strlen(inpStr) <= 2) return 0;

    // Get initial feed from first two digits, process others.

    int val = (inpStr[0] - '0') * 10 + inpStr[1] - '0';
    char *inpPtr = &(inpStr[2]);
    while (*inpPtr != '\0') {
        // Remove hundreds, add next digit as units, adjust table.

        val = (val % 100) * 10 + *inpPtr++ - '0';
        freq[val]++;
    }

    // Output (relevant part of) table.

    for (int i = 0; i < 1000; ++i)
        if (freq[i] > 1)
            printf("%3d -> %d\n", i, freq[i]);

    return 0;
}

You got off lightly, you probably don’t want to be working for a hedge fund where the quants don’t understand basic algorithms :-)

There is no way to process an arbitrarily-sized data structure in O(1) if, as in this case, you need to visit every element at least once. The best you can hope for is O(n) in this case, where n is the length of the string.

Although, as an aside, a nominal O(n) algorithm will be O(1) for a fixed input size so, technically, they may have been correct here. However, that’s not usually how people use complexity analysis.

It appears to me you could have impressed them in a number of ways.

First, by informing them that it’s not possible to do it in O(1), unless you use the “suspect” reasoning given above.

Second, by showing your elite skills by providing Pythonic code such as:

inpStr = '123412345123456'

# O(1) array creation.
freq = [0] * 1000

# O(n) string processing.
for val in [int(inpStr[pos:pos+3]) for pos in range(len(inpStr) - 2)]:
    freq[val] += 1

# O(1) output of relevant array values.
print ([(num, freq[num]) for num in range(1000) if freq[num] > 1])

This outputs:

[(123, 3), (234, 3), (345, 2)]

though you could, of course, modify the output format to anything you desire.

And, finally, by telling them there’s almost certainly no problem with an O(n) solution, since the code above delivers results for a one-million-digit string in well under half a second. It seems to scale quite linearly as well, since a 10,000,000-character string takes 3.5 seconds and a 100,000,000-character one takes 36 seconds.

And, if they need better than that, there are ways to parallelise this sort of stuff that can greatly speed it up.

Not within a single Python interpreter of course, due to the GIL, but you could split the string into something like (overlap indicated by vv is required to allow proper processing of the boundary areas):

    vv
123412  vv
    123451
        5123456

You can farm these out to separate workers and combine the results afterwards.

The splitting of input and combining of output are likely to swamp any saving with small strings (and possibly even million-digit strings) but, for much larger data sets, it may well make a difference. My usual mantra of “measure, don’t guess” applies here, of course.


This mantra also applies to other possibilities, such as bypassing Python altogether and using a different language which may be faster.

For example, the following C code, running on the same hardware as the earlier Python code, handles a hundred million digits in 0.6 seconds, roughly the same amount of time as the Python code processed one million. In other words, much faster:

#include <stdio.h>
#include <string.h>

int main(void) {
    static char inpStr[100000000+1];
    static int freq[1000];

    // Set up test data.

    memset(inpStr, '1', sizeof(inpStr));
    inpStr[sizeof(inpStr)-1] = '\0';

    // Need at least three digits to do anything useful.

    if (strlen(inpStr) <= 2) return 0;

    // Get initial feed from first two digits, process others.

    int val = (inpStr[0] - '0') * 10 + inpStr[1] - '0';
    char *inpPtr = &(inpStr[2]);
    while (*inpPtr != '\0') {
        // Remove hundreds, add next digit as units, adjust table.

        val = (val % 100) * 10 + *inpPtr++ - '0';
        freq[val]++;
    }

    // Output (relevant part of) table.

    for (int i = 0; i < 1000; ++i)
        if (freq[i] > 1)
            printf("%3d -> %d\n", i, freq[i]);

    return 0;
}

回答 1

固定时间是不可能的。所有一百万个数字都需要至少被查看一次,因此这是时间复杂度O(n),在这种情况下,n =一百万。

对于简单的O(n)解决方案,创建一个大小为1000的数组,该数组表示每个可能的3位数字的出现次数。一次前进1位数字,第一个索引== 0,最后一个索引== 999997,并递增array [3位数字]以创建直方图(每个可能的3位数字出现的次数)。然后输出计数> 1的数组内容。

Constant time isn’t possible. All 1 million digits need to be looked at at least once, so that is a time complexity of O(n), where n = 1 million in this case.

For a simple O(n) solution, create an array of size 1000 that represents the number of occurrences of each possible 3 digit number. Advance 1 digit at a time, first index == 0, last index == 999997, and increment array[3 digit number] to create a histogram (count of occurrences for each possible 3 digit number). Then output the content of the array with counts > 1.


回答 2

一百万对于我在下面给出的答案来说很小。只期望您必须能够不间断地在面试中运行解决方案,然后以下操作将在不到两秒钟的时间内完成并给出所需的结果:

from collections import Counter

def triple_counter(s):
    c = Counter(s[n-3: n] for n in range(3, len(s)))
    for tri, n in c.most_common():
        if n > 1:
            print('%s - %i times.' % (tri, n))
        else:
            break

if __name__ == '__main__':
    import random

    s = ''.join(random.choice('0123456789') for _ in range(1_000_000))
    triple_counter(s)

希望访问者可以使用标准库collections.Counter类。

并行执行版本

我为此写了一篇博客文章,并提供了更多解释。

A million is small for the answer I give below. Expecting only that you have to be able to run the solution in the interview, without a pause, then The following works in less than two seconds and gives the required result:

from collections import Counter

def triple_counter(s):
    c = Counter(s[n-3: n] for n in range(3, len(s)))
    for tri, n in c.most_common():
        if n > 1:
            print('%s - %i times.' % (tri, n))
        else:
            break

if __name__ == '__main__':
    import random

    s = ''.join(random.choice('0123456789') for _ in range(1_000_000))
    triple_counter(s)

Hopefully the interviewer would be looking for use of the standard libraries collections.Counter class.

Parallel execution version

I wrote a blog post on this with more explanation.


回答 3

简单的O(n)解决方案是对每个3位数字进行计数:

for nr in range(1000):
    cnt = text.count('%03d' % nr)
    if cnt > 1:
        print '%03d is found %d times' % (nr, cnt)

这将搜索全部100万个数字1000次。

仅遍历数字一次:

counts = [0] * 1000
for idx in range(len(text)-2):
    counts[int(text[idx:idx+3])] += 1

for nr, cnt in enumerate(counts):
    if cnt > 1:
        print '%03d is found %d times' % (nr, cnt)

时序显示,仅对索引进行一次迭代是使用的两倍count

The simple O(n) solution would be to count each 3-digit number:

for nr in range(1000):
    cnt = text.count('%03d' % nr)
    if cnt > 1:
        print '%03d is found %d times' % (nr, cnt)

This would search through all 1 million digits 1000 times.

Traversing the digits only once:

counts = [0] * 1000
for idx in range(len(text)-2):
    counts[int(text[idx:idx+3])] += 1

for nr, cnt in enumerate(counts):
    if cnt > 1:
        print '%03d is found %d times' % (nr, cnt)

Timing shows that iterating only once over the index is twice as fast as using count.


回答 4

这是“共识” O(n)算法的NumPy实现:遍历所有三元组和bin。通过遇到“ 385”,将bin加到bin [3,8,5](这是一个O(1)操作)中来完成合并。垃圾箱排列成一个10x10x10立方体。由于合并已完全矢量化,因此代码中没有循环。

def setup_data(n):
    import random
    digits = "0123456789"
    return dict(text = ''.join(random.choice(digits) for i in range(n)))

def f_np(text):
    # Get the data into NumPy
    import numpy as np
    a = np.frombuffer(bytes(text, 'utf8'), dtype=np.uint8) - ord('0')
    # Rolling triplets
    a3 = np.lib.stride_tricks.as_strided(a, (3, a.size-2), 2*a.strides)

    bins = np.zeros((10, 10, 10), dtype=int)
    # Next line performs O(n) binning
    np.add.at(bins, tuple(a3), 1)
    # Filtering is left as an exercise
    return bins.ravel()

def f_py(text):
    counts = [0] * 1000
    for idx in range(len(text)-2):
        counts[int(text[idx:idx+3])] += 1
    return counts

import numpy as np
import types
from timeit import timeit
for n in (10, 1000, 1000000):
    data = setup_data(n)
    ref = f_np(**data)
    print(f'n = {n}')
    for name, func in list(globals().items()):
        if not name.startswith('f_') or not isinstance(func, types.FunctionType):
            continue
        try:
            assert np.all(ref == func(**data))
            print("{:16s}{:16.8f} ms".format(name[2:], timeit(
                'f(**data)', globals={'f':func, 'data':data}, number=10)*100))
        except:
            print("{:16s} apparently crashed".format(name[2:]))

毫不奇怪,在大型数据集上,NumPy比@Daniel的纯Python解决方案要快一点。样本输出:

# n = 10
# np                    0.03481400 ms
# py                    0.00669330 ms
# n = 1000
# np                    0.11215360 ms
# py                    0.34836530 ms
# n = 1000000
# np                   82.46765980 ms
# py                  360.51235450 ms

Here is a NumPy implementation of the “consensus” O(n) algorithm: walk through all triplets and bin as you go. The binning is done by upon encountering say “385”, adding one to bin[3, 8, 5] which is an O(1) operation. Bins are arranged in a 10x10x10 cube. As the binning is fully vectorized there is no loop in the code.

def setup_data(n):
    import random
    digits = "0123456789"
    return dict(text = ''.join(random.choice(digits) for i in range(n)))

def f_np(text):
    # Get the data into NumPy
    import numpy as np
    a = np.frombuffer(bytes(text, 'utf8'), dtype=np.uint8) - ord('0')
    # Rolling triplets
    a3 = np.lib.stride_tricks.as_strided(a, (3, a.size-2), 2*a.strides)

    bins = np.zeros((10, 10, 10), dtype=int)
    # Next line performs O(n) binning
    np.add.at(bins, tuple(a3), 1)
    # Filtering is left as an exercise
    return bins.ravel()

def f_py(text):
    counts = [0] * 1000
    for idx in range(len(text)-2):
        counts[int(text[idx:idx+3])] += 1
    return counts

import numpy as np
import types
from timeit import timeit
for n in (10, 1000, 1000000):
    data = setup_data(n)
    ref = f_np(**data)
    print(f'n = {n}')
    for name, func in list(globals().items()):
        if not name.startswith('f_') or not isinstance(func, types.FunctionType):
            continue
        try:
            assert np.all(ref == func(**data))
            print("{:16s}{:16.8f} ms".format(name[2:], timeit(
                'f(**data)', globals={'f':func, 'data':data}, number=10)*100))
        except:
            print("{:16s} apparently crashed".format(name[2:]))

Unsurprisingly, NumPy is a bit faster than @Daniel’s pure Python solution on large data sets. Sample output:

# n = 10
# np                    0.03481400 ms
# py                    0.00669330 ms
# n = 1000
# np                    0.11215360 ms
# py                    0.34836530 ms
# n = 1000000
# np                   82.46765980 ms
# py                  360.51235450 ms

回答 5

我将解决以下问题:

def find_numbers(str_num):
    final_dict = {}
    buffer = {}
    for idx in range(len(str_num) - 3):
        num = int(str_num[idx:idx + 3])
        if num not in buffer:
            buffer[num] = 0
        buffer[num] += 1
        if buffer[num] > 1:
            final_dict[num] = buffer[num]
    return final_dict

应用于示例字符串,将生成:

>>> find_numbers("123412345123456")
{345: 2, 234: 3, 123: 3}

该解决方案在O(n)中运行,因为n是提供的字符串的长度,并且我认为这是您可以获得的最佳结果。

I would solve the problem as follows:

def find_numbers(str_num):
    final_dict = {}
    buffer = {}
    for idx in range(len(str_num) - 3):
        num = int(str_num[idx:idx + 3])
        if num not in buffer:
            buffer[num] = 0
        buffer[num] += 1
        if buffer[num] > 1:
            final_dict[num] = buffer[num]
    return final_dict

Applied to your example string, this yields:

>>> find_numbers("123412345123456")
{345: 2, 234: 3, 123: 3}

This solution runs in O(n) for n being the length of the provided string, and is, I guess, the best you can get.


回答 6

根据我的理解,您无法在固定时间内获得解决方案。至少需要通过一百万个数字(假设它是一个字符串)。您可以对百万个长度数字的位数进行三位数的滚动迭代,如果哈希键已经存在,则将其增加1;如果哈希密钥不存在,则创建一个新的哈希键(由值1初始化)。词典。

该代码将如下所示:

def calc_repeating_digits(number):

    hash = {}

    for i in range(len(str(number))-2):

        current_three_digits = number[i:i+3]
        if current_three_digits in hash.keys():
            hash[current_three_digits] += 1

        else:
            hash[current_three_digits] = 1

    return hash

您可以筛选出项值大于1的键。

As per my understanding, you cannot have the solution in a constant time. It will take at least one pass over the million digit number (assuming its a string). You can have a 3-digit rolling iteration over the digits of the million length number and increase the value of hash key by 1 if it already exists or create a new hash key (initialized by value 1) if it doesn’t exists already in the dictionary.

The code will look something like this:

def calc_repeating_digits(number):

    hash = {}

    for i in range(len(str(number))-2):

        current_three_digits = number[i:i+3]
        if current_three_digits in hash.keys():
            hash[current_three_digits] += 1

        else:
            hash[current_three_digits] = 1

    return hash

You can filter down to the keys which have item value greater than 1.


回答 7

如另一个答案中所述,您不能在固定时间内执行此算法,因为您必须查看至少n位数字。线性时间是最快的。

但是,该算法可以在O(1)空间中完成。您只需要存储每个3位数字的计数,因此您需要一个包含1000个条目的数组。然后,您可以输入号码。

我的猜测是,当面试官给您解决方案时,他们会误以为是,或者当他们说“恒定空间”时,您会误以为“恒定时间”。

As mentioned in another answer, you cannot do this algorithm in constant time, because you must look at at least n digits. Linear time is the fastest you can get.

However, the algorithm can be done in O(1) space. You only need to store the counts of each 3 digit number, so you need an array of 1000 entries. You can then stream the number in.

My guess is that either the interviewer misspoke when they gave you the solution, or you misheard “constant time” when they said “constant space.”


回答 8

这是我的答案:

from timeit import timeit
from collections import Counter
import types
import random

def setup_data(n):
    digits = "0123456789"
    return dict(text = ''.join(random.choice(digits) for i in range(n)))


def f_counter(text):
    c = Counter()
    for i in range(len(text)-2):
        ss = text[i:i+3]
        c.update([ss])
    return (i for i in c.items() if i[1] > 1)

def f_dict(text):
    d = {}
    for i in range(len(text)-2):
        ss = text[i:i+3]
        if ss not in d:
            d[ss] = 0
        d[ss] += 1
    return ((i, d[i]) for i in d if d[i] > 1)

def f_array(text):
    a = [[[0 for _ in range(10)] for _ in range(10)] for _ in range(10)]
    for n in range(len(text)-2):
        i, j, k = (int(ss) for ss in text[n:n+3])
        a[i][j][k] += 1
    for i, b in enumerate(a):
        for j, c in enumerate(b):
            for k, d in enumerate(c):
                if d > 1: yield (f'{i}{j}{k}', d)


for n in (1E1, 1E3, 1E6):
    n = int(n)
    data = setup_data(n)
    print(f'n = {n}')
    results = {}
    for name, func in list(globals().items()):
        if not name.startswith('f_') or not isinstance(func, types.FunctionType):
            continue
        print("{:16s}{:16.8f} ms".format(name[2:], timeit(
            'results[name] = f(**data)', globals={'f':func, 'data':data, 'results':results, 'name':name}, number=10)*100))
    for r in results:
        print('{:10}: {}'.format(r, sorted(list(results[r]))[:5]))

数组查找方法非常快(甚至比@ paul-panzer的numpy方法还快!)。当然,它作弊是因为它在完成后并未在技术上完成,因为它正在返回生成器。它也不必检查每次迭代是否已经存在该值,这可能会有所帮助。

n = 10
counter               0.10595780 ms
dict                  0.01070654 ms
array                 0.00135370 ms
f_counter : []
f_dict    : []
f_array   : []
n = 1000
counter               2.89462101 ms
dict                  0.40434612 ms
array                 0.00073838 ms
f_counter : [('008', 2), ('009', 3), ('010', 2), ('016', 2), ('017', 2)]
f_dict    : [('008', 2), ('009', 3), ('010', 2), ('016', 2), ('017', 2)]
f_array   : [('008', 2), ('009', 3), ('010', 2), ('016', 2), ('017', 2)]
n = 1000000
counter            2849.00500992 ms
dict                438.44007806 ms
array                 0.00135370 ms
f_counter : [('000', 1058), ('001', 943), ('002', 1030), ('003', 982), ('004', 1042)]
f_dict    : [('000', 1058), ('001', 943), ('002', 1030), ('003', 982), ('004', 1042)]
f_array   : [('000', 1058), ('001', 943), ('002', 1030), ('003', 982), ('004', 1042)]

Here’s my answer:

from timeit import timeit
from collections import Counter
import types
import random

def setup_data(n):
    digits = "0123456789"
    return dict(text = ''.join(random.choice(digits) for i in range(n)))


def f_counter(text):
    c = Counter()
    for i in range(len(text)-2):
        ss = text[i:i+3]
        c.update([ss])
    return (i for i in c.items() if i[1] > 1)

def f_dict(text):
    d = {}
    for i in range(len(text)-2):
        ss = text[i:i+3]
        if ss not in d:
            d[ss] = 0
        d[ss] += 1
    return ((i, d[i]) for i in d if d[i] > 1)

def f_array(text):
    a = [[[0 for _ in range(10)] for _ in range(10)] for _ in range(10)]
    for n in range(len(text)-2):
        i, j, k = (int(ss) for ss in text[n:n+3])
        a[i][j][k] += 1
    for i, b in enumerate(a):
        for j, c in enumerate(b):
            for k, d in enumerate(c):
                if d > 1: yield (f'{i}{j}{k}', d)


for n in (1E1, 1E3, 1E6):
    n = int(n)
    data = setup_data(n)
    print(f'n = {n}')
    results = {}
    for name, func in list(globals().items()):
        if not name.startswith('f_') or not isinstance(func, types.FunctionType):
            continue
        print("{:16s}{:16.8f} ms".format(name[2:], timeit(
            'results[name] = f(**data)', globals={'f':func, 'data':data, 'results':results, 'name':name}, number=10)*100))
    for r in results:
        print('{:10}: {}'.format(r, sorted(list(results[r]))[:5]))

The array lookup method is very fast (even faster than @paul-panzer’s numpy method!). Of course, it cheats since it isn’t technicailly finished after it completes, because it’s returning a generator. It also doesn’t have to check every iteration if the value already exists, which is likely to help a lot.

n = 10
counter               0.10595780 ms
dict                  0.01070654 ms
array                 0.00135370 ms
f_counter : []
f_dict    : []
f_array   : []
n = 1000
counter               2.89462101 ms
dict                  0.40434612 ms
array                 0.00073838 ms
f_counter : [('008', 2), ('009', 3), ('010', 2), ('016', 2), ('017', 2)]
f_dict    : [('008', 2), ('009', 3), ('010', 2), ('016', 2), ('017', 2)]
f_array   : [('008', 2), ('009', 3), ('010', 2), ('016', 2), ('017', 2)]
n = 1000000
counter            2849.00500992 ms
dict                438.44007806 ms
array                 0.00135370 ms
f_counter : [('000', 1058), ('001', 943), ('002', 1030), ('003', 982), ('004', 1042)]
f_dict    : [('000', 1058), ('001', 943), ('002', 1030), ('003', 982), ('004', 1042)]
f_array   : [('000', 1058), ('001', 943), ('002', 1030), ('003', 982), ('004', 1042)]

回答 9

图片作为答案:

看起来像一个滑动窗口。

Image as answer:

Looks like a sliding window.


回答 10

这是我的解决方案:

from collections import defaultdict
string = "103264685134845354863"
d = defaultdict(int)
for elt in range(len(string)-2):
    d[string[elt:elt+3]] += 1
d = {key: d[key] for key in d.keys() if d[key] > 1}

在for循环中具有一些创造力(例如,带有True / False / None的附加查找列表),您应该可以摆脱最后一行,因为您只想创建一个字典,直到我们访问该点为止。希望能帮助到你 :)

Here is my solution:

from collections import defaultdict
string = "103264685134845354863"
d = defaultdict(int)
for elt in range(len(string)-2):
    d[string[elt:elt+3]] += 1
d = {key: d[key] for key in d.keys() if d[key] > 1}

With a bit of creativity in for loop(and additional lookup list with True/False/None for example) you should be able to get rid of last line, as you only want to create keys in dict that we visited once up to that point. Hope it helps :)


回答 11

-从C角度讲。-您可以得到一个int 3-d数组结果[10] [10] [10]; -从第0个位置转到第n-4个位置,其中n是字符串数组的大小。-在每个位置上,检查当前,下一个和下一个下一个。-将cntr增加为resutls [current] [next] [next的下一个] ++;-打印的值

results[1][2][3]
results[2][3][4]
results[3][4][5]
results[4][5][6]
results[5][6][7]
results[6][7][8]
results[7][8][9]

-现在是O(n)时间,不涉及比较。-您可以在此处运行一些并行的东西,方法是对数组进行分区并计算分区之间的匹配项。

-Telling from the perspective of C. -You can have an int 3-d array results[10][10][10]; -Go from 0th location to n-4th location, where n being the size of the string array. -On each location, check the current, next and next’s next. -Increment the cntr as resutls[current][next][next’s next]++; -Print the values of

results[1][2][3]
results[2][3][4]
results[3][4][5]
results[4][5][6]
results[5][6][7]
results[6][7][8]
results[7][8][9]

-It is O(n) time, there is no comparisons involved. -You can run some parallel stuff here by partitioning the array and calculating the matches around the partitions.


回答 12

inputStr = '123456123138276237284287434628736482376487234682734682736487263482736487236482634'

count = {}
for i in range(len(inputStr) - 2):
    subNum = int(inputStr[i:i+3])
    if subNum not in count:
        count[subNum] = 1
    else:
        count[subNum] += 1

print count
inputStr = '123456123138276237284287434628736482376487234682734682736487263482736487236482634'

count = {}
for i in range(len(inputStr) - 2):
    subNum = int(inputStr[i:i+3])
    if subNum not in count:
        count[subNum] = 1
    else:
        count[subNum] += 1

print count

创建简单的python网络服务的最佳方法

问题:创建简单的python网络服务的最佳方法

我已经使用python多年了,但是我对python Web编程的经验很少。我想创建一个非常简单的Web服务,该服务公开来自现有python脚本的一些功能以供公司使用。它可能会在csv中返回结果。什么是最快的方法?如果它影响您的建议,那么我很可能会在此之后添加更多功能。

I’ve been using python for years, but I have little experience with python web programming. I’d like to create a very simple web service that exposes some functionality from an existing python script for use within my company. It will likely return the results in csv. What’s the quickest way to get something up? If it affects your suggestion, I will likely be adding more functionality to this, down the road.


回答 0

看看werkzeug。Werkzeug最初是WSGI应用程序各种实用程序的简单集合,现已成为最高级的WSGI实用程序模块之一。它包括功能强大的调试器,功能齐全的请求和响应对象,用于处理实体标签的HTTP实用程序,高速缓存控制标头,HTTP日期,cookie处理,文件上传,功能强大的URL路由系统和一堆由社区提供的附加模块。

它包括许多可与http配合使用的出色工具,并且具有可以在不同环境(cgi,fcgi,apache / mod_wsgi或用于调试的普通简单python服务器)中与wsgi一起使用的优点。

Have a look at werkzeug. Werkzeug started as a simple collection of various utilities for WSGI applications and has become one of the most advanced WSGI utility modules. It includes a powerful debugger, full featured request and response objects, HTTP utilities to handle entity tags, cache control headers, HTTP dates, cookie handling, file uploads, a powerful URL routing system and a bunch of community contributed addon modules.

It includes lots of cool tools to work with http and has the advantage that you can use it with wsgi in different environments (cgi, fcgi, apache/mod_wsgi or with a plain simple python server for debugging).


回答 1

web.py可能是最简单的Web框架。“裸” CGI比较简单,但是在提供一项实际上可以做某事的服务时,您完全是一个人。

“你好,世界!” 根据web.py是不是比裸CGI版本更长的时间,但它增加了URL映射,HTTP命令的区别,和查询参数解析为免费

import web

urls = (
    '/(.*)', 'hello'
)
app = web.application(urls, globals())

class hello:        
    def GET(self, name):
        if not name: 
            name = 'world'
        return 'Hello, ' + name + '!'

if __name__ == "__main__":
    app.run()

web.py is probably the simplest web framework out there. “Bare” CGI is simpler, but you’re completely on your own when it comes to making a service that actually does something.

“Hello, World!” according to web.py isn’t much longer than an bare CGI version, but it adds URL mapping, HTTP command distinction, and query parameter parsing for free:

import web

urls = (
    '/(.*)', 'hello'
)
app = web.application(urls, globals())

class hello:        
    def GET(self, name):
        if not name: 
            name = 'world'
        return 'Hello, ' + name + '!'

if __name__ == "__main__":
    app.run()

回答 2

在线获取Python脚本的最简单方法是使用CGI:

#!/usr/bin/python

print "Content-type: text/html"
print

print "<p>Hello world.</p>"

将该代码放入驻留在Web服务器CGI目录中的脚本中,使其可执行并运行。cgi当您需要接受用户的参数时,该模块具有许多有用的实用程序。

The simplest way to get a Python script online is to use CGI:

#!/usr/bin/python

print "Content-type: text/html"
print

print "<p>Hello world.</p>"

Put that code in a script that lives in your web server CGI directory, make it executable, and run it. The cgi module has a number of useful utilities when you need to accept parameters from the user.


回答 3

原始CGI有点痛苦,Django有点重量级。关于它,有许多更简单,更轻便的框架,例如CherryPy。值得一看。

Raw CGI is kind of a pain, Django is kind of heavyweight. There are a number of simpler, lighter frameworks about, e.g. CherryPy. It’s worth looking around a bit.


回答 4

看一下WSGI参考实现。您已经在Python库中拥有它。这很简单。

Look at the WSGI reference implementation. You already have it in your Python libraries. It’s quite simple.


回答 5

如果您指的是“ Web服务”,则其他程序SimpleXMLRPCServer可能访问了某些 适合您的东西。自版本2.2起,所有Python安装中均包含该功能。

对于简单的人类可访问的东西,我通常使用Python的SimpleHTTPServer,它也随每次安装一起提供。显然,您还可以通过客户端程序访问SimpleHTTPServer。

If you mean with “Web Service” something accessed by other Programms SimpleXMLRPCServer might be right for you. It is included with every Python install since Version 2.2.

For Simple human accessible things I usually use Pythons SimpleHTTPServer which also comes with every install. Obviously you also could access SimpleHTTPServer by client programs.


回答 6

如果您拥有一个好的Web框架,生活将会很简单。Django中的Web服务很简单。定义模型,编写返回CSV文档的视图函数。跳过模板。

Life is simple if you get a good web framework. Web services in Django are easy. Define your model, write view functions that return your CSV documents. Skip the templates.


回答 7

如果您用SOAP / WSDL表示“ Web服务”,则可能需要看一下 使用Python和SOAPpy生成WSDL

If you mean “web service” in SOAP/WSDL sense, you might want to look at Generating a WSDL using Python and SOAPpy


回答 8

也许是扭曲的 http://twistedmatrix.com/trac/


在Python中转义HTML的最简单方法是什么?

问题:在Python中转义HTML的最简单方法是什么?

cgi.escape似乎是一种可能的选择。它运作良好吗?有什么更好的东西吗?

cgi.escape seems like one possible choice. Does it work well? Is there something that is considered better?


回答 0

cgi.escape很好 它逃脱了:

  • <&lt;
  • >&gt;
  • &&amp;

对于所有HTML而言,这就足够了。

编辑:如果您有非ASCII字符,您还想转义,以便包含在使用不同编码的另一个编码文档中,如Craig所说,只需使用:

data.encode('ascii', 'xmlcharrefreplace')

不要忘了解码dataunicode第一,使用任何编码它编码的。

但是根据我的经验,如果您unicode从头开始一直都在工作,那么这种编码是没有用的。只需在文档头中指定的编码末尾进行编码(utf-8以实现最大兼容性)。

例:

>>> cgi.escape(u'<a>bá</a>').encode('ascii', 'xmlcharrefreplace')
'&lt;a&gt;b&#225;&lt;/a&gt;

另外值得一提的(感谢Greg)是额外的quote参数cgi.escape。将其设置为时Truecgi.escape还转义双引号字符("),因此您可以在XML / HTML属性中使用结果值。

编辑:请注意,cgi.escape已在Python 3.2中弃用,转而使用html.escape,它的功能相同,但quote默认情况下为True。

cgi.escape is fine. It escapes:

  • < to &lt;
  • > to &gt;
  • & to &amp;

That is enough for all HTML.

EDIT: If you have non-ascii chars you also want to escape, for inclusion in another encoded document that uses a different encoding, like Craig says, just use:

data.encode('ascii', 'xmlcharrefreplace')

Don’t forget to decode data to unicode first, using whatever encoding it was encoded.

However in my experience that kind of encoding is useless if you just work with unicode all the time from start. Just encode at the end to the encoding specified in the document header (utf-8 for maximum compatibility).

Example:

>>> cgi.escape(u'<a>bá</a>').encode('ascii', 'xmlcharrefreplace')
'&lt;a&gt;b&#225;&lt;/a&gt;

Also worth of note (thanks Greg) is the extra quote parameter cgi.escape takes. With it set to True, cgi.escape also escapes double quote chars (") so you can use the resulting value in a XML/HTML attribute.

EDIT: Note that cgi.escape has been deprecated in Python 3.2 in favor of html.escape, which does the same except that quote defaults to True.


回答 1

在Python 3.2中,新 html,引入模块,该模块用于从HTML标记转义保留字符。

它具有一个功能escape()

>>> import html
>>> html.escape('x > 2 && x < 7 single quote: \' double quote: "')
'x &gt; 2 &amp;&amp; x &lt; 7 single quote: &#x27; double quote: &quot;'

In Python 3.2 a new html module was introduced, which is used for escaping reserved characters from HTML markup.

It has one function escape():

>>> import html
>>> html.escape('x > 2 && x < 7 single quote: \' double quote: "')
'x &gt; 2 &amp;&amp; x &lt; 7 single quote: &#x27; double quote: &quot;'

回答 2

如果您希望在URL中转义HTML:

这可能不是OP想要的(问题并没有明确指出转义是在哪种上下文中使用的),但是Python的本机库urllib有一种方法可以转义需要安全包含在URL中的HTML实体。

以下是一个示例:

#!/usr/bin/python
from urllib import quote

x = '+<>^&'
print quote(x) # prints '%2B%3C%3E%5E%26'

在这里找到文件

If you wish to escape HTML in a URL:

This is probably NOT what the OP wanted (the question doesn’t clearly indicate in which context the escaping is meant to be used), but Python’s native library urllib has a method to escape HTML entities that need to be included in a URL safely.

The following is an example:

#!/usr/bin/python
from urllib import quote

x = '+<>^&'
print quote(x) # prints '%2B%3C%3E%5E%26'

Find docs here


回答 3

还有出色的markupsafe软件包

>>> from markupsafe import Markup, escape
>>> escape("<script>alert(document.cookie);</script>")
Markup(u'&lt;script&gt;alert(document.cookie);&lt;/script&gt;')

markupsafe程序包经过精心设计,并且可能是逃避转义的最通用,最Python化的方法,恕我直言,因为:

  1. return(Markup)是从unicode派生的类(即isinstance(escape('str'), unicode) == True
  2. 它可以正确处理unicode输入
  3. 它适用于Python(2.6、2.7、3.3和pypy)
  4. 它尊重对象(即具有__html__属性的对象)和模板重载(__html_format__)的自定义方法。

There is also the excellent markupsafe package.

>>> from markupsafe import Markup, escape
>>> escape("<script>alert(document.cookie);</script>")
Markup(u'&lt;script&gt;alert(document.cookie);&lt;/script&gt;')

The markupsafe package is well engineered, and probably the most versatile and Pythonic way to go about escaping, IMHO, because:

  1. the return (Markup) is a class derived from unicode (i.e. isinstance(escape('str'), unicode) == True
  2. it properly handles unicode input
  3. it works in Python (2.6, 2.7, 3.3, and pypy)
  4. it respects custom methods of objects (i.e. objects with a __html__ property) and template overloads (__html_format__).

回答 4

cgi.escape 从转义HTML标记和字符实体的有限意义上讲,应该可以逃脱HTML。

但是,您可能还必须考虑编码问题:如果要引用的HTML在特定的编码中包含非ASCII字符,那么还必须注意在引用时要合理地表示这些字符。也许您可以将它们转换为实体。否则,您应确保在“源” HTML和嵌入页面之间进行正确的编码转换,以避免损坏非ASCII字符。

cgi.escape should be good to escape HTML in the limited sense of escaping the HTML tags and character entities.

But you might have to also consider encoding issues: if the HTML you want to quote has non-ASCII characters in a particular encoding, then you would also have to take care that you represent those sensibly when quoting. Perhaps you could convert them to entities. Otherwise you should ensure that the correct encoding translations are done between the “source” HTML and the page it’s embedded in, to avoid corrupting the non-ASCII characters.


回答 5

没有库,纯python,可以安全地将文本转义为html文本:

text.replace('&', '&amp;').replace('>', '&gt;').replace('<', '&lt;'
        ).encode('ascii', 'xmlcharrefreplace')

No libraries, pure python, safely escapes text into html text:

text.replace('&', '&amp;').replace('>', '&gt;').replace('<', '&lt;'
        ).encode('ascii', 'xmlcharrefreplace')

回答 6

cgi.escape 扩展的

此版本进行了改进cgi.escape。它还保留空格和换行符。返回一个unicode字符串。

def escape_html(text):
    """escape strings for display in HTML"""
    return cgi.escape(text, quote=True).\
           replace(u'\n', u'<br />').\
           replace(u'\t', u'&emsp;').\
           replace(u'  ', u' &nbsp;')

例如

>>> escape_html('<foo>\nfoo\t"bar"')
u'&lt;foo&gt;<br />foo&emsp;&quot;bar&quot;'

cgi.escape extended

This version improves cgi.escape. It also preserves whitespace and newlines. Returns a unicode string.

def escape_html(text):
    """escape strings for display in HTML"""
    return cgi.escape(text, quote=True).\
           replace(u'\n', u'<br />').\
           replace(u'\t', u'&emsp;').\
           replace(u'  ', u' &nbsp;')

for example

>>> escape_html('<foo>\nfoo\t"bar"')
u'&lt;foo&gt;<br />foo&emsp;&quot;bar&quot;'

回答 7

不是最简单的方法,但仍然很简单。与cgi.escape模块的主要区别-如果您已经&amp;在文本中使用了它,它仍然可以正常工作。从评论中可以看到:

cgi.escape版本

def escape(s, quote=None):
    '''Replace special characters "&", "<" and ">" to HTML-safe sequences.
    If the optional flag quote is true, the quotation mark character (")
is also translated.'''
    s = s.replace("&", "&amp;") # Must be done first!
    s = s.replace("<", "&lt;")
    s = s.replace(">", "&gt;")
    if quote:
        s = s.replace('"', "&quot;")
    return s

正则表达式版本

QUOTE_PATTERN = r"""([&<>"'])(?!(amp|lt|gt|quot|#39);)"""
def escape(word):
    """
    Replaces special characters <>&"' to HTML-safe sequences. 
    With attention to already escaped characters.
    """
    replace_with = {
        '<': '&gt;',
        '>': '&lt;',
        '&': '&amp;',
        '"': '&quot;', # should be escaped in attributes
        "'": '&#39'    # should be escaped in attributes
    }
    quote_pattern = re.compile(QUOTE_PATTERN)
    return re.sub(quote_pattern, lambda x: replace_with[x.group(0)], word)

Not the easiest way, but still straightforward. The main difference from cgi.escape module – it still will work properly if you already have &amp; in your text. As you see from comments to it:

cgi.escape version

def escape(s, quote=None):
    '''Replace special characters "&", "<" and ">" to HTML-safe sequences.
    If the optional flag quote is true, the quotation mark character (")
is also translated.'''
    s = s.replace("&", "&amp;") # Must be done first!
    s = s.replace("<", "&lt;")
    s = s.replace(">", "&gt;")
    if quote:
        s = s.replace('"', "&quot;")
    return s

regex version

QUOTE_PATTERN = r"""([&<>"'])(?!(amp|lt|gt|quot|#39);)"""
def escape(word):
    """
    Replaces special characters <>&"' to HTML-safe sequences. 
    With attention to already escaped characters.
    """
    replace_with = {
        '<': '&gt;',
        '>': '&lt;',
        '&': '&amp;',
        '"': '&quot;', # should be escaped in attributes
        "'": '&#39'    # should be escaped in attributes
    }
    quote_pattern = re.compile(QUOTE_PATTERN)
    return re.sub(quote_pattern, lambda x: replace_with[x.group(0)], word)

回答 8

对于Python 2.7中的旧代码,可以通过BeautifulSoup4做到:

>>> bs4.dammit import EntitySubstitution
>>> esub = EntitySubstitution()
>>> esub.substitute_html("r&d")
'r&amp;d'

For legacy code in Python 2.7, can do it via BeautifulSoup4:

>>> bs4.dammit import EntitySubstitution
>>> esub = EntitySubstitution()
>>> esub.substitute_html("r&d")
'r&amp;d'

python中的属性文件(类似于Java属性)

问题:python中的属性文件(类似于Java属性)

给定以下格式(.properties.ini):

propertyName1=propertyValue1
propertyName2=propertyValue2
...
propertyNameN=propertyValueN

对于Java,有一个Properties类,该类提供了解析/与上述格式交互的功能。

python标准库(2.x)中有类似的东西吗?

如果没有,我还有什么其他选择?

Given the following format (.properties or .ini):

propertyName1=propertyValue1
propertyName2=propertyValue2
...
propertyNameN=propertyValueN

For Java there is the Properties class that offers functionality to parse / interact with the above format.

Is there something similar in python‘s standard library (2.x) ?

If not, what other alternatives do I have ?


回答 0

对于.ini文件,有一个ConfigParser模块,它提供与.ini文件兼容的格式。

无论如何,没有任何可用于解析完整的.properties文件的文件,当我必须这样做时,我只是使用jython(我在谈论脚本)。

For .ini files there is the ConfigParser module that provides a format compatible with .ini files.

Anyway there’s nothing available for parsing complete .properties files, when I have to do that I simply use jython (I’m talking about scripting).


回答 1

我能够使用它ConfigParser,没有人显示如何执行此操作的任何示例,因此这里是属性文件的简单python阅读器和属性文件的示例。请注意,扩展名仍然.properties,但是我必须添加一个节标题,类似于您在.ini文件中看到的标题……有点混蛋,但是可以工作。

python文件: PythonPropertyReader.py

#!/usr/bin/python    
import ConfigParser
config = ConfigParser.RawConfigParser()
config.read('ConfigFile.properties')

print config.get('DatabaseSection', 'database.dbname');

属性文件: ConfigFile.properties

[DatabaseSection]
database.dbname=unitTest
database.user=root
database.password=

有关更多功能,请阅读:https : //docs.python.org/2/library/configparser.html

I was able to get this to work with ConfigParser, no one showed any examples on how to do this, so here is a simple python reader of a property file and example of the property file. Note that the extension is still .properties, but I had to add a section header similar to what you see in .ini files… a bit of a bastardization, but it works.

The python file: PythonPropertyReader.py

#!/usr/bin/python    
import ConfigParser
config = ConfigParser.RawConfigParser()
config.read('ConfigFile.properties')

print config.get('DatabaseSection', 'database.dbname');

The property file: ConfigFile.properties

[DatabaseSection]
database.dbname=unitTest
database.user=root
database.password=

For more functionality, read: https://docs.python.org/2/library/configparser.html


回答 2

Java属性文件通常也是有效的python代码。您可以将myconfig.properties文件重命名为myconfig.py。然后像这样导入文件

import myconfig

并直接访问属性

print myconfig.propertyName1

A java properties file is often valid python code as well. You could rename your myconfig.properties file to myconfig.py. Then just import your file, like this

import myconfig

and access the properties directly

print myconfig.propertyName1

回答 3

我知道这是一个非常老的问题,但是我现在需要它,因此我决定实现自己的解决方案,一个纯python解决方案,它涵盖了大多数用例(不是全部):

def load_properties(filepath, sep='=', comment_char='#'):
    """
    Read the file passed as parameter as a properties file.
    """
    props = {}
    with open(filepath, "rt") as f:
        for line in f:
            l = line.strip()
            if l and not l.startswith(comment_char):
                key_value = l.split(sep)
                key = key_value[0].strip()
                value = sep.join(key_value[1:]).strip().strip('"') 
                props[key] = value 
    return props

您可以sep将’:’ 更改为以下格式的文件:

key : value

该代码可以正确解析以下行:

url = "http://my-host.com"
name = Paul = Pablo
# This comment line will be ignored

您将获得以下命令:

{"url": "http://my-host.com", "name": "Paul = Pablo" }

I know that this is a very old question, but I need it just now and I decided to implement my own solution, a pure python solution, that covers most uses cases (not all):

def load_properties(filepath, sep='=', comment_char='#'):
    """
    Read the file passed as parameter as a properties file.
    """
    props = {}
    with open(filepath, "rt") as f:
        for line in f:
            l = line.strip()
            if l and not l.startswith(comment_char):
                key_value = l.split(sep)
                key = key_value[0].strip()
                value = sep.join(key_value[1:]).strip().strip('"') 
                props[key] = value 
    return props

You can change the sep to ‘:’ to parse files with format:

key : value

The code parses correctly lines like:

url = "http://my-host.com"
name = Paul = Pablo
# This comment line will be ignored

You’ll get a dict with:

{"url": "http://my-host.com", "name": "Paul = Pablo" }

回答 4

如果可以选择文件格式,我建议使用.ini和Python的ConfigParser,如上所述。如果您需要与Java .properties文件兼容,那么我已经为它编写了一个名为jprops的库。我们使用的是pyjavaproperties,但是遇到各种限制后,我最终实现了自己的。它完全支持.properties格式,包括unicode支持和对转义序列的更好支持。Jprops还可以解析任何类似文件的对象,而pyjavaproperties仅适用于磁盘上的实际文件。

If you have an option of file formats I suggest using .ini and Python’s ConfigParser as mentioned. If you need compatibility with Java .properties files I have written a library for it called jprops. We were using pyjavaproperties, but after encountering various limitations I ended up implementing my own. It has full support for the .properties format, including unicode support and better support for escape sequences. Jprops can also parse any file-like object while pyjavaproperties only works with real files on disk.


回答 5

如果您没有多行属性并且需求非常简单,那么可以使用几行代码为您解决:

档案t.properties

a=b
c=d
e=f

Python代码:

with open("t.properties") as f:
    l = [line.split("=") for line in f.readlines()]
    d = {key.strip(): value.strip() for key, value in l}

if you don’t have multi line properties and a very simple need, a few lines of code can solve it for you:

File t.properties:

a=b
c=d
e=f

Python code:

with open("t.properties") as f:
    l = [line.split("=") for line in f.readlines()]
    d = {key.strip(): value.strip() for key, value in l}

回答 6

这不完全是属性,但是Python确实有一个很好的库来解析配置文件。另请参见此食谱:java.util.Properties的python替代品

This is not exactly properties but Python does have a nice library for parsing configuration files. Also see this recipe: A python replacement for java.util.Properties.


回答 7

这是我的项目的链接:https : //sourceforge.net/projects/pyproperties/。它是一个库,其中包含用于处理Python 3.x的* .properties文件的方法。

但这不是基于java.util.Properties

Here is link to my project: https://sourceforge.net/projects/pyproperties/. It is a library with methods for working with *.properties files for Python 3.x.

But it is not based on java.util.Properties


回答 8

这个是java.util.Propeties的一对一替换

从文档中:

  def __parse(self, lines):
        """ Parse a list of lines and create
        an internal property dictionary """

        # Every line in the file must consist of either a comment
        # or a key-value pair. A key-value pair is a line consisting
        # of a key which is a combination of non-white space characters
        # The separator character between key-value pairs is a '=',
        # ':' or a whitespace character not including the newline.
        # If the '=' or ':' characters are found, in the line, even
        # keys containing whitespace chars are allowed.

        # A line with only a key according to the rules above is also
        # fine. In such case, the value is considered as the empty string.
        # In order to include characters '=' or ':' in a key or value,
        # they have to be properly escaped using the backslash character.

        # Some examples of valid key-value pairs:
        #
        # key     value
        # key=value
        # key:value
        # key     value1,value2,value3
        # key     value1,value2,value3 \
        #         value4, value5
        # key
        # This key= this value
        # key = value1 value2 value3

        # Any line that starts with a '#' is considerered a comment
        # and skipped. Also any trailing or preceding whitespaces
        # are removed from the key/value.

        # This is a line parser. It parses the
        # contents like by line.

This is a one-to-one replacement of java.util.Propeties

From the doc:

  def __parse(self, lines):
        """ Parse a list of lines and create
        an internal property dictionary """

        # Every line in the file must consist of either a comment
        # or a key-value pair. A key-value pair is a line consisting
        # of a key which is a combination of non-white space characters
        # The separator character between key-value pairs is a '=',
        # ':' or a whitespace character not including the newline.
        # If the '=' or ':' characters are found, in the line, even
        # keys containing whitespace chars are allowed.

        # A line with only a key according to the rules above is also
        # fine. In such case, the value is considered as the empty string.
        # In order to include characters '=' or ':' in a key or value,
        # they have to be properly escaped using the backslash character.

        # Some examples of valid key-value pairs:
        #
        # key     value
        # key=value
        # key:value
        # key     value1,value2,value3
        # key     value1,value2,value3 \
        #         value4, value5
        # key
        # This key= this value
        # key = value1 value2 value3

        # Any line that starts with a '#' is considerered a comment
        # and skipped. Also any trailing or preceding whitespaces
        # are removed from the key/value.

        # This is a line parser. It parses the
        # contents like by line.

回答 9

您可以在ConfigParser.RawConfigParser.readfp此处定义的类似文件的对象中使用-> https://docs.python.org/2/library/configparser.html#ConfigParser.RawConfigParser.readfp

定义一个覆盖的类 readline在属性文件的实际内容之前添加节名称。

我已将其打包到返回dict定义的所有属性的类中。

import ConfigParser

class PropertiesReader(object):

    def __init__(self, properties_file_name):
        self.name = properties_file_name
        self.main_section = 'main'

        # Add dummy section on top
        self.lines = [ '[%s]\n' % self.main_section ]

        with open(properties_file_name) as f:
            self.lines.extend(f.readlines())

        # This makes sure that iterator in readfp stops
        self.lines.append('')

    def readline(self):
        return self.lines.pop(0)

    def read_properties(self):
        config = ConfigParser.RawConfigParser()

        # Without next line the property names will be lowercased
        config.optionxform = str

        config.readfp(self)
        return dict(config.items(self.main_section))

if __name__ == '__main__':
    print PropertiesReader('/path/to/file.properties').read_properties()

You can use a file-like object in ConfigParser.RawConfigParser.readfp defined here -> https://docs.python.org/2/library/configparser.html#ConfigParser.RawConfigParser.readfp

Define a class that overrides readline that adds a section name before the actual contents of your properties file.

I’ve packaged it into the class that returns a dict of all the properties defined.

import ConfigParser

class PropertiesReader(object):

    def __init__(self, properties_file_name):
        self.name = properties_file_name
        self.main_section = 'main'

        # Add dummy section on top
        self.lines = [ '[%s]\n' % self.main_section ]

        with open(properties_file_name) as f:
            self.lines.extend(f.readlines())

        # This makes sure that iterator in readfp stops
        self.lines.append('')

    def readline(self):
        return self.lines.pop(0)

    def read_properties(self):
        config = ConfigParser.RawConfigParser()

        # Without next line the property names will be lowercased
        config.optionxform = str

        config.readfp(self)
        return dict(config.items(self.main_section))

if __name__ == '__main__':
    print PropertiesReader('/path/to/file.properties').read_properties()

回答 10

我已经用过了,这个库非常有用

from pyjavaproperties import Properties
p = Properties()
p.load(open('test.properties'))
p.list()
print(p)
print(p.items())
print(p['name3'])
p['name3'] = 'changed = value'

i have used this, this library is very useful

from pyjavaproperties import Properties
p = Properties()
p.load(open('test.properties'))
p.list()
print(p)
print(p.items())
print(p['name3'])
p['name3'] = 'changed = value'

回答 11

这就是我在项目中所做的事情:我只创建了另一个名为properties.py的.py文件,其中包含我在项目中使用的所有常见变量/属性,并且在任何文件中都需要引用这些变量,

from properties import *(or anything you need)

当我经常更改开发人员的位置并且某些常见变量与本地环境有关时,使用此方法来保持svn的和平。对我来说效果很好,但不确定是否可以在正式的开发环境中建议使用此方法。

This is what I’m doing in my project: I just create another .py file called properties.py which includes all common variables/properties I used in the project, and in any file need to refer to these variables, put

from properties import *(or anything you need)

Used this method to keep svn peace when I was changing dev locations frequently and some common variables were quite relative to local environment. Works fine for me but not sure this method would be suggested for formal dev environment etc.


回答 12

import json
f=open('test.json')
x=json.load(f)
f.close()
print(x)

test.json的内容:{“主机”:“ 127.0.0.1”,“用户”:“ jms”}

import json
f=open('test.json')
x=json.load(f)
f.close()
print(x)

Contents of test.json: {“host”: “127.0.0.1”, “user”: “jms”}


回答 13

我创建了一个与Java的Properties类几乎相似的python模块(实际上就像Spring中的PropertyPlaceholderConfigurer一样,它允许您使用$ {variable-reference}来引用已定义的property)

编辑:您可以通过运行命令(当前已针对python 3测试)来安装此软件包。
pip install property

该项目托管在GitHub上

示例:(详细文档可在此处找到)

假设您在my_file.properties文件中定义了以下属性

foo = I am awesome
bar = ${chocolate}-bar
chocolate = fudge

加载上述属性的代码

from properties.p import Property

prop = Property()
# Simply load it into a dictionary
dic_prop = prop.load_property_files('my_file.properties')

I have created a python module that is almost similar to the Properties class of Java ( Actually it is like the PropertyPlaceholderConfigurer in spring which lets you use ${variable-reference} to refer to already defined property )

EDIT : You may install this package by running the command(currently tested for python 3).
pip install property

The project is hosted on GitHub

Example : ( Detailed documentation can be found here )

Let’s say you have the following properties defined in my_file.properties file

foo = I am awesome
bar = ${chocolate}-bar
chocolate = fudge

Code to load the above properties

from properties.p import Property

prop = Property()
# Simply load it into a dictionary
dic_prop = prop.load_property_files('my_file.properties')

回答 14

如果您需要以简单的方式从属性文件中的部分读取所有值:

您的config.properties文件布局:

[SECTION_NAME]  
key1 = value1  
key2 = value2  

您的代码:

   import configparser

   config = configparser.RawConfigParser()
   config.read('path_to_config.properties file')

   details_dict = dict(config.items('SECTION_NAME'))

这将为您提供一个字典,其中的键与配置文件中的键及其相应值相同。

details_dict 是:

{'key1':'value1', 'key2':'value2'}

现在获取key1的值: details_dict['key1']

将所有内容放到仅从配置文件读取该部分一次的方法中(在程序运行期间第一次调用该方法)。

def get_config_dict():
    if not hasattr(get_config_dict, 'config_dict'):
        get_config_dict.config_dict = dict(config.items('SECTION_NAME'))
    return get_config_dict.config_dict

现在调用上面的函数并获取所需键的值:

config_details = get_config_dict()
key_1_value = config_details['key1'] 

————————————————– ———–

扩展上述方法,自动逐节阅读,然后按节名和键名进行访问。

def get_config_section():
    if not hasattr(get_config_section, 'section_dict'):
        get_config_section.section_dict = dict()

        for section in config.sections():
            get_config_section.section_dict[section] = 
                             dict(config.items(section))

    return get_config_section.section_dict

要访问:

config_dict = get_config_section()

port = config_dict['DB']['port'] 

(此处“ DB”是配置文件中的节名称,“端口”是“ DB”节下的键。)

If you need to read all values from a section in properties file in a simple manner:

Your config.properties file layout :

[SECTION_NAME]  
key1 = value1  
key2 = value2  

You code:

   import configparser

   config = configparser.RawConfigParser()
   config.read('path_to_config.properties file')

   details_dict = dict(config.items('SECTION_NAME'))

This will give you a dictionary where keys are same as in config file and their corresponding values.

details_dict is :

{'key1':'value1', 'key2':'value2'}

Now to get key1’s value : details_dict['key1']

Putting it all in a method which reads that section from config file only once(the first time the method is called during a program run).

def get_config_dict():
    if not hasattr(get_config_dict, 'config_dict'):
        get_config_dict.config_dict = dict(config.items('SECTION_NAME'))
    return get_config_dict.config_dict

Now call the above function and get the required key’s value :

config_details = get_config_dict()
key_1_value = config_details['key1'] 

————————————————————-

Extending the approach mentioned above, reading section by section automatically and then accessing by section name followed by key name.

def get_config_section():
    if not hasattr(get_config_section, 'section_dict'):
        get_config_section.section_dict = dict()

        for section in config.sections():
            get_config_section.section_dict[section] = 
                             dict(config.items(section))

    return get_config_section.section_dict

To access:

config_dict = get_config_section()

port = config_dict['DB']['port'] 

(here ‘DB’ is a section name in config file and ‘port’ is a key under section ‘DB’.)


回答 15

下面的两行代码显示了如何使用Python List Comprehension加载“ java样式”属性文件。

split_properties=[line.split("=") for line in open('/<path_to_property_file>)]
properties={key: value for key,value in split_properties }

请查看以下帖子以了解详细信息 https://ilearnonlinesite.wordpress.com/2017/07/24/reading-property-file-in-python-using-comprehension-and-generators/

Below 2 lines of code shows how to use Python List Comprehension to load ‘java style’ property file.

split_properties=[line.split("=") for line in open('/<path_to_property_file>)]
properties={key: value for key,value in split_properties }

Please have a look at below post for details https://ilearnonlinesite.wordpress.com/2017/07/24/reading-property-file-in-python-using-comprehension-and-generators/


回答 16

您可以将参数“ fromfile_prefix_chars”与argparse一起使用,以从配置文件中读取内容,如下所示-

临时

parser = argparse.ArgumentParser(fromfile_prefix_chars='#')
parser.add_argument('--a')
parser.add_argument('--b')
args = parser.parse_args()
print(args.a)
print(args.b)

配置文件

--a
hello
--b
hello dear

运行命令

python temp.py "#config"

you can use parameter “fromfile_prefix_chars” with argparse to read from config file as below—

temp.py

parser = argparse.ArgumentParser(fromfile_prefix_chars='#')
parser.add_argument('--a')
parser.add_argument('--b')
args = parser.parse_args()
print(args.a)
print(args.b)

config file

--a
hello
--b
hello dear

Run command

python temp.py "#config"

回答 17

我使用ConfigParser进行了如下操作。该代码假定在放置BaseTest的同一目录中有一个名为config.prop的文件:

配置文件

[CredentialSection]
app.name=MyAppName

BaseTest.py:

import unittest
import ConfigParser

class BaseTest(unittest.TestCase):
    def setUp(self):
        __SECTION = 'CredentialSection'
        config = ConfigParser.ConfigParser()
        config.readfp(open('config.prop'))
        self.__app_name = config.get(__SECTION, 'app.name')

    def test1(self):
        print self.__app_name % This should print: MyAppName

I did this using ConfigParser as follows. The code assumes that there is a file called config.prop in the same directory where BaseTest is placed:

config.prop

[CredentialSection]
app.name=MyAppName

BaseTest.py:

import unittest
import ConfigParser

class BaseTest(unittest.TestCase):
    def setUp(self):
        __SECTION = 'CredentialSection'
        config = ConfigParser.ConfigParser()
        config.readfp(open('config.prop'))
        self.__app_name = config.get(__SECTION, 'app.name')

    def test1(self):
        print self.__app_name % This should print: MyAppName

回答 18

这就是我编写的用于解析文件并将其设置为env变量的内容,该变量会跳过注释,并且非关键值行添加了开关来指定hg:d

  • -h或–help打印用法摘要
  • -c指定用于标识注释的字符
  • -s prop文件中键和值之间的分隔符
  • 并指定需要解析的属性文件,例如:python EnvParamSet.py -c#-s = env.properties

    import pipes
    import sys , getopt
    import os.path
    
    class Parsing :
    
            def __init__(self , seprator , commentChar , propFile):
            self.seprator = seprator
            self.commentChar = commentChar
            self.propFile  = propFile
    
        def  parseProp(self):
            prop = open(self.propFile,'rU')
            for line in prop :
                if line.startswith(self.commentChar)==False and  line.find(self.seprator) != -1  :
                    keyValue = line.split(self.seprator)
                    key =  keyValue[0].strip() 
                    value = keyValue[1].strip() 
                            print("export  %s=%s" % (str (key),pipes.quote(str(value))))
    
    
    
    
    class EnvParamSet:
    
        def main (argv):
    
            seprator = '='
            comment =  '#'
    
            if len(argv)  is 0:
                print "Please Specify properties file to be parsed "
                sys.exit()
            propFile=argv[-1] 
    
    
            try :
                opts, args = getopt.getopt(argv, "hs:c:f:", ["help", "seprator=","comment=", "file="])
            except getopt.GetoptError,e:
                print str(e)
                print " possible  arguments  -s <key value sperator > -c < comment char >    <file> \n  Try -h or --help "
                sys.exit(2)
    
    
            if os.path.isfile(args[0])==False:
                print "File doesnt exist "
                sys.exit()
    
    
            for opt , arg  in opts :
                if opt in ("-h" , "--help"):
                    print " hg:d  \n -h or --help print usage summary \n -c Specify char that idetifes comment  \n -s Sperator between key and value in prop file \n  specify file  "
                    sys.exit()
                elif opt in ("-s" , "--seprator"):
                    seprator = arg 
                elif opt in ("-c"  , "--comment"):
                    comment  = arg
    
            p = Parsing( seprator, comment , propFile)
            p.parseProp()
    
        if __name__ == "__main__":
                main(sys.argv[1:])

This is what i had written to parse file and set it as env variables which skips comments and non key value lines added switches to specify hg:d

  • -h or –help print usage summary
  • -c Specify char that identifies comment
  • -s Separator between key and value in prop file
  • and specify properties file that needs to be parsed eg : python EnvParamSet.py -c # -s = env.properties

    import pipes
    import sys , getopt
    import os.path
    
    class Parsing :
    
            def __init__(self , seprator , commentChar , propFile):
            self.seprator = seprator
            self.commentChar = commentChar
            self.propFile  = propFile
    
        def  parseProp(self):
            prop = open(self.propFile,'rU')
            for line in prop :
                if line.startswith(self.commentChar)==False and  line.find(self.seprator) != -1  :
                    keyValue = line.split(self.seprator)
                    key =  keyValue[0].strip() 
                    value = keyValue[1].strip() 
                            print("export  %s=%s" % (str (key),pipes.quote(str(value))))
    
    
    
    
    class EnvParamSet:
    
        def main (argv):
    
            seprator = '='
            comment =  '#'
    
            if len(argv)  is 0:
                print "Please Specify properties file to be parsed "
                sys.exit()
            propFile=argv[-1] 
    
    
            try :
                opts, args = getopt.getopt(argv, "hs:c:f:", ["help", "seprator=","comment=", "file="])
            except getopt.GetoptError,e:
                print str(e)
                print " possible  arguments  -s <key value sperator > -c < comment char >    <file> \n  Try -h or --help "
                sys.exit(2)
    
    
            if os.path.isfile(args[0])==False:
                print "File doesnt exist "
                sys.exit()
    
    
            for opt , arg  in opts :
                if opt in ("-h" , "--help"):
                    print " hg:d  \n -h or --help print usage summary \n -c Specify char that idetifes comment  \n -s Sperator between key and value in prop file \n  specify file  "
                    sys.exit()
                elif opt in ("-s" , "--seprator"):
                    seprator = arg 
                elif opt in ("-c"  , "--comment"):
                    comment  = arg
    
            p = Parsing( seprator, comment , propFile)
            p.parseProp()
    
        if __name__ == "__main__":
                main(sys.argv[1:])
    

回答 19

Lightbend发布了Typesafe Config库,该库可解析属性文件以及一些基于JSON的扩展。Lightbend的库仅用于JVM,但似乎已被广泛采用,并且现在有许多语言的端口,包括Python:https : //github.com/chimpler/pyhocon

Lightbend has released the Typesafe Config library, which parses properties files and also some JSON-based extensions. Lightbend’s library is only for the JVM, but it seems to be widely adopted and there are now ports in many languages, including Python: https://github.com/chimpler/pyhocon


回答 20

您可以使用以下函数,它是@mvallebr的修改代码。它尊重属性文件的注释,忽略空的新行,并允许检索单个键值。

def getProperties(propertiesFile ="/home/memin/.config/customMemin/conf.properties", key=''):
    """
    Reads a .properties file and returns the key value pairs as dictionary.
    if key value is specified, then it will return its value alone.
    """
    with open(propertiesFile) as f:
        l = [line.strip().split("=") for line in f.readlines() if not line.startswith('#') and line.strip()]
        d = {key.strip(): value.strip() for key, value in l}

        if key:
            return d[key]
        else:
            return d

You can use the following function, which is the modified code of @mvallebr. It respects the properties file comments, ignores empty new lines, and allows retrieving a single key value.

def getProperties(propertiesFile ="/home/memin/.config/customMemin/conf.properties", key=''):
    """
    Reads a .properties file and returns the key value pairs as dictionary.
    if key value is specified, then it will return its value alone.
    """
    with open(propertiesFile) as f:
        l = [line.strip().split("=") for line in f.readlines() if not line.startswith('#') and line.strip()]
        d = {key.strip(): value.strip() for key, value in l}

        if key:
            return d[key]
        else:
            return d

回答 21

这对我有用。

from pyjavaproperties import Properties
p = Properties()
p.load(open('test.properties'))
p.list()
print p
print p.items()
print p['name3']

this works for me.

from pyjavaproperties import Properties
p = Properties()
p.load(open('test.properties'))
p.list()
print p
print p.items()
print p['name3']

回答 22

我遵循configparser方法,对我来说效果很好。创建了一个PropertyReader文件,并在其中使用了配置解析器以准备与每个部分相对应的属性。

**使用Python 2.7

PropertyReader.py文件的内容:

#!/usr/bin/python
import ConfigParser

class PropertyReader:

def readProperty(self, strSection, strKey):
    config = ConfigParser.RawConfigParser()
    config.read('ConfigFile.properties')
    strValue = config.get(strSection,strKey);
    print "Value captured for "+strKey+" :"+strValue
    return strValue

读取架构文件的内容:

from PropertyReader import *

class ReadSchema:

print PropertyReader().readProperty('source1_section','source_name1')
print PropertyReader().readProperty('source2_section','sn2_sc1_tb')

.properties文件的内容:

[source1_section]
source_name1:module1
sn1_schema:schema1,schema2,schema3
sn1_sc1_tb:employee,department,location
sn1_sc2_tb:student,college,country

[source2_section]
source_name1:module2
sn2_schema:schema4,schema5,schema6
sn2_sc1_tb:employee,department,location
sn2_sc2_tb:student,college,country

I followed configparser approach and it worked quite well for me. Created one PropertyReader file and used config parser there to ready property to corresponding to each section.

**Used Python 2.7

Content of PropertyReader.py file:

#!/usr/bin/python
import ConfigParser

class PropertyReader:

def readProperty(self, strSection, strKey):
    config = ConfigParser.RawConfigParser()
    config.read('ConfigFile.properties')
    strValue = config.get(strSection,strKey);
    print "Value captured for "+strKey+" :"+strValue
    return strValue

Content of read schema file:

from PropertyReader import *

class ReadSchema:

print PropertyReader().readProperty('source1_section','source_name1')
print PropertyReader().readProperty('source2_section','sn2_sc1_tb')

Content of .properties file:

[source1_section]
source_name1:module1
sn1_schema:schema1,schema2,schema3
sn1_sc1_tb:employee,department,location
sn1_sc2_tb:student,college,country

[source2_section]
source_name1:module2
sn2_schema:schema4,schema5,schema6
sn2_sc1_tb:employee,department,location
sn2_sc2_tb:student,college,country

回答 23

在python模块中创建一个字典并将所有内容存储到其中并访问它,例如:

dict = {
       'portalPath' : 'www.xyx.com',
       'elementID': 'submit'}

现在访问它,您只需执行以下操作:

submitButton = driver.find_element_by_id(dict['elementID'])

create a dictionary in your python module and store everything into it and access it, for example:

dict = {
       'portalPath' : 'www.xyx.com',
       'elementID': 'submit'}

Now to access it you can simply do:

submitButton = driver.find_element_by_id(dict['elementID'])