标签归档:Python

使用Matplotlib以非阻塞方式绘制

问题:使用Matplotlib以非阻塞方式绘制

最近几天,我一直在玩Numpy和matplotlib。我在尝试使matplotlib绘制函数而不阻止执行时遇到问题。我知道这里已经有很多线程在问类似的问题,并且我已经在Google上搜索了很多,但是还没有成功完成这项工作。

我曾尝试按照某些人的建议使用show(block = False),但是我得到的只是一个冻结的窗口。如果我简单地调用show(),则将正确绘制结果,但执行将被阻塞,直到关闭窗口为止。从我读过的其他线程中,我怀疑show(block = False)是否起作用取决于后端。这样对吗?我的后端是Qt4Agg。您能否看一下我的代码,并告诉我是否看到错误?这是我的代码。谢谢你的帮助。

from math import *
from matplotlib import pyplot as plt
print plt.get_backend()



def main():
    x = range(-50, 51, 1)
    for pow in range(1,5):   # plot x^1, x^2, ..., x^4

        y = [Xi**pow for Xi in x]
        print y

        plt.plot(x, y)
        plt.draw()
        #plt.show()             #this plots correctly, but blocks execution.
        plt.show(block=False)   #this creates an empty frozen window.
        _ = raw_input("Press [enter] to continue.")


if __name__ == '__main__':
    main()

PS。我忘了说,我想在每次绘制图形时都更新现有窗口,而不是创建一个新窗口。

I have been playing with Numpy and matplotlib in the last few days. I am having problems trying to make matplotlib plot a function without blocking execution. I know there are already many threads here on SO asking similar questions, and I ‘ve googled quite a lot but haven’t managed to make this work.

I have tried using show(block=False) as some people suggest, but all I get is a frozen window. If I simply call show(), the result is plotted properly but execution is blocked until the window is closed. From other threads I ‘ve read, I suspect that whether show(block=False) works or not depends on the backend. Is this correct? My back end is Qt4Agg. Could you have a look at my code and tell me if you see something wrong? Here is my code. Thanks for any help.

from math import *
from matplotlib import pyplot as plt
print plt.get_backend()



def main():
    x = range(-50, 51, 1)
    for pow in range(1,5):   # plot x^1, x^2, ..., x^4

        y = [Xi**pow for Xi in x]
        print y

        plt.plot(x, y)
        plt.draw()
        #plt.show()             #this plots correctly, but blocks execution.
        plt.show(block=False)   #this creates an empty frozen window.
        _ = raw_input("Press [enter] to continue.")


if __name__ == '__main__':
    main()

PS. I forgot to say that I would like to update the existing window every time I plot something, instead of creating a new one.


回答 0

我花了很长时间寻找解决方案,并找到了答案

看起来,为了获得您(和我)想要的东西,您需要将plt.ion()plt.show()(而不是与block=False)结合在一起,最重要的是,plt.pause(.001)(或您想要的任何时间)结合在一起。该暂停是必须的,因为GUI事件,而主代码正在睡觉,包括绘图发生。这很可能是通过从休眠线程中获取时间来实现的,所以IDE可能会为此惹恼我不知道。

这是对我适用于python 3.5的实现:

import numpy as np
from matplotlib import pyplot as plt

def main():
    plt.axis([-50,50,0,10000])
    plt.ion()
    plt.show()

    x = np.arange(-50, 51)
    for pow in range(1,5):   # plot x^1, x^2, ..., x^4
        y = [Xi**pow for Xi in x]
        plt.plot(x, y)
        plt.draw()
        plt.pause(0.001)
        input("Press [enter] to continue.")

if __name__ == '__main__':
    main()

I spent a long time looking for solutions, and found this answer.

It looks like, in order to get what you (and I) want, you need the combination of plt.ion(), plt.show() (not with block=False) and, most importantly, plt.pause(.001) (or whatever time you want). The pause is needed because the GUI events happen while the main code is sleeping, including drawing. It’s possible that this is implemented by picking up time from a sleeping thread, so maybe IDEs mess with that—I don’t know.

Here’s an implementation that works for me on python 3.5:

import numpy as np
from matplotlib import pyplot as plt

def main():
    plt.axis([-50,50,0,10000])
    plt.ion()
    plt.show()

    x = np.arange(-50, 51)
    for pow in range(1,5):   # plot x^1, x^2, ..., x^4
        y = [Xi**pow for Xi in x]
        plt.plot(x, y)
        plt.draw()
        plt.pause(0.001)
        input("Press [enter] to continue.")

if __name__ == '__main__':
    main()

回答 1


一个对我有用的简单技巧如下:

  1. 在show内使用block = False参数:plt.show(block = False)
  2. 在.py脚本的末尾使用另一个plt.show()

范例

import matplotlib.pyplot as plt

plt.imshow(add_something)
plt.xlabel("x")
plt.ylabel("y")

plt.show(block=False)

#more code here (e.g. do calculations and use print to see them on the screen

plt.show()

注意plt.show()是我脚本的最后一行。


A simple trick that works for me is the following:

  1. Use the block = False argument inside show: plt.show(block = False)
  2. Use another plt.show() at the end of the .py script.

Example:

import matplotlib.pyplot as plt

plt.imshow(add_something)
plt.xlabel("x")
plt.ylabel("y")

plt.show(block=False)

#more code here (e.g. do calculations and use print to see them on the screen

plt.show()

Note: plt.show() is the last line of my script.


回答 2

您可以通过将绘图写入数组,然后在另一个线程中显示该数组来避免阻塞执行。这是一个使用pyformulas 0.2.8中的 pf.screen同时生成和显示图的示例:

import pyformulas as pf
import matplotlib.pyplot as plt
import numpy as np
import time

fig = plt.figure()

canvas = np.zeros((480,640))
screen = pf.screen(canvas, 'Sinusoid')

start = time.time()
while True:
    now = time.time() - start

    x = np.linspace(now-2, now, 100)
    y = np.sin(2*np.pi*x) + np.sin(3*np.pi*x)
    plt.xlim(now-2,now+1)
    plt.ylim(-3,3)
    plt.plot(x, y, c='black')

    # If we haven't already shown or saved the plot, then we need to draw the figure first...
    fig.canvas.draw()

    image = np.fromstring(fig.canvas.tostring_rgb(), dtype=np.uint8, sep='')
    image = image.reshape(fig.canvas.get_width_height()[::-1] + (3,))

    screen.update(image)

#screen.close()

结果:

免责声明:我是pypyulas的维护者。

参考:Matplotlib:将图保存到numpy数组

You can avoid blocking execution by writing the plot to an array, then displaying the array in a different thread. Here is an example of generating and displaying plots simultaneously using pf.screen from pyformulas 0.2.8:

import pyformulas as pf
import matplotlib.pyplot as plt
import numpy as np
import time

fig = plt.figure()

canvas = np.zeros((480,640))
screen = pf.screen(canvas, 'Sinusoid')

start = time.time()
while True:
    now = time.time() - start

    x = np.linspace(now-2, now, 100)
    y = np.sin(2*np.pi*x) + np.sin(3*np.pi*x)
    plt.xlim(now-2,now+1)
    plt.ylim(-3,3)
    plt.plot(x, y, c='black')

    # If we haven't already shown or saved the plot, then we need to draw the figure first...
    fig.canvas.draw()

    image = np.fromstring(fig.canvas.tostring_rgb(), dtype=np.uint8, sep='')
    image = image.reshape(fig.canvas.get_width_height()[::-1] + (3,))

    screen.update(image)

#screen.close()

Result:

Disclaimer: I’m the maintainer for pyformulas.

Reference: Matplotlib: save plot to numpy array


回答 3

这些答案中有很多是超级夸张的,从我的发现中,答案并不是那么难理解。

您可以plt.ion()根据需要使用,但我发现使用plt.draw()同样有效

对于我的特定项目,我正在绘制图像,但是您可以使用plot()scatter()或其他任何一种来代替figimage(),这没关系。

plt.figimage(image_to_show)
plt.draw()
plt.pause(0.001)

要么

fig = plt.figure()
...
fig.figimage(image_to_show)
fig.canvas.draw()
plt.pause(0.001)

如果您使用的是实际数字。
我使用了@ krs013和@Default Picture的答案来解决这个问题,
希望这可以使某人不必在一个单独的线程上启动每个单独的角色,或者不必阅读这些小说就可以解决这个问题。

A lot of these answers are super inflated and from what I can find, the answer isn’t all that difficult to understand.

You can use plt.ion() if you want, but I found using plt.draw() just as effective

For my specific project I’m plotting images, but you can use plot() or scatter() or whatever instead of figimage(), it doesn’t matter.

plt.figimage(image_to_show)
plt.draw()
plt.pause(0.001)

Or

fig = plt.figure()
...
fig.figimage(image_to_show)
fig.canvas.draw()
plt.pause(0.001)

If you’re using an actual figure.
I used @krs013, and @Default Picture’s answers to figure this out
Hopefully this saves someone from having launch every single figure on a separate thread, or from having to read these novels just to figure this out


回答 4

实时绘图

import numpy as np
import matplotlib.pyplot as plt

x = np.linspace(0, 2 * np.pi, 100)
# plt.axis([x[0], x[-1], -1, 1])      # disable autoscaling
for point in x:
    plt.plot(point, np.sin(2 * point), '.', color='b')
    plt.draw()
    plt.pause(0.01)
# plt.clf()                           # clear the current figure

如果数据量太多,您可以通过一个简单的计数器降低更新率

cnt += 1
if (cnt == 10):       # update plot each 10 points
    plt.draw()
    plt.pause(0.01)
    cnt = 0

程序退出后的保持图

这是我的实际问题,无法找到令人满意的答案,我想在脚本完成后未关闭的绘图(例如MATLAB),

如果您考虑一下,在脚本完成后,程序将终止,并且没有逻辑方式以这种方式保存绘图,因此有两个选择

  1. 阻止脚本退出(这是plt.show()而不是我想要的)
  2. 在单独的线程上运行图(太复杂)

这对我来说并不令人满意,所以我在盒子外面找到了另一个解决方案

SaveToFile和在外部查看器中查看

为此,保存和查看均应快速进行,查看器不应锁定文件,而应自动更新内容

选择保存格式

基于矢量的格式既小又快速

  • SVG不错,但是除了默认情况下需要手动刷新的Web浏览器之外,找不到合适的查看器
  • PDF可支持矢量格式,并且有支持实时更新的轻量级查看器

快速实时更新的轻量级查看器

对于PDF,有几个不错的选择

  • 在Windows上,我使用免费,快速,轻巧的SumatraPDF(我的机箱仅使用1.8MB RAM)

  • 在Linux上,有几种选择,例如Evince(GNOME)和Ocular(KDE)

示例代码和结果

用于将绘图输出到文件的示例代码

import numpy as np
import matplotlib.pyplot as plt

x = np.linspace(0, 2 * np.pi, 100)
y = np.sin(2 * x)
plt.plot(x, y)
plt.savefig("fig.pdf")

第一次运行后,在上述其中一个查看器中打开输出文件并欣赏。

这是VSCode和SumatraPDF的屏幕截图,该过程也足够快以达到半实时更新率(我的设置可以time.sleep()在间隔之间使用时接近10Hz )

Live Plotting

import numpy as np
import matplotlib.pyplot as plt

x = np.linspace(0, 2 * np.pi, 100)
# plt.axis([x[0], x[-1], -1, 1])      # disable autoscaling
for point in x:
    plt.plot(point, np.sin(2 * point), '.', color='b')
    plt.draw()
    plt.pause(0.01)
# plt.clf()                           # clear the current figure

if the amount of data is too much you can lower the update rate with a simple counter

cnt += 1
if (cnt == 10):       # update plot each 10 points
    plt.draw()
    plt.pause(0.01)
    cnt = 0

Holding Plot after Program Exit

This was my actual problem that couldn’t find satisfactory answer for, I wanted plotting that didn’t close after the script was finished (like MATLAB),

If you think about it, after the script is finished, the program is terminated and there is no logical way to hold the plot this way, so there are two options

  1. block the script from exiting (that’s plt.show() and not what I want)
  2. run the plot on a separate thread (too complicated)

this wasn’t satisfactory for me so I found another solution outside of the box

SaveToFile and View in external viewer

For this the saving and viewing should be both fast and the viewer shouldn’t lock the file and should update the content automatically

Selecting Format for Saving

vector based formats are both small and fast

  • SVG is good but coudn’t find good viewer for it except the web browser which by default needs manual refresh
  • PDF can support vector formats and there are lightweight viewers which support live updating

Fast Lightweight Viewer with Live Update

For PDF there are several good options

  • On Windows I use SumatraPDF which is free, fast and light (only uses 1.8MB RAM for my case)

  • On Linux there are several options such as Evince (GNOME) and Ocular (KDE)

Sample Code & Results

Sample code for outputing plot to a file

import numpy as np
import matplotlib.pyplot as plt

x = np.linspace(0, 2 * np.pi, 100)
y = np.sin(2 * x)
plt.plot(x, y)
plt.savefig("fig.pdf")

after first run, open the output file in one of the viewers mentioned above and enjoy.

Here is a screenshot of VSCode alongside SumatraPDF, also the process is fast enough to get semi-live update rate (I can get near 10Hz on my setup just use time.sleep() between intervals)


回答 5

Iggy的答案对我来说是最容易遵循的,但是subplot当我执行刚在执行的后续命令时却遇到了以下错误show

MatplotlibDeprecationWarning:当前使用与先前轴相同的参数添加轴将重用较早的实例。在将来的版本中,将始终创建并返回一个新实例。同时,通过向每个轴实例传递唯一的标签,可以抑制此警告,并确保将来的行为。

为了避免此错误,它有助于在用户点击Enter后关闭(或清除)绘图。

这是对我有用的代码:

def plt_show():
    '''Text-blocking version of plt.show()
    Use this instead of plt.show()'''
    plt.draw()
    plt.pause(0.001)
    input("Press enter to continue...")
    plt.close()

Iggy’s answer was the easiest for me to follow, but I got the following error when doing a subsequent subplot command that was not there when I was just doing show:

MatplotlibDeprecationWarning: Adding an axes using the same arguments as a previous axes currently reuses the earlier instance. In a future version, a new instance will always be created and returned. Meanwhile, this warning can be suppressed, and the future behavior ensured, by passing a unique label to each axes instance.

In order to avoid this error, it helps to close (or clear) the plot after the user hits enter.

Here’s the code that worked for me:

def plt_show():
    '''Text-blocking version of plt.show()
    Use this instead of plt.show()'''
    plt.draw()
    plt.pause(0.001)
    input("Press enter to continue...")
    plt.close()

回答 6

Python包drawow允许以非阻塞方式实时更新绘图。
它还可以与网络摄像头和OpenCV配合使用,例如绘制每个帧的度量。
请参阅原始帖子

The Python package drawnow allows to update a plot in real time in a non blocking way.
It also works with a webcam and OpenCV for example to plot measures for each frame.
See the original post.


使用Requirements.txt安装时,避免在单个软件包上出现故障

问题:使用Requirements.txt安装时,避免在单个软件包上出现故障

我正在从安装软件包 requirements.txt

pip install -r requirements.txt

requirements.txt文件显示为:

Pillow
lxml
cssselect
jieba
beautifulsoup
nltk

lxml是唯一无法安装的软件包,这将导致一切失败(larsk在注释中指出了预期的结果)。但是,lxml失败后pip仍会继续运行并下载其余软件包。

据我了解,pip install -r requirements.txt如果requirements.txt无法安装中列出的任何软件包,该命令将失败。

我在运行时可以传递任何参数pip install -r requirements.txt来告诉它安装可以执行的操作并跳过不能执行的程序包,或者在看到失败后立即退出吗?

I am installing packages from requirements.txt

pip install -r requirements.txt

The requirements.txt file reads:

Pillow
lxml
cssselect
jieba
beautifulsoup
nltk

lxml is the only package failing to install and this leads to everything failing (expected results as pointed out by larsks in the comments). However, after lxml fails pip still runs through and downloads the rest of the packages.

From what I understand the pip install -r requirements.txt command will fail if any of the packages listed in the requirements.txt fail to install.

Is there any argument I can pass when running pip install -r requirements.txt to tell it to install what it can and skip the packages that it cannot, or to exit as soon as it sees something fail?


回答 0

运行每一行pip install可能是一种解决方法。

cat requirements.txt | xargs -n 1 pip install

注意:该-a参数在MacOS下不可用,因此老猫更便携。

Running each line with pip install may be a workaround.

cat requirements.txt | xargs -n 1 pip install

Note: -a parameter is not available under MacOS, so old cat is more portable.


回答 1

该解决方案处理您的requirements.txt中的空行,空白行,#注释行,whitespace-then-#注释行。

cat requirements.txt | sed -e '/^\s*#.*$/d' -e '/^\s*$/d' | xargs -n 1 pip install

帽尖到这个答案的sed的魔法。

This solution handles empty lines, whitespace lines, # comment lines, whitespace-then-# comment lines in your requirements.txt.

cat requirements.txt | sed -e '/^\s*#.*$/d' -e '/^\s*$/d' | xargs -n 1 pip install

Hat tip to this answer for the sed magic.


回答 2

对于Windows:

点版本> = 18

import sys
from pip._internal import main as pip_main

def install(package):
    pip_main(['install', package])

if __name__ == '__main__':
    with open(sys.argv[1]) as f:
        for line in f:
            install(line)

点版本<18

import sys
import pip

def install(package):
    pip.main(['install', package])

if __name__ == '__main__':
    with open(sys.argv[1]) as f:
        for line in f:
            install(line)

For Windows:

pip version >=18

import sys
from pip._internal import main as pip_main

def install(package):
    pip_main(['install', package])

if __name__ == '__main__':
    with open(sys.argv[1]) as f:
        for line in f:
            install(line)

pip version <18

import sys
import pip

def install(package):
    pip.main(['install', package])

if __name__ == '__main__':
    with open(sys.argv[1]) as f:
        for line in f:
            install(line)

回答 3

xargs解决方案有效,但可能会存在可移植性问题(BSD / GNU),并且/或者如果您的需求文件中有注释或空白行,则可能会很麻烦。

至于需要这种行为的用例,例如,我使用了两个单独的需求文件,一个仅列出需要始终安装的核心依赖关系,而另一个文件则列出了90%的具有非核心依赖关系的文件。大多数用例都不需要。这相当于Recommendsdebian软件包的这一部分。

我使用以下shell脚本(requires sed)安装可选的依赖项

#!/bin/sh

while read dependency; do
    dependency_stripped="$(echo "${dependency}" | sed -e 's/^[[:space:]]*//' -e 's/[[:space:]]*$//')"
    # Skip comments
    if [[ $dependency_stripped == \#* ]]; then
        continue
    # Skip blank lines
    elif [ -z "$dependency_stripped" ]; then
        continue
    else
        if pip install "$dependency_stripped"; then
            echo "$dependency_stripped is installed"
        else
            echo "Could not install $dependency_stripped, skipping"
        fi
    fi
done < recommends.txt

The xargs solution works but can have portability issues (BSD/GNU) and/or be cumbersome if you have comments or blank lines in your requirements file.

As for the usecase where such a behavior would be required, I use for instance two separate requirement files, one which is only listing core dependencies that need to be always installed and another file with non-core dependencies that are in 90% of the cases not needed for most usecases. That would be an equivalent of the Recommends section of a debian package.

I use the following shell script (requires sed) to install optional dependencies:

#!/bin/sh

while read dependency; do
    dependency_stripped="$(echo "${dependency}" | sed -e 's/^[[:space:]]*//' -e 's/[[:space:]]*$//')"
    # Skip comments
    if [[ $dependency_stripped == \#* ]]; then
        continue
    # Skip blank lines
    elif [ -z "$dependency_stripped" ]; then
        continue
    else
        if pip install "$dependency_stripped"; then
            echo "$dependency_stripped is installed"
        else
            echo "Could not install $dependency_stripped, skipping"
        fi
    fi
done < recommends.txt

回答 4

谢谢Etienne Prothon提供的Windows保护套。

但是,升级到pip 18后,pip包不会将main公开。因此,您可能需要像这样更改代码。

 # This code install line by line a list of pip package 
 import sys
 from pip._internal import main as pip_main

 def install(package):
    pip_main(['install', package])

 if __name__ == '__main__':
    with open(sys.argv[1]) as f:
        for line in f:
            install(line)

Thanks, Etienne Prothon for windows cases.

But, after upgrading to pip 18, pip package don’t expose main to public. So you may need to change code like this.

 # This code install line by line a list of pip package 
 import sys
 from pip._internal import main as pip_main

 def install(package):
    pip_main(['install', package])

 if __name__ == '__main__':
    with open(sys.argv[1]) as f:
        for line in f:
            install(line)

回答 5

对于Windows:

import os
from pip.__main__ import _main as main

error_log = open('error_log.txt', 'w')

def install(package):
    try:
        main(['install'] + [str(package)])
    except Exception as e:
        error_log.write(str(e))

if __name__ == '__main__':
    f = open('requirements1.txt', 'r')
    for line in f:
        install(line)
    f.close()
    error_log.close()
  1. 创建一个本地目录,然后将requirements.txt文件放入其中。
  2. 复制上面的代码,并将其另存为python文件在同一目录中。.py例如,请记住使用扩展名,install_packages.py
  3. 使用cmd运行此文件: python install_packages.py
  4. 提到的所有软件包都将一口气安装起来,而无需停止。:)

您可以在安装功能中添加其他参数。喜欢: main(['install'] + [str(package)] + ['--update'])

For Windows:

import os
from pip.__main__ import _main as main

error_log = open('error_log.txt', 'w')

def install(package):
    try:
        main(['install'] + [str(package)])
    except Exception as e:
        error_log.write(str(e))

if __name__ == '__main__':
    f = open('requirements1.txt', 'r')
    for line in f:
        install(line)
    f.close()
    error_log.close()
  1. Create a local directory, and put your requirements.txt file in it.
  2. Copy the code above and save it as a python file in the same directory. Remember to use .py extension, for instance, install_packages.py
  3. Run this file using a cmd: python install_packages.py
  4. All the packages mentioned will be installed in one go without stopping at all. :)

You can add other parameters in install function. Like: main(['install'] + [str(package)] + ['--update'])


如何在字典理解中使用if / else?

问题:如何在字典理解中使用if / else?

Python 2.7+中是否存在一种类似于以下内容的方法?

{ something_if_true if condition else something_if_false for key, value in dict_.items() }

我知道您只要使用’if’就可以做任何事情:

{ something_if_true for key, value in dict_.items() if condition}

Does there exist a way in Python 2.7+ to make something like the following?

{ something_if_true if condition else something_if_false for key, value in dict_.items() }

I know you can make anything with just ‘if’:

{ something_if_true for key, value in dict_.items() if condition}

回答 0

您已经知道了:A if test else B是有效的Python表达式。所示的dict理解的唯一问题是dict理解中表达式的位置必须有两个表达式,并用冒号分隔:

{ (some_key if condition else default_key):(something_if_true if condition
          else something_if_false) for key, value in dict_.items() }

final if子句充当过滤器,这与具有条件表达式不同。

You’ve already got it: A if test else B is a valid Python expression. The only problem with your dict comprehension as shown is that the place for an expression in a dict comprehension must have two expressions, separated by a colon:

{ (some_key if condition else default_key):(something_if_true if condition
          else something_if_false) for key, value in dict_.items() }

The final if clause acts as a filter, which is different from having the conditional expression.


回答 1

@Marcin的答案涵盖了所有内容,但是如果有人想看一个实际的示例,我在下面添加两个:

假设您有以下集合的字典

d = {'key1': {'a', 'b', 'c'}, 'key2': {'foo', 'bar'}, 'key3': {'so', 'sad'}}

并且您想创建一个新字典,其字典中的键指示值中是否包含字符串,'a'则可以使用

dout = {"a_in_values_of_{}".format(k) if 'a' in v else "a_not_in_values_of_{}".format(k): v for k, v in d.items()}

产生

{'a_in_values_of_key1': {'a', 'b', 'c'},
 'a_not_in_values_of_key2': {'bar', 'foo'},
 'a_not_in_values_of_key3': {'sad', 'so'}}

现在假设您有两个这样的字典

d1 = {'bad_key1': {'a', 'b', 'c'}, 'bad_key2': {'foo', 'bar'}, 'bad_key3': {'so', 'sad'}}
d2 = {'good_key1': {'foo', 'bar', 'xyz'}, 'good_key2': {'a', 'b', 'c'}}

如果您要替换的键中d1d2值相同,则可以

# here we assume that the values in d2 are unique
# Python 2
dout2 = {d2.keys()[d2.values().index(v1)] if v1 in d2.values() else k1: v1 for k1, v1 in d1.items()}

# Python 3
dout2 = {list(d2.keys())[list(d2.values()).index(v1)] if v1 in d2.values() else k1: v1 for k1, v1 in d1.items()}

这使

{'bad_key2': {'bar', 'foo'},
 'bad_key3': {'sad', 'so'},
 'good_key2': {'a', 'b', 'c'}}

@Marcin’s answer covers it all, but just in case someone wants to see an actual example, I add two below:

Let’s say you have the following dictionary of sets

d = {'key1': {'a', 'b', 'c'}, 'key2': {'foo', 'bar'}, 'key3': {'so', 'sad'}}

and you want to create a new dictionary whose keys indicate whether the string 'a' is contained in the values or not, you can use

dout = {"a_in_values_of_{}".format(k) if 'a' in v else "a_not_in_values_of_{}".format(k): v for k, v in d.items()}

which yields

{'a_in_values_of_key1': {'a', 'b', 'c'},
 'a_not_in_values_of_key2': {'bar', 'foo'},
 'a_not_in_values_of_key3': {'sad', 'so'}}

Now let’s suppose you have two dictionaries like this

d1 = {'bad_key1': {'a', 'b', 'c'}, 'bad_key2': {'foo', 'bar'}, 'bad_key3': {'so', 'sad'}}
d2 = {'good_key1': {'foo', 'bar', 'xyz'}, 'good_key2': {'a', 'b', 'c'}}

and you want to replace the keys in d1 by the keys of d2 if there respective values are identical, you could do

# here we assume that the values in d2 are unique
# Python 2
dout2 = {d2.keys()[d2.values().index(v1)] if v1 in d2.values() else k1: v1 for k1, v1 in d1.items()}

# Python 3
dout2 = {list(d2.keys())[list(d2.values()).index(v1)] if v1 in d2.values() else k1: v1 for k1, v1 in d1.items()}

which gives

{'bad_key2': {'bar', 'foo'},
 'bad_key3': {'sad', 'so'},
 'good_key2': {'a', 'b', 'c'}}

回答 2

如果您有不同的条件来评估键和值,@ Marcin的答案就是解决方法。

如果键和值的条件相同,则最好在生成器表达式中构建(键,值)元组并馈入dict()

dict((modify_k(k), modify_v(v)) if condition else (k, v) for k, v in dct.items())

它更易于阅读,并且每个键值仅对条件进行一次评估。

借用@Cleb的集合字典的示例:

d = {'key1': {'a', 'b', 'c'}, 'key2': {'foo', 'bar'}, 'key3': {'so', 'sad'}}

假设您只想在其后缀keys,并且在这种情况下,您希望将其替换为集合的长度。否则,键值对应保持不变。avaluevalue

dict((f"{k}_a", len(v)) if "a" in v else (k, v) for k, v in d.items())
# {'key1_a': 3, 'key2': {'bar', 'foo'}, 'key3': {'sad', 'so'}}

In case you have different conditions to evaluate for keys and values, @Marcin’s answer is the way to go.

If you have the same condition for keys and values, you’re better off with building (key, value)-tuples in a generator-expression feeding into dict():

dict((modify_k(k), modify_v(v)) if condition else (k, v) for k, v in dct.items())

It’s easier to read and the condition is only evaluated once per key, value.

Example with borrowing @Cleb’s dictionary of sets:

d = {'key1': {'a', 'b', 'c'}, 'key2': {'foo', 'bar'}, 'key3': {'so', 'sad'}}

Assume you want to suffix only keys with a in its value and you want the value replaced with the length of the set in such a case. Otherwise, the key-value pair should stay unchanged.

dict((f"{k}_a", len(v)) if "a" in v else (k, v) for k, v in d.items())
# {'key1_a': 3, 'key2': {'bar', 'foo'}, 'key3': {'sad', 'so'}}

回答 3

在字典理解中使用if / else的另一个示例

我正在为自己的办公室工作在数据输入桌面应用程序上,这种数据输入应用程序通常会从输入小部件中获取所有条目并将其转储到字典中,以进行进一步的处理,例如验证或编辑,我们必须将其返回选择的数据从文件返回到条目小部件等

第一轮使用传统编码(8行):

entries = {'name': 'Material Name', 'maxt': 'Max Working Temperature', 'ther': {100: 1.1, 200: 1.2}}

a_dic, b_dic = {}, {}

for field, value in entries.items():
    if field == 'ther':
        for k,v in value.items():
            b_dic[k] = v
        a_dic[field] = b_dic
    else:
        a_dic[field] = value
    
print(a_dic)
 {'name': 'Material Name', 'maxt': 'Max Working Temperature', 'ther': {100: 1.1, 200: 1.2}}”

第二轮我尝试使用字典理解,但是循环仍然存在(6行):

entries = {'name': 'Material Name', 'maxt': 'Max Working Temperature', 'ther': {100: 1.1, 200: 1.2}}

for field, value in entries.items():
    if field == 'ther':
        b_dic = {k:v for k,v in value.items()}
        a_dic[field] = b_dic
    else:
        a_dic[field] = value
    
print(a_dic)
 {'name': 'Material Name', 'maxt': 'Max Working Temperature', 'ther': {100: 1.1, 200: 1.2}}”

最后,使用单行字典理解语句(1行):

entries = {'name': 'Material Name', 'maxt': 'Max Working Temperature', 'ther': {100: 1.1, 200: 1.2}}

a_dic = {field:{k:v for k,v in value.items()} if field == 'ther' 
        else value for field, value in entries.items()}
    
print(a_dic)
 {'name': 'Material Name', 'maxt': 'Max Working Temperature', 'ther': {100: 1.1, 200: 1.2}}”

我使用python 3.8.3

Another example in using if/else in dictionary comprehension

I am working on data-entry desktop application for my own office work, and it is common for such data-entry application to get all entries from input widget and dump it into a dictionary for further processing like validation, or editing which we must return selected data from file back to entry widgets, etc.

The first round using traditional coding (8 lines):

entries = {'name': 'Material Name', 'maxt': 'Max Working Temperature', 'ther': {100: 1.1, 200: 1.2}}

a_dic, b_dic = {}, {}

for field, value in entries.items():
    if field == 'ther':
        for k,v in value.items():
            b_dic[k] = v
        a_dic[field] = b_dic
    else:
        a_dic[field] = value
    
print(a_dic)
“ {'name': 'Material Name', 'maxt': 'Max Working Temperature', 'ther': {100: 1.1, 200: 1.2}}”

Second round I tried to use dictionary comprehension but the loop still there (6 lines):

entries = {'name': 'Material Name', 'maxt': 'Max Working Temperature', 'ther': {100: 1.1, 200: 1.2}}

for field, value in entries.items():
    if field == 'ther':
        b_dic = {k:v for k,v in value.items()}
        a_dic[field] = b_dic
    else:
        a_dic[field] = value
    
print(a_dic)
“ {'name': 'Material Name', 'maxt': 'Max Working Temperature', 'ther': {100: 1.1, 200: 1.2}}”

Finally, with a one-line dictionary comprehension statement (1 line):

entries = {'name': 'Material Name', 'maxt': 'Max Working Temperature', 'ther': {100: 1.1, 200: 1.2}}

a_dic = {field:{k:v for k,v in value.items()} if field == 'ther' 
        else value for field, value in entries.items()}
    
print(a_dic)
“ {'name': 'Material Name', 'maxt': 'Max Working Temperature', 'ther': {100: 1.1, 200: 1.2}}”

I use python 3.8.3


安装Graphviz 2.38后,“ RuntimeError:确保Graphviz可执行文件在系统路径上”

问题:安装Graphviz 2.38后,“ RuntimeError:确保Graphviz可执行文件在系统路径上”

我下载了Graphviz 2.38MSI版本并安装在文件夹下C:\Python34,然后运行pip install Graphviz,一切顺利。在系统的路径中,我添加了C:\Python34\bin。在尝试在线运行测试脚本时filename=dot.render(filename='test'),我收到一条消息

 RuntimeError: failed to execute ['dot', '-Tpdf', '-O', 'test'], make sure the Graphviz executables are on your systems' path

我试图"C:\Python34\bin\dot.exe"输入系统的路径,但是它没有用,甚至创建了一个"GRAPHVIZ_DOT"带有value 的新环境变量"C:\Python34\bin\dot.exe",但仍然没有用。我尝试卸载Graphviz和pip uninstall graphviz,然后重新安装并再次进行pip安装,但是没有任何效果。

整个回溯消息是:

Traceback (most recent call last):
  File "C:\Python34\lib\site-packages\graphviz\files.py", line 220, in render
    proc = subprocess.Popen(cmd, startupinfo=STARTUPINFO)
  File "C:\Python34\lib\subprocess.py", line 859, in __init__
    restore_signals, start_new_session)
  File "C:\Python34\lib\subprocess.py", line 1112, in _execute_child
    startupinfo)
FileNotFoundError: [WinError 2] The system cannot find the file specified

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "C:\Users\Documents\Kissmetrics\curves and lines\eventNodes.py", line 56, in <module>
    filename=dot.render(filename='test')
  File "C:\Python34\lib\site-packages\graphviz\files.py", line 225, in render
    'are on your systems\' path' % cmd)
RuntimeError: failed to execute ['dot', '-Tpdf', '-O', 'test'], make sure the Graphviz executables are on your systems' path

有人有经验吗?

I downloaded Graphviz 2.38 MSI version and installed under folder C:\Python34, then I run pip install Graphviz, everything went well. In system’s path I added C:\Python34\bin. When I tried to run a test script, in line filename=dot.render(filename='test'), I got a message

 RuntimeError: failed to execute ['dot', '-Tpdf', '-O', 'test'], make sure the Graphviz executables are on your systems' path

I tried to put "C:\Python34\bin\dot.exe" in system’s path, but it didn’t work, and I even created a new environment variable "GRAPHVIZ_DOT" with value "C:\Python34\bin\dot.exe", still not working. I tried to uninstall Graphviz and pip uninstall graphviz, then reinstall it and pip install again, but nothing works.

The whole traceback message is:

Traceback (most recent call last):
  File "C:\Python34\lib\site-packages\graphviz\files.py", line 220, in render
    proc = subprocess.Popen(cmd, startupinfo=STARTUPINFO)
  File "C:\Python34\lib\subprocess.py", line 859, in __init__
    restore_signals, start_new_session)
  File "C:\Python34\lib\subprocess.py", line 1112, in _execute_child
    startupinfo)
FileNotFoundError: [WinError 2] The system cannot find the file specified

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "C:\Users\Documents\Kissmetrics\curves and lines\eventNodes.py", line 56, in <module>
    filename=dot.render(filename='test')
  File "C:\Python34\lib\site-packages\graphviz\files.py", line 225, in render
    'are on your systems\' path' % cmd)
RuntimeError: failed to execute ['dot', '-Tpdf', '-O', 'test'], make sure the Graphviz executables are on your systems' path

Does anybody have any experience with it?


回答 0

import os
os.environ["PATH"] += os.pathsep + 'D:/Program Files (x86)/Graphviz2.38/bin/'

在Windows中,只需在开头添加这两行,其中“ D:/ Program Files(x86)/Graphviz2.38/bin/”将替换为bin文件所在的地址。

那解决了问题。

import os
os.environ["PATH"] += os.pathsep + 'D:/Program Files (x86)/Graphviz2.38/bin/'

In windows just add these 2 lines in the beginning, where ‘D:/Program Files (x86)/Graphviz2.38/bin/’ is replaced by the address of where your bin file is.

That solves the problem.


回答 1

您应该在系统中安装graphviz软件包(而不仅仅是python软件包)。在Ubuntu上,您应该尝试:

sudo apt-get install graphviz

You should install the graphviz package in your system (not just the python package). On Ubuntu you should try:

sudo apt-get install graphviz

回答 2

这是我在MAC上为我解决的问题

  brew install graphviz

This one solved the problem for me on MAC:

  brew install graphviz

回答 3

对于Windows:

  1. 从以下位置安装Windows软件包:https : //graphviz.gitlab.io/_pages/Download/Download_windows.html
  2. 安装python graphviz
  3. 添加C:\Program Files (x86)\Graphviz2.38\bin到用户路径
  4. 添加C:\Program Files (x86)\Graphviz2.38\bin\dot.exe到系统路径

这对我有用!

For Windows:

  1. Install windows package from: https://graphviz.gitlab.io/_pages/Download/Download_windows.html
  2. Install python graphviz package
  3. Add C:\Program Files (x86)\Graphviz2.38\bin to User path
  4. Add C:\Program Files (x86)\Graphviz2.38\bin\dot.exe to System Path

This worked for me!


回答 4

尝试使用:

conda install python-graphviz

如果使用,graphviz可执行文件与conda目录位于不同的路径pip install graphviz

Try using:

conda install python-graphviz

The graphviz executable sit on a different path from your conda directory, if you use pip install graphviz.


回答 5

OSX Sierra,Python 2.7,Graphviz 2.38

使用pip install graphvizconda install graphviz解决了该问题。

pip只得到与您相同的路径问题,并且conda只得到导入错误。

OSX Sierra, Python 2.7, Graphviz 2.38

Using pip install graphviz and conda install graphviz BOTH resolves the problem.

pip only gets path problem same as yours and conda only gets import error.


回答 6

只需将以下内容添加到 Windows上的环境变量(系统)路径

C:\ Program档案(x86)\ Graphviz2.38 \ bin

在那里,您可以找到 .exe文件

如果不行

在您的程序文件中而不是python lib中找到Graphviz2.38 / bin文件夹

然后,添加到您的PATH

找到存在.exe文件的文件夹很重要

Just add below to your Environmental Variable(system) PATH on Windows

C:\Program Files (x86)\Graphviz2.38\bin

there, you can find .exe files

If not work

Find Graphviz2.38/bin folder in your Program Files not in python lib

Then, add to your PATH

It’s important to find a folder where .exe files exist


回答 7

步骤1:安装Graphviz二进制文件

视窗:

  1. http://www.graphviz.org/download/下载Graphviz
  2. 在下面添加到PATH环境变量中(提及已安装的graphviz版本):
    • C:\ Program档案(x86)\ Graphviz2.38 \ bin
    • C:\ Program Files(x86)\ Graphviz2.38 \ bin \ dot.exe
  3. 关闭所有打开的Juypter笔记本和命令提示符
  4. 重新启动Jupyter / cmd提示并测试

Linux:

  1. sudo apt-get更新
  2. 须藤apt-get install graphviz
  3. 或从http://www.graphviz.org/download/手动构建

步骤2:为Python安装graphviz模块

点:

  • 点安装graphviz

康达:

  • 康达安装graphviz

Step 1: Install Graphviz binary

Windows:

  1. Download Graphviz from http://www.graphviz.org/download/
  2. Add below to PATH environment variable (mention the installed graphviz version):
    • C:\Program Files (x86)\Graphviz2.38\bin
    • C:\Program Files (x86)\Graphviz2.38\bin\dot.exe
  3. Close any opened Juypter notebook and the command prompt
  4. Restart Jupyter / cmd prompt and test

Linux:

  1. sudo apt-get update
  2. sudo apt-get install graphviz
  3. or build it manually from http://www.graphviz.org/download/

Step 2: Install graphviz module for python

pip:

  • pip install graphviz

conda:

  • conda install graphviz

回答 8

尝试conda install graphviz。我有同样的问题,我通过MacOS中提到的命令解决了。

Try conda install graphviz. I had the same problem, I resolved it by mentioned command in MacOS.


回答 9

对我来说,在Windows10上使用conda install graphvizconda install python-graphviz安装GraphViz所需的路径是C:/ ProgramData / Anaconda3 / Library / bin / graphviz /。即添加

import os
os.environ["PATH"] += os.pathsep + 'C:/ProgramData/Anaconda3/Library/bin/graphviz/'

为我解决了这个问题。

Using conda install graphviz and conda install python-graphviz to install GraphViz on Windows10 the path needed was C:/ProgramData/Anaconda3/Library/bin/graphviz/ for me. I.e. adding

import os
os.environ["PATH"] += os.pathsep + 'C:/ProgramData/Anaconda3/Library/bin/graphviz/'

solved the issue for me.


回答 10

conda install python-graphviz

对于Windows,请安装Python Graphviz,它将在路径中包含可执行文件。

conda install python-graphviz

For Windows, install the Python Graphviz which will include the executables in the path.


回答 11

在Ubuntu Linux上,这为我解决了问题:

pip install graphviz
sudo apt-get install graphviz

conda install -c conda-forge graphviz如果使用Anaconda,也可以尝试代替pip。

On Ubuntu Linux this solved it for me:

pip install graphviz
sudo apt-get install graphviz

You could also try conda install -c conda-forge graphviz instead of pip if using Anaconda.


回答 12

在为自己解决此问题时,我使用了此GitHub教程,该教程分析了导致此问题的原因。如果我们在两行之间阅读,它说它需要系统以及python graph viz。除了conda install,我们还需要运行:

conda install -c conda-forge python-graphviz

然后重启内核;它就像一个魅力。

When solving this issue for myself, I used this GitHub tutorial, which analysed the cause of this issue. If we read in between the lines, it says it needs system as well as python graph viz. In addition to conda install, we would need to run:

conda install -c conda-forge python-graphviz

Then restart the kernel; it works like a charm.


回答 13

1)Graphviz –在系统的特定位置下载解压缩文件(pip在Windows中不起作用),并在每个程序中手动设置的路径中包含bin文件夹(“在Windows中设置环境变量”或)

import os
os.environ["PATH"] += os.pathsep + 'C:/GraphViz/bin'

2)然后将模型绘制

clf = xgb.train(params, d_train, 1000, evals=evallist, early_stopping_rounds=10)
xgb.plot_tree(clf)
plt.rcParams['figure.figsize'] = [50, 10]
plt.show()

1) Graphviz – download unzip in a particular place in the system (pip does not work in windows ) and include the bin folder in the path (‘set environment variables in windows’ OR) set manually in each program

import os
os.environ["PATH"] += os.pathsep + 'C:/GraphViz/bin'

2) Then put the model to plot

clf = xgb.train(params, d_train, 1000, evals=evallist, early_stopping_rounds=10)
xgb.plot_tree(clf)
plt.rcParams['figure.figsize'] = [50, 10]
plt.show()

回答 14

安装软件包后(如果尚未安装,请链接),将dot.exe的路径添加为新的系统变量。

默认路径是:

C:\ Program Files(x86)\ Graphviz2.38 \ bin \ dot.exe

After you’ve installed the package (link if you haven’t), add the path to dot.exe as a new system variable.

Default path is:

C:\Program Files (x86)\Graphviz2.38\bin\dot.exe


回答 15

我在使用Jupyter的Linux上遇到了相同的问题。

为了解决这个问题,我已经将点库添加到python sys.path中

首先:检查是否dot已安装,

然后:
找到他的路径whereis dot-> / local / notebook / miniconda2 / envs / ik2 / bin / dot

最后在python脚本中:sys.path.append(“ / local / notebook / miniconda2 / envs / ik2 / bin / dot”)

I had the same issue on Linux with Jupyter.

To solve it I’ve added the dot library to python sys.path

First: check if dot is installed,

Then:
find his path whereis dot -> /local/notebook/miniconda2/envs/ik2/bin/dot

Finally in python script : sys.path.append(“/local/notebook/miniconda2/envs/ik2/bin/dot”)


回答 16

首先,您应该使用pip install,然后在http://www.graphviz.org/Download_windows.php中下载另一个软件包 ,并将安装位置添加到环境路径中,然后它可以工作。

First, you should use pip install, and then download another package in http://www.graphviz.org/Download_windows.php and add the install location into the environmental path, then it works.


回答 17

在Mac OS(El Capitan)上,我使用PyCharm IDE遇到了相同的错误消息。我已按照RZK的答案中的建议使用brew安装了Graphviz,并使用PyCharm 安装了graphviz python软件包(我可以通过dot -V在终端中尝试并获取以下内容来检查Graphviz是否已正确安装:dot - graphviz version 2.40.1 (20161225.0304))。但是,当尝试从PyCharm调用Graphviz时,我仍然收到错误消息。

我必须按照此问题的答案中的建议,在PyCharm选项中添加路径/ usr / local / bin 。

I had the same error message on Mac OS (El Capitan), using the PyCharm IDE. I had installed Graphviz using brew, as recommended in RZK’s answer, and installed the graphviz python package using PyCharm (I could check Graphviz was installed correctly by trying dot -V in a terminal and getting: dot - graphviz version 2.40.1 (20161225.0304)). Yet I was still getting the error message when trying to call Graphviz from PyCharm.

I had to add the path /usr/local/bin in PyCharm options, as recommended in the answer to this question to resolve the problem.


回答 18

这显示了一些路径问题:

pip install graphviz

所以这对我有用:

sudo apt-get install graphviz

This is showing some path issue:

pip install graphviz

So this worked for me:

sudo apt-get install graphviz

回答 19

我在macOS Catalina 10.15.3上,并且遇到了类似的错误: ExecutableNotFound: failed to execute ['dot', '-Tsvg'], make sure the Graphviz executables are on your systems' PATH

使用以下方法修复了该问题:

pip3 install graphvizbrew install graphviz

请注意,pip3 install只会返回成功消息,Successfully installed graphviz-0.13.2因此我们仍然需要运行brew install以获得graphviz 2.42.3(截至2020年3月10日,下午6点)。

I’m on macOS Catalina 10.15.3, and I had a similar error: ExecutableNotFound: failed to execute ['dot', '-Tsvg'], make sure the Graphviz executables are on your systems' PATH

Fixed it with:

pip3 install graphviz AND brew install graphviz

Note the pip3 install will only return the success message Successfully installed graphviz-0.13.2 so we still need to run brew install to get graphviz 2.42.3 (as of 10 Mar 2020, 6PM).


回答 20

对于没有root权限并因此无法sudo按照其他答案中的建议使用命令的Linux用户…

首先,通过以下方法激活您的conda虚拟环境(如果要使用):

source activate virtual-env-name

然后安装graphviz,即使您已经使用pip完成了它:

conda install graphviz

然后复制以下命令的结果:

whereis dot

就我而言,其输出为:

/home/nader/anaconda2/bin/dot

并将其添加到您的PATH变量中。只需运行以下命令

nano ~/.bashrc

并将这些行添加到打开的文件的末尾:

PATH="/home/username/anaconda2/bin/dot:$PATH"
export PATH

现在按Ctrl+ O,然后Ctrl+ X保存并退出。

现在应该解决问题。

Pycharm用户,请注意:Pycharm并不总是看到与您的终端相同的PATH变量。该解决方案不适用于Pycharm以及其他IDE。但是您可以通过添加以下代码行来解决此问题:

os.environ["PATH"] += os.pathsep + '/home/nader/anaconda2/bin'

到您的python程序。不要忘记

import os

首先:)

编辑:如果您不想使用conda,您仍然可以没有任何root权限的情况下从此处安装graphviz ,并将bin文件夹添加到PATH变量中。我没有测试。

For Linux users who don’t have root access and hence can’t use sudo command as suggested in other answers…

First, activate your conda virtual-environment (if you want to use one) by:

source activate virtual-env-name

Then install graphviz, even if you have already done it using pip:

conda install graphviz

then copy the result of the following command:

whereis dot

In my case, its output is:

/home/nader/anaconda2/bin/dot

and add it to your PATH variable. Just run the command below

nano ~/.bashrc

and add these lines to the end of the opened file:

PATH="/home/username/anaconda2/bin/dot:$PATH"
export PATH

now press Ctrl+O and then Ctrl+X to save and exit.

Problem should be solved by now.

Pycharm users, please note: Pycharm does not always see the PATH variable the same as your terminal. This solution does not work for Pycharm, and maybe other IDEs. But you can fix this by adding this line of code:

os.environ["PATH"] += os.pathsep + '/home/nader/anaconda2/bin'

to your python program. Do not forget to

import os

first :)

Edit: If you don’t want to use conda, you can still install graphviz from here without any root permissions and add the bin folder to your PATH variable. I didn’t test this.


回答 21

1.从以下位置安装Windows软件包:https ://graphviz.gitlab.io/_pages/Download/Download_windows.html 并下载msi文件

添加环境变量2.将C:\ Program Files(x86)\ Graphviz2.38 \ bin添加到用户路径

  1. 将C:\ Program Files(x86)\ Graphviz2.38 \ bin \ dot.exe添加到系统路径

  2. 重新启动您的python笔记本。

会的。

1.install windows package from: https://graphviz.gitlab.io/_pages/Download/Download_windows.html and download msi file

Add in Environmental variables 2. Add C:\Program Files (x86)\Graphviz2.38\bin to User path

  1. Add C:\Program Files (x86)\Graphviz2.38\bin\dot.exe to System Path

  2. Restart your python notebook.

It will work.


回答 22

graphviz添加到系统路径

  1. Windows-编辑系统环境变量。
  2. 选择环境变量。
  3. 选择路径-新建
  4. 添加graphviz的路径

例如:C:\ Users \ AppData \ Local \ Continuum \ anaconda3 \ Library \ bin \ graphviz

Add graphviz to the System Path

  1. Windows – Edit the System Environment Variables.
  2. Choose Environment Variables.
  3. Select Path – New
  4. Add the Path of graphviz

Ex: C:\Users\AppData\Local\Continuum\anaconda3\Library\bin\graphviz


回答 23

OS Mojave 10.14。,Python 3.6

使用pip install graphviz在终端上有很好的反馈,但是当我尝试在Jupyter笔记本中绘制图形时导致使用此错误。然后brew install graphviz,我运行,这在终端中给出了错误。然后我跑conda install graphviz了,图开始工作了。

来自@Leighton的评论:pip仅会遇到与您相同的路径问题,而conda仅会导致导入错误。

OS Mojave 10.14., Python 3.6

Using pip install graphviz had good feedback in terminal, but lead to this error when I tried to make a graph in a Jupyter notebook. I then ran brew install graphviz, which gave an error in terminal. Then I ran conda install graphviz and the graph worked.

From @Leighton’s comment: pip only gets path problem same as yours and conda only gets import error.


回答 24

import os
os.environ["PATH"] += os.pathsep + "/Macintosh HD⁩/anaconda3⁩/lib⁩/⁨python3.7⁩/site-packages⁩/sphinx⁩/templates⁩/graphviz"

这为我解决了MAC上的PATH问题!

import os
os.environ["PATH"] += os.pathsep + "/Macintosh HD⁩/anaconda3⁩/lib⁩/⁨python3.7⁩/site-packages⁩/sphinx⁩/templates⁩/graphviz"

This solved the PATH issue on MAC for me!


回答 25

如果您不是使用Conda而是使用Vanilla Python,则“ brew install graphviz”有效。

If you are not using Conda but vanilla Python, ‘brew install graphviz’ works.


回答 26

#Write this on anaconda prompt in admin mode
conda install -c anaconda graphviz
conda install -c conda-forge python-graphviz
conda install -c conda-forge/label/broken python-graphviz
conda install -c conda-forge/label/cf201901 python-graphviz
conda install -c conda-forge/label/cf202003 python-graphviz

#check dot -v in window's cmd prompt
C:\WINDOWS\system32>dot -V
dot - graphviz version 2.38.0 (20140413.2041)
(this means graphviz installed successfully)

#Add path to sys and user eve variables
PATH
C:\Anaconda3\pkgs\graphviz-2.38-hfd603c8_2\Library\bin
(search bin folder of graphviz and then copy n paste path in env variables)

#Re-run all cmds in jyupter notebook
#if error occurs (less chances)
#then 
#Restart anaconda and again run all cmds in jyupter notebook
eg.
import graphviz as gp
with open("tree.dot") as f:
    dot_read=f.read()
display(gp.Source(dot_read))
#Write this on anaconda prompt in admin mode
conda install -c anaconda graphviz
conda install -c conda-forge python-graphviz
conda install -c conda-forge/label/broken python-graphviz
conda install -c conda-forge/label/cf201901 python-graphviz
conda install -c conda-forge/label/cf202003 python-graphviz

#check dot -v in window's cmd prompt
C:\WINDOWS\system32>dot -V
dot - graphviz version 2.38.0 (20140413.2041)
(this means graphviz installed successfully)

#Add path to sys and user eve variables
PATH
C:\Anaconda3\pkgs\graphviz-2.38-hfd603c8_2\Library\bin
(search bin folder of graphviz and then copy n paste path in env variables)

#Re-run all cmds in jyupter notebook
#if error occurs (less chances)
#then 
#Restart anaconda and again run all cmds in jyupter notebook
eg.
import graphviz as gp
with open("tree.dot") as f:
    dot_read=f.read()
display(gp.Source(dot_read))

回答 27

尝试在anaconda提示中一一键入以下代码。

这对我有用。

资料来源:https : //anaconda.org/conda-forge/python-graphviz

conda install -c conda-forge python-graphviz
conda install -c conda-forge/label/broken python-graphviz
conda install -c conda-forge/label/cf201901 python-graphviz
conda install -c conda-forge/label/cf202003 python-graphviz 

try typing the following code in anaconda prompt one by one.

this worked for me.

Source: https://anaconda.org/conda-forge/python-graphviz

conda install -c conda-forge python-graphviz
conda install -c conda-forge/label/broken python-graphviz
conda install -c conda-forge/label/cf201901 python-graphviz
conda install -c conda-forge/label/cf202003 python-graphviz 

回答 28

尝试在python import sys中做到这一点!conda install –yes –prefix {sys.prefix} graphviz import graphviz

trying doing this in python import sys !conda install –yes –prefix {sys.prefix} graphviz import graphviz


一个Flask进程接收多少个并发请求?

问题:一个Flask进程接收多少个并发请求?

我正在用Flask构建一个应用程序,但是我对WSGI并不太了解,它是基于HTTP的Werkzeug。当我开始使用gunicorn和4个工作进程处理Flask应用程序时,这是否意味着我可以处理4个并发请求?

我的意思是并发请求,而不是每秒的请求或其他任何请求。

I’m building an app with Flask, but I don’t know much about WSGI and it’s HTTP base, Werkzeug. When I start serving a Flask application with gunicorn and 4 worker processes, does this mean that I can handle 4 concurrent requests?

I do mean concurrent requests, and not requests per second or anything else.


回答 0

在运行开发服务器时(这是通过运行获得的)app.run(),您将获得一个同步过程,这意味着一次最多处理一个请求。

通过将Gunicorn保留在其默认配置中并简单地增加Gunicorn的数量--workers,您所获得的实际上是一些流程(由Gunicorn管理),每个流程的行为都类似于app.run()开发服务器。4个工作人员== 4个并发请求。这是因为Gunicorn sync默认使用其包含的工作程序类型。

重要的是要注意,Gunicorn还包含异步工作程序,即eventletgevent(以及tornado,但似乎最好在Tornado框架中使用)。通过使用--worker-class标志指定这些异步工作程序之一,您将获得Gunicorn管理多个异步进程的信息,每个进程管理自己的并发性。这些进程不使用线程,而是协程。基本上,在每个进程中,一次只能发生1件事(1个线程),但是当对象等待外部进程完成(例如数据库查询或等待网络I / O)时,它们可以被“暂停”。

这意味着,如果您使用的是Gunicorn的异步工作程序之一,则每个工作程序一次最多可以处理多个请求。多少工人才是最好的,取决于您的应用程序的性质,其环境,运行的硬件等。有关更多详细信息,请参见Gunicorn的设计页面,并在其介绍页面上介绍gevent的工作方式

When running the development server – which is what you get by running app.run(), you get a single synchronous process, which means at most 1 request is being processed at a time.

By sticking Gunicorn in front of it in its default configuration and simply increasing the number of --workers, what you get is essentially a number of processes (managed by Gunicorn) that each behave like the app.run() development server. 4 workers == 4 concurrent requests. This is because Gunicorn uses its included sync worker type by default.

It is important to note that Gunicorn also includes asynchronous workers, namely eventlet and gevent (and also tornado, but that’s best used with the Tornado framework, it seems). By specifying one of these async workers with the --worker-class flag, what you get is Gunicorn managing a number of async processes, each of which managing its own concurrency. These processes don’t use threads, but instead coroutines. Basically, within each process, still only 1 thing can be happening at a time (1 thread), but objects can be ‘paused’ when they are waiting on external processes to finish (think database queries or waiting on network I/O).

This means, if you’re using one of Gunicorn’s async workers, each worker can handle many more than a single request at a time. Just how many workers is best depends on the nature of your app, its environment, the hardware it runs on, etc. More details can be found on Gunicorn’s design page and notes on how gevent works on its intro page.


回答 1

当前,存在比已提供的解决方案简单得多的解决方案。运行应用程序时,只需将threaded=True参数传递给app.run()调用,例如:

app.run(host="your.host", port=4321, threaded=True)

根据在werkzeug文档中可以看到的另一种选择是使用processes参数,该参数接收的数字> 1表示要处理的最大并发进程数:

  • 线程化–进程应在单独的线程中处理每个请求吗?
  • 进程–如果大于1,则将处理新进程中的每个请求,直到最大并发进程数。

就像是:

app.run(host="your.host", port=4321, processes=3) #up to 3 processes

关于更多信息run()方法在这里,和博客文章,导致我找到解决方案和API引用。


注意:在Flask文档中,关于run()方法的方法表明不鼓励在生产环境中使用它,因为(quote):“虽然Flask轻巧易用,但其内置服务器不适合生产,因为它的扩展性不好”。

但是,他们确实指向其“ 部署选项”页面,以了解在投入生产时执行此操作的推荐方法。

Currently there is a far simpler solution than the ones already provided. When running your application you just have to pass along the threaded=True parameter to the app.run() call, like:

app.run(host="your.host", port=4321, threaded=True)

Another option as per what we can see in the werkzeug docs, is to use the processes parameter, which receives a number > 1 indicating the maximum number of concurrent processes to handle:

  • threaded – should the process handle each request in a separate thread?
  • processes – if greater than 1 then handle each request in a new process up to this maximum number of concurrent processes.

Something like:

app.run(host="your.host", port=4321, processes=3) #up to 3 processes

More info on the run() method here, and the blog post that led me to find the solution and api references.


Note: on the Flask docs on the run() methods it’s indicated that using it in a Production Environment is discouraged because (quote): “While lightweight and easy to use, Flask’s built-in server is not suitable for production as it doesn’t scale well.”

However, they do point to their Deployment Options page for the recommended ways to do this when going for production.


回答 2

Flask将同时为每个线程处理一个请求。如果您有2个进程,每个进程有4个线程,则是8个并发请求。

Flask不会产生或管理线程或进程。这就是WSGI网关(例如gunicorn)的责任。

Flask will process one request per thread at the same time. If you have 2 processes with 4 threads each, that’s 8 concurrent requests.

Flask doesn’t spawn or manage threads or processes. That’s the responsability of the WSGI gateway (eg. gunicorn).


回答 3

不,您绝对可以处理更多。

重要的是要记住,假设您正在运行一台单核计算机,那么CPU实际上一次只能运行一条指令*。

也就是说,CPU只能执行非常有限的一组指令,并且每个时钟周期不能执行多个指令(许多指令甚至需要1个周期以上)。

因此,我们在计算机科学中谈论的大多数并发是软件并发。换句话说,有一些软件实现层从我们这里提取底层CPU,并使我们认为我们正在同时运行代码。

这些“事物”可以是进程,它们是代码单元,它们在每个进程都认为其在自己的世界中使用自己的非共享内存在运行时可以并发运行。

另一个示例是线程,线程是进程内部的代码单元,也允许并发。

您的4个辅助进程能够处理4个以上请求的原因是,它们将触发线程以处理越来越多的请求。

实际的请求限制取决于所选择的HTTP服务器,I / O,操作系统,硬件,网络连接等。

祝好运!

*说明是CPU可以运行的最基本的命令。示例-加两个数字,从一条指令跳转到另一条指令

No- you can definitely handle more than that.

Its important to remember that deep deep down, assuming you are running a single core machine, the CPU really only runs one instruction* at a time.

Namely, the CPU can only execute a very limited set of instructions, and it can’t execute more than one instruction per clock tick (many instructions even take more than 1 tick).

Therefore, most concurrency we talk about in computer science is software concurrency. In other words, there are layers of software implementation that abstract the bottom level CPU from us and make us think we are running code concurrently.

These “things” can be processes, which are units of code that get run concurrently in the sense that each process thinks its running in its own world with its own, non-shared memory.

Another example is threads, which are units of code inside processes that allow concurrency as well.

The reason your 4 worker processes will be able to handle more than 4 requests is that they will fire off threads to handle more and more requests.

The actual request limit depends on HTTP server chosen, I/O, OS, hardware, network connection etc.

Good luck!

*instructions are the very basic commands the CPU can run. examples – add two numbers, jump from one instruction to another


Django-makemigrations-未检测到更改

问题:Django-makemigrations-未检测到更改

我试图使用makemigrations命令在现有应用程序中创建迁移,但输出“未检测到更改”。

通常,我使用startapp命令创建新应用,但在创建该应用时并未将其用于该应用。

调试后,我发现它没有创建迁移,因为migrations应用程序中缺少软件包/文件夹。

如果不存在该文件夹,还是创建丢失的文件夹,会更好吗?

I was trying to create migrations within an existing app using the makemigrations command but it outputs “No changes detected”.

Usually I create new apps using the startapp command but did not use it for this app when I created it.

After debugging, I found that it is not creating migration because the migrations package/folder is missing from an app.

Would it be better if it creates the folder if it is not there or am I missing something?


回答 0

要为应用创建初始迁移,请运行makemigrations并指定应用名称。将创建迁移文件夹。

./manage.py makemigrations <myapp>

您的应用必须首先包含INSTALLED_APPS(在settings.py内部)。

To create initial migrations for an app, run makemigrations and specify the app name. The migrations folder will be created.

./manage.py makemigrations <myapp>

Your app must be included in INSTALLED_APPS first (inside settings.py).


回答 1

我的问题(以及解决方案)与上述问题有所不同。

我没有使用models.py文件,而是创建了一个models目录并在my_model.py其中创建了文件,并在其中放置了模型。Django找不到我的模型,因此它写道没有要应用的迁移。

我的解决方案是:在my_app/models/__init__.py文件中添加以下行: from .my_model import MyModel

My problem (and so solution) was yet different from those described above.

I wasn’t using models.py file, but created a models directory and created the my_model.py file there, where I put my model. Django couldn’t find my model so it wrote that there are no migrations to apply.

My solution was: in the my_app/models/__init__.py file I added this line: from .my_model import MyModel


回答 2

django在makemigrations命令执行期间未检测到要迁移的内容有多种可能的原因。

  1. 迁移文件夹您的应用程序中需要一个迁移软件包。
  2. INSTALLED_APPS您需要在INSTALLED_APPS.dict中指定您的应用
  3. 详尽度 从运行makemigrations -v 3冗长开始。这可能会为这个问题提供一些启示。
  4. 完整路径INSTALLED_APPS建议指定完整的模块应用程序配置路径“apply.apps.MyAppConfig”
  5. –settings,您可能需要确保设置了正确的设置文件:manage.py makemigrations --settings mysite.settings
  6. 指定应用程序名称可以显式地放入应用程序名称manage.py makemigrations myapp-不仅可以缩小应用程序的迁移范围,而且可以帮助您隔离问题。
  7. 模型元检查你有合适的app_label模型中的元

  8. 调试django调试django核心脚本。makemigrations命令非常简单。这是在pycharm中执行的方法。相应地改变你的脚本定义(例如:makemigrations --traceback myapp

多个数据库:

  • DB路由器时使用Django DB路由器,路由器类(您的自定义类路由器)工作需要实现的allow_syncdb方法。

makemigrations总是为模型更改创建迁移,但是如果allow_migrate()返回False,

There are multiple possible reasons for django not detecting what to migrate during the makemigrations command.

  1. migration folder You need a migrations package in your app.
  2. INSTALLED_APPS You need your app to be specified in the INSTALLED_APPS .dict
  3. Verbosity start by running makemigrations -v 3 for verbosity. This might shed some light on the problem.
  4. Full path In INSTALLED_APPS it is recommended to specify the full module app config path ‘apply.apps.MyAppConfig’
  5. –settings you might want to make sure the correct settings file is set: manage.py makemigrations --settings mysite.settings
  6. specify app name explicitly put the app name in manage.py makemigrations myapp – that narrows down the migrations for the app alone and helps you isolate the problem.
  7. model meta check you have the right app_label in your model meta

  8. Debug django debug django core script. makemigrations command is pretty much straight forward. Here’s how to do it in pycharm. change your script definition accordingly (ex: makemigrations --traceback myapp)

Multiple databases:

  • Db Router when working with django db router, the router class (your custom router class) needs to implement the allow_syncdb method.

makemigrations always creates migrations for model changes, but if allow_migrate() returns False,


回答 3

我已经读过许多关于这个问题的答案,通常说它们只是makemigrations以其他方式运行。但是对我来说,问题出在Meta模型的子类中。

我有一个应用程序配置说label = <app name>(在apps.py文件中models.pyviews.py等等)。如果您的元类碰巧没有与应用程序标签相同的标签(例如,因为您将一个太大的应用程序拆分为多个应用程序),则不会检测到任何更改(也不会显示任何有用的错误消息)。因此,在我的模型课中,我现在有:

class ModelClassName(models.Model):

    class Meta:
        app_label = '<app name>' # <-- this label was wrong before.

    field_name = models.FloatField()
    ...

在此处运行Django 1.10。

I’ve read many answers to this question often stating to simply run makemigrations in some other ways. But to me, the problem was in the Meta subclass of models.

I have an app config that says label = <app name> (in the apps.py file, beside models.py, views.py etc). If by any chance your meta class doesn’t have the same label as the app label (for instance because you are splitting one too big app into multiple ones), no changes are detected (and no helpful error message whatsoever). So in my model class I have now:

class ModelClassName(models.Model):

    class Meta:
        app_label = '<app name>' # <-- this label was wrong before.

    field_name = models.FloatField()
    ...

Running Django 1.10 here.


回答 4

这是一个评论,但可能应该是一个答案。

确保您的应用程序名称位于settings.py中,INSTALLED_APPS否则无论执行什么操作都不会运行迁移。

INSTALLED_APPS = [
    'django.contrib.admin',
    'django.contrib.auth',
    'django.contrib.contenttypes',
    'django.contrib.sessions',
    'django.contrib.messages',
    'django.contrib.staticfiles',

    'blog',
]

然后运行:

./manage.py makemigrations blog

It is a comment but should probably be an answer.

Make sure that your app name is in settings.py INSTALLED_APPS otherwise no matter what you do it will not run the migrations.

INSTALLED_APPS = [
    'django.contrib.admin',
    'django.contrib.auth',
    'django.contrib.contenttypes',
    'django.contrib.sessions',
    'django.contrib.messages',
    'django.contrib.staticfiles',

    'blog',
]

Then run:

./manage.py makemigrations blog

回答 5

我还有另一个这里未描述的问题,这让我发疯。

class MyModel(models.Model):
    name = models.CharField(max_length=64, null=True)  # works
    language_code = models.CharField(max_length=2, default='en')  # works
    is_dumb = models.BooleanField(default=False),  # doesn't work

在复制和粘贴的一行中,我有一个尾随的“,”。is_dumb所在的行不会使用’./manage.py makemigrations’创建模型迁移,但也不会引发错误。删除“后,”它按预期方式工作。

因此,在复制粘贴时请多加注意:-)

I had another problem not described here, which drove me nuts.

class MyModel(models.Model):
    name = models.CharField(max_length=64, null=True)  # works
    language_code = models.CharField(max_length=2, default='en')  # works
    is_dumb = models.BooleanField(default=False),  # doesn't work

I had a trailing ‘,’ in one line perhaps from copy&paste. The line with is_dumb doesn’t created a model migration with ‘./manage.py makemigrations’ but also didn’t throw an error. After removing the ‘,’ it worked as expected.

So be careful when you do copy&paste :-)


回答 6

有时./manage.py makemigrations有优于./manage.py makemigrations <myapp>因为它可以处理应用程序之间的某些冲突。

这些场合默默发生,花了几个小时swearing才能理解恐惧的真正含义No changes detected消息。

因此,使用以下命令是一个更好的选择:

./manage.py makemigrations <myapp1> <myapp2> ... <myappN>

There are sometimes when ./manage.py makemigrations is superior to ./manage.py makemigrations <myapp> because it can handle certain conflicts between apps.

Those occasions occur silently and it takes several hours of swearing to understand the real meaning of the dreaded No changes detected message.

Therefore, it is a far better choice to make use of the following command:

./manage.py makemigrations <myapp1> <myapp2> ... <myappN>


回答 7

我从django外部复制了一个表,默认将Meta类设置为“ managed = false”。例如:

class Rssemailsubscription(models.Model):
    id = models.CharField(primary_key=True, max_length=36)
    ...
    area = models.FloatField('Area (Sq. KM)', null=True)

    class Meta:
        managed = False
        db_table = 'RSSEmailSubscription'

通过将managed更改为True,makemigrations开始接受更改。

I had copied a table in from outside of django and the Meta class defaulted to “managed = false”. For example:

class Rssemailsubscription(models.Model):
    id = models.CharField(primary_key=True, max_length=36)
    ...
    area = models.FloatField('Area (Sq. KM)', null=True)

    class Meta:
        managed = False
        db_table = 'RSSEmailSubscription'

By changing manged to True, makemigrations started picking up changes.


回答 8

  1. 确保您的应用在settings.py中的installed_apps中被提及
  2. 确保您的模型类扩展了模型。
  1. Make sure your app is mentioned in installed_apps in settings.py
  2. Make sure you model class extends models.Model

回答 9

我这样做来解决了这个问题:

  1. 擦除“ db.sqlite3”文件。这里的问题是您当前的数据库将被删除,因此您将不得不重新制作它。
  2. 在已编辑的应用程序的迁移文件夹中,删除上次更新的文件。请记住,第一个创建的文件是:“ 0001_initial.py”。例如:我创建了一个新类,并通过“ makemigrations”和“ migrate”过程进行了注册,现在创建了一个名为“ 0002_auto_etc.py”的新文件;删除它。
  3. 转到pycache ”文件夹(位于migrations文件夹内),然后删除文件“ 0002_auto_etc.pyc”。
  4. 最后,转到控制台并使用“ python manage.py makemigrations”和“ python manage.py migration”。

I solved that problem by doing this:

  1. Erase the “db.sqlite3” file. The issue here is that your current data base will be erased, so you will have to remake it again.
  2. Inside the migrations folder of your edited app, erase the last updated file. Remember that the first created file is: “0001_initial.py”. For example: I made a new class and register it by the “makemigrations” and “migrate” procedure, now a new file called “0002_auto_etc.py” was created; erase it.
  3. Go to the “pycache” folder (inside the migrations folder) and erase the file “0002_auto_etc.pyc”.
  4. Finally, go to the console and use “python manage.py makemigrations” and “python manage.py migrate”.

回答 10

我忘了提出正确的论点:

class LineInOffice(models.Model):   # here
    addressOfOffice = models.CharField("Корхоная жош",max_length= 200)   #and here
    ...

在models.py中,然后就开始消除烦人的

在应用程序“ myApp”中未检测到更改

I forgot to put correct arguments:

class LineInOffice(models.Model):   # here
    addressOfOffice = models.CharField("Корхоная жош",max_length= 200)   #and here
    ...

in models.py and then it started to drop that annoying

No changes detected in app ‘myApp ‘


回答 11

另一个可能的原因是,如果您在其他文件中(而不是在程序包中)定义了某些模型,而在其他任何地方都没有引用过。

对我来说,简单地增加from .graph_model import *,以admin.py(其中graph_model.py为新文件)解决了这一问题。

Another possible reason is if you had some models defined in another file (not in a package) and haven’t referenced that anywhere else.

For me, simply adding from .graph_model import * to admin.py (where graph_model.py was the new file) fixed the problem.


回答 12

我的问题比上面的答案要简单得多,并且可能是一个更普遍的原因,只要您的项目已经建立并可以运行。在我的一个已经使用了很长时间的应用程序中,迁移似乎很困难,因此,我急着做了以下事情:

rm -r */migrations/*
rm db.sqlite3
python3 manage.py makemigrations
No changes detected

哇?

我错误地还删除了所有__init__.py文件:(-进入后,一切又恢复正常了:

touch ads1/migrations/__init__.py

对于我的每个应用程序,都可以makemigrations再次使用。

事实证明,我已经手动通过复制另一个创建了一个新的应用程序,忘了把__init__.pymigrations文件夹和confinved我,一切都是靠不住的-我的领先使得它与更坏rm -r如上所述。

希望这可以帮助某人在几个小时内发誓“未检测到更改”错误。

My problem was much simpler than the above answers and probably a far more common reason as long as your project is already set up and working. In one of my applications that had been working for a long time, migrations seemed wonky, so in a hurry, I did the following:

rm -r */migrations/*
rm db.sqlite3
python3 manage.py makemigrations
No changes detected

Whaat??

I had mistakenly also removed all the __init__.py files :( – Everything was working again after I went in and:

touch ads1/migrations/__init__.py

For each of my applications then the makemigrations worked again.

It turns out that I had manually created a new application by copying another and forgot to put the __init__.py in the migrations folder and that confinved me that everything was wonky – leading my making it worse with an rm -r as described above.

Hope this helps someone from swearing at the “No changes detected” error for a few hours.


回答 13

解决方案是您必须将您的应用程序包含在INSTALLED_APPS中。

我错过了,发现了同样的问题。

指定我的应用名称后,迁移成功

INSTALLED_APPS = [
    'django.contrib.admin',
    'django.contrib.auth',
    'django.contrib.contenttypes',
    'django.contrib.sessions',
    'django.contrib.messages',
    'django.contrib.staticfiles',
    'boards',
]

请注意,我最后提到的木板是我的应用名称。

The solution is you have to include your app in INSTALLED_APPS.

I missed it and I found this same issue.

after specifying my app name migration became successful

INSTALLED_APPS = [
    'django.contrib.admin',
    'django.contrib.auth',
    'django.contrib.contenttypes',
    'django.contrib.sessions',
    'django.contrib.messages',
    'django.contrib.staticfiles',
    'boards',
]

please note I mentioned boards in last, which is my app name.


回答 14

INSTALLED_APPS = [

'blog.apps.BlogConfig',
'django.contrib.admin',
'django.contrib.auth',
'django.contrib.contenttypes',
'django.contrib.sessions',
'django.contrib.messages',
'django.contrib.staticfiles',

]

确保“ blog.apps.BlogConfig”(此设置包含在settings.py中,以便进行应用迁移)

然后运行python3 manage.py makemigrations博客或您的应用名称

INSTALLED_APPS = [

'blog.apps.BlogConfig',
'django.contrib.admin',
'django.contrib.auth',
'django.contrib.contenttypes',
'django.contrib.sessions',
'django.contrib.messages',
'django.contrib.staticfiles',

]

make sure ‘blog.apps.BlogConfig’, (this is included in your settings.py in order to make your app migrations)

then run python3 manage.py makemigrations blog or your app name


回答 15

您可能还会遇到的一个非常愚蠢的问题是class Meta在模型中定义两个。在这种情况下,运行时不会应用对第一个所做的任何更改makemigrations

class Product(models.Model):
    somefield = models.CharField(max_length=255)
    someotherfield = models.CharField(max_length=255)

    class Meta:
        indexes = [models.Index(fields=["somefield"], name="somefield_idx")]

    def somefunc(self):
        pass

    # Many lines...

    class Meta:
        indexes = [models.Index(fields=["someotherfield"], name="someotherfield_idx")]

A very dumb issue you can have as well is to define two class Meta in your model. In that case, any change to the first one won’t be applied when running makemigrations.

class Product(models.Model):
    somefield = models.CharField(max_length=255)
    someotherfield = models.CharField(max_length=255)

    class Meta:
        indexes = [models.Index(fields=["somefield"], name="somefield_idx")]

    def somefunc(self):
        pass

    # Many lines...

    class Meta:
        indexes = [models.Index(fields=["someotherfield"], name="someotherfield_idx")]

回答 16

我知道这是一个老问题,但是我整天都在同一个问题上作斗争,而我的解决方案很简单。

我的目录结构类似于…

apps/
   app/
      __init__.py
      app_sub1/
           __init__.py
           models.py
      app_sub2/
           __init__.py
           models.py
      app_sub3/
           __init__.py
           models.py
   app2/
      __init__.py
      app2_sub1/
           __init__.py
           models.py
      app2_sub2/
           __init__.py
           models.py
      app2_sub3/
           __init__.py
           models.py
    main_app/
      __init__.py
      models.py

而且由于直到我遇到问题的所有其他模型都被导入到其他地方,最后又从main_app那里导入了INSTALLED_APPS,所以我很幸运,他们都可以使用。

但是由于我只添加了一个appINSTALLED_APPS而不是app_sub*当我最终添加了一个其他未导入的新模型文件时添加的,因此Django完全忽略了它。

我的解决方法是像这样将models.py文件添加到每个文件的基本目录中app

apps/
   app/
      __init__.py
      models.py <<<<<<<<<<--------------------------
      app_sub1/
           __init__.py
           models.py
      app_sub2/
           __init__.py
           models.py
      app_sub3/
           __init__.py
           models.py
   app2/
      __init__.py
      models.py <<<<<<<<<<--------------------------
      app2_sub1/
           __init__.py
           models.py
      app2_sub2/
           __init__.py
           models.py
      app2_sub3/
           __init__.py
           models.py
    main_app/
      __init__.py
      models.py

然后添加from apps.app.app_sub1 import *等等到每个app关卡models.py文件中。

Bleh …这花了我很长时间才弄清楚,我在任何地方都找不到解决方案…我什至转到了Google搜索结果的第2页。

希望这对某人有帮助!

I know this is an old question but I fought with this same issue all day and my solution was a simple one.

I had my directory structure something along the lines of…

apps/
   app/
      __init__.py
      app_sub1/
           __init__.py
           models.py
      app_sub2/
           __init__.py
           models.py
      app_sub3/
           __init__.py
           models.py
   app2/
      __init__.py
      app2_sub1/
           __init__.py
           models.py
      app2_sub2/
           __init__.py
           models.py
      app2_sub3/
           __init__.py
           models.py
    main_app/
      __init__.py
      models.py

And since all the other models up until the one I had a problem with were being imported somewhere else that ended up importing from main_app which was registered in the INSTALLED_APPS, I just got lucky that they all worked.

But since I only added each app to INSTALLED_APPS and not the app_sub* when I finally added a new models file that wasn’t imported ANYWHERE else, Django totally ignored it.

My fix was adding a models.py file to the base directory of each app like this…

apps/
   app/
      __init__.py
      models.py <<<<<<<<<<--------------------------
      app_sub1/
           __init__.py
           models.py
      app_sub2/
           __init__.py
           models.py
      app_sub3/
           __init__.py
           models.py
   app2/
      __init__.py
      models.py <<<<<<<<<<--------------------------
      app2_sub1/
           __init__.py
           models.py
      app2_sub2/
           __init__.py
           models.py
      app2_sub3/
           __init__.py
           models.py
    main_app/
      __init__.py
      models.py

and then add from apps.app.app_sub1 import * and so on to each of the app level models.py files.

Bleh… this took me SO long to figure out and I couldn’t find the solution anywhere… I even went to page 2 of the google results.

Hope this helps someone!


回答 17

根据官方文档中的“迁移”部分,我在django 3.0中遇到了类似的问题,运行它足以更新我的表结构:

python manage.py makemigrations
python manage.py migrate

但是输出始终是相同的:执行“ makemigrations”脚本后,模型“未检测到更改”。我要在数据库上更新的模型在models.py上出现语法错误:

field_model : models.CharField(max_length=255, ...)

代替:

field_model = models.CharField(max_length=255, ...)

解决了这个愚蠢的错误,使用这些命令可以顺利进行迁移。也许这可以帮助某人。

I had a similar issue with django 3.0, according migrations section in the official documentation, running this was enough to update my table structure:

python manage.py makemigrations
python manage.py migrate

But the output was always the same: ‘no change detected’ about my models after I executed ‘makemigrations’ script. I had a syntax error on models.py at the model I wanted to update on db:

field_model : models.CharField(max_length=255, ...)

instead of:

field_model = models.CharField(max_length=255, ...)

Solving this stupid mistake, with those command the migration was done without problems. Maybe this helps someone.


回答 18

您应该添加polls.apps.PollsConfigINSTALLED_APPSsetting.py

You should add polls.apps.PollsConfig to INSTALLED_APPS in setting.py


回答 19

就我而言,我忘记插入类参数

错误:

class AccountInformation():

正确

class AccountInformation(models.Model):

In my case i forgot to insert the class arguments

Wrong:

class AccountInformation():

Correct

class AccountInformation(models.Model):

回答 20

就我而言,我首先向模型添加了一个字段,而Django说没有任何变化。

比起我决定更改模型的“表名”,makemigrations起作用了。比起将表名改回默认值,新字段也在那里。

django迁移系统中有一个“ bug”,有时看不到新字段。可能与日期字段有关。

In my case, I first added a field to the model, and Django said there’re no changes.

Than I decided to change the “table name” of the model, makemigrations worked. Than I changed table name back to default, and the new field was also there.

There is a “bug” in django migration system, sometimes it doesn’t see the new field. Might be related with date field.


回答 21

可能的原因可能是删除了现有的db文件和migrations文件夹,您可以使用python manage.py makemigrations <app_name>来工作。我曾经遇到过类似的问题。

The possible reason could be deletion of the existing db file and migrations folder you can use python manage.py makemigrations <app_name> this should work. I once faced a similar problem.


回答 22

另一种边缘情况和解决方案:

我添加了一个布尔字段,并同时添加了一个引用该属性的@property,其名称相同(doh)。注释了该属性,并且迁移看到并添加了新字段。重命名该属性,一切都很好。

One more edge case and solution:

I added a boolean field, and at the same time added an @property referencing it, with the same name (doh). Commented the property and migration sees and adds the new field. Renamed the property and all is good.


回答 23

如果您拥有managed = True模型内部元数据,则需要将其删除并进行迁移。然后再次运行迁移,它将检测到新更新。

If you have the managed = True in yout model Meta, you need to remove it and do a migration. Then run the migrations again, it will detect the new updates.


回答 24

将新模型添加到django api应用程序并运行python manage.py makemigrations该工具时,未检测到任何新模型。

奇怪的是,旧模型确实被选中了makemigrations,但这是因为它们在urlpatterns链中并且该工具以某种方式检测到了它们。因此,请注意该行为。

问题是因为与models包相对应的目录结构具有子包,并且所有__init__.py文件均为空。他们必须在每个子文件夹和模型中 显式导入所有必需的类,以__init__.py供Django使用该makemigrations工具将其提取。

models
  ├── __init__.py          <--- empty
  ├── patient
     ├── __init__.py      <--- empty
     ├── breed.py
     └── ...
  ├── timeline
     ├── __init__.py      <-- empty
     ├── event.py
     └── ...

When adding new models to the django api application and running the python manage.py makemigrations the tool did not detect any new models.

The strange thing was that the old models did got picked by makemigrations, but this was because they were referenced in the urlpatterns chain and the tool somehow detected them. So keep an eye on that behavior.

The problem was because the directory structure corresponding to the models package had subpackages and all the __init__.py files were empty. They must explicitly import all the required classes in each subfolder and in the models __init__.py for Django to pick them up with the makemigrations tool.

models
  ├── __init__.py          <--- empty
  ├── patient
  │   ├── __init__.py      <--- empty
  │   ├── breed.py
  │   └── ...
  ├── timeline
  │   ├── __init__.py      <-- empty
  │   ├── event.py
  │   └── ...

回答 25

尝试在admin.py中注册模型,这是一个示例:-admin.site.register(YourModelHere)

您可以执行以下操作:1. 1. admin.site.register(YourModelHere)#在admin.py中2.重新加载页面并重试3.按下CTRL-S并保存4.可能有错误,特别是检查模型.py和admin.py5。或者,在结束时,只需重新启动服务器即可。

Try registering your model in admin.py, here’s an example:- admin.site.register(YourModelHere)

You can do the following things:- 1. admin.site.register(YourModelHere) # In admin.py 2. Reload the page and try again 3. Hit CTRL-S and save 4. There might be an error, specially check models.py and admin.py 5. Or, at the end of it all just restart the server


回答 26

这可能希望对其他人有所帮助,因为我最终花了数小时试图将其消除。

如果你有一个函数模型由同一个名字,这将删除值。事后看来很明显,但仍然如此。

因此,如果您有这样的事情:

class Foobar(models.Model):
    [...]
    something = models.BooleanField(default=False)

    [...]
    def something(self):
        return [some logic]

在这种情况下,该功能将覆盖上面的设置,使其对变为“不可见” makemigrations

This might hopefully help someone else, as I ended up spending hours trying to chase this down.

If you have a function within your model by the same name, this will remove the value. Pretty obvious in hindsight, but nonetheless.

So, if you have something like this:

class Foobar(models.Model):
    [...]
    something = models.BooleanField(default=False)

    [...]
    def something(self):
        return [some logic]

In that case, the function will override the setting above, making it “invisible” to makemigrations.


回答 27

您可以做的最好的事情是,删除现有数据库。就我而言,我使用的是phpMyAdmin SQL数据库,因此我手动删除了上面创建的数据库。

删除后: 我在PhpMyAdmin中创建数据库,并且不添加任何表。

再次运行以下命令:

python manage.py makemigrations

python manage.py migrate

这些命令之后:您可以看到django在数据库中自动创建了其他必要的表(大约有10个表)。

python manage.py makemigrations <app_name>

python manage.py migrate

最后:在上述命令之后,您创建的所有模型(表)都将直接导入数据库。

希望这会有所帮助。

The Best Thing You can do is, Delete the existing database. In my case, I were using phpMyAdmin SQL database, so I manually delete the created database overthere.

After Deleting: I create database in PhpMyAdmin, and doesn,t add any tables.

Again run the following Commands:

python manage.py makemigrations

python manage.py migrate

After These Commands: You can see django has automatically created other necessary tables in Database(Approx there are 10 tables).

python manage.py makemigrations <app_name>

python manage.py migrate

And Lastly: After above commands all the model(table) you have created are directly imported to the database.

Hope this will help.


回答 28

我对此错误的问题是,我包括了:

class Meta:
   abstract = True

我要为其创建的内部模型。

My problem with this error, was that I had included:

class Meta:
   abstract = True

Inside model that I wanted to creation migrate for.


回答 29

创建一个名为的新应用时,我遇到了另一个问题deals。我想在该应用中分离模型,所以我有2个名为deals.py和的模型文件dealers.py。运行时,python manage.py makemigrations我得到了:No changes detected

我继续前进,并将__init__.py其放在我的模型文件所在的同一目录(交易和经销商)中

from .deals import *
from .dealers import *

然后makemigrations命令起作用了。

原来,如果您不打算在任何地方导入模型,或者模型文件名不是 models.py不会检测到模型。

我遇到的另一个问题是我在settings.py以下位置编写应用程序的方式:

我有:

apps.deals

它应该已经包含了根项目文件夹:

cars.apps.deals

I had a different issue while creating a new app called deals. I wanted to separate the models inside that app so I had 2 model files named deals.py and dealers.py. When running python manage.py makemigrations I got: No changes detected.

I went ahead and inside the __init__.py which lives on the same directory where my model files lived (deals and dealer) I did

from .deals import *
from .dealers import *

And then the makemigrations command worked.

Turns out that if you are not importing the models anywhere OR your models file name isn’t models.py then the models wont be detected.

Another issue that happened to me is the way I wrote the app in settings.py:

I had:

apps.deals

It should’ve been including the root project folder:

cars.apps.deals

了解__getitem__方法

问题:了解__getitem__方法

我已经阅读了__getitem__Python文档中的大多数文档,但仍然无法理解它的含义。

因此,我所能理解的__getitem__就是用于实现类似的调用self[key]。但是它有什么用?

可以说我以这种方式定义了一个python类:

class Person:
    def __init__(self,name,age):
        self.name = name
        self.age = age

    def __getitem__(self,key):
        print ("Inside `__getitem__` method!")
        return getattr(self,key)

p = Person("Subhayan",32)
print (p["age"])

这将返回预期的结果。但是为什么要__getitem__首先使用?我还听说过Python __getitem__内部调用。但是为什么要这样做呢?

有人可以详细解释一下吗?

I have gone through most of the documentation of __getitem__ in the Python docs, but I am still unable to grasp the meaning of it.

So all I can understand is that __getitem__ is used to implement calls like self[key]. But what is the use of it?

Lets say I have a python class defined in this way:

class Person:
    def __init__(self,name,age):
        self.name = name
        self.age = age

    def __getitem__(self,key):
        print ("Inside `__getitem__` method!")
        return getattr(self,key)

p = Person("Subhayan",32)
print (p["age"])

This returns the results as expected. But why use __getitem__ in the first place? I have also heard that Python calls __getitem__ internally. But why does it do it?

Can someone please explain this in more detail?


回答 0

马聪(Cong Ma)很好地解释了__getitem__用途-但我想举一个可能有用的示例。想象一个为建筑物建模的类。在建筑物数据中,它包含许多属性,包括对占据每个楼层的公司的描述:

如果不使用,__getitem__我们将有一个像这样的类:

class Building(object):
     def __init__(self, floors):
         self._floors = [None]*floors
     def occupy(self, floor_number, data):
          self._floors[floor_number] = data
     def get_floor_data(self, floor_number):
          return self._floors[floor_number]

building1 = Building(4) # Construct a building with 4 floors
building1.occupy(0, 'Reception')
building1.occupy(1, 'ABC Corp')
building1.occupy(2, 'DEF Inc')
print( building1.get_floor_data(2) )

但是,我们可以使用__getitem__(及其对应的__setitem__)来使用Building类“ nicer”。

class Building(object):
     def __init__(self, floors):
         self._floors = [None]*floors
     def __setitem__(self, floor_number, data):
          self._floors[floor_number] = data
     def __getitem__(self, floor_number):
          return self._floors[floor_number]

building1 = Building(4) # Construct a building with 4 floors
building1[0] = 'Reception'
building1[1] = 'ABC Corp'
building1[2] = 'DEF Inc'
print( building1[2] )

是否__setitem__像这样使用,实际上取决于您计划如何抽象化数据-在这种情况下,我们决定将建筑物视为楼层的容器(并且您还可以为建筑物实现迭代器,甚至可以实现切片功能-即一次获取多个楼层的数据-这取决于您的需求。

Cong Ma does a good job of explaining what __getitem__ is used for – but I want to give you an example which might be useful. Imagine a class which models a building. Within the data for the building it includes a number of attributes, including descriptions of the companies that occupy each floor :

Without using __getitem__ we would have a class like this :

class Building(object):
     def __init__(self, floors):
         self._floors = [None]*floors
     def occupy(self, floor_number, data):
          self._floors[floor_number] = data
     def get_floor_data(self, floor_number):
          return self._floors[floor_number]

building1 = Building(4) # Construct a building with 4 floors
building1.occupy(0, 'Reception')
building1.occupy(1, 'ABC Corp')
building1.occupy(2, 'DEF Inc')
print( building1.get_floor_data(2) )

We could however use __getitem__ (and its counterpart __setitem__) to make the usage of the Building class ‘nicer’.

class Building(object):
     def __init__(self, floors):
         self._floors = [None]*floors
     def __setitem__(self, floor_number, data):
          self._floors[floor_number] = data
     def __getitem__(self, floor_number):
          return self._floors[floor_number]

building1 = Building(4) # Construct a building with 4 floors
building1[0] = 'Reception'
building1[1] = 'ABC Corp'
building1[2] = 'DEF Inc'
print( building1[2] )

Whether you use __setitem__ like this really depends on how you plan to abstract your data – in this case we have decided to treat a building as a container of floors (and you could also implement an iterator for the Building, and maybe even the ability to slice – i.e. get more than one floor’s data at a time – it depends on what you need.


回答 1

[]通过键或索引获取项目的语法仅仅是语法糖。

在评估a[i]Python调用时a.__getitem__(i)(或type(a).__getitem__(a, i),但是这种区别是关于继承模型的,在这里并不重要)。即使的类a可能未明确定义此方法,它也通常是从祖先类继承的。

此处列出了所有(Python 2.7)特殊方法名称及其语义:https : //docs.python.org/2.7/reference/datamodel.html#special-method-names

The [] syntax for getting item by key or index is just syntax sugar.

When you evaluate a[i] Python calls a.__getitem__(i) (or type(a).__getitem__(a, i), but this distinction is about inheritance models and is not important here). Even if the class of a may not explicitly define this method, it is usually inherited from an ancestor class.

All the (Python 2.7) special method names and their semantics are listed here: https://docs.python.org/2.7/reference/datamodel.html#special-method-names


回答 2

magic方法__getitem__基本上用于访问列表项,字典条目,数组元素等。它对于快速查找实例属性非常有用。

在这里,我用一个示例类Person来显示这一点,该类可以通过“名称”,“年龄”和“出生日期”(出生日期)实例化。该__getitem__方法以一种可以访问索引实例属性的方式编写,例如,名字或姓氏,日期,日期,月份或年份等。

import copy

# Constants that can be used to index date of birth's Date-Month-Year
D = 0; M = 1; Y = -1

class Person(object):
    def __init__(self, name, age, dob):
        self.name = name
        self.age = age
        self.dob = dob

    def __getitem__(self, indx):
        print ("Calling __getitem__")
        p = copy.copy(self)

        p.name = p.name.split(" ")[indx]
        p.dob = p.dob[indx] # or, p.dob = p.dob.__getitem__(indx)
        return p

假设一个用户输入如下:

p = Person(name = 'Jonab Gutu', age = 20, dob=(1, 1, 1999))

借助__getitem__方法,用户可以访问索引属性。例如,

print p[0].name # print first (or last) name
print p[Y].dob  # print (Date or Month or ) Year of the 'date of birth'

The magic method __getitem__ is basically used for accessing list items, dictionary entries, array elements etc. It is very useful for a quick lookup of instance attributes.

Here I am showing this with an example class Person that can be instantiated by ‘name’, ‘age’, and ‘dob’ (date of birth). The __getitem__ method is written in a way that one can access the indexed instance attributes, such as first or last name, day, month or year of the dob, etc.

import copy

# Constants that can be used to index date of birth's Date-Month-Year
D = 0; M = 1; Y = -1

class Person(object):
    def __init__(self, name, age, dob):
        self.name = name
        self.age = age
        self.dob = dob

    def __getitem__(self, indx):
        print ("Calling __getitem__")
        p = copy.copy(self)

        p.name = p.name.split(" ")[indx]
        p.dob = p.dob[indx] # or, p.dob = p.dob.__getitem__(indx)
        return p

Suppose one user input is as follows:

p = Person(name = 'Jonab Gutu', age = 20, dob=(1, 1, 1999))

With the help of __getitem__ method, the user can access the indexed attributes. e.g.,

print p[0].name # print first (or last) name
print p[Y].dob  # print (Date or Month or ) Year of the 'date of birth'

为什么TensorFlow 2比TensorFlow 1慢得多?

问题:为什么TensorFlow 2比TensorFlow 1慢得多?

许多用户都将其作为切换到Pytorch的原因,但是我还没有找到牺牲/最渴望的实用质量,速度和执行力的理由/解释。

以下是代码基准测试性能,即TF1与TF2的对比-TF1的运行速度提高了47%至276%

我的问题是:在图形或硬件级别上,什么导致如此显着的下降?


寻找详细的答案-已经熟悉广泛的概念。相关的Git

规格:CUDA 10.0.130,cuDNN 7.4.2,Python 3.7.4,Windows 10,GTX 1070


基准测试结果


UPDATE:禁用每下面的代码不会急于执行没有帮助。但是,该行为是不一致的:有时以图形方式运行会有所帮助,而其他时候其运行速度要比 Eager

由于TF开发人员没有出现在任何地方,因此我将自己进行调查-可以跟踪相关Github问题的进展。

更新2:分享大量实验结果,并附有解释;应该在今天完成。


基准代码

# use tensorflow.keras... to benchmark tf.keras; used GPU for all above benchmarks
from keras.layers import Input, Dense, LSTM, Bidirectional, Conv1D
from keras.layers import Flatten, Dropout
from keras.models import Model
from keras.optimizers import Adam
import keras.backend as K
import numpy as np
from time import time

batch_shape = (32, 400, 16)
X, y = make_data(batch_shape)

model_small = make_small_model(batch_shape)
model_small.train_on_batch(X, y)  # skip first iteration which builds graph
timeit(model_small.train_on_batch, 200, X, y)

K.clear_session()  # in my testing, kernel was restarted instead

model_medium = make_medium_model(batch_shape)
model_medium.train_on_batch(X, y)  # skip first iteration which builds graph
timeit(model_medium.train_on_batch, 10, X, y)

使用的功能

def timeit(func, iterations, *args):
    t0 = time()
    for _ in range(iterations):
        func(*args)
    print("Time/iter: %.4f sec" % ((time() - t0) / iterations))

def make_small_model(batch_shape):
    ipt   = Input(batch_shape=batch_shape)
    x     = Conv1D(128, 400, strides=4, padding='same')(ipt)
    x     = Flatten()(x)
    x     = Dropout(0.5)(x)
    x     = Dense(64, activation='relu')(x)
    out   = Dense(1,  activation='sigmoid')(x)
    model = Model(ipt, out)
    model.compile(Adam(lr=1e-4), 'binary_crossentropy')
    return model

def make_medium_model(batch_shape):
    ipt   = Input(batch_shape=batch_shape)
    x     = Bidirectional(LSTM(512, activation='relu', return_sequences=True))(ipt)
    x     = LSTM(512, activation='relu', return_sequences=True)(x)
    x     = Conv1D(128, 400, strides=4, padding='same')(x)
    x     = Flatten()(x)
    x     = Dense(256, activation='relu')(x)
    x     = Dropout(0.5)(x)
    x     = Dense(128, activation='relu')(x)
    x     = Dense(64,  activation='relu')(x)
    out   = Dense(1,   activation='sigmoid')(x)
    model = Model(ipt, out)
    model.compile(Adam(lr=1e-4), 'binary_crossentropy')
    return model

def make_data(batch_shape):
    return np.random.randn(*batch_shape), np.random.randint(0, 2, (batch_shape[0], 1))

It’s been cited by many users as the reason for switching to Pytorch, but I’ve yet to find a justification / explanation for sacrificing the most important practical quality, speed, for eager execution.

Below is code benchmarking performance, TF1 vs. TF2 – with TF1 running anywhere from 47% to 276% faster.

My question is: what is it, at the graph or hardware level, that yields such a significant slowdown?


Looking for a detailed answer – am already familiar with broad concepts. Relevant Git

Specs: CUDA 10.0.130, cuDNN 7.4.2, Python 3.7.4, Windows 10, GTX 1070


Benchmark results:


UPDATE: Disabling Eager Execution per below code does not help. The behavior, however, is inconsistent: sometimes running in graph mode helps considerably, other times it runs slower relative to Eager.

As TF devs don’t appear around anywhere, I’ll be investigating this matter myself – can follow progress in the linked Github issue.

UPDATE 2: tons of experimental results to share, along explanations; should be done today.


Benchmark code:

# use tensorflow.keras... to benchmark tf.keras; used GPU for all above benchmarks
from keras.layers import Input, Dense, LSTM, Bidirectional, Conv1D
from keras.layers import Flatten, Dropout
from keras.models import Model
from keras.optimizers import Adam
import keras.backend as K
import numpy as np
from time import time

batch_shape = (32, 400, 16)
X, y = make_data(batch_shape)

model_small = make_small_model(batch_shape)
model_small.train_on_batch(X, y)  # skip first iteration which builds graph
timeit(model_small.train_on_batch, 200, X, y)

K.clear_session()  # in my testing, kernel was restarted instead

model_medium = make_medium_model(batch_shape)
model_medium.train_on_batch(X, y)  # skip first iteration which builds graph
timeit(model_medium.train_on_batch, 10, X, y)

Functions used:

def timeit(func, iterations, *args):
    t0 = time()
    for _ in range(iterations):
        func(*args)
    print("Time/iter: %.4f sec" % ((time() - t0) / iterations))

def make_small_model(batch_shape):
    ipt   = Input(batch_shape=batch_shape)
    x     = Conv1D(128, 400, strides=4, padding='same')(ipt)
    x     = Flatten()(x)
    x     = Dropout(0.5)(x)
    x     = Dense(64, activation='relu')(x)
    out   = Dense(1,  activation='sigmoid')(x)
    model = Model(ipt, out)
    model.compile(Adam(lr=1e-4), 'binary_crossentropy')
    return model

def make_medium_model(batch_shape):
    ipt   = Input(batch_shape=batch_shape)
    x     = Bidirectional(LSTM(512, activation='relu', return_sequences=True))(ipt)
    x     = LSTM(512, activation='relu', return_sequences=True)(x)
    x     = Conv1D(128, 400, strides=4, padding='same')(x)
    x     = Flatten()(x)
    x     = Dense(256, activation='relu')(x)
    x     = Dropout(0.5)(x)
    x     = Dense(128, activation='relu')(x)
    x     = Dense(64,  activation='relu')(x)
    out   = Dense(1,   activation='sigmoid')(x)
    model = Model(ipt, out)
    model.compile(Adam(lr=1e-4), 'binary_crossentropy')
    return model

def make_data(batch_shape):
    return np.random.randn(*batch_shape), np.random.randint(0, 2, (batch_shape[0], 1))

回答 0

2020年2月18日更新:我每晚排练 2.1和2.1;结果好坏参半。除了一个配置(模型和数据大小)外,其他配置的运行速度都快于TF2和TF1的最佳配置。速度较慢且急剧下降的是大型-尤其是。在图形执行中(慢1.6倍至2.5倍)。

此外,对于我测试的大型模型,Graph和Eager之间存在极大的可重复性差异-无法通过随机性/计算并行性来解释这一差异。我目前无法按时间限制显示这些声明的可重现代码,因此我强烈建议您针对自己的模型进行测试。

尚未针对这些问题打开Git问题,但我确实对原始内容发表了评论-尚未回复。取得进展后,我将更新答案。


VERDICT:它是不是,如果你知道自己在做什么。但是,如果您不这样做,则可能会花费大量成本-平均而言,需要进行几次GPU升级,而在最坏的情况下,则需要多个GPU。


答案:旨在提供对该问题的高级描述,以及有关如何根据您的需求决定培训配置的指南。有关详细的低级描述(包括所有基准测试结果和所使用的代码),请参阅我的其他答案。

如果我学到更多信息,我将更新我的答案,并提供更多信息-可以为该问题添加书签/“加上星号”以供参考。


问题摘要:正如TensorFlow开发人员Q. Scott Zhu 确认的那样,TF2专注于Eager执行和带有Keras的紧密集成的开发,这涉及到TF源的全面更改-包括图形级。好处:大大扩展了处理,分发,调试和部署功能。但是,其中一些成本是速度。

但是,这个问题要复杂得多。不仅仅是TF1和TF2-导致火车速度显着差异的因素包括:

  1. TF2与TF1
  2. 渴望与图表模式
  3. kerastf.keras
  4. numpyvs. tf.data.Datasetvs ….
  5. train_on_batch()fit()
  6. GPU与CPU
  7. model(x)vs. model.predict(x)vs ….

不幸的是,以上几乎没有一个是彼此独立的,并且每个相对于另一个可以至少使执行时间加倍。幸运的是,您可以确定哪些是系统上最有效的方法,并提供一些捷径-正如我将要展示的。


我该怎么办?当前,唯一的方法是-针对您的特定模型,数据和硬件进行实验。没有任何单一的配置总是最好的工作-但也做的,并没有对简化搜索:

>>做:

  • train_on_batch()++ numpy+ tf.kerasTF1 +热切/图
  • train_on_batch()+ numpy+ tf.keras+ + TF2图
  • fit()++ numpy+ tf.kerasTF1 / TF2 +图表+大型模型和数据

>>不要:

  • fit()+ numpy+ keras用于中小型模型和数据
  • fit()++ numpy+ tf.kerasTF1 / TF2 +渴望
  • train_on_batch()+ numpy+ keras+ + TF1伊格

  • [主要] tf.python.keras;它的运行速度可以降低10到100倍,并且带有许多错误;更多信息

    • 这包括layersmodelsoptimizers,和相关的“乱用”的使用进口; ops,utils和相关的“私有”导入都可以-但可以肯定的是,请检查alt以及它们是否用于tf.keras

请参阅其他答案底部的代码,以获取基准测试设置示例。上面的列表主要基于其他答案中的“ BENCHMARKS”表。


上述注意事项的局限性

  • 这个问题的标题是“为什么TF2比TF1慢得多?”,尽管它的主体明确地涉及训练,但问题并不局限于此。即使在相同的TF版本,导入,数据格式等中,推理也将受到主要速度差异的影响-参见此答案
  • RNN在TF2中得到了改进,很可能会明显改变其他答案中的数据网格。
  • 模型主要用于Conv1DDense-不RNNs,稀疏数据/目标,4 / 5D输入,和其他CONFIGS
  • 输入数据限制为numpytf.data.Dataset,同时存在许多其他格式;查看其他答案
  • 使用了GPU;结果在CPU上有所不同。实际上,当我问这个问题时,我的CUDA配置不正确,并且某些结果是基于CPU的。

为什么TF2为急切执行而牺牲了最实用的质量,速度?显然,它还没有-图形仍然可用。但是如果问题是“为什么要渴望”:

  • 出色的调试:您可能会遇到许多问题,询问“如何获得中间层输出”或“如何检查权重”;渴望,它(几乎)很简单.__dict__。相比之下,Graph需要熟悉特殊的后端功能-极大地增加了调试和自省的整个过程。
  • 更快的原型制作:与上述类似的想法;更快的理解=剩下更多的时间用于实际DL。

如何启用/禁用EAGER?

tf.enable_eager_execution()  # TF1; must be done before any model/tensor creation
tf.compat.v1.disable_eager_execution() # TF2; above holds

附加信息

  • 仔细_on_batch()研究TF2中的方法;根据TF开发人员的说法,他们仍然使用较慢的实现方式,但不是故意的 -即必须解决。有关详细信息,请参见其他答案。

张力流需求

  1. 请修复train_on_batch(),以及fit()迭代调用的性能方面;定制火车循环对许多人尤其是我来说很重要。
  2. 添加有关这些性能差异的文档/文档字符串,以供用户了解。
  3. 提高一般执行速度,以防止窥视现象跳入Pytorch。

致谢:感谢


更新

  • 191114日 -找到了一个模型(在我的实际应用程序中),该模型在TF2上针对所有*配置(带有Numpy输入数据)的速度较慢。差异范围为13-19%,平均为17%。但是,keras和之间的tf.keras差异更为明显:平均18-40%。32%(TF1和2)。(*-渴望者(TF2 OOM’d为此)

  • 11/17/19 -devs on_batch()最近的一次提交中更新了方法,指出已提高了速度-将在TF 2.1中发布,或现在以形式提供tf-nightly。由于我无法让后者运行,因此将替补席推迟到2.1。

  • 2/20/20-预测性能也值得借鉴;例如,在TF2中,CPU预测时间可能涉及周期性的峰值

UPDATE 2/18/2020: I’ve benched 2.1 and 2.1-nightly; the results are mixed. All but one configs (model & data size) are as fast as or much faster than the best of TF2 & TF1. The one that’s slower, and slower dramatically, is Large-Large – esp. in Graph execution (1.6x to 2.5x slower).

Furthermore, there are extreme reproducibility differences between Graph and Eager for a large model I tested – one not explainable via randomness/compute-parallelism. I can’t currently present reproducible code for these claims per time constraints, so instead I strongly recommend testing this for your own models.

Haven’t opened a Git issue on these yet, but I did comment on the original – no response yet. I’ll update the answer(s) once progress is made.


VERDICT: it isn’t, IF you know what you’re doing. But if you don’t, it could cost you, lots – by a few GPU upgrades on average, and by multiple GPUs worst-case.


THIS ANSWER: aims to provide a high-level description of the issue, as well as guidelines for how to decide on the training configuration specific to your needs. For a detailed, low-level description, which includes all benchmarking results + code used, see my other answer.

I’ll be updating my answer(s) w/ more info if I learn any – can bookmark / “star” this question for reference.


ISSUE SUMMARY: as confirmed by a TensorFlow developer, Q. Scott Zhu, TF2 focused development on Eager execution & tight integration w/ Keras, which involved sweeping changes in TF source – including at graph-level. Benefits: greatly expanded processing, distribution, debug, and deployment capabilities. The cost of some of these, however, is speed.

The matter, however, is fairly more complex. It isn’t just TF1 vs. TF2 – factors yielding significant differences in train speed include:

  1. TF2 vs. TF1
  2. Eager vs. Graph mode
  3. keras vs. tf.keras
  4. numpy vs. tf.data.Dataset vs. …
  5. train_on_batch() vs. fit()
  6. GPU vs. CPU
  7. model(x) vs. model.predict(x) vs. …

Unfortunately, almost none of the above are independent of the other, and each can at least double execution time relative to another. Fortunately, you can determine what’ll work best systematically, and with a few shortcuts – as I’ll be showing.


WHAT SHOULD I DO? Currently, the only way is – experiment for your specific model, data, and hardware. No single configuration will always work best – but there are do’s and don’t’s to simplify your search:

>> DO:

  • train_on_batch() + numpy + tf.keras + TF1 + Eager/Graph
  • train_on_batch() + numpy + tf.keras + TF2 + Graph
  • fit() + numpy + tf.keras + TF1/TF2 + Graph + large model & data

>> DON’T:

  • fit() + numpy + keras for small & medium models and data
  • fit() + numpy + tf.keras + TF1/TF2 + Eager
  • train_on_batch() + numpy + keras + TF1 + Eager

  • [Major] tf.python.keras; it can run 10-100x slower, and w/ plenty of bugs; more info

    • This includes layers, models, optimizers, & related “out-of-box” usage imports; ops, utils, & related ‘private’ imports are fine – but to be sure, check for alts, & whether they’re used in tf.keras

Refer to code at bottom of my other answer for an example benchmarking setup. The list above is based mainly on the “BENCHMARKS” tables in the other answer.


LIMITATIONS of the above DO’s & DON’T’s:

  • This question’s titled “Why is TF2 much slower than TF1?”, and while its body concerns training explicitly, the matter isn’t limited to it; inference, too, is subject to major speed differences, even within the same TF version, import, data format, etc. – see this answer.
  • RNNs are likely to notably change the data grid in the other answer, as they’ve been improved in TF2
  • Models primarily used Conv1D and Dense – no RNNs, sparse data/targets, 4/5D inputs, & other configs
  • Input data limited to numpy and tf.data.Dataset, while many other formats exist; see other answer
  • GPU was used; results will differ on a CPU. In fact, when I asked the question, my CUDA wasn’t properly configured, and some of the results were CPU-based.

Why did TF2 sacrifice the most practical quality, speed, for eager execution? It hasn’t, clearly – graph is still available. But if the question is “why eager at all”:

  • Superior debugging: you’ve likely come across multitudes of questions asking “how do I get intermediate layer outputs” or “how do I inspect weights”; with eager, it’s (almost) as simple as .__dict__. Graph, in contrast, requires familiarity with special backend functions – greatly complicating the entire process of debugging & introspection.
  • Faster prototyping: per ideas similar to above; faster understanding = more time left for actual DL.

HOW TO ENABLE/DISABLE EAGER?

tf.enable_eager_execution()  # TF1; must be done before any model/tensor creation
tf.compat.v1.disable_eager_execution() # TF2; above holds

ADDITIONAL INFO:

  • Careful with _on_batch() methods in TF2; according to the TF dev, they still use a slower implementation, but not intentionally – i.e. it’s to be fixed. See other answer for details.

REQUESTS TO TENSORFLOW DEVS:

  1. Please fix train_on_batch(), and the performance aspect of calling fit() iteratively; custom train loops are important to many, especially to me.
  2. Add documentation / docstring mention of these performance differences for users’ knowledge.
  3. Improve general execution speed to keep peeps from hopping to Pytorch.

ACKNOWLEDGEMENTS: Thanks to


UPDATES:

  • 11/14/19 – found a model (in my real application) that that runs slower on TF2 for all* configurations w/ Numpy input data. Differences ranged 13-19%, averaging 17%. Differences between keras and tf.keras, however, were more dramatic: 18-40%, avg. 32% (both TF1 & 2). (* – except Eager, for which TF2 OOM’d)

  • 11/17/19 – devs updated on_batch() methods in a recent commit, stating to have improved speed – to be released in TF 2.1, or available now as tf-nightly. As I’m unable to get latter running, will delay benching until 2.1.

  • 2/20/20 – prediction performance is also worth benching; in TF2, for example, CPU prediction times can involve periodic spikes

回答 1

解答:旨在提供对该问题的详细图形/硬件级别描述-包括TF2与TF1训练循环,输入数据处理器以及Eager与Graph模式的执行。有关问题摘要和解决方案的指南,请参见我的其他答案。


性能验证:有时一个更快,有时另一个更快,具体取决于配置。就TF2与TF1而言,它们的平均水平差不多,但是确实存在基于配置的重大差异,并且TF1比TF2更为常见。请参阅下面的“标记”。


EAGER VS. GRAPH:这可以说是整个答案的关键:根据我的测试,TF2的渴望比TF1的渴望。细节进一步下降。

两者之间的根本区别是:Graph 主动设置计算网络,并在“提示”时执行-而Eager在创建时执行所有操作。但故事只从这里开始:

  • 渴望并不是没有Graph,实际上可能主要是 Graph,这与预期相反。它主要是执行图 -包括模型和优化器权重,占图的很大一部分。

  • 渴望在执行时重建自己图的一部分 ; Graph未完全构建的直接结果-请参阅分析器结果。这具有计算开销。

  • 渴望慢与脾气暴躁的输入 ; 根据此Git注释和代码,Eager中的Numpy输入包括将张量从CPU复制到GPU的开销成本。遍历源代码,数据处理差异很明显;渴望直接通过Numpy,而图则通过张量,然后求和为Numpy。不确定确切的过程,但后者应涉及GPU级别的优化

  • TF2 Eager 比TF1 Eager -这是…意外。请参阅下面的基准测试结果。差异从可以忽略不计到显着,但是是一致的。不确定为什么会这样-如果TF开发人员澄清了,将会更新答案。


TF2与TF1:引用TF开发人员Q. Scott Zhu的相关部分的回复 -附上我的强调和改写:

急切地,运行时需要执行ops并为python代码的每一行返回数值。单步执行的性质使其运行缓慢

在TF2中,Keras利用tf.function构建其图形进行训练,评估和预测。我们称它们为模型的“执行功能”。在TF1中,“执行功能”是FuncGraph,它与TF功能共享一些公共组件,但是实现方式不同。

在此过程中,我们以某种方式为train_on_batch(),test_on_batch()和预报_on_batch()留下了错误的实现。它们在数值上仍然是正确的,但是x_on_batch的执行函数是纯python函数,而不是tf.function包装的python函数。这会导致缓慢

在TF2中,我们将所有输入数据转换为tf.data.Dataset,通过它我们可以统一执行函数来处理单一类型的输入。数据集转换中可能会有一些开销,我认为这是一次性的开销,而不是每次批处理的开销

带有上一段的最后一句和下段的最后一句:

为了克服急切模式下的缓慢性,我们提供了@ tf.function,它将把python函数变成图形。当像np数组一样输入数值时,tf.function的主体将转换为静态图,进行优化,并返回最终值,该值很快,并且应具有与TF1图模式相似的性能。

我不同意-根据我的分析结果,该结果表明Eager的输入数据处理比Graph的处理要慢得多。另外,tf.data.Dataset尤其不确定,但是Eager确实反复调用了多个相同的数据转换方法-请参阅事件探查器。

最后,开发人员的链接提交:支持Keras v2循环的大量更改


训练循环:取决于(1)渴望与图表;(2)输入数据格式,训练将在一个独特的训练循环中进行-在TF2中_select_training_loop()training.py,其中之一:

training_v2.Loop()
training_distributed.DistributionMultiWorkerTrainingLoop(
              training_v2.Loop()) # multi-worker mode
# Case 1: distribution strategy
training_distributed.DistributionMultiWorkerTrainingLoop(
            training_distributed.DistributionSingleWorkerTrainingLoop())
# Case 2: generator-like. Input is Python generator, or Sequence object,
# or a non-distributed Dataset or iterator in eager execution.
training_generator.GeneratorOrSequenceTrainingLoop()
training_generator.EagerDatasetOrIteratorTrainingLoop()
# Case 3: Symbolic tensors or Numpy array-like. This includes Datasets and iterators 
# in graph mode (since they generate symbolic tensors).
training_generator.GeneratorLikeTrainingLoop() # Eager
training_arrays.ArrayLikeTrainingLoop() # Graph

每个人对资源分配的处理方式不同,并会对性能和功能造成影响。


火车循环:fitvs train_on_batchkerasvstf.keras:四个循环都使用不同的火车循环,尽管可能不是每种可能的组合。kerasfit,例如,使用的形式fit_loop,例如training_arrays.fit_loop(),其train_on_batch可以使用K.function()tf.keras具有更复杂的层次结构,在上一节中进行了部分描述。


训练循环:文档 -有关某些不同执行方法的相关源文档字符串

与其他TensorFlow操作不同,我们不会将python数值输入转换为张量。此外,将为每个不同的python数值生成一个新图

function 为每个唯一的输入形状和数据类型集实例化一个单独的图

一个tf.function对象可能需要映射到后台的多个计算图。这应该仅在性能上可见(跟踪图的计算和内存成本非零


输入数据处理器:与上述类似,根据运行时配置(执行模式,数据格式,分发策略)设置的内部标志,视情况选择处理器。最简单的情况是使用Eager,它可以直接与Numpy数组一起使用。有关某些特定示例,请参见此答案


模型大小,数据大小:

  • 是决定性的;没有任何一种配置能在所有型号和数据尺寸上脱颖而出。
  • 相对于模型大小的数据大小很重要;对于小型数据和模型,数据传输(例如,CPU至GPU)的开销可能占主导。同样,小型的开销处理器在每个数据转换时间对大型数据的运行速度上较慢(请参见 convert_to_tensor“配置文件”)
  • 速度因火车循环和输入数据处理器处理资源的方式而异。

基准:磨碎的肉。- Word文档Excel电子表格


术语

  • 减去%的数字都是
  • %计算为(1 - longer_time / shorter_time)*100; 理由:我们对哪个因素比另一个因素更快感兴趣shorter / longer实际上是非线性关系,对直接比较没有用
  • 百分号确定:
    • TF2 vs TF1:+如果TF2更快
    • GvE(图表vs.渴望):+如果图表更快
  • TF2 = TensorFlow 2.0.0 + Keras 2.3.1; TF1 = TensorFlow 1.14.0 + Keras 2.2.5

简介


PROFILER-说明:Spyder 3.3.6 IDE分析器。

  • 有些功能在其他嵌套中重复;因此,很难找到“数据处理”和“训练”功能之间的确切间隔,因此会有一些重叠-在最后一个结果中很明显。

  • 计算的wrt运行时数减去构建时间的百分比

  • 通过将所有(唯一)运行时间相加得出的构建时间来计算,这些运行时间称为1或2次
  • 通过累加所有(唯一的)运行时间(与迭代的次数和它们的嵌套的运行时间相同)计算出的训练时间
  • 不幸的是,函数是根据其原始名称进行_func = func概要分析的(即,将概要分析为func),这会混入构建时间-因此需要将其排除在外

测试环境

  • 底部执行的代码带有最少的后台任务运行
  • GPU是“热身” W /定时重复前几次反复,在提出这个帖子
  • 从源代码构建的CUDA 10.0.130,cuDNN 7.6.0,TensorFlow 1.14.0和TensorFlow 2.0.0,以及Anaconda
  • Python 3.7.4,Spyder 3.3.6 IDE
  • GTX 1070,Windows 10、24 GB DDR4 2.4 MHz RAM,i7-7700HQ 2.8 GHz CPU

方法

  • 基准“小”,“中”和“大”模型和数据大小
  • 固定每个模型大小的参数数,与输入数据大小无关
  • “较大”模型具有更多参数和层
  • “较大”的数据具有更长的序列,但相同batch_sizenum_channels
  • 模型只使用Conv1DDense“可学习”层; 每个TF版本的符号都避免了RNN。差异
  • 始终在基准循环之外运行一列火车,以省略模型和优化器图的构建
  • 不使用稀疏数据(例如layers.Embedding())或稀疏目标(例如SparseCategoricalCrossEntropy()

局限性:“完整”的答案将解释所有可能的火车循环和迭代器,但这肯定超出了我的时间能力,不存在薪水或一般必要性。结果仅与方法学一样好-用开放的心态进行解释。


代码

import numpy as np
import tensorflow as tf
import random
from termcolor import cprint
from time import time

from tensorflow.keras.layers import Input, Dense, Conv1D
from tensorflow.keras.layers import Dropout, GlobalAveragePooling1D
from tensorflow.keras.models import Model
from tensorflow.keras.optimizers import Adam
import tensorflow.keras.backend as K
#from keras.layers import Input, Dense, Conv1D
#from keras.layers import Dropout, GlobalAveragePooling1D
#from keras.models import Model 
#from keras.optimizers import Adam
#import keras.backend as K

#tf.compat.v1.disable_eager_execution()
#tf.enable_eager_execution()

def reset_seeds(reset_graph_with_backend=None, verbose=1):
    if reset_graph_with_backend is not None:
        K = reset_graph_with_backend
        K.clear_session()
        tf.compat.v1.reset_default_graph()
        if verbose:
            print("KERAS AND TENSORFLOW GRAPHS RESET")

    np.random.seed(1)
    random.seed(2)
    if tf.__version__[0] == '2':
        tf.random.set_seed(3)
    else:
        tf.set_random_seed(3)
    if verbose:
        print("RANDOM SEEDS RESET")

print("TF version: {}".format(tf.__version__))
reset_seeds()

def timeit(func, iterations, *args, _verbose=0, **kwargs):
    t0 = time()
    for _ in range(iterations):
        func(*args, **kwargs)
        print(end='.'*int(_verbose))
    print("Time/iter: %.4f sec" % ((time() - t0) / iterations))

def make_model_small(batch_shape):
    ipt   = Input(batch_shape=batch_shape)
    x     = Conv1D(128, 40, strides=4, padding='same')(ipt)
    x     = GlobalAveragePooling1D()(x)
    x     = Dropout(0.5)(x)
    x     = Dense(64, activation='relu')(x)
    out   = Dense(1,  activation='sigmoid')(x)
    model = Model(ipt, out)
    model.compile(Adam(lr=1e-4), 'binary_crossentropy')
    return model

def make_model_medium(batch_shape):
    ipt = Input(batch_shape=batch_shape)
    x = ipt
    for filters in [64, 128, 256, 256, 128, 64]:
        x  = Conv1D(filters, 20, strides=1, padding='valid')(x)
    x     = GlobalAveragePooling1D()(x)
    x     = Dense(256, activation='relu')(x)
    x     = Dropout(0.5)(x)
    x     = Dense(128, activation='relu')(x)
    x     = Dense(64,  activation='relu')(x)
    out   = Dense(1,   activation='sigmoid')(x)
    model = Model(ipt, out)
    model.compile(Adam(lr=1e-4), 'binary_crossentropy')
    return model

def make_model_large(batch_shape):
    ipt   = Input(batch_shape=batch_shape)
    x     = Conv1D(64,  400, strides=4, padding='valid')(ipt)
    x     = Conv1D(128, 200, strides=1, padding='valid')(x)
    for _ in range(40):
        x = Conv1D(256,  12, strides=1, padding='same')(x)
    x     = Conv1D(512,  20, strides=2, padding='valid')(x)
    x     = Conv1D(1028, 10, strides=2, padding='valid')(x)
    x     = Conv1D(256,   1, strides=1, padding='valid')(x)
    x     = GlobalAveragePooling1D()(x)
    x     = Dense(256, activation='relu')(x)
    x     = Dropout(0.5)(x)
    x     = Dense(128, activation='relu')(x)
    x     = Dense(64,  activation='relu')(x)    
    out   = Dense(1,   activation='sigmoid')(x)
    model = Model(ipt, out)
    model.compile(Adam(lr=1e-4), 'binary_crossentropy')
    return model

def make_data(batch_shape):
    return np.random.randn(*batch_shape), \
           np.random.randint(0, 2, (batch_shape[0], 1))

def make_data_tf(batch_shape, n_batches, iters):
    data = np.random.randn(n_batches, *batch_shape),
    trgt = np.random.randint(0, 2, (n_batches, batch_shape[0], 1))
    return tf.data.Dataset.from_tensor_slices((data, trgt))#.repeat(iters)

batch_shape_small  = (32, 140,   30)
batch_shape_medium = (32, 1400,  30)
batch_shape_large  = (32, 14000, 30)

batch_shapes = batch_shape_small, batch_shape_medium, batch_shape_large
make_model_fns = make_model_small, make_model_medium, make_model_large
iterations = [200, 100, 50]
shape_names = ["Small data",  "Medium data",  "Large data"]
model_names = ["Small model", "Medium model", "Large model"]

def test_all(fit=False, tf_dataset=False):
    for model_fn, model_name, iters in zip(make_model_fns, model_names, iterations):
        for batch_shape, shape_name in zip(batch_shapes, shape_names):
            if (model_fn is make_model_large) and (batch_shape is batch_shape_small):
                continue
            reset_seeds(reset_graph_with_backend=K)
            if tf_dataset:
                data = make_data_tf(batch_shape, iters, iters)
            else:
                data = make_data(batch_shape)
            model = model_fn(batch_shape)

            if fit:
                if tf_dataset:
                    model.train_on_batch(data.take(1))
                    t0 = time()
                    model.fit(data, steps_per_epoch=iters)
                    print("Time/iter: %.4f sec" % ((time() - t0) / iters))
                else:
                    model.train_on_batch(*data)
                    timeit(model.fit, iters, *data, _verbose=1, verbose=0)
            else:
                model.train_on_batch(*data)
                timeit(model.train_on_batch, iters, *data, _verbose=1)
            cprint(">> {}, {} done <<\n".format(model_name, shape_name), 'blue')
            del model

test_all(fit=True, tf_dataset=False)

THIS ANSWER: aims to provide a detailed, graph/hardware-level description of the issue – including TF2 vs. TF1 train loops, input data processors, and Eager vs. Graph mode executions. For an issue summary & resolution guidelines, see my other answer.


PERFORMANCE VERDICT: sometimes one is faster, sometimes the other, depending on configuration. As far as TF2 vs TF1 goes, they’re about on par on average, but significant config-based differences do exist, and TF1 trumps TF2 more often than vice versa. See “BENCHMARKING” below.


EAGER VS. GRAPH: the meat of this entire answer for some: TF2’s eager is slower than TF1’s, according to my testing. Details further down.

The fundamental difference between the two is: Graph sets up a computational network proactively, and executes when ‘told to’ – whereas Eager executes everything upon creation. But the story only begins here:

  • Eager is NOT devoid of Graph, and may in fact be mostly Graph, contrary to expectation. What it largely is, is executed Graph – this includes model & optimizer weights, comprising a great portion of the graph.

  • Eager rebuilds part of own graph at execution; direct consequence of Graph not being fully built — see profiler results. This has a computational overhead.

  • Eager is slower w/ Numpy inputs; per this Git comment & code, Numpy inputs in Eager include the overhead cost of copying tensors from CPU to GPU. Stepping through source code, data handling differences are clear; Eager directly passes Numpy, while Graph passes tensors which then evaluate to Numpy; uncertain of the exact process, but latter should involve GPU-level optimizations

  • TF2 Eager is slower than TF1 Eager – this is… unexpected. See benchmarking results below. Differences span from negligible to significant, but are consistent. Unsure why it’s the case – if a TF dev clarifies, will update answer.


TF2 vs. TF1: quoting relevant portions of a TF dev’s, Q. Scott Zhu’s, response – w/ bit of my emphasis & rewording:

In eager, the runtime needs to execute the ops and return the numerical value for every line of python code. The nature of single step execution causes it to be slow.

In TF2, Keras leverages tf.function to build its graph for training, eval and prediction. We call them “execution function” for the model. In TF1, the “execution function” was a FuncGraph, which shared some common component as TF function, but has a different implementation.

During the process, we somehow left an incorrect implementation for train_on_batch(), test_on_batch() and predict_on_batch(). They are still numerically correct, but the execution function for x_on_batch is a pure python function, rather than a tf.function wrapped python function. This will cause slowness

In TF2, we convert all input data into a tf.data.Dataset, by which we can unify our execution function to handle the single type of the inputs. There might be some overhead in the dataset conversion, and I think this is a one-time only overhead, rather than a per-batch cost

With the last sentence of last paragraph above, and last clause of below paragraph:

To overcome the slowness in eager mode, we have @tf.function, which will turn a python function into a graph. When feed numerical value like np array, the body of the tf.function is converted into static graph, being optimized, and return the final value, which is fast and should have similar performance as TF1 graph mode.

I disagree – per my profiling results, which show Eager’s input data processing to be substantially slower than Graph’s. Also, unsure about tf.data.Dataset in particular, but Eager does repeatedly call multiple of the same data conversion methods – see profiler.

Lastly, dev’s linked commit: Significant number of changes to support the Keras v2 loops.


Train Loops: depending on (1) Eager vs. Graph; (2) input data format, training in will proceed with a distinct train loop – in TF2, _select_training_loop(), training.py, one of:

training_v2.Loop()
training_distributed.DistributionMultiWorkerTrainingLoop(
              training_v2.Loop()) # multi-worker mode
# Case 1: distribution strategy
training_distributed.DistributionMultiWorkerTrainingLoop(
            training_distributed.DistributionSingleWorkerTrainingLoop())
# Case 2: generator-like. Input is Python generator, or Sequence object,
# or a non-distributed Dataset or iterator in eager execution.
training_generator.GeneratorOrSequenceTrainingLoop()
training_generator.EagerDatasetOrIteratorTrainingLoop()
# Case 3: Symbolic tensors or Numpy array-like. This includes Datasets and iterators 
# in graph mode (since they generate symbolic tensors).
training_generator.GeneratorLikeTrainingLoop() # Eager
training_arrays.ArrayLikeTrainingLoop() # Graph

Each handles resource allocation differently, and bears consequences on performance & capability.


Train Loops: fit vs train_on_batch, keras vs. tf.keras: each of the four uses different train loops, though perhaps not in every possible combination. kerasfit, for example, uses a form of fit_loop, e.g. training_arrays.fit_loop(), and its train_on_batch may use K.function(). tf.keras has a more sophisticated hierarchy described in part in previous section.


Train Loops: documentation — relevant source docstring on some of the different execution methods:

Unlike other TensorFlow operations, we don’t convert python numerical inputs to tensors. Moreover, a new graph is generated for each distinct python numerical value

function instantiates a separate graph for every unique set of input shapes and datatypes.

A single tf.function object might need to map to multiple computation graphs under the hood. This should be visible only as performance (tracing graphs has a nonzero computational and memory cost)


Input data processors: similar to above, the processor is selected case-by-case, depending on internal flags set according to runtime configurations (execution mode, data format, distribution strategy). The simplest case’s with Eager, which works directly w/ Numpy arrays. For some specific examples, see this answer.


MODEL SIZE, DATA SIZE:

  • Is decisive; no single configuration crowned itself atop all model & data sizes.
  • Data size relative to model size is important; for small data & model, data transfer (e.g. CPU to GPU) overhead can dominate. Likewise, small overhead processors can run slower on large data per data conversion time dominating (see convert_to_tensor in “PROFILER”)
  • Speed differs per train loops’ and input data processors’ differing means of handling resources.

BENCHMARKS: the grinded meat. — Word DocumentExcel Spreadsheet


Terminology:

  • %-less numbers are all seconds
  • % computed as (1 - longer_time / shorter_time)*100; rationale: we’re interested by what factor one is faster than the other; shorter / longer is actually a non-linear relation, not useful for direct comparison
  • % sign determination:
    • TF2 vs TF1: + if TF2 is faster
    • GvE (Graph vs. Eager): + if Graph is faster
  • TF2 = TensorFlow 2.0.0 + Keras 2.3.1; TF1 = TensorFlow 1.14.0 + Keras 2.2.5

PROFILER:


PROFILER – Explanation: Spyder 3.3.6 IDE profiler.

  • Some functions are repeated in nests of others; hence, it’s hard to track down the exact separation between “data processing” and “training” functions, so there will be some overlap – as pronounced in the very last result.

  • % figures computed w.r.t. runtime minus build time

  • Build time computed by summing all (unique) runtimes which were called 1 or 2 times
  • Train time computed by summing all (unique) runtimes which were called the same # of times as the # of iterations, and some of their nests’ runtimes
  • Functions are profiled according to their original names, unfortunately (i.e. _func = func will profile as func), which mixes in build time – hence the need to exclude it

TESTING ENVIRONMENT:

  • Executed code at bottom w/ minimal background tasks running
  • GPU was “warmed up” w/ a few iterations before timing iterations, as suggested in this post
  • CUDA 10.0.130, cuDNN 7.6.0, TensorFlow 1.14.0, & TensorFlow 2.0.0 built from source, plus Anaconda
  • Python 3.7.4, Spyder 3.3.6 IDE
  • GTX 1070, Windows 10, 24GB DDR4 2.4-MHz RAM, i7-7700HQ 2.8-GHz CPU

METHODOLOGY:

  • Benchmark ‘small’, ‘medium’, & ‘large’ model & data sizes
  • Fix # of parameters for each model size, independent of input data size
  • “Larger” model has more parameters and layers
  • “Larger” data has a longer sequence, but same batch_size and num_channels
  • Models only use Conv1D, Dense ‘learnable’ layers; RNNs avoided per TF-version implem. differences
  • Always ran one train fit outside of benchmarking loop, to omit model & optimizer graph building
  • Not using sparse data (e.g. layers.Embedding()) or sparse targets (e.g. SparseCategoricalCrossEntropy()

LIMITATIONS: a “complete” answer would explain every possible train loop & iterator, but that’s surely beyond my time ability, nonexistent paycheck, or general necessity. The results are only as good as the methodology – interpret with an open mind.


CODE:

import numpy as np
import tensorflow as tf
import random
from termcolor import cprint
from time import time

from tensorflow.keras.layers import Input, Dense, Conv1D
from tensorflow.keras.layers import Dropout, GlobalAveragePooling1D
from tensorflow.keras.models import Model
from tensorflow.keras.optimizers import Adam
import tensorflow.keras.backend as K
#from keras.layers import Input, Dense, Conv1D
#from keras.layers import Dropout, GlobalAveragePooling1D
#from keras.models import Model 
#from keras.optimizers import Adam
#import keras.backend as K

#tf.compat.v1.disable_eager_execution()
#tf.enable_eager_execution()

def reset_seeds(reset_graph_with_backend=None, verbose=1):
    if reset_graph_with_backend is not None:
        K = reset_graph_with_backend
        K.clear_session()
        tf.compat.v1.reset_default_graph()
        if verbose:
            print("KERAS AND TENSORFLOW GRAPHS RESET")

    np.random.seed(1)
    random.seed(2)
    if tf.__version__[0] == '2':
        tf.random.set_seed(3)
    else:
        tf.set_random_seed(3)
    if verbose:
        print("RANDOM SEEDS RESET")

print("TF version: {}".format(tf.__version__))
reset_seeds()

def timeit(func, iterations, *args, _verbose=0, **kwargs):
    t0 = time()
    for _ in range(iterations):
        func(*args, **kwargs)
        print(end='.'*int(_verbose))
    print("Time/iter: %.4f sec" % ((time() - t0) / iterations))

def make_model_small(batch_shape):
    ipt   = Input(batch_shape=batch_shape)
    x     = Conv1D(128, 40, strides=4, padding='same')(ipt)
    x     = GlobalAveragePooling1D()(x)
    x     = Dropout(0.5)(x)
    x     = Dense(64, activation='relu')(x)
    out   = Dense(1,  activation='sigmoid')(x)
    model = Model(ipt, out)
    model.compile(Adam(lr=1e-4), 'binary_crossentropy')
    return model

def make_model_medium(batch_shape):
    ipt = Input(batch_shape=batch_shape)
    x = ipt
    for filters in [64, 128, 256, 256, 128, 64]:
        x  = Conv1D(filters, 20, strides=1, padding='valid')(x)
    x     = GlobalAveragePooling1D()(x)
    x     = Dense(256, activation='relu')(x)
    x     = Dropout(0.5)(x)
    x     = Dense(128, activation='relu')(x)
    x     = Dense(64,  activation='relu')(x)
    out   = Dense(1,   activation='sigmoid')(x)
    model = Model(ipt, out)
    model.compile(Adam(lr=1e-4), 'binary_crossentropy')
    return model

def make_model_large(batch_shape):
    ipt   = Input(batch_shape=batch_shape)
    x     = Conv1D(64,  400, strides=4, padding='valid')(ipt)
    x     = Conv1D(128, 200, strides=1, padding='valid')(x)
    for _ in range(40):
        x = Conv1D(256,  12, strides=1, padding='same')(x)
    x     = Conv1D(512,  20, strides=2, padding='valid')(x)
    x     = Conv1D(1028, 10, strides=2, padding='valid')(x)
    x     = Conv1D(256,   1, strides=1, padding='valid')(x)
    x     = GlobalAveragePooling1D()(x)
    x     = Dense(256, activation='relu')(x)
    x     = Dropout(0.5)(x)
    x     = Dense(128, activation='relu')(x)
    x     = Dense(64,  activation='relu')(x)    
    out   = Dense(1,   activation='sigmoid')(x)
    model = Model(ipt, out)
    model.compile(Adam(lr=1e-4), 'binary_crossentropy')
    return model

def make_data(batch_shape):
    return np.random.randn(*batch_shape), \
           np.random.randint(0, 2, (batch_shape[0], 1))

def make_data_tf(batch_shape, n_batches, iters):
    data = np.random.randn(n_batches, *batch_shape),
    trgt = np.random.randint(0, 2, (n_batches, batch_shape[0], 1))
    return tf.data.Dataset.from_tensor_slices((data, trgt))#.repeat(iters)

batch_shape_small  = (32, 140,   30)
batch_shape_medium = (32, 1400,  30)
batch_shape_large  = (32, 14000, 30)

batch_shapes = batch_shape_small, batch_shape_medium, batch_shape_large
make_model_fns = make_model_small, make_model_medium, make_model_large
iterations = [200, 100, 50]
shape_names = ["Small data",  "Medium data",  "Large data"]
model_names = ["Small model", "Medium model", "Large model"]

def test_all(fit=False, tf_dataset=False):
    for model_fn, model_name, iters in zip(make_model_fns, model_names, iterations):
        for batch_shape, shape_name in zip(batch_shapes, shape_names):
            if (model_fn is make_model_large) and (batch_shape is batch_shape_small):
                continue
            reset_seeds(reset_graph_with_backend=K)
            if tf_dataset:
                data = make_data_tf(batch_shape, iters, iters)
            else:
                data = make_data(batch_shape)
            model = model_fn(batch_shape)

            if fit:
                if tf_dataset:
                    model.train_on_batch(data.take(1))
                    t0 = time()
                    model.fit(data, steps_per_epoch=iters)
                    print("Time/iter: %.4f sec" % ((time() - t0) / iters))
                else:
                    model.train_on_batch(*data)
                    timeit(model.fit, iters, *data, _verbose=1, verbose=0)
            else:
                model.train_on_batch(*data)
                timeit(model.train_on_batch, iters, *data, _verbose=1)
            cprint(">> {}, {} done <<\n".format(model_name, shape_name), 'blue')
            del model

test_all(fit=True, tf_dataset=False)

如何在Python中打破一系列链接方法?

问题:如何在Python中打破一系列链接方法?

我有以下代码行(不要怪罪命名约定,它们不是我的):

subkeyword = Session.query(
    Subkeyword.subkeyword_id, Subkeyword.subkeyword_word
).filter_by(
    subkeyword_company_id=self.e_company_id
).filter_by(
    subkeyword_word=subkeyword_word
).filter_by(
    subkeyword_active=True
).one()

我不喜欢它的外观(不太可读),但是在这种情况下,我没有更好的主意将行数限制为79个字符。有没有更好的方法来破解它(最好没有反斜杠)?

I have a line of the following code (don’t blame for naming conventions, they are not mine):

subkeyword = Session.query(
    Subkeyword.subkeyword_id, Subkeyword.subkeyword_word
).filter_by(
    subkeyword_company_id=self.e_company_id
).filter_by(
    subkeyword_word=subkeyword_word
).filter_by(
    subkeyword_active=True
).one()

I don’t like how it looks like (not too readable) but I don’t have any better idea to limit lines to 79 characters in this situation. Is there a better way of breaking it (preferably without backslashes)?


回答 0

您可以使用其他括号:

subkeyword = (
        Session.query(Subkeyword.subkeyword_id, Subkeyword.subkeyword_word)
        .filter_by(subkeyword_company_id=self.e_company_id)
        .filter_by(subkeyword_word=subkeyword_word)
        .filter_by(subkeyword_active=True)
        .one()
    )

You could use additional parenthesis:

subkeyword = (
        Session.query(Subkeyword.subkeyword_id, Subkeyword.subkeyword_word)
        .filter_by(subkeyword_company_id=self.e_company_id)
        .filter_by(subkeyword_word=subkeyword_word)
        .filter_by(subkeyword_active=True)
        .one()
    )

回答 1

在这种情况下,最好使用连续行字符代替括号。随着方法名称变长以及方法开始采用参数,对这种样式的需求变得更加明显:

subkeyword = Session.query(Subkeyword.subkeyword_id, Subkeyword.subkeyword_word) \
                    .filter_by(subkeyword_company_id=self.e_company_id)          \
                    .filter_by(subkeyword_word=subkeyword_word)                  \
                    .filter_by(subkeyword_active=True)                           \
                    .one()

PEP 8旨在以一种常识的方式进行解释,并兼顾实用性和美观性。很高兴违反任何导致难看或难以阅读代码的PEP 8准则。

话虽如此,如果您经常发现自己与PEP 8不符,则可能表明存在一些可读性问题超出了对空白的选择:-)

This is a case where a line continuation character is preferred to open parentheses. The need for this style becomes more obvious as method names get longer and as methods start taking arguments:

subkeyword = Session.query(Subkeyword.subkeyword_id, Subkeyword.subkeyword_word) \
                    .filter_by(subkeyword_company_id=self.e_company_id)          \
                    .filter_by(subkeyword_word=subkeyword_word)                  \
                    .filter_by(subkeyword_active=True)                           \
                    .one()

PEP 8 is intend to be interpreted with a measure of common-sense and an eye for both the practical and the beautiful. Happily violate any PEP 8 guideline that results in ugly or hard to read code.

That being said, if you frequently find yourself at odds with PEP 8, it may be a sign that there are readability issues that transcend your choice of whitespace :-)


回答 2

我个人的选择是:

子关键字= Session.query(
    Subkeyword.subkeyword_id,
    Subkeyword.subkeyword_word,
)。过滤(
    subkeyword_company_id = self.e_company_id,
    subkeyword_word = subkeyword_word,
    subkeyword_active =真实,
)。之一()

My personal choice would be:

subkeyword = Session.query(
    Subkeyword.subkeyword_id,
    Subkeyword.subkeyword_word,
).filter_by(
    subkeyword_company_id=self.e_company_id,
    subkeyword_word=subkeyword_word,
    subkeyword_active=True,
).one()

回答 3

只需存储中间结果/对象并在其上调用下一个方法,例如

q = Session.query(Subkeyword.subkeyword_id, Subkeyword.subkeyword_word)
q = q.filter_by(subkeyword_company_id=self.e_company_id)
q = q.filter_by(subkeyword_word=subkeyword_word)
q = q.filter_by(subkeyword_active=True)
subkeyword = q.one()

Just store the intermediate result/object and invoke the next method on it, e.g.

q = Session.query(Subkeyword.subkeyword_id, Subkeyword.subkeyword_word)
q = q.filter_by(subkeyword_company_id=self.e_company_id)
q = q.filter_by(subkeyword_word=subkeyword_word)
q = q.filter_by(subkeyword_active=True)
subkeyword = q.one()

回答 4

根据Python语言参考,
您可以使用反斜杠。
或简单地打破它。如果括号未配对,则python不会将其视为一行。在这种情况下,以下行的缩进无关紧要。

According to Python Language Reference
You can use a backslash.
Or simply break it. If a bracket is not paired, python will not treat that as a line. And under such circumstance, the indentation of following lines doesn’t matter.


回答 5

它与其他人提供的解决方案有些不同,但我的最爱,因为它有时会导致漂亮的元编程。

base = [Subkeyword.subkeyword_id, Subkeyword_word]
search = {
    'subkeyword_company_id':self.e_company_id,
    'subkeyword_word':subkeyword_word,
    'subkeyword_active':True,
    }
subkeyword = Session.query(*base).filter_by(**search).one()

这是构建搜索的好方法。浏览条件列表,从复杂的查询表单(或基于字符串的有关用户正在寻找的内容的推论)中进行挖掘,然后将字典分解到过滤器中。

It’s a bit of a different solution than provided by others but a favorite of mine since it leads to nifty metaprogramming sometimes.

base = [Subkeyword.subkeyword_id, Subkeyword_word]
search = {
    'subkeyword_company_id':self.e_company_id,
    'subkeyword_word':subkeyword_word,
    'subkeyword_active':True,
    }
subkeyword = Session.query(*base).filter_by(**search).one()

This is a nice technique for building searches. Go through a list of conditionals to mine from your complex query form (or string-based deductions about what the user is looking for), then just explode the dictionary into the filter.


回答 6

您似乎在使用SQLAlchemy,如果为true,则sqlalchemy.orm.query.Query.filter_by()方法采用多个关键字参数,因此您可以这样编写:

subkeyword = Session.query(Subkeyword.subkeyword_id,
                           Subkeyword.subkeyword_word) \
                    .filter_by(subkeyword_company_id=self.e_company_id,
                               subkeyword_word=subkeyword_word,
                               subkeyword_active=True) \
                    .one()

但这会更好:

subkeyword = Session.query(Subkeyword.subkeyword_id,
                           Subkeyword.subkeyword_word)
subkeyword = subkeyword.filter_by(subkeyword_company_id=self.e_company_id,
                                  subkeyword_word=subkeyword_word,
                                  subkeyword_active=True)
subkeuword = subkeyword.one()

You seems using SQLAlchemy, if it is true, sqlalchemy.orm.query.Query.filter_by() method takes multiple keyword arguments, so you could write like:

subkeyword = Session.query(Subkeyword.subkeyword_id,
                           Subkeyword.subkeyword_word) \
                    .filter_by(subkeyword_company_id=self.e_company_id,
                               subkeyword_word=subkeyword_word,
                               subkeyword_active=True) \
                    .one()

But it would be better:

subkeyword = Session.query(Subkeyword.subkeyword_id,
                           Subkeyword.subkeyword_word)
subkeyword = subkeyword.filter_by(subkeyword_company_id=self.e_company_id,
                                  subkeyword_word=subkeyword_word,
                                  subkeyword_active=True)
subkeuword = subkeyword.one()

回答 7

我喜欢将参数缩进两个块,将语句缩进一个块,如下所示:

for image_pathname in image_directory.iterdir():
    image = cv2.imread(str(image_pathname))
    input_image = np.resize(
            image, (height, width, 3)
        ).transpose((2,0,1)).reshape(1, 3, height, width)
    net.forward_all(data=input_image)
    segmentation_index = net.blobs[
            'argmax'
        ].data.squeeze().transpose(1,2,0).astype(np.uint8)
    segmentation = np.empty(segmentation_index.shape, dtype=np.uint8)
    cv2.LUT(segmentation_index, label_colours, segmentation)
    prediction_pathname = prediction_directory / image_pathname.name
    cv2.imwrite(str(prediction_pathname), segmentation)

I like to indent the arguments by two blocks, and the statement by one block, like these:

for image_pathname in image_directory.iterdir():
    image = cv2.imread(str(image_pathname))
    input_image = np.resize(
            image, (height, width, 3)
        ).transpose((2,0,1)).reshape(1, 3, height, width)
    net.forward_all(data=input_image)
    segmentation_index = net.blobs[
            'argmax'
        ].data.squeeze().transpose(1,2,0).astype(np.uint8)
    segmentation = np.empty(segmentation_index.shape, dtype=np.uint8)
    cv2.LUT(segmentation_index, label_colours, segmentation)
    prediction_pathname = prediction_directory / image_pathname.name
    cv2.imwrite(str(prediction_pathname), segmentation)

给定一百万个数字的字符串,返回所有重复的3位数字

问题:给定一百万个数字的字符串,返回所有重复的3位数字

几个月前,我在纽约接受了一家对冲基金公司的采访,不幸的是,我没有获得数据/软件工程师的实习机会。(他们还要求解决方案使用Python。)

我几乎搞砸了第一次面试的问题…

问题:给定一百万个数字的字符串(例如,Pi),编写一个函数/程序,该函数/程序返回所有重复的3位数字,并且重复次数大于1

例如:如果字符串为:123412345123456则函数/程序将返回:

123 - 3 times
234 - 3 times
345 - 2 times

在面试失败后,他们没有给我解决方案,但他们确实告诉我,解决方案的时间复杂度恒定为1000,因为所有可能的结果都介于:

000-> 999

现在我正在考虑它,我认为不可能提出一个恒定时间算法。是吗?

I had an interview with a hedge fund company in New York a few months ago and unfortunately, I did not get the internship offer as a data/software engineer. (They also asked the solution to be in Python.)

I pretty much screwed up on the first interview problem…

Question: Given a string of a million numbers (Pi for example), write a function/program that returns all repeating 3 digit numbers and number of repetition greater than 1

For example: if the string was: 123412345123456 then the function/program would return:

123 - 3 times
234 - 3 times
345 - 2 times

They did not give me the solution after I failed the interview, but they did tell me that the time complexity for the solution was constant of 1000 since all the possible outcomes are between:

000 –> 999

Now that I’m thinking about it, I don’t think it’s possible to come up with a constant time algorithm. Is it?


回答 0

您轻轻松松下手,您可能不想在对量子点不了解基本算法的对冲基金中工作:-)

在这种情况下,如果您需要至少访问一次每个元素,则无法处理任意大小的数据结构O(1)。在这种情况下,您可以期望的最好是字符串的长度。O(n)n

虽然,顺便说一句,标称O(n)算法O(1)对一个固定的输入大小,这样,在技术上,他们可能已经在这里正确的。但是,这通常不是人们使用复杂度分析的方式。

在我看来,您可能会在很多方面给他们留下深刻的印象。

首先,通知他们,它是不是能够做到这一点的O(1),除非你使用上面的“犯罪嫌疑人”说理给出。

其次,通过提供Pythonic代码来展示您的精英技能,例如:

inpStr = '123412345123456'

# O(1) array creation.
freq = [0] * 1000

# O(n) string processing.
for val in [int(inpStr[pos:pos+3]) for pos in range(len(inpStr) - 2)]:
    freq[val] += 1

# O(1) output of relevant array values.
print ([(num, freq[num]) for num in range(1000) if freq[num] > 1])

输出:

[(123, 3), (234, 3), (345, 2)]

尽管您当然可以将输出格式修改为所需的任何格式。

最后,通过告诉他们解决方案几乎没有问题O(n),因为上面的代码在不到半秒钟的时间内即可提供一百万个数字字符串的结果。它似乎也线性地缩放,因为一个10,000,000个字符的字符串需要3.5秒,而一个100,000,000个字符的字符串需要36秒。

而且,如果他们需要的更好,则可以采用多种方法并行化此类内容,从而可以大大加快这种处理速度。

当然,由于GIL的缘故,不在单个 Python解释器中,但是您可以将字符串拆分成类似的字符(vv为了正确处理边界区域,必须用表示的重叠):

    vv
123412  vv
    123451
        5123456

您可以将它们种出以分开工作,然后再合并结果。

输入的拆分和输出的合并很可能会用小字符串(甚至可能是百万位数字的字符串)淹没任何节省的时间,但是,对于更大的数据集,这很可能会有所作为。当然,我通常的口号是:“不要猜测”


此口头禅也适用于其他可能性,例如完全绕过Python并使用可能更快的其他语言。

例如,以下C代码,在相同的硬件作为较早Python代码运行,处理一个在0.6秒万位,大致为Python代码处理的相同的时间量之一百万。换句话说,速度快:

#include <stdio.h>
#include <string.h>

int main(void) {
    static char inpStr[100000000+1];
    static int freq[1000];

    // Set up test data.

    memset(inpStr, '1', sizeof(inpStr));
    inpStr[sizeof(inpStr)-1] = '\0';

    // Need at least three digits to do anything useful.

    if (strlen(inpStr) <= 2) return 0;

    // Get initial feed from first two digits, process others.

    int val = (inpStr[0] - '0') * 10 + inpStr[1] - '0';
    char *inpPtr = &(inpStr[2]);
    while (*inpPtr != '\0') {
        // Remove hundreds, add next digit as units, adjust table.

        val = (val % 100) * 10 + *inpPtr++ - '0';
        freq[val]++;
    }

    // Output (relevant part of) table.

    for (int i = 0; i < 1000; ++i)
        if (freq[i] > 1)
            printf("%3d -> %d\n", i, freq[i]);

    return 0;
}

You got off lightly, you probably don’t want to be working for a hedge fund where the quants don’t understand basic algorithms :-)

There is no way to process an arbitrarily-sized data structure in O(1) if, as in this case, you need to visit every element at least once. The best you can hope for is O(n) in this case, where n is the length of the string.

Although, as an aside, a nominal O(n) algorithm will be O(1) for a fixed input size so, technically, they may have been correct here. However, that’s not usually how people use complexity analysis.

It appears to me you could have impressed them in a number of ways.

First, by informing them that it’s not possible to do it in O(1), unless you use the “suspect” reasoning given above.

Second, by showing your elite skills by providing Pythonic code such as:

inpStr = '123412345123456'

# O(1) array creation.
freq = [0] * 1000

# O(n) string processing.
for val in [int(inpStr[pos:pos+3]) for pos in range(len(inpStr) - 2)]:
    freq[val] += 1

# O(1) output of relevant array values.
print ([(num, freq[num]) for num in range(1000) if freq[num] > 1])

This outputs:

[(123, 3), (234, 3), (345, 2)]

though you could, of course, modify the output format to anything you desire.

And, finally, by telling them there’s almost certainly no problem with an O(n) solution, since the code above delivers results for a one-million-digit string in well under half a second. It seems to scale quite linearly as well, since a 10,000,000-character string takes 3.5 seconds and a 100,000,000-character one takes 36 seconds.

And, if they need better than that, there are ways to parallelise this sort of stuff that can greatly speed it up.

Not within a single Python interpreter of course, due to the GIL, but you could split the string into something like (overlap indicated by vv is required to allow proper processing of the boundary areas):

    vv
123412  vv
    123451
        5123456

You can farm these out to separate workers and combine the results afterwards.

The splitting of input and combining of output are likely to swamp any saving with small strings (and possibly even million-digit strings) but, for much larger data sets, it may well make a difference. My usual mantra of “measure, don’t guess” applies here, of course.


This mantra also applies to other possibilities, such as bypassing Python altogether and using a different language which may be faster.

For example, the following C code, running on the same hardware as the earlier Python code, handles a hundred million digits in 0.6 seconds, roughly the same amount of time as the Python code processed one million. In other words, much faster:

#include <stdio.h>
#include <string.h>

int main(void) {
    static char inpStr[100000000+1];
    static int freq[1000];

    // Set up test data.

    memset(inpStr, '1', sizeof(inpStr));
    inpStr[sizeof(inpStr)-1] = '\0';

    // Need at least three digits to do anything useful.

    if (strlen(inpStr) <= 2) return 0;

    // Get initial feed from first two digits, process others.

    int val = (inpStr[0] - '0') * 10 + inpStr[1] - '0';
    char *inpPtr = &(inpStr[2]);
    while (*inpPtr != '\0') {
        // Remove hundreds, add next digit as units, adjust table.

        val = (val % 100) * 10 + *inpPtr++ - '0';
        freq[val]++;
    }

    // Output (relevant part of) table.

    for (int i = 0; i < 1000; ++i)
        if (freq[i] > 1)
            printf("%3d -> %d\n", i, freq[i]);

    return 0;
}

回答 1

固定时间是不可能的。所有一百万个数字都需要至少被查看一次,因此这是时间复杂度O(n),在这种情况下,n =一百万。

对于简单的O(n)解决方案,创建一个大小为1000的数组,该数组表示每个可能的3位数字的出现次数。一次前进1位数字,第一个索引== 0,最后一个索引== 999997,并递增array [3位数字]以创建直方图(每个可能的3位数字出现的次数)。然后输出计数> 1的数组内容。

Constant time isn’t possible. All 1 million digits need to be looked at at least once, so that is a time complexity of O(n), where n = 1 million in this case.

For a simple O(n) solution, create an array of size 1000 that represents the number of occurrences of each possible 3 digit number. Advance 1 digit at a time, first index == 0, last index == 999997, and increment array[3 digit number] to create a histogram (count of occurrences for each possible 3 digit number). Then output the content of the array with counts > 1.


回答 2

一百万对于我在下面给出的答案来说很小。只期望您必须能够不间断地在面试中运行解决方案,然后以下操作将在不到两秒钟的时间内完成并给出所需的结果:

from collections import Counter

def triple_counter(s):
    c = Counter(s[n-3: n] for n in range(3, len(s)))
    for tri, n in c.most_common():
        if n > 1:
            print('%s - %i times.' % (tri, n))
        else:
            break

if __name__ == '__main__':
    import random

    s = ''.join(random.choice('0123456789') for _ in range(1_000_000))
    triple_counter(s)

希望访问者可以使用标准库collections.Counter类。

并行执行版本

我为此写了一篇博客文章,并提供了更多解释。

A million is small for the answer I give below. Expecting only that you have to be able to run the solution in the interview, without a pause, then The following works in less than two seconds and gives the required result:

from collections import Counter

def triple_counter(s):
    c = Counter(s[n-3: n] for n in range(3, len(s)))
    for tri, n in c.most_common():
        if n > 1:
            print('%s - %i times.' % (tri, n))
        else:
            break

if __name__ == '__main__':
    import random

    s = ''.join(random.choice('0123456789') for _ in range(1_000_000))
    triple_counter(s)

Hopefully the interviewer would be looking for use of the standard libraries collections.Counter class.

Parallel execution version

I wrote a blog post on this with more explanation.


回答 3

简单的O(n)解决方案是对每个3位数字进行计数:

for nr in range(1000):
    cnt = text.count('%03d' % nr)
    if cnt > 1:
        print '%03d is found %d times' % (nr, cnt)

这将搜索全部100万个数字1000次。

仅遍历数字一次:

counts = [0] * 1000
for idx in range(len(text)-2):
    counts[int(text[idx:idx+3])] += 1

for nr, cnt in enumerate(counts):
    if cnt > 1:
        print '%03d is found %d times' % (nr, cnt)

时序显示,仅对索引进行一次迭代是使用的两倍count

The simple O(n) solution would be to count each 3-digit number:

for nr in range(1000):
    cnt = text.count('%03d' % nr)
    if cnt > 1:
        print '%03d is found %d times' % (nr, cnt)

This would search through all 1 million digits 1000 times.

Traversing the digits only once:

counts = [0] * 1000
for idx in range(len(text)-2):
    counts[int(text[idx:idx+3])] += 1

for nr, cnt in enumerate(counts):
    if cnt > 1:
        print '%03d is found %d times' % (nr, cnt)

Timing shows that iterating only once over the index is twice as fast as using count.


回答 4

这是“共识” O(n)算法的NumPy实现:遍历所有三元组和bin。通过遇到“ 385”,将bin加到bin [3,8,5](这是一个O(1)操作)中来完成合并。垃圾箱排列成一个10x10x10立方体。由于合并已完全矢量化,因此代码中没有循环。

def setup_data(n):
    import random
    digits = "0123456789"
    return dict(text = ''.join(random.choice(digits) for i in range(n)))

def f_np(text):
    # Get the data into NumPy
    import numpy as np
    a = np.frombuffer(bytes(text, 'utf8'), dtype=np.uint8) - ord('0')
    # Rolling triplets
    a3 = np.lib.stride_tricks.as_strided(a, (3, a.size-2), 2*a.strides)

    bins = np.zeros((10, 10, 10), dtype=int)
    # Next line performs O(n) binning
    np.add.at(bins, tuple(a3), 1)
    # Filtering is left as an exercise
    return bins.ravel()

def f_py(text):
    counts = [0] * 1000
    for idx in range(len(text)-2):
        counts[int(text[idx:idx+3])] += 1
    return counts

import numpy as np
import types
from timeit import timeit
for n in (10, 1000, 1000000):
    data = setup_data(n)
    ref = f_np(**data)
    print(f'n = {n}')
    for name, func in list(globals().items()):
        if not name.startswith('f_') or not isinstance(func, types.FunctionType):
            continue
        try:
            assert np.all(ref == func(**data))
            print("{:16s}{:16.8f} ms".format(name[2:], timeit(
                'f(**data)', globals={'f':func, 'data':data}, number=10)*100))
        except:
            print("{:16s} apparently crashed".format(name[2:]))

毫不奇怪,在大型数据集上,NumPy比@Daniel的纯Python解决方案要快一点。样本输出:

# n = 10
# np                    0.03481400 ms
# py                    0.00669330 ms
# n = 1000
# np                    0.11215360 ms
# py                    0.34836530 ms
# n = 1000000
# np                   82.46765980 ms
# py                  360.51235450 ms

Here is a NumPy implementation of the “consensus” O(n) algorithm: walk through all triplets and bin as you go. The binning is done by upon encountering say “385”, adding one to bin[3, 8, 5] which is an O(1) operation. Bins are arranged in a 10x10x10 cube. As the binning is fully vectorized there is no loop in the code.

def setup_data(n):
    import random
    digits = "0123456789"
    return dict(text = ''.join(random.choice(digits) for i in range(n)))

def f_np(text):
    # Get the data into NumPy
    import numpy as np
    a = np.frombuffer(bytes(text, 'utf8'), dtype=np.uint8) - ord('0')
    # Rolling triplets
    a3 = np.lib.stride_tricks.as_strided(a, (3, a.size-2), 2*a.strides)

    bins = np.zeros((10, 10, 10), dtype=int)
    # Next line performs O(n) binning
    np.add.at(bins, tuple(a3), 1)
    # Filtering is left as an exercise
    return bins.ravel()

def f_py(text):
    counts = [0] * 1000
    for idx in range(len(text)-2):
        counts[int(text[idx:idx+3])] += 1
    return counts

import numpy as np
import types
from timeit import timeit
for n in (10, 1000, 1000000):
    data = setup_data(n)
    ref = f_np(**data)
    print(f'n = {n}')
    for name, func in list(globals().items()):
        if not name.startswith('f_') or not isinstance(func, types.FunctionType):
            continue
        try:
            assert np.all(ref == func(**data))
            print("{:16s}{:16.8f} ms".format(name[2:], timeit(
                'f(**data)', globals={'f':func, 'data':data}, number=10)*100))
        except:
            print("{:16s} apparently crashed".format(name[2:]))

Unsurprisingly, NumPy is a bit faster than @Daniel’s pure Python solution on large data sets. Sample output:

# n = 10
# np                    0.03481400 ms
# py                    0.00669330 ms
# n = 1000
# np                    0.11215360 ms
# py                    0.34836530 ms
# n = 1000000
# np                   82.46765980 ms
# py                  360.51235450 ms

回答 5

我将解决以下问题:

def find_numbers(str_num):
    final_dict = {}
    buffer = {}
    for idx in range(len(str_num) - 3):
        num = int(str_num[idx:idx + 3])
        if num not in buffer:
            buffer[num] = 0
        buffer[num] += 1
        if buffer[num] > 1:
            final_dict[num] = buffer[num]
    return final_dict

应用于示例字符串,将生成:

>>> find_numbers("123412345123456")
{345: 2, 234: 3, 123: 3}

该解决方案在O(n)中运行,因为n是提供的字符串的长度,并且我认为这是您可以获得的最佳结果。

I would solve the problem as follows:

def find_numbers(str_num):
    final_dict = {}
    buffer = {}
    for idx in range(len(str_num) - 3):
        num = int(str_num[idx:idx + 3])
        if num not in buffer:
            buffer[num] = 0
        buffer[num] += 1
        if buffer[num] > 1:
            final_dict[num] = buffer[num]
    return final_dict

Applied to your example string, this yields:

>>> find_numbers("123412345123456")
{345: 2, 234: 3, 123: 3}

This solution runs in O(n) for n being the length of the provided string, and is, I guess, the best you can get.


回答 6

根据我的理解,您无法在固定时间内获得解决方案。至少需要通过一百万个数字(假设它是一个字符串)。您可以对百万个长度数字的位数进行三位数的滚动迭代,如果哈希键已经存在,则将其增加1;如果哈希密钥不存在,则创建一个新的哈希键(由值1初始化)。词典。

该代码将如下所示:

def calc_repeating_digits(number):

    hash = {}

    for i in range(len(str(number))-2):

        current_three_digits = number[i:i+3]
        if current_three_digits in hash.keys():
            hash[current_three_digits] += 1

        else:
            hash[current_three_digits] = 1

    return hash

您可以筛选出项值大于1的键。

As per my understanding, you cannot have the solution in a constant time. It will take at least one pass over the million digit number (assuming its a string). You can have a 3-digit rolling iteration over the digits of the million length number and increase the value of hash key by 1 if it already exists or create a new hash key (initialized by value 1) if it doesn’t exists already in the dictionary.

The code will look something like this:

def calc_repeating_digits(number):

    hash = {}

    for i in range(len(str(number))-2):

        current_three_digits = number[i:i+3]
        if current_three_digits in hash.keys():
            hash[current_three_digits] += 1

        else:
            hash[current_three_digits] = 1

    return hash

You can filter down to the keys which have item value greater than 1.


回答 7

如另一个答案中所述,您不能在固定时间内执行此算法,因为您必须查看至少n位数字。线性时间是最快的。

但是,该算法可以在O(1)空间中完成。您只需要存储每个3位数字的计数,因此您需要一个包含1000个条目的数组。然后,您可以输入号码。

我的猜测是,当面试官给您解决方案时,他们会误以为是,或者当他们说“恒定空间”时,您会误以为“恒定时间”。

As mentioned in another answer, you cannot do this algorithm in constant time, because you must look at at least n digits. Linear time is the fastest you can get.

However, the algorithm can be done in O(1) space. You only need to store the counts of each 3 digit number, so you need an array of 1000 entries. You can then stream the number in.

My guess is that either the interviewer misspoke when they gave you the solution, or you misheard “constant time” when they said “constant space.”


回答 8

这是我的答案:

from timeit import timeit
from collections import Counter
import types
import random

def setup_data(n):
    digits = "0123456789"
    return dict(text = ''.join(random.choice(digits) for i in range(n)))


def f_counter(text):
    c = Counter()
    for i in range(len(text)-2):
        ss = text[i:i+3]
        c.update([ss])
    return (i for i in c.items() if i[1] > 1)

def f_dict(text):
    d = {}
    for i in range(len(text)-2):
        ss = text[i:i+3]
        if ss not in d:
            d[ss] = 0
        d[ss] += 1
    return ((i, d[i]) for i in d if d[i] > 1)

def f_array(text):
    a = [[[0 for _ in range(10)] for _ in range(10)] for _ in range(10)]
    for n in range(len(text)-2):
        i, j, k = (int(ss) for ss in text[n:n+3])
        a[i][j][k] += 1
    for i, b in enumerate(a):
        for j, c in enumerate(b):
            for k, d in enumerate(c):
                if d > 1: yield (f'{i}{j}{k}', d)


for n in (1E1, 1E3, 1E6):
    n = int(n)
    data = setup_data(n)
    print(f'n = {n}')
    results = {}
    for name, func in list(globals().items()):
        if not name.startswith('f_') or not isinstance(func, types.FunctionType):
            continue
        print("{:16s}{:16.8f} ms".format(name[2:], timeit(
            'results[name] = f(**data)', globals={'f':func, 'data':data, 'results':results, 'name':name}, number=10)*100))
    for r in results:
        print('{:10}: {}'.format(r, sorted(list(results[r]))[:5]))

数组查找方法非常快(甚至比@ paul-panzer的numpy方法还快!)。当然,它作弊是因为它在完成后并未在技术上完成,因为它正在返回生成器。它也不必检查每次迭代是否已经存在该值,这可能会有所帮助。

n = 10
counter               0.10595780 ms
dict                  0.01070654 ms
array                 0.00135370 ms
f_counter : []
f_dict    : []
f_array   : []
n = 1000
counter               2.89462101 ms
dict                  0.40434612 ms
array                 0.00073838 ms
f_counter : [('008', 2), ('009', 3), ('010', 2), ('016', 2), ('017', 2)]
f_dict    : [('008', 2), ('009', 3), ('010', 2), ('016', 2), ('017', 2)]
f_array   : [('008', 2), ('009', 3), ('010', 2), ('016', 2), ('017', 2)]
n = 1000000
counter            2849.00500992 ms
dict                438.44007806 ms
array                 0.00135370 ms
f_counter : [('000', 1058), ('001', 943), ('002', 1030), ('003', 982), ('004', 1042)]
f_dict    : [('000', 1058), ('001', 943), ('002', 1030), ('003', 982), ('004', 1042)]
f_array   : [('000', 1058), ('001', 943), ('002', 1030), ('003', 982), ('004', 1042)]

Here’s my answer:

from timeit import timeit
from collections import Counter
import types
import random

def setup_data(n):
    digits = "0123456789"
    return dict(text = ''.join(random.choice(digits) for i in range(n)))


def f_counter(text):
    c = Counter()
    for i in range(len(text)-2):
        ss = text[i:i+3]
        c.update([ss])
    return (i for i in c.items() if i[1] > 1)

def f_dict(text):
    d = {}
    for i in range(len(text)-2):
        ss = text[i:i+3]
        if ss not in d:
            d[ss] = 0
        d[ss] += 1
    return ((i, d[i]) for i in d if d[i] > 1)

def f_array(text):
    a = [[[0 for _ in range(10)] for _ in range(10)] for _ in range(10)]
    for n in range(len(text)-2):
        i, j, k = (int(ss) for ss in text[n:n+3])
        a[i][j][k] += 1
    for i, b in enumerate(a):
        for j, c in enumerate(b):
            for k, d in enumerate(c):
                if d > 1: yield (f'{i}{j}{k}', d)


for n in (1E1, 1E3, 1E6):
    n = int(n)
    data = setup_data(n)
    print(f'n = {n}')
    results = {}
    for name, func in list(globals().items()):
        if not name.startswith('f_') or not isinstance(func, types.FunctionType):
            continue
        print("{:16s}{:16.8f} ms".format(name[2:], timeit(
            'results[name] = f(**data)', globals={'f':func, 'data':data, 'results':results, 'name':name}, number=10)*100))
    for r in results:
        print('{:10}: {}'.format(r, sorted(list(results[r]))[:5]))

The array lookup method is very fast (even faster than @paul-panzer’s numpy method!). Of course, it cheats since it isn’t technicailly finished after it completes, because it’s returning a generator. It also doesn’t have to check every iteration if the value already exists, which is likely to help a lot.

n = 10
counter               0.10595780 ms
dict                  0.01070654 ms
array                 0.00135370 ms
f_counter : []
f_dict    : []
f_array   : []
n = 1000
counter               2.89462101 ms
dict                  0.40434612 ms
array                 0.00073838 ms
f_counter : [('008', 2), ('009', 3), ('010', 2), ('016', 2), ('017', 2)]
f_dict    : [('008', 2), ('009', 3), ('010', 2), ('016', 2), ('017', 2)]
f_array   : [('008', 2), ('009', 3), ('010', 2), ('016', 2), ('017', 2)]
n = 1000000
counter            2849.00500992 ms
dict                438.44007806 ms
array                 0.00135370 ms
f_counter : [('000', 1058), ('001', 943), ('002', 1030), ('003', 982), ('004', 1042)]
f_dict    : [('000', 1058), ('001', 943), ('002', 1030), ('003', 982), ('004', 1042)]
f_array   : [('000', 1058), ('001', 943), ('002', 1030), ('003', 982), ('004', 1042)]

回答 9

图片作为答案:

看起来像一个滑动窗口。

Image as answer:

Looks like a sliding window.


回答 10

这是我的解决方案:

from collections import defaultdict
string = "103264685134845354863"
d = defaultdict(int)
for elt in range(len(string)-2):
    d[string[elt:elt+3]] += 1
d = {key: d[key] for key in d.keys() if d[key] > 1}

在for循环中具有一些创造力(例如,带有True / False / None的附加查找列表),您应该可以摆脱最后一行,因为您只想创建一个字典,直到我们访问该点为止。希望能帮助到你 :)

Here is my solution:

from collections import defaultdict
string = "103264685134845354863"
d = defaultdict(int)
for elt in range(len(string)-2):
    d[string[elt:elt+3]] += 1
d = {key: d[key] for key in d.keys() if d[key] > 1}

With a bit of creativity in for loop(and additional lookup list with True/False/None for example) you should be able to get rid of last line, as you only want to create keys in dict that we visited once up to that point. Hope it helps :)


回答 11

-从C角度讲。-您可以得到一个int 3-d数组结果[10] [10] [10]; -从第0个位置转到第n-4个位置,其中n是字符串数组的大小。-在每个位置上,检查当前,下一个和下一个下一个。-将cntr增加为resutls [current] [next] [next的下一个] ++;-打印的值

results[1][2][3]
results[2][3][4]
results[3][4][5]
results[4][5][6]
results[5][6][7]
results[6][7][8]
results[7][8][9]

-现在是O(n)时间,不涉及比较。-您可以在此处运行一些并行的东西,方法是对数组进行分区并计算分区之间的匹配项。

-Telling from the perspective of C. -You can have an int 3-d array results[10][10][10]; -Go from 0th location to n-4th location, where n being the size of the string array. -On each location, check the current, next and next’s next. -Increment the cntr as resutls[current][next][next’s next]++; -Print the values of

results[1][2][3]
results[2][3][4]
results[3][4][5]
results[4][5][6]
results[5][6][7]
results[6][7][8]
results[7][8][9]

-It is O(n) time, there is no comparisons involved. -You can run some parallel stuff here by partitioning the array and calculating the matches around the partitions.


回答 12

inputStr = '123456123138276237284287434628736482376487234682734682736487263482736487236482634'

count = {}
for i in range(len(inputStr) - 2):
    subNum = int(inputStr[i:i+3])
    if subNum not in count:
        count[subNum] = 1
    else:
        count[subNum] += 1

print count
inputStr = '123456123138276237284287434628736482376487234682734682736487263482736487236482634'

count = {}
for i in range(len(inputStr) - 2):
    subNum = int(inputStr[i:i+3])
    if subNum not in count:
        count[subNum] = 1
    else:
        count[subNum] += 1

print count