标签归档:Python

PyQt应用程序中的线程:使用Qt线程还是Python线程?

问题:PyQt应用程序中的线程:使用Qt线程还是Python线程?

我正在编写一个GUI应用程序,该应用程序通过Web连接定期检索数据。由于此检索需要一段时间,因此会导致UI在检索过程中无响应(无法拆分成较小的部分)。这就是为什么我想将Web连接外包给单独的工作线程。

[是的,我知道,现在我有两个问题。]

无论如何,该应用程序使用PyQt4,所以我想知道更好的选择是:使用Qt的线程还是使用Python threading模块?各自的优点/缺点是什么?还是您有完全不同的建议?

编辑(赏金):虽然在我的特定情况下,解决方案可能会使用非阻塞网络请求,例如Jeff OberLukášLalinský建议的(所以基本上将并发性问题留给了网络实现),但我仍然想要更多深入回答一般问题:

与本地Python线程(来自threading模块)相比,使用PyQt4(即Qt)线程有什么优缺点?


编辑2:谢谢大家的回答。尽管没有达成100%的协议,但似乎普遍的共识是答案是“使用Qt”,因为这样做的优点是可以与库的其余部分集成,而不会造成任何实际的缺点。

对于希望在这两种线程实现之间进行选择的任何人,我强烈建议他们阅读此处提供的所有答案,包括方丈链接到的PyQt邮件列表线程。

我考虑了一些悬赏的答案;最后,我选择了方丈作为非常相关的外部参考;然而,这是一个密切的电话。

再次感谢。

I’m writing a GUI application that regularly retrieves data through a web connection. Since this retrieval takes a while, this causes the UI to be unresponsive during the retrieval process (it cannot be split into smaller parts). This is why I’d like to outsource the web connection to a separate worker thread.

[Yes, I know, now I have two problems.]

Anyway, the application uses PyQt4, so I’d like to know what the better choice is: Use Qt’s threads or use the Python threading module? What are advantages / disadvantages of each? Or do you have a totally different suggestion?

Edit (re bounty): While the solution in my particular case will probably be using a non-blocking network request like Jeff Ober and Lukáš Lalinský suggested (so basically leaving the concurrency problems to the networking implementation), I’d still like a more in-depth answer to the general question:

What are advantages and disadvantages of using PyQt4’s (i.e. Qt’s) threads over native Python threads (from the threading module)?


Edit 2: Thanks all for you answers. Although there’s no 100% agreement, there seems to be widespread consensus that the answer is “use Qt”, since the advantage of that is integration with the rest of the library, while causing no real disadvantages.

For anyone looking to choose between the two threading implementations, I highly recommend they read all the answers provided here, including the PyQt mailing list thread that abbot links to.

There were several answers I considered for the bounty; in the end I chose abbot’s for the very relevant external reference; it was, however, a close call.

Thanks again.


回答 0

不久前在PyQt邮件列表中对此进行了讨论。引用乔凡尼·巴霍(Giovanni Bajo)对这个问题的评论

大致相同。主要区别在于QThreads与Qt(异步信号/插槽,事件循环等)更好地集成在一起。另外,您不能在Python线程中使用Qt(例如,不能通过QApplication.postEvent将事件发布到主线程):您需要一个QThread才能工作。

一般的经验法则是,如果您要以某种方式与Qt进行交互,则可以使用QThreads;否则,请使用Python线程。

PyQt的作者对此主题有较早的评论:“它们都是相同的本机线程实现的包装器”。两种实现都以相同的方式使用GIL。

This was discussed not too long ago in PyQt mailing list. Quoting Giovanni Bajo’s comments on the subject:

It’s mostly the same. The main difference is that QThreads are better integrated with Qt (asynchrnous signals/slots, event loop, etc.). Also, you can’t use Qt from a Python thread (you can’t for instance post event to the main thread through QApplication.postEvent): you need a QThread for that to work.

A general rule of thumb might be to use QThreads if you’re going to interact somehow with Qt, and use Python threads otherwise.

And some earlier comment on this subject from PyQt’s author: “they are both wrappers around the same native thread implementations”. And both implementations use GIL in the same way.


回答 1

Python的线程将更简单,更安全,并且由于它用于基于I / O的应用程序,因此它们能够绕过GIL。也就是说,您是否考虑过使用Twisted或非阻塞套接字/选择的非阻塞I / O?

编辑:更多关于线程

Python线程

Python的线程是系统线程。但是,Python使用全局解释器锁(GIL)来确保解释器一次只执行一定大小的字节码指令块。幸运的是,Python在输入/输出操作期间释放了GIL,使线程可用于模拟非阻塞I / O。

重要警告:这可能会引起误解,因为字节码指令的数量与程序中的行数对应。在Python中,即使是单个分配也可能不是原子分配的,因此对于必须原子执行的任何代码块,即使使用GIL,也需要互斥锁。

QT线程

当Python将控制权交给第三方编译模块时,它将释放GIL。在需要时,确保原子性成为模块的责任。当控制权回传时,Python将使用GIL。这可能会使第3方库与线程混淆一起使用。使用外部线程库更加困难,因为它增加了控制权在何时何地掌握在模块和解释器之间的不确定性。

QT线程在释放GIL的情况下运行。QT线程能够同时执行QT库代码(以及其他不获取GIL的已编译模块代码)。然而,QT线程的上下文中执行的Python代码仍然取得GIL,现在你必须要管理2台逻辑的锁定你的代码。

最后,QT线程和Python线程都是系统线程的包装器。Python线程使用起来稍微安全些,因为那些不是用Python编写的部分(隐式使用GIL)在任何情况下都使用GIL(尽管上面的警告仍然适用)。

非阻塞I / O

线程给您的应用程序增加了极大的复杂性。特别是在处理Python解释器和已编译模块代码之间已经很复杂的交互时。尽管许多人发现很难遵循基于事件的编程,但是基于事件的非阻塞I / O通常比线程难得多。

使用异步I / O,您始终可以确保对于每个打开的描述符,执行路径是一致且有序的。显然,有一些必须解决的问题,例如当代码取决于一个打开的通道时该怎么办进一步取决于当另一个打开的通道返回数据时要调用的代码结果。

新的Diesel库是基于事件的非阻塞I / O的一种不错的解决方案。目前,它仅限于Linux,但是它非常快且非常优雅。

还值得您花时间学习pyevent,它是一个出色的libevent库的包装器,它为系统使用最快的可用方法(在编译时确定)提供了基于事件的编程的基本框架。

Python’s threads will be simpler and safer, and since it is for an I/O-based application, they are able to bypass the GIL. That said, have you considered non-blocking I/O using Twisted or non-blocking sockets/select?

EDIT: more on threads

Python threads

Python’s threads are system threads. However, Python uses a global interpreter lock (GIL) to ensure that the interpreter is only ever executing a certain size block of byte-code instructions at a time. Luckily, Python releases the GIL during input/output operations, making threads useful for simulating non-blocking I/O.

Important caveat: This can be misleading, since the number of byte-code instructions does not correspond to the number of lines in a program. Even a single assignment may not be atomic in Python, so a mutex lock is necessary for any block of code that must be executed atomically, even with the GIL.

QT threads

When Python hands off control to a 3rd party compiled module, it releases the GIL. It becomes the responsibility of the module to ensure atomicity where required. When control is passed back, Python will use the GIL. This can make using 3rd party libraries in conjunction with threads confusing. It is even more difficult to use an external threading library because it adds uncertainty as to where and when control is in the hands of the module vs the interpreter.

QT threads operate with the GIL released. QT threads are able to execute QT library code (and other compiled module code that does not acquire the GIL) concurrently. However, the Python code executed within the context of a QT thread still acquires the GIL, and now you have to manage two sets of logic for locking your code.

In the end, both QT threads and Python threads are wrappers around system threads. Python threads are marginally safer to use, since those parts that are not written in Python (implicitly using the GIL) use the GIL in any case (although the caveat above still applies.)

Non-blocking I/O

Threads add extraordinarily complexity to your application. Especially when dealing with the already complex interaction between the Python interpreter and compiled module code. While many find event-based programming difficult to follow, event-based, non-blocking I/O is often much less difficult to reason about than threads.

With asynchronous I/O, you can always be sure that, for each open descriptor, the path of execution is consistent and orderly. There are, obviously, issues that must be addressed, such as what to do when code depending on one open channel further depends on the results of code to be called when another open channel returns data.

One nice solution for event-based, non-blocking I/O is the new Diesel library. It is restricted to Linux at the moment, but it is extraordinarily fast and quite elegant.

It is also worth your time to learn pyevent, a wrapper around the wonderful libevent library, which provides a basic framework for event-based programming using the fastest available method for your system (determined at compile time).


回答 2

优点QThread是它与Qt库的其余部分集成在一起。也就是说,Qt中的线程感知方法将需要知道它们在哪个线程中运行,并且需要在线程之间移动对象QThread。另一个有用的功能是在线程中运行您自己的事件循环。

如果要访问HTTP服务器,则应考虑QNetworkAccessManager

The advantage of QThread is that it’s integrated with the rest of the Qt library. That is, thread-aware methods in Qt will need to know in which thread they run, and to move objects between threads, you will need to use QThread. Another useful feature is running your own event loop in a thread.

If you are accessing a HTTP server, you should consider QNetworkAccessManager.


回答 3

PyTalk上工作时,我问了同样的问题。

如果您使用的是Qt,则需要使用QThread能够使用Qt框架,尤其是信号/插槽系统。

使用信号/插槽引擎,您将能够从一个线程与另一个线程以及项目的每个部分进行对话。

而且,由于这两者都是C ++绑定,因此对于此选择没有太大的性能问题。

这是我对PyQt和线程的经验。

我鼓励你使用QThread

I asked myself the same question when I was working to PyTalk.

If you are using Qt, you need to use QThread to be able to use the Qt framework and expecially the signal/slot system.

With the signal/slot engine, you will be able to talk from a thread to another and with every part of your project.

Moreover, there is not very performance question about this choice since both are a C++ bindings.

Here is my experience of PyQt and thread.

I encourage you to use QThread.


回答 4

杰夫有一些优点。只有一个主线程可以执行任何GUI更新。如果确实需要从线程内更新GUI,则Qt-4的排队连接信号使跨线程发送数据变得容易,并且如果使用QThread,则将自动调用该信号。我不确定您是否正在使用Python线程,尽管可以轻松地向中添加参数connect()

Jeff has some good points. Only one main thread can do any GUI updates. If you do need to update the GUI from within the thread, Qt-4’s queued connection signals make it easy to send data across threads and will automatically be invoked if you’re using QThread; I’m not sure if they will be if you’re using Python threads, although it’s easy to add a parameter to connect().


回答 5

我也不能真正推荐,但是我可以尝试描述CPython和Qt线程之间的区别。

首先,CPython线程不能并发运行,至少不是Python代码。是的,它们确实为每个Python线程创建了系统线程,但是仅允许当前持有“全局解释器锁”的线程运行(C扩展名和FFI代码可能会绕过它,但是当线程不持有GIL时不执行Python字节码)。

另一方面,我们有Qt线程,它们基本上是系统线程上的通用层,没有全局解释器锁定,因此能够并行运行。我不确定PyQt如何处理它,但是除非您的Qt线程调用Python代码,否则它们应该能够并发运行(可能会在各种结构中实现的各种额外锁)。

为了进行额外的微调,您可以修改在切换GIL所有权之前解释的字节码指令的数量-较低的值意味着更多的上下文切换(并且可能会有更高的响应度),但是每个单独线程的性能较低(如果您需要尝试切换每条指令,这对提高速度没有帮助。)

希望它可以帮助您解决问题:)

I can’t really recommend either, but I can try describing differences between CPython and Qt threads.

First of all, CPython threads do not run concurrently, at least not Python code. Yes, they do create system threads for each Python thread, however only the thread currently holding Global Interpreter Lock is allowed to run (C extensions and FFI code might bypass it, but Python bytecode is not executed while thread doesn’t hold GIL).

On the other hand, we have Qt threads, which are basically common layer over system threads, don’t have Global Interpreter Lock, and thus are capable of running concurrently. I’m not sure how PyQt deals with it, however unless your Qt threads call Python code, they should be able to run concurrently (bar various extra locks that might be implemented in various structures).

For extra fine-tuning, you can modify the amount of bytecode instructions that are interpreted before switching ownership of GIL – lower values mean more context switching (and possibly higher responsiveness) but lower performance per individual thread (context switches have their cost – if you try switching every few instructions it doesn’t help speed.)

Hope it helps with your problems :)


回答 6

我无法评论Python和PyQt线程之间的确切差异,但我一直在使用尝试做您想做的事情QThreadQNetworkAcessManager并确保QApplication.processEvents()在线程运行时调用它。如果GUI响应确实是您要解决的问题,则稍后的内容会有所帮助。

I can’t comment on the exact differences between Python and PyQt threads, but I’ve been doing what you’re attempting to do using QThread, QNetworkAcessManager and making sure to call QApplication.processEvents() while the thread is alive. If GUI responsiveness is really the issue you’re trying to solve, the later will help.


在Markdown和reStructuredText中都具有相同的自述文件

问题:在Markdown和reStructuredText中都具有相同的自述文件

我有一个托管在GitHub上的项目。为此,我使用Markdown语法编写了自述文件,以便在GitHub上很好地格式化它。

由于我的项目使用Python,因此我还计划将其上传到PyPi。PyPi上用于README的语法为reStructuredText。

我希望避免处理两个包含大致相同内容的自述文件。因此,我搜索了RST(或相反)转换器的降价促销,但找不到任何商品。

我看到的另一个解决方案是执行markdown / HTML,然后执行HTML / RST转换。我在这里这里都找到了一些资源,所以我猜应该是可行的。

您有什么想法可以更好地适合我的工作吗?

I have a project hosted on GitHub. For this I have written my README using the Markdown syntax in order to have it nicely formatted on GitHub.

As my project is in Python I also plan to upload it to PyPi. The syntax used for READMEs on PyPi is reStructuredText.

I would like to avoid having to handle two READMEs containing roughly the same content; so I searched for a markdown to RST (or the other way around) translator, but couldn’t find any.

The other solution I see is to perform a markdown/HTML and then a HTML/RST translation. I found some ressources for this here and here so I guess it should be possible.

Would you have any idea that could fit better with what I want to do?


回答 0

我会推荐Pandoc,“将文件从一种标记格式转换为另一种格式的瑞士军刀”(在页面底部查看受支持的转换图,这是非常令人印象深刻的)。Pandoc允许markdown直接进行reStructuredText翻译。此外,还有一个在线编辑器,在这里它可以让你尝试一下,所以你可以简单地使用在线编辑器来转换你的自述文件。

I would recommend Pandoc, the “swiss-army knife for converting files from one markup format into another” (check out the diagram of supported conversions at the bottom of the page, it is quite impressive). Pandoc allows markdown to reStructuredText translation directly. There is also an online editor here which lets you try it out, so you could simply use the online editor to convert your README files.


回答 1

正如@Chris建议的那样,您可以使用Pandoc将Markdown转换为RST。这可以使用pypandoc模块和setup.py中的一些魔术来简单地自动化:

from setuptools import setup
try:
    from pypandoc import convert
    read_md = lambda f: convert(f, 'rst')
except ImportError:
    print("warning: pypandoc module not found, could not convert Markdown to RST")
    read_md = lambda f: open(f, 'r').read()

setup(
    # name, version, ...
    long_description=read_md('README.md'),
    install_requires=[]
)

对于在PyPi上使用的详细说明,这将自动将README.md转换为RST。当pypandoc不可用时,它将仅读取README.md而不进行转换-不会在其他人只想构建模块而不上传到PyPi时不强迫其他人安装pypandoc。

因此,您可以像往常一样在Markdown中编写内容,而不再关心RST混乱了。;)

As @Chris suggested, you can use Pandoc to convert Markdown to RST. This can be simply automated using pypandoc module and some magic in setup.py:

from setuptools import setup
try:
    from pypandoc import convert
    read_md = lambda f: convert(f, 'rst')
except ImportError:
    print("warning: pypandoc module not found, could not convert Markdown to RST")
    read_md = lambda f: open(f, 'r').read()

setup(
    # name, version, ...
    long_description=read_md('README.md'),
    install_requires=[]
)

This will automatically convert README.md to RST for the long description using on PyPi. When pypandoc is not available, then it just reads README.md without the conversion – to not force others to install pypandoc when they wanna just build the module, not upload to PyPi.

So you can write in Markdown as usual and don’t care about RST mess anymore. ;)


回答 2

2019更新

PyPI Warehouse 现在也支持渲染Markdown!您只需要更新软件包配置并将其添加long_description_content_type='text/markdown'到其中即可。例如:

setup(
    name='an_example_package',
    # other arguments omitted
    long_description=long_description,
    long_description_content_type='text/markdown'
)

因此,无需再将README保留为两种格式。

您可以在文档中找到有关它的更多信息。

旧答案:

GitHub使用的标记库支持reStructuredText。这意味着您可以编写README.rst文件。

它们甚至使用codecode-block指令支持语法特定的颜色突出显示(示例

2019 Update

The PyPI Warehouse now supports rendering Markdown as well! You just need to update your package configuration and add the long_description_content_type='text/markdown' to it. e.g.:

setup(
    name='an_example_package',
    # other arguments omitted
    long_description=long_description,
    long_description_content_type='text/markdown'
)

Therefore, there is no need to keep the README in two formats any longer.

You can find more information about it in the documentation.

Old answer:

The Markup library used by GitHub supports reStructuredText. This means you can write a README.rst file.

They even support syntax specific color highlighting using the code and code-block directives (Example)


回答 3

PyPI现在支持Markdown进行详细描述!

在中setup.py,设置long_description为Markdown字符串,添加long_description_content_type="text/markdown"并确保您使用的是最新工具(setuptools38.6.0 +,twine1.11 +)。

有关更多详细信息,请参见Dustin Ingram的博客文章

PyPI now supports Markdown for long descriptions!

In setup.py, set long_description to a Markdown string, add long_description_content_type="text/markdown" and make sure you’re using recent tooling (setuptools 38.6.0+, twine 1.11+).

See Dustin Ingram’s blog post for more details.


回答 4

根据我的要求,我不想在计算机上安装Pandoc。我用了docverter。Docverter是具有HTTP接口的文档转换服务器,为此使用了Pandoc。

import requests
r = requests.post(url='http://c.docverter.com/convert',
                  data={'to':'rst','from':'markdown'},
                  files={'input_files[]':open('README.md','rb')})
if r.ok:
    print r.content

For my requirements I didn’t want to install Pandoc in my computer. I used docverter. Docverter is a document conversion server with an HTTP interface using Pandoc for this.

import requests
r = requests.post(url='http://c.docverter.com/convert',
                  data={'to':'rst','from':'markdown'},
                  files={'input_files[]':open('README.md','rb')})
if r.ok:
    print r.content

回答 5

您可能还对以下事实感兴趣:可以编写一个公共子集,以便在以markdown呈现或以reStructuredText呈现时,文档以相同的方式出现:https: //gist.github.com/dupuy/1855764☺

You might also be interested in the fact that it is possible to write in a common subset so that your document comes out the same way when rendered as markdown or rendered as reStructuredText: https://gist.github.com/dupuy/1855764


回答 6

我遇到了这个问题,并使用以下两个bash脚本解决了这个问题。

请注意,我已将LaTeX捆绑到Markdown中。

#!/usr/bin/env bash

if [ $# -lt 1 ]; then
  echo "$0 file.md"
  exit;
fi

filename=$(basename "$1")
extension="${filename##*.}"
filename="${filename%.*}"

if [ "$extension" = "md" ]; then
  rst=".rst"
  pandoc $1 -o $filename$rst
fi

将其转换为html也很有用。md2html:

#!/usr/bin/env bash

if [ $# -lt 1 ]; then
  echo "$0 file.md <style.css>"
  exit;
fi

filename=$(basename "$1")
extension="${filename##*.}"
filename="${filename%.*}"

if [ "$extension" = "md" ]; then
  html=".html"
  if [ -z $2 ]; then
    # if no css
    pandoc -s -S --mathjax --highlight-style pygments $1 -o $filename$html
  else
    pandoc -s -S --mathjax --highlight-style pygments -c $2 $1 -o $filename$html
  fi
fi

希望对您有所帮助

I ran into this problem and solved it with the two following bash scripts.

Note that I have LaTeX bundled into my Markdown.

#!/usr/bin/env bash

if [ $# -lt 1 ]; then
  echo "$0 file.md"
  exit;
fi

filename=$(basename "$1")
extension="${filename##*.}"
filename="${filename%.*}"

if [ "$extension" = "md" ]; then
  rst=".rst"
  pandoc $1 -o $filename$rst
fi

Its also useful to convert to html. md2html:

#!/usr/bin/env bash

if [ $# -lt 1 ]; then
  echo "$0 file.md <style.css>"
  exit;
fi

filename=$(basename "$1")
extension="${filename##*.}"
filename="${filename%.*}"

if [ "$extension" = "md" ]; then
  html=".html"
  if [ -z $2 ]; then
    # if no css
    pandoc -s -S --mathjax --highlight-style pygments $1 -o $filename$html
  else
    pandoc -s -S --mathjax --highlight-style pygments -c $2 $1 -o $filename$html
  fi
fi

I hope that helps


回答 7

使用pandoc其他人建议的工具,我创建了一个md2rst实用程序来创建rst文件。即使此解决方案意味着您同时拥有an md和an,rst但它似乎是侵入性最小的,并且将允许将来添加任何降价支持。与更改相比setup.py,我更喜欢它,也许您也会:

#!/usr/bin/env python

'''
Recursively and destructively creates a .rst file for all Markdown
files in the target directory and below.

Created to deal with PyPa without changing anything in setup based on
the idea that getting proper Markdown support later is worth waiting
for rather than forcing a pandoc dependency in sample packages and such.

Vote for
(https://bitbucket.org/pypa/pypi/issue/148/support-markdown-for-readmes)

'''

import sys, os, re

markdown_sufs = ('.md','.markdown','.mkd')
markdown_regx = '\.(md|markdown|mkd)$'

target = '.'
if len(sys.argv) >= 2: target = sys.argv[1]

md_files = []
for root, dirnames, filenames in os.walk(target):
    for name in filenames:
        if name.endswith(markdown_sufs):
            md_files.append(os.path.join(root, name))

for md in md_files:
    bare = re.sub(markdown_regx,'',md)
    cmd='pandoc --from=markdown --to=rst "{}" -o "{}.rst"'
    print(cmd.format(md,bare))
    os.system(cmd.format(md,bare))

Using the pandoc tool suggested by others I created a md2rst utility to create the rst files. Even though this solution means you have both an md and an rst it seemed to be the least invasive and would allow for whatever future markdown support is added. I prefer it over altering setup.py and maybe you would as well:

#!/usr/bin/env python

'''
Recursively and destructively creates a .rst file for all Markdown
files in the target directory and below.

Created to deal with PyPa without changing anything in setup based on
the idea that getting proper Markdown support later is worth waiting
for rather than forcing a pandoc dependency in sample packages and such.

Vote for
(https://bitbucket.org/pypa/pypi/issue/148/support-markdown-for-readmes)

'''

import sys, os, re

markdown_sufs = ('.md','.markdown','.mkd')
markdown_regx = '\.(md|markdown|mkd)$'

target = '.'
if len(sys.argv) >= 2: target = sys.argv[1]

md_files = []
for root, dirnames, filenames in os.walk(target):
    for name in filenames:
        if name.endswith(markdown_sufs):
            md_files.append(os.path.join(root, name))

for md in md_files:
    bare = re.sub(markdown_regx,'',md)
    cmd='pandoc --from=markdown --to=rst "{}" -o "{}.rst"'
    print(cmd.format(md,bare))
    os.system(cmd.format(md,bare))

Flask vs webapp2(适用于Google App Engine)

问题:Flask vs webapp2(适用于Google App Engine)

我正在启动新的Google App Engine应用程序,目前正在考虑两个框架:Flaskwebapp2。我对以前的App Engine应用程序使用的内置webapp框架感到非常满意,因此我认为webapp2会更好,并且不会有任何问题。

但是,Flask有很多不错的评论,我真的很喜欢Flask的方法以及到目前为止我在文档中已经读过的所有东西,我想尝试一下。但是我有点担心Flask会遇到的局限性。

因此,问题是- 您是否知道Flask可能会带入Google App Engine应用程序的任何问题,性能问题,限制(例如,路由系统,内置授权机制等)?“问题”是指我无法在几行代码(或任何合理数量的代码和工作量)中解决的问题,或者是完全不可能的事情。

还有一个后续问题:尽管我可能会遇到任何问题,但您认为Flask中是否有任何杀手级功能可以打动我,让我使用它?

I’m starting new Google App Engine application and currently considering two frameworks: Flask and webapp2. I’m rather satisfied with built-in webapp framework that I’ve used for my previous App Engine application, so I think webapp2 will be even better and I won’t have any problems with it.

However, there are a lot of good reviews of Flask, I really like its approach and all the things that I’ve read so far in the documentation and I want to try it out. But I’m a bit concerned about limitations that I can face down the road with Flask.

So, the question is – do you know any problems, performance issues, limitations (e.g. routing system, built-in authorization mechanism, etc.) that Flask could bring into Google App Engine application? By “problem” I mean something that I can’t work around in several lines of code (or any reasonable amount of code and efforts) or something that is completely impossible.

And as a follow-up question: are there any killer-features in Flask that you think can blow my mind and make me use it despite any problems that I can face?


回答 0

免责声明:我是tipfy和webapp2的作者。

坚持使用webapp(或其自然发展版本,webapp2)的一大优势是,您不必为自己选择的框架为现有的SDK处理程序创建自己的版本。

例如,deferred使用一个webapp处理程序。要在纯Flask视图中使用werkzeug.Request和werkzeug.Response来使用它,您需要为此实现deferred(就像我在这里为tipfy 所做的那样)。

其他处理程序也会发生同样的情况:blobstore(Werkzeug仍然不支持范围请求,因此即使您创建了自己的处理程序,也需要使用WebOb -请参见tipfy.appengine.blobstore),邮件,XMPP等,或以后包含在SDK中的其他内容。

对于使用App Engine创建的库,也是如此,例如ProtoRPC,它基于webapp,并且如果您不想混合使用webapp和框架,则需要端口或适配器才能与其他框架一起使用。同一应用中的选择处理程序。

因此,即使您选择其他框架,您也将不得不结束a)在某些特殊情况下使用webapp或b)必须创建和维护特定SDK处理程序或功能的版本(如果要使用它们)。

与WebOb相比,我更偏爱Werkzeug,但是在移植和维护可与Tipfy一起使用的SDK处理程序版本超过一年之后,我意识到这是一个失败的原因-长期支持GAE,最好是保持与webapp / WebOb。它使对SDK库的支持变得轻而易举,维护变得更加容易,因为新的库和SDK功能将立即可用,因此它具有更强的前瞻性,并且大型社区可以使用相同的App Engine工具来受益。

这里总结一个特定的webapp2防御。此外,webapp2可以在App Engine之外使用,并且易于自定义以使其看起来像流行的微框架,并且您有很多吸引人的理由。而且,webapp2有很大的机会被包含在将来的SDK版本中(这是非正式的,不要引用我的:-),这将推动它向前发展并带来新的开发人员和贡献。

也就是说,我是Werkzeug和Pocoo的忠实拥护者,并向Flask和其他人(web.py,Tornado)借了很多钱,但是-我知道,我有偏见-以上webapp2的好处应该被考虑在内。

Disclaimer: I’m the author of tipfy and webapp2.

A big advantage of sticking with webapp (or its natural evolution, webapp2) is that you don’t have to create your own versions for existing SDK handlers for your framework of your choice.

For example, deferred uses a webapp handler. To use it in a pure Flask view, using werkzeug.Request and werkzeug.Response, you’ll need to implement deferred for it (like I did here for tipfy).

The same happens for other handlers: blobstore (Werkzeug still doesn’t support range requests, so you’ll need to use WebOb even if you create your own handler — see tipfy.appengine.blobstore), mail, XMPP and so on, or others that are included in the SDK in the future.

And the same happens for libraries created with App Engine in mind, like ProtoRPC, which is based on webapp and would need a port or adapter to work with other frameworks, if you don’t want to mix webapp and your-framework-of-choice handlers in the same app.

So, even if you choose a different framework, you’ll end a) using webapp in some special cases or b) having to create and maintain your versions for specific SDK handlers or features, if you’ll use them.

I much prefer Werkzeug over WebOb, but after over one year porting and maintaining versions of the SDK handlers that work natively with tipfy, I realized that this is a lost cause — to support GAE for the long term, best is to stay close to webapp/WebOb. It makes support for SDK libraries a breeze, maintenance becomes a lot easier, it is more future-proof as new libraries and SDK features will work out of the box and there’s the benefit of a large community working around the same App Engine tools.

A specific webapp2 defense is summarized here. Add to those that webapp2 can be used outside of App Engine and is easy to be customized to look like popular micro-frameworks and you have a good set of compelling reasons to go for it. Also, webapp2 has a big chance to be included in a future SDK release (this is extra-official, don’t quote me :-) which will push it forward and bring new developers and contributions.

That said, I’m a big fan of Werkzeug and the Pocoo guys and borrowed a lot from Flask and others (web.py, Tornado), but — and, you know, I’m biased — the above webapp2 benefits should be taken into account.


回答 1

您的问题非常广泛,但是在Google App Engine上使用Flask似乎没有大问题。

该邮件列表线程链接到多个模板:

http://flask.pocoo.org/mailinglist/archive/2011/3/27/google-app-engine/#4f95bab1627a24922c60ad1d0a0a8e44

以下是针对Flask / App Engine组合的教程:

http://www.franciscosouza.com/2010/08/flying-with-flask-on-google-app-engine/

另请参阅App Engine-难以访问Twitter数据-FlaskFlask消息刷新无法跨重定向重定向,以及如何使用Google App Engine管理第三方Python库?(virtualenv?pip?)解决人们对于Flask和Google App Engine的问题。

Your question is extremely broad, but there appears to be no big problems using Flask on Google App Engine.

This mailing list thread links to several templates:

http://flask.pocoo.org/mailinglist/archive/2011/3/27/google-app-engine/#4f95bab1627a24922c60ad1d0a0a8e44

And here is a tutorial specific to the Flask / App Engine combination:

http://www.franciscosouza.com/2010/08/flying-with-flask-on-google-app-engine/

Also, see App Engine – Difficulty Accessing Twitter Data – Flask, Flask message flashing fails across redirects, and How do I manage third-party Python libraries with Google App Engine? (virtualenv? pip?) for issues people have had with Flask and Google App Engine.


回答 2

对我来说,当我发现烧瓶不是(从一开始就)不是面向对象的框架,而webapp2是一个纯粹的面向对象的框架时,对webapp2的决定很容易。webapp2将基于方法的调度作为所有RequestHandler的标准使用(如烧瓶文档在MethodViews中从V0.7开始对其进行调用和实现)。在烧瓶中,MethodViews是附加组件,它是webapp2的核心设计原则。因此,使用这两种框架,您的软件设计将看起来有所不同。如今,这两个框架都使用jinja2模板,并且功能完全相同。

我更喜欢将安全检查添加到基类RequestHandler并从中继承。这对于实用程序功能等也很有用。例如,您可以在链接[3]中看到,您可以重写方法以防止调度请求。

如果您是面向对象的人,或者需要设计REST服务器,我将为您推荐webapp2。如果您希望使用带有装饰器的简单函数作为多种请求类型的处理程序,或者对OO继承不满意,请选择flask。我认为这两个框架都避免了金字塔等大型框架的复杂性和依赖性。

  1. http://flask.pocoo.org/docs/0.10/views/#method-based-dispatching
  2. https://webapp-improved.appspot.com/guide/handlers.html
  3. https://webapp-improved.appspot.com/guide/handlers.html#overriding-dispatch

For me the decision for webapp2 was easy when I discovered that flask is not an object-oriented framework (from the beginning), while webapp2 is a pure object oriented framework. webapp2 uses Method Based Dispatching as standard for all RequestHandlers (as flask documentation calls it and implements it since V0.7 in MethodViews). While in flask MethodViews are an add-on it is a core design principle for webapp2. So your software design will look different using both frameworks. Both frameworks use nowadays jinja2 templates and are fairly feature identical.

I prefer to add security checks to a base-class RequestHandler and inherit from it. This is also good for utility functions, etc. As you can see for example in link [3] you can override methods to prevent dispatching a request.

If you are an OO-person, or if you need to design a REST-server, I would recommend webapp2 for you. If you prefer simple functions with decorators as handlers for multiple request-types, or you are uncomfortable with OO-inheritance then choose flask. I think both frameworks avoid the complexity and dependencies of much bigger frameworks like pyramid.

  1. http://flask.pocoo.org/docs/0.10/views/#method-based-dispatching
  2. https://webapp-improved.appspot.com/guide/handlers.html
  3. https://webapp-improved.appspot.com/guide/handlers.html#overriding-dispatch

回答 3

我认为Google App Engine正式支持Flask框架。这里有一个示例代码和教程-> https://console.developers.google.com/start/appengine?_ga=1.36257892.596387946.1427891855

I think google app engine officially supports flask framework. There is a sample code and tutorial here -> https://console.developers.google.com/start/appengine?_ga=1.36257892.596387946.1427891855


回答 4

我没有尝试使用webapp2,但发现tipfy有点难以使用,因为它需要安装脚本和用于将python安装配置为默认设置以外的构建。由于这些原因和其他原因,我没有使我最大的项目依赖于框架,而是使用普通的webapp,而是添加了名为beaker的库以获取会话功能,并且django已经内置了许多用例通用词的翻译,因此在构建框架时本地化的应用程序django是我最大的项目的正确选择。我实际在项目中部署到生产环境的其他两个框架分别是GAEframework.com和web2py,通常看来,添加一个更改其模板引擎的框架可能会导致新旧版本不兼容。

因此,我的经验是,除非他们解决了更高级的用例(文件上传,多重身份验证,管理ui是目前尚无框架的3个更高级的用例示例),否则我不愿意在项目中添加框架处理得很好。

I didn’t try webapp2 and found that tipfy was a bit difficult to use since it required setup scripts and builds that configure your python installation to other than default. For these and other reasons I haven’t made my largest project depend on a framework and I use the plain webapp instead, add the library called beaker to get session capability and django already has builtin translations for words common to many usecases so when building a localized application django was the right choice for my largest project. The 2 other frameworks I actually deployed with projects to a production environment were GAEframework.com and web2py and generally it seems that adding a framework which changes its template engine could lead to incompatibilities between old and new versions.

So my experience is that I’m being reluctant to adding a framework to my projects unless they solve the more advanced use cases (file upload, multi auth, admin ui are 3 examples of more advanced use cases that no framework for gae at the moment handles well.


TemplateDoesNotExist-Django错误

问题:TemplateDoesNotExist-Django错误

我正在使用Django Rest Framework。而且我不断出错

Exception Type: TemplateDoesNotExist
Exception Value: rest_framework/api.html

我不知道我怎么了。这是我第一次尝试使用REST Framework。这是代码。

views.py

import socket, json
from modules.data.models import *
from modules.utils import *
from rest_framework import status
from rest_framework.decorators import api_view
from rest_framework.response import Response
from modules.actions.serializers import ActionSerializer


@api_view(['POST'])
@check_field_exists_wrapper("installation")
def api_actions(request, format = None):

    action_type = request.POST['action_type']
    if action_type == "Shutdown" : 
        send_message = '1'
        print "Shutting Down the system..."
    elif action_type == "Enable" : 
        send_message = '1'
        print "Enabling the system..."
    elif action_type == "Disable" : 
        send_message = '1'
        print "Disabling the system..."
    elif action_type == "Restart" : 
        send_message = '1'
        print "Restarting the system..."

    if action_type in ["Shutdown", "Enable", "Disable"] : PORT = 6000
    else : PORT = 6100

    controllers_list = Controller.objects.filter(installation_id = kwargs['installation_id'])

    for controller_obj in controllers_list:
        ip = controller_obj.ip
        try:
            s = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
            s.connect((ip, PORT))
            s.send(send_message)
            s.close()
        except Exception as e:
            print("Exception when sending " + action_type +" command: "+str(e))

    return Response(status = status.HTTP_200_OK)

models.py

class Controller(models.Model):
    id = models.IntegerField(primary_key = True)
    name = models.CharField(max_length = 255, unique = True)
    ip = models.CharField(max_length = 255, unique = True)
    installation_id = models.ForeignKey('Installation')

serializers.py

从django.forms中导入从rest_framework中导入小部件从modules.data.models中导入序列化器*

class ActionSerializer(serializers.ModelSerializer):
    class Meta:
        model = Controller
        fields = ('id', 'name', 'ip', 'installation_id')

urls.py

from django.conf.urls import patterns, url
from rest_framework.urlpatterns import format_suffix_patterns

urlpatterns = patterns('modules.actions.views',
    url(r'^$','api_actions',name='api_actions'),
)

I’m using Django Rest Framework. and I keep getting an error

Exception Type: TemplateDoesNotExist
Exception Value: rest_framework/api.html

I dont know how I’m going wrong. This is the first time I’m trying out hands on REST Framework. This is code.

views.py

import socket, json
from modules.data.models import *
from modules.utils import *
from rest_framework import status
from rest_framework.decorators import api_view
from rest_framework.response import Response
from modules.actions.serializers import ActionSerializer


@api_view(['POST'])
@check_field_exists_wrapper("installation")
def api_actions(request, format = None):

    action_type = request.POST['action_type']
    if action_type == "Shutdown" : 
        send_message = '1'
        print "Shutting Down the system..."
    elif action_type == "Enable" : 
        send_message = '1'
        print "Enabling the system..."
    elif action_type == "Disable" : 
        send_message = '1'
        print "Disabling the system..."
    elif action_type == "Restart" : 
        send_message = '1'
        print "Restarting the system..."

    if action_type in ["Shutdown", "Enable", "Disable"] : PORT = 6000
    else : PORT = 6100

    controllers_list = Controller.objects.filter(installation_id = kwargs['installation_id'])

    for controller_obj in controllers_list:
        ip = controller_obj.ip
        try:
            s = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
            s.connect((ip, PORT))
            s.send(send_message)
            s.close()
        except Exception as e:
            print("Exception when sending " + action_type +" command: "+str(e))

    return Response(status = status.HTTP_200_OK)

models.py

class Controller(models.Model):
    id = models.IntegerField(primary_key = True)
    name = models.CharField(max_length = 255, unique = True)
    ip = models.CharField(max_length = 255, unique = True)
    installation_id = models.ForeignKey('Installation')

serializers.py

from django.forms import widgets from rest_framework import serializers from modules.data.models import *

class ActionSerializer(serializers.ModelSerializer):
    class Meta:
        model = Controller
        fields = ('id', 'name', 'ip', 'installation_id')

urls.py

from django.conf.urls import patterns, url
from rest_framework.urlpatterns import format_suffix_patterns

urlpatterns = patterns('modules.actions.views',
    url(r'^$','api_actions',name='api_actions'),
)

回答 0

确保您已rest_framework在中列出settings.py INSTALLED_APPS

Make sure you have rest_framework listed in your settings.py INSTALLED_APPS.


回答 1

对我来说,rest_framework/api.html由于安装损坏或其他未知原因,文件系统上实际上缺少该文件系统。重新安装djangorestframework解决了问题:

$ pip install --upgrade djangorestframework

For me, rest_framework/api.html was actually missing on the filesystem due to a corrupt installation or some other unknown reason. Reinstalling djangorestframework fixed the problem:

$ pip install --upgrade djangorestframework

回答 2

请注意,DRF尝试以请求的相同格式返回数据。在您的浏览器中,这很可能是HTML。要指定替代响应,请使用?format=参数。例如:?format=json

如其他受访者所述,TemplateDoesNotExist当您在浏览器中访问API端点并且未将rest_framework包含在已安装应用程序列表中时,通常会发生此错误。

如果您的应用程序列表中未包含DRF,但又不想使用HTML Admin DRF页面,请尝试使用其他格式来“避免”此错误消息。

来自此处的文档的更多信息:http : //www.django-rest-framework.org/topics/browsable-api/#formats

Please note that the DRF attempts to return data in the same format that was requested. From your browser, this is most likely HTML. To specify an alternative response, use the ?format= parameter. For example: ?format=json.

The TemplateDoesNotExist error occurs most commonly when you are visiting an API endpoint in your browser and you do not have the rest_framework included in your list of installed apps, as described by other respondents.

If you do not have DRF included in your list of apps, but don’t want to use the HTML Admin DRF page, try using an alternative format to ‘side-step’ this error message.

More info from the docs here: http://www.django-rest-framework.org/topics/browsable-api/#formats


回答 3

不是您的情况,而是为定制loaders了可能的原因Django。例如,如果您有设置(自以来Django 1.8):

TEMPLATES = [
{
    ...
    'OPTIONS': {
        'context_processors': [
            'django.template.context_processors.debug',
            'django.template.context_processors.request',
            'django.contrib.auth.context_processors.auth',
            'django.contrib.messages.context_processors.messages'
        ],
        'loaders': [
            'django.template.loaders.filesystem.Loader',
        ],
        ...
    }
}]

Django将不尝试一下使用模板的应用程序文件夹,因为你应该明确地添加django.template.loaders.app_directories.Loaderloaders了这一点。

请注意,默认情况下django.template.loaders.app_directories.Loader包含在中loaders

Not your case, but also possible reason is customized loaders for Django. For example, if you have in settings (since Django 1.8):

TEMPLATES = [
{
    ...
    'OPTIONS': {
        'context_processors': [
            'django.template.context_processors.debug',
            'django.template.context_processors.request',
            'django.contrib.auth.context_processors.auth',
            'django.contrib.messages.context_processors.messages'
        ],
        'loaders': [
            'django.template.loaders.filesystem.Loader',
        ],
        ...
    }
}]

Django will not try to look at applications folders with templates, because you should explicitly add django.template.loaders.app_directories.Loader into loaders for that.

Notice, that by default django.template.loaders.app_directories.Loader included into loaders.


回答 4

我遇到了同样的错误消息。就我而言,这是由于将后端设置为Jinja2。在我的设置文件中:

TEMPLATES = [
{
    'BACKEND': 'django.template.backends.jinja2.Jinja2',
...

将其更改回默认值可解决此问题:

TEMPLATES = [
{
    'BACKEND': 'django.template.backends.django.DjangoTemplates',
...

仍然不确定是否有办法将Jinja2后端与rest_framework一起使用。

I ran into the same error message. In my case, it was due to setting the backend to Jinja2. In my settings file:

TEMPLATES = [
{
    'BACKEND': 'django.template.backends.jinja2.Jinja2',
...

Changing this back to the default fixed the problem:

TEMPLATES = [
{
    'BACKEND': 'django.template.backends.django.DjangoTemplates',
...

Still not sure if there is a way to use the Jinja2 backend with rest_framework.


使用SQLAlchemy ORM高效地更新数据库

问题:使用SQLAlchemy ORM高效地更新数据库

我正在启动一个新应用程序,并考虑使用ORM,尤其是SQLAlchemy。

假设我的数据库中有一列“ foo”,我想增加它。在直通sqlite中,这很容易:

db = sqlite3.connect('mydata.sqlitedb')
cur = db.cursor()
cur.execute('update table stuff set foo = foo + 1')

我弄清楚了SQLAlchemy SQL-builder等效项:

engine = sqlalchemy.create_engine('sqlite:///mydata.sqlitedb')
md = sqlalchemy.MetaData(engine)
table = sqlalchemy.Table('stuff', md, autoload=True)
upd = table.update(values={table.c.foo:table.c.foo+1})
engine.execute(upd)

这稍微慢一点,但是没有太多。

这是我对SQLAlchemy ORM方法的最佳猜测:

# snip definition of Stuff class made using declarative_base
# snip creation of session object
for c in session.query(Stuff):
    c.foo = c.foo + 1
session.flush()
session.commit()

这样做是正确的,但所需的时间是其他两种方法的近50倍。我认为这是因为它必须先将所有数据带入内存,然后才能使用它。

有什么方法可以使用SQLAlchemy的ORM生成高效的SQL?还是使用其他任何Python ORM?还是我应该回到手工编写SQL?

I’m starting a new application and looking at using an ORM — in particular, SQLAlchemy.

Say I’ve got a column ‘foo’ in my database and I want to increment it. In straight sqlite, this is easy:

db = sqlite3.connect('mydata.sqlitedb')
cur = db.cursor()
cur.execute('update table stuff set foo = foo + 1')

I figured out the SQLAlchemy SQL-builder equivalent:

engine = sqlalchemy.create_engine('sqlite:///mydata.sqlitedb')
md = sqlalchemy.MetaData(engine)
table = sqlalchemy.Table('stuff', md, autoload=True)
upd = table.update(values={table.c.foo:table.c.foo+1})
engine.execute(upd)

This is slightly slower, but there’s not much in it.

Here’s my best guess for a SQLAlchemy ORM approach:

# snip definition of Stuff class made using declarative_base
# snip creation of session object
for c in session.query(Stuff):
    c.foo = c.foo + 1
session.flush()
session.commit()

This does the right thing, but it takes just under fifty times as long as the other two approaches. I presume that’s because it has to bring all the data into memory before it can work with it.

Is there any way to generate the efficient SQL using SQLAlchemy’s ORM? Or using any other python ORM? Or should I just go back to writing the SQL by hand?


回答 0

SQLAlchemy的ORM旨在与SQL层一起使用,而不是将其隐藏。但是,在同一事务中使用ORM和纯SQL时,您必须牢记一两件事。基本上,从一方面讲,仅当您从会话中清除更改时,ORM数据修改才会命中数据库。另一方面,SQL数据操作语句不会影响会话中的对象。

所以如果你说

for c in session.query(Stuff).all():
    c.foo = c.foo+1
session.commit()

它会按照说的去做,从数据库中获取所有对象,修改所有对象,然后在需要时将更改刷新到数据库中,一行一行地更新。

相反,您应该这样做:

session.execute(update(stuff_table, values={stuff_table.c.foo: stuff_table.c.foo + 1}))
session.commit()

这将像您期望的那样作为一个查询执行,并且因为至少默认会话配置在提交时使会话中的所有数据失效,所以您没有任何过时的数据问题。

在即将发布的0.5系列中,您还可以使用以下方法进行更新:

session.query(Stuff).update({Stuff.foo: Stuff.foo + 1})
session.commit()

基本上,它将运行与上一片段相同的SQL语句,但还会选择更改的行并使会话中的所有过时数据过期。如果您知道更新后没有使用任何会话数据,则也可以synchronize_session=False将其添加到update语句中并摆脱该选择。

SQLAlchemy’s ORM is meant to be used together with the SQL layer, not hide it. But you do have to keep one or two things in mind when using the ORM and plain SQL in the same transaction. Basically, from one side, ORM data modifications will only hit the database when you flush the changes from your session. From the other side, SQL data manipulation statements don’t affect the objects that are in your session.

So if you say

for c in session.query(Stuff).all():
    c.foo = c.foo+1
session.commit()

it will do what it says, go fetch all the objects from the database, modify all the objects and then when it’s time to flush the changes to the database, update the rows one by one.

Instead you should do this:

session.execute(update(stuff_table, values={stuff_table.c.foo: stuff_table.c.foo + 1}))
session.commit()

This will execute as one query as you would expect, and because at least the default session configuration expires all data in the session on commit you don’t have any stale data issues.

In the almost-released 0.5 series you could also use this method for updating:

session.query(Stuff).update({Stuff.foo: Stuff.foo + 1})
session.commit()

That will basically run the same SQL statement as the previous snippet, but also select the changed rows and expire any stale data in the session. If you know you aren’t using any session data after the update you could also add synchronize_session=False to the update statement and get rid of that select.


回答 1

session.query(Clients).filter(Clients.id == client_id_list).update({'status': status})
session.commit()

试试这个=)

session.query(Clients).filter(Clients.id == client_id_list).update({'status': status})
session.commit()

Try this =)


回答 2

有几种使用sqlalchemy进行更新的方法

1) for c in session.query(Stuff).all():
       c.foo += 1
   session.commit()

2) session.query().\
       update({"foo": (Stuff.foo + 1)})
   session.commit()

3) conn = engine.connect()
   stmt = Stuff.update().\
       values(Stuff.foo = (Stuff.foo + 1))
   conn.execute(stmt)

There are several ways to UPDATE using sqlalchemy

1) for c in session.query(Stuff).all():
       c.foo += 1
   session.commit()

2) session.query().\
       update({"foo": (Stuff.foo + 1)})
   session.commit()

3) conn = engine.connect()
   stmt = Stuff.update().\
       values(Stuff.foo = (Stuff.foo + 1))
   conn.execute(stmt)

回答 3

这是一个无需手动映射字段即可解决相同问题的示例:

from sqlalchemy import Column, ForeignKey, Integer, String, Date, DateTime, text, create_engine
from sqlalchemy.exc import IntegrityError
from sqlalchemy.ext.declarative import declarative_base
from sqlalchemy.orm import sessionmaker
from sqlalchemy.orm.attributes import InstrumentedAttribute

engine = create_engine('postgres://postgres@localhost:5432/database')
session = sessionmaker()
session.configure(bind=engine)

Base = declarative_base()


class Media(Base):
  __tablename__ = 'media'
  id = Column(Integer, primary_key=True)
  title = Column(String, nullable=False)
  slug = Column(String, nullable=False)
  type = Column(String, nullable=False)

  def update(self):
    s = session()
    mapped_values = {}
    for item in Media.__dict__.iteritems():
      field_name = item[0]
      field_type = item[1]
      is_column = isinstance(field_type, InstrumentedAttribute)
      if is_column:
        mapped_values[field_name] = getattr(self, field_name)

    s.query(Media).filter(Media.id == self.id).update(mapped_values)
    s.commit()

因此,要更新Media实例,您可以执行以下操作:

media = Media(id=123, title="Titular Line", slug="titular-line", type="movie")
media.update()

Here’s an example of how to solve the same problem without having to map the fields manually:

from sqlalchemy import Column, ForeignKey, Integer, String, Date, DateTime, text, create_engine
from sqlalchemy.exc import IntegrityError
from sqlalchemy.ext.declarative import declarative_base
from sqlalchemy.orm import sessionmaker
from sqlalchemy.orm.attributes import InstrumentedAttribute

engine = create_engine('postgres://postgres@localhost:5432/database')
session = sessionmaker()
session.configure(bind=engine)

Base = declarative_base()


class Media(Base):
  __tablename__ = 'media'
  id = Column(Integer, primary_key=True)
  title = Column(String, nullable=False)
  slug = Column(String, nullable=False)
  type = Column(String, nullable=False)

  def update(self):
    s = session()
    mapped_values = {}
    for item in Media.__dict__.iteritems():
      field_name = item[0]
      field_type = item[1]
      is_column = isinstance(field_type, InstrumentedAttribute)
      if is_column:
        mapped_values[field_name] = getattr(self, field_name)

    s.query(Media).filter(Media.id == self.id).update(mapped_values)
    s.commit()

So to update a Media instance, you can do something like this:

media = Media(id=123, title="Titular Line", slug="titular-line", type="movie")
media.update()

回答 4

经过足够的测试,我会尝试:

for c in session.query(Stuff).all():
     c.foo = c.foo+1
session.commit()

(IIRC,commit()在不使用flush()的情况下工作)。

我发现有时执行大型查询然后在python中进行迭代比许多查询快2个数量级。我假设遍历查询对象的效率不及遍历查询对象的all()方法生成的列表的效率。

[请注意下面的评论-这根本没有加快速度]。

Withough testing, I’d try:

for c in session.query(Stuff).all():
     c.foo = c.foo+1
session.commit()

(IIRC, commit() works without flush()).

I’ve found that at times doing a large query and then iterating in python can be up to 2 orders of magnitude faster than lots of queries. I assume that iterating over the query object is less efficient than iterating over a list generated by the all() method of the query object.

[Please note comment below – this did not speed things up at all].


回答 5

如果是由于创建对象方面的开销,那么使用SA可能根本无法加速。

如果是因为它正在加载相关对象,那么您可以通过延迟加载来执行某些操作。是否存在大量由于引用而创建的对象?(即,获取Company对象也将获取所有相关的People对象)。

If it is because of the overhead in terms of creating objects, then it probably can’t be sped up at all with SA.

If it is because it is loading up related objects, then you might be able to do something with lazy loading. Are there lots of objects being created due to references? (IE, getting a Company object also gets all of the related People objects).


从列表列表中删除重复项

问题:从列表列表中删除重复项

我在Python中有一个列表列表:

k = [[1, 2], [4], [5, 6, 2], [1, 2], [3], [4]]

我想从中删除重复的元素。如果这是正常列表,而不是我可以使用的列表set。但不幸的是,该列表不可散列,因此无法建立一组列表。只有元组。因此,我可以将所有列表转换为元组,然后使用set并返回列表。但这不是很快。

如何以最有效的方式做到这一点?

上面的结果应为:

k = [[5, 6, 2], [1, 2], [3], [4]]

我不在乎保留订单。

注意:这个问题很相似,但不是我所需要的。搜索了SO,但没有找到确切的重复项。


基准测试:

import itertools, time


class Timer(object):
    def __init__(self, name=None):
        self.name = name

    def __enter__(self):
        self.tstart = time.time()

    def __exit__(self, type, value, traceback):
        if self.name:
            print '[%s]' % self.name,
        print 'Elapsed: %s' % (time.time() - self.tstart)


k = [[1, 2], [4], [5, 6, 2], [1, 2], [3], [5, 2], [6], [8], [9]] * 5
N = 100000

print len(k)

with Timer('set'):
    for i in xrange(N):
        kt = [tuple(i) for i in k]
        skt = set(kt)
        kk = [list(i) for i in skt]


with Timer('sort'):
    for i in xrange(N):
        ks = sorted(k)
        dedup = [ks[i] for i in xrange(len(ks)) if i == 0 or ks[i] != ks[i-1]]


with Timer('groupby'):
    for i in xrange(N):
        k = sorted(k)
        dedup = list(k for k, _ in itertools.groupby(k))

with Timer('loop in'):
    for i in xrange(N):
        new_k = []
        for elem in k:
            if elem not in new_k:
                new_k.append(elem)

对于短列表,“循环”(二次方方法)最快。对于长列表,它比除groupby方法外的每个人都快。这有意义吗?

对于短列表(代码中的一个),进行100000次迭代:

[set] Elapsed: 1.3900001049
[sort] Elapsed: 0.891000032425
[groupby] Elapsed: 0.780999898911
[loop in] Elapsed: 0.578000068665

对于更长的列表(代码中的一个重复了5次):

[set] Elapsed: 3.68700003624
[sort] Elapsed: 3.43799996376
[groupby] Elapsed: 1.03099989891
[loop in] Elapsed: 1.85900020599

I have a list of lists in Python:

k = [[1, 2], [4], [5, 6, 2], [1, 2], [3], [4]]

And I want to remove duplicate elements from it. Was if it a normal list not of lists I could used set. But unfortunate that list is not hashable and can’t make set of lists. Only of tuples. So I can turn all lists to tuples then use set and back to lists. But this isn’t fast.

How can this done in the most efficient way?

The result of above list should be:

k = [[5, 6, 2], [1, 2], [3], [4]]

I don’t care about preserve order.

Note: this question is similar but not quite what I need. Searched SO but didn’t find exact duplicate.


Benchmarking:

import itertools, time


class Timer(object):
    def __init__(self, name=None):
        self.name = name

    def __enter__(self):
        self.tstart = time.time()

    def __exit__(self, type, value, traceback):
        if self.name:
            print '[%s]' % self.name,
        print 'Elapsed: %s' % (time.time() - self.tstart)


k = [[1, 2], [4], [5, 6, 2], [1, 2], [3], [5, 2], [6], [8], [9]] * 5
N = 100000

print len(k)

with Timer('set'):
    for i in xrange(N):
        kt = [tuple(i) for i in k]
        skt = set(kt)
        kk = [list(i) for i in skt]


with Timer('sort'):
    for i in xrange(N):
        ks = sorted(k)
        dedup = [ks[i] for i in xrange(len(ks)) if i == 0 or ks[i] != ks[i-1]]


with Timer('groupby'):
    for i in xrange(N):
        k = sorted(k)
        dedup = list(k for k, _ in itertools.groupby(k))

with Timer('loop in'):
    for i in xrange(N):
        new_k = []
        for elem in k:
            if elem not in new_k:
                new_k.append(elem)

“loop in” (quadratic method) fastest of all for short lists. For long lists it’s faster then everyone except groupby method. Does this make sense?

For short list (the one in the code), 100000 iterations:

[set] Elapsed: 1.3900001049
[sort] Elapsed: 0.891000032425
[groupby] Elapsed: 0.780999898911
[loop in] Elapsed: 0.578000068665

For longer list (the one in the code duplicated 5 times):

[set] Elapsed: 3.68700003624
[sort] Elapsed: 3.43799996376
[groupby] Elapsed: 1.03099989891
[loop in] Elapsed: 1.85900020599

回答 0

>>> k = [[1, 2], [4], [5, 6, 2], [1, 2], [3], [4]]
>>> import itertools
>>> k.sort()
>>> list(k for k,_ in itertools.groupby(k))
[[1, 2], [3], [4], [5, 6, 2]]

itertools通常会为此类问题提供最快,最强大的解决方案,非常值得您熟悉!-)

编辑:正如我在评论中提到的那样,正常的优化工作主要集中在大型输入(big-O方法)上,因为它要容易得多,可以提供良好的回报。但是有时(本质上是对推动性能极限界限的深层代码内循环中的“悲剧性关键瓶颈”),可能需要更详细地介绍概率分布,从而确定要优化的性能指标(可能是上限或第90个百分位数比平均值或中位数更重要,具体取决于一个人的应用程序),一开始执行启发式检查,然后根据输入数据特征选择不同的算法,依此类推。

仔细测量“点”性能(特定输入的代码A与代码B)是此极其昂贵的过程的一部分,标准库模块timeit在此方面可以提供帮助。但是,在shell提示符下使用它更容易。例如,这是一个简短的模块,展示了解决此问题的一般方法,另存为nodup.py

import itertools

k = [[1, 2], [4], [5, 6, 2], [1, 2], [3], [4]]

def doset(k, map=map, list=list, set=set, tuple=tuple):
  return map(list, set(map(tuple, k)))

def dosort(k, sorted=sorted, xrange=xrange, len=len):
  ks = sorted(k)
  return [ks[i] for i in xrange(len(ks)) if i == 0 or ks[i] != ks[i-1]]

def dogroupby(k, sorted=sorted, groupby=itertools.groupby, list=list):
  ks = sorted(k)
  return [i for i, _ in itertools.groupby(ks)]

def donewk(k):
  newk = []
  for i in k:
    if i not in newk:
      newk.append(i)
  return newk

# sanity check that all functions compute the same result and don't alter k
if __name__ == '__main__':
  savek = list(k)
  for f in doset, dosort, dogroupby, donewk:
    resk = f(k)
    assert k == savek
    print '%10s %s' % (f.__name__, sorted(resk))

请注意进行完整性检查(仅在执行时执行python nodup.py)和基本的提升技术(使每个函数局部具有恒定的全局名称以提高速度)可以使事物处于平等的地位。

现在,我们可以在较小的示例列表上运行检查:

$ python -mtimeit -s'import nodup' 'nodup.doset(nodup.k)'
100000 loops, best of 3: 11.7 usec per loop
$ python -mtimeit -s'import nodup' 'nodup.dosort(nodup.k)'
100000 loops, best of 3: 9.68 usec per loop
$ python -mtimeit -s'import nodup' 'nodup.dogroupby(nodup.k)'
100000 loops, best of 3: 8.74 usec per loop
$ python -mtimeit -s'import nodup' 'nodup.donewk(nodup.k)'
100000 loops, best of 3: 4.44 usec per loop

证实了二次方法具有足够小的常数,使其对于具有很少重复值的小列表具有吸引力。有一个没有重复的简短列表:

$ python -mtimeit -s'import nodup' 'nodup.donewk([[i] for i in range(12)])'
10000 loops, best of 3: 25.4 usec per loop
$ python -mtimeit -s'import nodup' 'nodup.dogroupby([[i] for i in range(12)])'
10000 loops, best of 3: 23.7 usec per loop
$ python -mtimeit -s'import nodup' 'nodup.doset([[i] for i in range(12)])'
10000 loops, best of 3: 31.3 usec per loop
$ python -mtimeit -s'import nodup' 'nodup.dosort([[i] for i in range(12)])'
10000 loops, best of 3: 25 usec per loop

二次法也不错,但排序和分组比较好。等等

如果(如对性能的痴迷所示)此操作位于“推动边界”应用程序的核心内循环中,则值得在其他代表性输入样本上尝试同一组测试,可能会检测到一些可以启发式地让您感到满意的简单措施选择一种或另一种方法(但是措施一定要很快)。

还值得考虑使用其他表示形式k-为什么它必须首先是列表列表而不是一组元组?如果重复删除任务很频繁,并且性能分析表明它是程序的性能瓶颈,则始终保留一组元组,并仅在需要和需要时才从中获取列表列表,例如,整体上可能会更快。

>>> k = [[1, 2], [4], [5, 6, 2], [1, 2], [3], [4]]
>>> import itertools
>>> k.sort()
>>> list(k for k,_ in itertools.groupby(k))
[[1, 2], [3], [4], [5, 6, 2]]

itertools often offers the fastest and most powerful solutions to this kind of problems, and is well worth getting intimately familiar with!-)

Edit: as I mention in a comment, normal optimization efforts are focused on large inputs (the big-O approach) because it’s so much easier that it offers good returns on efforts. But sometimes (essentially for “tragically crucial bottlenecks” in deep inner loops of code that’s pushing the boundaries of performance limits) one may need to go into much more detail, providing probability distributions, deciding which performance measures to optimize (maybe the upper bound or the 90th centile is more important than an average or median, depending on one’s apps), performing possibly-heuristic checks at the start to pick different algorithms depending on input data characteristics, and so forth.

Careful measurements of “point” performance (code A vs code B for a specific input) are a part of this extremely costly process, and standard library module timeit helps here. However, it’s easier to use it at a shell prompt. For example, here’s a short module to showcase the general approach for this problem, save it as nodup.py:

import itertools

k = [[1, 2], [4], [5, 6, 2], [1, 2], [3], [4]]

def doset(k, map=map, list=list, set=set, tuple=tuple):
  return map(list, set(map(tuple, k)))

def dosort(k, sorted=sorted, xrange=xrange, len=len):
  ks = sorted(k)
  return [ks[i] for i in xrange(len(ks)) if i == 0 or ks[i] != ks[i-1]]

def dogroupby(k, sorted=sorted, groupby=itertools.groupby, list=list):
  ks = sorted(k)
  return [i for i, _ in itertools.groupby(ks)]

def donewk(k):
  newk = []
  for i in k:
    if i not in newk:
      newk.append(i)
  return newk

# sanity check that all functions compute the same result and don't alter k
if __name__ == '__main__':
  savek = list(k)
  for f in doset, dosort, dogroupby, donewk:
    resk = f(k)
    assert k == savek
    print '%10s %s' % (f.__name__, sorted(resk))

Note the sanity check (performed when you just do python nodup.py) and the basic hoisting technique (make constant global names local to each function for speed) to put things on equal footing.

Now we can run checks on the tiny example list:

$ python -mtimeit -s'import nodup' 'nodup.doset(nodup.k)'
100000 loops, best of 3: 11.7 usec per loop
$ python -mtimeit -s'import nodup' 'nodup.dosort(nodup.k)'
100000 loops, best of 3: 9.68 usec per loop
$ python -mtimeit -s'import nodup' 'nodup.dogroupby(nodup.k)'
100000 loops, best of 3: 8.74 usec per loop
$ python -mtimeit -s'import nodup' 'nodup.donewk(nodup.k)'
100000 loops, best of 3: 4.44 usec per loop

confirming that the quadratic approach has small-enough constants to make it attractive for tiny lists with few duplicated values. With a short list without duplicates:

$ python -mtimeit -s'import nodup' 'nodup.donewk([[i] for i in range(12)])'
10000 loops, best of 3: 25.4 usec per loop
$ python -mtimeit -s'import nodup' 'nodup.dogroupby([[i] for i in range(12)])'
10000 loops, best of 3: 23.7 usec per loop
$ python -mtimeit -s'import nodup' 'nodup.doset([[i] for i in range(12)])'
10000 loops, best of 3: 31.3 usec per loop
$ python -mtimeit -s'import nodup' 'nodup.dosort([[i] for i in range(12)])'
10000 loops, best of 3: 25 usec per loop

the quadratic approach isn’t bad, but the sort and groupby ones are better. Etc, etc.

If (as the obsession with performance suggests) this operation is at a core inner loop of your pushing-the-boundaries application, it’s worth trying the same set of tests on other representative input samples, possibly detecting some simple measure that could heuristically let you pick one or the other approach (but the measure must be fast, of course).

It’s also well worth considering keeping a different representation for k — why does it have to be a list of lists rather than a set of tuples in the first place? If the duplicate removal task is frequent, and profiling shows it to be the program’s performance bottleneck, keeping a set of tuples all the time and getting a list of lists from it only if and where needed, might be faster overall, for example.


回答 1

手动执行此操作,创建一个新k列表并添加到目前为止未找到的条目:

k = [[1, 2], [4], [5, 6, 2], [1, 2], [3], [4]]
new_k = []
for elem in k:
    if elem not in new_k:
        new_k.append(elem)
k = new_k
print k
# prints [[1, 2], [4], [5, 6, 2], [3]]

易于理解,可以保留每个元素第一次出现的顺序,这很有用,但是我想在搜索new_k每个元素的整体时,其复杂度是二次的。

Doing it manually, creating a new k list and adding entries not found so far:

k = [[1, 2], [4], [5, 6, 2], [1, 2], [3], [4]]
new_k = []
for elem in k:
    if elem not in new_k:
        new_k.append(elem)
k = new_k
print k
# prints [[1, 2], [4], [5, 6, 2], [3]]

Simple to comprehend, and you preserve the order of the first occurrence of each element should that be useful, but I guess it’s quadratic in complexity as you’re searching the whole of new_k for each element.


回答 2

>>> k = [[1, 2], [4], [5, 6, 2], [1, 2], [3], [4]]
>>> k = sorted(k)
>>> k
[[1, 2], [1, 2], [3], [4], [4], [5, 6, 2]]
>>> dedup = [k[i] for i in range(len(k)) if i == 0 or k[i] != k[i-1]]
>>> dedup
[[1, 2], [3], [4], [5, 6, 2]]

我不知道它是否一定要更快,但是您不必使用元组和集合。

>>> k = [[1, 2], [4], [5, 6, 2], [1, 2], [3], [4]]
>>> k = sorted(k)
>>> k
[[1, 2], [1, 2], [3], [4], [4], [5, 6, 2]]
>>> dedup = [k[i] for i in range(len(k)) if i == 0 or k[i] != k[i-1]]
>>> dedup
[[1, 2], [3], [4], [5, 6, 2]]

I don’t know if it’s necessarily faster, but you don’t have to use to tuples and sets.


回答 3

set到目前为止,所有与该问题相关的解决方案都需要set在迭代之前创建一个完整的整体。

通过迭代列表列表并添加到“ seen”中,可以使这种懒惰并同时保留顺序set。然后仅在此跟踪器中找不到列表时才产生列表set

unique_everseen食谱可在itertools docs中找到。也可以在第3方toolz库中使用:

from toolz import unique

k = [[1, 2], [4], [5, 6, 2], [1, 2], [3], [4]]

# lazy iterator
res = map(list, unique(map(tuple, k)))

print(list(res))

[[1, 2], [4], [5, 6, 2], [3]]

请注意,tuple转换是必需的,因为列表不可散列。

All the set-related solutions to this problem thus far require creating an entire set before iteration.

It is possible to make this lazy, and at the same time preserve order, by iterating the list of lists and adding to a “seen” set. Then only yield a list if it is not found in this tracker set.

This unique_everseen recipe is available in the itertools docs. It’s also available in the 3rd party toolz library:

from toolz import unique

k = [[1, 2], [4], [5, 6, 2], [1, 2], [3], [4]]

# lazy iterator
res = map(list, unique(map(tuple, k)))

print(list(res))

[[1, 2], [4], [5, 6, 2], [3]]

Note that tuple conversion is necessary because lists are not hashable.


回答 4

甚至您的“长”列表也很短。另外,您是否选择了它们以匹配实际数据?性能将随这些数据的实际外观而变化。例如,您有一遍又一遍的简短列表,以使列表更长。这意味着二次解在您的基准测试中是线性的,但实际上并非如此。

对于实际较大的列表,设置代码是最好的选择-它是线性的(尽管需要大量空间)。sort和groupby方法为O(n log n),方法中的循环显然是二次的,所以您知道随着n的变大,它们将如何扩展。如果这是您正在分析的数据的实际大小,那么谁在乎呢?很小

顺便说一句,如果我没有形成中间列表来进行设置,那我会看到明显的加速,也就是说,如果我替换

kt = [tuple(i) for i in k]
skt = set(kt)

skt = set(tuple(i) for i in k)

真正的解决方案可能取决于更多信息:您确定列表列表确实是您所需要的表示形式吗?

Even your “long” list is pretty short. Also, did you choose them to match the actual data? Performance will vary with what these data actually look like. For example, you have a short list repeated over and over to make a longer list. This means that the quadratic solution is linear in your benchmarks, but not in reality.

For actually-large lists, the set code is your best bet—it’s linear (although space-hungry). The sort and groupby methods are O(n log n) and the loop in method is obviously quadratic, so you know how these will scale as n gets really big. If this is the real size of the data you are analyzing, then who cares? It’s tiny.

Incidentally, I’m seeing a noticeable speedup if I don’t form an intermediate list to make the set, that is to say if I replace

kt = [tuple(i) for i in k]
skt = set(kt)

with

skt = set(tuple(i) for i in k)

The real solution may depend on more information: Are you sure that a list of lists is really the representation you need?


回答 5

元组和{}的列表可用于删除重复项

>>> [list(tupl) for tupl in {tuple(item) for item in k }]
[[1, 2], [5, 6, 2], [3], [4]]
>>> 

List of tuple and {} can be used to remove duplicates

>>> [list(tupl) for tupl in {tuple(item) for item in k }]
[[1, 2], [5, 6, 2], [3], [4]]
>>> 

回答 6

创建一个以元组为键的字典,然后打印键。

  • 创建以元组为键,索引为值的字典
  • 打印字典键列表

k = [[1, 2], [4], [5, 6, 2], [1, 2], [3], [4]]

dict_tuple = {tuple(item): index for index, item in enumerate(k)}

print [list(itm) for itm in dict_tuple.keys()]

# prints [[1, 2], [5, 6, 2], [3], [4]]

Create a dictionary with tuple as the key, and print the keys.

  • create dictionary with tuple as key and index as value
  • print list of keys of dictionary

k = [[1, 2], [4], [5, 6, 2], [1, 2], [3], [4]]

dict_tuple = {tuple(item): index for index, item in enumerate(k)}

print [list(itm) for itm in dict_tuple.keys()]

# prints [[1, 2], [5, 6, 2], [3], [4]]

回答 7

这应该工作。

k = [[1, 2], [4], [5, 6, 2], [1, 2], [3], [4]]

k_cleaned = []
for ele in k:
    if set(ele) not in [set(x) for x in k_cleaned]:
        k_cleaned.append(ele)
print(k_cleaned)

# output: [[1, 2], [4], [5, 6, 2], [3]]

This should work.

k = [[1, 2], [4], [5, 6, 2], [1, 2], [3], [4]]

k_cleaned = []
for ele in k:
    if set(ele) not in [set(x) for x in k_cleaned]:
        k_cleaned.append(ele)
print(k_cleaned)

# output: [[1, 2], [4], [5, 6, 2], [3]]

回答 8

奇怪的是,以上答案删除了“重复项”,但是如果我也想删除重复的值怎么办?以下内容应该有用,并且不会在内存中创建新对象!

def dictRemoveDuplicates(self):
    a=[[1,'somevalue1'],[1,'somevalue2'],[2,'somevalue1'],[3,'somevalue4'],[5,'somevalue5'],[5,'somevalue1'],[5,'somevalue1'],[5,'somevalue8'],[6,'somevalue9'],[6,'somevalue0'],[6,'somevalue1'],[7,'somevalue7']]


print(a)
temp = 0
position = -1
for pageNo, item in a:
    position+=1
    if pageNo != temp:
        temp = pageNo
        continue
    else:
        a[position] = 0
        a[position - 1] = 0
a = [x for x in a if x != 0]         
print(a)

和/ /是

[[1, 'somevalue1'], [1, 'somevalue2'], [2, 'somevalue1'], [3, 'somevalue4'], [5, 'somevalue5'], [5, 'somevalue1'], [5, 'somevalue1'], [5, 'somevalue8'], [6, 'somevalue9'], [6, 'somevalue0'], [6, 'somevalue1'], [7, 'somevalue7']]
[[2, 'somevalue1'], [3, 'somevalue4'], [7, 'somevalue7']]

Strangely, the answers above removes the ‘duplicates’ but what if I want to remove the duplicated value also?? The following should be useful and does not create a new object in memory!

def dictRemoveDuplicates(self):
    a=[[1,'somevalue1'],[1,'somevalue2'],[2,'somevalue1'],[3,'somevalue4'],[5,'somevalue5'],[5,'somevalue1'],[5,'somevalue1'],[5,'somevalue8'],[6,'somevalue9'],[6,'somevalue0'],[6,'somevalue1'],[7,'somevalue7']]


print(a)
temp = 0
position = -1
for pageNo, item in a:
    position+=1
    if pageNo != temp:
        temp = pageNo
        continue
    else:
        a[position] = 0
        a[position - 1] = 0
a = [x for x in a if x != 0]         
print(a)

and the o/p is:

[[1, 'somevalue1'], [1, 'somevalue2'], [2, 'somevalue1'], [3, 'somevalue4'], [5, 'somevalue5'], [5, 'somevalue1'], [5, 'somevalue1'], [5, 'somevalue8'], [6, 'somevalue9'], [6, 'somevalue0'], [6, 'somevalue1'], [7, 'somevalue7']]
[[2, 'somevalue1'], [3, 'somevalue4'], [7, 'somevalue7']]

回答 9

另一个可能更通用,更简单的解决方案是创建一个由对象的字符串版本作为键的字典,并在最后获取values():

>>> dict([(unicode(a),a) for a in [["A", "A"], ["A", "A"], ["A", "B"]]]).values()
[['A', 'B'], ['A', 'A']]

要注意的是,这仅适用于字符串表示形式是足够好的唯一键的对象(对于大多数本机对象而言都是如此)。

Another probably more generic and simpler solution is to create a dictionary keyed by the string version of the objects and getting the values() at the end:

>>> dict([(unicode(a),a) for a in [["A", "A"], ["A", "A"], ["A", "B"]]]).values()
[['A', 'B'], ['A', 'A']]

The catch is that this only works for objects whose string representation is a good-enough unique key (which is true for most native objects).


python3和python3m可执行文件之间的区别

问题:python3和python3m可执行文件之间的区别

是什么之间的区别/usr/bin/python3/usr/bin/python3mexecutibles?

我在Ubuntu 13.04上观察到了它们,但是Google建议它们也存在于其他发行版中。

这两个文件具有相同的md5sum,但似乎不是符号链接或硬链接。这两个文件返回的索引节点号不同,ls -li并且测试find -xdev -samefile /usr/bin/python3.3不返回任何其他文件。

有人在AskUbuntu上问了类似的问题,但我想更多地了解这两个文件之间的区别。

What is the difference between the /usr/bin/python3 and /usr/bin/python3m executibles?

I am observing them on Ubuntu 13.04, but Google suggests that they exist on other distributions too.

The two files have the same md5sum, but do not seem to be symbolic links or hard links; the two files have different inode numbers returned by ls -li and testing find -xdev -samefile /usr/bin/python3.3 does not return any other files.

Someone asked a similar question on AskUbuntu, but I wanted to find out more about the difference between the two files.


回答 0

值得称赞的chepner指出我已经拥有该解决方案的链接。

Python实现可以在文件名标签中适当地包含其他标志。例如,在POSIX系统上,这些标志也将有助于文件名:

–with-pydebug(标志:d)

–with-pymalloc(标志:m)

–with-wide-unicode(标志:u)

通过PEP 3149

关于m标志,这是Pymalloc的含义:

Pymalloc是由Vladimir Marangozov编写的专用对象分配器,它是Python 2.1中新增的一项功能。Pymalloc旨在比系统malloc()更快,并且对于Python程序典型的分配模式而言,具有较少的内存开销。分配器使用C的malloc()函数获取较大的内存池,然后从这些池执行较小的内存请求。

通过Python 2.3的新功能

最后,这两个文件可能在某些系统上被硬链接。虽然两个文件在我的Ubuntu 13.04系统上具有不同的inode编号(因此是不同的文件),但两年前comp.lang.python帖子显示它们曾经被硬链接过。

Credit for this goes to chepner for pointing out that I already had the link to the solution.

Python implementations MAY include additional flags in the file name tag as appropriate. For example, on POSIX systems these flags will also contribute to the file name:

–with-pydebug (flag: d)

–with-pymalloc (flag: m)

–with-wide-unicode (flag: u)

via PEP 3149.

Regarding the m flag specifically, this is what Pymalloc is:

Pymalloc, a specialized object allocator written by Vladimir Marangozov, was a feature added to Python 2.1. Pymalloc is intended to be faster than the system malloc() and to have less memory overhead for allocation patterns typical of Python programs. The allocator uses C’s malloc() function to get large pools of memory and then fulfills smaller memory requests from these pools.

via What’s New in Python 2.3

Finally, the two files may be hardlinked on some systems. While the two files have different inode numbers on my Ubuntu 13.04 system (thus are different files), a comp.lang.python post from two years ago shows that they once were hardlinked.


在Java中调用Python?

问题:在Java中调用Python?

我想知道是否可以使用jython从Java代码调用python函数,还是仅用于从python调用Java代码?

I am wondering if it is possible to call python functions from java code using jython, or is it only for calling java code from python?


回答 0

Jython:适用于Java平台的Python- http ://www.jython.org/index.html

您可以使用Jython从Java代码轻松调用python函数。只要您的python代码本身在jython下运行,即不使用某些不受支持的c扩展名。

如果这对您有用,那肯定是您可以获得的最简单的解决方案。否则,您可以使用org.python.util.PythonInterpreter新的Java6解释器支持。

我的脑海中有一个简单的例子-但我希望它可以工作:(为简便起见,没有进行错误检查)

PythonInterpreter interpreter = new PythonInterpreter();
interpreter.exec("import sys\nsys.path.append('pathToModules if they are not there by default')\nimport yourModule");
// execute a function that takes a string and returns a string
PyObject someFunc = interpreter.get("funcName");
PyObject result = someFunc.__call__(new PyString("Test!"));
String realResult = (String) result.__tojava__(String.class);

Jython: Python for the Java Platform – http://www.jython.org/index.html

You can easily call python functions from Java code with Jython. That is as long as your python code itself runs under jython, i.e. doesn’t use some c-extensions that aren’t supported.

If that works for you, it’s certainly the simplest solution you can get. Otherwise you can use org.python.util.PythonInterpreter from the new Java6 interpreter support.

A simple example from the top of my head – but should work I hope: (no error checking done for brevity)

PythonInterpreter interpreter = new PythonInterpreter();
interpreter.exec("import sys\nsys.path.append('pathToModules if they are not there by default')\nimport yourModule");
// execute a function that takes a string and returns a string
PyObject someFunc = interpreter.get("funcName");
PyObject result = someFunc.__call__(new PyString("Test!"));
String realResult = (String) result.__tojava__(String.class);

回答 1

嘿,我想我会输入我的答案,尽管已经很晚了。我想首先要考虑一些重要的事情,即您希望在java和python之间建立多强的连接。

首先 ,您是否只想调用函数,或者您是否真的希望python代码更改Java对象中的数据?这个非常重要。如果您只想调用带或不带参数的python代码,那并不是很难。如果您的参数是基元,那么它将变得更加容易。但是,如果您想让Java类在python中实现成员函数,这些成员函数会更改java对象的数据,那么这并不是那么容易或直接的。

其次,我们在谈论cpython还是jython做?我会说cpython是它的所在!我主张这就是为什么python如此强大的原因!但是,在需要时具有如此高的抽象度却可以访问c,c ++。想象一下您是否可以在Java中使用它。这个问题甚至都不值得问jython是否还可以,因为这样很容易。

因此,我使用以下方法,并从容易到困难列出了它们:

Java到Jython

优点:轻而易举。实际引用Java对象

缺点:没有CPython,非常慢!

来自Java的Jython非常简单,如果确实够了,那就太好了。但是它非常慢并且没有cpython!没有cpython值得生活,我不这么认为!您可以轻松地让python代码为java对象实现成员函数。

通过Pyro从Java到Jython到CPython

Pyro是python的远程对象模块。您在cpython解释器上有一些对象,您可以向其发送通过序列化传输的对象,也可以通过此方法返回对象。请注意,如果您从jython发送一个序列化的python对象,然后调用某些函数来更改其成员中的数据,那么您将在java中看不到这些更改。您只需要记住从pyro发送回想要的数据。我相信这是进入cpython的最简单方法!您不需要任何jni或jna或swig或…。您不需要了解任何c或c ++。酷吧?

优点:访问cpython,不像以下方法那样困难

缺点:无法直接从python更改java对象的成员数据。有点间接,(jython是中间人)。

通过JNI / JNA / SWIG将Java转换为C / C ++,通过嵌入式解释器转换为Python(也许使用BOOST库?)

OMG这种方法不适合胆小的人。我可以告诉您,用一种体面的方法来实现这一目标已经花了我很长时间。您要执行此操作的主要原因是,您可以运行cpython代码,以完全控制您的java对象。在决定尝试使用python(就像一匹马)为java(像黑猩猩)做面包之前,需要考虑一些主要的主要事情。首先,如果您崩溃的解释器为您的程序点亮了!而且不要让我开始讨论并发问题!另外,还有锅炉分配器,我相信我已经找到了使该锅炉最小化的最佳配置,但仍然是分配器!那么该怎么做:考虑一下C ++是您的中间人,您的对象实际上就是c ++对象!很好,您现在就知道。只需编写您的对象,就好像您的程序在cpp中而不是java中一样,您想从两个世界访问的数据。然后,您可以使用名为swig(http://www.swig.org/Doc1.3/Java.html),以使Java可以访问此文件并编译一个dll,您可以在java中调用System.load(此处为dll名称)。首先使此工作正常,然后继续进行困难的工作!要使用python,您需要嵌入一个解释器。首先,我建议您编写一些hello解释程序或本教程 python嵌入C / C中。一旦完成这项工作,就该让马和Monkey跳舞了!您可以通过[boost] [3]将c ++对象发送给python。我知道我没有给你鱼,只是告诉你在哪里可以找到鱼。编译时需要注意的一些指针。

编译boost时,您将需要编译一个共享库。并且您需要包括并链接到jdk中所需的内容,即jawt.lib,jvm.lib(启动应用程序时,您的路径中还将需要客户端jvm.dll)以及python27.lib或以及boost_python-vc100-mt-1_55.lib。然后包括Python / include,jdk / include,boost,并且仅使用共享库(dll),否则boost有眼泪。是的,我知道。有很多方法可以解决此问题。因此,请确保您一步一步地完成每件事。然后将它们放在一起。

Hey I thought I would enter my answer to this even though its late. I think there are some important things to consider first with how strong you wish to have the linking between java and python.

Firstly Do you only want to call functions or do you actually want python code to change the data in your java objects? This is very important. If you only want to call some python code with or without arguments, then that is not very difficult. If your arguments are primitives it makes it even more easy. However if you want to have java class implement member functions in python, which change the data of the java object, then this is not so easy or straight forward.

Secondly are we talking cpython or will jython do? I would say cpython is where its at! I would advocate this is why python is so kool! Having such high abstractions however access to c,c++ when needed. Imagine if you could have that in java. This question is not even worth asking if jython is ok because then it is easy anyway.

So I have played with the following methods, and listed them from easy to difficult:

Java to Jython

Advantages: Trivially easy. Have actual references to java objects

Disadvantages: No CPython, Extremely Slow!

Jython from java is so easy, and if this is really enough then great. However it is very slow and no cpython! Is life worth living without cpython I don’t think so! You can easily have python code implementing your member functions for you java objects.

Java to Jython to CPython via Pyro

Pyro is the remote object module for python. You have some object on a cpython interpreter, and you can send it objects which are transferred via serialization and it can also return objects via this method. Note that if you send a serialized python object from jython and then call some functions which change the data in its members, then you will not see those changes in java. You just need to remember to send back the data which you want from pyro. This I believe is the easiest way to get to cpython! You do not need any jni or jna or swig or …. You don’t need to know any c, or c++. kool huh?

Advantages: Access to cpython, not as difficult as following methods

Disadvantages: Cannot change the member data of java objects directly from python. Is somewhat indirect, (jython is middle man).

Java to C/C++ via JNI/JNA/SWIG to Python via Embedded interpreter (maybe using BOOST Libraries?)

OMG this method is not for the faint of heart. And I can tell you it has taken me very long to achieve this in with a decent method. Main reason you would want to do this is so that you can run cpython code which as full rein over you java object. There are major major things to consider before deciding to try and bread java (which is like a chimp) with python (which is like a horse). Firstly if you crash the interpreter that’s lights out for you program! And don’t get me started on concurrency issues! In addition, there is allot allot of boiler, I believe I have found the best configuration to minimize this boiler but still it is allot! So how to go about this: Consider that C++ is your middle man, your objects are actually c++ objects! Good that you know that now. Just write your object as if your program as in cpp not java, with the data you want to access from both worlds. Then you can use the wrapper generator called swig (http://www.swig.org/Doc1.3/Java.html) to make this accessible to java and compile a dll which you call System.load(dll name here) in java. Get this working first, then move on to the hard part! To get to python you need to embed an interpreter. Firstly I suggest doing some hello interpreter programs or this tutorial Embedding python in C/C. Once you have that working, its time to make the horse and the monkey dance! You can send you c++ object to python via [boost][3] . I know I have not given you the fish, merely told you where to find the fish. Some pointers to note for this when compiling.

When you compile boost you will need to compile a shared library. And you need to include and link to the stuff you need from jdk, ie jawt.lib, jvm.lib, (you will also need the client jvm.dll in your path when launching the application) As well as the python27.lib or whatever and the boost_python-vc100-mt-1_55.lib. Then include Python/include, jdk/include, boost and only use shared libraries (dlls) otherwise boost has a teary. And yeah full on I know. There are so many ways in which this can go sour. So make sure you get each thing done block by block. Then put them together.


回答 2

在Java中包含python代码并不明智。用flask或其他Web框架包装您的python代码,使其成为微服务。使您的Java程序能够调用此微服务(例如,通过REST)。

相信我,这很简单,可以为您节省很多问题。而且代码是松散耦合的,因此它们是可伸缩的。

于2020年3月24日更新:根据@stx的评论,上述方法不适用于客户端和服务器之间的海量数据传输。这是我推荐的另一种方法:使用Rust连接Python和Java(也可以使用C / C ++)。 https://medium.com/@shmulikamar/https-medium-com-shmulikamar-connecting-python-and-java-with-rust-11c256a1dfb0

It’s not smart to have python code inside java. Wrap your python code with flask or other web framework to make it as a microservice. Make your java program able to call this microservice (e.g. via REST).

Beleive me, this is much simple and will save you tons of issues. And the codes are loosely coupled so they are scalable.

Updated on Mar 24th 2020: According to @stx’s comment, the above approach is not suitable for massive data transfer between client and server. Here is another approach I recommended: Connecting Python and Java with Rust(C/C++ also ok). https://medium.com/@shmulikamar/https-medium-com-shmulikamar-connecting-python-and-java-with-rust-11c256a1dfb0


回答 3

有几个答案提到您可以使用JNI或JNA来访问cpython,但我不建议您从头开始,因为已经有了用于从java访问cpython的开源库。例如:

Several of the answers mention that you can use JNI or JNA to access cpython but I would not recommend starting from scratch because there are already open source libraries for accessing cpython from java. For example:


回答 4

这里是一个库,可让您一次编写python脚本并确定在运行时使用哪种集成方法(Jython,CPython / PyPy(通过Jep和Py4j)):

https://github.com/subes/invesdwin-context-python

由于每种方法都有其自身的优点/缺点,如链接中所述。

Here a library that lets you write your python scripts once and decide which integration method (Jython, CPython/PyPy via Jep and Py4j) to use at runtime:

https://github.com/subes/invesdwin-context-python

Since each method has its own benefits/drawbacks as explained in the link.


回答 5

这取决于您对python函数的含义是什么?如果它们是用cpython编写的,则不能直接调用它们,则必须使用JNI,但是如果它们是用Jython编写的可以轻松地从Java调用它们,因为jython最终会生成Java字节码。

现在,当我说用cpython或jython编写时,这没有多大意义,因为python是python,并且除非您使用依赖于cpython或java的特定库,否则大多数代码都可以在两种实现上运行。

请参阅此处如何在Java中使用Python解释器。

It depends on what do you mean by python functions? if they were written in cpython you can not directly call them you will have to use JNI, but if they were written in Jython you can easily call them from java, as jython ultimately generates java byte code.

Now when I say written in cpython or jython it doesn’t make much sense because python is python and most code will run on both implementations unless you are using specific libraries which relies on cpython or java.

see here how to use Python interpreter in Java.


回答 6

根据您的要求,诸如XML-RPC之类的选项可能会很有用,它可以用于虚拟地以任何支持协议的语言远程调用函数。

Depending on your requirements, options like XML-RPC could be useful, which can be used to remotely call functions virtually in any language supporting the protocol.


回答 7

GraalVM是一个不错的选择。我已经完成了与GraalVM的Java + Javascript组合用于微服务设计(具有Javascript反射功能的Java)。他们最近增加了对python的支持,我想尝试一下,尤其是多年来这些社区的规模。

GraalVM is a good choice. I’ve done Java+Javascript combination with GraalVM for microservice design (Java with Javascript reflection). They recently added support for python, I’d give it a try especially with how big its community has grown over the years.


回答 8

您可以使用Java Native Interface从Java调用任何语言

You can call any language from java using Java Native Interface


回答 9

Jython有一些限制:

有许多差异。首先,Jython程序不能使用用C编写的CPython扩展模块。这些模块通常具有扩展名为.so,.pyd或.dll的文件。如果要使用这样的模块,则应寻找用纯Python或Java编写的等效模块。尽管在技术上支持此类扩展是可行的-IronPython这样做-在Jython中尚无计划这样做。

使用Jython将我的Python脚本作为JAR文件分发吗?

您只需使用Runtime或ProcessBuilder从Java调用python脚本(或bash或Perl脚本),然后将输出传递回Java:

在Java中运行bash shell脚本

在Java中运行命令行

java runtime.getruntime()从执行命令行程序获取输出

Jython has some limitations:

There are a number of differences. First, Jython programs cannot use CPython extension modules written in C. These modules usually have files with the extension .so, .pyd or .dll. If you want to use such a module, you should look for an equivalent written in pure Python or Java. Although it is technically feasible to support such extensions – IronPython does so – there are no plans to do so in Jython.

Distributing my Python scripts as JAR files with Jython?

you can simply call python scripts (or bash or Perl scripts) from Java using Runtime or ProcessBuilder and pass output back to Java:

Running a bash shell script in java

Running Command Line in Java

java runtime.getruntime() getting output from executing a command line program


回答 10

这样可以很好地概述当前的选项。其中一些在其他答案中被命名。在他们决定实现Python 3.x之前,Jython不可用,并且许多其他项目都来自python方面并希望访问java。但是,还有一些选项可以命名尚未命名的名称:gRPC

This gives a pretty good overview over the current options. Some of which are named in other answers. Jython is not usable until they decide to implement Python 3.x and many of the other projects are coming form the python side and want to access java. But there are a few options still, to name something which has not been named yet: gRPC


如何在Python的日志记录工具中添加自定义日志级别

问题:如何在Python的日志记录工具中添加自定义日志级别

我想为我的应用程序使用loglevel TRACE(5),因为我认为这debug()还不够。另外log(5, msg)不是我想要的。如何将自定义日志级别添加到Python记录器?

我有mylogger.py以下内容:

import logging

@property
def log(obj):
    myLogger = logging.getLogger(obj.__class__.__name__)
    return myLogger

在我的代码中,我通过以下方式使用它:

class ExampleClass(object):
    from mylogger import log

    def __init__(self):
        '''The constructor with the logger'''
        self.log.debug("Init runs")

现在我想打电话 self.log.trace("foo bar")

在此先感谢您的帮助。

编辑(2016年12月8日):我更改了pfa的公认答案,即IMHO,这是基于Eric S的非常好的建议的出色解决方案。

I’d like to have loglevel TRACE (5) for my application, as I don’t think that debug() is sufficient. Additionally log(5, msg) isn’t what I want. How can I add a custom loglevel to a Python logger?

I’ve a mylogger.py with the following content:

import logging

@property
def log(obj):
    myLogger = logging.getLogger(obj.__class__.__name__)
    return myLogger

In my code I use it in the following way:

class ExampleClass(object):
    from mylogger import log

    def __init__(self):
        '''The constructor with the logger'''
        self.log.debug("Init runs")

Now I’d like to call self.log.trace("foo bar")

Thanks in advance for your help.

Edit (Dec 8th 2016): I changed the accepted answer to pfa’s which is, IMHO, an excellent solution based on the very good proposal from Eric S.


回答 0

@Eric S.

Eric S.的回答很好,但是我通过实验得知,这将始终导致打印以新的调试级别记录的消息,而不管日志级别设置为什么。因此,如果您将的新级别号码设置为9,如果您调用setLevel(50),则较低级别的消息将被错误地打印。

为了防止这种情况的发生,您需要在“ debugv”函数内的另一行检查是否实际启用了相关日志记录级别。

固定示例检查是否启用了日志记录级别:

import logging
DEBUG_LEVELV_NUM = 9 
logging.addLevelName(DEBUG_LEVELV_NUM, "DEBUGV")
def debugv(self, message, *args, **kws):
    if self.isEnabledFor(DEBUG_LEVELV_NUM):
        # Yes, logger takes its '*args' as 'args'.
        self._log(DEBUG_LEVELV_NUM, message, args, **kws) 
logging.Logger.debugv = debugv

如果查看Python 2.7中的class Loggerin 代码logging.__init__.py,这就是所有标准日志功能(.critical,.debug等)的功能。

我显然不能发表因缺乏声誉而对其他人的回答的回复…希望埃里克(Eric)会在看到这一点后更新其帖子。=)

@Eric S.

Eric S.’s answer is excellent, but I learned by experimentation that this will always cause messages logged at the new debug level to be printed — regardless of what the log level is set to. So if you make a new level number of 9, if you call setLevel(50), the lower level messages will erroneously be printed.

To prevent that from happening, you need another line inside the “debugv” function to check if the logging level in question is actually enabled.

Fixed example that checks if the logging level is enabled:

import logging
DEBUG_LEVELV_NUM = 9 
logging.addLevelName(DEBUG_LEVELV_NUM, "DEBUGV")
def debugv(self, message, *args, **kws):
    if self.isEnabledFor(DEBUG_LEVELV_NUM):
        # Yes, logger takes its '*args' as 'args'.
        self._log(DEBUG_LEVELV_NUM, message, args, **kws) 
logging.Logger.debugv = debugv

If you look at the code for class Logger in logging.__init__.py for Python 2.7, this is what all the standard log functions do (.critical, .debug, etc.).

I apparently can’t post replies to others’ answers for lack of reputation… hopefully Eric will update his post if he sees this. =)


回答 1

我回答了“避免看到lambda”,不得不修改在添加log_at_my_log_level的位置。我也看到了Paul所做的问题“我认为这不起作用。您是否需要logger作为log_at_my_log_level中的第一个参数?” 这对我有用

import logging
DEBUG_LEVELV_NUM = 9 
logging.addLevelName(DEBUG_LEVELV_NUM, "DEBUGV")
def debugv(self, message, *args, **kws):
    # Yes, logger takes its '*args' as 'args'.
    self._log(DEBUG_LEVELV_NUM, message, args, **kws) 
logging.Logger.debugv = debugv

I took the avoid seeing “lambda” answer and had to modify where the log_at_my_log_level was being added. I too saw the problem that Paul did – I don’t think this works. Don’t you need logger as the first arg in log_at_my_log_level? This worked for me

import logging
DEBUG_LEVELV_NUM = 9 
logging.addLevelName(DEBUG_LEVELV_NUM, "DEBUGV")
def debugv(self, message, *args, **kws):
    # Yes, logger takes its '*args' as 'args'.
    self._log(DEBUG_LEVELV_NUM, message, args, **kws) 
logging.Logger.debugv = debugv

回答 2

将所有现有答案与大量使用经验相结合,我想我已经列出了确保完全无缝使用新级别所需要做的所有事情的清单。下面的步骤假定您要添加一个TRACE具有value 的新级别logging.DEBUG - 5 == 5

  1. logging.addLevelName(logging.DEBUG - 5, 'TRACE') 需要调用以在内部注册新级别,以便可以按名称引用它。
  2. logging为了保持一致性,需要将新级别添加为自身的属性logging.TRACE = logging.DEBUG - 5
  3. trace需要将一种称为的方法添加到logging模块中。它应该表现得就像debuginfo等等。
  4. trace需要将一种称为的方法添加到当前配置的记录器类中。由于不是100%保证是logging.Logger,请logging.getLoggerClass()改用。

下面的方法说明了所有步骤:

def addLoggingLevel(levelName, levelNum, methodName=None):
    """
    Comprehensively adds a new logging level to the `logging` module and the
    currently configured logging class.

    `levelName` becomes an attribute of the `logging` module with the value
    `levelNum`. `methodName` becomes a convenience method for both `logging`
    itself and the class returned by `logging.getLoggerClass()` (usually just
    `logging.Logger`). If `methodName` is not specified, `levelName.lower()` is
    used.

    To avoid accidental clobberings of existing attributes, this method will
    raise an `AttributeError` if the level name is already an attribute of the
    `logging` module or if the method name is already present 

    Example
    -------
    >>> addLoggingLevel('TRACE', logging.DEBUG - 5)
    >>> logging.getLogger(__name__).setLevel("TRACE")
    >>> logging.getLogger(__name__).trace('that worked')
    >>> logging.trace('so did this')
    >>> logging.TRACE
    5

    """
    if not methodName:
        methodName = levelName.lower()

    if hasattr(logging, levelName):
       raise AttributeError('{} already defined in logging module'.format(levelName))
    if hasattr(logging, methodName):
       raise AttributeError('{} already defined in logging module'.format(methodName))
    if hasattr(logging.getLoggerClass(), methodName):
       raise AttributeError('{} already defined in logger class'.format(methodName))

    # This method was inspired by the answers to Stack Overflow post
    # http://stackoverflow.com/q/2183233/2988730, especially
    # http://stackoverflow.com/a/13638084/2988730
    def logForLevel(self, message, *args, **kwargs):
        if self.isEnabledFor(levelNum):
            self._log(levelNum, message, args, **kwargs)
    def logToRoot(message, *args, **kwargs):
        logging.log(levelNum, message, *args, **kwargs)

    logging.addLevelName(levelNum, levelName)
    setattr(logging, levelName, levelNum)
    setattr(logging.getLoggerClass(), methodName, logForLevel)
    setattr(logging, methodName, logToRoot)

Combining all of the existing answers with a bunch of usage experience, I think that I have come up with a list of all the things that need to be done to ensure completely seamless usage of the new level. The steps below assume that you are adding a new level TRACE with value logging.DEBUG - 5 == 5:

  1. logging.addLevelName(logging.DEBUG - 5, 'TRACE') needs to be invoked to get the new level registered internally so that it can be referenced by name.
  2. The new level needs to be added as an attribute to logging itself for consistency: logging.TRACE = logging.DEBUG - 5.
  3. A method called trace needs to be added to the logging module. It should behave just like debug, info, etc.
  4. A method called trace needs to be added to the currently configured logger class. Since this is not 100% guaranteed to be logging.Logger, use logging.getLoggerClass() instead.

All the steps are illustrated in the method below:

def addLoggingLevel(levelName, levelNum, methodName=None):
    """
    Comprehensively adds a new logging level to the `logging` module and the
    currently configured logging class.

    `levelName` becomes an attribute of the `logging` module with the value
    `levelNum`. `methodName` becomes a convenience method for both `logging`
    itself and the class returned by `logging.getLoggerClass()` (usually just
    `logging.Logger`). If `methodName` is not specified, `levelName.lower()` is
    used.

    To avoid accidental clobberings of existing attributes, this method will
    raise an `AttributeError` if the level name is already an attribute of the
    `logging` module or if the method name is already present 

    Example
    -------
    >>> addLoggingLevel('TRACE', logging.DEBUG - 5)
    >>> logging.getLogger(__name__).setLevel("TRACE")
    >>> logging.getLogger(__name__).trace('that worked')
    >>> logging.trace('so did this')
    >>> logging.TRACE
    5

    """
    if not methodName:
        methodName = levelName.lower()

    if hasattr(logging, levelName):
       raise AttributeError('{} already defined in logging module'.format(levelName))
    if hasattr(logging, methodName):
       raise AttributeError('{} already defined in logging module'.format(methodName))
    if hasattr(logging.getLoggerClass(), methodName):
       raise AttributeError('{} already defined in logger class'.format(methodName))

    # This method was inspired by the answers to Stack Overflow post
    # http://stackoverflow.com/q/2183233/2988730, especially
    # http://stackoverflow.com/a/13638084/2988730
    def logForLevel(self, message, *args, **kwargs):
        if self.isEnabledFor(levelNum):
            self._log(levelNum, message, args, **kwargs)
    def logToRoot(message, *args, **kwargs):
        logging.log(levelNum, message, *args, **kwargs)

    logging.addLevelName(levelNum, levelName)
    setattr(logging, levelName, levelNum)
    setattr(logging.getLoggerClass(), methodName, logForLevel)
    setattr(logging, methodName, logToRoot)

回答 3

这个问题比较老,但是我只是处理相同的主题,并发现了一种与已经提到的类似的方法,对我来说似乎更干净。这已经在3.4上进行了测试,因此我不确定所使用的方法是否在较早的版本中存在:

from logging import getLoggerClass, addLevelName, setLoggerClass, NOTSET

VERBOSE = 5

class MyLogger(getLoggerClass()):
    def __init__(self, name, level=NOTSET):
        super().__init__(name, level)

        addLevelName(VERBOSE, "VERBOSE")

    def verbose(self, msg, *args, **kwargs):
        if self.isEnabledFor(VERBOSE):
            self._log(VERBOSE, msg, args, **kwargs)

setLoggerClass(MyLogger)

This question is rather old, but I just dealt with the same topic and found a way similiar to those already mentioned which appears a little cleaner to me. This was tested on 3.4, so I’m not sure whether the methods used exist in older versions:

from logging import getLoggerClass, addLevelName, setLoggerClass, NOTSET

VERBOSE = 5

class MyLogger(getLoggerClass()):
    def __init__(self, name, level=NOTSET):
        super().__init__(name, level)

        addLevelName(VERBOSE, "VERBOSE")

    def verbose(self, msg, *args, **kwargs):
        if self.isEnabledFor(VERBOSE):
            self._log(VERBOSE, msg, args, **kwargs)

setLoggerClass(MyLogger)

回答 4

谁开始使用内部方法(self._log)的错误做法,为什么每个答案都基于此?pythonic解决方案将改为使用,self.log因此您不必弄乱任何内部内容:

import logging

SUBDEBUG = 5
logging.addLevelName(SUBDEBUG, 'SUBDEBUG')

def subdebug(self, message, *args, **kws):
    self.log(SUBDEBUG, message, *args, **kws) 
logging.Logger.subdebug = subdebug

logging.basicConfig()
l = logging.getLogger()
l.setLevel(SUBDEBUG)
l.subdebug('test')
l.setLevel(logging.DEBUG)
l.subdebug('test')

Who started the bad practice of using internal methods (self._log) and why is each answer based on that?! The pythonic solution would be to use self.log instead so you don’t have to mess with any internal stuff:

import logging

SUBDEBUG = 5
logging.addLevelName(SUBDEBUG, 'SUBDEBUG')

def subdebug(self, message, *args, **kws):
    self.log(SUBDEBUG, message, *args, **kws) 
logging.Logger.subdebug = subdebug

logging.basicConfig()
l = logging.getLogger()
l.setLevel(SUBDEBUG)
l.subdebug('test')
l.setLevel(logging.DEBUG)
l.subdebug('test')

回答 5

我发现为通过log()函数的logger对象创建新属性更加容易。我认为出于这个原因,记录器模块提供了addLevelName()和log()。因此,不需要子类或新方法。

import logging

@property
def log(obj):
    logging.addLevelName(5, 'TRACE')
    myLogger = logging.getLogger(obj.__class__.__name__)
    setattr(myLogger, 'trace', lambda *args: myLogger.log(5, *args))
    return myLogger

现在

mylogger.trace('This is a trace message')

应该能按预期工作。

I find it easier to create a new attribute for the logger object that passes the log() function. I think the logger module provides the addLevelName() and the log() for this very reason. Thus no subclasses or new method needed.

import logging

@property
def log(obj):
    logging.addLevelName(5, 'TRACE')
    myLogger = logging.getLogger(obj.__class__.__name__)
    setattr(myLogger, 'trace', lambda *args: myLogger.log(5, *args))
    return myLogger

now

mylogger.trace('This is a trace message')

should work as expected.


回答 6

虽然我们已经有了很多正确的答案,但我认为以下内容更像pythonic:

import logging

from functools import partial, partialmethod

logging.TRACE = 5
logging.addLevelName(logging.TRACE, 'TRACE')
logging.Logger.trace = partialmethod(logging.Logger.log, logging.TRACE)
logging.trace = partial(logging.log, logging.TRACE)

如果要mypy在代码上使用,建议添加# type: ignore以禁止添加属性的警告。

While we have already plenty of correct answers, the following is in my opinion more pythonic:

import logging

from functools import partial, partialmethod

logging.TRACE = 5
logging.addLevelName(logging.TRACE, 'TRACE')
logging.Logger.trace = partialmethod(logging.Logger.log, logging.TRACE)
logging.trace = partial(logging.log, logging.TRACE)

If you want to use mypy on your code, it is recommended to add # type: ignore to suppress warnings from adding attribute.


回答 7

我认为您必须对该Logger类进行子类化,并添加一个称为的方法trace,该方法基本上Logger.log以低于的级别进行调用DEBUG。我还没有尝试过,但这就是文档所指示的

I think you’ll have to subclass the Logger class and add a method called trace which basically calls Logger.log with a level lower than DEBUG. I haven’t tried this but this is what the docs indicate.


回答 8

创建自定义记录器的提示:

  1. 不要使用_log,使用log(您不必检查isEnabledFor
  2. 日志记录模块应该是自定义记录器的一个创建实例,因为它在中getLogger起到了一些神奇作用,因此您需要通过以下方式设置类setLoggerClass
  3. __init__如果您不存储任何内容,则无需为记录器定义类
# Lower than debug which is 10
TRACE = 5
class MyLogger(logging.Logger):
    def trace(self, msg, *args, **kwargs):
        self.log(TRACE, msg, *args, **kwargs)

调用此记录器时,请使用setLoggerClass(MyLogger)它作为默认记录器getLogger

logging.setLoggerClass(MyLogger)
log = logging.getLogger(__name__)
# ...
log.trace("something specific")

您需要setFormattersetHandler以及setLevel(TRACE)handler与对log自身实际SE这低水平跟踪

Tips for creating a custom logger:

  1. Do not use _log, use log (you don’t have to check isEnabledFor)
  2. the logging module should be the one creating instance of the custom logger since it does some magic in getLogger, so you will need to set the class via setLoggerClass
  3. You do not need to define __init__ for the logger, class if you are not storing anything
# Lower than debug which is 10
TRACE = 5
class MyLogger(logging.Logger):
    def trace(self, msg, *args, **kwargs):
        self.log(TRACE, msg, *args, **kwargs)

When calling this logger use setLoggerClass(MyLogger) to make this the default logger from getLogger

logging.setLoggerClass(MyLogger)
log = logging.getLogger(__name__)
# ...
log.trace("something specific")

You will need to setFormatter, setHandler, and setLevel(TRACE) on the handler and on the log itself to actually se this low level trace


回答 9

这对我有用:

import logging
logging.basicConfig(
    format='  %(levelname)-8.8s %(funcName)s: %(message)s',
)
logging.NOTE = 32  # positive yet important
logging.addLevelName(logging.NOTE, 'NOTE')      # new level
logging.addLevelName(logging.CRITICAL, 'FATAL') # rename existing

log = logging.getLogger(__name__)
log.note = lambda msg, *args: log._log(logging.NOTE, msg, args)
log.note('school\'s out for summer! %s', 'dude')
log.fatal('file not found.')

lambda / funcName问题已通过@marqueed指出的logger._log解决。我认为使用lambda看起来更干净一些,但是缺点是它不能接受关键字参数。我自己从来没有用过,所以没什么大不了的。

  注意设置:暑假就要放学了!花花公子
  致命设置:找不到文件。

This worked for me:

import logging
logging.basicConfig(
    format='  %(levelname)-8.8s %(funcName)s: %(message)s',
)
logging.NOTE = 32  # positive yet important
logging.addLevelName(logging.NOTE, 'NOTE')      # new level
logging.addLevelName(logging.CRITICAL, 'FATAL') # rename existing

log = logging.getLogger(__name__)
log.note = lambda msg, *args: log._log(logging.NOTE, msg, args)
log.note('school\'s out for summer! %s', 'dude')
log.fatal('file not found.')

The lambda/funcName issue is fixed with logger._log as @marqueed pointed out. I think using lambda looks a bit cleaner, but the drawback is that it can’t take keyword arguments. I’ve never used that myself, so no biggie.

  NOTE     setup: school's out for summer! dude
  FATAL    setup: file not found.

回答 10

以我的经验,这是操作程序问题的完整解决方案…为了避免看到“ lambda”作为发出消息的函数,请深入了解:

MY_LEVEL_NUM = 25
logging.addLevelName(MY_LEVEL_NUM, "MY_LEVEL_NAME")
def log_at_my_log_level(self, message, *args, **kws):
    # Yes, logger takes its '*args' as 'args'.
    self._log(MY_LEVEL_NUM, message, args, **kws)
logger.log_at_my_log_level = log_at_my_log_level

我从未尝试过使用独立的记录器类,但我认为基本思想是相同的(使用_log)。

In my experience, this is the full solution the the op’s problem… to avoid seeing “lambda” as the function in which the message is emitted, go deeper:

MY_LEVEL_NUM = 25
logging.addLevelName(MY_LEVEL_NUM, "MY_LEVEL_NAME")
def log_at_my_log_level(self, message, *args, **kws):
    # Yes, logger takes its '*args' as 'args'.
    self._log(MY_LEVEL_NUM, message, args, **kws)
logger.log_at_my_log_level = log_at_my_log_level

I’ve never tried working with a standalone logger class, but I think the basic idea is the same (use _log).


回答 11

除了“疯狂物理学家”示exceptions,还可以使文件名和行号正确无误:

def logToRoot(message, *args, **kwargs):
    if logging.root.isEnabledFor(levelNum):
        logging.root._log(levelNum, message, args, **kwargs)

Addition to Mad Physicists example to get file name and line number correct:

def logToRoot(message, *args, **kwargs):
    if logging.root.isEnabledFor(levelNum):
        logging.root._log(levelNum, message, args, **kwargs)

回答 12

基于固定的答案,我写了一种方法可以自动创建新的日志记录级别

def set_custom_logging_levels(config={}):
    """
        Assign custom levels for logging
            config: is a dict, like
            {
                'EVENT_NAME': EVENT_LEVEL_NUM,
            }
        EVENT_LEVEL_NUM can't be like already has logging module
        logging.DEBUG       = 10
        logging.INFO        = 20
        logging.WARNING     = 30
        logging.ERROR       = 40
        logging.CRITICAL    = 50
    """
    assert isinstance(config, dict), "Configuration must be a dict"

    def get_level_func(level_name, level_num):
        def _blank(self, message, *args, **kws):
            if self.isEnabledFor(level_num):
                # Yes, logger takes its '*args' as 'args'.
                self._log(level_num, message, args, **kws) 
        _blank.__name__ = level_name.lower()
        return _blank

    for level_name, level_num in config.items():
        logging.addLevelName(level_num, level_name.upper())
        setattr(logging.Logger, level_name.lower(), get_level_func(level_name, level_num))

配置可能像这样:

new_log_levels = {
    # level_num is in logging.INFO section, that's why it 21, 22, etc..
    "FOO":      21,
    "BAR":      22,
}

based on pinned answer, i wrote a little method which automaticaly create new logging levels

def set_custom_logging_levels(config={}):
    """
        Assign custom levels for logging
            config: is a dict, like
            {
                'EVENT_NAME': EVENT_LEVEL_NUM,
            }
        EVENT_LEVEL_NUM can't be like already has logging module
        logging.DEBUG       = 10
        logging.INFO        = 20
        logging.WARNING     = 30
        logging.ERROR       = 40
        logging.CRITICAL    = 50
    """
    assert isinstance(config, dict), "Configuration must be a dict"

    def get_level_func(level_name, level_num):
        def _blank(self, message, *args, **kws):
            if self.isEnabledFor(level_num):
                # Yes, logger takes its '*args' as 'args'.
                self._log(level_num, message, args, **kws) 
        _blank.__name__ = level_name.lower()
        return _blank

    for level_name, level_num in config.items():
        logging.addLevelName(level_num, level_name.upper())
        setattr(logging.Logger, level_name.lower(), get_level_func(level_name, level_num))

config may smth like that:

new_log_levels = {
    # level_num is in logging.INFO section, that's why it 21, 22, etc..
    "FOO":      21,
    "BAR":      22,
}

回答 13

作为向Logger类添加额外方法的替代方法,我建议使用该Logger.log(level, msg)方法。

import logging

TRACE = 5
logging.addLevelName(TRACE, 'TRACE')
FORMAT = '%(levelname)s:%(name)s:%(lineno)d:%(message)s'


logging.basicConfig(format=FORMAT)
l = logging.getLogger()
l.setLevel(TRACE)
l.log(TRACE, 'trace message')
l.setLevel(logging.DEBUG)
l.log(TRACE, 'disabled trace message')

As alternative to adding an extra method to the Logger class I would recommend using the Logger.log(level, msg) method.

import logging

TRACE = 5
logging.addLevelName(TRACE, 'TRACE')
FORMAT = '%(levelname)s:%(name)s:%(lineno)d:%(message)s'


logging.basicConfig(format=FORMAT)
l = logging.getLogger()
l.setLevel(TRACE)
l.log(TRACE, 'trace message')
l.setLevel(logging.DEBUG)
l.log(TRACE, 'disabled trace message')

回答 14

我很困惑; 至少在python 3.5中,它可以正常工作:

import logging


TRACE = 5
"""more detail than debug"""

logging.basicConfig()
logging.addLevelName(TRACE,"TRACE")
logger = logging.getLogger('')
logger.debug("n")
logger.setLevel(logging.DEBUG)
logger.debug("y1")
logger.log(TRACE,"n")
logger.setLevel(TRACE)
logger.log(TRACE,"y2")
    

输出:

调试:root:y1

跟踪:root:y2

I’m confused; with python 3.5, at least, it just works:

import logging


TRACE = 5
"""more detail than debug"""

logging.basicConfig()
logging.addLevelName(TRACE,"TRACE")
logger = logging.getLogger('')
logger.debug("n")
logger.setLevel(logging.DEBUG)
logger.debug("y1")
logger.log(TRACE,"n")
logger.setLevel(TRACE)
logger.log(TRACE,"y2")
    

output:

DEBUG:root:y1

TRACE:root:y2


回答 15

万一有人想要一种自动的方式来动态地向日志记录模块(或其副本)添加新的日志记录级别,我创建了此函数,扩展了@pfa的答案:

def add_level(log_name,custom_log_module=None,log_num=None,
                log_call=None,
                   lower_than=None, higher_than=None, same_as=None,
              verbose=True):
    '''
    Function to dynamically add a new log level to a given custom logging module.
    <custom_log_module>: the logging module. If not provided, then a copy of
        <logging> module is used
    <log_name>: the logging level name
    <log_num>: the logging level num. If not provided, then function checks
        <lower_than>,<higher_than> and <same_as>, at the order mentioned.
        One of those three parameters must hold a string of an already existent
        logging level name.
    In case a level is overwritten and <verbose> is True, then a message in WARNING
        level of the custom logging module is established.
    '''
    if custom_log_module is None:
        import imp
        custom_log_module = imp.load_module('custom_log_module',
                                            *imp.find_module('logging'))
    log_name = log_name.upper()
    def cust_log(par, message, *args, **kws):
        # Yes, logger takes its '*args' as 'args'.
        if par.isEnabledFor(log_num):
            par._log(log_num, message, args, **kws)
    available_level_nums = [key for key in custom_log_module._levelNames
                            if isinstance(key,int)]

    available_levels = {key:custom_log_module._levelNames[key]
                             for key in custom_log_module._levelNames
                            if isinstance(key,str)}
    if log_num is None:
        try:
            if lower_than is not None:
                log_num = available_levels[lower_than]-1
            elif higher_than is not None:
                log_num = available_levels[higher_than]+1
            elif same_as is not None:
                log_num = available_levels[higher_than]
            else:
                raise Exception('Infomation about the '+
                                'log_num should be provided')
        except KeyError:
            raise Exception('Non existent logging level name')
    if log_num in available_level_nums and verbose:
        custom_log_module.warn('Changing ' +
                                  custom_log_module._levelNames[log_num] +
                                  ' to '+log_name)
    custom_log_module.addLevelName(log_num, log_name)

    if log_call is None:
        log_call = log_name.lower()

    setattr(custom_log_module.Logger, log_call, cust_log)
    return custom_log_module

In case anyone wants an automated way to add a new logging level to the logging module (or a copy of it) dynamically, I have created this function, expanding @pfa’s answer:

def add_level(log_name,custom_log_module=None,log_num=None,
                log_call=None,
                   lower_than=None, higher_than=None, same_as=None,
              verbose=True):
    '''
    Function to dynamically add a new log level to a given custom logging module.
    <custom_log_module>: the logging module. If not provided, then a copy of
        <logging> module is used
    <log_name>: the logging level name
    <log_num>: the logging level num. If not provided, then function checks
        <lower_than>,<higher_than> and <same_as>, at the order mentioned.
        One of those three parameters must hold a string of an already existent
        logging level name.
    In case a level is overwritten and <verbose> is True, then a message in WARNING
        level of the custom logging module is established.
    '''
    if custom_log_module is None:
        import imp
        custom_log_module = imp.load_module('custom_log_module',
                                            *imp.find_module('logging'))
    log_name = log_name.upper()
    def cust_log(par, message, *args, **kws):
        # Yes, logger takes its '*args' as 'args'.
        if par.isEnabledFor(log_num):
            par._log(log_num, message, args, **kws)
    available_level_nums = [key for key in custom_log_module._levelNames
                            if isinstance(key,int)]

    available_levels = {key:custom_log_module._levelNames[key]
                             for key in custom_log_module._levelNames
                            if isinstance(key,str)}
    if log_num is None:
        try:
            if lower_than is not None:
                log_num = available_levels[lower_than]-1
            elif higher_than is not None:
                log_num = available_levels[higher_than]+1
            elif same_as is not None:
                log_num = available_levels[higher_than]
            else:
                raise Exception('Infomation about the '+
                                'log_num should be provided')
        except KeyError:
            raise Exception('Non existent logging level name')
    if log_num in available_level_nums and verbose:
        custom_log_module.warn('Changing ' +
                                  custom_log_module._levelNames[log_num] +
                                  ' to '+log_name)
    custom_log_module.addLevelName(log_num, log_name)

    if log_call is None:
        log_call = log_name.lower()

    setattr(custom_log_module.Logger, log_call, cust_log)
    return custom_log_module

如何消除matplotlib中子图之间的间隙?

问题:如何消除matplotlib中子图之间的间隙?

下面的代码在子图之间产生间隙。如何消除子图之间的间隙并使图像紧密网格?

import matplotlib.pyplot as plt

for i in range(16):
    i = i + 1
    ax1 = plt.subplot(4, 4, i)
    plt.axis('on')
    ax1.set_xticklabels([])
    ax1.set_yticklabels([])
    ax1.set_aspect('equal')
    plt.subplots_adjust(wspace=None, hspace=None)
plt.show()

The code below produces gaps between the subplots. How do I remove the gaps between the subplots and make the image a tight grid?

import matplotlib.pyplot as plt

for i in range(16):
    i = i + 1
    ax1 = plt.subplot(4, 4, i)
    plt.axis('on')
    ax1.set_xticklabels([])
    ax1.set_yticklabels([])
    ax1.set_aspect('equal')
    plt.subplots_adjust(wspace=None, hspace=None)
plt.show()

回答 0

您可以使用gridspec来控制轴之间的间距。这里有更多信息

import matplotlib.pyplot as plt
import matplotlib.gridspec as gridspec

plt.figure(figsize = (4,4))
gs1 = gridspec.GridSpec(4, 4)
gs1.update(wspace=0.025, hspace=0.05) # set the spacing between axes. 

for i in range(16):
   # i = i + 1 # grid spec indexes from 0
    ax1 = plt.subplot(gs1[i])
    plt.axis('on')
    ax1.set_xticklabels([])
    ax1.set_yticklabels([])
    ax1.set_aspect('equal')

plt.show()

You can use gridspec to control the spacing between axes. There’s more information here.

import matplotlib.pyplot as plt
import matplotlib.gridspec as gridspec

plt.figure(figsize = (4,4))
gs1 = gridspec.GridSpec(4, 4)
gs1.update(wspace=0.025, hspace=0.05) # set the spacing between axes. 

for i in range(16):
   # i = i + 1 # grid spec indexes from 0
    ax1 = plt.subplot(gs1[i])
    plt.axis('on')
    ax1.set_xticklabels([])
    ax1.set_yticklabels([])
    ax1.set_aspect('equal')

plt.show()


回答 1

问题是使用aspect='equal',防止子图拉伸到任意纵横比并填满所有空白空间。

通常,这可以工作:

import matplotlib.pyplot as plt

ax = [plt.subplot(2,2,i+1) for i in range(4)]

for a in ax:
    a.set_xticklabels([])
    a.set_yticklabels([])

plt.subplots_adjust(wspace=0, hspace=0)

结果是这样的:

但是,使用aspect='equal',如以下代码所示:

import matplotlib.pyplot as plt

ax = [plt.subplot(2,2,i+1) for i in range(4)]

for a in ax:
    a.set_xticklabels([])
    a.set_yticklabels([])
    a.set_aspect('equal')

plt.subplots_adjust(wspace=0, hspace=0)

这是我们得到的:

第二种情况的区别在于,您已将x轴和y轴强制设置为具有相同数量的单位/像素。由于默认情况下轴从0变为1(即在绘制任何东西之前),因此使用aspect='equal'强制每个轴为正方形。由于该图不是正方形,因此pyplot会在水平轴之间增加额外的间距。

要解决此问题,可以将图形设置为具有正确的宽高比。我们将在这里使用面向对象的pyplot接口,我认为它通常是更好的:

import matplotlib.pyplot as plt

fig = plt.figure(figsize=(8,8)) # Notice the equal aspect ratio
ax = [fig.add_subplot(2,2,i+1) for i in range(4)]

for a in ax:
    a.set_xticklabels([])
    a.set_yticklabels([])
    a.set_aspect('equal')

fig.subplots_adjust(wspace=0, hspace=0)

结果如下:

The problem is the use of aspect='equal', which prevents the subplots from stretching to an arbitrary aspect ratio and filling up all the empty space.

Normally, this would work:

import matplotlib.pyplot as plt

ax = [plt.subplot(2,2,i+1) for i in range(4)]

for a in ax:
    a.set_xticklabels([])
    a.set_yticklabels([])

plt.subplots_adjust(wspace=0, hspace=0)

The result is this:

However, with aspect='equal', as in the following code:

import matplotlib.pyplot as plt

ax = [plt.subplot(2,2,i+1) for i in range(4)]

for a in ax:
    a.set_xticklabels([])
    a.set_yticklabels([])
    a.set_aspect('equal')

plt.subplots_adjust(wspace=0, hspace=0)

This is what we get:

The difference in this second case is that you’ve forced the x- and y-axes to have the same number of units/pixel. Since the axes go from 0 to 1 by default (i.e., before you plot anything), using aspect='equal' forces each axis to be a square. Since the figure is not a square, pyplot adds in extra spacing between the axes horizontally.

To get around this problem, you can set your figure to have the correct aspect ratio. We’re going to use the object-oriented pyplot interface here, which I consider to be superior in general:

import matplotlib.pyplot as plt

fig = plt.figure(figsize=(8,8)) # Notice the equal aspect ratio
ax = [fig.add_subplot(2,2,i+1) for i in range(4)]

for a in ax:
    a.set_xticklabels([])
    a.set_yticklabels([])
    a.set_aspect('equal')

fig.subplots_adjust(wspace=0, hspace=0)

Here’s the result:


回答 2

在不完全采用gridspec的情况下,还可以通过将wspacehspace设置为零来使用以下方法消除差距:

import matplotlib.pyplot as plt

plt.clf()
f, axarr = plt.subplots(4, 4, gridspec_kw = {'wspace':0, 'hspace':0})

for i, ax in enumerate(f.axes):
    ax.grid('on', linestyle='--')
    ax.set_xticklabels([])
    ax.set_yticklabels([])

plt.show()
plt.close()

导致:

Without resorting gridspec entirely, the following might also be used to remove the gaps by setting wspace and hspace to zero:

import matplotlib.pyplot as plt

plt.clf()
f, axarr = plt.subplots(4, 4, gridspec_kw = {'wspace':0, 'hspace':0})

for i, ax in enumerate(f.axes):
    ax.grid('on', linestyle='--')
    ax.set_xticklabels([])
    ax.set_yticklabels([])

plt.show()
plt.close()

Resulting in:


回答 3

你试过了plt.tight_layout()吗?

plt.tight_layout() 没有它:

或者:类似这样的东西(使用add_axes

left=[0.1,0.3,0.5,0.7]
width=[0.2,0.2, 0.2, 0.2]
rectLS=[]
for x in left:
   for y in left:
       rectLS.append([x, y, 0.2, 0.2])
axLS=[]
fig=plt.figure()
axLS.append(fig.add_axes(rectLS[0]))
for i in [1,2,3]:
     axLS.append(fig.add_axes(rectLS[i],sharey=axLS[-1]))    
axLS.append(fig.add_axes(rectLS[4]))
for i in [1,2,3]:
     axLS.append(fig.add_axes(rectLS[i+4],sharex=axLS[i],sharey=axLS[-1]))
axLS.append(fig.add_axes(rectLS[8]))
for i in [5,6,7]:
     axLS.append(fig.add_axes(rectLS[i+4],sharex=axLS[i],sharey=axLS[-1]))     
axLS.append(fig.add_axes(rectLS[12]))
for i in [9,10,11]:
     axLS.append(fig.add_axes(rectLS[i+4],sharex=axLS[i],sharey=axLS[-1]))

如果您不需要共享轴,则只需 axLS=map(fig.add_axes, rectLS)

Have you tried plt.tight_layout()?

with plt.tight_layout() without it:

Or: something like this (use add_axes)

left=[0.1,0.3,0.5,0.7]
width=[0.2,0.2, 0.2, 0.2]
rectLS=[]
for x in left:
   for y in left:
       rectLS.append([x, y, 0.2, 0.2])
axLS=[]
fig=plt.figure()
axLS.append(fig.add_axes(rectLS[0]))
for i in [1,2,3]:
     axLS.append(fig.add_axes(rectLS[i],sharey=axLS[-1]))    
axLS.append(fig.add_axes(rectLS[4]))
for i in [1,2,3]:
     axLS.append(fig.add_axes(rectLS[i+4],sharex=axLS[i],sharey=axLS[-1]))
axLS.append(fig.add_axes(rectLS[8]))
for i in [5,6,7]:
     axLS.append(fig.add_axes(rectLS[i+4],sharex=axLS[i],sharey=axLS[-1]))     
axLS.append(fig.add_axes(rectLS[12]))
for i in [9,10,11]:
     axLS.append(fig.add_axes(rectLS[i+4],sharex=axLS[i],sharey=axLS[-1]))

If you don’t need to share axes, then simply axLS=map(fig.add_axes, rectLS)


回答 4

对于最新的matplotlib版本,您可能需要尝试Constrained Layoutplt.subplot()但是,这不起作用,因此您需要使用plt.subplots()

fig, axs = plt.subplots(4, 4, constrained_layout=True)

With recent matplotlib versions you might want to try Constrained Layout. This does not work with plt.subplot() however, so you need to use plt.subplots() instead:

fig, axs = plt.subplots(4, 4, constrained_layout=True)