分类目录归档:知识问答

什么是Python缓冲区类型?

问题:什么是Python缓冲区类型?

bufferpython中有一个类型,但是我不知道如何使用它。

Python文档中,描述为:

buffer(object[, offset[, size]])

object参数必须是支持缓冲区调用接口的对象(例如字符串,数组和缓冲区)。将创建一个引用该对象参数的新缓冲区对象。缓冲区对象将从对象的开头(或指定的偏移量)开始是一个切片。切片将延伸到对象的末尾(或具有由size参数指定的长度)。

There is a buffer type in python, but I don’t know how can I use it.

In the Python doc the description is:

buffer(object[, offset[, size]])

The object argument must be an object that supports the buffer call interface (such as strings, arrays, and buffers). A new buffer object will be created which references the object argument. The buffer object will be a slice from the beginning of object (or from the specified offset). The slice will extend to the end of object (or will have a length given by the size argument).


回答 0

用法示例:

>>> s = 'Hello world'
>>> t = buffer(s, 6, 5)
>>> t
<read-only buffer for 0x10064a4b0, size 5, offset 6 at 0x100634ab0>
>>> print t
world

在这种情况下,缓冲区是一个子字符串,从位置6开始,长度为5,并且不占用额外的存储空间-它引用字符串的一部分。

这对于像这样的短字符串不是很有用,但是在使用大量数据时可能是必需的。这个例子使用了可变的bytearray

>>> s = bytearray(1000000)   # a million zeroed bytes
>>> t = buffer(s, 1)         # slice cuts off the first byte
>>> s[1] = 5                 # set the second element in s
>>> t[0]                     # which is now also the first element in t!
'\x05'

如果您想对数据有多个视图并且不想(或不能)在内存中保存多个副本,这将非常有用。

请注意,尽管您可以在Python 2.7中使用它,但是buffer已经被memoryviewPython 3中的更好名称所代替。

还要注意,如果不深入研究C API,就不能为自己的对象实现缓冲区接口,即,不能在纯Python中做到这一点。

An example usage:

>>> s = 'Hello world'
>>> t = buffer(s, 6, 5)
>>> t
<read-only buffer for 0x10064a4b0, size 5, offset 6 at 0x100634ab0>
>>> print t
world

The buffer in this case is a sub-string, starting at position 6 with length 5, and it doesn’t take extra storage space – it references a slice of the string.

This isn’t very useful for short strings like this, but it can be necessary when using large amounts of data. This example uses a mutable bytearray:

>>> s = bytearray(1000000)   # a million zeroed bytes
>>> t = buffer(s, 1)         # slice cuts off the first byte
>>> s[1] = 5                 # set the second element in s
>>> t[0]                     # which is now also the first element in t!
'\x05'

This can be very helpful if you want to have more than one view on the data and don’t want to (or can’t) hold multiple copies in memory.

Note that buffer has been replaced by the better named memoryview in Python 3, though you can use either in Python 2.7.

Note also that you can’t implement a buffer interface for your own objects without delving into the C API, i.e. you can’t do it in pure Python.


回答 1

我认为在将python与本机库接口时,缓冲区非常有用。(Guido van Rossum buffer此邮件列表帖子中进行了解释)。

例如,numpy似乎使用缓冲区进行有效的数据存储:

import numpy
a = numpy.ndarray(1000000)

a.data是:

<read-write buffer for 0x1d7b410, size 8000000, offset 0 at 0x1e353b0>

I think buffers are e.g. useful when interfacing python to native libraries. (Guido van Rossum explains buffer in this mailinglist post).

For example, numpy seems to use buffer for efficient data storage:

import numpy
a = numpy.ndarray(1000000)

the a.data is a:

<read-write buffer for 0x1d7b410, size 8000000, offset 0 at 0x1e353b0>

在Python中获取对象的全限定类名

问题:在Python中获取对象的全限定类名

出于记录目的,我想检索Python对象的完全限定的类名。(对于完全限定,我的意思是类名称,包括软件包和模块名称。)

我知道x.__class__.__name__,但是有一种简单的方法来获取软件包和模块吗?

For logging purposes I want to retrieve the fully qualified class name of a Python object. (With fully qualified I mean the class name including the package and module name.)

I know about x.__class__.__name__, but is there a simple method to get the package and module?


回答 0

随着以下程序

#! /usr/bin/env python

import foo

def fullname(o):
  # o.__module__ + "." + o.__class__.__qualname__ is an example in
  # this context of H.L. Mencken's "neat, plausible, and wrong."
  # Python makes no guarantees as to whether the __module__ special
  # attribute is defined, so we take a more circumspect approach.
  # Alas, the module name is explicitly excluded from __qualname__
  # in Python 3.

  module = o.__class__.__module__
  if module is None or module == str.__class__.__module__:
    return o.__class__.__name__  # Avoid reporting __builtin__
  else:
    return module + '.' + o.__class__.__name__

bar = foo.Bar()
print fullname(bar)

Bar定义为

class Bar(object):
  def __init__(self, v=42):
    self.val = v

输出是

$ ./prog.py
foo.Bar

With the following program

#! /usr/bin/env python

import foo

def fullname(o):
  # o.__module__ + "." + o.__class__.__qualname__ is an example in
  # this context of H.L. Mencken's "neat, plausible, and wrong."
  # Python makes no guarantees as to whether the __module__ special
  # attribute is defined, so we take a more circumspect approach.
  # Alas, the module name is explicitly excluded from __qualname__
  # in Python 3.

  module = o.__class__.__module__
  if module is None or module == str.__class__.__module__:
    return o.__class__.__name__  # Avoid reporting __builtin__
  else:
    return module + '.' + o.__class__.__name__

bar = foo.Bar()
print fullname(bar)

and Bar defined as

class Bar(object):
  def __init__(self, v=42):
    self.val = v

the output is

$ ./prog.py
foo.Bar

回答 1

提供的答案不涉及嵌套类。尽管直到Python 3.3(PEP 3155)才可用__qualname__,但您确实想使用该类。最终(3.4?PEP 395),__qualname__模块也将存在,以处理模块被重命名的情况(即,将其重命名为__main__)。

The provided answers don’t deal with nested classes. Though it’s not available until Python 3.3 (PEP 3155), you really want to use __qualname__ of the class. Eventually (3.4? PEP 395), __qualname__ will also exist for modules to deal with cases where the module is renamed (i.e. when it is renamed to __main__).


回答 2

考虑使用inspect具有以下功能的模块getmodule

>>>import inspect
>>>import xml.etree.ElementTree
>>>et = xml.etree.ElementTree.ElementTree()
>>>inspect.getmodule(et)
<module 'xml.etree.ElementTree' from 
        'D:\tools\python2.5.2\lib\xml\etree\ElementTree.pyc'>

Consider using the inspect module which has functions like getmodule which might be what are looking for:

>>>import inspect
>>>import xml.etree.ElementTree
>>>et = xml.etree.ElementTree.ElementTree()
>>>inspect.getmodule(et)
<module 'xml.etree.ElementTree' from 
        'D:\tools\python2.5.2\lib\xml\etree\ElementTree.pyc'>

回答 3

这是基于格雷格·培根(Greg Bacon)出色答案的答案,但还要进行一些额外的检查:

__module__可以是None(根据文档),也str可以是类似的类型__builtin__(您可能不想在日志或其他内容中出现)。以下检查这两种可能性:

def fullname(o):
    module = o.__class__.__module__
    if module is None or module == str.__class__.__module__:
        return o.__class__.__name__
    return module + '.' + o.__class__.__name__

(可能有一种更好的检查方法__builtin__。以上内容仅取决于以下事实:str始终可用,并且其模块始终为__builtin__

Here’s one based on Greg Bacon’s excellent answer, but with a couple of extra checks:

__module__ can be None (according to the docs), and also for a type like str it can be __builtin__ (which you might not want appearing in logs or whatever). The following checks for both those possibilities:

def fullname(o):
    module = o.__class__.__module__
    if module is None or module == str.__class__.__module__:
        return o.__class__.__name__
    return module + '.' + o.__class__.__name__

(There might be a better way to check for __builtin__. The above just relies on the fact that str is always available, and its module is always __builtin__)


回答 4

对于python3.7我使用:

".".join([obj.__module__, obj.__name__])

获得:

package.subpackage.ClassName

For python3.7 I use:

".".join([obj.__module__, obj.__name__])

Getting:

package.subpackage.ClassName

回答 5

__module__ 会成功的

尝试:

>>> import re
>>> print re.compile.__module__
re

该站点建议这__package__可能适用于Python 3.0。但是,此处给出的示例在我的Python 2.5.2控制台下不起作用。

__module__ would do the trick.

Try:

>>> import re
>>> print re.compile.__module__
re

This site suggests that __package__ might work for Python 3.0; However, the examples given there won’t work under my Python 2.5.2 console.


回答 6

这是一个hack,但是我支持2.6,只需要简单一些即可:

>>> from logging.handlers import MemoryHandler as MH
>>> str(MH).split("'")[1]

'logging.handlers.MemoryHandler'

This is a hack but I’m supporting 2.6 and just need something simple:

>>> from logging.handlers import MemoryHandler as MH
>>> str(MH).split("'")[1]

'logging.handlers.MemoryHandler'

回答 7

有些人(例如https://stackoverflow.com/a/16763814/5766934)认为__qualname__比更好__name__。这是显示区别的示例:

$ cat dummy.py 
class One:
    class Two:
        pass

$ python3.6
>>> import dummy
>>> print(dummy.One)
<class 'dummy.One'>
>>> print(dummy.One.Two)
<class 'dummy.One.Two'>
>>> def full_name_with_name(klass):
...     return f'{klass.__module__}.{klass.__name__}'
>>> def full_name_with_qualname(klass):
...     return f'{klass.__module__}.{klass.__qualname__}'
>>> print(full_name_with_name(dummy.One))  # Correct
dummy.One
>>> print(full_name_with_name(dummy.One.Two))  # Wrong
dummy.Two
>>> print(full_name_with_qualname(dummy.One))  # Correct
dummy.One
>>> print(full_name_with_qualname(dummy.One.Two))  # Correct
dummy.One.Two

请注意,它对于buildins也可以正常工作:

>>> print(full_name_with_qualname(print))
builtins.print
>>> import builtins
>>> builtins.print
<built-in function print>

Some people (e.g. https://stackoverflow.com/a/16763814/5766934) arguing that __qualname__ is better than __name__. Here is an example that shows the difference:

$ cat dummy.py 
class One:
    class Two:
        pass

$ python3.6
>>> import dummy
>>> print(dummy.One)
<class 'dummy.One'>
>>> print(dummy.One.Two)
<class 'dummy.One.Two'>
>>> def full_name_with_name(klass):
...     return f'{klass.__module__}.{klass.__name__}'
>>> def full_name_with_qualname(klass):
...     return f'{klass.__module__}.{klass.__qualname__}'
>>> print(full_name_with_name(dummy.One))  # Correct
dummy.One
>>> print(full_name_with_name(dummy.One.Two))  # Wrong
dummy.Two
>>> print(full_name_with_qualname(dummy.One))  # Correct
dummy.One
>>> print(full_name_with_qualname(dummy.One.Two))  # Correct
dummy.One.Two

Note, it also works correctly for buildins:

>>> print(full_name_with_qualname(print))
builtins.print
>>> import builtins
>>> builtins.print
<built-in function print>

回答 8

由于本主题的兴趣是获取完全限定的名称,因此在将相对导入与同一软件包中现有的主模块一起使用时,会出现一个陷阱。例如,使用以下模块设置:

$ cat /tmp/fqname/foo/__init__.py
$ cat /tmp/fqname/foo/bar.py
from baz import Baz
print Baz.__module__
$ cat /tmp/fqname/foo/baz.py
class Baz: pass
$ cat /tmp/fqname/main.py
import foo.bar
from foo.baz import Baz
print Baz.__module__
$ cat /tmp/fqname/foo/hum.py
import bar
import foo.bar

这是显示不同导入同一模块的结果的输出:

$ export PYTHONPATH=/tmp/fqname
$ python /tmp/fqname/main.py
foo.baz
foo.baz
$ python /tmp/fqname/foo/bar.py
baz
$ python /tmp/fqname/foo/hum.py
baz
foo.baz

当嗡嗡声使用相对路径导入bar时,bar会看到 Baz.__module__只是“ baz”,但是在第二次使用全名的导入中,bar却看到与“ foo.baz”相同。

如果要在某处保留标准名称,则最好避免这些类的相对导入。

Since the interest of this topic is to get fully qualified names, here is a pitfall that occurs when using relative imports along with the main module existing in the same package. E.g., with the below module setup:

$ cat /tmp/fqname/foo/__init__.py
$ cat /tmp/fqname/foo/bar.py
from baz import Baz
print Baz.__module__
$ cat /tmp/fqname/foo/baz.py
class Baz: pass
$ cat /tmp/fqname/main.py
import foo.bar
from foo.baz import Baz
print Baz.__module__
$ cat /tmp/fqname/foo/hum.py
import bar
import foo.bar

Here is the output showing the result of importing the same module differently:

$ export PYTHONPATH=/tmp/fqname
$ python /tmp/fqname/main.py
foo.baz
foo.baz
$ python /tmp/fqname/foo/bar.py
baz
$ python /tmp/fqname/foo/hum.py
baz
foo.baz

When hum imports bar using relative path, bar sees Baz.__module__ as just “baz”, but in the second import that uses full name, bar sees the same as “foo.baz”.

If you are persisting the fully-qualified names somewhere, it is better to avoid relative imports for those classes.


回答 9

这里没有答案对我有用。就我而言,我使用的是Python 2.7,并且知道我只会使用newstyle object类。

def get_qualified_python_name_from_class(model):
    c = model.__class__.__mro__[0]
    name = c.__module__ + "." + c.__name__
    return name

None of the answers here worked for me. In my case, I was using Python 2.7 and knew that I would only be working with newstyle object classes.

def get_qualified_python_name_from_class(model):
    c = model.__class__.__mro__[0]
    name = c.__module__ + "." + c.__name__
    return name

构造tkinter应用程序的最佳方法?

问题:构造tkinter应用程序的最佳方法?

以下是我典型的python tkinter程序的整体结构。

def funA():
    def funA1():
        def funA12():
            # stuff

    def funA2():
        # stuff

def funB():
    def funB1():
        # stuff

    def funB2():
        # stuff

def funC():
    def funC1():
        # stuff

    def funC2():
        # stuff


root = tk.Tk()

button1 = tk.Button(root, command=funA)
button1.pack()
button2 = tk.Button(root, command=funB)
button2.pack()
button3 = tk.Button(root, command=funC)
button3.pack()

funA funB并在用户单击按钮1、2、3时funC打开另一个Toplevel带有窗口小部件的窗口。

我想知道这是否是编写python tkinter程序的正确方法吗?当然,即使我这样写也可以,但这是最好的方法吗?这听起来很愚蠢,但是当我看到其他人编写的代码时,他们的代码并没有弄乱一堆函数,而且大多数情况下它们都有类。

有没有作为良好实践应遵循的特定结构?开始编写python程序之前,我应该如何计划?

我知道编程中没有最佳实践之类的东西,我也不需要。在我自己学习Python时,我只想一些建议和解释就可以使我保持正确的方向。

The following is the overall structure of my typical python tkinter program.

def funA():
    def funA1():
        def funA12():
            # stuff

    def funA2():
        # stuff

def funB():
    def funB1():
        # stuff

    def funB2():
        # stuff

def funC():
    def funC1():
        # stuff

    def funC2():
        # stuff


root = tk.Tk()

button1 = tk.Button(root, command=funA)
button1.pack()
button2 = tk.Button(root, command=funB)
button2.pack()
button3 = tk.Button(root, command=funC)
button3.pack()

funA funB and funC will bring up another Toplevel windows with widgets when user click on button 1, 2, 3.

I am wondering if this is the right way to write a python tkinter program? Sure, it will work even if I write this way, but is it the best way? It sounds stupid but when I see the codes other people written, their code is not messed up with bunch of functions and mostly they have classes.

Is there any specific structure that we should follow as good practice? How should I plan before start writing a python program?

I know there is no such thing as best practice in programming and I am not asking for it either. I just want some advice and explanations to keep me on the right direction as I am learning Python by myself.


回答 0

我主张一种面向对象的方法。这是我开始的模板:

# Use Tkinter for python 2, tkinter for python 3
import tkinter as tk

class MainApplication(tk.Frame):
    def __init__(self, parent, *args, **kwargs):
        tk.Frame.__init__(self, parent, *args, **kwargs)
        self.parent = parent

        <create the rest of your GUI here>

if __name__ == "__main__":
    root = tk.Tk()
    MainApplication(root).pack(side="top", fill="both", expand=True)
    root.mainloop()

需要注意的重要事项是:

  • 我不使用通配符导入。我将软件包导入为“ tk”,这要求我在所有命令前加上tk.。这样可以防止全局命名空间污染,并且在使用Tkinter类,ttk类或您自己的某些类时,使代码完全显而易见。

  • 主要应用是一类。这为您的所有回调和私有函数提供了私有命名空间,并且通常使组织代码更容易。在过程样式中,您必须自上而下进行编码,在使用函数之前定义函数等。使用此方法,您无需真正地在最后一步之前创建主窗口。我更喜欢从中继承,tk.Frame因为我通常从创建框架开始,但这绝不是必需的。

如果您的应用程序具有其他顶级窗口,建议您将每个窗口都设为一个单独的类,并从继承tk.Toplevel。这为您提供了上述所有相同的优点-窗口是原子的,它们具有自己的命名空间,并且代码井井有条。另外,一旦代码开始变大,就可以很容易地将每个模块放入自己的模块中。

最后,您可能要考虑对接口的每个主要部分使用类。例如,如果您要创建一个带有工具栏,导航窗格,状态栏和主区域的应用程序,则可以使每个类成为一个类。这使您的主要代码非常小,易于理解:

class Navbar(tk.Frame): ...
class Toolbar(tk.Frame): ...
class Statusbar(tk.Frame): ...
class Main(tk.Frame): ...

class MainApplication(tk.Frame):
    def __init__(self, parent, *args, **kwargs):
        tk.Frame.__init__(self, parent, *args, **kwargs)
        self.statusbar = Statusbar(self, ...)
        self.toolbar = Toolbar(self, ...)
        self.navbar = Navbar(self, ...)
        self.main = Main(self, ...)

        self.statusbar.pack(side="bottom", fill="x")
        self.toolbar.pack(side="top", fill="x")
        self.navbar.pack(side="left", fill="y")
        self.main.pack(side="right", fill="both", expand=True)

由于所有这些实例共享一个公共父对象,因此该父对象实际上成为了模型-视图-控制器体系结构的“控制器”部分。因此,例如,主窗口可以通过调用在状态栏上放置一些内容self.parent.statusbar.set("Hello, world")。这使您可以在组件之间定义一个简单的接口,从而有助于保持最小的耦合。

I advocate an object oriented approach. This is the template that I start out with:

# Use Tkinter for python 2, tkinter for python 3
import tkinter as tk

class MainApplication(tk.Frame):
    def __init__(self, parent, *args, **kwargs):
        tk.Frame.__init__(self, parent, *args, **kwargs)
        self.parent = parent

        <create the rest of your GUI here>

if __name__ == "__main__":
    root = tk.Tk()
    MainApplication(root).pack(side="top", fill="both", expand=True)
    root.mainloop()

The important things to notice are:

  • I don’t use a wildcard import. I import the package as “tk”, which requires that I prefix all commands with tk.. This prevents global namespace pollution, plus it makes the code completely obvious when you are using Tkinter classes, ttk classes, or some of your own.

  • The main application is a class. This gives you a private namespace for all of your callbacks and private functions, and just generally makes it easier to organize your code. In a procedural style you have to code top-down, defining functions before using them, etc. With this method you don’t since you don’t actually create the main window until the very last step. I prefer inheriting from tk.Frame just because I typically start by creating a frame, but it is by no means necessary.

If your app has additional toplevel windows, I recommend making each of those a separate class, inheriting from tk.Toplevel. This gives you all of the same advantages mentioned above — the windows are atomic, they have their own namespace, and the code is well organized. Plus, it makes it easy to put each into its own module once the code starts to get large.

Finally, you might want to consider using classes for every major portion of your interface. For example, if you’re creating an app with a toolbar, a navigation pane, a statusbar, and a main area, you could make each one of those classes. This makes your main code quite small and easy to understand:

class Navbar(tk.Frame): ...
class Toolbar(tk.Frame): ...
class Statusbar(tk.Frame): ...
class Main(tk.Frame): ...

class MainApplication(tk.Frame):
    def __init__(self, parent, *args, **kwargs):
        tk.Frame.__init__(self, parent, *args, **kwargs)
        self.statusbar = Statusbar(self, ...)
        self.toolbar = Toolbar(self, ...)
        self.navbar = Navbar(self, ...)
        self.main = Main(self, ...)

        self.statusbar.pack(side="bottom", fill="x")
        self.toolbar.pack(side="top", fill="x")
        self.navbar.pack(side="left", fill="y")
        self.main.pack(side="right", fill="both", expand=True)

Since all of those instances share a common parent, the parent effectively becomes the “controller” part of a model-view-controller architecture. So, for example, the main window could place something on the statusbar by calling self.parent.statusbar.set("Hello, world"). This allows you to define a simple interface between the components, helping to keep coupling to a minimun.


回答 1

将每个顶级窗口放入自己的单独类中,可以使代码重用并更好地组织代码。窗口中存在的任何按钮和相关方法都应在此类内定义。这是一个示例(从此处获取):

import tkinter as tk

class Demo1:
    def __init__(self, master):
        self.master = master
        self.frame = tk.Frame(self.master)
        self.button1 = tk.Button(self.frame, text = 'New Window', width = 25, command = self.new_window)
        self.button1.pack()
        self.frame.pack()
    def new_window(self):
        self.newWindow = tk.Toplevel(self.master)
        self.app = Demo2(self.newWindow)

class Demo2:
    def __init__(self, master):
        self.master = master
        self.frame = tk.Frame(self.master)
        self.quitButton = tk.Button(self.frame, text = 'Quit', width = 25, command = self.close_windows)
        self.quitButton.pack()
        self.frame.pack()
    def close_windows(self):
        self.master.destroy()

def main(): 
    root = tk.Tk()
    app = Demo1(root)
    root.mainloop()

if __name__ == '__main__':
    main()

另请参阅:

希望有帮助。

Putting each of your top-level windows into it’s own separate class gives you code re-use and better code organization. Any buttons and relevant methods that are present in the window should be defined inside this class. Here’s an example (taken from here):

import tkinter as tk

class Demo1:
    def __init__(self, master):
        self.master = master
        self.frame = tk.Frame(self.master)
        self.button1 = tk.Button(self.frame, text = 'New Window', width = 25, command = self.new_window)
        self.button1.pack()
        self.frame.pack()
    def new_window(self):
        self.newWindow = tk.Toplevel(self.master)
        self.app = Demo2(self.newWindow)

class Demo2:
    def __init__(self, master):
        self.master = master
        self.frame = tk.Frame(self.master)
        self.quitButton = tk.Button(self.frame, text = 'Quit', width = 25, command = self.close_windows)
        self.quitButton.pack()
        self.frame.pack()
    def close_windows(self):
        self.master.destroy()

def main(): 
    root = tk.Tk()
    app = Demo1(root)
    root.mainloop()

if __name__ == '__main__':
    main()

Also see:

Hope that helps.


回答 2

这不是一个坏结构。它会很好地工作。但是,当某人单击按钮或其他内容时,您必须在函数中具有执行命令的功能

因此,您可以为这些编写类,然后在该类中具有处理按钮单击等命令的方法。

这是一个例子:

import tkinter as tk

class Window1:
    def __init__(self, master):
        pass
        # Create labels, entries,buttons
    def button_click(self):
        pass
        # If button is clicked, run this method and open window 2


class Window2:
    def __init__(self, master):
        #create buttons,entries,etc

    def button_method(self):
        #run this when button click to close window
        self.master.destroy()

def main(): #run mianloop 
    root = tk.Tk()
    app = Window1(root)
    root.mainloop()

if __name__ == '__main__':
    main()

通常带有多个窗口的tk程序是多个大类,并且在__init__所有条目中创建标签等,然后每种方法都将处理按钮单击事件

只要有可读性,实际上就没有正确的方法,只要它对您有用并且可以完成工作,并且您可以轻松地解释它,因为如果您不能轻松地解释您的程序,那么可能会有更好的方法。

看一看《Tkinter中的思考》

This isn’t a bad structure; it will work just fine. However, you do have to have functions in a function to do commands when someone clicks on a button or something

So what you could do is write classes for these then have methods in the class that handle commands for the button clicks and such.

Here’s an example:

import tkinter as tk

class Window1:
    def __init__(self, master):
        pass
        # Create labels, entries,buttons
    def button_click(self):
        pass
        # If button is clicked, run this method and open window 2


class Window2:
    def __init__(self, master):
        #create buttons,entries,etc

    def button_method(self):
        #run this when button click to close window
        self.master.destroy()

def main(): #run mianloop 
    root = tk.Tk()
    app = Window1(root)
    root.mainloop()

if __name__ == '__main__':
    main()

Usually tk programs with multiple windows are multiple big classes and in the __init__ all the entries, labels etc are created and then each method is to handle button click events

There isn’t really a right way to do it, whatever works for you and gets the job done as long as its readable and you can easily explain it because if you cant easily explain your program, there probably is a better way to do it.

Take a look at Thinking in Tkinter.


回答 3

OOP应该是方法,frame应该是类变量而不是实例变量

from Tkinter import *
class App:
  def __init__(self, master):
    frame = Frame(master)
    frame.pack()
    self.button = Button(frame, 
                         text="QUIT", fg="red",
                         command=frame.quit)
    self.button.pack(side=LEFT)
    self.slogan = Button(frame,
                         text="Hello",
                         command=self.write_slogan)
    self.slogan.pack(side=LEFT)
  def write_slogan(self):
    print "Tkinter is easy to use!"

root = Tk()
app = App(root)
root.mainloop()

参考:http : //www.python-course.eu/tkinter_buttons.php

OOP should be the approach and frame should be a class variable instead of instance variable.

from Tkinter import *
class App:
  def __init__(self, master):
    frame = Frame(master)
    frame.pack()
    self.button = Button(frame, 
                         text="QUIT", fg="red",
                         command=frame.quit)
    self.button.pack(side=LEFT)
    self.slogan = Button(frame,
                         text="Hello",
                         command=self.write_slogan)
    self.slogan.pack(side=LEFT)
  def write_slogan(self):
    print "Tkinter is easy to use!"

root = Tk()
app = App(root)
root.mainloop()

Reference: http://www.python-course.eu/tkinter_buttons.php


回答 4

使用类对应用程序进行组织可以使您和与您一起工作的其他人轻松调试问题并轻松改进应用程序。

您可以像这样轻松地组织您的应用程序:

class hello(Tk):
    def __init__(self):
        super(hello, self).__init__()
        self.btn = Button(text = "Click me", command=close)
        self.btn.pack()
    def close():
        self.destroy()

app = hello()
app.mainloop()

Organizing your application using class make it easy to you and others who work with you to debug problems and improve the app easily.

You can easily organize your application like this:

class hello(Tk):
    def __init__(self):
        super(hello, self).__init__()
        self.btn = Button(text = "Click me", command=close)
        self.btn.pack()
    def close():
        self.destroy()

app = hello()
app.mainloop()

回答 5

学习如何构建程序的最佳方法可能是阅读他人的代码,尤其是如果这是许多人都为之贡献的大型程序。在查看了许多项目的代码之后,您应该了解共识样式应该是什么。

作为一种语言,Python的特殊之处在于,在如何格式化代码方面存在一些严格的指导原则。第一个是所谓的“ Python禅”:

  • 美丽胜于丑陋。
  • 显式胜于隐式。
  • 简单胜于复杂。
  • 复杂胜于复杂。
  • 扁平比嵌套更好。
  • 稀疏胜于密集。
  • 可读性很重要。
  • 特殊情况还不足以打破规则。
  • 尽管实用性胜过纯度。
  • 错误绝不能默默传递。
  • 除非明确地保持沉默。
  • 面对模棱两可的想法,拒绝猜测的诱惑。
  • 应该有一种-最好只有一种-显而易见的方法。
  • 尽管除非您是荷兰人,否则一开始这种方式可能并不明显。
  • 现在总比没有好。
  • 虽然从未往往比了。
  • 如果实现难以解释,那是个坏主意。
  • 如果实现易于解释,则可能是个好主意。
  • 命名空间是一个很棒的主意-让我们做更多这些吧!

在更实际的水平上,有Python的样式指南PEP8

考虑到这些,我会说您的代码风格并不适合,特别是嵌套函数。通过使用类或将它们移到单独的模块中,找到一种解决方案。这将使程序的结构更容易理解。

Probably the best way to learn how to structure your program is by reading other people’s code, especially if it’s a large program to which many people have contributed. After looking at the code of many projects, you should get an idea of what the consensus style should be.

Python, as a language, is special in that there are some strong guidelines as to how you should format your code. The first is the so-called “Zen of Python”:

  • Beautiful is better than ugly.
  • Explicit is better than implicit.
  • Simple is better than complex.
  • Complex is better than complicated.
  • Flat is better than nested.
  • Sparse is better than dense.
  • Readability counts.
  • Special cases aren’t special enough to break the rules.
  • Although practicality beats purity.
  • Errors should never pass silently.
  • Unless explicitly silenced.
  • In the face of ambiguity, refuse the temptation to guess.
  • There should be one– and preferably only one –obvious way to do it.
  • Although that way may not be obvious at first unless you’re Dutch.
  • Now is better than never.
  • Although never is often better than right now.
  • If the implementation is hard to explain, it’s a bad idea.
  • If the implementation is easy to explain, it may be a good idea.
  • Namespaces are one honking great idea — let’s do more of those!

On a more practical level, there is PEP8, the style guide for Python.

With those in mind, I would say that your code style doesn’t really fit, particularly the nested functions. Find a way to flatten those out, either by using classes or moving them into separate modules. This will make the structure of your program much easier to understand.


回答 6

我个人不使用面向对象的方法,主要是因为a)只会妨碍;b)您永远不会将其作为模块重复使用。

但是这里没有讨论的是必须使用线程或多处理。总是。否则您的应用程序将很糟糕。

只需做一个简单的测试:启动一个窗口,然后获取一些URL或其他内容。所做的更改是在网络请求发生时不会更新您的用户界面。意思是,您的应用程序窗口将被破坏。取决于您所使用的操作系统,但是大多数情况下,它不会重绘,在窗口上拖动的任何内容都将粘贴在它上面,直到该过程返回到TK mainloop。

I personally do not use the objected oriented approach, mostly because it a) only get in the way; b) you will never reuse that as a module.

but something that is not discussed here, is that you must use threading or multiprocessing. Always. otherwise your application will be awful.

just do a simple test: start a window, and then fetch some URL or anything else. changes are your UI will not be updated while the network request is happening. Meaning, your application window will be broken. depend on the OS you are on, but most times, it will not redraw, anything you drag over the window will be plastered on it, until the process is back to the TK mainloop.


键盘中断与python的多处理池

问题:键盘中断与python的多处理池

如何使用python的多处理池处理KeyboardInterrupt事件?这是一个简单的示例:

from multiprocessing import Pool
from time import sleep
from sys import exit

def slowly_square(i):
    sleep(1)
    return i*i

def go():
    pool = Pool(8)
    try:
        results = pool.map(slowly_square, range(40))
    except KeyboardInterrupt:
        # **** THIS PART NEVER EXECUTES. ****
        pool.terminate()
        print "You cancelled the program!"
        sys.exit(1)
    print "\nFinally, here are the results: ", results

if __name__ == "__main__":
    go()

当运行上面的代码时,KeyboardInterrupt按时会引发^C,但是该过程只是在此时挂起,我必须在外部将其杀死。

我希望能够随时按下^C并使所有进程正常退出。

How can I handle KeyboardInterrupt events with python’s multiprocessing Pools? Here is a simple example:

from multiprocessing import Pool
from time import sleep
from sys import exit

def slowly_square(i):
    sleep(1)
    return i*i

def go():
    pool = Pool(8)
    try:
        results = pool.map(slowly_square, range(40))
    except KeyboardInterrupt:
        # **** THIS PART NEVER EXECUTES. ****
        pool.terminate()
        print "You cancelled the program!"
        sys.exit(1)
    print "\nFinally, here are the results: ", results

if __name__ == "__main__":
    go()

When running the code above, the KeyboardInterrupt gets raised when I press ^C, but the process simply hangs at that point and I have to kill it externally.

I want to be able to press ^C at any time and cause all of the processes to exit gracefully.


回答 0

这是一个Python错误。等待threading.Condition.wait()中的条件时,从不发送KeyboardInterrupt。复制:

import threading
cond = threading.Condition(threading.Lock())
cond.acquire()
cond.wait(None)
print "done"

直到wait()返回,才会传递KeyboardInterrupt异常,并且它永远不会返回,因此中断永远不会发生。KeyboardInterrupt几乎应该可以中断条件等待。

请注意,如果指定了超时,则不会发生这种情况。cond.wait(1)将立即收到中断。因此,一种解决方法是指定超时。为此,请更换

    results = pool.map(slowly_square, range(40))

    results = pool.map_async(slowly_square, range(40)).get(9999999)

或类似。

This is a Python bug. When waiting for a condition in threading.Condition.wait(), KeyboardInterrupt is never sent. Repro:

import threading
cond = threading.Condition(threading.Lock())
cond.acquire()
cond.wait(None)
print "done"

The KeyboardInterrupt exception won’t be delivered until wait() returns, and it never returns, so the interrupt never happens. KeyboardInterrupt should almost certainly interrupt a condition wait.

Note that this doesn’t happen if a timeout is specified; cond.wait(1) will receive the interrupt immediately. So, a workaround is to specify a timeout. To do that, replace

    results = pool.map(slowly_square, range(40))

with

    results = pool.map_async(slowly_square, range(40)).get(9999999)

or similar.


回答 1

从我最近发现的情况来看,最好的解决方案是设置工作进程完全忽略SIGINT,并将所有清理代码限制在父进程中。这可以解决空闲和繁忙的工作进程的问题,并且在子进程中不需要错误处理代码。

import signal

...

def init_worker():
    signal.signal(signal.SIGINT, signal.SIG_IGN)

...

def main()
    pool = multiprocessing.Pool(size, init_worker)

    ...

    except KeyboardInterrupt:
        pool.terminate()
        pool.join()

解释和完整的示例代码分别位于http://noswap.com/blog/python-multiprocessing-keyboardinterrupt/http://github.com/jreese/multiprocessing-keyboardinterrupt

From what I have recently found, the best solution is to set up the worker processes to ignore SIGINT altogether, and confine all the cleanup code to the parent process. This fixes the problem for both idle and busy worker processes, and requires no error handling code in your child processes.

import signal

...

def init_worker():
    signal.signal(signal.SIGINT, signal.SIG_IGN)

...

def main()
    pool = multiprocessing.Pool(size, init_worker)

    ...

    except KeyboardInterrupt:
        pool.terminate()
        pool.join()

Explanation and full example code can be found at http://noswap.com/blog/python-multiprocessing-keyboardinterrupt/ and http://github.com/jreese/multiprocessing-keyboardinterrupt respectively.


回答 2

由于某些原因,仅Exception可正常处理从基类继承的异常。作为一种变通方法,你可能会重新提高你KeyboardInterrupt作为一个Exception实例:

from multiprocessing import Pool
import time

class KeyboardInterruptError(Exception): pass

def f(x):
    try:
        time.sleep(x)
        return x
    except KeyboardInterrupt:
        raise KeyboardInterruptError()

def main():
    p = Pool(processes=4)
    try:
        print 'starting the pool map'
        print p.map(f, range(10))
        p.close()
        print 'pool map complete'
    except KeyboardInterrupt:
        print 'got ^C while pool mapping, terminating the pool'
        p.terminate()
        print 'pool is terminated'
    except Exception, e:
        print 'got exception: %r, terminating the pool' % (e,)
        p.terminate()
        print 'pool is terminated'
    finally:
        print 'joining pool processes'
        p.join()
        print 'join complete'
    print 'the end'

if __name__ == '__main__':
    main()

通常,您将获得以下输出:

staring the pool map
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
pool map complete
joining pool processes
join complete
the end

因此,如果您点击^C,您将获得:

staring the pool map
got ^C while pool mapping, terminating the pool
pool is terminated
joining pool processes
join complete
the end

For some reasons, only exceptions inherited from the base Exception class are handled normally. As a workaround, you may re-raise your KeyboardInterrupt as an Exception instance:

from multiprocessing import Pool
import time

class KeyboardInterruptError(Exception): pass

def f(x):
    try:
        time.sleep(x)
        return x
    except KeyboardInterrupt:
        raise KeyboardInterruptError()

def main():
    p = Pool(processes=4)
    try:
        print 'starting the pool map'
        print p.map(f, range(10))
        p.close()
        print 'pool map complete'
    except KeyboardInterrupt:
        print 'got ^C while pool mapping, terminating the pool'
        p.terminate()
        print 'pool is terminated'
    except Exception, e:
        print 'got exception: %r, terminating the pool' % (e,)
        p.terminate()
        print 'pool is terminated'
    finally:
        print 'joining pool processes'
        p.join()
        print 'join complete'
    print 'the end'

if __name__ == '__main__':
    main()

Normally you would get the following output:

staring the pool map
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
pool map complete
joining pool processes
join complete
the end

So if you hit ^C, you will get:

staring the pool map
got ^C while pool mapping, terminating the pool
pool is terminated
joining pool processes
join complete
the end

回答 3

通常这种简单的结构工程CtrlC上池:

def signal_handle(_signal, frame):
    print "Stopping the Jobs."

signal.signal(signal.SIGINT, signal_handle)

如几篇类似文章所述:

无需尝试即可在Python中捕获键盘中断

Usually this simple structure works for CtrlC on Pool :

def signal_handle(_signal, frame):
    print "Stopping the Jobs."

signal.signal(signal.SIGINT, signal_handle)

As was stated in few similar posts:

Capture keyboardinterrupt in Python without try-except


回答 4

投票表决的答案不能解决核心问题,但具有类似的副作用。

多重处理库的作者Jesse Noller解释了multiprocessing.Pool在旧博客中使用CTRL + C时如何正确处理。

import signal
from multiprocessing import Pool


def initializer():
    """Ignore CTRL+C in the worker process."""
    signal.signal(signal.SIGINT, signal.SIG_IGN)


pool = Pool(initializer=initializer)

try:
    pool.map(perform_download, dowloads)
except KeyboardInterrupt:
    pool.terminate()
    pool.join()

The voted answer does not tackle the core issue but a similar side effect.

Jesse Noller, the author of the multiprocessing library, explains how to correctly deal with CTRL+C when using multiprocessing.Pool in a old blog post.

import signal
from multiprocessing import Pool


def initializer():
    """Ignore CTRL+C in the worker process."""
    signal.signal(signal.SIGINT, signal.SIG_IGN)


pool = Pool(initializer=initializer)

try:
    pool.map(perform_download, dowloads)
except KeyboardInterrupt:
    pool.terminate()
    pool.join()

回答 5

似乎有两个问题使多处理过程变得异常烦人。第一个(由Glenn指出)是您需要使用map_async超时而不是map为了获得即时响应(即,不要完成对整个列表的处理)。第二点(Andrey指出)是,多处理不会捕获不继承自Exception(例如SystemExit)的异常。所以这是我的解决方案,涉及这两个方面:

import sys
import functools
import traceback
import multiprocessing

def _poolFunctionWrapper(function, arg):
    """Run function under the pool

    Wrapper around function to catch exceptions that don't inherit from
    Exception (which aren't caught by multiprocessing, so that you end
    up hitting the timeout).
    """
    try:
        return function(arg)
    except:
        cls, exc, tb = sys.exc_info()
        if issubclass(cls, Exception):
            raise # No worries
        # Need to wrap the exception with something multiprocessing will recognise
        import traceback
        print "Unhandled exception %s (%s):\n%s" % (cls.__name__, exc, traceback.format_exc())
        raise Exception("Unhandled exception: %s (%s)" % (cls.__name__, exc))

def _runPool(pool, timeout, function, iterable):
    """Run the pool

    Wrapper around pool.map_async, to handle timeout.  This is required so as to
    trigger an immediate interrupt on the KeyboardInterrupt (Ctrl-C); see
    http://stackoverflow.com/questions/1408356/keyboard-interrupts-with-pythons-multiprocessing-pool

    Further wraps the function in _poolFunctionWrapper to catch exceptions
    that don't inherit from Exception.
    """
    return pool.map_async(functools.partial(_poolFunctionWrapper, function), iterable).get(timeout)

def myMap(function, iterable, numProcesses=1, timeout=9999):
    """Run the function on the iterable, optionally with multiprocessing"""
    if numProcesses > 1:
        pool = multiprocessing.Pool(processes=numProcesses, maxtasksperchild=1)
        mapFunc = functools.partial(_runPool, pool, timeout)
    else:
        pool = None
        mapFunc = map
    results = mapFunc(function, iterable)
    if pool is not None:
        pool.close()
        pool.join()
    return results

It seems there are two issues that make exceptions while multiprocessing annoying. The first (noted by Glenn) is that you need to use map_async with a timeout instead of map in order to get an immediate response (i.e., don’t finish processing the entire list). The second (noted by Andrey) is that multiprocessing doesn’t catch exceptions that don’t inherit from Exception (e.g., SystemExit). So here’s my solution that deals with both of these:

import sys
import functools
import traceback
import multiprocessing

def _poolFunctionWrapper(function, arg):
    """Run function under the pool

    Wrapper around function to catch exceptions that don't inherit from
    Exception (which aren't caught by multiprocessing, so that you end
    up hitting the timeout).
    """
    try:
        return function(arg)
    except:
        cls, exc, tb = sys.exc_info()
        if issubclass(cls, Exception):
            raise # No worries
        # Need to wrap the exception with something multiprocessing will recognise
        import traceback
        print "Unhandled exception %s (%s):\n%s" % (cls.__name__, exc, traceback.format_exc())
        raise Exception("Unhandled exception: %s (%s)" % (cls.__name__, exc))

def _runPool(pool, timeout, function, iterable):
    """Run the pool

    Wrapper around pool.map_async, to handle timeout.  This is required so as to
    trigger an immediate interrupt on the KeyboardInterrupt (Ctrl-C); see
    http://stackoverflow.com/questions/1408356/keyboard-interrupts-with-pythons-multiprocessing-pool

    Further wraps the function in _poolFunctionWrapper to catch exceptions
    that don't inherit from Exception.
    """
    return pool.map_async(functools.partial(_poolFunctionWrapper, function), iterable).get(timeout)

def myMap(function, iterable, numProcesses=1, timeout=9999):
    """Run the function on the iterable, optionally with multiprocessing"""
    if numProcesses > 1:
        pool = multiprocessing.Pool(processes=numProcesses, maxtasksperchild=1)
        mapFunc = functools.partial(_runPool, pool, timeout)
    else:
        pool = None
        mapFunc = map
    results = mapFunc(function, iterable)
    if pool is not None:
        pool.close()
        pool.join()
    return results

回答 6

我发现目前最好的解决方案是不使用multiprocessing.pool功能,而是使用自己的池功能。我提供了一个使用apply_async演示该错误的示例,以及一个示例,展示了如何避免完全使用池功能。

http://www.bryceboe.com/2010/08/26/python-multiprocessing-and-keyboardinterrupt/

I found, for the time being, the best solution is to not use the multiprocessing.pool feature but rather roll your own pool functionality. I provided an example demonstrating the error with apply_async as well as an example showing how to avoid using the pool functionality altogether.

http://www.bryceboe.com/2010/08/26/python-multiprocessing-and-keyboardinterrupt/


回答 7

我是Python的新手。我到处都在寻找答案,却偶然发现了这个以及其他一些博客和YouTube视频。我试图将粘贴作者的代码复制到上面,并在Windows 7 64位的python 2.7.13上重现它。这接近我想要实现的目标。

我使我的子进程忽略ControlC,并使父进程终止。似乎绕过子进程确实为我避免了这个问题。

#!/usr/bin/python

from multiprocessing import Pool
from time import sleep
from sys import exit


def slowly_square(i):
    try:
        print "<slowly_square> Sleeping and later running a square calculation..."
        sleep(1)
        return i * i
    except KeyboardInterrupt:
        print "<child processor> Don't care if you say CtrlC"
        pass


def go():
    pool = Pool(8)

    try:
        results = pool.map(slowly_square, range(40))
    except KeyboardInterrupt:
        pool.terminate()
        pool.close()
        print "You cancelled the program!"
        exit(1)
    print "Finally, here are the results", results


if __name__ == '__main__':
    go()

从头开始的那部分pool.terminate()似乎永远不会执行。

I’m a newbie in Python. I was looking everywhere for answer and stumble upon this and a few other blogs and youtube videos. I have tried to copy paste the author’s code above and reproduce it on my python 2.7.13 in windows 7 64- bit. It’s close to what I wanna achieve.

I made my child processes to ignore the ControlC and make the parent process terminate. Looks like bypassing the child process does avoid this problem for me.

#!/usr/bin/python

from multiprocessing import Pool
from time import sleep
from sys import exit


def slowly_square(i):
    try:
        print "<slowly_square> Sleeping and later running a square calculation..."
        sleep(1)
        return i * i
    except KeyboardInterrupt:
        print "<child processor> Don't care if you say CtrlC"
        pass


def go():
    pool = Pool(8)

    try:
        results = pool.map(slowly_square, range(40))
    except KeyboardInterrupt:
        pool.terminate()
        pool.close()
        print "You cancelled the program!"
        exit(1)
    print "Finally, here are the results", results


if __name__ == '__main__':
    go()

The part starting at pool.terminate() never seems to execute.


回答 8

您可以尝试使用Pool对象的apply_async方法,如下所示:

import multiprocessing
import time
from datetime import datetime


def test_func(x):
    time.sleep(2)
    return x**2


def apply_multiprocessing(input_list, input_function):
    pool_size = 5
    pool = multiprocessing.Pool(processes=pool_size, maxtasksperchild=10)

    try:
        jobs = {}
        for value in input_list:
            jobs[value] = pool.apply_async(input_function, [value])

        results = {}
        for value, result in jobs.items():
            try:
                results[value] = result.get()
            except KeyboardInterrupt:
                print "Interrupted by user"
                pool.terminate()
                break
            except Exception as e:
                results[value] = e
        return results
    except Exception:
        raise
    finally:
        pool.close()
        pool.join()


if __name__ == "__main__":
    iterations = range(100)
    t0 = datetime.now()
    results1 = apply_multiprocessing(iterations, test_func)
    t1 = datetime.now()
    print results1
    print "Multi: {}".format(t1 - t0)

    t2 = datetime.now()
    results2 = {i: test_func(i) for i in iterations}
    t3 = datetime.now()
    print results2
    print "Non-multi: {}".format(t3 - t2)

输出:

100
Multiprocessing run time: 0:00:41.131000
100
Non-multiprocessing run time: 0:03:20.688000

此方法的优点是中断之前处理的结果将返回到结果字典中:

>>> apply_multiprocessing(range(100), test_func)
Interrupted by user
{0: 0, 1: 1, 2: 4, 3: 9, 4: 16, 5: 25}

You can try using the apply_async method of a Pool object, like this:

import multiprocessing
import time
from datetime import datetime


def test_func(x):
    time.sleep(2)
    return x**2


def apply_multiprocessing(input_list, input_function):
    pool_size = 5
    pool = multiprocessing.Pool(processes=pool_size, maxtasksperchild=10)

    try:
        jobs = {}
        for value in input_list:
            jobs[value] = pool.apply_async(input_function, [value])

        results = {}
        for value, result in jobs.items():
            try:
                results[value] = result.get()
            except KeyboardInterrupt:
                print "Interrupted by user"
                pool.terminate()
                break
            except Exception as e:
                results[value] = e
        return results
    except Exception:
        raise
    finally:
        pool.close()
        pool.join()


if __name__ == "__main__":
    iterations = range(100)
    t0 = datetime.now()
    results1 = apply_multiprocessing(iterations, test_func)
    t1 = datetime.now()
    print results1
    print "Multi: {}".format(t1 - t0)

    t2 = datetime.now()
    results2 = {i: test_func(i) for i in iterations}
    t3 = datetime.now()
    print results2
    print "Non-multi: {}".format(t3 - t2)

Output:

100
Multiprocessing run time: 0:00:41.131000
100
Non-multiprocessing run time: 0:03:20.688000

An advantage of this method is that results processed before interruption will be returned in the results dictionary:

>>> apply_multiprocessing(range(100), test_func)
Interrupted by user
{0: 0, 1: 1, 2: 4, 3: 9, 4: 16, 5: 25}

回答 9

奇怪的是,您似乎也必须处理KeyboardInterrupt孩子中的孩子。我本来希望它能像写的那样工作…尝试更改slowly_square为:

def slowly_square(i):
    try:
        sleep(1)
        return i * i
    except KeyboardInterrupt:
        print 'You EVIL bastard!'
        return 0

那应该可以按您预期的那样工作。

Strangely enough it looks like you have to handle the KeyboardInterrupt in the children as well. I would have expected this to work as written… try changing slowly_square to:

def slowly_square(i):
    try:
        sleep(1)
        return i * i
    except KeyboardInterrupt:
        print 'You EVIL bastard!'
        return 0

That should work as you expected.


从Matplotlib的颜色图中获取单个颜色

问题:从Matplotlib的颜色图中获取单个颜色

cmap例如,如果您有一个Colormap :

cmap = matplotlib.cm.get_cmap('Spectral')

如何从0到1之间获得特定的颜色,其中0是地图中的第一种颜色,而1是地图中的最后一种颜色?

理想情况下,我可以通过执行以下操作来获得地图中的中间颜色:

>>> do_some_magic(cmap, 0.5) # Return an RGBA tuple
(0.1, 0.2, 0.3, 1.0)

If you have a Colormap cmap, for example:

cmap = matplotlib.cm.get_cmap('Spectral')

How can you get a particular colour out of it between 0 and 1, where 0 is the first colour in the map and 1 is the last colour in the map?

Ideally, I would be able to get the middle colour in the map by doing:

>>> do_some_magic(cmap, 0.5) # Return an RGBA tuple
(0.1, 0.2, 0.3, 1.0)

回答 0

您可以使用下面的代码来执行此操作,而问题中的代码实际上与所需的代码非常接近,您所要做的就是调用cmap您拥有的对象。

import matplotlib

cmap = matplotlib.cm.get_cmap('Spectral')

rgba = cmap(0.5)
print(rgba) # (0.99807766255210428, 0.99923106502084169, 0.74602077638401709, 1.0)

对于[0.0,1.0]范围之外的值,它将分别返回底色和底色。默认情况下,这是该范围内的最小和最大颜色(即0.0和1.0)。可以使用cmap.set_under()和更改默认设置cmap.set_over()

对于“特殊”数字(例如)np.nannp.inf默认值是使用0.0值,可以使用cmap.set_bad()类似于“低于”和“高于”的方式更改此值。

最后,您可能需要对数据进行规范化以使其符合范围[0.0, 1.0]matplotlib.colors.Normalize只需使用下面的小示例所示,即可完成此操作,在该示例中,参数vminvmax描述应分别映射到0.0和1.0的数字。

import matplotlib

norm = matplotlib.colors.Normalize(vmin=10.0, vmax=20.0)

print(norm(15.0)) # 0.5

对数归一化器(matplotlib.colors.LogNorm)也可用于值范围较大的数据范围。

(感谢Joe Kingtontcaswell提出了有关如何改善答案的建议。)

You can do this with the code below, and the code in your question was actually very close to what you needed, all you have to do is call the cmap object you have.

import matplotlib

cmap = matplotlib.cm.get_cmap('Spectral')

rgba = cmap(0.5)
print(rgba) # (0.99807766255210428, 0.99923106502084169, 0.74602077638401709, 1.0)

For values outside of the range [0.0, 1.0] it will return the under and over colour (respectively). This, by default, is the minimum and maximum colour within the range (so 0.0 and 1.0). This default can be changed with cmap.set_under() and cmap.set_over().

For “special” numbers such as np.nan and np.inf the default is to use the 0.0 value, this can be changed using cmap.set_bad() similarly to under and over as above.

Finally it may be necessary for you to normalize your data such that it conforms to the range [0.0, 1.0]. This can be done using matplotlib.colors.Normalize simply as shown in the small example below where the arguments vmin and vmax describe what numbers should be mapped to 0.0 and 1.0 respectively.

import matplotlib

norm = matplotlib.colors.Normalize(vmin=10.0, vmax=20.0)

print(norm(15.0)) # 0.5

A logarithmic normaliser (matplotlib.colors.LogNorm) is also available for data ranges with a large range of values.

(Thanks to both Joe Kington and tcaswell for suggestions on how to improve the answer.)


回答 1

为了获得rgba整数值而不是float值,我们可以

rgba = cmap(0.5,bytes=True)

因此,为了简化基于Ffisegydd的答案的代码,代码将如下所示:

#import colormap
from matplotlib import cm

#normalize item number values to colormap
norm = matplotlib.colors.Normalize(vmin=0, vmax=1000)

#colormap possible values = viridis, jet, spectral
rgba_color = cm.jet(norm(400),bytes=True) 

#400 is one of value between 0 and 1000

In order to get rgba integer value instead of float value, we can do

rgba = cmap(0.5,bytes=True)

So to simplify the code based on answer from Ffisegydd, the code would be like this:

#import colormap
from matplotlib import cm

#normalize item number values to colormap
norm = matplotlib.colors.Normalize(vmin=0, vmax=1000)

#colormap possible values = viridis, jet, spectral
rgba_color = cm.jet(norm(400),bytes=True) 

#400 is one of value between 0 and 1000

回答 2

为了建立在Ffisegyddamaliammr的解决方案的基础上,这是一个示例,其中我们为自定义颜色图制作CSV表示形式:

#! /usr/bin/env python3
import matplotlib
import numpy as np 

vmin = 0.1
vmax = 1000

norm = matplotlib.colors.Normalize(np.log10(vmin), np.log10(vmax))
lognum = norm(np.log10([.5, 2., 10, 40, 150,1000]))

cdict = {
    'red':
    (
        (0., 0, 0),
        (lognum[0], 0, 0),
        (lognum[1], 0, 0),
        (lognum[2], 1, 1),
        (lognum[3], 0.8, 0.8),
        (lognum[4], .7, .7),
    (lognum[5], .7, .7)
    ),
    'green':
    (
        (0., .6, .6),
        (lognum[0], 0.8, 0.8),
        (lognum[1], 1, 1),
        (lognum[2], 1, 1),
        (lognum[3], 0, 0),
        (lognum[4], 0, 0),
    (lognum[5], 0, 0)
    ),
    'blue':
    (
        (0., 0, 0),
        (lognum[0], 0, 0),
        (lognum[1], 0, 0),
        (lognum[2], 0, 0),
        (lognum[3], 0, 0),
        (lognum[4], 0, 0),
    (lognum[5], 1, 1)
    )
}


mycmap = matplotlib.colors.LinearSegmentedColormap('my_colormap', cdict, 256)   
norm = matplotlib.colors.LogNorm(vmin, vmax)
colors = {}
count = 0
step_size = 0.001
for value in np.arange(vmin, vmax+step_size, step_size):
    count += 1
    print("%d/%d %f%%" % (count, vmax*(1./step_size), 100.*count/(vmax*(1./step_size))))
    rgba = mycmap(norm(value), bytes=True)
    color = (rgba[0], rgba[1], rgba[2])
    if color not in colors.values():
        colors[value] = color

print ("value, red, green, blue")
for value in sorted(colors.keys()):
    rgb = colors[value]
    print("%s, %s, %s, %s" % (value, rgb[0], rgb[1], rgb[2]))

To build on the solutions from Ffisegydd and amaliammr, here’s an example where we make CSV representation for a custom colormap:

#! /usr/bin/env python3
import matplotlib
import numpy as np 

vmin = 0.1
vmax = 1000

norm = matplotlib.colors.Normalize(np.log10(vmin), np.log10(vmax))
lognum = norm(np.log10([.5, 2., 10, 40, 150,1000]))

cdict = {
    'red':
    (
        (0., 0, 0),
        (lognum[0], 0, 0),
        (lognum[1], 0, 0),
        (lognum[2], 1, 1),
        (lognum[3], 0.8, 0.8),
        (lognum[4], .7, .7),
    (lognum[5], .7, .7)
    ),
    'green':
    (
        (0., .6, .6),
        (lognum[0], 0.8, 0.8),
        (lognum[1], 1, 1),
        (lognum[2], 1, 1),
        (lognum[3], 0, 0),
        (lognum[4], 0, 0),
    (lognum[5], 0, 0)
    ),
    'blue':
    (
        (0., 0, 0),
        (lognum[0], 0, 0),
        (lognum[1], 0, 0),
        (lognum[2], 0, 0),
        (lognum[3], 0, 0),
        (lognum[4], 0, 0),
    (lognum[5], 1, 1)
    )
}


mycmap = matplotlib.colors.LinearSegmentedColormap('my_colormap', cdict, 256)   
norm = matplotlib.colors.LogNorm(vmin, vmax)
colors = {}
count = 0
step_size = 0.001
for value in np.arange(vmin, vmax+step_size, step_size):
    count += 1
    print("%d/%d %f%%" % (count, vmax*(1./step_size), 100.*count/(vmax*(1./step_size))))
    rgba = mycmap(norm(value), bytes=True)
    color = (rgba[0], rgba[1], rgba[2])
    if color not in colors.values():
        colors[value] = color

print ("value, red, green, blue")
for value in sorted(colors.keys()):
    rgb = colors[value]
    print("%s, %s, %s, %s" % (value, rgb[0], rgb[1], rgb[2]))

回答 3

为了完整起见,这些是我到目前为止遇到的cmap选择:

重音,重音,蓝调,蓝调,BrBG,BrBG_r,BuGn,BuGn_r,BuPu,BuPu_r,CMRmap,CMRmap_r,Dark2,Dark2_r,GnBu,GnBu_r,Greens,Greens_r,Greys,Greys_r,Orange,Rr,OrRd,OrRd PRGn_r,成对,成对_r,Pastel1,Pastel1_r,Pastel2,Pastel2_r,PiYG,PiYG_r,PuBu,PuBuGn,PuBuGn_r,PuBu_r,PuOr,PuOr_r,PuRd,PuRd_r,Puror,PurOr_r,RdBu,RdBu,RdBu,RdBu,RdBu,RdBu,RdBu,RdBu,RdBu,RdBu,RdBu,RdBu RdYlBu,RdYlBu_r,RdYlGn,RdYlGn_r,Reds,Reds_r,Set1,Set1_r,Set2,Set2_r,Set3,Set3_r,Spectral,Spectral_r,Wistia,Wistia_r,YlGn,YlGnBu,YlGnBr_r,YlGnBr_r,YlGnBr_r,YlGnBr_r,YlGnBr_r,YlGnBr afmhot_r,秋季,autumn_r,二进制,binary_r,骨骼,bone_r,brg,brg_r,bwr,bwr_r,cividis,cividis_r,cool,cool_r,coolwarm,coolwarm_r,铜,copper_r,cubehelix,cubehelix_r,标志,flag_r,gist_eargist_gray,gist_gray_r,gist_heat,gist_heat_r,gist_ncar,gist_ncar_r,gist_rainbow,gist_rainbow_r,gist_stern,gist_stern_r,gist_yarg,gist_yarg_r,gnuplots,gn_lotv,gnuplot,gnuplot2,gnuplot2,gnuplot2,gnuplot2, jet_r,岩浆,岩浆_r,nipy_spectral,nipy_spectral_r,海洋,ocean_r,粉红色,pink_r,等离子,plasma_r,棱镜,prism_r,彩虹,rainbow_r,地震,地震_r,弹​​簧,spring_r,夏季,summer_r,tab10,tab10_r,tab20,tab20_r, tab20b,tab20b_r,tab20c,tab20c_r,terrain,terrain_r,twilight,twilight_r,twilight_shifted,twilight_shifted_r,viridis,viridis_r,冬天,winter_rgray_r,hot,hot_r,hsv,hsv_r,地狱,inferno_r,喷射,jet_r,岩浆,岩浆_r,nipy_spectral,nipy_spectral_r,海洋,ocean_r,粉红色,pink_r,等离子,plasma_r,棱镜,prism_r,彩虹,rainbow_r,地震,地震_r,春天,spring_r,夏天,summer_r,tab10,tab10_r,tab20,tab20_r,tab20b,tab20b_r,tab20c,tab20c_r,terrain,terrain_r,twilight,twilight_r,twilight_shifted,twilight_shifted_r,viridis,viridis_r,冬天,winter_rgray_r,hot,hot_r,hsv,hsv_r,地狱,inferno_r,喷射,jet_r,岩浆,岩浆_r,nipy_spectral,nipy_spectral_r,海洋,ocean_r,粉红色,pink_r,等离子,plasma_r,棱镜,prism_r,彩虹,rainbow_r,地震,地震_r,春天,spring_r,夏天,summer_r,tab10,tab10_r,tab20,tab20_r,tab20b,tab20b_r,tab20c,tab20c_r,terrain,terrain_r,twilight,twilight_r,twilight_shifted,twilight_shifted_r,viridis,viridis_r,冬天,winter_rviridis,viridis_r,冬天,winter_rviridis,viridis_r,冬天,winter_r

For completeness these are the cmap choices I encountered so far:

Accent, Accent_r, Blues, Blues_r, BrBG, BrBG_r, BuGn, BuGn_r, BuPu, BuPu_r, CMRmap, CMRmap_r, Dark2, Dark2_r, GnBu, GnBu_r, Greens, Greens_r, Greys, Greys_r, OrRd, OrRd_r, Oranges, Oranges_r, PRGn, PRGn_r, Paired, Paired_r, Pastel1, Pastel1_r, Pastel2, Pastel2_r, PiYG, PiYG_r, PuBu, PuBuGn, PuBuGn_r, PuBu_r, PuOr, PuOr_r, PuRd, PuRd_r, Purples, Purples_r, RdBu, RdBu_r, RdGy, RdGy_r, RdPu, RdPu_r, RdYlBu, RdYlBu_r, RdYlGn, RdYlGn_r, Reds, Reds_r, Set1, Set1_r, Set2, Set2_r, Set3, Set3_r, Spectral, Spectral_r, Wistia, Wistia_r, YlGn, YlGnBu, YlGnBu_r, YlGn_r, YlOrBr, YlOrBr_r, YlOrRd, YlOrRd_r, afmhot, afmhot_r, autumn, autumn_r, binary, binary_r, bone, bone_r, brg, brg_r, bwr, bwr_r, cividis, cividis_r, cool, cool_r, coolwarm, coolwarm_r, copper, copper_r, cubehelix, cubehelix_r, flag, flag_r, gist_earth, gist_earth_r, gist_gray, gist_gray_r, gist_heat, gist_heat_r, gist_ncar, gist_ncar_r, gist_rainbow, gist_rainbow_r, gist_stern, gist_stern_r, gist_yarg, gist_yarg_r, gnuplot, gnuplot2, gnuplot2_r, gnuplot_r, gray, gray_r, hot, hot_r, hsv, hsv_r, inferno, inferno_r, jet, jet_r, magma, magma_r, nipy_spectral, nipy_spectral_r, ocean, ocean_r, pink, pink_r, plasma, plasma_r, prism, prism_r, rainbow, rainbow_r, seismic, seismic_r, spring, spring_r, summer, summer_r, tab10, tab10_r, tab20, tab20_r, tab20b, tab20b_r, tab20c, tab20c_r, terrain, terrain_r, twilight, twilight_r, twilight_shifted, twilight_shifted_r, viridis, viridis_r, winter, winter_r


Python ElementTree模块:使用方法“ find”,“ findall”时,如何忽略XML文件的命名空间以找到匹配的元素

问题:Python ElementTree模块:使用方法“ find”,“ findall”时,如何忽略XML文件的命名空间以找到匹配的元素

我想使用“ findall”方法在ElementTree模块中找到源xml文件的某些元素。

但是,源xml文件(test.xml)具有命名空间。我截断一部分xml文件作为示例:

<?xml version="1.0" encoding="iso-8859-1"?>
<XML_HEADER xmlns="http://www.test.com">
    <TYPE>Updates</TYPE>
    <DATE>9/26/2012 10:30:34 AM</DATE>
    <COPYRIGHT_NOTICE>All Rights Reserved.</COPYRIGHT_NOTICE>
    <LICENSE>newlicense.htm</LICENSE>
    <DEAL_LEVEL>
        <PAID_OFF>N</PAID_OFF>
        </DEAL_LEVEL>
</XML_HEADER>

示例python代码如下:

from xml.etree import ElementTree as ET
tree = ET.parse(r"test.xml")
el1 = tree.findall("DEAL_LEVEL/PAID_OFF") # Return None
el2 = tree.findall("{http://www.test.com}DEAL_LEVEL/{http://www.test.com}PAID_OFF") # Return <Element '{http://www.test.com}DEAL_LEVEL/PAID_OFF' at 0xb78b90>

尽管它可以工作,但是因为有一个命名空间“ {http://www.test.com}”,但是在每个标签前面添加一个命名空间非常不方便。

使用“ find”,“ findall”等方法时,如何忽略命名空间?

I want to use the method of “findall” to locate some elements of the source xml file in the ElementTree module.

However, the source xml file (test.xml) has namespace. I truncate part of xml file as sample:

<?xml version="1.0" encoding="iso-8859-1"?>
<XML_HEADER xmlns="http://www.test.com">
    <TYPE>Updates</TYPE>
    <DATE>9/26/2012 10:30:34 AM</DATE>
    <COPYRIGHT_NOTICE>All Rights Reserved.</COPYRIGHT_NOTICE>
    <LICENSE>newlicense.htm</LICENSE>
    <DEAL_LEVEL>
        <PAID_OFF>N</PAID_OFF>
        </DEAL_LEVEL>
</XML_HEADER>

The sample python code is below:

from xml.etree import ElementTree as ET
tree = ET.parse(r"test.xml")
el1 = tree.findall("DEAL_LEVEL/PAID_OFF") # Return None
el2 = tree.findall("{http://www.test.com}DEAL_LEVEL/{http://www.test.com}PAID_OFF") # Return <Element '{http://www.test.com}DEAL_LEVEL/PAID_OFF' at 0xb78b90>

Although it can works, because there is a namespace “{http://www.test.com}”, it’s very inconvenient to add a namespace in front of each tag.

How can I ignore the namespace when using the method of “find”, “findall” and so on?


回答 0

最好不要解析XML文档本身,而是先解析它,然后修改结果中的标记。这样,您可以处理多个命名空间和命名空间别名:

from io import StringIO  # for Python 2 import from StringIO instead
import xml.etree.ElementTree as ET

# instead of ET.fromstring(xml)
it = ET.iterparse(StringIO(xml))
for _, el in it:
    prefix, has_namespace, postfix = el.tag.partition('}')
    if has_namespace:
        el.tag = postfix  # strip all namespaces
root = it.root

这是基于此处的讨论:http : //bugs.python.org/issue18304

更新: rpartition而不是partition确保你得到的标签名postfix,即使没有命名空间。因此,您可以将其压缩:

for _, el in it:
    _, _, el.tag = el.tag.rpartition('}') # strip ns

Instead of modifying the XML document itself, it’s best to parse it and then modify the tags in the result. This way you can handle multiple namespaces and namespace aliases:

from io import StringIO  # for Python 2 import from StringIO instead
import xml.etree.ElementTree as ET

# instead of ET.fromstring(xml)
it = ET.iterparse(StringIO(xml))
for _, el in it:
    prefix, has_namespace, postfix = el.tag.partition('}')
    if has_namespace:
        el.tag = postfix  # strip all namespaces
root = it.root

This is based on the discussion here: http://bugs.python.org/issue18304

Update: rpartition instead of partition makes sure you get the tag name in postfix even if there is no namespace. Thus you could condense it:

for _, el in it:
    _, _, el.tag = el.tag.rpartition('}') # strip ns

回答 1

如果您在解析前从xml中删除xmlns属性,则树中的每个标记都将没有命名空间。

import re

xmlstring = re.sub(' xmlns="[^"]+"', '', xmlstring, count=1)

If you remove the xmlns attribute from the xml before parsing it then there won’t be a namespace prepended to each tag in the tree.

import re

xmlstring = re.sub(' xmlns="[^"]+"', '', xmlstring, count=1)

回答 2

到目前为止,答案明确地将命名空间值放在脚本中。对于更通用的解决方案,我宁愿从xml中提取命名空间:

import re
def get_namespace(element):
  m = re.match('\{.*\}', element.tag)
  return m.group(0) if m else ''

并在查找方法中使用它:

namespace = get_namespace(tree.getroot())
print tree.find('./{0}parent/{0}version'.format(namespace)).text

The answers so far explicitely put the namespace value in the script. For a more generic solution, I would rather extract the namespace from the xml:

import re
def get_namespace(element):
  m = re.match('\{.*\}', element.tag)
  return m.group(0) if m else ''

And use it in find method:

namespace = get_namespace(tree.getroot())
print tree.find('./{0}parent/{0}version'.format(namespace)).text

回答 3

这是对nonagon答案的扩展,它也剥离了命名空间的属性:

from StringIO import StringIO
import xml.etree.ElementTree as ET

# instead of ET.fromstring(xml)
it = ET.iterparse(StringIO(xml))
for _, el in it:
    if '}' in el.tag:
        el.tag = el.tag.split('}', 1)[1]  # strip all namespaces
    for at in list(el.attrib.keys()): # strip namespaces of attributes too
        if '}' in at:
            newat = at.split('}', 1)[1]
            el.attrib[newat] = el.attrib[at]
            del el.attrib[at]
root = it.root

UPDATE:已添加,list()以便迭代器可以工作(Python 3所需)

Here’s an extension to nonagon’s answer, which also strips namespaces off attributes:

from StringIO import StringIO
import xml.etree.ElementTree as ET

# instead of ET.fromstring(xml)
it = ET.iterparse(StringIO(xml))
for _, el in it:
    if '}' in el.tag:
        el.tag = el.tag.split('}', 1)[1]  # strip all namespaces
    for at in list(el.attrib.keys()): # strip namespaces of attributes too
        if '}' in at:
            newat = at.split('}', 1)[1]
            el.attrib[newat] = el.attrib[at]
            del el.attrib[at]
root = it.root

UPDATE: added list() so the iterator works (needed for Python 3)


回答 4

改善ericspod的答案:

无需全局更改解析模式,我们可以将其包装在支持with构造的对象中。

from xml.parsers import expat

class DisableXmlNamespaces:
    def __enter__(self):
            self.oldcreate = expat.ParserCreate
            expat.ParserCreate = lambda encoding, sep: self.oldcreate(encoding, None)
    def __exit__(self, type, value, traceback):
            expat.ParserCreate = self.oldcreate

然后可以按如下方式使用

import xml.etree.ElementTree as ET
with DisableXmlNamespaces():
     tree = ET.parse("test.xml")

这种方式的优点在于,它不会更改with块之外无关代码的任何行为。我使用了ericspod的版本(在此同时也使用了expat)在不相关的库中出现错误之后,最终创建了该代码。

Improving on the answer by ericspod:

Instead of changing the parse mode globally we can wrap this in an object supporting the with construct.

from xml.parsers import expat

class DisableXmlNamespaces:
    def __enter__(self):
            self.oldcreate = expat.ParserCreate
            expat.ParserCreate = lambda encoding, sep: self.oldcreate(encoding, None)
    def __exit__(self, type, value, traceback):
            expat.ParserCreate = self.oldcreate

This can then be used as follows

import xml.etree.ElementTree as ET
with DisableXmlNamespaces():
     tree = ET.parse("test.xml")

The beauty of this way is that it does not change any behaviour for unrelated code outside the with block. I ended up creating this after getting errors in unrelated libraries after using the version by ericspod which also happened to use expat.


回答 5

您也可以使用优雅的字符串格式构造:

ns='http://www.test.com'
el2 = tree.findall("{%s}DEAL_LEVEL/{%s}PAID_OFF" %(ns,ns))

或者,如果您确定PAID_OFF仅出现在树的一级中:

el2 = tree.findall(".//{%s}PAID_OFF" % ns)

You can use the elegant string formatting construct as well:

ns='http://www.test.com'
el2 = tree.findall("{%s}DEAL_LEVEL/{%s}PAID_OFF" %(ns,ns))

or, if you’re sure that PAID_OFF only appears in one level in tree:

el2 = tree.findall(".//{%s}PAID_OFF" % ns)

回答 6

如果不使用ElementTree,则cElementTree可以通过替换来强制Expat忽略命名空间处理ParserCreate()

from xml.parsers import expat
oldcreate = expat.ParserCreate
expat.ParserCreate = lambda encoding, sep: oldcreate(encoding, None)

ElementTree尝试通过调用来使用Expat,ParserCreate()但没有提供不提供命名空间分隔符字符串的选项,以上代码将导致其被忽略,但被警告可能会破坏其他情况。

If you’re using ElementTree and not cElementTree you can force Expat to ignore namespace processing by replacing ParserCreate():

from xml.parsers import expat
oldcreate = expat.ParserCreate
expat.ParserCreate = lambda encoding, sep: oldcreate(encoding, None)

ElementTree tries to use Expat by calling ParserCreate() but provides no option to not provide a namespace separator string, the above code will cause it to be ignore but be warned this could break other things.


回答 7

我为此可能会迟到,但我认为这re.sub不是一个好的解决方案。

但是,该重写xml.parsers.expat不适用于Python 3.x版本,

罪魁祸首是xml/etree/ElementTree.py源代码的底部

# Import the C accelerators
try:
    # Element is going to be shadowed by the C implementation. We need to keep
    # the Python version of it accessible for some "creative" by external code
    # (see tests)
    _Element_Py = Element

    # Element, SubElement, ParseError, TreeBuilder, XMLParser
    from _elementtree import *
except ImportError:
    pass

真是可悲。

解决的办法是先摆脱它。

import _elementtree
try:
    del _elementtree.XMLParser
except AttributeError:
    # in case deleted twice
    pass
else:
    from xml.parsers import expat  # NOQA: F811
    oldcreate = expat.ParserCreate
    expat.ParserCreate = lambda encoding, sep: oldcreate(encoding, None)

在Python 3.6上测试。

try如果在代码的某处重新加载或导入模块两次而遇到一些奇怪的错误,例如try 语句,则很有用

  • 超过最大递归深度
  • AttributeError:XMLParser

顺便说一句,etree源代码看起来真的很乱。

I might be late for this but I dont think re.sub is a good solution.

However the rewrite xml.parsers.expat does not work for Python 3.x versions,

The main culprit is the xml/etree/ElementTree.py see bottom of the source code

# Import the C accelerators
try:
    # Element is going to be shadowed by the C implementation. We need to keep
    # the Python version of it accessible for some "creative" by external code
    # (see tests)
    _Element_Py = Element

    # Element, SubElement, ParseError, TreeBuilder, XMLParser
    from _elementtree import *
except ImportError:
    pass

Which is kinda sad.

The solution is to get rid of it first.

import _elementtree
try:
    del _elementtree.XMLParser
except AttributeError:
    # in case deleted twice
    pass
else:
    from xml.parsers import expat  # NOQA: F811
    oldcreate = expat.ParserCreate
    expat.ParserCreate = lambda encoding, sep: oldcreate(encoding, None)

Tested on Python 3.6.

Try try statement is useful in case somewhere in your code you reload or import a module twice you get some strange errors like

  • maximum recursion depth exceeded
  • AttributeError: XMLParser

btw damn the etree source code looks really messy.


回答 8

让我们结合nonagon的答案mzjn对一个相关问题的答案

def parse_xml(xml_path: Path) -> Tuple[ET.Element, Dict[str, str]]:
    xml_iter = ET.iterparse(xml_path, events=["start-ns"])
    xml_namespaces = dict(prefix_namespace_pair for _, prefix_namespace_pair in xml_iter)
    return xml_iter.root, xml_namespaces

使用此功能,我们:

  1. 创建一个迭代器以获取命名空间和已解析的树对象

  2. 遍历创建的迭代器以获取命名空间命令,我们以后可以传入每个命名空间find()findall()调用iMom0的命名空间。

  3. 返回解析树的根元素对象和命名空间。

我认为这是最好的方法,因为无论源XML还是解析后的xml.etree.ElementTree输出都不会受到任何操纵。

我还要感谢Barny的回答,因为它提供了这个难题的重要组成部分(您可以从迭代器获得解析的根)。在此之前,我实际上在应用程序中遍历了两次XML树(一次获取命名空间,第二次获取根)。

Let’s combine nonagon’s answer with mzjn’s answer to a related question:

def parse_xml(xml_path: Path) -> Tuple[ET.Element, Dict[str, str]]:
    xml_iter = ET.iterparse(xml_path, events=["start-ns"])
    xml_namespaces = dict(prefix_namespace_pair for _, prefix_namespace_pair in xml_iter)
    return xml_iter.root, xml_namespaces

Using this function we:

  1. Create an iterator to get both namespaces and a parsed tree object.

  2. Iterate over the created iterator to get the namespaces dict that we can later pass in each find() or findall() call as sugested by iMom0.

  3. Return the parsed tree’s root element object and namespaces.

I think this is the best approach all around as there’s no manipulation either of a source XML or resulting parsed xml.etree.ElementTree output whatsoever involved.

I’d like also to credit barny’s answer with providing an essential piece of this puzzle (that you can get the parsed root from the iterator). Until that I actually traversed XML tree twice in my application (once to get namespaces, second for a root).


Python / SciPy的峰值发现算法

问题:Python / SciPy的峰值发现算法

我可以通过找到一阶导数的零交叉点或类似的东西自己写点东西,但是它似乎包含在标准库中,具有足够的通用性。有人知道吗?

我的特定应用是2D阵列,但通常将其用于查找FFT等中的峰值。

具体来说,在这些类型的问题中,有多个强峰值,然后是由噪声引起的许多较小的“峰值”,应将其忽略。这些仅仅是示例;不是我的实际数据:

一维峰:

二维峰:

峰值查找算法将找到这些峰的位置(而不仅仅是它们的值),理想情况下,可能会使用二次插值或其他方法找到真正的样本间峰,而不仅仅是具有最大值的索引。

通常,您只关心几个强峰,因此选择它们是因为它们高于某个阈值,或者因为它们是有序列表的前n个峰(按振幅排序)。

正如我说的,我自己会写这样的东西。我只是问是否有一个已知的运作良好的功能或软件包。

更新:

翻译了一个MATLAB脚本,它在1-D情况下工作得很好,但可能会更好。

更新的更新:

sixtenbe 为一维案例创建了更好的版本

I can write something myself by finding zero-crossings of the first derivative or something, but it seems like a common-enough function to be included in standard libraries. Anyone know of one?

My particular application is a 2D array, but usually it would be used for finding peaks in FFTs, etc.

Specifically, in these kinds of problems, there are multiple strong peaks, and then lots of smaller “peaks” that are just caused by noise that should be ignored. These are just examples; not my actual data:

1-dimensional peaks:

2-dimensional peaks:

The peak-finding algorithm would find the location of these peaks (not just their values), and ideally would find the true inter-sample peak, not just the index with maximum value, probably using quadratic interpolation or something.

Typically you only care about a few strong peaks, so they’d either be chosen because they’re above a certain threshold, or because they’re the first n peaks of an ordered list, ranked by amplitude.

As I said, I know how to write something like this myself. I’m just asking if there’s a pre-existing function or package that’s known to work well.

Update:

I translated a MATLAB script and it works decently for the 1-D case, but could be better.

Updated update:

sixtenbe created a better version for the 1-D case.


回答 0

scipy.signal.find_peaks顾名思义,该功能对此有用。但是,要理解以及它的参数是非常重要的widththresholddistance 和高于一切prominence,以获得良好的峰值提取。

根据我的测试和文档,突出的概念是“有用的概念”,用于保持良好的峰值,并丢弃嘈杂的峰值。

什么是(地形)突出?它是“从山顶下降到更高地形所需的最低高度”,如下所示:

这个想法是:

突出程度越高,峰越“重要”。

测试:

我故意使用了一个(嘈杂的)频率变化正弦曲线,因为它显示了很多困难。我们可以看到该width参数在这里不是很有用,因为如果您将最小值设置width得太高,则它将无法跟踪高频部分中非常接近的峰值。如果设置width得太低,则信号左侧会出现许多不需要的峰值。同样的问题distancethreshold仅与直接邻居比较,在这里没有用。prominence是提供最佳解决方案的一种。请注意,您可以结合使用许多这些参数!

码:

import numpy as np
import matplotlib.pyplot as plt 
from scipy.signal import find_peaks

x = np.sin(2*np.pi*(2**np.linspace(2,10,1000))*np.arange(1000)/48000) + np.random.normal(0, 1, 1000) * 0.15
peaks, _ = find_peaks(x, distance=20)
peaks2, _ = find_peaks(x, prominence=1)      # BEST!
peaks3, _ = find_peaks(x, width=20)
peaks4, _ = find_peaks(x, threshold=0.4)     # Required vertical distance to its direct neighbouring samples, pretty useless
plt.subplot(2, 2, 1)
plt.plot(peaks, x[peaks], "xr"); plt.plot(x); plt.legend(['distance'])
plt.subplot(2, 2, 2)
plt.plot(peaks2, x[peaks2], "ob"); plt.plot(x); plt.legend(['prominence'])
plt.subplot(2, 2, 3)
plt.plot(peaks3, x[peaks3], "vg"); plt.plot(x); plt.legend(['width'])
plt.subplot(2, 2, 4)
plt.plot(peaks4, x[peaks4], "xk"); plt.plot(x); plt.legend(['threshold'])
plt.show()

The function scipy.signal.find_peaks, as its name suggests, is useful for this. But it’s important to understand well its parameters width, threshold, distance and above all prominence to get a good peak extraction.

According to my tests and the documentation, the concept of prominence is “the useful concept” to keep the good peaks, and discard the noisy peaks.

What is (topographic) prominence? It is “the minimum height necessary to descend to get from the summit to any higher terrain”, as it can be seen here:

The idea is:

The higher the prominence, the more “important” the peak is.

Test:

I used a (noisy) frequency-varying sinusoid on purpose because it shows many difficulties. We can see that the width parameter is not very useful here because if you set a minimum width too high, then it won’t be able to track very close peaks in the high frequency part. If you set width too low, you would have many unwanted peaks in the left part of the signal. Same problem with distance. threshold only compares with the direct neighbours, which is not useful here. prominence is the one that gives the best solution. Note that you can combine many of these parameters!

Code:

import numpy as np
import matplotlib.pyplot as plt 
from scipy.signal import find_peaks

x = np.sin(2*np.pi*(2**np.linspace(2,10,1000))*np.arange(1000)/48000) + np.random.normal(0, 1, 1000) * 0.15
peaks, _ = find_peaks(x, distance=20)
peaks2, _ = find_peaks(x, prominence=1)      # BEST!
peaks3, _ = find_peaks(x, width=20)
peaks4, _ = find_peaks(x, threshold=0.4)     # Required vertical distance to its direct neighbouring samples, pretty useless
plt.subplot(2, 2, 1)
plt.plot(peaks, x[peaks], "xr"); plt.plot(x); plt.legend(['distance'])
plt.subplot(2, 2, 2)
plt.plot(peaks2, x[peaks2], "ob"); plt.plot(x); plt.legend(['prominence'])
plt.subplot(2, 2, 3)
plt.plot(peaks3, x[peaks3], "vg"); plt.plot(x); plt.legend(['width'])
plt.subplot(2, 2, 4)
plt.plot(peaks4, x[peaks4], "xk"); plt.plot(x); plt.legend(['threshold'])
plt.show()

回答 1

我正在寻找一个类似的问题,并且我发现一些最佳参考来自化学(来自质谱数据中的峰)。有关峰发现算法的详尽综述,请阅读本章。这是我所遇到的关于峰发现技术的最清晰的评论之一。(小波最适合在嘈杂的数据中找到此类峰。)。

看来您的峰清晰地定义了,并且没有隐藏在噪音中。在这种情况下,我建议您使用平滑的savtizky-golay导数来查找峰(如果仅区分上面的数据,则会有一些误报。)。这是一种非常有效的技术,非常容易实现(您确实需要带有基本操作的矩阵类)。如果您只是找到一阶SG导数的零交叉,我想您会很高兴的。

I’m looking at a similar problem, and I’ve found some of the best references come from chemistry (from peaks finding in mass-spec data). For a good thorough review of peaking finding algorithms read this. This is one of the best clearest reviews of peak finding techniques that I’ve run across. (Wavelets are the best for finding peaks of this sort in noisy data.).

It looks like your peaks are clearly defined and aren’t hidden in the noise. That being the case I’d recommend using smooth savtizky-golay derivatives to find the peaks (If you just differentiate the data above you’ll have a mess of false positives.). This is a very effective technique and is pretty easy to implemented (you do need a matrix class w/ basic operations). If you simply find the zero crossing of the first S-G derivative I think you’ll be happy.


回答 2

scipy中有一个名为的功能scipy.signal.find_peaks_cwt,听起来像很适合您的需求,但是我没有经验,所以我不推荐。

http://docs.scipy.org/doc/scipy/reference/generation/scipy.signal.find_peaks_cwt.html

There is a function in scipy named scipy.signal.find_peaks_cwt which sounds like is suitable for your needs, however I don’t have experience with it so I cannot recommend..

http://docs.scipy.org/doc/scipy/reference/generated/scipy.signal.find_peaks_cwt.html


回答 3

对于那些不确定在Python中使用哪种峰值查找算法的人,这里是替代方法的快速概述:https : //github.com/MonsieurV/py-findpeaks

想要自己等同于MatLab findpeaks函数,我发现Marcos Duarte 的detect_peaks函数是一个不错的选择。

相当容易使用:

import numpy as np
from vector import vector, plot_peaks
from libs import detect_peaks
print('Detect peaks with minimum height and distance filters.')
indexes = detect_peaks.detect_peaks(vector, mph=7, mpd=2)
print('Peaks are: %s' % (indexes))

这会给你:

For those not sure about which peak-finding algorithms to use in Python, here a rapid overview of the alternatives: https://github.com/MonsieurV/py-findpeaks

Wanting myself an equivalent to the MatLab findpeaks function, I’ve found that the detect_peaks function from Marcos Duarte is a good catch.

Pretty easy to use:

import numpy as np
from vector import vector, plot_peaks
from libs import detect_peaks
print('Detect peaks with minimum height and distance filters.')
indexes = detect_peaks.detect_peaks(vector, mph=7, mpd=2)
print('Peaks are: %s' % (indexes))

Which will give you:


回答 4

以可靠的方式检测频谱中的峰值已经进行了很多研究,例如80年代对音乐/音频信号的正弦建模的所有工作。在文献中查找“正弦建模”。

如果您的信号像示例一样干净,那么简单的“给我振幅大于N个邻居的东西”应该可以正常工作。如果您有嘈杂的信号,一种简单而有效的方法就是及时查看峰值并进行跟踪:然后检测频谱线而不是频谱峰值。IOW,您可以在信号的滑动窗口上计算FFT,以获得时间上的一组频谱(也称为频谱图)。然后,您可以查看频谱峰值随时间的变化(即在连续的窗口中)。

Detecting peaks in a spectrum in a reliable way has been studied quite a bit, for example all the work on sinusoidal modelling for music/audio signals in the 80ies. Look for “Sinusoidal Modeling” in the literature.

If your signals are as clean as the example, a simple “give me something with an amplitude higher than N neighbours” should work reasonably well. If you have noisy signals, a simple but effective way is to look at your peaks in time, to track them: you then detect spectral lines instead of spectral peaks. IOW, you compute the FFT on a sliding window of your signal, to get a set of spectrum in time (also called spectrogram). You then look at the evolution of the spectral peak in time (i.e. in consecutive windows).


回答 5

我认为您所寻找的不是SciPy提供的。在这种情况下,我将自己编写代码。

scipy.interpolate的样条曲线插值和平滑效果非常好,可能对拟合峰然后找到最大值的位置很有帮助。

I do not think that what you are looking for is provided by SciPy. I would write the code myself, in this situation.

The spline interpolation and smoothing from scipy.interpolate are quite nice and might be quite helpful in fitting peaks and then finding the location of their maximum.


回答 6

有一些标准的统计功能和方法可以找到数据的异常值,这可能是第一种情况。使用导数将解决您的第二个问题。但是,我不确定是否可以解决连续函数和采样数据的方法。

There are standard statistical functions and methods for finding outliers to data, which is probably what you need in the first case. Using derivatives would solve your second. I’m not sure for a method which solves both continuous functions and sampled data, however.


回答 7

首先,如果没有进一步说明,“峰值”的定义是模糊的。例如,对于以下系列,您将5-4-5称为一个峰还是两个峰?

1-2-1-2-1-1-5-4-5-1-1-5-1

在这种情况下,您至少需要两个阈值:1)仅在高阈值之上,极值才能注册为峰值;2)较低的阈值,以使极小值被其以下的小数值分隔开将成为两个峰值。

峰值检测是极值理论文献中一个经过充分研究的主题,也称为“极值的聚类”。它的典型应用包括基于连续读取环境变量来识别危险事件,例如分析风速以检测风暴事件。

First things first, the definition of “peak” is vague if without further specifications. For example, for the following series, would you call 5-4-5 one peak or two?

1-2-1-2-1-1-5-4-5-1-1-5-1

In this case, you’ll need at least two thresholds: 1) a high threshold only above which can an extreme value register as a peak; and 2) a low threshold so that extreme values separated by small values below it will become two peaks.

Peak detection is a well-studied topic in Extreme Value Theory literature, also known as “declustering of extreme values”. Its typical applications include identifying hazard events based on continuous readings of environmental variables e.g. analysing wind speed to detect storm events.


Python速度测试-时差-毫秒

问题:Python速度测试-时差-毫秒

为了快速测试一段代码,在Python中进行两次比较的正确方法是什么?我尝试阅读API文档。我不确定我是否了解timedelta。

到目前为止,我有以下代码:

from datetime import datetime

tstart = datetime.now()
print t1

# code to speed test

tend = datetime.now()
print t2
# what am I missing?
# I'd like to print the time diff here

What is the proper way to compare 2 times in Python in order to speed test a section of code? I tried reading the API docs. I’m not sure I understand the timedelta thing.

So far I have this code:

from datetime import datetime

tstart = datetime.now()
print t1

# code to speed test

tend = datetime.now()
print t2
# what am I missing?
# I'd like to print the time diff here

回答 0

datetime.timedelta 只是两个日期时间之间的差…所以就像一段时间,以天/秒/微秒为单位

>>> import datetime
>>> a = datetime.datetime.now()
>>> b = datetime.datetime.now()
>>> c = b - a

>>> c
datetime.timedelta(0, 4, 316543)
>>> c.days
0
>>> c.seconds
4
>>> c.microseconds
316543

请注意,它c.microseconds仅返回timedelta的微秒部分!出于计时目的,请始终使用c.total_seconds()

您可以使用datetime.timedelta进行各种数学运算,例如:

>>> c / 10
datetime.timedelta(0, 0, 431654)

不过,查看CPU时间而不是墙上时钟时间可能更有用……虽然这取决于操作系统,但在类Unix系统下,请检查“ time”命令。

datetime.timedelta is just the difference between two datetimes … so it’s like a period of time, in days / seconds / microseconds

>>> import datetime
>>> a = datetime.datetime.now()
>>> b = datetime.datetime.now()
>>> c = b - a

>>> c
datetime.timedelta(0, 4, 316543)
>>> c.days
0
>>> c.seconds
4
>>> c.microseconds
316543

Be aware that c.microseconds only returns the microseconds portion of the timedelta! For timing purposes always use c.total_seconds().

You can do all sorts of maths with datetime.timedelta, eg:

>>> c / 10
datetime.timedelta(0, 0, 431654)

It might be more useful to look at CPU time instead of wallclock time though … that’s operating system dependant though … under Unix-like systems, check out the ‘time’ command.


回答 1

从Python 2.7开始,有了timedelta.total_seconds()方法。因此,要获得经过的毫秒数:

>>> import datetime
>>> a = datetime.datetime.now()
>>> b = datetime.datetime.now()
>>> delta = b - a
>>> print delta
0:00:05.077263
>>> int(delta.total_seconds() * 1000) # milliseconds
5077

Since Python 2.7 there’s the timedelta.total_seconds() method. So, to get the elapsed milliseconds:

>>> import datetime
>>> a = datetime.datetime.now()
>>> b = datetime.datetime.now()
>>> delta = b - a
>>> print delta
0:00:05.077263
>>> int(delta.total_seconds() * 1000) # milliseconds
5077

回答 2

您可能要改用timeit模块

You might want to use the timeit module instead.


回答 3

您还可以使用:

import time

start = time.clock()
do_something()
end = time.clock()
print "%.2gs" % (end-start)

或者您可以使用python分析器

You could also use:

import time

start = time.clock()
do_something()
end = time.clock()
print "%.2gs" % (end-start)

Or you could use the python profilers.


回答 4

我知道这很晚了,但实际上我真的很喜欢使用:

import time
start = time.time()

##### your timed code here ... #####

print "Process time: " + (time.time() - start)

time.time()从纪元开始,您可以得到秒数。因为这是标准时间(以秒为单位),所以您可以简单地从结束时间中减去开始时间来获得处理时间(以秒为单位)。time.clock()对基准测试非常有用,但是如果您想知道过程花费了多长时间,我发现它毫无用处。例如,说“我的过程需要10秒”比说“我的过程需要10个处理器时钟单位”要直观得多。

>>> start = time.time(); sum([each**8.3 for each in range(1,100000)]) ; print (time.time() - start)
3.4001404476250935e+45
0.0637760162354
>>> start = time.clock(); sum([each**8.3 for each in range(1,100000)]) ; print (time.clock() - start)
3.4001404476250935e+45
0.05

在上面的第一个示例中,显示的时间time.clock()为0.05,而time.time()为0.06377

>>> start = time.clock(); time.sleep(1) ; print "process time: " + (time.clock() - start)
process time: 0.0
>>> start = time.time(); time.sleep(1) ; print "process time: " + (time.time() - start)
process time: 1.00111794472

在第二个示例中,即使进程睡眠了一秒钟,处理器时间也以某种方式显示为“ 0”。time.time()正确显示多于1秒。

I know this is late, but I actually really like using:

import time
start = time.time()

##### your timed code here ... #####

print "Process time: " + (time.time() - start)

time.time() gives you seconds since the epoch. Because this is a standardized time in seconds, you can simply subtract the start time from the end time to get the process time (in seconds). time.clock() is good for benchmarking, but I have found it kind of useless if you want to know how long your process took. For example, it’s much more intuitive to say “my process takes 10 seconds” than it is to say “my process takes 10 processor clock units”

>>> start = time.time(); sum([each**8.3 for each in range(1,100000)]) ; print (time.time() - start)
3.4001404476250935e+45
0.0637760162354
>>> start = time.clock(); sum([each**8.3 for each in range(1,100000)]) ; print (time.clock() - start)
3.4001404476250935e+45
0.05

In the first example above, you are shown a time of 0.05 for time.clock() vs 0.06377 for time.time()

>>> start = time.clock(); time.sleep(1) ; print "process time: " + (time.clock() - start)
process time: 0.0
>>> start = time.time(); time.sleep(1) ; print "process time: " + (time.time() - start)
process time: 1.00111794472

In the second example, somehow the processor time shows “0” even though the process slept for a second. time.time() correctly shows a little more than 1 second.


回答 5

以下代码应显示时间说明…

from datetime import datetime

tstart = datetime.now()

# code to speed test

tend = datetime.now()
print tend - tstart

The following code should display the time detla…

from datetime import datetime

tstart = datetime.now()

# code to speed test

tend = datetime.now()
print tend - tstart

回答 6

您可以简单地打印出差异:

print tend - tstart

You could simply print the difference:

print tend - tstart

回答 7

我不是Python程序员,但我确实知道如何使用Google,这就是我发现的内容:您使用“-”运算符。要完成您的代码:

from datetime import datetime

tstart = datetime.now()

# code to speed test

tend = datetime.now()
print tend - tstart

此外,看起来您可以使用strftime()函数格式化时间跨度计算以呈现时间,但是这会让您感到高兴。

I am not a Python programmer, but I do know how to use Google and here’s what I found: you use the “-” operator. To complete your code:

from datetime import datetime

tstart = datetime.now()

# code to speed test

tend = datetime.now()
print tend - tstart

Additionally, it looks like you can use the strftime() function to format the timespan calculation in order to render the time however makes you happy.


回答 8

time.time()/ datetime可以快速使用,但并不总是100%精确。出于这个原因,我喜欢使用其中一个std lib 分析器(尤其是hotshot)来找出问题所在。

time.time() / datetime is good for quick use, but is not always 100% precise. For that reason, I like to use one of the std lib profilers (especially hotshot) to find out what’s what.


回答 9

您可能需要研究配置文件模块。您会更好地了解减速的位置,并且大部分工作将完全自动化。

You may want to look into the profile modules. You’ll get a better read out of where your slowdowns are, and much of your work will be full-on automated.


回答 10

这是一个模仿Matlab / Octave tic toc函数的自定义函数。

使用示例:

time_var = time_me(); # get a variable with the current timestamp

... run operation ...

time_me(time_var); # print the time difference (e.g. '5 seconds 821.12314 ms')

功能:

def time_me(*arg):
    if len(arg) != 0: 
        elapsedTime = time.time() - arg[0];
        #print(elapsedTime);
        hours = math.floor(elapsedTime / (60*60))
        elapsedTime = elapsedTime - hours * (60*60);
        minutes = math.floor(elapsedTime / 60)
        elapsedTime = elapsedTime - minutes * (60);
        seconds = math.floor(elapsedTime);
        elapsedTime = elapsedTime - seconds;
        ms = elapsedTime * 1000;
        if(hours != 0):
            print ("%d hours %d minutes %d seconds" % (hours, minutes, seconds)) 
        elif(minutes != 0):
            print ("%d minutes %d seconds" % (minutes, seconds))
        else :
            print ("%d seconds %f ms" % (seconds, ms))
    else:
        #print ('does not exist. here you go.');
        return time.time()

Here is a custom function that mimic’s Matlab’s/Octave’s tic toc functions.

Example of use:

time_var = time_me(); # get a variable with the current timestamp

... run operation ...

time_me(time_var); # print the time difference (e.g. '5 seconds 821.12314 ms')

Function :

def time_me(*arg):
    if len(arg) != 0: 
        elapsedTime = time.time() - arg[0];
        #print(elapsedTime);
        hours = math.floor(elapsedTime / (60*60))
        elapsedTime = elapsedTime - hours * (60*60);
        minutes = math.floor(elapsedTime / 60)
        elapsedTime = elapsedTime - minutes * (60);
        seconds = math.floor(elapsedTime);
        elapsedTime = elapsedTime - seconds;
        ms = elapsedTime * 1000;
        if(hours != 0):
            print ("%d hours %d minutes %d seconds" % (hours, minutes, seconds)) 
        elif(minutes != 0):
            print ("%d minutes %d seconds" % (minutes, seconds))
        else :
            print ("%d seconds %f ms" % (seconds, ms))
    else:
        #print ('does not exist. here you go.');
        return time.time()

回答 11

您可以像这样使用timeit测试名为module.py的脚本。

$ python -mtimeit -s 'import module'

You could use timeit like this to test a script named module.py

$ python -mtimeit -s 'import module'

回答 12

《箭头》:Python的更好日期和时间

import arrow
start_time = arrow.utcnow()
end_time = arrow.utcnow()
(end_time - start_time).total_seconds()  # senconds
(end_time - start_time).total_seconds() * 1000  # milliseconds

Arrow: Better dates & times for Python

import arrow
start_time = arrow.utcnow()
end_time = arrow.utcnow()
(end_time - start_time).total_seconds()  # senconds
(end_time - start_time).total_seconds() * 1000  # milliseconds

如何将NumPy数组标准化到一定范围内?

问题:如何将NumPy数组标准化到一定范围内?

在对音频或图像阵列进行一些处理之后,需要先在一定范围内对其进行标准化,然后才能将其写回到文件中。可以这样完成:

# Normalize audio channels to between -1.0 and +1.0
audio[:,0] = audio[:,0]/abs(audio[:,0]).max()
audio[:,1] = audio[:,1]/abs(audio[:,1]).max()

# Normalize image to between 0 and 255
image = image/(image.max()/255.0)

有没有那么繁琐,方便的函数方式来做到这一点?matplotlib.colors.Normalize()似乎无关。

After doing some processing on an audio or image array, it needs to be normalized within a range before it can be written back to a file. This can be done like so:

# Normalize audio channels to between -1.0 and +1.0
audio[:,0] = audio[:,0]/abs(audio[:,0]).max()
audio[:,1] = audio[:,1]/abs(audio[:,1]).max()

# Normalize image to between 0 and 255
image = image/(image.max()/255.0)

Is there a less verbose, convenience function way to do this? matplotlib.colors.Normalize() doesn’t seem to be related.


回答 0

audio /= np.max(np.abs(audio),axis=0)
image *= (255.0/image.max())

使用/=*=可以消除中间的临时阵列,从而节省了一些内存。乘法比除法便宜,所以

image *= 255.0/image.max()    # Uses 1 division and image.size multiplications

比…快一点

image /= image.max()/255.0    # Uses 1+image.size divisions

由于我们在这里使用基本的numpy方法,因此我认为这是尽可能有效的numpy解决方案。


就地操作不会更改容器数组的dtype。由于所需的标准化值是浮点型,因此在执行就地操作之前,audioand image数组需要具有浮点dtype。如果它们还不是浮点dtype,则需要使用进行转换astype。例如,

image = image.astype('float64')
audio /= np.max(np.abs(audio),axis=0)
image *= (255.0/image.max())

Using /= and *= allows you to eliminate an intermediate temporary array, thus saving some memory. Multiplication is less expensive than division, so

image *= 255.0/image.max()    # Uses 1 division and image.size multiplications

is marginally faster than

image /= image.max()/255.0    # Uses 1+image.size divisions

Since we are using basic numpy methods here, I think this is about as efficient a solution in numpy as can be.


In-place operations do not change the dtype of the container array. Since the desired normalized values are floats, the audio and image arrays need to have floating-point point dtype before the in-place operations are performed. If they are not already of floating-point dtype, you’ll need to convert them using astype. For example,

image = image.astype('float64')

回答 1

如果数组同时包含正数和负数,我将使用:

import numpy as np

a = np.random.rand(3,2)

# Normalised [0,1]
b = (a - np.min(a))/np.ptp(a)

# Normalised [0,255] as integer: don't forget the parenthesis before astype(int)
c = (255*(a - np.min(a))/np.ptp(a)).astype(int)        

# Normalised [-1,1]
d = 2.*(a - np.min(a))/np.ptp(a)-1

如果数组包含nan,则一种解决方案是将其删除为:

def nan_ptp(a):
    return np.ptp(a[np.isfinite(a)])

b = (a - np.nanmin(a))/nan_ptp(a)

但是,根据上下文,您可能需要nan不同的对待。例如,插值,用例如0代替,或引发错误。

最后,值得一提的是,即使不是OP的问题,也要标准化

e = (a - np.mean(a)) / np.std(a)

If the array contains both positive and negative data, I’d go with:

import numpy as np

a = np.random.rand(3,2)

# Normalised [0,1]
b = (a - np.min(a))/np.ptp(a)

# Normalised [0,255] as integer: don't forget the parenthesis before astype(int)
c = (255*(a - np.min(a))/np.ptp(a)).astype(int)        

# Normalised [-1,1]
d = 2.*(a - np.min(a))/np.ptp(a)-1

If the array contains nan, one solution could be to just remove them as:

def nan_ptp(a):
    return np.ptp(a[np.isfinite(a)])

b = (a - np.nanmin(a))/nan_ptp(a)

However, depending on the context you might want to treat nan differently. E.g. interpolate the value, replacing in with e.g. 0, or raise an error.

Finally, worth mentioning even if it’s not OP’s question, standardization:

e = (a - np.mean(a)) / np.std(a)

回答 2

您也可以使用重新缩放sklearn。优势在于,除了对数据进行均值居中之外,还可以调整标准差的归一化,并且可以在任一轴上,通过要素或按记录进行校准。

from sklearn.preprocessing import scale
X = scale( X, axis=0, with_mean=True, with_std=True, copy=True )

关键词参数axiswith_meanwith_std是自我解释,并且在默认状态显示。如果该参数copy设置为,则执行就地操作False这里的文件

You can also rescale using sklearn. The advantages are that you can adjust normalize the standard deviation, in addition to mean-centering the data, and that you can do this on either axis, by features, or by records.

from sklearn.preprocessing import scale
X = scale( X, axis=0, with_mean=True, with_std=True, copy=True )

The keyword arguments axis, with_mean, with_std are self explanatory, and are shown in their default state. The argument copy performs the operation in-place if it is set to False. Documentation here.


回答 3

您可以使用“ i”版本(如idiv中的imul ..),它看起来还不错:

image /= (image.max()/255.0)

在另一种情况下,您可以编写一个函数来通过colums标准化n维数组:

def normalize_columns(arr):
    rows, cols = arr.shape
    for col in xrange(cols):
        arr[:,col] /= abs(arr[:,col]).max()

You can use the “i” (as in idiv, imul..) version, and it doesn’t look half bad:

image /= (image.max()/255.0)

For the other case you can write a function to normalize an n-dimensional array by colums:

def normalize_columns(arr):
    rows, cols = arr.shape
    for col in xrange(cols):
        arr[:,col] /= abs(arr[:,col]).max()

回答 4

您正在尝试最小-最大比例缩放audio介于-1和+1 image之间以及0和255之间的值。

使用sklearn.preprocessing.minmax_scale,应该可以轻松解决您的问题。

例如:

audio_scaled = minmax_scale(audio, feature_range=(-1,1))

shape = image.shape
image_scaled = minmax_scale(image.ravel(), feature_range=(0,255)).reshape(shape)

注意:不要与将向量的范数(长度)缩放到某个值(通常为1)的操作相混淆,该操作通常也称为归一化。

You are trying to min-max scale the values of audio between -1 and +1 and image between 0 and 255.

Using sklearn.preprocessing.minmax_scale, should easily solve your problem.

e.g.:

audio_scaled = minmax_scale(audio, feature_range=(-1,1))

and

shape = image.shape
image_scaled = minmax_scale(image.ravel(), feature_range=(0,255)).reshape(shape)

note: Not to be confused with the operation that scales the norm (length) of a vector to a certain value (usually 1), which is also commonly referred to as normalization.


回答 5

一个简单的解决方案是使用sklearn.preprocessing库提供的缩放器。

scaler = sk.MinMaxScaler(feature_range=(0, 250))
scaler = scaler.fit(X)
X_scaled = scaler.transform(X)
# Checking reconstruction
X_rec = scaler.inverse_transform(X_scaled)

错误X_rec-X将为零。您可以根据需要调整feature_range,甚至可以使用标准缩放器sk.StandardScaler()

A simple solution is using the scalers offered by the sklearn.preprocessing library.

scaler = sk.MinMaxScaler(feature_range=(0, 250))
scaler = scaler.fit(X)
X_scaled = scaler.transform(X)
# Checking reconstruction
X_rec = scaler.inverse_transform(X_scaled)

The error X_rec-X will be zero. You can adjust the feature_range for your needs, or even use a standart scaler sk.StandardScaler()


回答 6

我尝试按照此操作,但出现了错误

TypeError: ufunc 'true_divide' output (typecode 'd') could not be coerced to provided output parameter (typecode 'l') according to the casting rule ''same_kind''

numpy我试图正常化阵列是一个integer数组。似乎他们不赞成在版本>中进行类型转换1.10,而您必须使用它numpy.true_divide()来解决该问题。

arr = np.array(img)
arr = np.true_divide(arr,[255.0],out=None)

img是一个PIL.Image对象。

I tried following this, and got the error

TypeError: ufunc 'true_divide' output (typecode 'd') could not be coerced to provided output parameter (typecode 'l') according to the casting rule ''same_kind''

The numpy array I was trying to normalize was an integer array. It seems they deprecated type casting in versions > 1.10, and you have to use numpy.true_divide() to resolve that.

arr = np.array(img)
arr = np.true_divide(arr,[255.0],out=None)

img was an PIL.Image object.


在Python中显示带有两位小数的浮点数

问题:在Python中显示带有两位小数的浮点数

我有一个带浮点参数的函数(通常是整数或具有一位有效数字的十进制数),我需要在字符串中输出具有两位小数位的值(5-> 5.00、5.5-> 5.50等)。如何在Python中做到这一点?

I have a function taking float arguments (generally integers or decimals with one significant digit), and I need to output the values in a string with two decimal places (5 -> 5.00, 5.5 -> 5.50, etc). How can I do this in Python?


回答 0

您可以为此使用字符串格式运算符:

>>> '%.2f' % 1.234
'1.23'
>>> '%.2f' % 5.0
'5.00'

运算符的结果是一个字符串,因此您可以将其存储在变量中,进行打印等。

You could use the string formatting operator for that:

>>> '%.2f' % 1.234
'1.23'
>>> '%.2f' % 5.0
'5.00'

The result of the operator is a string, so you can store it in a variable, print etc.


回答 1

由于这篇文章可能会在这里出现一段时间,因此我们还要指出python 3语法:

"{:.2f}".format(5)

Since this post might be here for a while, lets also point out python 3 syntax:

"{:.2f}".format(5)

回答 2

f字符串格式:

这是Python 3.6中的新功能-照常将字符串放在引号中,并f'...以与r'...原始字符串相同的方式加上前缀。然后,将任何要放入字符串,变量,数字,大括号内的内容放入其中-Python会f'some string text with a {variable} or {number} within that text'像以前的字符串格式化方法那样进行求值,只是该方法更具可读性。

>>> a = 3.141592
>>> print(f'My number is {a:.2f} - look at the nice rounding!')

My number is 3.14 - look at the nice rounding!

您可以在此示例中看到,我们以与以前的字符串格式化方法相似的方式用小数位格式化。

NB a可以是数字,变量甚至是表达式,例如f'{3*my_func(3.14):02f}'

展望未来,使用新代码,我更喜欢f字符串而不是常见的%s或str.format()方法,因为f字符串可以更容易阅读,并且通常更快

f-string formatting:

This was new in Python 3.6 – the string is placed in quotation marks as usual, prepended with f'... in the same way you would r'... for a raw string. Then you place whatever you want to put within your string, variables, numbers, inside braces f'some string text with a {variable} or {number} within that text' – and Python evaluates as with previous string formatting methods, except that this method is much more readable.

>>> foobar = 3.141592
>>> print(f'My number is {foobar:.2f} - look at the nice rounding!')

My number is 3.14 - look at the nice rounding!

You can see in this example we format with decimal places in similar fashion to previous string formatting methods.

NB foobar can be an number, variable, or even an expression eg f'{3*my_func(3.14):02f}'.

Going forward, with new code I prefer f-strings over common %s or str.format() methods as f-strings can be far more readable, and are often much faster.


回答 3

字符串格式:

print "%.2f" % 5

String formatting:

print "%.2f" % 5

回答 4

使用python字符串格式。

>>> "%0.2f" % 3
'3.00'

Using python string formatting.

>>> "%0.2f" % 3
'3.00'

回答 5

字符串格式:

a = 6.789809823
print('%.2f' %a)

要么

print ("{0:.2f}".format(a)) 

舍入函数可以使用:

print(round(a, 2))

round()的好处是,我们可以将结果存储到另一个变量中,然后将其用于其他目的。

b = round(a, 2)
print(b)

String Formatting:

a = 6.789809823
print('%.2f' %a)

OR

print ("{0:.2f}".format(a)) 

Round Function can be used:

print(round(a, 2))

Good thing about round() is that, we can store this result to another variable, and then use it for other purposes.

b = round(a, 2)
print(b)

回答 6

最短的Python 3语法:

n = 5
print(f'{n:.2f}')

Shortest Python 3 syntax:

n = 5
print(f'{n:.2f}')

回答 7

如果您实际上想更改数字本身,而不是只显示不同的数字,请使用format()

将其格式化为2位小数:

format(value, '.2f')

例:

>>> format(5.00000, '.2f')
'5.00'

If you actually want to change the number itself instead of only displaying it differently use format()

Format it to 2 decimal places:

format(value, '.2f')

example:

>>> format(5.00000, '.2f')
'5.00'

回答 8

我知道这是一个古老的问题,但我一直在努力寻找答案。这是我想出的:

Python 3:

>>> num_dict = {'num': 0.123, 'num2': 0.127}
>>> "{0[num]:.2f}_{0[num2]:.2f}".format(num_dict) 
0.12_0.13

I know it is an old question, but I was struggling finding the answer myself. Here is what I have come up with:

Python 3:

>>> num_dict = {'num': 0.123, 'num2': 0.127}
>>> "{0[num]:.2f}_{0[num2]:.2f}".format(num_dict) 
0.12_0.13

回答 9

使用Python 3语法:

print('%.2f' % number)

Using Python 3 syntax:

print('%.2f' % number)

回答 10

如果要在调用输入时获得一个小数点后两位数限制的浮点值,

看看这个〜

a = eval(format(float(input()), '.2f'))   # if u feed 3.1415 for 'a'.
print(a)                                  # output 3.14 will be printed.

If you want to get a floating point value with two decimal places limited at the time of calling input,

Check this out ~

a = eval(format(float(input()), '.2f'))   # if u feed 3.1415 for 'a'.
print(a)                                  # output 3.14 will be printed.