标签归档:Python

为什么表达式0 <0 == 0在Python中返回False?

问题:为什么表达式0 <0 == 0在Python中返回False?

在Python 2.6中查看Queue.py时,我发现这个构造有点奇怪:

def full(self):
    """Return True if the queue is full, False otherwise
    (not reliable!)."""
    self.mutex.acquire()
    n = 0 < self.maxsize == self._qsize()
    self.mutex.release()
    return n

如果maxsize为0,则队列永远不会满。

我的问题是在这种情况下如何运作?如何0 < 0 == 0被认为是错误的?

>>> 0 < 0 == 0
False
>>> (0) < (0 == 0)
True
>>> (0 < 0) == 0
True
>>> 0 < (0 == 0)
True

Looking into Queue.py in Python 2.6, I found this construct that I found a bit strange:

def full(self):
    """Return True if the queue is full, False otherwise
    (not reliable!)."""
    self.mutex.acquire()
    n = 0 < self.maxsize == self._qsize()
    self.mutex.release()
    return n

If maxsize is 0 the queue is never full.

My question is how does it work for this case? How 0 < 0 == 0 is considered False?

>>> 0 < 0 == 0
False
>>> (0) < (0 == 0)
True
>>> (0 < 0) == 0
True
>>> 0 < (0 == 0)
True

回答 0

我相信Python对关系运算符的序列有特殊的处理方式,以使范围比较易于表达。可以说更好0 < x <= 5比好得多(0 < x) and (x <= 5)

这些称为链接比较。这是他们文档的链接。

在您谈论的其他情况下,括号会强制在一个关系运算符之前应用一个关系运算符,因此它们不再是链式比较。并且由于True和的False值都是整数,因此您可以从括号中得到答案。

I believe Python has special case handling for sequences of relational operators to make range comparisons easy to express. It’s much nicer to be able to say 0 < x <= 5 than to say (0 < x) and (x <= 5).

These are called chained comparisons. And that’s a link to the documentation for them.

With the other cases you talk about, the parenthesis force one relational operator to be applied before the other, and so they are no longer chained comparisons. And since True and False have values as integers you get the answers you do out of the parenthesized versions.


回答 1

因为

(0 < 0) and (0 == 0)

False。您可以将比较运算符链接在一起,它们会自动扩展为成对比较。


编辑-关于Python中正确与错误的说明

在Python中TrueFalse仅仅是的实例bool,它是的子类int。换句话说,True确实只有1。

这样做的目的是,您可以像完全使用整数一样使用布尔比较的结果。这导致诸如

>>> (1==1)+(1==1)
2
>>> (2<1)<1
True

但是,只有在您对比较加括号以使它们首先被评估时,这些情况才会发生。否则,Python将扩展比较运算符。

Because

(0 < 0) and (0 == 0)

is False. You can chain together comparison operators and they are automatically expanded out into the pairwise comparisons.


EDIT — clarification about True and False in Python

In Python True and False are just instances of bool, which is a subclass of int. In other words, True really is just 1.

The point of this is that you can use the result of a boolean comparison exactly like an integer. This leads to confusing things like

>>> (1==1)+(1==1)
2
>>> (2<1)<1
True

But these will only happen if you parenthesise the comparisons so that they are evaluated first. Otherwise Python will expand out the comparison operators.


回答 2

您遇到的奇怪行为来自python链接条件的能力。由于发现0不小于0,因此它决定整个表达式的计算结果为false。一旦将其分解为单独的条件,就在更改功能。最初,它实际上是在测试a < b && b == c您对的原始声明a < b == c

另一个例子:

>>> 1 < 5 < 3
False

>>> (1 < 5) < 3
True

The strange behavior your experiencing comes from pythons ability to chain conditions. Since it finds 0 is not less than 0, it decides the entire expression evaluates to false. As soon as you break this apart into seperate conditions, you’re changing the functionality. It initially is essentially testing that a < b && b == c for your original statement of a < b == c.

Another example:

>>> 1 < 5 < 3
False

>>> (1 < 5) < 3
True

回答 3

>>> 0 < 0 == 0
False

这是一个链式比较。如果每个成对比较依次为true,则返回true。相当于(0 < 0) and (0 == 0)

>>> (0) < (0 == 0)
True

这等效0 < True于其值为True。

>>> (0 < 0) == 0
True

这等效False == 0于其值为True。

>>> 0 < (0 == 0)
True

0 < True如上所述,与之等效的结果为True。

>>> 0 < 0 == 0
False

This is a chained comparison. It returns true if each pairwise comparison in turn is true. It is the equivalent to (0 < 0) and (0 == 0)

>>> (0) < (0 == 0)
True

This is equivalent to 0 < True which evaluates to True.

>>> (0 < 0) == 0
True

This is equivalent to False == 0 which evaluates to True.

>>> 0 < (0 == 0)
True

Equivalent to 0 < True which, as above, evaluates to True.


回答 4

查看反汇编(字节码),很明显为什么0 < 0 == 0False

这是对此表达式的分析:

>>>import dis

>>>def f():
...    0 < 0 == 0

>>>dis.dis(f)
  2      0 LOAD_CONST               1 (0)
         3 LOAD_CONST               1 (0)
         6 DUP_TOP
         7 ROT_THREE
         8 COMPARE_OP               0 (<)
        11 JUMP_IF_FALSE_OR_POP    23
        14 LOAD_CONST               1 (0)
        17 COMPARE_OP               2 (==)
        20 JUMP_FORWARD             2 (to 25)
   >>   23 ROT_TWO
        24 POP_TOP
   >>   25 POP_TOP
        26 LOAD_CONST               0 (None)
        29 RETURN_VALUE

注意第0-8行:这些行检查是否0 < 0明显返回False了python堆栈。

现在注意第11行:JUMP_IF_FALSE_OR_POP 23 这意味着如果0 < 0返回,则False跳到第23行。

现在,0 < 0False,因此进行了跳转,即使堆栈中的部分甚至没有被检查,它也会False为堆栈留下a ,这是整个表达式的返回值。0 < 0 == 0== 0

因此,总而言之,答案就像对该问题的其他答案中所说的那样。 0 < 0 == 0有特殊的意义。编译器将其评估为两个术语:0 < 00 == 0。与任何复杂的布尔表达式一样and它们之间的一样,如果第一个失败,则甚至不会检查第二个。

希望这对我们有所启发,并且我真的希望我用来分析这种意外行为的方法会鼓励其他人将来尝试相同的方法。

Looking at the disassembly (the bytes codes) it is obvious why 0 < 0 == 0 is False.

Here is an analysis of this expression:

>>>import dis

>>>def f():
...    0 < 0 == 0

>>>dis.dis(f)
  2      0 LOAD_CONST               1 (0)
         3 LOAD_CONST               1 (0)
         6 DUP_TOP
         7 ROT_THREE
         8 COMPARE_OP               0 (<)
        11 JUMP_IF_FALSE_OR_POP    23
        14 LOAD_CONST               1 (0)
        17 COMPARE_OP               2 (==)
        20 JUMP_FORWARD             2 (to 25)
   >>   23 ROT_TWO
        24 POP_TOP
   >>   25 POP_TOP
        26 LOAD_CONST               0 (None)
        29 RETURN_VALUE

Notice lines 0-8: These lines check if 0 < 0 which obviously returns False onto the python stack.

Now notice line 11: JUMP_IF_FALSE_OR_POP 23 This means that if 0 < 0 returns False perform a jump to line 23.

Now, 0 < 0 is False, so the jump is taken, which leaves the stack with a False which is the return value for the whole expression 0 < 0 == 0, even though the == 0 part isn’t even checked.

So, to conclude, the answer is like said in other answers to this question. 0 < 0 == 0 has a special meaning. The compiler evaluates this to two terms: 0 < 0 and 0 == 0. As with any complex boolean expressions with and between them, if the first fails then the second one isn’t even checked.

Hopes this enlightens things up a bit, and I really hope that the method I used to analyse this unexpected behavior will encourage others to try the same in the future.


回答 5

正如其他人提到的那样x comparison_operator y comparison_operator z,语法糖(x comparison_operator y) and (y comparison_operator z)的优势在于y仅被评估一次。

因此,您的表情0 < 0 == 0是真的(0 < 0) and (0 == 0),它的评估False and True结果是公正的False

As other’s mentioned x comparison_operator y comparison_operator z is syntactical sugar for (x comparison_operator y) and (y comparison_operator z) with the bonus that y is only evaluated once.

So your expression 0 < 0 == 0 is really (0 < 0) and (0 == 0), which evaluates to False and True which is just False.


回答 6

也许从这个摘录文档可以帮助:

这些是所谓的“丰富比较”方法,在__cmp__()下面优先于比较运算符。运算符符号和方法名之间的对应关系如下:x<y呼叫 x.__lt__(y)x<=y呼叫x.__le__(y)x==y呼叫x.__eq__(y)x!=yx<>y 呼叫x.__ne__(y)x>y呼叫 x.__gt__(y),和x>=y呼叫 x.__ge__(y)

如果富比较方法NotImplemented未实现给定参数对的操作,则可能返回单例。按照惯例,False并将True其返回以进行成功比较。但是,这些方法可以返回任何值,因此,如果在布尔上下文中使用比较运算符(例如,在if语句的条件下),Python将调用bool()该值以确定结果是true还是false。

比较运算符之间没有隐含的关系。的真相x==y并不意味着那x!=y 是错误的。因此,在定义时 __eq__(),还应该定义一个,__ne__()以便操作符能够按预期运行。有关__hash__()创建可哈希对象(支持自定义比较操作并可用作字典键)的一些重要说明,请参见上的段落。

这些方法没有交换参数版本(当left参数不支持该操作但right参数支持该操作时使用);相反,__lt__()and __gt__() 是彼此的反射,__le__() and __ge__()是彼此的反射,and __eq__()and __ne__() 是自己的反射。

丰富比较方法的论点永远不会被强迫。

这些是比较,但是由于要链接比较,因此您应该知道:

可以任意链接比较,例如x < y <= z与等效x < y and y <= z,除了y仅被评估一次(但是在两种情况下,当x <y为假时,z都不被评估)。

形式上,如果a,b,c,…,y,z是表达式,而op1,op2,…,opN是比较运算符,则op1 b op2 c … y opN z等效于op1 b和b op2 c和… y opN z,除了每个表达式最多计算一次。

maybe this excerpt from the docs can help:

These are the so-called “rich comparison” methods, and are called for comparison operators in preference to __cmp__() below. The correspondence between operator symbols and method names is as follows: x<y calls x.__lt__(y), x<=y calls x.__le__(y), x==y calls x.__eq__(y), x!=y and x<>y call x.__ne__(y), x>y calls x.__gt__(y), and x>=y calls x.__ge__(y).

A rich comparison method may return the singleton NotImplemented if it does not implement the operation for a given pair of arguments. By convention, False and True are returned for a successful comparison. However, these methods can return any value, so if the comparison operator is used in a Boolean context (e.g., in the condition of an if statement), Python will call bool() on the value to determine if the result is true or false.

There are no implied relationships among the comparison operators. The truth of x==y does not imply that x!=y is false. Accordingly, when defining __eq__(), one should also define __ne__() so that the operators will behave as expected. See the paragraph on __hash__() for some important notes on creating hashable objects which support custom comparison operations and are usable as dictionary keys.

There are no swapped-argument versions of these methods (to be used when the left argument does not support the operation but the right argument does); rather, __lt__() and __gt__() are each other’s reflection, __le__() and __ge__() are each other’s reflection, and __eq__() and __ne__() are their own reflection.

Arguments to rich comparison methods are never coerced.

These were comparisons but since you are chaining comparisons you should know that:

Comparisons can be chained arbitrarily, e.g., x < y <= z is equivalent to x < y and y <= z, except that y is evaluated only once (but in both cases z is not evaluated at all when x < y is found to be false).

Formally, if a, b, c, …, y, z are expressions and op1, op2, …, opN are comparison operators, then a op1 b op2 c … y opN z is equivalent to a op1 b and b op2 c and … y opN z, except that each expression is evaluated at most once.


回答 7

这就是它的全部荣耀。

>>> class showme(object):
...   def __init__(self, name, value):
...     self.name, self.value = name, value
...   def __repr__(self):
...     return "<showme %s:%s>" % (self.name, self.value)
...   def __cmp__(self, other):
...     print "cmp(%r, %r)" % (self, other)
...     if type(other) == showme:
...       return cmp(self.value, other.value)
...     else:
...       return cmp(self.value, other)
... 
>>> showme(1,0) < showme(2,0) == showme(3,0)
cmp(<showme 1:0>, <showme 2:0>)
False
>>> (showme(1,0) < showme(2,0)) == showme(3,0)
cmp(<showme 1:0>, <showme 2:0>)
cmp(<showme 3:0>, False)
True
>>> showme(1,0) < (showme(2,0) == showme(3,0))
cmp(<showme 2:0>, <showme 3:0>)
cmp(<showme 1:0>, True)
True
>>> 

Here it is, in all its glory.

>>> class showme(object):
...   def __init__(self, name, value):
...     self.name, self.value = name, value
...   def __repr__(self):
...     return "<showme %s:%s>" % (self.name, self.value)
...   def __cmp__(self, other):
...     print "cmp(%r, %r)" % (self, other)
...     if type(other) == showme:
...       return cmp(self.value, other.value)
...     else:
...       return cmp(self.value, other)
... 
>>> showme(1,0) < showme(2,0) == showme(3,0)
cmp(<showme 1:0>, <showme 2:0>)
False
>>> (showme(1,0) < showme(2,0)) == showme(3,0)
cmp(<showme 1:0>, <showme 2:0>)
cmp(<showme 3:0>, False)
True
>>> showme(1,0) < (showme(2,0) == showme(3,0))
cmp(<showme 2:0>, <showme 3:0>)
cmp(<showme 1:0>, True)
True
>>> 

回答 8

我在想Python在魔术之间做得很奇怪。与1 < 2 < 3均值2 相同,介于1和3之间。

在这种情况下,我认为它正在执行[中间0]大于[左0]并等于[右0]。中间0不大于左边0,因此它的值为false。

I’m thinking Python is doing it’s weird between magic. Same as 1 < 2 < 3 means 2 is between 1 and 3.

In this case, I think it’s doing [middle 0] is greater than [left 0] and equal to [right 0]. Middle 0 is not greater than left 0, so it evaluates to false.


我如何知道是否可以禁用SQLALCHEMY_TRACK_MODIFICATIONS?

问题:我如何知道是否可以禁用SQLALCHEMY_TRACK_MODIFICATIONS?

每次我运行使用Flask-SQLAlchemy的应用程序时,都会收到以下警告,提示该SQLALCHEMY_TRACK_MODIFICATIONS选项将被禁用。

/home/david/.virtualenvs/flask-sqlalchemy/lib/python3.5/site-packages/flask_sqlalchemy/__init__.py:800: UserWarning: SQLALCHEMY_TRACK_MODIFICATIONS adds significant overhead and will be disabled by default in the future.  Set it to True to suppress this warning.
  warnings.warn('SQLALCHEMY_TRACK_MODIFICATIONS adds significant overhead and will be disabled by default in the future.  Set it to True to suppress this warning.')

我试图找出此选项的作用,但是Flask-SQLAlchemy文档尚不清楚该跟踪的用途。

SQLALCHEMY_TRACK_MODIFICATIONS

如果设置为True(默认值),Flask-SQLAlchemy将跟踪对象的修改并发出信号。这需要额外的内存,如果不需要,可以将其禁用。

如何确定我的项目是否需要,SQLALCHEMY_TRACK_MODIFICATIONS = True或者是否可以安全地禁用此功能并在服务器上节省内存?

Every time I run my app that uses Flask-SQLAlchemy I get the following warning that the SQLALCHEMY_TRACK_MODIFICATIONS option will be disabled.

/home/david/.virtualenvs/flask-sqlalchemy/lib/python3.5/site-packages/flask_sqlalchemy/__init__.py:800: UserWarning: SQLALCHEMY_TRACK_MODIFICATIONS adds significant overhead and will be disabled by default in the future.  Set it to True to suppress this warning.
  warnings.warn('SQLALCHEMY_TRACK_MODIFICATIONS adds significant overhead and will be disabled by default in the future.  Set it to True to suppress this warning.')

I tried to find out what this option does, but the Flask-SQLAlchemy documentation isn’t clear about what uses this tracking.

SQLALCHEMY_TRACK_MODIFICATIONS

If set to True (the default) Flask-SQLAlchemy will track modifications of objects and emit signals. This requires extra memory and can be disabled if not needed.

How do I find out if my project requires SQLALCHEMY_TRACK_MODIFICATIONS = True or if I can safely disable this feature and save memory on my server?


回答 0

您的应用程序很可能没有使用Flask-SQLAlchemy事件系统,因此可以安全地关闭它。您需要审核代码以进行验证-您正在寻找与models_committedbefore_models_committed挂钩的任何内容。如果确实发现您正在使用Flask-SQLAlchemy事件系统,则可能应该更新代码以使用SQLAlchemy的内置事件系统。

要关闭Flask-SQLAlchemy事件系统(并禁用警告),只需添加:

SQLALCHEMY_TRACK_MODIFICATIONS = False

更改为您的应用程序配置,直到更改默认设置为止(很有可能在Flask-SQLAlchemy v3中)。


背景-警告告诉您的是以下内容:

Flask-SQLAlchemy有自己的事件通知系统,该系统在SQLAlchemy之上分层。为此,它跟踪对SQLAlchemy会话的修改。这需要额外的资源,因此该选项SQLALCHEMY_TRACK_MODIFICATIONS允许您禁用修改跟踪系统。当前,该选项默认为True,但将来该默认值将更改为False,从而禁用事件系统。

据我了解,更改的理由有三点:

  1. 使用Flask-SQLAlchemy的事件系统的人并不多,但是大多数人没有意识到他们可以通过禁用它来节省系统资源。因此,更明智的默认设置是禁用它,想要它的人可以打开它。

  2. Flask-SQLAlchemy中的事件系统存在相当多的错误(请参阅下面提到的请求请求中与之相关的问题),需要为很少有人使用的功能进行额外的维护。

  3. 在v0.7中,SQLAlchemy本身添加了一个强大的事件系统,其中包括创建自定义事件的功能。理想情况下,Flask-SQLAlchemy事件系统除了创建一些自定义的SQLAlchemy事件挂钩和侦听器外,无所不用其事,然后让SQLAlchemy自己管理事件触发器。

您可以在有关拉动请求的讨论中看到更多信息,该请求开始触发此警告

Most likely your application doesn’t use the Flask-SQLAlchemy event system, so you’re probably safe to turn off. You’ll need to audit the code to verify–you’re looking for anything that hooks into models_committed or before_models_committed. If you do find that you’re using the Flask-SQLAlchemy event system, you probably should update the code to use SQLAlchemy’s built-in event system instead.

To turn off the Flask-SQLAlchemy event system (and disable the warning), just add:

SQLALCHEMY_TRACK_MODIFICATIONS = False

to your app config until the default is changed (most likely in Flask-SQLAlchemy v3).


Background–here’s what the warning is telling you:

Flask-SQLAlchemy has its own event notification system that gets layered on top of SQLAlchemy. To do this, it tracks modifications to the SQLAlchemy session. This takes extra resources, so the option SQLALCHEMY_TRACK_MODIFICATIONS allows you to disable the modification tracking system. Currently the option defaults to True, but in the future, that default will change to False, thereby disabling the event system.

As far as I understand, the rationale for the change is three-fold:

  1. Not many people use Flask-SQLAlchemy’s event system, but most people don’t realize they can save system resources by disabling it. So a saner default is to disable it and those who want it can turn it on.

  2. The event system in Flask-SQLAlchemy has been rather buggy (see issues linked to in the pull request mentioned below), requiring additional maintenance for a feature that few people use.

  3. In v0.7, SQLAlchemy itself added a powerful event system including the ability to create custom events. Ideally, the Flask-SQLAlchemy event system should do nothing more than create a few custom SQLAlchemy event hooks and listeners, and then let SQLAlchemy itself manage the event trigger.

You can see more in the discussion around the pull request that started triggering this warning.


回答 1

Jeff Widman的详细解释非常完美。

由于在完成此操作之前我曾进行过一些“复制粘贴”的操作,因此我想使下一个穿鞋的操作变得更容易。

在您的代码中,紧接在

app = Flask(__name__)

如果要启用轨道修改,只需添加:

app.config['SQLALCHEMY_TRACK_MODIFICATIONS'] = True

否则,如果您使用此功能,则可能需要将值更改为False,以免浪费系统资源。由于您仍在显式设置配置,因此这仍然会使警告保持沉默。

这是具有False值的相同代码段:

app.config['SQLALCHEMY_TRACK_MODIFICATIONS'] = False

感谢Jeff Widman提出的建议和详细信息。

Jeff Widman’s detailed explanation is simply perfect.

Since I had some copy’n’paste fights before getting this right I’d like to make it easier for the next one that will be in my shoes.

In your code, immediately after:

app = Flask(__name__)

If you want to enable track modifications simply add:

app.config['SQLALCHEMY_TRACK_MODIFICATIONS'] = True

Otherwise, if you are not using this feature, you may want to change the value to False in order not to waste system resources. This will still silence the warning since you’re anyway explicitly setting the config.

Here’s the same snippet with False value:

app.config['SQLALCHEMY_TRACK_MODIFICATIONS'] = False

Thanks to Jeff Widman for this added suggestion and details.


回答 2

上面的答案看起来不错。但是,我想在Flask-SQLAlchemy文档中指出这一行,因为SQLALCHEMY_TRACK_MODIFICATIONS = False在我的应用程序配置中设置后,我仍然收到这些警告。

在此页面上:http : //flask-sqlalchemy.pocoo.org/2.3/config/

Flask-SQLAlchemy存在以下配置值。Flask-SQLAlchemy从您的主要Flask配置中加载这些值,可以通过多种方式填充。请注意,其中一些不能在创建引擎后进行修改,因此请确保尽早进行配置,并且不要在运行时进行修改。

换句话说,app.config 创建Flask-SQLAlchemy数据库之前,请确保设置您的数据库。

例如,如果您将应用程序配置为set SQLALCHEMY_TRACK_MODIFICATIONS = False

from flask import Flask
app = Flask(__name__)
app.config['SQLALCHEMY_TRACK_MODIFICATIONS'] = False

db = SQLAlchemy(app)

The above answers look good. However, I wanted to point out this line in the Flask-SQLAlchemy documentation because I was still getting these warnings after setting SQLALCHEMY_TRACK_MODIFICATIONS = False in my application config.

On this page: http://flask-sqlalchemy.pocoo.org/2.3/config/

The following configuration values exist for Flask-SQLAlchemy. Flask-SQLAlchemy loads these values from your main Flask config which can be populated in various ways. Note that some of those cannot be modified after the engine was created so make sure to configure as early as possible and to not modify them at runtime.

In other words, make sure to set up your app.config before creating your Flask-SQLAlchemy database.

For example, if you are configuring your application to set SQLALCHEMY_TRACK_MODIFICATIONS = False:

from flask import Flask
app = Flask(__name__)
app.config['SQLALCHEMY_TRACK_MODIFICATIONS'] = False

db = SQLAlchemy(app)

Python中的EAFP原理是什么?

问题:Python中的EAFP原理是什么?

Python中的“使用EAFP原理”是什么意思?你能提供一些例子吗?

What is meant by “using the EAFP principle” in Python? Could you provide any examples?


回答 0

词汇表中

寻求宽恕比允许容易。这种通用的Python编码风格假设存在有效的键或属性,并且在假设被证明为假的情况下捕获异常。这种干净快捷的样式的特点是存在许多tryexcept声明。该技术与C等其他许多语言通用的LBYL风格形成对比。

一个示例是尝试访问字典键。

EAFP:

try:
    x = my_dict["key"]
except KeyError:
    # handle missing key

LBYL:

if "key" in my_dict:
    x = my_dict["key"]
else:
    # handle missing key

LBYL版本必须在字典中搜索关键字两次,并且可能还被认为可读性较差。

From the glossary:

Easier to ask for forgiveness than permission. This common Python coding style assumes the existence of valid keys or attributes and catches exceptions if the assumption proves false. This clean and fast style is characterized by the presence of many try and except statements. The technique contrasts with the LBYL style common to many other languages such as C.

An example would be an attempt to access a dictionary key.

EAFP:

try:
    x = my_dict["key"]
except KeyError:
    # handle missing key

LBYL:

if "key" in my_dict:
    x = my_dict["key"]
else:
    # handle missing key

The LBYL version has to search the key inside the dictionary twice, and might also be considered slightly less readable.


回答 1

我将尝试通过另一个示例对其进行解释。

在这里,我们尝试访问文件并在控制台中打印内容。

LBYL-飞跃前先看看:

我们可能要检查是否可以访问该文件,如果可以,我们将其打开并打印内容。如果我们无法访问该文件,我们将发挥else作用。之所以成为竞争条件,是因为我们首先进行访问检查。到我们到达的时候with open(my_file) as f:,由于某些权限问题(例如,另一个进程获得了独占文件锁定),我们可能无法再访问它。该代码可能会引发错误,并且由于我们认为可以访问该文件,因此无法捕获该错误。

import os

my_file = "/path/to/my/file.txt"

# Race condition
if os.access(my_file, os.R_OK):
    with open(my_file) as f:
        print(f.read())
else:
    print("File can't be accessed")

EAFP-较宽容更容易寻求宽恕:

在此示例中,我们只是尝试打开文件,如果无法打开文件,则会抛出IOError。如果可以,我们将打开文件并打印内容。因此,我们不是在什么,而是在尝试做。如果有效,那就太好了!如果不是,我们将捕获错误并进行处理。

# # No race condition
try:
    f = open(my_file)
except IOError as e:
    print("File can't be accessed")
else:
    with f:
        print(f.read())

I’ll try to explain it with another example.

Here we’re trying to access the file and print the contents in console.

LBYL – Look Before You Leap :

We might want to check if we can access the file and if we can, we’ll open it and print the contents. If we can’t access the file we’ll hit the else part. The reason that this is a race condition is because we first make an access-check. By the time we reach with open(my_file) as f: maybe we can’t access it anymore due to some permission issues (for example another process gains an exclusive file lock). This code will likely throw an error and we won’t be able to catch that error because we thought that we could access the file.

import os

my_file = "/path/to/my/file.txt"

# Race condition
if os.access(my_file, os.R_OK):
    with open(my_file) as f:
        print(f.read())
else:
    print("File can't be accessed")

EAFP – Easier to Ask for Forgiveness than Permission :

In this example, we’re just trying to open the file and if we can’t open it, it’ll throw an IOError. If we can, we’ll open the file and print the contents. So instead of asking something we’re trying to do it. If it works, great! If it doesn’t we catch the error and handle it.

# # No race condition
try:
    f = open(my_file)
except IOError as e:
    print("File can't be accessed")
else:
    with f:
        print(f.read())

回答 2

我称之为“乐观编程”。这个想法是,大多数时候人们会做正确的事,错误应该很少。因此,首先编写代码以使“正确的事情”发生,然后,如果没有,则捕获错误。

我的感觉是,如果用户要犯错误,那么他们应该是遭受时间后果的人。正确使用该工具的人会被激怒。

I call it “optimistic programming”. The idea is that most times people will do the right thing, and errors should be few. So code first for the “right thing” to happen, and then catch the errors if they don’t.

My feeling is that if a user is going to be making mistakes, they should be the one to suffer the time consequences. People who use the tool the right way are sped through.


什么是Python缓冲区类型?

问题:什么是Python缓冲区类型?

bufferpython中有一个类型,但是我不知道如何使用它。

Python文档中,描述为:

buffer(object[, offset[, size]])

object参数必须是支持缓冲区调用接口的对象(例如字符串,数组和缓冲区)。将创建一个引用该对象参数的新缓冲区对象。缓冲区对象将从对象的开头(或指定的偏移量)开始是一个切片。切片将延伸到对象的末尾(或具有由size参数指定的长度)。

There is a buffer type in python, but I don’t know how can I use it.

In the Python doc the description is:

buffer(object[, offset[, size]])

The object argument must be an object that supports the buffer call interface (such as strings, arrays, and buffers). A new buffer object will be created which references the object argument. The buffer object will be a slice from the beginning of object (or from the specified offset). The slice will extend to the end of object (or will have a length given by the size argument).


回答 0

用法示例:

>>> s = 'Hello world'
>>> t = buffer(s, 6, 5)
>>> t
<read-only buffer for 0x10064a4b0, size 5, offset 6 at 0x100634ab0>
>>> print t
world

在这种情况下,缓冲区是一个子字符串,从位置6开始,长度为5,并且不占用额外的存储空间-它引用字符串的一部分。

这对于像这样的短字符串不是很有用,但是在使用大量数据时可能是必需的。这个例子使用了可变的bytearray

>>> s = bytearray(1000000)   # a million zeroed bytes
>>> t = buffer(s, 1)         # slice cuts off the first byte
>>> s[1] = 5                 # set the second element in s
>>> t[0]                     # which is now also the first element in t!
'\x05'

如果您想对数据有多个视图并且不想(或不能)在内存中保存多个副本,这将非常有用。

请注意,尽管您可以在Python 2.7中使用它,但是buffer已经被memoryviewPython 3中的更好名称所代替。

还要注意,如果不深入研究C API,就不能为自己的对象实现缓冲区接口,即,不能在纯Python中做到这一点。

An example usage:

>>> s = 'Hello world'
>>> t = buffer(s, 6, 5)
>>> t
<read-only buffer for 0x10064a4b0, size 5, offset 6 at 0x100634ab0>
>>> print t
world

The buffer in this case is a sub-string, starting at position 6 with length 5, and it doesn’t take extra storage space – it references a slice of the string.

This isn’t very useful for short strings like this, but it can be necessary when using large amounts of data. This example uses a mutable bytearray:

>>> s = bytearray(1000000)   # a million zeroed bytes
>>> t = buffer(s, 1)         # slice cuts off the first byte
>>> s[1] = 5                 # set the second element in s
>>> t[0]                     # which is now also the first element in t!
'\x05'

This can be very helpful if you want to have more than one view on the data and don’t want to (or can’t) hold multiple copies in memory.

Note that buffer has been replaced by the better named memoryview in Python 3, though you can use either in Python 2.7.

Note also that you can’t implement a buffer interface for your own objects without delving into the C API, i.e. you can’t do it in pure Python.


回答 1

我认为在将python与本机库接口时,缓冲区非常有用。(Guido van Rossum buffer此邮件列表帖子中进行了解释)。

例如,numpy似乎使用缓冲区进行有效的数据存储:

import numpy
a = numpy.ndarray(1000000)

a.data是:

<read-write buffer for 0x1d7b410, size 8000000, offset 0 at 0x1e353b0>

I think buffers are e.g. useful when interfacing python to native libraries. (Guido van Rossum explains buffer in this mailinglist post).

For example, numpy seems to use buffer for efficient data storage:

import numpy
a = numpy.ndarray(1000000)

the a.data is a:

<read-write buffer for 0x1d7b410, size 8000000, offset 0 at 0x1e353b0>

在Python中获取对象的全限定类名

问题:在Python中获取对象的全限定类名

出于记录目的,我想检索Python对象的完全限定的类名。(对于完全限定,我的意思是类名称,包括软件包和模块名称。)

我知道x.__class__.__name__,但是有一种简单的方法来获取软件包和模块吗?

For logging purposes I want to retrieve the fully qualified class name of a Python object. (With fully qualified I mean the class name including the package and module name.)

I know about x.__class__.__name__, but is there a simple method to get the package and module?


回答 0

随着以下程序

#! /usr/bin/env python

import foo

def fullname(o):
  # o.__module__ + "." + o.__class__.__qualname__ is an example in
  # this context of H.L. Mencken's "neat, plausible, and wrong."
  # Python makes no guarantees as to whether the __module__ special
  # attribute is defined, so we take a more circumspect approach.
  # Alas, the module name is explicitly excluded from __qualname__
  # in Python 3.

  module = o.__class__.__module__
  if module is None or module == str.__class__.__module__:
    return o.__class__.__name__  # Avoid reporting __builtin__
  else:
    return module + '.' + o.__class__.__name__

bar = foo.Bar()
print fullname(bar)

Bar定义为

class Bar(object):
  def __init__(self, v=42):
    self.val = v

输出是

$ ./prog.py
foo.Bar

With the following program

#! /usr/bin/env python

import foo

def fullname(o):
  # o.__module__ + "." + o.__class__.__qualname__ is an example in
  # this context of H.L. Mencken's "neat, plausible, and wrong."
  # Python makes no guarantees as to whether the __module__ special
  # attribute is defined, so we take a more circumspect approach.
  # Alas, the module name is explicitly excluded from __qualname__
  # in Python 3.

  module = o.__class__.__module__
  if module is None or module == str.__class__.__module__:
    return o.__class__.__name__  # Avoid reporting __builtin__
  else:
    return module + '.' + o.__class__.__name__

bar = foo.Bar()
print fullname(bar)

and Bar defined as

class Bar(object):
  def __init__(self, v=42):
    self.val = v

the output is

$ ./prog.py
foo.Bar

回答 1

提供的答案不涉及嵌套类。尽管直到Python 3.3(PEP 3155)才可用__qualname__,但您确实想使用该类。最终(3.4?PEP 395),__qualname__模块也将存在,以处理模块被重命名的情况(即,将其重命名为__main__)。

The provided answers don’t deal with nested classes. Though it’s not available until Python 3.3 (PEP 3155), you really want to use __qualname__ of the class. Eventually (3.4? PEP 395), __qualname__ will also exist for modules to deal with cases where the module is renamed (i.e. when it is renamed to __main__).


回答 2

考虑使用inspect具有以下功能的模块getmodule

>>>import inspect
>>>import xml.etree.ElementTree
>>>et = xml.etree.ElementTree.ElementTree()
>>>inspect.getmodule(et)
<module 'xml.etree.ElementTree' from 
        'D:\tools\python2.5.2\lib\xml\etree\ElementTree.pyc'>

Consider using the inspect module which has functions like getmodule which might be what are looking for:

>>>import inspect
>>>import xml.etree.ElementTree
>>>et = xml.etree.ElementTree.ElementTree()
>>>inspect.getmodule(et)
<module 'xml.etree.ElementTree' from 
        'D:\tools\python2.5.2\lib\xml\etree\ElementTree.pyc'>

回答 3

这是基于格雷格·培根(Greg Bacon)出色答案的答案,但还要进行一些额外的检查:

__module__可以是None(根据文档),也str可以是类似的类型__builtin__(您可能不想在日志或其他内容中出现)。以下检查这两种可能性:

def fullname(o):
    module = o.__class__.__module__
    if module is None or module == str.__class__.__module__:
        return o.__class__.__name__
    return module + '.' + o.__class__.__name__

(可能有一种更好的检查方法__builtin__。以上内容仅取决于以下事实:str始终可用,并且其模块始终为__builtin__

Here’s one based on Greg Bacon’s excellent answer, but with a couple of extra checks:

__module__ can be None (according to the docs), and also for a type like str it can be __builtin__ (which you might not want appearing in logs or whatever). The following checks for both those possibilities:

def fullname(o):
    module = o.__class__.__module__
    if module is None or module == str.__class__.__module__:
        return o.__class__.__name__
    return module + '.' + o.__class__.__name__

(There might be a better way to check for __builtin__. The above just relies on the fact that str is always available, and its module is always __builtin__)


回答 4

对于python3.7我使用:

".".join([obj.__module__, obj.__name__])

获得:

package.subpackage.ClassName

For python3.7 I use:

".".join([obj.__module__, obj.__name__])

Getting:

package.subpackage.ClassName

回答 5

__module__ 会成功的

尝试:

>>> import re
>>> print re.compile.__module__
re

该站点建议这__package__可能适用于Python 3.0。但是,此处给出的示例在我的Python 2.5.2控制台下不起作用。

__module__ would do the trick.

Try:

>>> import re
>>> print re.compile.__module__
re

This site suggests that __package__ might work for Python 3.0; However, the examples given there won’t work under my Python 2.5.2 console.


回答 6

这是一个hack,但是我支持2.6,只需要简单一些即可:

>>> from logging.handlers import MemoryHandler as MH
>>> str(MH).split("'")[1]

'logging.handlers.MemoryHandler'

This is a hack but I’m supporting 2.6 and just need something simple:

>>> from logging.handlers import MemoryHandler as MH
>>> str(MH).split("'")[1]

'logging.handlers.MemoryHandler'

回答 7

有些人(例如https://stackoverflow.com/a/16763814/5766934)认为__qualname__比更好__name__。这是显示区别的示例:

$ cat dummy.py 
class One:
    class Two:
        pass

$ python3.6
>>> import dummy
>>> print(dummy.One)
<class 'dummy.One'>
>>> print(dummy.One.Two)
<class 'dummy.One.Two'>
>>> def full_name_with_name(klass):
...     return f'{klass.__module__}.{klass.__name__}'
>>> def full_name_with_qualname(klass):
...     return f'{klass.__module__}.{klass.__qualname__}'
>>> print(full_name_with_name(dummy.One))  # Correct
dummy.One
>>> print(full_name_with_name(dummy.One.Two))  # Wrong
dummy.Two
>>> print(full_name_with_qualname(dummy.One))  # Correct
dummy.One
>>> print(full_name_with_qualname(dummy.One.Two))  # Correct
dummy.One.Two

请注意,它对于buildins也可以正常工作:

>>> print(full_name_with_qualname(print))
builtins.print
>>> import builtins
>>> builtins.print
<built-in function print>

Some people (e.g. https://stackoverflow.com/a/16763814/5766934) arguing that __qualname__ is better than __name__. Here is an example that shows the difference:

$ cat dummy.py 
class One:
    class Two:
        pass

$ python3.6
>>> import dummy
>>> print(dummy.One)
<class 'dummy.One'>
>>> print(dummy.One.Two)
<class 'dummy.One.Two'>
>>> def full_name_with_name(klass):
...     return f'{klass.__module__}.{klass.__name__}'
>>> def full_name_with_qualname(klass):
...     return f'{klass.__module__}.{klass.__qualname__}'
>>> print(full_name_with_name(dummy.One))  # Correct
dummy.One
>>> print(full_name_with_name(dummy.One.Two))  # Wrong
dummy.Two
>>> print(full_name_with_qualname(dummy.One))  # Correct
dummy.One
>>> print(full_name_with_qualname(dummy.One.Two))  # Correct
dummy.One.Two

Note, it also works correctly for buildins:

>>> print(full_name_with_qualname(print))
builtins.print
>>> import builtins
>>> builtins.print
<built-in function print>

回答 8

由于本主题的兴趣是获取完全限定的名称,因此在将相对导入与同一软件包中现有的主模块一起使用时,会出现一个陷阱。例如,使用以下模块设置:

$ cat /tmp/fqname/foo/__init__.py
$ cat /tmp/fqname/foo/bar.py
from baz import Baz
print Baz.__module__
$ cat /tmp/fqname/foo/baz.py
class Baz: pass
$ cat /tmp/fqname/main.py
import foo.bar
from foo.baz import Baz
print Baz.__module__
$ cat /tmp/fqname/foo/hum.py
import bar
import foo.bar

这是显示不同导入同一模块的结果的输出:

$ export PYTHONPATH=/tmp/fqname
$ python /tmp/fqname/main.py
foo.baz
foo.baz
$ python /tmp/fqname/foo/bar.py
baz
$ python /tmp/fqname/foo/hum.py
baz
foo.baz

当嗡嗡声使用相对路径导入bar时,bar会看到 Baz.__module__只是“ baz”,但是在第二次使用全名的导入中,bar却看到与“ foo.baz”相同。

如果要在某处保留标准名称,则最好避免这些类的相对导入。

Since the interest of this topic is to get fully qualified names, here is a pitfall that occurs when using relative imports along with the main module existing in the same package. E.g., with the below module setup:

$ cat /tmp/fqname/foo/__init__.py
$ cat /tmp/fqname/foo/bar.py
from baz import Baz
print Baz.__module__
$ cat /tmp/fqname/foo/baz.py
class Baz: pass
$ cat /tmp/fqname/main.py
import foo.bar
from foo.baz import Baz
print Baz.__module__
$ cat /tmp/fqname/foo/hum.py
import bar
import foo.bar

Here is the output showing the result of importing the same module differently:

$ export PYTHONPATH=/tmp/fqname
$ python /tmp/fqname/main.py
foo.baz
foo.baz
$ python /tmp/fqname/foo/bar.py
baz
$ python /tmp/fqname/foo/hum.py
baz
foo.baz

When hum imports bar using relative path, bar sees Baz.__module__ as just “baz”, but in the second import that uses full name, bar sees the same as “foo.baz”.

If you are persisting the fully-qualified names somewhere, it is better to avoid relative imports for those classes.


回答 9

这里没有答案对我有用。就我而言,我使用的是Python 2.7,并且知道我只会使用newstyle object类。

def get_qualified_python_name_from_class(model):
    c = model.__class__.__mro__[0]
    name = c.__module__ + "." + c.__name__
    return name

None of the answers here worked for me. In my case, I was using Python 2.7 and knew that I would only be working with newstyle object classes.

def get_qualified_python_name_from_class(model):
    c = model.__class__.__mro__[0]
    name = c.__module__ + "." + c.__name__
    return name

构造tkinter应用程序的最佳方法?

问题:构造tkinter应用程序的最佳方法?

以下是我典型的python tkinter程序的整体结构。

def funA():
    def funA1():
        def funA12():
            # stuff

    def funA2():
        # stuff

def funB():
    def funB1():
        # stuff

    def funB2():
        # stuff

def funC():
    def funC1():
        # stuff

    def funC2():
        # stuff


root = tk.Tk()

button1 = tk.Button(root, command=funA)
button1.pack()
button2 = tk.Button(root, command=funB)
button2.pack()
button3 = tk.Button(root, command=funC)
button3.pack()

funA funB并在用户单击按钮1、2、3时funC打开另一个Toplevel带有窗口小部件的窗口。

我想知道这是否是编写python tkinter程序的正确方法吗?当然,即使我这样写也可以,但这是最好的方法吗?这听起来很愚蠢,但是当我看到其他人编写的代码时,他们的代码并没有弄乱一堆函数,而且大多数情况下它们都有类。

有没有作为良好实践应遵循的特定结构?开始编写python程序之前,我应该如何计划?

我知道编程中没有最佳实践之类的东西,我也不需要。在我自己学习Python时,我只想一些建议和解释就可以使我保持正确的方向。

The following is the overall structure of my typical python tkinter program.

def funA():
    def funA1():
        def funA12():
            # stuff

    def funA2():
        # stuff

def funB():
    def funB1():
        # stuff

    def funB2():
        # stuff

def funC():
    def funC1():
        # stuff

    def funC2():
        # stuff


root = tk.Tk()

button1 = tk.Button(root, command=funA)
button1.pack()
button2 = tk.Button(root, command=funB)
button2.pack()
button3 = tk.Button(root, command=funC)
button3.pack()

funA funB and funC will bring up another Toplevel windows with widgets when user click on button 1, 2, 3.

I am wondering if this is the right way to write a python tkinter program? Sure, it will work even if I write this way, but is it the best way? It sounds stupid but when I see the codes other people written, their code is not messed up with bunch of functions and mostly they have classes.

Is there any specific structure that we should follow as good practice? How should I plan before start writing a python program?

I know there is no such thing as best practice in programming and I am not asking for it either. I just want some advice and explanations to keep me on the right direction as I am learning Python by myself.


回答 0

我主张一种面向对象的方法。这是我开始的模板:

# Use Tkinter for python 2, tkinter for python 3
import tkinter as tk

class MainApplication(tk.Frame):
    def __init__(self, parent, *args, **kwargs):
        tk.Frame.__init__(self, parent, *args, **kwargs)
        self.parent = parent

        <create the rest of your GUI here>

if __name__ == "__main__":
    root = tk.Tk()
    MainApplication(root).pack(side="top", fill="both", expand=True)
    root.mainloop()

需要注意的重要事项是:

  • 我不使用通配符导入。我将软件包导入为“ tk”,这要求我在所有命令前加上tk.。这样可以防止全局命名空间污染,并且在使用Tkinter类,ttk类或您自己的某些类时,使代码完全显而易见。

  • 主要应用是一类。这为您的所有回调和私有函数提供了私有命名空间,并且通常使组织代码更容易。在过程样式中,您必须自上而下进行编码,在使用函数之前定义函数等。使用此方法,您无需真正地在最后一步之前创建主窗口。我更喜欢从中继承,tk.Frame因为我通常从创建框架开始,但这绝不是必需的。

如果您的应用程序具有其他顶级窗口,建议您将每个窗口都设为一个单独的类,并从继承tk.Toplevel。这为您提供了上述所有相同的优点-窗口是原子的,它们具有自己的命名空间,并且代码井井有条。另外,一旦代码开始变大,就可以很容易地将每个模块放入自己的模块中。

最后,您可能要考虑对接口的每个主要部分使用类。例如,如果您要创建一个带有工具栏,导航窗格,状态栏和主区域的应用程序,则可以使每个类成为一个类。这使您的主要代码非常小,易于理解:

class Navbar(tk.Frame): ...
class Toolbar(tk.Frame): ...
class Statusbar(tk.Frame): ...
class Main(tk.Frame): ...

class MainApplication(tk.Frame):
    def __init__(self, parent, *args, **kwargs):
        tk.Frame.__init__(self, parent, *args, **kwargs)
        self.statusbar = Statusbar(self, ...)
        self.toolbar = Toolbar(self, ...)
        self.navbar = Navbar(self, ...)
        self.main = Main(self, ...)

        self.statusbar.pack(side="bottom", fill="x")
        self.toolbar.pack(side="top", fill="x")
        self.navbar.pack(side="left", fill="y")
        self.main.pack(side="right", fill="both", expand=True)

由于所有这些实例共享一个公共父对象,因此该父对象实际上成为了模型-视图-控制器体系结构的“控制器”部分。因此,例如,主窗口可以通过调用在状态栏上放置一些内容self.parent.statusbar.set("Hello, world")。这使您可以在组件之间定义一个简单的接口,从而有助于保持最小的耦合。

I advocate an object oriented approach. This is the template that I start out with:

# Use Tkinter for python 2, tkinter for python 3
import tkinter as tk

class MainApplication(tk.Frame):
    def __init__(self, parent, *args, **kwargs):
        tk.Frame.__init__(self, parent, *args, **kwargs)
        self.parent = parent

        <create the rest of your GUI here>

if __name__ == "__main__":
    root = tk.Tk()
    MainApplication(root).pack(side="top", fill="both", expand=True)
    root.mainloop()

The important things to notice are:

  • I don’t use a wildcard import. I import the package as “tk”, which requires that I prefix all commands with tk.. This prevents global namespace pollution, plus it makes the code completely obvious when you are using Tkinter classes, ttk classes, or some of your own.

  • The main application is a class. This gives you a private namespace for all of your callbacks and private functions, and just generally makes it easier to organize your code. In a procedural style you have to code top-down, defining functions before using them, etc. With this method you don’t since you don’t actually create the main window until the very last step. I prefer inheriting from tk.Frame just because I typically start by creating a frame, but it is by no means necessary.

If your app has additional toplevel windows, I recommend making each of those a separate class, inheriting from tk.Toplevel. This gives you all of the same advantages mentioned above — the windows are atomic, they have their own namespace, and the code is well organized. Plus, it makes it easy to put each into its own module once the code starts to get large.

Finally, you might want to consider using classes for every major portion of your interface. For example, if you’re creating an app with a toolbar, a navigation pane, a statusbar, and a main area, you could make each one of those classes. This makes your main code quite small and easy to understand:

class Navbar(tk.Frame): ...
class Toolbar(tk.Frame): ...
class Statusbar(tk.Frame): ...
class Main(tk.Frame): ...

class MainApplication(tk.Frame):
    def __init__(self, parent, *args, **kwargs):
        tk.Frame.__init__(self, parent, *args, **kwargs)
        self.statusbar = Statusbar(self, ...)
        self.toolbar = Toolbar(self, ...)
        self.navbar = Navbar(self, ...)
        self.main = Main(self, ...)

        self.statusbar.pack(side="bottom", fill="x")
        self.toolbar.pack(side="top", fill="x")
        self.navbar.pack(side="left", fill="y")
        self.main.pack(side="right", fill="both", expand=True)

Since all of those instances share a common parent, the parent effectively becomes the “controller” part of a model-view-controller architecture. So, for example, the main window could place something on the statusbar by calling self.parent.statusbar.set("Hello, world"). This allows you to define a simple interface between the components, helping to keep coupling to a minimun.


回答 1

将每个顶级窗口放入自己的单独类中,可以使代码重用并更好地组织代码。窗口中存在的任何按钮和相关方法都应在此类内定义。这是一个示例(从此处获取):

import tkinter as tk

class Demo1:
    def __init__(self, master):
        self.master = master
        self.frame = tk.Frame(self.master)
        self.button1 = tk.Button(self.frame, text = 'New Window', width = 25, command = self.new_window)
        self.button1.pack()
        self.frame.pack()
    def new_window(self):
        self.newWindow = tk.Toplevel(self.master)
        self.app = Demo2(self.newWindow)

class Demo2:
    def __init__(self, master):
        self.master = master
        self.frame = tk.Frame(self.master)
        self.quitButton = tk.Button(self.frame, text = 'Quit', width = 25, command = self.close_windows)
        self.quitButton.pack()
        self.frame.pack()
    def close_windows(self):
        self.master.destroy()

def main(): 
    root = tk.Tk()
    app = Demo1(root)
    root.mainloop()

if __name__ == '__main__':
    main()

另请参阅:

希望有帮助。

Putting each of your top-level windows into it’s own separate class gives you code re-use and better code organization. Any buttons and relevant methods that are present in the window should be defined inside this class. Here’s an example (taken from here):

import tkinter as tk

class Demo1:
    def __init__(self, master):
        self.master = master
        self.frame = tk.Frame(self.master)
        self.button1 = tk.Button(self.frame, text = 'New Window', width = 25, command = self.new_window)
        self.button1.pack()
        self.frame.pack()
    def new_window(self):
        self.newWindow = tk.Toplevel(self.master)
        self.app = Demo2(self.newWindow)

class Demo2:
    def __init__(self, master):
        self.master = master
        self.frame = tk.Frame(self.master)
        self.quitButton = tk.Button(self.frame, text = 'Quit', width = 25, command = self.close_windows)
        self.quitButton.pack()
        self.frame.pack()
    def close_windows(self):
        self.master.destroy()

def main(): 
    root = tk.Tk()
    app = Demo1(root)
    root.mainloop()

if __name__ == '__main__':
    main()

Also see:

Hope that helps.


回答 2

这不是一个坏结构。它会很好地工作。但是,当某人单击按钮或其他内容时,您必须在函数中具有执行命令的功能

因此,您可以为这些编写类,然后在该类中具有处理按钮单击等命令的方法。

这是一个例子:

import tkinter as tk

class Window1:
    def __init__(self, master):
        pass
        # Create labels, entries,buttons
    def button_click(self):
        pass
        # If button is clicked, run this method and open window 2


class Window2:
    def __init__(self, master):
        #create buttons,entries,etc

    def button_method(self):
        #run this when button click to close window
        self.master.destroy()

def main(): #run mianloop 
    root = tk.Tk()
    app = Window1(root)
    root.mainloop()

if __name__ == '__main__':
    main()

通常带有多个窗口的tk程序是多个大类,并且在__init__所有条目中创建标签等,然后每种方法都将处理按钮单击事件

只要有可读性,实际上就没有正确的方法,只要它对您有用并且可以完成工作,并且您可以轻松地解释它,因为如果您不能轻松地解释您的程序,那么可能会有更好的方法。

看一看《Tkinter中的思考》

This isn’t a bad structure; it will work just fine. However, you do have to have functions in a function to do commands when someone clicks on a button or something

So what you could do is write classes for these then have methods in the class that handle commands for the button clicks and such.

Here’s an example:

import tkinter as tk

class Window1:
    def __init__(self, master):
        pass
        # Create labels, entries,buttons
    def button_click(self):
        pass
        # If button is clicked, run this method and open window 2


class Window2:
    def __init__(self, master):
        #create buttons,entries,etc

    def button_method(self):
        #run this when button click to close window
        self.master.destroy()

def main(): #run mianloop 
    root = tk.Tk()
    app = Window1(root)
    root.mainloop()

if __name__ == '__main__':
    main()

Usually tk programs with multiple windows are multiple big classes and in the __init__ all the entries, labels etc are created and then each method is to handle button click events

There isn’t really a right way to do it, whatever works for you and gets the job done as long as its readable and you can easily explain it because if you cant easily explain your program, there probably is a better way to do it.

Take a look at Thinking in Tkinter.


回答 3

OOP应该是方法,frame应该是类变量而不是实例变量

from Tkinter import *
class App:
  def __init__(self, master):
    frame = Frame(master)
    frame.pack()
    self.button = Button(frame, 
                         text="QUIT", fg="red",
                         command=frame.quit)
    self.button.pack(side=LEFT)
    self.slogan = Button(frame,
                         text="Hello",
                         command=self.write_slogan)
    self.slogan.pack(side=LEFT)
  def write_slogan(self):
    print "Tkinter is easy to use!"

root = Tk()
app = App(root)
root.mainloop()

参考:http : //www.python-course.eu/tkinter_buttons.php

OOP should be the approach and frame should be a class variable instead of instance variable.

from Tkinter import *
class App:
  def __init__(self, master):
    frame = Frame(master)
    frame.pack()
    self.button = Button(frame, 
                         text="QUIT", fg="red",
                         command=frame.quit)
    self.button.pack(side=LEFT)
    self.slogan = Button(frame,
                         text="Hello",
                         command=self.write_slogan)
    self.slogan.pack(side=LEFT)
  def write_slogan(self):
    print "Tkinter is easy to use!"

root = Tk()
app = App(root)
root.mainloop()

Reference: http://www.python-course.eu/tkinter_buttons.php


回答 4

使用类对应用程序进行组织可以使您和与您一起工作的其他人轻松调试问题并轻松改进应用程序。

您可以像这样轻松地组织您的应用程序:

class hello(Tk):
    def __init__(self):
        super(hello, self).__init__()
        self.btn = Button(text = "Click me", command=close)
        self.btn.pack()
    def close():
        self.destroy()

app = hello()
app.mainloop()

Organizing your application using class make it easy to you and others who work with you to debug problems and improve the app easily.

You can easily organize your application like this:

class hello(Tk):
    def __init__(self):
        super(hello, self).__init__()
        self.btn = Button(text = "Click me", command=close)
        self.btn.pack()
    def close():
        self.destroy()

app = hello()
app.mainloop()

回答 5

学习如何构建程序的最佳方法可能是阅读他人的代码,尤其是如果这是许多人都为之贡献的大型程序。在查看了许多项目的代码之后,您应该了解共识样式应该是什么。

作为一种语言,Python的特殊之处在于,在如何格式化代码方面存在一些严格的指导原则。第一个是所谓的“ Python禅”:

  • 美丽胜于丑陋。
  • 显式胜于隐式。
  • 简单胜于复杂。
  • 复杂胜于复杂。
  • 扁平比嵌套更好。
  • 稀疏胜于密集。
  • 可读性很重要。
  • 特殊情况还不足以打破规则。
  • 尽管实用性胜过纯度。
  • 错误绝不能默默传递。
  • 除非明确地保持沉默。
  • 面对模棱两可的想法,拒绝猜测的诱惑。
  • 应该有一种-最好只有一种-显而易见的方法。
  • 尽管除非您是荷兰人,否则一开始这种方式可能并不明显。
  • 现在总比没有好。
  • 虽然从未往往比了。
  • 如果实现难以解释,那是个坏主意。
  • 如果实现易于解释,则可能是个好主意。
  • 命名空间是一个很棒的主意-让我们做更多这些吧!

在更实际的水平上,有Python的样式指南PEP8

考虑到这些,我会说您的代码风格并不适合,特别是嵌套函数。通过使用类或将它们移到单独的模块中,找到一种解决方案。这将使程序的结构更容易理解。

Probably the best way to learn how to structure your program is by reading other people’s code, especially if it’s a large program to which many people have contributed. After looking at the code of many projects, you should get an idea of what the consensus style should be.

Python, as a language, is special in that there are some strong guidelines as to how you should format your code. The first is the so-called “Zen of Python”:

  • Beautiful is better than ugly.
  • Explicit is better than implicit.
  • Simple is better than complex.
  • Complex is better than complicated.
  • Flat is better than nested.
  • Sparse is better than dense.
  • Readability counts.
  • Special cases aren’t special enough to break the rules.
  • Although practicality beats purity.
  • Errors should never pass silently.
  • Unless explicitly silenced.
  • In the face of ambiguity, refuse the temptation to guess.
  • There should be one– and preferably only one –obvious way to do it.
  • Although that way may not be obvious at first unless you’re Dutch.
  • Now is better than never.
  • Although never is often better than right now.
  • If the implementation is hard to explain, it’s a bad idea.
  • If the implementation is easy to explain, it may be a good idea.
  • Namespaces are one honking great idea — let’s do more of those!

On a more practical level, there is PEP8, the style guide for Python.

With those in mind, I would say that your code style doesn’t really fit, particularly the nested functions. Find a way to flatten those out, either by using classes or moving them into separate modules. This will make the structure of your program much easier to understand.


回答 6

我个人不使用面向对象的方法,主要是因为a)只会妨碍;b)您永远不会将其作为模块重复使用。

但是这里没有讨论的是必须使用线程或多处理。总是。否则您的应用程序将很糟糕。

只需做一个简单的测试:启动一个窗口,然后获取一些URL或其他内容。所做的更改是在网络请求发生时不会更新您的用户界面。意思是,您的应用程序窗口将被破坏。取决于您所使用的操作系统,但是大多数情况下,它不会重绘,在窗口上拖动的任何内容都将粘贴在它上面,直到该过程返回到TK mainloop。

I personally do not use the objected oriented approach, mostly because it a) only get in the way; b) you will never reuse that as a module.

but something that is not discussed here, is that you must use threading or multiprocessing. Always. otherwise your application will be awful.

just do a simple test: start a window, and then fetch some URL or anything else. changes are your UI will not be updated while the network request is happening. Meaning, your application window will be broken. depend on the OS you are on, but most times, it will not redraw, anything you drag over the window will be plastered on it, until the process is back to the TK mainloop.


键盘中断与python的多处理池

问题:键盘中断与python的多处理池

如何使用python的多处理池处理KeyboardInterrupt事件?这是一个简单的示例:

from multiprocessing import Pool
from time import sleep
from sys import exit

def slowly_square(i):
    sleep(1)
    return i*i

def go():
    pool = Pool(8)
    try:
        results = pool.map(slowly_square, range(40))
    except KeyboardInterrupt:
        # **** THIS PART NEVER EXECUTES. ****
        pool.terminate()
        print "You cancelled the program!"
        sys.exit(1)
    print "\nFinally, here are the results: ", results

if __name__ == "__main__":
    go()

当运行上面的代码时,KeyboardInterrupt按时会引发^C,但是该过程只是在此时挂起,我必须在外部将其杀死。

我希望能够随时按下^C并使所有进程正常退出。

How can I handle KeyboardInterrupt events with python’s multiprocessing Pools? Here is a simple example:

from multiprocessing import Pool
from time import sleep
from sys import exit

def slowly_square(i):
    sleep(1)
    return i*i

def go():
    pool = Pool(8)
    try:
        results = pool.map(slowly_square, range(40))
    except KeyboardInterrupt:
        # **** THIS PART NEVER EXECUTES. ****
        pool.terminate()
        print "You cancelled the program!"
        sys.exit(1)
    print "\nFinally, here are the results: ", results

if __name__ == "__main__":
    go()

When running the code above, the KeyboardInterrupt gets raised when I press ^C, but the process simply hangs at that point and I have to kill it externally.

I want to be able to press ^C at any time and cause all of the processes to exit gracefully.


回答 0

这是一个Python错误。等待threading.Condition.wait()中的条件时,从不发送KeyboardInterrupt。复制:

import threading
cond = threading.Condition(threading.Lock())
cond.acquire()
cond.wait(None)
print "done"

直到wait()返回,才会传递KeyboardInterrupt异常,并且它永远不会返回,因此中断永远不会发生。KeyboardInterrupt几乎应该可以中断条件等待。

请注意,如果指定了超时,则不会发生这种情况。cond.wait(1)将立即收到中断。因此,一种解决方法是指定超时。为此,请更换

    results = pool.map(slowly_square, range(40))

    results = pool.map_async(slowly_square, range(40)).get(9999999)

或类似。

This is a Python bug. When waiting for a condition in threading.Condition.wait(), KeyboardInterrupt is never sent. Repro:

import threading
cond = threading.Condition(threading.Lock())
cond.acquire()
cond.wait(None)
print "done"

The KeyboardInterrupt exception won’t be delivered until wait() returns, and it never returns, so the interrupt never happens. KeyboardInterrupt should almost certainly interrupt a condition wait.

Note that this doesn’t happen if a timeout is specified; cond.wait(1) will receive the interrupt immediately. So, a workaround is to specify a timeout. To do that, replace

    results = pool.map(slowly_square, range(40))

with

    results = pool.map_async(slowly_square, range(40)).get(9999999)

or similar.


回答 1

从我最近发现的情况来看,最好的解决方案是设置工作进程完全忽略SIGINT,并将所有清理代码限制在父进程中。这可以解决空闲和繁忙的工作进程的问题,并且在子进程中不需要错误处理代码。

import signal

...

def init_worker():
    signal.signal(signal.SIGINT, signal.SIG_IGN)

...

def main()
    pool = multiprocessing.Pool(size, init_worker)

    ...

    except KeyboardInterrupt:
        pool.terminate()
        pool.join()

解释和完整的示例代码分别位于http://noswap.com/blog/python-multiprocessing-keyboardinterrupt/http://github.com/jreese/multiprocessing-keyboardinterrupt

From what I have recently found, the best solution is to set up the worker processes to ignore SIGINT altogether, and confine all the cleanup code to the parent process. This fixes the problem for both idle and busy worker processes, and requires no error handling code in your child processes.

import signal

...

def init_worker():
    signal.signal(signal.SIGINT, signal.SIG_IGN)

...

def main()
    pool = multiprocessing.Pool(size, init_worker)

    ...

    except KeyboardInterrupt:
        pool.terminate()
        pool.join()

Explanation and full example code can be found at http://noswap.com/blog/python-multiprocessing-keyboardinterrupt/ and http://github.com/jreese/multiprocessing-keyboardinterrupt respectively.


回答 2

由于某些原因,仅Exception可正常处理从基类继承的异常。作为一种变通方法,你可能会重新提高你KeyboardInterrupt作为一个Exception实例:

from multiprocessing import Pool
import time

class KeyboardInterruptError(Exception): pass

def f(x):
    try:
        time.sleep(x)
        return x
    except KeyboardInterrupt:
        raise KeyboardInterruptError()

def main():
    p = Pool(processes=4)
    try:
        print 'starting the pool map'
        print p.map(f, range(10))
        p.close()
        print 'pool map complete'
    except KeyboardInterrupt:
        print 'got ^C while pool mapping, terminating the pool'
        p.terminate()
        print 'pool is terminated'
    except Exception, e:
        print 'got exception: %r, terminating the pool' % (e,)
        p.terminate()
        print 'pool is terminated'
    finally:
        print 'joining pool processes'
        p.join()
        print 'join complete'
    print 'the end'

if __name__ == '__main__':
    main()

通常,您将获得以下输出:

staring the pool map
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
pool map complete
joining pool processes
join complete
the end

因此,如果您点击^C,您将获得:

staring the pool map
got ^C while pool mapping, terminating the pool
pool is terminated
joining pool processes
join complete
the end

For some reasons, only exceptions inherited from the base Exception class are handled normally. As a workaround, you may re-raise your KeyboardInterrupt as an Exception instance:

from multiprocessing import Pool
import time

class KeyboardInterruptError(Exception): pass

def f(x):
    try:
        time.sleep(x)
        return x
    except KeyboardInterrupt:
        raise KeyboardInterruptError()

def main():
    p = Pool(processes=4)
    try:
        print 'starting the pool map'
        print p.map(f, range(10))
        p.close()
        print 'pool map complete'
    except KeyboardInterrupt:
        print 'got ^C while pool mapping, terminating the pool'
        p.terminate()
        print 'pool is terminated'
    except Exception, e:
        print 'got exception: %r, terminating the pool' % (e,)
        p.terminate()
        print 'pool is terminated'
    finally:
        print 'joining pool processes'
        p.join()
        print 'join complete'
    print 'the end'

if __name__ == '__main__':
    main()

Normally you would get the following output:

staring the pool map
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
pool map complete
joining pool processes
join complete
the end

So if you hit ^C, you will get:

staring the pool map
got ^C while pool mapping, terminating the pool
pool is terminated
joining pool processes
join complete
the end

回答 3

通常这种简单的结构工程CtrlC上池:

def signal_handle(_signal, frame):
    print "Stopping the Jobs."

signal.signal(signal.SIGINT, signal_handle)

如几篇类似文章所述:

无需尝试即可在Python中捕获键盘中断

Usually this simple structure works for CtrlC on Pool :

def signal_handle(_signal, frame):
    print "Stopping the Jobs."

signal.signal(signal.SIGINT, signal_handle)

As was stated in few similar posts:

Capture keyboardinterrupt in Python without try-except


回答 4

投票表决的答案不能解决核心问题,但具有类似的副作用。

多重处理库的作者Jesse Noller解释了multiprocessing.Pool在旧博客中使用CTRL + C时如何正确处理。

import signal
from multiprocessing import Pool


def initializer():
    """Ignore CTRL+C in the worker process."""
    signal.signal(signal.SIGINT, signal.SIG_IGN)


pool = Pool(initializer=initializer)

try:
    pool.map(perform_download, dowloads)
except KeyboardInterrupt:
    pool.terminate()
    pool.join()

The voted answer does not tackle the core issue but a similar side effect.

Jesse Noller, the author of the multiprocessing library, explains how to correctly deal with CTRL+C when using multiprocessing.Pool in a old blog post.

import signal
from multiprocessing import Pool


def initializer():
    """Ignore CTRL+C in the worker process."""
    signal.signal(signal.SIGINT, signal.SIG_IGN)


pool = Pool(initializer=initializer)

try:
    pool.map(perform_download, dowloads)
except KeyboardInterrupt:
    pool.terminate()
    pool.join()

回答 5

似乎有两个问题使多处理过程变得异常烦人。第一个(由Glenn指出)是您需要使用map_async超时而不是map为了获得即时响应(即,不要完成对整个列表的处理)。第二点(Andrey指出)是,多处理不会捕获不继承自Exception(例如SystemExit)的异常。所以这是我的解决方案,涉及这两个方面:

import sys
import functools
import traceback
import multiprocessing

def _poolFunctionWrapper(function, arg):
    """Run function under the pool

    Wrapper around function to catch exceptions that don't inherit from
    Exception (which aren't caught by multiprocessing, so that you end
    up hitting the timeout).
    """
    try:
        return function(arg)
    except:
        cls, exc, tb = sys.exc_info()
        if issubclass(cls, Exception):
            raise # No worries
        # Need to wrap the exception with something multiprocessing will recognise
        import traceback
        print "Unhandled exception %s (%s):\n%s" % (cls.__name__, exc, traceback.format_exc())
        raise Exception("Unhandled exception: %s (%s)" % (cls.__name__, exc))

def _runPool(pool, timeout, function, iterable):
    """Run the pool

    Wrapper around pool.map_async, to handle timeout.  This is required so as to
    trigger an immediate interrupt on the KeyboardInterrupt (Ctrl-C); see
    http://stackoverflow.com/questions/1408356/keyboard-interrupts-with-pythons-multiprocessing-pool

    Further wraps the function in _poolFunctionWrapper to catch exceptions
    that don't inherit from Exception.
    """
    return pool.map_async(functools.partial(_poolFunctionWrapper, function), iterable).get(timeout)

def myMap(function, iterable, numProcesses=1, timeout=9999):
    """Run the function on the iterable, optionally with multiprocessing"""
    if numProcesses > 1:
        pool = multiprocessing.Pool(processes=numProcesses, maxtasksperchild=1)
        mapFunc = functools.partial(_runPool, pool, timeout)
    else:
        pool = None
        mapFunc = map
    results = mapFunc(function, iterable)
    if pool is not None:
        pool.close()
        pool.join()
    return results

It seems there are two issues that make exceptions while multiprocessing annoying. The first (noted by Glenn) is that you need to use map_async with a timeout instead of map in order to get an immediate response (i.e., don’t finish processing the entire list). The second (noted by Andrey) is that multiprocessing doesn’t catch exceptions that don’t inherit from Exception (e.g., SystemExit). So here’s my solution that deals with both of these:

import sys
import functools
import traceback
import multiprocessing

def _poolFunctionWrapper(function, arg):
    """Run function under the pool

    Wrapper around function to catch exceptions that don't inherit from
    Exception (which aren't caught by multiprocessing, so that you end
    up hitting the timeout).
    """
    try:
        return function(arg)
    except:
        cls, exc, tb = sys.exc_info()
        if issubclass(cls, Exception):
            raise # No worries
        # Need to wrap the exception with something multiprocessing will recognise
        import traceback
        print "Unhandled exception %s (%s):\n%s" % (cls.__name__, exc, traceback.format_exc())
        raise Exception("Unhandled exception: %s (%s)" % (cls.__name__, exc))

def _runPool(pool, timeout, function, iterable):
    """Run the pool

    Wrapper around pool.map_async, to handle timeout.  This is required so as to
    trigger an immediate interrupt on the KeyboardInterrupt (Ctrl-C); see
    http://stackoverflow.com/questions/1408356/keyboard-interrupts-with-pythons-multiprocessing-pool

    Further wraps the function in _poolFunctionWrapper to catch exceptions
    that don't inherit from Exception.
    """
    return pool.map_async(functools.partial(_poolFunctionWrapper, function), iterable).get(timeout)

def myMap(function, iterable, numProcesses=1, timeout=9999):
    """Run the function on the iterable, optionally with multiprocessing"""
    if numProcesses > 1:
        pool = multiprocessing.Pool(processes=numProcesses, maxtasksperchild=1)
        mapFunc = functools.partial(_runPool, pool, timeout)
    else:
        pool = None
        mapFunc = map
    results = mapFunc(function, iterable)
    if pool is not None:
        pool.close()
        pool.join()
    return results

回答 6

我发现目前最好的解决方案是不使用multiprocessing.pool功能,而是使用自己的池功能。我提供了一个使用apply_async演示该错误的示例,以及一个示例,展示了如何避免完全使用池功能。

http://www.bryceboe.com/2010/08/26/python-multiprocessing-and-keyboardinterrupt/

I found, for the time being, the best solution is to not use the multiprocessing.pool feature but rather roll your own pool functionality. I provided an example demonstrating the error with apply_async as well as an example showing how to avoid using the pool functionality altogether.

http://www.bryceboe.com/2010/08/26/python-multiprocessing-and-keyboardinterrupt/


回答 7

我是Python的新手。我到处都在寻找答案,却偶然发现了这个以及其他一些博客和YouTube视频。我试图将粘贴作者的代码复制到上面,并在Windows 7 64位的python 2.7.13上重现它。这接近我想要实现的目标。

我使我的子进程忽略ControlC,并使父进程终止。似乎绕过子进程确实为我避免了这个问题。

#!/usr/bin/python

from multiprocessing import Pool
from time import sleep
from sys import exit


def slowly_square(i):
    try:
        print "<slowly_square> Sleeping and later running a square calculation..."
        sleep(1)
        return i * i
    except KeyboardInterrupt:
        print "<child processor> Don't care if you say CtrlC"
        pass


def go():
    pool = Pool(8)

    try:
        results = pool.map(slowly_square, range(40))
    except KeyboardInterrupt:
        pool.terminate()
        pool.close()
        print "You cancelled the program!"
        exit(1)
    print "Finally, here are the results", results


if __name__ == '__main__':
    go()

从头开始的那部分pool.terminate()似乎永远不会执行。

I’m a newbie in Python. I was looking everywhere for answer and stumble upon this and a few other blogs and youtube videos. I have tried to copy paste the author’s code above and reproduce it on my python 2.7.13 in windows 7 64- bit. It’s close to what I wanna achieve.

I made my child processes to ignore the ControlC and make the parent process terminate. Looks like bypassing the child process does avoid this problem for me.

#!/usr/bin/python

from multiprocessing import Pool
from time import sleep
from sys import exit


def slowly_square(i):
    try:
        print "<slowly_square> Sleeping and later running a square calculation..."
        sleep(1)
        return i * i
    except KeyboardInterrupt:
        print "<child processor> Don't care if you say CtrlC"
        pass


def go():
    pool = Pool(8)

    try:
        results = pool.map(slowly_square, range(40))
    except KeyboardInterrupt:
        pool.terminate()
        pool.close()
        print "You cancelled the program!"
        exit(1)
    print "Finally, here are the results", results


if __name__ == '__main__':
    go()

The part starting at pool.terminate() never seems to execute.


回答 8

您可以尝试使用Pool对象的apply_async方法,如下所示:

import multiprocessing
import time
from datetime import datetime


def test_func(x):
    time.sleep(2)
    return x**2


def apply_multiprocessing(input_list, input_function):
    pool_size = 5
    pool = multiprocessing.Pool(processes=pool_size, maxtasksperchild=10)

    try:
        jobs = {}
        for value in input_list:
            jobs[value] = pool.apply_async(input_function, [value])

        results = {}
        for value, result in jobs.items():
            try:
                results[value] = result.get()
            except KeyboardInterrupt:
                print "Interrupted by user"
                pool.terminate()
                break
            except Exception as e:
                results[value] = e
        return results
    except Exception:
        raise
    finally:
        pool.close()
        pool.join()


if __name__ == "__main__":
    iterations = range(100)
    t0 = datetime.now()
    results1 = apply_multiprocessing(iterations, test_func)
    t1 = datetime.now()
    print results1
    print "Multi: {}".format(t1 - t0)

    t2 = datetime.now()
    results2 = {i: test_func(i) for i in iterations}
    t3 = datetime.now()
    print results2
    print "Non-multi: {}".format(t3 - t2)

输出:

100
Multiprocessing run time: 0:00:41.131000
100
Non-multiprocessing run time: 0:03:20.688000

此方法的优点是中断之前处理的结果将返回到结果字典中:

>>> apply_multiprocessing(range(100), test_func)
Interrupted by user
{0: 0, 1: 1, 2: 4, 3: 9, 4: 16, 5: 25}

You can try using the apply_async method of a Pool object, like this:

import multiprocessing
import time
from datetime import datetime


def test_func(x):
    time.sleep(2)
    return x**2


def apply_multiprocessing(input_list, input_function):
    pool_size = 5
    pool = multiprocessing.Pool(processes=pool_size, maxtasksperchild=10)

    try:
        jobs = {}
        for value in input_list:
            jobs[value] = pool.apply_async(input_function, [value])

        results = {}
        for value, result in jobs.items():
            try:
                results[value] = result.get()
            except KeyboardInterrupt:
                print "Interrupted by user"
                pool.terminate()
                break
            except Exception as e:
                results[value] = e
        return results
    except Exception:
        raise
    finally:
        pool.close()
        pool.join()


if __name__ == "__main__":
    iterations = range(100)
    t0 = datetime.now()
    results1 = apply_multiprocessing(iterations, test_func)
    t1 = datetime.now()
    print results1
    print "Multi: {}".format(t1 - t0)

    t2 = datetime.now()
    results2 = {i: test_func(i) for i in iterations}
    t3 = datetime.now()
    print results2
    print "Non-multi: {}".format(t3 - t2)

Output:

100
Multiprocessing run time: 0:00:41.131000
100
Non-multiprocessing run time: 0:03:20.688000

An advantage of this method is that results processed before interruption will be returned in the results dictionary:

>>> apply_multiprocessing(range(100), test_func)
Interrupted by user
{0: 0, 1: 1, 2: 4, 3: 9, 4: 16, 5: 25}

回答 9

奇怪的是,您似乎也必须处理KeyboardInterrupt孩子中的孩子。我本来希望它能像写的那样工作…尝试更改slowly_square为:

def slowly_square(i):
    try:
        sleep(1)
        return i * i
    except KeyboardInterrupt:
        print 'You EVIL bastard!'
        return 0

那应该可以按您预期的那样工作。

Strangely enough it looks like you have to handle the KeyboardInterrupt in the children as well. I would have expected this to work as written… try changing slowly_square to:

def slowly_square(i):
    try:
        sleep(1)
        return i * i
    except KeyboardInterrupt:
        print 'You EVIL bastard!'
        return 0

That should work as you expected.


从Matplotlib的颜色图中获取单个颜色

问题:从Matplotlib的颜色图中获取单个颜色

cmap例如,如果您有一个Colormap :

cmap = matplotlib.cm.get_cmap('Spectral')

如何从0到1之间获得特定的颜色,其中0是地图中的第一种颜色,而1是地图中的最后一种颜色?

理想情况下,我可以通过执行以下操作来获得地图中的中间颜色:

>>> do_some_magic(cmap, 0.5) # Return an RGBA tuple
(0.1, 0.2, 0.3, 1.0)

If you have a Colormap cmap, for example:

cmap = matplotlib.cm.get_cmap('Spectral')

How can you get a particular colour out of it between 0 and 1, where 0 is the first colour in the map and 1 is the last colour in the map?

Ideally, I would be able to get the middle colour in the map by doing:

>>> do_some_magic(cmap, 0.5) # Return an RGBA tuple
(0.1, 0.2, 0.3, 1.0)

回答 0

您可以使用下面的代码来执行此操作,而问题中的代码实际上与所需的代码非常接近,您所要做的就是调用cmap您拥有的对象。

import matplotlib

cmap = matplotlib.cm.get_cmap('Spectral')

rgba = cmap(0.5)
print(rgba) # (0.99807766255210428, 0.99923106502084169, 0.74602077638401709, 1.0)

对于[0.0,1.0]范围之外的值,它将分别返回底色和底色。默认情况下,这是该范围内的最小和最大颜色(即0.0和1.0)。可以使用cmap.set_under()和更改默认设置cmap.set_over()

对于“特殊”数字(例如)np.nannp.inf默认值是使用0.0值,可以使用cmap.set_bad()类似于“低于”和“高于”的方式更改此值。

最后,您可能需要对数据进行规范化以使其符合范围[0.0, 1.0]matplotlib.colors.Normalize只需使用下面的小示例所示,即可完成此操作,在该示例中,参数vminvmax描述应分别映射到0.0和1.0的数字。

import matplotlib

norm = matplotlib.colors.Normalize(vmin=10.0, vmax=20.0)

print(norm(15.0)) # 0.5

对数归一化器(matplotlib.colors.LogNorm)也可用于值范围较大的数据范围。

(感谢Joe Kingtontcaswell提出了有关如何改善答案的建议。)

You can do this with the code below, and the code in your question was actually very close to what you needed, all you have to do is call the cmap object you have.

import matplotlib

cmap = matplotlib.cm.get_cmap('Spectral')

rgba = cmap(0.5)
print(rgba) # (0.99807766255210428, 0.99923106502084169, 0.74602077638401709, 1.0)

For values outside of the range [0.0, 1.0] it will return the under and over colour (respectively). This, by default, is the minimum and maximum colour within the range (so 0.0 and 1.0). This default can be changed with cmap.set_under() and cmap.set_over().

For “special” numbers such as np.nan and np.inf the default is to use the 0.0 value, this can be changed using cmap.set_bad() similarly to under and over as above.

Finally it may be necessary for you to normalize your data such that it conforms to the range [0.0, 1.0]. This can be done using matplotlib.colors.Normalize simply as shown in the small example below where the arguments vmin and vmax describe what numbers should be mapped to 0.0 and 1.0 respectively.

import matplotlib

norm = matplotlib.colors.Normalize(vmin=10.0, vmax=20.0)

print(norm(15.0)) # 0.5

A logarithmic normaliser (matplotlib.colors.LogNorm) is also available for data ranges with a large range of values.

(Thanks to both Joe Kington and tcaswell for suggestions on how to improve the answer.)


回答 1

为了获得rgba整数值而不是float值,我们可以

rgba = cmap(0.5,bytes=True)

因此,为了简化基于Ffisegydd的答案的代码,代码将如下所示:

#import colormap
from matplotlib import cm

#normalize item number values to colormap
norm = matplotlib.colors.Normalize(vmin=0, vmax=1000)

#colormap possible values = viridis, jet, spectral
rgba_color = cm.jet(norm(400),bytes=True) 

#400 is one of value between 0 and 1000

In order to get rgba integer value instead of float value, we can do

rgba = cmap(0.5,bytes=True)

So to simplify the code based on answer from Ffisegydd, the code would be like this:

#import colormap
from matplotlib import cm

#normalize item number values to colormap
norm = matplotlib.colors.Normalize(vmin=0, vmax=1000)

#colormap possible values = viridis, jet, spectral
rgba_color = cm.jet(norm(400),bytes=True) 

#400 is one of value between 0 and 1000

回答 2

为了建立在Ffisegyddamaliammr的解决方案的基础上,这是一个示例,其中我们为自定义颜色图制作CSV表示形式:

#! /usr/bin/env python3
import matplotlib
import numpy as np 

vmin = 0.1
vmax = 1000

norm = matplotlib.colors.Normalize(np.log10(vmin), np.log10(vmax))
lognum = norm(np.log10([.5, 2., 10, 40, 150,1000]))

cdict = {
    'red':
    (
        (0., 0, 0),
        (lognum[0], 0, 0),
        (lognum[1], 0, 0),
        (lognum[2], 1, 1),
        (lognum[3], 0.8, 0.8),
        (lognum[4], .7, .7),
    (lognum[5], .7, .7)
    ),
    'green':
    (
        (0., .6, .6),
        (lognum[0], 0.8, 0.8),
        (lognum[1], 1, 1),
        (lognum[2], 1, 1),
        (lognum[3], 0, 0),
        (lognum[4], 0, 0),
    (lognum[5], 0, 0)
    ),
    'blue':
    (
        (0., 0, 0),
        (lognum[0], 0, 0),
        (lognum[1], 0, 0),
        (lognum[2], 0, 0),
        (lognum[3], 0, 0),
        (lognum[4], 0, 0),
    (lognum[5], 1, 1)
    )
}


mycmap = matplotlib.colors.LinearSegmentedColormap('my_colormap', cdict, 256)   
norm = matplotlib.colors.LogNorm(vmin, vmax)
colors = {}
count = 0
step_size = 0.001
for value in np.arange(vmin, vmax+step_size, step_size):
    count += 1
    print("%d/%d %f%%" % (count, vmax*(1./step_size), 100.*count/(vmax*(1./step_size))))
    rgba = mycmap(norm(value), bytes=True)
    color = (rgba[0], rgba[1], rgba[2])
    if color not in colors.values():
        colors[value] = color

print ("value, red, green, blue")
for value in sorted(colors.keys()):
    rgb = colors[value]
    print("%s, %s, %s, %s" % (value, rgb[0], rgb[1], rgb[2]))

To build on the solutions from Ffisegydd and amaliammr, here’s an example where we make CSV representation for a custom colormap:

#! /usr/bin/env python3
import matplotlib
import numpy as np 

vmin = 0.1
vmax = 1000

norm = matplotlib.colors.Normalize(np.log10(vmin), np.log10(vmax))
lognum = norm(np.log10([.5, 2., 10, 40, 150,1000]))

cdict = {
    'red':
    (
        (0., 0, 0),
        (lognum[0], 0, 0),
        (lognum[1], 0, 0),
        (lognum[2], 1, 1),
        (lognum[3], 0.8, 0.8),
        (lognum[4], .7, .7),
    (lognum[5], .7, .7)
    ),
    'green':
    (
        (0., .6, .6),
        (lognum[0], 0.8, 0.8),
        (lognum[1], 1, 1),
        (lognum[2], 1, 1),
        (lognum[3], 0, 0),
        (lognum[4], 0, 0),
    (lognum[5], 0, 0)
    ),
    'blue':
    (
        (0., 0, 0),
        (lognum[0], 0, 0),
        (lognum[1], 0, 0),
        (lognum[2], 0, 0),
        (lognum[3], 0, 0),
        (lognum[4], 0, 0),
    (lognum[5], 1, 1)
    )
}


mycmap = matplotlib.colors.LinearSegmentedColormap('my_colormap', cdict, 256)   
norm = matplotlib.colors.LogNorm(vmin, vmax)
colors = {}
count = 0
step_size = 0.001
for value in np.arange(vmin, vmax+step_size, step_size):
    count += 1
    print("%d/%d %f%%" % (count, vmax*(1./step_size), 100.*count/(vmax*(1./step_size))))
    rgba = mycmap(norm(value), bytes=True)
    color = (rgba[0], rgba[1], rgba[2])
    if color not in colors.values():
        colors[value] = color

print ("value, red, green, blue")
for value in sorted(colors.keys()):
    rgb = colors[value]
    print("%s, %s, %s, %s" % (value, rgb[0], rgb[1], rgb[2]))

回答 3

为了完整起见,这些是我到目前为止遇到的cmap选择:

重音,重音,蓝调,蓝调,BrBG,BrBG_r,BuGn,BuGn_r,BuPu,BuPu_r,CMRmap,CMRmap_r,Dark2,Dark2_r,GnBu,GnBu_r,Greens,Greens_r,Greys,Greys_r,Orange,Rr,OrRd,OrRd PRGn_r,成对,成对_r,Pastel1,Pastel1_r,Pastel2,Pastel2_r,PiYG,PiYG_r,PuBu,PuBuGn,PuBuGn_r,PuBu_r,PuOr,PuOr_r,PuRd,PuRd_r,Puror,PurOr_r,RdBu,RdBu,RdBu,RdBu,RdBu,RdBu,RdBu,RdBu,RdBu,RdBu,RdBu,RdBu RdYlBu,RdYlBu_r,RdYlGn,RdYlGn_r,Reds,Reds_r,Set1,Set1_r,Set2,Set2_r,Set3,Set3_r,Spectral,Spectral_r,Wistia,Wistia_r,YlGn,YlGnBu,YlGnBr_r,YlGnBr_r,YlGnBr_r,YlGnBr_r,YlGnBr_r,YlGnBr afmhot_r,秋季,autumn_r,二进制,binary_r,骨骼,bone_r,brg,brg_r,bwr,bwr_r,cividis,cividis_r,cool,cool_r,coolwarm,coolwarm_r,铜,copper_r,cubehelix,cubehelix_r,标志,flag_r,gist_eargist_gray,gist_gray_r,gist_heat,gist_heat_r,gist_ncar,gist_ncar_r,gist_rainbow,gist_rainbow_r,gist_stern,gist_stern_r,gist_yarg,gist_yarg_r,gnuplots,gn_lotv,gnuplot,gnuplot2,gnuplot2,gnuplot2,gnuplot2, jet_r,岩浆,岩浆_r,nipy_spectral,nipy_spectral_r,海洋,ocean_r,粉红色,pink_r,等离子,plasma_r,棱镜,prism_r,彩虹,rainbow_r,地震,地震_r,弹​​簧,spring_r,夏季,summer_r,tab10,tab10_r,tab20,tab20_r, tab20b,tab20b_r,tab20c,tab20c_r,terrain,terrain_r,twilight,twilight_r,twilight_shifted,twilight_shifted_r,viridis,viridis_r,冬天,winter_rgray_r,hot,hot_r,hsv,hsv_r,地狱,inferno_r,喷射,jet_r,岩浆,岩浆_r,nipy_spectral,nipy_spectral_r,海洋,ocean_r,粉红色,pink_r,等离子,plasma_r,棱镜,prism_r,彩虹,rainbow_r,地震,地震_r,春天,spring_r,夏天,summer_r,tab10,tab10_r,tab20,tab20_r,tab20b,tab20b_r,tab20c,tab20c_r,terrain,terrain_r,twilight,twilight_r,twilight_shifted,twilight_shifted_r,viridis,viridis_r,冬天,winter_rgray_r,hot,hot_r,hsv,hsv_r,地狱,inferno_r,喷射,jet_r,岩浆,岩浆_r,nipy_spectral,nipy_spectral_r,海洋,ocean_r,粉红色,pink_r,等离子,plasma_r,棱镜,prism_r,彩虹,rainbow_r,地震,地震_r,春天,spring_r,夏天,summer_r,tab10,tab10_r,tab20,tab20_r,tab20b,tab20b_r,tab20c,tab20c_r,terrain,terrain_r,twilight,twilight_r,twilight_shifted,twilight_shifted_r,viridis,viridis_r,冬天,winter_rviridis,viridis_r,冬天,winter_rviridis,viridis_r,冬天,winter_r

For completeness these are the cmap choices I encountered so far:

Accent, Accent_r, Blues, Blues_r, BrBG, BrBG_r, BuGn, BuGn_r, BuPu, BuPu_r, CMRmap, CMRmap_r, Dark2, Dark2_r, GnBu, GnBu_r, Greens, Greens_r, Greys, Greys_r, OrRd, OrRd_r, Oranges, Oranges_r, PRGn, PRGn_r, Paired, Paired_r, Pastel1, Pastel1_r, Pastel2, Pastel2_r, PiYG, PiYG_r, PuBu, PuBuGn, PuBuGn_r, PuBu_r, PuOr, PuOr_r, PuRd, PuRd_r, Purples, Purples_r, RdBu, RdBu_r, RdGy, RdGy_r, RdPu, RdPu_r, RdYlBu, RdYlBu_r, RdYlGn, RdYlGn_r, Reds, Reds_r, Set1, Set1_r, Set2, Set2_r, Set3, Set3_r, Spectral, Spectral_r, Wistia, Wistia_r, YlGn, YlGnBu, YlGnBu_r, YlGn_r, YlOrBr, YlOrBr_r, YlOrRd, YlOrRd_r, afmhot, afmhot_r, autumn, autumn_r, binary, binary_r, bone, bone_r, brg, brg_r, bwr, bwr_r, cividis, cividis_r, cool, cool_r, coolwarm, coolwarm_r, copper, copper_r, cubehelix, cubehelix_r, flag, flag_r, gist_earth, gist_earth_r, gist_gray, gist_gray_r, gist_heat, gist_heat_r, gist_ncar, gist_ncar_r, gist_rainbow, gist_rainbow_r, gist_stern, gist_stern_r, gist_yarg, gist_yarg_r, gnuplot, gnuplot2, gnuplot2_r, gnuplot_r, gray, gray_r, hot, hot_r, hsv, hsv_r, inferno, inferno_r, jet, jet_r, magma, magma_r, nipy_spectral, nipy_spectral_r, ocean, ocean_r, pink, pink_r, plasma, plasma_r, prism, prism_r, rainbow, rainbow_r, seismic, seismic_r, spring, spring_r, summer, summer_r, tab10, tab10_r, tab20, tab20_r, tab20b, tab20b_r, tab20c, tab20c_r, terrain, terrain_r, twilight, twilight_r, twilight_shifted, twilight_shifted_r, viridis, viridis_r, winter, winter_r


Python ElementTree模块:使用方法“ find”,“ findall”时,如何忽略XML文件的命名空间以找到匹配的元素

问题:Python ElementTree模块:使用方法“ find”,“ findall”时,如何忽略XML文件的命名空间以找到匹配的元素

我想使用“ findall”方法在ElementTree模块中找到源xml文件的某些元素。

但是,源xml文件(test.xml)具有命名空间。我截断一部分xml文件作为示例:

<?xml version="1.0" encoding="iso-8859-1"?>
<XML_HEADER xmlns="http://www.test.com">
    <TYPE>Updates</TYPE>
    <DATE>9/26/2012 10:30:34 AM</DATE>
    <COPYRIGHT_NOTICE>All Rights Reserved.</COPYRIGHT_NOTICE>
    <LICENSE>newlicense.htm</LICENSE>
    <DEAL_LEVEL>
        <PAID_OFF>N</PAID_OFF>
        </DEAL_LEVEL>
</XML_HEADER>

示例python代码如下:

from xml.etree import ElementTree as ET
tree = ET.parse(r"test.xml")
el1 = tree.findall("DEAL_LEVEL/PAID_OFF") # Return None
el2 = tree.findall("{http://www.test.com}DEAL_LEVEL/{http://www.test.com}PAID_OFF") # Return <Element '{http://www.test.com}DEAL_LEVEL/PAID_OFF' at 0xb78b90>

尽管它可以工作,但是因为有一个命名空间“ {http://www.test.com}”,但是在每个标签前面添加一个命名空间非常不方便。

使用“ find”,“ findall”等方法时,如何忽略命名空间?

I want to use the method of “findall” to locate some elements of the source xml file in the ElementTree module.

However, the source xml file (test.xml) has namespace. I truncate part of xml file as sample:

<?xml version="1.0" encoding="iso-8859-1"?>
<XML_HEADER xmlns="http://www.test.com">
    <TYPE>Updates</TYPE>
    <DATE>9/26/2012 10:30:34 AM</DATE>
    <COPYRIGHT_NOTICE>All Rights Reserved.</COPYRIGHT_NOTICE>
    <LICENSE>newlicense.htm</LICENSE>
    <DEAL_LEVEL>
        <PAID_OFF>N</PAID_OFF>
        </DEAL_LEVEL>
</XML_HEADER>

The sample python code is below:

from xml.etree import ElementTree as ET
tree = ET.parse(r"test.xml")
el1 = tree.findall("DEAL_LEVEL/PAID_OFF") # Return None
el2 = tree.findall("{http://www.test.com}DEAL_LEVEL/{http://www.test.com}PAID_OFF") # Return <Element '{http://www.test.com}DEAL_LEVEL/PAID_OFF' at 0xb78b90>

Although it can works, because there is a namespace “{http://www.test.com}”, it’s very inconvenient to add a namespace in front of each tag.

How can I ignore the namespace when using the method of “find”, “findall” and so on?


回答 0

最好不要解析XML文档本身,而是先解析它,然后修改结果中的标记。这样,您可以处理多个命名空间和命名空间别名:

from io import StringIO  # for Python 2 import from StringIO instead
import xml.etree.ElementTree as ET

# instead of ET.fromstring(xml)
it = ET.iterparse(StringIO(xml))
for _, el in it:
    prefix, has_namespace, postfix = el.tag.partition('}')
    if has_namespace:
        el.tag = postfix  # strip all namespaces
root = it.root

这是基于此处的讨论:http : //bugs.python.org/issue18304

更新: rpartition而不是partition确保你得到的标签名postfix,即使没有命名空间。因此,您可以将其压缩:

for _, el in it:
    _, _, el.tag = el.tag.rpartition('}') # strip ns

Instead of modifying the XML document itself, it’s best to parse it and then modify the tags in the result. This way you can handle multiple namespaces and namespace aliases:

from io import StringIO  # for Python 2 import from StringIO instead
import xml.etree.ElementTree as ET

# instead of ET.fromstring(xml)
it = ET.iterparse(StringIO(xml))
for _, el in it:
    prefix, has_namespace, postfix = el.tag.partition('}')
    if has_namespace:
        el.tag = postfix  # strip all namespaces
root = it.root

This is based on the discussion here: http://bugs.python.org/issue18304

Update: rpartition instead of partition makes sure you get the tag name in postfix even if there is no namespace. Thus you could condense it:

for _, el in it:
    _, _, el.tag = el.tag.rpartition('}') # strip ns

回答 1

如果您在解析前从xml中删除xmlns属性,则树中的每个标记都将没有命名空间。

import re

xmlstring = re.sub(' xmlns="[^"]+"', '', xmlstring, count=1)

If you remove the xmlns attribute from the xml before parsing it then there won’t be a namespace prepended to each tag in the tree.

import re

xmlstring = re.sub(' xmlns="[^"]+"', '', xmlstring, count=1)

回答 2

到目前为止,答案明确地将命名空间值放在脚本中。对于更通用的解决方案,我宁愿从xml中提取命名空间:

import re
def get_namespace(element):
  m = re.match('\{.*\}', element.tag)
  return m.group(0) if m else ''

并在查找方法中使用它:

namespace = get_namespace(tree.getroot())
print tree.find('./{0}parent/{0}version'.format(namespace)).text

The answers so far explicitely put the namespace value in the script. For a more generic solution, I would rather extract the namespace from the xml:

import re
def get_namespace(element):
  m = re.match('\{.*\}', element.tag)
  return m.group(0) if m else ''

And use it in find method:

namespace = get_namespace(tree.getroot())
print tree.find('./{0}parent/{0}version'.format(namespace)).text

回答 3

这是对nonagon答案的扩展,它也剥离了命名空间的属性:

from StringIO import StringIO
import xml.etree.ElementTree as ET

# instead of ET.fromstring(xml)
it = ET.iterparse(StringIO(xml))
for _, el in it:
    if '}' in el.tag:
        el.tag = el.tag.split('}', 1)[1]  # strip all namespaces
    for at in list(el.attrib.keys()): # strip namespaces of attributes too
        if '}' in at:
            newat = at.split('}', 1)[1]
            el.attrib[newat] = el.attrib[at]
            del el.attrib[at]
root = it.root

UPDATE:已添加,list()以便迭代器可以工作(Python 3所需)

Here’s an extension to nonagon’s answer, which also strips namespaces off attributes:

from StringIO import StringIO
import xml.etree.ElementTree as ET

# instead of ET.fromstring(xml)
it = ET.iterparse(StringIO(xml))
for _, el in it:
    if '}' in el.tag:
        el.tag = el.tag.split('}', 1)[1]  # strip all namespaces
    for at in list(el.attrib.keys()): # strip namespaces of attributes too
        if '}' in at:
            newat = at.split('}', 1)[1]
            el.attrib[newat] = el.attrib[at]
            del el.attrib[at]
root = it.root

UPDATE: added list() so the iterator works (needed for Python 3)


回答 4

改善ericspod的答案:

无需全局更改解析模式,我们可以将其包装在支持with构造的对象中。

from xml.parsers import expat

class DisableXmlNamespaces:
    def __enter__(self):
            self.oldcreate = expat.ParserCreate
            expat.ParserCreate = lambda encoding, sep: self.oldcreate(encoding, None)
    def __exit__(self, type, value, traceback):
            expat.ParserCreate = self.oldcreate

然后可以按如下方式使用

import xml.etree.ElementTree as ET
with DisableXmlNamespaces():
     tree = ET.parse("test.xml")

这种方式的优点在于,它不会更改with块之外无关代码的任何行为。我使用了ericspod的版本(在此同时也使用了expat)在不相关的库中出现错误之后,最终创建了该代码。

Improving on the answer by ericspod:

Instead of changing the parse mode globally we can wrap this in an object supporting the with construct.

from xml.parsers import expat

class DisableXmlNamespaces:
    def __enter__(self):
            self.oldcreate = expat.ParserCreate
            expat.ParserCreate = lambda encoding, sep: self.oldcreate(encoding, None)
    def __exit__(self, type, value, traceback):
            expat.ParserCreate = self.oldcreate

This can then be used as follows

import xml.etree.ElementTree as ET
with DisableXmlNamespaces():
     tree = ET.parse("test.xml")

The beauty of this way is that it does not change any behaviour for unrelated code outside the with block. I ended up creating this after getting errors in unrelated libraries after using the version by ericspod which also happened to use expat.


回答 5

您也可以使用优雅的字符串格式构造:

ns='http://www.test.com'
el2 = tree.findall("{%s}DEAL_LEVEL/{%s}PAID_OFF" %(ns,ns))

或者,如果您确定PAID_OFF仅出现在树的一级中:

el2 = tree.findall(".//{%s}PAID_OFF" % ns)

You can use the elegant string formatting construct as well:

ns='http://www.test.com'
el2 = tree.findall("{%s}DEAL_LEVEL/{%s}PAID_OFF" %(ns,ns))

or, if you’re sure that PAID_OFF only appears in one level in tree:

el2 = tree.findall(".//{%s}PAID_OFF" % ns)

回答 6

如果不使用ElementTree,则cElementTree可以通过替换来强制Expat忽略命名空间处理ParserCreate()

from xml.parsers import expat
oldcreate = expat.ParserCreate
expat.ParserCreate = lambda encoding, sep: oldcreate(encoding, None)

ElementTree尝试通过调用来使用Expat,ParserCreate()但没有提供不提供命名空间分隔符字符串的选项,以上代码将导致其被忽略,但被警告可能会破坏其他情况。

If you’re using ElementTree and not cElementTree you can force Expat to ignore namespace processing by replacing ParserCreate():

from xml.parsers import expat
oldcreate = expat.ParserCreate
expat.ParserCreate = lambda encoding, sep: oldcreate(encoding, None)

ElementTree tries to use Expat by calling ParserCreate() but provides no option to not provide a namespace separator string, the above code will cause it to be ignore but be warned this could break other things.


回答 7

我为此可能会迟到,但我认为这re.sub不是一个好的解决方案。

但是,该重写xml.parsers.expat不适用于Python 3.x版本,

罪魁祸首是xml/etree/ElementTree.py源代码的底部

# Import the C accelerators
try:
    # Element is going to be shadowed by the C implementation. We need to keep
    # the Python version of it accessible for some "creative" by external code
    # (see tests)
    _Element_Py = Element

    # Element, SubElement, ParseError, TreeBuilder, XMLParser
    from _elementtree import *
except ImportError:
    pass

真是可悲。

解决的办法是先摆脱它。

import _elementtree
try:
    del _elementtree.XMLParser
except AttributeError:
    # in case deleted twice
    pass
else:
    from xml.parsers import expat  # NOQA: F811
    oldcreate = expat.ParserCreate
    expat.ParserCreate = lambda encoding, sep: oldcreate(encoding, None)

在Python 3.6上测试。

try如果在代码的某处重新加载或导入模块两次而遇到一些奇怪的错误,例如try 语句,则很有用

  • 超过最大递归深度
  • AttributeError:XMLParser

顺便说一句,etree源代码看起来真的很乱。

I might be late for this but I dont think re.sub is a good solution.

However the rewrite xml.parsers.expat does not work for Python 3.x versions,

The main culprit is the xml/etree/ElementTree.py see bottom of the source code

# Import the C accelerators
try:
    # Element is going to be shadowed by the C implementation. We need to keep
    # the Python version of it accessible for some "creative" by external code
    # (see tests)
    _Element_Py = Element

    # Element, SubElement, ParseError, TreeBuilder, XMLParser
    from _elementtree import *
except ImportError:
    pass

Which is kinda sad.

The solution is to get rid of it first.

import _elementtree
try:
    del _elementtree.XMLParser
except AttributeError:
    # in case deleted twice
    pass
else:
    from xml.parsers import expat  # NOQA: F811
    oldcreate = expat.ParserCreate
    expat.ParserCreate = lambda encoding, sep: oldcreate(encoding, None)

Tested on Python 3.6.

Try try statement is useful in case somewhere in your code you reload or import a module twice you get some strange errors like

  • maximum recursion depth exceeded
  • AttributeError: XMLParser

btw damn the etree source code looks really messy.


回答 8

让我们结合nonagon的答案mzjn对一个相关问题的答案

def parse_xml(xml_path: Path) -> Tuple[ET.Element, Dict[str, str]]:
    xml_iter = ET.iterparse(xml_path, events=["start-ns"])
    xml_namespaces = dict(prefix_namespace_pair for _, prefix_namespace_pair in xml_iter)
    return xml_iter.root, xml_namespaces

使用此功能,我们:

  1. 创建一个迭代器以获取命名空间和已解析的树对象

  2. 遍历创建的迭代器以获取命名空间命令,我们以后可以传入每个命名空间find()findall()调用iMom0的命名空间。

  3. 返回解析树的根元素对象和命名空间。

我认为这是最好的方法,因为无论源XML还是解析后的xml.etree.ElementTree输出都不会受到任何操纵。

我还要感谢Barny的回答,因为它提供了这个难题的重要组成部分(您可以从迭代器获得解析的根)。在此之前,我实际上在应用程序中遍历了两次XML树(一次获取命名空间,第二次获取根)。

Let’s combine nonagon’s answer with mzjn’s answer to a related question:

def parse_xml(xml_path: Path) -> Tuple[ET.Element, Dict[str, str]]:
    xml_iter = ET.iterparse(xml_path, events=["start-ns"])
    xml_namespaces = dict(prefix_namespace_pair for _, prefix_namespace_pair in xml_iter)
    return xml_iter.root, xml_namespaces

Using this function we:

  1. Create an iterator to get both namespaces and a parsed tree object.

  2. Iterate over the created iterator to get the namespaces dict that we can later pass in each find() or findall() call as sugested by iMom0.

  3. Return the parsed tree’s root element object and namespaces.

I think this is the best approach all around as there’s no manipulation either of a source XML or resulting parsed xml.etree.ElementTree output whatsoever involved.

I’d like also to credit barny’s answer with providing an essential piece of this puzzle (that you can get the parsed root from the iterator). Until that I actually traversed XML tree twice in my application (once to get namespaces, second for a root).


Python / SciPy的峰值发现算法

问题:Python / SciPy的峰值发现算法

我可以通过找到一阶导数的零交叉点或类似的东西自己写点东西,但是它似乎包含在标准库中,具有足够的通用性。有人知道吗?

我的特定应用是2D阵列,但通常将其用于查找FFT等中的峰值。

具体来说,在这些类型的问题中,有多个强峰值,然后是由噪声引起的许多较小的“峰值”,应将其忽略。这些仅仅是示例;不是我的实际数据:

一维峰:

二维峰:

峰值查找算法将找到这些峰的位置(而不仅仅是它们的值),理想情况下,可能会使用二次插值或其他方法找到真正的样本间峰,而不仅仅是具有最大值的索引。

通常,您只关心几个强峰,因此选择它们是因为它们高于某个阈值,或者因为它们是有序列表的前n个峰(按振幅排序)。

正如我说的,我自己会写这样的东西。我只是问是否有一个已知的运作良好的功能或软件包。

更新:

翻译了一个MATLAB脚本,它在1-D情况下工作得很好,但可能会更好。

更新的更新:

sixtenbe 为一维案例创建了更好的版本

I can write something myself by finding zero-crossings of the first derivative or something, but it seems like a common-enough function to be included in standard libraries. Anyone know of one?

My particular application is a 2D array, but usually it would be used for finding peaks in FFTs, etc.

Specifically, in these kinds of problems, there are multiple strong peaks, and then lots of smaller “peaks” that are just caused by noise that should be ignored. These are just examples; not my actual data:

1-dimensional peaks:

2-dimensional peaks:

The peak-finding algorithm would find the location of these peaks (not just their values), and ideally would find the true inter-sample peak, not just the index with maximum value, probably using quadratic interpolation or something.

Typically you only care about a few strong peaks, so they’d either be chosen because they’re above a certain threshold, or because they’re the first n peaks of an ordered list, ranked by amplitude.

As I said, I know how to write something like this myself. I’m just asking if there’s a pre-existing function or package that’s known to work well.

Update:

I translated a MATLAB script and it works decently for the 1-D case, but could be better.

Updated update:

sixtenbe created a better version for the 1-D case.


回答 0

scipy.signal.find_peaks顾名思义,该功能对此有用。但是,要理解以及它的参数是非常重要的widththresholddistance 和高于一切prominence,以获得良好的峰值提取。

根据我的测试和文档,突出的概念是“有用的概念”,用于保持良好的峰值,并丢弃嘈杂的峰值。

什么是(地形)突出?它是“从山顶下降到更高地形所需的最低高度”,如下所示:

这个想法是:

突出程度越高,峰越“重要”。

测试:

我故意使用了一个(嘈杂的)频率变化正弦曲线,因为它显示了很多困难。我们可以看到该width参数在这里不是很有用,因为如果您将最小值设置width得太高,则它将无法跟踪高频部分中非常接近的峰值。如果设置width得太低,则信号左侧会出现许多不需要的峰值。同样的问题distancethreshold仅与直接邻居比较,在这里没有用。prominence是提供最佳解决方案的一种。请注意,您可以结合使用许多这些参数!

码:

import numpy as np
import matplotlib.pyplot as plt 
from scipy.signal import find_peaks

x = np.sin(2*np.pi*(2**np.linspace(2,10,1000))*np.arange(1000)/48000) + np.random.normal(0, 1, 1000) * 0.15
peaks, _ = find_peaks(x, distance=20)
peaks2, _ = find_peaks(x, prominence=1)      # BEST!
peaks3, _ = find_peaks(x, width=20)
peaks4, _ = find_peaks(x, threshold=0.4)     # Required vertical distance to its direct neighbouring samples, pretty useless
plt.subplot(2, 2, 1)
plt.plot(peaks, x[peaks], "xr"); plt.plot(x); plt.legend(['distance'])
plt.subplot(2, 2, 2)
plt.plot(peaks2, x[peaks2], "ob"); plt.plot(x); plt.legend(['prominence'])
plt.subplot(2, 2, 3)
plt.plot(peaks3, x[peaks3], "vg"); plt.plot(x); plt.legend(['width'])
plt.subplot(2, 2, 4)
plt.plot(peaks4, x[peaks4], "xk"); plt.plot(x); plt.legend(['threshold'])
plt.show()

The function scipy.signal.find_peaks, as its name suggests, is useful for this. But it’s important to understand well its parameters width, threshold, distance and above all prominence to get a good peak extraction.

According to my tests and the documentation, the concept of prominence is “the useful concept” to keep the good peaks, and discard the noisy peaks.

What is (topographic) prominence? It is “the minimum height necessary to descend to get from the summit to any higher terrain”, as it can be seen here:

The idea is:

The higher the prominence, the more “important” the peak is.

Test:

I used a (noisy) frequency-varying sinusoid on purpose because it shows many difficulties. We can see that the width parameter is not very useful here because if you set a minimum width too high, then it won’t be able to track very close peaks in the high frequency part. If you set width too low, you would have many unwanted peaks in the left part of the signal. Same problem with distance. threshold only compares with the direct neighbours, which is not useful here. prominence is the one that gives the best solution. Note that you can combine many of these parameters!

Code:

import numpy as np
import matplotlib.pyplot as plt 
from scipy.signal import find_peaks

x = np.sin(2*np.pi*(2**np.linspace(2,10,1000))*np.arange(1000)/48000) + np.random.normal(0, 1, 1000) * 0.15
peaks, _ = find_peaks(x, distance=20)
peaks2, _ = find_peaks(x, prominence=1)      # BEST!
peaks3, _ = find_peaks(x, width=20)
peaks4, _ = find_peaks(x, threshold=0.4)     # Required vertical distance to its direct neighbouring samples, pretty useless
plt.subplot(2, 2, 1)
plt.plot(peaks, x[peaks], "xr"); plt.plot(x); plt.legend(['distance'])
plt.subplot(2, 2, 2)
plt.plot(peaks2, x[peaks2], "ob"); plt.plot(x); plt.legend(['prominence'])
plt.subplot(2, 2, 3)
plt.plot(peaks3, x[peaks3], "vg"); plt.plot(x); plt.legend(['width'])
plt.subplot(2, 2, 4)
plt.plot(peaks4, x[peaks4], "xk"); plt.plot(x); plt.legend(['threshold'])
plt.show()

回答 1

我正在寻找一个类似的问题,并且我发现一些最佳参考来自化学(来自质谱数据中的峰)。有关峰发现算法的详尽综述,请阅读本章。这是我所遇到的关于峰发现技术的最清晰的评论之一。(小波最适合在嘈杂的数据中找到此类峰。)。

看来您的峰清晰地定义了,并且没有隐藏在噪音中。在这种情况下,我建议您使用平滑的savtizky-golay导数来查找峰(如果仅区分上面的数据,则会有一些误报。)。这是一种非常有效的技术,非常容易实现(您确实需要带有基本操作的矩阵类)。如果您只是找到一阶SG导数的零交叉,我想您会很高兴的。

I’m looking at a similar problem, and I’ve found some of the best references come from chemistry (from peaks finding in mass-spec data). For a good thorough review of peaking finding algorithms read this. This is one of the best clearest reviews of peak finding techniques that I’ve run across. (Wavelets are the best for finding peaks of this sort in noisy data.).

It looks like your peaks are clearly defined and aren’t hidden in the noise. That being the case I’d recommend using smooth savtizky-golay derivatives to find the peaks (If you just differentiate the data above you’ll have a mess of false positives.). This is a very effective technique and is pretty easy to implemented (you do need a matrix class w/ basic operations). If you simply find the zero crossing of the first S-G derivative I think you’ll be happy.


回答 2

scipy中有一个名为的功能scipy.signal.find_peaks_cwt,听起来像很适合您的需求,但是我没有经验,所以我不推荐。

http://docs.scipy.org/doc/scipy/reference/generation/scipy.signal.find_peaks_cwt.html

There is a function in scipy named scipy.signal.find_peaks_cwt which sounds like is suitable for your needs, however I don’t have experience with it so I cannot recommend..

http://docs.scipy.org/doc/scipy/reference/generated/scipy.signal.find_peaks_cwt.html


回答 3

对于那些不确定在Python中使用哪种峰值查找算法的人,这里是替代方法的快速概述:https : //github.com/MonsieurV/py-findpeaks

想要自己等同于MatLab findpeaks函数,我发现Marcos Duarte 的detect_peaks函数是一个不错的选择。

相当容易使用:

import numpy as np
from vector import vector, plot_peaks
from libs import detect_peaks
print('Detect peaks with minimum height and distance filters.')
indexes = detect_peaks.detect_peaks(vector, mph=7, mpd=2)
print('Peaks are: %s' % (indexes))

这会给你:

For those not sure about which peak-finding algorithms to use in Python, here a rapid overview of the alternatives: https://github.com/MonsieurV/py-findpeaks

Wanting myself an equivalent to the MatLab findpeaks function, I’ve found that the detect_peaks function from Marcos Duarte is a good catch.

Pretty easy to use:

import numpy as np
from vector import vector, plot_peaks
from libs import detect_peaks
print('Detect peaks with minimum height and distance filters.')
indexes = detect_peaks.detect_peaks(vector, mph=7, mpd=2)
print('Peaks are: %s' % (indexes))

Which will give you:


回答 4

以可靠的方式检测频谱中的峰值已经进行了很多研究,例如80年代对音乐/音频信号的正弦建模的所有工作。在文献中查找“正弦建模”。

如果您的信号像示例一样干净,那么简单的“给我振幅大于N个邻居的东西”应该可以正常工作。如果您有嘈杂的信号,一种简单而有效的方法就是及时查看峰值并进行跟踪:然后检测频谱线而不是频谱峰值。IOW,您可以在信号的滑动窗口上计算FFT,以获得时间上的一组频谱(也称为频谱图)。然后,您可以查看频谱峰值随时间的变化(即在连续的窗口中)。

Detecting peaks in a spectrum in a reliable way has been studied quite a bit, for example all the work on sinusoidal modelling for music/audio signals in the 80ies. Look for “Sinusoidal Modeling” in the literature.

If your signals are as clean as the example, a simple “give me something with an amplitude higher than N neighbours” should work reasonably well. If you have noisy signals, a simple but effective way is to look at your peaks in time, to track them: you then detect spectral lines instead of spectral peaks. IOW, you compute the FFT on a sliding window of your signal, to get a set of spectrum in time (also called spectrogram). You then look at the evolution of the spectral peak in time (i.e. in consecutive windows).


回答 5

我认为您所寻找的不是SciPy提供的。在这种情况下,我将自己编写代码。

scipy.interpolate的样条曲线插值和平滑效果非常好,可能对拟合峰然后找到最大值的位置很有帮助。

I do not think that what you are looking for is provided by SciPy. I would write the code myself, in this situation.

The spline interpolation and smoothing from scipy.interpolate are quite nice and might be quite helpful in fitting peaks and then finding the location of their maximum.


回答 6

有一些标准的统计功能和方法可以找到数据的异常值,这可能是第一种情况。使用导数将解决您的第二个问题。但是,我不确定是否可以解决连续函数和采样数据的方法。

There are standard statistical functions and methods for finding outliers to data, which is probably what you need in the first case. Using derivatives would solve your second. I’m not sure for a method which solves both continuous functions and sampled data, however.


回答 7

首先,如果没有进一步说明,“峰值”的定义是模糊的。例如,对于以下系列,您将5-4-5称为一个峰还是两个峰?

1-2-1-2-1-1-5-4-5-1-1-5-1

在这种情况下,您至少需要两个阈值:1)仅在高阈值之上,极值才能注册为峰值;2)较低的阈值,以使极小值被其以下的小数值分隔开将成为两个峰值。

峰值检测是极值理论文献中一个经过充分研究的主题,也称为“极值的聚类”。它的典型应用包括基于连续读取环境变量来识别危险事件,例如分析风速以检测风暴事件。

First things first, the definition of “peak” is vague if without further specifications. For example, for the following series, would you call 5-4-5 one peak or two?

1-2-1-2-1-1-5-4-5-1-1-5-1

In this case, you’ll need at least two thresholds: 1) a high threshold only above which can an extreme value register as a peak; and 2) a low threshold so that extreme values separated by small values below it will become two peaks.

Peak detection is a well-studied topic in Extreme Value Theory literature, also known as “declustering of extreme values”. Its typical applications include identifying hazard events based on continuous readings of environmental variables e.g. analysing wind speed to detect storm events.