标签归档:Python

在Tensorflow中,获取图中所有张量的名称

问题:在Tensorflow中,获取图中所有张量的名称

我正在使用Tensorflow和创建神经网络skflow。由于某种原因,我想获得某种内在的张量的值给定的输入,所以我使用的myClassifier.get_layer_value(input, "tensorName")myClassifier作为一个skflow.estimators.TensorFlowEstimator

但是,我发现很难找到张量名称的正确语法,即使知道它的名称也很困难(而且我对操作和张量感到困惑),因此我使用张量板来绘制图形并寻找名称。

有没有一种方法可以在不使用张量板的情况下枚举图中的所有张量?

I am creating neural nets with Tensorflow and skflow; for some reason I want to get the values of some inner tensors for a given input, so I am using myClassifier.get_layer_value(input, "tensorName"), myClassifier being a skflow.estimators.TensorFlowEstimator.

However, I find it difficult to find the correct syntax of the tensor name, even knowing its name (and I’m getting confused between operation and tensors), so I’m using tensorboard to plot the graph and look for the name.

Is there a way to enumerate all the tensors in a graph without using tensorboard?


回答 0

你可以做

[n.name for n in tf.get_default_graph().as_graph_def().node]

另外,如果要在IPython笔记本中进行原型制作,则可以直接在笔记本中显示图形,请参见show_graphAlexander’s Deep Dream 笔记本中的功能

You can do

[n.name for n in tf.get_default_graph().as_graph_def().node]

Also, if you are prototyping in an IPython notebook, you can show the graph directly in notebook, see show_graph function in Alexander’s Deep Dream notebook


回答 1

有一种方法可以通过使用get_operations来比Yaroslav的回答中稍快一些。这是一个简单的示例:

import tensorflow as tf

a = tf.constant(1.3, name='const_a')
b = tf.Variable(3.1, name='variable_b')
c = tf.add(a, b, name='addition')
d = tf.multiply(c, a, name='multiply')

for op in tf.get_default_graph().get_operations():
    print(str(op.name))

There is a way to do it slightly faster than in Yaroslav’s answer by using get_operations. Here is a quick example:

import tensorflow as tf

a = tf.constant(1.3, name='const_a')
b = tf.Variable(3.1, name='variable_b')
c = tf.add(a, b, name='addition')
d = tf.multiply(c, a, name='multiply')

for op in tf.get_default_graph().get_operations():
    print(str(op.name))

回答 2

我将尝试总结答案:

要获取所有节点(类型tensorflow.core.framework.node_def_pb2.NodeDef):

all_nodes = [n for n in tf.get_default_graph().as_graph_def().node]

要获取所有操作(类型tensorflow.python.framework.ops.Operation):

all_ops = tf.get_default_graph().get_operations()

要获取所有变量(类型tensorflow.python.ops.resource_variable_ops.ResourceVariable):

all_vars = tf.global_variables()

获取所有张量(类型tensorflow.python.framework.ops.Tensor

all_tensors = [tensor for op in tf.get_default_graph().get_operations() for tensor in op.values()]

I’ll try to summarize the answers:

To get all nodes: (type tensorflow.core.framework.node_def_pb2.NodeDef)

all_nodes = [n for n in tf.get_default_graph().as_graph_def().node]

To get all ops: (type tensorflow.python.framework.ops.Operation)

all_ops = tf.get_default_graph().get_operations()

To get all variables: (type tensorflow.python.ops.resource_variable_ops.ResourceVariable)

all_vars = tf.global_variables()

To get all tensors: (type tensorflow.python.framework.ops.Tensor)

all_tensors = [tensor for op in tf.get_default_graph().get_operations() for tensor in op.values()]

To get the graph in Tensorflow 2, instead of tf.get_default_graph() you need to instantiate a tf.function first and access the graph attribute, for example:

graph = func.get_concrete_function().graph

where func is a tf.function


回答 3

tf.all_variables() 可以为您获取所需的信息。

此外,今天在TensorFlow Learn中所做的提交get_variable_names在estimator中提供了一个函数,您可以使用该函数轻松检索所有变量名称。

tf.all_variables() can get you the information you want.

Also, this commit made today in TensorFlow Learn that provides a function get_variable_names in estimator that you can use to retrieve all variable names easily.


回答 4

我认为这样做也可以:

print(tf.contrib.graph_editor.get_tensors(tf.get_default_graph()))

但是,与萨尔瓦多和雅罗斯拉夫的答案相比,我不知道哪个更好。

I think this will do too:

print(tf.contrib.graph_editor.get_tensors(tf.get_default_graph()))

But compared with Salvado and Yaroslav’s answers, I don’t know which one is better.


回答 5

接受的答案仅会为您提供带有名称的字符串列表。我更喜欢另一种方法,它使您(几乎)直接访问张量:

graph = tf.get_default_graph()
list_of_tuples = [op.values() for op in graph.get_operations()]

list_of_tuples现在包含每个张量,每个张量都在一个元组中。您还可以对其进行调整以直接获得张量:

graph = tf.get_default_graph()
list_of_tuples = [op.values()[0] for op in graph.get_operations()]

The accepted answer only gives you a list of strings with the names. I prefer a different approach, which gives you (almost) direct access to the tensors:

graph = tf.get_default_graph()
list_of_tuples = [op.values() for op in graph.get_operations()]

list_of_tuples now contains every tensor, each within a tuple. You could also adapt it to get the tensors directly:

graph = tf.get_default_graph()
list_of_tuples = [op.values()[0] for op in graph.get_operations()]

回答 6

由于OP要求张量的列表而不是操作/节点的列表,因此代码应略有不同:

graph = tf.get_default_graph()    
tensors_per_node = [node.values() for node in graph.get_operations()]
tensor_names = [tensor.name for tensors in tensors_per_node for tensor in tensors]

Since the OP asked for the list of the tensors instead of the list of operations/nodes, the code should be slightly different:

graph = tf.get_default_graph()    
tensors_per_node = [node.values() for node in graph.get_operations()]
tensor_names = [tensor.name for tensors in tensors_per_node for tensor in tensors]

回答 7

先前的答案很好,我只想分享我编写的从图中选择张量的实用函数:

def get_graph_op(graph, and_conds=None, op='and', or_conds=None):
    """Selects nodes' names in the graph if:
    - The name contains all items in and_conds
    - OR/AND depending on op
    - The name contains any item in or_conds

    Condition starting with a "!" are negated.
    Returns all ops if no optional arguments is given.

    Args:
        graph (tf.Graph): The graph containing sought tensors
        and_conds (list(str)), optional): Defaults to None.
            "and" conditions
        op (str, optional): Defaults to 'and'. 
            How to link the and_conds and or_conds:
            with an 'and' or an 'or'
        or_conds (list(str), optional): Defaults to None.
            "or conditions"

    Returns:
        list(str): list of relevant tensor names
    """
    assert op in {'and', 'or'}

    if and_conds is None:
        and_conds = ['']
    if or_conds is None:
        or_conds = ['']

    node_names = [n.name for n in graph.as_graph_def().node]

    ands = {
        n for n in node_names
        if all(
            cond in n if '!' not in cond
            else cond[1:] not in n
            for cond in and_conds
        )}

    ors = {
        n for n in node_names
        if any(
            cond in n if '!' not in cond
            else cond[1:] not in n
            for cond in or_conds
        )}

    if op == 'and':
        return [
            n for n in node_names
            if n in ands.intersection(ors)
        ]
    elif op == 'or':
        return [
            n for n in node_names
            if n in ands.union(ors)
        ]

因此,如果您有带有操作图的图形:

['model/classifier/dense/kernel',
'model/classifier/dense/kernel/Assign',
'model/classifier/dense/kernel/read',
'model/classifier/dense/bias',
'model/classifier/dense/bias/Assign',
'model/classifier/dense/bias/read',
'model/classifier/dense/MatMul',
'model/classifier/dense/BiasAdd',
'model/classifier/ArgMax/dimension',
'model/classifier/ArgMax']

然后跑步

get_graph_op(tf.get_default_graph(), ['dense', '!kernel'], 'or', ['Assign'])

返回:

['model/classifier/dense/kernel/Assign',
'model/classifier/dense/bias',
'model/classifier/dense/bias/Assign',
'model/classifier/dense/bias/read',
'model/classifier/dense/MatMul',
'model/classifier/dense/BiasAdd']

Previous answers are good, I’d just like to share a utility function I wrote to select Tensors from a graph:

def get_graph_op(graph, and_conds=None, op='and', or_conds=None):
    """Selects nodes' names in the graph if:
    - The name contains all items in and_conds
    - OR/AND depending on op
    - The name contains any item in or_conds

    Condition starting with a "!" are negated.
    Returns all ops if no optional arguments is given.

    Args:
        graph (tf.Graph): The graph containing sought tensors
        and_conds (list(str)), optional): Defaults to None.
            "and" conditions
        op (str, optional): Defaults to 'and'. 
            How to link the and_conds and or_conds:
            with an 'and' or an 'or'
        or_conds (list(str), optional): Defaults to None.
            "or conditions"

    Returns:
        list(str): list of relevant tensor names
    """
    assert op in {'and', 'or'}

    if and_conds is None:
        and_conds = ['']
    if or_conds is None:
        or_conds = ['']

    node_names = [n.name for n in graph.as_graph_def().node]

    ands = {
        n for n in node_names
        if all(
            cond in n if '!' not in cond
            else cond[1:] not in n
            for cond in and_conds
        )}

    ors = {
        n for n in node_names
        if any(
            cond in n if '!' not in cond
            else cond[1:] not in n
            for cond in or_conds
        )}

    if op == 'and':
        return [
            n for n in node_names
            if n in ands.intersection(ors)
        ]
    elif op == 'or':
        return [
            n for n in node_names
            if n in ands.union(ors)
        ]

So if you have a graph with ops:

['model/classifier/dense/kernel',
'model/classifier/dense/kernel/Assign',
'model/classifier/dense/kernel/read',
'model/classifier/dense/bias',
'model/classifier/dense/bias/Assign',
'model/classifier/dense/bias/read',
'model/classifier/dense/MatMul',
'model/classifier/dense/BiasAdd',
'model/classifier/ArgMax/dimension',
'model/classifier/ArgMax']

Then running

get_graph_op(tf.get_default_graph(), ['dense', '!kernel'], 'or', ['Assign'])

returns:

['model/classifier/dense/kernel/Assign',
'model/classifier/dense/bias',
'model/classifier/dense/bias/Assign',
'model/classifier/dense/bias/read',
'model/classifier/dense/MatMul',
'model/classifier/dense/BiasAdd']

回答 8

这对我有用:

for n in tf.get_default_graph().as_graph_def().node:
    print('\n',n)

This worked for me:

for n in tf.get_default_graph().as_graph_def().node:
    print('\n',n)

2个数字列表之间的余弦相似度

问题:2个数字列表之间的余弦相似度

我需要计算两个列表之间的余弦相似度,比如说列表1是,列表2是。我不能使用任何东西,例如numpy或统计模块。我必须使用通用模块(数学等)(并尽可能减少模块数量,以减少花费的时间)。dataSetIdataSetII

假设dataSetIis [3, 45, 7, 2]dataSetIIis [2, 54, 13, 15]。列表的长度始终相等。

当然,余弦相似度在0到1之间,因此,它将用舍入到小数点后三位或四位format(round(cosine, 3))

预先非常感谢您的帮助。

I need to calculate the cosine similarity between two lists, let’s say for example list 1 which is dataSetI and list 2 which is dataSetII. I cannot use anything such as numpy or a statistics module. I must use common modules (math, etc) (and the least modules as possible, at that, to reduce time spent).

Let’s say dataSetI is [3, 45, 7, 2] and dataSetII is [2, 54, 13, 15]. The length of the lists are always equal.

Of course, the cosine similarity is between 0 and 1, and for the sake of it, it will be rounded to the third or fourth decimal with format(round(cosine, 3)).

Thank you very much in advance for helping.


回答 0

您应该尝试SciPy。它具有许多有用的科学例程,例如“用于数值计算积分,求解微分方程,优化和稀疏矩阵的例程”。它使用超快速优化的NumPy进行数字运算。请参阅此处进行安装。

注意space.distance.cosine计算的是distance,而不是相似度。因此,必须从1中减去该值才能获得相似性

from scipy import spatial

dataSetI = [3, 45, 7, 2]
dataSetII = [2, 54, 13, 15]
result = 1 - spatial.distance.cosine(dataSetI, dataSetII)

You should try SciPy. It has a bunch of useful scientific routines for example, “routines for computing integrals numerically, solving differential equations, optimization, and sparse matrices.” It uses the superfast optimized NumPy for its number crunching. See here for installing.

Note that spatial.distance.cosine computes the distance, and not the similarity. So, you must subtract the value from 1 to get the similarity.

from scipy import spatial

dataSetI = [3, 45, 7, 2]
dataSetII = [2, 54, 13, 15]
result = 1 - spatial.distance.cosine(dataSetI, dataSetII)

回答 1

numpy仅基于的另一个版本

from numpy import dot
from numpy.linalg import norm

cos_sim = dot(a, b)/(norm(a)*norm(b))

another version based on numpy only

from numpy import dot
from numpy.linalg import norm

cos_sim = dot(a, b)/(norm(a)*norm(b))

回答 2

您可以使用cosine_similarity功能表单sklearn.metrics.pairwise 文档

In [23]: from sklearn.metrics.pairwise import cosine_similarity

In [24]: cosine_similarity([[1, 0, -1]], [[-1,-1, 0]])
Out[24]: array([[-0.5]])

You can use cosine_similarity function form sklearn.metrics.pairwise docs

In [23]: from sklearn.metrics.pairwise import cosine_similarity

In [24]: cosine_similarity([[1, 0, -1]], [[-1,-1, 0]])
Out[24]: array([[-0.5]])

回答 3

我认为这里的性能并不重要,但是我无法抗拒。zip()函数完全复制了两个向量(实际上是更多的矩阵转置),只是以“ Pythonic”顺序获取数据。计时一下实现细节将是很有趣的:

import math
def cosine_similarity(v1,v2):
    "compute cosine similarity of v1 to v2: (v1 dot v2)/{||v1||*||v2||)"
    sumxx, sumxy, sumyy = 0, 0, 0
    for i in range(len(v1)):
        x = v1[i]; y = v2[i]
        sumxx += x*x
        sumyy += y*y
        sumxy += x*y
    return sumxy/math.sqrt(sumxx*sumyy)

v1,v2 = [3, 45, 7, 2], [2, 54, 13, 15]
print(v1, v2, cosine_similarity(v1,v2))

Output: [3, 45, 7, 2] [2, 54, 13, 15] 0.972284251712

这一次经历了像C一样的噪声,一次提取元素,但没有批量数组复制,并且所有重要的事情都在单个for循环中完成,并且使用单个平方根。

预计到达时间:将打印调用更新为功能。(原始版本是Python 2.7,而不是3.3。当前版本在带from __future__ import print_function声明的Python 2.7下运行。)两种方法的输出都是相同的。

在3.0GHz Core 2 Duo上的CPYthon 2.7.3:

>>> timeit.timeit("cosine_similarity(v1,v2)",setup="from __main__ import cosine_similarity, v1, v2")
2.4261788514654654
>>> timeit.timeit("cosine_measure(v1,v2)",setup="from __main__ import cosine_measure, v1, v2")
8.794677709375264

因此,在这种情况下,非Python方式要快3.6倍。

I don’t suppose performance matters much here, but I can’t resist. The zip() function completely recopies both vectors (more of a matrix transpose, actually) just to get the data in “Pythonic” order. It would be interesting to time the nuts-and-bolts implementation:

import math
def cosine_similarity(v1,v2):
    "compute cosine similarity of v1 to v2: (v1 dot v2)/{||v1||*||v2||)"
    sumxx, sumxy, sumyy = 0, 0, 0
    for i in range(len(v1)):
        x = v1[i]; y = v2[i]
        sumxx += x*x
        sumyy += y*y
        sumxy += x*y
    return sumxy/math.sqrt(sumxx*sumyy)

v1,v2 = [3, 45, 7, 2], [2, 54, 13, 15]
print(v1, v2, cosine_similarity(v1,v2))

Output: [3, 45, 7, 2] [2, 54, 13, 15] 0.972284251712

That goes through the C-like noise of extracting elements one-at-a-time, but does no bulk array copying and gets everything important done in a single for loop, and uses a single square root.

ETA: Updated print call to be a function. (The original was Python 2.7, not 3.3. The current runs under Python 2.7 with a from __future__ import print_function statement.) The output is the same, either way.

CPYthon 2.7.3 on 3.0GHz Core 2 Duo:

>>> timeit.timeit("cosine_similarity(v1,v2)",setup="from __main__ import cosine_similarity, v1, v2")
2.4261788514654654
>>> timeit.timeit("cosine_measure(v1,v2)",setup="from __main__ import cosine_measure, v1, v2")
8.794677709375264

So, the unpythonic way is about 3.6 times faster in this case.


回答 4

不使用任何进口

math.sqrt(x)

可以替换为

x ** .5

在不使用numpy.dot()的情况下,您必须使用列表理解来创建自己的点函数:

def dot(A,B): 
    return (sum(a*b for a,b in zip(A,B)))

然后只需应用余弦相似度公式即可:

def cosine_similarity(a,b):
    return dot(a,b) / ( (dot(a,a) **.5) * (dot(b,b) ** .5) )

without using any imports

math.sqrt(x)

can be replaced with

x** .5

without using numpy.dot() you have to create your own dot function using list comprehension:

def dot(A,B): 
    return (sum(a*b for a,b in zip(A,B)))

and then its just a simple matter of applying the cosine similarity formula:

def cosine_similarity(a,b):
    return dot(a,b) / ( (dot(a,a) **.5) * (dot(b,b) ** .5) )

回答 5

我根据问题中的几个答案进行了基准测试,以下代码段被认为是最佳选择:

def dot_product2(v1, v2):
    return sum(map(operator.mul, v1, v2))


def vector_cos5(v1, v2):
    prod = dot_product2(v1, v2)
    len1 = math.sqrt(dot_product2(v1, v1))
    len2 = math.sqrt(dot_product2(v2, v2))
    return prod / (len1 * len2)

结果使我感到惊讶的是,基于的实现scipy不是最快的。我进行了分析,发现scipy中的余弦需要大量时间才能将向量从python列表转换为numpy数组。

I did a benchmark based on several answers in the question and the following snippet is believed to be the best choice:

def dot_product2(v1, v2):
    return sum(map(operator.mul, v1, v2))


def vector_cos5(v1, v2):
    prod = dot_product2(v1, v2)
    len1 = math.sqrt(dot_product2(v1, v1))
    len2 = math.sqrt(dot_product2(v2, v2))
    return prod / (len1 * len2)

The result makes me surprised that the implementation based on scipy is not the fastest one. I profiled and find that cosine in scipy takes a lot of time to cast a vector from python list to numpy array.


回答 6

import math
from itertools import izip

def dot_product(v1, v2):
    return sum(map(lambda x: x[0] * x[1], izip(v1, v2)))

def cosine_measure(v1, v2):
    prod = dot_product(v1, v2)
    len1 = math.sqrt(dot_product(v1, v1))
    len2 = math.sqrt(dot_product(v2, v2))
    return prod / (len1 * len2)

您可以在计算后将其取整:

cosine = format(round(cosine_measure(v1, v2), 3))

如果您希望它真的很短,则可以使用以下一种格式:

from math import sqrt
from itertools import izip

def cosine_measure(v1, v2):
    return (lambda (x, y, z): x / sqrt(y * z))(reduce(lambda x, y: (x[0] + y[0] * y[1], x[1] + y[0]**2, x[2] + y[1]**2), izip(v1, v2), (0, 0, 0)))
import math
from itertools import izip

def dot_product(v1, v2):
    return sum(map(lambda x: x[0] * x[1], izip(v1, v2)))

def cosine_measure(v1, v2):
    prod = dot_product(v1, v2)
    len1 = math.sqrt(dot_product(v1, v1))
    len2 = math.sqrt(dot_product(v2, v2))
    return prod / (len1 * len2)

You can round it after computing:

cosine = format(round(cosine_measure(v1, v2), 3))

If you want it really short, you can use this one-liner:

from math import sqrt
from itertools import izip

def cosine_measure(v1, v2):
    return (lambda (x, y, z): x / sqrt(y * z))(reduce(lambda x, y: (x[0] + y[0] * y[1], x[1] + y[0]**2, x[2] + y[1]**2), izip(v1, v2), (0, 0, 0)))

回答 7

您可以使用简单的函数在Python中执行此操作:

def get_cosine(text1, text2):
  vec1 = text1
  vec2 = text2
  intersection = set(vec1.keys()) & set(vec2.keys())
  numerator = sum([vec1[x] * vec2[x] for x in intersection])
  sum1 = sum([vec1[x]**2 for x in vec1.keys()])
  sum2 = sum([vec2[x]**2 for x in vec2.keys()])
  denominator = math.sqrt(sum1) * math.sqrt(sum2)
  if not denominator:
     return 0.0
  else:
     return round(float(numerator) / denominator, 3)
dataSet1 = [3, 45, 7, 2]
dataSet2 = [2, 54, 13, 15]
get_cosine(dataSet1, dataSet2)

You can do this in Python using simple function:

def get_cosine(text1, text2):
  vec1 = text1
  vec2 = text2
  intersection = set(vec1.keys()) & set(vec2.keys())
  numerator = sum([vec1[x] * vec2[x] for x in intersection])
  sum1 = sum([vec1[x]**2 for x in vec1.keys()])
  sum2 = sum([vec2[x]**2 for x in vec2.keys()])
  denominator = math.sqrt(sum1) * math.sqrt(sum2)
  if not denominator:
     return 0.0
  else:
     return round(float(numerator) / denominator, 3)
dataSet1 = [3, 45, 7, 2]
dataSet2 = [2, 54, 13, 15]
get_cosine(dataSet1, dataSet2)

回答 8

使用numpy将一个数字列表与多个列表(矩阵)进行比较:

def cosine_similarity(vector,matrix):
   return ( np.sum(vector*matrix,axis=1) / ( np.sqrt(np.sum(matrix**2,axis=1)) * np.sqrt(np.sum(vector**2)) ) )[::-1]

Using numpy compare one list of numbers to multiple lists(matrix):

def cosine_similarity(vector,matrix):
   return ( np.sum(vector*matrix,axis=1) / ( np.sqrt(np.sum(matrix**2,axis=1)) * np.sqrt(np.sum(vector**2)) ) )[::-1]

回答 9

您可以使用以下简单函数来计算余弦相似度:

def cosine_similarity(a, b):
return sum([i*j for i,j in zip(a, b)])/(math.sqrt(sum([i*i for i in a]))* math.sqrt(sum([i*i for i in b])))

You can use this simple function to calculate the cosine similarity:

def cosine_similarity(a, b):
return sum([i*j for i,j in zip(a, b)])/(math.sqrt(sum([i*i for i in a]))* math.sqrt(sum([i*i for i in b])))

回答 10

如果您碰巧已经在使用PyTorch,则应使用其CosineSimilarity实现

假设你有两个nnumpy.ndarrayS,v1v2,即它们的形状都是(n,)。这就是它们的余弦相似度的方法:

import torch
import torch.nn as nn

cos = nn.CosineSimilarity()
cos(torch.tensor([v1]), torch.tensor([v2])).item()

或者,假设你有两个numpy.ndarray小号w1w2,其形状都是(m, n)。以下内容为您提供了余弦相似度列表,每一个都是in中的一行w1和in中的相应行之间的余弦相似性w2

cos(torch.tensor(w1), torch.tensor(w2)).tolist()

If you happen to be using PyTorch already, you should go with their CosineSimilarity implementation.

Suppose you have two n-dimensional numpy.ndarrays, v1 and v2, i.e. their shapes are both (n,). Here’s how you get their cosine similarity:

import torch
import torch.nn as nn

cos = nn.CosineSimilarity()
cos(torch.tensor([v1]), torch.tensor([v2])).item()

Or suppose you have two numpy.ndarrays w1 and w2, whose shapes are both (m, n). The following gets you a list of cosine similarities, each being the cosine similarity between a row in w1 and the corresponding row in w2:

cos(torch.tensor(w1), torch.tensor(w2)).tolist()

回答 11

对于无法使用NumPy的情况,所有答案都非常有用。如果可以的话,这是另一种方法:

def cosine(x, y):
    dot_products = np.dot(x, y.T)
    norm_products = np.linalg.norm(x) * np.linalg.norm(y)
    return dot_products / (norm_products + EPSILON)

也要牢记EPSILON = 1e-07确保分裂。

All the answers are great for situations where you cannot use NumPy. If you can, here is another approach:

def cosine(x, y):
    dot_products = np.dot(x, y.T)
    norm_products = np.linalg.norm(x) * np.linalg.norm(y)
    return dot_products / (norm_products + EPSILON)

Also bear in mind about EPSILON = 1e-07 to secure the division.


为什么+ =在列表上表现异常?

问题:为什么+ =在列表上表现异常?

+=python中的运算符似乎在列表上运行异常。谁能告诉我这是怎么回事?

class foo:  
     bar = []
     def __init__(self,x):
         self.bar += [x]


class foo2:
     bar = []
     def __init__(self,x):
          self.bar = self.bar + [x]

f = foo(1)
g = foo(2)
print f.bar
print g.bar 

f.bar += [3]
print f.bar
print g.bar

f.bar = f.bar + [4]
print f.bar
print g.bar

f = foo2(1)
g = foo2(2)
print f.bar 
print g.bar 

输出值

[1, 2]
[1, 2]
[1, 2, 3]
[1, 2, 3]
[1, 2, 3, 4]
[1, 2, 3]
[1]
[2]

foo += bar似乎影响类的每个实例,而foo = foo + bar似乎以我希望事情表现的方式表现。

+=运算符称为“化合物赋值运算符”。

The += operator in python seems to be operating unexpectedly on lists. Can anyone tell me what is going on here?

class foo:  
     bar = []
     def __init__(self,x):
         self.bar += [x]


class foo2:
     bar = []
     def __init__(self,x):
          self.bar = self.bar + [x]

f = foo(1)
g = foo(2)
print f.bar
print g.bar 

f.bar += [3]
print f.bar
print g.bar

f.bar = f.bar + [4]
print f.bar
print g.bar

f = foo2(1)
g = foo2(2)
print f.bar 
print g.bar 

OUTPUT

[1, 2]
[1, 2]
[1, 2, 3]
[1, 2, 3]
[1, 2, 3, 4]
[1, 2, 3]
[1]
[2]

foo += bar seems to affect every instance of the class, whereas foo = foo + bar seems to behave in the way I would expect things to behave.

The += operator is called a “compound assignment operator”.


回答 0

一般的答案是+=尝试调用__iadd__特殊方法,如果该方法不可用,则尝试使用该方法__add__。因此,问题在于这些特殊方法之间的差异。

__iadd__特殊方法是就地此外,这是它发生变异,它作用于对象。该__add__特殊方法返回一个新的对象,也可用于标准+操作。

因此,当在+=__iadd__定义的对象上使用运算符时,该对象将被修改。否则,它将尝试使用纯文本__add__并返回一个新对象。

这就是为什么对于诸如列表之类的可变类型会+=更改对象的值,而对于诸如元组,字符串和整数之类的不可变类型则会返回一个新对象(a += b等于a = a + b)的原因。

对于类型的同时支持__iadd__,并__add__因此你必须要小心你使用哪一个。a += b将调用__iadd__和变异a,而a = a + b将创建一个新对象并将其分配给a。他们是不一样的操作!

>>> a1 = a2 = [1, 2]
>>> b1 = b2 = [1, 2]
>>> a1 += [3]          # Uses __iadd__, modifies a1 in-place
>>> b1 = b1 + [3]      # Uses __add__, creates new list, assigns it to b1
>>> a2
[1, 2, 3]              # a1 and a2 are still the same list
>>> b2
[1, 2]                 # whereas only b1 was changed

对于不可变的类型(没有__iadd__a += ba = a + b它们是等效的。这就是让您+=在不可变类型上使用的原因,这似乎是一个奇怪的设计决定,除非您考虑到否则无法+=在不可变类型(例如数字)上使用!

The general answer is that += tries to call the __iadd__ special method, and if that isn’t available it tries to use __add__ instead. So the issue is with the difference between these special methods.

The __iadd__ special method is for an in-place addition, that is it mutates the object that it acts on. The __add__ special method returns a new object and is also used for the standard + operator.

So when the += operator is used on an object which has an __iadd__ defined the object is modified in place. Otherwise it will instead try to use the plain __add__ and return a new object.

That is why for mutable types like lists += changes the object’s value, whereas for immutable types like tuples, strings and integers a new object is returned instead (a += b becomes equivalent to a = a + b).

For types that support both __iadd__ and __add__ you therefore have to be careful which one you use. a += b will call __iadd__ and mutate a, whereas a = a + b will create a new object and assign it to a. They are not the same operation!

>>> a1 = a2 = [1, 2]
>>> b1 = b2 = [1, 2]
>>> a1 += [3]          # Uses __iadd__, modifies a1 in-place
>>> b1 = b1 + [3]      # Uses __add__, creates new list, assigns it to b1
>>> a2
[1, 2, 3]              # a1 and a2 are still the same list
>>> b2
[1, 2]                 # whereas only b1 was changed

For immutable types (where you don’t have an __iadd__) a += b and a = a + b are equivalent. This is what lets you use += on immutable types, which might seem a strange design decision until you consider that otherwise you couldn’t use += on immutable types like numbers!


回答 1

对于一般情况,请参见Scott Griffith的答案。但是,当像您一样处理列表时,+=运算符是的简写someListObject.extend(iterableObject)。请参阅extend()文档

extend函数会将参数的所有元素添加到列表中。

执行此操作时,foo += something您要foo在适当位置修改列表,因此无需更改名称foo指向的引用,而是直接更改列表对象。使用foo = foo + something,您实际上是在创建一个列表。

此示例代码将对其进行解释:

>>> l = []
>>> id(l)
13043192
>>> l += [3]
>>> id(l)
13043192
>>> l = l + [3]
>>> id(l)
13059216

请注意,当您将新列表重新分配给时,参考如何变化l

bar使用类变量(而不是实例变量)一样,就地修改将影响该类的所有实例。但是,当重新定义时self.bar,该实例将具有一个单独的实例变量,self.bar而不会影响其他类实例。

For the general case, see Scott Griffith’s answer. When dealing with lists like you are, though, the += operator is a shorthand for someListObject.extend(iterableObject). See the documentation of extend().

The extend function will append all elements of the parameter to the list.

When doing foo += something you’re modifying the list foo in place, thus you don’t change the reference that the name foo points to, but you’re changing the list object directly. With foo = foo + something, you’re actually creating a new list.

This example code will explain it:

>>> l = []
>>> id(l)
13043192
>>> l += [3]
>>> id(l)
13043192
>>> l = l + [3]
>>> id(l)
13059216

Note how the reference changes when you reassign the new list to l.

As bar is a class variable instead of an instance variable, modifying in place will affect all instances of that class. But when redefining self.bar, the instance will have a separate instance variable self.bar without affecting the other class instances.


回答 2

这里的问题是,bar被定义为类属性,而不是实例变量。

在中foo,在init方法中修改了class属性,这就是所有实例都受影响的原因。

在中foo2,使用(空)class属性定义了一个实例变量,并且每个实例都有自己的bar

“正确”的实现将是:

class foo:
    def __init__(self, x):
        self.bar = [x]

当然,类属性是完全合法的。实际上,无需创建此类的实例即可访问和修改它们:

class foo:
    bar = []

foo.bar = [x]

The problem here is, bar is defined as a class attribute, not an instance variable.

In foo, the class attribute is modified in the init method, that’s why all instances are affected.

In foo2, an instance variable is defined using the (empty) class attribute, and every instance gets its own bar.

The “correct” implementation would be:

class foo:
    def __init__(self, x):
        self.bar = [x]

Of course, class attributes are completely legal. In fact, you can access and modify them without creating an instance of the class like this:

class foo:
    bar = []

foo.bar = [x]

回答 3

这里涉及两件事:

1. class attributes and instance attributes
2. difference between the operators + and += for lists

+操作员__add__在列表上调用该方法。它从其操作数中获取所有元素,并创建一个包含这些元素保持其顺序的新列表。

+=操作员调用__iadd__列表中的方法。它需要一个iterable,并将iterable的所有元素附加到适当的列表中。它不会创建新的列表对象。

在课堂上,foo该陈述 self.bar += [x]不是作业陈述,而是实际上翻译为

self.bar.__iadd__([x])  # modifies the class attribute  

它修改了列表并像list方法一样起作用extend

foo2相反,在类中,init方法中的赋值语句

self.bar = self.bar + [x]  

可以按以下方式进行解构:
实例没有属性bar(尽管有一个同名的类属性),因此它可以访问该类属性bar并通过附加该属性来创建一个新列表x。该语句转换为:

self.bar = self.bar.__add__([x]) # bar on the lhs is the class attribute 

然后,它创建一个实例属性bar,并将新创建的列表分配给它。请注意,bar分配的rhs bar与lhs的不同。

对于class的实例foobar是class属性,而不是instance属性。因此,对class属性的任何更改bar都将反映在所有实例中。

相反,该类的每个实例foo2都有其自己的instance属性bar,该属性不同于同名的class属性bar

f = foo2(4)
print f.bar # accessing the instance attribute. prints [4]  
print f.__class__.bar # accessing the class attribute. prints []  

希望这能清除一切。

There are two things involved here:

1. class attributes and instance attributes
2. difference between the operators + and += for lists

+ operator calls the __add__ method on a list. It takes all the elements from its operands and makes a new list containing those elements maintaining their order.

+= operator calls __iadd__ method on the list. It takes an iterable and appends all the elements of the iterable to the list in place. It does not create a new list object.

In class foo the statement self.bar += [x] is not an assignment statement but actually translates to

self.bar.__iadd__([x])  # modifies the class attribute  

which modifies the list in place and acts like the list method extend.

In class foo2, on the contrary, the assignment statement in the init method

self.bar = self.bar + [x]  

can be deconstructed as:
The instance has no attribute bar (there is a class attribute of the same name, though) so it accesses the class attribute bar and creates a new list by appending x to it. The statement translates to:

self.bar = self.bar.__add__([x]) # bar on the lhs is the class attribute 

Then it creates an instance attribute bar and assigns the newly created list to it. Note that bar on the rhs of the assignment is different from the bar on the lhs.

For instances of class foo, bar is a class attribute and not instance attribute. Hence any change to the class attribute bar will be reflected for all instances.

On the contrary, each instance of the class foo2 has its own instance attribute bar which is different from the class attribute of the same name bar.

f = foo2(4)
print f.bar # accessing the instance attribute. prints [4]  
print f.__class__.bar # accessing the class attribute. prints []  

Hope this clears things.


回答 4

尽管已经过去了很多时间,并且说了许多正确的话,但是还没有答案将这两种效果捆绑在一起。

您有2种效果:

  1. 列表的一种“特殊”的,也许未被注意的行为+=(如Scott Griffiths所述
  2. 包含类属性和实例属性的事实(如Can BerkBüder所述

在class中foo,该__init__方法修改了class属性。这是因为self.bar += [x]翻译成self.bar = self.bar.__iadd__([x])__iadd__()是用于就地修改的,因此它将修改列表并返回对其的引用。

请注意,实例字典已被修改,尽管通常不需要,因为类字典已包含相同的赋值。因此,这个细节几乎没有引起注意-除非您foo.bar = []事后做。bar由于上述事实,实例在这里保持不变。

foo2但是,在class 中,使用了class bar,但没有涉及。而是[x]向其中添加一个,以形成一个新对象,如此self.bar.__add__([x])处所说,它不会修改该对象。然后将结果放入实例dict中,为实例提供一个新列表作为dict,而类的属性保持修改状态。

... = ... + ...和之间的区别... += ...也会影响以后的分配:

f = foo(1) # adds 1 to the class's bar and assigns f.bar to this as well.
g = foo(2) # adds 2 to the class's bar and assigns g.bar to this as well.
# Here, foo.bar, f.bar and g.bar refer to the same object.
print f.bar # [1, 2]
print g.bar # [1, 2]

f.bar += [3] # adds 3 to this object
print f.bar # As these still refer to the same object,
print g.bar # the output is the same.

f.bar = f.bar + [4] # Construct a new list with the values of the old ones, 4 appended.
print f.bar # Print the new one
print g.bar # Print the old one.

f = foo2(1) # Here a new list is created on every call.
g = foo2(2)
print f.bar # So these all obly have one element.
print g.bar 

您可以使用验证对象的身份print id(foo), id(f), id(g)()如果您使用的是Python3,请不要忘记其他的)。

顺便说一句:+=运算符被称为“扩充分配”,通常旨在尽可能进行就地修改。

Although much time has passed and many correct things were said, there is no answer which bundles both effects.

You have 2 effects:

  1. a “special”, maybe unnoticed behaviour of lists with += (as stated by Scott Griffiths)
  2. the fact that class attributes as well as instance attributes are involved (as stated by Can Berk Büder)

In class foo, the __init__ method modifies the class attribute. It is because self.bar += [x] translates to self.bar = self.bar.__iadd__([x]). __iadd__() is for inplace modification, so it modifies the list and returns a reference to it.

Note that the instance dict is modified although this would normally not be necessary as the class dict already contains the same assignment. So this detail goes almost unnoticed – except if you do a foo.bar = [] afterwards. Here the instances’s bar stays the same thanks to the said fact.

In class foo2, however, the class’s bar is used, but not touched. Instead, a [x] is added to it, forming a new object, as self.bar.__add__([x]) is called here, which doesn’t modify the object. The result is put into the instance dict then, giving the instance the new list as a dict, while the class’s attribute stays modified.

The distinction between ... = ... + ... and ... += ... affects as well the assignments afterwards:

f = foo(1) # adds 1 to the class's bar and assigns f.bar to this as well.
g = foo(2) # adds 2 to the class's bar and assigns g.bar to this as well.
# Here, foo.bar, f.bar and g.bar refer to the same object.
print f.bar # [1, 2]
print g.bar # [1, 2]

f.bar += [3] # adds 3 to this object
print f.bar # As these still refer to the same object,
print g.bar # the output is the same.

f.bar = f.bar + [4] # Construct a new list with the values of the old ones, 4 appended.
print f.bar # Print the new one
print g.bar # Print the old one.

f = foo2(1) # Here a new list is created on every call.
g = foo2(2)
print f.bar # So these all obly have one element.
print g.bar 

You can verify the identity of the objects with print id(foo), id(f), id(g) (don’t forget the additional ()s if you are on Python3).

BTW: The += operator is called “augmented assignment” and generally is intended to do inplace modifications as far as possible.


回答 5

其他答案似乎几乎涵盖了所有内容,尽管它似乎值得引用和参考增值作业PEP 203

他们(增强的赋值运算符)实现了与普通二进制形式相同的运算符,不同之处在于,当左侧对象支持该操作时“就地”执行该操作,并且左侧仅被评估一次。

Python中的增强赋值背后的想法是,它不仅是编写将二进制运算的结果存储在其左操作数中的通用实践的简便方法,而且还是一种用于所讨论的左操作数的方法。知道它应该“自己”运行,而不是创建自己的修改副本。

The other answers would seem to pretty much have it covered, though it seems worth quoting and referring to the Augmented Assignments PEP 203:

They [the augmented assignment operators] implement the same operator as their normal binary form, except that the operation is done `in-place’ when the left-hand side object supports it, and that the left-hand side is only evaluated once.

The idea behind augmented assignment in Python is that it isn’t just an easier way to write the common practice of storing the result of a binary operation in its left-hand operand, but also a way for the left-hand operand in question to know that it should operate `on itself’, rather than creating a modified copy of itself.


回答 6

>>> elements=[[1],[2],[3]]
>>> subset=[]
>>> subset+=elements[0:1]
>>> subset
[[1]]
>>> elements
[[1], [2], [3]]
>>> subset[0][0]='change'
>>> elements
[['change'], [2], [3]]

>>> a=[1,2,3,4]
>>> b=a
>>> a+=[5]
>>> a,b
([1, 2, 3, 4, 5], [1, 2, 3, 4, 5])
>>> a=[1,2,3,4]
>>> b=a
>>> a=a+[5]
>>> a,b
([1, 2, 3, 4, 5], [1, 2, 3, 4])
>>> elements=[[1],[2],[3]]
>>> subset=[]
>>> subset+=elements[0:1]
>>> subset
[[1]]
>>> elements
[[1], [2], [3]]
>>> subset[0][0]='change'
>>> elements
[['change'], [2], [3]]

>>> a=[1,2,3,4]
>>> b=a
>>> a+=[5]
>>> a,b
([1, 2, 3, 4, 5], [1, 2, 3, 4, 5])
>>> a=[1,2,3,4]
>>> b=a
>>> a=a+[5]
>>> a,b
([1, 2, 3, 4, 5], [1, 2, 3, 4])

回答 7

>>> a = 89
>>> id(a)
4434330504
>>> a = 89 + 1
>>> print(a)
90
>>> id(a)
4430689552  # this is different from before!

>>> test = [1, 2, 3]
>>> id(test)
48638344L
>>> test2 = test
>>> id(test)
48638344L
>>> test2 += [4]
>>> id(test)
48638344L
>>> print(test, test2)  # [1, 2, 3, 4] [1, 2, 3, 4]```
([1, 2, 3, 4], [1, 2, 3, 4])
>>> id(test2)
48638344L # ID is different here

我们看到,当我们尝试修改不可变对象(在这种情况下为整数)时,Python只是给了我们一个不同的对象。另一方面,我们能够对可变对象(列表)进行更改,并使其始终保持不变。

参考:https : //medium.com/@tyastropheus/tricky-python-i-memory-management-for-mutable-immutable-objects-21507d1e5b95

另请参阅以下网址以了解浅拷贝和深拷贝

https://www.geeksforgeeks.org/copy-python-deep-copy-shallow-copy/

>>> a = 89
>>> id(a)
4434330504
>>> a = 89 + 1
>>> print(a)
90
>>> id(a)
4430689552  # this is different from before!

>>> test = [1, 2, 3]
>>> id(test)
48638344L
>>> test2 = test
>>> id(test)
48638344L
>>> test2 += [4]
>>> id(test)
48638344L
>>> print(test, test2)  # [1, 2, 3, 4] [1, 2, 3, 4]```
([1, 2, 3, 4], [1, 2, 3, 4])
>>> id(test2)
48638344L # ID is different here

We see that when we attempt to modify an immutable object (integer in this case), Python simply gives us a different object instead. On the other hand, we are able to make changes to an mutable object (a list) and have it remain the same object throughout.

ref : https://medium.com/@tyastropheus/tricky-python-i-memory-management-for-mutable-immutable-objects-21507d1e5b95

Also refer below url to understand the shallowcopy and deepcopy

https://www.geeksforgeeks.org/copy-python-deep-copy-shallow-copy/


在运行时检查Python模块版本

问题:在运行时检查Python模块版本

许多第三方Python模块都有一个属性,该属性保存该模块的版本信息(通常是module.VERSIONmodule.__version__),但是有些则没有。

此类模块的特定示例是libxslt和libxml2。

我需要检查这些模块在运行时是否使用了正确的版本。有没有办法做到这一点?

潜在的解决方案是在运行时读取源代码,对其进行哈希处理,然后将其与已知版本的哈希进行比较,但这很讨厌。

有更好的解决方案吗?

Many third-party Python modules have an attribute which holds the version information for the module (usually something like module.VERSION or module.__version__), however some do not.

Particular examples of such modules are libxslt and libxml2.

I need to check that the correct version of these modules are being used at runtime. Is there a way to do this?

A potential solution wold be to read in the source at runtime, hash it, and then compare it to the hash of the known version, but that’s nasty.

Is there a better solutions?


回答 0

我会远离哈希。使用的libxslt版本可能包含某种补丁,但不会影响您的使用。

作为一种替代方法,我建议您不要在运行时检查(不知道这是否很困难)。对于我编写的具有外部依赖性(第3方库)的python东西,我编写了一个脚本,用户可以运行该脚本来检查其python安装,以查看是否安装了适当的模块版本。

对于没有定义的“版本”属性的模块,您可以检查其包含的接口(类和方法),并查看它们是否与期望的接口匹配。然后,在您正在使用的实际代码中,假设第3方模块具有您期望的接口。

I’d stay away from hashing. The version of libxslt being used might contain some type of patch that doesn’t effect your use of it.

As an alternative, I’d like to suggest that you don’t check at run time (don’t know if that’s a hard requirement or not). For the python stuff I write that has external dependencies (3rd party libraries), I write a script that users can run to check their python install to see if the appropriate versions of modules are installed.

For the modules that don’t have a defined ‘version’ attribute, you can inspect the interfaces it contains (classes and methods) and see if they match the interface they expect. Then in the actual code that you’re working on, assume that the 3rd party modules have the interface you expect.


回答 1

使用pkg_resources。从PyPI安装的所有内容至少应具有版本号。

>>> import pkg_resources
>>> pkg_resources.get_distribution("blogofile").version
'0.7.1'

Use pkg_resources. Anything installed from PyPI at least should have a version number.

>>> import pkg_resources
>>> pkg_resources.get_distribution("blogofile").version
'0.7.1'

回答 2

一些想法:

  1. 尝试检查所需版本中存在的功能或不存在的功能。
  2. 如果没有函数差异,请检查函数参数和签名。
  3. 如果无法从函数签名中找出问题,请在导入时设置一些存根调用并检查其行为。

Some ideas:

  1. Try checking for functions that exist or don’t exist in your needed versions.
  2. If there are no function differences, inspect function arguments and signatures.
  3. If you can’t figure it out from function signatures, set up some stub calls at import time and check their behavior.

回答 3

您可以使用

pip freeze

以需求格式查看已安装的软件包。

You can use

pip freeze

to see the installed packages in requirements format.


回答 4

您可以importlib_metadata为此使用库。

如果您使用的是python < 3.8,请首先使用以下命令进行安装:

pip install importlib_metadata

从python开始,3.8它就包含在python的标准库中。

然后,要检查软件包的版本(在本示例中为lxml),请运行:

>>> from importlib_metadata import version
>>> version('lxml')
'4.3.1'

请记住,这仅适用于从PyPI安装的软件包。同样,您必须将包名称作为version方法的参数传递,而不是此包提供的模块名称(尽管它们通常是相同的)。

If you’re on python >=3.8 you can use a module from the built-in library for that. To check a package’s version (in this example lxml) run:

>>> from importlib.metadata import version
>>> version('lxml')
'4.3.1'

This functionality has been ported to older versions of python (<3.8) as well, but you need to install a separate library first:

pip install importlib_metadata

and then to check a package’s version (in this example lxml) run:

>>> from importlib_metadata import version
>>> version('lxml')
'4.3.1'

Keep in mind that this works only for packages installed from PyPI. Also, you must pass a package name as an argument to the version method, rather than a module name that this package provides (although they’re usually the same).


回答 5

我发现使用各种可用的工具(包括此其他答案中pkg_resources提到的最好的一种)非常不可靠,因为它们中的大多数都不能涵盖所有情况。例如

  • 内置模块
  • 未安装但仅添加到python路径的模块(例如,通过您的IDE)
  • 可以使用同一模块的两个版本(在python路径中取代已安装的一个)

由于我们需要一种可靠的方法来获取任何软件包,模块或子模块的版本,因此我最终编写了getversion。使用起来非常简单:

from getversion import get_module_version
import foo
version, details = get_module_version(foo)

有关详细信息,请参见文档

I found it quite unreliable to use the various tools available (including the best one pkg_resources mentioned by this other answer), as most of them do not cover all cases. For example

  • built-in modules
  • modules not installed but just added to the python path (by your IDE for example)
  • two versions of the same module available (one in python path superseding the one installed)

Since we needed a reliable way to get the version of any package, module or submodule, I ended up writing getversion. It is quite simple to use:

from getversion import get_module_version
import foo
version, details = get_module_version(foo)

See the documentation for details.


回答 6

对于不提供__version__以下功能但可以使用的模块:

#!/usr/bin/env python3.6
import sys
import os
import subprocess
import re

sp = subprocess.run(["pip3", "show", "numpy"], stdout=subprocess.PIPE)
ver = sp.stdout.decode('utf-8').strip().split('\n')[1]
res = re.search('^Version:\ (.*)$', ver)
print(res.group(1))

要么

#!/usr/bin/env python3.7
import sys
import os
import subprocess
import re

sp = subprocess.run(["pip3", "show", "numpy"], capture_output=True)
ver = sp.stdout.decode('utf-8').strip().split('\n')[1]
res = re.search('^Version:\ (.*)$', ver)
print(res.group(1))

For modules which do not provide __version__ the following is ugly but works:

#!/usr/bin/env python3.6
import sys
import os
import subprocess
import re

sp = subprocess.run(["pip3", "show", "numpy"], stdout=subprocess.PIPE)
ver = sp.stdout.decode('utf-8').strip().split('\n')[1]
res = re.search('^Version:\ (.*)$', ver)
print(res.group(1))

or

#!/usr/bin/env python3.7
import sys
import os
import subprocess
import re

sp = subprocess.run(["pip3", "show", "numpy"], capture_output=True)
ver = sp.stdout.decode('utf-8').strip().split('\n')[1]
res = re.search('^Version:\ (.*)$', ver)
print(res.group(1))

在matplotlib中的次要y轴上添加y轴标签

问题:在matplotlib中的次要y轴上添加y轴标签

我可以使用将y标签添加到左侧的y轴plt.ylabel,但是如何将其添加到辅助y轴呢?

table = sql.read_frame(query,connection)

table[0].plot(color=colors[0],ylim=(0,100))
table[1].plot(secondary_y=True,color=colors[1])
plt.ylabel('$')

I can add a y label to the left y-axis using plt.ylabel, but how can I add it to the secondary y-axis?

table = sql.read_frame(query,connection)

table[0].plot(color=colors[0],ylim=(0,100))
table[1].plot(secondary_y=True,color=colors[1])
plt.ylabel('$')

回答 0

最好的方法是axes直接与对象进行交互

import numpy as np
import matplotlib.pyplot as plt
x = np.arange(0, 10, 0.1)
y1 = 0.05 * x**2
y2 = -1 *y1

fig, ax1 = plt.subplots()

ax2 = ax1.twinx()
ax1.plot(x, y1, 'g-')
ax2.plot(x, y2, 'b-')

ax1.set_xlabel('X data')
ax1.set_ylabel('Y1 data', color='g')
ax2.set_ylabel('Y2 data', color='b')

plt.show()

The best way is to interact with the axes object directly

import numpy as np
import matplotlib.pyplot as plt
x = np.arange(0, 10, 0.1)
y1 = 0.05 * x**2
y2 = -1 *y1

fig, ax1 = plt.subplots()

ax2 = ax1.twinx()
ax1.plot(x, y1, 'g-')
ax2.plot(x, y2, 'b-')

ax1.set_xlabel('X data')
ax1.set_ylabel('Y1 data', color='g')
ax2.set_ylabel('Y2 data', color='b')

plt.show()


回答 1

有一个简单的解决方案,不会弄乱matplotlib:只是熊猫。

调整原始示例:

table = sql.read_frame(query,connection)

ax = table[0].plot(color=colors[0],ylim=(0,100))
ax2 = table[1].plot(secondary_y=True,color=colors[1], ax=ax)

ax.set_ylabel('Left axes label')
ax2.set_ylabel('Right axes label')

基本上,当secondary_y=True给定选项时(即使ax=ax也传递了),它会pandas.plot返回不同的轴,我们将使用这些轴来设置标签。

我知道很早以前就已经回答了,但是我认为这种方法值得。

There is a straightforward solution without messing with matplotlib: just pandas.

Tweaking the original example:

table = sql.read_frame(query,connection)

ax = table[0].plot(color=colors[0],ylim=(0,100))
ax2 = table[1].plot(secondary_y=True,color=colors[1], ax=ax)

ax.set_ylabel('Left axes label')
ax2.set_ylabel('Right axes label')

Basically, when the secondary_y=True option is given (eventhough ax=ax is passed too) pandas.plot returns a different axes which we use to set the labels.

I know this was answered long ago, but I think this approach worths it.


回答 2

我目前无法使用Python,但最不可思议的是:

fig = plt.figure()

axes1 = fig.add_subplot(111)
# set props for left y-axis here

axes2 = axes1.twinx()   # mirror them
axes2.set_ylabel(...)

I don’t have access to Python right now, but off the top of my head:

fig = plt.figure()

axes1 = fig.add_subplot(111)
# set props for left y-axis here

axes2 = axes1.twinx()   # mirror them
axes2.set_ylabel(...)

Python如何管理int和long?

问题:Python如何管理int和long?

有人知道Python如何在内部管理int和long类型吗?

  • 它会动态选择合适的类型吗?
  • 一个整数的限制是多少?
  • 我正在使用Python 2.6,与以前的版本有所不同吗?

我应该如何理解以下代码?

>>> print type(65535)
<type 'int'>
>>> print type(65536*65536)
<type 'long'>

更新:

>>> print type(0x7fffffff)
<type 'int'>
>>> print type(0x80000000)
<type 'long'>

Does anybody know how Python manage internally int and long types?

  • Does it choose the right type dynamically?
  • What is the limit for an int?
  • I am using Python 2.6, Is is different with previous versions?

How should I understand the code below?

>>> print type(65535)
<type 'int'>
>>> print type(65536*65536)
<type 'long'>

Update:

>>> print type(0x7fffffff)
<type 'int'>
>>> print type(0x80000000)
<type 'long'>

回答 0

intlong“统一” 了几个版本。在此之前,可以通过数学运算来溢出int。

3.x通过完全消除long而仅具有int来进一步提高了此功能。

  • Python 2sys.maxint包含Python int可以容纳的最大值。
    • 在64位Python 2.7上,大小为24个字节。用检查sys.getsizeof()
  • Python 3sys.maxsize包含Python int可以达到的最大字节数。
    • 这将是32位的千兆字节和64位的EB。
    • 如此大的int的值将等于8的幂sys.maxsize

int and long were “unified” a few versions back. Before that it was possible to overflow an int through math ops.

3.x has further advanced this by eliminating long altogether and only having int.

  • Python 2: sys.maxint contains the maximum value a Python int can hold.
    • On a 64-bit Python 2.7, the size is 24 bytes. Check with sys.getsizeof().
  • Python 3: sys.maxsize contains the maximum size in bytes a Python int can be.
    • This will be gigabytes in 32 bits, and exabytes in 64 bits.
    • Such a large int would have a value similar to 8 to the power of sys.maxsize.

回答 1

PEP应该有所帮助。

最重要的是,在python版本> 2.4中,您真的不必担心它

This PEP should help.

Bottom line is that you really shouldn’t have to worry about it in python versions > 2.4


回答 2

在我的机器上:

>>> print type(1<<30)
<type 'int'>
>>> print type(1<<31)
<type 'long'>
>>> print type(0x7FFFFFFF)
<type 'int'>
>>> print type(0x7FFFFFFF+1)
<type 'long'>

Python使用整数(32位带符号整数,我不知道它们是否是C整数)适合适合32位的值,但是对于任何东西,它都会自动切换为长整数(任意大的位数,即bignums)更大。我猜想这将加快速度以提供较小的值,同时通过无缝过渡到bignum避免任何溢出。

On my machine:

>>> print type(1<<30)
<type 'int'>
>>> print type(1<<31)
<type 'long'>
>>> print type(0x7FFFFFFF)
<type 'int'>
>>> print type(0x7FFFFFFF+1)
<type 'long'>

Python uses ints (32 bit signed integers, I don’t know if they are C ints under the hood or not) for values that fit into 32 bit, but automatically switches to longs (arbitrarily large number of bits – i.e. bignums) for anything larger. I’m guessing this speeds things up for smaller values while avoiding any overflows with a seamless transition to bignums.


回答 3

有趣。在我的64位(i7 Ubuntu)盒子上:

>>> print type(0x7FFFFFFF)
<type 'int'>
>>> print type(0x7FFFFFFF+1)
<type 'int'>

猜猜它在更大的机器上提升到64位整数。

Interesting. On my 64-bit (i7 Ubuntu) box:

>>> print type(0x7FFFFFFF)
<type 'int'>
>>> print type(0x7FFFFFFF+1)
<type 'int'>

Guess it steps up to 64 bit ints on a larger machine.


回答 4

Python 2.7.9自动提升数字。对于不确定使用int()或long()的情况。

>>> a = int("123")
>>> type(a)
<type 'int'>
>>> a = int("111111111111111111111111111111111111111111111111111")
>>> type(a)
<type 'long'>

Python 2.7.9 auto promotes numbers. For a case where one is unsure to use int() or long().

>>> a = int("123")
>>> type(a)
<type 'int'>
>>> a = int("111111111111111111111111111111111111111111111111111")
>>> type(a)
<type 'long'>

回答 5

Python 2将根据值的大小自动设置类型。最大值指南可在下面找到。

Python 2中默认Int的Max值为65535,任何高于此值的值都会很长

例如:

>> print type(65535)
<type 'int'>
>>> print type(65536*65536)
<type 'long'>

在Python 3中,长数据类型已被删除,所有整数值都由Int类处理。Int的默认大小将取决于您的CPU体系结构。

例如:

  • 32位系统,整数的默认数据类型为’Int32′
  • 64位系统,整数的默认数据类型将为’Int64′

每种类型的最小值/最大值可在下面找到:

  • Int8:[-128,127]
  • Int16:[-32768,32767]
  • Int32:[-2147483648,2147483647]
  • Int64:[-9223372036854775808,9223372036854775807]
  • Int128:[-170141183460469231731687303715884105728,170141183460469231731687303715884105727]
  • UInt8:[0,255]
  • UInt16:[0,65535]
  • UInt32:[0,4294967295]
  • UInt64:[0,18446744073709551615]
  • UInt128:[0,340282366920938938463463374607431768211455]

如果您的Int大小超过上述限制,python将自动更改其类型并分配更多内存以处理此最小值/最大值的增加。在Python 2中,它将转换为“ long”,现在仅转换为下一个Int大小。

示例:如果您使用的是32位操作系统,则Int的最大值默认为2147483647。如果分配的值为2147483648或更大,则类型将更改为Int64。

有多种方法可以检查int的大小及其内存分配。注意:在Python 3中,<class 'int'>无论您使用的是什么Int大小,使用内置的type()方法总是会返回。

Python 2 will automatically set the type based on the size of the value. A guide of max values can be found below.

The Max value of the default Int in Python 2 is 65535, anything above that will be a long

For example:

>> print type(65535)
<type 'int'>
>>> print type(65536*65536)
<type 'long'>

In Python 3 the long datatype has been removed and all integer values are handled by the Int class. The default size of Int will depend on your CPU architecture.

For example:

  • 32 bit systems the default datatype for integers will be ‘Int32’
  • 64 bit systems the default datatype for integers will be ‘Int64’

The min/max values of each type can be found below:

  • Int8: [-128,127]
  • Int16: [-32768,32767]
  • Int32: [-2147483648,2147483647]
  • Int64: [-9223372036854775808,9223372036854775807]
  • Int128: [-170141183460469231731687303715884105728,170141183460469231731687303715884105727]
  • UInt8: [0,255]
  • UInt16: [0,65535]
  • UInt32: [0,4294967295]
  • UInt64: [0,18446744073709551615]
  • UInt128: [0,340282366920938463463374607431768211455]

If the size of your Int exceeds the limits mentioned above, python will automatically change it’s type and allocate more memory to handle this increase in min/max values. Where in Python 2, it would convert into ‘long’, it now just converts into the next size of Int.

Example: If you are using a 32 bit operating system, your max value of an Int will be 2147483647 by default. If a value of 2147483648 or more is assigned, the type will be changed to Int64.

There are different ways to check the size of the int and it’s memory allocation. Note: In Python 3, using the built-in type() method will always return <class 'int'> no matter what size Int you are using.


回答 6

从python 3.x开始,统一整数库比旧版本更加智能。在(i7 Ubuntu)盒子上,我得到了以下信息:

>>> type(math.factorial(30))
<class 'int'>

有关实现的详细信息,请参见Include/longintrepr.h, Objects/longobject.c and Modules/mathmodule.c文件。最后一个文件是动态模块(编译为so文件)。该代码很好地遵循。

From python 3.x, the unified integer libries are even more smarter than older versions. On my (i7 Ubuntu) box I got the following,

>>> type(math.factorial(30))
<class 'int'>

For implementation details refer Include/longintrepr.h, Objects/longobject.c and Modules/mathmodule.c files. The last file is a dynamic module (compiled to an so file). The code is well commented to follow.


回答 7

只是为了继续这里给出的所有答案,尤其是@James Lanes

整数类型的大小可以通过以下公式表示:

总范围=(2 ^位系统)

下限=-(2 ^位系统)* 0.5上限=((2 ^位系统)* 0.5)-1

Just to continue to all the answers that were given here, especially @James Lanes

the size of the integer type can be expressed by this formula:

total range = (2 ^ bit system)

lower limit = -(2 ^ bit system)*0.5 upper limit = ((2 ^ bit system)*0.5) – 1


回答 8

它管理着他们,因为intlong是同级类定义。它们具有用于+,-,*,/等的适当方法,这些方法将产生适当类的结果。

例如

>>> a=1<<30
>>> type(a)
<type 'int'>
>>> b=a*2
>>> type(b)
<type 'long'>

在这种情况下,该类int具有一个__mul__方法(实现*的方法),该方法可long在需要时创建结果。

It manages them because int and long are sibling class definitions. They have appropriate methods for +, -, *, /, etc., that will produce results of the appropriate class.

For example

>>> a=1<<30
>>> type(a)
<type 'int'>
>>> b=a*2
>>> type(b)
<type 'long'>

In this case, the class int has a __mul__ method (the one that implements *) which creates a long result when required.


如何抑制熊猫未来警告?

问题:如何抑制熊猫未来警告?

当我运行程序时,Pandas每次都会发出如下“未来警告”。

D:\Python\lib\site-packages\pandas\core\frame.py:3581: FutureWarning: rename with inplace=True  will return None from pandas 0.11 onward
  " from pandas 0.11 onward", FutureWarning) 

我得到了消息,但我只是想一次又一次地停止Pandas显示这样的消息,是否可以设置任何buildin参数,以使Pandas不会弹出“未来警告”?

When I run the program, Pandas gives ‘Future warning’ like below every time.

D:\Python\lib\site-packages\pandas\core\frame.py:3581: FutureWarning: rename with inplace=True  will return None from pandas 0.11 onward
  " from pandas 0.11 onward", FutureWarning) 

I got the msg, but I just want to stop Pandas showing such msg again and again, is there any buildin parameter that I can set to let Pandas not pop up the ‘Future warning’ ?


回答 0

github上发现了这个…

import warnings
warnings.simplefilter(action='ignore', category=FutureWarning)

import pandas

Found this on github

import warnings
warnings.simplefilter(action='ignore', category=FutureWarning)

import pandas

回答 1

@bdiamante的答案可能只会部分帮助您。如果您在取消警告后仍然收到消息,那是因为pandas库本身正在打印消息。除非您自己编辑Pandas源代码,否则您将无能为力。也许内部有一个抑制它们的选项,或者是一种覆盖事物的方法,但是我找不到。


对于那些需要知道为什么的人…

假设您要确保干净的工作环境。在脚本的顶部,放pd.reset_option('all')。使用Pandas 0.23.4,您将获得以下信息:

>>> import pandas as pd
>>> pd.reset_option('all')
html.border has been deprecated, use display.html.border instead
(currently both are identical)

C:\projects\stackoverflow\venv\lib\site-packages\pandas\core\config.py:619: FutureWarning: html.bord
er has been deprecated, use display.html.border instead
(currently both are identical)

  warnings.warn(d.msg, FutureWarning)

: boolean
    use_inf_as_null had been deprecated and will be removed in a future
    version. Use `use_inf_as_na` instead.

C:\projects\stackoverflow\venv\lib\site-packages\pandas\core\config.py:619: FutureWarning:
: boolean
    use_inf_as_null had been deprecated and will be removed in a future
    version. Use `use_inf_as_na` instead.

  warnings.warn(d.msg, FutureWarning)

>>>

按照@bdiamante的建议,您可以使用该warnings库。现在,诚如其言,警告已被删除。但是,仍然存在一些令人讨厌的消息:

>>> import warnings
>>> warnings.simplefilter(action='ignore', category=FutureWarning)
>>> import pandas as pd
>>> pd.reset_option('all')
html.border has been deprecated, use display.html.border instead
(currently both are identical)


: boolean
    use_inf_as_null had been deprecated and will be removed in a future
    version. Use `use_inf_as_na` instead.

>>>

实际上,禁用所有警告会产生相同的输出:

>>> import warnings
>>> warnings.simplefilter(action='ignore', category=Warning)
>>> import pandas as pd
>>> pd.reset_option('all')
html.border has been deprecated, use display.html.border instead
(currently both are identical)


: boolean
    use_inf_as_null had been deprecated and will be removed in a future
    version. Use `use_inf_as_na` instead.

>>>

从标准库的角度来看,这些不是真正的警告。熊猫实施自己的警告系统。grep -rn在警告消息上运行表明,该pandas警告系统已在core/config_init.py以下位置实现:

$ grep -rn "html.border has been deprecated"
core/config_init.py:207:html.border has been deprecated, use display.html.border instead

进一步的追踪表明,我没有时间这样做。而且您可能也不是。希望这可以使您免于跌倒,或者可以激发某人找出如何真正压制这些消息的方法!

@bdiamante’s answer may only partially help you. If you still get a message after you’ve suppressed warnings, it’s because the pandas library itself is printing the message. There’s not much you can do about it unless you edit the Pandas source code yourself. Maybe there’s an option internally to suppress them, or a way to override things, but I couldn’t find one.


For those who need to know why…

Suppose that you want to ensure a clean working environment. At the top of your script, you put pd.reset_option('all'). With Pandas 0.23.4, you get the following:

>>> import pandas as pd
>>> pd.reset_option('all')
html.border has been deprecated, use display.html.border instead
(currently both are identical)

C:\projects\stackoverflow\venv\lib\site-packages\pandas\core\config.py:619: FutureWarning: html.bord
er has been deprecated, use display.html.border instead
(currently both are identical)

  warnings.warn(d.msg, FutureWarning)

: boolean
    use_inf_as_null had been deprecated and will be removed in a future
    version. Use `use_inf_as_na` instead.

C:\projects\stackoverflow\venv\lib\site-packages\pandas\core\config.py:619: FutureWarning:
: boolean
    use_inf_as_null had been deprecated and will be removed in a future
    version. Use `use_inf_as_na` instead.

  warnings.warn(d.msg, FutureWarning)

>>>

Following the @bdiamante’s advice, you use the warnings library. Now, true to it’s word, the warnings have been removed. However, several pesky messages remain:

>>> import warnings
>>> warnings.simplefilter(action='ignore', category=FutureWarning)
>>> import pandas as pd
>>> pd.reset_option('all')
html.border has been deprecated, use display.html.border instead
(currently both are identical)


: boolean
    use_inf_as_null had been deprecated and will be removed in a future
    version. Use `use_inf_as_na` instead.

>>>

In fact, disabling all warnings produces the same output:

>>> import warnings
>>> warnings.simplefilter(action='ignore', category=Warning)
>>> import pandas as pd
>>> pd.reset_option('all')
html.border has been deprecated, use display.html.border instead
(currently both are identical)


: boolean
    use_inf_as_null had been deprecated and will be removed in a future
    version. Use `use_inf_as_na` instead.

>>>

In the standard library sense, these aren’t true warnings. Pandas implements its own warnings system. Running grep -rn on the warning messages shows that the pandas warning system is implemented in core/config_init.py:

$ grep -rn "html.border has been deprecated"
core/config_init.py:207:html.border has been deprecated, use display.html.border instead

Further chasing shows that I don’t have time for this. And you probably don’t either. Hopefully this saves you from falling down the rabbit hole or perhaps inspires someone to figure out how to truly suppress these messages!


回答 2

警告很烦人。如其他答案所述,您可以使用以下方法抑制它们:

import warnings
warnings.simplefilter(action='ignore', category=FutureWarning)

但是,如果要一一处理它们,并且要管理更大的代码库,将很难找到引起警告的代码行。由于警告与错误不同,因此代码回溯不会附带警告。为了跟踪类似错误的警告,您可以在代码顶部编写以下代码:

import warnings
warnings.filterwarnings("error")

但是,如果代码库更大,并且正在导入一堆其他库/程序包,则各种警告将开始作为错误发出。为了仅将某些类型的警告(在您的情况下为FutureWarning)引发为错误,您可以编写:

import warnings
warnings.simplefilter(action='error', category=FutureWarning)

Warnings are annoying. As mentioned in other answers, you can suppress them using:

import warnings
warnings.simplefilter(action='ignore', category=FutureWarning)

But if you want to handle them one by one and you are managing a bigger codebase, it will be difficult to find the line of code which is causing the warning. Since warnings unlike errors don’t come with code traceback. In order to trace warnings like errors, you can write this at the top of the code:

import warnings
warnings.filterwarnings("error")

But if the codebase is bigger and it is importing bunch of other libraries/packages, then all sort of warnings will start to be raised as errors. In order to raise only certain type of warnings (in your case, its FutureWarning) as error, you can write:

import warnings
warnings.simplefilter(action='error', category=FutureWarning)

如何沿一个轴获取numpy数组中最大元素的索引

问题:如何沿一个轴获取numpy数组中最大元素的索引

我有一个二维的NumPy数组。我知道如何获取轴上的最大值:

>>> a = array([[1,2,3],[4,3,1]])
>>> amax(a,axis=0)
array([4, 3, 3])

如何获得最大元素的索引?所以我想作为输出array([1,1,0])

I have a 2 dimensional NumPy array. I know how to get the maximum values over axes:

>>> a = array([[1,2,3],[4,3,1]])
>>> amax(a,axis=0)
array([4, 3, 3])

How can I get the indices of the maximum elements? I would like as output array([1,1,0]) instead.


回答 0

>>> a.argmax(axis=0)

array([1, 1, 0])
>>> a.argmax(axis=0)

array([1, 1, 0])

回答 1

>>> import numpy as np
>>> a = np.array([[1,2,3],[4,3,1]])
>>> i,j = np.unravel_index(a.argmax(), a.shape)
>>> a[i,j]
4
>>> import numpy as np
>>> a = np.array([[1,2,3],[4,3,1]])
>>> i,j = np.unravel_index(a.argmax(), a.shape)
>>> a[i,j]
4

回答 2

argmax()将仅返回每一行的第一个匹配项。 http://docs.scipy.org/doc/numpy/reference/generation/numpy.argmax.html

如果您需要对整形阵列执行此操作,则此方法比unravel

import numpy as np
a = np.array([[1,2,3], [4,3,1]])  # Can be of any shape
indices = np.where(a == a.max())

您还可以更改条件:

indices = np.where(a >= 1.5)

上面以您要求的形式为您提供了结果。另外,您可以通过以下方式将其转换为x,y坐标列表:

x_y_coords =  zip(indices[0], indices[1])

argmax() will only return the first occurrence for each row. http://docs.scipy.org/doc/numpy/reference/generated/numpy.argmax.html

If you ever need to do this for a shaped array, this works better than unravel:

import numpy as np
a = np.array([[1,2,3], [4,3,1]])  # Can be of any shape
indices = np.where(a == a.max())

You can also change your conditions:

indices = np.where(a >= 1.5)

The above gives you results in the form that you asked for. Alternatively, you can convert to a list of x,y coordinates by:

x_y_coords =  zip(indices[0], indices[1])

回答 3

v = alli.max()
index = alli.argmax()
x, y = index/8, index%8
v = alli.max()
index = alli.argmax()
x, y = index/8, index%8

模拟函数引发异常以测试except块

问题:模拟函数引发异常以测试except块

我有一个foo调用另一个函数(bar)的函数()。如果调用bar()引发一个HttpError,如果状态代码为404,我想特别处理它,否则重新引发。

我正在尝试围绕此foo函数编写一些单元测试,以模拟对的调用bar()。不幸的是,我无法得到模拟调用bar()以引发被我的代码except块捕获的异常。

这是说明我问题的代码:

import unittest
import mock
from apiclient.errors import HttpError


class FooTests(unittest.TestCase):
    @mock.patch('my_tests.bar')
    def test_foo_shouldReturnResultOfBar_whenBarSucceeds(self, barMock):
        barMock.return_value = True
        result = foo()
        self.assertTrue(result)  # passes

    @mock.patch('my_tests.bar')
    def test_foo_shouldReturnNone_whenBarRaiseHttpError404(self, barMock):
        barMock.side_effect = HttpError(mock.Mock(return_value={'status': 404}), 'not found')
        result = foo()
        self.assertIsNone(result)  # fails, test raises HttpError

    @mock.patch('my_tests.bar')
    def test_foo_shouldRaiseHttpError_whenBarRaiseHttpErrorNot404(self, barMock):
        barMock.side_effect = HttpError(mock.Mock(return_value={'status': 500}), 'error')
        with self.assertRaises(HttpError):  # passes
            foo()

def foo():
    try:
        result = bar()
        return result
    except HttpError as error:
        if error.resp.status == 404:
            print '404 - %s' % error.message
            return None
        raise

def bar():
    raise NotImplementedError()

我跟着模拟文档这不能不说您应该设置side_effect一个的Mock情况下,以一个Exception班有嘲笑功能引发错误。

我还查看了其他一些与StackOverflow相关的问与答,看起来我在做他们正在做的相同事情,以使他们的模拟引发Exception。

为什么设置side_effectbarMock不引起预期Exception得到提升?如果我做的事情很奇怪,我应该如何在except块中测试逻辑?

I have a function (foo) which calls another function (bar). If invoking bar() raises an HttpError, I want to handle it specially if the status code is 404, otherwise re-raise.

I am trying to write some unit tests around this foo function, mocking out the call to bar(). Unfortunately, I am unable to get the mocked call to bar() to raise an Exception which is caught by my except block.

Here is my code which illustrates my problem:

import unittest
import mock
from apiclient.errors import HttpError


class FooTests(unittest.TestCase):
    @mock.patch('my_tests.bar')
    def test_foo_shouldReturnResultOfBar_whenBarSucceeds(self, barMock):
        barMock.return_value = True
        result = foo()
        self.assertTrue(result)  # passes

    @mock.patch('my_tests.bar')
    def test_foo_shouldReturnNone_whenBarRaiseHttpError404(self, barMock):
        barMock.side_effect = HttpError(mock.Mock(return_value={'status': 404}), 'not found')
        result = foo()
        self.assertIsNone(result)  # fails, test raises HttpError

    @mock.patch('my_tests.bar')
    def test_foo_shouldRaiseHttpError_whenBarRaiseHttpErrorNot404(self, barMock):
        barMock.side_effect = HttpError(mock.Mock(return_value={'status': 500}), 'error')
        with self.assertRaises(HttpError):  # passes
            foo()

def foo():
    try:
        result = bar()
        return result
    except HttpError as error:
        if error.resp.status == 404:
            print '404 - %s' % error.message
            return None
        raise

def bar():
    raise NotImplementedError()

I followed the Mock docs which say that you should set the side_effect of a Mock instance to an Exception class to have the mocked function raise the error.

I also looked at some other related StackOverflow Q&As, and it looks like I am doing the same thing they are doing to cause and Exception to be raised by their mock.

Why is setting the side_effect of barMock not causing the expected Exception to be raised? If I am doing something weird, how should I go about testing logic in my except block?


回答 0

您的模拟正在引发异常,但是该error.resp.status值丢失了。而不是使用return_value,只是告诉Mockstatus是一个属性:

barMock.side_effect = HttpError(mock.Mock(status=404), 'not found')

将其他关键字参数Mock()设置为结果对象的属性。

我将您的foobar定义放在my_tests模块中,并添加到HttpError类中,这样我也可以使用它,然后您的测试可以成功进行:

>>> from my_tests import foo, HttpError
>>> import mock
>>> with mock.patch('my_tests.bar') as barMock:
...     barMock.side_effect = HttpError(mock.Mock(status=404), 'not found')
...     result = my_test.foo()
... 
404 - 
>>> result is None
True

您甚至可以看到print '404 - %s' % error.message生产线运行,但是我想您想在error.content那里使用它。HttpError()无论如何,这是第二个参数设置的属性。

Your mock is raising the exception just fine, but the error.resp.status value is missing. Rather than use return_value, just tell Mock that status is an attribute:

barMock.side_effect = HttpError(mock.Mock(status=404), 'not found')

Additional keyword arguments to Mock() are set as attributes on the resulting object.

I put your foo and bar definitions in a my_tests module, added in the HttpError class so I could use it too, and your test then can be ran to success:

>>> from my_tests import foo, HttpError
>>> import mock
>>> with mock.patch('my_tests.bar') as barMock:
...     barMock.side_effect = HttpError(mock.Mock(status=404), 'not found')
...     result = my_test.foo()
... 
404 - 
>>> result is None
True

You can even see the print '404 - %s' % error.message line run, but I think you wanted to use error.content there instead; that’s the attribute HttpError() sets from the second argument, at any rate.


在Python中,什么时候应该使用函数而不是方法?

问题:在Python中,什么时候应该使用函数而不是方法?

Python的Zen指出,只有一种方法可以做事情-但我经常遇到决定何时使用函数以及何时使用方法的问题。

让我们举一个简单的例子-ChessBoard对象。假设我们需要某种方式使董事会上所有合法的King举动均可用。我们是否编写ChessBoard.get_king_moves()或get_king_moves(chess_board)?

这是我看过的一些相关问题:

我得到的答案基本上没有定论:

为什么Python使用方法来实现某些功能(例如list.index()),却使用其他方法(例如len(list))呢?

主要原因是历史。函数用于那些对一组类型通用的操作,即使对于根本没有方法的对象(例如元组),这些操作也可以使用。使用Python的功能特性(map(),apply()等)时,具有可以轻松应用于对象的不定形集合的函数也很方便。

实际上,将len(),max(),min()实现为内置函数实际上比将它们实现为每种类型的方法要少。人们可能会质疑个别情况,但这是Python的一部分,现在进行这样的基本更改为时已晚。必须保留功能以避免大量代码损坏。

尽管很有趣,但是上面并没有真正说明采用哪种策略。

这是原因之一-使用自定义方法,开发人员可以自由选择其他方法名称,例如getLength(),length(),getlength()或其他名称。Python强制执行严格的命名,以便可以使用通用函数len()。

稍微有趣一点。我认为函数在某种意义上是接口的Pythonic版本。

最后,来自Guido本人

谈论能力/接口使我想到了一些“流氓”特殊方法名称。在《语言参考》中,它说:“类可以通过定义具有特殊名称的方法来实现某些由特殊语法调用的操作(例如算术运算或下标和切片)。” 但是,所有这些带有特殊名称的方法(例如__len__或)__unicode__似乎都是为内置函数的利益提供的,而不是为了支持语法。大概在基于接口的Python中,这些方法将在ABC上变成常规命名的方法,因此 __len__将成为

class container:
  ...
  def len(self):
    raise NotImplemented

虽然,再想一想,我不明白为什么所有的句法运算都不会仅仅在特定的ABC上调用适当的通常命名的方法。“ <”举例来说,大概会调用“ object.lessthan”(或者是“ comparable.lessthan“)。因此,另一个好处是能够使Python摆脱这种乱七八糟的名字,对我而言这似乎是HCI的改进

嗯 我不确定我是否同意(图:-)。

我首先要解释“ Python基本原理”的两个方面。

首先,出于HCI的原因,我选择了len(x)而不是x.len()(def __len__()后来出现了)。实际上,两个HCI相互交织在一起:

(a)对于某些运算,前缀表示法比后缀读得更好-前缀(和infix!)操作在数学中具有悠久的传统,喜欢在视觉上帮助数学家思考问题的表示法。比较与我们改写像公式简单x*(a+b)x*a + x*b使用原始OO符号做同样的事情的笨拙。

(b)当我读到说的代码时,len(x)知道那是在问某物的长度。这告诉我两件事:结果是整数,参数是某种容器。相反,当我阅读本文时x.len(),我必须已经知道这x是一种实现接口或从具有standard的类继承的容器len()。当未实现映射的类具有get()keys() 方法,或者不是文件的某些具有方法时,我们有时会感到困惑write()

用另一种方式说同样的事情,我将’len’视为内置 操作。我不想失去那个。我不能肯定地说出您是否是那样的意思,但是“ def len(self):…”当然听起来像您想将其降级为普通方法。我对此坚决为-1。

我答应解释的Python基本原理的第二点是为什么我选择了特殊的外观__special__而不是仅仅 选择外观的原因special。我期待类可能要覆盖的许多操作,一些标准(例如__add____getitem__),某些不是那么标准(例如,泡菜__reduce__很长一段时间都不支持C代码)。我不希望这些特殊操作使用普通的方法名称,因为那样的话,预先存在的类或用户没有为所有特殊方法存储百科全书的用户编写的类可能会意外地定义它们并非要实现的操作,可能会造成灾难性的后果。伊万·科斯蒂奇(IvanKrstić)在他的信息中对此进行了更为简洁的解释,在我将所有这些内容写完之后,这些信息才得以体现。

—Guido van Rossum(主页:http ://www.python.org/~guido/ )

我对此的理解是,在某些情况下,前缀表示法更有意义(即,从语言的角度来看,Duck.quack比quack(Duck)更有意义。)而且,该函数还允许使用“接口”。

在这种情况下,我的猜测是仅基于Guido的第一点实现get_king_moves。但这仍然存在很多悬而未决的问题,例如使用类似的push和pop方法实现堆栈和队列类-它们应该是函数还是方法?(在这里我会猜测功能,因为我真的很想发信号通知推送界面)

TLDR:有人可以解释决定何时使用函数还是方法的策略是什么?

The Zen of Python states that there should only be one way to do things- yet frequently I run into the problem of deciding when to use a function versus when to use a method.

Let’s take a trivial example- a ChessBoard object. Let’s say we need some way to get all the legal King moves available on the board. Do we write ChessBoard.get_king_moves() or get_king_moves(chess_board)?

Here are some related questions I looked at:

The answers I got were largely inconclusive:

Why does Python use methods for some functionality (e.g. list.index()) but functions for other (e.g. len(list))?

The major reason is history. Functions were used for those operations that were generic for a group of types and which were intended to work even for objects that didn’t have methods at all (e.g. tuples). It is also convenient to have a function that can readily be applied to an amorphous collection of objects when you use the functional features of Python (map(), apply() et al).

In fact, implementing len(), max(), min() as a built-in function is actually less code than implementing them as methods for each type. One can quibble about individual cases but it’s a part of Python, and it’s too late to make such fundamental changes now. The functions have to remain to avoid massive code breakage.

While interesting, the above doesn’t really say much as to what strategy to adopt.

This is one of the reasons – with custom methods, developers would be free to choose a different method name, like getLength(), length(), getlength() or whatsoever. Python enforces strict naming so that the common function len() can be used.

Slightly more interesting. My take is that functions are in a sense, the Pythonic version of interfaces.

Lastly, from Guido himself:

Talking about the Abilities/Interfaces made me think about some of our “rogue” special method names. In the Language Reference, it says, “A class can implement certain operations that are invoked by special syntax (such as arithmetic operations or subscripting and slicing) by defining methods with special names.” But there are all these methods with special names like __len__ or __unicode__ which seem to be provided for the benefit of built-in functions, rather than for support of syntax. Presumably in an interface-based Python, these methods would turn into regularly-named methods on an ABC, so that __len__ would become

class container:
  ...
  def len(self):
    raise NotImplemented

Though, thinking about it some more, I don’t see why all syntactic operations wouldn’t just invoke the appropriate normally-named method on a specific ABC. “<“, for instance, would presumably invoke “object.lessthan” (or perhaps “comparable.lessthan“). So another benefit would be the ability to wean Python away from this mangled-name oddness, which seems to me an HCI improvement.

Hm. I’m not sure I agree (figure that :-).

There are two bits of “Python rationale” that I’d like to explain first.

First of all, I chose len(x) over x.len() for HCI reasons (def __len__() came much later). There are two intertwined reasons actually, both HCI:

(a) For some operations, prefix notation just reads better than postfix — prefix (and infix!) operations have a long tradition in mathematics which likes notations where the visuals help the mathematician thinking about a problem. Compare the easy with which we rewrite a formula like x*(a+b) into x*a + x*b to the clumsiness of doing the same thing using a raw OO notation.

(b) When I read code that says len(x) I know that it is asking for the length of something. This tells me two things: the result is an integer, and the argument is some kind of container. To the contrary, when I read x.len(), I have to already know that x is some kind of container implementing an interface or inheriting from a class that has a standard len(). Witness the confusion we occasionally have when a class that is not implementing a mapping has a get() or keys() method, or something that isn’t a file has a write() method.

Saying the same thing in another way, I see ‘len’ as a built-in operation. I’d hate to lose that. I can’t say for sure whether you meant that or not, but ‘def len(self): …’ certainly sounds like you want to demote it to an ordinary method. I’m strongly -1 on that.

The second bit of Python rationale I promised to explain is the reason why I chose special methods to look __special__ and not merely special. I was anticipating lots of operations that classes might want to override, some standard (e.g. __add__ or __getitem__), some not so standard (e.g. pickle’s __reduce__ for a long time had no support in C code at all). I didn’t want these special operations to use ordinary method names, because then pre-existing classes, or classes written by users without an encyclopedic memory for all the special methods, would be liable to accidentally define operations they didn’t mean to implement, with possibly disastrous consequences. Ivan Krstić explained this more concise in his message, which arrived after I’d written all this up.

— –Guido van Rossum (home page: http://www.python.org/~guido/)

My understanding of this is that in certain cases, prefix notation just makes more sense (ie, Duck.quack makes more sense than quack(Duck) from a linguistic standpoint.) and again, the functions allow for “interfaces”.

In such a case, my guess would be to implement get_king_moves based solely on Guido’s first point. But that still leaves a lot of open questions regarding say, implementing a stack and queue class with similar push and pop methods- should they be functions or methods? (here I would guess functions, because I really want to signal a push-pop interface)

TLDR: Can someone explain what the strategy for deciding when to use functions vs. methods should be?


回答 0

我的一般规则是- 是在对象上执行还是由对象执行操作?

如果是由对象完成的,则应该是成员操作。如果它也可以应用于其他事物,或者由对象的其他事物完成,那么它应该是一个函数(或其他事物的成员)。

引入编程时,传统上(尽管实现不正确)以现实世界中的对象(例如汽车)来描述对象。您提到了一只鸭子,所以让我们开始吧。

class duck: 
    def __init__(self):pass
    def eat(self, o): pass 
    def crap(self) : pass
    def die(self)
    ....

在“对象是真实事物”类比的上下文中,为对象可以执行的任何操作添加类方法是“正确的”。所以说我想杀死一只鸭子,是否要在鸭子上添加.kill()?不,据我所知,动物不会自杀。因此,如果我想杀死一只鸭子,我应该这样做:

def kill(o):
    if isinstance(o, duck):
        o.die()
    elif isinstance(o, dog):
        print "WHY????"
        o.die()
    elif isinstance(o, nyancat):
        raise Exception("NYAN "*9001)
    else:
       print "can't kill it."

远离这种类比,为什么我们要使用方法和类?因为我们要包含数据并希望以某种方式构造我们的代码,以便将来可以重用和扩展它。这使我们想到了面向对象设计非常重要的封装概念。

封装原理实际上就是它的含义:作为设计人员,您应该隐藏有关实现和类内部的所有内容,对于任何用户或其他开发人员而言,都不一定要访问它。因为我们处理类的实例,所以这简化为“ 对该实例至关重要的操作”。如果操作不是实例特定的,则它不应是成员函数。

TL; DR:@Bryan说了什么。如果它在实例上运行并且需要访问类实例内部的数据,则它应该是成员函数。

My general rule is this – is the operation performed on the object or by the object?

if it is done by the object, it should be a member operation. If it could apply to other things too, or is done by something else to the object then it should be a function (or perhaps a member of something else).

When introducing programming, it is traditional (albeit implementation incorrect) to describe objects in terms of real-world objects such as cars. You mention a duck, so let’s go with that.

class duck: 
    def __init__(self):pass
    def eat(self, o): pass 
    def crap(self) : pass
    def die(self)
    ....

In the context of the “objects are real things” analogy, it is “correct” to add a class method for anything which the object can do. So say I want to kill off a duck, do I add a .kill() to the duck? No… as far as I know animals do not commit suicide. Therefore if I want to kill a duck I should do this:

def kill(o):
    if isinstance(o, duck):
        o.die()
    elif isinstance(o, dog):
        print "WHY????"
        o.die()
    elif isinstance(o, nyancat):
        raise Exception("NYAN "*9001)
    else:
       print "can't kill it."

Moving away from this analogy, why do we use methods and classes? Because we want to contain data and hopefully structure our code in a manner such that it will be reusable and extensible in the future. This brings us to the notion of encapsulation which is so dear to OO design.

The encapsulation principal is really what this comes down to: as a designer you should hide everything about the implementation and class internals which it is not absolutely necessarily for any user or other developer to access. Because we deal with instances of classes, this reduces to “what operations are crucial on this instance“. If an operation is not instance specific, then it should not be a member function.

TL;DR: what @Bryan said. If it operates on an instance and needs to access data which is internal to the class instance, it should be a member function.


回答 1

在需要以下情况时,请使用类:

1)从实现细节中隔离调用代码-利用抽象封装

2)当您想替代其他对象时-利用多态性

3)当您想为相似的对象重用代码时-利用继承

将函数用于对许多不同的对象类型有意义的调用-例如,内置的lenrepr函数适用于多种对象。

话虽如此,选择有时取决于口味。考虑一下最适合常规通话的方式和可读性。例如,这将是更好的(x.sin()**2 + y.cos()**2).sqrt()还是sqrt(sin(x)**2 + cos(y)**2)

Use a class when you want to:

1) Isolate calling code from implementation details — taking advantage of abstraction and encapsulation.

2) When you want to be substitutable for other objects — taking advantage of polymorphism.

3) When you want to reuse code for similar objects — taking advantage of inheritance.

Use a function for calls that make sense across many different object types — for example, the builtin len and repr functions apply to many kinds of objects.

That being said, the choice sometimes comes down to a matter of taste. Think in terms of what is most convenient and readable for typical calls. For example, which would be better (x.sin()**2 + y.cos()**2).sqrt() or sqrt(sin(x)**2 + cos(y)**2)?


回答 2

这是一条简单的经验法则:如果代码作用于对象的单个实例,请使用一种方法。甚至更好:除非有充分的理由将其编写为函数,否则请使用一种方法。

在您的特定示例中,您希望它看起来像这样:

chessboard = Chessboard()
...
chessboard.get_king_moves()

不要过度考虑。始终使用方法,直到您对自己说“将此方法定义为没有意义”为止,在这种情况下,您可以创建函数。

Here’s a simple rule of thumb: if the code acts upon a single instance of an object, use a method. Even better: use a method unless there is a compelling reason to write it as a function.

In your specific example, you want it to look like this:

chessboard = Chessboard()
...
chessboard.get_king_moves()

Don’t over think it. Always use methods until the point comes where you say to yourself “it doesn’t make sense to make this a method”, in which case you can make a function.


回答 3

我通常认为一个物体像一个人。

属性是人物的姓名,身高,鞋子的大小等。

方法功能是人可以执行的操作。

如果该操作只能由任何其他人完成,而又不需要该特定人独有的任何东西(并且无需更改该特定人的任何东西),那么它就是一个函数,应该这样编写。

如果某项操作正在对该人进行操作(例如进餐,散步等),或者需要该人独特的操作(例如跳舞,写书等),则应采用一种方法

当然,将其转换为您正在使用的特定对象并不总是一件容易的事,但是我发现这是思考它的好方法。

I usually think of an object like a person.

Attributes are the person’s name, height, shoe size, etc.

Methods and functions are operations that the person can perform.

If the operation could be done by just any ol’ person, without requiring anything unique to this one specific person (and without changing anything on this one specific person), then it’s a function and should be written as such.

If an operation is acting upon the person (e.g. eating, walking, …) or requires something unique to this person to get involved (like dancing, writing a book, …), then it should be a method.

Of course, it is not always trivial to translate this into the specific object you’re working with, but I find it is a good way to think of it.


回答 4

通常,我使用类来为某件事实现一组逻辑功能,以便在程序的其余部分中,我可以对事物进行推理,而不必担心构成其实现的所有小问题。

凡是是那核心抽象的一部分“你可以用做什么事情 ”通常应该是一个方法。这通常包括可以改变一切的事情,作为内部数据状态通常被认为是私人,而不是“你可以用做什么逻辑思想的一部分的事情 ”。

当您进行更高级别的操作时,特别是如果它们涉及多个事物,我发现它们通常最自然地表示为函数,前提是它们可以从事物的公共抽象中构建而无需特殊的内部访问(除非它们re方法)。这具有很大的优势,当我决定完全重写我工作方式的内部结构(无需更改接口)时,我只有一小部分核心方法可以重写,然后使用这些方法编写所有外部函数将工作。我发现坚持认为与类X有关的所有操作都是类X上的方法会导致类过于复杂。

这取决于我正在编写的代码。对于某些程序,我将它们建模为对象的集合,这些对象的相互作用引起了程序的行为。在这里,最重要的功能紧密耦合到单个对象,因此是在方法中实现的,其中包含实用功能。对于其他程序,最重要的东西是一组操作数据的函数,而类仅用于实现由这些函数操纵的自然“鸭子类型”。

Generally I use classes to implement a logical set of capabilities for some thing, so that in the rest of my program I can reason about the thing, not having to worry about all the little concerns that make up its implementation.

Anything that’s part of that core abstraction of “what you can do with a thing” should usually be a method. This generally includes everything that can alter a thing, as the internal data state is usually considered private and not part of the logical idea of “what you can do with a thing“.

When you come to higher level operations, especially if they involve multiple things, I find they are usually most naturally expressed as functions, if they can be built out of the public abstraction of a thing without needing special access to the internals (unless they’re methods of some other object). This has the big advantage that when I decide to completely rewrite the internals of how my thing works (without changing the interface), I just have a small core set of methods to rewrite, and then all the external functions written in terms of those methods will Just Work. I find that insisting that all operations to do with class X are methods on class X leads to over-complicated classes.

It depends on the code I’m writing though. For some programs I model them as a collection of objects whose interactions give rise to the behavior of the program; here most important functionality is closely coupled to a single object, and so is implemented in methods, with a scattering of utility functions. For other programs the most important stuff is a set of functions that manipulate data, and classes are in use only to implement the natural “duck types” that are manipulated by the functions.


回答 5

您可能会说,“面对模棱两可,拒绝猜测的诱惑”。

但是,这甚至不是猜测。您绝对可以确保两种方法的结果相同,因为它们可以解决您的问题。

我相信,采用多种方式实现目标只是一件好事。与其他用户一样,我要谦虚地告诉您,就语言而言,采用“味道更好” /感觉更直观的方法。

You may say that, “in the face of ambiguity, refuse the temptation to guess”.

However, it’s not even a guess. You’re absolutely sure that the outcomes of both approaches are the same in that they solve your problem.

I believe it is only a good thing to have multiple ways to accomplishing goals. I’d humbly tell you, as other users did already, to employ whichever “tastes better” / feels more intuitive, in terms of language.