I am creating neural nets with Tensorflow and skflow; for some reason I want to get the values of some inner tensors for a given input, so I am using myClassifier.get_layer_value(input, "tensorName"), myClassifier being a skflow.estimators.TensorFlowEstimator.
However, I find it difficult to find the correct syntax of the tensor name, even knowing its name (and I’m getting confused between operation and tensors), so I’m using tensorboard to plot the graph and look for the name.
Is there a way to enumerate all the tensors in a graph without using tensorboard?
回答 0
你可以做
[n.name for n in tf.get_default_graph().as_graph_def().node]
另外,如果要在IPython笔记本中进行原型制作,则可以直接在笔记本中显示图形,请参见show_graphAlexander’s Deep Dream 笔记本中的功能
[n.name for n in tf.get_default_graph().as_graph_def().node]
Also, if you are prototyping in an IPython notebook, you can show the graph directly in notebook, see show_graph function in Alexander’s Deep Dream notebook
import tensorflow as tf
a = tf.constant(1.3, name='const_a')
b = tf.Variable(3.1, name='variable_b')
c = tf.add(a, b, name='addition')
d = tf.multiply(c, a, name='multiply')for op in tf.get_default_graph().get_operations():print(str(op.name))
There is a way to do it slightly faster than in Yaroslav’s answer by using get_operations. Here is a quick example:
import tensorflow as tf
a = tf.constant(1.3, name='const_a')
b = tf.Variable(3.1, name='variable_b')
c = tf.add(a, b, name='addition')
d = tf.multiply(c, a, name='multiply')
for op in tf.get_default_graph().get_operations():
print(str(op.name))
To get all nodes: (type tensorflow.core.framework.node_def_pb2.NodeDef)
all_nodes = [n for n in tf.get_default_graph().as_graph_def().node]
To get all ops: (type tensorflow.python.framework.ops.Operation)
all_ops = tf.get_default_graph().get_operations()
To get all variables: (type tensorflow.python.ops.resource_variable_ops.ResourceVariable)
all_vars = tf.global_variables()
To get all tensors: (type tensorflow.python.framework.ops.Tensor)
all_tensors = [tensor for op in tf.get_default_graph().get_operations() for tensor in op.values()]
To get the graph in Tensorflow 2, instead of tf.get_default_graph() you need to instantiate a tf.function first and access the graph attribute, for example:
tf.all_variables() can get you the information you want.
Also, this commit made today in TensorFlow Learn that provides a function get_variable_names in estimator that you can use to retrieve all variable names easily.
The accepted answer only gives you a list of strings with the names. I prefer a different approach, which gives you (almost) direct access to the tensors:
graph = tf.get_default_graph()
list_of_tuples = [op.values() for op in graph.get_operations()]
list_of_tuples now contains every tensor, each within a tuple. You could also adapt it to get the tensors directly:
graph = tf.get_default_graph()
list_of_tuples = [op.values()[0] for op in graph.get_operations()]
回答 6
由于OP要求张量的列表而不是操作/节点的列表,因此代码应略有不同:
graph = tf.get_default_graph()
tensors_per_node =[node.values()for node in graph.get_operations()]
tensor_names =[tensor.name for tensors in tensors_per_node for tensor in tensors]
Since the OP asked for the list of the tensors instead of the list of operations/nodes, the code should be slightly different:
graph = tf.get_default_graph()
tensors_per_node = [node.values() for node in graph.get_operations()]
tensor_names = [tensor.name for tensors in tensors_per_node for tensor in tensors]
回答 7
先前的答案很好,我只想分享我编写的从图中选择张量的实用函数:
def get_graph_op(graph, and_conds=None, op='and', or_conds=None):"""Selects nodes' names in the graph if:
- The name contains all items in and_conds
- OR/AND depending on op
- The name contains any item in or_conds
Condition starting with a "!" are negated.
Returns all ops if no optional arguments is given.
Args:
graph (tf.Graph): The graph containing sought tensors
and_conds (list(str)), optional): Defaults to None.
"and" conditions
op (str, optional): Defaults to 'and'.
How to link the and_conds and or_conds:
with an 'and' or an 'or'
or_conds (list(str), optional): Defaults to None.
"or conditions"
Returns:
list(str): list of relevant tensor names
"""assert op in{'and','or'}if and_conds isNone:
and_conds =['']if or_conds isNone:
or_conds =['']
node_names =[n.name for n in graph.as_graph_def().node]
ands ={
n for n in node_names
if all(
cond in n if'!'notin cond
else cond[1:]notin n
for cond in and_conds
)}
ors ={
n for n in node_names
if any(
cond in n if'!'notin cond
else cond[1:]notin n
for cond in or_conds
)}if op =='and':return[
n for n in node_names
if n in ands.intersection(ors)]elif op =='or':return[
n for n in node_names
if n in ands.union(ors)]
Previous answers are good, I’d just like to share a utility function I wrote to select Tensors from a graph:
def get_graph_op(graph, and_conds=None, op='and', or_conds=None):
"""Selects nodes' names in the graph if:
- The name contains all items in and_conds
- OR/AND depending on op
- The name contains any item in or_conds
Condition starting with a "!" are negated.
Returns all ops if no optional arguments is given.
Args:
graph (tf.Graph): The graph containing sought tensors
and_conds (list(str)), optional): Defaults to None.
"and" conditions
op (str, optional): Defaults to 'and'.
How to link the and_conds and or_conds:
with an 'and' or an 'or'
or_conds (list(str), optional): Defaults to None.
"or conditions"
Returns:
list(str): list of relevant tensor names
"""
assert op in {'and', 'or'}
if and_conds is None:
and_conds = ['']
if or_conds is None:
or_conds = ['']
node_names = [n.name for n in graph.as_graph_def().node]
ands = {
n for n in node_names
if all(
cond in n if '!' not in cond
else cond[1:] not in n
for cond in and_conds
)}
ors = {
n for n in node_names
if any(
cond in n if '!' not in cond
else cond[1:] not in n
for cond in or_conds
)}
if op == 'and':
return [
n for n in node_names
if n in ands.intersection(ors)
]
elif op == 'or':
return [
n for n in node_names
if n in ands.union(ors)
]
I need to calculate the cosine similarity between two lists, let’s say for example list 1 which is dataSetI and list 2 which is dataSetII. I cannot use anything such as numpy or a statistics module. I must use common modules (math, etc) (and the least modules as possible, at that, to reduce time spent).
Let’s say dataSetI is [3, 45, 7, 2] and dataSetII is [2, 54, 13, 15]. The length of the lists are always equal.
Of course, the cosine similarity is between 0 and 1, and for the sake of it, it will be rounded to the third or fourth decimal with format(round(cosine, 3)).
You should try SciPy. It has a bunch of useful scientific routines for example, “routines for computing integrals numerically, solving differential equations, optimization, and sparse matrices.” It uses the superfast optimized NumPy for its number crunching. See here for installing.
Note that spatial.distance.cosine computes the distance, and not the similarity. So, you must subtract the value from 1 to get the similarity.
I don’t suppose performance matters much here, but I can’t resist. The zip() function completely recopies both vectors (more of a matrix transpose, actually) just to get the data in “Pythonic” order. It would be interesting to time the nuts-and-bolts implementation:
import math
def cosine_similarity(v1,v2):
"compute cosine similarity of v1 to v2: (v1 dot v2)/{||v1||*||v2||)"
sumxx, sumxy, sumyy = 0, 0, 0
for i in range(len(v1)):
x = v1[i]; y = v2[i]
sumxx += x*x
sumyy += y*y
sumxy += x*y
return sumxy/math.sqrt(sumxx*sumyy)
v1,v2 = [3, 45, 7, 2], [2, 54, 13, 15]
print(v1, v2, cosine_similarity(v1,v2))
Output: [3, 45, 7, 2] [2, 54, 13, 15] 0.972284251712
That goes through the C-like noise of extracting elements one-at-a-time, but does no bulk array copying and gets everything important done in a single for loop, and uses a single square root.
ETA: Updated print call to be a function. (The original was Python 2.7, not 3.3. The current runs under Python 2.7 with a from __future__ import print_function statement.) The output is the same, either way.
The result makes me surprised that the implementation based on scipy is not the fastest one. I profiled and find that cosine in scipy takes a lot of time to cast a vector from python list to numpy array.
Suppose you have two n-dimensional numpy.ndarrays, v1 and v2, i.e. their shapes are both (n,). Here’s how you get their cosine similarity:
import torch
import torch.nn as nn
cos = nn.CosineSimilarity()
cos(torch.tensor([v1]), torch.tensor([v2])).item()
Or suppose you have two numpy.ndarrays w1 and w2, whose shapes are both (m, n). The following gets you a list of cosine similarities, each being the cosine similarity between a row in w1 and the corresponding row in w2:
这就是为什么对于诸如列表之类的可变类型会+=更改对象的值,而对于诸如元组,字符串和整数之类的不可变类型则会返回一个新对象(a += b等于a = a + b)的原因。
对于类型的同时支持__iadd__,并__add__因此你必须要小心你使用哪一个。a += b将调用__iadd__和变异a,而a = a + b将创建一个新对象并将其分配给a。他们是不一样的操作!
>>> a1 = a2 =[1,2]>>> b1 = b2 =[1,2]>>> a1 +=[3]# Uses __iadd__, modifies a1 in-place>>> b1 = b1 +[3]# Uses __add__, creates new list, assigns it to b1>>> a2
[1,2,3]# a1 and a2 are still the same list>>> b2
[1,2]# whereas only b1 was changed
对于不可变的类型(没有__iadd__)a += b,a = a + b它们是等效的。这就是让您+=在不可变类型上使用的原因,这似乎是一个奇怪的设计决定,除非您考虑到否则无法+=在不可变类型(例如数字)上使用!
The general answer is that += tries to call the __iadd__ special method, and if that isn’t available it tries to use __add__ instead. So the issue is with the difference between these special methods.
The __iadd__ special method is for an in-place addition, that is it mutates the object that it acts on. The __add__ special method returns a new object and is also used for the standard + operator.
So when the += operator is used on an object which has an __iadd__ defined the object is modified in place. Otherwise it will instead try to use the plain __add__ and return a new object.
That is why for mutable types like lists += changes the object’s value, whereas for immutable types like tuples, strings and integers a new object is returned instead (a += b becomes equivalent to a = a + b).
For types that support both __iadd__ and __add__ you therefore have to be careful which one you use. a += b will call __iadd__ and mutate a, whereas a = a + b will create a new object and assign it to a. They are not the same operation!
>>> a1 = a2 = [1, 2]
>>> b1 = b2 = [1, 2]
>>> a1 += [3] # Uses __iadd__, modifies a1 in-place
>>> b1 = b1 + [3] # Uses __add__, creates new list, assigns it to b1
>>> a2
[1, 2, 3] # a1 and a2 are still the same list
>>> b2
[1, 2] # whereas only b1 was changed
For immutable types (where you don’t have an __iadd__) a += b and a = a + b are equivalent. This is what lets you use += on immutable types, which might seem a strange design decision until you consider that otherwise you couldn’t use += on immutable types like numbers!
For the general case, see Scott Griffith’s answer. When dealing with lists like you are, though, the += operator is a shorthand for someListObject.extend(iterableObject). See the documentation of extend().
The extend function will append all elements of the parameter to the list.
When doing foo += something you’re modifying the list foo in place, thus you don’t change the reference that the name foo points to, but you’re changing the list object directly. With foo = foo + something, you’re actually creating a new list.
This example code will explain it:
>>> l = []
>>> id(l)
13043192
>>> l += [3]
>>> id(l)
13043192
>>> l = l + [3]
>>> id(l)
13059216
Note how the reference changes when you reassign the new list to l.
As bar is a class variable instead of an instance variable, modifying in place will affect all instances of that class. But when redefining self.bar, the instance will have a separate instance variable self.bar without affecting the other class instances.
1. class attributes and instance attributes
2. difference between the operators + and += for lists
+ operator calls the __add__ method on a list. It takes all the elements from its operands and makes a new list containing those elements maintaining their order.
+= operator calls __iadd__ method on the list. It takes an iterable and appends all the elements of the iterable to the list in place. It does not create a new list object.
In class foo the statement self.bar += [x] is not an assignment statement but actually translates to
self.bar.__iadd__([x]) # modifies the class attribute
which modifies the list in place and acts like the list method extend.
In class foo2, on the contrary, the assignment statement in the init method
self.bar = self.bar + [x]
can be deconstructed as:
The instance has no attribute bar (there is a class attribute of the same name, though) so it accesses the class attribute bar and creates a new list by appending x to it. The statement translates to:
self.bar = self.bar.__add__([x]) # bar on the lhs is the class attribute
Then it creates an instance attribute bar and assigns the newly created list to it. Note that bar on the rhs of the assignment is different from the bar on the lhs.
For instances of class foo, bar is a class attribute and not instance attribute. Hence any change to the class attribute bar will be reflected for all instances.
On the contrary, each instance of the class foo2 has its own instance attribute bar which is different from the class attribute of the same name bar.
f = foo2(4)
print f.bar # accessing the instance attribute. prints [4]
print f.__class__.bar # accessing the class attribute. prints []
f = foo(1)# adds 1 to the class's bar and assigns f.bar to this as well.
g = foo(2)# adds 2 to the class's bar and assigns g.bar to this as well.# Here, foo.bar, f.bar and g.bar refer to the same object.print f.bar # [1, 2]print g.bar # [1, 2]
f.bar +=[3]# adds 3 to this objectprint f.bar # As these still refer to the same object,print g.bar # the output is the same.
f.bar = f.bar +[4]# Construct a new list with the values of the old ones, 4 appended.print f.bar # Print the new oneprint g.bar # Print the old one.
f = foo2(1)# Here a new list is created on every call.
g = foo2(2)print f.bar # So these all obly have one element.print g.bar
Although much time has passed and many correct things were said, there is no answer which bundles both effects.
You have 2 effects:
a “special”, maybe unnoticed behaviour of lists with += (as stated by Scott Griffiths)
the fact that class attributes as well as instance attributes are involved (as stated by Can Berk Büder)
In class foo, the __init__ method modifies the class attribute. It is because self.bar += [x] translates to self.bar = self.bar.__iadd__([x]). __iadd__() is for inplace modification, so it modifies the list and returns a reference to it.
Note that the instance dict is modified although this would normally not be necessary as the class dict already contains the same assignment. So this detail goes almost unnoticed – except if you do a foo.bar = [] afterwards. Here the instances’s bar stays the same thanks to the said fact.
In class foo2, however, the class’s bar is used, but not touched. Instead, a [x] is added to it, forming a new object, as self.bar.__add__([x]) is called here, which doesn’t modify the object. The result is put into the instance dict then, giving the instance the new list as a dict, while the class’s attribute stays modified.
The distinction between ... = ... + ... and ... += ... affects as well the assignments afterwards:
f = foo(1) # adds 1 to the class's bar and assigns f.bar to this as well.
g = foo(2) # adds 2 to the class's bar and assigns g.bar to this as well.
# Here, foo.bar, f.bar and g.bar refer to the same object.
print f.bar # [1, 2]
print g.bar # [1, 2]
f.bar += [3] # adds 3 to this object
print f.bar # As these still refer to the same object,
print g.bar # the output is the same.
f.bar = f.bar + [4] # Construct a new list with the values of the old ones, 4 appended.
print f.bar # Print the new one
print g.bar # Print the old one.
f = foo2(1) # Here a new list is created on every call.
g = foo2(2)
print f.bar # So these all obly have one element.
print g.bar
You can verify the identity of the objects with print id(foo), id(f), id(g) (don’t forget the additional ()s if you are on Python3).
BTW: The += operator is called “augmented assignment” and generally is intended to do inplace modifications as far as possible.
The other answers would seem to pretty much have it covered, though it seems worth quoting and referring to the Augmented Assignments PEP 203:
They [the augmented assignment operators] implement the same operator
as their normal binary form, except that the operation is done
`in-place’ when the left-hand side object supports it, and that the
left-hand side is only evaluated once.
…
The idea behind augmented
assignment in Python is that it isn’t just an easier way to write the
common practice of storing the result of a binary operation in its
left-hand operand, but also a way for the left-hand operand in
question to know that it should operate `on itself’, rather than
creating a modified copy of itself.
>>> a =89>>> id(a)4434330504>>> a =89+1>>>print(a)90>>> id(a)4430689552# this is different from before!>>> test =[1,2,3]>>> id(test)48638344L>>> test2 = test
>>> id(test)48638344L>>> test2 +=[4]>>> id(test)48638344L>>>print(test, test2)# [1, 2, 3, 4] [1, 2, 3, 4]```([1,2,3,4],[1,2,3,4])>>> id(test2)48638344L# ID is different here
>>> a = 89
>>> id(a)
4434330504
>>> a = 89 + 1
>>> print(a)
90
>>> id(a)
4430689552 # this is different from before!
>>> test = [1, 2, 3]
>>> id(test)
48638344L
>>> test2 = test
>>> id(test)
48638344L
>>> test2 += [4]
>>> id(test)
48638344L
>>> print(test, test2) # [1, 2, 3, 4] [1, 2, 3, 4]```
([1, 2, 3, 4], [1, 2, 3, 4])
>>> id(test2)
48638344L # ID is different here
We see that when we attempt to modify an immutable object (integer in this case), Python simply gives us a different object instead. On the other hand, we are able to make changes to an mutable object (a list) and have it remain the same object throughout.
Many third-party Python modules have an attribute which holds the version information for the module (usually something like module.VERSION or module.__version__), however some do not.
Particular examples of such modules are libxslt and libxml2.
I need to check that the correct version of these modules are being used at runtime. Is there a way to do this?
A potential solution wold be to read in the source at runtime, hash it, and then compare it to the hash of the known version, but that’s nasty.
I’d stay away from hashing. The version of libxslt being used might contain some type of patch that doesn’t effect your use of it.
As an alternative, I’d like to suggest that you don’t check at run time (don’t know if that’s a hard requirement or not). For the python stuff I write that has external dependencies (3rd party libraries), I write a script that users can run to check their python install to see if the appropriate versions of modules are installed.
For the modules that don’t have a defined ‘version’ attribute, you can inspect the interfaces it contains (classes and methods) and see if they match the interface they expect. Then in the actual code that you’re working on, assume that the 3rd party modules have the interface you expect.
If you’re on python >=3.8 you can use a module from the built-in library for that. To check a package’s version (in this example lxml) run:
>>> from importlib.metadata import version
>>> version('lxml')
'4.3.1'
This functionality has been ported to older versions of python (<3.8) as well, but you need to install a separate library first:
pip install importlib_metadata
and then to check a package’s version (in this example lxml) run:
>>> from importlib_metadata import version
>>> version('lxml')
'4.3.1'
Keep in mind that this works only for packages installed from PyPI. Also, you must pass a package name as an argument to the version method, rather than a module name that this package provides (although they’re usually the same).
I found it quite unreliable to use the various tools available (including the best one pkg_resources mentioned by this other answer), as most of them do not cover all cases. For example
built-in modules
modules not installed but just added to the python path (by your IDE for example)
two versions of the same module available (one in python path superseding the one installed)
Since we needed a reliable way to get the version of any package, module or submodule, I ended up writing getversion. It is quite simple to use:
from getversion import get_module_version
import foo
version, details = get_module_version(foo)
Basically, when the secondary_y=True option is given (eventhough ax=ax is passed too) pandas.plot returns a different axes which we use to set the labels.
I know this was answered long ago, but I think this approach worths it.
回答 2
我目前无法使用Python,但最不可思议的是:
fig = plt.figure()
axes1 = fig.add_subplot(111)# set props for left y-axis here
axes2 = axes1.twinx()# mirror them
axes2.set_ylabel(...)
Python uses ints (32 bit signed integers, I don’t know if they are C ints under the hood or not) for values that fit into 32 bit, but automatically switches to longs (arbitrarily large number of bits – i.e. bignums) for anything larger. I’m guessing this speeds things up for smaller values while avoiding any overflows with a seamless transition to bignums.
In Python 3 the long datatype has been removed and all integer values are handled by the Int class. The default size of Int will depend on your CPU architecture.
For example:
32 bit systems the default datatype for integers will be ‘Int32’
64 bit systems the default datatype for integers will be ‘Int64’
The min/max values of each type can be found below:
If the size of your Int exceeds the limits mentioned above, python will automatically change it’s type and allocate more memory to handle this increase in min/max values. Where in Python 2, it would convert into ‘long’, it now just converts into the next size of Int.
Example: If you are using a 32 bit operating system, your max value of an Int will be 2147483647 by default. If a value of 2147483648 or more is assigned, the type will be changed to Int64.
There are different ways to check the size of the int and it’s memory allocation.
Note: In Python 3, using the built-in type() method will always return <class 'int'> no matter what size Int you are using.
From python 3.x, the unified integer libries are even more smarter than older versions. On my (i7 Ubuntu) box I got the following,
>>> type(math.factorial(30))
<class 'int'>
For implementation details refer Include/longintrepr.h, Objects/longobject.c and Modules/mathmodule.c files. The last file is a dynamic module (compiled to an so file). The code is well commented to follow.
It manages them because int and long are sibling class definitions. They have appropriate methods for +, -, *, /, etc., that will produce results of the appropriate class.
D:\Python\lib\site-packages\pandas\core\frame.py:3581:FutureWarning: rename with inplace=True will returnNonefrom pandas 0.11 onward
" from pandas 0.11 onward",FutureWarning)
When I run the program, Pandas gives ‘Future warning’ like below every time.
D:\Python\lib\site-packages\pandas\core\frame.py:3581: FutureWarning: rename with inplace=True will return None from pandas 0.11 onward
" from pandas 0.11 onward", FutureWarning)
I got the msg, but I just want to stop Pandas showing such msg again and again, is there any buildin parameter that I can set to let Pandas not pop up the ‘Future warning’ ?
>>>import pandas as pd
>>> pd.reset_option('all')
html.border has been deprecated, use display.html.border instead
(currently both are identical)
C:\projects\stackoverflow\venv\lib\site-packages\pandas\core\config.py:619:FutureWarning: html.bord
er has been deprecated, use display.html.border instead
(currently both are identical)
warnings.warn(d.msg,FutureWarning): boolean
use_inf_as_null had been deprecated and will be removed in a future
version.Use`use_inf_as_na` instead.
C:\projects\stackoverflow\venv\lib\site-packages\pandas\core\config.py:619:FutureWarning:: boolean
use_inf_as_null had been deprecated and will be removed in a future
version.Use`use_inf_as_na` instead.
warnings.warn(d.msg,FutureWarning)>>>
>>>import warnings
>>> warnings.simplefilter(action='ignore', category=FutureWarning)>>>import pandas as pd
>>> pd.reset_option('all')
html.border has been deprecated, use display.html.border instead
(currently both are identical): boolean
use_inf_as_null had been deprecated and will be removed in a future
version.Use`use_inf_as_na` instead.>>>
实际上,禁用所有警告会产生相同的输出:
>>>import warnings
>>> warnings.simplefilter(action='ignore', category=Warning)>>>import pandas as pd
>>> pd.reset_option('all')
html.border has been deprecated, use display.html.border instead
(currently both are identical): boolean
use_inf_as_null had been deprecated and will be removed in a future
version.Use`use_inf_as_na` instead.>>>
@bdiamante’s answer may only partially help you. If you still get a message after you’ve suppressed warnings, it’s because the pandas library itself is printing the message. There’s not much you can do about it unless you edit the Pandas source code yourself. Maybe there’s an option internally to suppress them, or a way to override things, but I couldn’t find one.
For those who need to know why…
Suppose that you want to ensure a clean working environment. At the top of your script, you put pd.reset_option('all'). With Pandas 0.23.4, you get the following:
>>> import pandas as pd
>>> pd.reset_option('all')
html.border has been deprecated, use display.html.border instead
(currently both are identical)
C:\projects\stackoverflow\venv\lib\site-packages\pandas\core\config.py:619: FutureWarning: html.bord
er has been deprecated, use display.html.border instead
(currently both are identical)
warnings.warn(d.msg, FutureWarning)
: boolean
use_inf_as_null had been deprecated and will be removed in a future
version. Use `use_inf_as_na` instead.
C:\projects\stackoverflow\venv\lib\site-packages\pandas\core\config.py:619: FutureWarning:
: boolean
use_inf_as_null had been deprecated and will be removed in a future
version. Use `use_inf_as_na` instead.
warnings.warn(d.msg, FutureWarning)
>>>
Following the @bdiamante’s advice, you use the warnings library. Now, true to it’s word, the warnings have been removed. However, several pesky messages remain:
>>> import warnings
>>> warnings.simplefilter(action='ignore', category=FutureWarning)
>>> import pandas as pd
>>> pd.reset_option('all')
html.border has been deprecated, use display.html.border instead
(currently both are identical)
: boolean
use_inf_as_null had been deprecated and will be removed in a future
version. Use `use_inf_as_na` instead.
>>>
In fact, disabling all warnings produces the same output:
>>> import warnings
>>> warnings.simplefilter(action='ignore', category=Warning)
>>> import pandas as pd
>>> pd.reset_option('all')
html.border has been deprecated, use display.html.border instead
(currently both are identical)
: boolean
use_inf_as_null had been deprecated and will be removed in a future
version. Use `use_inf_as_na` instead.
>>>
In the standard library sense, these aren’t true warnings. Pandas implements its own warnings system. Running grep -rn on the warning messages shows that the pandas warning system is implemented in core/config_init.py:
$ grep -rn "html.border has been deprecated"
core/config_init.py:207:html.border has been deprecated, use display.html.border instead
Further chasing shows that I don’t have time for this. And you probably don’t either. Hopefully this saves you from falling down the rabbit hole or perhaps inspires someone to figure out how to truly suppress these messages!
But if you want to handle them one by one and you are managing a bigger codebase, it will be difficult to find the line of code which is causing the warning. Since warnings unlike errors don’t come with code traceback. In order to trace warnings like errors, you can write this at the top of the code:
import warnings
warnings.filterwarnings("error")
But if the codebase is bigger and it is importing bunch of other libraries/packages, then all sort of warnings will start to be raised as errors. In order to raise only certain type of warnings (in your case, its FutureWarning) as error, you can write:
I have a function (foo) which calls another function (bar). If invoking bar() raises an HttpError, I want to handle it specially if the status code is 404, otherwise re-raise.
I am trying to write some unit tests around this foo function, mocking out the call to bar(). Unfortunately, I am unable to get the mocked call to bar() to raise an Exception which is caught by my except block.
Here is my code which illustrates my problem:
import unittest
import mock
from apiclient.errors import HttpError
class FooTests(unittest.TestCase):
@mock.patch('my_tests.bar')
def test_foo_shouldReturnResultOfBar_whenBarSucceeds(self, barMock):
barMock.return_value = True
result = foo()
self.assertTrue(result) # passes
@mock.patch('my_tests.bar')
def test_foo_shouldReturnNone_whenBarRaiseHttpError404(self, barMock):
barMock.side_effect = HttpError(mock.Mock(return_value={'status': 404}), 'not found')
result = foo()
self.assertIsNone(result) # fails, test raises HttpError
@mock.patch('my_tests.bar')
def test_foo_shouldRaiseHttpError_whenBarRaiseHttpErrorNot404(self, barMock):
barMock.side_effect = HttpError(mock.Mock(return_value={'status': 500}), 'error')
with self.assertRaises(HttpError): # passes
foo()
def foo():
try:
result = bar()
return result
except HttpError as error:
if error.resp.status == 404:
print '404 - %s' % error.message
return None
raise
def bar():
raise NotImplementedError()
I followed the Mock docs which say that you should set the side_effect of a Mock instance to an Exception class to have the mocked function raise the error.
I also looked at some other related StackOverflow Q&As, and it looks like I am doing the same thing they are doing to cause and Exception to be raised by their mock.
Why is setting the side_effect of barMock not causing the expected Exception to be raised? If I am doing something weird, how should I go about testing logic in my except block?
Your mock is raising the exception just fine, but the error.resp.status value is missing. Rather than use return_value, just tell Mock that status is an attribute:
Additional keyword arguments to Mock() are set as attributes on the resulting object.
I put your foo and bar definitions in a my_tests module, added in the HttpError class so I could use it too, and your test then can be ran to success:
>>> from my_tests import foo, HttpError
>>> import mock
>>> with mock.patch('my_tests.bar') as barMock:
... barMock.side_effect = HttpError(mock.Mock(status=404), 'not found')
... result = my_test.foo()
...
404 -
>>> result is None
True
You can even see the print '404 - %s' % error.message line run, but I think you wanted to use error.content there instead; that’s the attribute HttpError() sets from the second argument, at any rate.
The Zen of Python states that there should only be one way to do things- yet frequently I run into the problem of deciding when to use a function versus when to use a method.
Let’s take a trivial example- a ChessBoard object. Let’s say we need some way to get all the legal King moves available on the board. Do we write ChessBoard.get_king_moves() or get_king_moves(chess_board)?
Why does Python use methods for some functionality (e.g. list.index()) but functions for other (e.g. len(list))?
The major reason is history. Functions were used for those operations that were generic for a group of types and which were
intended to work even for objects that didn’t have methods at all
(e.g. tuples). It is also convenient to have a function that can
readily be applied to an amorphous collection of objects when you use
the functional features of Python (map(), apply() et al).
In fact, implementing len(), max(), min() as a built-in function is actually less code than implementing them as methods for each type.
One can quibble about individual cases but it’s a part of Python, and
it’s too late to make such fundamental changes now. The functions have
to remain to avoid massive code breakage.
While interesting, the above doesn’t really say much as to what strategy to adopt.
This is one of the reasons – with custom methods, developers would be
free to choose a different method name, like getLength(), length(),
getlength() or whatsoever. Python enforces strict naming so that the
common function len() can be used.
Slightly more interesting. My take is that functions are in a sense, the Pythonic version of interfaces.
Talking about the Abilities/Interfaces made me think about some of our
“rogue” special method names. In the Language Reference, it says, “A
class can implement certain operations that are invoked by special
syntax (such as arithmetic operations or subscripting and slicing) by
defining methods with special names.” But there are all these methods
with special names like __len__ or __unicode__ which seem to be
provided for the benefit of built-in functions, rather than for
support of syntax. Presumably in an interface-based Python, these
methods would turn into regularly-named methods on an ABC, so that
__len__ would become
class container:
...
def len(self):
raise NotImplemented
Though, thinking about it some more, I don’t see why all syntactic
operations wouldn’t just invoke the appropriate normally-named method
on a specific ABC. “<“, for instance, would presumably invoke
“object.lessthan” (or perhaps “comparable.lessthan“). So another
benefit would be the ability to wean Python away from this
mangled-name oddness, which seems to me an HCI improvement.
Hm. I’m not sure I agree (figure that :-).
There are two bits of “Python rationale” that I’d like to explain
first.
First of all, I chose len(x) over x.len() for HCI reasons (def
__len__() came much later). There are two intertwined reasons actually, both HCI:
(a) For some operations, prefix notation just reads better than
postfix — prefix (and infix!) operations have a long tradition in
mathematics which likes notations where the visuals help the
mathematician thinking about a problem. Compare the easy with which we
rewrite a formula like x*(a+b) into x*a + x*b to the clumsiness of
doing the same thing using a raw OO notation.
(b) When I read code that says len(x) I know that it is asking for
the length of something. This tells me two things: the result is an
integer, and the argument is some kind of container. To the contrary,
when I read x.len(), I have to already know that x is some kind of
container implementing an interface or inheriting from a class that
has a standard len(). Witness the confusion we occasionally have when
a class that is not implementing a mapping has a get() or keys()
method, or something that isn’t a file has a write() method.
Saying the same thing in another way, I see ‘len’ as a built-in
operation. I’d hate to lose that. I can’t say for sure whether you meant that or not, but ‘def len(self): …’ certainly sounds like you
want to demote it to an ordinary method. I’m strongly -1 on that.
The second bit of Python rationale I promised to explain is the reason
why I chose special methods to look __special__ and not merely
special. I was anticipating lots of operations that classes might want
to override, some standard (e.g. __add__ or __getitem__), some not so
standard (e.g. pickle’s __reduce__ for a long time had no support in C
code at all). I didn’t want these special operations to use ordinary
method names, because then pre-existing classes, or classes written by
users without an encyclopedic memory for all the special methods,
would be liable to accidentally define operations they didn’t mean to
implement, with possibly disastrous consequences. Ivan Krstić
explained this more concise in his message, which arrived after I’d
written all this up.
My understanding of this is that in certain cases, prefix notation just makes more sense (ie, Duck.quack makes more sense than quack(Duck) from a linguistic standpoint.) and again, the functions allow for “interfaces”.
In such a case, my guess would be to implement get_king_moves based solely on Guido’s first point. But that still leaves a lot of open questions regarding say, implementing a stack and queue class with similar push and pop methods- should they be functions or methods? (here I would guess functions, because I really want to signal a push-pop interface)
TLDR: Can someone explain what the strategy for deciding when to use functions vs. methods should be?
My general rule is this – is the operation performed on the object or by the object?
if it is done by the object, it should be a member operation. If it could apply to other things too, or is done by something else to the object then it should be a function (or perhaps a member of something else).
When introducing programming, it is traditional (albeit implementation incorrect) to describe objects in terms of real-world objects such as cars. You mention a duck, so let’s go with that.
In the context of the “objects are real things” analogy, it is “correct” to add a class method for anything which the object can do. So say I want to kill off a duck, do I add a
.kill() to the duck? No… as far as I know animals do not commit suicide. Therefore if I want to kill a duck I should do this:
Moving away from this analogy, why do we use methods and classes? Because we want to contain data and hopefully structure our code in a manner such that it will be reusable and extensible in the future. This brings us to the notion of encapsulation which is so dear to OO design.
The encapsulation principal is really what this comes down to: as a designer you should hide everything about the implementation and class internals which it is not absolutely necessarily for any user or other developer to access. Because we deal with instances of classes, this reduces to “what operations are crucial on this instance“. If an operation is not instance specific, then it should not be a member function.
TL;DR:
what @Bryan said. If it operates on an instance and needs to access data which is internal to the class instance, it should be a member function.
1) Isolate calling code from implementation details — taking advantage of abstraction and encapsulation.
2) When you want to be substitutable for other objects — taking advantage of polymorphism.
3) When you want to reuse code for similar objects — taking advantage of inheritance.
Use a function for calls that make sense across many different object types — for example, the builtin len and repr functions apply to many kinds of objects.
That being said, the choice sometimes comes down to a matter of taste. Think in terms of what is most convenient and readable for typical calls. For example, which would be better (x.sin()**2 + y.cos()**2).sqrt() or sqrt(sin(x)**2 + cos(y)**2)?
Here’s a simple rule of thumb: if the code acts upon a single instance of an object, use a method. Even better: use a method unless there is a compelling reason to write it as a function.
In your specific example, you want it to look like this:
Don’t over think it. Always use methods until the point comes where you say to yourself “it doesn’t make sense to make this a method”, in which case you can make a function.
Attributes are the person’s name, height, shoe size, etc.
Methods and functions are operations that the person can perform.
If the operation could be done by just any ol’ person, without requiring anything unique to this one specific person (and without changing anything on this one specific person), then it’s a function and should be written as such.
If an operation is acting upon the person (e.g. eating, walking, …) or requires something unique to this person to get involved (like dancing, writing a book, …), then it should be a method.
Of course, it is not always trivial to translate this into the specific object you’re working with, but I find it is a good way to think of it.
Generally I use classes to implement a logical set of capabilities for some thing, so that in the rest of my program I can reason about the thing, not having to worry about all the little concerns that make up its implementation.
Anything that’s part of that core abstraction of “what you can do with a thing” should usually be a method. This generally includes everything that can alter a thing, as the internal data state is usually considered private and not part of the logical idea of “what you can do with a thing“.
When you come to higher level operations, especially if they involve multiple things, I find they are usually most naturally expressed as functions, if they can be built out of the public abstraction of a thing without needing special access to the internals (unless they’re methods of some other object). This has the big advantage that when I decide to completely rewrite the internals of how my thing works (without changing the interface), I just have a small core set of methods to rewrite, and then all the external functions written in terms of those methods will Just Work. I find that insisting that all operations to do with class X are methods on class X leads to over-complicated classes.
It depends on the code I’m writing though. For some programs I model them as a collection of objects whose interactions give rise to the behavior of the program; here most important functionality is closely coupled to a single object, and so is implemented in methods, with a scattering of utility functions. For other programs the most important stuff is a set of functions that manipulate data, and classes are in use only to implement the natural “duck types” that are manipulated by the functions.
You may say that, “in the face of ambiguity, refuse the temptation to guess”.
However, it’s not even a guess. You’re absolutely sure that the outcomes of both approaches are the same in that they solve your problem.
I believe it is only a good thing to have multiple ways to accomplishing goals. I’d humbly tell you, as other users did already, to employ whichever “tastes better” / feels more intuitive, in terms of language.