I am using the terrific Python Requests library. I notice that the fine documentation has many examples of how to do something without explaining the why. For instance, both r.text and r.content are shown as examples of how to get the server response. But where is it explained what these properties do? For instance, when would I choose one over the other? I see thar r.text returns a unicode object sometimes, and I suppose that there would be a difference for a non-text response. But where is all this documented? Note that the linked document does state:
You can also access the response body as bytes, for non-text requests:
But then it goes on to show an example of a text response! I can only suppose that the quote above means to say non-text responses instead of non-text requests, as a non-text request does not make sense in HTTP.
In short, where is the proper documentation of the library, as opposed to the (excellent) tutorial on the Python Requests site?
How can I convert a Django QuerySet into a list of dicts? I haven’t found an answer to this so I’m wondering if I’m missing some sort of common helper function that everyone uses.
Note: the result is a QuerySet which mostly behaves like a list, but isn’t actually an instance of list. Use list(Blog.objects.values(…)) if you really need an instance of list.
result =Blog.objects.values()# return ValuesQuerySet object
list_result =[entry for entry in result]# converts ValuesQuerySet into Python listreturn list_result
The .values() method will return you a result of type ValuesQuerySet which is typically what you need in most cases.
But if you wish, you could turn ValuesQuerySet into a native Python list using Python list comprehension as illustrated in the example below.
result = Blog.objects.values() # return ValuesQuerySet object
list_result = [entry for entry in result] # converts ValuesQuerySet into Python list
return list_result
I find the above helps if you are writing unit tests and need to assert that the expected return value of a function matches the actual return value, in which case both expected_result and actual_result must be of the same type (e.g. dictionary).
You do not exactly define what the dictionaries should look like, but most likely you are referring to QuerySet.values(). From the official django documentation:
Returns a ValuesQuerySet — a QuerySet subclass that returns
dictionaries when used as an iterable, rather than model-instance
objects.
Each of those dictionaries represents an object, with the keys
corresponding to the attribute names of model objects.
You can use the values() method on the dict you got from the Django model field you make the queries on and then you can easily access each field by a index value.
Call it like this –
myList = dictOfSomeData.values()
itemNumberThree = myList[2] #If there's a value in that index off course...
def queryset_to_dict(qs,fields=None, exclude=None):
my_array=[]for x in qs:
my_array.append(model_to_dict(x,fields=fields,exclude=exclude))return my_array
You could define a function using model_to_dict as follows:
def queryset_to_dict(qs,fields=None, exclude=None):
my_array=[]
for x in qs:
my_array.append(model_to_dict(x,fields=fields,exclude=exclude))
return my_array
And if you want to produce a column containing the name of the column with the maximum value but considering only a subset of columns then you use a variation of @ajcr’s answer:
In[146]:%timeit df.apply(lambda x: x.argmax(), axis=1)1 loops, best of 3:479 ms per loop
In[147]:%timeit df.idxmax(axis=1)10 loops, best of 3:47.3 ms per loop
You could apply on dataframe and get argmax() of each row via axis=1
In [144]: df.apply(lambda x: x.argmax(), axis=1)
Out[144]:
0 Communications
1 Business
2 Communications
3 Communications
4 Business
dtype: object
Here’s a benchmark to compare how slow apply method is to idxmax() for len(df) ~ 20K
In [146]: %timeit df.apply(lambda x: x.argmax(), axis=1)
1 loops, best of 3: 479 ms per loop
In [147]: %timeit df.idxmax(axis=1)
10 loops, best of 3: 47.3 ms per loop
When appending only once or once every now and again, using np.append on your array should be fine. The drawback of this approach is that memory is allocated for a completely new array every time it is called. When growing an array for a significant amount of samples it would be better to either pre-allocate the array (if the total size is known) or to append to a list and convert to an array afterward.
Using np.append:
b = np.array([0])
for k in range(int(10e4)):
b = np.append(b, k)
1.2 s ± 16.1 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
Using python list converting to array afterward:
d = [0]
for k in range(int(10e4)):
d.append(k)
f = np.array(d)
13.5 ms ± 277 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
Pre-allocating numpy array:
e = np.zeros((n,))
for k in range(n):
e[k] = k
9.92 ms ± 752 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
When the final size is unkown pre-allocating is difficult, I tried pre-allocating in chunks of 50 but it did not come close to using a list.
85.1 ms ± 561 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
pip is configured with locations that require TLS/SSL, however the ssl module inPythonisnot available.Collecting<package>Couldnot fetch URL https://pypi.python.org/simple/<package>/:There was a problem confirming the ssl certificate:Can't connect to HTTPS URL because the SSL module is not available. - skipping
Could not find a version that satisfies the requirement <package> (from versions: )
No matching distribution found for <package>
I’ve install Python 3.4 and Python 3.6 on my local machine successfully, but am unable to install packages with pip3.
When I execute pip3 install <package>, I get the following SSL related error:
pip is configured with locations that require TLS/SSL, however the ssl module in Python is not available.
Collecting <package>
Could not fetch URL https://pypi.python.org/simple/<package>/: There was a problem confirming the ssl certificate: Can't connect to HTTPS URL because the SSL module is not available. - skipping
Could not find a version that satisfies the requirement <package> (from versions: )
No matching distribution found for <package>
How can I fix my Python3.x install so that I can install packages with pip install <package>?
Disclaimer: The above commands are not tested in Ubuntu 20.04 LTS.
回答 1
如果您使用的是Red Hat / CentOS:
# To allow for building python ssl libs
yum install openssl-devel
# Download the source of *any* python version
cd /usr/src
wget https://www.python.org/ftp/python/3.6.2/Python-3.6.2.tar.xz
tar xf Python-3.6.2.tar.xz
cd Python-3.6.2# Configure the build w/ your installed libraries./configure
# Install into /usr/local/bin/python3.6, don't overwrite global python bin
make altinstall
# To allow for building python ssl libs
yum install openssl-devel
# Download the source of *any* python version
cd /usr/src
wget https://www.python.org/ftp/python/3.6.2/Python-3.6.2.tar.xz
tar xf Python-3.6.2.tar.xz
cd Python-3.6.2
# Configure the build w/ your installed libraries
./configure
# Install into /usr/local/bin/python3.6, don't overwrite global python bin
make altinstall
If you are on Windows and use anaconda this worked for me:
I tried a lot of other solutions which did not work (Environment PATH Variable changes …)
The problem can be caused by DLLs in the Windows\System32 folder (e.g. libcrypto-1_1-x64.dll or libssl-1_1-x64.dll or others) placed there by other software.
I had a similar problem on OSX 10.11 due to installing memcached which installed python 3.7 on top of 3.6.
WARNING: pip is configured with locations that require TLS/SSL, however the ssl module in Python is not available.
Spent hours on unlinking openssl, reinstalling, changing paths.. and nothing helped. Changing openssl version back from to older version did the trick:
brew switch openssl 1.0.2e
I did not see this suggestion anywhere in internet. Hope it serves someone.
If you are on OSX and have compiled python from source:
Install openssl using brew brew install openssl
Make sure to follow the instructions brew gives you about setting your CPPFLAGS and LDFLAGS. In my case I am using the openssl@1.1 brew formula and I need these 3 settings for the python build process to correctly link to my SSL library:
Assuming the library is installed at that location.
回答 9
我在Windows 10上遇到了同样的问题。我的非常具体的问题是由于安装了Anaconda。我安装了Anaconda并在路径下Path/to/Anaconda3/出现了python.exe。因此,我完全没有安装python,因为Anaconda包含python。使用pip安装软件包时,我发现了相同的错误报告pip is configured with locations that require TLS/SSL, however the ssl module in Python is not available.。
I encountered the same problem on windows 10. My very specific issue is due to my installation of Anaconda. I installed Anaconda and under the path Path/to/Anaconda3/, there comes the python.exe. Thus, I didn’t install python at all because Anaconda includes python. When using pip to install packages, I found the same error report, pip is configured with locations that require TLS/SSL, however the ssl module in Python is not available..
The solution was the following:
1) you can download python again on the official website;
2) Navigate to the directory where "Python 3.7 (64-bit).lnk"is located
3) import ssl and exit()
4) type in cmd, "Python 3.7 (64-bit).lnk" -m pip install tensorflow for instance.
If you are on Windows and use Anaconda you can try running “pip install …” command in Anaconda Prompt instead of cmd.exe, as user willliu1995 suggests here. This was the fastest solution for me, that does not require installation of additional components.
my issue appeared related to my python installation and was quickly resolved by re-installing python3 and pip. i think it started misbehaving after an OS update but who knows (at this time I am on Mac OS 10.14.6)
While installing Anaconda, select the option to add Anaconda to the path.
or
Find these (complete) paths from your installation folder of Anaconda and add them to the environment variable :
\Anaconda
\Anaconda\Library\mingw-w64\bin
\Anaconda\Library\usr\bin
\Anaconda\Library\bin
\Anaconda\Scripts
\anaconda\Library
\anaconda\condabin
Add the above paths to the “Path” system variable and it should show the error no more :)
回答 14
我遇到了同样的问题,可以通过以下步骤解决:
sudo yum install -y libffi-devel
sudo yum install openssl-devel
cd /usr/src
sudo wget https://www.python.org/ftp/python/3.7.1/Python-3.7.1.tar.xz
sudo tar xf Python-3.7.1.tar.xz
cd Python-3.7.1
sudo ./configure --enable-optimizations
# Install into /usr/local/bin/python3.7, don't overwrite global python bin
sudo make altinstall
I was having the same issue and was able to resolve with the following steps:
sudo yum install -y libffi-devel
sudo yum install openssl-devel
cd /usr/src
sudo wget https://www.python.org/ftp/python/3.7.1/Python-3.7.1.tar.xz
sudo tar xf Python-3.7.1.tar.xz
cd Python-3.7.1
sudo ./configure --enable-optimizations
# Install into /usr/local/bin/python3.7, don't overwrite global python bin
sudo make altinstall
The ssl module is a TLS/SSL wrapper for accessing Operation Sytem (OS) socket (Lib/ssl.py). So when ssl module is not available, chances are that you either don’t have OS OpenSSL libraries installed, or those libraries were not found when you install Python. Let assume it is a later case (aka: you already have OpenSSL installed, but they are not correctly linked when installing Python).
I will also assume you are installing from source. If you are installing from binary (ie: Window .exe file), or package (Mac .dmg, or Ubuntu apt), there is not much you can do with the installing process.
During the step of configuring your python installation, you need to specify where the OS OpenSSL will be used for linking:
Your system may be different than mine, but as you see here I have many different installed openssl libraries. As the time of this writing, python 3.8 expects openssl 1.0.2 or 1.1:
Python requires an OpenSSL 1.0.2 or 1.1 compatible libssl with X509_VERIFY_PARAM_set1_host().
So you would need to verify which of those installed libraries that you can use for linking, for example
/usr/bin/openssl version
OpenSSL 1.0.2g 1 Mar 2016
./configure --with-openssl="/usr"
make && make install
You may need to try a few, or install a new, to find the library that would work for your Python and your OS.
I finally solve this issue. These are the detail of my env:
Version of Python to install: 3.6.8
OS: Ubuntu 16.04.6 LTS
Root access: No
Some people suggest to install libssl-dev, but it did not work for me. I follow this link and I fixed it!
In short, I download, extract, build, and install OpenSSL (openssl-1.1.1b.tar.gz). Then, I modify .bashrc file follow this link.
Next, I download and extract Python-3.6.8.tgz. I edit Modules/Setup.dist to modify SSL path (lines around #211). I did ./configure --prefix=$HOME/opt/python-3.6.8, make and make install. Last, I modify my .bashrc. Notice that I do not include --enable-optimizations in ./configure.
The huge assumption I made is that the yum version of ssl is the same as the conda version, so just renaming the shared object might work, and it did.
My other solution was to re-compile python, re-install anaconda, etc, but in the end I’m glad I didn’t need to.
Hope this helps you guys out.
回答 21
在pyenv用于管理Mac OS Catalina上的python安装的情况下,我必须先openssl使用brew 安装,然后再运行pyenv install 3.7.8,这似乎是使用opensslfrom homebrew 来构建python安装的(甚至在安装输出中也是如此)。然后pyenv global 3.7.8,我不在了。
In the case of using pyenv to manage python installations on Mac OS Catalina, I had to install openssl with brew first and then after that run pyenv install 3.7.8 which seemed to build the python installation using the openssl from homebrew (it even said as such in the installation output). Then pyenv global 3.7.8 and I was away.
I was able to fix this by updating the python version in this file.
pyenv: version `3.6.5′ is not installed (set by /Users/taruntarun/.python-version)
Though i had the latest version installed, my command was still using old version 3.6.5
I was having the same problem for the last two days and only have fixed it right now.
I had tried to use --trust-host option with the DigiCert_High_Assurance_EV_Root_CA.pem did not work, I couldn’t install the ssl module (It tells it cannot be installed for python versions greater than 2.6), setting the $PIP_CERT variable didn’t fix it either and I had libssl1.0.2 and libssl1.0.0 installed. Also worth mentioning I didn’t had a ~/.pip/pip.conf file, and creating it didn’t solve the bug either.
What finally solved it, was installing python3.6 with make again.
Download the Python-3.6.0.tgz from the website, run configure then make, make test and make install. Hope it works for you.
The python documentation is actually very clear, and following the instructions did the job whereas other answers I found here were not fixing this issue.
make sure you have openssl installed by running brew install openssl
unzip it and move to the python directory: tar xvzf Python-3.6.2.tar.xz && cd Python-3.6.2
then if the python version is < 3.7, run
CPPFLAGS="-I$(brew --prefix openssl)/include" \
LDFLAGS="-L$(brew --prefix openssl)/lib" \
./configure --with-pydebug
5. finallly, run make -s -j2 (-s is the silent flag, -j2 tells your machine to use 2 jobs)
I had the same issue trying to install python3.7 on an ubuntu14.04 machine.
The issue was that I had some custom folders in my PKG_CONFIG_PATH and in my LD_LIBRARY_PATH, which prevented the python build process to find the system openssl libraries.
Ok the latest answer to this, as of now don’t use Python 3.8, use only 3.7 or less , because of most of the libraries fail to install with the above error
import pandas as pd
df ={'col_1':[0,1,2,3],'col_2':[4,5,6,7]}
df = pd.DataFrame(df)
df[['column_new_1','column_new_2','column_new_3']]=[np.nan,'dogs',3]#thought this would work here...
I’m new to pandas and trying to figure out how to add multiple columns to pandas simultaneously. Any help here is appreciated. Ideally I would like to do this in one step rather than multiple repeated steps…
import pandas as pd
df = {'col_1': [0, 1, 2, 3],
'col_2': [4, 5, 6, 7]}
df = pd.DataFrame(df)
df[[ 'column_new_1', 'column_new_2','column_new_3']] = [np.nan, 'dogs',3] #thought this would work here...
I would have expected your syntax to work too. The problem arises because when you create new columns with the column-list syntax (df[[new1, new2]] = ...), pandas requires that the right hand side be a DataFrame (note that it doesn’t actually matter if the columns of the DataFrame have the same names as the columns you are creating).
Your syntax works fine for assigning scalar values to existing columns, and pandas is also happy to assign scalar values to a new column using the single-column syntax (df[new1] = ...). So the solution is either to convert this into several single-column assignments, or create a suitable DataFrame for the right-hand side.
5) Using a dict is a more “natural” way to create the new data frame than the previous two, but the new columns will be sorted alphabetically (at least before Python 3.6 or 3.7):
I like this variant on @zero’s answer a lot, but like the previous one, the new columns will always be sorted alphabetically, at least with early versions of Python:
In [128]: df
Out[128]:
col_1 col_2
0 0 4
1 1 5
2 2 6
3 3 7
In [129]: pd.concat([df, pd.DataFrame(columns = [ 'column_new_1', 'column_new_2','column_new_3'])])
Out[129]:
col_1 col_2 column_new_1 column_new_2 column_new_3
0 0.0 4.0 NaN NaN NaN
1 1.0 5.0 NaN NaN NaN
2 2.0 6.0 NaN NaN NaN
3 3.0 7.0 NaN NaN NaN
Not very sure of what you wanted to do with [np.nan, 'dogs',3]. Maybe now set them as default values?
In [142]: df1 = pd.concat([df, pd.DataFrame(columns = [ 'column_new_1', 'column_new_2','column_new_3'])])
In [143]: df1[[ 'column_new_1', 'column_new_2','column_new_3']] = [np.nan, 'dogs', 3]
In [144]: df1
Out[144]:
col_1 col_2 column_new_1 column_new_2 column_new_3
0 0.0 4.0 NaN dogs 3
1 1.0 5.0 NaN dogs 3
2 2.0 6.0 NaN dogs 3
3 3.0 7.0 NaN dogs 3
回答 3
使用列表理解,pd.DataFrame以及pd.concat
pd.concat([
df,
pd.DataFrame([[np.nan,'dogs',3]for _ in range(df.shape[0])],
df.index,['column_new_1','column_new_2','column_new_3'])], axis=1)
You can pass a list of columns to [] to select columns in that order.
If a column is not contained in the DataFrame, an exception will be raised.
Multiple columns can also be set in this manner.
You may find this useful for applying a transform (in-place) to a subset of the columns.
If you just want to add empty new columns, reindex will do the job
df
col_1 col_2
0 0 4
1 1 5
2 2 6
3 3 7
df.reindex(list(df)+['column_new_1', 'column_new_2','column_new_3'], axis=1)
col_1 col_2 column_new_1 column_new_2 column_new_3
0 0 4 NaN NaN NaN
1 1 5 NaN NaN NaN
2 2 6 NaN NaN NaN
3 3 7 NaN NaN NaN
I want to create a python function to test the time spent in each function and print its name with its time, how i can print the function name and if there is another way to do so please tell me
def timing(f):def wrap(*args):
time1 = time.time()
ret = f(*args)
time2 = time.time()print'%s function took %0.3f ms'%(f.func_name,(time2-time1)*1000.0)return ret
return wrap
用法非常简单,只需使用@timing装饰器即可:
@timingdef do_work():#code
Python 3:
def timing(f):def wrap(*args,**kwargs):
time1 = time.time()
ret = f(*args,**kwargs)
time2 = time.time()print('{:s} function took {:.3f} ms'.format(f.__name__,(time2-time1)*1000.0))return ret
return wrap
with timeit_context('My profiling code'):
mike = Person()
mike.think()
And the code within the with block will be timed.
Conclusion
Using the first method, you can eaily comment out the decorator to get the normal code. However, it can only time a function. If you have some part of code that you don’t what to make it a function, then you can choose the second method.
The only case where you might be interested in writing your own timing statements is if you want to run a function only once and are also want to obtain its return value.
The advantage of using the timeit module is that it lets you repeat the number of executions. This might be necessary because other processes might interfere with your timing accuracy. So, you should run it multiple times and look at the lowest value.
Timeit has two big flaws: it doesn’t return the return value of the function, and it uses eval, which requires passing in extra setup code for imports. This solves both problems simply and elegantly:
def timed(f):
start = time.time()
ret = f()
elapsed = time.time() - start
return ret, elapsed
timed(lambda: database.foo.execute('select count(*) from source.apachelog'))
(<sqlalchemy.engine.result.ResultProxy object at 0x7fd6c20fc690>, 4.07547402381897)
timer =Timer()def any_function():
timer.start()for i in range(10):
timer.reset()
np.dot(np.ones((100,1000)), np.zeros((1000,500)))
timer.checkpoint('block1')
np.dot(np.ones((100,1000)), np.zeros((1000,500)))
np.dot(np.ones((100,1000)), np.zeros((1000,500)))
timer.checkpoint('block2')
np.dot(np.ones((100,1000)), np.zeros((1000,1000)))for j in range(20):
np.dot(np.ones((100,1000)), np.zeros((1000,500)))
timer.summary()for i in range(2):
any_function()
输出:
========TimingSummary of DefaultTimer========
block2:0.065062
block1:0.032529========TimingSummary of DefaultTimer========
block2:0.065838
block1:0.032891
from pytimer import Timer
@Timer(average=False)
def matmul(a,b, times=100):
for i in range(times):
np.dot(a,b)
Output:
matmul:0.368434
matmul:2.839355
It can also work like a plug-in timer with namespace control(helpful if you are inserting it to a function which has a lot of codes and may be called anywhere else).
timer = Timer()
def any_function():
timer.start()
for i in range(10):
timer.reset()
np.dot(np.ones((100,1000)), np.zeros((1000,500)))
timer.checkpoint('block1')
np.dot(np.ones((100,1000)), np.zeros((1000,500)))
np.dot(np.ones((100,1000)), np.zeros((1000,500)))
timer.checkpoint('block2')
np.dot(np.ones((100,1000)), np.zeros((1000,1000)))
for j in range(20):
np.dot(np.ones((100,1000)), np.zeros((1000,500)))
timer.summary()
for i in range(2):
any_function()
Output:
========Timing Summary of Default Timer========
block2:0.065062
block1:0.032529
========Timing Summary of Default Timer========
block2:0.065838
block1:0.032891
Hope it will help
回答 5
使用装饰器Python库的Decorator方法:
import decorator
@decoratordef timing(func,*args,**kwargs):'''Function timing wrapper
Example of using:
``@timing()``
'''
fn ='%s.%s'%(func.__module__, func.__name__)
timer =Timer()with timer:
ret = func(*args,**kwargs)
log.info(u'%s - %0.3f sec'%(fn, timer.duration_in_seconds()))return ret
from time import time
def printTime(start):
end = time()
duration = end - start
if duration <60:return"used: "+ str(round(duration,2))+"s."else:
mins = int(duration /60)
secs = round(duration %60,2)if mins <60:return"used: "+ str(mins)+"m "+ str(secs)+"s."else:
hours = int(duration /3600)
mins = mins %60return"used: "+ str(hours)+"h "+ str(mins)+"m "+ str(secs)+"s."
In[1]:def fun1(a)def fun2(b)# I want to set a breakpoint for the following line #return do_some_thing_about(b)return fun2(a)In[2]:import multiprocessing as mp
pool=mp.Pool(processes=2)
results=pool.map(fun1,1.0)
pool.close()
pool.join
As I know, %debug magic can do debug within one cell.
However, I have function calls across multiple cells.
For example,
In[1]: def fun1(a)
def fun2(b)
# I want to set a breakpoint for the following line #
return do_some_thing_about(b)
return fun2(a)
In[2]: import multiprocessing as mp
pool=mp.Pool(processes=2)
results=pool.map(fun1, 1.0)
pool.close()
pool.join
What I tried:
I tried to set %debug in the first line of cell-1. But it enter into debug mode immediately, even before executing cell-2.
I tried to add %debug in the line right before the code return do_some_thing_about(b). But then the code runs forever, never stops.
What is the right way to set a break point within the ipython notebook?
The %pdb magic command is good to use as well. Just say %pdb on and subsequently the pdb debugger will run on all exceptions, no matter how deep in the call stack. Very handy.
If you have a particular line that you want to debug, just raise an exception there (often you already are!) or use the %debug magic command that other folks have been suggesting.
I just discovered PixieDebugger. Even thought I have not yet had the time to test it, it really seems the most similar way to debug the way we’re used in ipython with ipdb
A native debugger is being made available as an extension to JupyterLab. Released a few weeks ago, this can be installed by getting the relevant extension, as well as xeus-python kernel (which notably comes without the magics well-known to ipykernel users):
My initial thought was to use heuristics to do the sorting, like:
There’s a ~60-40% ratio in weight bearing between the front and hind paws;
The hind paws are generally smaller in surface;
The paws are (often) spatially divided in left and right.
However, I’m a bit skeptical about my heuristics, as they would fail on me as soon as I encounter a variation I hadn’t thought off. They also won’t be able to cope with measurements from lame dogs, whom probably have rules of their own.
Furthermore, the annotation suggested by Joe sometimes get’s messed up and doesn’t take into account what the paw actually looks like.
Based on the answers I received on my question about peak detection within the paw, I’m hoping there are more advanced solutions to sort the paws. Especially because the pressure distribution and the progression thereof are different for each separate paw, almost like a fingerprint. I hope there’s a method that can use this to cluster my paws, rather than just sorting them in order of occurrence.
So I’m looking for a better way to sort the results with their corresponding paw.
To clarfiy: walk_sliced_data is a dictionary that contains [‘ser_3’, ‘ser_2’, ‘sel_1’, ‘sel_2’, ‘ser_1’, ‘sel_3’], which are the names of the measurements. Each measurement contains another dictionary, [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10] (example from ‘sel_1’) which represent the impacts that were extracted.
Also note that ‘false’ impacts, such as where the paw is partially measured (in space or time) can be ignored. They are only useful because they can help recognizing a pattern, but
won’t be analyzed.
def group_paws(data_slices, time):# Sort slices by initial contact time
data_slices.sort(key=lambda s: s[-1].start)# Get the centroid for each paw impact...
paw_coords =[]for x,y,z in data_slices:
paw_coords.append([(item.stop + item.start)/2.0for item in(x,y)])
paw_coords = np.array(paw_coords)# Make a vector between each sucessive impact...
dx, dy = np.diff(paw_coords, axis=0).T
#-- Group paws -------------------------------------------
paw_code ={0:'LF',1:'RH',2:'RF',3:'LH'}
paw_number = np.arange(len(paw_coords))# Did we miss the hind paw impact after the first # front paw impact? If so, first dx will be positive...if dx[0]>0:
paw_number[1:]+=1# Are we starting with the left or right front paw...# We assume we're starting with the left, and check dy[0].# If dy[0] > 0 (i.e. the next paw impacts to the left), then# it's actually the right front paw, instead of the left.if dy[0]>0:# Right front paw impact...
paw_number +=2# Now we can determine the paw with a simple modulo 4..
paw_codes = paw_number %4
paw_labels =[paw_code[code]for code in paw_codes]return paw_labels
def paw_pattern_problems(paw_labels, dx, dy):"""Check whether or not the label sequence "paw_labels" conforms to our
expected spatial pattern of paw impacts. "paw_labels" should be a sequence
of the strings: "LH", "RH", "LF", "RF" corresponding to the different paws"""# Check for problems... (This could be written a _lot_ more cleanly...)
problems =False
last = paw_labels[0]for paw, dy, dx in zip(paw_labels[1:], dy, dx):# Going from a left paw to a right, dy should be negativeif last.startswith('L')and paw.startswith('R')and(dy >0):
problems =Truebreak# Going from a right paw to a left, dy should be positiveif last.startswith('R')and paw.startswith('L')and(dy <0):
problems =Truebreak# Going from a front paw to a hind paw, dx should be negativeif last.endswith('F')and paw.endswith('H')and(dx >0):
problems =Truebreak# Going from a hind paw to a front paw, dx should be positiveif last.endswith('H')and paw.endswith('F')and(dx <0):
problems =Truebreak
last = paw
return problems
def paw_image(paw):from scipy.ndimage import map_coordinates
ny, nx = paw.shape
# Trim off any "blank" edges around the paw...
mask = paw >0.01* paw.max()
y, x = np.mgrid[:ny,:nx]
ymin, ymax = y[mask].min(), y[mask].max()
xmin, xmax = x[mask].min(), x[mask].max()# Make a 20x20 grid to resample the paw pressure values onto
numx, numy =20,20
xi = np.linspace(xmin, xmax, numx)
yi = np.linspace(ymin, ymax, numy)
xi, yi = np.meshgrid(xi, yi)# Resample the values onto the 20x20 grid
coords = np.vstack([yi.flatten(), xi.flatten()])
zi = map_coordinates(paw, coords)
zi = zi.reshape((numy, numx))# Rescale the pressure values
zi -= zi.min()
zi /= zi.max()
zi -= zi.mean()#<- Helps distinguish front from hind paws...return zi
def make_eigenpaws(paw_data):"""Creates a set of eigenpaws based on paw_data.
paw_data is a numdata by numdimensions matrix of all of the observations."""
average_paw = paw_data.mean(axis=0)
paw_data -= average_paw
# Determine the eigenvectors of the covariance matrix of the data
cov = np.cov(paw_data.T)
eigvals, eigvecs = np.linalg.eig(cov)# Sort the eigenvectors by ascending eigenvalue (largest is last)
eig_idx = np.argsort(eigvals)
sorted_eigvecs = eigvecs[:,eig_idx]
sorted_eigvals = eigvals[:,eig_idx]# Now choose a cutoff number of eigenvectors to use # (50 seems to work well, but it's arbirtrary...
num_basis_vecs =50
basis_vecs = sorted_eigvecs[:,-num_basis_vecs:]return basis_vecs
Alright! I’ve finally managed to get something working consistently! This problem pulled me in for several days… Fun stuff! Sorry for the length of this answer, but I need to elaborate a bit on some things… (Though I may set a record for the longest non-spam stackoverflow answer ever!)
As a side note, I’m using the full dataset that Ivo provided a link to in his original question. It’s a series of rar files (one-per-dog) each containing several different experiment runs stored as ascii arrays. Rather than try to copy-paste stand-alone code examples into this question, here’s a bitbucket mercurial repository with full, stand-alone code. You can clone it with
There are essentially two ways to approach the problem, as you noted in your question. I’m actually going to use both in different ways.
Use the (temporal and spatial) order of the paw impacts to determine which paw is which.
Try to identify the “pawprint” based purely on its shape.
Basically, the first method works with the dog’s paws follow the trapezoidal-like pattern shown in Ivo’s question above, but fails whenever the paws don’t follow that pattern. It’s fairly easy to programatically detect when it doesn’t work.
Therefore, we can use the measurements where it did work to build up a training dataset (of ~2000 paw impacts from ~30 different dogs) to recognize which paw is which, and the problem reduces to a supervised classification (With some additional wrinkles… Image recognition is a bit harder than a “normal” supervised classification problem).
Pattern Analysis
To elaborate on the first method, when a dog is walking (not running!) normally (which some of these dogs may not be), we expect paws to impact in the order of: Front Left, Hind Right, Front Right, Hind Left, Front Left, etc. The pattern may start with either the front left or front right paw.
If this were always the case, we could simply sort the impacts by initial contact time and use a modulo 4 to group them by paw.
However, even when everything is “normal”, this doesn’t work. This is due to the trapezoid-like shape of the pattern. A hind paw spatially falls behind the previous front paw.
Therefore, the hind paw impact after the initial front paw impact often falls off the sensor plate, and isn’t recorded. Similarly, the last paw impact is often not the next paw in the sequence, as the paw impact before it occured off the sensor plate and wasn’t recorded.
Nonetheless, we can use the shape of the paw impact pattern to determine when this has happened, and whether we’ve started with a left or right front paw. (I’m actually ignoring problems with the last impact here. It’s not too hard to add it, though.)
def group_paws(data_slices, time):
# Sort slices by initial contact time
data_slices.sort(key=lambda s: s[-1].start)
# Get the centroid for each paw impact...
paw_coords = []
for x,y,z in data_slices:
paw_coords.append([(item.stop + item.start) / 2.0 for item in (x,y)])
paw_coords = np.array(paw_coords)
# Make a vector between each sucessive impact...
dx, dy = np.diff(paw_coords, axis=0).T
#-- Group paws -------------------------------------------
paw_code = {0:'LF', 1:'RH', 2:'RF', 3:'LH'}
paw_number = np.arange(len(paw_coords))
# Did we miss the hind paw impact after the first
# front paw impact? If so, first dx will be positive...
if dx[0] > 0:
paw_number[1:] += 1
# Are we starting with the left or right front paw...
# We assume we're starting with the left, and check dy[0].
# If dy[0] > 0 (i.e. the next paw impacts to the left), then
# it's actually the right front paw, instead of the left.
if dy[0] > 0: # Right front paw impact...
paw_number += 2
# Now we can determine the paw with a simple modulo 4..
paw_codes = paw_number % 4
paw_labels = [paw_code[code] for code in paw_codes]
return paw_labels
In spite of all of this, it frequently doesn’t work correctly. Many of the dogs in the full dataset appear to be running, and the paw impacts don’t follow the same temporal order as when the dog is walking. (Or perhaps the dog just has severe hip problems…)
Fortunately, we can still programatically detect whether or not the paw impacts follow our expected spatial pattern:
def paw_pattern_problems(paw_labels, dx, dy):
"""Check whether or not the label sequence "paw_labels" conforms to our
expected spatial pattern of paw impacts. "paw_labels" should be a sequence
of the strings: "LH", "RH", "LF", "RF" corresponding to the different paws"""
# Check for problems... (This could be written a _lot_ more cleanly...)
problems = False
last = paw_labels[0]
for paw, dy, dx in zip(paw_labels[1:], dy, dx):
# Going from a left paw to a right, dy should be negative
if last.startswith('L') and paw.startswith('R') and (dy > 0):
problems = True
break
# Going from a right paw to a left, dy should be positive
if last.startswith('R') and paw.startswith('L') and (dy < 0):
problems = True
break
# Going from a front paw to a hind paw, dx should be negative
if last.endswith('F') and paw.endswith('H') and (dx > 0):
problems = True
break
# Going from a hind paw to a front paw, dx should be positive
if last.endswith('H') and paw.endswith('F') and (dx < 0):
problems = True
break
last = paw
return problems
Therefore, even though the simple spatial classification doesn’t work all of the time, we can determine when it does work with reasonable confidence.
Training Dataset
From the pattern-based classifications where it worked correctly, we can build up a very large training dataset of correctly classified paws (~2400 paw impacts from 32 different dogs!).
We can now start to look at what an “average” front left, etc, paw looks like.
To do this, we need some sort of “paw metric” that is the same dimensionality for any dog. (In the full dataset, there are both very large and very small dogs!) A paw print from an Irish elkhound will be both much wider and much “heavier” than a paw print from a toy poodle. We need to rescale each paw print so that a) they have the same number of pixels, and b) the pressure values are standardized. To do this, I resampled each paw print onto a 20×20 grid and rescaled the pressure values based on the maximum, mininum, and mean pressure value for the paw impact.
def paw_image(paw):
from scipy.ndimage import map_coordinates
ny, nx = paw.shape
# Trim off any "blank" edges around the paw...
mask = paw > 0.01 * paw.max()
y, x = np.mgrid[:ny, :nx]
ymin, ymax = y[mask].min(), y[mask].max()
xmin, xmax = x[mask].min(), x[mask].max()
# Make a 20x20 grid to resample the paw pressure values onto
numx, numy = 20, 20
xi = np.linspace(xmin, xmax, numx)
yi = np.linspace(ymin, ymax, numy)
xi, yi = np.meshgrid(xi, yi)
# Resample the values onto the 20x20 grid
coords = np.vstack([yi.flatten(), xi.flatten()])
zi = map_coordinates(paw, coords)
zi = zi.reshape((numy, numx))
# Rescale the pressure values
zi -= zi.min()
zi /= zi.max()
zi -= zi.mean() #<- Helps distinguish front from hind paws...
return zi
After all of this, we can finally take a look at what an average left front, hind right, etc paw looks like. Note that this is averaged across >30 dogs of greatly different sizes, and we seem to be getting consistent results!
However, before we do any analysis on these, we need to subtract the mean (the average paw for all legs of all dogs).
Now we can analyize the differences from the mean, which are a bit easier to recognize:
Image-based Paw Recognition
Ok… We finally have a set of patterns that we can begin to try to match the paws against. Each paw can be treated as a 400-dimensional vector (returned by the paw_image function) that can be compared to these four 400-dimensional vectors.
Unfortunately, if we just use a “normal” supervised classification algorithm (i.e. find which of the 4 patterns is closest to a particular paw print using a simple distance), it doesn’t work consistently. In fact, it doesn’t do much better than random chance on the training dataset.
This is a common problem in image recognition. Due to the high dimensionality of the input data, and the somewhat “fuzzy” nature of images (i.e. adjacent pixels have a high covariance), simply looking at the difference of an image from a template image does not give a very good measure of the similarity of their shapes.
Eigenpaws
To get around this we need to build a set of “eigenpaws” (just like “eigenfaces” in facial recognition), and describe each paw print as a combination of these eigenpaws. This is identical to principal components analysis, and basically provides a way to reduce the dimensionality of our data, so that distance is a good measure of shape.
Because we have more training images than dimensions (2400 vs 400), there’s no need to do “fancy” linear algebra for speed. We can work directly with the covariance matrix of the training data set:
def make_eigenpaws(paw_data):
"""Creates a set of eigenpaws based on paw_data.
paw_data is a numdata by numdimensions matrix of all of the observations."""
average_paw = paw_data.mean(axis=0)
paw_data -= average_paw
# Determine the eigenvectors of the covariance matrix of the data
cov = np.cov(paw_data.T)
eigvals, eigvecs = np.linalg.eig(cov)
# Sort the eigenvectors by ascending eigenvalue (largest is last)
eig_idx = np.argsort(eigvals)
sorted_eigvecs = eigvecs[:,eig_idx]
sorted_eigvals = eigvals[:,eig_idx]
# Now choose a cutoff number of eigenvectors to use
# (50 seems to work well, but it's arbirtrary...
num_basis_vecs = 50
basis_vecs = sorted_eigvecs[:,-num_basis_vecs:]
return basis_vecs
These basis_vecs are the “eigenpaws”.
To use these, we simply dot (i.e. matrix multiplication) each paw image (as a 400-dimensional vector, rather than a 20×20 image) with the basis vectors. This gives us a 50-dimensional vector (one element per basis vector) that we can use to classify the image. Instead of comparing a 20×20 image to the 20×20 image of each “template” paw, we compare the 50-dimensional, transformed image to each 50-dimensional transformed template paw. This is much less sensitive to small variations in exactly how each toe is positioned, etc, and basically reduces the dimensionality of the problem to just the relevant dimensions.
Eigenpaw-based Paw Classification
Now we can simply use the distance between the 50-dimensional vectors and the “template” vectors for each leg to classify which paw is which:
There are still some problems, particularly with dogs too small to make a clear pawprint… (It works best with large dogs, as the toes are more clearly seperated at the sensor’s resolution.) Also, partial pawprints aren’t recognized with this system, while they can be with the trapezoidal-pattern-based system.
However, because the eigenpaw analysis inherently uses a distance metric, we can classify the paws both ways, and fall back to the trapezoidal-pattern-based system when the eigenpaw analysis’s smallest distance from the “codebook” is over some threshold. I haven’t implemented this yet, though.
Phew… That was long! My hat is off to Ivo for having such a fun question!
Using the information purely based on duration, I think you could apply techniques from modeling kinematics; namely Inverse Kinematics. Combined with orientation, length, duration, and total weight it gives some level of periodicity which, I would hope could be the first step trying to solve your “sorting of paws” problem.
All that data could be used to create a list of bounded polygons (or tuples), which you could use to sort by step size then by paw-ness [index].
The __debug__ variable is handy in part because it affects every module. If I want to create another variable that works the same way, how would I do it?
The variable (let’s be original and call it ‘foo’) doesn’t have to be truly global, in the sense that if I change foo in one module, it is updated in others. I’d be fine if I could set foo before importing other modules and then they would see the same value for it.
I don’t endorse this solution in any way, shape or form. But if you add a variable to the __builtin__ module, it will be accessible as if a global from any other module that includes __builtin__ — which is all of them, by default.
a.py contains
print foo
b.py contains
import __builtin__
__builtin__.foo = 1
import a
The result is that “1” is printed.
Edit: The __builtin__ module is available as the local symbol __builtins__ — that’s the reason for the discrepancy between two of these answers. Also note that __builtin__ has been renamed to builtins in python3.
Define a module ( call it “globalbaz” ) and have the variables defined inside it. All the modules using this “pseudoglobal” should import the “globalbaz” module, and refer to it using “globalbaz.var_name”
This works regardless of the place of the change, you can change the variable before or after the import. The imported module will use the latest value. (I tested this in a toy example)
For clarification, globalbaz.py looks just like this:
当只有一个这样的模块时,我将其命名为“ g”。在其中,我为每个要视为全局变量的变量分配默认值。在使用它们的每个模块中,我都不使用“ from g import var”,因为这只会导致局部变量,该局部变量仅在导入时才从g初始化。我以g.var和“ g”的形式进行大多数引用。不断提醒我,我正在处理其他模块可能访问的变量。
I believe that there are plenty of circumstances in which it does make sense and it simplifies programming to have some globals that are known across several (tightly coupled) modules. In this spirit, I would like to elaborate a bit on the idea of having a module of globals which is imported by those modules which need to reference them.
When there is only one such module, I name it “g”. In it, I assign default values for every variable I intend to treat as global. In each module that uses any of them, I do not use “from g import var”, as this only results in a local variable which is initialized from g only at the time of the import. I make most references in the form g.var, and the “g.” serves as a constant reminder that I am dealing with a variable that is potentially accessible to other modules.
If the value of such a global variable is to be used frequently in some function in a module, then that function can make a local copy: var = g.var. However, it is important to realize that assignments to var are local, and global g.var cannot be updated without referencing g.var explicitly in an assignment.
Note that you can also have multiple such globals modules shared by different subsets of your modules to keep things a little more tightly controlled. The reason I use short names for my globals modules is to avoid cluttering up the code too much with occurrences of them. With only a little experience, they become mnemonic enough with only 1 or 2 characters.
It is still possible to make an assignment to, say, g.x when x was not already defined in g, and a different module can then access g.x. However, even though the interpreter permits it, this approach is not so transparent, and I do avoid it. There is still the possibility of accidentally creating a new variable in g as a result of a typo in the variable name for an assignment. Sometimes an examination of dir(g) is useful to discover any surprise names that may have arisen by such accident.
You can already do this with module-level variables. Modules are the same no matter what module they’re being imported from. So you can make the variable a module-level variable in whatever module it makes sense to put it in, and access it or assign to it from other modules. It would be better to call a function to set the variable’s value, or to make it a property of some singleton object. That way if you end up needing to run some code when the variable’s changed, you can do so without breaking your module’s external interface.
It’s not usually a great way to do things — using globals seldom is — but I think this is the cleanest way to do it.
回答 7
我想发布一个答案,在某些情况下找不到该变量。
循环导入可能会破坏模块的行为。
例如:
第一.py
import second
var =1
第二个
import first
print(first.var)# will throw an error because the order of execution happens before var gets declared.
This sounds like modifying the __builtin__ name space. To do it:
import __builtin__
__builtin__.foo = 'some-value'
Do not use the __builtins__ directly (notice the extra “s”) – apparently this can be a dictionary or a module. Thanks to ΤΖΩΤΖΙΟΥ for pointing this out, more can be found here.
Now foo is available for use everywhere.
I don’t recommend doing this generally, but the use of this is up to the programmer.
Assigning to it must be done as above, just setting foo = 'some-other-value' will only set it in the current namespace.
I use this for a couple built-in primitive functions that I felt were really missing. One example is a find function that has the same usage semantics as filter, map, reduce.
def builtin_find(f, x, d=None):
for i in x:
if f(i):
return i
return d
import __builtin__
__builtin__.find = builtin_find
Once this is run (for instance, by importing near your entry point) all your modules can use find() as though, obviously, it was built in.
find(lambda i: i < 0, [1, 3, 0, -5, -10]) # Yields -5, the first negative.
Note: You can do this, of course, with filter and another line to test for zero length, or with reduce in one sort of weird line, but I always felt it was weird.
回答 10
我可以使用字典来实现跨模块的可修改(或可变)变量:
# in myapp.__init__Timeouts={}# cross-modules global mutable variables for testing purposeTimeouts['WAIT_APP_UP_IN_SECONDS']=60# in myapp.mod1from myapp importTimeoutsdef wait_app_up(project_name, port):# wait for app until Timeouts['WAIT_APP_UP_IN_SECONDS']# ...# in myapp.test.test_mod1from myapp importTimeoutsdef test_wait_app_up_fail(self):
timeout_bak =Timeouts['WAIT_APP_UP_IN_SECONDS']Timeouts['WAIT_APP_UP_IN_SECONDS']=3with self.assertRaises(hlp.TimeoutException)as cm:
wait_app_up(PROJECT_NAME, PROJECT_PORT)
self.assertEqual("Timeout while waiting for App to start", str(cm.exception))Timeouts['WAIT_JENKINS_UP_TIMEOUT_IN_SECONDS']= timeout_bak
I could achieve cross-module modifiable (or mutable) variables by using a dictionary:
# in myapp.__init__
Timeouts = {} # cross-modules global mutable variables for testing purpose
Timeouts['WAIT_APP_UP_IN_SECONDS'] = 60
# in myapp.mod1
from myapp import Timeouts
def wait_app_up(project_name, port):
# wait for app until Timeouts['WAIT_APP_UP_IN_SECONDS']
# ...
# in myapp.test.test_mod1
from myapp import Timeouts
def test_wait_app_up_fail(self):
timeout_bak = Timeouts['WAIT_APP_UP_IN_SECONDS']
Timeouts['WAIT_APP_UP_IN_SECONDS'] = 3
with self.assertRaises(hlp.TimeoutException) as cm:
wait_app_up(PROJECT_NAME, PROJECT_PORT)
self.assertEqual("Timeout while waiting for App to start", str(cm.exception))
Timeouts['WAIT_JENKINS_UP_TIMEOUT_IN_SECONDS'] = timeout_bak
When launching test_wait_app_up_fail, the actual timeout duration is 3 seconds.
I wondered if it would be possible to avoid some of the disadvantages of using global variables (see e.g. http://wiki.c2.com/?GlobalVariablesAreBad) by using a class namespace rather than a global/module namespace to pass values of variables. The following code indicates that the two methods are essentially identical. There is a slight advantage in using class namespaces as explained below.
The following code fragments also show that attributes or variables may be dynamically created and deleted in both global/module namespaces and class namespaces.
wall.py
# Note no definition of global variables
class router:
""" Empty class """
I call this module ‘wall’ since it is used to bounce variables off of. It will act as a space to temporarily define global variables and class-wide attributes of the empty class ‘router’.
This module imports wall and defines a single function sourcefn which defines a message and emits it by two different mechanisms, one via globals and one via the router function. Note that the variables wall.msg and wall.router.message are defined here for the first time in their respective namespaces.
dest.py
import wall
def destfn():
if hasattr(wall, 'msg'):
print 'global: ' + wall.msg
del wall.msg
else:
print 'global: ' + 'no message'
if hasattr(wall.router, 'msg'):
print 'router: ' + wall.router.msg
del wall.router.msg
else:
print 'router: ' + 'no message'
This module defines a function destfn which uses the two different mechanisms to receive the messages emitted by source. It allows for the possibility that the variable ‘msg’ may not exist. destfn also deletes the variables once they have been displayed.
main.py
import source, dest
source.sourcefn()
dest.destfn() # variables deleted after this call
dest.destfn()
This module calls the previously defined functions in sequence. After the first call to dest.destfn the variables wall.msg and wall.router.msg no longer exist.
The output from the program is:
global: Hello world!
router: Hello world!
global: no message
router: no message
The above code fragments show that the module/global and the class/class variable mechanisms are essentially identical.
If a lot of variables are to be shared, namespace pollution can be managed either by using several wall-type modules, e.g. wall1, wall2 etc. or by defining several router-type classes in a single file. The latter is slightly tidier, so perhaps represents a marginal advantage for use of the class-variable mechanism.