标签归档:Python

“内容”和“文本”有什么区别

问题:“内容”和“文本”有什么区别

我正在使用很棒的Python Requests库。我注意到,精美的文档中有许多示例,说明了如何做某事而不解释其原因。例如,r.textr.content都作为如何获得服务器响应的示例显示。但是,这些属性在哪里解释呢?例如,我什么时候会选择一个?我看到thar 有时会r.text返回一个unicode对象,并且我想非文本响应会有所不同。但是,所有这些都记录在哪里?请注意,链接文档确实声明:

对于非文本请求,您还可以字节形式访问响应主体:

但随后继续显示文本响应的示例!我只能假设上面的引号是说non-text responses而不是non-text requests,因为非文本请求在HTTP中没有意义。

简而言之,相对于Python Requests网站上的(优秀)教程,该库的正确文档在哪里?

I am using the terrific Python Requests library. I notice that the fine documentation has many examples of how to do something without explaining the why. For instance, both r.text and r.content are shown as examples of how to get the server response. But where is it explained what these properties do? For instance, when would I choose one over the other? I see thar r.text returns a unicode object sometimes, and I suppose that there would be a difference for a non-text response. But where is all this documented? Note that the linked document does state:

You can also access the response body as bytes, for non-text requests:

But then it goes on to show an example of a text response! I can only suppose that the quote above means to say non-text responses instead of non-text requests, as a non-text request does not make sense in HTTP.

In short, where is the proper documentation of the library, as opposed to the (excellent) tutorial on the Python Requests site?


回答 0

开发接口进行了详细介绍:

r.text是Unicode中响应的内容,并且r.content是字节中响应的内容。

The requests.Response class documentation has more details:

r.text is the content of the response in Unicode, and r.content is the content of the response in bytes.


回答 1

从文档中可以明显看出,r.content

You can also access the response body as bytes, for non-text requests:

 >>> r.content

如果您进一步阅读该页面,它会处理例如图像文件

It seems clear from the documentation is that r.content

You can also access the response body as bytes, for non-text requests:

 >>> r.content

If you read further down the page it addresses for example an image file


如何将Django QuerySet转换为字典列表?

问题:如何将Django QuerySet转换为字典列表?

如何将Django QuerySet转换为字典列表?我没有找到答案,所以我想知道我是否缺少每个人都使用的某种通用帮助函数。

How can I convert a Django QuerySet into a list of dicts? I haven’t found an answer to this so I’m wondering if I’m missing some sort of common helper function that everyone uses.


回答 0

使用.values()方法:

>>> Blog.objects.values()
[{'id': 1, 'name': 'Beatles Blog', 'tagline': 'All the latest Beatles news.'}],
>>> Blog.objects.values('id', 'name')
[{'id': 1, 'name': 'Beatles Blog'}]

注意:结果是,QuerySet其行为基本上类似于列表,但实际上不是的实例list。使用list(Blog.objects.values(…))如果你真的需要的一个实例list

Use the .values() method:

>>> Blog.objects.values()
[{'id': 1, 'name': 'Beatles Blog', 'tagline': 'All the latest Beatles news.'}],
>>> Blog.objects.values('id', 'name')
[{'id': 1, 'name': 'Beatles Blog'}]

Note: the result is a QuerySet which mostly behaves like a list, but isn’t actually an instance of list. Use list(Blog.objects.values(…)) if you really need an instance of list.


回答 1

.values()方法将返回类型的结果,ValuesQuerySet这通常是大多数情况下所需的。

但是,如果您愿意,可以ValuesQuerySet使用Python列表理解功能将其转换为本地Python列表,如下例所示。

result = Blog.objects.values()             # return ValuesQuerySet object
list_result = [entry for entry in result]  # converts ValuesQuerySet into Python list
return list_result

我发现上面的帮助,如果你正在编写单元测试,并需要断言函数的预期收益值,实际的返回值匹配,在这种情况下,两个expected_resultactual_result必须是同一类型(例如字典)。

actual_result = some_function()
expected_result = {
    # dictionary content here ...
}
assert expected_result == actual_result

The .values() method will return you a result of type ValuesQuerySet which is typically what you need in most cases.

But if you wish, you could turn ValuesQuerySet into a native Python list using Python list comprehension as illustrated in the example below.

result = Blog.objects.values()             # return ValuesQuerySet object
list_result = [entry for entry in result]  # converts ValuesQuerySet into Python list
return list_result

I find the above helps if you are writing unit tests and need to assert that the expected return value of a function matches the actual return value, in which case both expected_result and actual_result must be of the same type (e.g. dictionary).

actual_result = some_function()
expected_result = {
    # dictionary content here ...
}
assert expected_result == actual_result

回答 2

如果出于某种原因(例如JSON序列化)而需要本机数据类型,这是我的快速“ n”处理方式:

data = [{'id': blog.pk, 'name': blog.name} for blog in blogs]

如您所见,在列表中构建字典并不是真正的DRY,所以如果有人知道更好的方法…

If you need native data types for some reason (e.g. JSON serialization) this is my quick ‘n’ dirty way to do it:

data = [{'id': blog.pk, 'name': blog.name} for blog in blogs]

As you can see building the dict inside the list is not really DRY so if somebody knows a better way …


回答 3

您没有完全定义字典的外观,但很可能是您所指的QuerySet.values()。从官方的django文档中

返回一个ValuesQuerySetQuerySet子类,该子类在用作可迭代对象(而不是模型实例对象)时返回字典。

这些词典中的每一个都代表一个对象,其键对应于模型对象的属性名称。

You do not exactly define what the dictionaries should look like, but most likely you are referring to QuerySet.values(). From the official django documentation:

Returns a ValuesQuerySet — a QuerySet subclass that returns dictionaries when used as an iterable, rather than model-instance objects.

Each of those dictionaries represents an object, with the keys corresponding to the attribute names of model objects.


回答 4

输入类型到列表

    job_reports = JobReport.objects.filter(job_id=job_id, status=1).values('id', 'name')

    json.dumps(list(job_reports))

Type Cast to List

    job_reports = JobReport.objects.filter(job_id=job_id, status=1).values('id', 'name')

    json.dumps(list(job_reports))

回答 5

您可以values()在查询所依据的Django模型字段中使用dict上的方法,然后可以通过索引值轻松访问每个字段。

这样称呼它-

myList = dictOfSomeData.values()
itemNumberThree = myList[2] #If there's a value in that index off course...

You can use the values() method on the dict you got from the Django model field you make the queries on and then you can easily access each field by a index value.

Call it like this –

myList = dictOfSomeData.values()
itemNumberThree = myList[2] #If there's a value in that index off course...

回答 6

你需要DjangoJSONEncoderlist使您Querysetjson,参考:Python的JSON序列化一个小数对象

import json
from django.core.serializers.json import DjangoJSONEncoder


blog = Blog.objects.all().values()
json.dumps(list(blog), cls=DjangoJSONEncoder)

You need DjangoJSONEncoder and list to make your Queryset to json, ref: Python JSON serialize a Decimal object

import json
from django.core.serializers.json import DjangoJSONEncoder


blog = Blog.objects.all().values()
json.dumps(list(blog), cls=DjangoJSONEncoder)

回答 7

简单地说list(yourQuerySet)

Simply put list(yourQuerySet).


回答 8

您可以使用model_to_dict定义一个函数,如下所示:

def queryset_to_dict(qs,fields=None, exclude=None):
    my_array=[]
    for x in qs:
        my_array.append(model_to_dict(x,fields=fields,exclude=exclude))
    return my_array

You could define a function using model_to_dict as follows:

def queryset_to_dict(qs,fields=None, exclude=None):
    my_array=[]
    for x in qs:
        my_array.append(model_to_dict(x,fields=fields,exclude=exclude))
    return my_array

查找具有每一行最大值的列名

问题:查找具有每一行最大值的列名

我有一个像这样的DataFrame:

In [7]:
frame.head()
Out[7]:
Communications and Search   Business    General Lifestyle
0   0.745763    0.050847    0.118644    0.084746
0   0.333333    0.000000    0.583333    0.083333
0   0.617021    0.042553    0.297872    0.042553
0   0.435897    0.000000    0.410256    0.153846
0   0.358974    0.076923    0.410256    0.153846

在这里,我想问一下如何获取每一行具有最大值的列名,所需的输出是这样的:

In [7]:
    frame.head()
    Out[7]:
    Communications and Search   Business    General Lifestyle   Max
    0   0.745763    0.050847    0.118644    0.084746           Communications 
    0   0.333333    0.000000    0.583333    0.083333           Business  
    0   0.617021    0.042553    0.297872    0.042553           Communications 
    0   0.435897    0.000000    0.410256    0.153846           Communications 
    0   0.358974    0.076923    0.410256    0.153846           Business 

I have a DataFrame like this one:

In [7]:
frame.head()
Out[7]:
Communications and Search   Business    General Lifestyle
0   0.745763    0.050847    0.118644    0.084746
0   0.333333    0.000000    0.583333    0.083333
0   0.617021    0.042553    0.297872    0.042553
0   0.435897    0.000000    0.410256    0.153846
0   0.358974    0.076923    0.410256    0.153846

In here, I want to ask how to get column name which has maximum value for each row, the desired output is like this:

In [7]:
    frame.head()
    Out[7]:
    Communications and Search   Business    General Lifestyle   Max
    0   0.745763    0.050847    0.118644    0.084746           Communications 
    0   0.333333    0.000000    0.583333    0.083333           Business  
    0   0.617021    0.042553    0.297872    0.042553           Communications 
    0   0.435897    0.000000    0.410256    0.153846           Communications 
    0   0.358974    0.076923    0.410256    0.153846           Business 

回答 0

您可以使用idxmaxwith axis=1查找每一行上具有最大值的列:

>>> df.idxmax(axis=1)
0    Communications
1          Business
2    Communications
3    Communications
4          Business
dtype: object

要创建新的列“ Max”,请使用df['Max'] = df.idxmax(axis=1)

要查找每列中出现最大值的索引,请使用df.idxmax()(或等效地df.idxmax(axis=0))。

You can use idxmax with axis=1 to find the column with the greatest value on each row:

>>> df.idxmax(axis=1)
0    Communications
1          Business
2    Communications
3    Communications
4          Business
dtype: object

To create the new column ‘Max’, use df['Max'] = df.idxmax(axis=1).

To find the row index at which the maximum value occurs in each column, use df.idxmax() (or equivalently df.idxmax(axis=0)).


回答 1

如果要生成包含最大值的列名但仅考虑列子集的列,则可以使用@ajcr答案的变体:

df['Max'] = df[['Communications','Business']].idxmax(axis=1)

And if you want to produce a column containing the name of the column with the maximum value but considering only a subset of columns then you use a variation of @ajcr’s answer:

df['Max'] = df[['Communications','Business']].idxmax(axis=1)

回答 2

您可以apply在数据框上并argmax()通过获取每一行axis=1

In [144]: df.apply(lambda x: x.argmax(), axis=1)
Out[144]:
0    Communications
1          Business
2    Communications
3    Communications
4          Business
dtype: object

这里有一个基准来比较慢apply的方法是idxmax()len(df) ~ 20K

In [146]: %timeit df.apply(lambda x: x.argmax(), axis=1)
1 loops, best of 3: 479 ms per loop

In [147]: %timeit df.idxmax(axis=1)
10 loops, best of 3: 47.3 ms per loop

You could apply on dataframe and get argmax() of each row via axis=1

In [144]: df.apply(lambda x: x.argmax(), axis=1)
Out[144]:
0    Communications
1          Business
2    Communications
3    Communications
4          Business
dtype: object

Here’s a benchmark to compare how slow apply method is to idxmax() for len(df) ~ 20K

In [146]: %timeit df.apply(lambda x: x.argmax(), axis=1)
1 loops, best of 3: 479 ms per loop

In [147]: %timeit df.idxmax(axis=1)
10 loops, best of 3: 47.3 ms per loop

将单个元素添加到numpy中的数组

问题:将单个元素添加到numpy中的数组

我有一个numpy数组,其中包含:

[1, 2, 3]

我想创建一个包含以下内容的数组:

[1, 2, 3, 1]

也就是说,我想将第一个元素添加到数组的末尾。

我尝试了明显的方法:

np.concatenate((a, a[0]))

但是我说错了 ValueError: arrays must have same number of dimensions

我不明白这一点-数组都是一维数组。

I have a numpy array containing:

[1, 2, 3]

I want to create an array containing:

[1, 2, 3, 1]

That is, I want to add the first element on to the end of the array.

I have tried the obvious:

np.concatenate((a, a[0]))

But I get an error saying ValueError: arrays must have same number of dimensions

I don’t understand this – the arrays are both just 1d arrays.


回答 0

append() 创建一个新数组,该数组可以是带有附加元素的旧数组。

我认为使用适当的方法添加元素更为正常:

a = numpy.append(a, a[0])

append() creates a new array which can be the old array with the appended element.

I think it’s more normal to use the proper method for adding an element:

a = numpy.append(a, a[0])

回答 1

如果仅一次或一次附加一次,则np.append在数组上使用应该没问题。这种方法的缺点是每次调用都会为一个全新的数组分配内存。当为大量样本增加数组时,最好预先分配数组(如果知道总大小),或者追加到列表中,然后再转换为数组。

使用np.append

b = np.array([0])
for k in range(int(10e4)):
    b = np.append(b, k)
1.2 s ± 16.1 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

之后使用python列表转换为数组:

d = [0]
for k in range(int(10e4)):
    d.append(k)
f = np.array(d)
13.5 ms ± 277 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

预分配numpy数组:

e = np.zeros((n,))
for k in range(n):
    e[k] = k
9.92 ms ± 752 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

当最终大小未知时,很难进行预分配,我尝试了以50个块为单位进行预分配,但它几乎无法使用列表。

85.1 ms ± 561 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)

When appending only once or once every now and again, using np.append on your array should be fine. The drawback of this approach is that memory is allocated for a completely new array every time it is called. When growing an array for a significant amount of samples it would be better to either pre-allocate the array (if the total size is known) or to append to a list and convert to an array afterward.

Using np.append:

b = np.array([0])
for k in range(int(10e4)):
    b = np.append(b, k)
1.2 s ± 16.1 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

Using python list converting to array afterward:

d = [0]
for k in range(int(10e4)):
    d.append(k)
f = np.array(d)
13.5 ms ± 277 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

Pre-allocating numpy array:

e = np.zeros((n,))
for k in range(n):
    e[k] = k
9.92 ms ± 752 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

When the final size is unkown pre-allocating is difficult, I tried pre-allocating in chunks of 50 but it did not come close to using a list.

85.1 ms ± 561 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)

回答 2

a[0]不是数组,它是的第一个元素,a因此没有尺寸。

尝试a[0:1]改用,这将返回a单个项目数组内的第一个元素。

a[0] isn’t an array, it’s the first element of a and therefore has no dimensions.

Try using a[0:1] instead, which will return the first element of a inside a single item array.


回答 3

试试这个:

np.concatenate((a, np.array([a[0]])))

http://docs.scipy.org/doc/numpy/reference/generation/numpy.concatenate.html

连接需要两个元素都是numpy数组;但是,[0]不是数组。这就是为什么它不起作用。

Try this:

np.concatenate((a, np.array([a[0]])))

http://docs.scipy.org/doc/numpy/reference/generated/numpy.concatenate.html

concatenate needs both elements to be numpy arrays; however, a[0] is not an array. That is why it does not work.


回答 4

这个命令

numpy.append(a, a[0])

不会改变a数组。但是,它返回一个新的修改后的数组。因此,如果a需要修改,则必须使用以下内容。

a = numpy.append(a, a[0])

This command,

numpy.append(a, a[0])

does not alter a array. However, it returns a new modified array. So, if a modification is required, then the following must be used.

a = numpy.append(a, a[0])

回答 5

t = np.array([2, 3])
t = np.append(t, [4])
t = np.array([2, 3])
t = np.append(t, [4])

回答 6

这可能有点矫kill过正,但是我始终将np.take函数用于任何环绕索引:

>>> a = np.array([1, 2, 3])
>>> np.take(a, range(0, len(a)+1), mode='wrap')
array([1, 2, 3, 1])

>>> np.take(a, range(-1, len(a)+1), mode='wrap')
array([3, 1, 2, 3, 1])

This might be a bit overkill, but I always use the the np.take function for any wrap-around indexing:

>>> a = np.array([1, 2, 3])
>>> np.take(a, range(0, len(a)+1), mode='wrap')
array([1, 2, 3, 1])

>>> np.take(a, range(-1, len(a)+1), mode='wrap')
array([3, 1, 2, 3, 1])

回答 7

假设a=[1,2,3]您希望它成为[1,2,3,1]

您可以使用内置的附加功能

np.append(a,1)

这里1是一个整数,它可以是字符串,并且可以属于或不属于数组中的元素。印刷品:[1,2,3,1]

Let’s say a=[1,2,3] and you want it to be [1,2,3,1].

You may use the built-in append function

np.append(a,1)

Here 1 is an int, it may be a string and it may or may not belong to the elements in the array. Prints: [1,2,3,1]


回答 8

如果要添加元素,请使用 append()

a = numpy.append(a, 1) 在这种情况下,在数组末尾添加1

如果要插入元素,请使用 insert()

a = numpy.insert(a, index, 1) 在这种情况下,您可以将1放置在所需的位置,并使用index设置数组中的位置。

If you want to add an element use append()

a = numpy.append(a, 1) in this case add the 1 at the end of the array

If you want to insert an element use insert()

a = numpy.insert(a, index, 1) in this case you can put the 1 where you desire, using index to set the position in the array.


使用pip3安装软件包时,“ Python中的ssl模块不可用”

问题:使用pip3安装软件包时,“ Python中的ssl模块不可用”

我已经在本地计算机上成功安装了Python 3.4和Python 3.6,但是无法使用安装软件包pip3

执行时pip3 install <package>,出现以下与SSL相关的错误:

pip is configured with locations that require TLS/SSL, however the ssl module in Python is not available.
Collecting <package>
  Could not fetch URL https://pypi.python.org/simple/<package>/: There was a problem confirming the ssl certificate: Can't connect to HTTPS URL because the SSL module is not available. - skipping
  Could not find a version that satisfies the requirement <package> (from versions: )
No matching distribution found for <package>

如何修复Python3.x安装,以便可以使用安装软件包pip install <package>

I’ve install Python 3.4 and Python 3.6 on my local machine successfully, but am unable to install packages with pip3.

When I execute pip3 install <package>, I get the following SSL related error:

pip is configured with locations that require TLS/SSL, however the ssl module in Python is not available.
Collecting <package>
  Could not fetch URL https://pypi.python.org/simple/<package>/: There was a problem confirming the ssl certificate: Can't connect to HTTPS URL because the SSL module is not available. - skipping
  Could not find a version that satisfies the requirement <package> (from versions: )
No matching distribution found for <package>

How can I fix my Python3.x install so that I can install packages with pip install <package>?


回答 0

在Ubuntu中安装Python 3.6和pip3的分步指南

  1. 为Python和ssl安装必要的软件包: $ sudo apt-get install libreadline-gplv2-dev libncursesw5-dev libssl-dev libsqlite3-dev tk-dev libgdbm-dev libc6-dev libbz2-dev

  2. https://www.python.org/ftp/python/下载“ Python-3.6.8.tar.xz”并将其解压缩到您的主目录中。

  3. 在该目录中打开终端并运行: $ ./configure

  4. 构建并安装: $ sudo make && sudo make install

  5. 使用以下方法安装软件包: $ pip3 install package_name

Step by step guide to install Python 3.6 and pip3 in Ubuntu

  1. Install the necessary packages for Python and ssl: $ sudo apt-get install libreadline-gplv2-dev libncursesw5-dev libssl-dev libsqlite3-dev tk-dev libgdbm-dev libc6-dev libbz2-dev

  2. Download and unzip “Python-3.6.8.tar.xz” from https://www.python.org/ftp/python/ into your home directory.

  3. Open terminal in that directory and run: $ ./configure

  4. Build and install: $ make && sudo make install

  5. Install packages with: $ pip3 install package_name

Disclaimer: The above commands are not tested in Ubuntu 20.04 LTS.


回答 1

如果您使用的是Red Hat / CentOS:

# To allow for building python ssl libs
yum install openssl-devel
# Download the source of *any* python version
cd /usr/src
wget https://www.python.org/ftp/python/3.6.2/Python-3.6.2.tar.xz
tar xf Python-3.6.2.tar.xz 
cd Python-3.6.2

# Configure the build w/ your installed libraries
./configure

# Install into /usr/local/bin/python3.6, don't overwrite global python bin
make altinstall

If you are on Red Hat/CentOS:

# To allow for building python ssl libs
yum install openssl-devel
# Download the source of *any* python version
cd /usr/src
wget https://www.python.org/ftp/python/3.6.2/Python-3.6.2.tar.xz
tar xf Python-3.6.2.tar.xz 
cd Python-3.6.2

# Configure the build w/ your installed libraries
./configure

# Install into /usr/local/bin/python3.6, don't overwrite global python bin
make altinstall

回答 2

如果您使用Windows并使用anaconda,则对我有用

我尝试了许多其他无效的解决方案(环境PATH变量更改…)

该问题可能是由其他软件放置在Windows \ System32文件夹中的DLL(例如libcrypto-1_1-x64.dll或libssl-1_1-x64.dll或其他)引起的。

该修补程序是从https://slproweb.com/products/Win32OpenSSL.html安装openSSL的,它将dll替换为较新的版本。

If you are on Windows and use anaconda this worked for me:

I tried a lot of other solutions which did not work (Environment PATH Variable changes …)

The problem can be caused by DLLs in the Windows\System32 folder (e.g. libcrypto-1_1-x64.dll or libssl-1_1-x64.dll or others) placed there by other software.

The fix was installing openSSL from https://slproweb.com/products/Win32OpenSSL.html which replaces the dlls by more recent versions.


回答 3

我在OSX 10.11上遇到了类似的问题,原因是安装了memcached,它在3.6之上安装了python 3.7。

警告:pip配置了需要TLS / SSL的位置,但是Python中的ssl模块不可用。

花了几个小时取消openssl的链接,重新安装,更改路径..无济于事。将openssl版本从改回较旧的版本可以达到目的:

brew switch openssl 1.0.2e

我在互联网上的任何地方都没有看到这个建议。希望它能为某人服务。

I had a similar problem on OSX 10.11 due to installing memcached which installed python 3.7 on top of 3.6.

WARNING: pip is configured with locations that require TLS/SSL, however the ssl module in Python is not available.

Spent hours on unlinking openssl, reinstalling, changing paths.. and nothing helped. Changing openssl version back from to older version did the trick:

brew switch openssl 1.0.2e

I did not see this suggestion anywhere in internet. Hope it serves someone.


回答 4

同意mastaBlasta的回答。为我工作。我遇到了与主题描述相同的问题。

环境:MacOS Sierra。我用自制的。

我的解决方案:

  1. 通过以下方式重新安装openssl brew uninstall openssl; brew install openssl
  2. 根据Homebrew提供的提示,执行以下操作:

    echo 'export PATH="/usr/local/opt/openssl/bin:$PATH"' >> ~/.bash_profile
    export LDFLAGS="-L/usr/local/opt/openssl/lib"
    export CPPFLAGS="-I/usr/local/opt/openssl/include"

Agree with the answer by mastaBlasta. Worked for me. I encountered the same problem as the topic description.

Environment: MacOS Sierra. And I use Homebrew.

My solution:

  1. Reinstall openssl by brew uninstall openssl; brew install openssl
  2. According to the hints given by Homebrew, do the following:

    echo 'export PATH="/usr/local/opt/openssl/bin:$PATH"' >> ~/.bash_profile
    export LDFLAGS="-L/usr/local/opt/openssl/lib"
    export CPPFLAGS="-I/usr/local/opt/openssl/include"
    

回答 5

在Ubuntu中,这可以帮助:

cd Python-3.6.2
./configure --with-ssl
make
sudo make install

In Ubuntu, this can help:

cd Python-3.6.2
./configure --with-ssl
make
sudo make install

回答 6

降级openssl对我有用

brew switch openssl 1.0.2s

Downgrading openssl worked for me,

brew switch openssl 1.0.2s

回答 7

该问题可能是由于缺少库引起的。

在安装python 3.6之前,请确保已安装python所需的所有库。

$ sudo apt-get install build-essential checkinstall 
$ sudo apt-get install libreadline-gplv2-dev libncursesw5-dev libssl-dev libsqlite3-dev tk-dev libgdbm-dev libc6-dev libbz2-dev

有关如何在Ubuntu和LinuxMint上安装Python 3.6.0的更多信息

The problem probably caused by library missing.

Before you install python 3.6, make sure you install all the libraries required for python.

$ sudo apt-get install build-essential checkinstall 
$ sudo apt-get install libreadline-gplv2-dev libncursesw5-dev libssl-dev libsqlite3-dev tk-dev libgdbm-dev libc6-dev libbz2-dev

More information in How to Install Python 3.6.0 on Ubuntu & LinuxMint


回答 8

如果您在OSX上并且已经从源代码编译了python:

使用brew安装openssl brew install openssl

确保遵循brew给您的有关设置CPPFLAGS和的说明LDFLAGS。就我而言,我使用的是openssl@1.1brew公式,并且在python构建过程中需要以下3个设置才能正确链接到我的SSL库:

export LDFLAGS="-L/usr/local/opt/openssl@1.1/lib"
export CPPFLAGS="-I/usr/local/opt/openssl@1.1/include"
export PKG_CONFIG_PATH="/usr/local/opt/openssl@1.1/lib/pkgconfig"

假设库安装在该位置。

If you are on OSX and have compiled python from source:

Install openssl using brew brew install openssl

Make sure to follow the instructions brew gives you about setting your CPPFLAGS and LDFLAGS. In my case I am using the openssl@1.1 brew formula and I need these 3 settings for the python build process to correctly link to my SSL library:

export LDFLAGS="-L/usr/local/opt/openssl@1.1/lib"
export CPPFLAGS="-I/usr/local/opt/openssl@1.1/include"
export PKG_CONFIG_PATH="/usr/local/opt/openssl@1.1/lib/pkgconfig"

Assuming the library is installed at that location.


回答 9

我在Windows 10上遇到了同样的问题。我的非常具体的问题是由于安装了Anaconda。我安装了Anaconda并在路径下Path/to/Anaconda3/出现了python.exe。因此,我完全没有安装python,因为Anaconda包含python。使用pip安装软件包时,我发现了相同的错误报告pip is configured with locations that require TLS/SSL, however the ssl module in Python is not available.

解决方案如下:

1)您可以在官方网站上再次下载python

2)导航到目录 "Python 3.7 (64-bit).lnk"位于

3)import sslexit()

4)例如输入cmd "Python 3.7 (64-bit).lnk" -m pip install tensorflow

在这里,您都准备好了。

I encountered the same problem on windows 10. My very specific issue is due to my installation of Anaconda. I installed Anaconda and under the path Path/to/Anaconda3/, there comes the python.exe. Thus, I didn’t install python at all because Anaconda includes python. When using pip to install packages, I found the same error report, pip is configured with locations that require TLS/SSL, however the ssl module in Python is not available..

The solution was the following:

1) you can download python again on the official website;

2) Navigate to the directory where "Python 3.7 (64-bit).lnk"is located

3) import ssl and exit()

4) type in cmd, "Python 3.7 (64-bit).lnk" -m pip install tensorflow for instance.

Here, you’re all set.


回答 10

如果您使用的是Windows和使用Anaconda你可以尝试在Python提示符,而非cmd.exe运行“PIP安装…”命令,为用户willliu1995表明这里。对于我来说,这是最快的解决方案,不需要安装其他组件。

If you are on Windows and use Anaconda you can try running “pip install …” command in Anaconda Prompt instead of cmd.exe, as user willliu1995 suggests here. This was the fastest solution for me, that does not require installation of additional components.


回答 11

我尝试了很多方法来解决此问题,但都没有解决。我目前在Windows 10上。

唯一有效的是:

  • 卸载Anaconda
  • 卸载Python(我使用的是版本3.7.3)
  • 再次安装Python(请记住选择自动添加到PATH的选项)

然后,我使用PIP下载了我需要的所有库…并开始工作!

不知道为什么,或者问题是否与水蟒有关。

I tried A LOT of ways to solve this problem and none solved. I’m currently on Windows 10.

The only thing that worked was:

  • Uninstall Anaconda
  • Uninstall Python (i was using version 3.7.3)
  • Install Python again (remember to check the option to automatically add to PATH)

Then I’ve downloaded all the libs I needed using PIP… and worked!

Don’t know why, or if the problem was somehow related to Anaconda.


回答 12

对于osx brew用户

我的问题似乎与我的python安装有关,并且通过重新安装python3和pip很快得到解决。我认为它在操作系统更新后开始出现异常,但是谁知道(目前我在Mac OS 10.14.6上)

brew reinstall python3 --force
# setup pip
curl https://bootstrap.pypa.io/get-pip.py -o get-pip.py
python3 get-pip.py
# installa pkg successfully 
pip install pandas

for osx brew users

my issue appeared related to my python installation and was quickly resolved by re-installing python3 and pip. i think it started misbehaving after an OS update but who knows (at this time I am on Mac OS 10.14.6)

brew reinstall python3 --force
# setup pip
curl https://bootstrap.pypa.io/get-pip.py -o get-pip.py
python3 get-pip.py
# installa pkg successfully 
pip install pandas

回答 13

您可以执行以下两个操作之一:

  1. 在安装Anaconda时,选择将Anaconda添加到path的选项。

要么

  1. 从Anaconda的安装文件夹中找到这些(完整)路径,并将它们添加到环境变量中

\ Anaconda

\ Anaconda \ Library \ mingw-w64 \ bin

\ Anaconda \ Library \ usr \ bin

\ Anaconda \ Library \ bin

\ Anaconda \脚本

\ anaconda \ Library

\ anaconda \ condabin

将以上路径添加到“ Path”系统变量中,它应该不再显示错误:)

You can do either of these two:

  1. While installing Anaconda, select the option to add Anaconda to the path.

or

  1. Find these (complete) paths from your installation folder of Anaconda and add them to the environment variable :

\Anaconda

\Anaconda\Library\mingw-w64\bin

\Anaconda\Library\usr\bin

\Anaconda\Library\bin

\Anaconda\Scripts

\anaconda\Library

\anaconda\condabin

Add the above paths to the “Path” system variable and it should show the error no more :)


回答 14

我遇到了同样的问题,可以通过以下步骤解决:

sudo yum install -y libffi-devel
sudo yum install openssl-devel
cd /usr/src
sudo wget https://www.python.org/ftp/python/3.7.1/Python-3.7.1.tar.xz
sudo tar xf Python-3.7.1.tar.xz
cd Python-3.7.1
sudo ./configure --enable-optimizations
# Install into /usr/local/bin/python3.7, don't overwrite global python bin
sudo make altinstall

根据烫发的不同,您可能不需要sudo。

Results:
Collecting setuptools
Collecting pip
Installing collected packages: setuptools, pip
Successfully installed pip-10.0.1 setuptools-39.0.1

现在应该可以运行了

python3.7 -V 

pip3.7 -V

安装软件包时:

pip3.7 install pandas

或根据权限,还可以添加–user标志,如下所示:

pip3.7 install pandas --user

I was having the same issue and was able to resolve with the following steps:

sudo yum install -y libffi-devel
sudo yum install openssl-devel
cd /usr/src
sudo wget https://www.python.org/ftp/python/3.7.1/Python-3.7.1.tar.xz
sudo tar xf Python-3.7.1.tar.xz
cd Python-3.7.1
sudo ./configure --enable-optimizations
# Install into /usr/local/bin/python3.7, don't overwrite global python bin
sudo make altinstall

depending on perms, you may not need sudo.

Results:
Collecting setuptools
Collecting pip
Installing collected packages: setuptools, pip
Successfully installed pip-10.0.1 setuptools-39.0.1

should now be able to run

python3.7 -V 

and

pip3.7 -V

When installing packages:

pip3.7 install pandas

or depending on perms, you can also add the –user flag like so:

pip3.7 install pandas --user

回答 15

ssl模块是TLS / SSL包装器,用于访问Operation Sytem(OS)套接字(Lib / ssl.py)。因此,当ssl模块不可用时,很可能您没有安装OS OpenSSL库,或者在安装Python时找不到这些库。假设是一种较晚的情况(又名:您已经安装了OpenSSL,但是在安装Python时未正确链接它们)。

我还将假设您是从源代码安装的。如果要从二进制文件(即Window .exe文件)或软件包(Mac .dmg或Ubuntu apt)进行安装,则安装过程将无济于事。

在配置python安装的步骤中,您需要指定OS OpenSSL用于链接的位置:

# python 3.8 beta
./configure --with-openssl="your_OpenSSL root"

那么,在哪里可以找到已安装的OpenSSL目录?

# ubuntu 
locate ssl.h | grep '/openssl/ssl.h'

/home/user/.linuxbrew/Cellar/openssl/1.0.2r/include/openssl/ssl.h
/home/user/envs/py37/include/openssl/ssl.h
/home/user/miniconda3/envs/py38b3/include/openssl/ssl.h
/home/user/miniconda3/include/openssl/ssl.h
/home/user/miniconda3/pkgs/openssl-1.0.2s-h7b6447c_0/include/openssl/ssl.h
/home/user/miniconda3/pkgs/openssl-1.1.1b-h7b6447c_1/include/openssl/ssl.h
/home/user/miniconda3/pkgs/openssl-1.1.1c-h7b6447c_1/include/openssl/ssl.h
/usr/include/openssl/ssl.h

您的系统可能与我的系统不同,但是正如您在此处看到的,我安装了许多不同的openssl库。在撰写本文时,python 3.8期望openssl 1.0.2或1.1:

Python需要具有X509_VERIFY_PARAM_set1_host()的OpenSSL 1.0.2或1.1兼容libssl。

因此,您将需要验证可用于链接的已安装库中的哪一个,例如

/usr/bin/openssl version

OpenSSL 1.0.2g  1 Mar 2016
./configure --with-openssl="/usr"
make && make install

您可能需要尝试一些或安装新的库,以找到适用于您的Python和OS的库。

The ssl module is a TLS/SSL wrapper for accessing Operation Sytem (OS) socket (Lib/ssl.py). So when ssl module is not available, chances are that you either don’t have OS OpenSSL libraries installed, or those libraries were not found when you install Python. Let assume it is a later case (aka: you already have OpenSSL installed, but they are not correctly linked when installing Python).

I will also assume you are installing from source. If you are installing from binary (ie: Window .exe file), or package (Mac .dmg, or Ubuntu apt), there is not much you can do with the installing process.

During the step of configuring your python installation, you need to specify where the OS OpenSSL will be used for linking:

# python 3.8 beta
./configure --with-openssl="your_OpenSSL root"

So where will you find your installed OpenSSL directory?

# ubuntu 
locate ssl.h | grep '/openssl/ssl.h'

/home/user/.linuxbrew/Cellar/openssl/1.0.2r/include/openssl/ssl.h
/home/user/envs/py37/include/openssl/ssl.h
/home/user/miniconda3/envs/py38b3/include/openssl/ssl.h
/home/user/miniconda3/include/openssl/ssl.h
/home/user/miniconda3/pkgs/openssl-1.0.2s-h7b6447c_0/include/openssl/ssl.h
/home/user/miniconda3/pkgs/openssl-1.1.1b-h7b6447c_1/include/openssl/ssl.h
/home/user/miniconda3/pkgs/openssl-1.1.1c-h7b6447c_1/include/openssl/ssl.h
/usr/include/openssl/ssl.h

Your system may be different than mine, but as you see here I have many different installed openssl libraries. As the time of this writing, python 3.8 expects openssl 1.0.2 or 1.1:

Python requires an OpenSSL 1.0.2 or 1.1 compatible libssl with X509_VERIFY_PARAM_set1_host().

So you would need to verify which of those installed libraries that you can use for linking, for example

/usr/bin/openssl version

OpenSSL 1.0.2g  1 Mar 2016
./configure --with-openssl="/usr"
make && make install

You may need to try a few, or install a new, to find the library that would work for your Python and your OS.


回答 16

在使用Mac的情况下,我删除了 /Applications/Python 3.7。因为我已经有了Python3.7brew install python3

但这是信息的触发

pip配置了需要TLS / SSL的位置,但是Python中的ssl模块不可用。

我在情况下所做的

  1. 我再次下载了macOS 64位安装程序,并进行了安装。
  2. 双击/Applications/Python3.6/Install Certificates.command/Applications/Python3.6/Update Shell Profile.command
  3. 重新启动Mac
  4. 我不确定但可能对成功有所贡献pip.conf。请参阅pip安装失败

In my case with using Mac, I deleted /Applications/Python 3.7. because I already had Python3.7 by brew install python3 .

But it was a trigger of the message

pip is configured with locations that require TLS/SSL, however the ssl module in Python is not available.

What I did in my situation

  1. I downloaded macOS 64-bit installer again, and installed.
  2. Double click /Applications/Python3.6/Install Certificates.command and /Applications/Python3.6/Update Shell Profile.command.
  3. Reboot mac
  4. And I am not sure but possibly contributed to succeed is pip.conf. See pip install fails.

回答 17

我终于解决了这个问题。这些是我的环境的详细信息:
要安装的Python版本:3.6.8
操作系统:Ubuntu 16.04.6 LTS
根访问权限:否

有些人建议安装libssl-dev,但对我而言不起作用。我点击此链接,并对其进行了修复!
简而言之,我下载,提取,构建和安装OpenSSL(openssl-1.1.1b.tar.gz)。然后,我通过该链接修改.bashrc文件。 接下来,我下载并解压缩Python-3.6.8.tgz。我编辑Modules / Setup.dist来修改SSL路径(#211周围的行)。我做了,和。最后,我修改了。请注意,我没有包含在中。
./configure --prefix=$HOME/opt/python-3.6.8makemake install.bashrc--enable-optimizations./configure

I finally solve this issue. These are the detail of my env:
Version of Python to install: 3.6.8
OS: Ubuntu 16.04.6 LTS
Root access: No

Some people suggest to install libssl-dev, but it did not work for me. I follow this link and I fixed it!
In short, I download, extract, build, and install OpenSSL (openssl-1.1.1b.tar.gz). Then, I modify .bashrc file follow this link.
Next, I download and extract Python-3.6.8.tgz. I edit Modules/Setup.dist to modify SSL path (lines around #211). I did ./configure --prefix=$HOME/opt/python-3.6.8, make and make install. Last, I modify my .bashrc. Notice that I do not include --enable-optimizations in ./configure.


回答 18

在macOS上,使用以下命令配置python 3.8.1将解决此问题,我认为它也可以在Linux上运行。

./configure --enable-optimizations --with-openssl=/usr/local/opt/openssl@1.1/

根据您的系统更改dir参数。

On macos, configure python 3.8.1 with the command below will solve the problem, i think it would also work on Linux.

./configure --enable-optimizations --with-openssl=/usr/local/opt/openssl@1.1/

change the dir parameter based on your system.


回答 19

如果您使用的是OSX,以防其他解决方案对您不起作用(就像我一样)。

您可以尝试卸载python3并升级pip3

brew uninstall --ignore-dependencies python3
pip3 install --upgrade pip   

这对我有用;)

If you are on OSX and in case the other solutions didn’t work for you (just like me).

You can try uninstalling python3 and upgrade pip3

brew uninstall --ignore-dependencies python3
pip3 install --upgrade pip   

This worked for me ;)


回答 20

(在Windows上不!)

这使我把头发扯了一个星期,所以我希望这可以帮助某人

除了重新安装Anaconda和/或Jupyter外,我尝试了所有尝试。

建立

  • AWS Linux
  • 手动安装的Anaconda 3-5.3.0
  • Python3(3.7)在anaconda内部运行(即./anaconda3/bin/python
  • 也有/usr/bin/python/usr/bin/python3(但大部分工作在Jupyter的终端做这些没有被使用)

固定

在Jupyter的终端中:

cp /usr/lib64/libssl.so.10 ./anaconda3/lib/libssl.so.1.0.0

cp /usr/lib64/libcrypto.so.10 ./anaconda3/lib/libcrypto.so.1.0.0

是什么触发了这个?

所以,这一切都是有效的,直到我尝试做一个 conda install conda-forge

我不确定发生了什么,但是conda必须已更新 openssl在包装盒上(我猜是这样),所以在此之后,一切都坏了。

基本上,不认得我,畅达了OpenSSL更新,但不知何故,删除旧的图书馆,取而代之的是libssl.so.1.1libcrypto.so.1.1

我猜Python3是为了寻找而编译的 libssl.so.1.0.0

最后,诊断的关键是:

python -c "import ssl; print (ssl.OPENSSL_VERSION)"

给出了线索 library "libssl.so.1.0.0" not found

我做出的巨大假设是yumssl 的版本与conda版本,因此只要重命名共享库就可以了,而且确实如此。

我的另一个解决方案是重新编译python,重新安装anaconda等,但是最后我很高兴我不需要。

希望这可以帮助你们。

(NOT on Windows!)

This made me tear my hair out for a week, so I hope this will help someone

I tried everything short of re-installing Anaconda and/or Jupyter.

Setup

  • AWS Linux
  • Manually installed Anaconda 3-5.3.0
  • Python3 (3.7) was running inside anaconda (ie, ./anaconda3/bin/python)
  • there was also /usr/bin/python and /usr/bin/python3 (but these were not being used as most of the work was done in Jupyter’s terminal)

Fix

In Jupyter’s terminal:

cp /usr/lib64/libssl.so.10 ./anaconda3/lib/libssl.so.1.0.0

cp /usr/lib64/libcrypto.so.10 ./anaconda3/lib/libcrypto.so.1.0.0

What triggered this?

So, this was all working until I tried to do a conda install conda-forge

I’m not sure what happened, but conda must have updated openssl on the box (I’m guessing) so after this, everything broke.

Basically, unknown to me, conda had updated openssl, but somehow deleted the old libraries and replaced it with libssl.so.1.1 and libcrypto.so.1.1.

Python3, I guess, was compiled to look for libssl.so.1.0.0

In the end, the key to diagnosis was this:

python -c "import ssl; print (ssl.OPENSSL_VERSION)"

gave the clue library "libssl.so.1.0.0" not found

The huge assumption I made is that the yum version of ssl is the same as the conda version, so just renaming the shared object might work, and it did.

My other solution was to re-compile python, re-install anaconda, etc, but in the end I’m glad I didn’t need to.

Hope this helps you guys out.


回答 21

pyenv用于管理Mac OS Catalina上的python安装的情况下,我必须先openssl使用brew 安装,然后再运行pyenv install 3.7.8,这似乎是使用opensslfrom homebrew 来构建python安装的(甚至在安装输出中也是如此)。然后pyenv global 3.7.8,我不在了。

In the case of using pyenv to manage python installations on Mac OS Catalina, I had to install openssl with brew first and then after that run pyenv install 3.7.8 which seemed to build the python installation using the openssl from homebrew (it even said as such in the installation output). Then pyenv global 3.7.8 and I was away.


回答 22

我可以通过更新此文件中的python版本来解决此问题。pyenv:未安装版本“ 3.6.5”(由/Users/taruntarun/.python-version设置)尽管我安装了最新版本,但我的命令仍在使用旧版本3.6.5

移至版本3.7.3

I was able to fix this by updating the python version in this file. pyenv: version `3.6.5′ is not installed (set by /Users/taruntarun/.python-version) Though i had the latest version installed, my command was still using old version 3.6.5

Moving to version 3.7.3


回答 23

最近两天我遇到了同样的问题,现在才解决。

我曾试图使用--trust-host与该选项DigiCert_High_Assurance_EV_Root_CA.pem没有工作,我无法安装SSL模块(它告诉它不能被安装Python版本大于2.6),设置$PIP_CERT可变没修好,要么,我不得不libssl1.0.2libssl1.0.0安装。还值得一提的是,我没有~/.pip/pip.conf文件,创建文件也无法解决该错误。

最终解决问题的是make再次安装了python3.6 。从网站上下载的Python-3.6.0.tgz,跑configure那么makemake testmake install。希望对你有效。

I was having the same problem for the last two days and only have fixed it right now.

I had tried to use --trust-host option with the DigiCert_High_Assurance_EV_Root_CA.pem did not work, I couldn’t install the ssl module (It tells it cannot be installed for python versions greater than 2.6), setting the $PIP_CERT variable didn’t fix it either and I had libssl1.0.2 and libssl1.0.0 installed. Also worth mentioning I didn’t had a ~/.pip/pip.conf file, and creating it didn’t solve the bug either.

What finally solved it, was installing python3.6 with make again. Download the Python-3.6.0.tgz from the website, run configure then make, make test and make install. Hope it works for you.


回答 24

Python文档实际上是很明确的,按照指示做了,而其他的答案我发现这里并没有解决这个问题的工作。

  1. 首先,使用例如版本3.6.2 https://www.python.org/ftp/python/3.6.2/Python-3.6.2.tar.xz从源安装python3.xx。

  2. 确保通过运行安装了openssl brew install openssl

  3. 解压缩并移至python目录: tar xvzf Python-3.6.2.tar.xz && cd Python-3.6.2

  4. 然后,如果python版本<3.7,请运行

CPPFLAGS="-I$(brew --prefix openssl)/include" \ LDFLAGS="-L$(brew --prefix openssl)/lib" \ ./configure --with-pydebug 5.最后,运行make -s -j2-s是无声标志,-j2告诉您的计算机使用2个作业)

The python documentation is actually very clear, and following the instructions did the job whereas other answers I found here were not fixing this issue.

  1. first, install python 3.x.x from source using, for example with version 3.6.2 https://www.python.org/ftp/python/3.6.2/Python-3.6.2.tar.xz

  2. make sure you have openssl installed by running brew install openssl

  3. unzip it and move to the python directory: tar xvzf Python-3.6.2.tar.xz && cd Python-3.6.2

  4. then if the python version is < 3.7, run

CPPFLAGS="-I$(brew --prefix openssl)/include" \ LDFLAGS="-L$(brew --prefix openssl)/lib" \ ./configure --with-pydebug 5. finallly, run make -s -j2 (-s is the silent flag, -j2 tells your machine to use 2 jobs)


回答 25

我在尝试在ubuntu14.04机器上安装python3.7时遇到了同样的问题。问题是我在PKG_CONFIG_PATH和LD_LIBRARY_PATH中有一些自定义文件夹,这阻止了python构建过程来找到系统openssl库。

因此,请尝试清除它们并查看会发生什么情况:

export PKG_CONFIG_PATH=""
export LD_LIBRARY_PATH=""

I had the same issue trying to install python3.7 on an ubuntu14.04 machine. The issue was that I had some custom folders in my PKG_CONFIG_PATH and in my LD_LIBRARY_PATH, which prevented the python build process to find the system openssl libraries.

so try to clear them and see what happens:

export PKG_CONFIG_PATH=""
export LD_LIBRARY_PATH=""

回答 26

好的,对此的最新答案是,截至目前,请勿使用Python 3.8,仅使用3.7或更小版本,因为大多数库由于上述错误而无法安装

Ok the latest answer to this, as of now don’t use Python 3.8, use only 3.7 or less , because of most of the libraries fail to install with the above error


如何在一次分配中向熊猫数据框添加多列?

问题:如何在一次分配中向熊猫数据框添加多列?

我是熊猫的新手,试图弄清楚如何同时向熊猫添加多列。感谢您的帮助。理想情况下,我希望一步一步完成此操作,而不是重复多次…

import pandas as pd

df = {'col_1': [0, 1, 2, 3],
        'col_2': [4, 5, 6, 7]}
df = pd.DataFrame(df)

df[[ 'column_new_1', 'column_new_2','column_new_3']] = [np.nan, 'dogs',3]  #thought this would work here...

I’m new to pandas and trying to figure out how to add multiple columns to pandas simultaneously. Any help here is appreciated. Ideally I would like to do this in one step rather than multiple repeated steps…

import pandas as pd

df = {'col_1': [0, 1, 2, 3],
        'col_2': [4, 5, 6, 7]}
df = pd.DataFrame(df)

df[[ 'column_new_1', 'column_new_2','column_new_3']] = [np.nan, 'dogs',3]  #thought this would work here...

回答 0

我希望您的语法也能正常工作。出现问题是因为当您使用column-list语法(df[[new1, new2]] = ...)创建新列时,pandas要求右侧为DataFrame(请注意,如果DataFrame的列与列的名称相同,则实际上并不重要您正在创建)。

您的语法可以很好地为现有列分配标量值,并且pandas也很乐意使用单列语法(df[new1] = ...)将标量值分配给新列。因此,解决方案是将其转换为几个单列分配,或者为右侧创建一个合适的DataFrame。

这里有几种方法是工作:

import pandas as pd
import numpy as np

df = pd.DataFrame({
    'col_1': [0, 1, 2, 3],
    'col_2': [4, 5, 6, 7]
})

然后执行以下操作之一:

1)使用列表拆包,将三个作业合二为一:

df['column_new_1'], df['column_new_2'], df['column_new_3'] = [np.nan, 'dogs', 3]

2)DataFrame方便地扩展单个行以匹配索引,因此您可以执行以下操作:

df[['column_new_1', 'column_new_2', 'column_new_3']] = pd.DataFrame([[np.nan, 'dogs', 3]], index=df.index)

3)用新列创建一个临时数据框,然后与原始数据框合并:

df = pd.concat(
    [
        df,
        pd.DataFrame(
            [[np.nan, 'dogs', 3]], 
            index=df.index, 
            columns=['column_new_1', 'column_new_2', 'column_new_3']
        )
    ], axis=1
)

4)与前面类似,但是使用join代替concat(可能效率较低):

df = df.join(pd.DataFrame(
    [[np.nan, 'dogs', 3]], 
    index=df.index, 
    columns=['column_new_1', 'column_new_2', 'column_new_3']
))

5)使用dict比前两个更“自然”地创建新数据框,但是新列将按字母顺序排序(至少在Python 3.6或3.7之前):

df = df.join(pd.DataFrame(
    {
        'column_new_1': np.nan,
        'column_new_2': 'dogs',
        'column_new_3': 3
    }, index=df.index
))

6).assign()与多个列参数一起使用。

我非常喜欢@zero的答案中的此变体,但像上一个一样,新列将始终按字母顺序排序,至少在早期版本的Python中:

df = df.assign(column_new_1=np.nan, column_new_2='dogs', column_new_3=3)

7)这很有趣(基于https://stackoverflow.com/a/44951376/3830997),但是我不知道什么时候值得这样做:

new_cols = ['column_new_1', 'column_new_2', 'column_new_3']
new_vals = [np.nan, 'dogs', 3]
df = df.reindex(columns=df.columns.tolist() + new_cols)   # add empty cols
df[new_cols] = new_vals  # multi-column assignment works for existing cols

8)最后,很难击败三个独立的任务:

df['column_new_1'] = np.nan
df['column_new_2'] = 'dogs'
df['column_new_3'] = 3

注意:这些选项中的许多选项已经包含在其他答案中:将多个列添加到DataFrame并将它们设置为等于现有列是否可以一次将多个列添加到pandas DataFrame?向pandas DataFrame添加多个空列

I would have expected your syntax to work too. The problem arises because when you create new columns with the column-list syntax (df[[new1, new2]] = ...), pandas requires that the right hand side be a DataFrame (note that it doesn’t actually matter if the columns of the DataFrame have the same names as the columns you are creating).

Your syntax works fine for assigning scalar values to existing columns, and pandas is also happy to assign scalar values to a new column using the single-column syntax (df[new1] = ...). So the solution is either to convert this into several single-column assignments, or create a suitable DataFrame for the right-hand side.

Here are several approaches that will work:

import pandas as pd
import numpy as np

df = pd.DataFrame({
    'col_1': [0, 1, 2, 3],
    'col_2': [4, 5, 6, 7]
})

Then one of the following:

1) Three assignments in one, using list unpacking:

df['column_new_1'], df['column_new_2'], df['column_new_3'] = [np.nan, 'dogs', 3]

2) DataFrame conveniently expands a single row to match the index, so you can do this:

df[['column_new_1', 'column_new_2', 'column_new_3']] = pd.DataFrame([[np.nan, 'dogs', 3]], index=df.index)

3) Make a temporary data frame with new columns, then combine with the original data frame later:

df = pd.concat(
    [
        df,
        pd.DataFrame(
            [[np.nan, 'dogs', 3]], 
            index=df.index, 
            columns=['column_new_1', 'column_new_2', 'column_new_3']
        )
    ], axis=1
)

4) Similar to the previous, but using join instead of concat (may be less efficient):

df = df.join(pd.DataFrame(
    [[np.nan, 'dogs', 3]], 
    index=df.index, 
    columns=['column_new_1', 'column_new_2', 'column_new_3']
))

5) Using a dict is a more “natural” way to create the new data frame than the previous two, but the new columns will be sorted alphabetically (at least before Python 3.6 or 3.7):

df = df.join(pd.DataFrame(
    {
        'column_new_1': np.nan,
        'column_new_2': 'dogs',
        'column_new_3': 3
    }, index=df.index
))

6) Use .assign() with multiple column arguments.

I like this variant on @zero’s answer a lot, but like the previous one, the new columns will always be sorted alphabetically, at least with early versions of Python:

df = df.assign(column_new_1=np.nan, column_new_2='dogs', column_new_3=3)

7) This is interesting (based on https://stackoverflow.com/a/44951376/3830997), but I don’t know when it would be worth the trouble:

new_cols = ['column_new_1', 'column_new_2', 'column_new_3']
new_vals = [np.nan, 'dogs', 3]
df = df.reindex(columns=df.columns.tolist() + new_cols)   # add empty cols
df[new_cols] = new_vals  # multi-column assignment works for existing cols

8) In the end it’s hard to beat three separate assignments:

df['column_new_1'] = np.nan
df['column_new_2'] = 'dogs'
df['column_new_3'] = 3

Note: many of these options have already been covered in other answers: Add multiple columns to DataFrame and set them equal to an existing column, Is it possible to add several columns at once to a pandas DataFrame?, Add multiple empty columns to pandas DataFrame


回答 1

您可以使用assign列名称和值的字典。

In [1069]: df.assign(**{'col_new_1': np.nan, 'col2_new_2': 'dogs', 'col3_new_3': 3})
Out[1069]:
   col_1  col_2 col2_new_2  col3_new_3  col_new_1
0      0      4       dogs           3        NaN
1      1      5       dogs           3        NaN
2      2      6       dogs           3        NaN
3      3      7       dogs           3        NaN

You could use assign with a dict of column names and values.

In [1069]: df.assign(**{'col_new_1': np.nan, 'col2_new_2': 'dogs', 'col3_new_3': 3})
Out[1069]:
   col_1  col_2 col2_new_2  col3_new_3  col_new_1
0      0      4       dogs           3        NaN
1      1      5       dogs           3        NaN
2      2      6       dogs           3        NaN
3      3      7       dogs           3        NaN

回答 2

随着concat的使用:

In [128]: df
Out[128]: 
   col_1  col_2
0      0      4
1      1      5
2      2      6
3      3      7

In [129]: pd.concat([df, pd.DataFrame(columns = [ 'column_new_1', 'column_new_2','column_new_3'])])
Out[129]: 
   col_1  col_2 column_new_1 column_new_2 column_new_3
0    0.0    4.0          NaN          NaN          NaN
1    1.0    5.0          NaN          NaN          NaN
2    2.0    6.0          NaN          NaN          NaN
3    3.0    7.0          NaN          NaN          NaN

不太确定您想做什么[np.nan, 'dogs',3]。也许现在将它们设置为默认值?

In [142]: df1 = pd.concat([df, pd.DataFrame(columns = [ 'column_new_1', 'column_new_2','column_new_3'])])
In [143]: df1[[ 'column_new_1', 'column_new_2','column_new_3']] = [np.nan, 'dogs', 3]

In [144]: df1
Out[144]: 
   col_1  col_2  column_new_1 column_new_2  column_new_3
0    0.0    4.0           NaN         dogs             3
1    1.0    5.0           NaN         dogs             3
2    2.0    6.0           NaN         dogs             3
3    3.0    7.0           NaN         dogs             3

With the use of concat:

In [128]: df
Out[128]: 
   col_1  col_2
0      0      4
1      1      5
2      2      6
3      3      7

In [129]: pd.concat([df, pd.DataFrame(columns = [ 'column_new_1', 'column_new_2','column_new_3'])])
Out[129]: 
   col_1  col_2 column_new_1 column_new_2 column_new_3
0    0.0    4.0          NaN          NaN          NaN
1    1.0    5.0          NaN          NaN          NaN
2    2.0    6.0          NaN          NaN          NaN
3    3.0    7.0          NaN          NaN          NaN

Not very sure of what you wanted to do with [np.nan, 'dogs',3]. Maybe now set them as default values?

In [142]: df1 = pd.concat([df, pd.DataFrame(columns = [ 'column_new_1', 'column_new_2','column_new_3'])])
In [143]: df1[[ 'column_new_1', 'column_new_2','column_new_3']] = [np.nan, 'dogs', 3]

In [144]: df1
Out[144]: 
   col_1  col_2  column_new_1 column_new_2  column_new_3
0    0.0    4.0           NaN         dogs             3
1    1.0    5.0           NaN         dogs             3
2    2.0    6.0           NaN         dogs             3
3    3.0    7.0           NaN         dogs             3

回答 3

使用列表理解,pd.DataFrame以及pd.concat

pd.concat(
    [
        df,
        pd.DataFrame(
            [[np.nan, 'dogs', 3] for _ in range(df.shape[0])],
            df.index, ['column_new_1', 'column_new_2','column_new_3']
        )
    ], axis=1)

use of list comprehension, pd.DataFrame and pd.concat

pd.concat(
    [
        df,
        pd.DataFrame(
            [[np.nan, 'dogs', 3] for _ in range(df.shape[0])],
            df.index, ['column_new_1', 'column_new_2','column_new_3']
        )
    ], axis=1)


回答 4

如果添加许多具有相同值的缺失列(a,b,c,….),这里为0,我这样做:

    new_cols = ["a", "b", "c" ] 
    df[new_cols] = pd.DataFrame([[0] * len(new_cols)], index=df.index)

它基于已接受答案的第二个变体。

if adding a lot of missing columns (a, b, c ,….) with the same value, here 0, i did this:

    new_cols = ["a", "b", "c" ] 
    df[new_cols] = pd.DataFrame([[0] * len(new_cols)], index=df.index)

It’s based on the second variant of the accepted answer.


回答 5

只想指出@Matthias Fripp的答案中的option2

(2)我不一定希望DataFrame可以这种方式工作,但确实可以

df [[”column_new_1’,’column_new_2’,’column_new_3′]] = pd.DataFrame([[np.nan,’dogs’,3]],index = df.index)

已记录在熊猫自己的文档中 http://pandas.pydata.org/pandas-docs/stable/indexing.html#basics

您可以将列列表传递给[],以按此顺序选择列。如果DataFrame中不包含任何列,则将引发异常。 也可以以此方式设置多列。 您可能会发现这对于将转换(就地)应用于列的子集很有用。

Just want to point out that option2 in @Matthias Fripp’s answer

(2) I wouldn’t necessarily expect DataFrame to work this way, but it does

df[[‘column_new_1’, ‘column_new_2’, ‘column_new_3’]] = pd.DataFrame([[np.nan, ‘dogs’, 3]], index=df.index)

is already documented in pandas’ own documentation http://pandas.pydata.org/pandas-docs/stable/indexing.html#basics

You can pass a list of columns to [] to select columns in that order. If a column is not contained in the DataFrame, an exception will be raised. Multiple columns can also be set in this manner. You may find this useful for applying a transform (in-place) to a subset of the columns.


回答 6

如果您只想添加空的新列,则reindex将完成此工作

df
   col_1  col_2
0      0      4
1      1      5
2      2      6
3      3      7

df.reindex(list(df)+['column_new_1', 'column_new_2','column_new_3'], axis=1)
   col_1  col_2  column_new_1  column_new_2  column_new_3
0      0      4           NaN           NaN           NaN
1      1      5           NaN           NaN           NaN
2      2      6           NaN           NaN           NaN
3      3      7           NaN           NaN           NaN

完整的代码示例

import numpy as np
import pandas as pd

df = {'col_1': [0, 1, 2, 3],
        'col_2': [4, 5, 6, 7]}
df = pd.DataFrame(df)
print('df',df, sep='\n')
print()
df=df.reindex(list(df)+['column_new_1', 'column_new_2','column_new_3'], axis=1)
print('''df.reindex(list(df)+['column_new_1', 'column_new_2','column_new_3'], axis=1)''',df, sep='\n')

否则去分配答案

If you just want to add empty new columns, reindex will do the job

df
   col_1  col_2
0      0      4
1      1      5
2      2      6
3      3      7

df.reindex(list(df)+['column_new_1', 'column_new_2','column_new_3'], axis=1)
   col_1  col_2  column_new_1  column_new_2  column_new_3
0      0      4           NaN           NaN           NaN
1      1      5           NaN           NaN           NaN
2      2      6           NaN           NaN           NaN
3      3      7           NaN           NaN           NaN

full code example

import numpy as np
import pandas as pd

df = {'col_1': [0, 1, 2, 3],
        'col_2': [4, 5, 6, 7]}
df = pd.DataFrame(df)
print('df',df, sep='\n')
print()
df=df.reindex(list(df)+['column_new_1', 'column_new_2','column_new_3'], axis=1)
print('''df.reindex(list(df)+['column_new_1', 'column_new_2','column_new_3'], axis=1)''',df, sep='\n')

otherwise go for zeros answer with assign


回答 7

我不喜欢使用“索引”,依此类推…可能如下

df.columns
Index(['A123', 'B123'], dtype='object')

df=pd.concat([df,pd.DataFrame(columns=list('CDE'))])

df.rename(columns={
    'C':'C123',
    'D':'D123',
    'E':'E123'
},inplace=True)


df.columns
Index(['A123', 'B123', 'C123', 'D123', 'E123'], dtype='object')

I am not comfortable using “Index” and so on…could come up as below

df.columns
Index(['A123', 'B123'], dtype='object')

df=pd.concat([df,pd.DataFrame(columns=list('CDE'))])

df.rename(columns={
    'C':'C123',
    'D':'D123',
    'E':'E123'
},inplace=True)


df.columns
Index(['A123', 'B123', 'C123', 'D123', 'E123'], dtype='object')

Python时间测量功能

问题:Python时间测量功能

我想创建一个python函数来测试在每个函数中花费的时间,并用时间显示其名称,我该如何打印函数名称,如果还有另一种方法,请告诉我

def measureTime(a):
    start = time.clock() 
    a()
    elapsed = time.clock()
    elapsed = elapsed - start
    print "Time spent in (function name) is: ", elapsed

I want to create a python function to test the time spent in each function and print its name with its time, how i can print the function name and if there is another way to do so please tell me

def measureTime(a):
    start = time.clock() 
    a()
    elapsed = time.clock()
    elapsed = elapsed - start
    print "Time spent in (function name) is: ", elapsed

回答 0

首先,我强烈建议使用探查器或至少使用timeit

但是,如果您想严格地学习自己的计时方法,可以在这里开始使用装饰器。

Python 2:

def timing(f):
    def wrap(*args):
        time1 = time.time()
        ret = f(*args)
        time2 = time.time()
        print '%s function took %0.3f ms' % (f.func_name, (time2-time1)*1000.0)
        return ret
    return wrap

用法非常简单,只需使用@timing装饰器即可:

@timing
def do_work():
  #code

Python 3:

def timing(f):
    def wrap(*args, **kwargs):
        time1 = time.time()
        ret = f(*args, **kwargs)
        time2 = time.time()
        print('{:s} function took {:.3f} ms'.format(f.__name__, (time2-time1)*1000.0))

        return ret
    return wrap

注意我正在调用f.func_name以字符串形式获取函数名称(在Python 2中)或f.__name__ 在Python 3中。

First and foremost, I highly suggest using a profiler or atleast use timeit.

However if you wanted to write your own timing method strictly to learn, here is somewhere to get started using a decorator.

Python 2:

def timing(f):
    def wrap(*args):
        time1 = time.time()
        ret = f(*args)
        time2 = time.time()
        print '%s function took %0.3f ms' % (f.func_name, (time2-time1)*1000.0)
        return ret
    return wrap

And the usage is very simple, just use the @timing decorator:

@timing
def do_work():
  #code

Python 3:

def timing(f):
    def wrap(*args, **kwargs):
        time1 = time.time()
        ret = f(*args, **kwargs)
        time2 = time.time()
        print('{:s} function took {:.3f} ms'.format(f.__name__, (time2-time1)*1000.0))

        return ret
    return wrap

Note I’m calling f.func_name to get the function name as a string(in Python 2), or f.__name__ in Python 3.


回答 1

在玩完timeit模块之后,我不喜欢它的界面,与下面两种方法相比,它的界面并不那么优雅。

以下代码在Python 3中。

装饰器方法

这与@Mike的方法几乎相同。在这里,我添加kwargsfunctools包装以使其更好。

def timeit(func):
    @functools.wraps(func)
    def newfunc(*args, **kwargs):
        startTime = time.time()
        func(*args, **kwargs)
        elapsedTime = time.time() - startTime
        print('function [{}] finished in {} ms'.format(
            func.__name__, int(elapsedTime * 1000)))
    return newfunc

@timeit
def foobar():
    mike = Person()
    mike.think(30)

上下文管理器方法

from contextlib import contextmanager

@contextmanager
def timeit_context(name):
    startTime = time.time()
    yield
    elapsedTime = time.time() - startTime
    print('[{}] finished in {} ms'.format(name, int(elapsedTime * 1000)))

例如,您可以像这样使用它:

with timeit_context('My profiling code'):
    mike = Person()
    mike.think()

并且该with块内的代码将被计时。

结论

使用第一种方法,您可以轻松地注释掉装饰器以获取常规代码。但是,它只能计时一个功能。如果您有一部分代码不是使它起作用的功能,则可以选择第二种方法。

例如,现在你有

images = get_images()
bigImage = ImagePacker.pack(images, width=4096)
drawer.draw(bigImage)

现在您要为bigImage = ...生产线计时。如果将其更改为功能,它将是:

images = get_images()
bitImage = None
@timeit
def foobar():
    nonlocal bigImage
    bigImage = ImagePacker.pack(images, width=4096)
drawer.draw(bigImage)

看起来不太好…如果您使用的是没有nonlocal关键字的Python 2,该怎么办?

相反,使用第二种方法非常适合这里:

images = get_images()
with timeit_context('foobar'):
    bigImage = ImagePacker.pack(images, width=4096)
drawer.draw(bigImage)

After playing with the timeit module, I don’t like its interface, which is not so elegant compared to the following two method.

The following code is in Python 3.

The decorator method

This is almost the same with @Mike’s method. Here I add kwargs and functools wrap to make it better.

def timeit(func):
    @functools.wraps(func)
    def newfunc(*args, **kwargs):
        startTime = time.time()
        func(*args, **kwargs)
        elapsedTime = time.time() - startTime
        print('function [{}] finished in {} ms'.format(
            func.__name__, int(elapsedTime * 1000)))
    return newfunc

@timeit
def foobar():
    mike = Person()
    mike.think(30)

The context manager method

from contextlib import contextmanager

@contextmanager
def timeit_context(name):
    startTime = time.time()
    yield
    elapsedTime = time.time() - startTime
    print('[{}] finished in {} ms'.format(name, int(elapsedTime * 1000)))

For example, you can use it like:

with timeit_context('My profiling code'):
    mike = Person()
    mike.think()

And the code within the with block will be timed.

Conclusion

Using the first method, you can eaily comment out the decorator to get the normal code. However, it can only time a function. If you have some part of code that you don’t what to make it a function, then you can choose the second method.

For example, now you have

images = get_images()
bigImage = ImagePacker.pack(images, width=4096)
drawer.draw(bigImage)

Now you want to time the bigImage = ... line. If you change it to a function, it will be:

images = get_images()
bitImage = None
@timeit
def foobar():
    nonlocal bigImage
    bigImage = ImagePacker.pack(images, width=4096)
drawer.draw(bigImage)

Looks not so great…What if you are in Python 2, which has no nonlocal keyword.

Instead, using the second method fits here very well:

images = get_images()
with timeit_context('foobar'):
    bigImage = ImagePacker.pack(images, width=4096)
drawer.draw(bigImage)

回答 2

我看不到timeit模块有什么问题。这可能是最简单的方法。

import timeit
timeit.timeit(a, number=1)

也可以向函数发送参数。您需要做的就是使用装饰器包装功能。此处有更多说明: http //www.pythoncentral.io/time-a-python-function/

您可能对编写自己的时序语句感兴趣的唯一情况是,您只想运行一个函数并且还希望获得其返回值。

使用该timeit模块的优势 在于,它使您可以重复执行次数。这可能是必要的,因为其他过程可能会干扰您的计时精度。因此,您应该多次运行它并查看最小值。

I don’t see what the problem with the timeit module is. This is probably the simplest way to do it.

import timeit
timeit.timeit(a, number=1)

Its also possible to send arguments to the functions. All you need is to wrap your function up using decorators. More explanation here: http://www.pythoncentral.io/time-a-python-function/

The only case where you might be interested in writing your own timing statements is if you want to run a function only once and are also want to obtain its return value.

The advantage of using the timeit module is that it lets you repeat the number of executions. This might be necessary because other processes might interfere with your timing accuracy. So, you should run it multiple times and look at the lowest value.


回答 3

Timeit有两个大缺陷:它不返回函数的返回值,并且它使用eval,这需要为导入传递额外的设置代码。这可以简单而优雅地解决这两个问题:

def timed(f):
  start = time.time()
  ret = f()
  elapsed = time.time() - start
  return ret, elapsed

timed(lambda: database.foo.execute('select count(*) from source.apachelog'))
(<sqlalchemy.engine.result.ResultProxy object at 0x7fd6c20fc690>, 4.07547402381897)

Timeit has two big flaws: it doesn’t return the return value of the function, and it uses eval, which requires passing in extra setup code for imports. This solves both problems simply and elegantly:

def timed(f):
  start = time.time()
  ret = f()
  elapsed = time.time() - start
  return ret, elapsed

timed(lambda: database.foo.execute('select count(*) from source.apachelog'))
(<sqlalchemy.engine.result.ResultProxy object at 0x7fd6c20fc690>, 4.07547402381897)

回答 4

有一个简单的计时工具。https://github.com/RalphMao/PyTimer

它可以像装饰器一样工作:

from pytimer import Timer
@Timer(average=False)      
def matmul(a,b, times=100):
    for i in range(times):
        np.dot(a,b)        

输出:

matmul:0.368434
matmul:2.839355

它也可以像带有命名空间控制的插件计时器一样工作(如果将其插入具有很多代码并且可以在其他任何地方调用的函数,则将很有帮助)。

timer = Timer()                                           
def any_function():                                       
    timer.start()                                         

    for i in range(10):                                   

        timer.reset()                                     
        np.dot(np.ones((100,1000)), np.zeros((1000,500)))
        timer.checkpoint('block1')                        

        np.dot(np.ones((100,1000)), np.zeros((1000,500)))
        np.dot(np.ones((100,1000)), np.zeros((1000,500)))
        timer.checkpoint('block2')                        
        np.dot(np.ones((100,1000)), np.zeros((1000,1000)))

    for j in range(20):                                   
        np.dot(np.ones((100,1000)), np.zeros((1000,500)))
    timer.summary()                                       

for i in range(2):                                        
    any_function()                                        

输出:

========Timing Summary of Default Timer========
block2:0.065062
block1:0.032529
========Timing Summary of Default Timer========
block2:0.065838
block1:0.032891

希望对你有帮助

There is an easy tool for timing. https://github.com/RalphMao/PyTimer

It can work like a decorator:

from pytimer import Timer
@Timer(average=False)      
def matmul(a,b, times=100):
    for i in range(times):
        np.dot(a,b)        

Output:

matmul:0.368434
matmul:2.839355

It can also work like a plug-in timer with namespace control(helpful if you are inserting it to a function which has a lot of codes and may be called anywhere else).

timer = Timer()                                           
def any_function():                                       
    timer.start()                                         

    for i in range(10):                                   

        timer.reset()                                     
        np.dot(np.ones((100,1000)), np.zeros((1000,500)))
        timer.checkpoint('block1')                        

        np.dot(np.ones((100,1000)), np.zeros((1000,500)))
        np.dot(np.ones((100,1000)), np.zeros((1000,500)))
        timer.checkpoint('block2')                        
        np.dot(np.ones((100,1000)), np.zeros((1000,1000)))

    for j in range(20):                                   
        np.dot(np.ones((100,1000)), np.zeros((1000,500)))
    timer.summary()                                       

for i in range(2):                                        
    any_function()                                        

Output:

========Timing Summary of Default Timer========
block2:0.065062
block1:0.032529
========Timing Summary of Default Timer========
block2:0.065838
block1:0.032891

Hope it will help


回答 5

使用装饰器Python库的Decorator方法:

import decorator

@decorator
def timing(func, *args, **kwargs):
    '''Function timing wrapper
        Example of using:
        ``@timing()``
    '''

    fn = '%s.%s' % (func.__module__, func.__name__)

    timer = Timer()
    with timer:
        ret = func(*args, **kwargs)

    log.info(u'%s - %0.3f sec' % (fn, timer.duration_in_seconds()))
    return ret

请参阅我的博客上的文章:

在mobilepro.pl博客上发布

我在Google Plus上的信息

Decorator method using decorator Python library:

import decorator

@decorator
def timing(func, *args, **kwargs):
    '''Function timing wrapper
        Example of using:
        ``@timing()``
    '''

    fn = '%s.%s' % (func.__module__, func.__name__)

    timer = Timer()
    with timer:
        ret = func(*args, **kwargs)

    log.info(u'%s - %0.3f sec' % (fn, timer.duration_in_seconds()))
    return ret

See post on my Blog:

post on mobilepro.pl Blog

my post on Google Plus


回答 6

我的做法:

from time import time

def printTime(start):
    end = time()
    duration = end - start
    if duration < 60:
        return "used: " + str(round(duration, 2)) + "s."
    else:
        mins = int(duration / 60)
        secs = round(duration % 60, 2)
        if mins < 60:
            return "used: " + str(mins) + "m " + str(secs) + "s."
        else:
            hours = int(duration / 3600)
            mins = mins % 60
            return "used: " + str(hours) + "h " + str(mins) + "m " + str(secs) + "s."

start = time()在执行功能/循环之前以及printTime(start)在块之后立即设置变量。

然后你得到了答案。

My way of doing it:

from time import time

def printTime(start):
    end = time()
    duration = end - start
    if duration < 60:
        return "used: " + str(round(duration, 2)) + "s."
    else:
        mins = int(duration / 60)
        secs = round(duration % 60, 2)
        if mins < 60:
            return "used: " + str(mins) + "m " + str(secs) + "s."
        else:
            hours = int(duration / 3600)
            mins = mins % 60
            return "used: " + str(hours) + "h " + str(mins) + "m " + str(secs) + "s."

Set a variable as start = time() before execute the function/loops, and printTime(start) right after the block.

and you got the answer.


在iPython Notebook中进行调试的正确方法是什么?

问题:在iPython Notebook中进行调试的正确方法是什么?

我所知, %debug magic可以在一个单元内进行调试。

但是,我有跨多个单元格的函数调用。

例如,

In[1]: def fun1(a)
           def fun2(b)
               # I want to set a breakpoint for the following line #
               return do_some_thing_about(b)

       return fun2(a)

In[2]: import multiprocessing as mp
       pool=mp.Pool(processes=2)
       results=pool.map(fun1, 1.0)
       pool.close()
       pool.join

我试过的

  1. 我试图%debug在cell-1的第一行中设置。但是它甚至在执行单元2之前就立即进入调试模式。

  2. 我试图%debug在代码之前添加该行return do_some_thing_about(b)。但是,代码将永远运行,永远不会停止。

在ipython笔记本中设置断点的正确方法是什么?

As I know, %debug magic can do debug within one cell.

However, I have function calls across multiple cells.

For example,

In[1]: def fun1(a)
           def fun2(b)
               # I want to set a breakpoint for the following line #
               return do_some_thing_about(b)

       return fun2(a)

In[2]: import multiprocessing as mp
       pool=mp.Pool(processes=2)
       results=pool.map(fun1, 1.0)
       pool.close()
       pool.join

What I tried:

  1. I tried to set %debug in the first line of cell-1. But it enter into debug mode immediately, even before executing cell-2.

  2. I tried to add %debug in the line right before the code return do_some_thing_about(b). But then the code runs forever, never stops.

What is the right way to set a break point within the ipython notebook?


回答 0

使用ipdb

通过安装

pip install ipdb

用法:

In[1]: def fun1(a):
   def fun2(a):
       import ipdb; ipdb.set_trace() # debugging starts here
       return do_some_thing_about(b)
   return fun2(a)
In[2]: fun1(1)

用于逐行执行n和进入功能使用,s并退出调试提示使用c

有关可用命令的完整列表:https : //appletree.or.kr/quick_reference_cards/Python/Python%20Debugger%20Cheatsheet.pdf

Use ipdb

Install it via

pip install ipdb

Usage:

In[1]: def fun1(a):
   def fun2(a):
       import ipdb; ipdb.set_trace() # debugging starts here
       return do_some_thing_about(b)
   return fun2(a)
In[2]: fun1(1)

For executing line by line use n and for step into a function use s and to exit from debugging prompt use c.

For complete list of available commands: https://appletree.or.kr/quick_reference_cards/Python/Python%20Debugger%20Cheatsheet.pdf


回答 1

您可以ipdb在jupyter内部使用以下命令:

from IPython.core.debugger import Tracer; Tracer()()

编辑:自IPython 5.1起,不推荐使用上述功能。这是新方法:

from IPython.core.debugger import set_trace

set_trace()在需要断点的地方添加。键入help用于ipdb命令输入字段出现时。

You can use ipdb inside jupyter with:

from IPython.core.debugger import Tracer; Tracer()()

Edit: the functions above are deprecated since IPython 5.1. This is the new approach:

from IPython.core.debugger import set_trace

Add set_trace() where you need a breakpoint. Type help for ipdb commands when the input field appears.


回答 2

您的返回函数位于def函数(主函数)的行中,您必须给它一个制表符。和使用

%%debug 

代替

%debug 

调试整个单元而不仅仅是行。希望这可能对您有帮助。

Your return function is in line of def function(main function), you must give one tab to it. And Use

%%debug 

instead of

%debug 

to debug the whole cell not only line. Hope, maybe this will help you.


回答 3

您始终可以在任何单元格中添加它:

import pdb; pdb.set_trace()

调试器将在该行停止。例如:

In[1]: def fun1(a):
           def fun2(a):
               import pdb; pdb.set_trace() # debugging starts here
           return fun2(a)

In[2]: fun1(1)

You can always add this in any cell:

import pdb; pdb.set_trace()

and the debugger will stop on that line. For example:

In[1]: def fun1(a):
           def fun2(a):
               import pdb; pdb.set_trace() # debugging starts here
           return fun2(a)

In[2]: fun1(1)

回答 4

在Python 3.7中,您可以使用breakpoint()函数。只需输入

breakpoint()

无论您想在哪里停止运行时,都可以使用相同的pdb命令(r,c,n,…)或评估变量。

In Python 3.7 you can use breakpoint() function. Just enter

breakpoint()

wherever you would like runtime to stop and from there you can use the same pdb commands (r, c, n, …) or evaluate your variables.


回答 5

只需键入import pdb在jupyter笔记本,然后用这个的cheatsheet调试。非常方便

c->继续,s->步进,b 12->在第12行设置断点,依此类推。

一些有用的链接: pdb上的Python官方文档Python pdb调试器示例,以更好地了解如何使用调试器命令

一些有用的屏幕截图:

Just type import pdb in jupyter notebook, and then use this cheatsheet to debug. It’s very convenient.

c –> continue, s –> step, b 12 –> set break point at line 12 and so on.

Some useful links: Python Official Document on pdb, Python pdb debugger examples for better understanding how to use the debugger commands.

Some useful screenshots:


回答 6

得到错误后,在下一个单元格中运行%debug,仅此而已。

After you get an error, in the next cell just run %debug and that’s it.


回答 7

%pdb魔术的命令是很好用为好。只需说一遍%pdb on,随后pdb调试器将在所有异常上运行,无论调用堆栈中有多深。非常便利。

如果您有要调试的特定行,只需在此处引发一个异常(通常您已经!),或使用%debug其他人一直在建议的magic命令。

The %pdb magic command is good to use as well. Just say %pdb on and subsequently the pdb debugger will run on all exceptions, no matter how deep in the call stack. Very handy.

If you have a particular line that you want to debug, just raise an exception there (often you already are!) or use the %debug magic command that other folks have been suggesting.


回答 8

我刚刚发现了PixieDebugger。甚至以为我还没有时间进行测试,这似乎确实是调试我们在ipdb中使用ipython的方式的最相似方法

它还有一个“评估”标签

I just discovered PixieDebugger. Even thought I have not yet had the time to test it, it really seems the most similar way to debug the way we’re used in ipython with ipdb

It also has an “evaluate” tab


回答 9

提供了本机调试器作为JupyterLab的扩展。可以在几周前发布,可以通过获取相关扩展以及xeus-python内核(尤其是没有ipykernel用户众所周知的魔术)来安装它:

jupyter labextension install @jupyterlab/debugger
conda install xeus-python -c conda-forge

这样可以实现其他IDE众所周知的可视化调试体验。

来源:Jupyter的可视调试器

A native debugger is being made available as an extension to JupyterLab. Released a few weeks ago, this can be installed by getting the relevant extension, as well as xeus-python kernel (which notably comes without the magics well-known to ipykernel users):

jupyter labextension install @jupyterlab/debugger
conda install xeus-python -c conda-forge

This enables a visual debugging experience well-known from other IDEs.

Source: A visual debugger for Jupyter


如何分类我的爪子?

问题:如何分类我的爪子?

我之前的问题中,我得到了一个很好的答案,可以帮助我检测出爪子在哪里压板,但是现在我很难将这些结果与相应的爪子联系起来:

我手动注释了爪子(RF =右前,RH =右后,LF =左前,LH =左后)。

正如您所看到的,显然有一个重复的模式,并且几乎在所有测量中都会返回。这是指向6条手动注释的试验的演示文稿的链接。

我最初的想法是使用启发式进行排序,例如:

  • 前爪和后爪之间的负重比约为60-40%;
  • 后爪的表面通常较小。
  • 爪子(通常)在空间上分为左右两半。

但是,我对我的启发式方法有些怀疑,因为一旦遇到我从未想到的变化,它们就会对我失败。他们也将无法应付la狗的测量,la狗可能有自己的规则。

此外,乔建议的注释有时会弄乱,并且没有考虑到爪子的实际外观。

基于我对爪子内峰值检测问题的回答,我希望有更多高级解决方案可以对爪子进行分类。特别是因为每个单独的爪子的压力分布及其进程都不同,几乎就像指纹一样。我希望有一种方法可以用它来对我的爪子进行聚类,而不仅仅是按照发生的顺序对其进行排序。

因此,我正在寻找一种更好的方法来对结果和相应的爪进行排序。

对于要应对挑战的任何人,我都腌制了一个词典其中包含所有包含每个爪的压力数据的切片切片(通过测量捆绑)以及描述其位置切片在板上和时间上)的切片

澄清一下:walk_sliced_data是一个字典,其中包含[‘ser_3’,’ser_2’,’sel_1’,’sel_2’,’ser_1’,’sel_3’],这是测量的名称。每个度量都包含另一个字典[0、1、2、3、4、5、6、7、8、9、10](来自“ sel_1”的示例),代表提取的影响。

还要注意,可以忽略“假”影响,例如对脚掌进行部分测量(在空间或时间上)。它们仅是有用的,因为它们可以帮助识别模式,但不会进行分析。

对于感兴趣的任何人,我都会保留一个博客,其中包含有关该项目的所有更新!

In my previous question I got an excellent answer that helped me detect where a paw hit a pressure plate, but now I’m struggling to link these results to their corresponding paws:

I manually annotated the paws (RF=right front, RH= right hind, LF=left front, LH=left hind).

As you can see there’s clearly a repeating pattern and it comes back in almost every measurement. Here’s a link to a presentation of 6 trials that were manually annotated.

My initial thought was to use heuristics to do the sorting, like:

  • There’s a ~60-40% ratio in weight bearing between the front and hind paws;
  • The hind paws are generally smaller in surface;
  • The paws are (often) spatially divided in left and right.

However, I’m a bit skeptical about my heuristics, as they would fail on me as soon as I encounter a variation I hadn’t thought off. They also won’t be able to cope with measurements from lame dogs, whom probably have rules of their own.

Furthermore, the annotation suggested by Joe sometimes get’s messed up and doesn’t take into account what the paw actually looks like.

Based on the answers I received on my question about peak detection within the paw, I’m hoping there are more advanced solutions to sort the paws. Especially because the pressure distribution and the progression thereof are different for each separate paw, almost like a fingerprint. I hope there’s a method that can use this to cluster my paws, rather than just sorting them in order of occurrence.

So I’m looking for a better way to sort the results with their corresponding paw.

For anyone up to the challenge, I have pickled a dictionary with all the sliced arrays that contain the pressure data of each paw (bundled by measurement) and the slice that describes their location (location on the plate and in time).

To clarfiy: walk_sliced_data is a dictionary that contains [‘ser_3’, ‘ser_2’, ‘sel_1’, ‘sel_2’, ‘ser_1’, ‘sel_3’], which are the names of the measurements. Each measurement contains another dictionary, [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10] (example from ‘sel_1’) which represent the impacts that were extracted.

Also note that ‘false’ impacts, such as where the paw is partially measured (in space or time) can be ignored. They are only useful because they can help recognizing a pattern, but won’t be analyzed.

And for anyone interested, I’m keeping a blog with all the updates regarding the project!


回答 0

好的!我终于设法使某些东西始终如一!这个问题使我困扰了好几天…好玩的东西!很抱歉这个答案的长度,但是我需要详细说明一些事情……(尽管我可能创下有史以来最长的非垃圾邮件stackoverflow答案的记录!)

附带说明一下,我正在使用Ivo 在其原始问题中提供的链接的完整数据集。这是一系列rar文件(每个狗一个),每个文件包含以ascii数组存储的几种不同的实验运行。与其尝试将独立的代码示例复制粘贴到此问题中,不如这里是一个带有完整的独立代码的位存储库。您可以使用克隆

hg clone https://joferkington@bitbucket.org/joferkington/paw-analysis


总览

正如您在问题中指出的那样,解决问题基本上有两种方法。我实际上将以不同的方式使用两者。

  1. 使用脚掌冲击的(时间和空间)顺序确定哪个脚掌是哪个。
  2. 尝试仅根据其形状来识别“爪印”。

基本上,第一种方法适用于狗的爪子,遵循上面Ivo问题中所示的梯形样式,但是只要爪子不遵循这种样式,它就会失败。以编程方式检测何时不起作用是很容易的。

因此,我们可以在实际工作中使用测量结果来建立训练数据集(约30只不同狗的约2000爪影响),以识别出哪一只爪,并将问题归结为监督分类(带有一些额外的皱纹)。 ..图像识别比“常规”监督分类问题要难一些)。


模式分析

为了详细说明第一种方法,当一条狗正常走路(不跑!)(其中一些狗可能不会走路)时,我们希望爪子按以下顺序冲击:前左,后右,右前,后左,左前等。模式可能从左前爪或右前爪开始。

如果总是这样,我们可以简单地按初始接触时间对冲击进行分类,并使用模数4将其按爪进行分组。

但是,即使一切都“正常”,这也不起作用。这是由于图案的梯形形状。后爪在空间上位于前一个前爪的后面。

因此,最初的前爪撞击后的后爪撞击通常会从传感器板上掉落,因此不会被记录下来。同样,最后的爪子撞击通常不是序列中的下一个爪子,因为爪子撞击发生在传感器板上之前,没有被记录下来。

但是,我们可以使用爪子撞击模式的形状来确定何时发生这种情况,以及是否从左前爪或右前爪开始。(我实际上忽略了这里最后影响的问题。不过,添加它并不难。)

def group_paws(data_slices, time):   
    # Sort slices by initial contact time
    data_slices.sort(key=lambda s: s[-1].start)

    # Get the centroid for each paw impact...
    paw_coords = []
    for x,y,z in data_slices:
        paw_coords.append([(item.stop + item.start) / 2.0 for item in (x,y)])
    paw_coords = np.array(paw_coords)

    # Make a vector between each sucessive impact...
    dx, dy = np.diff(paw_coords, axis=0).T

    #-- Group paws -------------------------------------------
    paw_code = {0:'LF', 1:'RH', 2:'RF', 3:'LH'}
    paw_number = np.arange(len(paw_coords))

    # Did we miss the hind paw impact after the first 
    # front paw impact? If so, first dx will be positive...
    if dx[0] > 0: 
        paw_number[1:] += 1

    # Are we starting with the left or right front paw...
    # We assume we're starting with the left, and check dy[0].
    # If dy[0] > 0 (i.e. the next paw impacts to the left), then
    # it's actually the right front paw, instead of the left.
    if dy[0] > 0: # Right front paw impact...
        paw_number += 2

    # Now we can determine the paw with a simple modulo 4..
    paw_codes = paw_number % 4
    paw_labels = [paw_code[code] for code in paw_codes]

    return paw_labels

尽管如此,它经常无法正常工作。完整数据集中的许多狗似乎都在奔跑,而且爪子的撞击与狗走路时的时间顺序不同。(或者这只狗有严重的髋关节问题…)

幸运的是,我们仍然可以通过编程方式检测爪子撞击是否遵循我们预期的空间模式:

def paw_pattern_problems(paw_labels, dx, dy):
    """Check whether or not the label sequence "paw_labels" conforms to our
    expected spatial pattern of paw impacts. "paw_labels" should be a sequence
    of the strings: "LH", "RH", "LF", "RF" corresponding to the different paws"""
    # Check for problems... (This could be written a _lot_ more cleanly...)
    problems = False
    last = paw_labels[0]
    for paw, dy, dx in zip(paw_labels[1:], dy, dx):
        # Going from a left paw to a right, dy should be negative
        if last.startswith('L') and paw.startswith('R') and (dy > 0):
            problems = True
            break
        # Going from a right paw to a left, dy should be positive
        if last.startswith('R') and paw.startswith('L') and (dy < 0):
            problems = True
            break
        # Going from a front paw to a hind paw, dx should be negative
        if last.endswith('F') and paw.endswith('H') and (dx > 0):
            problems = True
            break
        # Going from a hind paw to a front paw, dx should be positive
        if last.endswith('H') and paw.endswith('F') and (dx < 0):
            problems = True
            break
        last = paw
    return problems

因此,即使简单的空间分类并不能始终有效,我们仍可以合理地确定何时进行分类。

训练数据集

从正确运行的基于模式的分类中,我们可以建立一个非常大的训练数据集,以正确分类的爪子(32只不同的狗约有2400爪子撞击!)。

现在,我们可以开始查看“平均”左前爪的外观,等等。

为此,我们需要某种“爪度量”,它对任何狗都具有相同的维数。(在完整的数据集中,有很大的狗也有很小的狗!)与玩具贵宾犬的爪子印相相比,爱尔兰埃尔克猎犬的爪子印相既宽又“重”。我们需要重新调整每个爪印的比例,以便a)它们具有相同的像素数,b)压力值已标准化。为此,我将每个爪印重新采样到20×20的网格上,并根据爪影响的最大,最小和平均压力值重新调整压力值。

def paw_image(paw):
    from scipy.ndimage import map_coordinates
    ny, nx = paw.shape

    # Trim off any "blank" edges around the paw...
    mask = paw > 0.01 * paw.max()
    y, x = np.mgrid[:ny, :nx]
    ymin, ymax = y[mask].min(), y[mask].max()
    xmin, xmax = x[mask].min(), x[mask].max()

    # Make a 20x20 grid to resample the paw pressure values onto
    numx, numy = 20, 20
    xi = np.linspace(xmin, xmax, numx)
    yi = np.linspace(ymin, ymax, numy)
    xi, yi = np.meshgrid(xi, yi)  

    # Resample the values onto the 20x20 grid
    coords = np.vstack([yi.flatten(), xi.flatten()])
    zi = map_coordinates(paw, coords)
    zi = zi.reshape((numy, numx))

    # Rescale the pressure values
    zi -= zi.min()
    zi /= zi.max()
    zi -= zi.mean() #<- Helps distinguish front from hind paws...
    return zi

完成所有这些操作之后,我们终于可以看看平均左前,右后等爪的外观。请注意,这是在> 30头大小相差很大的狗中得到的平均值,我们似乎获得了一致的结果!

但是,在对此进行任何分析之前,我们需要减去平均值(所有狗的所有腿的平均爪)。

现在,我们可以分析与均值的差异,这更容易识别:

基于图像的爪子识别

好的,我们终于有了一组模式,可以开始尝试与之匹配的爪子。每个爪都可以当作一个400维向量(由paw_image函数返回),可以与这四个400维向量进行比较。

不幸的是,如果我们仅使用“常规”监督分类算法(即使用简单的距离来找到4个图案中的哪个最接近特定的爪印),它就不能始终如一地工作。实际上,它并没有比训练数据集上的随机机会好得多。

这是图像识别中的常见问题。由于输入数据的高维性以及图像的“模糊”性质(即,相邻像素具有较高的协方差),仅查看图像与模板图像的差异并不能很好地衡量图像的质量。它们的形状相似。

特征爪

为了解决这个问题,我们需要构建一组“特征爪”(就像面部识别中的“特征脸”一样),并将每个爪印描述为这些特征爪的组合。这与主成分分析相同,并且基本上提供了减少数据维数的方法,因此距离是衡量形状的好方法。

因为我们拥有的训练图像多于尺寸(2400与400),所以不需要为速度做“奇特”线性代数。我们可以直接使用训练数据集的协方差矩阵:

def make_eigenpaws(paw_data):
    """Creates a set of eigenpaws based on paw_data.
    paw_data is a numdata by numdimensions matrix of all of the observations."""
    average_paw = paw_data.mean(axis=0)
    paw_data -= average_paw

    # Determine the eigenvectors of the covariance matrix of the data
    cov = np.cov(paw_data.T)
    eigvals, eigvecs = np.linalg.eig(cov)

    # Sort the eigenvectors by ascending eigenvalue (largest is last)
    eig_idx = np.argsort(eigvals)
    sorted_eigvecs = eigvecs[:,eig_idx]
    sorted_eigvals = eigvals[:,eig_idx]

    # Now choose a cutoff number of eigenvectors to use 
    # (50 seems to work well, but it's arbirtrary...
    num_basis_vecs = 50
    basis_vecs = sorted_eigvecs[:,-num_basis_vecs:]

    return basis_vecs

这些basis_vecs是“本征爪子”。

要使用这些,我们只需将每个爪图像(作为400维矢量,而不是20×20图像)与基本矢量点(即矩阵相乘)。这为我们提供了一个50维向量(每个基本向量一个元素),可用于对图像进行分类。而不是将20×20图像与每个“模板”爪子的20×20图像进行比较,我们将50维变换图像与每个50维变换模板爪进行比较。这对于每个脚趾的确切位置的微小变化等不太敏感,并且基本上将问题的维数减小到仅相关的维数。

基于特征根的爪子分类

现在,我们可以简单地使用每条腿的50维向量和“模板”向量之间的距离来分类哪个爪子是哪个:

codebook = np.load('codebook.npy') # Template vectors for each paw
average_paw = np.load('average_paw.npy')
basis_stds = np.load('basis_stds.npy') # Needed to "whiten" the dataset...
basis_vecs = np.load('basis_vecs.npy')
paw_code = {0:'LF', 1:'RH', 2:'RF', 3:'LH'}
def classify(paw):
    paw = paw.flatten()
    paw -= average_paw
    scores = paw.dot(basis_vecs) / basis_stds
    diff = codebook - scores
    diff *= diff
    diff = np.sqrt(diff.sum(axis=1))
    return paw_code[diff.argmin()]

以下是一些结果:

仍然存在的问题

仍然存在一些问题,特别是对于太小而无法形成清晰脚印的狗……(它对大狗最有效,因为脚趾在传感器的分辨率下更明显地分开了。)而且,此功能无法识别部分脚印系统,而它们可以与基于梯形图案的系统一起使用。

但是,由于本征分析本质上使用距离度量,因此我们可以对两种方式进行分类,当本征分析与“密码本”的最小距离超过某个阈值时,可以退回到基于梯形模式的系统。我还没有实现这个。

ew …好长!我的帽子对Ivo提出了这样一个有趣的问题!

Alright! I’ve finally managed to get something working consistently! This problem pulled me in for several days… Fun stuff! Sorry for the length of this answer, but I need to elaborate a bit on some things… (Though I may set a record for the longest non-spam stackoverflow answer ever!)

As a side note, I’m using the full dataset that Ivo provided a link to in his original question. It’s a series of rar files (one-per-dog) each containing several different experiment runs stored as ascii arrays. Rather than try to copy-paste stand-alone code examples into this question, here’s a bitbucket mercurial repository with full, stand-alone code. You can clone it with

hg clone https://joferkington@bitbucket.org/joferkington/paw-analysis


Overview

There are essentially two ways to approach the problem, as you noted in your question. I’m actually going to use both in different ways.

  1. Use the (temporal and spatial) order of the paw impacts to determine which paw is which.
  2. Try to identify the “pawprint” based purely on its shape.

Basically, the first method works with the dog’s paws follow the trapezoidal-like pattern shown in Ivo’s question above, but fails whenever the paws don’t follow that pattern. It’s fairly easy to programatically detect when it doesn’t work.

Therefore, we can use the measurements where it did work to build up a training dataset (of ~2000 paw impacts from ~30 different dogs) to recognize which paw is which, and the problem reduces to a supervised classification (With some additional wrinkles… Image recognition is a bit harder than a “normal” supervised classification problem).


Pattern Analysis

To elaborate on the first method, when a dog is walking (not running!) normally (which some of these dogs may not be), we expect paws to impact in the order of: Front Left, Hind Right, Front Right, Hind Left, Front Left, etc. The pattern may start with either the front left or front right paw.

If this were always the case, we could simply sort the impacts by initial contact time and use a modulo 4 to group them by paw.

However, even when everything is “normal”, this doesn’t work. This is due to the trapezoid-like shape of the pattern. A hind paw spatially falls behind the previous front paw.

Therefore, the hind paw impact after the initial front paw impact often falls off the sensor plate, and isn’t recorded. Similarly, the last paw impact is often not the next paw in the sequence, as the paw impact before it occured off the sensor plate and wasn’t recorded.

Nonetheless, we can use the shape of the paw impact pattern to determine when this has happened, and whether we’ve started with a left or right front paw. (I’m actually ignoring problems with the last impact here. It’s not too hard to add it, though.)

def group_paws(data_slices, time):   
    # Sort slices by initial contact time
    data_slices.sort(key=lambda s: s[-1].start)

    # Get the centroid for each paw impact...
    paw_coords = []
    for x,y,z in data_slices:
        paw_coords.append([(item.stop + item.start) / 2.0 for item in (x,y)])
    paw_coords = np.array(paw_coords)

    # Make a vector between each sucessive impact...
    dx, dy = np.diff(paw_coords, axis=0).T

    #-- Group paws -------------------------------------------
    paw_code = {0:'LF', 1:'RH', 2:'RF', 3:'LH'}
    paw_number = np.arange(len(paw_coords))

    # Did we miss the hind paw impact after the first 
    # front paw impact? If so, first dx will be positive...
    if dx[0] > 0: 
        paw_number[1:] += 1

    # Are we starting with the left or right front paw...
    # We assume we're starting with the left, and check dy[0].
    # If dy[0] > 0 (i.e. the next paw impacts to the left), then
    # it's actually the right front paw, instead of the left.
    if dy[0] > 0: # Right front paw impact...
        paw_number += 2

    # Now we can determine the paw with a simple modulo 4..
    paw_codes = paw_number % 4
    paw_labels = [paw_code[code] for code in paw_codes]

    return paw_labels

In spite of all of this, it frequently doesn’t work correctly. Many of the dogs in the full dataset appear to be running, and the paw impacts don’t follow the same temporal order as when the dog is walking. (Or perhaps the dog just has severe hip problems…)

Fortunately, we can still programatically detect whether or not the paw impacts follow our expected spatial pattern:

def paw_pattern_problems(paw_labels, dx, dy):
    """Check whether or not the label sequence "paw_labels" conforms to our
    expected spatial pattern of paw impacts. "paw_labels" should be a sequence
    of the strings: "LH", "RH", "LF", "RF" corresponding to the different paws"""
    # Check for problems... (This could be written a _lot_ more cleanly...)
    problems = False
    last = paw_labels[0]
    for paw, dy, dx in zip(paw_labels[1:], dy, dx):
        # Going from a left paw to a right, dy should be negative
        if last.startswith('L') and paw.startswith('R') and (dy > 0):
            problems = True
            break
        # Going from a right paw to a left, dy should be positive
        if last.startswith('R') and paw.startswith('L') and (dy < 0):
            problems = True
            break
        # Going from a front paw to a hind paw, dx should be negative
        if last.endswith('F') and paw.endswith('H') and (dx > 0):
            problems = True
            break
        # Going from a hind paw to a front paw, dx should be positive
        if last.endswith('H') and paw.endswith('F') and (dx < 0):
            problems = True
            break
        last = paw
    return problems

Therefore, even though the simple spatial classification doesn’t work all of the time, we can determine when it does work with reasonable confidence.

Training Dataset

From the pattern-based classifications where it worked correctly, we can build up a very large training dataset of correctly classified paws (~2400 paw impacts from 32 different dogs!).

We can now start to look at what an “average” front left, etc, paw looks like.

To do this, we need some sort of “paw metric” that is the same dimensionality for any dog. (In the full dataset, there are both very large and very small dogs!) A paw print from an Irish elkhound will be both much wider and much “heavier” than a paw print from a toy poodle. We need to rescale each paw print so that a) they have the same number of pixels, and b) the pressure values are standardized. To do this, I resampled each paw print onto a 20×20 grid and rescaled the pressure values based on the maximum, mininum, and mean pressure value for the paw impact.

def paw_image(paw):
    from scipy.ndimage import map_coordinates
    ny, nx = paw.shape

    # Trim off any "blank" edges around the paw...
    mask = paw > 0.01 * paw.max()
    y, x = np.mgrid[:ny, :nx]
    ymin, ymax = y[mask].min(), y[mask].max()
    xmin, xmax = x[mask].min(), x[mask].max()

    # Make a 20x20 grid to resample the paw pressure values onto
    numx, numy = 20, 20
    xi = np.linspace(xmin, xmax, numx)
    yi = np.linspace(ymin, ymax, numy)
    xi, yi = np.meshgrid(xi, yi)  

    # Resample the values onto the 20x20 grid
    coords = np.vstack([yi.flatten(), xi.flatten()])
    zi = map_coordinates(paw, coords)
    zi = zi.reshape((numy, numx))

    # Rescale the pressure values
    zi -= zi.min()
    zi /= zi.max()
    zi -= zi.mean() #<- Helps distinguish front from hind paws...
    return zi

After all of this, we can finally take a look at what an average left front, hind right, etc paw looks like. Note that this is averaged across >30 dogs of greatly different sizes, and we seem to be getting consistent results!

However, before we do any analysis on these, we need to subtract the mean (the average paw for all legs of all dogs).

Now we can analyize the differences from the mean, which are a bit easier to recognize:

Image-based Paw Recognition

Ok… We finally have a set of patterns that we can begin to try to match the paws against. Each paw can be treated as a 400-dimensional vector (returned by the paw_image function) that can be compared to these four 400-dimensional vectors.

Unfortunately, if we just use a “normal” supervised classification algorithm (i.e. find which of the 4 patterns is closest to a particular paw print using a simple distance), it doesn’t work consistently. In fact, it doesn’t do much better than random chance on the training dataset.

This is a common problem in image recognition. Due to the high dimensionality of the input data, and the somewhat “fuzzy” nature of images (i.e. adjacent pixels have a high covariance), simply looking at the difference of an image from a template image does not give a very good measure of the similarity of their shapes.

Eigenpaws

To get around this we need to build a set of “eigenpaws” (just like “eigenfaces” in facial recognition), and describe each paw print as a combination of these eigenpaws. This is identical to principal components analysis, and basically provides a way to reduce the dimensionality of our data, so that distance is a good measure of shape.

Because we have more training images than dimensions (2400 vs 400), there’s no need to do “fancy” linear algebra for speed. We can work directly with the covariance matrix of the training data set:

def make_eigenpaws(paw_data):
    """Creates a set of eigenpaws based on paw_data.
    paw_data is a numdata by numdimensions matrix of all of the observations."""
    average_paw = paw_data.mean(axis=0)
    paw_data -= average_paw

    # Determine the eigenvectors of the covariance matrix of the data
    cov = np.cov(paw_data.T)
    eigvals, eigvecs = np.linalg.eig(cov)

    # Sort the eigenvectors by ascending eigenvalue (largest is last)
    eig_idx = np.argsort(eigvals)
    sorted_eigvecs = eigvecs[:,eig_idx]
    sorted_eigvals = eigvals[:,eig_idx]

    # Now choose a cutoff number of eigenvectors to use 
    # (50 seems to work well, but it's arbirtrary...
    num_basis_vecs = 50
    basis_vecs = sorted_eigvecs[:,-num_basis_vecs:]

    return basis_vecs

These basis_vecs are the “eigenpaws”.

To use these, we simply dot (i.e. matrix multiplication) each paw image (as a 400-dimensional vector, rather than a 20×20 image) with the basis vectors. This gives us a 50-dimensional vector (one element per basis vector) that we can use to classify the image. Instead of comparing a 20×20 image to the 20×20 image of each “template” paw, we compare the 50-dimensional, transformed image to each 50-dimensional transformed template paw. This is much less sensitive to small variations in exactly how each toe is positioned, etc, and basically reduces the dimensionality of the problem to just the relevant dimensions.

Eigenpaw-based Paw Classification

Now we can simply use the distance between the 50-dimensional vectors and the “template” vectors for each leg to classify which paw is which:

codebook = np.load('codebook.npy') # Template vectors for each paw
average_paw = np.load('average_paw.npy')
basis_stds = np.load('basis_stds.npy') # Needed to "whiten" the dataset...
basis_vecs = np.load('basis_vecs.npy')
paw_code = {0:'LF', 1:'RH', 2:'RF', 3:'LH'}
def classify(paw):
    paw = paw.flatten()
    paw -= average_paw
    scores = paw.dot(basis_vecs) / basis_stds
    diff = codebook - scores
    diff *= diff
    diff = np.sqrt(diff.sum(axis=1))
    return paw_code[diff.argmin()]

Here are some of the results:

Remaining Problems

There are still some problems, particularly with dogs too small to make a clear pawprint… (It works best with large dogs, as the toes are more clearly seperated at the sensor’s resolution.) Also, partial pawprints aren’t recognized with this system, while they can be with the trapezoidal-pattern-based system.

However, because the eigenpaw analysis inherently uses a distance metric, we can classify the paws both ways, and fall back to the trapezoidal-pattern-based system when the eigenpaw analysis’s smallest distance from the “codebook” is over some threshold. I haven’t implemented this yet, though.

Phew… That was long! My hat is off to Ivo for having such a fun question!


回答 1

纯粹基于持续时间使用信息,我认为您可以应用运动学建模中的技术。即逆运动学。结合方向,长度,持续时间和总重量,可以提供一定程度的周期性,我希望这可能是尝试解决“脚掌分类”问题的第一步。

所有这些数据都可以用于创建有界多边形(或元组)的列表,您可以使用这些列表按步长然后按爪子[index]进行排序。

Using the information purely based on duration, I think you could apply techniques from modeling kinematics; namely Inverse Kinematics. Combined with orientation, length, duration, and total weight it gives some level of periodicity which, I would hope could be the first step trying to solve your “sorting of paws” problem.

All that data could be used to create a list of bounded polygons (or tuples), which you could use to sort by step size then by paw-ness [index].


回答 2

您可以让运行测试的技术人员手动输入第一个(或前两个)爪子吗?该过程可能是:

  • 向技术人员显示步骤图像的顺序,并要求他们注释第一个爪子。
  • 根据第一个爪子标记其他爪子,并允许技术人员进行更正或重新运行测试。这允许la脚或三足狗。

Can you have the technician running the test manually enter the first paw (or first two)? The process might be:

  • Show tech the order of steps image and require them to annotate the first paw.
  • Label the other paws based on the first paw and allow the tech to make corrections or re-run the test. This allows for lame or 3-legged dogs.

如何制作跨模块变量?

问题:如何制作跨模块变量?

__debug__变量很方便,部分原因是它会影响每个模块。如果我想创建另一个工作方式相同的变量,我该怎么做?

变量(让我们成为原始变量,并将其称为“ foo”)不必是真正的全局变量,在某种意义上,如果我在一个模块中更改foo,则在其他模块中对其进行更新。如果我可以在导入其他模块之前设置foo,然后它们会看到相同的值,那很好。

The __debug__ variable is handy in part because it affects every module. If I want to create another variable that works the same way, how would I do it?

The variable (let’s be original and call it ‘foo’) doesn’t have to be truly global, in the sense that if I change foo in one module, it is updated in others. I’d be fine if I could set foo before importing other modules and then they would see the same value for it.


回答 0

我不以任何方式,形状或形式认可该解决方案。但是,如果您将变量添加到__builtin__模块中,则__builtin__默认情况下,就如同所有其他包含- 的模块一样,都可以访问该变量。

a.py包含

print foo

b.py包含

import __builtin__
__builtin__.foo = 1
import a

结果是打印了“ 1”。

编辑:__builtin__模块可用作本地符号__builtins__-这就是其中两个答案之间存在差异的原因。另请注意,__builtin__它已builtins在python3中重命名为。

I don’t endorse this solution in any way, shape or form. But if you add a variable to the __builtin__ module, it will be accessible as if a global from any other module that includes __builtin__ — which is all of them, by default.

a.py contains

print foo

b.py contains

import __builtin__
__builtin__.foo = 1
import a

The result is that “1” is printed.

Edit: The __builtin__ module is available as the local symbol __builtins__ — that’s the reason for the discrepancy between two of these answers. Also note that __builtin__ has been renamed to builtins in python3.


回答 1

如果您需要一个全局的跨模块变量,也许只需简单的全局模块级变量就足够了。

a.py:

var = 1

b.py:

import a
print a.var
import c
print a.var

c.py:

import a
a.var = 2

测试:

$ python b.py
# -> 1 2

实际示例:Django的global_settings.py(尽管在Django应用中,设置是通过导入对象使用的 django.conf.settings)。

If you need a global cross-module variable maybe just simple global module-level variable will suffice.

a.py:

var = 1

b.py:

import a
print a.var
import c
print a.var

c.py:

import a
a.var = 2

Test:

$ python b.py
# -> 1 2

Real-world example: Django’s global_settings.py (though in Django apps settings are used by importing the object django.conf.settings).


回答 2

定义一个模块(称为“ globalbaz”)并在其中定义变量。使用此“ pseudoglobal”的所有模块都应导入“ globalbaz”模块,并使用“ globalbaz.var_name”进行引用

无论更改的位置如何,此方法均有效,您可以在导入之前或之后更改变量。导入的模块将使用最新值。(我在一个玩具示例中对此进行了测试)

为了澄清起见,globalbaz.py看起来像这样:

var_name = "my_useful_string"

Define a module ( call it “globalbaz” ) and have the variables defined inside it. All the modules using this “pseudoglobal” should import the “globalbaz” module, and refer to it using “globalbaz.var_name”

This works regardless of the place of the change, you can change the variable before or after the import. The imported module will use the latest value. (I tested this in a toy example)

For clarification, globalbaz.py looks just like this:

var_name = "my_useful_string"

回答 3

我认为在很多情况下它确实有意义,并且它简化了编程,使某些全局变量在多个(紧密耦合的)模块中广为人知。本着这种精神,我想详细说明一下由需要引用全局模块的模块导入的全局模块。

当只有一个这样的模块时,我将其命名为“ g”。在其中,我为每个要视为全局变量的变量分配默认值。在使用它们的每个模块中,我都不使用“ from g import var”,因为这只会导致局部变量,该局部变量仅在导入时才从g初始化。我以g.var和“ g”的形式进行大多数引用。不断提醒我,我正在处理其他模块可能访问的变量。

如果此类全局变量的值要在模块中的某些函数中频繁使用,则该函数可以创建本地副本:var = g.var。但是,重要的是要认识到对var的分配是本地的,并且全局g.var不能在不显式引用分配中的g.var的情况下进行更新。

请注意,您还可以让模块的不同子集共享多个这样的全局变量模块,以使事情得到更严格的控制。我对全局模块使用短名称的原因是为了避免由于出现它们而使代码过于混乱。仅凭少量经验,它们仅用1个或2个字符就变得足够易记。

当x尚未在g中定义时,仍然可以对gx进行赋值,然后另一个模块可以访问gx。但是,即使解释器允许,这种方法也不是那么透明,我会避免它。由于赋值的变量名称中有错字,仍有可能意外地在g中创建新变量。有时,对dir(g)的检查对于发现此类事故可能引起的任何意外名称很有用。

I believe that there are plenty of circumstances in which it does make sense and it simplifies programming to have some globals that are known across several (tightly coupled) modules. In this spirit, I would like to elaborate a bit on the idea of having a module of globals which is imported by those modules which need to reference them.

When there is only one such module, I name it “g”. In it, I assign default values for every variable I intend to treat as global. In each module that uses any of them, I do not use “from g import var”, as this only results in a local variable which is initialized from g only at the time of the import. I make most references in the form g.var, and the “g.” serves as a constant reminder that I am dealing with a variable that is potentially accessible to other modules.

If the value of such a global variable is to be used frequently in some function in a module, then that function can make a local copy: var = g.var. However, it is important to realize that assignments to var are local, and global g.var cannot be updated without referencing g.var explicitly in an assignment.

Note that you can also have multiple such globals modules shared by different subsets of your modules to keep things a little more tightly controlled. The reason I use short names for my globals modules is to avoid cluttering up the code too much with occurrences of them. With only a little experience, they become mnemonic enough with only 1 or 2 characters.

It is still possible to make an assignment to, say, g.x when x was not already defined in g, and a different module can then access g.x. However, even though the interpreter permits it, this approach is not so transparent, and I do avoid it. There is still the possibility of accidentally creating a new variable in g as a result of a typo in the variable name for an assignment. Sometimes an examination of dir(g) is useful to discover any surprise names that may have arisen by such accident.


回答 4

您可以将一个模块的全局变量传递给另一个模块:

在模块A中:

import module_b
my_var=2
module_b.do_something_with_my_globals(globals())
print my_var

在模块B中:

def do_something_with_my_globals(glob): # glob is simply a dict.
    glob["my_var"]=3

You can pass the globals of one module to onother:

In Module A:

import module_b
my_var=2
module_b.do_something_with_my_globals(globals())
print my_var

In Module B:

def do_something_with_my_globals(glob): # glob is simply a dict.
    glob["my_var"]=3

回答 5

全局变量通常不是一个好主意,但是您可以通过分配给__builtins__

__builtins__.foo = 'something'
print foo

同样,模块本身是可以从任何模块访问的变量。因此,如果您定义一个名为的模块my_globals.py

# my_globals.py
foo = 'something'

然后,您也可以在任何地方使用它:

import my_globals
print my_globals.foo

__builtins__通常,使用模块而不是修改模块是执行此类全局操作的更干净的方法。

Global variables are usually a bad idea, but you can do this by assigning to __builtins__:

__builtins__.foo = 'something'
print foo

Also, modules themselves are variables that you can access from any module. So if you define a module called my_globals.py:

# my_globals.py
foo = 'something'

Then you can use that from anywhere as well:

import my_globals
print my_globals.foo

Using modules rather than modifying __builtins__ is generally a cleaner way to do globals of this sort.


回答 6

您已经可以使用模块级变量执行此操作。无论从哪个模块导入模块,它们都是相同的。因此,您可以在将其放入,访问或从其他模块分配给它的任何有意义的模块中,将该变量设为模块级变量。最好调用一个函数来设置变量的值,或者使其成为某个单例对象的属性。这样,如果您最终需要在更改变量后运行一些代码,则可以这样做而不会破坏模块的外部接口。

通常,这不是一种很好的处理方法-很少使用全局变量-但我认为这是最干净的方法。

You can already do this with module-level variables. Modules are the same no matter what module they’re being imported from. So you can make the variable a module-level variable in whatever module it makes sense to put it in, and access it or assign to it from other modules. It would be better to call a function to set the variable’s value, or to make it a property of some singleton object. That way if you end up needing to run some code when the variable’s changed, you can do so without breaking your module’s external interface.

It’s not usually a great way to do things — using globals seldom is — but I think this is the cleanest way to do it.


回答 7

我想发布一个答案,在某些情况下找不到该变量。

循环导入可能会破坏模块的行为。

例如:

第一.py

import second
var = 1

第二个

import first
print(first.var)  # will throw an error because the order of execution happens before var gets declared.

main.py

import first

在这个例子中,它应该很明显,但是在大型代码库中,这确实很令人困惑。

I wanted to post an answer that there is a case where the variable won’t be found.

Cyclical imports may break the module behavior.

For example:

first.py

import second
var = 1

second.py

import first
print(first.var)  # will throw an error because the order of execution happens before var gets declared.

main.py

import first

On this is example it should be obvious, but in a large code-base, this can be really confusing.


回答 8

这听起来像修改__builtin__命名空间。去做吧:

import __builtin__
__builtin__.foo = 'some-value'

不要__builtins__直接使用(请注意额外的“ s”)-显然这可以是字典或模块。感谢ΤZnΩΤZΙΟΥ指出这一点,更多内容请点击这里

现在 foo可在任何地方使用。

我不建议一般这样做,但是使用此方法取决于程序员。

分配它必须按照上面的步骤进行,只是设置foo = 'some-other-value'只会在当前命名空间中进行设置。

This sounds like modifying the __builtin__ name space. To do it:

import __builtin__
__builtin__.foo = 'some-value'

Do not use the __builtins__ directly (notice the extra “s”) – apparently this can be a dictionary or a module. Thanks to ΤΖΩΤΖΙΟΥ for pointing this out, more can be found here.

Now foo is available for use everywhere.

I don’t recommend doing this generally, but the use of this is up to the programmer.

Assigning to it must be done as above, just setting foo = 'some-other-value' will only set it in the current namespace.


回答 9

我将此用于几个内置的原始函数,我觉得它们确实缺少了。一个示例是具有与filter,map,reduce相同的用法语义的find函数。

def builtin_find(f, x, d=None):
    for i in x:
        if f(i):
            return i
    return d

import __builtin__
__builtin__.find = builtin_find

一旦运行(例如,通过在入口点附近导入),所有模块就可以使用find(),显然,它是内置的。

find(lambda i: i < 0, [1, 3, 0, -5, -10])  # Yields -5, the first negative.

注意:当然,您可以使用过滤器和另一条线来测试零长度,或者使用减少一种奇怪的线来执行此操作,但是我始终觉得这很奇怪。

I use this for a couple built-in primitive functions that I felt were really missing. One example is a find function that has the same usage semantics as filter, map, reduce.

def builtin_find(f, x, d=None):
    for i in x:
        if f(i):
            return i
    return d

import __builtin__
__builtin__.find = builtin_find

Once this is run (for instance, by importing near your entry point) all your modules can use find() as though, obviously, it was built in.

find(lambda i: i < 0, [1, 3, 0, -5, -10])  # Yields -5, the first negative.

Note: You can do this, of course, with filter and another line to test for zero length, or with reduce in one sort of weird line, but I always felt it was weird.


回答 10

我可以使用字典来实现跨模块的可修改(或可变)变量:

# in myapp.__init__
Timeouts = {} # cross-modules global mutable variables for testing purpose
Timeouts['WAIT_APP_UP_IN_SECONDS'] = 60

# in myapp.mod1
from myapp import Timeouts

def wait_app_up(project_name, port):
    # wait for app until Timeouts['WAIT_APP_UP_IN_SECONDS']
    # ...

# in myapp.test.test_mod1
from myapp import Timeouts

def test_wait_app_up_fail(self):
    timeout_bak = Timeouts['WAIT_APP_UP_IN_SECONDS']
    Timeouts['WAIT_APP_UP_IN_SECONDS'] = 3
    with self.assertRaises(hlp.TimeoutException) as cm:
        wait_app_up(PROJECT_NAME, PROJECT_PORT)
    self.assertEqual("Timeout while waiting for App to start", str(cm.exception))
    Timeouts['WAIT_JENKINS_UP_TIMEOUT_IN_SECONDS'] = timeout_bak

启动时test_wait_app_up_fail,实际的超时时间为3秒。

I could achieve cross-module modifiable (or mutable) variables by using a dictionary:

# in myapp.__init__
Timeouts = {} # cross-modules global mutable variables for testing purpose
Timeouts['WAIT_APP_UP_IN_SECONDS'] = 60

# in myapp.mod1
from myapp import Timeouts

def wait_app_up(project_name, port):
    # wait for app until Timeouts['WAIT_APP_UP_IN_SECONDS']
    # ...

# in myapp.test.test_mod1
from myapp import Timeouts

def test_wait_app_up_fail(self):
    timeout_bak = Timeouts['WAIT_APP_UP_IN_SECONDS']
    Timeouts['WAIT_APP_UP_IN_SECONDS'] = 3
    with self.assertRaises(hlp.TimeoutException) as cm:
        wait_app_up(PROJECT_NAME, PROJECT_PORT)
    self.assertEqual("Timeout while waiting for App to start", str(cm.exception))
    Timeouts['WAIT_JENKINS_UP_TIMEOUT_IN_SECONDS'] = timeout_bak

When launching test_wait_app_up_fail, the actual timeout duration is 3 seconds.


回答 11

我想知道是否有可能避免使用全局变量的某些缺点(请参见例如http://wiki.c2.com/?GlobalVariablesAreBad通过使用类命名空间而不是全局/模块命名空间来传递变量值) 。以下代码表明这两种方法本质上是相同的。如下所述,使用类命名空间有一点优势。

以下代码片段还显示,可以在全局/模块命名空间和类命名空间中动态创建和删除属性或变量。

wall.py

# Note no definition of global variables

class router:
    """ Empty class """

我称此模块为“墙”,因为它是用来反弹变量的。它将用作临时定义空类“路由器”的全局变量和类范围属性的空间。

source.py

import wall
def sourcefn():
    msg = 'Hello world!'
    wall.msg = msg
    wall.router.msg = msg

该模块导入wall并定义一个函数sourcefn,该函数定义消息并通过两种不同的机制发出消息,一种通过全局机制,一种通过路由器函数。请注意,变量wall.msgwall.router.message是在此处首次在其各自的命名空间中定义。

目的地

import wall
def destfn():

    if hasattr(wall, 'msg'):
        print 'global: ' + wall.msg
        del wall.msg
    else:
        print 'global: ' + 'no message'

    if hasattr(wall.router, 'msg'):
        print 'router: ' + wall.router.msg
        del wall.router.msg
    else:
        print 'router: ' + 'no message'

该模块定义了一个函数destfn,该函数使用两种不同的机制来接收源发出的消息。它允许变量“ msg”可能不存在。destfn在显示变量后也将删除它们。

main.py

import source, dest

source.sourcefn()

dest.destfn() # variables deleted after this call
dest.destfn()

该模块依次调用先前定义的函数。第一次调用dest.destfn变量后wall.msgwall.router.msg不再存在。

该程序的输出为:

全球:您好,世界!
路由器:世界您好!
全局:无消息
路由器:无消息

上面的代码片段表明,模块/全局机制和类/类变量机制基本相同。

如果要共享许多变量,则可以通过使用多个wall类型的模块(例如wall1,wall2等)或通过在单个文件中定义多个路由器类型的类来管理命名空间污染。后者稍微更整洁,因此也许代表了使用类变量机制的边际优势。

I wondered if it would be possible to avoid some of the disadvantages of using global variables (see e.g. http://wiki.c2.com/?GlobalVariablesAreBad) by using a class namespace rather than a global/module namespace to pass values of variables. The following code indicates that the two methods are essentially identical. There is a slight advantage in using class namespaces as explained below.

The following code fragments also show that attributes or variables may be dynamically created and deleted in both global/module namespaces and class namespaces.

wall.py

# Note no definition of global variables

class router:
    """ Empty class """

I call this module ‘wall’ since it is used to bounce variables off of. It will act as a space to temporarily define global variables and class-wide attributes of the empty class ‘router’.

source.py

import wall
def sourcefn():
    msg = 'Hello world!'
    wall.msg = msg
    wall.router.msg = msg

This module imports wall and defines a single function sourcefn which defines a message and emits it by two different mechanisms, one via globals and one via the router function. Note that the variables wall.msg and wall.router.message are defined here for the first time in their respective namespaces.

dest.py

import wall
def destfn():

    if hasattr(wall, 'msg'):
        print 'global: ' + wall.msg
        del wall.msg
    else:
        print 'global: ' + 'no message'

    if hasattr(wall.router, 'msg'):
        print 'router: ' + wall.router.msg
        del wall.router.msg
    else:
        print 'router: ' + 'no message'

This module defines a function destfn which uses the two different mechanisms to receive the messages emitted by source. It allows for the possibility that the variable ‘msg’ may not exist. destfn also deletes the variables once they have been displayed.

main.py

import source, dest

source.sourcefn()

dest.destfn() # variables deleted after this call
dest.destfn()

This module calls the previously defined functions in sequence. After the first call to dest.destfn the variables wall.msg and wall.router.msg no longer exist.

The output from the program is:

global: Hello world!
router: Hello world!
global: no message
router: no message

The above code fragments show that the module/global and the class/class variable mechanisms are essentially identical.

If a lot of variables are to be shared, namespace pollution can be managed either by using several wall-type modules, e.g. wall1, wall2 etc. or by defining several router-type classes in a single file. The latter is slightly tidier, so perhaps represents a marginal advantage for use of the class-variable mechanism.