标签归档:python-3.x

FutureWarning:逐元素比较失败;返回标量,但将来将执行元素比较

问题:FutureWarning:逐元素比较失败;返回标量,但将来将执行元素比较

0.19.1在Python 3上使用Pandas 。我在这些代码行上收到警告。我正在尝试获取一个包含所有Peter在column处存在string的行号的列表Unnamed: 5

df = pd.read_excel(xls_path)
myRows = df[df['Unnamed: 5'] == 'Peter'].index.tolist()

它产生一个警告:

"\Python36\lib\site-packages\pandas\core\ops.py:792: FutureWarning: elementwise 
comparison failed; returning scalar, but in the future will perform 
elementwise comparison 
result = getattr(x, name)(y)"

这是什么FutureFarning,由于它似乎起作用,因此我应该忽略它。

I am using Pandas 0.19.1 on Python 3. I am getting a warning on these lines of code. I’m trying to get a list that contains all the row numbers where string Peter is present at column Unnamed: 5.

df = pd.read_excel(xls_path)
myRows = df[df['Unnamed: 5'] == 'Peter'].index.tolist()

It produces a Warning:

"\Python36\lib\site-packages\pandas\core\ops.py:792: FutureWarning: elementwise 
comparison failed; returning scalar, but in the future will perform 
elementwise comparison 
result = getattr(x, name)(y)"

What is this FutureWarning and should I ignore it since it seems to work.


回答 0

此FutureWarning并非来自Pandas,而是来自numpy,并且该错误也影响了matplotlib和其他漏洞,以下是在更接近问题根源的位置重现警告的方法:

import numpy as np
print(np.__version__)   # Numpy version '1.12.0'
'x' in np.arange(5)       #Future warning thrown here

FutureWarning: elementwise comparison failed; returning scalar instead, but in the 
future will perform elementwise comparison
False

使用double equals运算符重现此错误的另一种方法:

import numpy as np
np.arange(5) == np.arange(5).astype(str)    #FutureWarning thrown here

受此FutureWarning影响的Matplotlib示例在其颤动图实施下:https ://matplotlib.org/examples/pylab_examples/quiver_demo.html

这里发生了什么?

当您将字符串与numpy的数字类型进行比较时,Numpy和本机python之间会发生什么分歧。请注意,左操作数是python的草皮,是原始字符串,中间操作是python的草皮,而右操作数是numpy的草皮。您应该返回Python样式的Scalar还是Numpy样式的ndarray布尔值?Numpy说布尔的ndarray,Pythonic开发人员不同意。经典对峙。

如果数组中存在item,应该是元素比较还是标量?

如果您的代码或库使用in==运算符将python字符串与numpy ndarrays比较,则它们不兼容,因此,当您尝试使用它时,它将返回标量,但仅在现在。警告表示将来这种行为可能会改变,因此,如果python / numpy决定采用Numpy样式,则您的代码会全程吐槽。

提交的错误报告:

Numpy和Python处于僵持状态,目前操作返回标量,但将来可能会改变。

https://github.com/numpy/numpy/issues/6784

https://github.com/pandas-dev/pandas/issues/7830

两种解决方法:

无论您锁定Python和numpy的版本,忽略这些警告并期望行为不改变,或转换的左侧和右侧的操作数==,并in从一个numpy的类型或原始数值Python类型。

全局禁止警告:

import warnings
import numpy as np
warnings.simplefilter(action='ignore', category=FutureWarning)
print('x' in np.arange(5))   #returns False, without Warning

逐行抑制警告。

import warnings
import numpy as np

with warnings.catch_warnings():
    warnings.simplefilter(action='ignore', category=FutureWarning)
    print('x' in np.arange(2))   #returns False, warning is suppressed

print('x' in np.arange(10))   #returns False, Throws FutureWarning

只需按名称隐藏警告,然后在其旁边添加一个大声注释,提及python和numpy的当前版本,并说此代码很脆弱,并且需要这些版本,并在此处添加了链接。踢罐子的路。

TLDR: pandas是绝地武士;numpy是小屋 并且python是银河帝国。 https://youtu.be/OZczsiCfQQk?t=3

This FutureWarning isn’t from Pandas, it is from numpy and the bug also affects matplotlib and others, here’s how to reproduce the warning nearer to the source of the trouble:

import numpy as np
print(np.__version__)   # Numpy version '1.12.0'
'x' in np.arange(5)       #Future warning thrown here

FutureWarning: elementwise comparison failed; returning scalar instead, but in the 
future will perform elementwise comparison
False

Another way to reproduce this bug using the double equals operator:

import numpy as np
np.arange(5) == np.arange(5).astype(str)    #FutureWarning thrown here

An example of Matplotlib affected by this FutureWarning under their quiver plot implementation: https://matplotlib.org/examples/pylab_examples/quiver_demo.html

What’s going on here?

There is a disagreement between Numpy and native python on what should happen when you compare a strings to numpy’s numeric types. Notice the left operand is python’s turf, a primitive string, and the middle operation is python’s turf, but the right operand is numpy’s turf. Should you return a Python style Scalar or a Numpy style ndarray of Boolean? Numpy says ndarray of bool, Pythonic developers disagree. Classic standoff.

Should it be elementwise comparison or Scalar if item exists in the array?

If your code or library is using the in or == operators to compare python string to numpy ndarrays, they aren’t compatible, so when if you try it, it returns a scalar, but only for now. The Warning indicates that in the future this behavior might change so your code pukes all over the carpet if python/numpy decide to do adopt Numpy style.

Submitted Bug reports:

Numpy and Python are in a standoff, for now the operation returns a scalar, but in the future it may change.

https://github.com/numpy/numpy/issues/6784

https://github.com/pandas-dev/pandas/issues/7830

Two workaround solutions:

Either lockdown your version of python and numpy, ignore the warnings and expect the behavior to not change, or convert both left and right operands of == and in to be from a numpy type or primitive python numeric type.

Suppress the warning globally:

import warnings
import numpy as np
warnings.simplefilter(action='ignore', category=FutureWarning)
print('x' in np.arange(5))   #returns False, without Warning

Suppress the warning on a line by line basis.

import warnings
import numpy as np

with warnings.catch_warnings():
    warnings.simplefilter(action='ignore', category=FutureWarning)
    print('x' in np.arange(2))   #returns False, warning is suppressed

print('x' in np.arange(10))   #returns False, Throws FutureWarning

Just suppress the warning by name, then put a loud comment next to it mentioning the current version of python and numpy, saying this code is brittle and requires these versions and put a link to here. Kick the can down the road.

TLDR: pandas are Jedi; numpy are the hutts; and python is the galactic empire. https://youtu.be/OZczsiCfQQk?t=3


回答 1

当我尝试将index_col读取文件设置为Panda的数据帧时,出现相同的错误:

df = pd.read_csv('my_file.tsv', sep='\t', header=0, index_col=['0'])  ## or same with the following
df = pd.read_csv('my_file.tsv', sep='\t', header=0, index_col=[0])

我以前从未遇到过这样的错误。我仍然试图找出背后的原因(使用@Eric Leschinski的解释和其他解释)。

无论如何,在我找出原因之前,以下方法可以立即解决该问题:

df = pd.read_csv('my_file.tsv', sep='\t', header=0)  ## not setting the index_col
df.set_index(['0'], inplace=True)

一旦弄清这种行为的原因,我将立即更新。

I get the same error when I try to set the index_col reading a file into a Panda‘s data-frame:

df = pd.read_csv('my_file.tsv', sep='\t', header=0, index_col=['0'])  ## or same with the following
df = pd.read_csv('my_file.tsv', sep='\t', header=0, index_col=[0])

I have never encountered such an error previously. I still am trying to figure out the reason behind this (using @Eric Leschinski explanation and others).

Anyhow, the following approach solves the problem for now until I figure the reason out:

df = pd.read_csv('my_file.tsv', sep='\t', header=0)  ## not setting the index_col
df.set_index(['0'], inplace=True)

I will update this as soon as I figure out the reason for such behavior.


回答 2

我对同一条警告消息的体验是由TypeError引起的。

TypeError:类型比较无效

因此,您可能要检查 Unnamed: 5

for x in df['Unnamed: 5']:
  print(type(x))  # are they 'str' ?

这是我可以复制警告消息的方法:

import pandas as pd
import numpy as np
df = pd.DataFrame(np.random.randn(3, 2), columns=['num1', 'num2'])
df['num3'] = 3
df.loc[df['num3'] == '3', 'num3'] = 4  # TypeError and the Warning
df.loc[df['num3'] == 3, 'num3'] = 4  # No Error

希望能帮助到你。

My experience to the same warning message was caused by TypeError.

TypeError: invalid type comparison

So, you may want to check the data type of the Unnamed: 5

for x in df['Unnamed: 5']:
  print(type(x))  # are they 'str' ?

Here is how I can replicate the warning message:

import pandas as pd
import numpy as np
df = pd.DataFrame(np.random.randn(3, 2), columns=['num1', 'num2'])
df['num3'] = 3
df.loc[df['num3'] == '3', 'num3'] = 4  # TypeError and the Warning
df.loc[df['num3'] == 3, 'num3'] = 4  # No Error

Hope it helps.


回答 3

无法击败Eric Leschinski的详细答案,但这是针对我认为尚未提及的原始问题的快速解决方法-将字符串放在列表中并使用.isin而不是==

例如:

import pandas as pd
import numpy as np

df = pd.DataFrame({"Name": ["Peter", "Joe"], "Number": [1, 2]})

# Raises warning using == to compare different types:
df.loc[df["Number"] == "2", "Number"]

# No warning using .isin:
df.loc[df["Number"].isin(["2"]), "Number"]

Can’t beat Eric Leschinski’s awesomely detailed answer, but here’s a quick workaround to the original question that I don’t think has been mentioned yet – put the string in a list and use .isin instead of ==

For example:

import pandas as pd
import numpy as np

df = pd.DataFrame({"Name": ["Peter", "Joe"], "Number": [1, 2]})

# Raises warning using == to compare different types:
df.loc[df["Number"] == "2", "Number"]

# No warning using .isin:
df.loc[df["Number"].isin(["2"]), "Number"]

回答 4

一个快速的解决方法是使用numpy.core.defchararray。我也遇到了同样的警告消息,并且能够使用上述模块来解决它。

import numpy.core.defchararray as npd
resultdataset = npd.equal(dataset1, dataset2)

A quick workaround for this is to use numpy.core.defchararray. I also faced the same warning message and was able to resolve it using above module.

import numpy.core.defchararray as npd
resultdataset = npd.equal(dataset1, dataset2)

回答 5

埃里克(Eric)的回答很有帮助,说明了麻烦来自将Pandas系列(包含NumPy数组)与Python字符串进行比较。不幸的是,他的两个解决方法都只是抑制了警告。

要首先编写不会引起警告的代码,请显式地将字符串与Series的每个元素进行比较,并为每个元素获取单独的布尔值。例如,您可以使用map和匿名函数。

myRows = df[df['Unnamed: 5'].map( lambda x: x == 'Peter' )].index.tolist()

Eric’s answer helpfully explains that the trouble comes from comparing a Pandas Series (containing a NumPy array) to a Python string. Unfortunately, his two workarounds both just suppress the warning.

To write code that doesn’t cause the warning in the first place, explicitly compare your string to each element of the Series and get a separate bool for each. For example, you could use map and an anonymous function.

myRows = df[df['Unnamed: 5'].map( lambda x: x == 'Peter' )].index.tolist()

回答 6

如果数组不太大或数组不太多,则可以通过将其左侧强制==为字符串来摆脱困境:

myRows = df[str(df['Unnamed: 5']) == 'Peter'].index.tolist()

但这如果df['Unnamed: 5']是字符串则要慢约1.5倍,如果df['Unnamed: 5']是小的numpy数组(长度= 10)则要慢25-30倍,如果是长度为100的numpy数组则要慢150-160倍(时间超过500次试验) 。

a = linspace(0, 5, 10)
b = linspace(0, 50, 100)
n = 500
string1 = 'Peter'
string2 = 'blargh'
times_a = zeros(n)
times_str_a = zeros(n)
times_s = zeros(n)
times_str_s = zeros(n)
times_b = zeros(n)
times_str_b = zeros(n)
for i in range(n):
    t0 = time.time()
    tmp1 = a == string1
    t1 = time.time()
    tmp2 = str(a) == string1
    t2 = time.time()
    tmp3 = string2 == string1
    t3 = time.time()
    tmp4 = str(string2) == string1
    t4 = time.time()
    tmp5 = b == string1
    t5 = time.time()
    tmp6 = str(b) == string1
    t6 = time.time()
    times_a[i] = t1 - t0
    times_str_a[i] = t2 - t1
    times_s[i] = t3 - t2
    times_str_s[i] = t4 - t3
    times_b[i] = t5 - t4
    times_str_b[i] = t6 - t5
print('Small array:')
print('Time to compare without str conversion: {} s. With str conversion: {} s'.format(mean(times_a), mean(times_str_a)))
print('Ratio of time with/without string conversion: {}'.format(mean(times_str_a)/mean(times_a)))

print('\nBig array')
print('Time to compare without str conversion: {} s. With str conversion: {} s'.format(mean(times_b), mean(times_str_b)))
print(mean(times_str_b)/mean(times_b))

print('\nString')
print('Time to compare without str conversion: {} s. With str conversion: {} s'.format(mean(times_s), mean(times_str_s)))
print('Ratio of time with/without string conversion: {}'.format(mean(times_str_s)/mean(times_s)))

结果:

Small array:
Time to compare without str conversion: 6.58464431763e-06 s. With str conversion: 0.000173756599426 s
Ratio of time with/without string conversion: 26.3881526541

Big array
Time to compare without str conversion: 5.44309616089e-06 s. With str conversion: 0.000870866775513 s
159.99474375821288

String
Time to compare without str conversion: 5.89370727539e-07 s. With str conversion: 8.30173492432e-07 s
Ratio of time with/without string conversion: 1.40857605178

If your arrays aren’t too big or you don’t have too many of them, you might be able to get away with forcing the left hand side of == to be a string:

myRows = df[str(df['Unnamed: 5']) == 'Peter'].index.tolist()

But this is ~1.5 times slower if df['Unnamed: 5'] is a string, 25-30 times slower if df['Unnamed: 5'] is a small numpy array (length = 10), and 150-160 times slower if it’s a numpy array with length 100 (times averaged over 500 trials).

a = linspace(0, 5, 10)
b = linspace(0, 50, 100)
n = 500
string1 = 'Peter'
string2 = 'blargh'
times_a = zeros(n)
times_str_a = zeros(n)
times_s = zeros(n)
times_str_s = zeros(n)
times_b = zeros(n)
times_str_b = zeros(n)
for i in range(n):
    t0 = time.time()
    tmp1 = a == string1
    t1 = time.time()
    tmp2 = str(a) == string1
    t2 = time.time()
    tmp3 = string2 == string1
    t3 = time.time()
    tmp4 = str(string2) == string1
    t4 = time.time()
    tmp5 = b == string1
    t5 = time.time()
    tmp6 = str(b) == string1
    t6 = time.time()
    times_a[i] = t1 - t0
    times_str_a[i] = t2 - t1
    times_s[i] = t3 - t2
    times_str_s[i] = t4 - t3
    times_b[i] = t5 - t4
    times_str_b[i] = t6 - t5
print('Small array:')
print('Time to compare without str conversion: {} s. With str conversion: {} s'.format(mean(times_a), mean(times_str_a)))
print('Ratio of time with/without string conversion: {}'.format(mean(times_str_a)/mean(times_a)))

print('\nBig array')
print('Time to compare without str conversion: {} s. With str conversion: {} s'.format(mean(times_b), mean(times_str_b)))
print(mean(times_str_b)/mean(times_b))

print('\nString')
print('Time to compare without str conversion: {} s. With str conversion: {} s'.format(mean(times_s), mean(times_str_s)))
print('Ratio of time with/without string conversion: {}'.format(mean(times_str_s)/mean(times_s)))

Result:

Small array:
Time to compare without str conversion: 6.58464431763e-06 s. With str conversion: 0.000173756599426 s
Ratio of time with/without string conversion: 26.3881526541

Big array
Time to compare without str conversion: 5.44309616089e-06 s. With str conversion: 0.000870866775513 s
159.99474375821288

String
Time to compare without str conversion: 5.89370727539e-07 s. With str conversion: 8.30173492432e-07 s
Ratio of time with/without string conversion: 1.40857605178

回答 7

就我而言,发出警告的原因仅仅是布尔索引的常规类型-因为该系列只有np.nan。示范(熊猫1.0.3):

>>> import pandas as pd
>>> import numpy as np
>>> pd.Series([np.nan, 'Hi']) == 'Hi'
0    False
1     True
>>> pd.Series([np.nan, np.nan]) == 'Hi'
~/anaconda3/envs/ms3/lib/python3.7/site-packages/pandas/core/ops/array_ops.py:255: FutureWarning: elementwise comparison failed; returning scalar instead, but in the future will perform elementwise comparison
  res_values = method(rvalues)
0    False
1    False

我认为对于pandas 1.0,他们确实希望您使用'string'允许pd.NA值的新数据类型:

>>> pd.Series([pd.NA, pd.NA]) == 'Hi'
0    False
1    False
>>> pd.Series([np.nan, np.nan], dtype='string') == 'Hi'
0    <NA>
1    <NA>
>>> (pd.Series([np.nan, np.nan], dtype='string') == 'Hi').fillna(False)
0    False
1    False

不喜欢他们在何时开始使用布尔索引等日常功能。

In my case, the warning occurred because of just the regular type of boolean indexing — because the series had only np.nan. Demonstration (pandas 1.0.3):

>>> import pandas as pd
>>> import numpy as np
>>> pd.Series([np.nan, 'Hi']) == 'Hi'
0    False
1     True
>>> pd.Series([np.nan, np.nan]) == 'Hi'
~/anaconda3/envs/ms3/lib/python3.7/site-packages/pandas/core/ops/array_ops.py:255: FutureWarning: elementwise comparison failed; returning scalar instead, but in the future will perform elementwise comparison
  res_values = method(rvalues)
0    False
1    False

I think with pandas 1.0 they really want you to use the new 'string' datatype which allows for pd.NA values:

>>> pd.Series([pd.NA, pd.NA]) == 'Hi'
0    False
1    False
>>> pd.Series([np.nan, np.nan], dtype='string') == 'Hi'
0    <NA>
1    <NA>
>>> (pd.Series([np.nan, np.nan], dtype='string') == 'Hi').fillna(False)
0    False
1    False

Don’t love at which point they tinkered with every-day functionality such as boolean indexing.


回答 8

我收到此警告是因为我认为我的列包含空字符串,但是在检查时,它包含了np.nan!

if df['column'] == '':

将我的列更改为空字符串有帮助:)

I got this warning because I thought my column contained null strings, but on checking, it contained np.nan!

if df['column'] == '':

Changing my column to empty strings helped :)


回答 9

我已经比较了几种可能的方法,包括熊猫,几种numpy方法和列表理解方法。

首先,让我们从基线开始:

>>> import numpy as np
>>> import operator
>>> import pandas as pd

>>> x = [1, 2, 1, 2]
>>> %time count = np.sum(np.equal(1, x))
>>> print("Count {} using numpy equal with ints".format(count))
CPU times: user 52 µs, sys: 0 ns, total: 52 µs
Wall time: 56 µs
Count 2 using numpy equal with ints

因此,我们的基准是该计数应该正确2,并且我们应该大约50 us

现在,我们尝试使用朴素的方法:

>>> x = ['s', 'b', 's', 'b']
>>> %time count = np.sum(np.equal('s', x))
>>> print("Count {} using numpy equal".format(count))
CPU times: user 145 µs, sys: 24 µs, total: 169 µs
Wall time: 158 µs
Count NotImplemented using numpy equal
/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/ipykernel_launcher.py:1: FutureWarning: elementwise comparison failed; returning scalar instead, but in the future will perform elementwise comparison
  """Entry point for launching an IPython kernel.

在这里,我们得到了错误的答案(NotImplemented != 2),这花了我们很长时间,并且引发了警告。

因此,我们将尝试另一种幼稚的方法:

>>> %time count = np.sum(x == 's')
>>> print("Count {} using ==".format(count))
CPU times: user 46 µs, sys: 1 µs, total: 47 µs
Wall time: 50.1 µs
Count 0 using ==

同样,错误答案(0 != 2)。这更加隐蔽,因为没有后续警告(0可以像一样传递2)。

现在,让我们尝试一个列表理解:

>>> %time count = np.sum([operator.eq(_x, 's') for _x in x])
>>> print("Count {} using list comprehension".format(count))
CPU times: user 55 µs, sys: 1 µs, total: 56 µs
Wall time: 60.3 µs
Count 2 using list comprehension

我们在这里得到正确的答案,而且速度很快!

另一种可能性pandas

>>> y = pd.Series(x)
>>> %time count = np.sum(y == 's')
>>> print("Count {} using pandas ==".format(count))
CPU times: user 453 µs, sys: 31 µs, total: 484 µs
Wall time: 463 µs
Count 2 using pandas ==

慢,但是正确!

最后,我将使用的选项是:将numpy数组转换为object类型:

>>> x = np.array(['s', 'b', 's', 'b']).astype(object)
>>> %time count = np.sum(np.equal('s', x))
>>> print("Count {} using numpy equal".format(count))
CPU times: user 50 µs, sys: 1 µs, total: 51 µs
Wall time: 55.1 µs
Count 2 using numpy equal

快速正确!

I’ve compared a few of the methods possible for doing this, including pandas, several numpy methods, and a list comprehension method.

First, let’s start with a baseline:

>>> import numpy as np
>>> import operator
>>> import pandas as pd

>>> x = [1, 2, 1, 2]
>>> %time count = np.sum(np.equal(1, x))
>>> print("Count {} using numpy equal with ints".format(count))
CPU times: user 52 µs, sys: 0 ns, total: 52 µs
Wall time: 56 µs
Count 2 using numpy equal with ints

So, our baseline is that the count should be correct 2, and we should take about 50 us.

Now, we try the naive method:

>>> x = ['s', 'b', 's', 'b']
>>> %time count = np.sum(np.equal('s', x))
>>> print("Count {} using numpy equal".format(count))
CPU times: user 145 µs, sys: 24 µs, total: 169 µs
Wall time: 158 µs
Count NotImplemented using numpy equal
/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/ipykernel_launcher.py:1: FutureWarning: elementwise comparison failed; returning scalar instead, but in the future will perform elementwise comparison
  """Entry point for launching an IPython kernel.

And here, we get the wrong answer (NotImplemented != 2), it takes us a long time, and it throws the warning.

So we’ll try another naive method:

>>> %time count = np.sum(x == 's')
>>> print("Count {} using ==".format(count))
CPU times: user 46 µs, sys: 1 µs, total: 47 µs
Wall time: 50.1 µs
Count 0 using ==

Again, the wrong answer (0 != 2). This is even more insidious because there’s no subsequent warnings (0 can be passed around just like 2).

Now, let’s try a list comprehension:

>>> %time count = np.sum([operator.eq(_x, 's') for _x in x])
>>> print("Count {} using list comprehension".format(count))
CPU times: user 55 µs, sys: 1 µs, total: 56 µs
Wall time: 60.3 µs
Count 2 using list comprehension

We get the right answer here, and it’s pretty fast!

Another possibility, pandas:

>>> y = pd.Series(x)
>>> %time count = np.sum(y == 's')
>>> print("Count {} using pandas ==".format(count))
CPU times: user 453 µs, sys: 31 µs, total: 484 µs
Wall time: 463 µs
Count 2 using pandas ==

Slow, but correct!

And finally, the option I’m going to use: casting the numpy array to the object type:

>>> x = np.array(['s', 'b', 's', 'b']).astype(object)
>>> %time count = np.sum(np.equal('s', x))
>>> print("Count {} using numpy equal".format(count))
CPU times: user 50 µs, sys: 1 µs, total: 51 µs
Wall time: 55.1 µs
Count 2 using numpy equal

Fast and correct!


回答 10

我有导致错误的此代码:

for t in dfObj['time']:
  if type(t) == str:
    the_date = dateutil.parser.parse(t)
    loc_dt_int = int(the_date.timestamp())
    dfObj.loc[t == dfObj.time, 'time'] = loc_dt_int

我将其更改为:

for t in dfObj['time']:
  try:
    the_date = dateutil.parser.parse(t)
    loc_dt_int = int(the_date.timestamp())
    dfObj.loc[t == dfObj.time, 'time'] = loc_dt_int
  except Exception as e:
    print(e)
    continue

为了避免比较,它会发出警告-如上所述。我只需要避免这种异常,因为dfObj.loc在for循环中,也许有一种方法可以告诉它不要检查已更改的行。

I had this code which was causing the error:

for t in dfObj['time']:
  if type(t) == str:
    the_date = dateutil.parser.parse(t)
    loc_dt_int = int(the_date.timestamp())
    dfObj.loc[t == dfObj.time, 'time'] = loc_dt_int

I changed it to this:

for t in dfObj['time']:
  try:
    the_date = dateutil.parser.parse(t)
    loc_dt_int = int(the_date.timestamp())
    dfObj.loc[t == dfObj.time, 'time'] = loc_dt_int
  except Exception as e:
    print(e)
    continue

to avoid the comparison, which is throwing the warning – as stated above. I only had to avoid the exception because of dfObj.loc in the for loop, maybe there is a way to tell it not to check the rows it has already changed.


python字符串前的a前缀是什么意思?

问题:python字符串前的a前缀是什么意思?

在python源代码中,我偶然发现在类似如下的字符串之前看到一个小b

b"abcdef"

我知道u表示unicode字符串的r前缀和原始字符串文字的前缀。

b它看起来像一个没有任何前缀的纯字符串,它代表什么?在哪种源代码中有用?

In a python source code I stumbled upon I’ve seen a small b before a string like in:

b"abcdef"

I know about the u prefix signifying a unicode string, and the r prefix for a raw string literal.

What does the b stand for and in which kind of source code is it useful as it seems to be exactly like a plain string without any prefix?


回答 0

这是Python3 bytes 文字。在Python 2.5和更早版本中不存在此前缀(等效于2.x的纯字符串,而3.x的纯字符串等效u于2.x中带前缀的文字)。在Python 2.6+中,它等效于纯字符串,以与3.x兼容

This is Python3 bytes literal. This prefix is absent in Python 2.5 and older (it is equivalent to a plain string of 2.x, while plain string of 3.x is equivalent to a literal with u prefix in 2.x). In Python 2.6+ it is equivalent to a plain string, for compatibility with 3.x.


回答 1

b前缀表示一个bytes字符串常量

如果您看到它在Python 3源代码中使用过,该表达式将创建一个bytes对象,而不是常规Unicode str对象。如果您看到它在Python Shell中回显,或者作为列表,字典或其他容器内容的一部分回显,那么您会看到bytes使用此符号表示的对象。

bytes对象基本上包含一个介于0-255之间的整数序列,但是当表示这些对象时,Python 会将这些字节显示为ASCII码点,以使其更易于读取其内容。外部任何字节可打印的ASCII字符范围被示为转义序列(例如\n\x82等)。相反,您可以同时使用ASCII字符和转义序列来定义字节值。对于ASCII值,使用其数字值(例如b'A'== b'\x41'

因为bytes对象由整数序列组成,所以您可以bytes从其他任何整数序列(其值在0-255范围内)构造一个对象,例如列表:

bytes([72, 101, 108, 108, 111])

和索引给你回的整数(但切片产生一个新bytes值;对于上面的例子中,value[0]给你72,但是value[:1]b'H'作为72是用于大写字母的ASCII码点ħ)。

bytes模拟二进制数据,包括编码文本。如果您的bytes值确实包含文本,则需要先使用正确的编解码器对其进行解码。例如,如果数据编码为UTF-8,则可以使用以下方法获取Unicode str值:

strvalue = bytesvalue.decode('utf-8')

相反,要从str对象中的文本转到bytes需要编码。您需要确定要使用的编码。默认是使用UTF-8,但是您所需要的很大程度上取决于您的用例:

bytesvalue = strvalue.encode('utf-8')

您也可以使用构造函数bytes(strvalue, encoding)执行相同的操作。

解码和编码方法都使用一个额外的参数来指定应如何处理错误

Python 2版本2.6和2.7还支持使用b'..'字符串文字语法创建字符串文字,以简化适用于Python 2和3的代码。

bytes对象是不变的,就像str字符串一样。如果您需要一个可变的字节值,请使用一个bytearray()对象

The b prefix signifies a bytes string literal.

If you see it used in Python 3 source code, the expression creates a bytes object, not a regular Unicode str object. If you see it echoed in your Python shell or as part of a list, dict or other container contents, then you see a bytes object represented using this notation.

bytes objects basically contain a sequence of integers in the range 0-255, but when represented, Python displays these bytes as ASCII codepoints to make it easier to read their contents. Any bytes outside the printable range of ASCII characters are shown as escape sequences (e.g. \n, \x82, etc.). Inversely, you can use both ASCII characters and escape sequences to define byte values; for ASCII values their numeric value is used (e.g. b'A' == b'\x41')

Because a bytes object consist of a sequence of integers, you can construct a bytes object from any other sequence of integers with values in the 0-255 range, like a list:

bytes([72, 101, 108, 108, 111])

and indexing gives you back the integers (but slicing produces a new bytes value; for the above example, value[0] gives you 72, but value[:1] is b'H' as 72 is the ASCII code point for the capital letter H).

bytes model binary data, including encoded text. If your bytes value does contain text, you need to first decode it, using the correct codec. If the data is encoded as UTF-8, for example, you can obtain a Unicode str value with:

strvalue = bytesvalue.decode('utf-8')

Conversely, to go from text in a str object to bytes you need to encode. You need to decide on an encoding to use; the default is to use UTF-8, but what you will need is highly dependent on your use case:

bytesvalue = strvalue.encode('utf-8')

You can also use the constructor, bytes(strvalue, encoding) to do the same.

Both the decoding and encoding methods take an extra argument to specify how errors should be handled.

Python 2, versions 2.6 and 2.7 also support creating string literals using b'..' string literal syntax, to ease code that works on both Python 2 and 3.

bytes objects are immutable, just like str strings are. Use a bytearray() object if you need to have a mutable bytes value.


TypeError:无法在re.findall()中的类似字节的对象上使用字符串模式

问题:TypeError:无法在re.findall()中的类似字节的对象上使用字符串模式

我正在尝试学习如何自动从页面获取URL。在下面的代码中,我试图获取网页的标题:

import urllib.request
import re

url = "http://www.google.com"
regex = r'<title>(,+?)</title>'
pattern  = re.compile(regex)

with urllib.request.urlopen(url) as response:
   html = response.read()

title = re.findall(pattern, html)
print(title)

我得到这个意外的错误:

Traceback (most recent call last):
  File "path\to\file\Crawler.py", line 11, in <module>
    title = re.findall(pattern, html)
  File "C:\Python33\lib\re.py", line 201, in findall
    return _compile(pattern, flags).findall(string)
TypeError: can't use a string pattern on a bytes-like object

我究竟做错了什么?

I am trying to learn how to automatically fetch urls from a page. In the following code I am trying to get the title of the webpage:

import urllib.request
import re

url = "http://www.google.com"
regex = r'<title>(,+?)</title>'
pattern  = re.compile(regex)

with urllib.request.urlopen(url) as response:
   html = response.read()

title = re.findall(pattern, html)
print(title)

And I get this unexpected error:

Traceback (most recent call last):
  File "path\to\file\Crawler.py", line 11, in <module>
    title = re.findall(pattern, html)
  File "C:\Python33\lib\re.py", line 201, in findall
    return _compile(pattern, flags).findall(string)
TypeError: can't use a string pattern on a bytes-like object

What am I doing wrong?


回答 0

要转换的HTML(一个字节状物体)到使用字符串.decode,例如 html = response.read().decode('utf-8')

请参见将字节转换为Python字符串

You want to convert html (a byte-like object) into a string using .decode, e.g. html = response.read().decode('utf-8').

See Convert bytes to a Python String


回答 1

问题是您的正则表达式是一个字符串,但是htmlbytes

>>> type(html)
<class 'bytes'>

由于python不知道这些字节是如何编码的,因此当您尝试在它们上使用字符串正则表达式时会引发异常。

您可以decode将字节存储为字符串:

html = html.decode('ISO-8859-1')  # encoding may vary!
title = re.findall(pattern, html)  # no more error

或使用字节正则表达式:

regex = rb'<title>(,+?)</title>'
#        ^

在此特定上下文中,您可以从响应标头中获取编码:

with urllib.request.urlopen(url) as response:
    encoding = response.info().get_param('charset', 'utf8')
    html = response.read().decode(encoding)

有关更多详细信息,请参见urlopen文档

The problem is that your regex is a string, but html is bytes:

>>> type(html)
<class 'bytes'>

Since python doesn’t know how those bytes are encoded, it throws an exception when you try to use a string regex on them.

You can either decode the bytes to a string:

html = html.decode('ISO-8859-1')  # encoding may vary!
title = re.findall(pattern, html)  # no more error

Or use a bytes regex:

regex = rb'<title>(,+?)</title>'
#        ^

In this particular context, you can get the encoding from the response headers:

with urllib.request.urlopen(url) as response:
    encoding = response.info().get_param('charset', 'utf8')
    html = response.read().decode(encoding)

See the urlopen documentation for more details.


ImportError:没有名为“编码”的模块

问题:ImportError:没有名为“编码”的模块

我最近重新安装了ubuntu并升级到16.04,无法使用python:

$ python manage.py runserver
Could not find platform independent libraries <prefix>
Could not find platform dependent libraries <exec_prefix>
Consider setting $PYTHONHOME to <prefix>[:<exec_prefix>]
Fatal Python error: Py_Initialize: Unable to get the locale encoding
ImportError: No module named 'encodings'
Aborted

在这一点上,python本身不起作用

$ python
Could not find platform independent libraries <prefix>
Could not find platform dependent libraries <exec_prefix>
Consider setting $PYTHONHOME to <prefix>[:<exec_prefix>]
Fatal Python error: Py_Initialize: Unable to get the locale encoding
ImportError: No module named 'encodings'
Aborted

甚至这个建议也不再起作用:

unset PYTHONHOME
unset PYTHONPATH

每一次我用一种方式修复它,它都会再次出现。有几个答案有助于暂时修复它,但不是永久性的。我已经重新安装了python和python3几次。我可以从这里做什么?谢谢

I recently reinstalled ubuntu and did upgrade to 16.04 and cannot use python:

$ python manage.py runserver
Could not find platform independent libraries <prefix>
Could not find platform dependent libraries <exec_prefix>
Consider setting $PYTHONHOME to <prefix>[:<exec_prefix>]
Fatal Python error: Py_Initialize: Unable to get the locale encoding
ImportError: No module named 'encodings'
Aborted

At this point, python itself doesn’t work

$ python
Could not find platform independent libraries <prefix>
Could not find platform dependent libraries <exec_prefix>
Consider setting $PYTHONHOME to <prefix>[:<exec_prefix>]
Fatal Python error: Py_Initialize: Unable to get the locale encoding
ImportError: No module named 'encodings'
Aborted

Even this suggestion is no longer working:

unset PYTHONHOME
unset PYTHONPATH

Every every I fix it one way, it comes back again. Several answers help to fix it temporarily, but not for good. I’ve reinstalled python and python3 several times. What can I do from here? Thank you


回答 0

对于Python-3,请尝试删除虚拟环境文件。并重新设置它。

rm -rf venv
virtualenv -p /usr/bin/python3 venv/
source venv/bin/activate
pip install -r requirements.txt

https://wiki.ubuntu.com/XenialXerus/ReleaseNotes#Python_3 编辑

For Python-3 try removing virtual environment files. And resetting it up.

rm -rf venv
virtualenv -p /usr/bin/python3 venv/
source venv/bin/activate
pip install -r requirements.txt

https://wiki.ubuntu.com/XenialXerus/ReleaseNotes#Python_3 edit fo


回答 1

对于Windows10用户。

我在Windows10上使用python3.4。我安装了python3.5。我找不到PYTHONPATH,PYTHONHOME env变量。如果我在CMD控制台中命令python,它将继续使用python3.4。我删除了python3.4。每当我在CMD控制台中命令python时,它就会开始显示如下错误。

Fatal Python error: Py_Initialize: Unable to get the locale encoding
ImportError: No module named 'encodings'

我搜寻以找出我的问题。解决方案很简单。安装python3.5时,可以自定义安装并在“高级选项”中选中“ 将Python添加到环境变量”

我只是在这里离开,以防有人遇到类似的问题来这里,以便他们不会浪费很多宝贵的时间来弄清楚。

For Windows10 User.

I was using python3.4 on Windows10. I installed python3.5. I couldn’t find PYTHONPATH, PYTHONHOME env variable. If I command python in CMD console, It kept using python3.4. I deleted python3.4. Whenever I command python in CMD console, it starts showing an error like below.

Fatal Python error: Py_Initialize: Unable to get the locale encoding
ImportError: No module named 'encodings'

I searched to figure out my problem. Solution was simple. When you install python3.5, you can custom install and check Add Python to environment variables in Advanced Options.

I just leave here for case that someone have similar issues visit here so that they don’t waste their precious time much to figure out.


回答 2

我在Windows7下也面临同样的问题。错误消息如下所示:

Fatal Python error: Py_Initialize: unable to load the file system codec
ModuleNotFoundError: No module named 'encodings'

Current thread 0x000011f4 (most recent call first):

我已经安装了python 2.7(现在已卸载),并且在安装python 3.6时选中了“将Python添加到高级选项中的环境变量”。结果表明,环境变量“ PYTHONHOME ”和“ PYTHONPATH ”仍然是python2.7。

最后,我通过将“ PYTHONHOME ” 修改为python3.6安装路径并删除了变量“ PYTHONPATH ” 来解决了该问题。

I was facing the same problem under Windows7. The error message looks like that:

Fatal Python error: Py_Initialize: unable to load the file system codec
ModuleNotFoundError: No module named 'encodings'

Current thread 0x000011f4 (most recent call first):

I have installed python 2.7(uninstalled now), and I checked “Add Python to environment variables in Advanced Options” while installing python 3.6. It comes out that the Environment Variable “PYTHONHOME” and “PYTHONPATH” is still python2.7.

Finally I solved it by modify “PYTHONHOME” to python3.6 install path and remove variable “PYTHONPATH“.


回答 3

对于Windows7上的相同问题

如果您的环境变量/系统变量设置不正确,您将看到这样的错误:

Fatal Python error: Py_Initialize: unable to load the file system codec
ImportError: No module named 'encodings'

Current thread 0x00001db4 (most recent call first):

解决这个问题非常简单:

  1. 当您下载Python3.x版本并运行.exe文件时,它为您提供了一个自定义系统中要安装Python位置的选项。例如,我选择了以下位置:C:\ Program Files \ Python36

  2. 然后打开系统属性,然后转到“ 高级 ”选项卡(或者您可以简单地做到这一点:转到“开始”>“搜索” 环境变量 ”>单击“编辑系统环境变量”。)在“高级”选项卡下,查找单击“环境变量”,然后单击它。将会弹出另一个名为“环境变量”的窗口。

  3. 现在,确保您的用户变量具有“路径变量”中列出的正确的Python路径。在这里的示例中,您应该看到C:\ Program Files \ Python36。如果在此处找不到它,则通过选择“路径变量”字段并单击“编辑”来添加它。

  4. 最后一步是在同一窗口中的系统变量下仔细检查PYTHONHOMEPYTHONPATH字段。您应该看到与上述相同的路径。如果没有也添加它。

然后单击“确定”并返回到CMD终端,然后尝试检查python。现在应解决此问题。它为我工作。

For the same issue on Windows7

You will see an error like this if your environment variables/ system variables are incorrectly set:

Fatal Python error: Py_Initialize: unable to load the file system codec
ImportError: No module named 'encodings'

Current thread 0x00001db4 (most recent call first):

Fixing this is really simple:

  1. When you download Python3.x version, and run the .exe file, it gives you an option to customize where in your system you want to install Python. For example, I chose this location: C:\Program Files\Python36

  2. Then open system properties and go to “Advanced” tab (Or you can simply do this: Go to Start > Search for “environment variables” > Click on “Edit the system environment variables”.) Under the “Advanced” tab, look for “Environment Variables” and click it. Another window with name “Environment Variables” will pop up.

  3. Now make sure your user variables have the correct Python path listed in “Path Variable”. In my example here, you should see C:\Program Files\Python36. If you do not find it there, add it, by selecting Path Variable field and clicking Edit.

  4. Last step is to double-check PYTHONHOME and PYTHONPATH fields under System Variables in the same window. You should see the same path as described above. If not add it there too.

Then click OK and go back to CMD terminal, and try checking for python. The issue should now be resolved. It worked for me.


回答 4

在迁移到Ubuntu 17.10的过程中出现了此错误,这解决了问题:

sudo dpkg-reconfigure python3

也许您必须关闭会话并重新连接。

I had this error during migration to Ubuntu 17.10, and this solved the problem :

sudo dpkg-reconfigure python3

Maybe you will have to close your session and reconnect.


回答 5

查看/lib/python3.5,您将看到指向python库的断开链接。将其重新创建到工作目录。

下一个错误-

./script/bin/pip3
Failed to import the site module
Traceback (most recent call last):
  File "/home/script/script/lib/python3.5/site.py", line 703, in <module>
    main()
  File "/home/script/script/lib/python3.5/site.py", line 683, in main
    paths_in_sys = addsitepackages(paths_in_sys)
  File "/home/script/script/lib/python3.5/site.py", line 282, in addsitepackages
    addsitedir(sitedir, known_paths)
  File "/home/script/script/lib/python3.5/site.py", line 204, in addsitedir
    addpackage(sitedir, name, known_paths)
  File "/home/script/script/lib/python3.5/site.py", line 173, in addpackage
    exec(line)
  File "<string>", line 1, in <module>
  File "/home/script/script/lib/python3.5/types.py", line 166, in <module>
    import functools as _functools
  File "/home/script/script/lib/python3.5/functools.py", line 23, in <module>
    from weakref import WeakKeyDictionary
  File "/home/script/script/lib/python3.5/weakref.py", line 12, in <module>
    from _weakref import (
ImportError: cannot import name '_remove_dead_weakref'

像这样固定-https: //askubuntu.com/questions/907035/importerror-cannot-import-name-remove-dead-weakref

cd my-virtualenv-directory
virtualenv . --system-site-packages

Look at /lib/python3.5 and you will see broken links to python libraries. Recreate it to working directory.

Next error –

./script/bin/pip3
Failed to import the site module
Traceback (most recent call last):
  File "/home/script/script/lib/python3.5/site.py", line 703, in <module>
    main()
  File "/home/script/script/lib/python3.5/site.py", line 683, in main
    paths_in_sys = addsitepackages(paths_in_sys)
  File "/home/script/script/lib/python3.5/site.py", line 282, in addsitepackages
    addsitedir(sitedir, known_paths)
  File "/home/script/script/lib/python3.5/site.py", line 204, in addsitedir
    addpackage(sitedir, name, known_paths)
  File "/home/script/script/lib/python3.5/site.py", line 173, in addpackage
    exec(line)
  File "<string>", line 1, in <module>
  File "/home/script/script/lib/python3.5/types.py", line 166, in <module>
    import functools as _functools
  File "/home/script/script/lib/python3.5/functools.py", line 23, in <module>
    from weakref import WeakKeyDictionary
  File "/home/script/script/lib/python3.5/weakref.py", line 12, in <module>
    from _weakref import (
ImportError: cannot import name '_remove_dead_weakref'

fixed like this – https://askubuntu.com/questions/907035/importerror-cannot-import-name-remove-dead-weakref

cd my-virtualenv-directory
virtualenv . --system-site-packages

回答 6

更新到macOS Catalina后,我遇到了此问题“ ModuleNotFoundError:没有名为’encodings的模块”。

我在系统中安装了多个版本的Python。

从macOS系统中删除所有python版本(2.7和3.7.4)并重新安装最新的python 3.8对我来说很有效。

要从macOS中删除python,我已遵循此处的说明如何在Mac OS X 10.6.4上卸载Python 2.7?

上面的链接适用于python 2.7,但是您也可以将其用于3.7。

I was facing this issue “ModuleNotFoundError: No module named ‘encodings” after updating to macOS Catalina.

I was having multiple versions of Python installed in my system.

Removing all the python versions(2.7 and 3.7.4) from macOS system and reinstalling the latest python 3.8 worked for me.

To remove a python from macOS, I’ve followed the instructions from here How to uninstall Python 2.7 on a Mac OS X 10.6.4?

The above link is for python 2.7 and but you can use the same for 3.7 also.


回答 7

我有一个类似的问题。我在计算机上同时安装了anaconda和python,而我的python依赖项则来自Anaconda目录。当我卸载Anaconda时,此错误开始弹出。我加了,PYTHONPATH但还是没去。我检查了python -version一下,发现它仍在沿用Python之路。我不得不手动删除Anaconda3目录,然后python开始从接收依赖PYTHONPATH
问题已解决!

I had a similar issue. I had both anaconda and python installed on my computer and my python dependencies were from the Anaconda directory. When I uninstalled Anaconda, this error started popping. I added PYTHONPATH but it still didn’t go. I checked with python -version and go to know that it was still taking the anaconda path. I had to manually delete Anaconda3 directory and after that python started taking dependencies from PYTHONPATH.
Issue Solved!


回答 8

使用时将我的mac更新到macOS Catalina时遇到了相同的问题pipenv。Pipenv virtualenv为您创建并管理一个,因此@ Anoop-Malav的早期建议是相同的,只是使用pipenv根据当前目录删除虚拟环境并重置它:

pipenv --rm
pipenv shell  # recreate a virtual env with your current Pipfile

Had the same problem when updating my mac to macOS Catalina, while using pipenv. Pipenv creates and manages a virtualenv for you, so the earlier suggestion from @Anoop-Malav is the same, just using pipenv to remove the virtual environment based on the current dir and reset it:

pipenv --rm
pipenv shell  # recreate a virtual env with your current Pipfile

回答 9

在我的情况下,只需更改anaconda文件夹的权限即可:

sudo chmod -R u=rwx,g=rx,o=rx /path/to/anaconda   

In my case just changing the permissions of anaconda folder worked:

sudo chmod -R u=rwx,g=rx,o=rx /path/to/anaconda   

回答 10

因为这是google的第一个结果,所以我只想向其他有监狱问题的人添加以下信息:

Could not find platform independent libraries <prefix>
Could not find platform dependent libraries <exec_prefix>
Consider setting $PYTHONHOME to <prefix>[:<exec_prefix>]
Fatal Python error: Py_Initialize: Unable to get the locale encoding
ModuleNotFoundError: No module named 'encodings'

Current thread 0x00007f079b16d740 (most recent call first):
Aborted (core dumped)

尝试将python导入监狱时,您都需要将依赖项和/usr/lib/pythonX.Y链接到[JAIL] / usr / lib /。希望这可以帮助。

Because this is the first result in google I just want to add the following information for anybody else having problems with jails:

Could not find platform independent libraries <prefix>
Could not find platform dependent libraries <exec_prefix>
Consider setting $PYTHONHOME to <prefix>[:<exec_prefix>]
Fatal Python error: Py_Initialize: Unable to get the locale encoding
ModuleNotFoundError: No module named 'encodings'

Current thread 0x00007f079b16d740 (most recent call first):
Aborted (core dumped)

When attempting to import python into your jail you both need to link the dependencies and /usr/lib/pythonX.Y to [JAIL]/usr/lib/. Hope this helps.


回答 11

只需转到File-> Settings->在Project选项卡下选择Project Interpreter->单击小齿轮图标-> Add-> System Interpreter->在下拉菜单中选择所需的python版本

这似乎对我有用

Just go to File -> Settings -> select Project Interpreter under Project tab -> click on the small gear icon -> Add -> System Interpreter -> select the python version you want in the drop down menu

this seemed to work for me


回答 12

我也可以解决这个问题。PYTHONPATH和PYTHONHOME是原因。

在终端上运行

   touch ~/.bash_profile
   open ~/.bash_profile

然后删除此文件的所有无用部分,然后保存。我不知道这样做是多么推荐!

I could also fix this. PYTHONPATH and PYTHONHOME were in cause.

run this in a terminal

   touch ~/.bash_profile
   open ~/.bash_profile

and then delete all useless parts of this file, and save. I do not know how recommended it is to do that !


DeprecationWarning:无效的转义序列-使用什么代替\ d?

问题:DeprecationWarning:无效的转义序列-使用什么代替\ d?

re在Python 3.6.5中遇到了模块问题。我的正则表达式中有以下模式:

'\\nRevision: (\d+)\\n'

但是,当我运行它时,我得到了DeprecationWarning

在SO上搜索了问题,但没有找到答案,实际上-我应该用什么代替\d+?只是[0-9]+还是其他?

I’ve met a problem with re module in Python 3.6.5. I have this pattern in my regular expression:

'\\nRevision: (\d+)\\n'

But when I run it, I’m getting a DeprecationWarning.

I searched for the problem on SO, and haven’t found the answer, actually – what should I use instead of \d+? Just [0-9]+ or maybe something else?


回答 0

Python 3将字符串文字解释为Unicode字符串,因此您\d被视为转义的Unicode字符。

将RegEx模式声明为原始字符串,而不是通过在前面加上r,如下所示:

r'\nRevision: (\d+)\n'

这也意味着您也可以删除转义\n符,因为这些转义符仅会被解析为换行符re

Python 3 interprets string literals as Unicode strings, and therefore your \d is treated as an escaped Unicode character.

Declare your RegEx pattern as a raw string instead by prepending r, as below:

r'\nRevision: (\d+)\n'

This also means you can drop the escapes for \n as well since these will just be parsed as newline characters by re.


Python 3中的多处理与多线程与异步

问题:Python 3中的多处理与多线程与异步

我发现在Python 3.4中,用于多处理/线程的库很少:多处理 vs 线程asyncio

但是我不知道使用哪个,或者是“推荐的”。他们做的是同一件事还是不同?如果是这样,则将哪一个用于什么?我想编写一个在计算机上使用多核的程序。但是我不知道我应该学习哪个图书馆。

I found that in Python 3.4 there are few different libraries for multiprocessing/threading: multiprocessing vs threading vs asyncio.

But I don’t know which one to use or is the “recommended one”. Do they do the same thing, or are different? If so, which one is used for what? I want to write a program that uses multicores in my computer. But I don’t know which library I should learn.


回答 0

它们旨在(略有)不同的目的和/或要求。CPython(典型的主线Python实现)仍然具有全局解释器锁,因此多线程应用程序(当今实现并行处理的标准方式)不是最佳选择。这就是为什么multiprocessing 可能要优先于threading。但是并不是每个问题都可以有效地分解为[几乎独立的]部分,因此可能需要大量的进程间通信。这就是为什么multiprocessing可能不被threading普遍推荐的原因。

asyncio(该技术不仅在Python中可用,其他语言和/或框架也有此技术,例如Boost.ASIO)是一种有效处理来自许多同时源的大量I / O操作而无需并行代码执行的方法。 。因此,这仅是针对特定任务的解决方案(确实是一个不错的方案!),而不是通常用于并行处理的解决方案。

They are intended for (slightly) different purposes and/or requirements. CPython (a typical, mainline Python implementation) still has the global interpreter lock so a multi-threaded application (a standard way to implement parallel processing nowadays) is suboptimal. That’s why multiprocessing may be preferred over threading. But not every problem may be effectively split into [almost independent] pieces, so there may be a need in heavy interprocess communications. That’s why multiprocessing may not be preferred over threading in general.

asyncio (this technique is available not only in Python, other languages and/or frameworks also have it, e.g. Boost.ASIO) is a method to effectively handle a lot of I/O operations from many simultaneous sources w/o need of parallel code execution. So it’s just a solution (a good one indeed!) for a particular task, not for parallel processing in general.


回答 1

[快速回答]

TL; DR

做出正确的选择:

我们介绍了最流行的并发形式。但是问题仍然存在-什么时候应该选择哪个?这实际上取决于用例。根据我的经验(和阅读),我倾向于遵循以下伪代码:

if io_bound:
    if io_very_slow:
        print("Use Asyncio")
    else:
        print("Use Threads")
else:
    print("Multi Processing")
  • CPU限制=>多处理
  • I / O绑定,快速I / O,有限的连接数=>多线程
  • I / O受限,I / O缓慢,许多连接=> Asyncio

参考


[ 注意 ]:

  • 如果您使用的是长调用方法(即,包含在睡眠时间或惰性I / O中的方法),则最佳选择是asyncioTwistedTornado方法(协程方法),该方法可以与单个线程并发工作。
  • asyncio适用于Python3.4及更高版本。
  • 自从Python2.7开始,TornadoTwisted已经准备就绪
  • uvloop是超快速asyncio事件循环(uvloop使asyncio速度提高2-4倍)。

[更新(2019)]:

  • Japranto GitHub是一个基于uvloop的非常快速的管道HTTP服务器。

[Quick Answer]

TL;DR

Making the Right Choice:

We have walked through the most popular forms of concurrency. But the question remains – when should choose which one? It really depends on the use cases. From my experience (and reading), I tend to follow this pseudo code:

if io_bound:
    if io_very_slow:
        print("Use Asyncio")
    else:
        print("Use Threads")
else:
    print("Multi Processing")
  • CPU Bound => Multi Processing
  • I/O Bound, Fast I/O, Limited Number of Connections => Multi Threading
  • I/O Bound, Slow I/O, Many connections => Asyncio

Reference


[NOTE]:

  • If you have a long call method (i.e. a method that contained with a sleep time or lazy I/O), the best choice is asyncio, Twisted or Tornado approach (coroutine methods), that works with a single thread as concurrency.
  • asyncio works on Python3.4 and later.
  • Tornado and Twisted are ready since Python2.7
  • uvloop is ultra fast asyncio event loop (uvloop makes asyncio 2-4x faster).

[UPDATE (2019)]:

  • Japranto (GitHub) is a very fast pipelining HTTP server based on uvloop.

回答 2

这是基本思想:

IO- BOUND吗?———>使用asyncio

它是CPU- HEAVY吗?—–>使用multiprocessing

其他吗?———————->使用threading

因此,除非您遇到IO / CPU问题,否则基本上要坚持使用线程。

This is the basic idea:

Is it IO-BOUND ? ———> USE asyncio

IS IT CPU-HEAVY ? —–> USE multiprocessing

ELSE ? ———————-> USE threading

So basically stick to threading unless you have IO/CPU problems.


回答 3

多处理中,您利用多个CPU来分配您的计算。由于每个CPU并行运行,因此您可以有效地同时运行多个任务。您可能希望对CPU绑定的任务使用多处理。一个示例将尝试计算巨大列表中所有元素的总和。如果您的计算机具有8个核心,则可以将列表“切割”为8个较小的列表,并分别在单独的核心上计算每个列表的总和,然后将这些数字相加即可。这样您将获得约8倍的加速。

穿线您不需要多个CPU。想象一个程序向网络发送大量HTTP请求。如果使用单线程程序,它将在每个请求处停止执行(块),等待响应,然后在收到响应后继续执行。这里的问题是,在等待某些外部服务器执行任务时,您的CPU并未真正在工作。同时,它实际上可以做一些有用的工作!解决方法是使用线程-您可以创建多个线程,每个线程负责从Web请求一些内容。关于线程的好处是,即使它们在一个CPU上运行,CPU也会不时地“冻结”一个线程的执行并跳转到执行另一个线程(这称为上下文切换,并且它在不确定性下不断发生)间隔)。 -使用线程。

asyncio本质上是线程化,而不是CPU,而是由您(作为程序员(或实际上是您的应用程序))决定上下文切换的时间和地点。在Python中,您可以使用await关键字来暂停协程的执行(使用async关键字定义)。

In multiprocessing you leverage multiple CPUs to distribute your calculations. Since each of the CPUs runs in parallel, you’re effectively able to run multiple tasks simultaneously. You would want to use multiprocessing for CPU-bound tasks. An example would be trying to calculate a sum of all elements of a huge list. If your machine has 8 cores, you can “cut” the list into 8 smaller lists and calculate the sum of each of those lists separately on separate core and then just add up those numbers. You’ll get a ~8x speedup by doing that.

In (multi)threading you don’t need multiple CPUs. Imagine a program that sends lots of HTTP requests to the web. If you used a single-threaded program, it would stop the execution (block) at each request, wait for a response, and then continue once received a response. The problem here is that your CPU isn’t really doing work while waiting for some external server to do the job; it could have actually done some useful work in the meantime! The fix is to use threads – you can create many of them, each responsible for requesting some content from the web. The nice thing about threads is that, even if they run on one CPU, the CPU from time to time “freezes” the execution of one thread and jumps to executing the other one (it’s called context switching and it happens constantly at non-deterministic intervals). So if your task is I/O bound – use threading.

asyncio is essentially threading where not the CPU but you, as a programmer (or actually your application), decide where and when does the context switch happen. In Python you use an await keyword to suspend the execution of your coroutine (defined using async keyword).


开箱,扩展开箱和嵌套扩展开箱

问题:开箱,扩展开箱和嵌套扩展开箱

请考虑以下表达式。注意,某些表达被重复以表示“上下文”。

(这是一长串)

a, b = 1, 2                          # simple sequence assignment
a, b = ['green', 'blue']             # list asqignment
a, b = 'XY'                          # string assignment
a, b = range(1,5,2)                  # any iterable will do


                                     # nested sequence assignment

(a,b), c = "XY", "Z"                 # a = 'X', b = 'Y', c = 'Z' 

(a,b), c = "XYZ"                     # ERROR -- too many values to unpack
(a,b), c = "XY"                      # ERROR -- need more than 1 value to unpack

(a,b), c, = [1,2],'this'             # a = '1', b = '2', c = 'this'
(a,b), (c,) = [1,2],'this'           # ERROR -- too many values to unpack


                                     # extended sequence unpacking

a, *b = 1,2,3,4,5                    # a = 1, b = [2,3,4,5]
*a, b = 1,2,3,4,5                    # a = [1,2,3,4], b = 5
a, *b, c = 1,2,3,4,5                 # a = 1, b = [2,3,4], c = 5

a, *b = 'X'                          # a = 'X', b = []
*a, b = 'X'                          # a = [], b = 'X'
a, *b, c = "XY"                      # a = 'X', b = [], c = 'Y'
a, *b, c = "X...Y"                   # a = 'X', b = ['.','.','.'], c = 'Y'

a, b, *c = 1,2,3                     # a = 1, b = 2, c = [3]
a, b, c, *d = 1,2,3                  # a = 1, b = 2, c = 3, d = []

a, *b, c, *d = 1,2,3,4,5             # ERROR -- two starred expressions in assignment

(a,b), c = [1,2],'this'              # a = '1', b = '2', c = 'this'
(a,b), *c = [1,2],'this'             # a = '1', b = '2', c = ['this']

(a,b), c, *d = [1,2],'this'          # a = '1', b = '2', c = 'this', d = []
(a,b), *c, d = [1,2],'this'          # a = '1', b = '2', c = [], d = 'this'

(a,b), (c, *d) = [1,2],'this'        # a = '1', b = '2', c = 't', d = ['h', 'i', 's']

*a = 1                               # ERROR -- target must be in a list or tuple
*a = (1,2)                           # ERROR -- target must be in a list or tuple
*a, = (1,2)                          # a = [1,2]
*a, = 1                              # ERROR -- 'int' object is not iterable
*a, = [1]                            # a = [1]
*a = [1]                             # ERROR -- target must be in a list or tuple
*a, = (1,)                           # a = [1]
*a, = (1)                            # ERROR -- 'int' object is not iterable

*a, b = [1]                          # a = [], b = 1
*a, b = (1,)                         # a = [], b = 1

(a,b),c = 1,2,3                      # ERROR -- too many values to unpack
(a,b), *c = 1,2,3                    # ERROR - 'int' object is not iterable
(a,b), *c = 'XY', 2, 3               # a = 'X', b = 'Y', c = [2,3]


                                     # extended sequence unpacking -- NESTED

(a,b),c = 1,2,3                      # ERROR -- too many values to unpack
*(a,b), c = 1,2,3                    # a = 1, b = 2, c = 3

*(a,b) = 1,2                         # ERROR -- target must be in a list or tuple
*(a,b), = 1,2                        # a = 1, b = 2

*(a,b) = 'XY'                        # ERROR -- target must be in a list or tuple
*(a,b), = 'XY'                       # a = 'X', b = 'Y'

*(a, b) = 'this'                     # ERROR -- target must be in a list or tuple
*(a, b), = 'this'                    # ERROR -- too many values to unpack
*(a, *b), = 'this'                   # a = 't', b = ['h', 'i', 's']

*(a, *b), c = 'this'                 # a = 't', b = ['h', 'i'], c = 's'

*(a,*b), = 1,2,3,3,4,5,6,7           # a = 1, b = [2, 3, 3, 4, 5, 6, 7]

*(a,*b), *c = 1,2,3,3,4,5,6,7        # ERROR -- two starred expressions in assignment
*(a,*b), (*c,) = 1,2,3,3,4,5,6,7     # ERROR -- 'int' object is not iterable
*(a,*b), c = 1,2,3,3,4,5,6,7         # a = 1, b = [2, 3, 3, 4, 5, 6], c = 7
*(a,*b), (*c,) = 1,2,3,4,5,'XY'      # a = 1, b = [2, 3, 4, 5], c = ['X', 'Y']

*(a,*b), c, d = 1,2,3,3,4,5,6,7      # a = 1, b = [2, 3, 3, 4, 5], c = 6, d = 7
*(a,*b), (c, d) = 1,2,3,3,4,5,6,7    # ERROR -- 'int' object is not iterable
*(a,*b), (*c, d) = 1,2,3,3,4,5,6,7   # ERROR -- 'int' object is not iterable
*(a,*b), *(c, d) = 1,2,3,3,4,5,6,7   # ERROR -- two starred expressions in assignment


*(a,b), c = 'XY', 3                  # ERROR -- need more than 1 value to unpack
*(*a,b), c = 'XY', 3                 # a = [], b = 'XY', c = 3
(a,b), c = 'XY', 3                   # a = 'X', b = 'Y', c = 3

*(a,b), c = 'XY', 3, 4               # a = 'XY', b = 3, c = 4
*(*a,b), c = 'XY', 3, 4              # a = ['XY'], b = 3, c = 4
(a,b), c = 'XY', 3, 4                # ERROR -- too many values to unpack

如何手工正确地推论这些表达式的结果?

Consider the following expressions. Note that some expressions are repeated to present the “context”.

(this is a long list)

a, b = 1, 2                          # simple sequence assignment
a, b = ['green', 'blue']             # list asqignment
a, b = 'XY'                          # string assignment
a, b = range(1,5,2)                  # any iterable will do


                                     # nested sequence assignment

(a,b), c = "XY", "Z"                 # a = 'X', b = 'Y', c = 'Z' 

(a,b), c = "XYZ"                     # ERROR -- too many values to unpack
(a,b), c = "XY"                      # ERROR -- need more than 1 value to unpack

(a,b), c, = [1,2],'this'             # a = '1', b = '2', c = 'this'
(a,b), (c,) = [1,2],'this'           # ERROR -- too many values to unpack


                                     # extended sequence unpacking

a, *b = 1,2,3,4,5                    # a = 1, b = [2,3,4,5]
*a, b = 1,2,3,4,5                    # a = [1,2,3,4], b = 5
a, *b, c = 1,2,3,4,5                 # a = 1, b = [2,3,4], c = 5

a, *b = 'X'                          # a = 'X', b = []
*a, b = 'X'                          # a = [], b = 'X'
a, *b, c = "XY"                      # a = 'X', b = [], c = 'Y'
a, *b, c = "X...Y"                   # a = 'X', b = ['.','.','.'], c = 'Y'

a, b, *c = 1,2,3                     # a = 1, b = 2, c = [3]
a, b, c, *d = 1,2,3                  # a = 1, b = 2, c = 3, d = []

a, *b, c, *d = 1,2,3,4,5             # ERROR -- two starred expressions in assignment

(a,b), c = [1,2],'this'              # a = '1', b = '2', c = 'this'
(a,b), *c = [1,2],'this'             # a = '1', b = '2', c = ['this']

(a,b), c, *d = [1,2],'this'          # a = '1', b = '2', c = 'this', d = []
(a,b), *c, d = [1,2],'this'          # a = '1', b = '2', c = [], d = 'this'

(a,b), (c, *d) = [1,2],'this'        # a = '1', b = '2', c = 't', d = ['h', 'i', 's']

*a = 1                               # ERROR -- target must be in a list or tuple
*a = (1,2)                           # ERROR -- target must be in a list or tuple
*a, = (1,2)                          # a = [1,2]
*a, = 1                              # ERROR -- 'int' object is not iterable
*a, = [1]                            # a = [1]
*a = [1]                             # ERROR -- target must be in a list or tuple
*a, = (1,)                           # a = [1]
*a, = (1)                            # ERROR -- 'int' object is not iterable

*a, b = [1]                          # a = [], b = 1
*a, b = (1,)                         # a = [], b = 1

(a,b),c = 1,2,3                      # ERROR -- too many values to unpack
(a,b), *c = 1,2,3                    # ERROR - 'int' object is not iterable
(a,b), *c = 'XY', 2, 3               # a = 'X', b = 'Y', c = [2,3]


                                     # extended sequence unpacking -- NESTED

(a,b),c = 1,2,3                      # ERROR -- too many values to unpack
*(a,b), c = 1,2,3                    # a = 1, b = 2, c = 3

*(a,b) = 1,2                         # ERROR -- target must be in a list or tuple
*(a,b), = 1,2                        # a = 1, b = 2

*(a,b) = 'XY'                        # ERROR -- target must be in a list or tuple
*(a,b), = 'XY'                       # a = 'X', b = 'Y'

*(a, b) = 'this'                     # ERROR -- target must be in a list or tuple
*(a, b), = 'this'                    # ERROR -- too many values to unpack
*(a, *b), = 'this'                   # a = 't', b = ['h', 'i', 's']

*(a, *b), c = 'this'                 # a = 't', b = ['h', 'i'], c = 's'

*(a,*b), = 1,2,3,3,4,5,6,7           # a = 1, b = [2, 3, 3, 4, 5, 6, 7]

*(a,*b), *c = 1,2,3,3,4,5,6,7        # ERROR -- two starred expressions in assignment
*(a,*b), (*c,) = 1,2,3,3,4,5,6,7     # ERROR -- 'int' object is not iterable
*(a,*b), c = 1,2,3,3,4,5,6,7         # a = 1, b = [2, 3, 3, 4, 5, 6], c = 7
*(a,*b), (*c,) = 1,2,3,4,5,'XY'      # a = 1, b = [2, 3, 4, 5], c = ['X', 'Y']

*(a,*b), c, d = 1,2,3,3,4,5,6,7      # a = 1, b = [2, 3, 3, 4, 5], c = 6, d = 7
*(a,*b), (c, d) = 1,2,3,3,4,5,6,7    # ERROR -- 'int' object is not iterable
*(a,*b), (*c, d) = 1,2,3,3,4,5,6,7   # ERROR -- 'int' object is not iterable
*(a,*b), *(c, d) = 1,2,3,3,4,5,6,7   # ERROR -- two starred expressions in assignment


*(a,b), c = 'XY', 3                  # ERROR -- need more than 1 value to unpack
*(*a,b), c = 'XY', 3                 # a = [], b = 'XY', c = 3
(a,b), c = 'XY', 3                   # a = 'X', b = 'Y', c = 3

*(a,b), c = 'XY', 3, 4               # a = 'XY', b = 3, c = 4
*(*a,b), c = 'XY', 3, 4              # a = ['XY'], b = 3, c = 4
(a,b), c = 'XY', 3, 4                # ERROR -- too many values to unpack

How to correctly deduce the result of such expressions by hand?


回答 0

对于这篇文章的篇幅,我深表歉意,但我决定选择完整性。

一旦您了解了一些基本规则,就可以将它们概括起来。我将尽力举例说明。由于您是在谈论“手工”评估,因此,我将建议一些简单的替换规则。基本上,如果所有可迭代对象的格式都相同,则可能会更容易理解表达式。

仅出于解压缩的目的,以下替换在()的右侧有效=(即,对于rvalues):

'XY' -> ('X', 'Y')
['X', 'Y'] -> ('X', 'Y')

如果发现值没有解包,则将撤消替换。(有关更多说明,请参见下文。)

另外,当您看到“裸”逗号时,请假装有一个顶级元组。在左侧和右侧都执行此操作(即,对于lvaluesrvalues):

'X', 'Y' -> ('X', 'Y')
a, b -> (a, b)

考虑到这些简单的规则,下面是一些示例:

(a,b), c = "XY", "Z"                 # a = 'X', b = 'Y', c = 'Z'

应用上述规则,我们将转换"XY"('X', 'Y'),并用括号括住裸逗号:

((a, b), c) = (('X', 'Y'), 'Z')

这里的视觉对应关系使分配工作原理非常明显。

这是一个错误的示例:

(a,b), c = "XYZ"

按照上述替换规则,我们得到以下内容:

((a, b), c) = ('X', 'Y', 'Z')

这显然是错误的;嵌套结构不匹配。现在,让我们来看一个稍微复杂的示例的工作方式:

(a,b), c, = [1,2],'this'             # a = '1', b = '2', c = 'this'

应用上述规则,我们得到

((a, b), c) = ((1, 2), ('t', 'h', 'i', 's'))

但是现在从结构上很明显,它'this'不会被解包,而是直接分配给c。因此,我们撤消替换。

((a, b), c) = ((1, 2), 'this')

现在,让我们看一下在包装c元组时会发生什么:

(a,b), (c,) = [1,2],'this'           # ERROR -- too many values to unpack

成为

((a, b), (c,)) = ((1, 2), ('t', 'h', 'i', 's'))

同样,该错误是显而易见的。c不再是裸变量,而是序列内的变量,因此右侧的相应序列被解包为(c,)。但是序列的长度不同,因此会出现错误。

现在使用*操作员扩展拆箱。这有点复杂,但仍然相当简单。*开头的变量将成为一个列表,其中包含相应序列中未分配给变量名称的所有项目。从一个非常简单的示例开始:

a, *b, c = "X...Y"                   # a = 'X', b = ['.','.','.'], c = 'Y'

这变成

(a, *b, c) = ('X', '.', '.', '.', 'Y')

分析此问题的最简单方法是从头开始工作。'X'被分配给a'Y'分配给c。序列中的其余值将放入列表中并分配给b

Lvalue像(*a, b)(a, *b)只是上述情况的特例。*一个左值序列内不能有两个运算符,因为这会造成歧义。该值会去哪里在这样的事情(a, *b, *c, d)-在bc?一会儿我将考虑嵌套案例。

*a = 1                               # ERROR -- target must be in a list or tuple

这里的错误是不言自明的。目标(*a)必须位于一个元组中。

*a, = (1,2)                          # a = [1,2]

这是有效的,因为有一个赤裸的逗号。正在应用规则…

(*a,) = (1, 2)

由于除以外没有其他变量*a*a所以将rvalue序列中的所有值都吸收掉。如果(1, 2)用单个值替换,该怎么办?

*a, = 1                              # ERROR -- 'int' object is not iterable

变成

(*a,) = 1

同样,这里的错误是不言自明的。您不能解压缩不是序列的*a东西,而需要解压缩东西。所以我们把它放在一个序列中

*a, = [1]                            # a = [1]

相当于

(*a,) = (1,)

最后,这是一个常见的混淆点:(1)1– 一样,您需要使用逗号将元组与算术语句区分开。

*a, = (1)                            # ERROR -- 'int' object is not 

现在进行嵌套。实际上,此示例不在您的“嵌套”部分中;也许您没有意识到它是嵌套的?

(a,b), *c = 'XY', 2, 3               # a = 'X', b = 'Y', c = [2,3]

成为

((a, b), *c) = (('X', 'Y'), 2, 3)

就像我们所期望的那样,顶级元组中的第一个值被分配,而顶级元组(23)中的其余值被分配给c

(a,b),c = 1,2,3                      # ERROR -- too many values to unpack
*(a,b), c = 1,2,3                    # a = 1, b = 2, c = 3

我已经在上面解释了为什么第一行引发错误。第二行很愚蠢,但这是它起作用的原因:

(*(a, b), c) = (1, 2, 3)

如前所述,我们从头开始。3被分配给c,然后将剩余的值被分配给具有可变*它前面,在这种情况下,(a, b)。因此,这等效于(a, b) = (1, 2),由于元素数量正确,因此碰巧可以使用。我不认为这会在工作代码中出现任何原因。同样,

*(a, *b), c = 'this'                 # a = 't', b = ['h', 'i'], c = 's'

变成

(*(a, *b), c) = ('t', 'h', 'i', 's')

从头开始工作,'s'分配给c,并('t', 'h', 'i')分配给(a, *b)。从头开始再次工作,'t'被分配给a,并被('h', 'i')分配给b作为列表。这是另一个愚蠢的示例,永远不要出现在工作代码中。

My apologies for the length of this post, but I decided to opt for completeness.

Once you know a few basic rules, it’s not hard to generalize them. I’ll do my best to explain with a few examples. Since you’re talking about evaluating these “by hand,” I’ll suggest some simple substitution rules. Basically, you might find it easier to understand an expression if all the iterables are formatted in the same way.

For the purposes of unpacking only, the following substitutions are valid on the right side of the = (i.e. for rvalues):

'XY' -> ('X', 'Y')
['X', 'Y'] -> ('X', 'Y')

If you find that a value doesn’t get unpacked, then you’ll undo the substitution. (See below for further explanation.)

Also, when you see “naked” commas, pretend there’s a top-level tuple. Do this on both the left and the right side (i.e. for lvalues and rvalues):

'X', 'Y' -> ('X', 'Y')
a, b -> (a, b)

With those simple rules in mind, here are some examples:

(a,b), c = "XY", "Z"                 # a = 'X', b = 'Y', c = 'Z'

Applying the above rules, we convert "XY" to ('X', 'Y'), and cover the naked commas in parens:

((a, b), c) = (('X', 'Y'), 'Z')

The visual correspondence here makes it fairly obvious how the assignment works.

Here’s an erroneous example:

(a,b), c = "XYZ"

Following the above substitution rules, we get the below:

((a, b), c) = ('X', 'Y', 'Z')

This is clearly erroneous; the nested structures don’t match up. Now let’s see how it works for a slightly more complex example:

(a,b), c, = [1,2],'this'             # a = '1', b = '2', c = 'this'

Applying the above rules, we get

((a, b), c) = ((1, 2), ('t', 'h', 'i', 's'))

But now it’s clear from the structure that 'this' won’t be unpacked, but assigned directly to c. So we undo the substitution.

((a, b), c) = ((1, 2), 'this')

Now let’s see what happens when we wrap c in a tuple:

(a,b), (c,) = [1,2],'this'           # ERROR -- too many values to unpack

Becomes

((a, b), (c,)) = ((1, 2), ('t', 'h', 'i', 's'))

Again, the error is obvious. c is no longer a naked variable, but a variable inside a sequence, and so the corresponding sequence on the right is unpacked into (c,). But the sequences have a different length, so there’s an error.

Now for extended unpacking using the * operator. This is a bit more complex, but it’s still fairly straightforward. A variable preceded by * becomes a list, which contains any items from the corresponding sequence that aren’t assigned to variable names. Starting with a fairly simple example:

a, *b, c = "X...Y"                   # a = 'X', b = ['.','.','.'], c = 'Y'

This becomes

(a, *b, c) = ('X', '.', '.', '.', 'Y')

The simplest way to analyze this is to work from the ends. 'X' is assigned to a and 'Y' is assigned to c. The remaining values in the sequence are put in a list and assigned to b.

Lvalues like (*a, b) and (a, *b) are just special cases of the above. You can’t have two * operators inside one lvalue sequence because it would be ambiguous. Where would the values go in something like this (a, *b, *c, d) — in b or c? I’ll consider the nested case in a moment.

*a = 1                               # ERROR -- target must be in a list or tuple

Here the error is fairly self-explanatory. The target (*a) must be in a tuple.

*a, = (1,2)                          # a = [1,2]

This works because there’s a naked comma. Applying the rules…

(*a,) = (1, 2)

Since there are no variables other than *a, *a slurps up all the values in the rvalue sequence. What if you replace the (1, 2) with a single value?

*a, = 1                              # ERROR -- 'int' object is not iterable

becomes

(*a,) = 1

Again, the error here is self-explanatory. You can’t unpack something that isn’t a sequence, and *a needs something to unpack. So we put it in a sequence

*a, = [1]                            # a = [1]

Which is eqivalent to

(*a,) = (1,)

Finally, this is a common point of confusion: (1) is the same as 1 — you need a comma to distinguish a tuple from an arithmetic statement.

*a, = (1)                            # ERROR -- 'int' object is not 

Now for nesting. Actually this example wasn’t in your “NESTED” section; perhaps you didn’t realize it was nested?

(a,b), *c = 'XY', 2, 3               # a = 'X', b = 'Y', c = [2,3]

Becomes

((a, b), *c) = (('X', 'Y'), 2, 3)

The first value in the top-level tuple gets assigned, and the remaining values in the top-level tuple (2 and 3) are assigned to c — just as we should expect.

(a,b),c = 1,2,3                      # ERROR -- too many values to unpack
*(a,b), c = 1,2,3                    # a = 1, b = 2, c = 3

I’ve already explained above why the first line throws an error. The second line is silly but here’s why it works:

(*(a, b), c) = (1, 2, 3)

As previously explained, we work from the ends. 3 is assigned to c, and then the remaining values are assigned to the variable with the * preceding it, in this case, (a, b). So that’s equivalent to (a, b) = (1, 2), which happens to work because there are the right number of elements. I can’t think of any reason this would ever appear in working code. Similarly,

*(a, *b), c = 'this'                 # a = 't', b = ['h', 'i'], c = 's'

becomes

(*(a, *b), c) = ('t', 'h', 'i', 's')

Working from the ends, 's' is assigned to c, and ('t', 'h', 'i') is assigned to (a, *b). Working again from the ends, 't' is assigned to a, and ('h', 'i') is assigned to b as a list. This is another silly example that should never appear in working code.


回答 1

我发现解压缩Python 2元组非常简单。左侧的每个名称都与整个序列或右侧序列中的单个项目相对应。如果名称对应于任何序列的单个项目,则必须有足够的名称来覆盖所有项目。

但是,扩展解压缩肯定会造成混乱,因为它是如此强大。现实情况是,您永远不应该再执行给出的最后10个或更多有效的示例-如果数据是结构化的,则应使用dict类或类实例,而不是列表等非结构化形式。

显然,新语法可能会被滥用。您的问题的答案是,您不必阅读这样的表达式-它们是一种不好的做法,我怀疑它们会被使用。

仅仅因为您可以编写任意复杂的表达式并不意味着您应该这样做。您可以像这样编写代码,map(map, iterable_of_transformations, map(map, iterable_of_transformations, iterable_of_iterables_of_iterables))但不能这样

I find the Python 2 tuple unpacking pretty straightforward. Each name on the left corresponds with either an entire sequence or a single item in a sequence on the right. If names correspond to single items of any sequence, then there must be enough names to cover all of the items.

Extended unpacking, however, can certainly be confusing, because it is so powerful. The reality is you should never be doing the last 10 or more valid examples you gave — if the data is that structured, it should be in a dict or a class instance, not unstructured forms like lists.

Clearly, the new syntax can be abused. The answer to your question is that you shouldn’t have to read expressions like that — they’re bad practice and I doubt they’ll be used.

Just because you can write arbitrarily complex expressions doesn’t mean you should. You could write code like map(map, iterable_of_transformations, map(map, iterable_of_transformations, iterable_of_iterables_of_iterables)) but you don’t.


回答 2

我认为您的代码可能会产生误导,请使用其他形式来表达它。

这就像在表达式中使用多余的括号来避免有关运算符优先级的问题。为了使您的代码易于阅读,我总是不花钱。

我更喜欢仅将拆包用于交换之类的简单任务。

I you think your code may be misleading use other form to express it.

It’s like using extra brackets in expressions to avoid questions about operators precedence. I’ts always a good investment to make your code readable.

I prefer to use unpacking only for simple tasks like swap.


这种奇怪的结肠行为在做什么?

问题:这种奇怪的结肠行为在做什么?

我正在使用Python 3.6.1,遇到了一些非常奇怪的事情。我有一个简单的字典作业错字,花了很长时间才找到。

context = {}
context["a"]: 2
print(context)

输出量

{}

代码context["a"]: 2在做什么?SyntaxErrorIMO应该什么时候都没有提出。起初,我认为它正在创建一个切片。但是,键入repr(context["a"]: 2)会引发SyntaxError。我也输入context["a"]: 2了控制台,但控制台没有打印任何内容。我以为也许它回来了None,但是我不太确定。

我还认为它可能是if语句的一行,但这也不应该是正确的语法。

此外,context["a"]应提出一个KeyError

我很困惑。到底是怎么回事?

I am using Python 3.6.1, and I have come across something very strange. I had a simple dictionary assignment typo that took me a long time to find.

context = {}
context["a"]: 2
print(context)

Output

{}

What is the code context["a"]: 2 doing? It doesn’t raise a SyntaxError when it should IMO. At first I thought it was creating a slice. However, typing repr(context["a"]: 2) raises a SyntaxError. I also typed context["a"]: 2 in the console and the console didn’t print anything. I thought maybe it returned None, but I’m not so sure.

I’ve also thought it could be a single line if statement, but that shouldn’t be the right syntax either.

Additionally, context["a"] should raise a KeyError.

I am perplexed. What is going on?


回答 0

您不小心编写了语法正确的变量注释。该功能是在Python 3.6中引入的(请参阅PEP 526)。

尽管将变量注释解析为带注释的赋值的一部分,但是赋值语句是可选的

annotated_assignment_stmt ::=  augtarget ":" expression ["=" expression]

因此,在 context["a"]: 2

  • context["a"] 是注释目标
  • 2 是注释本身
  • context["a"] 未初始化

PEP指出“注释的目标可以是任何有效的单个分配目标,至少在语法上(取决于类型检查器的操作)”,这意味着该密钥不需要存在带注释(因此没有KeyError)。这是原始PEP的示例:

d = {}
d['a']: int = 0  # Annotates d['a'] with int.
d['b']: int      # Annotates d['b'] with int.

通常情况下,注释表达式应该计算为Python类型-毕竟主要使用注解是类型提示,但不会强制执行。注释可以是任何有效的 Python表达式,无论结果的类型或值如何。

如您所见,此时,除非您具有诸如mypy之类的静态类型检查器,否则类型提示非常宽松,很少有用。

You have accidentally written a syntactically correct variable annotation. That feature was introduced in Python 3.6 (see PEP 526).

Although a variable annotation is parsed as part of an annotated assignment, the assignment statement is optional:

annotated_assignment_stmt ::=  augtarget ":" expression ["=" expression]

Thus, in context["a"]: 2

  • context["a"] is the annotation target
  • 2 is the annotation itself
  • context["a"] is left uninitialised

The PEP states that “the target of the annotation can be any valid single assignment target, at least syntactically (it is up to the type checker what to do with this)”, which means that the key doesn’t need to exist to be annotated (hence no KeyError). Here’s an example from the original PEP:

d = {}
d['a']: int = 0  # Annotates d['a'] with int.
d['b']: int      # Annotates d['b'] with int.

Normally, the annotation expression should evaluate to a Python type — after all the main use of annotations is type hinting, but it is not enforced. The annotation can be any valid Python expression, regardless of the type or value of the result.

As you can see, at this time type hints are very permissive and rarely useful, unless you have a static type checker such as mypy.


‘dict’对象没有属性’has_key’

问题:’dict’对象没有属性’has_key’

在Python中遍历图形时,我收到此错误:

‘dict’对象没有属性’has_key’

这是我的代码:

def find_path(graph, start, end, path=[]):
    path = path + [start]
    if start == end:
        return path
    if not graph.has_key(start):
        return None
    for node in graph[start]:
        if node not in path:
            newpath = find_path(graph, node, end, path)
            if newpath: return newpath
    return None

该代码旨在查找从一个节点到另一节点的路径。代码源:http : //cs.mwsu.edu/~terry/courses/4883/lectures/graphs.html

为什么会出现此错误,我该如何解决?

While traversing a graph in Python, a I’m receiving this error:

‘dict’ object has no attribute ‘has_key’

Here is my code:

def find_path(graph, start, end, path=[]):
    path = path + [start]
    if start == end:
        return path
    if not graph.has_key(start):
        return None
    for node in graph[start]:
        if node not in path:
            newpath = find_path(graph, node, end, path)
            if newpath: return newpath
    return None

The code aims to find the paths from one node to others. Code source: http://cs.mwsu.edu/~terry/courses/4883/lectures/graphs.html

Why am I getting this error and how can I fix it?


回答 0

has_key已在Python 3中删除。从文档中

  • 已删除dict.has_key()–请改用in运算符。

这是一个例子:

if start not in graph:
    return None

has_key was removed in Python 3. From the documentation:

  • Removed dict.has_key() – use the in operator instead.

Here’s an example:

if start not in graph:
    return None

回答 1

has_keyPython 3.0中已被弃用。或者,您可以使用“ in”

graph={'A':['B','C'],
   'B':['C','D']}

print('A' in graph)
>> True

print('E' in graph)
>> False

has_key has been deprecated in Python 3.0. Alternatively you can use ‘in’

graph={'A':['B','C'],
   'B':['C','D']}

print('A' in graph)
>> True

print('E' in graph)
>> False

回答 2

在python3中,has_key(key)被替换为__contains__(key)

在python3.7中测试:

a = {'a':1, 'b':2, 'c':3}
print(a.__contains__('a'))

In python3, has_key(key) is replaced by __contains__(key)

Tested in python3.7:

a = {'a':1, 'b':2, 'c':3}
print(a.__contains__('a'))

回答 3

我认为,仅in在确定某个键是否已存在时才使用它,它被认为是“更多的pythonic” ,如

if start not in graph:
    return None

I think it is considered “more pythonic” to just use in when determining if a key already exists, as in

if start not in graph:
    return None

回答 4

该文档中的整个代码将为:

graph = {'A': ['B', 'C'],
             'B': ['C', 'D'],
             'C': ['D'],
             'D': ['C'],
             'E': ['F'],
             'F': ['C']}
def find_path(graph, start, end, path=[]):
        path = path + [start]
        if start == end:
            return path
        if start not in graph:
            return None
        for node in graph[start]:
            if node not in path:
                newpath = find_path(graph, node, end, path)
                if newpath: return newpath
        return None

写入后,保存文档并按F 5

之后,您将在Python IDLE shell中运行的代码为:

find_path(图,’A’,’D’)

您应该在“ IDLE”中收到的答案是

['A', 'B', 'C', 'D'] 

The whole code in the document will be:

graph = {'A': ['B', 'C'],
             'B': ['C', 'D'],
             'C': ['D'],
             'D': ['C'],
             'E': ['F'],
             'F': ['C']}
def find_path(graph, start, end, path=[]):
        path = path + [start]
        if start == end:
            return path
        if start not in graph:
            return None
        for node in graph[start]:
            if node not in path:
                newpath = find_path(graph, node, end, path)
                if newpath: return newpath
        return None

After writing it, save the document and press F 5

After that, the code you will run in the Python IDLE shell will be:

find_path(graph, ‘A’,’D’)

The answer you should receive in IDLE is

['A', 'B', 'C', 'D'] 

回答 5

尝试:

if start not in graph:

有关更多信息,请参见ProgrammerSought

Try:

if start not in graph:

For more info see ProgrammerSought


从Python中的相对路径导入

问题:从Python中的相对路径导入

我有一个用于客户代码的文件夹,一个用于我的服务器代码的文件夹,以及一个在他们之间共享的代码的文件夹

Proj/
    Client/
        Client.py
    Server/
        Server.py
    Common/
        __init__.py
        Common.py

如何从Server.py和Client.py导入Common.py?

I have a folder for my client code, a folder for my server code, and a folder for code that is shared between them

Proj/
    Client/
        Client.py
    Server/
        Server.py
    Common/
        __init__.py
        Common.py

How do I import Common.py from Server.py and Client.py?


回答 0

编辑2014年11月(3年后):

Python 2.6和3.x支持适当的相对导入,在这里您可以避免做任何棘手的事情。使用这种方法,您知道您得到的是相对导入而不是绝对导入。“ ..”表示转到我上方的目录:

from ..Common import Common

请注意,仅当您从包外部将python作为模块运行时,此方法才有效。例如:

python -m Proj

原始的骇客方式

在某些情况下,实际上您从来没有“安装”软件包,这种方法仍然很常用。例如,它在Django用户中很流行。

您可以将Common /添加到您的sys.path中(python用来导入内容的路径列表):

import sys, os
sys.path.append(os.path.join(os.path.dirname(__file__), '..', 'Common'))
import Common

os.path.dirname(__file__) 只需为您提供当前python文件所在的目录,然后我们导航至该目录的“ Common /”并导入“ Common”模块。

EDIT Nov 2014 (3 years later):

Python 2.6 and 3.x supports proper relative imports, where you can avoid doing anything hacky. With this method, you know you are getting a relative import rather than an absolute import. The ‘..’ means, go to the directory above me:

from ..Common import Common

As a caveat, this will only work if you run your python as a module, from outside of the package. For example:

python -m Proj

Original hacky way

This method is still commonly used in some situations, where you aren’t actually ever ‘installing’ your package. For example, it’s popular with Django users.

You can add Common/ to your sys.path (the list of paths python looks at to import things):

import sys, os
sys.path.append(os.path.join(os.path.dirname(__file__), '..', 'Common'))
import Common

os.path.dirname(__file__) just gives you the directory that your current python file is in, and then we navigate to ‘Common/’ the directory and import ‘Common’ the module.


回答 1

有趣的是,我刚刚遇到了一个相同的问题,我可以通过以下方式获得这项工作:

结合linux命令ln,我们可以使事情变得更加简单:

1. cd Proj/Client
2. ln -s ../Common ./

3. cd Proj/Server
4. ln -s ../Common ./

而且,现在,如果some_stuff要从file:Proj/Common/Common.py导入到file:中Proj/Client/Client.py,就像这样:

# in Proj/Client/Client.py
from Common.Common import some_stuff

并且,同样适用于Proj/Server,也适用于setup.py过程, 此处讨论的相同问题,希望对您有所帮助!

Funny enough, a same problem I just met, and I get this work in following way:

combining with linux command ln , we can make thing a lot simper:

1. cd Proj/Client
2. ln -s ../Common ./

3. cd Proj/Server
4. ln -s ../Common ./

And, now if you want to import some_stuff from file: Proj/Common/Common.py into your file: Proj/Client/Client.py, just like this:

# in Proj/Client/Client.py
from Common.Common import some_stuff

And, the same applies to Proj/Server, Also works for setup.py process, a same question discussed here, hope it helps !


回答 2

不要做相对导入。

PEP8

强烈建议不要将相对进口用于包装内进口。

将所有代码放入一个超级包(即“ myapp”)中,并将子包用于客户端,服务器和通用代码。

更新:Python 2.6和3.x支持正确的相对导入(…) ”。有关更多详细信息,请参见Dave的答案

Don’t do relative import.

From PEP8:

Relative imports for intra-package imports are highly discouraged.

Put all your code into one super package (i.e. “myapp”) and use subpackages for client, server and common code.

Update:Python 2.6 and 3.x supports proper relative imports (…)“. See Dave’s answers for more details.


回答 3

进行相对导入绝对可以!这是我的小事:

#first change the cwd to the script path
scriptPath = os.path.realpath(os.path.dirname(sys.argv[0]))
os.chdir(scriptPath)

#append the relative location you want to import from
sys.path.append("../common")

#import your module stored in '../common'
import common.py

Doing a relative import is absolulutely OK! Here’s what little ‘ol me does:

#first change the cwd to the script path
scriptPath = os.path.realpath(os.path.dirname(sys.argv[0]))
os.chdir(scriptPath)

#append the relative location you want to import from
sys.path.append("../common")

#import your module stored in '../common'
import common.py

回答 4

从PYTHONPATH开始,默认的导入方法已经是“相对的”。默认情况下,PYTHONPATH是某些系统库以及原始源文件的文件夹。如果使用-m运行以运行模块,则当前目录将添加到PYTHONPATH中。因此,如果程序的入口点位于Proj内,则import Common.Common在Server.py和Client.py内都可以使用。

不要做相对导入。它不会按您希望的那样工作。

The default import method is already “relative”, from the PYTHONPATH. The PYTHONPATH is by default, to some system libraries along with the folder of the original source file. If you run with -m to run a module, the current directory gets added to the PYTHONPATH. So if the entry point of your program is inside of Proj, then using import Common.Common should work inside both Server.py and Client.py.

Don’t do a relative import. It won’t work how you want it to.