Python 实用宝典

Question 1

我0.19.1在Python 3上使用Pandas 。我在这些代码行上收到警告。我正在尝试获取一个包含所有Peter在column处存在string的行号的列表Unnamed: 5。

df = pd.read_excel(xls_path)
myRows = df[df['Unnamed: 5'] == 'Peter'].index.tolist()

它产生一个警告：

"\Python36\lib\site-packages\pandas\core\ops.py:792: FutureWarning: elementwise 
comparison failed; returning scalar, but in the future will perform 
elementwise comparison 
result = getattr(x, name)(y)"

这是什么FutureFarning，由于它似乎起作用，因此我应该忽略它。

Question 2

I am using Pandas 0.19.1 on Python 3. I am getting a warning on these lines of code. I’m trying to get a list that contains all the row numbers where string Peter is present at column Unnamed: 5.

df = pd.read_excel(xls_path)
myRows = df[df['Unnamed: 5'] == 'Peter'].index.tolist()

It produces a Warning:

"\Python36\lib\site-packages\pandas\core\ops.py:792: FutureWarning: elementwise 
comparison failed; returning scalar, but in the future will perform 
elementwise comparison 
result = getattr(x, name)(y)"

What is this FutureWarning and should I ignore it since it seems to work.

Question 3

此FutureWarning并非来自Pandas，而是来自numpy，并且该错误也影响了matplotlib和其他漏洞，以下是在更接近问题根源的位置重现警告的方法：

import numpy as np
print(np.__version__)   # Numpy version '1.12.0'
'x' in np.arange(5)       #Future warning thrown here

FutureWarning: elementwise comparison failed; returning scalar instead, but in the 
future will perform elementwise comparison
False

使用double equals运算符重现此错误的另一种方法：

import numpy as np
np.arange(5) == np.arange(5).astype(str)    #FutureWarning thrown here

受此FutureWarning影响的Matplotlib示例在其颤动图实施下：https ://matplotlib.org/examples/pylab_examples/quiver_demo.html

这里发生了什么？

当您将字符串与numpy的数字类型进行比较时，Numpy和本机python之间会发生什么分歧。请注意，左操作数是python的草皮，是原始字符串，中间操作是python的草皮，而右操作数是numpy的草皮。您应该返回Python样式的Scalar还是Numpy样式的ndarray布尔值？Numpy说布尔的ndarray，Pythonic开发人员不同意。经典对峙。

如果数组中存在item，应该是元素比较还是标量？

如果您的代码或库使用in或==运算符将python字符串与numpy ndarrays比较，则它们不兼容，因此，当您尝试使用它时，它将返回标量，但仅在现在。警告表示将来这种行为可能会改变，因此，如果python / numpy决定采用Numpy样式，则您的代码会全程吐槽。

提交的错误报告：

Numpy和Python处于僵持状态，目前操作返回标量，但将来可能会改变。

https://github.com/numpy/numpy/issues/6784

https://github.com/pandas-dev/pandas/issues/7830

两种解决方法：

无论您锁定Python和numpy的版本，忽略这些警告并期望行为不改变，或转换的左侧和右侧的操作数==，并in从一个numpy的类型或原始数值Python类型。

全局禁止警告：

import warnings
import numpy as np
warnings.simplefilter(action='ignore', category=FutureWarning)
print('x' in np.arange(5))   #returns False, without Warning

逐行抑制警告。

import warnings
import numpy as np

with warnings.catch_warnings():
    warnings.simplefilter(action='ignore', category=FutureWarning)
    print('x' in np.arange(2))   #returns False, warning is suppressed

print('x' in np.arange(10))   #returns False, Throws FutureWarning

只需按名称隐藏警告，然后在其旁边添加一个大声注释，提及python和numpy的当前版本，并说此代码很脆弱，并且需要这些版本，并在此处添加了链接。踢罐子的路。

TLDR： pandas是绝地武士；numpy是小屋并且python是银河帝国。 https://youtu.be/OZczsiCfQQk?t=3

Question 4

This FutureWarning isn’t from Pandas, it is from numpy and the bug also affects matplotlib and others, here’s how to reproduce the warning nearer to the source of the trouble:

import numpy as np
print(np.__version__)   # Numpy version '1.12.0'
'x' in np.arange(5)       #Future warning thrown here

FutureWarning: elementwise comparison failed; returning scalar instead, but in the 
future will perform elementwise comparison
False

Another way to reproduce this bug using the double equals operator:

import numpy as np
np.arange(5) == np.arange(5).astype(str)    #FutureWarning thrown here

An example of Matplotlib affected by this FutureWarning under their quiver plot implementation: https://matplotlib.org/examples/pylab_examples/quiver_demo.html

What’s going on here?

There is a disagreement between Numpy and native python on what should happen when you compare a strings to numpy’s numeric types. Notice the left operand is python’s turf, a primitive string, and the middle operation is python’s turf, but the right operand is numpy’s turf. Should you return a Python style Scalar or a Numpy style ndarray of Boolean? Numpy says ndarray of bool, Pythonic developers disagree. Classic standoff.

Should it be elementwise comparison or Scalar if item exists in the array?

If your code or library is using the in or == operators to compare python string to numpy ndarrays, they aren’t compatible, so when if you try it, it returns a scalar, but only for now. The Warning indicates that in the future this behavior might change so your code pukes all over the carpet if python/numpy decide to do adopt Numpy style.

Submitted Bug reports:

Numpy and Python are in a standoff, for now the operation returns a scalar, but in the future it may change.

https://github.com/numpy/numpy/issues/6784

https://github.com/pandas-dev/pandas/issues/7830

Two workaround solutions:

Either lockdown your version of python and numpy, ignore the warnings and expect the behavior to not change, or convert both left and right operands of == and in to be from a numpy type or primitive python numeric type.

Suppress the warning globally:

import warnings
import numpy as np
warnings.simplefilter(action='ignore', category=FutureWarning)
print('x' in np.arange(5))   #returns False, without Warning

Suppress the warning on a line by line basis.

import warnings
import numpy as np

with warnings.catch_warnings():
    warnings.simplefilter(action='ignore', category=FutureWarning)
    print('x' in np.arange(2))   #returns False, warning is suppressed

print('x' in np.arange(10))   #returns False, Throws FutureWarning

Just suppress the warning by name, then put a loud comment next to it mentioning the current version of python and numpy, saying this code is brittle and requires these versions and put a link to here. Kick the can down the road.

TLDR: pandas are Jedi; numpy are the hutts; and python is the galactic empire. https://youtu.be/OZczsiCfQQk?t=3

Question 5

当我尝试将index_col读取文件设置为Panda的数据帧时，出现相同的错误：

df = pd.read_csv('my_file.tsv', sep='\t', header=0, index_col=['0'])  ## or same with the following
df = pd.read_csv('my_file.tsv', sep='\t', header=0, index_col=[0])

我以前从未遇到过这样的错误。我仍然试图找出背后的原因（使用@Eric Leschinski的解释和其他解释）。

无论如何，在我找出原因之前，以下方法可以立即解决该问题：

df = pd.read_csv('my_file.tsv', sep='\t', header=0)  ## not setting the index_col
df.set_index(['0'], inplace=True)

一旦弄清这种行为的原因，我将立即更新。

Question 6

I get the same error when I try to set the index_col reading a file into a Panda‘s data-frame:

df = pd.read_csv('my_file.tsv', sep='\t', header=0, index_col=['0'])  ## or same with the following
df = pd.read_csv('my_file.tsv', sep='\t', header=0, index_col=[0])

I have never encountered such an error previously. I still am trying to figure out the reason behind this (using @Eric Leschinski explanation and others).

Anyhow, the following approach solves the problem for now until I figure the reason out:

df = pd.read_csv('my_file.tsv', sep='\t', header=0)  ## not setting the index_col
df.set_index(['0'], inplace=True)

I will update this as soon as I figure out the reason for such behavior.

Question 7

我对同一条警告消息的体验是由TypeError引起的。

TypeError：类型比较无效

因此，您可能要检查 Unnamed: 5

for x in df['Unnamed: 5']:
  print(type(x))  # are they 'str' ?

这是我可以复制警告消息的方法：

import pandas as pd
import numpy as np
df = pd.DataFrame(np.random.randn(3, 2), columns=['num1', 'num2'])
df['num3'] = 3
df.loc[df['num3'] == '3', 'num3'] = 4  # TypeError and the Warning
df.loc[df['num3'] == 3, 'num3'] = 4  # No Error

希望能帮助到你。

Question 8

My experience to the same warning message was caused by TypeError.

TypeError: invalid type comparison

So, you may want to check the data type of the Unnamed: 5

for x in df['Unnamed: 5']:
  print(type(x))  # are they 'str' ?

Here is how I can replicate the warning message:

import pandas as pd
import numpy as np
df = pd.DataFrame(np.random.randn(3, 2), columns=['num1', 'num2'])
df['num3'] = 3
df.loc[df['num3'] == '3', 'num3'] = 4  # TypeError and the Warning
df.loc[df['num3'] == 3, 'num3'] = 4  # No Error

Hope it helps.

Question 9

无法击败Eric Leschinski的详细答案，但这是针对我认为尚未提及的原始问题的快速解决方法-将字符串放在列表中并使用.isin而不是==

例如：

import pandas as pd
import numpy as np

df = pd.DataFrame({"Name": ["Peter", "Joe"], "Number": [1, 2]})

# Raises warning using == to compare different types:
df.loc[df["Number"] == "2", "Number"]

# No warning using .isin:
df.loc[df["Number"].isin(["2"]), "Number"]

Question 10

Can’t beat Eric Leschinski’s awesomely detailed answer, but here’s a quick workaround to the original question that I don’t think has been mentioned yet – put the string in a list and use .isin instead of ==

For example:

import pandas as pd
import numpy as np

df = pd.DataFrame({"Name": ["Peter", "Joe"], "Number": [1, 2]})

# Raises warning using == to compare different types:
df.loc[df["Number"] == "2", "Number"]

# No warning using .isin:
df.loc[df["Number"].isin(["2"]), "Number"]

Question 11

一个快速的解决方法是使用numpy.core.defchararray。我也遇到了同样的警告消息，并且能够使用上述模块来解决它。

import numpy.core.defchararray as npd
resultdataset = npd.equal(dataset1, dataset2)

Question 12

A quick workaround for this is to use numpy.core.defchararray. I also faced the same warning message and was able to resolve it using above module.

import numpy.core.defchararray as npd
resultdataset = npd.equal(dataset1, dataset2)

Question 13

埃里克（Eric）的回答很有帮助，说明了麻烦来自将Pandas系列（包含NumPy数组）与Python字符串进行比较。不幸的是，他的两个解决方法都只是抑制了警告。

要首先编写不会引起警告的代码，请显式地将字符串与Series的每个元素进行比较，并为每个元素获取单独的布尔值。例如，您可以使用map和匿名函数。

myRows = df[df['Unnamed: 5'].map( lambda x: x == 'Peter' )].index.tolist()

Question 14

Eric’s answer helpfully explains that the trouble comes from comparing a Pandas Series (containing a NumPy array) to a Python string. Unfortunately, his two workarounds both just suppress the warning.

To write code that doesn’t cause the warning in the first place, explicitly compare your string to each element of the Series and get a separate bool for each. For example, you could use map and an anonymous function.

myRows = df[df['Unnamed: 5'].map( lambda x: x == 'Peter' )].index.tolist()

Question 15

如果数组不太大或数组不太多，则可以通过将其左侧强制==为字符串来摆脱困境：

myRows = df[str(df['Unnamed: 5']) == 'Peter'].index.tolist()

但这如果df['Unnamed: 5']是字符串则要慢约1.5倍，如果df['Unnamed: 5']是小的numpy数组（长度= 10）则要慢25-30倍，如果是长度为100的numpy数组则要慢150-160倍（时间超过500次试验）。

a = linspace(0, 5, 10)
b = linspace(0, 50, 100)
n = 500
string1 = 'Peter'
string2 = 'blargh'
times_a = zeros(n)
times_str_a = zeros(n)
times_s = zeros(n)
times_str_s = zeros(n)
times_b = zeros(n)
times_str_b = zeros(n)
for i in range(n):
    t0 = time.time()
    tmp1 = a == string1
    t1 = time.time()
    tmp2 = str(a) == string1
    t2 = time.time()
    tmp3 = string2 == string1
    t3 = time.time()
    tmp4 = str(string2) == string1
    t4 = time.time()
    tmp5 = b == string1
    t5 = time.time()
    tmp6 = str(b) == string1
    t6 = time.time()
    times_a[i] = t1 - t0
    times_str_a[i] = t2 - t1
    times_s[i] = t3 - t2
    times_str_s[i] = t4 - t3
    times_b[i] = t5 - t4
    times_str_b[i] = t6 - t5
print('Small array:')
print('Time to compare without str conversion: {} s. With str conversion: {} s'.format(mean(times_a), mean(times_str_a)))
print('Ratio of time with/without string conversion: {}'.format(mean(times_str_a)/mean(times_a)))

print('\nBig array')
print('Time to compare without str conversion: {} s. With str conversion: {} s'.format(mean(times_b), mean(times_str_b)))
print(mean(times_str_b)/mean(times_b))

print('\nString')
print('Time to compare without str conversion: {} s. With str conversion: {} s'.format(mean(times_s), mean(times_str_s)))
print('Ratio of time with/without string conversion: {}'.format(mean(times_str_s)/mean(times_s)))

结果：

Small array:
Time to compare without str conversion: 6.58464431763e-06 s. With str conversion: 0.000173756599426 s
Ratio of time with/without string conversion: 26.3881526541

Big array
Time to compare without str conversion: 5.44309616089e-06 s. With str conversion: 0.000870866775513 s
159.99474375821288

String
Time to compare without str conversion: 5.89370727539e-07 s. With str conversion: 8.30173492432e-07 s
Ratio of time with/without string conversion: 1.40857605178

Question 16

If your arrays aren’t too big or you don’t have too many of them, you might be able to get away with forcing the left hand side of == to be a string:

myRows = df[str(df['Unnamed: 5']) == 'Peter'].index.tolist()

But this is ~1.5 times slower if df['Unnamed: 5'] is a string, 25-30 times slower if df['Unnamed: 5'] is a small numpy array (length = 10), and 150-160 times slower if it’s a numpy array with length 100 (times averaged over 500 trials).

a = linspace(0, 5, 10)
b = linspace(0, 50, 100)
n = 500
string1 = 'Peter'
string2 = 'blargh'
times_a = zeros(n)
times_str_a = zeros(n)
times_s = zeros(n)
times_str_s = zeros(n)
times_b = zeros(n)
times_str_b = zeros(n)
for i in range(n):
    t0 = time.time()
    tmp1 = a == string1
    t1 = time.time()
    tmp2 = str(a) == string1
    t2 = time.time()
    tmp3 = string2 == string1
    t3 = time.time()
    tmp4 = str(string2) == string1
    t4 = time.time()
    tmp5 = b == string1
    t5 = time.time()
    tmp6 = str(b) == string1
    t6 = time.time()
    times_a[i] = t1 - t0
    times_str_a[i] = t2 - t1
    times_s[i] = t3 - t2
    times_str_s[i] = t4 - t3
    times_b[i] = t5 - t4
    times_str_b[i] = t6 - t5
print('Small array:')
print('Time to compare without str conversion: {} s. With str conversion: {} s'.format(mean(times_a), mean(times_str_a)))
print('Ratio of time with/without string conversion: {}'.format(mean(times_str_a)/mean(times_a)))

print('\nBig array')
print('Time to compare without str conversion: {} s. With str conversion: {} s'.format(mean(times_b), mean(times_str_b)))
print(mean(times_str_b)/mean(times_b))

print('\nString')
print('Time to compare without str conversion: {} s. With str conversion: {} s'.format(mean(times_s), mean(times_str_s)))
print('Ratio of time with/without string conversion: {}'.format(mean(times_str_s)/mean(times_s)))

Result:

Small array:
Time to compare without str conversion: 6.58464431763e-06 s. With str conversion: 0.000173756599426 s
Ratio of time with/without string conversion: 26.3881526541

Big array
Time to compare without str conversion: 5.44309616089e-06 s. With str conversion: 0.000870866775513 s
159.99474375821288

String
Time to compare without str conversion: 5.89370727539e-07 s. With str conversion: 8.30173492432e-07 s
Ratio of time with/without string conversion: 1.40857605178

Question 17

就我而言，发出警告的原因仅仅是布尔索引的常规类型-因为该系列只有np.nan。示范（熊猫1.0.3）：

>>> import pandas as pd
>>> import numpy as np
>>> pd.Series([np.nan, 'Hi']) == 'Hi'
0    False
1     True
>>> pd.Series([np.nan, np.nan]) == 'Hi'
~/anaconda3/envs/ms3/lib/python3.7/site-packages/pandas/core/ops/array_ops.py:255: FutureWarning: elementwise comparison failed; returning scalar instead, but in the future will perform elementwise comparison
  res_values = method(rvalues)
0    False
1    False

我认为对于pandas 1.0，他们确实希望您使用'string'允许pd.NA值的新数据类型：

>>> pd.Series([pd.NA, pd.NA]) == 'Hi'
0    False
1    False
>>> pd.Series([np.nan, np.nan], dtype='string') == 'Hi'
0    <NA>
1    <NA>
>>> (pd.Series([np.nan, np.nan], dtype='string') == 'Hi').fillna(False)
0    False
1    False

不喜欢他们在何时开始使用布尔索引等日常功能。

Question 18

In my case, the warning occurred because of just the regular type of boolean indexing — because the series had only np.nan. Demonstration (pandas 1.0.3):

>>> import pandas as pd
>>> import numpy as np
>>> pd.Series([np.nan, 'Hi']) == 'Hi'
0    False
1     True
>>> pd.Series([np.nan, np.nan]) == 'Hi'
~/anaconda3/envs/ms3/lib/python3.7/site-packages/pandas/core/ops/array_ops.py:255: FutureWarning: elementwise comparison failed; returning scalar instead, but in the future will perform elementwise comparison
  res_values = method(rvalues)
0    False
1    False

I think with pandas 1.0 they really want you to use the new 'string' datatype which allows for pd.NA values:

>>> pd.Series([pd.NA, pd.NA]) == 'Hi'
0    False
1    False
>>> pd.Series([np.nan, np.nan], dtype='string') == 'Hi'
0    <NA>
1    <NA>
>>> (pd.Series([np.nan, np.nan], dtype='string') == 'Hi').fillna(False)
0    False
1    False

Don’t love at which point they tinkered with every-day functionality such as boolean indexing.

Question 19

我收到此警告是因为我认为我的列包含空字符串，但是在检查时，它包含了np.nan！

if df['column'] == '':

将我的列更改为空字符串有帮助:)

Question 20

I got this warning because I thought my column contained null strings, but on checking, it contained np.nan!

if df['column'] == '':

Changing my column to empty strings helped :)

Question 21

我已经比较了几种可能的方法，包括熊猫，几种numpy方法和列表理解方法。

首先，让我们从基线开始：

>>> import numpy as np
>>> import operator
>>> import pandas as pd

>>> x = [1, 2, 1, 2]
>>> %time count = np.sum(np.equal(1, x))
>>> print("Count {} using numpy equal with ints".format(count))
CPU times: user 52 µs, sys: 0 ns, total: 52 µs
Wall time: 56 µs
Count 2 using numpy equal with ints

因此，我们的基准是该计数应该正确2，并且我们应该大约50 us。

现在，我们尝试使用朴素的方法：

>>> x = ['s', 'b', 's', 'b']
>>> %time count = np.sum(np.equal('s', x))
>>> print("Count {} using numpy equal".format(count))
CPU times: user 145 µs, sys: 24 µs, total: 169 µs
Wall time: 158 µs
Count NotImplemented using numpy equal
/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/ipykernel_launcher.py:1: FutureWarning: elementwise comparison failed; returning scalar instead, but in the future will perform elementwise comparison
  """Entry point for launching an IPython kernel.

在这里，我们得到了错误的答案（NotImplemented != 2），这花了我们很长时间，并且引发了警告。

因此，我们将尝试另一种幼稚的方法：

>>> %time count = np.sum(x == 's')
>>> print("Count {} using ==".format(count))
CPU times: user 46 µs, sys: 1 µs, total: 47 µs
Wall time: 50.1 µs
Count 0 using ==

同样，错误答案（0 != 2）。这更加隐蔽，因为没有后续警告（0可以像一样传递2）。

现在，让我们尝试一个列表理解：

>>> %time count = np.sum([operator.eq(_x, 's') for _x in x])
>>> print("Count {} using list comprehension".format(count))
CPU times: user 55 µs, sys: 1 µs, total: 56 µs
Wall time: 60.3 µs
Count 2 using list comprehension

我们在这里得到正确的答案，而且速度很快！

另一种可能性pandas：

>>> y = pd.Series(x)
>>> %time count = np.sum(y == 's')
>>> print("Count {} using pandas ==".format(count))
CPU times: user 453 µs, sys: 31 µs, total: 484 µs
Wall time: 463 µs
Count 2 using pandas ==

慢，但是正确！

最后，我将使用的选项是：将numpy数组转换为object类型：

>>> x = np.array(['s', 'b', 's', 'b']).astype(object)
>>> %time count = np.sum(np.equal('s', x))
>>> print("Count {} using numpy equal".format(count))
CPU times: user 50 µs, sys: 1 µs, total: 51 µs
Wall time: 55.1 µs
Count 2 using numpy equal

快速正确！

Question 22

I’ve compared a few of the methods possible for doing this, including pandas, several numpy methods, and a list comprehension method.

First, let’s start with a baseline:

>>> import numpy as np
>>> import operator
>>> import pandas as pd

>>> x = [1, 2, 1, 2]
>>> %time count = np.sum(np.equal(1, x))
>>> print("Count {} using numpy equal with ints".format(count))
CPU times: user 52 µs, sys: 0 ns, total: 52 µs
Wall time: 56 µs
Count 2 using numpy equal with ints

So, our baseline is that the count should be correct 2, and we should take about 50 us.

Now, we try the naive method:

>>> x = ['s', 'b', 's', 'b']
>>> %time count = np.sum(np.equal('s', x))
>>> print("Count {} using numpy equal".format(count))
CPU times: user 145 µs, sys: 24 µs, total: 169 µs
Wall time: 158 µs
Count NotImplemented using numpy equal
/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/ipykernel_launcher.py:1: FutureWarning: elementwise comparison failed; returning scalar instead, but in the future will perform elementwise comparison
  """Entry point for launching an IPython kernel.

And here, we get the wrong answer (NotImplemented != 2), it takes us a long time, and it throws the warning.

So we’ll try another naive method:

>>> %time count = np.sum(x == 's')
>>> print("Count {} using ==".format(count))
CPU times: user 46 µs, sys: 1 µs, total: 47 µs
Wall time: 50.1 µs
Count 0 using ==

Again, the wrong answer (0 != 2). This is even more insidious because there’s no subsequent warnings (0 can be passed around just like 2).

Now, let’s try a list comprehension:

>>> %time count = np.sum([operator.eq(_x, 's') for _x in x])
>>> print("Count {} using list comprehension".format(count))
CPU times: user 55 µs, sys: 1 µs, total: 56 µs
Wall time: 60.3 µs
Count 2 using list comprehension

We get the right answer here, and it’s pretty fast!

Another possibility, pandas:

>>> y = pd.Series(x)
>>> %time count = np.sum(y == 's')
>>> print("Count {} using pandas ==".format(count))
CPU times: user 453 µs, sys: 31 µs, total: 484 µs
Wall time: 463 µs
Count 2 using pandas ==

Slow, but correct!

And finally, the option I’m going to use: casting the numpy array to the object type:

>>> x = np.array(['s', 'b', 's', 'b']).astype(object)
>>> %time count = np.sum(np.equal('s', x))
>>> print("Count {} using numpy equal".format(count))
CPU times: user 50 µs, sys: 1 µs, total: 51 µs
Wall time: 55.1 µs
Count 2 using numpy equal

Fast and correct!

Question 23

我有导致错误的此代码：

for t in dfObj['time']:
  if type(t) == str:
    the_date = dateutil.parser.parse(t)
    loc_dt_int = int(the_date.timestamp())
    dfObj.loc[t == dfObj.time, 'time'] = loc_dt_int

我将其更改为：

for t in dfObj['time']:
  try:
    the_date = dateutil.parser.parse(t)
    loc_dt_int = int(the_date.timestamp())
    dfObj.loc[t == dfObj.time, 'time'] = loc_dt_int
  except Exception as e:
    print(e)
    continue

为了避免比较，它会发出警告-如上所述。我只需要避免这种异常，因为dfObj.loc在for循环中，也许有一种方法可以告诉它不要检查已更改的行。

Question 24

I had this code which was causing the error:

for t in dfObj['time']:
  if type(t) == str:
    the_date = dateutil.parser.parse(t)
    loc_dt_int = int(the_date.timestamp())
    dfObj.loc[t == dfObj.time, 'time'] = loc_dt_int

I changed it to this:

for t in dfObj['time']:
  try:
    the_date = dateutil.parser.parse(t)
    loc_dt_int = int(the_date.timestamp())
    dfObj.loc[t == dfObj.time, 'time'] = loc_dt_int
  except Exception as e:
    print(e)
    continue

to avoid the comparison, which is throwing the warning – as stated above. I only had to avoid the exception because of dfObj.loc in the for loop, maybe there is a way to tell it not to check the rows it has already changed.

Question 25

In a python source code I stumbled upon I’ve seen a small b before a string like in:

b"abcdef"

I know about the u prefix signifying a unicode string, and the r prefix for a raw string literal.

What does the b stand for and in which kind of source code is it useful as it seems to be exactly like a plain string without any prefix?

Question 26

This is Python3 bytes literal. This prefix is absent in Python 2.5 and older (it is equivalent to a plain string of 2.x, while plain string of 3.x is equivalent to a literal with u prefix in 2.x). In Python 2.6+ it is equivalent to a plain string, for compatibility with 3.x.

Question 27

The b prefix signifies a bytes string literal.

If you see it used in Python 3 source code, the expression creates a bytes object, not a regular Unicode str object. If you see it echoed in your Python shell or as part of a list, dict or other container contents, then you see a bytes object represented using this notation.

bytes objects basically contain a sequence of integers in the range 0-255, but when represented, Python displays these bytes as ASCII codepoints to make it easier to read their contents. Any bytes outside the printable range of ASCII characters are shown as escape sequences (e.g. \n, \x82, etc.). Inversely, you can use both ASCII characters and escape sequences to define byte values; for ASCII values their numeric value is used (e.g. b'A' == b'\x41')

Because a bytes object consist of a sequence of integers, you can construct a bytes object from any other sequence of integers with values in the 0-255 range, like a list:

bytes([72, 101, 108, 108, 111])

and indexing gives you back the integers (but slicing produces a new bytes value; for the above example, value[0] gives you 72, but value[:1] is b'H' as 72 is the ASCII code point for the capital letter H).

bytes model binary data, including encoded text. If your bytes value does contain text, you need to first decode it, using the correct codec. If the data is encoded as UTF-8, for example, you can obtain a Unicode str value with:

strvalue = bytesvalue.decode('utf-8')

Conversely, to go from text in a str object to bytes you need to encode. You need to decide on an encoding to use; the default is to use UTF-8, but what you will need is highly dependent on your use case:

bytesvalue = strvalue.encode('utf-8')

You can also use the constructor, bytes(strvalue, encoding) to do the same.

Both the decoding and encoding methods take an extra argument to specify how errors should be handled.

Python 2, versions 2.6 and 2.7 also support creating string literals using b'..' string literal syntax, to ease code that works on both Python 2 and 3.

bytes objects are immutable, just like str strings are. Use a bytearray() object if you need to have a mutable bytes value.

Question 28

I am trying to learn how to automatically fetch urls from a page. In the following code I am trying to get the title of the webpage:

import urllib.request
import re

url = "http://www.google.com"
regex = r'<title>(,+?)</title>'
pattern  = re.compile(regex)

with urllib.request.urlopen(url) as response:
   html = response.read()

title = re.findall(pattern, html)
print(title)

And I get this unexpected error:

Traceback (most recent call last):
  File "path\to\file\Crawler.py", line 11, in <module>
    title = re.findall(pattern, html)
  File "C:\Python33\lib\re.py", line 201, in findall
    return _compile(pattern, flags).findall(string)
TypeError: can't use a string pattern on a bytes-like object

What am I doing wrong?

Question 29

You want to convert html (a byte-like object) into a string using .decode, e.g. html = response.read().decode('utf-8').

See Convert bytes to a Python String

Question 30

The problem is that your regex is a string, but html is bytes:

>>> type(html)
<class 'bytes'>

Since python doesn’t know how those bytes are encoded, it throws an exception when you try to use a string regex on them.

You can either decode the bytes to a string:

html = html.decode('ISO-8859-1')  # encoding may vary!
title = re.findall(pattern, html)  # no more error

Or use a bytes regex:

regex = rb'<title>(,+?)</title>'
#        ^

In this particular context, you can get the encoding from the response headers:

with urllib.request.urlopen(url) as response:
    encoding = response.info().get_param('charset', 'utf8')
    html = response.read().decode(encoding)

See the urlopen documentation for more details.

Question 31

I recently reinstalled ubuntu and did upgrade to 16.04 and cannot use python:

$ python manage.py runserver
Could not find platform independent libraries <prefix>
Could not find platform dependent libraries <exec_prefix>
Consider setting $PYTHONHOME to <prefix>[:<exec_prefix>]
Fatal Python error: Py_Initialize: Unable to get the locale encoding
ImportError: No module named 'encodings'
Aborted

At this point, python itself doesn’t work

$ python
Could not find platform independent libraries <prefix>
Could not find platform dependent libraries <exec_prefix>
Consider setting $PYTHONHOME to <prefix>[:<exec_prefix>]
Fatal Python error: Py_Initialize: Unable to get the locale encoding
ImportError: No module named 'encodings'
Aborted

Even this suggestion is no longer working:

unset PYTHONHOME
unset PYTHONPATH

Every every I fix it one way, it comes back again. Several answers help to fix it temporarily, but not for good. I’ve reinstalled python and python3 several times. What can I do from here? Thank you

Question 32

For Python-3 try removing virtual environment files. And resetting it up.

rm -rf venv
virtualenv -p /usr/bin/python3 venv/
source venv/bin/activate
pip install -r requirements.txt

https://wiki.ubuntu.com/XenialXerus/ReleaseNotes#Python_3 edit fo

Question 33

For Windows10 User.

I was using python3.4 on Windows10. I installed python3.5. I couldn’t find PYTHONPATH, PYTHONHOME env variable. If I command python in CMD console, It kept using python3.4. I deleted python3.4. Whenever I command python in CMD console, it starts showing an error like below.

Fatal Python error: Py_Initialize: Unable to get the locale encoding
ImportError: No module named 'encodings'

I searched to figure out my problem. Solution was simple. When you install python3.5, you can custom install and check Add Python to environment variables in Advanced Options.

I just leave here for case that someone have similar issues visit here so that they don’t waste their precious time much to figure out.

Question 34

I was facing the same problem under Windows7. The error message looks like that:

Fatal Python error: Py_Initialize: unable to load the file system codec
ModuleNotFoundError: No module named 'encodings'

Current thread 0x000011f4 (most recent call first):

I have installed python 2.7(uninstalled now), and I checked “Add Python to environment variables in Advanced Options” while installing python 3.6. It comes out that the Environment Variable “PYTHONHOME” and “PYTHONPATH” is still python2.7.

Finally I solved it by modify “PYTHONHOME” to python3.6 install path and remove variable “PYTHONPATH“.

Question 35

For the same issue on Windows7

You will see an error like this if your environment variables/ system variables are incorrectly set:

Fatal Python error: Py_Initialize: unable to load the file system codec
ImportError: No module named 'encodings'

Current thread 0x00001db4 (most recent call first):

Fixing this is really simple:

When you download Python3.x version, and run the .exe file, it gives you an option to customize where in your system you want to install Python. For example, I chose this location: C:\Program Files\Python36
Then open system properties and go to “Advanced” tab (Or you can simply do this: Go to Start > Search for “environment variables” > Click on “Edit the system environment variables”.) Under the “Advanced” tab, look for “Environment Variables” and click it. Another window with name “Environment Variables” will pop up.
Now make sure your user variables have the correct Python path listed in “Path Variable”. In my example here, you should see C:\Program Files\Python36. If you do not find it there, add it, by selecting Path Variable field and clicking Edit.
Last step is to double-check PYTHONHOME and PYTHONPATH fields under System Variables in the same window. You should see the same path as described above. If not add it there too.

Then click OK and go back to CMD terminal, and try checking for python. The issue should now be resolved. It worked for me.

Question 36

I had this error during migration to Ubuntu 17.10, and this solved the problem :

sudo dpkg-reconfigure python3

Maybe you will have to close your session and reconnect.

Question 37

Look at /lib/python3.5 and you will see broken links to python libraries. Recreate it to working directory.

Next error –

./script/bin/pip3
Failed to import the site module
Traceback (most recent call last):
  File "/home/script/script/lib/python3.5/site.py", line 703, in <module>
    main()
  File "/home/script/script/lib/python3.5/site.py", line 683, in main
    paths_in_sys = addsitepackages(paths_in_sys)
  File "/home/script/script/lib/python3.5/site.py", line 282, in addsitepackages
    addsitedir(sitedir, known_paths)
  File "/home/script/script/lib/python3.5/site.py", line 204, in addsitedir
    addpackage(sitedir, name, known_paths)
  File "/home/script/script/lib/python3.5/site.py", line 173, in addpackage
    exec(line)
  File "<string>", line 1, in <module>
  File "/home/script/script/lib/python3.5/types.py", line 166, in <module>
    import functools as _functools
  File "/home/script/script/lib/python3.5/functools.py", line 23, in <module>
    from weakref import WeakKeyDictionary
  File "/home/script/script/lib/python3.5/weakref.py", line 12, in <module>
    from _weakref import (
ImportError: cannot import name '_remove_dead_weakref'

fixed like this – https://askubuntu.com/questions/907035/importerror-cannot-import-name-remove-dead-weakref

cd my-virtualenv-directory
virtualenv . --system-site-packages

Question 38

I was facing this issue “ModuleNotFoundError: No module named ‘encodings” after updating to macOS Catalina.

I was having multiple versions of Python installed in my system.

Removing all the python versions(2.7 and 3.7.4) from macOS system and reinstalling the latest python 3.8 worked for me.

To remove a python from macOS, I’ve followed the instructions from here How to uninstall Python 2.7 on a Mac OS X 10.6.4?

The above link is for python 2.7 and but you can use the same for 3.7 also.

Question 39

I had a similar issue. I had both anaconda and python installed on my computer and my python dependencies were from the Anaconda directory. When I uninstalled Anaconda, this error started popping. I added PYTHONPATH but it still didn’t go. I checked with python -version and go to know that it was still taking the anaconda path. I had to manually delete Anaconda3 directory and after that python started taking dependencies from PYTHONPATH.
Issue Solved!

Question 40

Had the same problem when updating my mac to macOS Catalina, while using pipenv. Pipenv creates and manages a virtualenv for you, so the earlier suggestion from @Anoop-Malav is the same, just using pipenv to remove the virtual environment based on the current dir and reset it:

pipenv --rm
pipenv shell  # recreate a virtual env with your current Pipfile

Question 41

In my case just changing the permissions of anaconda folder worked:

sudo chmod -R u=rwx,g=rx,o=rx /path/to/anaconda

Question 42

Because this is the first result in google I just want to add the following information for anybody else having problems with jails:

Could not find platform independent libraries <prefix>
Could not find platform dependent libraries <exec_prefix>
Consider setting $PYTHONHOME to <prefix>[:<exec_prefix>]
Fatal Python error: Py_Initialize: Unable to get the locale encoding
ModuleNotFoundError: No module named 'encodings'

Current thread 0x00007f079b16d740 (most recent call first):
Aborted (core dumped)

When attempting to import python into your jail you both need to link the dependencies and /usr/lib/pythonX.Y to [JAIL]/usr/lib/. Hope this helps.

Question 43

Just go to File -> Settings -> select Project Interpreter under Project tab -> click on the small gear icon -> Add -> System Interpreter -> select the python version you want in the drop down menu

this seemed to work for me

Question 44

I could also fix this. PYTHONPATH and PYTHONHOME were in cause.

run this in a terminal

   touch ~/.bash_profile
   open ~/.bash_profile

and then delete all useless parts of this file, and save. I do not know how recommended it is to do that !

Question 45

I’ve met a problem with re module in Python 3.6.5. I have this pattern in my regular expression:

'\\nRevision: (\d+)\\n'

But when I run it, I’m getting a DeprecationWarning.

I searched for the problem on SO, and haven’t found the answer, actually – what should I use instead of \d+? Just [0-9]+ or maybe something else?

Question 46

Python 3 interprets string literals as Unicode strings, and therefore your \d is treated as an escaped Unicode character.

Declare your RegEx pattern as a raw string instead by prepending r, as below:

r'\nRevision: (\d+)\n'

This also means you can drop the escapes for \n as well since these will just be parsed as newline characters by re.

Question 47

I found that in Python 3.4 there are few different libraries for multiprocessing/threading: multiprocessing vs threading vs asyncio.

But I don’t know which one to use or is the “recommended one”. Do they do the same thing, or are different? If so, which one is used for what? I want to write a program that uses multicores in my computer. But I don’t know which library I should learn.

Question 48

They are intended for (slightly) different purposes and/or requirements. CPython (a typical, mainline Python implementation) still has the global interpreter lock so a multi-threaded application (a standard way to implement parallel processing nowadays) is suboptimal. That’s why multiprocessing may be preferred over threading. But not every problem may be effectively split into [almost independent] pieces, so there may be a need in heavy interprocess communications. That’s why multiprocessing may not be preferred over threading in general.

asyncio (this technique is available not only in Python, other languages and/or frameworks also have it, e.g. Boost.ASIO) is a method to effectively handle a lot of I/O operations from many simultaneous sources w/o need of parallel code execution. So it’s just a solution (a good one indeed!) for a particular task, not for parallel processing in general.

Question 49

[Quick Answer]

TL;DR

Making the Right Choice:

We have walked through the most popular forms of concurrency. But the question remains – when should choose which one? It really depends on the use cases. From my experience (and reading), I tend to follow this pseudo code:

if io_bound:
    if io_very_slow:
        print("Use Asyncio")
    else:
        print("Use Threads")
else:
    print("Multi Processing")

CPU Bound => Multi Processing

I/O Bound, Fast I/O, Limited Number of Connections => Multi Threading

I/O Bound, Slow I/O, Many connections => Asyncio

Reference

[NOTE]:

If you have a long call method (i.e. a method that contained with a sleep time or lazy I/O), the best choice is asyncio, Twisted or Tornado approach (coroutine methods), that works with a single thread as concurrency.
asyncio works on Python3.4 and later.
Tornado and Twisted are ready since Python2.7
uvloop is ultra fast asyncio event loop (uvloop makes asyncio 2-4x faster).

[UPDATE (2019)]:

Japranto ^(GitHub) is a very fast pipelining HTTP server based on uvloop.

Question 50

This is the basic idea:

Is it IO-BOUND ? ———> USE asyncio

IS IT CPU-HEAVY ? —–> USE multiprocessing

ELSE ? ———————-> USE threading

So basically stick to threading unless you have IO/CPU problems.

Question 51

In multiprocessing you leverage multiple CPUs to distribute your calculations. Since each of the CPUs runs in parallel, you’re effectively able to run multiple tasks simultaneously. You would want to use multiprocessing for CPU-bound tasks. An example would be trying to calculate a sum of all elements of a huge list. If your machine has 8 cores, you can “cut” the list into 8 smaller lists and calculate the sum of each of those lists separately on separate core and then just add up those numbers. You’ll get a ~8x speedup by doing that.

In (multi)threading you don’t need multiple CPUs. Imagine a program that sends lots of HTTP requests to the web. If you used a single-threaded program, it would stop the execution (block) at each request, wait for a response, and then continue once received a response. The problem here is that your CPU isn’t really doing work while waiting for some external server to do the job; it could have actually done some useful work in the meantime! The fix is to use threads – you can create many of them, each responsible for requesting some content from the web. The nice thing about threads is that, even if they run on one CPU, the CPU from time to time “freezes” the execution of one thread and jumps to executing the other one (it’s called context switching and it happens constantly at non-deterministic intervals). So if your task is I/O bound – use threading.

asyncio is essentially threading where not the CPU but you, as a programmer (or actually your application), decide where and when does the context switch happen. In Python you use an await keyword to suspend the execution of your coroutine (defined using async keyword).

Question 52

Consider the following expressions. Note that some expressions are repeated to present the “context”.

(this is a long list)

a, b = 1, 2                          # simple sequence assignment
a, b = ['green', 'blue']             # list asqignment
a, b = 'XY'                          # string assignment
a, b = range(1,5,2)                  # any iterable will do


                                     # nested sequence assignment

(a,b), c = "XY", "Z"                 # a = 'X', b = 'Y', c = 'Z' 

(a,b), c = "XYZ"                     # ERROR -- too many values to unpack
(a,b), c = "XY"                      # ERROR -- need more than 1 value to unpack

(a,b), c, = [1,2],'this'             # a = '1', b = '2', c = 'this'
(a,b), (c,) = [1,2],'this'           # ERROR -- too many values to unpack


                                     # extended sequence unpacking

a, *b = 1,2,3,4,5                    # a = 1, b = [2,3,4,5]
*a, b = 1,2,3,4,5                    # a = [1,2,3,4], b = 5
a, *b, c = 1,2,3,4,5                 # a = 1, b = [2,3,4], c = 5

a, *b = 'X'                          # a = 'X', b = []
*a, b = 'X'                          # a = [], b = 'X'
a, *b, c = "XY"                      # a = 'X', b = [], c = 'Y'
a, *b, c = "X...Y"                   # a = 'X', b = ['.','.','.'], c = 'Y'

a, b, *c = 1,2,3                     # a = 1, b = 2, c = [3]
a, b, c, *d = 1,2,3                  # a = 1, b = 2, c = 3, d = []

a, *b, c, *d = 1,2,3,4,5             # ERROR -- two starred expressions in assignment

(a,b), c = [1,2],'this'              # a = '1', b = '2', c = 'this'
(a,b), *c = [1,2],'this'             # a = '1', b = '2', c = ['this']

(a,b), c, *d = [1,2],'this'          # a = '1', b = '2', c = 'this', d = []
(a,b), *c, d = [1,2],'this'          # a = '1', b = '2', c = [], d = 'this'

(a,b), (c, *d) = [1,2],'this'        # a = '1', b = '2', c = 't', d = ['h', 'i', 's']

*a = 1                               # ERROR -- target must be in a list or tuple
*a = (1,2)                           # ERROR -- target must be in a list or tuple
*a, = (1,2)                          # a = [1,2]
*a, = 1                              # ERROR -- 'int' object is not iterable
*a, = [1]                            # a = [1]
*a = [1]                             # ERROR -- target must be in a list or tuple
*a, = (1,)                           # a = [1]
*a, = (1)                            # ERROR -- 'int' object is not iterable

*a, b = [1]                          # a = [], b = 1
*a, b = (1,)                         # a = [], b = 1

(a,b),c = 1,2,3                      # ERROR -- too many values to unpack
(a,b), *c = 1,2,3                    # ERROR - 'int' object is not iterable
(a,b), *c = 'XY', 2, 3               # a = 'X', b = 'Y', c = [2,3]


                                     # extended sequence unpacking -- NESTED

(a,b),c = 1,2,3                      # ERROR -- too many values to unpack
*(a,b), c = 1,2,3                    # a = 1, b = 2, c = 3

*(a,b) = 1,2                         # ERROR -- target must be in a list or tuple
*(a,b), = 1,2                        # a = 1, b = 2

*(a,b) = 'XY'                        # ERROR -- target must be in a list or tuple
*(a,b), = 'XY'                       # a = 'X', b = 'Y'

*(a, b) = 'this'                     # ERROR -- target must be in a list or tuple
*(a, b), = 'this'                    # ERROR -- too many values to unpack
*(a, *b), = 'this'                   # a = 't', b = ['h', 'i', 's']

*(a, *b), c = 'this'                 # a = 't', b = ['h', 'i'], c = 's'

*(a,*b), = 1,2,3,3,4,5,6,7           # a = 1, b = [2, 3, 3, 4, 5, 6, 7]

*(a,*b), *c = 1,2,3,3,4,5,6,7        # ERROR -- two starred expressions in assignment
*(a,*b), (*c,) = 1,2,3,3,4,5,6,7     # ERROR -- 'int' object is not iterable
*(a,*b), c = 1,2,3,3,4,5,6,7         # a = 1, b = [2, 3, 3, 4, 5, 6], c = 7
*(a,*b), (*c,) = 1,2,3,4,5,'XY'      # a = 1, b = [2, 3, 4, 5], c = ['X', 'Y']

*(a,*b), c, d = 1,2,3,3,4,5,6,7      # a = 1, b = [2, 3, 3, 4, 5], c = 6, d = 7
*(a,*b), (c, d) = 1,2,3,3,4,5,6,7    # ERROR -- 'int' object is not iterable
*(a,*b), (*c, d) = 1,2,3,3,4,5,6,7   # ERROR -- 'int' object is not iterable
*(a,*b), *(c, d) = 1,2,3,3,4,5,6,7   # ERROR -- two starred expressions in assignment


*(a,b), c = 'XY', 3                  # ERROR -- need more than 1 value to unpack
*(*a,b), c = 'XY', 3                 # a = [], b = 'XY', c = 3
(a,b), c = 'XY', 3                   # a = 'X', b = 'Y', c = 3

*(a,b), c = 'XY', 3, 4               # a = 'XY', b = 3, c = 4
*(*a,b), c = 'XY', 3, 4              # a = ['XY'], b = 3, c = 4
(a,b), c = 'XY', 3, 4                # ERROR -- too many values to unpack

How to correctly deduce the result of such expressions by hand?

Question 53

My apologies for the length of this post, but I decided to opt for completeness.

Once you know a few basic rules, it’s not hard to generalize them. I’ll do my best to explain with a few examples. Since you’re talking about evaluating these “by hand,” I’ll suggest some simple substitution rules. Basically, you might find it easier to understand an expression if all the iterables are formatted in the same way.

For the purposes of unpacking only, the following substitutions are valid on the right side of the = (i.e. for rvalues):

'XY' -> ('X', 'Y')
['X', 'Y'] -> ('X', 'Y')

If you find that a value doesn’t get unpacked, then you’ll undo the substitution. (See below for further explanation.)

Also, when you see “naked” commas, pretend there’s a top-level tuple. Do this on both the left and the right side (i.e. for lvalues and rvalues):

'X', 'Y' -> ('X', 'Y')
a, b -> (a, b)

With those simple rules in mind, here are some examples:

(a,b), c = "XY", "Z"                 # a = 'X', b = 'Y', c = 'Z'

Applying the above rules, we convert "XY" to ('X', 'Y'), and cover the naked commas in parens:

((a, b), c) = (('X', 'Y'), 'Z')

The visual correspondence here makes it fairly obvious how the assignment works.

Here’s an erroneous example:

(a,b), c = "XYZ"

Following the above substitution rules, we get the below:

((a, b), c) = ('X', 'Y', 'Z')

This is clearly erroneous; the nested structures don’t match up. Now let’s see how it works for a slightly more complex example:

(a,b), c, = [1,2],'this'             # a = '1', b = '2', c = 'this'

Applying the above rules, we get

((a, b), c) = ((1, 2), ('t', 'h', 'i', 's'))

But now it’s clear from the structure that 'this' won’t be unpacked, but assigned directly to c. So we undo the substitution.

((a, b), c) = ((1, 2), 'this')

Now let’s see what happens when we wrap c in a tuple:

(a,b), (c,) = [1,2],'this'           # ERROR -- too many values to unpack

Becomes

((a, b), (c,)) = ((1, 2), ('t', 'h', 'i', 's'))

Again, the error is obvious. c is no longer a naked variable, but a variable inside a sequence, and so the corresponding sequence on the right is unpacked into (c,). But the sequences have a different length, so there’s an error.

Now for extended unpacking using the * operator. This is a bit more complex, but it’s still fairly straightforward. A variable preceded by * becomes a list, which contains any items from the corresponding sequence that aren’t assigned to variable names. Starting with a fairly simple example:

a, *b, c = "X...Y"                   # a = 'X', b = ['.','.','.'], c = 'Y'

This becomes

(a, *b, c) = ('X', '.', '.', '.', 'Y')

The simplest way to analyze this is to work from the ends. 'X' is assigned to a and 'Y' is assigned to c. The remaining values in the sequence are put in a list and assigned to b.

Lvalues like (*a, b) and (a, *b) are just special cases of the above. You can’t have two * operators inside one lvalue sequence because it would be ambiguous. Where would the values go in something like this (a, *b, *c, d) — in b or c? I’ll consider the nested case in a moment.

*a = 1                               # ERROR -- target must be in a list or tuple

Here the error is fairly self-explanatory. The target (*a) must be in a tuple.

*a, = (1,2)                          # a = [1,2]

This works because there’s a naked comma. Applying the rules…

(*a,) = (1, 2)

Since there are no variables other than *a, *a slurps up all the values in the rvalue sequence. What if you replace the (1, 2) with a single value?

*a, = 1                              # ERROR -- 'int' object is not iterable

becomes

(*a,) = 1

Again, the error here is self-explanatory. You can’t unpack something that isn’t a sequence, and *a needs something to unpack. So we put it in a sequence

*a, = [1]                            # a = [1]

Which is eqivalent to

(*a,) = (1,)

Finally, this is a common point of confusion: (1) is the same as 1 — you need a comma to distinguish a tuple from an arithmetic statement.

*a, = (1)                            # ERROR -- 'int' object is not

Now for nesting. Actually this example wasn’t in your “NESTED” section; perhaps you didn’t realize it was nested?

(a,b), *c = 'XY', 2, 3               # a = 'X', b = 'Y', c = [2,3]

Becomes

((a, b), *c) = (('X', 'Y'), 2, 3)

The first value in the top-level tuple gets assigned, and the remaining values in the top-level tuple (2 and 3) are assigned to c — just as we should expect.

(a,b),c = 1,2,3                      # ERROR -- too many values to unpack
*(a,b), c = 1,2,3                    # a = 1, b = 2, c = 3

I’ve already explained above why the first line throws an error. The second line is silly but here’s why it works:

(*(a, b), c) = (1, 2, 3)

As previously explained, we work from the ends. 3 is assigned to c, and then the remaining values are assigned to the variable with the * preceding it, in this case, (a, b). So that’s equivalent to (a, b) = (1, 2), which happens to work because there are the right number of elements. I can’t think of any reason this would ever appear in working code. Similarly,

*(a, *b), c = 'this'                 # a = 't', b = ['h', 'i'], c = 's'

becomes

(*(a, *b), c) = ('t', 'h', 'i', 's')

Working from the ends, 's' is assigned to c, and ('t', 'h', 'i') is assigned to (a, *b). Working again from the ends, 't' is assigned to a, and ('h', 'i') is assigned to b as a list. This is another silly example that should never appear in working code.

Question 54

I find the Python 2 tuple unpacking pretty straightforward. Each name on the left corresponds with either an entire sequence or a single item in a sequence on the right. If names correspond to single items of any sequence, then there must be enough names to cover all of the items.

Extended unpacking, however, can certainly be confusing, because it is so powerful. The reality is you should never be doing the last 10 or more valid examples you gave — if the data is that structured, it should be in a dict or a class instance, not unstructured forms like lists.

Clearly, the new syntax can be abused. The answer to your question is that you shouldn’t have to read expressions like that — they’re bad practice and I doubt they’ll be used.

Just because you can write arbitrarily complex expressions doesn’t mean you should. You could write code like map(map, iterable_of_transformations, map(map, iterable_of_transformations, iterable_of_iterables_of_iterables)) but you don’t.

Question 55

I you think your code may be misleading use other form to express it.

It’s like using extra brackets in expressions to avoid questions about operators precedence. I’ts always a good investment to make your code readable.

I prefer to use unpacking only for simple tasks like swap.

Question 56

I am using Python 3.6.1, and I have come across something very strange. I had a simple dictionary assignment typo that took me a long time to find.

context = {}
context["a"]: 2
print(context)

Output

{}

What is the code context["a"]: 2 doing? It doesn’t raise a SyntaxError when it should IMO. At first I thought it was creating a slice. However, typing repr(context["a"]: 2) raises a SyntaxError. I also typed context["a"]: 2 in the console and the console didn’t print anything. I thought maybe it returned None, but I’m not so sure.

I’ve also thought it could be a single line if statement, but that shouldn’t be the right syntax either.

Additionally, context["a"] should raise a KeyError.

I am perplexed. What is going on?

Question 57

You have accidentally written a syntactically correct variable annotation. That feature was introduced in Python 3.6 (see PEP 526).

Although a variable annotation is parsed as part of an annotated assignment, the assignment statement is optional:

annotated_assignment_stmt ::=  augtarget ":" expression ["=" expression]

Thus, in context["a"]: 2

context["a"] is the annotation target
2 is the annotation itself
context["a"] is left uninitialised

The PEP states that “the target of the annotation can be any valid single assignment target, at least syntactically (it is up to the type checker what to do with this)”, which means that the key doesn’t need to exist to be annotated (hence no KeyError). Here’s an example from the original PEP:

d = {}
d['a']: int = 0  # Annotates d['a'] with int.
d['b']: int      # Annotates d['b'] with int.

Normally, the annotation expression should evaluate to a Python type — after all the main use of annotations is type hinting, but it is not enforced. The annotation can be any valid Python expression, regardless of the type or value of the result.

As you can see, at this time type hints are very permissive and rarely useful, unless you have a static type checker such as mypy.

Question 58

While traversing a graph in Python, a I’m receiving this error:

‘dict’ object has no attribute ‘has_key’

Here is my code:

def find_path(graph, start, end, path=[]):
    path = path + [start]
    if start == end:
        return path
    if not graph.has_key(start):
        return None
    for node in graph[start]:
        if node not in path:
            newpath = find_path(graph, node, end, path)
            if newpath: return newpath
    return None

The code aims to find the paths from one node to others. Code source: http://cs.mwsu.edu/~terry/courses/4883/lectures/graphs.html

Why am I getting this error and how can I fix it?

Question 59

has_key was removed in Python 3. From the documentation:

Removed dict.has_key() – use the in operator instead.

Here’s an example:

if start not in graph:
    return None

Question 60

has_key has been deprecated in Python 3.0. Alternatively you can use ‘in’

graph={'A':['B','C'],
   'B':['C','D']}

print('A' in graph)
>> True

print('E' in graph)
>> False

Question 61

In python3, has_key(key) is replaced by __contains__(key)

Tested in python3.7:

a = {'a':1, 'b':2, 'c':3}
print(a.__contains__('a'))

Question 62

I think it is considered “more pythonic” to just use in when determining if a key already exists, as in

if start not in graph:
    return None

Question 63

The whole code in the document will be:

graph = {'A': ['B', 'C'],
             'B': ['C', 'D'],
             'C': ['D'],
             'D': ['C'],
             'E': ['F'],
             'F': ['C']}
def find_path(graph, start, end, path=[]):
        path = path + [start]
        if start == end:
            return path
        if start not in graph:
            return None
        for node in graph[start]:
            if node not in path:
                newpath = find_path(graph, node, end, path)
                if newpath: return newpath
        return None

After writing it, save the document and press F 5

After that, the code you will run in the Python IDLE shell will be:

find_path(graph, ‘A’,’D’)

The answer you should receive in IDLE is

['A', 'B', 'C', 'D']

Question 64

Try:

if start not in graph:

For more info see ProgrammerSought

问题：FutureWarning：逐元素比较失败；返回标量，但将来将执行元素比较

回答 0

这里发生了什么？

提交的错误报告：

两种解决方法：

What’s going on here?

Submitted Bug reports:

Two workaround solutions:

回答 1

回答 2

回答 3

回答 4

回答 5

回答 6

回答 7

回答 8

回答 9

回答 10

问题：python字符串前的a前缀是什么意思？

回答 0

回答 1

问题：TypeError：无法在re.findall（）中的类似字节的对象上使用字符串模式

回答 0

回答 1

问题：ImportError：没有名为“编码”的模块

回答 0

回答 1

对于Windows10用户。

For Windows10 User.

回答 2

回答 3

回答 4

回答 5

回答 6

回答 7

回答 8

回答 9

回答 10

回答 11

回答 12

问题：DeprecationWarning：无效的转义序列-使用什么代替\ d？

回答 0

问题：Python 3中的多处理与多线程与异步

回答 0

回答 1

[快速回答]

TL; DR

做出正确的选择：

[Quick Answer]

TL;DR

Making the Right Choice:

回答 2

回答 3

问题：开箱，扩展开箱和嵌套扩展开箱

回答 0

回答 1

回答 2

问题：这种奇怪的结肠行为在做什么？

回答 0

问题：’dict’对象没有属性’has_key’

回答 0

回答 1

回答 2

回答 3

回答 4

回答 5

问题：从Python中的相对路径导入

回答 0

编辑2014年11月（3年后）：

原始的骇客方式

EDIT Nov 2014 (3 years later):

Original hacky way

回答 1

回答 2

回答 3

回答 4

有趣好用的Python教程