问题:如何获得熊猫系列的按元素逻辑非?
我有一个Series
包含布尔值的pandas 对象。如何获得包含NOT
每个值逻辑的序列?
例如,考虑一个包含以下内容的系列:
True
True
True
False
我想要获得的系列将包含:
False
False
False
True
这似乎应该相当简单,但是显然我放错了我的mojo =(
I have a pandas Series
object containing boolean values. How can I get a series containing the logical NOT
of each value?
For example, consider a series containing:
True
True
True
False
The series I’d like to get would contain:
False
False
False
True
This seems like it should be reasonably simple, but apparently I’ve misplaced my mojo =(
回答 0
要反转布尔系列,请使用~s
:
In [7]: s = pd.Series([True, True, False, True])
In [8]: ~s
Out[8]:
0 False
1 False
2 True
3 False
dtype: bool
使用Python2.7,NumPy 1.8.0,Pandas 0.13.1:
In [119]: s = pd.Series([True, True, False, True]*10000)
In [10]: %timeit np.invert(s)
10000 loops, best of 3: 91.8 µs per loop
In [11]: %timeit ~s
10000 loops, best of 3: 73.5 µs per loop
In [12]: %timeit (-s)
10000 loops, best of 3: 73.5 µs per loop
从Pandas 0.13.0开始,Series不再是numpy.ndarray
;的子类。它们现在是的子类pd.NDFrame
。这可能与为什么np.invert(s)
不再像~s
或一样快有关-s
。
注意:timeit
结果可能取决于许多因素,包括硬件,编译器,操作系统,Python,NumPy和Pandas版本。
To invert a boolean Series, use ~s
:
In [7]: s = pd.Series([True, True, False, True])
In [8]: ~s
Out[8]:
0 False
1 False
2 True
3 False
dtype: bool
Using Python2.7, NumPy 1.8.0, Pandas 0.13.1:
In [119]: s = pd.Series([True, True, False, True]*10000)
In [10]: %timeit np.invert(s)
10000 loops, best of 3: 91.8 µs per loop
In [11]: %timeit ~s
10000 loops, best of 3: 73.5 µs per loop
In [12]: %timeit (-s)
10000 loops, best of 3: 73.5 µs per loop
As of Pandas 0.13.0, Series are no longer subclasses of numpy.ndarray
; they are now subclasses of pd.NDFrame
. This might have something to do with why np.invert(s)
is no longer as fast as ~s
or -s
.
Caveat: timeit
results may vary depending on many factors including hardware, compiler, OS, Python, NumPy and Pandas versions.
回答 1
@unutbu的答案是正确的,只是想添加一个警告,说明您的蒙版必须是dtype bool,而不是’object’。也就是说,您的面具永远都不会有过。看到这里 -即使您的面具现在是不含纳米的,它仍将是“对象”类型。
“对象”系列的逆函数不会引发错误,相反,您将获得整数的垃圾掩码,这些掩码将无法按预期工作。
In[1]: df = pd.DataFrame({'A':[True, False, np.nan], 'B':[True, False, True]})
In[2]: df.dropna(inplace=True)
In[3]: df['A']
Out[3]:
0 True
1 False
Name: A, dtype object
In[4]: ~df['A']
Out[4]:
0 -2
0 -1
Name: A, dtype object
与同事讨论了这个问题之后,我得到了一个解释:看起来熊猫正在恢复按位运算符:
In [1]: ~True
Out[1]: -2
正如@geher所说,您可以先将其转换为具有astype的bool,然后再使用〜逆
~df['A'].astype(bool)
0 False
1 True
Name: A, dtype: bool
(~df['A']).astype(bool)
0 True
1 True
Name: A, dtype: bool
@unutbu’s answer is spot on, just wanted to add a warning that your mask needs to be dtype bool, not ‘object’. Ie your mask can’t have ever had any nan’s. See here – even if your mask is nan-free now, it will remain ‘object’ type.
The inverse of an ‘object’ series won’t throw an error, instead you’ll get a garbage mask of ints that won’t work as you expect.
In[1]: df = pd.DataFrame({'A':[True, False, np.nan], 'B':[True, False, True]})
In[2]: df.dropna(inplace=True)
In[3]: df['A']
Out[3]:
0 True
1 False
Name: A, dtype object
In[4]: ~df['A']
Out[4]:
0 -2
0 -1
Name: A, dtype object
After speaking with colleagues about this one I have an explanation: It looks like pandas is reverting to the bitwise operator:
In [1]: ~True
Out[1]: -2
As @geher says, you can convert it to bool with astype before you inverse with ~
~df['A'].astype(bool)
0 False
1 True
Name: A, dtype: bool
(~df['A']).astype(bool)
0 True
1 True
Name: A, dtype: bool
回答 2
我只是试一试:
In [9]: s = Series([True, True, True, False])
In [10]: s
Out[10]:
0 True
1 True
2 True
3 False
In [11]: -s
Out[11]:
0 False
1 False
2 False
3 True
I just give it a shot:
In [9]: s = Series([True, True, True, False])
In [10]: s
Out[10]:
0 True
1 True
2 True
3 False
In [11]: -s
Out[11]:
0 False
1 False
2 False
3 True
回答 3
您也可以使用numpy.invert
:
In [1]: import numpy as np
In [2]: import pandas as pd
In [3]: s = pd.Series([True, True, False, True])
In [4]: np.invert(s)
Out[4]:
0 False
1 False
2 True
3 False
编辑:性能差异出现在Ubuntu 12.04,Python 2.7,NumPy 1.7.0上-尽管使用NumPy 1.6.2似乎不存在:
In [5]: %timeit (-s)
10000 loops, best of 3: 26.8 us per loop
In [6]: %timeit np.invert(s)
100000 loops, best of 3: 7.85 us per loop
In [7]: %timeit ~s
10000 loops, best of 3: 27.3 us per loop
You can also use numpy.invert
:
In [1]: import numpy as np
In [2]: import pandas as pd
In [3]: s = pd.Series([True, True, False, True])
In [4]: np.invert(s)
Out[4]:
0 False
1 False
2 True
3 False
EDIT: The difference in performance appears on Ubuntu 12.04, Python 2.7, NumPy 1.7.0 – doesn’t seem to exist using NumPy 1.6.2 though:
In [5]: %timeit (-s)
10000 loops, best of 3: 26.8 us per loop
In [6]: %timeit np.invert(s)
100000 loops, best of 3: 7.85 us per loop
In [7]: %timeit ~s
10000 loops, best of 3: 27.3 us per loop
回答 4
NumPy较慢,因为它将输入强制转换为布尔值(因此None和0变为False,其他所有值变为True)。
import pandas as pd
import numpy as np
s = pd.Series([True, None, False, True])
np.logical_not(s)
给你
0 False
1 True
2 True
3 False
dtype: object
而〜s会崩溃。在大多数情况下,与NumPy相比,波浪号是一个更安全的选择。
熊猫0.25,小米1.17
NumPy is slower because it casts the input to boolean values (so None and 0 becomes False and everything else becomes True).
import pandas as pd
import numpy as np
s = pd.Series([True, None, False, True])
np.logical_not(s)
gives you
0 False
1 True
2 True
3 False
dtype: object
whereas ~s would crash. In most cases tilde would be a safer choice than NumPy.
Pandas 0.25, NumPy 1.17