问题:熊猫中布尔索引的逻辑运算符

我正在Pandas中使用布尔值索引。问题是为什么声明:

a[(a['some_column']==some_number) & (a['some_other_column']==some_other_number)]

工作正常而

a[(a['some_column']==some_number) and (a['some_other_column']==some_other_number)]

退出错误?

例:

a=pd.DataFrame({'x':[1,1],'y':[10,20]})

In: a[(a['x']==1)&(a['y']==10)]
Out:    x   y
     0  1  10

In: a[(a['x']==1) and (a['y']==10)]
Out: ValueError: The truth value of an array with more than one element is ambiguous.     Use a.any() or a.all()

I’m working with boolean index in Pandas. The question is why the statement:

a[(a['some_column']==some_number) & (a['some_other_column']==some_other_number)]

works fine whereas

a[(a['some_column']==some_number) and (a['some_other_column']==some_other_number)]

exits with error?

Example:

a=pd.DataFrame({'x':[1,1],'y':[10,20]})

In: a[(a['x']==1)&(a['y']==10)]
Out:    x   y
     0  1  10

In: a[(a['x']==1) and (a['y']==10)]
Out: ValueError: The truth value of an array with more than one element is ambiguous.     Use a.any() or a.all()

回答 0

当你说

(a['x']==1) and (a['y']==10)

您暗中要求Python进行转换(a['x']==1)并转换(a['y']==10)为布尔值。

NumPy数组(长度大于1)和Pandas对象(例如Series)没有布尔值-换句话说,它们引发

ValueError: The truth value of an array is ambiguous. Use a.empty, a.any() or a.all().

当用作布尔值时。这是因为尚不清楚何时应为True或False。如果某些用户的长度非零,则可能会认为它们为True,例如Python列表。其他人可能只希望它的所有元素都为真,才希望它为真。如果其他任何元素为True,则其他人可能希望它为True。

由于期望值如此之多,因此NumPy和Pandas的设计师拒绝猜测,而是提出了ValueError。

相反,你必须是明确的,通过调用empty()all()any()方法来表示你的愿望是什么行为。

但是,在这种情况下,您似乎不希望布尔值求值,而是希望按元素进行逻辑与。这就是&二进制运算符执行的操作:

(a['x']==1) & (a['y']==10)

返回一个布尔数组。


顺便说一句,正如alexpmil所指出的,括号是强制性的,因为&运算符优先级高于==。如果没有括号,a['x']==1 & a['y']==10则将被评估为a['x'] == (1 & a['y']) == 10等效于链式比较(a['x'] == (1 & a['y'])) and ((1 & a['y']) == 10)。那是形式的表达Series and Seriesand与两个Series一起使用将再次触发与ValueError上述相同的操作。这就是为什么括号是强制性的。

When you say

(a['x']==1) and (a['y']==10)

You are implicitly asking Python to convert (a['x']==1) and (a['y']==10) to boolean values.

NumPy arrays (of length greater than 1) and Pandas objects such as Series do not have a boolean value — in other words, they raise

ValueError: The truth value of an array is ambiguous. Use a.empty, a.any() or a.all().

when used as a boolean value. That’s because its unclear when it should be True or False. Some users might assume they are True if they have non-zero length, like a Python list. Others might desire for it to be True only if all its elements are True. Others might want it to be True if any of its elements are True.

Because there are so many conflicting expectations, the designers of NumPy and Pandas refuse to guess, and instead raise a ValueError.

Instead, you must be explicit, by calling the empty(), all() or any() method to indicate which behavior you desire.

In this case, however, it looks like you do not want boolean evaluation, you want element-wise logical-and. That is what the & binary operator performs:

(a['x']==1) & (a['y']==10)

returns a boolean array.


By the way, as alexpmil notes, the parentheses are mandatory since & has a higher operator precedence than ==. Without the parentheses, a['x']==1 & a['y']==10 would be evaluated as a['x'] == (1 & a['y']) == 10 which would in turn be equivalent to the chained comparison (a['x'] == (1 & a['y'])) and ((1 & a['y']) == 10). That is an expression of the form Series and Series. The use of and with two Series would again trigger the same ValueError as above. That’s why the parentheses are mandatory.


回答 1

TLDR;在熊猫逻辑运算符&|~和括号(...)是很重要的!

Python的andornot逻辑运算符的设计与标量的工作。因此,Pandas必须做得更好,并覆盖按位运算符,以实现此功能的矢量化(逐元素)版本。

因此,以下是python中的(exp1以及exp2是计算结果为布尔结果的表达式)…

exp1 and exp2              # Logical AND
exp1 or exp2               # Logical OR
not exp1                   # Logical NOT

…将转换为…

exp1 & exp2                # Element-wise logical AND
exp1 | exp2                # Element-wise logical OR
~exp1                      # Element-wise logical NOT

大熊猫。

如果在执行逻辑运算的过程中得到ValueError,则需要使用括号进行分组:

(exp1) op (exp2)

例如,

(df['col1'] == x) & (df['col2'] == y) 

等等。


布尔索引:常见的操作是通过逻辑条件来计算布尔掩码,以过滤数据。熊猫提供了三种运算符:&用于逻辑与,|逻辑或,以及~逻辑非。

请考虑以下设置:

np.random.seed(0)
df = pd.DataFrame(np.random.choice(10, (5, 3)), columns=list('ABC'))
df

   A  B  C
0  5  0  3
1  3  7  9
2  3  5  2
3  4  7  6
4  8  8  1

逻辑与

对于df上面的内容,假设您想返回A <5和B> 5的所有行。这是通过分别计算每个条件的掩码并将它们与与完成的。

重载的按位&运算符
在继续之前,请注意文档的此特定摘录,其中指出

另一个常见的操作是使用布尔向量来过滤数据。运算符是:|for or&for and~for not这些必须通过使用括号来分组,由于由默认的Python将评估的表达式如df.A > 2 & df.B < 3df.A > (2 & df.B) < 3,而所期望的评价顺序是(df.A > 2) & (df.B < 3)

因此,考虑到这一点,可以使用按位运算符实现按元素逻辑与&

df['A'] < 5

0    False
1     True
2     True
3     True
4    False
Name: A, dtype: bool

df['B'] > 5

0    False
1     True
2    False
3     True
4     True
Name: B, dtype: bool

(df['A'] < 5) & (df['B'] > 5)

0    False
1     True
2    False
3     True
4    False
dtype: bool

接下来的过滤步骤很简单,

df[(df['A'] < 5) & (df['B'] > 5)]

   A  B  C
1  3  7  9
3  4  7  6

括号用于覆盖按位运算符的默认优先级顺序,后者比条件运算符<和具有更高的优先级>。请参阅python文档中的“ 运算符优先级 ”部分。

如果不使用括号,则表达式的计算不正确。例如,如果您不小心尝试了诸如

df['A'] < 5 & df['B'] > 5

它被解析为

df['A'] < (5 & df['B']) > 5

变成

df['A'] < something_you_dont_want > 5

变成了(请参阅有关链接运算符比较的python文档),

(df['A'] < something_you_dont_want) and (something_you_dont_want > 5)

变成

# Both operands are Series...
something_else_you_dont_want1 and something_else_you_dont_want2

哪个抛出

ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().

所以,不要犯那个错误!1个

避免括号分组
该修补程序实际上非常简单。大多数运算符都有对应的DataFrame绑定方法。如果使用函数而不是条件运算符来构建单个掩码,则不再需要按括号分组以指定评估顺序:

df['A'].lt(5)

0     True
1     True
2     True
3     True
4    False
Name: A, dtype: bool

df['B'].gt(5)

0    False
1     True
2    False
3     True
4     True
Name: B, dtype: bool

df['A'].lt(5) & df['B'].gt(5)

0    False
1     True
2    False
3     True
4    False
dtype: bool

请参阅“ 灵活比较 ”部分。总而言之,我们有

╒════╤════════════╤════════════╕
     Operator    Function   
╞════╪════════════╪════════════╡
  0  >           gt         
├────┼────────────┼────────────┤
  1  >=          ge         
├────┼────────────┼────────────┤
  2  <           lt         
├────┼────────────┼────────────┤
  3  <=          le         
├────┼────────────┼────────────┤
  4  ==          eq         
├────┼────────────┼────────────┤
  5  !=          ne         
╘════╧════════════╧════════════╛

避免括号的另一种方法是使用(或eval):

df.query('A < 5 and B > 5')

   A  B  C
1  3  7  9
3  4  7  6

我已经广泛地记录queryeval使用pd.eval动态表达评价大熊猫()


允许您以功能方式执行此操作。内部调用Series.__and__,它对应于按位运算符。

import operator 

operator.and_(df['A'] < 5, df['B'] > 5)
# Same as,
# (df['A'] < 5).__and__(df['B'] > 5) 

0    False
1     True
2    False
3     True
4    False
dtype: bool

df[operator.and_(df['A'] < 5, df['B'] > 5)]

   A  B  C
1  3  7  9
3  4  7  6

您通常不需要此功能,但了解它很有用。


另一种替代方法是使用 np.logical_and,它也不需要括号分组:

np.logical_and(df['A'] < 5, df['B'] > 5)

0    False
1     True
2    False
3     True
4    False
Name: A, dtype: bool

df[np.logical_and(df['A'] < 5, df['B'] > 5)]

   A  B  C
1  3  7  9
3  4  7  6

np.logical_and是ufunc (通用函数),大多数ufunc都有一个reduce方法。这意味着logical_and如果对AND有多个掩码,则更容易一概而论。例如,对于AND遮罩m1以及m2m3&,您必须要做

m1 & m2 & m3

但是,一个更简单的选择是

np.logical_and.reduce([m1, m2, m3])

这很强大,因为它使您可以使用更复杂的逻辑在此基础上构建(例如,在列表理解中动态生成掩码并添加所有掩码):

import operator

cols = ['A', 'B']
ops = [np.less, np.greater]
values = [5, 5]

m = np.logical_and.reduce([op(df[c], v) for op, c, v in zip(ops, cols, values)])
m 
# array([False,  True, False,  True, False])

df[m]
   A  B  C
1  3  7  9
3  4  7  6

1-我知道我在这一点上很困难,但是请耐心等待。这是一个非常非常常见的初学者的错误,必须非常有详尽的解释。


逻辑或

对于df上述内容,假设您想返回A == 3或B == 7的所有行。

按位重载 |

df['A'] == 3

0    False
1     True
2     True
3    False
4    False
Name: A, dtype: bool

df['B'] == 7

0    False
1     True
2    False
3     True
4    False
Name: B, dtype: bool

(df['A'] == 3) | (df['B'] == 7)

0    False
1     True
2     True
3     True
4    False
dtype: bool

df[(df['A'] == 3) | (df['B'] == 7)]

   A  B  C
1  3  7  9
2  3  5  2
3  4  7  6

如果您还没有阅读过,请同时阅读上面“ 逻辑与 ”部分,所有注意事项均在此处适用。

或者,可以使用

df[df['A'].eq(3) | df['B'].eq(7)]

   A  B  C
1  3  7  9
2  3  5  2
3  4  7  6


在后台调用Series.__or__

operator.or_(df['A'] == 3, df['B'] == 7)
# Same as,
# (df['A'] == 3).__or__(df['B'] == 7)

0    False
1     True
2     True
3     True
4    False
dtype: bool

df[operator.or_(df['A'] == 3, df['B'] == 7)]

   A  B  C
1  3  7  9
2  3  5  2
3  4  7  6


对于两个条件,请使用logical_or

np.logical_or(df['A'] == 3, df['B'] == 7)

0    False
1     True
2     True
3     True
4    False
Name: A, dtype: bool

df[np.logical_or(df['A'] == 3, df['B'] == 7)]

   A  B  C
1  3  7  9
2  3  5  2
3  4  7  6

对于多个口罩,请使用logical_or.reduce

np.logical_or.reduce([df['A'] == 3, df['B'] == 7])
# array([False,  True,  True,  True, False])

df[np.logical_or.reduce([df['A'] == 3, df['B'] == 7])]

   A  B  C
1  3  7  9
2  3  5  2
3  4  7  6

逻辑非

给定口罩,例如

mask = pd.Series([True, True, False])

如果您需要反转每个布尔值(以使最终结果为[False, False, True]),则可以使用以下任何方法。

按位 ~

~mask

0    False
1    False
2     True
dtype: bool

同样,表达式需要加上括号。

~(df['A'] == 3)

0     True
1    False
2    False
3     True
4     True
Name: A, dtype: bool

这在内部调用

mask.__invert__()

0    False
1    False
2     True
dtype: bool

但是不要直接使用它。


内部调用__invert__该系列。

operator.inv(mask)

0    False
1    False
2     True
dtype: bool

np.logical_not
这是numpy的变体。

np.logical_not(mask)

0    False
1    False
2     True
dtype: bool

注意,np.logical_and可以代替np.bitwise_andlogical_orbitwise_or,并logical_notinvert

TLDR; Logical Operators in Pandas are &, | and ~, and parentheses (...) is important!

Python’s and, or and not logical operators are designed to work with scalars. So Pandas had to do one better and override the bitwise operators to achieve vectorized (element-wise) version of this functionality.

So the following in python (exp1 and exp2 are expressions which evaluate to a boolean result)…

exp1 and exp2              # Logical AND
exp1 or exp2               # Logical OR
not exp1                   # Logical NOT

…will translate to…

exp1 & exp2                # Element-wise logical AND
exp1 | exp2                # Element-wise logical OR
~exp1                      # Element-wise logical NOT

for pandas.

If in the process of performing logical operation you get a ValueError, then you need to use parentheses for grouping:

(exp1) op (exp2)

For example,

(df['col1'] == x) & (df['col2'] == y) 

And so on.


Boolean Indexing: A common operation is to compute boolean masks through logical conditions to filter the data. Pandas provides three operators: & for logical AND, | for logical OR, and ~ for logical NOT.

Consider the following setup:

np.random.seed(0)
df = pd.DataFrame(np.random.choice(10, (5, 3)), columns=list('ABC'))
df

   A  B  C
0  5  0  3
1  3  7  9
2  3  5  2
3  4  7  6
4  8  8  1

Logical AND

For df above, say you’d like to return all rows where A < 5 and B > 5. This is done by computing masks for each condition separately, and ANDing them.

Overloaded Bitwise & Operator
Before continuing, please take note of this particular excerpt of the docs, which state

Another common operation is the use of boolean vectors to filter the data. The operators are: | for or, & for and, and ~ for not. These must be grouped by using parentheses, since by default Python will evaluate an expression such as df.A > 2 & df.B < 3 as df.A > (2 & df.B) < 3, while the desired evaluation order is (df.A > 2) & (df.B < 3).

So, with this in mind, element wise logical AND can be implemented with the bitwise operator &:

df['A'] < 5

0    False
1     True
2     True
3     True
4    False
Name: A, dtype: bool

df['B'] > 5

0    False
1     True
2    False
3     True
4     True
Name: B, dtype: bool

(df['A'] < 5) & (df['B'] > 5)

0    False
1     True
2    False
3     True
4    False
dtype: bool

And the subsequent filtering step is simply,

df[(df['A'] < 5) & (df['B'] > 5)]

   A  B  C
1  3  7  9
3  4  7  6

The parentheses are used to override the default precedence order of bitwise operators, which have higher precedence over the conditional operators < and >. See the section of Operator Precedence in the python docs.

If you do not use parentheses, the expression is evaluated incorrectly. For example, if you accidentally attempt something such as

df['A'] < 5 & df['B'] > 5

It is parsed as

df['A'] < (5 & df['B']) > 5

Which becomes,

df['A'] < something_you_dont_want > 5

Which becomes (see the python docs on chained operator comparison),

(df['A'] < something_you_dont_want) and (something_you_dont_want > 5)

Which becomes,

# Both operands are Series...
something_else_you_dont_want1 and something_else_you_dont_want2

Which throws

ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().

So, don’t make that mistake!1

Avoiding Parentheses Grouping
The fix is actually quite simple. Most operators have a corresponding bound method for DataFrames. If the individual masks are built up using functions instead of conditional operators, you will no longer need to group by parens to specify evaluation order:

df['A'].lt(5)

0     True
1     True
2     True
3     True
4    False
Name: A, dtype: bool

df['B'].gt(5)

0    False
1     True
2    False
3     True
4     True
Name: B, dtype: bool

df['A'].lt(5) & df['B'].gt(5)

0    False
1     True
2    False
3     True
4    False
dtype: bool

See the section on Flexible Comparisons.. To summarise, we have

╒════╤════════════╤════════════╕
│    │ Operator   │ Function   │
╞════╪════════════╪════════════╡
│  0 │ >          │ gt         │
├────┼────────────┼────────────┤
│  1 │ >=         │ ge         │
├────┼────────────┼────────────┤
│  2 │ <          │ lt         │
├────┼────────────┼────────────┤
│  3 │ <=         │ le         │
├────┼────────────┼────────────┤
│  4 │ ==         │ eq         │
├────┼────────────┼────────────┤
│  5 │ !=         │ ne         │
╘════╧════════════╧════════════╛

Another option for avoiding parentheses is to use (or eval):

df.query('A < 5 and B > 5')

   A  B  C
1  3  7  9
3  4  7  6

I have extensively documented query and eval in Dynamic Expression Evaluation in pandas using pd.eval().


Allows you to perform this operation in a functional manner. Internally calls Series.__and__ which corresponds to the bitwise operator.

import operator 

operator.and_(df['A'] < 5, df['B'] > 5)
# Same as,
# (df['A'] < 5).__and__(df['B'] > 5) 

0    False
1     True
2    False
3     True
4    False
dtype: bool

df[operator.and_(df['A'] < 5, df['B'] > 5)]

   A  B  C
1  3  7  9
3  4  7  6

You won’t usually need this, but it is useful to know.


Another alternative is using np.logical_and, which also does not need parentheses grouping:

np.logical_and(df['A'] < 5, df['B'] > 5)

0    False
1     True
2    False
3     True
4    False
Name: A, dtype: bool

df[np.logical_and(df['A'] < 5, df['B'] > 5)]

   A  B  C
1  3  7  9
3  4  7  6

np.logical_and is a ufunc (Universal Functions), and most ufuncs have a reduce method. This means it is easier to generalise with logical_and if you have multiple masks to AND. For example, to AND masks m1 and m2 and m3 with &, you would have to do

m1 & m2 & m3

However, an easier option is

np.logical_and.reduce([m1, m2, m3])

This is powerful, because it lets you build on top of this with more complex logic (for example, dynamically generating masks in a list comprehension and adding all of them):

import operator

cols = ['A', 'B']
ops = [np.less, np.greater]
values = [5, 5]

m = np.logical_and.reduce([op(df[c], v) for op, c, v in zip(ops, cols, values)])
m 
# array([False,  True, False,  True, False])

df[m]
   A  B  C
1  3  7  9
3  4  7  6

1 – I know I’m harping on this point, but please bear with me. This is a very, very common beginner’s mistake, and must be explained very thoroughly.


Logical OR

For the df above, say you’d like to return all rows where A == 3 or B == 7.

Overloaded Bitwise |

df['A'] == 3

0    False
1     True
2     True
3    False
4    False
Name: A, dtype: bool

df['B'] == 7

0    False
1     True
2    False
3     True
4    False
Name: B, dtype: bool

(df['A'] == 3) | (df['B'] == 7)

0    False
1     True
2     True
3     True
4    False
dtype: bool

df[(df['A'] == 3) | (df['B'] == 7)]

   A  B  C
1  3  7  9
2  3  5  2
3  4  7  6

If you haven’t yet, please also read the section on Logical AND above, all caveats apply here.

Alternatively, this operation can be specified with

df[df['A'].eq(3) | df['B'].eq(7)]

   A  B  C
1  3  7  9
2  3  5  2
3  4  7  6


Calls Series.__or__ under the hood.

operator.or_(df['A'] == 3, df['B'] == 7)
# Same as,
# (df['A'] == 3).__or__(df['B'] == 7)

0    False
1     True
2     True
3     True
4    False
dtype: bool

df[operator.or_(df['A'] == 3, df['B'] == 7)]

   A  B  C
1  3  7  9
2  3  5  2
3  4  7  6


For two conditions, use logical_or:

np.logical_or(df['A'] == 3, df['B'] == 7)

0    False
1     True
2     True
3     True
4    False
Name: A, dtype: bool

df[np.logical_or(df['A'] == 3, df['B'] == 7)]

   A  B  C
1  3  7  9
2  3  5  2
3  4  7  6

For multiple masks, use logical_or.reduce:

np.logical_or.reduce([df['A'] == 3, df['B'] == 7])
# array([False,  True,  True,  True, False])

df[np.logical_or.reduce([df['A'] == 3, df['B'] == 7])]

   A  B  C
1  3  7  9
2  3  5  2
3  4  7  6

Logical NOT

Given a mask, such as

mask = pd.Series([True, True, False])

If you need to invert every boolean value (so that the end result is [False, False, True]), then you can use any of the methods below.

Bitwise ~

~mask

0    False
1    False
2     True
dtype: bool

Again, expressions need to be parenthesised.

~(df['A'] == 3)

0     True
1    False
2    False
3     True
4     True
Name: A, dtype: bool

This internally calls

mask.__invert__()

0    False
1    False
2     True
dtype: bool

But don’t use it directly.


Internally calls __invert__ on the Series.

operator.inv(mask)

0    False
1    False
2     True
dtype: bool

np.logical_not
This is the numpy variant.

np.logical_not(mask)

0    False
1    False
2     True
dtype: bool

Note, np.logical_and can be substituted for np.bitwise_and, logical_or with bitwise_or, and logical_not with invert.


回答 2

熊猫中布尔索引的逻辑运算符

要认识到,你不能使用任何的Python是很重要的逻辑运算符andornot上)pandas.Seriespandas.DataFrameS(同样,你不能上使用它们numpy.array有一个以上的元素S)。之所以不能使用它们,是因为它们隐式地调用bool其操作数,从而引发异常,因为这些数据结构确定数组的布尔值是不明确的:

>>> import numpy as np
>>> import pandas as pd
>>> arr = np.array([1,2,3])
>>> s = pd.Series([1,2,3])
>>> df = pd.DataFrame([1,2,3])
>>> bool(arr)
ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()
>>> bool(s)
ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().
>>> bool(df)
ValueError: The truth value of a DataFrame is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().

我确实在回答“系列的真值不明确。请使用a.empty,a.bool(),a.item(),a.any()或a.all()”时回答这个问题。 + A

NumPys逻辑功能

然而NumPy的提供逐元素的操作等同于这些运营商的功能,可以在被使用numpy.arraypandas.Seriespandas.DataFrame,或任何其他(符合)numpy.array亚类:

因此,从本质上讲,应该使用(假设df1并且df2是pandas DataFrames):

np.logical_and(df1, df2)
np.logical_or(df1, df2)
np.logical_not(df1)
np.logical_xor(df1, df2)

布尔的按位函数和按位运算符

但是,如果您具有布尔NumPy数组,pandas系列或pandas DataFrame,则也可以使用按元素逐位的函数(对于布尔,它们与逻辑函数是(或至少应该是)不可区分的):

  • 按位与:&运算符
  • 按位或:|运算符
  • 按位不:(或别名np.bitwise_not)或~运算符
  • 按位异或:或^运算符

通常使用运算符。但是,当与比较运算符组合使用时,必须记住将比较括在括号中,因为按位运算符的优先级高于比较运算符

(df1 < 10) | (df2 > 10)  # instead of the wrong df1 < 10 | df2 > 10

这可能很烦人,因为Python逻辑运算符的优先级比比较运算符的优先级低,因此您通常可以编写a < 10 and b > 10(其中a并且b是简单整数),并且不需要括号。

逻辑和按位运算之间的差异(非布尔值)

需要特别强调的是,位和逻辑运算仅对布尔NumPy数组(以及布尔Series和DataFrame)是等效的。如果这些不包含布尔值,则这些操作将给出不同的结果。我将包括使用NumPy数组的示例,但对于pandas数据结构,结果将相似:

>>> import numpy as np
>>> a1 = np.array([0, 0, 1, 1])
>>> a2 = np.array([0, 1, 0, 1])

>>> np.logical_and(a1, a2)
array([False, False, False,  True])
>>> np.bitwise_and(a1, a2)
array([0, 0, 0, 1], dtype=int32)

而且由于NumPy(和类似的pandas)对boolean(布尔或“掩码”索引数组)和integer(Index数组)索引所做的操作不同,因此索引的结果也将不同:

>>> a3 = np.array([1, 2, 3, 4])

>>> a3[np.logical_and(a1, a2)]
array([4])
>>> a3[np.bitwise_and(a1, a2)]
array([1, 1, 1, 2])

汇总表

Logical operator | NumPy logical function | NumPy bitwise function | Bitwise operator
-------------------------------------------------------------------------------------
       and       |  np.logical_and        | np.bitwise_and         |        &
-------------------------------------------------------------------------------------
       or        |  np.logical_or         | np.bitwise_or          |        |
-------------------------------------------------------------------------------------
                 |  np.logical_xor        | np.bitwise_xor         |        ^
-------------------------------------------------------------------------------------
       not       |  np.logical_not        | np.invert              |        ~

其中的逻辑运算符不适合与NumPy阵列工作,熊猫系列,和熊猫DataFrames。其他的则在这些数据结构(和普通的Python对象)上工作,并在元素方面工作。但是,请对纯Python上的按位取反要小心,bool因为在这种情况下,布尔将被解释为整数(例如~Falsereturn -1~Truereturn -2)。

Logical operators for boolean indexing in Pandas

It’s important to realize that you cannot use any of the Python logical operators (and, or or not) on pandas.Series or pandas.DataFrames (similarly you cannot use them on numpy.arrays with more than one element). The reason why you cannot use those is because they implicitly call bool on their operands which throws an Exception because these data structures decided that the boolean of an array is ambiguous:

>>> import numpy as np
>>> import pandas as pd
>>> arr = np.array([1,2,3])
>>> s = pd.Series([1,2,3])
>>> df = pd.DataFrame([1,2,3])
>>> bool(arr)
ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()
>>> bool(s)
ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().
>>> bool(df)
ValueError: The truth value of a DataFrame is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().

I did cover this more extensively in my answer to the “Truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all()” Q+A.

NumPys logical functions

However NumPy provides element-wise operating equivalents to these operators as functions that can be used on numpy.array, pandas.Series, pandas.DataFrame, or any other (conforming) numpy.array subclass:

So, essentially, one should use (assuming df1 and df2 are pandas DataFrames):

np.logical_and(df1, df2)
np.logical_or(df1, df2)
np.logical_not(df1)
np.logical_xor(df1, df2)

Bitwise functions and bitwise operators for booleans

However in case you have boolean NumPy array, pandas Series, or pandas DataFrames you could also use the element-wise bitwise functions (for booleans they are – or at least should be – indistinguishable from the logical functions):

  • bitwise and: or the & operator
  • bitwise or: or the | operator
  • bitwise not: (or the alias np.bitwise_not) or the ~ operator
  • bitwise xor: or the ^ operator

Typically the operators are used. However when combined with comparison operators one has to remember to wrap the comparison in parenthesis because the bitwise operators have a higher precedence than the comparison operators:

(df1 < 10) | (df2 > 10)  # instead of the wrong df1 < 10 | df2 > 10

This may be irritating because the Python logical operators have a lower precendence than the comparison operators so you normally write a < 10 and b > 10 (where a and b are for example simple integers) and don’t need the parenthesis.

Differences between logical and bitwise operations (on non-booleans)

It is really important to stress that bit and logical operations are only equivalent for boolean NumPy arrays (and boolean Series & DataFrames). If these don’t contain booleans then the operations will give different results. I’ll include examples using NumPy arrays but the results will be similar for the pandas data structures:

>>> import numpy as np
>>> a1 = np.array([0, 0, 1, 1])
>>> a2 = np.array([0, 1, 0, 1])

>>> np.logical_and(a1, a2)
array([False, False, False,  True])
>>> np.bitwise_and(a1, a2)
array([0, 0, 0, 1], dtype=int32)

And since NumPy (and similarly pandas) does different things for boolean (Boolean or “mask” index arrays) and integer (Index arrays) indices the results of indexing will be also be different:

>>> a3 = np.array([1, 2, 3, 4])

>>> a3[np.logical_and(a1, a2)]
array([4])
>>> a3[np.bitwise_and(a1, a2)]
array([1, 1, 1, 2])

Summary table

Logical operator | NumPy logical function | NumPy bitwise function | Bitwise operator
-------------------------------------------------------------------------------------
       and       |  np.logical_and        | np.bitwise_and         |        &
-------------------------------------------------------------------------------------
       or        |  np.logical_or         | np.bitwise_or          |        |
-------------------------------------------------------------------------------------
                 |  np.logical_xor        | np.bitwise_xor         |        ^
-------------------------------------------------------------------------------------
       not       |  np.logical_not        | np.invert              |        ~

Where the logical operator does not work for NumPy arrays, pandas Series, and pandas DataFrames. The others work on these data structures (and plain Python objects) and work element-wise. However be careful with the bitwise invert on plain Python bools because the bool will be interpreted as integers in this context (for example ~False returns -1 and ~True returns -2).


声明:本站所有文章,如无特殊说明或标注,均为本站原创发布。任何个人或组织,在未征得本站同意时,禁止复制、盗用、采集、发布本站内容到任何网站、书籍等各类媒体平台。如若本站内容侵犯了原著者的合法权益,可联系我们进行处理。