问题:使用熊猫比较两列

以此为起点:

a = [['10', '1.2', '4.2'], ['15', '70', '0.03'], ['8', '5', '0']]
df = pd.DataFrame(a, columns=['one', 'two', 'three'])

Out[8]: 
  one  two three
0   10  1.2   4.2
1   15  70   0.03
2    8   5     0

我想if在熊猫中使用类似声明的内容。

if df['one'] >= df['two'] and df['one'] <= df['three']:
    df['que'] = df['one']

基本上,通过if语句检查每一行,然后创建新列。

文档说要使用,.all但没有示例…

Using this as a starting point:

a = [['10', '1.2', '4.2'], ['15', '70', '0.03'], ['8', '5', '0']]
df = pd.DataFrame(a, columns=['one', 'two', 'three'])

Out[8]: 
  one  two three
0   10  1.2   4.2
1   15  70   0.03
2    8   5     0

I want to use something like an if statement within pandas.

if df['one'] >= df['two'] and df['one'] <= df['three']:
    df['que'] = df['one']

Basically, check each row via the if statement, create new column.

The docs say to use .all but there is no example…


回答 0

您可以使用np.where。如果cond是布尔数组,A并且B是数组,则

C = np.where(cond, A, B)

将C定义为等于A哪里cond为True,B哪里cond为False。

import numpy as np
import pandas as pd

a = [['10', '1.2', '4.2'], ['15', '70', '0.03'], ['8', '5', '0']]
df = pd.DataFrame(a, columns=['one', 'two', 'three'])

df['que'] = np.where((df['one'] >= df['two']) & (df['one'] <= df['three'])
                     , df['one'], np.nan)

Yield

  one  two three  que
0  10  1.2   4.2   10
1  15   70  0.03  NaN
2   8    5     0  NaN

如果您有多个条件,则可以使用np.select代替。例如,如果你想df['que']等于df['two']df['one'] < df['two'],则

conditions = [
    (df['one'] >= df['two']) & (df['one'] <= df['three']), 
    df['one'] < df['two']]

choices = [df['one'], df['two']]

df['que'] = np.select(conditions, choices, default=np.nan)

Yield

  one  two three  que
0  10  1.2   4.2   10
1  15   70  0.03   70
2   8    5     0  NaN

如果我们可以假设df['one'] >= df['two']whendf['one'] < df['two']为False,那么条件和选择可以简化为

conditions = [
    df['one'] < df['two'],
    df['one'] <= df['three']]

choices = [df['two'], df['one']]

(如果包含df['one']df['two']包含NaN,则该假设可能不正确。)


注意

a = [['10', '1.2', '4.2'], ['15', '70', '0.03'], ['8', '5', '0']]
df = pd.DataFrame(a, columns=['one', 'two', 'three'])

用字符串值定义一个DataFrame。由于它们看起来是数字,因此最好将这些字符串转换为浮点数:

df2 = df.astype(float)

但是,这会改变结果,因为字符串会逐个字符进行比较,而浮点数会进行数字比较。

In [61]: '10' <= '4.2'
Out[61]: True

In [62]: 10 <= 4.2
Out[62]: False

You could use np.where. If cond is a boolean array, and A and B are arrays, then

C = np.where(cond, A, B)

defines C to be equal to A where cond is True, and B where cond is False.

import numpy as np
import pandas as pd

a = [['10', '1.2', '4.2'], ['15', '70', '0.03'], ['8', '5', '0']]
df = pd.DataFrame(a, columns=['one', 'two', 'three'])

df['que'] = np.where((df['one'] >= df['two']) & (df['one'] <= df['three'])
                     , df['one'], np.nan)

yields

  one  two three  que
0  10  1.2   4.2   10
1  15   70  0.03  NaN
2   8    5     0  NaN

If you have more than one condition, then you could use np.select instead. For example, if you wish df['que'] to equal df['two'] when df['one'] < df['two'], then

conditions = [
    (df['one'] >= df['two']) & (df['one'] <= df['three']), 
    df['one'] < df['two']]

choices = [df['one'], df['two']]

df['que'] = np.select(conditions, choices, default=np.nan)

yields

  one  two three  que
0  10  1.2   4.2   10
1  15   70  0.03   70
2   8    5     0  NaN

If we can assume that df['one'] >= df['two'] when df['one'] < df['two'] is False, then the conditions and choices could be simplified to

conditions = [
    df['one'] < df['two'],
    df['one'] <= df['three']]

choices = [df['two'], df['one']]

(The assumption may not be true if df['one'] or df['two'] contain NaNs.)


Note that

a = [['10', '1.2', '4.2'], ['15', '70', '0.03'], ['8', '5', '0']]
df = pd.DataFrame(a, columns=['one', 'two', 'three'])

defines a DataFrame with string values. Since they look numeric, you might be better off converting those strings to floats:

df2 = df.astype(float)

This changes the results, however, since strings compare character-by-character, while floats are compared numerically.

In [61]: '10' <= '4.2'
Out[61]: True

In [62]: 10 <= 4.2
Out[62]: False

回答 1

您可以将其.equals用于列或整个数据框。

df['col1'].equals(df['col2'])

如果它们相等,则该语句将返回Trueelse False

You can use .equals for columns or entire dataframes.

df['col1'].equals(df['col2'])

If they’re equal, that statement will return True, else False.


回答 2

您可以使用apply()并执行类似的操作

df['que'] = df.apply(lambda x : x['one'] if x['one'] >= x['two'] and x['one'] <= x['three'] else "", axis=1)

或者如果您不想使用lambda

def que(x):
    if x['one'] >= x['two'] and x['one'] <= x['three']:
        return x['one']
    return ''
df['que'] = df.apply(que, axis=1)

You could use apply() and do something like this

df['que'] = df.apply(lambda x : x['one'] if x['one'] >= x['two'] and x['one'] <= x['three'] else "", axis=1)

or if you prefer not to use a lambda

def que(x):
    if x['one'] >= x['two'] and x['one'] <= x['three']:
        return x['one']
    return ''
df['que'] = df.apply(que, axis=1)

回答 3

一种方法是使用布尔序列对列进行索引df['one']。这将为您提供一个新列,其中的True条目与相同的行具有相同的值,df['one']并且这些False值为NaN

布尔级数仅由您的if语句给出(尽管必须使用&代替and):

>>> df['que'] = df['one'][(df['one'] >= df['two']) & (df['one'] <= df['three'])]
>>> df
    one two three   que
0   10  1.2 4.2      10
1   15  70  0.03    NaN
2   8   5   0       NaN

如果您希望将这些NaN值替换为其他值,则可以使用fillna新列上的方法que。我用的0不是这里的空字符串:

>>> df['que'] = df['que'].fillna(0)
>>> df
    one two three   que
0   10  1.2   4.2    10
1   15   70  0.03     0
2    8    5     0     0

One way is to use a Boolean series to index the column df['one']. This gives you a new column where the True entries have the same value as the same row as df['one'] and the False values are NaN.

The Boolean series is just given by your if statement (although it is necessary to use & instead of and):

>>> df['que'] = df['one'][(df['one'] >= df['two']) & (df['one'] <= df['three'])]
>>> df
    one two three   que
0   10  1.2 4.2      10
1   15  70  0.03    NaN
2   8   5   0       NaN

If you want the NaN values to be replaced by other values, you can use the fillna method on the new column que. I’ve used 0 instead of the empty string here:

>>> df['que'] = df['que'].fillna(0)
>>> df
    one two three   que
0   10  1.2   4.2    10
1   15   70  0.03     0
2    8    5     0     0

回答 4

将每个条件括在括号中,然后使用&运算符组合条件:

df.loc[(df['one'] >= df['two']) & (df['one'] <= df['three']), 'que'] = df['one']

您可以仅使用~(“ not”运算符)来反转匹配项来填充不匹配的行:

df.loc[~ ((df['one'] >= df['two']) & (df['one'] <= df['three'])), 'que'] = ''

您需要使用&~而不是and和,not因为&~运算符是逐个元素地工作的。

最终结果:

df
Out[8]: 
  one  two three que
0  10  1.2   4.2  10
1  15   70  0.03    
2   8    5     0  

Wrap each individual condition in parentheses, and then use the & operator to combine the conditions:

df.loc[(df['one'] >= df['two']) & (df['one'] <= df['three']), 'que'] = df['one']

You can fill the non-matching rows by just using ~ (the “not” operator) to invert the match:

df.loc[~ ((df['one'] >= df['two']) & (df['one'] <= df['three'])), 'que'] = ''

You need to use & and ~ rather than and and not because the & and ~ operators work element-by-element.

The final result:

df
Out[8]: 
  one  two three que
0  10  1.2   4.2  10
1  15   70  0.03    
2   8    5     0  

回答 5

使用np.select,如果你必须从数据帧和输出特定的选择在不同的列中选中多个条件

conditions=[(condition1),(condition2)]
choices=["choice1","chocie2"]

df["new column"]=np.select=(condtion,choice,default=)

注意:没有条件,没有选择项应该匹配,如果对于两个不同的条件您有相同的选择,请重复选择文本

Use np.select if you have multiple conditions to be checked from the dataframe and output a specific choice in a different column

conditions=[(condition1),(condition2)]
choices=["choice1","chocie2"]

df["new column"]=np.select=(condtion,choice,default=)

Note: No of conditions and no of choices should match, repeat text in choice if for two different conditions you have same choices


回答 6

我认为最接近OP直觉的是内联if语句:

df['que'] = (df['one'] if ((df['one'] >= df['two']) and (df['one'] <= df['three'])) 

I think the closest to the OP’s intuition is an inline if statement:

df['que'] = (df['one'] if ((df['one'] >= df['two']) and (df['one'] <= df['three'])) 

声明:本站所有文章,如无特殊说明或标注,均为本站原创发布。任何个人或组织,在未征得本站同意时,禁止复制、盗用、采集、发布本站内容到任何网站、书籍等各类媒体平台。如若本站内容侵犯了原著者的合法权益,可联系我们进行处理。