问题:应用具有多个参数的函数以创建新的pandas列

我想pandas通过将函数应用于两个现有列在数据框中创建一个新列。按照这个答案,当我只需要一个列作为参数时,我已经能够创建一个新列:

import pandas as pd
df = pd.DataFrame({"A": [10,20,30], "B": [20, 30, 10]})

def fx(x):
    return x * x

print(df)
df['newcolumn'] = df.A.apply(fx)
print(df)

但是,当函数需要多个参数时,我无法弄清楚该怎么做。例如,如何通过将A列和B列传递给下面的函数来创建新列?

def fxy(x, y):
    return x * y

I want to create a new column in a pandas data frame by applying a function to two existing columns. Following this answer I’ve been able to create a new column when I only need one column as an argument:

import pandas as pd
df = pd.DataFrame({"A": [10,20,30], "B": [20, 30, 10]})

def fx(x):
    return x * x

print(df)
df['newcolumn'] = df.A.apply(fx)
print(df)

However, I cannot figure out how to do the same thing when the function requires multiple arguments. For example, how do I create a new column by passing column A and column B to the function below?

def fxy(x, y):
    return x * y

回答 0

另外,您可以使用numpy基础函数:

>>> import numpy as np
>>> df = pd.DataFrame({"A": [10,20,30], "B": [20, 30, 10]})
>>> df['new_column'] = np.multiply(df['A'], df['B'])
>>> df
    A   B  new_column
0  10  20         200
1  20  30         600
2  30  10         300

或一般情况下向量化任意函数:

>>> def fx(x, y):
...     return x*y
...
>>> df['new_column'] = np.vectorize(fx)(df['A'], df['B'])
>>> df
    A   B  new_column
0  10  20         200
1  20  30         600
2  30  10         300

Alternatively, you can use numpy underlying function:

>>> import numpy as np
>>> df = pd.DataFrame({"A": [10,20,30], "B": [20, 30, 10]})
>>> df['new_column'] = np.multiply(df['A'], df['B'])
>>> df
    A   B  new_column
0  10  20         200
1  20  30         600
2  30  10         300

or vectorize arbitrary function in general case:

>>> def fx(x, y):
...     return x*y
...
>>> df['new_column'] = np.vectorize(fx)(df['A'], df['B'])
>>> df
    A   B  new_column
0  10  20         200
1  20  30         600
2  30  10         300

回答 1

如果可以重写函数,则可以使用@greenAfrican示例。但是,如果您不想重写函数,可以将其包装到apply内部的匿名函数中,如下所示:

>>> def fxy(x, y):
...     return x * y

>>> df['newcolumn'] = df.apply(lambda x: fxy(x['A'], x['B']), axis=1)
>>> df
    A   B  newcolumn
0  10  20        200
1  20  30        600
2  30  10        300

You can go with @greenAfrican example, if it’s possible for you to rewrite your function. But if you don’t want to rewrite your function, you can wrap it into anonymous function inside apply, like this:

>>> def fxy(x, y):
...     return x * y

>>> df['newcolumn'] = df.apply(lambda x: fxy(x['A'], x['B']), axis=1)
>>> df
    A   B  newcolumn
0  10  20        200
1  20  30        600
2  30  10        300

回答 2

这样可以解决问题:

df['newcolumn'] = df.A * df.B

您也可以这样做:

def fab(row):
  return row['A'] * row['B']

df['newcolumn'] = df.apply(fab, axis=1)

This solves the problem:

df['newcolumn'] = df.A * df.B

You could also do:

def fab(row):
  return row['A'] * row['B']

df['newcolumn'] = df.apply(fab, axis=1)

回答 3

如果您需要一次创建多个列

  1. 创建数据框:

    import pandas as pd
    df = pd.DataFrame({"A": [10,20,30], "B": [20, 30, 10]})
    
  2. 创建函数:

    def fab(row):                                                  
        return row['A'] * row['B'], row['A'] + row['B']
    
  3. 分配新列:

    df['newcolumn'], df['newcolumn2'] = zip(*df.apply(fab, axis=1))

If you need to create multiple columns at once:

  1. Create the dataframe:

    import pandas as pd
    df = pd.DataFrame({"A": [10,20,30], "B": [20, 30, 10]})
    
  2. Create the function:

    def fab(row):                                                  
        return row['A'] * row['B'], row['A'] + row['B']
    
  3. Assign the new columns:

    df['newcolumn'], df['newcolumn2'] = zip(*df.apply(fab, axis=1))
    

回答 4

另一种dict风格的干净语法:

df["new_column"] = df.apply(lambda x: x["A"] * x["B"], axis = 1)

要么,

df["new_column"] = df["A"] * df["B"]

One more dict style clean syntax:

df["new_column"] = df.apply(lambda x: x["A"] * x["B"], axis = 1)

or,

df["new_column"] = df["A"] * df["B"]

声明:本站所有文章,如无特殊说明或标注,均为本站原创发布。任何个人或组织,在未征得本站同意时,禁止复制、盗用、采集、发布本站内容到任何网站、书籍等各类媒体平台。如若本站内容侵犯了原著者的合法权益,可联系我们进行处理。