python pandas:将带有参数的函数应用于一系列

问题:python pandas:将带有参数的函数应用于一系列

我想将带有参数的函数应用于python pandas中的系列:

x = my_series.apply(my_function, more_arguments_1)
y = my_series.apply(my_function, more_arguments_2)
...

文档描述了对apply方法的支持,但不接受任何参数。是否存在接受参数的其他方法?另外,我是否缺少一个简单的解决方法?

更新(2017年10月): 请注意,由于最初询问此问题以来,apply()已对熊猫进行了更新以处理位置和关键字参数,并且上面的文档链接现在反映了这一点,并说明了如何包括这两种类型的参数。

I want to apply a function with arguments to a series in python pandas:

x = my_series.apply(my_function, more_arguments_1)
y = my_series.apply(my_function, more_arguments_2)
...

The documentation describes support for an apply method, but it doesn’t accept any arguments. Is there a different method that accepts arguments? Alternatively, am I missing a simple workaround?

Update (October 2017): Note that since this question was originally asked that pandas apply() has been updated to handle positional and keyword arguments and the documentation link above now reflects that and shows how to include either type of argument.


回答 0

较新版本的pandas 确实允许您传递额外的参数(请参阅新文档)。现在,您可以执行以下操作:

my_series.apply(your_function, args=(2,3,4), extra_kw=1)

位置参数添加系列元素之后


对于旧版本的熊猫:

文档对此进行了清晰的解释。apply方法接受应具有单个参数的python函数。如果要传递更多参数,则应functools.partial按照Joel Cornett在其评论中的建议使用。

一个例子:

>>> import functools
>>> import operator
>>> add_3 = functools.partial(operator.add,3)
>>> add_3(2)
5
>>> add_3(7)
10

您也可以使用传递关键字参数partial

另一种方法是创建一个lambda:

my_series.apply((lambda x: your_func(a,b,c,d,...,x)))

但我认为使用partial会更好。

Newer versions of pandas do allow you to pass extra arguments (see the new documentation). So now you can do:

my_series.apply(your_function, args=(2,3,4), extra_kw=1)

The positional arguments are added after the element of the series.


For older version of pandas:

The documentation explains this clearly. The apply method accepts a python function which should have a single parameter. If you want to pass more parameters you should use functools.partial as suggested by Joel Cornett in his comment.

An example:

>>> import functools
>>> import operator
>>> add_3 = functools.partial(operator.add,3)
>>> add_3(2)
5
>>> add_3(7)
10

You can also pass keyword arguments using partial.

Another way would be to create a lambda:

my_series.apply((lambda x: your_func(a,b,c,d,...,x)))

But I think using partial is better.


回答 1

脚步:

  1. 创建一个数据框
  2. 创建一个功能
  3. 在apply语句中使用函数的命名参数。

x=pd.DataFrame([1,2,3,4])  

def add(i1, i2):  
    return i1+i2

x.apply(add,i2=9)

此示例的结果是,数据框中的每个数字都将添加到数字9中。

    0
0  10
1  11
2  12
3  13

说明:

“添加”功能具有两个参数:i1,i2。第一个参数将是数据帧中的值,第二个参数是我们传递给“ apply”函数的值。在这种情况下,我们使用关键字参数“ i2”将“ 9”传递给apply函数。

Steps:

  1. Create a dataframe
  2. Create a function
  3. Use the named arguments of the function in the apply statement.

Example

x=pd.DataFrame([1,2,3,4])  

def add(i1, i2):  
    return i1+i2

x.apply(add,i2=9)

The outcome of this example is that each number in the dataframe will be added to the number 9.

    0
0  10
1  11
2  12
3  13

Explanation:

The “add” function has two parameters: i1, i2. The first parameter is going to be the value in data frame and the second is whatever we pass to the “apply” function. In this case, we are passing “9” to the apply function using the keyword argument “i2”.


回答 2

Series.apply(func, convert_dtype=True, args=(), **kwds)

args : tuple

x = my_series.apply(my_function, args = (arg1,))
Series.apply(func, convert_dtype=True, args=(), **kwds)

args : tuple

x = my_series.apply(my_function, args = (arg1,))

回答 3

您可以将任何数量的参数传递给apply正在通过未命名参数传递,作为元组传递给args参数或通过内部由关键字捕获为字典的其他关键字参数传递给函数的kwds函数。

例如,让我们构建一个函数,该函数对于3到6之间的值返回True,否则返回False。

s = pd.Series(np.random.randint(0,10, 10))
s

0    5
1    3
2    1
3    1
4    6
5    0
6    3
7    4
8    9
9    6
dtype: int64

s.apply(lambda x: x >= 3 and x <= 6)

0     True
1     True
2    False
3    False
4     True
5    False
6     True
7     True
8    False
9     True
dtype: bool

这个匿名函数不是很灵活。让我们创建一个带有两个参数的普通函数,以控制我们在系列中所需的最小值和最大值。

def between(x, low, high):
    return x >= low and x =< high

我们可以通过将未命名的参数传递给来复制第一个函数的输出args

s.apply(between, args=(3,6))

或者我们可以使用命名参数

s.apply(between, low=3, high=6)

或两者兼而有之

s.apply(between, args=(3,), high=6)

You can pass any number of arguments to the function that apply is calling through either unnamed arguments, passed as a tuple to the args parameter, or through other keyword arguments internally captured as a dictionary by the kwds parameter.

For instance, let’s build a function that returns True for values between 3 and 6, and False otherwise.

s = pd.Series(np.random.randint(0,10, 10))
s

0    5
1    3
2    1
3    1
4    6
5    0
6    3
7    4
8    9
9    6
dtype: int64

s.apply(lambda x: x >= 3 and x <= 6)

0     True
1     True
2    False
3    False
4     True
5    False
6     True
7     True
8    False
9     True
dtype: bool

This anonymous function isn’t very flexible. Let’s create a normal function with two arguments to control the min and max values we want in our Series.

def between(x, low, high):
    return x >= low and x =< high

We can replicate the output of the first function by passing unnamed arguments to args:

s.apply(between, args=(3,6))

Or we can use the named arguments

s.apply(between, low=3, high=6)

Or even a combination of both

s.apply(between, args=(3,), high=6)