熊猫:如何对单个列使用apply()函数?

问题:熊猫:如何对单个列使用apply()函数?

我有两列的熊猫数据框。我需要在不影响第二列的情况下更改第一列的值,并只更改第一列的值即可获取整个数据帧。我该如何使用大熊猫应用程序?

I have a pandas data frame with two columns. I need to change the values of the first column without affecting the second one and get back the whole data frame with just first column values changed. How can I do that using apply in pandas?


回答 0

给定一个示例数据框df为:

a,b
1,2
2,3
3,4
4,5

您想要的是:

df['a'] = df['a'].apply(lambda x: x + 1)

返回:

   a  b
0  2  2
1  3  3
2  4  4
3  5  5

Given a sample dataframe df as:

a,b
1,2
2,3
3,4
4,5

what you want is:

df['a'] = df['a'].apply(lambda x: x + 1)

that returns:

   a  b
0  2  2
1  3  3
2  4  4
3  5  5

回答 1

对于更好使用的单列map(),像这样:

df = pd.DataFrame([{'a': 15, 'b': 15, 'c': 5}, {'a': 20, 'b': 10, 'c': 7}, {'a': 25, 'b': 30, 'c': 9}])

    a   b  c
0  15  15  5
1  20  10  7
2  25  30  9



df['a'] = df['a'].map(lambda a: a / 2.)

      a   b  c
0   7.5  15  5
1  10.0  10  7
2  12.5  30  9

For a single column better to use map(), like this:

df = pd.DataFrame([{'a': 15, 'b': 15, 'c': 5}, {'a': 20, 'b': 10, 'c': 7}, {'a': 25, 'b': 30, 'c': 9}])

    a   b  c
0  15  15  5
1  20  10  7
2  25  30  9



df['a'] = df['a'].map(lambda a: a / 2.)

      a   b  c
0   7.5  15  5
1  10.0  10  7
2  12.5  30  9

回答 2

您根本不需要功能。您可以直接处理整个列。

示例数据:

>>> df = pd.DataFrame({'a': [100, 1000], 'b': [200, 2000], 'c': [300, 3000]})
>>> df

      a     b     c
0   100   200   300
1  1000  2000  3000

列中所有值的一半a

>>> df.a = df.a / 2
>>> df

     a     b     c
0   50   200   300
1  500  2000  3000

You don’t need a function at all. You can work on a whole column directly.

Example data:

>>> df = pd.DataFrame({'a': [100, 1000], 'b': [200, 2000], 'c': [300, 3000]})
>>> df

      a     b     c
0   100   200   300
1  1000  2000  3000

Half all the values in column a:

>>> df.a = df.a / 2
>>> df

     a     b     c
0   50   200   300
1  500  2000  3000

回答 3

尽管给定的响应是正确的,但是它们修改了初始数据帧,这并不总是令人满意的(并且,如果OP要求示例“使用apply”,那么他们可能想要一个返回新数据帧的版本,就像apply这样)。

可以使用assign:这可能assign对现有列有效,因为文档指出(重点是我的):

将新列分配给DataFrame。

返回一个新对象,该对象具有除新列之外的所有原始列。重新分配的现有列将被覆盖

简而言之:

In [1]: import pandas as pd

In [2]: df = pd.DataFrame([{'a': 15, 'b': 15, 'c': 5}, {'a': 20, 'b': 10, 'c': 7}, {'a': 25, 'b': 30, 'c': 9}])

In [3]: df.assign(a=lambda df: df.a / 2)
Out[3]: 
      a   b  c
0   7.5  15  5
1  10.0  10  7
2  12.5  30  9

In [4]: df
Out[4]: 
    a   b  c
0  15  15  5
1  20  10  7
2  25  30  9

请注意,该函数将传递给整个数据框,而不仅是要修改的列,因此您需要确保在lambda中选择正确的列。

Although the given responses are correct, they modify the initial data frame, which is not always desirable (and, given the OP asked for examples “using apply“, it might be they wanted a version that returns a new data frame, as apply does).

This is possible using assign: it is valid to assign to existing columns, as the documentation states (emphasis is mine):

Assign new columns to a DataFrame.

Returns a new object with all original columns in addition to new ones. Existing columns that are re-assigned will be overwritten.

In short:

In [1]: import pandas as pd

In [2]: df = pd.DataFrame([{'a': 15, 'b': 15, 'c': 5}, {'a': 20, 'b': 10, 'c': 7}, {'a': 25, 'b': 30, 'c': 9}])

In [3]: df.assign(a=lambda df: df.a / 2)
Out[3]: 
      a   b  c
0   7.5  15  5
1  10.0  10  7
2  12.5  30  9

In [4]: df
Out[4]: 
    a   b  c
0  15  15  5
1  20  10  7
2  25  30  9

Note that the function will be passed the whole dataframe, not only the column you want to modify, so you will need to make sure you select the right column in your lambda.


回答 4

如果您真的很关心apply函数的执行速度,并且有庞大的数据集需要处理,则可以使用swifter加快执行速度,以下是在swifter上实现pandas数据框的示例:

import pandas as pd
import swifter

def fnc(m):
    return m*3+4

df = pd.DataFrame({"m": [1,2,3,4,5,6], "c": [1,1,1,1,1,1], "x":[5,3,6,2,6,1]})

# apply a self created function to a single column in pandas
df["y"] = df.m.swifter.apply(fnc)

这将使您所有的CPU内核都能计算结果,因此比正常的应用功能要快得多。尝试让我知道它是否对您有用。

If you are really concerned about the execution speed of your apply function and you have a huge dataset to work on, you could use swifter to make faster execution, here is an example for swifter on pandas dataframe:

import pandas as pd
import swifter

def fnc(m):
    return m*3+4

df = pd.DataFrame({"m": [1,2,3,4,5,6], "c": [1,1,1,1,1,1], "x":[5,3,6,2,6,1]})

# apply a self created function to a single column in pandas
df["y"] = df.m.swifter.apply(fnc)

This will enable your all CPU cores to compute the result hence it will be much faster than normal apply functions. Try and let me know if it become useful for you.


回答 5

让我尝试使用日期时间并考虑空值或空白的复杂计算。我正在减少30年的datetime列,并使用apply方法以及lambda转换datetime格式。Line if x != '' else x将照顾所有空白或相应的空值。

df['Date'] = df['Date'].fillna('')
df['Date'] = df['Date'].apply(lambda x : ((datetime.datetime.strptime(str(x), '%m/%d/%Y') - datetime.timedelta(days=30*365)).strftime('%Y%m%d')) if x != '' else x)

Let me try a complex computation using datetime and considering nulls or empty spaces. I am reducing 30 years on a datetime column and using apply method as well as lambda and converting datetime format. Line if x != '' else x will take care of all empty spaces or nulls accordingly.

df['Date'] = df['Date'].fillna('')
df['Date'] = df['Date'].apply(lambda x : ((datetime.datetime.strptime(str(x), '%m/%d/%Y') - datetime.timedelta(days=30*365)).strftime('%Y%m%d')) if x != '' else x)