问题:Pandas中map,applymap和apply方法之间的区别

您能否通过基本示例告诉我何时使用这些矢量化方法?

我看到这map是一种Series方法,而其余都是DataFrame方法。我糊涂了约applyapplymap,虽然方法。为什么我们有两种将函数应用于DataFrame的方法?同样,简单的例子可以很好地说明用法!

Can you tell me when to use these vectorization methods with basic examples?

I see that map is a Series method whereas the rest are DataFrame methods. I got confused about apply and applymap methods though. Why do we have two methods for applying a function to a DataFrame? Again, simple examples which illustrate the usage would be great!


回答 0

直接来自Wes McKinney的《Python for Data Analysis》一书,第16页。132(我强烈推荐这本书):

另一种常见的操作是将一维数组上的函数应用于每一列或每一行。DataFrame的apply方法正是这样做的:

In [116]: frame = DataFrame(np.random.randn(4, 3), columns=list('bde'), index=['Utah', 'Ohio', 'Texas', 'Oregon'])

In [117]: frame
Out[117]: 
               b         d         e
Utah   -0.029638  1.081563  1.280300
Ohio    0.647747  0.831136 -1.549481
Texas   0.513416 -0.884417  0.195343
Oregon -0.485454 -0.477388 -0.309548

In [118]: f = lambda x: x.max() - x.min()

In [119]: frame.apply(f)
Out[119]: 
b    1.133201
d    1.965980
e    2.829781
dtype: float64

许多最常见的数组统计信息(例如sum和mean)都是DataFrame方法,因此不必使用apply。

也可以使用基于元素的Python函数。假设您要根据帧中的每个浮点值来计算格式化的字符串。您可以使用applymap做到这一点:

In [120]: format = lambda x: '%.2f' % x

In [121]: frame.applymap(format)
Out[121]: 
            b      d      e
Utah    -0.03   1.08   1.28
Ohio     0.65   0.83  -1.55
Texas    0.51  -0.88   0.20
Oregon  -0.49  -0.48  -0.31

之所以使用applymap之所以命名,是因为Series具有用于应用逐元素函数的map方法:

In [122]: frame['e'].map(format)
Out[122]: 
Utah       1.28
Ohio      -1.55
Texas      0.20
Oregon    -0.31
Name: e, dtype: object

总结起来,apply在DataFrame的行/列基础上工作,在DataFrame applymap上按map元素工作,在Series上按元素工作。

Straight from Wes McKinney’s Python for Data Analysis book, pg. 132 (I highly recommended this book):

Another frequent operation is applying a function on 1D arrays to each column or row. DataFrame’s apply method does exactly this:

In [116]: frame = DataFrame(np.random.randn(4, 3), columns=list('bde'), index=['Utah', 'Ohio', 'Texas', 'Oregon'])

In [117]: frame
Out[117]: 
               b         d         e
Utah   -0.029638  1.081563  1.280300
Ohio    0.647747  0.831136 -1.549481
Texas   0.513416 -0.884417  0.195343
Oregon -0.485454 -0.477388 -0.309548

In [118]: f = lambda x: x.max() - x.min()

In [119]: frame.apply(f)
Out[119]: 
b    1.133201
d    1.965980
e    2.829781
dtype: float64

Many of the most common array statistics (like sum and mean) are DataFrame methods, so using apply is not necessary.

Element-wise Python functions can be used, too. Suppose you wanted to compute a formatted string from each floating point value in frame. You can do this with applymap:

In [120]: format = lambda x: '%.2f' % x

In [121]: frame.applymap(format)
Out[121]: 
            b      d      e
Utah    -0.03   1.08   1.28
Ohio     0.65   0.83  -1.55
Texas    0.51  -0.88   0.20
Oregon  -0.49  -0.48  -0.31

The reason for the name applymap is that Series has a map method for applying an element-wise function:

In [122]: frame['e'].map(format)
Out[122]: 
Utah       1.28
Ohio      -1.55
Texas      0.20
Oregon    -0.31
Name: e, dtype: object

Summing up, apply works on a row / column basis of a DataFrame, applymap works element-wise on a DataFrame, and map works element-wise on a Series.


回答 1

比较mapapplymap和:上下文问题apply

第一个主要区别:定义

  • map 仅在系列上定义
  • applymap 仅在DataFrames上定义
  • apply 两者都定义

第二个主要区别:输入参数

  • map接受dictS, Series,或可调用
  • applymap并且apply只接受可调用

第三大区别:行为

  • map 对于系列是元素
  • applymap 对于DataFrames是元素
  • apply也可以逐元素工作,但适用于更复杂的操作和聚合。行为和返回值取决于函数。

四主要的区别(最重要的):用例

  • map是用于将值从一个域映射到另一个域,因此针对性能进行了优化(例如df['A'].map({1:'a', 2:'b', 3:'c'})
  • applymap适用于跨多个行/列的元素转换(例如df[['A', 'B', 'C']].applymap(str.strip)
  • apply用于应用无法向量化的任何功能(例如df['sentences'].apply(nltk.sent_tokenize)

总结

在此处输入图片说明

脚注

  1. map通过字典/系列时,将基于该字典/系列中的键映射元素。缺少的值将在输出中记录为NaN。
  2. applymap在最新版本中,已针对某些操作进行了优化。您会发现applymapapply某些情况下要快一些。我的建议是对它们都进行测试,并使用更好的方法。

  3. map针对元素映射和转换进行了优化。涉及字典或系列的操作将使熊猫能够使用更快的代码路径来获得更好的性能。

  4. Series.apply返回用于汇总操作的标量,否则返回Series。同样适用于DataFrame.apply。需要注意的是apply,当某些NumPy的功能,如所谓的也有FastPaths的meansum等等。

Comparing map, applymap and apply: Context Matters

First major difference: DEFINITION

  • map is defined on Series ONLY
  • applymap is defined on DataFrames ONLY
  • apply is defined on BOTH

Second major difference: INPUT ARGUMENT

  • map accepts dicts, Series, or callable
  • applymap and apply accept callables only

Third major difference: BEHAVIOR

  • map is elementwise for Series
  • applymap is elementwise for DataFrames
  • apply also works elementwise but is suited to more complex operations and aggregation. The behaviour and return value depends on the function.

Fourth major difference (the most important one): USE CASE

  • map is meant for mapping values from one domain to another, so is optimised for performance (e.g., df['A'].map({1:'a', 2:'b', 3:'c'}))
  • applymap is good for elementwise transformations across multiple rows/columns (e.g., df[['A', 'B', 'C']].applymap(str.strip))
  • apply is for applying any function that cannot be vectorised (e.g., df['sentences'].apply(nltk.sent_tokenize))

Summarising

enter image description here

Footnotes

  1. map when passed a dictionary/Series will map elements based on the keys in that dictionary/Series. Missing values will be recorded as NaN in the output.
  2. applymap in more recent versions has been optimised for some operations. You will find applymap slightly faster than apply in some cases. My suggestion is to test them both and use whatever works better.

  3. map is optimised for elementwise mappings and transformation. Operations that involve dictionaries or Series will enable pandas to use faster code paths for better performance.

  4. Series.apply returns a scalar for aggregating operations, Series otherwise. Similarly for DataFrame.apply. Note that apply also has fastpaths when called with certain NumPy functions such as mean, sum, etc.

回答 2

这些答案中有很多有用的信息,但是我要添加自己的信息,以明确总结哪些方法在数组方式与元素方式下均有效。jeremiahbuddha主要这样做,但没有提及Series.apply。我没有代表对此发表评论。

  • DataFrame.apply 一次对整个行或列进行操作。

  • DataFrame.applymap,,Series.apply并同时Series.map对一个元素进行操作。

Series.apply和的功能之间有很多重叠之处Series.map,这意味着任何一种在大多数情况下都可以使用。但是,它们确实有一些细微的差异,其中一些已在osa的答案中进行了讨论。

There’s great information in these answers, but I’m adding my own to clearly summarize which methods work array-wise versus element-wise. jeremiahbuddha mostly did this but did not mention Series.apply. I don’t have the rep to comment.

  • DataFrame.apply operates on entire rows or columns at a time.

  • DataFrame.applymap, Series.apply, and Series.map operate on one element at time.

There is a lot of overlap between the capabilities of Series.apply and Series.map, meaning that either one will work in most cases. They do have some slight differences though, some of which were discussed in osa’s answer.


回答 3

除了其他答案外,Series还有mapapply

Apply可以使DataFrame脱离系列;但是,map只会将一个系列放在另一个系列的每个单元格中,这可能不是您想要的。

In [40]: p=pd.Series([1,2,3])
In [41]: p
Out[31]:
0    1
1    2
2    3
dtype: int64

In [42]: p.apply(lambda x: pd.Series([x, x]))
Out[42]: 
   0  1
0  1  1
1  2  2
2  3  3

In [43]: p.map(lambda x: pd.Series([x, x]))
Out[43]: 
0    0    1
1    1
dtype: int64
1    0    2
1    2
dtype: int64
2    0    3
1    3
dtype: int64
dtype: object

另外,如果我有一个带有副作用的功能,例如“连接到Web服务器”,那么我可能apply只是为了清楚起见而使用。

series.apply(download_file_for_every_element) 

Map不仅可以使用功能,还可以使用字典或其他系列。假设您要操纵排列

采取

1 2 3 4 5
2 1 4 5 3

此排列的平方是

1 2 3 4 5
1 2 5 3 4

您可以使用进行计算map。不知道自助申请是否已记录在案,但可以在中使用0.15.1

In [39]: p=pd.Series([1,0,3,4,2])

In [40]: p.map(p)
Out[40]: 
0    0
1    1
2    4
3    2
4    3
dtype: int64

Adding to the other answers, in a Series there are also map and apply.

Apply can make a DataFrame out of a series; however, map will just put a series in every cell of another series, which is probably not what you want.

In [40]: p=pd.Series([1,2,3])
In [41]: p
Out[31]:
0    1
1    2
2    3
dtype: int64

In [42]: p.apply(lambda x: pd.Series([x, x]))
Out[42]: 
   0  1
0  1  1
1  2  2
2  3  3

In [43]: p.map(lambda x: pd.Series([x, x]))
Out[43]: 
0    0    1
1    1
dtype: int64
1    0    2
1    2
dtype: int64
2    0    3
1    3
dtype: int64
dtype: object

Also if I had a function with side effects, such as “connect to a web server”, I’d probably use apply just for the sake of clarity.

series.apply(download_file_for_every_element) 

Map can use not only a function, but also a dictionary or another series. Let’s say you want to manipulate permutations.

Take

1 2 3 4 5
2 1 4 5 3

The square of this permutation is

1 2 3 4 5
1 2 5 3 4

You can compute it using map. Not sure if self-application is documented, but it works in 0.15.1.

In [39]: p=pd.Series([1,0,3,4,2])

In [40]: p.map(p)
Out[40]: 
0    0
1    1
2    4
3    2
4    3
dtype: int64

回答 4

@jeremiahbuddha提到了apply在行/列上的工作,而applymap在元素上工作。但似乎您仍可以使用apply进行元素计算。

    frame.apply(np.sqrt)
    Out[102]: 
                   b         d         e
    Utah         NaN  1.435159       NaN
    Ohio    1.098164  0.510594  0.729748
    Texas        NaN  0.456436  0.697337
    Oregon  0.359079       NaN       NaN

    frame.applymap(np.sqrt)
    Out[103]: 
                   b         d         e
    Utah         NaN  1.435159       NaN
    Ohio    1.098164  0.510594  0.729748
    Texas        NaN  0.456436  0.697337
    Oregon  0.359079       NaN       NaN

@jeremiahbuddha mentioned that apply works on row/columns, while applymap works element-wise. But it seems you can still use apply for element-wise computation….

    frame.apply(np.sqrt)
    Out[102]: 
                   b         d         e
    Utah         NaN  1.435159       NaN
    Ohio    1.098164  0.510594  0.729748
    Texas        NaN  0.456436  0.697337
    Oregon  0.359079       NaN       NaN

    frame.applymap(np.sqrt)
    Out[103]: 
                   b         d         e
    Utah         NaN  1.435159       NaN
    Ohio    1.098164  0.510594  0.729748
    Texas        NaN  0.456436  0.697337
    Oregon  0.359079       NaN       NaN

回答 5

只是想指出一点,因为我为此苦了一点

def f(x):
    if x < 0:
        x = 0
    elif x > 100000:
        x = 100000
    return x

df.applymap(f)
df.describe()

这不会修改数据框本身,必须重新分配

df = df.applymap(f)
df.describe()

Just wanted to point out, as I struggled with this for a bit

def f(x):
    if x < 0:
        x = 0
    elif x > 100000:
        x = 100000
    return x

df.applymap(f)
df.describe()

this does not modify the dataframe itself, has to be reassigned

df = df.applymap(f)
df.describe()

回答 6

可能最简单的解释apply和applymap之间的区别:

apply将整个列作为参数,然后将结果分配给该列

applymap将单独的单元格值作为参数,并将结果分配回该单元格。

注意:如果apply返回单个值,则分配后将具有该值而不是列,最终将仅具有一行而不是矩阵。

Probably simplest explanation the difference between apply and applymap:

apply takes the whole column as a parameter and then assign the result to this column

applymap takes the separate cell value as a parameter and assign the result back to this cell.

NB If apply returns the single value you will have this value instead of the column after assigning and eventually will have just a row instead of matrix.


回答 7

我的理解:

从功能上看:

如果函数具有需要在列/行中进行比较的变量,请使用 apply

例如:lambda x: x.max()-x.mean()

如果要将函数应用于每个元素:

1>如果找到列/行,请使用 apply

2>如果适用于整个数据框,请使用 applymap

majority = lambda x : x > 17
df2['legal_drinker'] = df2['age'].apply(majority)

def times10(x):
  if type(x) is int:
    x *= 10 
  return x
df2.applymap(times10)

My understanding:

From the function point of view:

If the function has variables that need to compare within a column/ row, use apply.

e.g.: lambda x: x.max()-x.mean().

If the function is to be applied to each element:

1> If a column/row is located, use apply

2> If apply to entire dataframe, use applymap

majority = lambda x : x > 17
df2['legal_drinker'] = df2['age'].apply(majority)

def times10(x):
  if type(x) is int:
    x *= 10 
  return x
df2.applymap(times10)

回答 8

基于cs95的答案

  • map 仅在系列上定义
  • applymap 仅在DataFrames上定义
  • apply 两者都定义

举一些例子

In [3]: frame = pd.DataFrame(np.random.randn(4, 3), columns=list('bde'), index=['Utah', 'Ohio', 'Texas', 'Oregon'])

In [4]: frame
Out[4]:
            b         d         e
Utah    0.129885 -0.475957 -0.207679
Ohio   -2.978331 -1.015918  0.784675
Texas  -0.256689 -0.226366  2.262588
Oregon  2.605526  1.139105 -0.927518

In [5]: myformat=lambda x: f'{x:.2f}'

In [6]: frame.d.map(myformat)
Out[6]:
Utah      -0.48
Ohio      -1.02
Texas     -0.23
Oregon     1.14
Name: d, dtype: object

In [7]: frame.d.apply(myformat)
Out[7]:
Utah      -0.48
Ohio      -1.02
Texas     -0.23
Oregon     1.14
Name: d, dtype: object

In [8]: frame.applymap(myformat)
Out[8]:
            b      d      e
Utah     0.13  -0.48  -0.21
Ohio    -2.98  -1.02   0.78
Texas   -0.26  -0.23   2.26
Oregon   2.61   1.14  -0.93

In [9]: frame.apply(lambda x: x.apply(myformat))
Out[9]:
            b      d      e
Utah     0.13  -0.48  -0.21
Ohio    -2.98  -1.02   0.78
Texas   -0.26  -0.23   2.26
Oregon   2.61   1.14  -0.93


In [10]: myfunc=lambda x: x**2

In [11]: frame.applymap(myfunc)
Out[11]:
            b         d         e
Utah    0.016870  0.226535  0.043131
Ohio    8.870453  1.032089  0.615714
Texas   0.065889  0.051242  5.119305
Oregon  6.788766  1.297560  0.860289

In [12]: frame.apply(myfunc)
Out[12]:
            b         d         e
Utah    0.016870  0.226535  0.043131
Ohio    8.870453  1.032089  0.615714
Texas   0.065889  0.051242  5.119305
Oregon  6.788766  1.297560  0.860289

Based on the answer of cs95

  • map is defined on Series ONLY
  • applymap is defined on DataFrames ONLY
  • apply is defined on BOTH

give some examples

In [3]: frame = pd.DataFrame(np.random.randn(4, 3), columns=list('bde'), index=['Utah', 'Ohio', 'Texas', 'Oregon'])

In [4]: frame
Out[4]:
            b         d         e
Utah    0.129885 -0.475957 -0.207679
Ohio   -2.978331 -1.015918  0.784675
Texas  -0.256689 -0.226366  2.262588
Oregon  2.605526  1.139105 -0.927518

In [5]: myformat=lambda x: f'{x:.2f}'

In [6]: frame.d.map(myformat)
Out[6]:
Utah      -0.48
Ohio      -1.02
Texas     -0.23
Oregon     1.14
Name: d, dtype: object

In [7]: frame.d.apply(myformat)
Out[7]:
Utah      -0.48
Ohio      -1.02
Texas     -0.23
Oregon     1.14
Name: d, dtype: object

In [8]: frame.applymap(myformat)
Out[8]:
            b      d      e
Utah     0.13  -0.48  -0.21
Ohio    -2.98  -1.02   0.78
Texas   -0.26  -0.23   2.26
Oregon   2.61   1.14  -0.93

In [9]: frame.apply(lambda x: x.apply(myformat))
Out[9]:
            b      d      e
Utah     0.13  -0.48  -0.21
Ohio    -2.98  -1.02   0.78
Texas   -0.26  -0.23   2.26
Oregon   2.61   1.14  -0.93


In [10]: myfunc=lambda x: x**2

In [11]: frame.applymap(myfunc)
Out[11]:
            b         d         e
Utah    0.016870  0.226535  0.043131
Ohio    8.870453  1.032089  0.615714
Texas   0.065889  0.051242  5.119305
Oregon  6.788766  1.297560  0.860289

In [12]: frame.apply(myfunc)
Out[12]:
            b         d         e
Utah    0.016870  0.226535  0.043131
Ohio    8.870453  1.032089  0.615714
Texas   0.065889  0.051242  5.119305
Oregon  6.788766  1.297560  0.860289

回答 9

FOMO:

以下示例显示应用于DataFrame

map函数仅适用于Series。您不能map 在DataFrame上申请。

需要记住的是, apply可以做任何事情 applymap都可以,但apply额外的选项。

X因子选项包括:axisresult_typeresult_type仅在axis=1(仅适用于)时才适用。

df = DataFrame(1, columns=list('abc'),
                  index=list('1234'))
print(df)

f = lambda x: np.log(x)
print(df.applymap(f)) # apply to the whole dataframe
print(np.log(df)) # applied to the whole dataframe
print(df.applymap(np.sum)) # reducing can be applied for rows only

# apply can take different options (vs. applymap cannot)
print(df.apply(f)) # same as applymap
print(df.apply(sum, axis=1))  # reducing example
print(df.apply(np.log, axis=1)) # cannot reduce
print(df.apply(lambda x: [1, 2, 3], axis=1, result_type='expand')) # expand result

map附带说明一下,Series 函数不应与Python map函数混淆。

第一个应用于Series,以映射值,第二个应用于迭代对象的每个项目。


最后,不要将dataframe apply方法与groupby apply方法混淆。

FOMO:

The following example shows and applied to a DataFrame.

map function is something you do apply on Series only. You cannot apply map on DataFrame.

The thing to remember is that apply can do anything applymap can, but apply has eXtra options.

The X factor options are: axis and result_type where result_type only works when axis=1 (for columns).

df = DataFrame(1, columns=list('abc'),
                  index=list('1234'))
print(df)

f = lambda x: np.log(x)
print(df.applymap(f)) # apply to the whole dataframe
print(np.log(df)) # applied to the whole dataframe
print(df.applymap(np.sum)) # reducing can be applied for rows only

# apply can take different options (vs. applymap cannot)
print(df.apply(f)) # same as applymap
print(df.apply(sum, axis=1))  # reducing example
print(df.apply(np.log, axis=1)) # cannot reduce
print(df.apply(lambda x: [1, 2, 3], axis=1, result_type='expand')) # expand result

As a sidenote, Series map function, should not be confused with the Python map function.

The first one is applied on Series, to map the values, and the second one to every item of an iterable.


Lastly don’t confuse the dataframe apply method with groupby apply method.


声明:本站所有文章,如无特殊说明或标注,均为本站原创发布。任何个人或组织,在未征得本站同意时,禁止复制、盗用、采集、发布本站内容到任何网站、书籍等各类媒体平台。如若本站内容侵犯了原著者的合法权益,可联系我们进行处理。