问题:Pandas中map,applymap和apply方法之间的区别
您能否通过基本示例告诉我何时使用这些矢量化方法?
我看到这map
是一种Series
方法,而其余都是DataFrame
方法。我糊涂了约apply
和applymap
,虽然方法。为什么我们有两种将函数应用于DataFrame的方法?同样,简单的例子可以很好地说明用法!
Can you tell me when to use these vectorization methods with basic examples?
I see that map
is a Series
method whereas the rest are DataFrame
methods. I got confused about apply
and applymap
methods though. Why do we have two methods for applying a function to a DataFrame? Again, simple examples which illustrate the usage would be great!
回答 0
直接来自Wes McKinney的《Python for Data Analysis》一书,第16页。132(我强烈推荐这本书):
另一种常见的操作是将一维数组上的函数应用于每一列或每一行。DataFrame的apply方法正是这样做的:
In [116]: frame = DataFrame(np.random.randn(4, 3), columns=list('bde'), index=['Utah', 'Ohio', 'Texas', 'Oregon'])
In [117]: frame
Out[117]:
b d e
Utah -0.029638 1.081563 1.280300
Ohio 0.647747 0.831136 -1.549481
Texas 0.513416 -0.884417 0.195343
Oregon -0.485454 -0.477388 -0.309548
In [118]: f = lambda x: x.max() - x.min()
In [119]: frame.apply(f)
Out[119]:
b 1.133201
d 1.965980
e 2.829781
dtype: float64
许多最常见的数组统计信息(例如sum和mean)都是DataFrame方法,因此不必使用apply。
也可以使用基于元素的Python函数。假设您要根据帧中的每个浮点值来计算格式化的字符串。您可以使用applymap做到这一点:
In [120]: format = lambda x: '%.2f' % x
In [121]: frame.applymap(format)
Out[121]:
b d e
Utah -0.03 1.08 1.28
Ohio 0.65 0.83 -1.55
Texas 0.51 -0.88 0.20
Oregon -0.49 -0.48 -0.31
之所以使用applymap之所以命名,是因为Series具有用于应用逐元素函数的map方法:
In [122]: frame['e'].map(format)
Out[122]:
Utah 1.28
Ohio -1.55
Texas 0.20
Oregon -0.31
Name: e, dtype: object
总结起来,apply
在DataFrame的行/列基础上工作,在DataFrame applymap
上按map
元素工作,在Series上按元素工作。
Straight from Wes McKinney’s Python for Data Analysis book, pg. 132 (I highly recommended this book):
Another frequent operation is applying a function on 1D arrays to each column or row. DataFrame’s apply method does exactly this:
In [116]: frame = DataFrame(np.random.randn(4, 3), columns=list('bde'), index=['Utah', 'Ohio', 'Texas', 'Oregon'])
In [117]: frame
Out[117]:
b d e
Utah -0.029638 1.081563 1.280300
Ohio 0.647747 0.831136 -1.549481
Texas 0.513416 -0.884417 0.195343
Oregon -0.485454 -0.477388 -0.309548
In [118]: f = lambda x: x.max() - x.min()
In [119]: frame.apply(f)
Out[119]:
b 1.133201
d 1.965980
e 2.829781
dtype: float64
Many of the most common array statistics (like sum and mean) are DataFrame methods, so using apply is not necessary.
Element-wise Python functions can be used, too. Suppose you wanted to compute a formatted string from each floating point value in frame. You can do this with applymap:
In [120]: format = lambda x: '%.2f' % x
In [121]: frame.applymap(format)
Out[121]:
b d e
Utah -0.03 1.08 1.28
Ohio 0.65 0.83 -1.55
Texas 0.51 -0.88 0.20
Oregon -0.49 -0.48 -0.31
The reason for the name applymap is that Series has a map method for applying an element-wise function:
In [122]: frame['e'].map(format)
Out[122]:
Utah 1.28
Ohio -1.55
Texas 0.20
Oregon -0.31
Name: e, dtype: object
Summing up, apply
works on a row / column basis of a DataFrame, applymap
works element-wise on a DataFrame, and map
works element-wise on a Series.
回答 1
第一个主要区别:定义
map
仅在系列上定义 applymap
仅在DataFrames上定义 apply
两者都定义
第二个主要区别:输入参数
map
接受dict
S, Series
,或可调用 applymap
并且apply
只接受可调用
第三大区别:行为
map
对于系列是元素 applymap
对于DataFrames是元素 apply
也可以逐元素工作,但适用于更复杂的操作和聚合。行为和返回值取决于函数。
四主要的区别(最重要的):用例
map
是用于将值从一个域映射到另一个域,因此针对性能进行了优化(例如df['A'].map({1:'a', 2:'b', 3:'c'})
) applymap
适用于跨多个行/列的元素转换(例如df[['A', 'B', 'C']].applymap(str.strip)
) apply
用于应用无法向量化的任何功能(例如df['sentences'].apply(nltk.sent_tokenize)
)
总结
脚注
map
通过字典/系列时,将基于该字典/系列中的键映射元素。缺少的值将在输出中记录为NaN。 applymap
在最新版本中,已针对某些操作进行了优化。您会发现applymap
比apply
某些情况下要快一些。我的建议是对它们都进行测试,并使用更好的方法。
map
针对元素映射和转换进行了优化。涉及字典或系列的操作将使熊猫能够使用更快的代码路径来获得更好的性能。
Series.apply
返回用于汇总操作的标量,否则返回Series。同样适用于DataFrame.apply
。需要注意的是apply
,当某些NumPy的功能,如所谓的也有FastPaths的mean
,
sum
等等。
Comparing map
, applymap
and ap
ply
: Context Matters
First major difference: DEFINITION
map
is defined on Series ONLY applymap
is defined on DataFrames ONLY apply
is defined on BOTH
Second major difference: INPUT ARGUMENT
map
accepts dict
s, Series
, or callable applymap
and apply
accept callables only
Third major difference: BEHAVIOR
map
is elementwise for Series applymap
is elementwise for DataFrames apply
also works elementwise but is suited to more complex operations and aggregation. The behaviour and return value depends on the function.
Fourth major difference (the most important one): USE CASE
map
is meant for mapping values from one domain to another, so is optimised for performance (e.g., df['A'].map({1:'a', 2:'b', 3:'c'})
) applymap
is good for elementwise transformations across multiple rows/columns (e.g., df[['A', 'B', 'C']].applymap(str.strip)
) apply
is for applying any function that cannot be vectorised (e.g., df['sentences'].apply(nltk.sent_tokenize)
)
Summarising
Footnotes
map
when passed a dictionary/Series will map elements based on the keys in that dictionary/Series. Missing values will be recorded as
NaN in the output. applymap
in more recent versions has been optimised for some operations. You will find applymap
slightly faster than apply
in
some cases. My suggestion is to test them both and use whatever works
better.
map
is optimised for elementwise mappings and transformation. Operations that involve dictionaries or Series will enable pandas to
use faster code paths for better performance.
Series.apply
returns a scalar for aggregating operations, Series otherwise. Similarly for DataFrame.apply
. Note that apply
also has
fastpaths when called with certain NumPy functions such as mean
,
sum
, etc.
回答 2
这些答案中有很多有用的信息,但是我要添加自己的信息,以明确总结哪些方法在数组方式与元素方式下均有效。jeremiahbuddha主要这样做,但没有提及Series.apply。我没有代表对此发表评论。
Series.apply
和的功能之间有很多重叠之处Series.map
,这意味着任何一种在大多数情况下都可以使用。但是,它们确实有一些细微的差异,其中一些已在osa的答案中进行了讨论。
There’s great information in these answers, but I’m adding my own to clearly summarize which methods work array-wise versus element-wise. jeremiahbuddha mostly did this but did not mention Series.apply. I don’t have the rep to comment.
DataFrame.apply
operates on entire rows or columns at a time.
DataFrame.applymap
, Series.apply
, and Series.map
operate on one
element at time.
There is a lot of overlap between the capabilities of Series.apply
and Series.map
, meaning that either one will work in most cases. They do have some slight differences though, some of which were discussed in osa’s answer.
回答 3
除了其他答案外,Series
还有map和apply。
Apply可以使DataFrame脱离系列;但是,map只会将一个系列放在另一个系列的每个单元格中,这可能不是您想要的。
In [40]: p=pd.Series([1,2,3])
In [41]: p
Out[31]:
0 1
1 2
2 3
dtype: int64
In [42]: p.apply(lambda x: pd.Series([x, x]))
Out[42]:
0 1
0 1 1
1 2 2
2 3 3
In [43]: p.map(lambda x: pd.Series([x, x]))
Out[43]:
0 0 1
1 1
dtype: int64
1 0 2
1 2
dtype: int64
2 0 3
1 3
dtype: int64
dtype: object
另外,如果我有一个带有副作用的功能,例如“连接到Web服务器”,那么我可能apply
只是为了清楚起见而使用。
series.apply(download_file_for_every_element)
Map
不仅可以使用功能,还可以使用字典或其他系列。假设您要操纵排列。
采取
1 2 3 4 5
2 1 4 5 3
此排列的平方是
1 2 3 4 5
1 2 5 3 4
您可以使用进行计算map
。不知道自助申请是否已记录在案,但可以在中使用0.15.1
。
In [39]: p=pd.Series([1,0,3,4,2])
In [40]: p.map(p)
Out[40]:
0 0
1 1
2 4
3 2
4 3
dtype: int64
Adding to the other answers, in a Series
there are also map and apply.
Apply can make a DataFrame out of a series; however, map will just put a series in every cell of another series, which is probably not what you want.
In [40]: p=pd.Series([1,2,3])
In [41]: p
Out[31]:
0 1
1 2
2 3
dtype: int64
In [42]: p.apply(lambda x: pd.Series([x, x]))
Out[42]:
0 1
0 1 1
1 2 2
2 3 3
In [43]: p.map(lambda x: pd.Series([x, x]))
Out[43]:
0 0 1
1 1
dtype: int64
1 0 2
1 2
dtype: int64
2 0 3
1 3
dtype: int64
dtype: object
Also if I had a function with side effects, such as “connect to a web server”, I’d probably use apply
just for the sake of clarity.
series.apply(download_file_for_every_element)
Map
can use not only a function, but also a dictionary or another series. Let’s say you want to manipulate permutations.
Take
1 2 3 4 5
2 1 4 5 3
The square of this permutation is
1 2 3 4 5
1 2 5 3 4
You can compute it using map
. Not sure if self-application is documented, but it works in 0.15.1
.
In [39]: p=pd.Series([1,0,3,4,2])
In [40]: p.map(p)
Out[40]:
0 0
1 1
2 4
3 2
4 3
dtype: int64
回答 4
@jeremiahbuddha提到了apply在行/列上的工作,而applymap在元素上工作。但似乎您仍可以使用apply进行元素计算。
frame.apply(np.sqrt)
Out[102]:
b d e
Utah NaN 1.435159 NaN
Ohio 1.098164 0.510594 0.729748
Texas NaN 0.456436 0.697337
Oregon 0.359079 NaN NaN
frame.applymap(np.sqrt)
Out[103]:
b d e
Utah NaN 1.435159 NaN
Ohio 1.098164 0.510594 0.729748
Texas NaN 0.456436 0.697337
Oregon 0.359079 NaN NaN
@jeremiahbuddha mentioned that apply works on row/columns, while applymap works element-wise. But it seems you can still use apply for element-wise computation….
frame.apply(np.sqrt)
Out[102]:
b d e
Utah NaN 1.435159 NaN
Ohio 1.098164 0.510594 0.729748
Texas NaN 0.456436 0.697337
Oregon 0.359079 NaN NaN
frame.applymap(np.sqrt)
Out[103]:
b d e
Utah NaN 1.435159 NaN
Ohio 1.098164 0.510594 0.729748
Texas NaN 0.456436 0.697337
Oregon 0.359079 NaN NaN
回答 5
只是想指出一点,因为我为此苦了一点
def f(x):
if x < 0:
x = 0
elif x > 100000:
x = 100000
return x
df.applymap(f)
df.describe()
这不会修改数据框本身,必须重新分配
df = df.applymap(f)
df.describe()
Just wanted to point out, as I struggled with this for a bit
def f(x):
if x < 0:
x = 0
elif x > 100000:
x = 100000
return x
df.applymap(f)
df.describe()
this does not modify the dataframe itself, has to be reassigned
df = df.applymap(f)
df.describe()
回答 6
可能最简单的解释apply和applymap之间的区别:
apply将整个列作为参数,然后将结果分配给该列
applymap将单独的单元格值作为参数,并将结果分配回该单元格。
注意:如果apply返回单个值,则分配后将具有该值而不是列,最终将仅具有一行而不是矩阵。
Probably simplest explanation the difference between apply and applymap:
apply takes the whole column as a parameter and then assign the result to this column
applymap takes the separate cell value as a parameter and assign the result back to this cell.
NB If apply returns the single value you will have this value instead of the column after assigning and eventually will have just a row instead of matrix.
回答 7
我的理解:
从功能上看:
如果函数具有需要在列/行中进行比较的变量,请使用
apply
。
例如:lambda x: x.max()-x.mean()
。
如果要将函数应用于每个元素:
1>如果找到列/行,请使用 apply
2>如果适用于整个数据框,请使用 applymap
majority = lambda x : x > 17
df2['legal_drinker'] = df2['age'].apply(majority)
def times10(x):
if type(x) is int:
x *= 10
return x
df2.applymap(times10)
My understanding:
From the function point of view:
If the function has variables that need to compare within a column/ row, use
apply
.
e.g.: lambda x: x.max()-x.mean()
.
If the function is to be applied to each element:
1> If a column/row is located, use apply
2> If apply to entire dataframe, use applymap
majority = lambda x : x > 17
df2['legal_drinker'] = df2['age'].apply(majority)
def times10(x):
if type(x) is int:
x *= 10
return x
df2.applymap(times10)
回答 8
基于cs95的答案
map
仅在系列上定义 applymap
仅在DataFrames上定义 apply
两者都定义
举一些例子
In [3]: frame = pd.DataFrame(np.random.randn(4, 3), columns=list('bde'), index=['Utah', 'Ohio', 'Texas', 'Oregon'])
In [4]: frame
Out[4]:
b d e
Utah 0.129885 -0.475957 -0.207679
Ohio -2.978331 -1.015918 0.784675
Texas -0.256689 -0.226366 2.262588
Oregon 2.605526 1.139105 -0.927518
In [5]: myformat=lambda x: f'{x:.2f}'
In [6]: frame.d.map(myformat)
Out[6]:
Utah -0.48
Ohio -1.02
Texas -0.23
Oregon 1.14
Name: d, dtype: object
In [7]: frame.d.apply(myformat)
Out[7]:
Utah -0.48
Ohio -1.02
Texas -0.23
Oregon 1.14
Name: d, dtype: object
In [8]: frame.applymap(myformat)
Out[8]:
b d e
Utah 0.13 -0.48 -0.21
Ohio -2.98 -1.02 0.78
Texas -0.26 -0.23 2.26
Oregon 2.61 1.14 -0.93
In [9]: frame.apply(lambda x: x.apply(myformat))
Out[9]:
b d e
Utah 0.13 -0.48 -0.21
Ohio -2.98 -1.02 0.78
Texas -0.26 -0.23 2.26
Oregon 2.61 1.14 -0.93
In [10]: myfunc=lambda x: x**2
In [11]: frame.applymap(myfunc)
Out[11]:
b d e
Utah 0.016870 0.226535 0.043131
Ohio 8.870453 1.032089 0.615714
Texas 0.065889 0.051242 5.119305
Oregon 6.788766 1.297560 0.860289
In [12]: frame.apply(myfunc)
Out[12]:
b d e
Utah 0.016870 0.226535 0.043131
Ohio 8.870453 1.032089 0.615714
Texas 0.065889 0.051242 5.119305
Oregon 6.788766 1.297560 0.860289
Based on the answer of cs95
map
is defined on Series ONLY applymap
is defined on DataFrames ONLY apply
is defined on BOTH
give some examples
In [3]: frame = pd.DataFrame(np.random.randn(4, 3), columns=list('bde'), index=['Utah', 'Ohio', 'Texas', 'Oregon'])
In [4]: frame
Out[4]:
b d e
Utah 0.129885 -0.475957 -0.207679
Ohio -2.978331 -1.015918 0.784675
Texas -0.256689 -0.226366 2.262588
Oregon 2.605526 1.139105 -0.927518
In [5]: myformat=lambda x: f'{x:.2f}'
In [6]: frame.d.map(myformat)
Out[6]:
Utah -0.48
Ohio -1.02
Texas -0.23
Oregon 1.14
Name: d, dtype: object
In [7]: frame.d.apply(myformat)
Out[7]:
Utah -0.48
Ohio -1.02
Texas -0.23
Oregon 1.14
Name: d, dtype: object
In [8]: frame.applymap(myformat)
Out[8]:
b d e
Utah 0.13 -0.48 -0.21
Ohio -2.98 -1.02 0.78
Texas -0.26 -0.23 2.26
Oregon 2.61 1.14 -0.93
In [9]: frame.apply(lambda x: x.apply(myformat))
Out[9]:
b d e
Utah 0.13 -0.48 -0.21
Ohio -2.98 -1.02 0.78
Texas -0.26 -0.23 2.26
Oregon 2.61 1.14 -0.93
In [10]: myfunc=lambda x: x**2
In [11]: frame.applymap(myfunc)
Out[11]:
b d e
Utah 0.016870 0.226535 0.043131
Ohio 8.870453 1.032089 0.615714
Texas 0.065889 0.051242 5.119305
Oregon 6.788766 1.297560 0.860289
In [12]: frame.apply(myfunc)
Out[12]:
b d e
Utah 0.016870 0.226535 0.043131
Ohio 8.870453 1.032089 0.615714
Texas 0.065889 0.051242 5.119305
Oregon 6.788766 1.297560 0.860289
回答 9
FOMO:
以下示例显示并应用于DataFrame
。
map
函数仅适用于Series。您不能map
在DataFrame上申请。
需要记住的是, apply
可以做任何事情 applymap
都可以,但apply
有额外的选项。
X因子选项包括:axis
和result_type
,result_type
仅在axis=1
(仅适用于)时才适用。
df = DataFrame(1, columns=list('abc'),
index=list('1234'))
print(df)
f = lambda x: np.log(x)
print(df.applymap(f)) # apply to the whole dataframe
print(np.log(df)) # applied to the whole dataframe
print(df.applymap(np.sum)) # reducing can be applied for rows only
# apply can take different options (vs. applymap cannot)
print(df.apply(f)) # same as applymap
print(df.apply(sum, axis=1)) # reducing example
print(df.apply(np.log, axis=1)) # cannot reduce
print(df.apply(lambda x: [1, 2, 3], axis=1, result_type='expand')) # expand result
map
附带说明一下,Series 函数不应与Python map
函数混淆。
第一个应用于Series,以映射值,第二个应用于迭代对象的每个项目。
最后,不要将dataframe apply
方法与groupby apply
方法混淆。
FOMO:
The following example shows and applied to a DataFrame
.
map
function is something you do apply on Series only. You cannot apply map
on DataFrame.
The thing to remember is that apply
can do anything applymap
can, but apply
has eXtra options.
The X factor options are: axis
and result_type
where result_type
only works when axis=1
(for columns).
df = DataFrame(1, columns=list('abc'),
index=list('1234'))
print(df)
f = lambda x: np.log(x)
print(df.applymap(f)) # apply to the whole dataframe
print(np.log(df)) # applied to the whole dataframe
print(df.applymap(np.sum)) # reducing can be applied for rows only
# apply can take different options (vs. applymap cannot)
print(df.apply(f)) # same as applymap
print(df.apply(sum, axis=1)) # reducing example
print(df.apply(np.log, axis=1)) # cannot reduce
print(df.apply(lambda x: [1, 2, 3], axis=1, result_type='expand')) # expand result
As a sidenote, Series map
function, should not be confused with the Python map
function.
The first one is applied on Series, to map the values, and the second one to every item of an iterable.
Lastly don’t confuse the dataframe apply
method with groupby apply
method.
声明:本站所有文章,如无特殊说明或标注,均为本站原创发布。任何个人或组织,在未征得本站同意时,禁止复制、盗用、采集、发布本站内容到任何网站、书籍等各类媒体平台。如若本站内容侵犯了原著者的合法权益,可联系我们进行处理。