有条件替换熊猫

问题:有条件替换熊猫

我有一个DataFrame,我想用超过零的值替换特定列中的值。我以为这是实现此目标的一种方式:

df[df.my_channel > 20000].my_channel = 0

如果将通道复制到新的数据框中,这很简单:

df2 = df.my_channel 

df2[df2 > 20000] = 0

这完全符合我的要求,但似乎无法与通道一起用作原始DataFrame的一部分。

I have a DataFrame, and I want to replace the values in a particular column that exceed a value with zero. I had thought this was a way of achieving this:

df[df.my_channel > 20000].my_channel = 0

If I copy the channel into a new data frame it’s simple:

df2 = df.my_channel 

df2[df2 > 20000] = 0

This does exactly what I want, but seems not to work with the channel as part of the original DataFrame.


回答 0

.ixindexer可以在0.20.0之前的熊猫版本上正常工作,但是由于pandas为0.20.0 ,因此不推荐使用.ix indexer ,因此应避免使用它。而是可以使用或索引器。您可以通过以下方法解决此问题:.lociloc

mask = df.my_channel > 20000
column_name = 'my_channel'
df.loc[mask, column_name] = 0

或者,一行

df.loc[df.my_channel > 20000, 'my_channel'] = 0

mask帮助您选择这些行df.my_channel > 20000True,而df.loc[mask, column_name] = 0将值0到所选择的行,其中mask在其名称是列存放column_name

更新: 在这种情况下,应该使用,loc因为如果使用iloc,则会NotImplementedError告诉您基于iLocation的基于整数类型的布尔索引不可用

.ix indexer works okay for pandas version prior to 0.20.0, but since pandas 0.20.0, the .ix indexer is deprecated, so you should avoid using it. Instead, you can use .loc or iloc indexers. You can solve this problem by:

mask = df.my_channel > 20000
column_name = 'my_channel'
df.loc[mask, column_name] = 0

Or, in one line,

df.loc[df.my_channel > 20000, 'my_channel'] = 0

mask helps you to select the rows in which df.my_channel > 20000 is True, while df.loc[mask, column_name] = 0 sets the value 0 to the selected rows where maskholds in the column which name is column_name.

Update: In this case, you should use loc because if you use iloc, you will get a NotImplementedError telling you that iLocation based boolean indexing on an integer type is not available.


回答 1

尝试

df.loc[df.my_channel > 20000, 'my_channel'] = 0

注: 由于v0.20.0,ix 已被弃用,赞成loc/ iloc

Try

df.loc[df.my_channel > 20000, 'my_channel'] = 0

Note: Since v0.20.0, ix has been deprecated in favour of loc / iloc.


回答 2

np.where 功能如下:

df['X'] = np.where(df['Y']>=50, 'yes', 'no')

在您的情况下,您需要:

import numpy as np
df['my_channel'] = np.where(df.my_channel > 20000, 0, df.my_channel)

np.where function works as follows:

df['X'] = np.where(df['Y']>=50, 'yes', 'no')

In your case you would want:

import numpy as np
df['my_channel'] = np.where(df.my_channel > 20000, 0, df.my_channel)

回答 3

原始数据框不更新的原因是,链接索引可能会导致您修改副本而不是数据框的视图。该文档提供了以下建议:

在熊猫对象中设置值时,必须注意避免所谓的链接索引。

您有几种选择:-

loc +布尔索引

loc 可以用于设置值并支持布尔掩码:

df.loc[df['my_channel'] > 20000, 'my_channel'] = 0

mask +布尔索引

您可以分配给您的系列:

df['my_channel'] = df['my_channel'].mask(df['my_channel'] > 20000, 0)

或者,您可以就地更新系列:

df['my_channel'].mask(df['my_channel'] > 20000, 0, inplace=True)

np.where +布尔索引

可以通过分配当你的条件原系列使用NumPy的满足的; 但是,前两种解决方案更干净,因为它们仅显式更改指定的值。

df['my_channel'] = np.where(df['my_channel'] > 20000, 0, df['my_channel'])

The reason your original dataframe does not update is because chained indexing may cause you to modify a copy rather than a view of your dataframe. The docs give this advice:

When setting values in a pandas object, care must be taken to avoid what is called chained indexing.

You have a few alternatives:-

loc + Boolean indexing

loc may be used for setting values and supports Boolean masks:

df.loc[df['my_channel'] > 20000, 'my_channel'] = 0

mask + Boolean indexing

You can assign to your series:

df['my_channel'] = df['my_channel'].mask(df['my_channel'] > 20000, 0)

Or you can update your series in place:

df['my_channel'].mask(df['my_channel'] > 20000, 0, inplace=True)

np.where + Boolean indexing

You can use NumPy by assigning your original series when your condition is not satisfied; however, the first two solutions are cleaner since they explicitly change only specified values.

df['my_channel'] = np.where(df['my_channel'] > 20000, 0, df['my_channel'])

回答 4

我会用lambda一个函数SeriesDataFrame是这样的:

f = lambda x: 0 if x>100 else 1
df['my_column'] = df['my_column'].map(f)

我没有断言这是一种有效的方法,但是效果很好。

I would use lambda function on a Series of a DataFrame like this:

f = lambda x: 0 if x>100 else 1
df['my_column'] = df['my_column'].map(f)

I do not assert that this is an efficient way, but it works fine.


回答 5

试试这个:

df.my_channel = df.my_channel.where(df.my_channel <= 20000, other= 0)

要么

df.my_channel = df.my_channel.mask(df.my_channel > 20000, other= 0)

Try this:

df.my_channel = df.my_channel.where(df.my_channel <= 20000, other= 0)

or

df.my_channel = df.my_channel.mask(df.my_channel > 20000, other= 0)