标签归档:resampling

熊猫每隔n行

问题:熊猫每隔n行

Dataframe.resample()仅适用于时间序列数据。我找不到从非时间序列数据中获取第n行的方法。最好的方法是什么?

Dataframe.resample() works only with timeseries data. I cannot find a way of getting every nth row from non-timeseries data. What is the best method?


回答 0

我会使用iloc,它根据整数位置并遵循常规python语法获取行/列切片。

df.iloc[::5, :]

I’d use iloc, which takes a row/column slice, both based on integer position and following normal python syntax.

df.iloc[::5, :]

回答 1

尽管@chrisb接受的答案确实回答了该问题,但我想在此添加以下内容。

我用来获取nth数据或删除nth行的一种简单方法如下:

df1 = df[df.index % 3 != 0]  # Excludes every 3rd row starting from 0
df2 = df[df.index % 3 == 0]  # Selects every 3rd raw starting from 0

这种基于算术的采样具有实现甚至更复杂的行选择的能力。

当然,这假设您有一index列从0开始的有序,连续的整数

Though @chrisb’s accepted answer does answer the question, I would like to add to it the following.

A simple method I use to get the nth data or drop the nth row is the following:

df1 = df[df.index % 3 != 0]  # Excludes every 3rd row starting from 0
df2 = df[df.index % 3 == 0]  # Selects every 3rd raw starting from 0

This arithmetic based sampling has the ability to enable even more complex row-selections.

This assumes, of course, that you have an index column of ordered, consecutive, integers starting at 0.


回答 2

对于接受的答案,有一个甚至更简单的解决方案,涉及直接调用df.__getitem__

df = pd.DataFrame('x', index=range(5), columns=list('abc'))
df

   a  b  c
0  x  x  x
1  x  x  x
2  x  x  x
3  x  x  x
4  x  x  x

例如,要获取每2行,您可以执行

df[::2]

   a  b  c
0  x  x  x
2  x  x  x
4  x  x  x

还有GroupBy.first/ GroupBy.head,您对索引进行分组:

df.index // 2
# Int64Index([0, 0, 1, 1, 2], dtype='int64')

df.groupby(df.index // 2).first()
# Alternatively,
# df.groupby(df.index // 2).head(1)

   a  b  c
0  x  x  x
1  x  x  x
2  x  x  x

索引被步幅(在本例中为2)划分为底数。如果索引是非数字的,请执行

# df.groupby(np.arange(len(df)) // 2).first()
df.groupby(pd.RangeIndex(len(df)) // 2).first()

   a  b  c
0  x  x  x
1  x  x  x
2  x  x  x

There is an even simpler solution to the accepted answer that involves directly invoking df.__getitem__.

df = pd.DataFrame('x', index=range(5), columns=list('abc'))
df

   a  b  c
0  x  x  x
1  x  x  x
2  x  x  x
3  x  x  x
4  x  x  x

For example, to get every 2 rows, you can do

df[::2]

   a  b  c
0  x  x  x
2  x  x  x
4  x  x  x

There’s also GroupBy.first/GroupBy.head, you group on the index:

df.index // 2
# Int64Index([0, 0, 1, 1, 2], dtype='int64')

df.groupby(df.index // 2).first()
# Alternatively,
# df.groupby(df.index // 2).head(1)

   a  b  c
0  x  x  x
1  x  x  x
2  x  x  x

The index is floor-divved by the stride (2, in this case). If the index is non-numeric, instead do

# df.groupby(np.arange(len(df)) // 2).first()
df.groupby(pd.RangeIndex(len(df)) // 2).first()

   a  b  c
0  x  x  x
1  x  x  x
2  x  x  x

回答 3

我也有类似的要求,但我希望特定组中的第n个物品。这就是我解决的方法。

groups = data.groupby(['group_key'])
selection = groups['index_col'].apply(lambda x: x % 3 == 0)
subset = data[selection]

I had a similar requirement, but I wanted the n’th item in a particular group. This is how I solved it.

groups = data.groupby(['group_key'])
selection = groups['index_col'].apply(lambda x: x % 3 == 0)
subset = data[selection]