问题:根据列表索引选择熊猫行

我有一个数据框df:

   20060930  10.103       NaN     10.103   7.981
   20061231  15.915       NaN     15.915  12.686
   20070331   3.196       NaN      3.196   2.710
   20070630   7.907       NaN      7.907   6.459

然后,我想选择列表中指示的具有某些序列号的行,假设这里是[1,3],然后向左移:

   20061231  15.915       NaN     15.915  12.686
   20070630   7.907       NaN      7.907   6.459

如何或什么功能可以做到这一点?

I have a dataframe df :

   20060930  10.103       NaN     10.103   7.981
   20061231  15.915       NaN     15.915  12.686
   20070331   3.196       NaN      3.196   2.710
   20070630   7.907       NaN      7.907   6.459

Then I want to select rows with certain sequence numbers which indicated in a list, suppose here is [1,3], then left:

   20061231  15.915       NaN     15.915  12.686
   20070630   7.907       NaN      7.907   6.459

How or what function can do that ?


回答 0

List = [1, 3]
df.ix[List]

应该做的把戏!当我用数据帧建立索引时,我总是使用.ix()方法。它是如此容易和灵活…

UPDATE 这不再是可接受的索引编制方法。该ix方法已弃用。使用.iloc基于整数索引和.loc基于标签索引。

ind_list = [1, 3]
df.ix[ind_list]

should do the trick! When I index with data frames I always use the .ix() method. Its so much easier and more flexible…

UPDATE This is no longer the accepted method for indexing. The ix method is deprecated. Use .iloc for integer based indexing and .loc for label based indexing.


回答 1

您还可以使用iloc:

df.iloc[[1,3],:]

如果由于先前的计算,如果数据框中的索引与行的顺序不对应,则此方法将无效。在这种情况下,请使用:

df.index.isin([1,3])

…如其他回应所建议。

you can also use iloc:

df.iloc[[1,3],:]

This will not work if the indexes in your dataframe do not correspond to the order of the rows due to prior computations. In that case use:

df.index.isin([1,3])

… as suggested in other responses.


回答 2

另一种方法(尽管它是更长的代码),但是比上面的代码要快。使用%timeit函数检查它:

df[df.index.isin([1,3])]

PS:您找出原因

在此处输入图片说明

Another way (although it is a longer code) but it is faster than the above codes. Check it using %timeit function:

df[df.index.isin([1,3])]

PS: You figure out the reason

enter image description here


回答 3

对于大型数据集,通过skiprows参数仅读取选定的行会节省内存。

pred = lambda x: x not in [1, 3]
pd.read_csv("data.csv", skiprows=pred, index_col=0, names=...)

现在,这将从文件中返回一个DataFrame,该文件将跳过除1和3之外的所有行。


细节

文档

skiprows :类似于列表或整数或可调用的列表,默认 None

如果可调用,则将针对行索引评估可调用函数,如果应跳过该行,则返回True,否则返回False。有效的可调用参数的一个示例是lambda x: x in [0, 2]

此功能适用于熊猫0.20.0+版本。另请参见相应的问题相关文章

For large datasets, it is memory efficient to read only selected rows via the skiprows parameter.

Example

pred = lambda x: x not in [1, 3]
pd.read_csv("data.csv", skiprows=pred, index_col=0, names=...)

This will now return a DataFrame from a file that skips all rows except 1 and 3.


Details

From the docs:

skiprows : list-like or integer or callable, default None

If callable, the callable function will be evaluated against the row indices, returning True if the row should be skipped and False otherwise. An example of a valid callable argument would be lambda x: x in [0, 2]

This feature works in version pandas 0.20.0+. See also the corresponding issue and a related post.


声明:本站所有文章,如无特殊说明或标注,均为本站原创发布。任何个人或组织,在未征得本站同意时,禁止复制、盗用、采集、发布本站内容到任何网站、书籍等各类媒体平台。如若本站内容侵犯了原著者的合法权益,可联系我们进行处理。