我很好奇,为什么df[2]不支持,而df.ix[2]与df[2:3]这两个工作。 In [26]: df.ix[2] Out[26]: A 1.027680 B 1.514210 C -1.466963 D -0.162339 Name: 2000-01-03 00:00:00 In [27]: df[2:3] Out[27]: A B C D 2000-01-03 1.02768 1.51421 -1.466963 -0.162339 我希望df[2]以df[2:3]与Python索引约定一致的方式进行工作。是否有设计原因不支持按单个整数索引行?
问题:通过整数索引选择一行熊猫系列/数据框
我很好奇,为什么df[2]
不支持,而df.ix[2]
与df[2:3]
这两个工作。
In [26]: df.ix[2]
Out[26]:
A 1.027680
B 1.514210
C -1.466963
D -0.162339
Name: 2000-01-03 00:00:00
In [27]: df[2:3]
Out[27]:
A B C D
2000-01-03 1.02768 1.51421 -1.466963 -0.162339
我希望df[2]
以df[2:3]
与Python索引约定一致的方式进行工作。是否有设计原因不支持按单个整数索引行?
I am curious as to why df[2]
is not supported, while df.ix[2]
and df[2:3]
both work.
In [26]: df.ix[2]
Out[26]:
A 1.027680
B 1.514210
C -1.466963
D -0.162339
Name: 2000-01-03 00:00:00
In [27]: df[2:3]
Out[27]:
A B C D
2000-01-03 1.02768 1.51421 -1.466963 -0.162339
I would expect df[2]
to work the same way as df[2:3]
to be consistent with Python indexing convention. Is there a design reason for not supporting indexing row by single integer?
回答 0
回显@HYRY,请参阅0.11中的新文档
http://pandas.pydata.org/pandas-docs/stable/indexing.html
在这里,我们有了新的运算符,.iloc
以明确支持仅整数索引,并且.loc
明确支持仅标签索引
例如,想象这种情况
In [1]: df = pd.DataFrame(np.random.rand(5,2),index=range(0,10,2),columns=list('AB'))
In [2]: df
Out[2]:
A B
0 1.068932 -0.794307
2 -0.470056 1.192211
4 -0.284561 0.756029
6 1.037563 -0.267820
8 -0.538478 -0.800654
In [5]: df.iloc[[2]]
Out[5]:
A B
4 -0.284561 0.756029
In [6]: df.loc[[2]]
Out[6]:
A B
2 -0.470056 1.192211
[]
仅对行进行切片(按标签位置)
echoing @HYRY, see the new docs in 0.11
http://pandas.pydata.org/pandas-docs/stable/indexing.html
Here we have new operators, .iloc
to explicity support only integer indexing, and .loc
to explicity support only label indexing
e.g. imagine this scenario
In [1]: df = pd.DataFrame(np.random.rand(5,2),index=range(0,10,2),columns=list('AB'))
In [2]: df
Out[2]:
A B
0 1.068932 -0.794307
2 -0.470056 1.192211
4 -0.284561 0.756029
6 1.037563 -0.267820
8 -0.538478 -0.800654
In [5]: df.iloc[[2]]
Out[5]:
A B
4 -0.284561 0.756029
In [6]: df.loc[[2]]
Out[6]:
A B
2 -0.470056 1.192211
[]
slices the rows (by label location) only
回答 1
DataFrame索引运算符的主要目的[]
是选择列。
当索引运算符传递字符串或整数时,它将尝试查找具有该特定名称的列并将其作为Series返回。
因此,在上述问题中:df[2]
搜索与整数值匹配的列名2
。该列不存在,并且KeyError
引发a。
使用切片符号时,DataFrame索引运算符完全更改行为以选择行
奇怪的是,当给定切片时,DataFrame索引运算符选择行,并且可以按整数位置或按索引标签来选择行。
df[2:3]
这将从整数位置为2的行开始切为3,最后一个元素除外。因此,只需一行。下面的代码选择从整数位置6开始的行,直到每第三行从20开始但不包括20的行。
df[6:20:3]
如果DataFrame索引中包含字符串,则还可以使用由字符串标签组成的切片。有关更多详细信息,请参见.iloc与.loc上的此解决方案。
我几乎从未将这种切片符号与索引运算符一起使用,因为它不是显式的,而且几乎从未使用过。按行切片时,请坚持使用.loc/.iloc
。
The primary purpose of the DataFrame indexing operator, []
is to select columns.
When the indexing operator is passed a string or integer, it attempts to find a column with that particular name and return it as a Series.
So, in the question above: df[2]
searches for a column name matching the integer value 2
. This column does not exist and a KeyError
is raised.
The DataFrame indexing operator completely changes behavior to select rows when slice notation is used
Strangely, when given a slice, the DataFrame indexing operator selects rows and can do so by integer location or by index label.
df[2:3]
This will slice beginning from the row with integer location 2 up to 3, exclusive of the last element. So, just a single row. The following selects rows beginning at integer location 6 up to but not including 20 by every third row.
df[6:20:3]
You can also use slices consisting of string labels if your DataFrame index has strings in it. For more details, see this solution on .iloc vs .loc.
I almost never use this slice notation with the indexing operator as its not explicit and hardly ever used. When slicing by rows, stick with .loc/.iloc
.
回答 2
回答 3
要基于索引访问熊猫表,还可以考虑使用numpy.as_array选项将表转换为Numpy数组,方法如下:
np_df = df.as_matrix()
然后
np_df[i]
会工作。
To index-based access to the pandas table, one can also consider numpy.as_array option to convert the table to Numpy array as
np_df = df.as_matrix()
and then
np_df[i]
would work.
回答 4
您可以看一下源代码。
DataFrame
具有对_slice()
进行切片的私有函数DataFrame
,并且它允许参数axis
确定要切片的轴。在__getitem__()
对DataFrame
不设置轴,同时调用_slice()
。因此_slice()
,默认情况下将其切片为轴0。
您可以进行一个简单的实验,这可能对您有所帮助:
print df._slice(slice(0, 2))
print df._slice(slice(0, 2), 0)
print df._slice(slice(0, 2), 1)
You can take a look at the source code .
DataFrame
has a private function _slice()
to slice the DataFrame
, and it allows the parameter axis
to determine which axis to slice. The __getitem__()
for DataFrame
doesn't set the axis while invoking _slice()
. So the _slice()
slice it by default axis 0.
You can take a simple experiment, that might help you:
print df._slice(slice(0, 2))
print df._slice(slice(0, 2), 0)
print df._slice(slice(0, 2), 1)
回答 5
您可以像这样遍历数据帧。
for ad in range(1,dataframe_c.size):
print(dataframe_c.values[ad])
you can loop through the data frame like this .
for ad in range(1,dataframe_c.size):
print(dataframe_c.values[ad])
本文由 Python 实用宝典 作者:Python实用宝典 发表,其版权均为 Python 实用宝典 所有,文章内容系作者个人观点,不代表 Python 实用宝典 对观点赞同或支持。如需转载,请注明文章来源。