熊猫-获取给定列的第一行值

Question 1

This seems like a ridiculously easy question… but I’m not seeing the easy answer I was expecting.

So, how do I get the value at an nth row of a given column in Pandas? (I am particularly interested in the first row, but would be interested in a more general practice as well).

For example, let’s say I want to pull the 1.2 value in Btime as a variable.

Whats the right way to do this?

df_test =

  ATime   X   Y   Z   Btime  C   D   E
0    1.2  2  15   2    1.2  12  25  12
1    1.4  3  12   1    1.3  13  22  11
2    1.5  1  10   6    1.4  11  20  16
3    1.6  2   9  10    1.7  12  29  12
4    1.9  1   1   9    1.9  11  21  19
5    2.0  0   0   0    2.0   8  10  11
6    2.4  0   0   0    2.4  10  12  15

Question 2

To select the ith row, use iloc:

In [31]: df_test.iloc[0]
Out[31]: 
ATime     1.2
X         2.0
Y        15.0
Z         2.0
Btime     1.2
C        12.0
D        25.0
E        12.0
Name: 0, dtype: float64

To select the ith value in the Btime column you could use:

In [30]: df_test['Btime'].iloc[0]
Out[30]: 1.2

There is a difference between `df_test['Btime'].iloc[0]` (recommended) and `df_test.iloc[0]['Btime']`:

DataFrames store data in column-based blocks (where each block has a single dtype). If you select by column first, a view can be returned (which is quicker than returning a copy) and the original dtype is preserved. In contrast, if you select by row first, and if the DataFrame has columns of different dtypes, then Pandas copies the data into a new Series of object dtype. So selecting columns is a bit faster than selecting rows. Thus, although df_test.iloc[0]['Btime'] works, df_test['Btime'].iloc[0] is a little bit more efficient.

There is a big difference between the two when it comes to assignment. df_test['Btime'].iloc[0] = x affects df_test, but df_test.iloc[0]['Btime'] may not. See below for an explanation of why. Because a subtle difference in the order of indexing makes a big difference in behavior, it is better to use single indexing assignment:

df.iloc[0, df.columns.get_loc('Btime')] = x

`df.iloc[0, df.columns.get_loc('Btime')] = x` (recommended):

The recommended way to assign new values to a DataFrame is to avoid chained indexing, and instead use the method shown by andrew,

df.loc[df.index[n], 'Btime'] = x

or

df.iloc[n, df.columns.get_loc('Btime')] = x

The latter method is a bit faster, because df.loc has to convert the row and column labels to positional indices, so there is a little less conversion necessary if you use df.iloc instead.

`df['Btime'].iloc[0] = x` works, but is not recommended:

Although this works, it is taking advantage of the way DataFrames are currently implemented. There is no guarantee that Pandas has to work this way in the future. In particular, it is taking advantage of the fact that (currently) df['Btime'] always returns a view (not a copy) so df['Btime'].iloc[n] = x can be used to assign a new value at the nth location of the Btime column of df.

Since Pandas makes no explicit guarantees about when indexers return a view versus a copy, assignments that use chained indexing generally always raise a SettingWithCopyWarning even though in this case the assignment succeeds in modifying df:

In [22]: df = pd.DataFrame({'foo':list('ABC')}, index=[0,2,1])
In [24]: df['bar'] = 100
In [25]: df['bar'].iloc[0] = 99
/home/unutbu/data/binky/bin/ipython:1: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  self._setitem_with_indexer(indexer, value)

In [26]: df
Out[26]: 
  foo  bar
0   A   99  <-- assignment succeeded
2   B  100
1   C  100

`df.iloc[0]['Btime'] = x` does not work:

In contrast, assignment with df.iloc[0]['bar'] = 123 does not work because df.iloc[0] is returning a copy:

In [66]: df.iloc[0]['bar'] = 123
/home/unutbu/data/binky/bin/ipython:1: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy

In [67]: df
Out[67]: 
  foo  bar
0   A   99  <-- assignment failed
2   B  100
1   C  100

Warning: I had previously suggested df_test.ix[i, 'Btime']. But this is not guaranteed to give you the ith value since ix tries to index by label before trying to index by position. So if the DataFrame has an integer index which is not in sorted order starting at 0, then using ix[i] will return the row labeled i rather than the ith row. For example,

In [1]: df = pd.DataFrame({'foo':list('ABC')}, index=[0,2,1])

In [2]: df
Out[2]: 
  foo
0   A
2   B
1   C

In [4]: df.ix[1, 'foo']
Out[4]: 'C'

Question 3

Note that the answer from @unutbu will be correct until you want to set the value to something new, then it will not work if your dataframe is a view.

In [4]: df = pd.DataFrame({'foo':list('ABC')}, index=[0,2,1])
In [5]: df['bar'] = 100
In [6]: df['bar'].iloc[0] = 99
/opt/local/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/pandas-0.16.0_19_g8d2818e-py2.7-macosx-10.9-x86_64.egg/pandas/core/indexing.py:118: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame

See the the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  self._setitem_with_indexer(indexer, value)

Another approach that will consistently work with both setting and getting is:

In [7]: df.loc[df.index[0], 'foo']
Out[7]: 'A'
In [8]: df.loc[df.index[0], 'bar'] = 99
In [9]: df
Out[9]:
  foo  bar
0   A   99
2   B  100
1   C  100

Question 4

Another way to do this:

first_value = df['Btime'].values[0]

This way seems to be faster than using .iloc:

In [1]: %timeit -n 1000 df['Btime'].values[20]
5.82 µs ± 142 ns per loop (mean ± std. dev. of 7 runs, 1000 loops each)

In [2]: %timeit -n 1000 df['Btime'].iloc[20]
29.2 µs ± 1.28 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

Question 5

df.iloc[0].head(1) – First data set only from entire first row.
df.iloc[0] – Entire First row in column.

Question 6

In a general way, if you want to pick up the first N rows from the J column from pandas dataframe the best way to do this is:

data = dataframe[0:N][:,J]

Question 7

To get e.g the value from column ‘test’ and row 1 it works like

df[['test']].values[0][0]

as only df[['test']].values[0] gives back a array

Question 8

Another way of getting the first row and preserving the index:

x = df.first('d') # Returns the first day. '3d' gives first three days.

熊猫-获取给定列的第一行值

问题：熊猫-获取给定列的第一行值

回答 0

`df_test['Btime'].iloc[0]`（推荐）和之间有区别`df_test.iloc[0]['Btime']`：

`df.iloc[0, df.columns.get_loc('Btime')] = x` （推荐的）：

`df['Btime'].iloc[0] = x` 可行，但不建议：

`df.iloc[0]['Btime'] = x` 不起作用：

There is a difference between `df_test['Btime'].iloc[0]` (recommended) and `df_test.iloc[0]['Btime']`:

`df.iloc[0, df.columns.get_loc('Btime')] = x` (recommended):

`df['Btime'].iloc[0] = x` works, but is not recommended:

`df.iloc[0]['Btime'] = x` does not work:

回答 1

回答 2

回答 3

回答 4

回答 5

回答 6

排行榜展示

Python 情人节超强技能导出微信聊天记录生成词云

你不得不知道的python超级文献批量搜索下载工具

Python 流程图 — 一键转化代码为流程图

7行代码 Python热力图可视化分析缺失数据处理

Python 优化—算出每条语句执行时间

你的10W块放哪里能赚最多钱？

文章展示

如何选择特定列中带有NaN的行？

快速检查NumPy中的NaN

如何使用NLTK标记器消除标点符号？

兄弟包进口

如何验证一个列表是否是另一个列表的子集？

脾气暴躁的地方有多个条件

熊猫-获取给定列的第一行值

问题：熊猫-获取给定列的第一行值

回答 0

df_test['Btime'].iloc[0]（推荐）和之间有区别df_test.iloc[0]['Btime']：

df.iloc[0, df.columns.get_loc('Btime')] = x （推荐的）：

df['Btime'].iloc[0] = x 可行，但不建议：

df.iloc[0]['Btime'] = x 不起作用：

There is a difference between df_test['Btime'].iloc[0] (recommended) and df_test.iloc[0]['Btime']:

df.iloc[0, df.columns.get_loc('Btime')] = x (recommended):

df['Btime'].iloc[0] = x works, but is not recommended:

df.iloc[0]['Btime'] = x does not work:

回答 1

回答 2

回答 3

回答 4

回答 5

回答 6

相关文章

排行榜展示

文章展示

`df_test['Btime'].iloc[0]`（推荐）和之间有区别`df_test.iloc[0]['Btime']`：

`df.iloc[0, df.columns.get_loc('Btime')] = x` （推荐的）：

`df['Btime'].iloc[0] = x` 可行，但不建议：

`df.iloc[0]['Btime'] = x` 不起作用：

There is a difference between `df_test['Btime'].iloc[0]` (recommended) and `df_test.iloc[0]['Btime']`:

`df.iloc[0, df.columns.get_loc('Btime')] = x` (recommended):

`df['Btime'].iloc[0] = x` works, but is not recommended:

`df.iloc[0]['Btime'] = x` does not work: