问题:大熊猫可以使用列作为索引吗?
我有一个像这样的电子表格:
Locality 2005 2006 2007 2008 2009
ABBOTSFORD 427000 448000 602500 600000 638500
ABERFELDIE 534000 600000 735000 710000 775000
AIREYS INLET459000 440000 430000 517500 512500
我不想手动将列与行交换。是否可以使用熊猫将数据读取到列表中,如下所示:
data [ 'ABBOTSFORD' ]=[ 427000 , 448000 , 602500 , 600000 , 638500 ]
data [ 'ABERFELDIE' ]=[ 534000 , 600000 , 735000 , 710000 , 775000 ]
data [ 'AIREYS INLET' ]=[ 459000 , 440000 , 430000 , 517500 , 512500 ]
I have a spreadsheet like this:
Locality 2005 2006 2007 2008 2009
ABBOTSFORD 427000 448000 602500 600000 638500
ABERFELDIE 534000 600000 735000 710000 775000
AIREYS INLET459000 440000 430000 517500 512500
I don’t want to manually swap the column with the row. Could it be possible to use pandas reading data to a list as this:
data['ABBOTSFORD']=[427000,448000,602500,600000,638500]
data['ABERFELDIE']=[534000,600000,735000,710000,775000]
data['AIREYS INLET']=[459000,440000,430000,517500,512500]
回答 0
是的,使用set_index 可以创建Locality
行索引。
data . set_index ( 'Locality' , inplace = True )
如果inplace=True
未提供,则set_index
返回修改后的数据帧。
例:
> import pandas as pd
> df = pd . DataFrame ([[ 'ABBOTSFORD' , 427000 , 448000 ],
[ 'ABERFELDIE' , 534000 , 600000 ]],
columns =[ 'Locality' , 2005 , 2006 ])
> df
Locality 2005 2006
0 ABBOTSFORD 427000 448000
1 ABERFELDIE 534000 600000
> df . set_index ( 'Locality' , inplace = True )
> df
2005 2006
Locality
ABBOTSFORD 427000 448000
ABERFELDIE 534000 600000
> df . loc [ 'ABBOTSFORD' ]
2005 427000
2006 448000
Name : ABBOTSFORD , dtype : int64
> df . loc [ 'ABBOTSFORD' ][ 2005 ]
427000
> df . loc [ 'ABBOTSFORD' ]. values
array ([ 427000 , 448000 ])
> df . loc [ 'ABBOTSFORD' ]. tolist ()
[ 427000 , 448000 ]
Yes, with set_index you can make Locality
your row index.
data.set_index('Locality', inplace=True)
If inplace=True
is not provided, set_index
returns the modified dataframe as a result.
Example:
> import pandas as pd
> df = pd.DataFrame([['ABBOTSFORD', 427000, 448000],
['ABERFELDIE', 534000, 600000]],
columns=['Locality', 2005, 2006])
> df
Locality 2005 2006
0 ABBOTSFORD 427000 448000
1 ABERFELDIE 534000 600000
> df.set_index('Locality', inplace=True)
> df
2005 2006
Locality
ABBOTSFORD 427000 448000
ABERFELDIE 534000 600000
> df.loc['ABBOTSFORD']
2005 427000
2006 448000
Name: ABBOTSFORD, dtype: int64
> df.loc['ABBOTSFORD'][2005]
427000
> df.loc['ABBOTSFORD'].values
array([427000, 448000])
> df.loc['ABBOTSFORD'].tolist()
[427000, 448000]
回答 1
您可以使用进行更改,如已经说明的那样set_index
。您无需手动将行与列交换data.T
,pandas中有一个transpose()方法可以为您完成此操作:
> df = pd . DataFrame ([[ 'ABBOTSFORD' , 427000 , 448000 ],
[ 'ABERFELDIE' , 534000 , 600000 ]],
columns =[ 'Locality' , 2005 , 2006 ])
> newdf = df . set_index ( 'Locality' ). T
> newdf
Locality ABBOTSFORD ABERFELDIE
2005 427000 534000
2006 448000 600000
然后您可以获取数据框列值并将其转换为列表:
> newdf [ 'ABBOTSFORD' ]. values . tolist ()
[ 427000 , 448000 ]
You can change the index as explained already using set_index
.
You don’t need to manually swap rows with columns, there is a transpose (data.T
) method in pandas that does it for you:
> df = pd.DataFrame([['ABBOTSFORD', 427000, 448000],
['ABERFELDIE', 534000, 600000]],
columns=['Locality', 2005, 2006])
> newdf = df.set_index('Locality').T
> newdf
Locality ABBOTSFORD ABERFELDIE
2005 427000 534000
2006 448000 600000
then you can fetch the dataframe column values and transform them to a list:
> newdf['ABBOTSFORD'].values.tolist()
[427000, 448000]
回答 2
您可以在从Pandas中的电子表格读取数据时使用可用的index_col 参数设置列索引。
这是我的解决方案:
首先,将熊猫作为pd导入:
import pandas as pd
使用pd.read_excel() 读入文件名(如果电子表格中有数据),并通过指定index_col参数将索引设置为“ Locality”。
df = pd.read_excel('testexcel.xlsx', index_col=0)
在此阶段,如果出现“没有名为xlrd的模块”错误,请使用进行安装pip install xlrd
。
为了进行视觉检查,请读取数据框,使用df.head()
该数据框将打印以下输出
现在,您可以获取数据框所需列的值并进行打印
You can set the column index using index_col parameter available while reading from spreadsheet in Pandas.
Here is my solution:
Firstly, import pandas as pd:
import pandas as pd
Read in filename using pd.read_excel() (if you have your data in a spreadsheet) and set the index to ‘Locality’ by specifying the index_col parameter.
df = pd.read_excel('testexcel.xlsx', index_col=0)
At this stage if you get a ‘no module named xlrd’ error, install it using pip install xlrd
.
For visual inspection, read the dataframe using df.head()
which will print the following output
Now you can fetch the values of the desired columns of the dataframe and print it