查找列的最大值,并使用Pandas返回相应的行值

问题:查找列的最大值,并使用Pandas返回相应的行值

我正在尝试使用Python Pandas查找具有最大值的CountryPlace

这将返回最大值:

data.groupby(['Country','Place'])['Value'].max()

但我怎么得到相应CountryPlace的名字吗?

Using Python Pandas I am trying to find the Country & Place with the maximum value.

This returns the maximum value:

data.groupby(['Country','Place'])['Value'].max()

But how do I get the corresponding Country and Place name?


回答 0

假设df有一个唯一的索引,则该行具有最大值:

In [34]: df.loc[df['Value'].idxmax()]
Out[34]: 
Country        US
Place      Kansas
Value         894
Name: 7

请注意,idxmax返回索引标签。因此,如果DataFrame在索引中有重复项,则标签可能不会唯一地标识该行,因此df.loc可能会返回多个行。

因此,如果df没有唯一索引,则必须按照上述步骤使索引唯一。取决于DataFrame,有时您可以使用stackset_index使索引唯一。或者,您可以简单地重置索引(这样行将被重新编号,从0开始):

df = df.reset_index()

Assuming df has a unique index, this gives the row with the maximum value:

In [34]: df.loc[df['Value'].idxmax()]
Out[34]: 
Country        US
Place      Kansas
Value         894
Name: 7

Note that idxmax returns index labels. So if the DataFrame has duplicates in the index, the label may not uniquely identify the row, so df.loc may return more than one row.

Therefore, if df does not have a unique index, you must make the index unique before proceeding as above. Depending on the DataFrame, sometimes you can use stack or set_index to make the index unique. Or, you can simply reset the index (so the rows become renumbered, starting at 0):

df = df.reset_index()

回答 1

df[df['Value']==df['Value'].max()]

这将返回整个行的最大值

df[df['Value']==df['Value'].max()]

This will return the entire row with max value


回答 2

国家和地方是该系列的索引,如果不需要该索引,则可以设置as_index=False

df.groupby(['country','place'], as_index=False)['value'].max()

编辑:

似乎您想让每个国家/地区的价值最大化,以下代码将满足您的要求:

df.groupby("country").apply(lambda df:df.irow(df.value.argmax()))

The country and place is the index of the series, if you don’t need the index, you can set as_index=False:

df.groupby(['country','place'], as_index=False)['value'].max()

Edit:

It seems that you want the place with max value for every country, following code will do what you want:

df.groupby("country").apply(lambda df:df.irow(df.value.argmax()))

回答 3

我认为返回具有最大值的行的最简单方法是获取其索引。argmax()可用于返回具有最大值的行的索引。

index = df.Value.argmax()

现在,索引可以用于获取该特定行的功能:

df.iloc[df.Value.argmax(), 0:2]

I think the easiest way to return a row with the maximum value is by getting its index. argmax() can be used to return the index of the row with the largest value.

index = df.Value.argmax()

Now the index could be used to get the features for that particular row:

df.iloc[df.Value.argmax(), 0:2]

回答 4

使用的index属性DataFrame。请注意,我没有在示例中键入所有行。

In [14]: df = data.groupby(['Country','Place'])['Value'].max()

In [15]: df.index
Out[15]: 
MultiIndex
[Spain  Manchester, UK     London    , US     Mchigan   ,        NewYork   ]

In [16]: df.index[0]
Out[16]: ('Spain', 'Manchester')

In [17]: df.index[1]
Out[17]: ('UK', 'London')

您还可以通过该索引获取值:

In [21]: for index in df.index:
    print index, df[index]
   ....:      
('Spain', 'Manchester') 512
('UK', 'London') 778
('US', 'Mchigan') 854
('US', 'NewYork') 562

编辑

很抱歉造成您的误解,请尝试以下操作:

In [52]: s=data.max()

In [53]: print '%s, %s, %s' % (s['Country'], s['Place'], s['Value'])
US, NewYork, 854

Use the index attribute of DataFrame. Note that I don’t type all the rows in the example.

In [14]: df = data.groupby(['Country','Place'])['Value'].max()

In [15]: df.index
Out[15]: 
MultiIndex
[Spain  Manchester, UK     London    , US     Mchigan   ,        NewYork   ]

In [16]: df.index[0]
Out[16]: ('Spain', 'Manchester')

In [17]: df.index[1]
Out[17]: ('UK', 'London')

You can also get the value by that index:

In [21]: for index in df.index:
    print index, df[index]
   ....:      
('Spain', 'Manchester') 512
('UK', 'London') 778
('US', 'Mchigan') 854
('US', 'NewYork') 562

Edit

Sorry for misunderstanding what you want, try followings:

In [52]: s=data.max()

In [53]: print '%s, %s, %s' % (s['Country'], s['Place'], s['Value'])
US, NewYork, 854

回答 5

为了以最大值打印“国家和地区”,请使用以下代码行。

print(df[['Country', 'Place']][df.Value == df.Value.max()])

In order to print the Country and Place with maximum value, use the following line of code.

print(df[['Country', 'Place']][df.Value == df.Value.max()])

回答 6

我在列中查找最大值的解决方案:

df.ix[df.idxmax()]

,也是最低要求:

df.ix[df.idxmin()]

My solution for finding maximum values in columns:

df.ix[df.idxmax()]

, also minimum:

df.ix[df.idxmin()]

回答 7

我建议使用nlargest以获得更好的性能和较短的代码。进口pandas

df[col_name].value_counts().nlargest(n=1)

I’d recommend using nlargest for better performance and shorter code. import pandas

df[col_name].value_counts().nlargest(n=1)

回答 8

您可以使用:

打印(df [df [‘Value’] == df [‘Value’]。max()])

You can use:

print(df[df['Value']==df['Value'].max()])

回答 9

import pandas
df是您创建的数据框。

使用命令:

df1=df[['Country','Place']][df.Value == df['Value'].max()]

这将显示其最大值的国家和地方。

import pandas
df is the data frame you create.

Use the command:

df1=df[['Country','Place']][df.Value == df['Value'].max()]

This will display the country and place whose value is maximum.


回答 10

尝试使用pandas导入数据时遇到类似的错误,数据集的第一列在单词开头之前有空格。我删除了空间,它就像一个魅力!

I encountered a similar error while trying to import data using pandas, The first column on my dataset had spaces before the start of the words. I removed the spaces and it worked like a charm!!