问题:如何打印分组对象
我想打印与熊猫分组的结果。
我有一个数据框:
import pandas as pd
df = pd.DataFrame({'A': ['one', 'one', 'two', 'three', 'three', 'one'], 'B': range(6)})
print(df)
A B
0 one 0
1 one 1
2 two 2
3 three 3
4 three 4
5 one 5
按“ A”分组后进行打印时,我有以下内容:
print(df.groupby('A'))
<pandas.core.groupby.DataFrameGroupBy object at 0x05416E90>
如何打印分组的数据框?
如果我做:
print(df.groupby('A').head())
我获得的数据框好像没有分组一样:
A B
A
one 0 one 0
1 one 1
two 2 two 2
three 3 three 3
4 three 4
one 5 one 5
我期待的是这样的:
A B
A
one 0 one 0
1 one 1
5 one 5
two 2 two 2
three 3 three 3
4 three 4
I want to print the result of grouping with Pandas.
I have a dataframe:
import pandas as pd
df = pd.DataFrame({'A': ['one', 'one', 'two', 'three', 'three', 'one'], 'B': range(6)})
print(df)
A B
0 one 0
1 one 1
2 two 2
3 three 3
4 three 4
5 one 5
When printing after grouping by ‘A’ I have the following:
print(df.groupby('A'))
<pandas.core.groupby.DataFrameGroupBy object at 0x05416E90>
How can I print the dataframe grouped?
If I do:
print(df.groupby('A').head())
I obtain the dataframe as if it was not grouped:
A B
A
one 0 one 0
1 one 1
two 2 two 2
three 3 three 3
4 three 4
one 5 one 5
I was expecting something like:
A B
A
one 0 one 0
1 one 1
5 one 5
two 2 two 2
three 3 three 3
4 three 4
回答 0
只需做:
grouped_df = df.groupby('A')
for key, item in grouped_df:
print(grouped_df.get_group(key), "\n\n")
这也可以
grouped_df = df.groupby('A')
gb = grouped_df.groups
for key, values in gb.iteritems():
print(df.ix[values], "\n\n")
对于选择性键分组:key_list_from_gb
使用以下命令将所需的键插入,如下所示gb.keys()
:
gb = grouped_df.groups
gb.keys()
key_list_from_gb = [key1, key2, key3]
for key, values in gb.items():
if key in key_list_from_gb:
print(df.ix[values], "\n")
Simply do:
grouped_df = df.groupby('A')
for key, item in grouped_df:
print(grouped_df.get_group(key), "\n\n")
This also works,
grouped_df = df.groupby('A')
gb = grouped_df.groups
for key, values in gb.iteritems():
print(df.ix[values], "\n\n")
For selective key grouping: Insert the keys you want inside the key_list_from_gb
, in following, using gb.keys()
: For Example,
gb = grouped_df.groups
gb.keys()
key_list_from_gb = [key1, key2, key3]
for key, values in gb.items():
if key in key_list_from_gb:
print(df.ix[values], "\n")
回答 1
如果您只是在寻找一种显示方式,可以使用describe():
grp = df.groupby['colName']
grp.describe()
这给您一个整洁的桌子。
If you’re simply looking for a way to display it, you could use describe():
grp = df.groupby['colName']
grp.describe()
This gives you a neat table.
回答 2
我确认了head()
版本0.12和0.13之间的更改行为。在我看来,这似乎是个虫子。我创建了一个问题。
但是groupby操作实际上并不返回按组排序的DataFrame。该.head()
方法在这里有点误导-只是方便的功能,它使您可以重新检查df
您分组的对象(在本例中为)。结果groupby
是另一种对象,一个GroupBy
对象。您必须apply
,transform
或filter
返回到DataFrame或Series。
如果您要做的只是按A列中的值排序,则应使用df.sort('A')
。
I confirmed that the behavior of head()
changes between version 0.12 and 0.13. That looks like a bug to me. I created an issue.
But a groupby operation doesn’t actually return a DataFrame sorted by group. The .head()
method is a little misleading here — it’s just a convenience feature to let you re-examine the object (in this case, df
) that you grouped. The result of groupby
is separate kind of object, a GroupBy
object. You must apply
, transform
, or filter
to get back to a DataFrame or Series.
If all you wanted to do was sort by the values in columns A, you should use df.sort('A')
.
回答 3
另一个简单的选择:
for name_of_the_group, group in grouped_dataframe:
print (name_of_the_group)
print (group)
Another simple alternative:
for name_of_the_group, group in grouped_dataframe:
print (name_of_the_group)
print (group)
回答 4
另外,其他简单的选择可能是:
gb = df.groupby("A")
gb.count() # or,
gb.get_group(your_key)
Also, other simple alternative could be:
gb = df.groupby("A")
gb.count() # or,
gb.get_group(your_key)
回答 5
除了以前的答案:
以你为例
df = pd.DataFrame({'A': ['one', 'one', 'two', 'three', 'three', 'one'], 'B': range(6)})
然后是简单的1行代码
df.groupby('A').apply(print)
In addition to previous answers:
Taking your example,
df = pd.DataFrame({'A': ['one', 'one', 'two', 'three', 'three', 'one'], 'B': range(6)})
Then simple 1 line code
df.groupby('A').apply(print)
回答 6
感谢Surya的深刻见解。我会清理他的解决方案,然后简单地执行以下操作:
for key, value in df.groupby('A'):
print(key, value)
Thanks to Surya for good insights. I’d clean up his solution and simply do:
for key, value in df.groupby('A'):
print(key, value)
回答 7
您不能直接通过print语句查看groupBy数据,但可以使用for循环遍历该组来查看,请尝试使用此代码查看数据中的组
group = df.groupby('A') #group variable contains groupby data
for A,A_df in group: # A is your column and A_df is group of one kind at a time
print(A)
print(A_df)
尝试将其作为分组结果后,您将获得输出
希望对您有所帮助
you cannot see the groupBy data directly by print statement but you can see by iterating over the group using for loop
try this code to see the group by data
group = df.groupby('A') #group variable contains groupby data
for A,A_df in group: # A is your column and A_df is group of one kind at a time
print(A)
print(A_df)
you will get an output after trying this as a groupby result
I hope it helps
回答 8
在GroupBy对象上调用list()
print(list(df.groupby('A')))
给你:
[('one', A B
0 one 0
1 one 1
5 one 5), ('three', A B
3 three 3
4 three 4), ('two', A B
2 two 2)]
Call list() on the GroupBy object
print(list(df.groupby('A')))
gives you:
[('one', A B
0 one 0
1 one 1
5 one 5), ('three', A B
3 three 3
4 three 4), ('two', A B
2 two 2)]
回答 9
在Jupyter Notebook中,如果执行以下操作,它将打印该对象的一个很好的分组版本。该apply
方法有助于创建多索引数据框。
by = 'A' # groupby 'by' argument
df.groupby(by).apply(lambda a: a[:])
输出:
A B
A
one 0 one 0
1 one 1
5 one 5
three 3 three 3
4 three 4
two 2 two 2
如果您希望该by
列不出现在输出中,请像这样删除列。
df.groupby(by).apply(lambda a: a.drop(by, axis=1)[:])
输出:
B
A
one 0 0
1 1
5 5
three 3 3
4 4
two 2 2
在这里,我不确定为什么.iloc[:]
不起作用,而不是[:]
最后。因此,如果将来由于更新(或当前)而存在一些问题,.iloc[:len(a)]
也可以使用。
In Jupyter Notebook, if you do the following, it prints a nice grouped version of the object. The apply
method helps in creation of a multiindex dataframe.
by = 'A' # groupby 'by' argument
df.groupby(by).apply(lambda a: a[:])
Output:
A B
A
one 0 one 0
1 one 1
5 one 5
three 3 three 3
4 three 4
two 2 two 2
If you want the by
column(s) to not appear in the output, just drop the column(s), like so.
df.groupby(by).apply(lambda a: a.drop(by, axis=1)[:])
Output:
B
A
one 0 0
1 1
5 5
three 3 3
4 4
two 2 2
Here, I am not sure as to why .iloc[:]
does not work instead of [:]
at the end. So, if there are some issues in future due to updates (or at present), .iloc[:len(a)]
also works.
回答 10
我发现了一个棘手的方法,只是为了头脑风暴,请参见代码:
df['a'] = df['A'] # create a shadow column for MultiIndexing
df.sort_values('A', inplace=True)
df.set_index(["A","a"], inplace=True)
print(df)
输出:
B
A a
one one 0
one 1
one 5
three three 3
three 4
two two 2
优点很容易打印,因为它返回一个数据框而不是Groupby Object。输出看起来不错。缺点是会创建一系列冗余数据。
I found a tricky way, just for brainstorm, see the code:
df['a'] = df['A'] # create a shadow column for MultiIndexing
df.sort_values('A', inplace=True)
df.set_index(["A","a"], inplace=True)
print(df)
the output:
B
A a
one one 0
one 1
one 5
three three 3
three 4
two two 2
The pros is so easy to print, as it returns a dataframe, instead of Groupby Object. And the output looks nice.
While the con is that it create a series of redundant data.
回答 11
在python 3中
k = None
for name_of_the_group, group in dict(df_group):
if(k != name_of_the_group):
print ('\n', name_of_the_group)
print('..........','\n')
print (group)
k = name_of_the_group
以更互动的方式
In python 3
k = None
for name_of_the_group, group in dict(df_group):
if(k != name_of_the_group):
print ('\n', name_of_the_group)
print('..........','\n')
print (group)
k = name_of_the_group
In more interactive way
回答 12
打印所有(或任意多个)分组的df行:
import pandas as pd
pd.set_option('display.max_rows', 500)
grouped_df = df.group(['var1', 'var2'])
print(grouped_df)
to print all (or arbitrarily many) lines of the grouped df:
import pandas as pd
pd.set_option('display.max_rows', 500)
grouped_df = df.group(['var1', 'var2'])
print(grouped_df)