Python 实用宝典

Question 1

I want to print the result of grouping with Pandas.

I have a dataframe:

import pandas as pd
df = pd.DataFrame({'A': ['one', 'one', 'two', 'three', 'three', 'one'], 'B': range(6)})
print(df)

       A  B
0    one  0
1    one  1
2    two  2
3  three  3
4  three  4
5    one  5

When printing after grouping by ‘A’ I have the following:

print(df.groupby('A'))

<pandas.core.groupby.DataFrameGroupBy object at 0x05416E90>

How can I print the dataframe grouped?

If I do:

print(df.groupby('A').head())

I obtain the dataframe as if it was not grouped:

             A  B
A                
one   0    one  0
      1    one  1
two   2    two  2
three 3  three  3
      4  three  4
one   5    one  5

I was expecting something like:

             A  B
A                
one   0    one  0
      1    one  1
      5    one  5
two   2    two  2
three 3  three  3
      4  three  4

Question 2

Simply do:

grouped_df = df.groupby('A')

for key, item in grouped_df:
    print(grouped_df.get_group(key), "\n\n")

This also works,

grouped_df = df.groupby('A')    
gb = grouped_df.groups

for key, values in gb.iteritems():
    print(df.ix[values], "\n\n")

For selective key grouping: Insert the keys you want inside the key_list_from_gb, in following, using gb.keys(): For Example,

gb = grouped_df.groups
gb.keys()

key_list_from_gb = [key1, key2, key3]

for key, values in gb.items():
    if key in key_list_from_gb:
        print(df.ix[values], "\n")

Question 3

If you’re simply looking for a way to display it, you could use describe():

grp = df.groupby['colName']
grp.describe()

This gives you a neat table.

Question 4

I confirmed that the behavior of head() changes between version 0.12 and 0.13. That looks like a bug to me. I created an issue.

But a groupby operation doesn’t actually return a DataFrame sorted by group. The .head() method is a little misleading here — it’s just a convenience feature to let you re-examine the object (in this case, df) that you grouped. The result of groupby is separate kind of object, a GroupBy object. You must apply, transform, or filter to get back to a DataFrame or Series.

If all you wanted to do was sort by the values in columns A, you should use df.sort('A').

Question 5

Another simple alternative:

for name_of_the_group, group in grouped_dataframe:
   print (name_of_the_group)
   print (group)

Question 6

Also, other simple alternative could be:

gb = df.groupby("A")
gb.count() # or,
gb.get_group(your_key)

Question 7

In addition to previous answers:

Taking your example,

df = pd.DataFrame({'A': ['one', 'one', 'two', 'three', 'three', 'one'], 'B': range(6)})

Then simple 1 line code

df.groupby('A').apply(print)

Question 8

Thanks to Surya for good insights. I’d clean up his solution and simply do:

for key, value in df.groupby('A'):
    print(key, value)

Question 9

you cannot see the groupBy data directly by print statement but you can see by iterating over the group using for loop try this code to see the group by data

group = df.groupby('A') #group variable contains groupby data
for A,A_df in group: # A is your column and A_df is group of one kind at a time
  print(A)
  print(A_df)

you will get an output after trying this as a groupby result

I hope it helps

Question 10

Call list() on the GroupBy object

print(list(df.groupby('A')))

gives you:

[('one',      A  B
0  one  0
1  one  1
5  one  5), ('three',        A  B
3  three  3
4  three  4), ('two',      A  B
2  two  2)]

Question 11

In Jupyter Notebook, if you do the following, it prints a nice grouped version of the object. The apply method helps in creation of a multiindex dataframe.

by = 'A'  # groupby 'by' argument
df.groupby(by).apply(lambda a: a[:])

Output:

             A  B
A                
one   0    one  0
      1    one  1
      5    one  5
three 3  three  3
      4  three  4
two   2    two  2

If you want the by column(s) to not appear in the output, just drop the column(s), like so.

df.groupby(by).apply(lambda a: a.drop(by, axis=1)[:])

Output:

Here, I am not sure as to why .iloc[:] does not work instead of [:] at the end. So, if there are some issues in future due to updates (or at present), .iloc[:len(a)] also works.

Question 12

I found a tricky way, just for brainstorm, see the code:

df['a'] = df['A']  # create a shadow column for MultiIndexing
df.sort_values('A', inplace=True)
df.set_index(["A","a"], inplace=True)
print(df)

the output:

             B
A     a
one   one    0
      one    1
      one    5
three three  3
      three  4
two   two    2

The pros is so easy to print, as it returns a dataframe, instead of Groupby Object. And the output looks nice. While the con is that it create a series of redundant data.

Question 13

In python 3

k = None
for name_of_the_group, group in dict(df_group):
    if(k != name_of_the_group):
        print ('\n', name_of_the_group)
        print('..........','\n')
    print (group)
    k = name_of_the_group

In more interactive way

Question 14

to print all (or arbitrarily many) lines of the grouped df:

import pandas as pd
pd.set_option('display.max_rows', 500)

grouped_df = df.group(['var1', 'var2'])
print(grouped_df)

Python 实用宝典

如何打印分组对象

问题：如何打印分组对象

回答 0

回答 1

回答 2

回答 3

回答 4

回答 5

回答 6

回答 7

回答 8

回答 9

回答 10

回答 11

回答 12

有趣好用的Python教程