如何遍历分组的熊猫数据框？

Question 1

DataFrame:

  c_os_family_ss c_os_major_is l_customer_id_i
0      Windows 7                         90418
1      Windows 7                         90418
2      Windows 7                         90418

Code:

print df
for name, group in df.groupby('l_customer_id_i').agg(lambda x: ','.join(x)):
    print name
    print group

I’m trying to just loop over the aggregated data, but I get the error:

ValueError: too many values to unpack

@EdChum, here’s the expected output:

                                                    c_os_family_ss  \
l_customer_id_i
131572           Windows 7,Windows 7,Windows 7,Windows 7,Window...
135467           Windows 7,Windows 7,Windows 7,Windows 7,Window...

                                                     c_os_major_is
l_customer_id_i
131572           ,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,...
135467           ,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,...

The output is not the problem, I wish to loop over every group.

Question 2

df.groupby('l_customer_id_i').agg(lambda x: ','.join(x)) does already return a dataframe, so you cannot loop over the groups anymore.

In general:

df.groupby(...) returns a GroupBy object (a DataFrameGroupBy or SeriesGroupBy), and with this, you can iterate through the groups (as explained in the docs here). You can do something like:
```
grouped = df.groupby('A')

for name, group in grouped:
    ...
```
When you apply a function on the groupby, in your example df.groupby(...).agg(...) (but this can also be transform, apply, mean, …), you combine the result of applying the function to the different groups together in one dataframe (the apply and combine step of the ‘split-apply-combine’ paradigm of groupby). So the result of this will always be again a DataFrame (or a Series depending on the applied function).

Question 3

Here is an example of iterating over a pd.DataFrame grouped by the column atable. For an sample usecase, “create” statements for an SQL database are generated within the for loop:

import pandas as pd

df1 = pd.DataFrame({
    'atable':     ['Users', 'Users', 'Domains', 'Domains', 'Locks'],
    'column':     ['col_1', 'col_2', 'col_a', 'col_b', 'col'],
    'column_type':['varchar', 'varchar', 'int', 'varchar', 'varchar'],
    'is_null':    ['No', 'No', 'Yes', 'No', 'Yes'],
})

df1_grouped = df1.groupby('atable')

# iterate over each group
for group_name, df_group in df1_grouped:
    print('\nCREATE TABLE {}('.format(group_name))

    for row_index, row in df_group.iterrows():
        col = row['column']
        column_type = row['column_type']
        is_null = 'NOT NULL' if row['is_null'] == 'NO' else ''
        print('\t{} {} {},'.format(col, column_type, is_null))

    print(");")

Question 4

You can iterate over the index values if your dataframe has already been created.

df = df.groupby('l_customer_id_i').agg(lambda x: ','.join(x))
for name in df.index:
    print name
    print df.loc[name]

如何遍历分组的熊猫数据框？

问题：如何遍历分组的熊猫数据框？

回答 0

回答 1

回答 2

排行榜展示

Python 情人节超强技能导出微信聊天记录生成词云

你不得不知道的python超级文献批量搜索下载工具

7行代码 Python热力图可视化分析缺失数据处理

Python 流程图 — 一键转化代码为流程图

Python 优化—算出每条语句执行时间

你的10W块放哪里能赚最多钱？

文章展示

Python中是否有`string.split（）`的生成器版本？

以最简单的方式将图例添加到Matplotlib中的PyPlot

如何在Flask-SQLAlchemy中按ID删除记录

获取类的属性

在Python中，如何以可读格式显示当前时间

如何在python 3中将二进制数据写入stdout？

如何遍历分组的熊猫数据框？

问题：如何遍历分组的熊猫数据框？

回答 0

回答 1

回答 2

相关文章

排行榜展示

文章展示