如何遍历分组的熊猫数据框？-Python 实用宝典

问题：如何遍历分组的熊猫数据框？

数据框：

  c_os_family_ss c_os_major_is l_customer_id_i
0      Windows 7                         90418
1      Windows 7                         90418
2      Windows 7                         90418

码：

print df
for name, group in df.groupby('l_customer_id_i').agg(lambda x: ','.join(x)):
    print name
    print group

我正在尝试仅遍历聚合数据，但出现错误：

ValueError：太多值无法解包

@EdChum，这是预期的输出：

                                                    c_os_family_ss  \
l_customer_id_i
131572           Windows 7,Windows 7,Windows 7,Windows 7,Window...
135467           Windows 7,Windows 7,Windows 7,Windows 7,Window...

                                                     c_os_major_is
l_customer_id_i
131572           ,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,...
135467           ,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,...

输出不是问题，我希望遍历每个组。

DataFrame:

  c_os_family_ss c_os_major_is l_customer_id_i
0      Windows 7                         90418
1      Windows 7                         90418
2      Windows 7                         90418

Code:

print df
for name, group in df.groupby('l_customer_id_i').agg(lambda x: ','.join(x)):
    print name
    print group

I’m trying to just loop over the aggregated data, but I get the error:

ValueError: too many values to unpack

@EdChum, here’s the expected output:

                                                    c_os_family_ss  \
l_customer_id_i
131572           Windows 7,Windows 7,Windows 7,Windows 7,Window...
135467           Windows 7,Windows 7,Windows 7,Windows 7,Window...

                                                     c_os_major_is
l_customer_id_i
131572           ,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,...
135467           ,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,...

The output is not the problem, I wish to loop over every group.

回答 0

df.groupby('l_customer_id_i').agg(lambda x: ','.join(x)) 确实已经返回了数据帧，因此您无法再遍历这些组。

一般来说：

df.groupby(...)返回一个GroupBy对象（DataFrameGroupBy或SeriesGroupBy），以及与此，您可以迭代通过组（如文档解释这里）。您可以执行以下操作：
```
grouped = df.groupby('A')

for name, group in grouped:
    ...
```
当您应用在GROUPBY，在你的榜样的功能df.groupby(...).agg(...)（但是这也可以是transform，apply，mean，…），你结合的结果应用的功能，不同的群体集中在一个数据框（在适用和结合的步骤groupby的“ split-apply-combine”范式。因此，其结果将始终是DataFrame（或Series，具体取决于所应用的功能）。

df.groupby('l_customer_id_i').agg(lambda x: ','.join(x)) does already return a dataframe, so you cannot loop over the groups anymore.

In general:

df.groupby(...) returns a GroupBy object (a DataFrameGroupBy or SeriesGroupBy), and with this, you can iterate through the groups (as explained in the docs here). You can do something like:
```
grouped = df.groupby('A')

for name, group in grouped:
    ...
```
When you apply a function on the groupby, in your example df.groupby(...).agg(...) (but this can also be transform, apply, mean, …), you combine the result of applying the function to the different groups together in one dataframe (the apply and combine step of the ‘split-apply-combine’ paradigm of groupby). So the result of this will always be again a DataFrame (or a Series depending on the applied function).

回答 1

这是一个迭代pd.DataFrame按列分组的示例atable。对于示例用例，将在for循环内生成SQL数据库的“创建”语句：

import pandas as pd

df1 = pd.DataFrame({
    'atable':     ['Users', 'Users', 'Domains', 'Domains', 'Locks'],
    'column':     ['col_1', 'col_2', 'col_a', 'col_b', 'col'],
    'column_type':['varchar', 'varchar', 'int', 'varchar', 'varchar'],
    'is_null':    ['No', 'No', 'Yes', 'No', 'Yes'],
})

df1_grouped = df1.groupby('atable')

# iterate over each group
for group_name, df_group in df1_grouped:
    print('\nCREATE TABLE {}('.format(group_name))

    for row_index, row in df_group.iterrows():
        col = row['column']
        column_type = row['column_type']
        is_null = 'NOT NULL' if row['is_null'] == 'NO' else ''
        print('\t{} {} {},'.format(col, column_type, is_null))

    print(");")

Here is an example of iterating over a pd.DataFrame grouped by the column atable. For an sample usecase, “create” statements for an SQL database are generated within the for loop:

import pandas as pd

df1 = pd.DataFrame({
    'atable':     ['Users', 'Users', 'Domains', 'Domains', 'Locks'],
    'column':     ['col_1', 'col_2', 'col_a', 'col_b', 'col'],
    'column_type':['varchar', 'varchar', 'int', 'varchar', 'varchar'],
    'is_null':    ['No', 'No', 'Yes', 'No', 'Yes'],
})

df1_grouped = df1.groupby('atable')

# iterate over each group
for group_name, df_group in df1_grouped:
    print('\nCREATE TABLE {}('.format(group_name))

    for row_index, row in df_group.iterrows():
        col = row['column']
        column_type = row['column_type']
        is_null = 'NOT NULL' if row['is_null'] == 'NO' else ''
        print('\t{} {} {},'.format(col, column_type, is_null))

    print(");")

回答 2

如果已经创建了数据框，则可以遍历索引值。

df = df.groupby('l_customer_id_i').agg(lambda x: ','.join(x))
for name in df.index:
    print name
    print df.loc[name]

You can iterate over the index values if your dataframe has already been created.

df = df.groupby('l_customer_id_i').agg(lambda x: ','.join(x))
for name in df.index:
    print name
    print df.loc[name]

声明：本站所有文章，如无特殊说明或标注，均为本站原创发布。任何个人或组织，在未征得本站同意时，禁止复制、盗用、采集、发布本站内容到任何网站、书籍等各类媒体平台。如若本站内容侵犯了原著者的合法权益，可联系我们进行处理。

如何遍历分组的熊猫数据框？

问题：如何遍历分组的熊猫数据框？

回答 0

回答 1

回答 2

排行榜展示

Python 情人节超强技能导出微信聊天记录生成词云

你不得不知道的python超级文献批量搜索下载工具

Python 流程图 — 一键转化代码为流程图

7行代码 Python热力图可视化分析缺失数据处理

Python 优化—算出每条语句执行时间

你的10W块放哪里能赚最多钱？

文章展示

为什么要从1970年1月1日开始计算日期？

AttributeError：“模块”对象没有属性“测试”

通过索引访问Python字典的元素

使用“导入模块”还是“从模块导入”？

我可以单独使用Flask app.run（）服务多个客户端吗？

创建动态选择字段

如何遍历分组的熊猫数据框？

问题：如何遍历分组的熊猫数据框？

回答 0

回答 1

回答 2

相关文章

排行榜展示

文章展示