问题:熊猫加入问题:列重叠但未指定后缀

我有以下2个数据帧:

df_a =

     mukey  DI  PI
0   100000  35  14
1  1000005  44  14
2  1000006  44  14
3  1000007  43  13
4  1000008  43  13

df_b = 
    mukey  niccdcd
0  190236        4
1  190237        6
2  190238        7
3  190239        4
4  190240        7

当我尝试加入这两个数据框时:

join_df = df_a.join(df_b,on='mukey',how='left')

我得到错误:

*** ValueError: columns overlap but no suffix specified: Index([u'mukey'], dtype='object')

为什么会这样呢?数据帧确实具有通用的“ mukey”值。

I have following 2 data frames:

df_a =

     mukey  DI  PI
0   100000  35  14
1  1000005  44  14
2  1000006  44  14
3  1000007  43  13
4  1000008  43  13

df_b = 
    mukey  niccdcd
0  190236        4
1  190237        6
2  190238        7
3  190239        4
4  190240        7

When I try to join these 2 dataframes:

join_df = df_a.join(df_b,on='mukey',how='left')

I get the error:

*** ValueError: columns overlap but no suffix specified: Index([u'mukey'], dtype='object')

Why is this so? The dataframes do have common ‘mukey’ values.


回答 0

您发布的数据片段中的错误有点神秘,因为没有通用值,所以联接操作失败,因为这些值不重叠,这需要您在左侧和右侧提供后缀:

In [173]:

df_a.join(df_b, on='mukey', how='left', lsuffix='_left', rsuffix='_right')
Out[173]:
       mukey_left  DI  PI  mukey_right  niccdcd
index                                          
0          100000  35  14          NaN      NaN
1         1000005  44  14          NaN      NaN
2         1000006  44  14          NaN      NaN
3         1000007  43  13          NaN      NaN
4         1000008  43  13          NaN      NaN

merge 之所以有效,是因为它没有此限制:

In [176]:

df_a.merge(df_b, on='mukey', how='left')
Out[176]:
     mukey  DI  PI  niccdcd
0   100000  35  14      NaN
1  1000005  44  14      NaN
2  1000006  44  14      NaN
3  1000007  43  13      NaN
4  1000008  43  13      NaN

Your error on the snippet of data you posted is a little cryptic, in that because there are no common values, the join operation fails because the values don’t overlap it requires you to supply a suffix for the left and right hand side:

In [173]:

df_a.join(df_b, on='mukey', how='left', lsuffix='_left', rsuffix='_right')
Out[173]:
       mukey_left  DI  PI  mukey_right  niccdcd
index                                          
0          100000  35  14          NaN      NaN
1         1000005  44  14          NaN      NaN
2         1000006  44  14          NaN      NaN
3         1000007  43  13          NaN      NaN
4         1000008  43  13          NaN      NaN

merge works because it doesn’t have this restriction:

In [176]:

df_a.merge(df_b, on='mukey', how='left')
Out[176]:
     mukey  DI  PI  niccdcd
0   100000  35  14      NaN
1  1000005  44  14      NaN
2  1000006  44  14      NaN
3  1000007  43  13      NaN
4  1000008  43  13      NaN

回答 1

.join()函数正在使用index传递的参数数据集的,因此您应该改用set_index或使用.mergefunction。

请找到适合您的情况的两个示例:

join_df = LS_sgo.join(MSU_pi.set_index('mukey'), on='mukey', how='left')

要么

join_df = df_a.merge(df_b, on='mukey', how='left')

The .join() function is using the index of the passed as argument dataset, so you should use set_index or use .merge function instead.

Please find the two examples that should work in your case:

join_df = LS_sgo.join(MSU_pi.set_index('mukey'), on='mukey', how='left')

or

join_df = df_a.merge(df_b, on='mukey', how='left')

回答 2

此错误表明两个表具有1个或多个具有相同列名的列名。错误消息翻译为:“我可以在两个表中看到同一列,但是您没有告诉我在将其中一个引入之前重命名了任何一个”

您要么要删除一列,然后再使用del df [‘column name’]从另一列中引入,要么使用lsuffix来重写原始列,或者使用rsuffix重命名要引入的列。

df_a.join(df_b, on='mukey', how='left', lsuffix='_left', rsuffix='_right')

This error indicates that the two tables have the 1 or more column names that have the same column name. The error message translates to: “I can see the same column in both tables but you haven’t told me to rename either before bringing one of them in”

You either want to delete one of the columns before bringing it in from the other on using del df[‘column name’], or use lsuffix to re-write the original column, or rsuffix to rename the one that is being brought it.

df_a.join(df_b, on='mukey', how='left', lsuffix='_left', rsuffix='_right')

回答 3

主要是join是专门用于基于索引的联接,而不是基于属性名称的联接,因此在两个不同的数据框中更改属性名称,然后尝试联接,它们将被联接,否则会引发此错误

Mainly join is used exclusively to join based on the index,not on the attribute names,so change the attributes names in two different dataframes,then try to join,they will be joined,else this error is raised


声明:本站所有文章,如无特殊说明或标注,均为本站原创发布。任何个人或组织,在未征得本站同意时,禁止复制、盗用、采集、发布本站内容到任何网站、书籍等各类媒体平台。如若本站内容侵犯了原著者的合法权益,可联系我们进行处理。