问题:Python Pandas仅合并某些列

是否可以仅合并一些列?我有一个带有x,y,z和df2列的DataFrame df1,其中x,a,b,c,d,e,f等列。

我想在x上合并两个DataFrame,但是我只想合并df2.a,df2.b列-而不是整个DataFrame。

结果将是具有x,y,z,a,b的DataFrame。

我可以合并然后删除不需要的列,但是似乎有更好的方法。

Is it possible to only merge some columns? I have a DataFrame df1 with columns x, y, z, and df2 with columns x, a ,b, c, d, e, f, etc.

I want to merge the two DataFrames on x, but I only want to merge columns df2.a, df2.b – not the entire DataFrame.

The result would be a DataFrame with x, y, z, a, b.

I could merge then delete the unwanted columns, but it seems like there is a better method.


回答 0

您可以合并sub-DataFrame(仅包含那些列):

df2[list('xab')]  # df2 but only with columns x, a, and b

df1.merge(df2[list('xab')])

You could merge the sub-DataFrame (with just those columns):

df2[list('xab')]  # df2 but only with columns x, a, and b

df1.merge(df2[list('xab')])

回答 1

您想使用两个括号,因此,如果要执行VLOOKUP动作,请执行以下操作:

df = pd.merge(df,df2[['Key_Column','Target_Column']],on='Key_Column', how='left')

这将为您提供原始df中的所有内容,并在df2中添加您想要加入的相应列。

You want to use TWO brackets, so if you are doing a VLOOKUP sort of action:

df = pd.merge(df,df2[['Key_Column','Target_Column']],on='Key_Column', how='left')

This will give you everything in the original df + add that one corresponding column in df2 that you want to join.


回答 2

如果要从目标数据框中删除列,但联接需要该列,则可以执行以下操作:

df1 = df1.merge(df2[['a', 'b', 'key1']], how = 'left',
                left_on = 'key2', right_on = 'key1').drop('key1')

.drop('key1')部分将防止“ key1”保留在结果数据帧中,尽管它首先需要加入。

If you want to drop column(s) from the target data frame, but the column(s) are required for the join, you can do the following:

df1 = df1.merge(df2[['a', 'b', 'key1']], how = 'left',
                left_on = 'key2', right_on = 'key1').drop('key1')

The .drop('key1') part will prevent ‘key1’ from being kept in the resulting data frame, despite it being required to join in the first place.


回答 3

您可以使用.loc选择所有行的特定列,然后将其拉出。下面是一个示例:

pandas.merge(dataframe1, dataframe2.iloc[:, [0:5]], how='left', on='key')

在此示例中,您要合并dataframe1和dataframe2。您已选择对“键”进行外部左连接。但是,对于dataframe2,您指定.iloc了允许您以数字格式指定想要的行和列的方法。使用:,选择所有行,但[0:5]选择前5列。您可以使用.loc按名称指定,但是如果您使用长列名称,则.iloc可能会更好。

You can use .loc to select the specific columns with all rows and then pull that. An example is below:

pandas.merge(dataframe1, dataframe2.iloc[:, [0:5]], how='left', on='key')

In this example, you are merging dataframe1 and dataframe2. You have chosen to do an outer left join on ‘key’. However, for dataframe2 you have specified .iloc which allows you to specific the rows and columns you want in a numerical format. Using :, your selecting all rows, but [0:5] selects the first 5 columns. You could use .loc to specify by name, but if your dealing with long column names, then .iloc may be better.


回答 4

这是为了合并两个表中的选定列。

如果table_1包含t1_a,t1_b,t1_c..,id,..t1_ztable_2包含t2_a, t2_b, t2_c..., id,..t2_z列,并且最终表中仅需要t1_a,id和t2_a,则

mergedCSV = table_1[['t1_a','id']].merge(table_2[['t2_a','id']], on = 'id',how = 'left')
# save resulting output file    
mergedCSV.to_csv('output.csv',index = False)

This is to merge selected columns from two tables.

If table_1 contains t1_a,t1_b,t1_c..,id,..t1_z columns, and table_2 contains t2_a, t2_b, t2_c..., id,..t2_z columns, and only t1_a, id, t2_a are required in the final table, then

mergedCSV = table_1[['t1_a','id']].merge(table_2[['t2_a','id']], on = 'id',how = 'left')
# save resulting output file    
mergedCSV.to_csv('output.csv',index = False)

声明:本站所有文章,如无特殊说明或标注,均为本站原创发布。任何个人或组织,在未征得本站同意时,禁止复制、盗用、采集、发布本站内容到任何网站、书籍等各类媒体平台。如若本站内容侵犯了原著者的合法权益,可联系我们进行处理。