问题:Python Pandas仅合并某些列
是否可以仅合并一些列?我有一个带有x,y,z和df2列的DataFrame df1,其中x,a,b,c,d,e,f等列。
我想在x上合并两个DataFrame,但是我只想合并df2.a,df2.b列-而不是整个DataFrame。
结果将是具有x,y,z,a,b的DataFrame。
我可以合并然后删除不需要的列,但是似乎有更好的方法。
Is it possible to only merge some columns? I have a DataFrame df1 with columns x, y, z, and df2 with columns x, a ,b, c, d, e, f, etc.
I want to merge the two DataFrames on x, but I only want to merge columns df2.a, df2.b – not the entire DataFrame.
The result would be a DataFrame with x, y, z, a, b.
I could merge then delete the unwanted columns, but it seems like there is a better method.
回答 0
您可以合并sub-DataFrame(仅包含那些列):
df2[list('xab')] # df2 but only with columns x, a, and b
df1.merge(df2[list('xab')])
You could merge the sub-DataFrame (with just those columns):
df2[list('xab')] # df2 but only with columns x, a, and b
df1.merge(df2[list('xab')])
回答 1
您想使用两个括号,因此,如果要执行VLOOKUP动作,请执行以下操作:
df = pd.merge(df,df2[['Key_Column','Target_Column']],on='Key_Column', how='left')
这将为您提供原始df中的所有内容,并在df2中添加您想要加入的相应列。
You want to use TWO brackets, so if you are doing a VLOOKUP sort of action:
df = pd.merge(df,df2[['Key_Column','Target_Column']],on='Key_Column', how='left')
This will give you everything in the original df + add that one corresponding column in df2 that you want to join.
回答 2
如果要从目标数据框中删除列,但联接需要该列,则可以执行以下操作:
df1 = df1.merge(df2[['a', 'b', 'key1']], how = 'left',
left_on = 'key2', right_on = 'key1').drop('key1')
该.drop('key1')
部分将防止“ key1”保留在结果数据帧中,尽管它首先需要加入。
If you want to drop column(s) from the target data frame, but the column(s) are required for the join, you can do the following:
df1 = df1.merge(df2[['a', 'b', 'key1']], how = 'left',
left_on = 'key2', right_on = 'key1').drop('key1')
The .drop('key1')
part will prevent ‘key1’ from being kept in the resulting data frame, despite it being required to join in the first place.
回答 3
您可以使用.loc
选择所有行的特定列,然后将其拉出。下面是一个示例:
pandas.merge(dataframe1, dataframe2.iloc[:, [0:5]], how='left', on='key')
在此示例中,您要合并dataframe1和dataframe2。您已选择对“键”进行外部左连接。但是,对于dataframe2,您指定.iloc
了允许您以数字格式指定想要的行和列的方法。使用:
,选择所有行,但[0:5]
选择前5列。您可以使用.loc
按名称指定,但是如果您使用长列名称,则.iloc
可能会更好。
You can use .loc
to select the specific columns with all rows and then pull that. An example is below:
pandas.merge(dataframe1, dataframe2.iloc[:, [0:5]], how='left', on='key')
In this example, you are merging dataframe1 and dataframe2. You have chosen to do an outer left join on ‘key’. However, for dataframe2 you have specified .iloc
which allows you to specific the rows and columns you want in a numerical format. Using :
, your selecting all rows, but [0:5]
selects the first 5 columns. You could use .loc
to specify by name, but if your dealing with long column names, then .iloc
may be better.
回答 4
这是为了合并两个表中的选定列。
如果table_1
包含t1_a,t1_b,t1_c..,id,..t1_z
且table_2
包含t2_a, t2_b, t2_c..., id,..t2_z
列,并且最终表中仅需要t1_a,id和t2_a,则
mergedCSV = table_1[['t1_a','id']].merge(table_2[['t2_a','id']], on = 'id',how = 'left')
# save resulting output file
mergedCSV.to_csv('output.csv',index = False)
This is to merge selected columns from two tables.
If table_1
contains t1_a,t1_b,t1_c..,id,..t1_z
columns,
and table_2
contains t2_a, t2_b, t2_c..., id,..t2_z
columns,
and only t1_a, id, t2_a are required in the final table, then
mergedCSV = table_1[['t1_a','id']].merge(table_2[['t2_a','id']], on = 'id',how = 'left')
# save resulting output file
mergedCSV.to_csv('output.csv',index = False)