内容 隐藏

问题:如何在一次分配中向熊猫数据框添加多列?

我是熊猫的新手,试图弄清楚如何同时向熊猫添加多列。感谢您的帮助。理想情况下,我希望一步一步完成此操作,而不是重复多次…

import pandas as pd

df = {'col_1': [0, 1, 2, 3],
        'col_2': [4, 5, 6, 7]}
df = pd.DataFrame(df)

df[[ 'column_new_1', 'column_new_2','column_new_3']] = [np.nan, 'dogs',3]  #thought this would work here...

I’m new to pandas and trying to figure out how to add multiple columns to pandas simultaneously. Any help here is appreciated. Ideally I would like to do this in one step rather than multiple repeated steps…

import pandas as pd

df = {'col_1': [0, 1, 2, 3],
        'col_2': [4, 5, 6, 7]}
df = pd.DataFrame(df)

df[[ 'column_new_1', 'column_new_2','column_new_3']] = [np.nan, 'dogs',3]  #thought this would work here...

回答 0

我希望您的语法也能正常工作。出现问题是因为当您使用column-list语法(df[[new1, new2]] = ...)创建新列时,pandas要求右侧为DataFrame(请注意,如果DataFrame的列与列的名称相同,则实际上并不重要您正在创建)。

您的语法可以很好地为现有列分配标量值,并且pandas也很乐意使用单列语法(df[new1] = ...)将标量值分配给新列。因此,解决方案是将其转换为几个单列分配,或者为右侧创建一个合适的DataFrame。

这里有几种方法是工作:

import pandas as pd
import numpy as np

df = pd.DataFrame({
    'col_1': [0, 1, 2, 3],
    'col_2': [4, 5, 6, 7]
})

然后执行以下操作之一:

1)使用列表拆包,将三个作业合二为一:

df['column_new_1'], df['column_new_2'], df['column_new_3'] = [np.nan, 'dogs', 3]

2)DataFrame方便地扩展单个行以匹配索引,因此您可以执行以下操作:

df[['column_new_1', 'column_new_2', 'column_new_3']] = pd.DataFrame([[np.nan, 'dogs', 3]], index=df.index)

3)用新列创建一个临时数据框,然后与原始数据框合并:

df = pd.concat(
    [
        df,
        pd.DataFrame(
            [[np.nan, 'dogs', 3]], 
            index=df.index, 
            columns=['column_new_1', 'column_new_2', 'column_new_3']
        )
    ], axis=1
)

4)与前面类似,但是使用join代替concat(可能效率较低):

df = df.join(pd.DataFrame(
    [[np.nan, 'dogs', 3]], 
    index=df.index, 
    columns=['column_new_1', 'column_new_2', 'column_new_3']
))

5)使用dict比前两个更“自然”地创建新数据框,但是新列将按字母顺序排序(至少在Python 3.6或3.7之前):

df = df.join(pd.DataFrame(
    {
        'column_new_1': np.nan,
        'column_new_2': 'dogs',
        'column_new_3': 3
    }, index=df.index
))

6).assign()与多个列参数一起使用。

我非常喜欢@zero的答案中的此变体,但像上一个一样,新列将始终按字母顺序排序,至少在早期版本的Python中:

df = df.assign(column_new_1=np.nan, column_new_2='dogs', column_new_3=3)

7)这很有趣(基于https://stackoverflow.com/a/44951376/3830997),但是我不知道什么时候值得这样做:

new_cols = ['column_new_1', 'column_new_2', 'column_new_3']
new_vals = [np.nan, 'dogs', 3]
df = df.reindex(columns=df.columns.tolist() + new_cols)   # add empty cols
df[new_cols] = new_vals  # multi-column assignment works for existing cols

8)最后,很难击败三个独立的任务:

df['column_new_1'] = np.nan
df['column_new_2'] = 'dogs'
df['column_new_3'] = 3

注意:这些选项中的许多选项已经包含在其他答案中:将多个列添加到DataFrame并将它们设置为等于现有列是否可以一次将多个列添加到pandas DataFrame?向pandas DataFrame添加多个空列

I would have expected your syntax to work too. The problem arises because when you create new columns with the column-list syntax (df[[new1, new2]] = ...), pandas requires that the right hand side be a DataFrame (note that it doesn’t actually matter if the columns of the DataFrame have the same names as the columns you are creating).

Your syntax works fine for assigning scalar values to existing columns, and pandas is also happy to assign scalar values to a new column using the single-column syntax (df[new1] = ...). So the solution is either to convert this into several single-column assignments, or create a suitable DataFrame for the right-hand side.

Here are several approaches that will work:

import pandas as pd
import numpy as np

df = pd.DataFrame({
    'col_1': [0, 1, 2, 3],
    'col_2': [4, 5, 6, 7]
})

Then one of the following:

1) Three assignments in one, using list unpacking:

df['column_new_1'], df['column_new_2'], df['column_new_3'] = [np.nan, 'dogs', 3]

2) DataFrame conveniently expands a single row to match the index, so you can do this:

df[['column_new_1', 'column_new_2', 'column_new_3']] = pd.DataFrame([[np.nan, 'dogs', 3]], index=df.index)

3) Make a temporary data frame with new columns, then combine with the original data frame later:

df = pd.concat(
    [
        df,
        pd.DataFrame(
            [[np.nan, 'dogs', 3]], 
            index=df.index, 
            columns=['column_new_1', 'column_new_2', 'column_new_3']
        )
    ], axis=1
)

4) Similar to the previous, but using join instead of concat (may be less efficient):

df = df.join(pd.DataFrame(
    [[np.nan, 'dogs', 3]], 
    index=df.index, 
    columns=['column_new_1', 'column_new_2', 'column_new_3']
))

5) Using a dict is a more “natural” way to create the new data frame than the previous two, but the new columns will be sorted alphabetically (at least before Python 3.6 or 3.7):

df = df.join(pd.DataFrame(
    {
        'column_new_1': np.nan,
        'column_new_2': 'dogs',
        'column_new_3': 3
    }, index=df.index
))

6) Use .assign() with multiple column arguments.

I like this variant on @zero’s answer a lot, but like the previous one, the new columns will always be sorted alphabetically, at least with early versions of Python:

df = df.assign(column_new_1=np.nan, column_new_2='dogs', column_new_3=3)

7) This is interesting (based on https://stackoverflow.com/a/44951376/3830997), but I don’t know when it would be worth the trouble:

new_cols = ['column_new_1', 'column_new_2', 'column_new_3']
new_vals = [np.nan, 'dogs', 3]
df = df.reindex(columns=df.columns.tolist() + new_cols)   # add empty cols
df[new_cols] = new_vals  # multi-column assignment works for existing cols

8) In the end it’s hard to beat three separate assignments:

df['column_new_1'] = np.nan
df['column_new_2'] = 'dogs'
df['column_new_3'] = 3

Note: many of these options have already been covered in other answers: Add multiple columns to DataFrame and set them equal to an existing column, Is it possible to add several columns at once to a pandas DataFrame?, Add multiple empty columns to pandas DataFrame


回答 1

您可以使用assign列名称和值的字典。

In [1069]: df.assign(**{'col_new_1': np.nan, 'col2_new_2': 'dogs', 'col3_new_3': 3})
Out[1069]:
   col_1  col_2 col2_new_2  col3_new_3  col_new_1
0      0      4       dogs           3        NaN
1      1      5       dogs           3        NaN
2      2      6       dogs           3        NaN
3      3      7       dogs           3        NaN

You could use assign with a dict of column names and values.

In [1069]: df.assign(**{'col_new_1': np.nan, 'col2_new_2': 'dogs', 'col3_new_3': 3})
Out[1069]:
   col_1  col_2 col2_new_2  col3_new_3  col_new_1
0      0      4       dogs           3        NaN
1      1      5       dogs           3        NaN
2      2      6       dogs           3        NaN
3      3      7       dogs           3        NaN

回答 2

随着concat的使用:

In [128]: df
Out[128]: 
   col_1  col_2
0      0      4
1      1      5
2      2      6
3      3      7

In [129]: pd.concat([df, pd.DataFrame(columns = [ 'column_new_1', 'column_new_2','column_new_3'])])
Out[129]: 
   col_1  col_2 column_new_1 column_new_2 column_new_3
0    0.0    4.0          NaN          NaN          NaN
1    1.0    5.0          NaN          NaN          NaN
2    2.0    6.0          NaN          NaN          NaN
3    3.0    7.0          NaN          NaN          NaN

不太确定您想做什么[np.nan, 'dogs',3]。也许现在将它们设置为默认值?

In [142]: df1 = pd.concat([df, pd.DataFrame(columns = [ 'column_new_1', 'column_new_2','column_new_3'])])
In [143]: df1[[ 'column_new_1', 'column_new_2','column_new_3']] = [np.nan, 'dogs', 3]

In [144]: df1
Out[144]: 
   col_1  col_2  column_new_1 column_new_2  column_new_3
0    0.0    4.0           NaN         dogs             3
1    1.0    5.0           NaN         dogs             3
2    2.0    6.0           NaN         dogs             3
3    3.0    7.0           NaN         dogs             3

With the use of concat:

In [128]: df
Out[128]: 
   col_1  col_2
0      0      4
1      1      5
2      2      6
3      3      7

In [129]: pd.concat([df, pd.DataFrame(columns = [ 'column_new_1', 'column_new_2','column_new_3'])])
Out[129]: 
   col_1  col_2 column_new_1 column_new_2 column_new_3
0    0.0    4.0          NaN          NaN          NaN
1    1.0    5.0          NaN          NaN          NaN
2    2.0    6.0          NaN          NaN          NaN
3    3.0    7.0          NaN          NaN          NaN

Not very sure of what you wanted to do with [np.nan, 'dogs',3]. Maybe now set them as default values?

In [142]: df1 = pd.concat([df, pd.DataFrame(columns = [ 'column_new_1', 'column_new_2','column_new_3'])])
In [143]: df1[[ 'column_new_1', 'column_new_2','column_new_3']] = [np.nan, 'dogs', 3]

In [144]: df1
Out[144]: 
   col_1  col_2  column_new_1 column_new_2  column_new_3
0    0.0    4.0           NaN         dogs             3
1    1.0    5.0           NaN         dogs             3
2    2.0    6.0           NaN         dogs             3
3    3.0    7.0           NaN         dogs             3

回答 3

使用列表理解,pd.DataFrame以及pd.concat

pd.concat(
    [
        df,
        pd.DataFrame(
            [[np.nan, 'dogs', 3] for _ in range(df.shape[0])],
            df.index, ['column_new_1', 'column_new_2','column_new_3']
        )
    ], axis=1)

在此处输入图片说明

use of list comprehension, pd.DataFrame and pd.concat

pd.concat(
    [
        df,
        pd.DataFrame(
            [[np.nan, 'dogs', 3] for _ in range(df.shape[0])],
            df.index, ['column_new_1', 'column_new_2','column_new_3']
        )
    ], axis=1)

enter image description here


回答 4

如果添加许多具有相同值的缺失列(a,b,c,….),这里为0,我这样做:

    new_cols = ["a", "b", "c" ] 
    df[new_cols] = pd.DataFrame([[0] * len(new_cols)], index=df.index)

它基于已接受答案的第二个变体。

if adding a lot of missing columns (a, b, c ,….) with the same value, here 0, i did this:

    new_cols = ["a", "b", "c" ] 
    df[new_cols] = pd.DataFrame([[0] * len(new_cols)], index=df.index)

It’s based on the second variant of the accepted answer.


回答 5

只想指出@Matthias Fripp的答案中的option2

(2)我不一定希望DataFrame可以这种方式工作,但确实可以

df [[”column_new_1’,’column_new_2’,’column_new_3′]] = pd.DataFrame([[np.nan,’dogs’,3]],index = df.index)

已记录在熊猫自己的文档中 http://pandas.pydata.org/pandas-docs/stable/indexing.html#basics

您可以将列列表传递给[],以按此顺序选择列。如果DataFrame中不包含任何列,则将引发异常。 也可以以此方式设置多列。 您可能会发现这对于将转换(就地)应用于列的子集很有用。

Just want to point out that option2 in @Matthias Fripp’s answer

(2) I wouldn’t necessarily expect DataFrame to work this way, but it does

df[[‘column_new_1’, ‘column_new_2’, ‘column_new_3’]] = pd.DataFrame([[np.nan, ‘dogs’, 3]], index=df.index)

is already documented in pandas’ own documentation http://pandas.pydata.org/pandas-docs/stable/indexing.html#basics

You can pass a list of columns to [] to select columns in that order. If a column is not contained in the DataFrame, an exception will be raised. Multiple columns can also be set in this manner. You may find this useful for applying a transform (in-place) to a subset of the columns.


回答 6

如果您只想添加空的新列,则reindex将完成此工作

df
   col_1  col_2
0      0      4
1      1      5
2      2      6
3      3      7

df.reindex(list(df)+['column_new_1', 'column_new_2','column_new_3'], axis=1)
   col_1  col_2  column_new_1  column_new_2  column_new_3
0      0      4           NaN           NaN           NaN
1      1      5           NaN           NaN           NaN
2      2      6           NaN           NaN           NaN
3      3      7           NaN           NaN           NaN

完整的代码示例

import numpy as np
import pandas as pd

df = {'col_1': [0, 1, 2, 3],
        'col_2': [4, 5, 6, 7]}
df = pd.DataFrame(df)
print('df',df, sep='\n')
print()
df=df.reindex(list(df)+['column_new_1', 'column_new_2','column_new_3'], axis=1)
print('''df.reindex(list(df)+['column_new_1', 'column_new_2','column_new_3'], axis=1)''',df, sep='\n')

否则去分配答案

If you just want to add empty new columns, reindex will do the job

df
   col_1  col_2
0      0      4
1      1      5
2      2      6
3      3      7

df.reindex(list(df)+['column_new_1', 'column_new_2','column_new_3'], axis=1)
   col_1  col_2  column_new_1  column_new_2  column_new_3
0      0      4           NaN           NaN           NaN
1      1      5           NaN           NaN           NaN
2      2      6           NaN           NaN           NaN
3      3      7           NaN           NaN           NaN

full code example

import numpy as np
import pandas as pd

df = {'col_1': [0, 1, 2, 3],
        'col_2': [4, 5, 6, 7]}
df = pd.DataFrame(df)
print('df',df, sep='\n')
print()
df=df.reindex(list(df)+['column_new_1', 'column_new_2','column_new_3'], axis=1)
print('''df.reindex(list(df)+['column_new_1', 'column_new_2','column_new_3'], axis=1)''',df, sep='\n')

otherwise go for zeros answer with assign


回答 7

我不喜欢使用“索引”,依此类推…可能如下

df.columns
Index(['A123', 'B123'], dtype='object')

df=pd.concat([df,pd.DataFrame(columns=list('CDE'))])

df.rename(columns={
    'C':'C123',
    'D':'D123',
    'E':'E123'
},inplace=True)


df.columns
Index(['A123', 'B123', 'C123', 'D123', 'E123'], dtype='object')

I am not comfortable using “Index” and so on…could come up as below

df.columns
Index(['A123', 'B123'], dtype='object')

df=pd.concat([df,pd.DataFrame(columns=list('CDE'))])

df.rename(columns={
    'C':'C123',
    'D':'D123',
    'E':'E123'
},inplace=True)


df.columns
Index(['A123', 'B123', 'C123', 'D123', 'E123'], dtype='object')

声明:本站所有文章,如无特殊说明或标注,均为本站原创发布。任何个人或组织,在未征得本站同意时,禁止复制、盗用、采集、发布本站内容到任何网站、书籍等各类媒体平台。如若本站内容侵犯了原著者的合法权益,可联系我们进行处理。