问题:如何将空列添加到数据框?

向熊猫DataFrame对象添加空列的最简单方法是什么?我偶然发现的最好的东西是

df['foo'] = df.apply(lambda _: '', axis=1)

有没有那么不合常理的方法?

What’s the easiest way to add an empty column to a pandas DataFrame object? The best I’ve stumbled upon is something like

df['foo'] = df.apply(lambda _: '', axis=1)

Is there a less perverse method?


回答 0

如果我理解正确,则应填写作业:

>>> import numpy as np
>>> import pandas as pd
>>> df = pd.DataFrame({"A": [1,2,3], "B": [2,3,4]})
>>> df
   A  B
0  1  2
1  2  3
2  3  4
>>> df["C"] = ""
>>> df["D"] = np.nan
>>> df
   A  B C   D
0  1  2   NaN
1  2  3   NaN
2  3  4   NaN

If I understand correctly, assignment should fill:

>>> import numpy as np
>>> import pandas as pd
>>> df = pd.DataFrame({"A": [1,2,3], "B": [2,3,4]})
>>> df
   A  B
0  1  2
1  2  3
2  3  4
>>> df["C"] = ""
>>> df["D"] = np.nan
>>> df
   A  B C   D
0  1  2   NaN
1  2  3   NaN
2  3  4   NaN

回答 1

为了增加DSM的答案并以这个相关问题为基础,我将该方法分为两种情况:

  • 添加单个列:只需将空值分配给新列,例如 df['C'] = np.nan

  • 添加多个列:我建议使用.reindex(columns=[...]) pandas方法将新列添加到数据框的列索引中。这也适用于使用添加多个新行.reindex(rows=[...])。请注意,较新版本的Pandas(v> 0.20)允许您指定axis关键字,而不是显式分配给columnsrows

这是添加多列的示例:

mydf = mydf.reindex(columns = mydf.columns.tolist() + ['newcol1','newcol2'])

要么

mydf = mydf.reindex(mydf.columns.tolist() + ['newcol1','newcol2'], axis=1)  # version > 0.20.0

您还可以始终将新的(空)数据框连接到现有数据框,但这对我来说并不像pythonic那样:)

To add to DSM’s answer and building on this associated question, I’d split the approach into two cases:

  • Adding a single column: Just assign empty values to the new columns, e.g. df['C'] = np.nan

  • Adding multiple columns: I’d suggest using the .reindex(columns=[...]) method of pandas to add the new columns to the dataframe’s column index. This also works for adding multiple new rows with .reindex(rows=[...]). Note that newer versions of Pandas (v>0.20) allow you to specify an axis keyword rather than explicitly assigning to columns or rows.

Here is an example adding multiple columns:

mydf = mydf.reindex(columns = mydf.columns.tolist() + ['newcol1','newcol2'])

or

mydf = mydf.reindex(mydf.columns.tolist() + ['newcol1','newcol2'], axis=1)  # version > 0.20.0

You can also always concatenate a new (empty) dataframe to the existing dataframe, but that doesn’t feel as pythonic to me :)


回答 2

一个更简单的解决方案是:

df = df.reindex(columns = header_list)                

其中“ header_list”是要显示的标题的列表。

列表中包含的,在数据​​框中尚未找到的所有标头都将添加以下空白单元格。

因此,如果

header_list = ['a','b','c', 'd']

然后将c和d添加为具有空白单元格的列

an even simpler solution is:

df = df.reindex(columns = header_list)                

where “header_list” is a list of the headers you want to appear.

any header included in the list that is not found already in the dataframe will be added with blank cells below.

so if

header_list = ['a','b','c', 'd']

then c and d will be added as columns with blank cells


回答 3

以开始v0.16.0DF.assign()可用于为分配新列(单/多DF。这些列在末尾按字母顺序插入DF

与您希望直接对返回的数据帧执行一系列链接操作的情况相比,与简单分配相比,这变得很有优势。

考虑DF@DSM演示的相同示例:

df = pd.DataFrame({"A": [1,2,3], "B": [2,3,4]})
df
Out[18]:
   A  B
0  1  2
1  2  3
2  3  4

df.assign(C="",D=np.nan)
Out[21]:
   A  B C   D
0  1  2   NaN
1  2  3   NaN
2  3  4   NaN

请注意,这将返回一个包含所有先前列以及新创建列的副本。为了对原件DF进行相应的修改,请像:df = df.assign(...)一样使用它,因为它inplace当前不支持操作。

Starting with v0.16.0, DF.assign() could be used to assign new columns (single/multiple) to a DF. These columns get inserted in alphabetical order at the end of the DF.

This becomes advantageous compared to simple assignment in cases wherein you want to perform a series of chained operations directly on the returned dataframe.

Consider the same DF sample demonstrated by @DSM:

df = pd.DataFrame({"A": [1,2,3], "B": [2,3,4]})
df
Out[18]:
   A  B
0  1  2
1  2  3
2  3  4

df.assign(C="",D=np.nan)
Out[21]:
   A  B C   D
0  1  2   NaN
1  2  3   NaN
2  3  4   NaN

Note that this returns a copy with all the previous columns along with the newly created ones. In order for the original DF to be modified accordingly, use it like : df = df.assign(...) as it does not support inplace operation currently.


回答 4

我喜欢:

df['new'] = pd.Series(dtype='your_required_dtype')

如果数据框为空,则此解决方案可确保不NaN添加仅包含新行的内容。

如果dtype未指定,则较新的Pandas版本会产生DeprecationWarning

I like:

df['new'] = pd.Series(dtype='your_required_dtype')

If you have an empty dataframe, this solution makes sure that no new row containing only NaN is added.

Specifying dtype is not strictly necessary, however newer Pandas versions produce a DeprecationWarning if not specified.


回答 5

如果要从列表中添加列名

df=pd.DataFrame()
a=['col1','col2','col3','col4']
for i in a:
    df[i]=np.nan

if you want to add column name from a list

df=pd.DataFrame()
a=['col1','col2','col3','col4']
for i in a:
    df[i]=np.nan

回答 6

@emunsing的答案对于添加多个列真的很酷,但是我无法在python 2.7中使用它。相反,我发现这可行:

mydf = mydf.reindex(columns = np.append( mydf.columns.values, ['newcol1','newcol2'])

@emunsing’s answer is really cool for adding multiple columns, but I couldn’t get it to work for me in python 2.7. Instead, I found this works:

mydf = mydf.reindex(columns = np.append( mydf.columns.values, ['newcol1','newcol2'])

回答 7

下面的代码解决了“如何向现有数据帧中添加n个空列”的问题。为了将针对类似问题的解决方案集中在一个地方,我在这里添加它。

方法1(使用1-64的列名创建64个其他列)

m = list(range(1,65,1)) 
dd=pd.DataFrame(columns=m)
df.join(dd).replace(np.nan,'') #df is the dataframe that already exists

方法2(使用1-64的列名称创建64个其他列)

df.reindex(df.columns.tolist() + list(range(1,65,1)), axis=1).replace(np.nan,'')

The below code address the question “How do I add n number of empty columns to my existing dataframe”. In the interest of keeping solutions to similar problems in one place, I am adding it here.

Approach 1 (to create 64 additional columns with column names from 1-64)

m = list(range(1,65,1)) 
dd=pd.DataFrame(columns=m)
df.join(dd).replace(np.nan,'') #df is the dataframe that already exists

Approach 2 (to create 64 additional columns with column names from 1-64)

df.reindex(df.columns.tolist() + list(range(1,65,1)), axis=1).replace(np.nan,'')

回答 8

你可以做

df['column'] = None #This works. This will create a new column with None type
df.column = None #This will work only when the column is already present in the dataframe 

You can do

df['column'] = None #This works. This will create a new column with None type
df.column = None #This will work only when the column is already present in the dataframe 

回答 9

可以用来df.insert(index_to_insert_at, column_header, init_value)在特定索引处插入新列。

cost_tbl.insert(1, "col_name", "") 

上面的语句将在第一列之后插入一个空列。

One can use df.insert(index_to_insert_at, column_header, init_value) to insert new column at a specific index.

cost_tbl.insert(1, "col_name", "") 

The above statement would insert an empty Column after the first column.


声明:本站所有文章,如无特殊说明或标注,均为本站原创发布。任何个人或组织,在未征得本站同意时,禁止复制、盗用、采集、发布本站内容到任何网站、书籍等各类媒体平台。如若本站内容侵犯了原著者的合法权益,可联系我们进行处理。