问题:如何将空列添加到数据框?
向熊猫DataFrame
对象添加空列的最简单方法是什么?我偶然发现的最好的东西是
df['foo'] = df.apply(lambda _: '', axis=1)
有没有那么不合常理的方法?
What’s the easiest way to add an empty column to a pandas DataFrame
object? The best I’ve stumbled upon is something like
df['foo'] = df.apply(lambda _: '', axis=1)
Is there a less perverse method?
回答 0
如果我理解正确,则应填写作业:
>>> import numpy as np
>>> import pandas as pd
>>> df = pd.DataFrame({"A": [1,2,3], "B": [2,3,4]})
>>> df
A B
0 1 2
1 2 3
2 3 4
>>> df["C"] = ""
>>> df["D"] = np.nan
>>> df
A B C D
0 1 2 NaN
1 2 3 NaN
2 3 4 NaN
If I understand correctly, assignment should fill:
>>> import numpy as np
>>> import pandas as pd
>>> df = pd.DataFrame({"A": [1,2,3], "B": [2,3,4]})
>>> df
A B
0 1 2
1 2 3
2 3 4
>>> df["C"] = ""
>>> df["D"] = np.nan
>>> df
A B C D
0 1 2 NaN
1 2 3 NaN
2 3 4 NaN
回答 1
为了增加DSM的答案并以这个相关问题为基础,我将该方法分为两种情况:
这是添加多列的示例:
mydf = mydf.reindex(columns = mydf.columns.tolist() + ['newcol1','newcol2'])
要么
mydf = mydf.reindex(mydf.columns.tolist() + ['newcol1','newcol2'], axis=1) # version > 0.20.0
您还可以始终将新的(空)数据框连接到现有数据框,但这对我来说并不像pythonic那样:)
To add to DSM’s answer and building on this associated question, I’d split the approach into two cases:
Adding a single column: Just assign empty values to the new columns, e.g. df['C'] = np.nan
Adding multiple columns: I’d suggest using the .reindex(columns=[...])
method of pandas to add the new columns to the dataframe’s column index. This also works for adding multiple new rows with .reindex(rows=[...])
. Note that newer versions of Pandas (v>0.20) allow you to specify an axis
keyword rather than explicitly assigning to columns
or rows
.
Here is an example adding multiple columns:
mydf = mydf.reindex(columns = mydf.columns.tolist() + ['newcol1','newcol2'])
or
mydf = mydf.reindex(mydf.columns.tolist() + ['newcol1','newcol2'], axis=1) # version > 0.20.0
You can also always concatenate a new (empty) dataframe to the existing dataframe, but that doesn’t feel as pythonic to me :)
回答 2
一个更简单的解决方案是:
df = df.reindex(columns = header_list)
其中“ header_list”是要显示的标题的列表。
列表中包含的,在数据框中尚未找到的所有标头都将添加以下空白单元格。
因此,如果
header_list = ['a','b','c', 'd']
然后将c和d添加为具有空白单元格的列
an even simpler solution is:
df = df.reindex(columns = header_list)
where “header_list” is a list of the headers you want to appear.
any header included in the list that is not found already in the dataframe will be added with blank cells below.
so if
header_list = ['a','b','c', 'd']
then c and d will be added as columns with blank cells
回答 3
以开始v0.16.0
,DF.assign()
可用于为分配新列(单/多)DF
。这些列在末尾按字母顺序插入DF
。
与您希望直接对返回的数据帧执行一系列链接操作的情况相比,与简单分配相比,这变得很有优势。
考虑DF
@DSM演示的相同示例:
df = pd.DataFrame({"A": [1,2,3], "B": [2,3,4]})
df
Out[18]:
A B
0 1 2
1 2 3
2 3 4
df.assign(C="",D=np.nan)
Out[21]:
A B C D
0 1 2 NaN
1 2 3 NaN
2 3 4 NaN
请注意,这将返回一个包含所有先前列以及新创建列的副本。为了对原件DF
进行相应的修改,请像:df = df.assign(...)
一样使用它,因为它inplace
当前不支持操作。
Starting with v0.16.0
, DF.assign()
could be used to assign new columns (single/multiple) to a DF
. These columns get inserted in alphabetical order at the end of the DF
.
This becomes advantageous compared to simple assignment in cases wherein you want to perform a series of chained operations directly on the returned dataframe.
Consider the same DF
sample demonstrated by @DSM:
df = pd.DataFrame({"A": [1,2,3], "B": [2,3,4]})
df
Out[18]:
A B
0 1 2
1 2 3
2 3 4
df.assign(C="",D=np.nan)
Out[21]:
A B C D
0 1 2 NaN
1 2 3 NaN
2 3 4 NaN
Note that this returns a copy with all the previous columns along with the newly created ones. In order for the original DF
to be modified accordingly, use it like : df = df.assign(...)
as it does not support inplace
operation currently.
回答 4
我喜欢:
df['new'] = pd.Series(dtype='your_required_dtype')
如果数据框为空,则此解决方案可确保不NaN
添加仅包含新行的内容。
如果dtype
未指定,则较新的Pandas版本会产生DeprecationWarning
。
I like:
df['new'] = pd.Series(dtype='your_required_dtype')
If you have an empty dataframe, this solution makes sure that no new row containing only NaN
is added.
Specifying dtype
is not strictly necessary, however newer Pandas versions produce a DeprecationWarning
if not specified.
回答 5
如果要从列表中添加列名
df=pd.DataFrame()
a=['col1','col2','col3','col4']
for i in a:
df[i]=np.nan
if you want to add column name from a list
df=pd.DataFrame()
a=['col1','col2','col3','col4']
for i in a:
df[i]=np.nan
回答 6
@emunsing的答案对于添加多个列真的很酷,但是我无法在python 2.7中使用它。相反,我发现这可行:
mydf = mydf.reindex(columns = np.append( mydf.columns.values, ['newcol1','newcol2'])
@emunsing’s answer is really cool for adding multiple columns, but I couldn’t get it to work for me in python 2.7. Instead, I found this works:
mydf = mydf.reindex(columns = np.append( mydf.columns.values, ['newcol1','newcol2'])
回答 7
下面的代码解决了“如何向现有数据帧中添加n个空列”的问题。为了将针对类似问题的解决方案集中在一个地方,我在这里添加它。
方法1(使用1-64的列名创建64个其他列)
m = list(range(1,65,1))
dd=pd.DataFrame(columns=m)
df.join(dd).replace(np.nan,'') #df is the dataframe that already exists
方法2(使用1-64的列名称创建64个其他列)
df.reindex(df.columns.tolist() + list(range(1,65,1)), axis=1).replace(np.nan,'')
The below code address the question “How do I add n number of empty columns to my existing dataframe”. In the interest of keeping solutions to similar problems in one place, I am adding it here.
Approach 1 (to create 64 additional columns with column names from 1-64)
m = list(range(1,65,1))
dd=pd.DataFrame(columns=m)
df.join(dd).replace(np.nan,'') #df is the dataframe that already exists
Approach 2 (to create 64 additional columns with column names from 1-64)
df.reindex(df.columns.tolist() + list(range(1,65,1)), axis=1).replace(np.nan,'')
回答 8
你可以做
df['column'] = None #This works. This will create a new column with None type
df.column = None #This will work only when the column is already present in the dataframe
You can do
df['column'] = None #This works. This will create a new column with None type
df.column = None #This will work only when the column is already present in the dataframe
回答 9
可以用来df.insert(index_to_insert_at, column_header, init_value)
在特定索引处插入新列。
cost_tbl.insert(1, "col_name", "")
上面的语句将在第一列之后插入一个空列。
One can use df.insert(index_to_insert_at, column_header, init_value)
to insert new column at a specific index.
cost_tbl.insert(1, "col_name", "")
The above statement would insert an empty Column after the first column.