问题:在熊猫数据框中插入一行
我有一个数据框:
s1 = pd.Series([5, 6, 7])
s2 = pd.Series([7, 8, 9])
df = pd.DataFrame([list(s1), list(s2)], columns = ["A", "B", "C"])
A B C
0 5 6 7
1 7 8 9
[2 rows x 3 columns]
并且我需要添加第一行[2、3、4]以获取:
A B C
0 2 3 4
1 5 6 7
2 7 8 9
我已经尝试过append()
并concat()
起作用,但是找不到正确的方法。
如何在数据框中添加/插入序列?
I have a dataframe:
s1 = pd.Series([5, 6, 7])
s2 = pd.Series([7, 8, 9])
df = pd.DataFrame([list(s1), list(s2)], columns = ["A", "B", "C"])
A B C
0 5 6 7
1 7 8 9
[2 rows x 3 columns]
and I need to add a first row [2, 3, 4] to get:
A B C
0 2 3 4
1 5 6 7
2 7 8 9
I’ve tried append()
and concat()
functions but can’t find the right way how to do that.
How to add/insert series to dataframe?
回答 0
只需使用以下命令将行分配给特定索引loc
:
df.loc[-1] = [2, 3, 4] # adding a row
df.index = df.index + 1 # shifting index
df = df.sort_index() # sorting by index
然后,您可以根据需要获得:
A B C
0 2 3 4
1 5 6 7
2 7 8 9
请参阅Pandas文档中的“ 索引:放大设置”。
Just assign row to a particular index, using loc
:
df.loc[-1] = [2, 3, 4] # adding a row
df.index = df.index + 1 # shifting index
df = df.sort_index() # sorting by index
And you get, as desired:
A B C
0 2 3 4
1 5 6 7
2 7 8 9
See in Pandas documentation Indexing: Setting with enlargement.
回答 1
不确定您的调用方式,concat()
但是只要两个对象的类型相同,它就可以正常工作。也许问题是您需要将第二个向量转换为数据框?使用您定义的df,以下对我有用:
df2 = pd.DataFrame([[2,3,4]], columns=['A','B','C'])
pd.concat([df2, df])
Not sure how you were calling concat()
but it should work as long as both objects are of the same type. Maybe the issue is that you need to cast your second vector to a dataframe? Using the df that you defined the following works for me:
df2 = pd.DataFrame([[2,3,4]], columns=['A','B','C'])
pd.concat([df2, df])
回答 2
实现此目的的一种方法是
>>> pd.DataFrame(np.array([[2, 3, 4]]), columns=['A', 'B', 'C']).append(df, ignore_index=True)
Out[330]:
A B C
0 2 3 4
1 5 6 7
2 7 8 9
通常,最简单的方法是附加数据帧,而不是序列。在您的情况下,由于您希望新行位于“顶部”(具有起始ID),并且没有功能pd.prepend()
,因此我首先创建新的数据框,然后追加旧的数据框。
ignore_index
会忽略数据框中旧的正在进行的索引,并确保第一行实际上以index开头,1
而不是以index重启0
。
典型的免责声明:Cetero censeo …追加行是一种效率很低的操作。如果您关心性能,并且可以某种方式确保首先创建具有正确(较长)索引的数据框,然后仅将另一行插入该数据框,则绝对应该这样做。看到:
>>> index = np.array([0, 1, 2])
>>> df2 = pd.DataFrame(columns=['A', 'B', 'C'], index=index)
>>> df2.loc[0:1] = [list(s1), list(s2)]
>>> df2
Out[336]:
A B C
0 5 6 7
1 7 8 9
2 NaN NaN NaN
>>> df2 = pd.DataFrame(columns=['A', 'B', 'C'], index=index)
>>> df2.loc[1:] = [list(s1), list(s2)]
到目前为止,我们拥有您所拥有的df
:
>>> df2
Out[339]:
A B C
0 NaN NaN NaN
1 5 6 7
2 7 8 9
但是现在您可以按如下所示轻松插入该行。由于空间是预先分配的,因此效率更高。
>>> df2.loc[0] = np.array([2, 3, 4])
>>> df2
Out[341]:
A B C
0 2 3 4
1 5 6 7
2 7 8 9
One way to achieve this is
>>> pd.DataFrame(np.array([[2, 3, 4]]), columns=['A', 'B', 'C']).append(df, ignore_index=True)
Out[330]:
A B C
0 2 3 4
1 5 6 7
2 7 8 9
Generally, it’s easiest to append dataframes, not series. In your case, since you want the new row to be “on top” (with starting id), and there is no function pd.prepend()
, I first create the new dataframe and then append your old one.
ignore_index
will ignore the old ongoing index in your dataframe and ensure that the first row actually starts with index 1
instead of restarting with index 0
.
Typical Disclaimer: Cetero censeo … appending rows is a quite inefficient operation. If you care about performance and can somehow ensure to first create a dataframe with the correct (longer) index and then just inserting the additional row into the dataframe, you should definitely do that. See:
>>> index = np.array([0, 1, 2])
>>> df2 = pd.DataFrame(columns=['A', 'B', 'C'], index=index)
>>> df2.loc[0:1] = [list(s1), list(s2)]
>>> df2
Out[336]:
A B C
0 5 6 7
1 7 8 9
2 NaN NaN NaN
>>> df2 = pd.DataFrame(columns=['A', 'B', 'C'], index=index)
>>> df2.loc[1:] = [list(s1), list(s2)]
So far, we have what you had as df
:
>>> df2
Out[339]:
A B C
0 NaN NaN NaN
1 5 6 7
2 7 8 9
But now you can easily insert the row as follows. Since the space was preallocated, this is more efficient.
>>> df2.loc[0] = np.array([2, 3, 4])
>>> df2
Out[341]:
A B C
0 2 3 4
1 5 6 7
2 7 8 9
回答 3
我整理了一个简短的函数,该函数在插入行时具有更大的灵活性:
def insert_row(idx, df, df_insert):
dfA = df.iloc[:idx, ]
dfB = df.iloc[idx:, ]
df = dfA.append(df_insert).append(dfB).reset_index(drop = True)
return df
可以进一步缩短为:
def insert_row(idx, df, df_insert):
return df.iloc[:idx, ].append(df_insert).append(df.iloc[idx:, ]).reset_index(drop = True)
然后,您可以使用类似:
df = insert_row(2, df, df_new)
这里2
是在索引位置df
要插入df_new
。
I put together a short function that allows for a little more flexibility when inserting a row:
def insert_row(idx, df, df_insert):
dfA = df.iloc[:idx, ]
dfB = df.iloc[idx:, ]
df = dfA.append(df_insert).append(dfB).reset_index(drop = True)
return df
which could be further shortened to:
def insert_row(idx, df, df_insert):
return df.iloc[:idx, ].append(df_insert).append(df.iloc[idx:, ]).reset_index(drop = True)
Then you could use something like:
df = insert_row(2, df, df_new)
where 2
is the index position in df
where you want to insert df_new
.
回答 4
我们可以使用numpy.insert
。这具有灵活性的优点。您只需要指定要插入的索引。
s1 = pd.Series([5, 6, 7])
s2 = pd.Series([7, 8, 9])
df = pd.DataFrame([list(s1), list(s2)], columns = ["A", "B", "C"])
pd.DataFrame(np.insert(df.values, 0, values=[2, 3, 4], axis=0))
0 1 2
0 2 3 4
1 5 6 7
2 7 8 9
对于np.insert(df.values, 0, values=[2, 3, 4], axis=0)
,0告诉函数要放置新值的位置/索引。
We can use numpy.insert
. This has the advantage of flexibility. You only need to specify the index you want to insert to.
s1 = pd.Series([5, 6, 7])
s2 = pd.Series([7, 8, 9])
df = pd.DataFrame([list(s1), list(s2)], columns = ["A", "B", "C"])
pd.DataFrame(np.insert(df.values, 0, values=[2, 3, 4], axis=0))
0 1 2
0 2 3 4
1 5 6 7
2 7 8 9
For np.insert(df.values, 0, values=[2, 3, 4], axis=0)
, 0 tells the function the place/index you want to place the new values.
回答 5
这看似过于简单,但令人难以置信的是,没有内置简单的插入新行功能。我已经读了很多关于将新df附加到原始df的信息,但是我想知道这样做是否会更快。
df.loc[0] = [row1data, blah...]
i = len(df) + 1
df.loc[i] = [row2data, blah...]
this might seem overly simple but its incredible that a simple insert new row function isn’t built in. i’ve read a lot about appending a new df to the original, but i’m wondering if this would be faster.
df.loc[0] = [row1data, blah...]
i = len(df) + 1
df.loc[i] = [row2data, blah...]
回答 6
以下是在不排序和重置索引的情况下将行插入pandas数据框的最佳方法:
import pandas as pd
df = pd.DataFrame(columns=['a','b','c'])
def insert(df, row):
insert_loc = df.index.max()
if pd.isna(insert_loc):
df.loc[0] = row
else:
df.loc[insert_loc + 1] = row
insert(df,[2,3,4])
insert(df,[8,9,0])
print(df)
Below would be the best way to insert a row into pandas dataframe without sorting and reseting an index:
import pandas as pd
df = pd.DataFrame(columns=['a','b','c'])
def insert(df, row):
insert_loc = df.index.max()
if pd.isna(insert_loc):
df.loc[0] = row
else:
df.loc[insert_loc + 1] = row
insert(df,[2,3,4])
insert(df,[8,9,0])
print(df)
回答 7
concat()
似乎比最后一行插入和重新索引要快一点。如果有人想知道两种主要方法的速度:
In [x]: %%timeit
...: df = pd.DataFrame(columns=['a','b'])
...: for i in range(10000):
...: df.loc[-1] = [1,2]
...: df.index = df.index + 1
...: df = df.sort_index()
每个循环17.1 s±705毫秒(平均±标准偏差,共7次运行,每个循环1次)
In [y]: %%timeit
...: df = pd.DataFrame(columns=['a', 'b'])
...: for i in range(10000):
...: df = pd.concat([pd.DataFrame([[1,2]], columns=df.columns), df])
每个循环6.53 s±127毫秒(平均±标准偏差,共7次运行,每个循环1次)
concat()
seems to be a bit faster than last row insertion and reindexing.
In case someone would wonder about the speed of two top approaches:
In [x]: %%timeit
...: df = pd.DataFrame(columns=['a','b'])
...: for i in range(10000):
...: df.loc[-1] = [1,2]
...: df.index = df.index + 1
...: df = df.sort_index()
17.1 s ± 705 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
In [y]: %%timeit
...: df = pd.DataFrame(columns=['a', 'b'])
...: for i in range(10000):
...: df = pd.concat([pd.DataFrame([[1,2]], columns=df.columns), df])
6.53 s ± 127 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
回答 8
在pandas中添加一行很简单DataFrame
:
创建一个与您的列名称相同的常规Python字典Dataframe
;
使用pandas.append()
method并传入您的字典名称,其中.append()
DataFrame实例上的方法是;
ignore_index=True
在您的词典名称之后添加。
It is pretty simple to add a row into a pandas DataFrame
:
Create a regular Python dictionary with the same columns names as your Dataframe
;
Use pandas.append()
method and pass in the name of your dictionary, where .append()
is a method on DataFrame instances;
Add ignore_index=True
right after your dictionary name.
回答 9
您可以简单地将行追加到DataFrame的末尾,然后调整索引。
例如:
df = df.append(pd.DataFrame([[2,3,4]],columns=df.columns),ignore_index=True)
df.index = (df.index + 1) % len(df)
df = df.sort_index()
或concat
用作:
df = pd.concat([pd.DataFrame([[1,2,3,4,5,6]],columns=df.columns),df],ignore_index=True)
You can simply append the row to the end of the DataFrame, and then adjust the index.
For instance:
df = df.append(pd.DataFrame([[2,3,4]],columns=df.columns),ignore_index=True)
df.index = (df.index + 1) % len(df)
df = df.sort_index()
Or use concat
as:
df = pd.concat([pd.DataFrame([[1,2,3,4,5,6]],columns=df.columns),df],ignore_index=True)
回答 10
在熊猫数据框中添加一行的最简单方法是:
DataFrame.loc[ location of insertion ]= list( )
范例:
DF.loc[ 9 ] = [ ´Pepe’ , 33, ´Japan’ ]
注意:列表的长度应与数据框的长度匹配。
The simplest way add a row in a pandas data frame is:
DataFrame.loc[ location of insertion ]= list( )
Example :
DF.loc[ 9 ] = [ ´Pepe’ , 33, ´Japan’ ]
NB: the length of your list should match that of the data frame.