向pandas DataFrame添加一行-Python 实用宝典

向pandas DataFrame添加一行

我知道pandas旨在加载完全填充的内容,DataFrame但是我需要创建一个空的DataFrame然后逐行添加行。做这个的最好方式是什么 ? 我成功创建了一个空的DataFrame: res = DataFrame(columns=('lib', 'qty1', 'qty2')) 然后,我可以添加新行,并使用以下字段填充字段: res = res.set_value(len(res), 'qty1', 10.0) 它有效,但看起来很奇怪:-/(添加字符串值失败) 如何将新行添加到DataFrame(具有不同的列类型)?

问题:向pandas DataFrame添加一行

我知道pandas旨在加载完全填充的内容,DataFrame但是我需要创建一个空的DataFrame然后逐行添加行。做这个的最好方式是什么 ?

我成功创建了一个空的DataFrame:

res = DataFrame(columns=('lib', 'qty1', 'qty2'))

然后,我可以添加新行,并使用以下字段填充字段:

res = res.set_value(len(res), 'qty1', 10.0)

它有效,但看起来很奇怪:-/(添加字符串值失败)

如何将新行添加到DataFrame(具有不同的列类型)?

I understand that pandas is designed to load fully populated DataFrame but I need to create an empty DataFrame then add rows, one by one. What is the best way to do this ?

I successfully created an empty DataFrame with :

res = DataFrame(columns=('lib', 'qty1', 'qty2'))

Then I can add a new row and fill a field with :

res = res.set_value(len(res), 'qty1', 10.0)

It works but seems very odd :-/ (it fails for adding string value)

How can I add a new row to my DataFrame (with different columns type) ?


回答 0

>>> import pandas as pd
>>> from numpy.random import randint

>>> df = pd.DataFrame(columns=['lib', 'qty1', 'qty2'])
>>> for i in range(5):
>>>     df.loc[i] = ['name' + str(i)] + list(randint(10, size=2))

>>> df
     lib qty1 qty2
0  name0    3    3
1  name1    2    4
2  name2    2    8
3  name3    2    1
4  name4    9    6
>>> import pandas as pd
>>> from numpy.random import randint

>>> df = pd.DataFrame(columns=['lib', 'qty1', 'qty2'])
>>> for i in range(5):
>>>     df.loc[i] = ['name' + str(i)] + list(randint(10, size=2))

>>> df
     lib qty1 qty2
0  name0    3    3
1  name1    2    4
2  name2    2    8
3  name3    2    1
4  name4    9    6

回答 1

如果可以预先获取该数据帧的所有数据,则有一种比附加到数据帧快得多的方法:

  1. 创建一个词典列表,其中每个词典对应于一个输入数据行。
  2. 从此列表创建一个数据框。

我有一个类似的任务,需要花30分钟的时间逐行附加到数据框,然后根据在几秒钟内完成的词典列表创建数据框。

rows_list = []
for row in input_rows:

        dict1 = {}
        # get input row in dictionary format
        # key = col_name
        dict1.update(blah..) 

        rows_list.append(dict1)

df = pd.DataFrame(rows_list)               

In case you can get all data for the data frame upfront, there is a much faster approach than appending to a data frame:

  1. Create a list of dictionaries in which each dictionary corresponds to an input data row.
  2. Create a data frame from this list.

I had a similar task for which appending to a data frame row by row took 30 min, and creating a data frame from a list of dictionaries completed within seconds.

rows_list = []
for row in input_rows:

        dict1 = {}
        # get input row in dictionary format
        # key = col_name
        dict1.update(blah..) 

        rows_list.append(dict1)

df = pd.DataFrame(rows_list)               

回答 2

您可以使用pandas.concat()DataFrame.append()。有关详细信息和示例,请参见合并,联接和连接

You could use pandas.concat() or DataFrame.append(). For details and examples, see Merge, join, and concatenate.


回答 3

已经很长时间了,但是我也面临着同样的问题。并在这里找到了很多有趣的答案。所以我很困惑使用什么方法。

在向数据帧添加很多行的情况下,我对速度性能感兴趣。因此,我尝试了4种最流行的方法并检查了它们的速度。

使用新版本的软件包在2019年更新。在@FooBar评论也会更新

速度表现

  1. 使用.append(NPE的答案
  2. 使用.loc(弗雷德的答案
  3. 使用.loc进行预分配(FooBar的答案
  4. 最后使用dict并创建DataFrame(ShikharDua的答案

结果(以秒为单位):

|------------|-------------|-------------|-------------|
|  Approach  |  1000 rows  |  5000 rows  | 10 000 rows |
|------------|-------------|-------------|-------------|
| .append    |    0.69     |    3.39     |    6.78     |
|------------|-------------|-------------|-------------|
| .loc w/o   |    0.74     |    3.90     |    8.35     |
| prealloc   |             |             |             |
|------------|-------------|-------------|-------------|
| .loc with  |    0.24     |    2.58     |    8.70     |
| prealloc   |             |             |             |
|------------|-------------|-------------|-------------|
|  dict      |    0.012    |   0.046     |   0.084     |
|------------|-------------|-------------|-------------|

也感谢@krassowski的有用评论-我更新了代码。

所以我自己在字典中使用加法。


码:

import pandas as pd
import numpy as np
import time

del df1, df2, df3, df4
numOfRows = 1000
# append
startTime = time.perf_counter()
df1 = pd.DataFrame(np.random.randint(100, size=(5,5)), columns=['A', 'B', 'C', 'D', 'E'])
for i in range( 1,numOfRows-4):
    df1 = df1.append( dict( (a,np.random.randint(100)) for a in ['A','B','C','D','E']), ignore_index=True)
print('Elapsed time: {:6.3f} seconds for {:d} rows'.format(time.perf_counter() - startTime, numOfRows))
print(df1.shape)

# .loc w/o prealloc
startTime = time.perf_counter()
df2 = pd.DataFrame(np.random.randint(100, size=(5,5)), columns=['A', 'B', 'C', 'D', 'E'])
for i in range( 1,numOfRows):
    df2.loc[i]  = np.random.randint(100, size=(1,5))[0]
print('Elapsed time: {:6.3f} seconds for {:d} rows'.format(time.perf_counter() - startTime, numOfRows))
print(df2.shape)

# .loc with prealloc
df3 = pd.DataFrame(index=np.arange(0, numOfRows), columns=['A', 'B', 'C', 'D', 'E'] )
startTime = time.perf_counter()
for i in range( 1,numOfRows):
    df3.loc[i]  = np.random.randint(100, size=(1,5))[0]
print('Elapsed time: {:6.3f} seconds for {:d} rows'.format(time.perf_counter() - startTime, numOfRows))
print(df3.shape)

# dict
startTime = time.perf_counter()
row_list = []
for i in range (0,5):
    row_list.append(dict( (a,np.random.randint(100)) for a in ['A','B','C','D','E']))
for i in range( 1,numOfRows-4):
    dict1 = dict( (a,np.random.randint(100)) for a in ['A','B','C','D','E'])
    row_list.append(dict1)

df4 = pd.DataFrame(row_list, columns=['A','B','C','D','E'])
print('Elapsed time: {:6.3f} seconds for {:d} rows'.format(time.perf_counter() - startTime, numOfRows))
print(df4.shape)

PS我相信,我的认识并不完美,也许还有一些优化。

It's been a long time, but I faced the same problem too. And found here a lot of interesting answers. So I was confused what method to use.

In the case of adding a lot of rows to dataframe I interested in speed performance. So I tried 4 most popular methods and checked their speed.

UPDATED IN 2019 using new versions of packages. Also updated after @FooBar comment

SPEED PERFORMANCE

  1. Using .append (NPE's answer)
  2. Using .loc (fred's answer)
  3. Using .loc with preallocating (FooBar's answer)
  4. Using dict and create DataFrame in the end (ShikharDua's answer)

Results (in secs):

|------------|-------------|-------------|-------------|
|  Approach  |  1000 rows  |  5000 rows  | 10 000 rows |
|------------|-------------|-------------|-------------|
| .append    |    0.69     |    3.39     |    6.78     |
|------------|-------------|-------------|-------------|
| .loc w/o   |    0.74     |    3.90     |    8.35     |
| prealloc   |             |             |             |
|------------|-------------|-------------|-------------|
| .loc with  |    0.24     |    2.58     |    8.70     |
| prealloc   |             |             |             |
|------------|-------------|-------------|-------------|
|  dict      |    0.012    |   0.046     |   0.084     |
|------------|-------------|-------------|-------------|

Also thanks to @krassowski for useful comment - I updated the code.

So I use addition through the dictionary for myself.


Code:

import pandas as pd
import numpy as np
import time

del df1, df2, df3, df4
numOfRows = 1000
# append
startTime = time.perf_counter()
df1 = pd.DataFrame(np.random.randint(100, size=(5,5)), columns=['A', 'B', 'C', 'D', 'E'])
for i in range( 1,numOfRows-4):
    df1 = df1.append( dict( (a,np.random.randint(100)) for a in ['A','B','C','D','E']), ignore_index=True)
print('Elapsed time: {:6.3f} seconds for {:d} rows'.format(time.perf_counter() - startTime, numOfRows))
print(df1.shape)

# .loc w/o prealloc
startTime = time.perf_counter()
df2 = pd.DataFrame(np.random.randint(100, size=(5,5)), columns=['A', 'B', 'C', 'D', 'E'])
for i in range( 1,numOfRows):
    df2.loc[i]  = np.random.randint(100, size=(1,5))[0]
print('Elapsed time: {:6.3f} seconds for {:d} rows'.format(time.perf_counter() - startTime, numOfRows))
print(df2.shape)

# .loc with prealloc
df3 = pd.DataFrame(index=np.arange(0, numOfRows), columns=['A', 'B', 'C', 'D', 'E'] )
startTime = time.perf_counter()
for i in range( 1,numOfRows):
    df3.loc[i]  = np.random.randint(100, size=(1,5))[0]
print('Elapsed time: {:6.3f} seconds for {:d} rows'.format(time.perf_counter() - startTime, numOfRows))
print(df3.shape)

# dict
startTime = time.perf_counter()
row_list = []
for i in range (0,5):
    row_list.append(dict( (a,np.random.randint(100)) for a in ['A','B','C','D','E']))
for i in range( 1,numOfRows-4):
    dict1 = dict( (a,np.random.randint(100)) for a in ['A','B','C','D','E'])
    row_list.append(dict1)

df4 = pd.DataFrame(row_list, columns=['A','B','C','D','E'])
print('Elapsed time: {:6.3f} seconds for {:d} rows'.format(time.perf_counter() - startTime, numOfRows))
print(df4.shape)

P.S. I believe, my realization isn't perfect, and maybe there is some optimization.


回答 4

如果事先知道条目数,则应该通过提供索引来预分配空间(从另一个答案中获取数据示例):

import pandas as pd
import numpy as np
# we know we're gonna have 5 rows of data
numberOfRows = 5
# create dataframe
df = pd.DataFrame(index=np.arange(0, numberOfRows), columns=('lib', 'qty1', 'qty2') )

# now fill it up row by row
for x in np.arange(0, numberOfRows):
    #loc or iloc both work here since the index is natural numbers
    df.loc[x] = [np.random.randint(-1,1) for n in range(3)]
In[23]: df
Out[23]: 
   lib  qty1  qty2
0   -1    -1    -1
1    0     0     0
2   -1     0    -1
3    0    -1     0
4   -1     0     0

速度比较

In[30]: %timeit tryThis() # function wrapper for this answer
In[31]: %timeit tryOther() # function wrapper without index (see, for example, @fred)
1000 loops, best of 3: 1.23 ms per loop
100 loops, best of 3: 2.31 ms per loop

而且-从注释中看-大小为6000,速度差变得更大:

增加数组(12)的大小和行数(500)会使速度差异更加明显:313ms vs 2.29s

If you know the number of entries ex ante, you should preallocate the space by also providing the index (taking the data example from a different answer):

import pandas as pd
import numpy as np
# we know we're gonna have 5 rows of data
numberOfRows = 5
# create dataframe
df = pd.DataFrame(index=np.arange(0, numberOfRows), columns=('lib', 'qty1', 'qty2') )

# now fill it up row by row
for x in np.arange(0, numberOfRows):
    #loc or iloc both work here since the index is natural numbers
    df.loc[x] = [np.random.randint(-1,1) for n in range(3)]
In[23]: df
Out[23]: 
   lib  qty1  qty2
0   -1    -1    -1
1    0     0     0
2   -1     0    -1
3    0    -1     0
4   -1     0     0

Speed comparison

In[30]: %timeit tryThis() # function wrapper for this answer
In[31]: %timeit tryOther() # function wrapper without index (see, for example, @fred)
1000 loops, best of 3: 1.23 ms per loop
100 loops, best of 3: 2.31 ms per loop

And - as from the comments - with a size of 6000, the speed difference becomes even larger:

Increasing the size of the array (12) and the number of rows (500) makes the speed difference more striking: 313ms vs 2.29s


回答 5

mycolumns = ['A', 'B']
df = pd.DataFrame(columns=mycolumns)
rows = [[1,2],[3,4],[5,6]]
for row in rows:
    df.loc[len(df)] = row
mycolumns = ['A', 'B']
df = pd.DataFrame(columns=mycolumns)
rows = [[1,2],[3,4],[5,6]]
for row in rows:
    df.loc[len(df)] = row

回答 6

为了高效地附加,请参见如何向pandas数据框添加额外的行和“ 设置为放大”

通过添加行loc/ix不存在的关键指标数据。例如:

In [1]: se = pd.Series([1,2,3])

In [2]: se
Out[2]: 
0    1
1    2
2    3
dtype: int64

In [3]: se[5] = 5.

In [4]: se
Out[4]: 
0    1.0
1    2.0
2    3.0
5    5.0
dtype: float64

要么:

In [1]: dfi = pd.DataFrame(np.arange(6).reshape(3,2),
   .....:                 columns=['A','B'])
   .....: 

In [2]: dfi
Out[2]: 
   A  B
0  0  1
1  2  3
2  4  5

In [3]: dfi.loc[:,'C'] = dfi.loc[:,'A']

In [4]: dfi
Out[4]: 
   A  B  C
0  0  1  0
1  2  3  2
2  4  5  4
In [5]: dfi.loc[3] = 5

In [6]: dfi
Out[6]: 
   A  B  C
0  0  1  0
1  2  3  2
2  4  5  4
3  5  5  5

For efficient appending see How to add an extra row to a pandas dataframe and Setting With Enlargement.

Add rows through loc/ix on non existing key index data. e.g. :

In [1]: se = pd.Series([1,2,3])

In [2]: se
Out[2]: 
0    1
1    2
2    3
dtype: int64

In [3]: se[5] = 5.

In [4]: se
Out[4]: 
0    1.0
1    2.0
2    3.0
5    5.0
dtype: float64

Or:

In [1]: dfi = pd.DataFrame(np.arange(6).reshape(3,2),
   .....:                 columns=['A','B'])
   .....: 

In [2]: dfi
Out[2]: 
   A  B
0  0  1
1  2  3
2  4  5

In [3]: dfi.loc[:,'C'] = dfi.loc[:,'A']

In [4]: dfi
Out[4]: 
   A  B  C
0  0  1  0
1  2  3  2
2  4  5  4
In [5]: dfi.loc[3] = 5

In [6]: dfi
Out[6]: 
   A  B  C
0  0  1  0
1  2  3  2
2  4  5  4
3  5  5  5

回答 7

您可以使用ignore_index选项将单行附加为字典。

>>> f = pandas.DataFrame(data = {'Animal':['cow','horse'], 'Color':['blue', 'red']})
>>> f
  Animal Color
0    cow  blue
1  horse   red
>>> f.append({'Animal':'mouse', 'Color':'black'}, ignore_index=True)
  Animal  Color
0    cow   blue
1  horse    red
2  mouse  black

You can append a single row as a dictionary using the ignore_index option.

>>> f = pandas.DataFrame(data = {'Animal':['cow','horse'], 'Color':['blue', 'red']})
>>> f
  Animal Color
0    cow  blue
1  horse   red
>>> f.append({'Animal':'mouse', 'Color':'black'}, ignore_index=True)
  Animal  Color
0    cow   blue
1  horse    red
2  mouse  black

回答 8

为了Python的方式,在这里添加我的答案:

res = pd.DataFrame(columns=('lib', 'qty1', 'qty2'))
res = res.append([{'qty1':10.0}], ignore_index=True)
print(res.head())

   lib  qty1  qty2
0  NaN  10.0   NaN

For the sake of Pythonic way, here add my answer:

res = pd.DataFrame(columns=('lib', 'qty1', 'qty2'))
res = res.append([{'qty1':10.0}], ignore_index=True)
print(res.head())

   lib  qty1  qty2
0  NaN  10.0   NaN

回答 9

您还可以建立列表列表,并将其转换为数据框-

import pandas as pd

columns = ['i','double','square']
rows = []

for i in range(6):
    row = [i, i*2, i*i]
    rows.append(row)

df = pd.DataFrame(rows, columns=columns)

给予

    我加倍
0 0 0 0
1 1 2 1
2 2 4 4
3 3 6 9
4 4 8 16
5 5 10 25

You can also build up a list of lists and convert it to a dataframe -

import pandas as pd

columns = ['i','double','square']
rows = []

for i in range(6):
    row = [i, i*2, i*i]
    rows.append(row)

df = pd.DataFrame(rows, columns=columns)

giving

    i   double  square
0   0   0   0
1   1   2   1
2   2   4   4
3   3   6   9
4   4   8   16
5   5   10  25

回答 10

这不是对OP问题的答案,而是一个玩具示例,用于说明@ShikharDua的答案,在上面我发现它非常有用。

尽管这个片段是微不足道的,但在实际数据中,我有1000行和许多列,我希望能够按不同的列进行分组,然后对一个以上的taget列执行以下统计信息。因此,拥有一种可靠的方法来一次一次构建数据帧非常方便。谢谢@ShikharDua!

import pandas as pd 

BaseData = pd.DataFrame({ 'Customer' : ['Acme','Mega','Acme','Acme','Mega','Acme'],
                          'Territory'  : ['West','East','South','West','East','South'],
                          'Product'  : ['Econ','Luxe','Econ','Std','Std','Econ']})
BaseData

columns = ['Customer','Num Unique Products', 'List Unique Products']

rows_list=[]
for name, group in BaseData.groupby('Customer'):
    RecordtoAdd={} #initialise an empty dict 
    RecordtoAdd.update({'Customer' : name}) #
    RecordtoAdd.update({'Num Unique Products' : len(pd.unique(group['Product']))})      
    RecordtoAdd.update({'List Unique Products' : pd.unique(group['Product'])})                   

    rows_list.append(RecordtoAdd)

AnalysedData = pd.DataFrame(rows_list)

print('Base Data : \n',BaseData,'\n\n Analysed Data : \n',AnalysedData)

This is not an answer to the OP question but a toy example to illustrate the answer of @ShikharDua above which I found very useful.

While this fragment is trivial, in the actual data I had 1,000s of rows, and many columns, and I wished to be able to group by different columns and then perform the stats below for more than one taget column. So having a reliable method for building the data frame one row at a time was a great convenience. Thank you @ShikharDua !

import pandas as pd 

BaseData = pd.DataFrame({ 'Customer' : ['Acme','Mega','Acme','Acme','Mega','Acme'],
                          'Territory'  : ['West','East','South','West','East','South'],
                          'Product'  : ['Econ','Luxe','Econ','Std','Std','Econ']})
BaseData

columns = ['Customer','Num Unique Products', 'List Unique Products']

rows_list=[]
for name, group in BaseData.groupby('Customer'):
    RecordtoAdd={} #initialise an empty dict 
    RecordtoAdd.update({'Customer' : name}) #
    RecordtoAdd.update({'Num Unique Products' : len(pd.unique(group['Product']))})      
    RecordtoAdd.update({'List Unique Products' : pd.unique(group['Product'])})                   

    rows_list.append(RecordtoAdd)

AnalysedData = pd.DataFrame(rows_list)

print('Base Data : \n',BaseData,'\n\n Analysed Data : \n',AnalysedData)

回答 11

想出了一种简单而又不错的方法:

>>> df
     A  B  C
one  1  2  3
>>> df.loc["two"] = [4,5,6]
>>> df
     A  B  C
one  1  2  3
two  4  5  6

Figured out a simple and nice way:

>>> df
     A  B  C
one  1  2  3
>>> df.loc["two"] = [4,5,6]
>>> df
     A  B  C
one  1  2  3
two  4  5  6

回答 12

您可以使用生成器对象创建Dataframe,这将在列表上提高内存效率。

num = 10

# Generator function to generate generator object
def numgen_func(num):
    for i in range(num):
        yield ('name_{}'.format(i), (i*i), (i*i*i))

# Generator expression to generate generator object (Only once data get populated, can not be re used)
numgen_expression = (('name_{}'.format(i), (i*i), (i*i*i)) for i in range(num) )

df = pd.DataFrame(data=numgen_func(num), columns=('lib', 'qty1', 'qty2'))

要将原始数据添加到现有DataFrame中,可以使用append方法。

df = df.append([{ 'lib': "name_20", 'qty1': 20, 'qty2': 400  }])

You can use generator object to create Dataframe, which will be more memory efficient over the list.

num = 10

# Generator function to generate generator object
def numgen_func(num):
    for i in range(num):
        yield ('name_{}'.format(i), (i*i), (i*i*i))

# Generator expression to generate generator object (Only once data get populated, can not be re used)
numgen_expression = (('name_{}'.format(i), (i*i), (i*i*i)) for i in range(num) )

df = pd.DataFrame(data=numgen_func(num), columns=('lib', 'qty1', 'qty2'))

To add raw to existing DataFrame you can use append method.

df = df.append([{ 'lib': "name_20", 'qty1': 20, 'qty2': 400  }])

回答 13

创建一个新记录(数据框)并添加到old_data_frame
传递列表和相应的名以创建new_record(data_frame)

new_record = pd.DataFrame([[0,'abcd',0,1,123]],columns=['a','b','c','d','e'])

old_data_frame = pd.concat([old_data_frame,new_record])

Create a new record(data frame) and add to old_data_frame.
pass list of values and corresponding column names to create a new_record (data_frame)

new_record = pd.DataFrame([[0,'abcd',0,1,123]],columns=['a','b','c','d','e'])

old_data_frame = pd.concat([old_data_frame,new_record])

回答 14

这是在其中添加/添加行的方法 pandas DataFrame

def add_row(df, row):
    df.loc[-1] = row
    df.index = df.index + 1  
    return df.sort_index()

add_row(df, [1,2,3]) 

它可以用于在空的或填充的熊猫DataFrame中插入/追加一行

Here is the way to add/append a row in pandas DataFrame

def add_row(df, row):
    df.loc[-1] = row
    df.index = df.index + 1  
    return df.sort_index()

add_row(df, [1,2,3]) 

It can be used to insert/append a row in empty or populated pandas DataFrame


回答 15

除了ShikharDua的答案中的字典列表之外,我们还可以将表表示为list字典,假设我们事先知道各列,则每个列表按行顺序存储一列。最后,我们构造一次DataFrame。

对于c列和n行,这使用1个字典和c个列表,而使用1个列表和n个字典。字典列表方法使每个字典都存储所有键,并且需要为每行创建一个新字典。在这里,我们仅附加到列表,这是恒定时间并且理论上非常快。

# current data
data = {"Animal":["cow", "horse"], "Color":["blue", "red"]}

# adding a new row (be careful to ensure every column gets another value)
data["Animal"].append("mouse")
data["Color"].append("black")

# at the end, construct our DataFrame
df = pd.DataFrame(data)
#   Animal  Color
# 0    cow   blue
# 1  horse    red
# 2  mouse  black

Instead of a list of dictionaries as in ShikharDua's answer, we can also represent our table as a dictionary of lists, where each list stores one column in row-order, given we know our columns beforehand. At the end we construct our DataFrame once.

For c columns and n rows, this uses 1 dictionary and c lists, versus 1 list and n dictionaries. The list of dictionaries method has each dictionary storing all keys and requires creating a new dictionary for every row. Here we only append to lists, which is constant time and theoretically very fast.

# current data
data = {"Animal":["cow", "horse"], "Color":["blue", "red"]}

# adding a new row (be careful to ensure every column gets another value)
data["Animal"].append("mouse")
data["Color"].append("black")

# at the end, construct our DataFrame
df = pd.DataFrame(data)
#   Animal  Color
# 0    cow   blue
# 1  horse    red
# 2  mouse  black

回答 16

如果要在行末添加行,请将其添加为列表

valuestoappend = [va1,val2,val3]
res = res.append(pd.Series(valuestoappend,index = ['lib', 'qty1', 'qty2']),ignore_index = True)

if you want to add row at the end append it as a list

valuestoappend = [va1,val2,val3]
res = res.append(pd.Series(valuestoappend,index = ['lib', 'qty1', 'qty2']),ignore_index = True)

回答 17

另一种方法(可能不是很出色):

# add a row
def add_row(df, row):
    colnames = list(df.columns)
    ncol = len(colnames)
    assert ncol == len(row), "Length of row must be the same as width of DataFrame: %s" % row
    return df.append(pd.DataFrame([row], columns=colnames))

您还可以像这样增强DataFrame类:

import pandas as pd
def add_row(self, row):
    self.loc[len(self.index)] = row
pd.DataFrame.add_row = add_row

Another way to do it (probably not very performant):

# add a row
def add_row(df, row):
    colnames = list(df.columns)
    ncol = len(colnames)
    assert ncol == len(row), "Length of row must be the same as width of DataFrame: %s" % row
    return df.append(pd.DataFrame([row], columns=colnames))

You can also enhance the DataFrame class like this:

import pandas as pd
def add_row(self, row):
    self.loc[len(self.index)] = row
pd.DataFrame.add_row = add_row

回答 18

简单点。通过将列表作为输入,将其添加为数据帧中的行:

import pandas as pd  
res = pd.DataFrame(columns=('lib', 'qty1', 'qty2'))  
for i in range(5):  
    res_list = list(map(int, input().split()))  
    res = res.append(pd.Series(res_list,index=['lib','qty1','qty2']), ignore_index=True)

Make it simple. By taking list as input which will be appended as row in data-frame:-

import pandas as pd  
res = pd.DataFrame(columns=('lib', 'qty1', 'qty2'))  
for i in range(5):  
    res_list = list(map(int, input().split()))  
    res = res.append(pd.Series(res_list,index=['lib','qty1','qty2']), ignore_index=True)

回答 19

您需要的是loc[df.shape[0]]loc[len(df)]


# Assuming your df has 4 columns (str, int, str, bool)
df.loc[df.shape[0]] = ['col1Value', 100, 'col3Value', False] 

要么

df.loc[len(df)] = ['col1Value', 100, 'col3Value', False] 

All you need is loc[df.shape[0]] or loc[len(df)]


# Assuming your df has 4 columns (str, int, str, bool)
df.loc[df.shape[0]] = ['col1Value', 100, 'col3Value', False] 

or

df.loc[len(df)] = ['col1Value', 100, 'col3Value', False] 

回答 20

我们经常看到df.loc[subscript] = …分配给一个DataFrame行的结构。Mikhail_Sam发布了基准测试,其中包含此构造以及使用dict的方法,最后创建了DataFrame。他发现后者是迄今为止最快的。但是,如果我们用替换df3.loc[i] = …其代码中的(使用预分配的DataFrame)df3.values[i] = …,结果将发生显着变化,因为该方法的性能类似于使用dict的方法。因此,我们应该更多地考虑使用df.values[subscript] = …。但是请注意,.values它采用从零开始的下标,该下标可能与DataFrame.index不同。

We often see the construct df.loc[subscript] = … to assign to one DataFrame row. Mikhail_Sam posted benchmarks containing, among others, this construct as well as the method using dict and create DataFrame in the end. He found the latter to be the fastest by far. But if we replace the df3.loc[i] = … (with preallocated DataFrame) in his code with df3.values[i] = …, the outcome changes significantly, in that that method performs similar to the one using dict. So we should more often take the use of df.values[subscript] = … into consideration. However note that .values takes a zero-based subscript, which may be different from the DataFrame.index.


回答 21

pandas.DataFrame.append

DataFrame.append(自身,其他,ignore_index = False,verify_integrity = False,sort = False)→'DataFrame'

df = pd.DataFrame([[1, 2], [3, 4]], columns=list('AB'))
df2 = pd.DataFrame([[5, 6], [7, 8]], columns=list('AB'))
df.append(df2)

将ignore_index设置为True:

df.append(df2, ignore_index=True)

pandas.DataFrame.append

DataFrame.append(self, other, ignore_index=False, verify_integrity=False, sort=False) → 'DataFrame'

df = pd.DataFrame([[1, 2], [3, 4]], columns=list('AB'))
df2 = pd.DataFrame([[5, 6], [7, 8]], columns=list('AB'))
df.append(df2)

With ignore_index set to True:

df.append(df2, ignore_index=True)

回答 22

在添加行之前,我们必须将数据帧转换为字典,在那里您可以看到键在数据帧中为列,并且列的值再次存储在字典中,但是每一列的键都是数据帧中的索引号。这个想法让我写了下面的代码。

df2=df.to_dict()
values=["s_101","hyderabad",10,20,16,13,15,12,12,13,25,26,25,27,"good","bad"] #this is total row that we are going to add
i=0
for x in df.columns:   #here df.columns gives us the main dictionary key
    df2[x][101]=values[i]   #here the 101 is our index number it is also key of sub dictionary
    i+=1

before going to add a row, we have to convert the dataframe to dictionary there you can see the keys as columns in dataframe and values of the columns are again stored in the dictionary but there key for every column is the index number in dataframe. That idea make me to write the below code.

df2=df.to_dict()
values=["s_101","hyderabad",10,20,16,13,15,12,12,13,25,26,25,27,"good","bad"] #this is total row that we are going to add
i=0
for x in df.columns:   #here df.columns gives us the main dictionary key
    df2[x][101]=values[i]   #here the 101 is our index number it is also key of sub dictionary
    i+=1

回答 23

您可以为此连接两个DataFrame。我基本上遇到了这个问题,将新行添加到具有字符索引(非数字)的现有DataFrame中。因此,我在pipe()中输入新行的数据,并在列表中输入索引。

new_dict = {put input for new row here}
new_list = [put your index here]

new_df = pd.DataFrame(data=new_dict, index=new_list)

df = pd.concat([existing_df, new_df])

You can concatenate two DataFrames for this. I basically came across this problem to add a new row to an existing DataFrame with a character index(not numeric). So, I input the data for a new row in a duct() and index in a list.

new_dict = {put input for new row here}
new_list = [put your index here]

new_df = pd.DataFrame(data=new_dict, index=new_list)

df = pd.concat([existing_df, new_df])

回答 24

这将有助于将一个项目添加到一个空的DataFrame中。问题在于df.index.max() == nan第一个索引:

df = pd.DataFrame(columns=['timeMS', 'accelX', 'accelY', 'accelZ', 'gyroX', 'gyroY', 'gyroZ'])

df.loc[0 if math.isnan(df.index.max()) else df.index.max() + 1] = [x for x in range(7)]

This will take care of adding an item to an empty DataFrame. The issue is that df.index.max() == nan for the first index:

df = pd.DataFrame(columns=['timeMS', 'accelX', 'accelY', 'accelZ', 'gyroX', 'gyroY', 'gyroZ'])

df.loc[0 if math.isnan(df.index.max()) else df.index.max() + 1] = [x for x in range(7)]

本文由 Python 实用宝典 作者:Python实用宝典 发表,其版权均为 Python 实用宝典 所有,文章内容系作者个人观点,不代表 Python 实用宝典 对观点赞同或支持。如需转载,请注明文章来源。
2

抱歉,评论已关闭!