问题:使用索引为pandas DataFrame中的特定单元格设置值
我创建了一个Pandas DataFrame
df = DataFrame(index=['A','B','C'], columns=['x','y'])
并得到这个
y
NaN NaN
B NaN NaN
Na
然后,我想为特定的单元格赋值,例如行“ C”和列“ x”。我期望得到这样的结果:
y
NaN NaN
B NaN NaN
C 10 NaN
使用此代码:
df.xs('C')['x'] = 10
但内容df
没有改变。再次仅NaN
在DataFrame中。
有什么建议么?
I’ve created a Pandas DataFrame
df = DataFrame(index=['A','B','C'], columns=['x','y'])
and got this
x y
A NaN NaN
B NaN NaN
C NaN NaN
Then I want to assign value to particular cell, for example for row ‘C’ and column ‘x’.
I’ve expected to get such result:
x y
A NaN NaN
B NaN NaN
C 10 NaN
with this code:
df.xs('C')['x'] = 10
but contents of df
haven’t changed. It’s again only NaN
s in DataFrame.
Any suggestions?
回答 0
RukTech的答案,df.set_value('C', 'x', 10)
远比我在下面建议的选项要快得多。但是,已将其淘汰。
展望未来,推荐的方法是.iat/.at
。
为什么df.xs('C')['x']=10
不起作用:
df.xs('C')
默认情况下,返回带有数据副本的新数据框,因此
df.xs('C')['x']=10
仅修改此新数据框。
df['x']
返回df
数据框的视图,因此
df['x']['C'] = 10
修改df
自己。
警告:有时很难预测操作是否返回副本或视图。因此,文档建议避免使用“链接索引”进行赋值。
所以推荐的替代方法是
df.at['C', 'x'] = 10
这不修改df
。
In [18]: %timeit df.set_value('C', 'x', 10)
100000 loops, best of 3: 2.9 µs per loop
In [20]: %timeit df['x']['C'] = 10
100000 loops, best of 3: 6.31 µs per loop
In [81]: %timeit df.at['C', 'x'] = 10
100000 loops, best of 3: 9.2 µs per loop
RukTech’s answer, df.set_value('C', 'x', 10)
, is far and away faster than the options I’ve suggested below. However, it has been slated for deprecation.
Going forward, the recommended method is .iat/.at
.
Why df.xs('C')['x']=10
does not work:
df.xs('C')
by default, returns a new dataframe with a copy of the data, so
df.xs('C')['x']=10
modifies this new dataframe only.
df['x']
returns a view of the df
dataframe, so
df['x']['C'] = 10
modifies df
itself.
Warning: It is sometimes difficult to predict if an operation returns a copy or a view. For this reason the docs recommend avoiding assignments with “chained indexing”.
So the recommended alternative is
df.at['C', 'x'] = 10
which does modify df
.
In [18]: %timeit df.set_value('C', 'x', 10)
100000 loops, best of 3: 2.9 µs per loop
In [20]: %timeit df['x']['C'] = 10
100000 loops, best of 3: 6.31 µs per loop
In [81]: %timeit df.at['C', 'x'] = 10
100000 loops, best of 3: 9.2 µs per loop
回答 1
更新:该.set_value
方法将不推荐使用。.iat/.at
是很好的替代品,不幸的是熊猫提供的文件很少
最快的方法是使用set_value。该方法比.ix
方法快100倍。例如:
df.set_value('C', 'x', 10)
Update: The .set_value
method is going to be deprecated. .iat/.at
are good replacements, unfortunately pandas provides little documentation
The fastest way to do this is using set_value. This method is ~100 times faster than .ix
method. For example:
df.set_value('C', 'x', 10)
回答 2
您还可以使用条件查询,.loc
如下所示:
df.loc[df[<some_column_name>] == <condition>, [<another_column_name>]] = <value_to_add>
<some_column_name
您要在其中检查<condition>
变量的列在哪里,您要添加到的列在哪里<another_column_name>
(可以是新列,也可以是已经存在的列)。<value_to_add>
是要添加到该列/行的值。
该示例不能完全解决当前的问题,但对于希望根据条件添加特定值的人来说可能很有用。
You can also use a conditional lookup using .loc
as seen here:
df.loc[df[<some_column_name>] == <condition>, [<another_column_name>]] = <value_to_add>
where <some_column_name
is the column you want to check the <condition>
variable against and <another_column_name>
is the column you want to add to (can be a new column or one that already exists). <value_to_add>
is the value you want to add to that column/row.
This example doesn’t work precisely with the question at hand, but it might be useful for someone wants to add a specific value based on a condition.
回答 3
推荐的方法(根据维护者)是:
df.ix['x','C']=10
使用“链接索引”(df['x']['C']
)可能会导致问题。
看到:
The recommended way (according to the maintainers) to set a value is:
df.ix['x','C']=10
Using ‘chained indexing’ (df['x']['C']
) may lead to problems.
See:
回答 4
尝试使用 df.loc[row_index,col_indexer] = value
Try using df.loc[row_index,col_indexer] = value
回答 5
这是唯一对我有用的东西!
df.loc['C', 'x'] = 10
.loc
在此处了解更多信息。
This is the only thing that worked for me!
df.loc['C', 'x'] = 10
Learn more about .loc
here.
回答 6
.iat/.at
是很好的解决方案。假设您有一个简单的data_frame:
A B C
0 1 8 4
1 3 9 6
2 22 33 52
如果我们要修改单元格的值,则[0,"A"]
可以使用以下解决方案之一:
df.iat[0,0] = 2
df.at[0,'A'] = 2
这是一个完整的示例,说明如何iat
用于获取和设置cell的值:
def prepossessing(df):
for index in range(0,len(df)):
df.iat[index,0] = df.iat[index,0] * 2
return df
y_train之前:
0
0 54
1 15
2 15
3 8
4 31
5 63
6 11
调用预设函数后的y_train iat
进行更改,以将每个单元格的值乘以2:
0
0 108
1 30
2 30
3 16
4 62
5 126
6 22
.iat/.at
is the good solution.
Supposing you have this simple data_frame:
A B C
0 1 8 4
1 3 9 6
2 22 33 52
if we want to modify the value of the cell [0,"A"]
u can use one of those solution :
df.iat[0,0] = 2
df.at[0,'A'] = 2
And here is a complete example how to use iat
to get and set a value of cell :
def prepossessing(df):
for index in range(0,len(df)):
df.iat[index,0] = df.iat[index,0] * 2
return df
y_train before :
0
0 54
1 15
2 15
3 8
4 31
5 63
6 11
y_train after calling prepossessing function that iat
to change to multiply the value of each cell by 2:
0
0 108
1 30
2 30
3 16
4 62
5 126
6 22
回答 7
要设置值,请使用:
df.at[0, 'clm1'] = 0
- 推荐的最快的设置变量的方法。
set_value
,ix
已弃用。
- 没有警告,与
iloc
和loc
To set values, use:
df.at[0, 'clm1'] = 0
- The fastest recommended method for setting variables.
set_value
, ix
have been deprecated.
- No warning, unlike
iloc
and loc
回答 8
您可以使用.iloc
。
df.iloc[[2], [0]] = 10
you can use .iloc
.
df.iloc[[2], [0]] = 10
回答 9
在我的示例中,我只是在选定的单元格中对其进行了更改
for index, row in result.iterrows():
if np.isnan(row['weight']):
result.at[index, 'weight'] = 0.0
“结果”是带有“权重”列的dataField
In my example i just change it in selected cell
for index, row in result.iterrows():
if np.isnan(row['weight']):
result.at[index, 'weight'] = 0.0
‘result’ is a dataField with column ‘weight’
回答 10
set_value()
不推荐使用。
从版本0.23.4开始,Pandas“ 宣布了未来 ”。
>>> df
Cars Prices (U$)
0 Audi TT 120.0
1 Lamborghini Aventador 245.0
2 Chevrolet Malibu 190.0
>>> df.set_value(2, 'Prices (U$)', 240.0)
__main__:1: FutureWarning: set_value is deprecated and will be removed in a future release.
Please use .at[] or .iat[] accessors instead
Cars Prices (U$)
0 Audi TT 120.0
1 Lamborghini Aventador 245.0
2 Chevrolet Malibu 240.0
考虑到此建议,下面是如何使用它们的演示:
>>> df.iat[1, 1] = 260.0
>>> df
Cars Prices (U$)
0 Audi TT 120.0
1 Lamborghini Aventador 260.0
2 Chevrolet Malibu 240.0
>>> df.at[2, "Cars"] = "Chevrolet Corvette"
>>> df
Cars Prices (U$)
0 Audi TT 120.0
1 Lamborghini Aventador 260.0
2 Chevrolet Corvette 240.0
参考文献:
set_value()
is deprecated.
Starting from the release 0.23.4, Pandas “announces the future“…
>>> df
Cars Prices (U$)
0 Audi TT 120.0
1 Lamborghini Aventador 245.0
2 Chevrolet Malibu 190.0
>>> df.set_value(2, 'Prices (U$)', 240.0)
__main__:1: FutureWarning: set_value is deprecated and will be removed in a future release.
Please use .at[] or .iat[] accessors instead
Cars Prices (U$)
0 Audi TT 120.0
1 Lamborghini Aventador 245.0
2 Chevrolet Malibu 240.0
Considering this advice, here’s a demonstration of how to use them:
- by row/column integer positions
>>> df.iat[1, 1] = 260.0
>>> df
Cars Prices (U$)
0 Audi TT 120.0
1 Lamborghini Aventador 260.0
2 Chevrolet Malibu 240.0
>>> df.at[2, "Cars"] = "Chevrolet Corvette"
>>> df
Cars Prices (U$)
0 Audi TT 120.0
1 Lamborghini Aventador 260.0
2 Chevrolet Corvette 240.0
References:
回答 11
这是所有用户针对整数和字符串索引的数据帧提供的有效解决方案的摘要。
df.iloc,df.loc和df.at适用于两种类型的数据帧,df.iloc仅适用于行/列整数索引,df.loc和df.at支持使用列名和/或整数索引设置值。
如果指定的索引不存在,则df.loc和df.at都将新插入的行/列追加到现有数据帧中,但是df.iloc将引发“ IndexError:位置索引器越界”。在Python 2.7和3.7中测试的一个工作示例如下:
import numpy as np, pandas as pd
df1 = pd.DataFrame(index=np.arange(3), columns=['x','y','z'])
df1['x'] = ['A','B','C']
df1.at[2,'y'] = 400
# rows/columns specified does not exist, appends new rows/columns to existing data frame
df1.at['D','w'] = 9000
df1.loc['E','q'] = 499
# using df[<some_column_name>] == <condition> to retrieve target rows
df1.at[df1['x']=='B', 'y'] = 10000
df1.loc[df1['x']=='B', ['z','w']] = 10000
# using a list of index to setup values
df1.iloc[[1,2,4], 2] = 9999
df1.loc[[0,'D','E'],'w'] = 7500
df1.at[[0,2,"D"],'x'] = 10
df1.at[:, ['y', 'w']] = 8000
df1
>>> df1
x y z w q
0 10 8000 NaN 8000 NaN
1 B 8000 9999 8000 NaN
2 10 8000 9999 8000 NaN
D 10 8000 NaN 8000 NaN
E NaN 8000 9999 8000 499.0
Here is a summary of the valid solutions provided by all users, for data frames indexed by integer and string.
df.iloc, df.loc and df.at work for both type of data frames, df.iloc only works with row/column integer indices, df.loc and df.at supports for setting values using column names and / or integer indices.
When the specified index does not exist, both df.loc and df.at would append the newly inserted rows/columns to the existing data frame, but df.iloc would raise “IndexError: positional indexers are out-of-bounds”. A working example tested in Python 2.7 and 3.7 is as follows:
import numpy as np, pandas as pd
df1 = pd.DataFrame(index=np.arange(3), columns=['x','y','z'])
df1['x'] = ['A','B','C']
df1.at[2,'y'] = 400
# rows/columns specified does not exist, appends new rows/columns to existing data frame
df1.at['D','w'] = 9000
df1.loc['E','q'] = 499
# using df[<some_column_name>] == <condition> to retrieve target rows
df1.at[df1['x']=='B', 'y'] = 10000
df1.loc[df1['x']=='B', ['z','w']] = 10000
# using a list of index to setup values
df1.iloc[[1,2,4], 2] = 9999
df1.loc[[0,'D','E'],'w'] = 7500
df1.at[[0,2,"D"],'x'] = 10
df1.at[:, ['y', 'w']] = 8000
df1
>>> df1
x y z w q
0 10 8000 NaN 8000 NaN
1 B 8000 9999 8000 NaN
2 10 8000 9999 8000 NaN
D 10 8000 NaN 8000 NaN
E NaN 8000 9999 8000 499.0
回答 12
我进行了测试,输出df.set_value
速度稍快一些,但是官方方法df.at
似乎是最快且不建议使用的方法。
import numpy as np
import pandas as pd
df = pd.DataFrame(np.random.rand(100, 100))
%timeit df.iat[50,50]=50 # ✓
%timeit df.at[50,50]=50 # ✔
%timeit df.set_value(50,50,50) # will deprecate
%timeit df.iloc[50,50]=50
%timeit df.loc[50,50]=50
7.06 µs ± 118 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
5.52 µs ± 64.2 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
3.68 µs ± 80.8 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
98.7 µs ± 1.07 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
109 µs ± 1.42 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
请注意,这是为单个单元格设置值。对于向量loc
,iloc
由于它们已向量化,因此应该是更好的选择。
I tested and the output is df.set_value
is little faster, but the official method df.at
looks like the fastest non deprecated way to do it.
import numpy as np
import pandas as pd
df = pd.DataFrame(np.random.rand(100, 100))
%timeit df.iat[50,50]=50 # ✓
%timeit df.at[50,50]=50 # ✔
%timeit df.set_value(50,50,50) # will deprecate
%timeit df.iloc[50,50]=50
%timeit df.loc[50,50]=50
7.06 µs ± 118 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
5.52 µs ± 64.2 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
3.68 µs ± 80.8 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
98.7 µs ± 1.07 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
109 µs ± 1.42 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
Note this is setting the value for a single cell. For the vectors loc
and iloc
should be better options since they are vectorized.
回答 13
将索引与条件一起使用的一种方法是,首先获取满足您条件的所有行的索引,然后简单地以多种方式使用这些行索引
conditional_index = df.loc[ df['col name'] <condition> ].index
示例条件如下
==5, >10 , =="Any string", >= DateTime
然后,您可以通过多种方式使用这些行索引,例如
- 将一列的值替换为conditional_index
df.loc[conditional_index , [col name]]= <new value>
- 将多列的值替换为conditional_index
df.loc[conditional_index, [col1,col2]]= <new value>
- 保存conditional_index的一个好处是,您可以将一列的值分配给具有相同行索引的另一列
df.loc[conditional_index, [col1,col2]]= df.loc[conditional_index,'col name']
这都是可能的,因为.index返回一个索引数组,.loc可以将其与直接寻址一起使用,从而避免了一次又一次的遍历。
One way to use index with condition is first get the index of all the rows that satisfy your condition and then simply use those row indexes in a multiple of ways
conditional_index = df.loc[ df['col name'] <condition> ].index
Example condition is like
==5, >10 , =="Any string", >= DateTime
Then you can use these row indexes in variety of ways like
- Replace value of one column for conditional_index
df.loc[conditional_index , [col name]]= <new value>
- Replace value of multiple column for conditional_index
df.loc[conditional_index, [col1,col2]]= <new value>
- One benefit with saving the conditional_index is that you can assign value of one column to another column with same row index
df.loc[conditional_index, [col1,col2]]= df.loc[conditional_index,'col name']
This is all possible because .index returns a array of index which .loc can use with direct addressing so it avoids traversals again and again.
回答 14
df.loc['c','x']=10
这将更改第c行和
第x列的值。
df.loc['c','x']=10
This will change the value of cth row and
xth column.
回答 15
除上述答案外,这是一个基准测试,比较了将数据行添加到现有数据框的不同方法。它表明对于大型数据框(至少对于这些测试条件),使用at或set-value是最有效的方法。
- 为每一行创建新的数据框并…
- …附加(13.0 s)
- …将其串联(13.1 s)
- 首先将所有新行存储在另一个容器中,一次转换为新数据框,然后追加…
- container =清单清单(2.0 s)
- container =列表字典(1.9 s)
- 预分配整个数据框,遍历新行和所有列,并使用
- …在(0.6 s)
- … set_value(0.4秒)
对于测试,使用了一个包含100,000行和1,000列以及随机numpy值的现有数据框。向该数据框添加了100个新行。
代码见下:
#!/usr/bin/env python3
# -*- coding: utf-8 -*-
"""
Created on Wed Nov 21 16:38:46 2018
@author: gebbissimo
"""
import pandas as pd
import numpy as np
import time
NUM_ROWS = 100000
NUM_COLS = 1000
data = np.random.rand(NUM_ROWS,NUM_COLS)
df = pd.DataFrame(data)
NUM_ROWS_NEW = 100
data_tot = np.random.rand(NUM_ROWS + NUM_ROWS_NEW,NUM_COLS)
df_tot = pd.DataFrame(data_tot)
DATA_NEW = np.random.rand(1,NUM_COLS)
#%% FUNCTIONS
# create and append
def create_and_append(df):
for i in range(NUM_ROWS_NEW):
df_new = pd.DataFrame(DATA_NEW)
df = df.append(df_new)
return df
# create and concatenate
def create_and_concat(df):
for i in range(NUM_ROWS_NEW):
df_new = pd.DataFrame(DATA_NEW)
df = pd.concat((df, df_new))
return df
# store as dict and
def store_as_list(df):
lst = [[] for i in range(NUM_ROWS_NEW)]
for i in range(NUM_ROWS_NEW):
for j in range(NUM_COLS):
lst[i].append(DATA_NEW[0,j])
df_new = pd.DataFrame(lst)
df_tot = df.append(df_new)
return df_tot
# store as dict and
def store_as_dict(df):
dct = {}
for j in range(NUM_COLS):
dct[j] = []
for i in range(NUM_ROWS_NEW):
dct[j].append(DATA_NEW[0,j])
df_new = pd.DataFrame(dct)
df_tot = df.append(df_new)
return df_tot
# preallocate and fill using .at
def fill_using_at(df):
for i in range(NUM_ROWS_NEW):
for j in range(NUM_COLS):
#print("i,j={},{}".format(i,j))
df.at[NUM_ROWS+i,j] = DATA_NEW[0,j]
return df
# preallocate and fill using .at
def fill_using_set(df):
for i in range(NUM_ROWS_NEW):
for j in range(NUM_COLS):
#print("i,j={},{}".format(i,j))
df.set_value(NUM_ROWS+i,j,DATA_NEW[0,j])
return df
#%% TESTS
t0 = time.time()
create_and_append(df)
t1 = time.time()
print('Needed {} seconds'.format(t1-t0))
t0 = time.time()
create_and_concat(df)
t1 = time.time()
print('Needed {} seconds'.format(t1-t0))
t0 = time.time()
store_as_list(df)
t1 = time.time()
print('Needed {} seconds'.format(t1-t0))
t0 = time.time()
store_as_dict(df)
t1 = time.time()
print('Needed {} seconds'.format(t1-t0))
t0 = time.time()
fill_using_at(df_tot)
t1 = time.time()
print('Needed {} seconds'.format(t1-t0))
t0 = time.time()
fill_using_set(df_tot)
t1 = time.time()
print('Needed {} seconds'.format(t1-t0))
In addition to the answers above, here is a benchmark comparing different ways to add rows of data to an already existing dataframe. It shows that using at or set-value is the most efficient way for large dataframes (at least for these test conditions).
- Create new dataframe for each row and…
- … append it (13.0 s)
- … concatenate it (13.1 s)
- Store all new rows in another container first, convert to new dataframe once and append…
- container = lists of lists (2.0 s)
- container = dictionary of lists (1.9 s)
- Preallocate whole dataframe, iterate over new rows and all columns and fill using
- … at (0.6 s)
- … set_value (0.4 s)
For the test, an existing dataframe comprising 100,000 rows and 1,000 columns and random numpy values was used. To this dataframe, 100 new rows were added.
Code see below:
#!/usr/bin/env python3
# -*- coding: utf-8 -*-
"""
Created on Wed Nov 21 16:38:46 2018
@author: gebbissimo
"""
import pandas as pd
import numpy as np
import time
NUM_ROWS = 100000
NUM_COLS = 1000
data = np.random.rand(NUM_ROWS,NUM_COLS)
df = pd.DataFrame(data)
NUM_ROWS_NEW = 100
data_tot = np.random.rand(NUM_ROWS + NUM_ROWS_NEW,NUM_COLS)
df_tot = pd.DataFrame(data_tot)
DATA_NEW = np.random.rand(1,NUM_COLS)
#%% FUNCTIONS
# create and append
def create_and_append(df):
for i in range(NUM_ROWS_NEW):
df_new = pd.DataFrame(DATA_NEW)
df = df.append(df_new)
return df
# create and concatenate
def create_and_concat(df):
for i in range(NUM_ROWS_NEW):
df_new = pd.DataFrame(DATA_NEW)
df = pd.concat((df, df_new))
return df
# store as dict and
def store_as_list(df):
lst = [[] for i in range(NUM_ROWS_NEW)]
for i in range(NUM_ROWS_NEW):
for j in range(NUM_COLS):
lst[i].append(DATA_NEW[0,j])
df_new = pd.DataFrame(lst)
df_tot = df.append(df_new)
return df_tot
# store as dict and
def store_as_dict(df):
dct = {}
for j in range(NUM_COLS):
dct[j] = []
for i in range(NUM_ROWS_NEW):
dct[j].append(DATA_NEW[0,j])
df_new = pd.DataFrame(dct)
df_tot = df.append(df_new)
return df_tot
# preallocate and fill using .at
def fill_using_at(df):
for i in range(NUM_ROWS_NEW):
for j in range(NUM_COLS):
#print("i,j={},{}".format(i,j))
df.at[NUM_ROWS+i,j] = DATA_NEW[0,j]
return df
# preallocate and fill using .at
def fill_using_set(df):
for i in range(NUM_ROWS_NEW):
for j in range(NUM_COLS):
#print("i,j={},{}".format(i,j))
df.set_value(NUM_ROWS+i,j,DATA_NEW[0,j])
return df
#%% TESTS
t0 = time.time()
create_and_append(df)
t1 = time.time()
print('Needed {} seconds'.format(t1-t0))
t0 = time.time()
create_and_concat(df)
t1 = time.time()
print('Needed {} seconds'.format(t1-t0))
t0 = time.time()
store_as_list(df)
t1 = time.time()
print('Needed {} seconds'.format(t1-t0))
t0 = time.time()
store_as_dict(df)
t1 = time.time()
print('Needed {} seconds'.format(t1-t0))
t0 = time.time()
fill_using_at(df_tot)
t1 = time.time()
print('Needed {} seconds'.format(t1-t0))
t0 = time.time()
fill_using_set(df_tot)
t1 = time.time()
print('Needed {} seconds'.format(t1-t0))
回答 16
如果您不想更改整个行的值,而只更改某些列的值:
x = pd.DataFrame({'A': [1, 2, 3], 'B': [4, 5, 6]})
x.iloc[1] = dict(A=10, B=-10)
If you want to change values not for whole row, but only for some columns:
x = pd.DataFrame({'A': [1, 2, 3], 'B': [4, 5, 6]})
x.iloc[1] = dict(A=10, B=-10)
回答 17
From version 0.21.1 you can also use .at
method. There are some differences compared to .loc
as mentioned here – pandas .at versus .loc, but it’s faster on single value replacement
回答 18
如此,您的问题是将[‘x’,C]的NaN转换为值10
答案是..
df['x'].loc['C':]=10
df
替代代码是
df.loc['C':'x']=10
df
Soo, your question to convert NaN at [‘x’,C] to value 10
the answer is..
df['x'].loc['C':]=10
df
alternative code is
df.loc['C':'x']=10
df
回答 19
我也在寻找这个主题,并且提出了一种方法来遍历DataFrame并使用第二个DataFrame中的查找值对其进行更新。这是我的代码。
src_df = pd.read_sql_query(src_sql,src_connection)
for index1, row1 in src_df.iterrows():
for index, row in vertical_df.iterrows():
src_df.set_value(index=index1,col=u'etl_load_key',value=etl_load_key)
if (row1[u'src_id'] == row['SRC_ID']) is True:
src_df.set_value(index=index1,col=u'vertical',value=row['VERTICAL'])
I too was searching for this topic and I put together a way to iterate through a DataFrame and update it with lookup values from a second DataFrame. Here is my code.
src_df = pd.read_sql_query(src_sql,src_connection)
for index1, row1 in src_df.iterrows():
for index, row in vertical_df.iterrows():
src_df.set_value(index=index1,col=u'etl_load_key',value=etl_load_key)
if (row1[u'src_id'] == row['SRC_ID']) is True:
src_df.set_value(index=index1,col=u'vertical',value=row['VERTICAL'])