根据熊猫中的另一个值更改一个值

Question 1

I’m trying to reprogram my Stata code into Python for speed improvements, and I was pointed in the direction of PANDAS. I am, however, having a hard time wrapping my head around how to process the data.

Let’s say I want to iterate over all values in the column head ‘ID.’ If that ID matches a specific number, then I want to change two corresponding values FirstName and LastName.

In Stata it looks like this:

replace FirstName = "Matt" if ID==103
replace LastName =  "Jones" if ID==103

So this replaces all values in FirstName that correspond with values of ID == 103 to Matt.

In PANDAS, I’m trying something like this

df = read_csv("test.csv")
for i in df['ID']:
    if i ==103:
          ...

Not sure where to go from here. Any ideas?

Question 2

One option is to use Python’s slicing and indexing features to logically evaluate the places where your condition holds and overwrite the data there.

Assuming you can load your data directly into pandas with pandas.read_csv then the following code might be helpful for you.

import pandas
df = pandas.read_csv("test.csv")
df.loc[df.ID == 103, 'FirstName'] = "Matt"
df.loc[df.ID == 103, 'LastName'] = "Jones"

As mentioned in the comments, you can also do the assignment to both columns in one shot:

df.loc[df.ID == 103, ['FirstName', 'LastName']] = 'Matt', 'Jones'

Note that you’ll need pandas version 0.11 or newer to make use of loc for overwrite assignment operations.

Another way to do it is to use what is called chained assignment. The behavior of this is less stable and so it is not considered the best solution (it is explicitly discouraged in the docs), but it is useful to know about:

import pandas
df = pandas.read_csv("test.csv")
df['FirstName'][df.ID == 103] = "Matt"
df['LastName'][df.ID == 103] = "Jones"

Question 3

You can use map, it can map vales from a dictonairy or even a custom function.

Suppose this is your df:

    ID First_Name Last_Name
0  103          a         b
1  104          c         d

Create the dicts:

fnames = {103: "Matt", 104: "Mr"}
lnames = {103: "Jones", 104: "X"}

And map:

df['First_Name'] = df['ID'].map(fnames)
df['Last_Name'] = df['ID'].map(lnames)

The result will be:

    ID First_Name Last_Name
0  103       Matt     Jones
1  104         Mr         X

Or use a custom function:

names = {103: ("Matt", "Jones"), 104: ("Mr", "X")}
df['First_Name'] = df['ID'].map(lambda x: names[x][0])

Question 4

The original question addresses a specific narrow use case. For those who need more generic answers here are some examples:

Creating a new column using data from other columns

Given the dataframe below:

import pandas as pd
import numpy as np

df = pd.DataFrame([['dog', 'hound', 5],
                   ['cat', 'ragdoll', 1]],
                  columns=['animal', 'type', 'age'])

In[1]:
Out[1]:
  animal     type  age
----------------------
0    dog    hound    5
1    cat  ragdoll    1

Below we are adding a new description column as a concatenation of other columns by using the + operation which is overridden for series. Fancy string formatting, f-strings etc won’t work here since the + applies to scalars and not ‘primitive’ values:

df['description'] = 'A ' + df.age.astype(str) + ' years old ' \
                    + df.type + ' ' + df.animal

In [2]: df
Out[2]:
  animal     type  age                description
-------------------------------------------------
0    dog    hound    5    A 5 years old hound dog
1    cat  ragdoll    1  A 1 years old ragdoll cat

We get 1 years for the cat (instead of 1 year) which we will be fixing below using conditionals.

Modifying an existing column with conditionals

Here we are replacing the original animal column with values from other columns, and using np.where to set a conditional substring based on the value of age:

# append 's' to 'age' if it's greater than 1
df.animal = df.animal + ", " + df.type + ", " + \
    df.age.astype(str) + " year" + np.where(df.age > 1, 's', '')

In [3]: df
Out[3]:
                 animal     type  age
-------------------------------------
0   dog, hound, 5 years    hound    5
1  cat, ragdoll, 1 year  ragdoll    1

Modifying multiple columns with conditionals

A more flexible approach is to call .apply() on an entire dataframe rather than on a single column:

def transform_row(r):
    r.animal = 'wild ' + r.type
    r.type = r.animal + ' creature'
    r.age = "{} year{}".format(r.age, r.age > 1 and 's' or '')
    return r

df.apply(transform_row, axis=1)

In[4]:
Out[4]:
         animal            type      age
----------------------------------------
0    wild hound    dog creature  5 years
1  wild ragdoll    cat creature   1 year

In the code above the transform_row(r) function takes a Series object representing a given row (indicated by axis=1, the default value of axis=0 will provide a Series object for each column). This simplifies processing since we can access the actual ‘primitive’ values in the row using the column names and have visibility of other cells in the given row/column.

Question 5

This question might still be visited often enough that it’s worth offering an addendum to Mr Kassies’ answer. The dict built-in class can be sub-classed so that a default is returned for ‘missing’ keys. This mechanism works well for pandas. But see below.

In this way it’s possible to avoid key errors.

>>> import pandas as pd
>>> data = { 'ID': [ 101, 201, 301, 401 ] }
>>> df = pd.DataFrame(data)
>>> class SurnameMap(dict):
...     def __missing__(self, key):
...         return ''
...     
>>> surnamemap = SurnameMap()
>>> surnamemap[101] = 'Mohanty'
>>> surnamemap[301] = 'Drake'
>>> df['Surname'] = df['ID'].apply(lambda x: surnamemap[x])
>>> df
    ID  Surname
0  101  Mohanty
1  201         
2  301    Drake
3  401

The same thing can be done more simply in the following way. The use of the ‘default’ argument for the get method of a dict object makes it unnecessary to subclass a dict.

>>> import pandas as pd
>>> data = { 'ID': [ 101, 201, 301, 401 ] }
>>> df = pd.DataFrame(data)
>>> surnamemap = {}
>>> surnamemap[101] = 'Mohanty'
>>> surnamemap[301] = 'Drake'
>>> df['Surname'] = df['ID'].apply(lambda x: surnamemap.get(x, ''))
>>> df
    ID  Surname
0  101  Mohanty
1  201         
2  301    Drake
3  401

根据熊猫中的另一个值更改一个值

问题：根据熊猫中的另一个值更改一个值

回答 0

回答 1

回答 2

使用其他列中的数据创建新列

使用条件修改现有列

使用条件修改多列

Creating a new column using data from other columns

Modifying an existing column with conditionals

Modifying multiple columns with conditionals

回答 3

排行榜展示

Python 情人节超强技能导出微信聊天记录生成词云

你不得不知道的python超级文献批量搜索下载工具

7行代码 Python热力图可视化分析缺失数据处理

Python 流程图 — 一键转化代码为流程图

Python 优化—算出每条语句执行时间

你的10W块放哪里能赚最多钱？

文章展示

Bash等同于Python的pass语句

如何在Python中声明和添加项目到数组？

Python Black一键格式化美化代码(详细配置教程)

在Java中调用Python？

将HDF5用于大型阵列存储（而不是平面二进制文件）是否具有分析速度或内存使用优势？

将图像插入IPython Notebook Markdown

根据熊猫中的另一个值更改一个值

问题：根据熊猫中的另一个值更改一个值

回答 0

回答 1

回答 2

使用其他列中的数据创建新列

使用条件修改现有列

使用条件修改多列

Creating a new column using data from other columns

Modifying an existing column with conditionals

Modifying multiple columns with conditionals

回答 3

相关文章

排行榜展示

文章展示