如何替换熊猫数据框的列中的文本?

问题:如何替换熊猫数据框的列中的文本?

我的数据框中有这样的一列:

range
"(2,30)"
"(50,290)"
"(400,1000)"
... 

我想,-破折号代替逗号。我目前正在使用此方法,但没有任何更改。

org_info_exc['range'].replace(',', '-', inplace=True)

有人可以帮忙吗?

I have a column in my dataframe like this:

range
"(2,30)"
"(50,290)"
"(400,1000)"
... 

and I want to replace the , comma with - dash. I’m currently using this method but nothing is changed.

org_info_exc['range'].replace(',', '-', inplace=True)

Can anybody help?


回答 0

使用向量化str方法replace

In [30]:

df['range'] = df['range'].str.replace(',','-')
df
Out[30]:
      range
0    (2-30)
1  (50-290)

编辑

因此,如果我们查看您尝试过的内容以及为何不起作用:

df['range'].replace(',','-',inplace=True)

文档中,我们看到以下说明:

str或regex:str:与to_replace完全匹配的字符串将替换为value

因此,由于str值不匹配,因此不会发生替换,请与以下内容进行比较:

In [43]:

df = pd.DataFrame({'range':['(2,30)',',']})
df['range'].replace(',','-', inplace=True)
df['range']
Out[43]:
0    (2,30)
1         -
Name: range, dtype: object

在这里,我们在第二行获得了完全匹配,并且替换发生了。

Use the vectorised str method replace:

In [30]:

df['range'] = df['range'].str.replace(',','-')
df
Out[30]:
      range
0    (2-30)
1  (50-290)

EDIT

So if we look at what you tried and why it didn’t work:

df['range'].replace(',','-',inplace=True)

from the docs we see this desc:

str or regex: str: string exactly matching to_replace will be replaced with value

So because the str values do not match, no replacement occurs, compare with the following:

In [43]:

df = pd.DataFrame({'range':['(2,30)',',']})
df['range'].replace(',','-', inplace=True)
df['range']
Out[43]:
0    (2,30)
1         -
Name: range, dtype: object

here we get an exact match on the second row and the replacement occurs.


回答 1

对于其他任何从Google搜索到的人,如何在所有列上进行字符串替换(例如,如果其中有多个列,如OP的“范围”列):Pandas在replace数据框对象上具有内置方法。

df.replace(',', '-', regex=True)

资料来源:文件

For anyone else arriving here from Google search on how to do a string replacement on all columns (for example, if one has multiple columns like the OP’s ‘range’ column): Pandas has a built in replace method available on a dataframe object.

df.replace(',', '-', regex=True)

Source: Docs


回答 2

在列名称中用下划线替换所有逗号

data.columns= data.columns.str.replace(' ','_',regex=True)

Replace all commas with underscore in the column names

data.columns= data.columns.str.replace(' ','_',regex=True)

回答 3

另外,对于那些希望替换一列中多个字符的用户,可以使用正则表达式来实现:

import re
chars_to_remove = ['.', '-', '(', ')', '']
regular_expression = '[' + re.escape (''. join (chars_to_remove)) + ']'

df['string_col'].str.replace(regular_expression, '', regex=True)

In addition, for those looking to replace more than one character in a column, you can do it using regular expressions:

import re
chars_to_remove = ['.', '-', '(', ')', '']
regular_expression = '[' + re.escape (''. join (chars_to_remove)) + ']'

df['string_col'].str.replace(regular_expression, '', regex=True)