df = df.rename(columns={'oldName1':'newName1','oldName2':'newName2'})# Or rename the existing DataFrame (rather than creating a copy)
df.rename(columns={'oldName1':'newName1','oldName2':'newName2'}, inplace=True)
最小代码示例
df = pd.DataFrame('x', index=range(3), columns=list('abcde'))
df
a b c d e0 x x x x x1 x x x x x2 x x x x x
下列方法均起作用并产生相同的输出:
df2 = df.rename({'a':'X','b':'Y'}, axis=1)# new method
df2 = df.rename({'a':'X','b':'Y'}, axis='columns')
df2 = df.rename(columns={'a':'X','b':'Y'})# old method
df2
X Y c d e0 x x x x x1 x x x x x2 x x x x x
切记将结果分配回去,因为修改未就位。或者,指定inplace=True:
df.rename({'a':'X','b':'Y'}, axis=1, inplace=True)
df
X Y c d e0 x x x x x1 x x x x x2 x x x x x
Use the df.rename() function and refer the columns to be renamed. Not all the columns have to be renamed:
df = df.rename(columns={'oldName1': 'newName1', 'oldName2': 'newName2'})
# Or rename the existing DataFrame (rather than creating a copy)
df.rename(columns={'oldName1': 'newName1', 'oldName2': 'newName2'}, inplace=True)
Minimal Code Example
df = pd.DataFrame('x', index=range(3), columns=list('abcde'))
df
a b c d e
0 x x x x x
1 x x x x x
2 x x x x x
The following methods all work and produce the same output:
df2 = df.rename({'a': 'X', 'b': 'Y'}, axis=1) # new method
df2 = df.rename({'a': 'X', 'b': 'Y'}, axis='columns')
df2 = df.rename(columns={'a': 'X', 'b': 'Y'}) # old method
df2
X Y c d e
0 x x x x x
1 x x x x x
2 x x x x x
Remember to assign the result back, as the modification is not-inplace. Alternatively, specify inplace=True:
df.rename({'a': 'X', 'b': 'Y'}, axis=1, inplace=True)
df
X Y c d e
0 x x x x x
1 x x x x x
2 x x x x x
From v0.25, you can also specify errors='raise' to raise errors if an invalid column-to-rename is specified. See v0.25 rename() docs.
REASSIGN COLUMN HEADERS
Use df.set_axis() with axis=1 and inplace=False (to return a copy).
df2 = df.set_axis(['V', 'W', 'X', 'Y', 'Z'], axis=1, inplace=False)
df2
V W X Y Z
0 x x x x x
1 x x x x x
2 x x x x x
This returns a copy, but you can modify the DataFrame in-place by setting inplace=True (this is the default behaviour for versions <=0.24 but is likely to change in the future).
You can also assign headers directly:
df.columns = ['V', 'W', 'X', 'Y', 'Z']
df
V W X Y Z
0 x x x x x
1 x x x x x
2 x x x x x
# new for pandas 0.21+
df.some_method1().some_method2().set_axis().some_method3()# old way
df1 = df.some_method1().some_method2()
df1.columns = columns
df1.some_method3()
There have been some significant updates to column renaming in version 0.21.
The rename method has added the axis parameter which may be set to columns or 1. This update makes this method match the rest of the pandas API. It still has the index and columns parameters but you are no longer forced to use them.
The set_axis method with the inplace set to False enables you to rename all the index or column labels with a list.
The rename function also accepts functions that will be applied to each column name.
df.rename(lambda x: x[1:], axis='columns')
or
df.rename(lambda x: x[1:], axis=1)
Using set_axis with a list and inplace=False
You can supply a list to the set_axis method that is equal in length to the number of columns (or index). Currently, inplace defaults to True, but inplace will be defaulted to False in future releases.
Why not use df.columns = ['a', 'b', 'c', 'd', 'e']?
There is nothing wrong with assigning columns directly like this. It is a perfectly good solution.
The advantage of using set_axis is that it can be used as part of a method chain and that it returns a new copy of the DataFrame. Without it, you would have to store your intermediate steps of the chain to another variable before reassigning the columns.
# new for pandas 0.21+
df.some_method1()
.some_method2()
.set_axis()
.some_method3()
# old way
df1 = df.some_method1()
.some_method2()
df1.columns = columns
df1.some_method3()
This way you can manually edit the new_names as you wish.
Works great when you need to rename only a few columns to correct mispellings, accents, remove special characters etc.
I have the edited column names stored it in a list, but I don’t know how to replace the column names.
I do not want to solve the problem of how to replace '$' or strip the first character off of each column header. OP has already done this step. Instead I want to focus on replacing the existing columns object with a new one given a list of replacement column names.
df.columns = new where new is the list of new columns names is as simple as it gets. The drawback of this approach is that it requires editing the existing dataframe’s columns attribute and it isn’t done inline. I’ll show a few ways to perform this via pipelining without editing the existing dataframe.
Setup 1
To focus on the need to rename of replace column names with a pre-existing list, I’ll create a new sample dataframe df with initial column names and unrelated new column names.
However, you can easily create that dictionary and include it in the call to rename. The following takes advantage of the fact that when iterating over df, we iterate over each column name.
# given just a list of new column names
df.rename(columns=dict(zip(df, new)))
x098 y765 z432
0 1 3 5
1 2 4 6
This works great if your original column names are unique. But if they are not, then this breaks down.
We didn’t map the new list as the column names. We ended up repeating y765. Instead, we can use the keys argument of the pd.concat function while iterating through the columns of df.
pd.concat([c for _, c in df.items()], axis=1, keys=new)
x098 y765 z432
0 1 3 5
1 2 4 6
Solution 3
Reconstruct. This should only be used if you have a single dtype for all columns. Otherwise, you’ll end up with dtypeobject for all columns and converting them back requires more dictionary work.
Solution 4
This is a gimmicky trick with transpose and set_index. pd.DataFrame.set_index allows us to set an index inline but there is no corresponding set_columns. So we can transpose, then set_index, and transpose back. However, the same single dtype versus mixed dtype caveat from solution 3 applies here.
Solution 5
Use a lambda in pd.DataFrame.rename that cycles through each element of new
In this solution, we pass a lambda that takes x but then ignores it. It also takes a y but doesn’t expect it. Instead, an iterator is given as a default value and I can then use that to cycle through one at a time without regard to what the value of x is.
And as pointed out to me by the folks in sopython chat, if I add a * in between x and y, I can protect my y variable. Though, in this context I don’t believe it needs protecting. It is still worth mentioning.
df.columns =['column_one','column_two']
df.columns.names =['name of the list of columns']
df.index.names =['name of the index']
name of the list of columns column_one column_two
name of the index
041152263
I would like to explain a bit what happens behind the scenes.
Dataframes are a set of Series.
Series in turn are an extension of a numpy.array
numpy.arrays have a property .name
This is the name of the series. It is seldom that pandas respects this attribute, but it lingers in places and can be used to hack some pandas behaviors.
Naming the list of columns
A lot of answers here talks about the df.columns attribute being a list when in fact it is a Series. This means it has a .name attribute.
This is what happens if you decide to fill in the name of the columns Series:
df.columns = ['column_one', 'column_two']
df.columns.names = ['name of the list of columns']
df.index.names = ['name of the index']
name of the list of columns column_one column_two
name of the index
0 4 1
1 5 2
2 6 3
Note that the name of the index always comes one column lower.
Artifacts that linger
The .name attribute lingers on sometimes. If you set df.columns = ['one', 'two'] then the df.one.name will be 'one'.
If you set df.one.name = 'three' then df.columns will still give you ['one', 'two'], and df.one.name will give you 'three'
BUT
pd.DataFrame(df.one) will return
three
0 1
1 2
2 3
Because pandas reuses the .name of the already defined Series.
Multi level column names
Pandas has ways of doing multi layered column names. There is not so much magic involved but I wanted to cover this in my answer too since I don’t see anyone picking up on this here.
If you’ve got the dataframe, df.columns dumps everything into a list you can manipulate and then reassign into your dataframe as the names of columns…
columns = df.columns
columns = [row.replace("$","") for row in columns]
df.rename(columns=dict(zip(columns, things)), inplace=True)
df.head() #to validate the output
Best way? IDK. A way – yes.
A better way of evaluating all the main techniques put forward in the answers to the question is below using cProfile to gage memory & execution time. @kadee, @kaitlyn, & @eumiro had the functions with the fastest execution times – though these functions are so fast we’re comparing the rounding of .000 and .001 seconds for all the answers. Moral: my answer above likely isn’t the ‘Best’ way.
The limitation of this method is that if one column has to be changed, full column list has to be passed. Also, this method is not applicable on index labels.
For example, if you passed this:
df.columns = ['a','b','c','d']
This will throw an error. Length mismatch: Expected axis has 5 elements, new values have 4 elements.
Another method is the Pandas rename() method which is used to rename any index, column or row
new_cols =['a','b','c','d','e']
df.columns = new_cols
>>> df
a b c d e
011111
如果您有一个将旧列名键入新列名的字典,则可以执行以下操作:
d ={'$a':'a','$b':'b','$c':'c','$d':'d','$e':'e'}
df.columns = df.columns.map(lambda col: d[col])# Or `.map(d.get)` as pointed out by @PiRSquared.>>> df
a b c d e
011111
如果没有列表或字典映射,则可以$通过列表理解来去除前导符号:
df.columns =[col[1:]if col[0]=='$'else col for col in df]
If your new list of columns is in the same order as the existing columns, the assignment is simple:
new_cols = ['a', 'b', 'c', 'd', 'e']
df.columns = new_cols
>>> df
a b c d e
0 1 1 1 1 1
If you had a dictionary keyed on old column names to new column names, you could do the following:
d = {'$a': 'a', '$b': 'b', '$c': 'c', '$d': 'd', '$e': 'e'}
df.columns = df.columns.map(lambda col: d[col]) # Or `.map(d.get)` as pointed out by @PiRSquared.
>>> df
a b c d e
0 1 1 1 1 1
If you don’t have a list or dictionary mapping, you could strip the leading $ symbol via a list comprehension:
df.columns = [col[1:] if col[0] == '$' else col for col in df]
df = pd.DataFrame({"A":[1,2,3],"B":[4,5,6]})#creating a df with column name A and B
df.rename({"A":"new_a","B":"new_b"},axis='columns',inplace =True)#renaming column A with 'new_a' and B with 'new_b'
output:
new_a new_b
014125236
2.使用映射重命名索引/行名:
df.rename({0:"x",1:"y",2:"z"},axis='index',inplace =True)#Row name are getting replaced by 'x','y','z'.
output:
new_a new_b
x 14
y 25
z 36
df = pd.DataFrame({"A": [1, 2, 3], "B": [4, 5, 6]}) #creating a df with column name A and B
df.rename({"A": "new_a", "B": "new_b"},axis='columns',inplace =True) #renaming column A with 'new_a' and B with 'new_b'
output:
new_a new_b
0 1 4
1 2 5
2 3 6
2.Renaming index/Row_Name using mapping:
df.rename({0: "x", 1: "y", 2: "z"},axis='index',inplace =True) #Row name are getting replaced by 'x','y','z'.
output:
new_a new_b
x 1 4
y 2 5
z 3 6
I know this question and answer has been chewed to death. But I referred to it for inspiration for one of the problem I was having . I was able to solve it using bits and pieces from different answers hence providing my response in case anyone needs it.
My method is generic wherein you can add additional delimiters by comma separating delimiters= variable and future-proof it.
Working Code:
import pandas as pd
import re
df = pd.DataFrame({'$a':[1,2], '$b': [3,4],'$c':[5,6], '$d': [7,8], '$e': [9,10]})
delimiters = '$'
matchPattern = '|'.join(map(re.escape, delimiters))
df.columns = [re.split(matchPattern, i)[1] for i in df.columns ]
Output:
>>> df
$a $b $c $d $e
0 1 3 5 7 9
1 2 4 6 8 10
>>> df
a b c d e
0 1 3 5 7 9
1 2 4 6 8 10
回答 19
请注意,这些方法不适用于MultiIndex。对于MultiIndex,您需要执行以下操作:
>>> df = pd.DataFrame({('$a','$x'):[1,2],('$b','$y'):[3,4],('e','f'):[5,6]})>>> df
$a $b e
$x $y f
01351246>>> rename ={('$a','$x'):('a','x'),('$b','$y'):('b','y')}>>> df.columns = pandas.MultiIndex.from_tuples([
rename.get(item, item)for item in df.columns.tolist()])>>> df
a b e
x y f
01351246
Note that these approach do not work for a MultiIndex. For a MultiIndex, you need to do something like the following:
>>> df = pd.DataFrame({('$a','$x'):[1,2], ('$b','$y'): [3,4], ('e','f'):[5,6]})
>>> df
$a $b e
$x $y f
0 1 3 5
1 2 4 6
>>> rename = {('$a','$x'):('a','x'), ('$b','$y'):('b','y')}
>>> df.columns = pandas.MultiIndex.from_tuples([
rename.get(item, item) for item in df.columns.tolist()])
>>> df
a b e
x y f
0 1 3 5
1 2 4 6
回答 20
另一种选择是使用正则表达式重命名:
import pandas as pd
import re
df = pd.DataFrame({'$a':[1,2],'$b':[3,4],'$c':[5,6]})
df = df.rename(columns=lambda x: re.sub('\$','',x))>>> df
a b c
01351246
If you have to deal with loads of columns named by the providing system out of your control, I came up with the following approach that is a combination of a general approach and specific replacments in one go.
First create a dictionary from the dataframe column names using regex expressions in order to throw away certain appendixes of column names
and then add specific replacements to the dictionary to name core columns as expected later in the receiving database.
import pandas as pd
ufo_cols =['city','color reported','shape reported','state','time']
ufo.columns = ufo_cols
ufo = pd.read_csv('link to the file you are using', names = ufo_cols, header =0)
In addition to the solution already provided, you can replace all the columns while you are reading the file. We can use names and header=0 to do that.
First, we create a list of the names that we like to use as our column names:
import pandas as pd
ufo_cols = ['city', 'color reported', 'shape reported', 'state', 'time']
ufo.columns = ufo_cols
ufo = pd.read_csv('link to the file you are using', names = ufo_cols, header = 0)
In this case, all the column names will be replaced with the names you have in your list.
回答 23
这是一个我喜欢用来减少键入的漂亮小功能:
def rename(data, oldnames, newname):if type(oldnames)== str:#input can be a string or list of strings
oldnames =[oldnames]#when renaming multiple columns
newname =[newname]#make sure you pass the corresponding list of new names
i =0for name in oldnames:
oldvar =[c for c in data.columns if name in c]if len(oldvar)==0:raiseValueError("Sorry, couldn't find that column in the dataset")if len(oldvar)>1:#doesn't have to be an exact match print("Found multiple columns that matched "+ str(name)+" :")for c in oldvar:print(str(oldvar.index(c))+": "+ str(c))
ind = input('please enter the index of the column you would like to rename: ')
oldvar = oldvar[int(ind)]if len(oldvar)==1:
oldvar = oldvar[0]
data = data.rename(columns ={oldvar : newname[i]})
i +=1return data
这是它如何工作的示例:
In[2]: df = pd.DataFrame(np.random.randint(0,10,size=(10,4)), columns=['col1','col2','omg','idk'])#first list = existing variables#second list = new names for those variablesIn[3]: df = rename(df,['col','omg'],['first','ohmy'])Found multiple columns that matched col :0: col1
1: col2
please enter the index of the column you would like to rename:0In[4]: df.columns
Out[5]:Index(['first','col2','ohmy','idk'], dtype='object')
Here’s a nifty little function I like to use to cut down on typing:
def rename(data, oldnames, newname):
if type(oldnames) == str: #input can be a string or list of strings
oldnames = [oldnames] #when renaming multiple columns
newname = [newname] #make sure you pass the corresponding list of new names
i = 0
for name in oldnames:
oldvar = [c for c in data.columns if name in c]
if len(oldvar) == 0:
raise ValueError("Sorry, couldn't find that column in the dataset")
if len(oldvar) > 1: #doesn't have to be an exact match
print("Found multiple columns that matched " + str(name) + " :")
for c in oldvar:
print(str(oldvar.index(c)) + ": " + str(c))
ind = input('please enter the index of the column you would like to rename: ')
oldvar = oldvar[int(ind)]
if len(oldvar) == 1:
oldvar = oldvar[0]
data = data.rename(columns = {oldvar : newname[i]})
i += 1
return data
Here is an example of how it works:
In [2]: df = pd.DataFrame(np.random.randint(0,10,size=(10, 4)), columns=['col1','col2','omg','idk'])
#first list = existing variables
#second list = new names for those variables
In [3]: df = rename(df, ['col','omg'],['first','ohmy'])
Found multiple columns that matched col :
0: col1
1: col2
please enter the index of the column you would like to rename: 0
In [4]: df.columns
Out[5]: Index(['first', 'col2', 'ohmy', 'idk'], dtype='object')