The data I have to work with is a bit messy.. It has header names inside of its data. How can I choose a row from an existing pandas dataframe and make it (rename it to) a column header?
In [21]: df = pd.DataFrame([(1,2,3), ('foo','bar','baz'), (4,5,6)])
In [22]: df
Out[22]:
0 1 2
0 1 2 3
1 foo bar baz
2 4 5 6
Set the column labels to equal the values in the 2nd row (index location 1):
In [23]: df.columns = df.iloc[1]
If the index has unique labels, you can drop the 2nd row using:
In [24]: df.drop(df.index[1])
Out[24]:
1 foo bar baz
0 1 2 3
2 4 5 6
If the index is not unique, you could use:
In [133]: df.iloc[pd.RangeIndex(len(df)).drop(1)]
Out[133]:
1 foo bar baz
0 1 2 3
2 4 5 6
Using df.drop(df.index[1]) removes all rows with the same label as the second row. Because non-unique indexes can lead to stumbling blocks (or potential bugs) like this, it’s often better to take care that the index is unique (even though Pandas does not require it).
If you don’t mind using regular expressions, then this function would give you much power in renaming files:
import re, glob, os
def renamer(files, pattern, replacement):
for pathname in glob.glob(files):
basename= os.path.basename(pathname)
new_filename= re.sub(pattern, replacement, basename)
if new_filename != basename:
os.rename(
pathname,
os.path.join(os.path.dirname(pathname), new_filename))
So in your example, you could do (assuming it’s the current directory where the files are):
renamer("*.doc", r"^(.*)\.doc$", r"new(\1).doc")
but you could also roll back to the initial filenames:
renamer("*.doc", r"^new\((.*)\)\.doc", r"\1.doc")
and more.
回答 3
我用它来简单地重命名文件夹的子文件夹中的所有文件
import os
def replace(fpath, old_str, new_str):for path, subdirs, files in os.walk(fpath):for name in files:if(old_str.lower()in name.lower()):
os.rename(os.path.join(path,name), os.path.join(path,
name.lower().replace(old_str,new_str)))
I have this to simply rename all files in subfolders of folder
import os
def replace(fpath, old_str, new_str):
for path, subdirs, files in os.walk(fpath):
for name in files:
if(old_str.lower() in name.lower()):
os.rename(os.path.join(path,name), os.path.join(path,
name.lower().replace(old_str,new_str)))
I am replacing all occurences of old_str with any case by new_str.
I like to have my music, movie, and
picture files named a certain way.
When I download files from the
internet, they usually don’t follow my
naming convention. I found myself
manually renaming each file to fit my
style. This got old realy fast, so I
decided to write a program to do it
for me.
This program can convert the filename
to all lowercase, replace strings in
the filename with whatever you want,
and trim any number of characters from
the front or back of the filename.
import os
import sys
# checking whether path and filename are given.if len(sys.argv)!=3:print"Usage : python rename.py <path> <new_name.extension>"
sys.exit()# splitting name and extension.
name = sys.argv[2].split('.')if len(name)<2:
name.append('')else:
name[1]=".%s"%name[1]# to name starting from 1 to number_of_files.
count =1# creating a new folder in which the renamed files will be stored.
s ="%s/pic_folder"% sys.argv[1]try:
os.mkdir(s)exceptOSError:# if pic_folder is already present, use it.passtry:for x in os.walk(sys.argv[1]):for y in x[2]:# creating the rename pattern.
s ="%spic_folder/%s%s%s"%(x[0], name[0], count, name[1])# getting the original path of the file to be renamed.
z = os.path.join(x[0],y)# renaming.
os.rename(z, s)# incrementing the count.
count = count +1exceptOSError:pass
I’ve written a python script on my own. It takes as arguments the path of the directory in which the files are present and the naming pattern that you want to use. However, it renames by attaching an incremental number (1, 2, 3 and so on) to the naming pattern you give.
import os
import sys
# checking whether path and filename are given.
if len(sys.argv) != 3:
print "Usage : python rename.py <path> <new_name.extension>"
sys.exit()
# splitting name and extension.
name = sys.argv[2].split('.')
if len(name) < 2:
name.append('')
else:
name[1] = ".%s" %name[1]
# to name starting from 1 to number_of_files.
count = 1
# creating a new folder in which the renamed files will be stored.
s = "%s/pic_folder" % sys.argv[1]
try:
os.mkdir(s)
except OSError:
# if pic_folder is already present, use it.
pass
try:
for x in os.walk(sys.argv[1]):
for y in x[2]:
# creating the rename pattern.
s = "%spic_folder/%s%s%s" %(x[0], name[0], count, name[1])
# getting the original path of the file to be renamed.
z = os.path.join(x[0],y)
# renaming.
os.rename(z, s)
# incrementing the count.
count = count + 1
except OSError:
pass
Hope this works for you.
回答 6
在您需要执行重命名的目录中。
import os
# get the file name list to nameList
nameList = os.listdir()#loop through the name and renamefor fileName in nameList:
rename=fileName[15:28]
os.rename(fileName,rename)#example:#input fileName bulk like :20180707131932_IMG_4304.JPG#output renamed bulk like :IMG_4304.JPG
Be in the directory where you need to perform the renaming.
import os
# get the file name list to nameList
nameList = os.listdir()
#loop through the name and rename
for fileName in nameList:
rename=fileName[15:28]
os.rename(fileName,rename)
#example:
#input fileName bulk like :20180707131932_IMG_4304.JPG
#output renamed bulk like :IMG_4304.JPG
回答 7
directoryName ="Photographs"
filePath = os.path.abspath(directoryName)
filePathWithSlash = filePath +"\\"for counter, filename in enumerate(os.listdir(directoryName)):
filenameWithPath = os.path.join(filePathWithSlash, filename)
os.rename(filenameWithPath, filenameWithPath.replace(filename,"DSC_"+ \
str(counter).zfill(4)+".jpg"))# e.g. filename = "photo1.jpg", directory = "c:\users\Photographs" # The string.replace call swaps in the new filename into # the current filename within the filenameWitPath string. Which # is then used by os.rename to rename the file in place, using the # current (unmodified) filenameWithPath.# os.listdir delivers the filename(s) from the directory# however in attempting to "rename" the file using os # a specific location of the file to be renamed is required.# this code is from Windows
directoryName = "Photographs"
filePath = os.path.abspath(directoryName)
filePathWithSlash = filePath + "\\"
for counter, filename in enumerate(os.listdir(directoryName)):
filenameWithPath = os.path.join(filePathWithSlash, filename)
os.rename(filenameWithPath, filenameWithPath.replace(filename,"DSC_" + \
str(counter).zfill(4) + ".jpg" ))
# e.g. filename = "photo1.jpg", directory = "c:\users\Photographs"
# The string.replace call swaps in the new filename into
# the current filename within the filenameWitPath string. Which
# is then used by os.rename to rename the file in place, using the
# current (unmodified) filenameWithPath.
# os.listdir delivers the filename(s) from the directory
# however in attempting to "rename" the file using os
# a specific location of the file to be renamed is required.
# this code is from Windows
folder = r"R:\mystuff\GIS_Projects\Website\2017\PDF"import os
for root, dirs, filenames in os.walk(folder):for filename in filenames:
fullpath = os.path.join(root, filename)
filename_split = os.path.splitext(filename)# filename will be filename_split[0] and extension will be filename_split[1])print fullpath
print filename_split[0]print filename_split[1]
os.rename(os.path.join(root, filename), os.path.join(root,"NewText_2017_"+ filename_split[0]+ filename_split[1]))
I had a similar problem, but I wanted to append text to the beginning of the file name of all files in a directory and used a similar method. See example below:
folder = r"R:\mystuff\GIS_Projects\Website\2017\PDF"
import os
for root, dirs, filenames in os.walk(folder):
for filename in filenames:
fullpath = os.path.join(root, filename)
filename_split = os.path.splitext(filename) # filename will be filename_split[0] and extension will be filename_split[1])
print fullpath
print filename_split[0]
print filename_split[1]
os.rename(os.path.join(root, filename), os.path.join(root, "NewText_2017_" + filename_split[0] + filename_split[1]))
def batch_rename():
base_dir ='F:/ad_samples/test_samples/'
sub_dir_list = glob.glob(base_dir +'*')# print sub_dir_list # like that ['F:/dir1', 'F:/dir2']for dir_item in sub_dir_list:
files = glob.glob(dir_item +'/*.jpg')
i =0for f in files:
os.rename(f, os.path.join(dir_item, str(i)+'.jpg'))
i +=1
as to me in my directory I have multiple subdir, each subdir has lots of images I want to change all the subdir images to 1.jpg ~ n.jpg
def batch_rename():
base_dir = 'F:/ad_samples/test_samples/'
sub_dir_list = glob.glob(base_dir + '*')
# print sub_dir_list # like that ['F:/dir1', 'F:/dir2']
for dir_item in sub_dir_list:
files = glob.glob(dir_item + '/*.jpg')
i = 0
for f in files:
os.rename(f, os.path.join(dir_item, str(i) + '.jpg'))
i += 1
import click
from pathlib importPath# current directory
direc_to_refactor =Path(".")# list of old file paths
old_paths = list(direc_to_refactor.iterdir())# list of old file names
old_names =[str(p.name)for p in old_paths]# modify old file names in an editor,# and store them in a list of new file names
new_names = click.edit("\n".join(old_names)).split("\n")# refactor the old file namesfor i in range(len(old_paths)):
old_paths[i].replace(direc_to_refactor / new_names[i])
If you would like to modify file names in an editor (such as vim), the click library comes with the command click.edit(), which can be used to receive user input from an editor. Here is an example of how it can be used to refactor files in a directory.
import click
from pathlib import Path
# current directory
direc_to_refactor = Path(".")
# list of old file paths
old_paths = list(direc_to_refactor.iterdir())
# list of old file names
old_names = [str(p.name) for p in old_paths]
# modify old file names in an editor,
# and store them in a list of new file names
new_names = click.edit("\n".join(old_names)).split("\n")
# refactor the old file names
for i in range(len(old_paths)):
old_paths[i].replace(direc_to_refactor / new_names[i])
I wrote a command line application that uses the same technique, but that reduces the volatility of this script, and comes with more options, such as recursive refactoring. Here is the link to the github page. This is useful if you like command line applications, and are interested in making some quick edits to file names. (My application is similar to the “bulkrename” command found in ranger).
The following content types are stale and need to be deleted:
yourapp | foo
Any objects related to these content types by a foreign key will also
be deleted.Are you sure you want to delete these content types?If you're unsure, answer 'no'.
You can accomplish this more simply using the db_table Meta option in your model class. But every time you do that, you increase the legacy weight of your codebase — having class names differ from table names makes your code harder to understand and maintain. I fully support doing simple refactorings like this for the sake of clarity.
(update) I just tried this in production, and got a strange warning when I went to apply the migration. It said:
The following content types are stale and need to be deleted:
yourapp | foo
Any objects related to these content types by a foreign key will also
be deleted. Are you sure you want to delete these content types?
If you're unsure, answer 'no'.
I answered “no” and everything seemed to be fine.
回答 1
进行更改models.py,然后运行
./manage.py schemamigration --auto myapp
检查迁移文件时,您会看到它删除了一个表并创建了一个新表。
classMigration(SchemaMigration):def forwards(self, orm):# Deleting model 'Foo'
db.delete_table('myapp_foo')# Adding model 'Bar'
db.create_table('myapp_bar',(...))
db.send_create_signal('myapp',['Bar'])def backwards(self, orm):...
这不是您想要的。而是编辑迁移,使其看起来像:
classMigration(SchemaMigration):def forwards(self, orm):# Renaming model from 'Foo' to 'Bar'
db.rename_table('myapp_foo','myapp_bar')ifnot db.dry_run:
orm['contenttypes.contenttype'].objects.filter(
app_label='myapp', model='foo').update(model='bar')def backwards(self, orm):# Renaming model from 'Bar' to 'Foo'
db.rename_table('myapp_bar','myapp_foo')ifnot db.dry_run:
orm['contenttypes.contenttype'].objects.filter(app_label='myapp', model='bar').update(model='foo')
When you inspect the migration file, you’ll see that it deletes a table and creates a new one
class Migration(SchemaMigration):
def forwards(self, orm):
# Deleting model 'Foo'
db.delete_table('myapp_foo')
# Adding model 'Bar'
db.create_table('myapp_bar', (
...
))
db.send_create_signal('myapp', ['Bar'])
def backwards(self, orm):
...
This is not quite what you want. Instead, edit the migration so that it looks like:
class Migration(SchemaMigration):
def forwards(self, orm):
# Renaming model from 'Foo' to 'Bar'
db.rename_table('myapp_foo', 'myapp_bar')
if not db.dry_run:
orm['contenttypes.contenttype'].objects.filter(
app_label='myapp', model='foo').update(model='bar')
def backwards(self, orm):
# Renaming model from 'Bar' to 'Foo'
db.rename_table('myapp_bar', 'myapp_foo')
if not db.dry_run:
orm['contenttypes.contenttype'].objects.filter(app_label='myapp', model='bar').update(model='foo')
In the absence of the update statement, the db.send_create_signal call will create a new ContentType with the new model name. But it’s better to just update the ContentType you already have in case there are database objects pointing to it (e.g., via a GenericForeignKey).
Also, if you’ve renamed some columns which are foreign keys to the renamed model, don’t forget to
South can’t do it itself – how does it know that Bar represents what Foo used to? This is the sort of thing I’d write a custom migration for. You can change your ForeignKey in code as you’ve done above, and then it’s just a case of renaming the appropriate fields and tables, which you can do any way you want.
Finally, do you really need to do this? I’ve yet to need to rename models – model names are just an implementation detail – particularly given the availability of the verbose_name Meta option.
I followed Leopd’s solution above. But, that did not change the model names. I changed it manually in the code (also in related models where this is referred as FK). And done another south migration, but with –fake option. This makes model names and table names to be same.
Just realized, one could first start with changing model names, then edit the migrations file before applying them. Much cleaner.
col_dict ={'gdp':'log(gdp)','cap':'cap_mod'}## key→old name, value→new name
df.columns =[col_dict.get(x, x)for x in df.columns]
时间:
%%timeit
df.rename(columns={'gdp':'log(gdp)'}, inplace=True)10000 loops, best of 3:168µs per loop
%%timeit
df.columns =['log(gdp)'if x=='gdp'else x for x in df.columns]10000 loops, best of 3:58.5µs per loop
A much faster implementation would be to use list-comprehension if you need to rename a single column.
df.columns = ['log(gdp)' if x=='gdp' else x for x in df.columns]
If the need arises to rename multiple columns, either use conditional expressions like:
df.columns = ['log(gdp)' if x=='gdp' else 'cap_mod' if x=='cap' else x for x in df.columns]
Or, construct a mapping using a dictionary and perform the list-comprehension with it’s get operation by setting default value as the old name:
col_dict = {'gdp': 'log(gdp)', 'cap': 'cap_mod'} ## key→old name, value→new name
df.columns = [col_dict.get(x, x) for x in df.columns]
Timings:
%%timeit
df.rename(columns={'gdp':'log(gdp)'}, inplace=True)
10000 loops, best of 3: 168 µs per loop
%%timeit
df.columns = ['log(gdp)' if x=='gdp' else x for x in df.columns]
10000 loops, best of 3: 58.5 µs per loop
df.set_axis(['cap','log(gdp)','y'], axis=1, inplace=False)# df.set_axis(['cap', 'log(gdp)', 'y'], axis='columns', inplace=False)
cap log(gdp) y
0 x x x
1 x x x
2 x x x
3 x x x
4 x x x
DataFrame.set_axis() method with axis=1. Pass a list-like sequence. Options are available for in-place modification as well.
rename with axis=1
df = pd.DataFrame('x', columns=['y', 'gdp', 'cap'], index=range(5))
df
y gdp cap
0 x x x
1 x x x
2 x x x
3 x x x
4 x x x
With 0.21+, you can now specify an axis parameter with rename:
df.rename({'gdp':'log(gdp)'}, axis=1)
# df.rename({'gdp':'log(gdp)'}, axis='columns')
y log(gdp) cap
0 x x x
1 x x x
2 x x x
3 x x x
4 x x x
(Note that rename is not in-place by default, so you will need to assign the result back.)
This addition has been made to improve consistency with the rest of the API. The new axis argument is analogous to the columns parameter—they do the same thing.
df.rename(columns={'gdp': 'log(gdp)'})
y log(gdp) cap
0 x x x
1 x x x
2 x x x
3 x x x
4 x x x
rename also accepts a callback that is called once for each column.
df.rename(lambda x: x[0], axis=1)
# df.rename(lambda x: x[0], axis='columns')
y g c
0 x x x
1 x x x
2 x x x
3 x x x
4 x x x
For this specific scenario, you would want to use
df.rename(lambda x: 'log(gdp)' if x == 'gdp' else x, axis=1)
Similar to replace method of strings in python, pandas Index and Series (object dtype only) define a (“vectorized”) str.replace method for string and regex-based replacement.
df.columns = df.columns.str.replace('gdp', 'log(gdp)')
df
y log(gdp) cap
0 x x x
1 x x x
2 x x x
3 x x x
4 x x x
The advantage of this over the other methods is that str.replace supports regex (enabled by default). See the docs for more information.
Passing a list to set_axis with axis=1
Call set_axis with a list of header(s). The list must be equal in length to the columns/index size. set_axis mutates the original DataFrame by default, but you can specify inplace=False to return a modified copy.
df.set_axis(['cap', 'log(gdp)', 'y'], axis=1, inplace=False)
# df.set_axis(['cap', 'log(gdp)', 'y'], axis='columns', inplace=False)
cap log(gdp) y
0 x x x
1 x x x
2 x x x
3 x x x
4 x x x
Note: In future releases, inplace will default to True.
Method Chaining
Why choose set_axis when we already have an efficient way of assigning columns with df.columns = ...? As shown by Ted Petrou in [this answer],(https://stackoverflow.com/a/46912050/4909087) set_axis is useful when trying to chain methods.
Compare
# new for pandas 0.21+
df.some_method1()
.some_method2()
.set_axis()
.some_method3()
Versus
# old way
df1 = df.some_method1()
.some_method2()
df1.columns = columns
df1.some_method3()
The former is more natural and free flowing syntax.
There are at least five different ways to rename specific columns in pandas, and I have listed them below along with links to the original answers. I also timed these methods and found them to perform about the same (though YMMV depending on your data set and scenario). The test case below is to rename columns AMNZ to A2M2N2Z2 in a dataframe with columns A to Z containing a million rows.
# Import required modules
import numpy as np
import pandas as pd
import timeit
# Create sample data
df = pd.DataFrame(np.random.randint(0,9999,size=(1000000, 26)), columns=list('ABCDEFGHIJKLMNOPQRSTUVWXYZ'))
# Standard way - https://stackoverflow.com/a/19758398/452587
def method_1():
df_renamed = df.rename(columns={'A': 'A2', 'M': 'M2', 'N': 'N2', 'Z': 'Z2'})
# Lambda function - https://stackoverflow.com/a/16770353/452587
def method_2():
df_renamed = df.rename(columns=lambda x: x + '2' if x in ['A', 'M', 'N', 'Z'] else x)
# Mapping function - https://stackoverflow.com/a/19758398/452587
def rename_some(x):
if x=='A' or x=='M' or x=='N' or x=='Z':
return x + '2'
return x
def method_3():
df_renamed = df.rename(columns=rename_some)
# Dictionary comprehension - https://stackoverflow.com/a/58143182/452587
def method_4():
df_renamed = df.rename(columns={col: col + '2' for col in df.columns[
np.asarray([i for i, col in enumerate(df.columns) if 'A' in col or 'M' in col or 'N' in col or 'Z' in col])
]})
# Dictionary comprehension - https://stackoverflow.com/a/38101084/452587
def method_5():
df_renamed = df.rename(columns=dict(zip(df[['A', 'M', 'N', 'Z']], ['A2', 'M2', 'N2', 'Z2'])))
print('Method 1:', timeit.timeit(method_1, number=10))
print('Method 2:', timeit.timeit(method_2, number=10))
print('Method 3:', timeit.timeit(method_3, number=10))
print('Method 4:', timeit.timeit(method_4, number=10))
print('Method 5:', timeit.timeit(method_5, number=10))
df = df.rename(columns={'oldName1':'newName1','oldName2':'newName2'})# Or rename the existing DataFrame (rather than creating a copy)
df.rename(columns={'oldName1':'newName1','oldName2':'newName2'}, inplace=True)
最小代码示例
df = pd.DataFrame('x', index=range(3), columns=list('abcde'))
df
a b c d e0 x x x x x1 x x x x x2 x x x x x
下列方法均起作用并产生相同的输出:
df2 = df.rename({'a':'X','b':'Y'}, axis=1)# new method
df2 = df.rename({'a':'X','b':'Y'}, axis='columns')
df2 = df.rename(columns={'a':'X','b':'Y'})# old method
df2
X Y c d e0 x x x x x1 x x x x x2 x x x x x
切记将结果分配回去,因为修改未就位。或者,指定inplace=True:
df.rename({'a':'X','b':'Y'}, axis=1, inplace=True)
df
X Y c d e0 x x x x x1 x x x x x2 x x x x x
Use the df.rename() function and refer the columns to be renamed. Not all the columns have to be renamed:
df = df.rename(columns={'oldName1': 'newName1', 'oldName2': 'newName2'})
# Or rename the existing DataFrame (rather than creating a copy)
df.rename(columns={'oldName1': 'newName1', 'oldName2': 'newName2'}, inplace=True)
Minimal Code Example
df = pd.DataFrame('x', index=range(3), columns=list('abcde'))
df
a b c d e
0 x x x x x
1 x x x x x
2 x x x x x
The following methods all work and produce the same output:
df2 = df.rename({'a': 'X', 'b': 'Y'}, axis=1) # new method
df2 = df.rename({'a': 'X', 'b': 'Y'}, axis='columns')
df2 = df.rename(columns={'a': 'X', 'b': 'Y'}) # old method
df2
X Y c d e
0 x x x x x
1 x x x x x
2 x x x x x
Remember to assign the result back, as the modification is not-inplace. Alternatively, specify inplace=True:
df.rename({'a': 'X', 'b': 'Y'}, axis=1, inplace=True)
df
X Y c d e
0 x x x x x
1 x x x x x
2 x x x x x
From v0.25, you can also specify errors='raise' to raise errors if an invalid column-to-rename is specified. See v0.25 rename() docs.
REASSIGN COLUMN HEADERS
Use df.set_axis() with axis=1 and inplace=False (to return a copy).
df2 = df.set_axis(['V', 'W', 'X', 'Y', 'Z'], axis=1, inplace=False)
df2
V W X Y Z
0 x x x x x
1 x x x x x
2 x x x x x
This returns a copy, but you can modify the DataFrame in-place by setting inplace=True (this is the default behaviour for versions <=0.24 but is likely to change in the future).
You can also assign headers directly:
df.columns = ['V', 'W', 'X', 'Y', 'Z']
df
V W X Y Z
0 x x x x x
1 x x x x x
2 x x x x x
# new for pandas 0.21+
df.some_method1().some_method2().set_axis().some_method3()# old way
df1 = df.some_method1().some_method2()
df1.columns = columns
df1.some_method3()
There have been some significant updates to column renaming in version 0.21.
The rename method has added the axis parameter which may be set to columns or 1. This update makes this method match the rest of the pandas API. It still has the index and columns parameters but you are no longer forced to use them.
The set_axis method with the inplace set to False enables you to rename all the index or column labels with a list.
The rename function also accepts functions that will be applied to each column name.
df.rename(lambda x: x[1:], axis='columns')
or
df.rename(lambda x: x[1:], axis=1)
Using set_axis with a list and inplace=False
You can supply a list to the set_axis method that is equal in length to the number of columns (or index). Currently, inplace defaults to True, but inplace will be defaulted to False in future releases.
Why not use df.columns = ['a', 'b', 'c', 'd', 'e']?
There is nothing wrong with assigning columns directly like this. It is a perfectly good solution.
The advantage of using set_axis is that it can be used as part of a method chain and that it returns a new copy of the DataFrame. Without it, you would have to store your intermediate steps of the chain to another variable before reassigning the columns.
# new for pandas 0.21+
df.some_method1()
.some_method2()
.set_axis()
.some_method3()
# old way
df1 = df.some_method1()
.some_method2()
df1.columns = columns
df1.some_method3()
This way you can manually edit the new_names as you wish.
Works great when you need to rename only a few columns to correct mispellings, accents, remove special characters etc.
I have the edited column names stored it in a list, but I don’t know how to replace the column names.
I do not want to solve the problem of how to replace '$' or strip the first character off of each column header. OP has already done this step. Instead I want to focus on replacing the existing columns object with a new one given a list of replacement column names.
df.columns = new where new is the list of new columns names is as simple as it gets. The drawback of this approach is that it requires editing the existing dataframe’s columns attribute and it isn’t done inline. I’ll show a few ways to perform this via pipelining without editing the existing dataframe.
Setup 1
To focus on the need to rename of replace column names with a pre-existing list, I’ll create a new sample dataframe df with initial column names and unrelated new column names.
However, you can easily create that dictionary and include it in the call to rename. The following takes advantage of the fact that when iterating over df, we iterate over each column name.
# given just a list of new column names
df.rename(columns=dict(zip(df, new)))
x098 y765 z432
0 1 3 5
1 2 4 6
This works great if your original column names are unique. But if they are not, then this breaks down.
We didn’t map the new list as the column names. We ended up repeating y765. Instead, we can use the keys argument of the pd.concat function while iterating through the columns of df.
pd.concat([c for _, c in df.items()], axis=1, keys=new)
x098 y765 z432
0 1 3 5
1 2 4 6
Solution 3
Reconstruct. This should only be used if you have a single dtype for all columns. Otherwise, you’ll end up with dtypeobject for all columns and converting them back requires more dictionary work.
Solution 4
This is a gimmicky trick with transpose and set_index. pd.DataFrame.set_index allows us to set an index inline but there is no corresponding set_columns. So we can transpose, then set_index, and transpose back. However, the same single dtype versus mixed dtype caveat from solution 3 applies here.
Solution 5
Use a lambda in pd.DataFrame.rename that cycles through each element of new
In this solution, we pass a lambda that takes x but then ignores it. It also takes a y but doesn’t expect it. Instead, an iterator is given as a default value and I can then use that to cycle through one at a time without regard to what the value of x is.
And as pointed out to me by the folks in sopython chat, if I add a * in between x and y, I can protect my y variable. Though, in this context I don’t believe it needs protecting. It is still worth mentioning.
df.columns =['column_one','column_two']
df.columns.names =['name of the list of columns']
df.index.names =['name of the index']
name of the list of columns column_one column_two
name of the index
041152263
I would like to explain a bit what happens behind the scenes.
Dataframes are a set of Series.
Series in turn are an extension of a numpy.array
numpy.arrays have a property .name
This is the name of the series. It is seldom that pandas respects this attribute, but it lingers in places and can be used to hack some pandas behaviors.
Naming the list of columns
A lot of answers here talks about the df.columns attribute being a list when in fact it is a Series. This means it has a .name attribute.
This is what happens if you decide to fill in the name of the columns Series:
df.columns = ['column_one', 'column_two']
df.columns.names = ['name of the list of columns']
df.index.names = ['name of the index']
name of the list of columns column_one column_two
name of the index
0 4 1
1 5 2
2 6 3
Note that the name of the index always comes one column lower.
Artifacts that linger
The .name attribute lingers on sometimes. If you set df.columns = ['one', 'two'] then the df.one.name will be 'one'.
If you set df.one.name = 'three' then df.columns will still give you ['one', 'two'], and df.one.name will give you 'three'
BUT
pd.DataFrame(df.one) will return
three
0 1
1 2
2 3
Because pandas reuses the .name of the already defined Series.
Multi level column names
Pandas has ways of doing multi layered column names. There is not so much magic involved but I wanted to cover this in my answer too since I don’t see anyone picking up on this here.
If you’ve got the dataframe, df.columns dumps everything into a list you can manipulate and then reassign into your dataframe as the names of columns…
columns = df.columns
columns = [row.replace("$","") for row in columns]
df.rename(columns=dict(zip(columns, things)), inplace=True)
df.head() #to validate the output
Best way? IDK. A way – yes.
A better way of evaluating all the main techniques put forward in the answers to the question is below using cProfile to gage memory & execution time. @kadee, @kaitlyn, & @eumiro had the functions with the fastest execution times – though these functions are so fast we’re comparing the rounding of .000 and .001 seconds for all the answers. Moral: my answer above likely isn’t the ‘Best’ way.
The limitation of this method is that if one column has to be changed, full column list has to be passed. Also, this method is not applicable on index labels.
For example, if you passed this:
df.columns = ['a','b','c','d']
This will throw an error. Length mismatch: Expected axis has 5 elements, new values have 4 elements.
Another method is the Pandas rename() method which is used to rename any index, column or row
new_cols =['a','b','c','d','e']
df.columns = new_cols
>>> df
a b c d e
011111
如果您有一个将旧列名键入新列名的字典,则可以执行以下操作:
d ={'$a':'a','$b':'b','$c':'c','$d':'d','$e':'e'}
df.columns = df.columns.map(lambda col: d[col])# Or `.map(d.get)` as pointed out by @PiRSquared.>>> df
a b c d e
011111
如果没有列表或字典映射,则可以$通过列表理解来去除前导符号:
df.columns =[col[1:]if col[0]=='$'else col for col in df]
If your new list of columns is in the same order as the existing columns, the assignment is simple:
new_cols = ['a', 'b', 'c', 'd', 'e']
df.columns = new_cols
>>> df
a b c d e
0 1 1 1 1 1
If you had a dictionary keyed on old column names to new column names, you could do the following:
d = {'$a': 'a', '$b': 'b', '$c': 'c', '$d': 'd', '$e': 'e'}
df.columns = df.columns.map(lambda col: d[col]) # Or `.map(d.get)` as pointed out by @PiRSquared.
>>> df
a b c d e
0 1 1 1 1 1
If you don’t have a list or dictionary mapping, you could strip the leading $ symbol via a list comprehension:
df.columns = [col[1:] if col[0] == '$' else col for col in df]
df = pd.DataFrame({"A":[1,2,3],"B":[4,5,6]})#creating a df with column name A and B
df.rename({"A":"new_a","B":"new_b"},axis='columns',inplace =True)#renaming column A with 'new_a' and B with 'new_b'
output:
new_a new_b
014125236
2.使用映射重命名索引/行名:
df.rename({0:"x",1:"y",2:"z"},axis='index',inplace =True)#Row name are getting replaced by 'x','y','z'.
output:
new_a new_b
x 14
y 25
z 36
df = pd.DataFrame({"A": [1, 2, 3], "B": [4, 5, 6]}) #creating a df with column name A and B
df.rename({"A": "new_a", "B": "new_b"},axis='columns',inplace =True) #renaming column A with 'new_a' and B with 'new_b'
output:
new_a new_b
0 1 4
1 2 5
2 3 6
2.Renaming index/Row_Name using mapping:
df.rename({0: "x", 1: "y", 2: "z"},axis='index',inplace =True) #Row name are getting replaced by 'x','y','z'.
output:
new_a new_b
x 1 4
y 2 5
z 3 6
I know this question and answer has been chewed to death. But I referred to it for inspiration for one of the problem I was having . I was able to solve it using bits and pieces from different answers hence providing my response in case anyone needs it.
My method is generic wherein you can add additional delimiters by comma separating delimiters= variable and future-proof it.
Working Code:
import pandas as pd
import re
df = pd.DataFrame({'$a':[1,2], '$b': [3,4],'$c':[5,6], '$d': [7,8], '$e': [9,10]})
delimiters = '$'
matchPattern = '|'.join(map(re.escape, delimiters))
df.columns = [re.split(matchPattern, i)[1] for i in df.columns ]
Output:
>>> df
$a $b $c $d $e
0 1 3 5 7 9
1 2 4 6 8 10
>>> df
a b c d e
0 1 3 5 7 9
1 2 4 6 8 10
回答 19
请注意,这些方法不适用于MultiIndex。对于MultiIndex,您需要执行以下操作:
>>> df = pd.DataFrame({('$a','$x'):[1,2],('$b','$y'):[3,4],('e','f'):[5,6]})>>> df
$a $b e
$x $y f
01351246>>> rename ={('$a','$x'):('a','x'),('$b','$y'):('b','y')}>>> df.columns = pandas.MultiIndex.from_tuples([
rename.get(item, item)for item in df.columns.tolist()])>>> df
a b e
x y f
01351246
Note that these approach do not work for a MultiIndex. For a MultiIndex, you need to do something like the following:
>>> df = pd.DataFrame({('$a','$x'):[1,2], ('$b','$y'): [3,4], ('e','f'):[5,6]})
>>> df
$a $b e
$x $y f
0 1 3 5
1 2 4 6
>>> rename = {('$a','$x'):('a','x'), ('$b','$y'):('b','y')}
>>> df.columns = pandas.MultiIndex.from_tuples([
rename.get(item, item) for item in df.columns.tolist()])
>>> df
a b e
x y f
0 1 3 5
1 2 4 6
回答 20
另一种选择是使用正则表达式重命名:
import pandas as pd
import re
df = pd.DataFrame({'$a':[1,2],'$b':[3,4],'$c':[5,6]})
df = df.rename(columns=lambda x: re.sub('\$','',x))>>> df
a b c
01351246
If you have to deal with loads of columns named by the providing system out of your control, I came up with the following approach that is a combination of a general approach and specific replacments in one go.
First create a dictionary from the dataframe column names using regex expressions in order to throw away certain appendixes of column names
and then add specific replacements to the dictionary to name core columns as expected later in the receiving database.
import pandas as pd
ufo_cols =['city','color reported','shape reported','state','time']
ufo.columns = ufo_cols
ufo = pd.read_csv('link to the file you are using', names = ufo_cols, header =0)
In addition to the solution already provided, you can replace all the columns while you are reading the file. We can use names and header=0 to do that.
First, we create a list of the names that we like to use as our column names:
import pandas as pd
ufo_cols = ['city', 'color reported', 'shape reported', 'state', 'time']
ufo.columns = ufo_cols
ufo = pd.read_csv('link to the file you are using', names = ufo_cols, header = 0)
In this case, all the column names will be replaced with the names you have in your list.
回答 23
这是一个我喜欢用来减少键入的漂亮小功能:
def rename(data, oldnames, newname):if type(oldnames)== str:#input can be a string or list of strings
oldnames =[oldnames]#when renaming multiple columns
newname =[newname]#make sure you pass the corresponding list of new names
i =0for name in oldnames:
oldvar =[c for c in data.columns if name in c]if len(oldvar)==0:raiseValueError("Sorry, couldn't find that column in the dataset")if len(oldvar)>1:#doesn't have to be an exact match print("Found multiple columns that matched "+ str(name)+" :")for c in oldvar:print(str(oldvar.index(c))+": "+ str(c))
ind = input('please enter the index of the column you would like to rename: ')
oldvar = oldvar[int(ind)]if len(oldvar)==1:
oldvar = oldvar[0]
data = data.rename(columns ={oldvar : newname[i]})
i +=1return data
这是它如何工作的示例:
In[2]: df = pd.DataFrame(np.random.randint(0,10,size=(10,4)), columns=['col1','col2','omg','idk'])#first list = existing variables#second list = new names for those variablesIn[3]: df = rename(df,['col','omg'],['first','ohmy'])Found multiple columns that matched col :0: col1
1: col2
please enter the index of the column you would like to rename:0In[4]: df.columns
Out[5]:Index(['first','col2','ohmy','idk'], dtype='object')
Here’s a nifty little function I like to use to cut down on typing:
def rename(data, oldnames, newname):
if type(oldnames) == str: #input can be a string or list of strings
oldnames = [oldnames] #when renaming multiple columns
newname = [newname] #make sure you pass the corresponding list of new names
i = 0
for name in oldnames:
oldvar = [c for c in data.columns if name in c]
if len(oldvar) == 0:
raise ValueError("Sorry, couldn't find that column in the dataset")
if len(oldvar) > 1: #doesn't have to be an exact match
print("Found multiple columns that matched " + str(name) + " :")
for c in oldvar:
print(str(oldvar.index(c)) + ": " + str(c))
ind = input('please enter the index of the column you would like to rename: ')
oldvar = oldvar[int(ind)]
if len(oldvar) == 1:
oldvar = oldvar[0]
data = data.rename(columns = {oldvar : newname[i]})
i += 1
return data
Here is an example of how it works:
In [2]: df = pd.DataFrame(np.random.randint(0,10,size=(10, 4)), columns=['col1','col2','omg','idk'])
#first list = existing variables
#second list = new names for those variables
In [3]: df = rename(df, ['col','omg'],['first','ohmy'])
Found multiple columns that matched col :
0: col1
1: col2
please enter the index of the column you would like to rename: 0
In [4]: df.columns
Out[5]: Index(['first', 'col2', 'ohmy', 'idk'], dtype='object')