标签归档:rename

将Pandas DataFrame的行转换为列标题,

问题:将Pandas DataFrame的行转换为列标题,

我必须使用的数据有点混乱。它的数据中包含标头名称。如何从现有的pandas数据框中选择一行并使其(重命名为)列标题?

我想做类似的事情:

header = df[df['old_header_name1'] == 'new_header_name1']

df.columns = header

The data I have to work with is a bit messy.. It has header names inside of its data. How can I choose a row from an existing pandas dataframe and make it (rename it to) a column header?

I want to do something like:

header = df[df['old_header_name1'] == 'new_header_name1']

df.columns = header

回答 0

In [21]: df = pd.DataFrame([(1,2,3), ('foo','bar','baz'), (4,5,6)])

In [22]: df
Out[22]: 
     0    1    2
0    1    2    3
1  foo  bar  baz
2    4    5    6

将列标签设置为等于第二行(索引位置1)中的值:

In [23]: df.columns = df.iloc[1]

如果索引具有唯一标签,则可以使用以下命令删除第二行:

In [24]: df.drop(df.index[1])
Out[24]: 
1 foo bar baz
0   1   2   3
2   4   5   6

如果索引不是唯一的,则可以使用:

In [133]: df.iloc[pd.RangeIndex(len(df)).drop(1)]
Out[133]: 
1 foo bar baz
0   1   2   3
2   4   5   6

使用df.drop(df.index[1])删除所有与第二行具有相同标签的行。因为非唯一索引可能会导致像这样的绊脚石(或潜在的错误),所以通常最好注意索引的唯一性(即使Pandas不需要它)。

In [21]: df = pd.DataFrame([(1,2,3), ('foo','bar','baz'), (4,5,6)])

In [22]: df
Out[22]: 
     0    1    2
0    1    2    3
1  foo  bar  baz
2    4    5    6

Set the column labels to equal the values in the 2nd row (index location 1):

In [23]: df.columns = df.iloc[1]

If the index has unique labels, you can drop the 2nd row using:

In [24]: df.drop(df.index[1])
Out[24]: 
1 foo bar baz
0   1   2   3
2   4   5   6

If the index is not unique, you could use:

In [133]: df.iloc[pd.RangeIndex(len(df)).drop(1)]
Out[133]: 
1 foo bar baz
0   1   2   3
2   4   5   6

Using df.drop(df.index[1]) removes all rows with the same label as the second row. Because non-unique indexes can lead to stumbling blocks (or potential bugs) like this, it’s often better to take care that the index is unique (even though Pandas does not require it).


回答 1

这有效(熊猫v’0.19.2’):

df.rename(columns=df.iloc[0])

This works (pandas v’0.19.2′):

df.rename(columns=df.iloc[0])

回答 2

重新创建数据框会更容易。这也将从头开始解释列的类型。

headers = df.iloc[0]
new_df  = pd.DataFrame(df.values[1:], columns=headers)

It would be easier to recreate the data frame. This would also interpret the columns types from scratch.

headers = df.iloc[0]
new_df  = pd.DataFrame(df.values[1:], columns=headers)

回答 3

您可以通过代表的参数在read_csvread_html构造函数中指定行索引。这样的优点是可以自动删除所有先前被认为是垃圾的行。headerRow number(s) to use as the column names, and the start of the data

import pandas as pd
from io import StringIO

In[1]
    csv = '''junk1, junk2, junk3, junk4, junk5
    junk1, junk2, junk3, junk4, junk5
    pears, apples, lemons, plums, other
    40, 50, 61, 72, 85
    '''

    df = pd.read_csv(StringIO(csv), header=2)
    print(df)

Out[1]
       pears   apples   lemons   plums   other
    0     40       50       61      72      85

You can specify the row index in the read_csv or read_html constructors via the header parameter which represents Row number(s) to use as the column names, and the start of the data. This has the advantage of automatically dropping all the preceding rows which supposedly are junk.

import pandas as pd
from io import StringIO

In[1]
    csv = '''junk1, junk2, junk3, junk4, junk5
    junk1, junk2, junk3, junk4, junk5
    pears, apples, lemons, plums, other
    40, 50, 61, 72, 85
    '''

    df = pd.read_csv(StringIO(csv), header=2)
    print(df)

Out[1]
       pears   apples   lemons   plums   other
    0     40       50       61      72      85

批量重命名目录中的文件

问题:批量重命名目录中的文件

有没有一种简单的方法可以使用Python重命名目录中已包含的一组文件?

示例: 我有一个充满* .doc文件的目录,我想以一致的方式重命名它们。

X.doc->“ new(X).doc”

Y.doc->“ new(Y).doc”

Is there an easy way to rename a group of files already contained in a directory, using Python?

Example: I have a directory full of *.doc files and I want to rename them in a consistent way.

X.doc -> “new(X).doc”

Y.doc -> “new(Y).doc”


回答 0

这样的重命名非常容易,例如使用osglob模块:

import glob, os

def rename(dir, pattern, titlePattern):
    for pathAndFilename in glob.iglob(os.path.join(dir, pattern)):
        title, ext = os.path.splitext(os.path.basename(pathAndFilename))
        os.rename(pathAndFilename, 
                  os.path.join(dir, titlePattern % title + ext))

然后可以在示例中使用它,如下所示:

rename(r'c:\temp\xx', r'*.doc', r'new(%s)')

上面的示例会将dir中的所有*.doc文件都转换c:\temp\xxnew(%s).doc,其中%s是文件的先前基本名称(不带扩展名)。

Such renaming is quite easy, for example with os and glob modules:

import glob, os

def rename(dir, pattern, titlePattern):
    for pathAndFilename in glob.iglob(os.path.join(dir, pattern)):
        title, ext = os.path.splitext(os.path.basename(pathAndFilename))
        os.rename(pathAndFilename, 
                  os.path.join(dir, titlePattern % title + ext))

You could then use it in your example like this:

rename(r'c:\temp\xx', r'*.doc', r'new(%s)')

The above example will convert all *.doc files in c:\temp\xx dir to new(%s).doc, where %s is the previous base name of the file (without extension).


回答 1

我更喜欢为每次替换编写一个小的内衬,而不是编写更加通用和复杂的代码。例如:

这会将当前目录中任何非隐藏文件中的所有下划线都用连字符替换

import os
[os.rename(f, f.replace('_', '-')) for f in os.listdir('.') if not f.startswith('.')]

I prefer writing small one liners for each replace I have to do instead of making a more generic and complex code. E.g.:

This replaces all underscores with hyphens in any non-hidden file in the current directory

import os
[os.rename(f, f.replace('_', '-')) for f in os.listdir('.') if not f.startswith('.')]

回答 2

如果您不介意使用正则表达式,那么此函数将为您提供重命名文件的强大功能:

import re, glob, os

def renamer(files, pattern, replacement):
    for pathname in glob.glob(files):
        basename= os.path.basename(pathname)
        new_filename= re.sub(pattern, replacement, basename)
        if new_filename != basename:
            os.rename(
              pathname,
              os.path.join(os.path.dirname(pathname), new_filename))

因此,在您的示例中,您可以这样做(假设这是文件所在的当前目录):

renamer("*.doc", r"^(.*)\.doc$", r"new(\1).doc")

但您也可以回滚到初始文件名:

renamer("*.doc", r"^new\((.*)\)\.doc", r"\1.doc")

和更多。

If you don’t mind using regular expressions, then this function would give you much power in renaming files:

import re, glob, os

def renamer(files, pattern, replacement):
    for pathname in glob.glob(files):
        basename= os.path.basename(pathname)
        new_filename= re.sub(pattern, replacement, basename)
        if new_filename != basename:
            os.rename(
              pathname,
              os.path.join(os.path.dirname(pathname), new_filename))

So in your example, you could do (assuming it’s the current directory where the files are):

renamer("*.doc", r"^(.*)\.doc$", r"new(\1).doc")

but you could also roll back to the initial filenames:

renamer("*.doc", r"^new\((.*)\)\.doc", r"\1.doc")

and more.


回答 3

我用它来简单地重命名文件夹的子文件夹中的所有文件

import os

def replace(fpath, old_str, new_str):
    for path, subdirs, files in os.walk(fpath):
        for name in files:
            if(old_str.lower() in name.lower()):
                os.rename(os.path.join(path,name), os.path.join(path,
                                            name.lower().replace(old_str,new_str)))

我用new_str替换所有出现的old_str情况。

I have this to simply rename all files in subfolders of folder

import os

def replace(fpath, old_str, new_str):
    for path, subdirs, files in os.walk(fpath):
        for name in files:
            if(old_str.lower() in name.lower()):
                os.rename(os.path.join(path,name), os.path.join(path,
                                            name.lower().replace(old_str,new_str)))

I am replacing all occurences of old_str with any case by new_str.


回答 4

尝试:http : //www.mattweber.org/2007/03/04/python-script-renamepy/

我喜欢用某种方式命名我的音乐,电影和图片文件。当我从Internet下载文件时,它们通常不遵循我的命名约定。我发现自己手动重命名每个文件以适合我的风格。这真的很快就过去了,所以我决定编写一个程序为我做。

该程序可以将文件名转换为所有小写字母,将文件名中的字符串替换为所需内容,并从文件名的开头或后面修剪任意数量的字符。

该程序的源代码也可用。

Try: http://www.mattweber.org/2007/03/04/python-script-renamepy/

I like to have my music, movie, and picture files named a certain way. When I download files from the internet, they usually don’t follow my naming convention. I found myself manually renaming each file to fit my style. This got old realy fast, so I decided to write a program to do it for me.

This program can convert the filename to all lowercase, replace strings in the filename with whatever you want, and trim any number of characters from the front or back of the filename.

The program’s source code is also available.


回答 5

我自己编写了一个python脚本。它以存在文件的目录路径和要使用的命名模式作为参数。但是,它通过给您提供的命名模式附加一个增量数字(1、2、3等)来重命名。

import os
import sys

# checking whether path and filename are given.
if len(sys.argv) != 3:
    print "Usage : python rename.py <path> <new_name.extension>"
    sys.exit()

# splitting name and extension.
name = sys.argv[2].split('.')
if len(name) < 2:
    name.append('')
else:
    name[1] = ".%s" %name[1]

# to name starting from 1 to number_of_files.
count = 1

# creating a new folder in which the renamed files will be stored.
s = "%s/pic_folder" % sys.argv[1]
try:
    os.mkdir(s)
except OSError:
    # if pic_folder is already present, use it.
    pass

try:
    for x in os.walk(sys.argv[1]):
        for y in x[2]:
            # creating the rename pattern.
            s = "%spic_folder/%s%s%s" %(x[0], name[0], count, name[1])
            # getting the original path of the file to be renamed.
            z = os.path.join(x[0],y)
            # renaming.
            os.rename(z, s)
            # incrementing the count.
            count = count + 1
except OSError:
    pass

希望这对您有用。

I’ve written a python script on my own. It takes as arguments the path of the directory in which the files are present and the naming pattern that you want to use. However, it renames by attaching an incremental number (1, 2, 3 and so on) to the naming pattern you give.

import os
import sys

# checking whether path and filename are given.
if len(sys.argv) != 3:
    print "Usage : python rename.py <path> <new_name.extension>"
    sys.exit()

# splitting name and extension.
name = sys.argv[2].split('.')
if len(name) < 2:
    name.append('')
else:
    name[1] = ".%s" %name[1]

# to name starting from 1 to number_of_files.
count = 1

# creating a new folder in which the renamed files will be stored.
s = "%s/pic_folder" % sys.argv[1]
try:
    os.mkdir(s)
except OSError:
    # if pic_folder is already present, use it.
    pass

try:
    for x in os.walk(sys.argv[1]):
        for y in x[2]:
            # creating the rename pattern.
            s = "%spic_folder/%s%s%s" %(x[0], name[0], count, name[1])
            # getting the original path of the file to be renamed.
            z = os.path.join(x[0],y)
            # renaming.
            os.rename(z, s)
            # incrementing the count.
            count = count + 1
except OSError:
    pass

Hope this works for you.


回答 6

在您需要执行重命名的目录中。

import os
# get the file name list to nameList
nameList = os.listdir() 
#loop through the name and rename
for fileName in nameList:
    rename=fileName[15:28]
    os.rename(fileName,rename)
#example:
#input fileName bulk like :20180707131932_IMG_4304.JPG
#output renamed bulk like :IMG_4304.JPG

Be in the directory where you need to perform the renaming.

import os
# get the file name list to nameList
nameList = os.listdir() 
#loop through the name and rename
for fileName in nameList:
    rename=fileName[15:28]
    os.rename(fileName,rename)
#example:
#input fileName bulk like :20180707131932_IMG_4304.JPG
#output renamed bulk like :IMG_4304.JPG

回答 7

directoryName = "Photographs"
filePath = os.path.abspath(directoryName)
filePathWithSlash = filePath + "\\"

for counter, filename in enumerate(os.listdir(directoryName)):

    filenameWithPath = os.path.join(filePathWithSlash, filename)

    os.rename(filenameWithPath, filenameWithPath.replace(filename,"DSC_" + \
          str(counter).zfill(4) + ".jpg" ))

# e.g. filename = "photo1.jpg", directory = "c:\users\Photographs"        
# The string.replace call swaps in the new filename into 
# the current filename within the filenameWitPath string. Which    
# is then used by os.rename to rename the file in place, using the  
# current (unmodified) filenameWithPath.

# os.listdir delivers the filename(s) from the directory
# however in attempting to "rename" the file using os 
# a specific location of the file to be renamed is required.

# this code is from Windows 
directoryName = "Photographs"
filePath = os.path.abspath(directoryName)
filePathWithSlash = filePath + "\\"

for counter, filename in enumerate(os.listdir(directoryName)):

    filenameWithPath = os.path.join(filePathWithSlash, filename)

    os.rename(filenameWithPath, filenameWithPath.replace(filename,"DSC_" + \
          str(counter).zfill(4) + ".jpg" ))

# e.g. filename = "photo1.jpg", directory = "c:\users\Photographs"        
# The string.replace call swaps in the new filename into 
# the current filename within the filenameWitPath string. Which    
# is then used by os.rename to rename the file in place, using the  
# current (unmodified) filenameWithPath.

# os.listdir delivers the filename(s) from the directory
# however in attempting to "rename" the file using os 
# a specific location of the file to be renamed is required.

# this code is from Windows 

回答 8

我有一个类似的问题,但是我想在目录中所有文件的文件名的开头添加文本,并使用类似的方法。请参见下面的示例:

folder = r"R:\mystuff\GIS_Projects\Website\2017\PDF"

import os


for root, dirs, filenames in os.walk(folder):


for filename in filenames:  
    fullpath = os.path.join(root, filename)  
    filename_split = os.path.splitext(filename) # filename will be filename_split[0] and extension will be filename_split[1])
    print fullpath
    print filename_split[0]
    print filename_split[1]
    os.rename(os.path.join(root, filename), os.path.join(root, "NewText_2017_" + filename_split[0] + filename_split[1]))

I had a similar problem, but I wanted to append text to the beginning of the file name of all files in a directory and used a similar method. See example below:

folder = r"R:\mystuff\GIS_Projects\Website\2017\PDF"

import os


for root, dirs, filenames in os.walk(folder):


for filename in filenames:  
    fullpath = os.path.join(root, filename)  
    filename_split = os.path.splitext(filename) # filename will be filename_split[0] and extension will be filename_split[1])
    print fullpath
    print filename_split[0]
    print filename_split[1]
    os.rename(os.path.join(root, filename), os.path.join(root, "NewText_2017_" + filename_split[0] + filename_split[1]))

回答 9

至于我在我的目录中我有多个子目录,每个子目录有很多图像,我想将所有子目录图像更改为1.jpg〜n.jpg

def batch_rename():
    base_dir = 'F:/ad_samples/test_samples/'
    sub_dir_list = glob.glob(base_dir + '*')
    # print sub_dir_list # like that ['F:/dir1', 'F:/dir2']
    for dir_item in sub_dir_list:
        files = glob.glob(dir_item + '/*.jpg')
        i = 0
        for f in files:
            os.rename(f, os.path.join(dir_item, str(i) + '.jpg'))
            i += 1

(我自己的答案)https://stackoverflow.com/a/45734381/6329006

as to me in my directory I have multiple subdir, each subdir has lots of images I want to change all the subdir images to 1.jpg ~ n.jpg

def batch_rename():
    base_dir = 'F:/ad_samples/test_samples/'
    sub_dir_list = glob.glob(base_dir + '*')
    # print sub_dir_list # like that ['F:/dir1', 'F:/dir2']
    for dir_item in sub_dir_list:
        files = glob.glob(dir_item + '/*.jpg')
        i = 0
        for f in files:
            os.rename(f, os.path.join(dir_item, str(i) + '.jpg'))
            i += 1

(mys own answer)https://stackoverflow.com/a/45734381/6329006


回答 10

#  another regex version
#  usage example:
#  replacing an underscore in the filename with today's date
#  rename_files('..\\output', '(.*)(_)(.*\.CSV)', '\g<1>_20180402_\g<3>')
def rename_files(path, pattern, replacement):
    for filename in os.listdir(path):
        if re.search(pattern, filename):
            new_filename = re.sub(pattern, replacement, filename)
            new_fullname = os.path.join(path, new_filename)
            old_fullname = os.path.join(path, filename)
            os.rename(old_fullname, new_fullname)
            print('Renamed: ' + old_fullname + ' to ' + new_fullname
#  another regex version
#  usage example:
#  replacing an underscore in the filename with today's date
#  rename_files('..\\output', '(.*)(_)(.*\.CSV)', '\g<1>_20180402_\g<3>')
def rename_files(path, pattern, replacement):
    for filename in os.listdir(path):
        if re.search(pattern, filename):
            new_filename = re.sub(pattern, replacement, filename)
            new_fullname = os.path.join(path, new_filename)
            old_fullname = os.path.join(path, filename)
            os.rename(old_fullname, new_fullname)
            print('Renamed: ' + old_fullname + ' to ' + new_fullname

回答 11

如果要在编辑器(例如vim)中修改文件名,单击库将随命令一起提供,该命令click.edit()可用于接收来自编辑器的用户输入。这是如何使用它来重构目录中文件的示例。

import click
from pathlib import Path

# current directory
direc_to_refactor = Path(".")

# list of old file paths
old_paths = list(direc_to_refactor.iterdir())

# list of old file names
old_names = [str(p.name) for p in old_paths]

# modify old file names in an editor,
# and store them in a list of new file names
new_names = click.edit("\n".join(old_names)).split("\n")

# refactor the old file names
for i in range(len(old_paths)):
    old_paths[i].replace(direc_to_refactor / new_names[i])

我编写了一个使用相同技术的命令行应用程序,但它减少了此脚本的易变性,并提供了更多选项,例如递归重构。这是github页面的链接。如果您喜欢命令行应用程序,并且对文件名进行一些快速编辑,这很有用。(我的应用程序是类似于中发现的“bulkrename”命令游侠)。

If you would like to modify file names in an editor (such as vim), the click library comes with the command click.edit(), which can be used to receive user input from an editor. Here is an example of how it can be used to refactor files in a directory.

import click
from pathlib import Path

# current directory
direc_to_refactor = Path(".")

# list of old file paths
old_paths = list(direc_to_refactor.iterdir())

# list of old file names
old_names = [str(p.name) for p in old_paths]

# modify old file names in an editor,
# and store them in a list of new file names
new_names = click.edit("\n".join(old_names)).split("\n")

# refactor the old file names
for i in range(len(old_paths)):
    old_paths[i].replace(direc_to_refactor / new_names[i])

I wrote a command line application that uses the same technique, but that reduces the volatility of this script, and comes with more options, such as recursive refactoring. Here is the link to the github page. This is useful if you like command line applications, and are interested in making some quick edits to file names. (My application is similar to the “bulkrename” command found in ranger).


回答 12

该代码将起作用

该函数正好使用两个参数f_patth作为重命名文件的路径,并使用new_name作为文件的新名称。

import glob2
import os


def rename(f_path, new_name):
    filelist = glob2.glob(f_path + "*.ma")
    count = 0
    for file in filelist:
        print("File Count : ", count)
        filename = os.path.split(file)
        print(filename)
        new_filename = f_path + new_name + str(count + 1) + ".ma"
        os.rename(f_path+filename[1], new_filename)
        print(new_filename)
        count = count + 1

This code will work

The function exactly takes two arguments f_patth as your path to rename file and new_name as your new name to the file.

import glob2
import os


def rename(f_path, new_name):
    filelist = glob2.glob(f_path + "*.ma")
    count = 0
    for file in filelist:
        print("File Count : ", count)
        filename = os.path.split(file)
        print(filename)
        new_filename = f_path + new_name + str(count + 1) + ".ma"
        os.rename(f_path+filename[1], new_filename)
        print(new_filename)
        count = count + 1

使用Django / South重命名模型的最简单方法?

问题:使用Django / South重命名模型的最简单方法?

我一直在South的网站,Google和SO上寻找答案,但是找不到简单的方法来做到这一点。

我想使用South重命名Django模型。说您有以下几点:

class Foo(models.Model):
    name = models.CharField()

class FooTwo(models.Model):
    name = models.CharField()
    foo = models.ForeignKey(Foo)

并且您想要将Foo转换为Bar,即

class Bar(models.Model):
    name = models.CharField()

class FooTwo(models.Model):
    name = models.CharField()
    foo = models.ForeignKey(Bar)

为简单起见,我只是尝试将名称从更改FooBar,但现在忽略其中的foo成员FooTwo

使用South进行此操作最简单的方法是什么?

  1. 我可能可以进行数据迁移,但这似乎很复杂。
  2. 编写一个自定义迁移,例如db.rename_table('city_citystate', 'geo_citystate'),但是在这种情况下我不确定如何修复外键。
  3. 您知道一种更简单的方法吗?

I’ve been hunting for an answer to this on South’s site, Google, and SO, but couldn’t find a simple way to do this.

I want to rename a Django model using South. Say you have the following:

class Foo(models.Model):
    name = models.CharField()

class FooTwo(models.Model):
    name = models.CharField()
    foo = models.ForeignKey(Foo)

and you want to convert Foo to Bar, namely

class Bar(models.Model):
    name = models.CharField()

class FooTwo(models.Model):
    name = models.CharField()
    foo = models.ForeignKey(Bar)

To keep it simple, I’m just trying to change the name from Foo to Bar, but ignore the foo member in FooTwo for now.

What’s the easiest way to do this using South?

  1. I could probably do a data migration, but that seems pretty involved.
  2. Write a custom migration, e.g. db.rename_table('city_citystate', 'geo_citystate'), but I’m not sure how to fix the foreign key in this case.
  3. An easier way that you know?

回答 0

为了回答您的第一个问题,简单的模型/表重命名非常简单。运行命令:

./manage.py schemamigration yourapp rename_foo_to_bar --empty

(更新2:尝试--auto,而不是--empty避免低于警告感谢@KFB的提示。)

如果您使用的是南方的旧版本,则需要startmigration而不是schemamigration

然后手动编辑迁移文件,如下所示:

class Migration(SchemaMigration):

    def forwards(self, orm):
        db.rename_table('yourapp_foo', 'yourapp_bar')


    def backwards(self, orm):
        db.rename_table('yourapp_bar','yourapp_foo')   

您可以使用db_table模型类中的Meta选项来更简单地完成此操作。但是每次这样做,都增加了代码库的旧版权重-类名与表名不同会使代码难以理解和维护。为了清楚起见,我完全支持进行这样的简单重构。

(更新)我刚刚在生产环境中尝试过此操作,并在应用迁移时收到一个奇怪的警告。它说:

The following content types are stale and need to be deleted:

    yourapp | foo

Any objects related to these content types by a foreign key will also
be deleted. Are you sure you want to delete these content types?
If you're unsure, answer 'no'.

我回答“不”,一切似乎都很好。

To answer your first question, the simple model/table rename is pretty straightforward. Run the command:

./manage.py schemamigration yourapp rename_foo_to_bar --empty

(Update 2: try --auto instead of --empty to avoid the warning below. Thanks to @KFB for the tip.)

If you’re using an older version of south, you’ll need startmigration instead of schemamigration.

Then manually edit the migration file to look like this:

class Migration(SchemaMigration):

    def forwards(self, orm):
        db.rename_table('yourapp_foo', 'yourapp_bar')


    def backwards(self, orm):
        db.rename_table('yourapp_bar','yourapp_foo')   

You can accomplish this more simply using the db_table Meta option in your model class. But every time you do that, you increase the legacy weight of your codebase — having class names differ from table names makes your code harder to understand and maintain. I fully support doing simple refactorings like this for the sake of clarity.

(update) I just tried this in production, and got a strange warning when I went to apply the migration. It said:

The following content types are stale and need to be deleted:

    yourapp | foo

Any objects related to these content types by a foreign key will also
be deleted. Are you sure you want to delete these content types?
If you're unsure, answer 'no'.

I answered “no” and everything seemed to be fine.


回答 1

进行更改models.py,然后运行

./manage.py schemamigration --auto myapp

检查迁移文件时,您会看到它删除了一个表并创建了一个新表。

class Migration(SchemaMigration):

    def forwards(self, orm):
        # Deleting model 'Foo'                                                                                                                      
        db.delete_table('myapp_foo')

        # Adding model 'Bar'                                                                                                                        
        db.create_table('myapp_bar', (
        ...
        ))
        db.send_create_signal('myapp', ['Bar'])

    def backwards(self, orm):
        ...

这不是您想要的。而是编辑迁移,使其看起来像:

class Migration(SchemaMigration):

    def forwards(self, orm):
        # Renaming model from 'Foo' to 'Bar'                                                                                                                      
        db.rename_table('myapp_foo', 'myapp_bar')                                                                                                                        
        if not db.dry_run:
            orm['contenttypes.contenttype'].objects.filter(
                app_label='myapp', model='foo').update(model='bar')

    def backwards(self, orm):
        # Renaming model from 'Bar' to 'Foo'                                                                                                                      
        db.rename_table('myapp_bar', 'myapp_foo')                                                                                                                        
        if not db.dry_run:
            orm['contenttypes.contenttype'].objects.filter(app_label='myapp', model='bar').update(model='foo')

在没有该update语句的情况下,该db.send_create_signal调用将ContentType使用新的模型名称创建一个新的模型。但它最好只updateContentType你已经拥有的情况下有数据库对象指向它(例如,通过一GenericForeignKey)。

另外,如果您已经重命名了某些列,这些列是重命名模型的外键,请不要忘记

db.rename_column(myapp_model, foo_id, bar_id)

Make the changes in models.py and then run

./manage.py schemamigration --auto myapp

When you inspect the migration file, you’ll see that it deletes a table and creates a new one

class Migration(SchemaMigration):

    def forwards(self, orm):
        # Deleting model 'Foo'                                                                                                                      
        db.delete_table('myapp_foo')

        # Adding model 'Bar'                                                                                                                        
        db.create_table('myapp_bar', (
        ...
        ))
        db.send_create_signal('myapp', ['Bar'])

    def backwards(self, orm):
        ...

This is not quite what you want. Instead, edit the migration so that it looks like:

class Migration(SchemaMigration):

    def forwards(self, orm):
        # Renaming model from 'Foo' to 'Bar'                                                                                                                      
        db.rename_table('myapp_foo', 'myapp_bar')                                                                                                                        
        if not db.dry_run:
            orm['contenttypes.contenttype'].objects.filter(
                app_label='myapp', model='foo').update(model='bar')

    def backwards(self, orm):
        # Renaming model from 'Bar' to 'Foo'                                                                                                                      
        db.rename_table('myapp_bar', 'myapp_foo')                                                                                                                        
        if not db.dry_run:
            orm['contenttypes.contenttype'].objects.filter(app_label='myapp', model='bar').update(model='foo')

In the absence of the update statement, the db.send_create_signal call will create a new ContentType with the new model name. But it’s better to just update the ContentType you already have in case there are database objects pointing to it (e.g., via a GenericForeignKey).

Also, if you’ve renamed some columns which are foreign keys to the renamed model, don’t forget to

db.rename_column(myapp_model, foo_id, bar_id)

回答 2

南方本身不能做-怎么知道这Bar代表Foo过去?我将为此编写自定义迁移。您可以ForeignKey像上面所做的那样更改in代码,然后只是重命名适当的字段和表的一种情况,您可以根据需要进行任何操作。

最后,您真的需要这样做吗?我还不需要重命名模型-模型名称只是实现细节-特别是考虑到verbose_nameMeta选项的可用性。

South can’t do it itself – how does it know that Bar represents what Foo used to? This is the sort of thing I’d write a custom migration for. You can change your ForeignKey in code as you’ve done above, and then it’s just a case of renaming the appropriate fields and tables, which you can do any way you want.

Finally, do you really need to do this? I’ve yet to need to rename models – model names are just an implementation detail – particularly given the availability of the verbose_name Meta option.


回答 3

我遵循了上面Leopd的解决方案。但是,这并没有更改型号名称。我在代码中手动更改了它(在相关模型中也将其称为FK)。并进行了另一个南迁,但带有–fake选项。这使得模型名称和表名称相同。

刚意识到,可以先从更改模型名称开始,然后在应用迁移文件之前编辑迁移文件。干净得多。

I followed Leopd’s solution above. But, that did not change the model names. I changed it manually in the code (also in related models where this is referred as FK). And done another south migration, but with –fake option. This makes model names and table names to be same.

Just realized, one could first start with changing model names, then edit the migrations file before applying them. Much cleaner.


重命名熊猫中的特定列

问题:重命名熊猫中的特定列

我有一个名为的数据框data。如何重命名唯一的一列标题?例如gdplog(gdp)

data =
    y  gdp  cap
0   1    2    5
1   2    3    9
2   8    7    2
3   3    4    7
4   6    7    7
5   4    8    3
6   8    2    8
7   9    9   10
8   6    6    4
9  10   10    7

I’ve got a dataframe called data. How would I rename the only one column header? For example gdp to log(gdp)?

data =
    y  gdp  cap
0   1    2    5
1   2    3    9
2   8    7    2
3   3    4    7
4   6    7    7
5   4    8    3
6   8    2    8
7   9    9   10
8   6    6    4
9  10   10    7

回答 0

data.rename(columns={'gdp':'log(gdp)'}, inplace=True)

rename它接受一个字典作为一个PARAM演出columns,所以你只是传递一个字典一次入境。

另请参阅相关

data.rename(columns={'gdp':'log(gdp)'}, inplace=True)

The rename show that it accepts a dict as a param for columns so you just pass a dict with a single entry.

Also see related


回答 1

list-comprehension如果您需要重命名单个列,则将使用更快的实现。

df.columns = ['log(gdp)' if x=='gdp' else x for x in df.columns]

如果需要重命名多个列,请使用以下条件表达式:

df.columns = ['log(gdp)' if x=='gdp' else 'cap_mod' if x=='cap' else x for x in df.columns]

或者,使用a构造映射dictionary并通过将默认值设置为旧名称来list-comprehension对其执行get操作:

col_dict = {'gdp': 'log(gdp)', 'cap': 'cap_mod'}   ## key→old name, value→new name

df.columns = [col_dict.get(x, x) for x in df.columns]

时间:

%%timeit
df.rename(columns={'gdp':'log(gdp)'}, inplace=True)
10000 loops, best of 3: 168 µs per loop

%%timeit
df.columns = ['log(gdp)' if x=='gdp' else x for x in df.columns]
10000 loops, best of 3: 58.5 µs per loop

A much faster implementation would be to use list-comprehension if you need to rename a single column.

df.columns = ['log(gdp)' if x=='gdp' else x for x in df.columns]

If the need arises to rename multiple columns, either use conditional expressions like:

df.columns = ['log(gdp)' if x=='gdp' else 'cap_mod' if x=='cap' else x for x in df.columns]

Or, construct a mapping using a dictionary and perform the list-comprehension with it’s get operation by setting default value as the old name:

col_dict = {'gdp': 'log(gdp)', 'cap': 'cap_mod'}   ## key→old name, value→new name

df.columns = [col_dict.get(x, x) for x in df.columns]

Timings:

%%timeit
df.rename(columns={'gdp':'log(gdp)'}, inplace=True)
10000 loops, best of 3: 168 µs per loop

%%timeit
df.columns = ['log(gdp)' if x=='gdp' else x for x in df.columns]
10000 loops, best of 3: 58.5 µs per loop

回答 2

如何重命名熊猫中的特定列?

从v0.24 +起,要一次重命名一列(或多列),

如果您需要一次重命名所有列,

  • DataFrame.set_axis()的方法axis=1。传递类似列表的序列。选项也可用于就地修改。

renameaxis=1

df = pd.DataFrame('x', columns=['y', 'gdp', 'cap'], index=range(5))
df

   y gdp cap
0  x   x   x
1  x   x   x
2  x   x   x
3  x   x   x
4  x   x   x

使用0.21+,您现在可以使用来指定axis参数rename

df.rename({'gdp':'log(gdp)'}, axis=1)
# df.rename({'gdp':'log(gdp)'}, axis='columns')
    
   y log(gdp) cap
0  x        x   x
1  x        x   x
2  x        x   x
3  x        x   x
4  x        x   x

(请注意,rename默认情况下它不是就地的,因此您需要将结果分配回去。)

进行此添加是为了提高与其他API的一致性。新axis参数类似于该columns参数,它们执行相同的操作。

df.rename(columns={'gdp': 'log(gdp)'})

   y log(gdp) cap
0  x        x   x
1  x        x   x
2  x        x   x
3  x        x   x
4  x        x   x

rename 还接受为每个列调用一次的回调。

df.rename(lambda x: x[0], axis=1)
# df.rename(lambda x: x[0], axis='columns')

   y  g  c
0  x  x  x
1  x  x  x
2  x  x  x
3  x  x  x
4  x  x  x

对于这种特定情况,您可能要使用

df.rename(lambda x: 'log(gdp)' if x == 'gdp' else x, axis=1)

Index.str.replace

replacepython中的字符串方法类似,pandas Index和Series(仅对象dtype)定义了一种(“矢量化”)str.replace方法,用于基于字符串和正则表达式的替换。

df.columns = df.columns.str.replace('gdp', 'log(gdp)')
df
 
   y log(gdp) cap
0  x        x   x
1  x        x   x
2  x        x   x
3  x        x   x
4  x        x   x

与其他方法相比,此方法的优点是str.replace支持正则表达式(默认情况下启用)。有关更多信息,请参阅文档。


传递一个列表,set_axisaxis=1

set_axis用标题列表进行调用。该列表的长度必须等于列/索引的大小。set_axis默认情况下会更改原始DataFrame,但您可以指定inplace=False返回修改后的副本。

df.set_axis(['cap', 'log(gdp)', 'y'], axis=1, inplace=False)
# df.set_axis(['cap', 'log(gdp)', 'y'], axis='columns', inplace=False)

  cap log(gdp)  y
0   x        x  x
1   x        x  x
2   x        x  x
3   x        x  x
4   x        x  x

注意:在将来的版本中,inplace默认为True

方法链接
为什么选择set_axis已经有一种有效的方式分配列的方式df.columns = ...?如Ted Petrou在[此答案]中所示,(https://stackoverflow.com/a/46912050/4909087set_axis在尝试链接方法时很有用。

比较

# new for pandas 0.21+
df.some_method1()
  .some_method2()
  .set_axis()
  .some_method3()

# old way
df1 = df.some_method1()
        .some_method2()
df1.columns = columns
df1.some_method3()

前者是更自然和自由流动的语法。

How do I rename a specific column in pandas?

From v0.24+, to rename one (or more) columns at a time,

If you need to rename ALL columns at once,

  • DataFrame.set_axis() method with axis=1. Pass a list-like sequence. Options are available for in-place modification as well.

rename with axis=1

df = pd.DataFrame('x', columns=['y', 'gdp', 'cap'], index=range(5))
df

   y gdp cap
0  x   x   x
1  x   x   x
2  x   x   x
3  x   x   x
4  x   x   x

With 0.21+, you can now specify an axis parameter with rename:

df.rename({'gdp':'log(gdp)'}, axis=1)
# df.rename({'gdp':'log(gdp)'}, axis='columns')
    
   y log(gdp) cap
0  x        x   x
1  x        x   x
2  x        x   x
3  x        x   x
4  x        x   x

(Note that rename is not in-place by default, so you will need to assign the result back.)

This addition has been made to improve consistency with the rest of the API. The new axis argument is analogous to the columns parameter—they do the same thing.

df.rename(columns={'gdp': 'log(gdp)'})

   y log(gdp) cap
0  x        x   x
1  x        x   x
2  x        x   x
3  x        x   x
4  x        x   x

rename also accepts a callback that is called once for each column.

df.rename(lambda x: x[0], axis=1)
# df.rename(lambda x: x[0], axis='columns')

   y  g  c
0  x  x  x
1  x  x  x
2  x  x  x
3  x  x  x
4  x  x  x

For this specific scenario, you would want to use

df.rename(lambda x: 'log(gdp)' if x == 'gdp' else x, axis=1)

Index.str.replace

Similar to replace method of strings in python, pandas Index and Series (object dtype only) define a (“vectorized”) str.replace method for string and regex-based replacement.

df.columns = df.columns.str.replace('gdp', 'log(gdp)')
df
 
   y log(gdp) cap
0  x        x   x
1  x        x   x
2  x        x   x
3  x        x   x
4  x        x   x

The advantage of this over the other methods is that str.replace supports regex (enabled by default). See the docs for more information.


Passing a list to set_axis with axis=1

Call set_axis with a list of header(s). The list must be equal in length to the columns/index size. set_axis mutates the original DataFrame by default, but you can specify inplace=False to return a modified copy.

df.set_axis(['cap', 'log(gdp)', 'y'], axis=1, inplace=False)
# df.set_axis(['cap', 'log(gdp)', 'y'], axis='columns', inplace=False)

  cap log(gdp)  y
0   x        x  x
1   x        x  x
2   x        x  x
3   x        x  x
4   x        x  x

Note: In future releases, inplace will default to True.

Method Chaining
Why choose set_axis when we already have an efficient way of assigning columns with df.columns = ...? As shown by Ted Petrou in [this answer],(https://stackoverflow.com/a/46912050/4909087) set_axis is useful when trying to chain methods.

Compare

# new for pandas 0.21+
df.some_method1()
  .some_method2()
  .set_axis()
  .some_method3()

Versus

# old way
df1 = df.some_method1()
        .some_method2()
df1.columns = columns
df1.some_method3()

The former is more natural and free flowing syntax.


回答 3

至少有五种不同的方法来重命名熊猫中的特定列,我在下面列出了它们以及原始答案的链接。我还对这些方法进行了计时,发现它们执行的效果大致相同(尽管YMMV取决于您的数据集和方案)。下面的试验情况下是列重命名A M N ZA2 M2 N2 Z2在一个数据帧的列AZ含有一百万行。

# Import required modules
import numpy as np
import pandas as pd
import timeit

# Create sample data
df = pd.DataFrame(np.random.randint(0,9999,size=(1000000, 26)), columns=list('ABCDEFGHIJKLMNOPQRSTUVWXYZ'))

# Standard way - https://stackoverflow.com/a/19758398/452587
def method_1():
    df_renamed = df.rename(columns={'A': 'A2', 'M': 'M2', 'N': 'N2', 'Z': 'Z2'})

# Lambda function - https://stackoverflow.com/a/16770353/452587
def method_2():
    df_renamed = df.rename(columns=lambda x: x + '2' if x in ['A', 'M', 'N', 'Z'] else x)

# Mapping function - https://stackoverflow.com/a/19758398/452587
def rename_some(x):
    if x=='A' or x=='M' or x=='N' or x=='Z':
        return x + '2'
    return x
def method_3():
    df_renamed = df.rename(columns=rename_some)

# Dictionary comprehension - https://stackoverflow.com/a/58143182/452587
def method_4():
    df_renamed = df.rename(columns={col: col + '2' for col in df.columns[
        np.asarray([i for i, col in enumerate(df.columns) if 'A' in col or 'M' in col or 'N' in col or 'Z' in col])
    ]})

# Dictionary comprehension - https://stackoverflow.com/a/38101084/452587
def method_5():
    df_renamed = df.rename(columns=dict(zip(df[['A', 'M', 'N', 'Z']], ['A2', 'M2', 'N2', 'Z2'])))

print('Method 1:', timeit.timeit(method_1, number=10))
print('Method 2:', timeit.timeit(method_2, number=10))
print('Method 3:', timeit.timeit(method_3, number=10))
print('Method 4:', timeit.timeit(method_4, number=10))
print('Method 5:', timeit.timeit(method_5, number=10))

输出:

Method 1: 3.650640267
Method 2: 3.163998427
Method 3: 2.998530871
Method 4: 2.9918436889999995
Method 5: 3.2436501520000007

使用对您来说最直观,最容易在应用程序中实现的方法。

There are at least five different ways to rename specific columns in pandas, and I have listed them below along with links to the original answers. I also timed these methods and found them to perform about the same (though YMMV depending on your data set and scenario). The test case below is to rename columns A M N Z to A2 M2 N2 Z2 in a dataframe with columns A to Z containing a million rows.

# Import required modules
import numpy as np
import pandas as pd
import timeit

# Create sample data
df = pd.DataFrame(np.random.randint(0,9999,size=(1000000, 26)), columns=list('ABCDEFGHIJKLMNOPQRSTUVWXYZ'))

# Standard way - https://stackoverflow.com/a/19758398/452587
def method_1():
    df_renamed = df.rename(columns={'A': 'A2', 'M': 'M2', 'N': 'N2', 'Z': 'Z2'})

# Lambda function - https://stackoverflow.com/a/16770353/452587
def method_2():
    df_renamed = df.rename(columns=lambda x: x + '2' if x in ['A', 'M', 'N', 'Z'] else x)

# Mapping function - https://stackoverflow.com/a/19758398/452587
def rename_some(x):
    if x=='A' or x=='M' or x=='N' or x=='Z':
        return x + '2'
    return x
def method_3():
    df_renamed = df.rename(columns=rename_some)

# Dictionary comprehension - https://stackoverflow.com/a/58143182/452587
def method_4():
    df_renamed = df.rename(columns={col: col + '2' for col in df.columns[
        np.asarray([i for i, col in enumerate(df.columns) if 'A' in col or 'M' in col or 'N' in col or 'Z' in col])
    ]})

# Dictionary comprehension - https://stackoverflow.com/a/38101084/452587
def method_5():
    df_renamed = df.rename(columns=dict(zip(df[['A', 'M', 'N', 'Z']], ['A2', 'M2', 'N2', 'Z2'])))

print('Method 1:', timeit.timeit(method_1, number=10))
print('Method 2:', timeit.timeit(method_2, number=10))
print('Method 3:', timeit.timeit(method_3, number=10))
print('Method 4:', timeit.timeit(method_4, number=10))
print('Method 5:', timeit.timeit(method_5, number=10))

Output:

Method 1: 3.650640267
Method 2: 3.163998427
Method 3: 2.998530871
Method 4: 2.9918436889999995
Method 5: 3.2436501520000007

Use the method that is most intuitive to you and easiest for you to implement in your application.


重命名熊猫列

问题:重命名熊猫列

我有一个使用熊猫和列标签的DataFrame,我需要对其进行编辑以替换原始列标签。

我想A在原始列名称为的DataFrame 中更改列名称:

['$a', '$b', '$c', '$d', '$e'] 

['a', 'b', 'c', 'd', 'e'].

我已经将编辑后的列名存储在列表中,但是我不知道如何替换列名。

I have a DataFrame using pandas and column labels that I need to edit to replace the original column labels.

I’d like to change the column names in a DataFrame A where the original column names are:

['$a', '$b', '$c', '$d', '$e'] 

to

['a', 'b', 'c', 'd', 'e'].

I have the edited column names stored it in a list, but I don’t know how to replace the column names.


回答 0

只需将其分配给.columns属性:

>>> df = pd.DataFrame({'$a':[1,2], '$b': [10,20]})
>>> df.columns = ['a', 'b']
>>> df
   a   b
0  1  10
1  2  20

Just assign it to the .columns attribute:

>>> df = pd.DataFrame({'$a':[1,2], '$b': [10,20]})
>>> df.columns = ['a', 'b']
>>> df
   a   b
0  1  10
1  2  20

回答 1

重命名特定列

使用该df.rename()函数并引用要重命名的列。并非所有列都必须重命名:

df = df.rename(columns={'oldName1': 'newName1', 'oldName2': 'newName2'})
# Or rename the existing DataFrame (rather than creating a copy) 
df.rename(columns={'oldName1': 'newName1', 'oldName2': 'newName2'}, inplace=True)

最小代码示例

df = pd.DataFrame('x', index=range(3), columns=list('abcde'))
df

   a  b  c  d  e
0  x  x  x  x  x
1  x  x  x  x  x
2  x  x  x  x  x

下列方法均起作用并产生相同的输出:

df2 = df.rename({'a': 'X', 'b': 'Y'}, axis=1)  # new method
df2 = df.rename({'a': 'X', 'b': 'Y'}, axis='columns')
df2 = df.rename(columns={'a': 'X', 'b': 'Y'})  # old method  

df2

   X  Y  c  d  e
0  x  x  x  x  x
1  x  x  x  x  x
2  x  x  x  x  x

切记将结果分配回去,因为修改未就位。或者,指定inplace=True

df.rename({'a': 'X', 'b': 'Y'}, axis=1, inplace=True)
df

   X  Y  c  d  e
0  x  x  x  x  x
1  x  x  x  x  x
2  x  x  x  x  x

从v0.25版开始,如果指定errors='raise'了无效的“要重命名的列” ,您还可以指定引发错误。参见v0.25 rename()文档


REASSIGN列标题

df.set_axis()axis=1inplace=False一起使用(返回副本)。

df2 = df.set_axis(['V', 'W', 'X', 'Y', 'Z'], axis=1, inplace=False)
df2

   V  W  X  Y  Z
0  x  x  x  x  x
1  x  x  x  x  x
2  x  x  x  x  x

这将返回一个副本,但是您可以通过设置来就地修改DataFrame inplace=True(这是版本<= 0.24的默认行为,但将来可能会更改)。

您还可以直接分配标题:

df.columns = ['V', 'W', 'X', 'Y', 'Z']
df

   V  W  X  Y  Z
0  x  x  x  x  x
1  x  x  x  x  x
2  x  x  x  x  x

RENAME SPECIFIC COLUMNS

Use the df.rename() function and refer the columns to be renamed. Not all the columns have to be renamed:

df = df.rename(columns={'oldName1': 'newName1', 'oldName2': 'newName2'})
# Or rename the existing DataFrame (rather than creating a copy) 
df.rename(columns={'oldName1': 'newName1', 'oldName2': 'newName2'}, inplace=True)

Minimal Code Example

df = pd.DataFrame('x', index=range(3), columns=list('abcde'))
df

   a  b  c  d  e
0  x  x  x  x  x
1  x  x  x  x  x
2  x  x  x  x  x

The following methods all work and produce the same output:

df2 = df.rename({'a': 'X', 'b': 'Y'}, axis=1)  # new method
df2 = df.rename({'a': 'X', 'b': 'Y'}, axis='columns')
df2 = df.rename(columns={'a': 'X', 'b': 'Y'})  # old method  

df2

   X  Y  c  d  e
0  x  x  x  x  x
1  x  x  x  x  x
2  x  x  x  x  x

Remember to assign the result back, as the modification is not-inplace. Alternatively, specify inplace=True:

df.rename({'a': 'X', 'b': 'Y'}, axis=1, inplace=True)
df

   X  Y  c  d  e
0  x  x  x  x  x
1  x  x  x  x  x
2  x  x  x  x  x

From v0.25, you can also specify errors='raise' to raise errors if an invalid column-to-rename is specified. See v0.25 rename() docs.


REASSIGN COLUMN HEADERS

Use df.set_axis() with axis=1 and inplace=False (to return a copy).

df2 = df.set_axis(['V', 'W', 'X', 'Y', 'Z'], axis=1, inplace=False)
df2

   V  W  X  Y  Z
0  x  x  x  x  x
1  x  x  x  x  x
2  x  x  x  x  x

This returns a copy, but you can modify the DataFrame in-place by setting inplace=True (this is the default behaviour for versions <=0.24 but is likely to change in the future).

You can also assign headers directly:

df.columns = ['V', 'W', 'X', 'Y', 'Z']
df

   V  W  X  Y  Z
0  x  x  x  x  x
1  x  x  x  x  x
2  x  x  x  x  x

回答 2

rename方法可以带有一个函数,例如:

In [11]: df.columns
Out[11]: Index([u'$a', u'$b', u'$c', u'$d', u'$e'], dtype=object)

In [12]: df.rename(columns=lambda x: x[1:], inplace=True)

In [13]: df.columns
Out[13]: Index([u'a', u'b', u'c', u'd', u'e'], dtype=object)

The rename method can take a function, for example:

In [11]: df.columns
Out[11]: Index([u'$a', u'$b', u'$c', u'$d', u'$e'], dtype=object)

In [12]: df.rename(columns=lambda x: x[1:], inplace=True)

In [13]: df.columns
Out[13]: Index([u'a', u'b', u'c', u'd', u'e'], dtype=object)

回答 3

使用文本数据中所述

df.columns = df.columns.str.replace('$','')

As documented in Working with text data:

df.columns = df.columns.str.replace('$','')

回答 4

熊猫0.21+答案

0.21版中对列重命名进行了一些重大更新。

  • rename方法添加了axis可以设置为columns或的参数1。此更新使该方法与其他pandas API匹配。它仍然具有indexcolumns参数,但是您不再被迫使用它们。
  • set_axis方法inplace设置为False可以使所有的索引或列标签与命名列表。

熊猫的例子0.21+

构造样本DataFrame:

df = pd.DataFrame({'$a':[1,2], '$b': [3,4], 
                   '$c':[5,6], '$d':[7,8], 
                   '$e':[9,10]})

   $a  $b  $c  $d  $e
0   1   3   5   7   9
1   2   4   6   8  10

renameaxis='columns'或一起使用axis=1

df.rename({'$a':'a', '$b':'b', '$c':'c', '$d':'d', '$e':'e'}, axis='columns')

要么

df.rename({'$a':'a', '$b':'b', '$c':'c', '$d':'d', '$e':'e'}, axis=1)

两者都导致以下结果:

   a  b  c  d   e
0  1  3  5  7   9
1  2  4  6  8  10

仍然可以使用旧的方法签名:

df.rename(columns={'$a':'a', '$b':'b', '$c':'c', '$d':'d', '$e':'e'})

rename函数还接受将应用于每个列名称的函数。

df.rename(lambda x: x[1:], axis='columns')

要么

df.rename(lambda x: x[1:], axis=1)

set_axis与列表一起使用inplace=False

您可以为该set_axis方法提供一个列表,该列表的长度等于列(或索引)的数量。当前,inplace默认值为True,但在将来的版本inplace中将默认为False

df.set_axis(['a', 'b', 'c', 'd', 'e'], axis='columns', inplace=False)

要么

df.set_axis(['a', 'b', 'c', 'd', 'e'], axis=1, inplace=False)

为什么不使用df.columns = ['a', 'b', 'c', 'd', 'e']

像这样直接分配列没有错。这是一个完美的解决方案。

using的优点set_axis是它可以用作方法链的一部分,并返回DataFrame的新副本。没有它,您将不得不在重新分配列之前将链的中间步骤存储到另一个变量。

# new for pandas 0.21+
df.some_method1()
  .some_method2()
  .set_axis()
  .some_method3()

# old way
df1 = df.some_method1()
        .some_method2()
df1.columns = columns
df1.some_method3()

Pandas 0.21+ Answer

There have been some significant updates to column renaming in version 0.21.

  • The rename method has added the axis parameter which may be set to columns or 1. This update makes this method match the rest of the pandas API. It still has the index and columns parameters but you are no longer forced to use them.
  • The set_axis method with the inplace set to False enables you to rename all the index or column labels with a list.

Examples for Pandas 0.21+

Construct sample DataFrame:

df = pd.DataFrame({'$a':[1,2], '$b': [3,4], 
                   '$c':[5,6], '$d':[7,8], 
                   '$e':[9,10]})

   $a  $b  $c  $d  $e
0   1   3   5   7   9
1   2   4   6   8  10

Using rename with axis='columns' or axis=1

df.rename({'$a':'a', '$b':'b', '$c':'c', '$d':'d', '$e':'e'}, axis='columns')

or

df.rename({'$a':'a', '$b':'b', '$c':'c', '$d':'d', '$e':'e'}, axis=1)

Both result in the following:

   a  b  c  d   e
0  1  3  5  7   9
1  2  4  6  8  10

It is still possible to use the old method signature:

df.rename(columns={'$a':'a', '$b':'b', '$c':'c', '$d':'d', '$e':'e'})

The rename function also accepts functions that will be applied to each column name.

df.rename(lambda x: x[1:], axis='columns')

or

df.rename(lambda x: x[1:], axis=1)

Using set_axis with a list and inplace=False

You can supply a list to the set_axis method that is equal in length to the number of columns (or index). Currently, inplace defaults to True, but inplace will be defaulted to False in future releases.

df.set_axis(['a', 'b', 'c', 'd', 'e'], axis='columns', inplace=False)

or

df.set_axis(['a', 'b', 'c', 'd', 'e'], axis=1, inplace=False)

Why not use df.columns = ['a', 'b', 'c', 'd', 'e']?

There is nothing wrong with assigning columns directly like this. It is a perfectly good solution.

The advantage of using set_axis is that it can be used as part of a method chain and that it returns a new copy of the DataFrame. Without it, you would have to store your intermediate steps of the chain to another variable before reassigning the columns.

# new for pandas 0.21+
df.some_method1()
  .some_method2()
  .set_axis()
  .some_method3()

# old way
df1 = df.some_method1()
        .some_method2()
df1.columns = columns
df1.some_method3()

回答 5

由于只想删除所有列名中的$符号,因此可以执行以下操作:

df = df.rename(columns=lambda x: x.replace('$', ''))

要么

df.rename(columns=lambda x: x.replace('$', ''), inplace=True)

Since you only want to remove the $ sign in all column names, you could just do:

df = df.rename(columns=lambda x: x.replace('$', ''))

OR

df.rename(columns=lambda x: x.replace('$', ''), inplace=True)

回答 6

df.columns = ['a', 'b', 'c', 'd', 'e']

它将按照您提供的顺序用您提供的名称替换现有名称。

df.columns = ['a', 'b', 'c', 'd', 'e']

It will replace the existing names with the names you provide, in the order you provide.


回答 7

old_names = ['$a', '$b', '$c', '$d', '$e'] 
new_names = ['a', 'b', 'c', 'd', 'e']
df.rename(columns=dict(zip(old_names, new_names)), inplace=True)

这样,您可以根据需要手动编辑new_names。当您只需要重命名几列以纠正拼写错误,重音符号,删除特殊字符等时,效果很好。

old_names = ['$a', '$b', '$c', '$d', '$e'] 
new_names = ['a', 'b', 'c', 'd', 'e']
df.rename(columns=dict(zip(old_names, new_names)), inplace=True)

This way you can manually edit the new_names as you wish. Works great when you need to rename only a few columns to correct mispellings, accents, remove special characters etc.


回答 8

一线或管道解决方案

我将专注于两件事:

  1. OP明确指出

    我已经将编辑后的列名存储在列表中,但是我不知道如何替换列名。

    我不想解决如何替换'$'或删除每个列标题中的第一个字符的问题。OP已完成此步骤。相反,我想集中精力用columns给定替换列名称列表的新对象替换现有对象。

  2. df.columns = newnew新列名称的列表在哪里就变得很简单。这种方法的缺点是,它需要编辑现有数据框的columns属性,并且无法内联完成。我将展示一些通过流水执行此操作而不编辑现有数据框的方法。


设置1
为了着重于需要使用现有列表重命名替换列名称,我将创建一个df具有初始列名称和不相关的新列名称的新示例数据框。

df = pd.DataFrame({'Jack': [1, 2], 'Mahesh': [3, 4], 'Xin': [5, 6]})
new = ['x098', 'y765', 'z432']

df

   Jack  Mahesh  Xin
0     1       3    5
1     2       4    6

解决方案1
pd.DataFrame.rename

已经有人说过,如果您有一个字典将旧的列名映射到新的列名,则可以使用pd.DataFrame.rename

d = {'Jack': 'x098', 'Mahesh': 'y765', 'Xin': 'z432'}
df.rename(columns=d)

   x098  y765  z432
0     1     3     5
1     2     4     6

但是,您可以轻松创建该词典并将其包含在对的调用中rename。以下内容利用了以下事实:迭代时df,我们迭代每个列名。

# given just a list of new column names
df.rename(columns=dict(zip(df, new)))

   x098  y765  z432
0     1     3     5
1     2     4     6

如果您原始的列名是唯一的,那么这很好。但是,如果不是这样,那么就会崩溃。


设置2个
非唯一列

df = pd.DataFrame(
    [[1, 3, 5], [2, 4, 6]],
    columns=['Mahesh', 'Mahesh', 'Xin']
)
new = ['x098', 'y765', 'z432']

df

   Mahesh  Mahesh  Xin
0       1       3    5
1       2       4    6

解决方案2
pd.concat使用keys参数

首先,请注意当我们尝试使用解决方案1时会发生什么:

df.rename(columns=dict(zip(df, new)))

   y765  y765  z432
0     1     3     5
1     2     4     6

我们没有将new列表映射为列名。我们最终重复了y765。相反,我们可以在遍历的列时使用函数的keys参数。pd.concatdf

pd.concat([c for _, c in df.items()], axis=1, keys=new) 

   x098  y765  z432
0     1     3     5
1     2     4     6

解决方案3
重建。仅当dtype所有列都有一个时,才应使用此选项。否则,您最终将dtype object获得所有列,并且将它们转换回需要更多的词典工作。

dtype

pd.DataFrame(df.values, df.index, new)

   x098  y765  z432
0     1     3     5
1     2     4     6

混合的 dtype

pd.DataFrame(df.values, df.index, new).astype(dict(zip(new, df.dtypes)))

   x098  y765  z432
0     1     3     5
1     2     4     6

解决方案4
这是使用transpose和的花招set_indexpd.DataFrame.set_index允许我们设置内联索引,但没有对应的set_columns。这样我们就可以转置,然后再set_index转回。但是,此处适用解决方案3 的相同警告dtype与混合dtype警告。

dtype

df.T.set_index(np.asarray(new)).T

   x098  y765  z432
0     1     3     5
1     2     4     6

混合的 dtype

df.T.set_index(np.asarray(new)).T.astype(dict(zip(new, df.dtypes)))

   x098  y765  z432
0     1     3     5
1     2     4     6

解决方案5在循环
使用的每个元素中使用a 在此解决方案中,我们传递一个lambda来接受但忽略它。它也需要一个但并不期望。取而代之的是,将迭代器指定为默认值,然后我可以使用该迭代器一次遍历一个迭代器,而无需考虑is 的值。lambdapd.DataFrame.renamenew
xyx

df.rename(columns=lambda x, y=iter(new): next(y))

   x098  y765  z432
0     1     3     5
1     2     4     6

正如人们在sopython聊天中向我指出的那样,如果*x和之间添加一个,则y可以保护我的y变量。不过,在这种情况下,我认为它不需要保护。仍然值得一提。

df.rename(columns=lambda x, *, y=iter(new): next(y))

   x098  y765  z432
0     1     3     5
1     2     4     6

One line or Pipeline solutions

I’ll focus on two things:

  1. OP clearly states

    I have the edited column names stored it in a list, but I don’t know how to replace the column names.

    I do not want to solve the problem of how to replace '$' or strip the first character off of each column header. OP has already done this step. Instead I want to focus on replacing the existing columns object with a new one given a list of replacement column names.

  2. df.columns = new where new is the list of new columns names is as simple as it gets. The drawback of this approach is that it requires editing the existing dataframe’s columns attribute and it isn’t done inline. I’ll show a few ways to perform this via pipelining without editing the existing dataframe.


Setup 1
To focus on the need to rename of replace column names with a pre-existing list, I’ll create a new sample dataframe df with initial column names and unrelated new column names.

df = pd.DataFrame({'Jack': [1, 2], 'Mahesh': [3, 4], 'Xin': [5, 6]})
new = ['x098', 'y765', 'z432']

df

   Jack  Mahesh  Xin
0     1       3    5
1     2       4    6

Solution 1
pd.DataFrame.rename

It has been said already that if you had a dictionary mapping the old column names to new column names, you could use pd.DataFrame.rename.

d = {'Jack': 'x098', 'Mahesh': 'y765', 'Xin': 'z432'}
df.rename(columns=d)

   x098  y765  z432
0     1     3     5
1     2     4     6

However, you can easily create that dictionary and include it in the call to rename. The following takes advantage of the fact that when iterating over df, we iterate over each column name.

# given just a list of new column names
df.rename(columns=dict(zip(df, new)))

   x098  y765  z432
0     1     3     5
1     2     4     6

This works great if your original column names are unique. But if they are not, then this breaks down.


Setup 2
non-unique columns

df = pd.DataFrame(
    [[1, 3, 5], [2, 4, 6]],
    columns=['Mahesh', 'Mahesh', 'Xin']
)
new = ['x098', 'y765', 'z432']

df

   Mahesh  Mahesh  Xin
0       1       3    5
1       2       4    6

Solution 2
pd.concat using the keys argument

First, notice what happens when we attempt to use solution 1:

df.rename(columns=dict(zip(df, new)))

   y765  y765  z432
0     1     3     5
1     2     4     6

We didn’t map the new list as the column names. We ended up repeating y765. Instead, we can use the keys argument of the pd.concat function while iterating through the columns of df.

pd.concat([c for _, c in df.items()], axis=1, keys=new) 

   x098  y765  z432
0     1     3     5
1     2     4     6

Solution 3
Reconstruct. This should only be used if you have a single dtype for all columns. Otherwise, you’ll end up with dtype object for all columns and converting them back requires more dictionary work.

Single dtype

pd.DataFrame(df.values, df.index, new)

   x098  y765  z432
0     1     3     5
1     2     4     6

Mixed dtype

pd.DataFrame(df.values, df.index, new).astype(dict(zip(new, df.dtypes)))

   x098  y765  z432
0     1     3     5
1     2     4     6

Solution 4
This is a gimmicky trick with transpose and set_index. pd.DataFrame.set_index allows us to set an index inline but there is no corresponding set_columns. So we can transpose, then set_index, and transpose back. However, the same single dtype versus mixed dtype caveat from solution 3 applies here.

Single dtype

df.T.set_index(np.asarray(new)).T

   x098  y765  z432
0     1     3     5
1     2     4     6

Mixed dtype

df.T.set_index(np.asarray(new)).T.astype(dict(zip(new, df.dtypes)))

   x098  y765  z432
0     1     3     5
1     2     4     6

Solution 5
Use a lambda in pd.DataFrame.rename that cycles through each element of new
In this solution, we pass a lambda that takes x but then ignores it. It also takes a y but doesn’t expect it. Instead, an iterator is given as a default value and I can then use that to cycle through one at a time without regard to what the value of x is.

df.rename(columns=lambda x, y=iter(new): next(y))

   x098  y765  z432
0     1     3     5
1     2     4     6

And as pointed out to me by the folks in sopython chat, if I add a * in between x and y, I can protect my y variable. Though, in this context I don’t believe it needs protecting. It is still worth mentioning.

df.rename(columns=lambda x, *, y=iter(new): next(y))

   x098  y765  z432
0     1     3     5
1     2     4     6

回答 9

列名称与系列名称

我想解释一下幕后发生的事情。

数据框是一组系列。

系列又是对 numpy.array

numpy.array有财产 .name

这是系列的名称。很少有人会尊重大熊猫的这一属性,但它会在某些地方徘徊,并可以用来破解某些大熊猫的行为。

命名列列表

这里有很多答案都谈到该df.columns属性list实际上是一个Series。这意味着它具有.name属性。

如果您决定填写各列的名称,则会发生这种情况Series

df.columns = ['column_one', 'column_two']
df.columns.names = ['name of the list of columns']
df.index.names = ['name of the index']

name of the list of columns     column_one  column_two
name of the index       
0                                    4           1
1                                    5           2
2                                    6           3

请注意,索引的名称总是低一列。

that绕的神器

.name属性有时会持续存在。如果设置df.columns = ['one', 'two']df.one.name则将为'one'

如果您设置,df.one.name = 'three'那么df.columns仍然会给您['one', 'two'],并df.one.name会给您'three'

pd.DataFrame(df.one) 将返回

    three
0       1
1       2
2       3

因为pandas重用.name了已经定义的Series

多级列名称

熊猫有做多层列名的方法。没有太多魔术,但是我也想在答案中涵盖这一点,因为我看不到有人在这里进行这项工作。

    |one            |
    |one      |two  |
0   |  4      |  1  |
1   |  5      |  2  |
2   |  6      |  3  |

通过将列设置为列表很容易实现,如下所示:

df.columns = [['one', 'one'], ['one', 'two']]

Column names vs Names of Series

I would like to explain a bit what happens behind the scenes.

Dataframes are a set of Series.

Series in turn are an extension of a numpy.array

numpy.arrays have a property .name

This is the name of the series. It is seldom that pandas respects this attribute, but it lingers in places and can be used to hack some pandas behaviors.

Naming the list of columns

A lot of answers here talks about the df.columns attribute being a list when in fact it is a Series. This means it has a .name attribute.

This is what happens if you decide to fill in the name of the columns Series:

df.columns = ['column_one', 'column_two']
df.columns.names = ['name of the list of columns']
df.index.names = ['name of the index']

name of the list of columns     column_one  column_two
name of the index       
0                                    4           1
1                                    5           2
2                                    6           3

Note that the name of the index always comes one column lower.

Artifacts that linger

The .name attribute lingers on sometimes. If you set df.columns = ['one', 'two'] then the df.one.name will be 'one'.

If you set df.one.name = 'three' then df.columns will still give you ['one', 'two'], and df.one.name will give you 'three'

BUT

pd.DataFrame(df.one) will return

    three
0       1
1       2
2       3

Because pandas reuses the .name of the already defined Series.

Multi level column names

Pandas has ways of doing multi layered column names. There is not so much magic involved but I wanted to cover this in my answer too since I don’t see anyone picking up on this here.

    |one            |
    |one      |two  |
0   |  4      |  1  |
1   |  5      |  2  |
2   |  6      |  3  |

This is easily achievable by setting columns to lists, like this:

df.columns = [['one', 'one'], ['one', 'two']]

回答 10

如果您有数据框,则df.columns会将所有内容转储到您可以操作的列表中,然后将其重新分配给数据框作为列名…

columns = df.columns
columns = [row.replace("$","") for row in columns]
df.rename(columns=dict(zip(columns, things)), inplace=True)
df.head() #to validate the output

最好的办法?IDK。一种方法-是的。

下面是使用cProfile衡量内存和执行时间的一种更好的评估问题答案中提出的所有主要技术的方法。@ kadee,@ kaitlyn和@eumiro具有执行时间最快的功能-尽管这些功能是如此之快,我们将比较所有答案的.000和.001秒舍入。道德:我上面的回答可能不是“最佳”方法。

import pandas as pd
import cProfile, pstats, re

old_names = ['$a', '$b', '$c', '$d', '$e']
new_names = ['a', 'b', 'c', 'd', 'e']
col_dict = {'$a': 'a', '$b': 'b','$c':'c','$d':'d','$e':'e'}

df = pd.DataFrame({'$a':[1,2], '$b': [10,20],'$c':['bleep','blorp'],'$d':[1,2],'$e':['texa$','']})

df.head()

def eumiro(df,nn):
    df.columns = nn
    #This direct renaming approach is duplicated in methodology in several other answers: 
    return df

def lexual1(df):
    return df.rename(columns=col_dict)

def lexual2(df,col_dict):
    return df.rename(columns=col_dict, inplace=True)

def Panda_Master_Hayden(df):
    return df.rename(columns=lambda x: x[1:], inplace=True)

def paulo1(df):
    return df.rename(columns=lambda x: x.replace('$', ''))

def paulo2(df):
    return df.rename(columns=lambda x: x.replace('$', ''), inplace=True)

def migloo(df,on,nn):
    return df.rename(columns=dict(zip(on, nn)), inplace=True)

def kadee(df):
    return df.columns.str.replace('$','')

def awo(df):
    columns = df.columns
    columns = [row.replace("$","") for row in columns]
    return df.rename(columns=dict(zip(columns, '')), inplace=True)

def kaitlyn(df):
    df.columns = [col.strip('$') for col in df.columns]
    return df

print 'eumiro'
cProfile.run('eumiro(df,new_names)')
print 'lexual1'
cProfile.run('lexual1(df)')
print 'lexual2'
cProfile.run('lexual2(df,col_dict)')
print 'andy hayden'
cProfile.run('Panda_Master_Hayden(df)')
print 'paulo1'
cProfile.run('paulo1(df)')
print 'paulo2'
cProfile.run('paulo2(df)')
print 'migloo'
cProfile.run('migloo(df,old_names,new_names)')
print 'kadee'
cProfile.run('kadee(df)')
print 'awo'
cProfile.run('awo(df)')
print 'kaitlyn'
cProfile.run('kaitlyn(df)')

If you’ve got the dataframe, df.columns dumps everything into a list you can manipulate and then reassign into your dataframe as the names of columns…

columns = df.columns
columns = [row.replace("$","") for row in columns]
df.rename(columns=dict(zip(columns, things)), inplace=True)
df.head() #to validate the output

Best way? IDK. A way – yes.

A better way of evaluating all the main techniques put forward in the answers to the question is below using cProfile to gage memory & execution time. @kadee, @kaitlyn, & @eumiro had the functions with the fastest execution times – though these functions are so fast we’re comparing the rounding of .000 and .001 seconds for all the answers. Moral: my answer above likely isn’t the ‘Best’ way.

import pandas as pd
import cProfile, pstats, re

old_names = ['$a', '$b', '$c', '$d', '$e']
new_names = ['a', 'b', 'c', 'd', 'e']
col_dict = {'$a': 'a', '$b': 'b','$c':'c','$d':'d','$e':'e'}

df = pd.DataFrame({'$a':[1,2], '$b': [10,20],'$c':['bleep','blorp'],'$d':[1,2],'$e':['texa$','']})

df.head()

def eumiro(df,nn):
    df.columns = nn
    #This direct renaming approach is duplicated in methodology in several other answers: 
    return df

def lexual1(df):
    return df.rename(columns=col_dict)

def lexual2(df,col_dict):
    return df.rename(columns=col_dict, inplace=True)

def Panda_Master_Hayden(df):
    return df.rename(columns=lambda x: x[1:], inplace=True)

def paulo1(df):
    return df.rename(columns=lambda x: x.replace('$', ''))

def paulo2(df):
    return df.rename(columns=lambda x: x.replace('$', ''), inplace=True)

def migloo(df,on,nn):
    return df.rename(columns=dict(zip(on, nn)), inplace=True)

def kadee(df):
    return df.columns.str.replace('$','')

def awo(df):
    columns = df.columns
    columns = [row.replace("$","") for row in columns]
    return df.rename(columns=dict(zip(columns, '')), inplace=True)

def kaitlyn(df):
    df.columns = [col.strip('$') for col in df.columns]
    return df

print 'eumiro'
cProfile.run('eumiro(df,new_names)')
print 'lexual1'
cProfile.run('lexual1(df)')
print 'lexual2'
cProfile.run('lexual2(df,col_dict)')
print 'andy hayden'
cProfile.run('Panda_Master_Hayden(df)')
print 'paulo1'
cProfile.run('paulo1(df)')
print 'paulo2'
cProfile.run('paulo2(df)')
print 'migloo'
cProfile.run('migloo(df,old_names,new_names)')
print 'kadee'
cProfile.run('kadee(df)')
print 'awo'
cProfile.run('awo(df)')
print 'kaitlyn'
cProfile.run('kaitlyn(df)')

回答 11

假设这是您的数据框。

您可以使用两种方法重命名列。

  1. 使用 dataframe.columns=[#list]

    df.columns=['a','b','c','d','e']

    此方法的局限性在于,如果必须更改一列,则必须传递完整的列列表。同样,此方法不适用于索引标签。例如,如果您通过以下操作:

    df.columns = ['a','b','c','d']

    这将引发错误。长度不匹配:预期轴有5个元素,新值有4个元素。

  2. 另一种方法是Pandas rename()方法,用于重命名任何索引,列或行

    df = df.rename(columns={'$a':'a'})

同样,您可以更改任何行或列。

Let’s say this is your dataframe.

You can rename the columns using two methods.

  1. Using dataframe.columns=[#list]

    df.columns=['a','b','c','d','e']
    

    The limitation of this method is that if one column has to be changed, full column list has to be passed. Also, this method is not applicable on index labels. For example, if you passed this:

    df.columns = ['a','b','c','d']
    

    This will throw an error. Length mismatch: Expected axis has 5 elements, new values have 4 elements.

  2. Another method is the Pandas rename() method which is used to rename any index, column or row

    df = df.rename(columns={'$a':'a'})
    

Similarly, you can change any rows or columns.


回答 12

df = pd.DataFrame({'$a': [1], '$b': [1], '$c': [1], '$d': [1], '$e': [1]})

如果新的列列表与现有列的顺序相同,则分配很简单:

new_cols = ['a', 'b', 'c', 'd', 'e']
df.columns = new_cols
>>> df
   a  b  c  d  e
0  1  1  1  1  1

如果您有一个将旧列名键入新列名的字典,则可以执行以下操作:

d = {'$a': 'a', '$b': 'b', '$c': 'c', '$d': 'd', '$e': 'e'}
df.columns = df.columns.map(lambda col: d[col])  # Or `.map(d.get)` as pointed out by @PiRSquared.
>>> df
   a  b  c  d  e
0  1  1  1  1  1

如果没有列表或字典映射,则可以$通过列表理解来去除前导符号:

df.columns = [col[1:] if col[0] == '$' else col for col in df]
df = pd.DataFrame({'$a': [1], '$b': [1], '$c': [1], '$d': [1], '$e': [1]})

If your new list of columns is in the same order as the existing columns, the assignment is simple:

new_cols = ['a', 'b', 'c', 'd', 'e']
df.columns = new_cols
>>> df
   a  b  c  d  e
0  1  1  1  1  1

If you had a dictionary keyed on old column names to new column names, you could do the following:

d = {'$a': 'a', '$b': 'b', '$c': 'c', '$d': 'd', '$e': 'e'}
df.columns = df.columns.map(lambda col: d[col])  # Or `.map(d.get)` as pointed out by @PiRSquared.
>>> df
   a  b  c  d  e
0  1  1  1  1  1

If you don’t have a list or dictionary mapping, you could strip the leading $ symbol via a list comprehension:

df.columns = [col[1:] if col[0] == '$' else col for col in df]

回答 13


回答 14

让我们通过一个小例子来了解重命名…

1.使用映射重命名列:

df = pd.DataFrame({"A": [1, 2, 3], "B": [4, 5, 6]}) #creating a df with column name A and B
df.rename({"A": "new_a", "B": "new_b"},axis='columns',inplace =True) #renaming column A with 'new_a' and B with 'new_b'

output:
   new_a  new_b
0  1       4
1  2       5
2  3       6

2.使用映射重命名索引/行名:

df.rename({0: "x", 1: "y", 2: "z"},axis='index',inplace =True) #Row name are getting replaced by 'x','y','z'.

output:
       new_a  new_b
    x  1       4
    y  2       5
    z  3       6

Let’s Understand renaming by a small example…

1.Renaming columns using mapping:

df = pd.DataFrame({"A": [1, 2, 3], "B": [4, 5, 6]}) #creating a df with column name A and B
df.rename({"A": "new_a", "B": "new_b"},axis='columns',inplace =True) #renaming column A with 'new_a' and B with 'new_b'

output:
   new_a  new_b
0  1       4
1  2       5
2  3       6

2.Renaming index/Row_Name using mapping:

df.rename({0: "x", 1: "y", 2: "z"},axis='index',inplace =True) #Row name are getting replaced by 'x','y','z'.

output:
       new_a  new_b
    x  1       4
    y  2       5
    z  3       6

回答 15

我们可以替换原始列标签的另一种方法是通过从原始列标签中删除不需要的字符(此处为“ $”)。

可以通过在df.columns上运行for循环,并将剥离后的列附加到df.columns来完成。

取而代之的是,我们可以通过使用如下列表理解来在一个语句中整齐地做到这一点:

df.columns = [col.strip('$') for col in df.columns]

stripPython中的方法从字符串的开头和结尾去除给定的字符。)

Another way we could replace the original column labels is by stripping the unwanted characters (here ‘$’) from the original column labels.

This could have been done by running a for loop over df.columns and appending the stripped columns to df.columns.

Instead , we can do this neatly in a single statement by using list comprehension like below:

df.columns = [col.strip('$') for col in df.columns]

(strip method in Python strips the given character from beginning and end of the string.)


回答 16

真正简单就用

df.columns = ['Name1', 'Name2', 'Name3'...]

它将按照您放置它们的顺序分配列名

Real simple just use

df.columns = ['Name1', 'Name2', 'Name3'...]

and it will assign the column names by the order you put them


回答 17

您可以使用str.slice

df.columns = df.columns.str.slice(1)

You could use str.slice for that:

df.columns = df.columns.str.slice(1)

回答 18

我知道这个问题和答案已经被to死了。但是我提到它是为了解决我遇到的一个问题。我能够使用来自不同答案的点点滴滴来解决它,从而在有人需要时提供我的回复。

我的方法很通用,您可以通过用逗号分隔delimiters=变量并将其过时的方式添加其他定界符。

工作代码:

import pandas as pd
import re


df = pd.DataFrame({'$a':[1,2], '$b': [3,4],'$c':[5,6], '$d': [7,8], '$e': [9,10]})

delimiters = '$'
matchPattern = '|'.join(map(re.escape, delimiters))
df.columns = [re.split(matchPattern, i)[1] for i in df.columns ]

输出:

>>> df
   $a  $b  $c  $d  $e
0   1   3   5   7   9
1   2   4   6   8  10

>>> df
   a  b  c  d   e
0  1  3  5  7   9
1  2  4  6  8  10

I know this question and answer has been chewed to death. But I referred to it for inspiration for one of the problem I was having . I was able to solve it using bits and pieces from different answers hence providing my response in case anyone needs it.

My method is generic wherein you can add additional delimiters by comma separating delimiters= variable and future-proof it.

Working Code:

import pandas as pd
import re


df = pd.DataFrame({'$a':[1,2], '$b': [3,4],'$c':[5,6], '$d': [7,8], '$e': [9,10]})

delimiters = '$'
matchPattern = '|'.join(map(re.escape, delimiters))
df.columns = [re.split(matchPattern, i)[1] for i in df.columns ]

Output:

>>> df
   $a  $b  $c  $d  $e
0   1   3   5   7   9
1   2   4   6   8  10

>>> df
   a  b  c  d   e
0  1  3  5  7   9
1  2  4  6  8  10

回答 19

请注意,这些方法不适用于MultiIndex。对于MultiIndex,您需要执行以下操作:

>>> df = pd.DataFrame({('$a','$x'):[1,2], ('$b','$y'): [3,4], ('e','f'):[5,6]})
>>> df
   $a $b  e
   $x $y  f
0  1  3  5
1  2  4  6
>>> rename = {('$a','$x'):('a','x'), ('$b','$y'):('b','y')}
>>> df.columns = pandas.MultiIndex.from_tuples([
        rename.get(item, item) for item in df.columns.tolist()])
>>> df
   a  b  e
   x  y  f
0  1  3  5
1  2  4  6

Note that these approach do not work for a MultiIndex. For a MultiIndex, you need to do something like the following:

>>> df = pd.DataFrame({('$a','$x'):[1,2], ('$b','$y'): [3,4], ('e','f'):[5,6]})
>>> df
   $a $b  e
   $x $y  f
0  1  3  5
1  2  4  6
>>> rename = {('$a','$x'):('a','x'), ('$b','$y'):('b','y')}
>>> df.columns = pandas.MultiIndex.from_tuples([
        rename.get(item, item) for item in df.columns.tolist()])
>>> df
   a  b  e
   x  y  f
0  1  3  5
1  2  4  6

回答 20

另一种选择是使用正则表达式重命名:

import pandas as pd
import re

df = pd.DataFrame({'$a':[1,2], '$b':[3,4], '$c':[5,6]})

df = df.rename(columns=lambda x: re.sub('\$','',x))
>>> df
   a  b  c
0  1  3  5
1  2  4  6

Another option is to rename using a regular expression:

import pandas as pd
import re

df = pd.DataFrame({'$a':[1,2], '$b':[3,4], '$c':[5,6]})

df = df.rename(columns=lambda x: re.sub('\$','',x))
>>> df
   a  b  c
0  1  3  5
1  2  4  6

回答 21

如果您必须处理无法由提供系统命名的列负载,那么我想出了以下方法,该方法将一次通用方法与特定替换方法结合在一起。

首先,使用正则表达式从数据框的列名称中创建字典,以丢弃某些列名称的附录,然后向字典中添加特定的替换内容,以便稍后在接收数据库中按预期命名核心列。

然后将其一次性应用到数据帧。

dict=dict(zip(df.columns,df.columns.str.replace('(:S$|:C1$|:L$|:D$|\.Serial:L$)','')))
dict['brand_timeseries:C1']='BTS'
dict['respid:L']='RespID'
dict['country:C1']='CountryID'
dict['pim1:D']='pim_actual'
df.rename(columns=dict, inplace=True)

If you have to deal with loads of columns named by the providing system out of your control, I came up with the following approach that is a combination of a general approach and specific replacments in one go.

First create a dictionary from the dataframe column names using regex expressions in order to throw away certain appendixes of column names and then add specific replacements to the dictionary to name core columns as expected later in the receiving database.

This is then applied to the dataframe in one go.

dict=dict(zip(df.columns,df.columns.str.replace('(:S$|:C1$|:L$|:D$|\.Serial:L$)','')))
dict['brand_timeseries:C1']='BTS'
dict['respid:L']='RespID'
dict['country:C1']='CountryID'
dict['pim1:D']='pim_actual'
df.rename(columns=dict, inplace=True)

回答 22

除了已经提供的解决方案之外,您还可以在读取文件时替换所有列。我们可以使用namesheader=0做到这一点。

首先,我们创建一个名称列表,以用作列名:

import pandas as pd

ufo_cols = ['city', 'color reported', 'shape reported', 'state', 'time']
ufo.columns = ufo_cols

ufo = pd.read_csv('link to the file you are using', names = ufo_cols, header = 0)

在这种情况下,所有列名称都将替换为列表中的名称。

In addition to the solution already provided, you can replace all the columns while you are reading the file. We can use names and header=0 to do that.

First, we create a list of the names that we like to use as our column names:

import pandas as pd

ufo_cols = ['city', 'color reported', 'shape reported', 'state', 'time']
ufo.columns = ufo_cols

ufo = pd.read_csv('link to the file you are using', names = ufo_cols, header = 0)

In this case, all the column names will be replaced with the names you have in your list.


回答 23

这是一个我喜欢用来减少键入的漂亮小功能:

def rename(data, oldnames, newname): 
    if type(oldnames) == str: #input can be a string or list of strings 
        oldnames = [oldnames] #when renaming multiple columns 
        newname = [newname] #make sure you pass the corresponding list of new names
    i = 0 
    for name in oldnames:
        oldvar = [c for c in data.columns if name in c]
        if len(oldvar) == 0: 
            raise ValueError("Sorry, couldn't find that column in the dataset")
        if len(oldvar) > 1: #doesn't have to be an exact match 
            print("Found multiple columns that matched " + str(name) + " :")
            for c in oldvar:
                print(str(oldvar.index(c)) + ": " + str(c))
            ind = input('please enter the index of the column you would like to rename: ')
            oldvar = oldvar[int(ind)]
        if len(oldvar) == 1:
            oldvar = oldvar[0]
        data = data.rename(columns = {oldvar : newname[i]})
        i += 1 
    return data   

这是它如何工作的示例:

In [2]: df = pd.DataFrame(np.random.randint(0,10,size=(10, 4)), columns=['col1','col2','omg','idk'])
#first list = existing variables
#second list = new names for those variables
In [3]: df = rename(df, ['col','omg'],['first','ohmy']) 
Found multiple columns that matched col :
0: col1
1: col2

please enter the index of the column you would like to rename: 0

In [4]: df.columns
Out[5]: Index(['first', 'col2', 'ohmy', 'idk'], dtype='object')

Here’s a nifty little function I like to use to cut down on typing:

def rename(data, oldnames, newname): 
    if type(oldnames) == str: #input can be a string or list of strings 
        oldnames = [oldnames] #when renaming multiple columns 
        newname = [newname] #make sure you pass the corresponding list of new names
    i = 0 
    for name in oldnames:
        oldvar = [c for c in data.columns if name in c]
        if len(oldvar) == 0: 
            raise ValueError("Sorry, couldn't find that column in the dataset")
        if len(oldvar) > 1: #doesn't have to be an exact match 
            print("Found multiple columns that matched " + str(name) + " :")
            for c in oldvar:
                print(str(oldvar.index(c)) + ": " + str(c))
            ind = input('please enter the index of the column you would like to rename: ')
            oldvar = oldvar[int(ind)]
        if len(oldvar) == 1:
            oldvar = oldvar[0]
        data = data.rename(columns = {oldvar : newname[i]})
        i += 1 
    return data   

Here is an example of how it works:

In [2]: df = pd.DataFrame(np.random.randint(0,10,size=(10, 4)), columns=['col1','col2','omg','idk'])
#first list = existing variables
#second list = new names for those variables
In [3]: df = rename(df, ['col','omg'],['first','ohmy']) 
Found multiple columns that matched col :
0: col1
1: col2

please enter the index of the column you would like to rename: 0

In [4]: df.columns
Out[5]: Index(['first', 'col2', 'ohmy', 'idk'], dtype='object')

回答 24

重命名熊猫中的列很容易。

df.rename(columns = {'$a':'a','$b':'b','$c':'c','$d':'d','$e':'e'},inplace = True)

Renaming columns in pandas is an easy task.

df.rename(columns = {'$a':'a','$b':'b','$c':'c','$d':'d','$e':'e'},inplace = True)

回答 25

假设您可以使用正则表达式。该解决方案无需使用正则表达式进行手动编码

import pandas as pd
import re

srch=re.compile(r"\w+")

data=pd.read_csv("CSV_FILE.csv")
cols=data.columns
new_cols=list(map(lambda v:v.group(),(list(map(srch.search,cols)))))
data.columns=new_cols

Assuming you can use regular expression. This solution removes the need of manual encoding using regex

import pandas as pd
import re

srch=re.compile(r"\w+")

data=pd.read_csv("CSV_FILE.csv")
cols=data.columns
new_cols=list(map(lambda v:v.group(),(list(map(srch.search,cols)))))
data.columns=new_cols