标签归档:replace

如何替换一个字符串的多个子字符串?

问题:如何替换一个字符串的多个子字符串?

我想使用.replace函数替换多个字符串。

我目前有

string.replace("condition1", "")

但想有类似的东西

string.replace("condition1", "").replace("condition2", "text")

尽管那听起来不像是好的语法

正确的方法是什么?有点像如何在grep / regex中进行操作\1以及\2如何将字段替换为某些搜索字符串

I would like to use the .replace function to replace multiple strings.

I currently have

string.replace("condition1", "")

but would like to have something like

string.replace("condition1", "").replace("condition2", "text")

although that does not feel like good syntax

what is the proper way to do this? kind of like how in grep/regex you can do \1 and \2 to replace fields to certain search strings


回答 0

这是一个简短的示例,应该使用正则表达式来解决问题:

import re

rep = {"condition1": "", "condition2": "text"} # define desired replacements here

# use these three lines to do the replacement
rep = dict((re.escape(k), v) for k, v in rep.iteritems()) 
#Python 3 renamed dict.iteritems to dict.items so use rep.items() for latest versions
pattern = re.compile("|".join(rep.keys()))
text = pattern.sub(lambda m: rep[re.escape(m.group(0))], text)

例如:

>>> pattern.sub(lambda m: rep[re.escape(m.group(0))], "(condition1) and --condition2--")
'() and --text--'

Here is a short example that should do the trick with regular expressions:

import re

rep = {"condition1": "", "condition2": "text"} # define desired replacements here

# use these three lines to do the replacement
rep = dict((re.escape(k), v) for k, v in rep.iteritems()) 
#Python 3 renamed dict.iteritems to dict.items so use rep.items() for latest versions
pattern = re.compile("|".join(rep.keys()))
text = pattern.sub(lambda m: rep[re.escape(m.group(0))], text)

For example:

>>> pattern.sub(lambda m: rep[re.escape(m.group(0))], "(condition1) and --condition2--")
'() and --text--'

回答 1

您可以制作一个不错的小循环功能。

def replace_all(text, dic):
    for i, j in dic.iteritems():
        text = text.replace(i, j)
    return text

其中,text是完整的字符串,dic是一个字典—每个定义都是一个字符串,它将替换与该词匹配的字符串。

注意:在Python 3中,iteritems()已替换为items()


注意: Python字典没有可靠的迭代顺序。该解决方案仅在以下情况下解决您的问题:

  • 更换顺序无关紧要
  • 可以更改以前的替换结果

例如:

d = { "cat": "dog", "dog": "pig"}
my_sentence = "This is my cat and this is my dog."
replace_all(my_sentence, d)
print(my_sentence)

可能的输出#1:

“这是我的猪,这是我的猪。”

可能的输出#2

“这是我的狗,这是我的猪。”

一种可能的解决方法是使用OrderedDict。

from collections import OrderedDict
def replace_all(text, dic):
    for i, j in dic.items():
        text = text.replace(i, j)
    return text
od = OrderedDict([("cat", "dog"), ("dog", "pig")])
my_sentence = "This is my cat and this is my dog."
replace_all(my_sentence, od)
print(my_sentence)

输出:

"This is my pig and this is my pig."

小心#2:如果您的text字符串太大或字典中有很多对,效率会很低。

You could just make a nice little looping function.

def replace_all(text, dic):
    for i, j in dic.iteritems():
        text = text.replace(i, j)
    return text

where text is the complete string and dic is a dictionary — each definition is a string that will replace a match to the term.

Note: in Python 3, iteritems() has been replaced with items()


Careful: Python dictionaries don’t have a reliable order for iteration. This solution only solves your problem if:

  • order of replacements is irrelevant
  • it’s ok for a replacement to change the results of previous replacements

Update: The above statement related to ordering of insertion does not apply to Python versions greater than or equal to 3.6, as standard dicts were changed to use insertion ordering for iteration.

For instance:

d = { "cat": "dog", "dog": "pig"}
my_sentence = "This is my cat and this is my dog."
replace_all(my_sentence, d)
print(my_sentence)

Possible output #1:

"This is my pig and this is my pig."

Possible output #2

"This is my dog and this is my pig."

One possible fix is to use an OrderedDict.

from collections import OrderedDict
def replace_all(text, dic):
    for i, j in dic.items():
        text = text.replace(i, j)
    return text
od = OrderedDict([("cat", "dog"), ("dog", "pig")])
my_sentence = "This is my cat and this is my dog."
replace_all(my_sentence, od)
print(my_sentence)

Output:

"This is my pig and this is my pig."

Careful #2: Inefficient if your text string is too big or there are many pairs in the dictionary.


回答 2

为什么不提供这样的解决方案?

s = "The quick brown fox jumps over the lazy dog"
for r in (("brown", "red"), ("lazy", "quick")):
    s = s.replace(*r)

#output will be:  The quick red fox jumps over the quick dog

Why not one solution like this?

s = "The quick brown fox jumps over the lazy dog"
for r in (("brown", "red"), ("lazy", "quick")):
    s = s.replace(*r)

#output will be:  The quick red fox jumps over the quick dog

回答 3

这是第一种使用reduce的解决方案的变体,以防您喜欢功能。:)

repls = {'hello' : 'goodbye', 'world' : 'earth'}
s = 'hello, world'
reduce(lambda a, kv: a.replace(*kv), repls.iteritems(), s)

martineau的更好版本:

repls = ('hello', 'goodbye'), ('world', 'earth')
s = 'hello, world'
reduce(lambda a, kv: a.replace(*kv), repls, s)

Here is a variant of the first solution using reduce, in case you like being functional. :)

repls = {'hello' : 'goodbye', 'world' : 'earth'}
s = 'hello, world'
reduce(lambda a, kv: a.replace(*kv), repls.iteritems(), s)

martineau’s even better version:

repls = ('hello', 'goodbye'), ('world', 'earth')
s = 'hello, world'
reduce(lambda a, kv: a.replace(*kv), repls, s)

回答 4

这只是对FJ和MiniQuark最佳答案的更简明扼要的回顾。要实现多个同时的字符串替换,您所需要做的就是以下功能:

def multiple_replace(string, rep_dict):
    pattern = re.compile("|".join([re.escape(k) for k in sorted(rep_dict,key=len,reverse=True)]), flags=re.DOTALL)
    return pattern.sub(lambda x: rep_dict[x.group(0)], string)

用法:

>>>multiple_replace("Do you like cafe? No, I prefer tea.", {'cafe':'tea', 'tea':'cafe', 'like':'prefer'})
'Do you prefer tea? No, I prefer cafe.'

如果您愿意,您可以从此简单的功能开始制作自己的专用替换功能。

This is just a more concise recap of F.J and MiniQuark great answers. All you need to achieve multiple simultaneous string replacements is the following function:

def multiple_replace(string, rep_dict):
    pattern = re.compile("|".join([re.escape(k) for k in sorted(rep_dict,key=len,reverse=True)]), flags=re.DOTALL)
    return pattern.sub(lambda x: rep_dict[x.group(0)], string)

Usage:

>>>multiple_replace("Do you like cafe? No, I prefer tea.", {'cafe':'tea', 'tea':'cafe', 'like':'prefer'})
'Do you prefer tea? No, I prefer cafe.'

If you wish, you can make your own dedicated replacement functions starting from this simpler one.


回答 5

我基于FJ的出色答案:

import re

def multiple_replacer(*key_values):
    replace_dict = dict(key_values)
    replacement_function = lambda match: replace_dict[match.group(0)]
    pattern = re.compile("|".join([re.escape(k) for k, v in key_values]), re.M)
    return lambda string: pattern.sub(replacement_function, string)

def multiple_replace(string, *key_values):
    return multiple_replacer(*key_values)(string)

一杆用法:

>>> replacements = (u"café", u"tea"), (u"tea", u"café"), (u"like", u"love")
>>> print multiple_replace(u"Do you like café? No, I prefer tea.", *replacements)
Do you love tea? No, I prefer café.

请注意,由于更换仅需一遍,因此“café”更改为“ tea”,但不会更改为“café”。

如果您需要多次进行相同的替换,则可以轻松创建替换功能:

>>> my_escaper = multiple_replacer(('"','\\"'), ('\t', '\\t'))
>>> many_many_strings = (u'This text will be escaped by "my_escaper"',
                       u'Does this work?\tYes it does',
                       u'And can we span\nmultiple lines?\t"Yes\twe\tcan!"')
>>> for line in many_many_strings:
...     print my_escaper(line)
... 
This text will be escaped by \"my_escaper\"
Does this work?\tYes it does
And can we span
multiple lines?\t\"Yes\twe\tcan!\"

改进之处:

  • 把代码变成函数
  • 增加了多行支持
  • 修复了转义中的错误
  • 易于为特定的多个替换创建函数

请享用!:-)

I built this upon F.J.s excellent answer:

import re

def multiple_replacer(*key_values):
    replace_dict = dict(key_values)
    replacement_function = lambda match: replace_dict[match.group(0)]
    pattern = re.compile("|".join([re.escape(k) for k, v in key_values]), re.M)
    return lambda string: pattern.sub(replacement_function, string)

def multiple_replace(string, *key_values):
    return multiple_replacer(*key_values)(string)

One shot usage:

>>> replacements = (u"café", u"tea"), (u"tea", u"café"), (u"like", u"love")
>>> print multiple_replace(u"Do you like café? No, I prefer tea.", *replacements)
Do you love tea? No, I prefer café.

Note that since replacement is done in just one pass, “café” changes to “tea”, but it does not change back to “café”.

If you need to do the same replacement many times, you can create a replacement function easily:

>>> my_escaper = multiple_replacer(('"','\\"'), ('\t', '\\t'))
>>> many_many_strings = (u'This text will be escaped by "my_escaper"',
                       u'Does this work?\tYes it does',
                       u'And can we span\nmultiple lines?\t"Yes\twe\tcan!"')
>>> for line in many_many_strings:
...     print my_escaper(line)
... 
This text will be escaped by \"my_escaper\"
Does this work?\tYes it does
And can we span
multiple lines?\t\"Yes\twe\tcan!\"

Improvements:

  • turned code into a function
  • added multiline support
  • fixed a bug in escaping
  • easy to create a function for a specific multiple replacement

Enjoy! :-)


回答 6

我想提出字符串模板的用法。只需将要替换的字符串放在字典中,就可以完成所有操作!来自docs.python.org的示例

>>> from string import Template
>>> s = Template('$who likes $what')
>>> s.substitute(who='tim', what='kung pao')
'tim likes kung pao'
>>> d = dict(who='tim')
>>> Template('Give $who $100').substitute(d)
Traceback (most recent call last):
[...]
ValueError: Invalid placeholder in string: line 1, col 10
>>> Template('$who likes $what').substitute(d)
Traceback (most recent call last):
[...]
KeyError: 'what'
>>> Template('$who likes $what').safe_substitute(d)
'tim likes $what'

I would like to propose the usage of string templates. Just place the string to be replaced in a dictionary and all is set! Example from docs.python.org

>>> from string import Template
>>> s = Template('$who likes $what')
>>> s.substitute(who='tim', what='kung pao')
'tim likes kung pao'
>>> d = dict(who='tim')
>>> Template('Give $who $100').substitute(d)
Traceback (most recent call last):
[...]
ValueError: Invalid placeholder in string: line 1, col 10
>>> Template('$who likes $what').substitute(d)
Traceback (most recent call last):
[...]
KeyError: 'what'
>>> Template('$who likes $what').safe_substitute(d)
'tim likes $what'

回答 7

就我而言,我需要用名称简单替换唯一键,所以我想到了:

a = 'This is a test string.'
b = {'i': 'I', 's': 'S'}
for x,y in b.items():
    a = a.replace(x, y)
>>> a
'ThIS IS a teSt StrIng.'

In my case, I needed a simple replacing of unique keys with names, so I thought this up:

a = 'This is a test string.'
b = {'i': 'I', 's': 'S'}
for x,y in b.items():
    a = a.replace(x, y)
>>> a
'ThIS IS a teSt StrIng.'

回答 8

从开始Python 3.8,并引入赋值表达式(PEP 572):=运算符),我们可以在列表推导中应用替换项:

# text = "The quick brown fox jumps over the lazy dog"
# replacements = [("brown", "red"), ("lazy", "quick")]
[text := text.replace(a, b) for a, b in replacements]
# text = 'The quick red fox jumps over the quick dog'

Starting Python 3.8, and the introduction of assignment expressions (PEP 572) (:= operator), we can apply the replacements within a list comprehension:

# text = "The quick brown fox jumps over the lazy dog"
# replacements = [("brown", "red"), ("lazy", "quick")]
[text := text.replace(a, b) for a, b in replacements]
# text = 'The quick red fox jumps over the quick dog'

回答 9

这是我的$ 0.02。它基于安德鲁·克拉克(Andrew Clark)的回答,稍微清晰了一点,并且还涵盖了替换字符串是另一个替换字符串的子字符串(更长的字符串获胜)的情况。

def multireplace(string, replacements):
    """
    Given a string and a replacement map, it returns the replaced string.

    :param str string: string to execute replacements on
    :param dict replacements: replacement dictionary {value to find: value to replace}
    :rtype: str

    """
    # Place longer ones first to keep shorter substrings from matching
    # where the longer ones should take place
    # For instance given the replacements {'ab': 'AB', 'abc': 'ABC'} against 
    # the string 'hey abc', it should produce 'hey ABC' and not 'hey ABc'
    substrs = sorted(replacements, key=len, reverse=True)

    # Create a big OR regex that matches any of the substrings to replace
    regexp = re.compile('|'.join(map(re.escape, substrs)))

    # For each match, look up the new string in the replacements
    return regexp.sub(lambda match: replacements[match.group(0)], string)

在此主旨中,如有任何建议,请随时对其进行修改。

Here my $0.02. It is based on Andrew Clark’s answer, just a little bit clearer, and it also covers the case when a string to replace is a substring of another string to replace (longer string wins)

def multireplace(string, replacements):
    """
    Given a string and a replacement map, it returns the replaced string.

    :param str string: string to execute replacements on
    :param dict replacements: replacement dictionary {value to find: value to replace}
    :rtype: str

    """
    # Place longer ones first to keep shorter substrings from matching
    # where the longer ones should take place
    # For instance given the replacements {'ab': 'AB', 'abc': 'ABC'} against 
    # the string 'hey abc', it should produce 'hey ABC' and not 'hey ABc'
    substrs = sorted(replacements, key=len, reverse=True)

    # Create a big OR regex that matches any of the substrings to replace
    regexp = re.compile('|'.join(map(re.escape, substrs)))

    # For each match, look up the new string in the replacements
    return regexp.sub(lambda match: replacements[match.group(0)], string)

It is in this this gist, feel free to modify it if you have any proposal.


回答 10

我需要一种解决方案,其中要替换的字符串可以是正则表达式,例如,通过用一个替换多个空格字符来帮助规范长文本。我基于其他人(包括MiniQuark和mmj)的答案链,得出了以下结论:

def multiple_replace(string, reps, re_flags = 0):
    """ Transforms string, replacing keys from re_str_dict with values.
    reps: dictionary, or list of key-value pairs (to enforce ordering;
          earlier items have higher priority).
          Keys are used as regular expressions.
    re_flags: interpretation of regular expressions, such as re.DOTALL
    """
    if isinstance(reps, dict):
        reps = reps.items()
    pattern = re.compile("|".join("(?P<_%d>%s)" % (i, re_str[0])
                                  for i, re_str in enumerate(reps)),
                         re_flags)
    return pattern.sub(lambda x: reps[int(x.lastgroup[1:])][1], string)

它适用于其他答案中给出的示例,例如:

>>> multiple_replace("(condition1) and --condition2--",
...                  {"condition1": "", "condition2": "text"})
'() and --text--'

>>> multiple_replace('hello, world', {'hello' : 'goodbye', 'world' : 'earth'})
'goodbye, earth'

>>> multiple_replace("Do you like cafe? No, I prefer tea.",
...                  {'cafe': 'tea', 'tea': 'cafe', 'like': 'prefer'})
'Do you prefer tea? No, I prefer cafe.'

对我来说,最主要的是您还可以使用正则表达式,例如仅替换整个单词,或规范化空白:

>>> s = "I don't want to change this name:\n  Philip II of Spain"
>>> re_str_dict = {r'\bI\b': 'You', r'[\n\t ]+': ' '}
>>> multiple_replace(s, re_str_dict)
"You don't want to change this name: Philip II of Spain"

如果您想将字典键用作普通字符串,则可以使用以下函数在调用multiple_replace之前对它们进行转义:

def escape_keys(d):
    """ transform dictionary d by applying re.escape to the keys """
    return dict((re.escape(k), v) for k, v in d.items())

>>> multiple_replace(s, escape_keys(re_str_dict))
"I don't want to change this name:\n  Philip II of Spain"

以下函数可以帮助您在字典键中查找错误的正则表达式(因为来自multiple_replace的错误消息不是很清楚):

def check_re_list(re_list):
    """ Checks if each regular expression in list is well-formed. """
    for i, e in enumerate(re_list):
        try:
            re.compile(e)
        except (TypeError, re.error):
            print("Invalid regular expression string "
                  "at position {}: '{}'".format(i, e))

>>> check_re_list(re_str_dict.keys())

请注意,它不会链接替换项,而是同时执行它们。这样可以提高效率,而不会限制它可以做什么。为了模拟链接的效果,您可能只需要添加更多的字符串替换对并确保对的预期顺序即可:

>>> multiple_replace("button", {"but": "mut", "mutton": "lamb"})
'mutton'
>>> multiple_replace("button", [("button", "lamb"),
...                             ("but", "mut"), ("mutton", "lamb")])
'lamb'

I needed a solution where the strings to be replaced can be a regular expressions, for example to help in normalizing a long text by replacing multiple whitespace characters with a single one. Building on a chain of answers from others, including MiniQuark and mmj, this is what I came up with:

def multiple_replace(string, reps, re_flags = 0):
    """ Transforms string, replacing keys from re_str_dict with values.
    reps: dictionary, or list of key-value pairs (to enforce ordering;
          earlier items have higher priority).
          Keys are used as regular expressions.
    re_flags: interpretation of regular expressions, such as re.DOTALL
    """
    if isinstance(reps, dict):
        reps = reps.items()
    pattern = re.compile("|".join("(?P<_%d>%s)" % (i, re_str[0])
                                  for i, re_str in enumerate(reps)),
                         re_flags)
    return pattern.sub(lambda x: reps[int(x.lastgroup[1:])][1], string)

It works for the examples given in other answers, for example:

>>> multiple_replace("(condition1) and --condition2--",
...                  {"condition1": "", "condition2": "text"})
'() and --text--'

>>> multiple_replace('hello, world', {'hello' : 'goodbye', 'world' : 'earth'})
'goodbye, earth'

>>> multiple_replace("Do you like cafe? No, I prefer tea.",
...                  {'cafe': 'tea', 'tea': 'cafe', 'like': 'prefer'})
'Do you prefer tea? No, I prefer cafe.'

The main thing for me is that you can use regular expressions as well, for example to replace whole words only, or to normalize white space:

>>> s = "I don't want to change this name:\n  Philip II of Spain"
>>> re_str_dict = {r'\bI\b': 'You', r'[\n\t ]+': ' '}
>>> multiple_replace(s, re_str_dict)
"You don't want to change this name: Philip II of Spain"

If you want to use the dictionary keys as normal strings, you can escape those before calling multiple_replace using e.g. this function:

def escape_keys(d):
    """ transform dictionary d by applying re.escape to the keys """
    return dict((re.escape(k), v) for k, v in d.items())

>>> multiple_replace(s, escape_keys(re_str_dict))
"I don't want to change this name:\n  Philip II of Spain"

The following function can help in finding erroneous regular expressions among your dictionary keys (since the error message from multiple_replace isn’t very telling):

def check_re_list(re_list):
    """ Checks if each regular expression in list is well-formed. """
    for i, e in enumerate(re_list):
        try:
            re.compile(e)
        except (TypeError, re.error):
            print("Invalid regular expression string "
                  "at position {}: '{}'".format(i, e))

>>> check_re_list(re_str_dict.keys())

Note that it does not chain the replacements, instead performs them simultaneously. This makes it more efficient without constraining what it can do. To mimic the effect of chaining, you may just need to add more string-replacement pairs and ensure the expected ordering of the pairs:

>>> multiple_replace("button", {"but": "mut", "mutton": "lamb"})
'mutton'
>>> multiple_replace("button", [("button", "lamb"),
...                             ("but", "mut"), ("mutton", "lamb")])
'lamb'

回答 11

这是一个示例,在长字符串上有很多小替换项时,效率更高。

source = "Here is foo, it does moo!"

replacements = {
    'is': 'was', # replace 'is' with 'was'
    'does': 'did',
    '!': '?'
}

def replace(source, replacements):
    finder = re.compile("|".join(re.escape(k) for k in replacements.keys())) # matches every string we want replaced
    result = []
    pos = 0
    while True:
        match = finder.search(source, pos)
        if match:
            # cut off the part up until match
            result.append(source[pos : match.start()])
            # cut off the matched part and replace it in place
            result.append(replacements[source[match.start() : match.end()]])
            pos = match.end()
        else:
            # the rest after the last match
            result.append(source[pos:])
            break
    return "".join(result)

print replace(source, replacements)

关键是要避免长字符串的许多串联。我们将源字符串切成片段,在形成列表时替换一些片段,然后将整个内容重新组合成字符串。

Note: Test your case, see comments.

Here’s a sample which is more efficient on long strings with many small replacements.

source = "Here is foo, it does moo!"

replacements = {
    'is': 'was', # replace 'is' with 'was'
    'does': 'did',
    '!': '?'
}

def replace(source, replacements):
    finder = re.compile("|".join(re.escape(k) for k in replacements.keys())) # matches every string we want replaced
    result = []
    pos = 0
    while True:
        match = finder.search(source, pos)
        if match:
            # cut off the part up until match
            result.append(source[pos : match.start()])
            # cut off the matched part and replace it in place
            result.append(replacements[source[match.start() : match.end()]])
            pos = match.end()
        else:
            # the rest after the last match
            result.append(source[pos:])
            break
    return "".join(result)

print replace(source, replacements)

The point is in avoiding many concatenations of long strings. We chop the source string to fragments, replacing some of the fragments as we form the list, and then join the whole thing back into a string.


回答 12

您真的不应该这样,但是我觉得它太酷了:

>>> replacements = {'cond1':'text1', 'cond2':'text2'}
>>> cmd = 'answer = s'
>>> for k,v in replacements.iteritems():
>>>     cmd += ".replace(%s, %s)" %(k,v)
>>> exec(cmd)

现在,answer是所有替换的结果

再次,这 hacky,不是您应该定期使用的东西。但是,很高兴知道您可以根据需要执行以下操作。

You should really not do it this way, but I just find it way too cool:

>>> replacements = {'cond1':'text1', 'cond2':'text2'}
>>> cmd = 'answer = s'
>>> for k,v in replacements.iteritems():
>>>     cmd += ".replace(%s, %s)" %(k,v)
>>> exec(cmd)

Now, answer is the result of all the replacements in turn

again, this is very hacky and is not something that you should be using regularly. But it’s just nice to know that you can do something like this if you ever need to.


回答 13

我也在这个问题上挣扎。string.replace在进行许多替换后,正则表达式很难解决,并且比循环慢大约四倍(在我的实验条件下)。

你绝对应该尝试使用Flashtext库(此处的博客文章Github上这里)。就我而言,每个文档的速度从1.8 s到0.015 s(正则表达式花费7.7 s)快两个数量级

在上面的链接中很容易找到使用示例,但这是一个有效的示例:

    from flashtext import KeywordProcessor
    self.processor = KeywordProcessor(case_sensitive=False)
    for k, v in self.my_dict.items():
        self.processor.add_keyword(k, v)
    new_string = self.processor.replace_keywords(string)

请注意,Flashtext会在一次通过中进行替换(以避免-> bb-> c将’a’转换为’c’)。Flashtext还会查找整个单词(因此,“ is”将与“ th is ” 不匹配)。如果您的目标是几个单词(用“ Hello”代替“ This is”),则效果很好。

I was struggling with this problem as well. With many substitutions regular expressions struggle, and are about four times slower than looping string.replace (in my experiment conditions).

You should absolutely try using the Flashtext library (blog post here, Github here). In my case it was a bit over two orders of magnitude faster, from 1.8 s to 0.015 s (regular expressions took 7.7 s) for each document.

It is easy to find use examples in the links above, but this is a working example:

    from flashtext import KeywordProcessor
    self.processor = KeywordProcessor(case_sensitive=False)
    for k, v in self.my_dict.items():
        self.processor.add_keyword(k, v)
    new_string = self.processor.replace_keywords(string)

Note that Flashtext makes substitutions in a single pass (to avoid a –> b and b –> c translating ‘a’ into ‘c’). Flashtext also looks for whole words (so ‘is’ will not match ‘this‘). It works fine if your target is several words (replacing ‘This is’ by ‘Hello’).


回答 14

我觉得这个问题需要单行递归lambda函数答案才能完整,仅因为如此。所以那里:

>>> mrep = lambda s, d: s if not d else mrep(s.replace(*d.popitem()), d)

用法:

>>> mrep('abcabc', {'a': '1', 'c': '2'})
'1b21b2'

笔记:

  • 这消耗了输入字典。
  • Python字典从3.6开始保留键顺序;其他答案中的相应警告不再适用。为了向后兼容,可以采用基于元组的版本:
>>> mrep = lambda s, d: s if not d else mrep(s.replace(*d.pop()), d)
>>> mrep('abcabc', [('a', '1'), ('c', '2')])

注意:与python中的所有递归函数一样,太大的递归深度(即太大的替换字典)将导致错误。参见例如这里

I feel this question needs a single-line recursive lambda function answer for completeness, just because. So there:

>>> mrep = lambda s, d: s if not d else mrep(s.replace(*d.popitem()), d)

Usage:

>>> mrep('abcabc', {'a': '1', 'c': '2'})
'1b21b2'

Notes:

  • This consumes the input dictionary.
  • Python dicts preserve key order as of 3.6; corresponding caveats in other answers are not relevant anymore. For backward compatibility one could resort to a tuple-based version:
>>> mrep = lambda s, d: s if not d else mrep(s.replace(*d.pop()), d)
>>> mrep('abcabc', [('a', '1'), ('c', '2')])

Note: As with all recursive functions in python, too large recursion depth (i.e. too large replacement dictionaries) will result in an error. See e.g. here.


回答 15

我不知道速度,但这是我的工作日快速解决方案:

reduce(lambda a, b: a.replace(*b)
    , [('o','W'), ('t','X')] #iterable of pairs: (oldval, newval)
    , 'tomato' #The string from which to replace values
    )

…但是我喜欢上面的#1正则表达式答案。注意-如果一个新值是另一个值的子字符串,则该操作不是可交换的。

I don’t know about speed but this is my workaday quick fix:

reduce(lambda a, b: a.replace(*b)
    , [('o','W'), ('t','X')] #iterable of pairs: (oldval, newval)
    , 'tomato' #The string from which to replace values
    )

… but I like the #1 regex answer above. Note – if one new value is a substring of another one then the operation is not commutative.


回答 16

您可以使用支持完全匹配以及正则表达式替换的pandas库和replace函数。例如:

df = pd.DataFrame({'text': ['Billy is going to visit Rome in November', 'I was born in 10/10/2010', 'I will be there at 20:00']})

to_replace=['Billy','Rome','January|February|March|April|May|June|July|August|September|October|November|December', '\d{2}:\d{2}', '\d{2}/\d{2}/\d{4}']
replace_with=['name','city','month','time', 'date']

print(df.text.replace(to_replace, replace_with, regex=True))

修改后的文本是:

0    name is going to visit city in month
1                      I was born in date
2                 I will be there at time

您可以在此处找到示例。请注意,文本上的替换是按照它们在列表中出现的顺序进行的

You can use the pandas library and the replace function which supports both exact matches as well as regex replacements. For example:

df = pd.DataFrame({'text': ['Billy is going to visit Rome in November', 'I was born in 10/10/2010', 'I will be there at 20:00']})

to_replace=['Billy','Rome','January|February|March|April|May|June|July|August|September|October|November|December', '\d{2}:\d{2}', '\d{2}/\d{2}/\d{4}']
replace_with=['name','city','month','time', 'date']

print(df.text.replace(to_replace, replace_with, regex=True))

And the modified text is:

0    name is going to visit city in month
1                      I was born in date
2                 I will be there at time

You can find an example here. Notice that the replacements on the text are done with the order they appear in the lists


回答 17

要只替换一个字符,请使用translatestr.maketrans是我最喜欢的方法。

tl; dr> result_string = your_string.translate(str.maketrans(dict_mapping))


演示

my_string = 'This is a test string.'
dict_mapping = {'i': 's', 's': 'S'}
result_good = my_string.translate(str.maketrans(dict_mapping))
result_bad = my_string
for x, y in dict_mapping.items():
    result_bad = result_bad.replace(x, y)
print(result_good)  # ThsS sS a teSt Strsng.
print(result_bad)   # ThSS SS a teSt StrSng.

For replace only one character, use the translate and str.maketrans is my favorite method.

tl;dr > result_string = your_string.translate(str.maketrans(dict_mapping))


demo

my_string = 'This is a test string.'
dict_mapping = {'i': 's', 's': 'S'}
result_good = my_string.translate(str.maketrans(dict_mapping))
result_bad = my_string
for x, y in dict_mapping.items():
    result_bad = result_bad.replace(x, y)
print(result_good)  # ThsS sS a teSt Strsng.
print(result_bad)   # ThSS SS a teSt StrSng.

回答 18

从安德鲁的宝贵答案开始,我开发了一个脚本,该脚本从文件中加载字典并详细说明打开的文件夹中的所有文件以进行替换。该脚本从可在其中设置分隔符的外部文件中加载映射。我是一个初学者,但是在多个文件中进行多次替换时,我发现此脚本非常有用。它以秒为单位加载了包含1000多个条目的字典。这不是优雅,但对我有用

import glob
import re

mapfile = input("Enter map file name with extension eg. codifica.txt: ")
sep = input("Enter map file column separator eg. |: ")
mask = input("Enter search mask with extension eg. 2010*txt for all files to be processed: ")
suff = input("Enter suffix with extension eg. _NEW.txt for newly generated files: ")

rep = {} # creation of empy dictionary

with open(mapfile) as temprep: # loading of definitions in the dictionary using input file, separator is prompted
    for line in temprep:
        (key, val) = line.strip('\n').split(sep)
        rep[key] = val

for filename in glob.iglob(mask): # recursion on all the files with the mask prompted

    with open (filename, "r") as textfile: # load each file in the variable text
        text = textfile.read()

        # start replacement
        #rep = dict((re.escape(k), v) for k, v in rep.items()) commented to enable the use in the mapping of re reserved characters
        pattern = re.compile("|".join(rep.keys()))
        text = pattern.sub(lambda m: rep[m.group(0)], text)

        #write of te output files with the prompted suffice
        target = open(filename[:-4]+"_NEW.txt", "w")
        target.write(text)
        target.close()

Starting from the precious answer of Andrew i developed a script that loads the dictionary from a file and elaborates all the files on the opened folder to do the replacements. The script loads the mappings from an external file in which you can set the separator. I’m a beginner but i found this script very useful when doing multiple substitutions in multiple files. It loaded a dictionary with more than 1000 entries in seconds. It is not elegant but it worked for me

import glob
import re

mapfile = input("Enter map file name with extension eg. codifica.txt: ")
sep = input("Enter map file column separator eg. |: ")
mask = input("Enter search mask with extension eg. 2010*txt for all files to be processed: ")
suff = input("Enter suffix with extension eg. _NEW.txt for newly generated files: ")

rep = {} # creation of empy dictionary

with open(mapfile) as temprep: # loading of definitions in the dictionary using input file, separator is prompted
    for line in temprep:
        (key, val) = line.strip('\n').split(sep)
        rep[key] = val

for filename in glob.iglob(mask): # recursion on all the files with the mask prompted

    with open (filename, "r") as textfile: # load each file in the variable text
        text = textfile.read()

        # start replacement
        #rep = dict((re.escape(k), v) for k, v in rep.items()) commented to enable the use in the mapping of re reserved characters
        pattern = re.compile("|".join(rep.keys()))
        text = pattern.sub(lambda m: rep[m.group(0)], text)

        #write of te output files with the prompted suffice
        target = open(filename[:-4]+"_NEW.txt", "w")
        target.write(text)
        target.close()

回答 19

这是我解决问题的方法。我在聊天机器人中使用它立即替换了不同的单词。

def mass_replace(text, dct):
    new_string = ""
    old_string = text
    while len(old_string) > 0:
        s = ""
        sk = ""
        for k in dct.keys():
            if old_string.startswith(k):
                s = dct[k]
                sk = k
        if s:
            new_string+=s
            old_string = old_string[len(sk):]
        else:
            new_string+=old_string[0]
            old_string = old_string[1:]
    return new_string

print mass_replace("The dog hunts the cat", {"dog":"cat", "cat":"dog"})

这将成为 The cat hunts the dog

this is my solution to the problem. I used it in a chatbot to replace the different words at once.

def mass_replace(text, dct):
    new_string = ""
    old_string = text
    while len(old_string) > 0:
        s = ""
        sk = ""
        for k in dct.keys():
            if old_string.startswith(k):
                s = dct[k]
                sk = k
        if s:
            new_string+=s
            old_string = old_string[len(sk):]
        else:
            new_string+=old_string[0]
            old_string = old_string[1:]
    return new_string

print mass_replace("The dog hunts the cat", {"dog":"cat", "cat":"dog"})

this will become The cat hunts the dog


回答 20

另一个例子:输入列表

error_list = ['[br]', '[ex]', 'Something']
words = ['how', 'much[ex]', 'is[br]', 'the', 'fish[br]', 'noSomething', 'really']

所需的输出将是

words = ['how', 'much', 'is', 'the', 'fish', 'no', 'really']

代码:

[n[0][0] if len(n[0]) else n[1] for n in [[[w.replace(e,"") for e in error_list if e in w],w] for w in words]] 

Another example : Input list

error_list = ['[br]', '[ex]', 'Something']
words = ['how', 'much[ex]', 'is[br]', 'the', 'fish[br]', 'noSomething', 'really']

The desired output would be

words = ['how', 'much', 'is', 'the', 'fish', 'no', 'really']

Code :

[n[0][0] if len(n[0]) else n[1] for n in [[[w.replace(e,"") for e in error_list if e in w],w] for w in words]] 

回答 21

或者只是为了快速破解:

for line in to_read:
    read_buffer = line              
    stripped_buffer1 = read_buffer.replace("term1", " ")
    stripped_buffer2 = stripped_buffer1.replace("term2", " ")
    write_to_file = to_write.write(stripped_buffer2)

Or just for a fast hack:

for line in to_read:
    read_buffer = line              
    stripped_buffer1 = read_buffer.replace("term1", " ")
    stripped_buffer2 = stripped_buffer1.replace("term2", " ")
    write_to_file = to_write.write(stripped_buffer2)

回答 22

这是使用字典的另一种方法:

listA="The cat jumped over the house".split()
modify = {word:word for number,word in enumerate(listA)}
modify["cat"],modify["jumped"]="dog","walked"
print " ".join(modify[x] for x in listA)

Here is another way of doing it with a dictionary:

listA="The cat jumped over the house".split()
modify = {word:word for number,word in enumerate(listA)}
modify["cat"],modify["jumped"]="dog","walked"
print " ".join(modify[x] for x in listA)

查找和替换列表中的元素

问题:查找和替换列表中的元素

我必须搜索一个列表,然后用一个元素替换所有出现的元素。到目前为止,我在代码方面的尝试使我无处可寻,做到这一点的最佳方法是什么?

例如,假设我的列表具有以下整数

>>> a = [1,2,3,4,5,1,2,3,4,5,1]

我需要将所有出现的数字1替换为值10,所以我需要的输出是

>>> a = [10, 2, 3, 4, 5, 10, 2, 3, 4, 5, 10]

因此,我的目标是将数字1的所有实例替换为数字10。

I have to search through a list and replace all occurrences of one element with another. So far my attempts in code are getting me nowhere, what is the best way to do this?

For example, suppose my list has the following integers

>>> a = [1,2,3,4,5,1,2,3,4,5,1]

and I need to replace all occurrences of the number 1 with the value 10 so the output I need is

>>> a = [10, 2, 3, 4, 5, 10, 2, 3, 4, 5, 10]

Thus my goal is to replace all instances of the number 1 with the number 10.


回答 0

>>> a= [1, 2, 3, 4, 5, 1, 2, 3, 4, 5, 1]
>>> for n, i in enumerate(a):
...   if i == 1:
...      a[n] = 10
...
>>> a
[10, 2, 3, 4, 5, 10, 2, 3, 4, 5, 10]
>>> a= [1, 2, 3, 4, 5, 1, 2, 3, 4, 5, 1]
>>> for n, i in enumerate(a):
...   if i == 1:
...      a[n] = 10
...
>>> a
[10, 2, 3, 4, 5, 10, 2, 3, 4, 5, 10]

回答 1

尝试使用列表推导三元运算符

>>> a=[1,2,3,1,3,2,1,1]
>>> [4 if x==1 else x for x in a]
[4, 2, 3, 4, 3, 2, 4, 4]

Try using a list comprehension and the ternary operator.

>>> a=[1,2,3,1,3,2,1,1]
>>> [4 if x==1 else x for x in a]
[4, 2, 3, 4, 3, 2, 4, 4]

回答 2

列表理解效果很好,使用enumerate循环可以节省一些内存(b / c操作实际上已就位)。

还有功能编程。查看地图用法:

>>> a = [1,2,3,2,3,4,3,5,6,6,5,4,5,4,3,4,3,2,1]
>>> map(lambda x: x if x != 4 else 'sss', a)
[1, 2, 3, 2, 3, 'sss', 3, 5, 6, 6, 5, 'sss', 5, 'sss', 3, 'sss', 3, 2, 1]

List comprehension works well, and looping through with enumerate can save you some memory (b/c the operation’s essentially being done in place).

There’s also functional programming. See usage of map:

>>> a = [1,2,3,2,3,4,3,5,6,6,5,4,5,4,3,4,3,2,1]
>>> map(lambda x: x if x != 4 else 'sss', a)
[1, 2, 3, 2, 3, 'sss', 3, 5, 6, 6, 5, 'sss', 5, 'sss', 3, 'sss', 3, 2, 1]

回答 3

如果您要替换多个值,则还可以使用字典:

a = [1, 2, 3, 4, 1, 5, 3, 2, 6, 1, 1]
dic = {1:10, 2:20, 3:'foo'}

print([dic.get(n, n) for n in a])

> [10, 20, 'foo', 4, 10, 5, 'foo', 20, 6, 10, 10]

If you have several values to replace, you can also use a dictionary:

a = [1, 2, 3, 4, 1, 5, 3, 2, 6, 1, 1]
dic = {1:10, 2:20, 3:'foo'}

print([dic.get(n, n) for n in a])

> [10, 20, 'foo', 4, 10, 5, 'foo', 20, 6, 10, 10]

回答 4

>>> a=[1,2,3,4,5,1,2,3,4,5,1]
>>> item_to_replace = 1
>>> replacement_value = 6
>>> indices_to_replace = [i for i,x in enumerate(a) if x==item_to_replace]
>>> indices_to_replace
[0, 5, 10]
>>> for i in indices_to_replace:
...     a[i] = replacement_value
... 
>>> a
[6, 2, 3, 4, 5, 6, 2, 3, 4, 5, 6]
>>> 
>>> a=[1,2,3,4,5,1,2,3,4,5,1]
>>> item_to_replace = 1
>>> replacement_value = 6
>>> indices_to_replace = [i for i,x in enumerate(a) if x==item_to_replace]
>>> indices_to_replace
[0, 5, 10]
>>> for i in indices_to_replace:
...     a[i] = replacement_value
... 
>>> a
[6, 2, 3, 4, 5, 6, 2, 3, 4, 5, 6]
>>> 

回答 5

a = [1,2,3,4,5,1,2,3,4,5,1,12]
for i in range (len(a)):
    if a[i]==2:
        a[i]=123

您可以使用for和while循环;但是,如果您知道内置的枚举功能,则建议使用枚举。1个

a = [1,2,3,4,5,1,2,3,4,5,1,12]
for i in range (len(a)):
    if a[i]==2:
        a[i]=123

You can use a for and or while loop; however if u know the builtin Enumerate function, then it is recommended to use Enumerate.1


回答 6

所有轻易更换110a = [1,2,3,4,5,1,2,3,4,5,1]一个可以使用下面的一行拉姆达+地图相结合,与“看,妈妈,没有如果或者维权!” :

# This substitutes all '1' with '10' in list 'a' and places result in list 'c':

c = list(map(lambda b: b.replace("1","10"), a))

To replace easily all 1 with 10 in a = [1,2,3,4,5,1,2,3,4,5,1]one could use the following one-line lambda+map combination, and ‘Look, Ma, no IFs or FORs!’ :

# This substitutes all '1' with '10' in list 'a' and places result in list 'c':

c = list(map(lambda b: b.replace("1","10"), a))


回答 7

以下是Python 2.x中非常直接的方法

 a = [1,2,3,4,5,1,2,3,4,5,1]        #Replacing every 1 with 10
 for i in xrange(len(a)):
   if a[i] == 1:
     a[i] = 10  
 print a

此方法有效。欢迎发表评论。希望能帮助到你 :)

也请尝试了解outisdamzam的解决方案的工作方式。列表压缩和lambda函数是有用的工具。

The following is a very straightforward method in Python 3.x

 a = [1,2,3,4,5,1,2,3,4,5,1]        #Replacing every 1 with 10
 for i in range(len(a)):
   if a[i] == 1:
     a[i] = 10  
 print(a)

This method works. Comments are welcome. Hope it helps :)

Also try understanding how outis’s and damzam’s solutions work. List compressions and lambda function are useful tools.


回答 8

我知道这是一个非常老的问题,并且有很多方法可以解决。我发现较简单的一种是使用numpy包。

import numpy

arr = numpy.asarray([1, 6, 1, 9, 8])
arr[ arr == 8 ] = 0 # change all occurrences of 8 by 0
print(arr)

I know this is a very old question and there’s a myriad of ways to do it. The simpler one I found is using numpy package.

import numpy

arr = numpy.asarray([1, 6, 1, 9, 8])
arr[ arr == 8 ] = 0 # change all occurrences of 8 by 0
print(arr)

回答 9

我的用例已替换None为一些默认值。

我已经定时提出了解决此问题的方法,包括@kxr-using str.count

使用Python 3.8.1在ipython中测试代码:

def rep1(lst, replacer = 0):
    ''' List comprehension, new list '''

    return [item if item is not None else replacer for item in lst]


def rep2(lst, replacer = 0):
    ''' List comprehension, in-place '''    
    lst[:] =  [item if item is not None else replacer for item in lst]

    return lst


def rep3(lst, replacer = 0):
    ''' enumerate() with comparison - in-place '''
    for idx, item in enumerate(lst):
        if item is None:
            lst[idx] = replacer

    return lst


def rep4(lst, replacer = 0):
    ''' Using str.index + Exception, in-place '''

    idx = -1
    # none_amount = lst.count(None)
    while True:
        try:
            idx = lst.index(None, idx+1)
        except ValueError:
            break
        else:
            lst[idx] = replacer

    return lst


def rep5(lst, replacer = 0):
    ''' Using str.index + str.count, in-place '''

    idx = -1
    for _ in range(lst.count(None)):
        idx = lst.index(None, idx+1)
        lst[idx] = replacer

    return lst


def rep6(lst, replacer = 0):
    ''' Using map, return map iterator '''

    return map(lambda item: item if item is not None else replacer, lst)


def rep7(lst, replacer = 0):
    ''' Using map, return new list '''

    return list(map(lambda item: item if item is not None else replacer, lst))


lst = [5]*10**6
# lst = [None]*10**6

%timeit rep1(lst)    
%timeit rep2(lst)    
%timeit rep3(lst)    
%timeit rep4(lst)    
%timeit rep5(lst)    
%timeit rep6(lst)    
%timeit rep7(lst)    

我得到:

26.3 ms ± 163 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
29.3 ms ± 206 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
33.8 ms ± 191 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
11.9 ms ± 37.8 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
11.9 ms ± 60.2 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
260 ns ± 1.84 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
56.5 ms ± 204 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)

str.index实际上,使用内部比任何手动比较都快。

我不知道测试4中的异常是否比使用更费力str.count,差异似乎可以忽略不计。

请注意,map()(测试6)返回迭代器而不是实际列表,因此测试7。

My usecase was replacing None with some default value.

I’ve timed approaches to this problem that were presented here, including the one by @kxr – using str.count.

Test code in ipython with Python 3.8.1:

def rep1(lst, replacer = 0):
    ''' List comprehension, new list '''

    return [item if item is not None else replacer for item in lst]


def rep2(lst, replacer = 0):
    ''' List comprehension, in-place '''    
    lst[:] =  [item if item is not None else replacer for item in lst]

    return lst


def rep3(lst, replacer = 0):
    ''' enumerate() with comparison - in-place '''
    for idx, item in enumerate(lst):
        if item is None:
            lst[idx] = replacer

    return lst


def rep4(lst, replacer = 0):
    ''' Using str.index + Exception, in-place '''

    idx = -1
    # none_amount = lst.count(None)
    while True:
        try:
            idx = lst.index(None, idx+1)
        except ValueError:
            break
        else:
            lst[idx] = replacer

    return lst


def rep5(lst, replacer = 0):
    ''' Using str.index + str.count, in-place '''

    idx = -1
    for _ in range(lst.count(None)):
        idx = lst.index(None, idx+1)
        lst[idx] = replacer

    return lst


def rep6(lst, replacer = 0):
    ''' Using map, return map iterator '''

    return map(lambda item: item if item is not None else replacer, lst)


def rep7(lst, replacer = 0):
    ''' Using map, return new list '''

    return list(map(lambda item: item if item is not None else replacer, lst))


lst = [5]*10**6
# lst = [None]*10**6

%timeit rep1(lst)    
%timeit rep2(lst)    
%timeit rep3(lst)    
%timeit rep4(lst)    
%timeit rep5(lst)    
%timeit rep6(lst)    
%timeit rep7(lst)    

I get:

26.3 ms ± 163 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
29.3 ms ± 206 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
33.8 ms ± 191 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
11.9 ms ± 37.8 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
11.9 ms ± 60.2 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
260 ns ± 1.84 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
56.5 ms ± 204 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)

Using the internal str.index is in fact faster than any manual comparison.

I didn’t know if the exception in test 4 would be more laborious than using str.count, the difference seems negligible.

Note that map() (test 6) returns an iterator and not an actual list, thus test 7.


回答 10

list.index()与其他答案中提出的单步迭代方法相比,在长列表和罕见情况下,使用它的速度大约快3倍。

def list_replace(lst, old=1, new=10):
    """replace list elements (inplace)"""
    i = -1
    try:
        while 1:
            i = lst.index(old, i + 1)
            lst[i] = new
    except ValueError:
        pass

On long lists and rare occurrences its about 3x faster using list.index() – compared to single step iteration methods presented in the other answers.

def list_replace(lst, old=1, new=10):
    """replace list elements (inplace)"""
    i = -1
    try:
        while 1:
            i = lst.index(old, i + 1)
            lst[i] = new
    except ValueError:
        pass

回答 11

您可以在python中简单地使用列表理解:

def replace_element(YOUR_LIST, set_to=NEW_VALUE):
    return [i
            if SOME_CONDITION
            else NEW_VALUE
            for i in YOUR_LIST]

对于您的情况,要将所有出现的1替换为10,代码片段将如下所示:

def replace_element(YOUR_LIST, set_to=10):
    return [i
            if i != 1  # keeps all elements not equal to one
            else set_to  # replaces 1 with 10
            for i in YOUR_LIST]

You can simply use list comprehension in python:

def replace_element(YOUR_LIST, set_to=NEW_VALUE):
    return [i
            if SOME_CONDITION
            else NEW_VALUE
            for i in YOUR_LIST]

for your case, where you want to replace all occurrences of 1 with 10, the code snippet will be like this:

def replace_element(YOUR_LIST, set_to=10):
    return [i
            if i != 1  # keeps all elements not equal to one
            else set_to  # replaces 1 with 10
            for i in YOUR_LIST]

回答 12

仅查找和替换一项

ur_list = [1,2,1]     # replace the first 1 wiz 11

loc = ur_list.index(1)
ur_list.remove(1)
ur_list.insert(loc, 11)

----------
[11,2,1]

Find & replace just one item

your_list = [1,2,1]     # replace the first 1 with 11

loc = your_list.index(1)
your_list.remove(1)
your_list.insert(loc, 11)

如何在string.replace中输入正则表达式?

问题:如何在string.replace中输入正则表达式?

我需要一些帮助来声明正则表达式。我的输入如下:

this is a paragraph with<[1> in between</[1> and then there are cases ... where the<[99> number ranges from 1-100</[99>. 
and there are many other lines in the txt files
with<[3> such tags </[3>

所需的输出是:

this is a paragraph with in between and then there are cases ... where the number ranges from 1-100. 
and there are many other lines in the txt files
with such tags

我已经试过了:

#!/usr/bin/python
import os, sys, re, glob
for infile in glob.glob(os.path.join(os.getcwd(), '*.txt')):
    for line in reader: 
        line2 = line.replace('<[1> ', '')
        line = line2.replace('</[1> ', '')
        line2 = line.replace('<[1>', '')
        line = line2.replace('</[1>', '')

        print line

我也尝试过此方法(但似乎我使用了错误的regex语法):

    line2 = line.replace('<[*> ', '')
    line = line2.replace('</[*> ', '')
    line2 = line.replace('<[*>', '')
    line = line2.replace('</[*>', '')

我不想replace从1到99 进行硬编码。。。

I need some help on declaring a regex. My inputs are like the following:

this is a paragraph with<[1> in between</[1> and then there are cases ... where the<[99> number ranges from 1-100</[99>. 
and there are many other lines in the txt files
with<[3> such tags </[3>

The required output is:

this is a paragraph with in between and then there are cases ... where the number ranges from 1-100. 
and there are many other lines in the txt files
with such tags

I’ve tried this:

#!/usr/bin/python
import os, sys, re, glob
for infile in glob.glob(os.path.join(os.getcwd(), '*.txt')):
    for line in reader: 
        line2 = line.replace('<[1> ', '')
        line = line2.replace('</[1> ', '')
        line2 = line.replace('<[1>', '')
        line = line2.replace('</[1>', '')

        print line

I’ve also tried this (but it seems like I’m using the wrong regex syntax):

    line2 = line.replace('<[*> ', '')
    line = line2.replace('</[*> ', '')
    line2 = line.replace('<[*>', '')
    line = line2.replace('</[*>', '')

I dont want to hard-code the replace from 1 to 99 . . .


回答 0

这个经过测试的代码段应该做到这一点:

import re
line = re.sub(r"</?\[\d+>", "", line)

编辑:这是解释其工作方式的注释版本:

line = re.sub(r"""
  (?x) # Use free-spacing mode.
  <    # Match a literal '<'
  /?   # Optionally match a '/'
  \[   # Match a literal '['
  \d+  # Match one or more digits
  >    # Match a literal '>'
  """, "", line)

正则表达式是 有趣!但我强烈建议您花一两个小时来学习基础知识。对于初学者,您需要学习哪些特殊字符:需要转义的“元字符”(即,前面加反斜杠-字符类的内外规则都不同。)在以下位置有一个出色的在线教程:www .regular-expressions.info。您在那里度过的时间将使自己获得很多回报。祝您满意!

This tested snippet should do it:

import re
line = re.sub(r"</?\[\d+>", "", line)

Edit: Here’s a commented version explaining how it works:

line = re.sub(r"""
  (?x) # Use free-spacing mode.
  <    # Match a literal '<'
  /?   # Optionally match a '/'
  \[   # Match a literal '['
  \d+  # Match one or more digits
  >    # Match a literal '>'
  """, "", line)

Regexes are fun! But I would strongly recommend spending an hour or two studying the basics. For starters, you need to learn which characters are special: “metacharacters” which need to be escaped (i.e. with a backslash placed in front – and the rules are different inside and outside character classes.) There is an excellent online tutorial at: www.regular-expressions.info. The time you spend there will pay for itself many times over. Happy regexing!


回答 1

str.replace()进行固定替换。使用re.sub()代替。

str.replace() does fixed replacements. Use re.sub() instead.


回答 2

我会这样(正则表达式在注释中说明):

import re

# If you need to use the regex more than once it is suggested to compile it.
pattern = re.compile(r"</{0,}\[\d+>")

# <\/{0,}\[\d+>
# 
# Match the character “<” literally «<»
# Match the character “/” literally «\/{0,}»
#    Between zero and unlimited times, as many times as possible, giving back as needed (greedy) «{0,}»
# Match the character “[” literally «\[»
# Match a single digit 0..9 «\d+»
#    Between one and unlimited times, as many times as possible, giving back as needed (greedy) «+»
# Match the character “>” literally «>»

subject = """this is a paragraph with<[1> in between</[1> and then there are cases ... where the<[99> number ranges from 1-100</[99>. 
and there are many other lines in the txt files
with<[3> such tags </[3>"""

result = pattern.sub("", subject)

print(result)

如果您想了解有关正则表达式的更多信息,建议阅读 Jan Goyvaerts和Steven Levithan撰写的《表达式食谱》

I would go like this (regex explained in comments):

import re

# If you need to use the regex more than once it is suggested to compile it.
pattern = re.compile(r"</{0,}\[\d+>")

# <\/{0,}\[\d+>
# 
# Match the character “<” literally «<»
# Match the character “/” literally «\/{0,}»
#    Between zero and unlimited times, as many times as possible, giving back as needed (greedy) «{0,}»
# Match the character “[” literally «\[»
# Match a single digit 0..9 «\d+»
#    Between one and unlimited times, as many times as possible, giving back as needed (greedy) «+»
# Match the character “>” literally «>»

subject = """this is a paragraph with<[1> in between</[1> and then there are cases ... where the<[99> number ranges from 1-100</[99>. 
and there are many other lines in the txt files
with<[3> such tags </[3>"""

result = pattern.sub("", subject)

print(result)

If you want to learn more about regex I recomend to read Regular Expressions Cookbook by Jan Goyvaerts and Steven Levithan.


回答 3

最简单的方法

import re

txt='this is a paragraph with<[1> in between</[1> and then there are cases ... where the<[99> number ranges from 1-100</[99>.  and there are many other lines in the txt files with<[3> such tags </[3>'

out = re.sub("(<[^>]+>)", '', txt)
print out

The easiest way

import re

txt='this is a paragraph with<[1> in between</[1> and then there are cases ... where the<[99> number ranges from 1-100</[99>.  and there are many other lines in the txt files with<[3> such tags </[3>'

out = re.sub("(<[^>]+>)", '', txt)
print out

回答 4

字符串对象的replace方法不接受正则表达式,而仅接受固定的字符串(请参见文档:http : //docs.python.org/2/library/stdtypes.html#str.replace)。

您必须使用re模块:

import re
newline= re.sub("<\/?\[[0-9]+>", "", line)

replace method of string objects does not accept regular expressions but only fixed strings (see documentation: http://docs.python.org/2/library/stdtypes.html#str.replace).

You have to use re module:

import re
newline= re.sub("<\/?\[[0-9]+>", "", line)

回答 5

不必使用正则表达式(用于您的示例字符串)

>>> s
'this is a paragraph with<[1> in between</[1> and then there are cases ... where the<[99> number ranges from 1-100</[99>. \nand there are many other lines in the txt files\nwith<[3> such tags </[3>\n'

>>> for w in s.split(">"):
...   if "<" in w:
...      print w.split("<")[0]
...
this is a paragraph with
 in between
 and then there are cases ... where the
 number ranges from 1-100
.
and there are many other lines in the txt files
with
 such tags

don’t have to use regular expression (for your sample string)

>>> s
'this is a paragraph with<[1> in between</[1> and then there are cases ... where the<[99> number ranges from 1-100</[99>. \nand there are many other lines in the txt files\nwith<[3> such tags </[3>\n'

>>> for w in s.split(">"):
...   if "<" in w:
...      print w.split("<")[0]
...
this is a paragraph with
 in between
 and then there are cases ... where the
 number ranges from 1-100
.
and there are many other lines in the txt files
with
 such tags

回答 6

import os, sys, re, glob

pattern = re.compile(r"\<\[\d\>")
replacementStringMatchesPattern = "<[1>"

for infile in glob.glob(os.path.join(os.getcwd(), '*.txt')):
   for line in reader: 
      retline =  pattern.sub(replacementStringMatchesPattern, "", line)         
      sys.stdout.write(retline)
      print (retline)
import os, sys, re, glob

pattern = re.compile(r"\<\[\d\>")
replacementStringMatchesPattern = "<[1>"

for infile in glob.glob(os.path.join(os.getcwd(), '*.txt')):
   for line in reader: 
      retline =  pattern.sub(replacementStringMatchesPattern, "", line)         
      sys.stdout.write(retline)
      print (retline)

重命名熊猫列

问题:重命名熊猫列

我有一个使用熊猫和列标签的DataFrame,我需要对其进行编辑以替换原始列标签。

我想A在原始列名称为的DataFrame 中更改列名称:

['$a', '$b', '$c', '$d', '$e'] 

['a', 'b', 'c', 'd', 'e'].

我已经将编辑后的列名存储在列表中,但是我不知道如何替换列名。

I have a DataFrame using pandas and column labels that I need to edit to replace the original column labels.

I’d like to change the column names in a DataFrame A where the original column names are:

['$a', '$b', '$c', '$d', '$e'] 

to

['a', 'b', 'c', 'd', 'e'].

I have the edited column names stored it in a list, but I don’t know how to replace the column names.


回答 0

只需将其分配给.columns属性:

>>> df = pd.DataFrame({'$a':[1,2], '$b': [10,20]})
>>> df.columns = ['a', 'b']
>>> df
   a   b
0  1  10
1  2  20

Just assign it to the .columns attribute:

>>> df = pd.DataFrame({'$a':[1,2], '$b': [10,20]})
>>> df.columns = ['a', 'b']
>>> df
   a   b
0  1  10
1  2  20

回答 1

重命名特定列

使用该df.rename()函数并引用要重命名的列。并非所有列都必须重命名:

df = df.rename(columns={'oldName1': 'newName1', 'oldName2': 'newName2'})
# Or rename the existing DataFrame (rather than creating a copy) 
df.rename(columns={'oldName1': 'newName1', 'oldName2': 'newName2'}, inplace=True)

最小代码示例

df = pd.DataFrame('x', index=range(3), columns=list('abcde'))
df

   a  b  c  d  e
0  x  x  x  x  x
1  x  x  x  x  x
2  x  x  x  x  x

下列方法均起作用并产生相同的输出:

df2 = df.rename({'a': 'X', 'b': 'Y'}, axis=1)  # new method
df2 = df.rename({'a': 'X', 'b': 'Y'}, axis='columns')
df2 = df.rename(columns={'a': 'X', 'b': 'Y'})  # old method  

df2

   X  Y  c  d  e
0  x  x  x  x  x
1  x  x  x  x  x
2  x  x  x  x  x

切记将结果分配回去,因为修改未就位。或者,指定inplace=True

df.rename({'a': 'X', 'b': 'Y'}, axis=1, inplace=True)
df

   X  Y  c  d  e
0  x  x  x  x  x
1  x  x  x  x  x
2  x  x  x  x  x

从v0.25版开始,如果指定errors='raise'了无效的“要重命名的列” ,您还可以指定引发错误。参见v0.25 rename()文档


REASSIGN列标题

df.set_axis()axis=1inplace=False一起使用(返回副本)。

df2 = df.set_axis(['V', 'W', 'X', 'Y', 'Z'], axis=1, inplace=False)
df2

   V  W  X  Y  Z
0  x  x  x  x  x
1  x  x  x  x  x
2  x  x  x  x  x

这将返回一个副本,但是您可以通过设置来就地修改DataFrame inplace=True(这是版本<= 0.24的默认行为,但将来可能会更改)。

您还可以直接分配标题:

df.columns = ['V', 'W', 'X', 'Y', 'Z']
df

   V  W  X  Y  Z
0  x  x  x  x  x
1  x  x  x  x  x
2  x  x  x  x  x

RENAME SPECIFIC COLUMNS

Use the df.rename() function and refer the columns to be renamed. Not all the columns have to be renamed:

df = df.rename(columns={'oldName1': 'newName1', 'oldName2': 'newName2'})
# Or rename the existing DataFrame (rather than creating a copy) 
df.rename(columns={'oldName1': 'newName1', 'oldName2': 'newName2'}, inplace=True)

Minimal Code Example

df = pd.DataFrame('x', index=range(3), columns=list('abcde'))
df

   a  b  c  d  e
0  x  x  x  x  x
1  x  x  x  x  x
2  x  x  x  x  x

The following methods all work and produce the same output:

df2 = df.rename({'a': 'X', 'b': 'Y'}, axis=1)  # new method
df2 = df.rename({'a': 'X', 'b': 'Y'}, axis='columns')
df2 = df.rename(columns={'a': 'X', 'b': 'Y'})  # old method  

df2

   X  Y  c  d  e
0  x  x  x  x  x
1  x  x  x  x  x
2  x  x  x  x  x

Remember to assign the result back, as the modification is not-inplace. Alternatively, specify inplace=True:

df.rename({'a': 'X', 'b': 'Y'}, axis=1, inplace=True)
df

   X  Y  c  d  e
0  x  x  x  x  x
1  x  x  x  x  x
2  x  x  x  x  x

From v0.25, you can also specify errors='raise' to raise errors if an invalid column-to-rename is specified. See v0.25 rename() docs.


REASSIGN COLUMN HEADERS

Use df.set_axis() with axis=1 and inplace=False (to return a copy).

df2 = df.set_axis(['V', 'W', 'X', 'Y', 'Z'], axis=1, inplace=False)
df2

   V  W  X  Y  Z
0  x  x  x  x  x
1  x  x  x  x  x
2  x  x  x  x  x

This returns a copy, but you can modify the DataFrame in-place by setting inplace=True (this is the default behaviour for versions <=0.24 but is likely to change in the future).

You can also assign headers directly:

df.columns = ['V', 'W', 'X', 'Y', 'Z']
df

   V  W  X  Y  Z
0  x  x  x  x  x
1  x  x  x  x  x
2  x  x  x  x  x

回答 2

rename方法可以带有一个函数,例如:

In [11]: df.columns
Out[11]: Index([u'$a', u'$b', u'$c', u'$d', u'$e'], dtype=object)

In [12]: df.rename(columns=lambda x: x[1:], inplace=True)

In [13]: df.columns
Out[13]: Index([u'a', u'b', u'c', u'd', u'e'], dtype=object)

The rename method can take a function, for example:

In [11]: df.columns
Out[11]: Index([u'$a', u'$b', u'$c', u'$d', u'$e'], dtype=object)

In [12]: df.rename(columns=lambda x: x[1:], inplace=True)

In [13]: df.columns
Out[13]: Index([u'a', u'b', u'c', u'd', u'e'], dtype=object)

回答 3

使用文本数据中所述

df.columns = df.columns.str.replace('$','')

As documented in Working with text data:

df.columns = df.columns.str.replace('$','')

回答 4

熊猫0.21+答案

0.21版中对列重命名进行了一些重大更新。

  • rename方法添加了axis可以设置为columns或的参数1。此更新使该方法与其他pandas API匹配。它仍然具有indexcolumns参数,但是您不再被迫使用它们。
  • set_axis方法inplace设置为False可以使所有的索引或列标签与命名列表。

熊猫的例子0.21+

构造样本DataFrame:

df = pd.DataFrame({'$a':[1,2], '$b': [3,4], 
                   '$c':[5,6], '$d':[7,8], 
                   '$e':[9,10]})

   $a  $b  $c  $d  $e
0   1   3   5   7   9
1   2   4   6   8  10

renameaxis='columns'或一起使用axis=1

df.rename({'$a':'a', '$b':'b', '$c':'c', '$d':'d', '$e':'e'}, axis='columns')

要么

df.rename({'$a':'a', '$b':'b', '$c':'c', '$d':'d', '$e':'e'}, axis=1)

两者都导致以下结果:

   a  b  c  d   e
0  1  3  5  7   9
1  2  4  6  8  10

仍然可以使用旧的方法签名:

df.rename(columns={'$a':'a', '$b':'b', '$c':'c', '$d':'d', '$e':'e'})

rename函数还接受将应用于每个列名称的函数。

df.rename(lambda x: x[1:], axis='columns')

要么

df.rename(lambda x: x[1:], axis=1)

set_axis与列表一起使用inplace=False

您可以为该set_axis方法提供一个列表,该列表的长度等于列(或索引)的数量。当前,inplace默认值为True,但在将来的版本inplace中将默认为False

df.set_axis(['a', 'b', 'c', 'd', 'e'], axis='columns', inplace=False)

要么

df.set_axis(['a', 'b', 'c', 'd', 'e'], axis=1, inplace=False)

为什么不使用df.columns = ['a', 'b', 'c', 'd', 'e']

像这样直接分配列没有错。这是一个完美的解决方案。

using的优点set_axis是它可以用作方法链的一部分,并返回DataFrame的新副本。没有它,您将不得不在重新分配列之前将链的中间步骤存储到另一个变量。

# new for pandas 0.21+
df.some_method1()
  .some_method2()
  .set_axis()
  .some_method3()

# old way
df1 = df.some_method1()
        .some_method2()
df1.columns = columns
df1.some_method3()

Pandas 0.21+ Answer

There have been some significant updates to column renaming in version 0.21.

  • The rename method has added the axis parameter which may be set to columns or 1. This update makes this method match the rest of the pandas API. It still has the index and columns parameters but you are no longer forced to use them.
  • The set_axis method with the inplace set to False enables you to rename all the index or column labels with a list.

Examples for Pandas 0.21+

Construct sample DataFrame:

df = pd.DataFrame({'$a':[1,2], '$b': [3,4], 
                   '$c':[5,6], '$d':[7,8], 
                   '$e':[9,10]})

   $a  $b  $c  $d  $e
0   1   3   5   7   9
1   2   4   6   8  10

Using rename with axis='columns' or axis=1

df.rename({'$a':'a', '$b':'b', '$c':'c', '$d':'d', '$e':'e'}, axis='columns')

or

df.rename({'$a':'a', '$b':'b', '$c':'c', '$d':'d', '$e':'e'}, axis=1)

Both result in the following:

   a  b  c  d   e
0  1  3  5  7   9
1  2  4  6  8  10

It is still possible to use the old method signature:

df.rename(columns={'$a':'a', '$b':'b', '$c':'c', '$d':'d', '$e':'e'})

The rename function also accepts functions that will be applied to each column name.

df.rename(lambda x: x[1:], axis='columns')

or

df.rename(lambda x: x[1:], axis=1)

Using set_axis with a list and inplace=False

You can supply a list to the set_axis method that is equal in length to the number of columns (or index). Currently, inplace defaults to True, but inplace will be defaulted to False in future releases.

df.set_axis(['a', 'b', 'c', 'd', 'e'], axis='columns', inplace=False)

or

df.set_axis(['a', 'b', 'c', 'd', 'e'], axis=1, inplace=False)

Why not use df.columns = ['a', 'b', 'c', 'd', 'e']?

There is nothing wrong with assigning columns directly like this. It is a perfectly good solution.

The advantage of using set_axis is that it can be used as part of a method chain and that it returns a new copy of the DataFrame. Without it, you would have to store your intermediate steps of the chain to another variable before reassigning the columns.

# new for pandas 0.21+
df.some_method1()
  .some_method2()
  .set_axis()
  .some_method3()

# old way
df1 = df.some_method1()
        .some_method2()
df1.columns = columns
df1.some_method3()

回答 5

由于只想删除所有列名中的$符号,因此可以执行以下操作:

df = df.rename(columns=lambda x: x.replace('$', ''))

要么

df.rename(columns=lambda x: x.replace('$', ''), inplace=True)

Since you only want to remove the $ sign in all column names, you could just do:

df = df.rename(columns=lambda x: x.replace('$', ''))

OR

df.rename(columns=lambda x: x.replace('$', ''), inplace=True)

回答 6

df.columns = ['a', 'b', 'c', 'd', 'e']

它将按照您提供的顺序用您提供的名称替换现有名称。

df.columns = ['a', 'b', 'c', 'd', 'e']

It will replace the existing names with the names you provide, in the order you provide.


回答 7

old_names = ['$a', '$b', '$c', '$d', '$e'] 
new_names = ['a', 'b', 'c', 'd', 'e']
df.rename(columns=dict(zip(old_names, new_names)), inplace=True)

这样,您可以根据需要手动编辑new_names。当您只需要重命名几列以纠正拼写错误,重音符号,删除特殊字符等时,效果很好。

old_names = ['$a', '$b', '$c', '$d', '$e'] 
new_names = ['a', 'b', 'c', 'd', 'e']
df.rename(columns=dict(zip(old_names, new_names)), inplace=True)

This way you can manually edit the new_names as you wish. Works great when you need to rename only a few columns to correct mispellings, accents, remove special characters etc.


回答 8

一线或管道解决方案

我将专注于两件事:

  1. OP明确指出

    我已经将编辑后的列名存储在列表中,但是我不知道如何替换列名。

    我不想解决如何替换'$'或删除每个列标题中的第一个字符的问题。OP已完成此步骤。相反,我想集中精力用columns给定替换列名称列表的新对象替换现有对象。

  2. df.columns = newnew新列名称的列表在哪里就变得很简单。这种方法的缺点是,它需要编辑现有数据框的columns属性,并且无法内联完成。我将展示一些通过流水执行此操作而不编辑现有数据框的方法。


设置1
为了着重于需要使用现有列表重命名替换列名称,我将创建一个df具有初始列名称和不相关的新列名称的新示例数据框。

df = pd.DataFrame({'Jack': [1, 2], 'Mahesh': [3, 4], 'Xin': [5, 6]})
new = ['x098', 'y765', 'z432']

df

   Jack  Mahesh  Xin
0     1       3    5
1     2       4    6

解决方案1
pd.DataFrame.rename

已经有人说过,如果您有一个字典将旧的列名映射到新的列名,则可以使用pd.DataFrame.rename

d = {'Jack': 'x098', 'Mahesh': 'y765', 'Xin': 'z432'}
df.rename(columns=d)

   x098  y765  z432
0     1     3     5
1     2     4     6

但是,您可以轻松创建该词典并将其包含在对的调用中rename。以下内容利用了以下事实:迭代时df,我们迭代每个列名。

# given just a list of new column names
df.rename(columns=dict(zip(df, new)))

   x098  y765  z432
0     1     3     5
1     2     4     6

如果您原始的列名是唯一的,那么这很好。但是,如果不是这样,那么就会崩溃。


设置2个
非唯一列

df = pd.DataFrame(
    [[1, 3, 5], [2, 4, 6]],
    columns=['Mahesh', 'Mahesh', 'Xin']
)
new = ['x098', 'y765', 'z432']

df

   Mahesh  Mahesh  Xin
0       1       3    5
1       2       4    6

解决方案2
pd.concat使用keys参数

首先,请注意当我们尝试使用解决方案1时会发生什么:

df.rename(columns=dict(zip(df, new)))

   y765  y765  z432
0     1     3     5
1     2     4     6

我们没有将new列表映射为列名。我们最终重复了y765。相反,我们可以在遍历的列时使用函数的keys参数。pd.concatdf

pd.concat([c for _, c in df.items()], axis=1, keys=new) 

   x098  y765  z432
0     1     3     5
1     2     4     6

解决方案3
重建。仅当dtype所有列都有一个时,才应使用此选项。否则,您最终将dtype object获得所有列,并且将它们转换回需要更多的词典工作。

dtype

pd.DataFrame(df.values, df.index, new)

   x098  y765  z432
0     1     3     5
1     2     4     6

混合的 dtype

pd.DataFrame(df.values, df.index, new).astype(dict(zip(new, df.dtypes)))

   x098  y765  z432
0     1     3     5
1     2     4     6

解决方案4
这是使用transpose和的花招set_indexpd.DataFrame.set_index允许我们设置内联索引,但没有对应的set_columns。这样我们就可以转置,然后再set_index转回。但是,此处适用解决方案3 的相同警告dtype与混合dtype警告。

dtype

df.T.set_index(np.asarray(new)).T

   x098  y765  z432
0     1     3     5
1     2     4     6

混合的 dtype

df.T.set_index(np.asarray(new)).T.astype(dict(zip(new, df.dtypes)))

   x098  y765  z432
0     1     3     5
1     2     4     6

解决方案5在循环
使用的每个元素中使用a 在此解决方案中,我们传递一个lambda来接受但忽略它。它也需要一个但并不期望。取而代之的是,将迭代器指定为默认值,然后我可以使用该迭代器一次遍历一个迭代器,而无需考虑is 的值。lambdapd.DataFrame.renamenew
xyx

df.rename(columns=lambda x, y=iter(new): next(y))

   x098  y765  z432
0     1     3     5
1     2     4     6

正如人们在sopython聊天中向我指出的那样,如果*x和之间添加一个,则y可以保护我的y变量。不过,在这种情况下,我认为它不需要保护。仍然值得一提。

df.rename(columns=lambda x, *, y=iter(new): next(y))

   x098  y765  z432
0     1     3     5
1     2     4     6

One line or Pipeline solutions

I’ll focus on two things:

  1. OP clearly states

    I have the edited column names stored it in a list, but I don’t know how to replace the column names.

    I do not want to solve the problem of how to replace '$' or strip the first character off of each column header. OP has already done this step. Instead I want to focus on replacing the existing columns object with a new one given a list of replacement column names.

  2. df.columns = new where new is the list of new columns names is as simple as it gets. The drawback of this approach is that it requires editing the existing dataframe’s columns attribute and it isn’t done inline. I’ll show a few ways to perform this via pipelining without editing the existing dataframe.


Setup 1
To focus on the need to rename of replace column names with a pre-existing list, I’ll create a new sample dataframe df with initial column names and unrelated new column names.

df = pd.DataFrame({'Jack': [1, 2], 'Mahesh': [3, 4], 'Xin': [5, 6]})
new = ['x098', 'y765', 'z432']

df

   Jack  Mahesh  Xin
0     1       3    5
1     2       4    6

Solution 1
pd.DataFrame.rename

It has been said already that if you had a dictionary mapping the old column names to new column names, you could use pd.DataFrame.rename.

d = {'Jack': 'x098', 'Mahesh': 'y765', 'Xin': 'z432'}
df.rename(columns=d)

   x098  y765  z432
0     1     3     5
1     2     4     6

However, you can easily create that dictionary and include it in the call to rename. The following takes advantage of the fact that when iterating over df, we iterate over each column name.

# given just a list of new column names
df.rename(columns=dict(zip(df, new)))

   x098  y765  z432
0     1     3     5
1     2     4     6

This works great if your original column names are unique. But if they are not, then this breaks down.


Setup 2
non-unique columns

df = pd.DataFrame(
    [[1, 3, 5], [2, 4, 6]],
    columns=['Mahesh', 'Mahesh', 'Xin']
)
new = ['x098', 'y765', 'z432']

df

   Mahesh  Mahesh  Xin
0       1       3    5
1       2       4    6

Solution 2
pd.concat using the keys argument

First, notice what happens when we attempt to use solution 1:

df.rename(columns=dict(zip(df, new)))

   y765  y765  z432
0     1     3     5
1     2     4     6

We didn’t map the new list as the column names. We ended up repeating y765. Instead, we can use the keys argument of the pd.concat function while iterating through the columns of df.

pd.concat([c for _, c in df.items()], axis=1, keys=new) 

   x098  y765  z432
0     1     3     5
1     2     4     6

Solution 3
Reconstruct. This should only be used if you have a single dtype for all columns. Otherwise, you’ll end up with dtype object for all columns and converting them back requires more dictionary work.

Single dtype

pd.DataFrame(df.values, df.index, new)

   x098  y765  z432
0     1     3     5
1     2     4     6

Mixed dtype

pd.DataFrame(df.values, df.index, new).astype(dict(zip(new, df.dtypes)))

   x098  y765  z432
0     1     3     5
1     2     4     6

Solution 4
This is a gimmicky trick with transpose and set_index. pd.DataFrame.set_index allows us to set an index inline but there is no corresponding set_columns. So we can transpose, then set_index, and transpose back. However, the same single dtype versus mixed dtype caveat from solution 3 applies here.

Single dtype

df.T.set_index(np.asarray(new)).T

   x098  y765  z432
0     1     3     5
1     2     4     6

Mixed dtype

df.T.set_index(np.asarray(new)).T.astype(dict(zip(new, df.dtypes)))

   x098  y765  z432
0     1     3     5
1     2     4     6

Solution 5
Use a lambda in pd.DataFrame.rename that cycles through each element of new
In this solution, we pass a lambda that takes x but then ignores it. It also takes a y but doesn’t expect it. Instead, an iterator is given as a default value and I can then use that to cycle through one at a time without regard to what the value of x is.

df.rename(columns=lambda x, y=iter(new): next(y))

   x098  y765  z432
0     1     3     5
1     2     4     6

And as pointed out to me by the folks in sopython chat, if I add a * in between x and y, I can protect my y variable. Though, in this context I don’t believe it needs protecting. It is still worth mentioning.

df.rename(columns=lambda x, *, y=iter(new): next(y))

   x098  y765  z432
0     1     3     5
1     2     4     6

回答 9

列名称与系列名称

我想解释一下幕后发生的事情。

数据框是一组系列。

系列又是对 numpy.array

numpy.array有财产 .name

这是系列的名称。很少有人会尊重大熊猫的这一属性,但它会在某些地方徘徊,并可以用来破解某些大熊猫的行为。

命名列列表

这里有很多答案都谈到该df.columns属性list实际上是一个Series。这意味着它具有.name属性。

如果您决定填写各列的名称,则会发生这种情况Series

df.columns = ['column_one', 'column_two']
df.columns.names = ['name of the list of columns']
df.index.names = ['name of the index']

name of the list of columns     column_one  column_two
name of the index       
0                                    4           1
1                                    5           2
2                                    6           3

请注意,索引的名称总是低一列。

that绕的神器

.name属性有时会持续存在。如果设置df.columns = ['one', 'two']df.one.name则将为'one'

如果您设置,df.one.name = 'three'那么df.columns仍然会给您['one', 'two'],并df.one.name会给您'three'

pd.DataFrame(df.one) 将返回

    three
0       1
1       2
2       3

因为pandas重用.name了已经定义的Series

多级列名称

熊猫有做多层列名的方法。没有太多魔术,但是我也想在答案中涵盖这一点,因为我看不到有人在这里进行这项工作。

    |one            |
    |one      |two  |
0   |  4      |  1  |
1   |  5      |  2  |
2   |  6      |  3  |

通过将列设置为列表很容易实现,如下所示:

df.columns = [['one', 'one'], ['one', 'two']]

Column names vs Names of Series

I would like to explain a bit what happens behind the scenes.

Dataframes are a set of Series.

Series in turn are an extension of a numpy.array

numpy.arrays have a property .name

This is the name of the series. It is seldom that pandas respects this attribute, but it lingers in places and can be used to hack some pandas behaviors.

Naming the list of columns

A lot of answers here talks about the df.columns attribute being a list when in fact it is a Series. This means it has a .name attribute.

This is what happens if you decide to fill in the name of the columns Series:

df.columns = ['column_one', 'column_two']
df.columns.names = ['name of the list of columns']
df.index.names = ['name of the index']

name of the list of columns     column_one  column_two
name of the index       
0                                    4           1
1                                    5           2
2                                    6           3

Note that the name of the index always comes one column lower.

Artifacts that linger

The .name attribute lingers on sometimes. If you set df.columns = ['one', 'two'] then the df.one.name will be 'one'.

If you set df.one.name = 'three' then df.columns will still give you ['one', 'two'], and df.one.name will give you 'three'

BUT

pd.DataFrame(df.one) will return

    three
0       1
1       2
2       3

Because pandas reuses the .name of the already defined Series.

Multi level column names

Pandas has ways of doing multi layered column names. There is not so much magic involved but I wanted to cover this in my answer too since I don’t see anyone picking up on this here.

    |one            |
    |one      |two  |
0   |  4      |  1  |
1   |  5      |  2  |
2   |  6      |  3  |

This is easily achievable by setting columns to lists, like this:

df.columns = [['one', 'one'], ['one', 'two']]

回答 10

如果您有数据框,则df.columns会将所有内容转储到您可以操作的列表中,然后将其重新分配给数据框作为列名…

columns = df.columns
columns = [row.replace("$","") for row in columns]
df.rename(columns=dict(zip(columns, things)), inplace=True)
df.head() #to validate the output

最好的办法?IDK。一种方法-是的。

下面是使用cProfile衡量内存和执行时间的一种更好的评估问题答案中提出的所有主要技术的方法。@ kadee,@ kaitlyn和@eumiro具有执行时间最快的功能-尽管这些功能是如此之快,我们将比较所有答案的.000和.001秒舍入。道德:我上面的回答可能不是“最佳”方法。

import pandas as pd
import cProfile, pstats, re

old_names = ['$a', '$b', '$c', '$d', '$e']
new_names = ['a', 'b', 'c', 'd', 'e']
col_dict = {'$a': 'a', '$b': 'b','$c':'c','$d':'d','$e':'e'}

df = pd.DataFrame({'$a':[1,2], '$b': [10,20],'$c':['bleep','blorp'],'$d':[1,2],'$e':['texa$','']})

df.head()

def eumiro(df,nn):
    df.columns = nn
    #This direct renaming approach is duplicated in methodology in several other answers: 
    return df

def lexual1(df):
    return df.rename(columns=col_dict)

def lexual2(df,col_dict):
    return df.rename(columns=col_dict, inplace=True)

def Panda_Master_Hayden(df):
    return df.rename(columns=lambda x: x[1:], inplace=True)

def paulo1(df):
    return df.rename(columns=lambda x: x.replace('$', ''))

def paulo2(df):
    return df.rename(columns=lambda x: x.replace('$', ''), inplace=True)

def migloo(df,on,nn):
    return df.rename(columns=dict(zip(on, nn)), inplace=True)

def kadee(df):
    return df.columns.str.replace('$','')

def awo(df):
    columns = df.columns
    columns = [row.replace("$","") for row in columns]
    return df.rename(columns=dict(zip(columns, '')), inplace=True)

def kaitlyn(df):
    df.columns = [col.strip('$') for col in df.columns]
    return df

print 'eumiro'
cProfile.run('eumiro(df,new_names)')
print 'lexual1'
cProfile.run('lexual1(df)')
print 'lexual2'
cProfile.run('lexual2(df,col_dict)')
print 'andy hayden'
cProfile.run('Panda_Master_Hayden(df)')
print 'paulo1'
cProfile.run('paulo1(df)')
print 'paulo2'
cProfile.run('paulo2(df)')
print 'migloo'
cProfile.run('migloo(df,old_names,new_names)')
print 'kadee'
cProfile.run('kadee(df)')
print 'awo'
cProfile.run('awo(df)')
print 'kaitlyn'
cProfile.run('kaitlyn(df)')

If you’ve got the dataframe, df.columns dumps everything into a list you can manipulate and then reassign into your dataframe as the names of columns…

columns = df.columns
columns = [row.replace("$","") for row in columns]
df.rename(columns=dict(zip(columns, things)), inplace=True)
df.head() #to validate the output

Best way? IDK. A way – yes.

A better way of evaluating all the main techniques put forward in the answers to the question is below using cProfile to gage memory & execution time. @kadee, @kaitlyn, & @eumiro had the functions with the fastest execution times – though these functions are so fast we’re comparing the rounding of .000 and .001 seconds for all the answers. Moral: my answer above likely isn’t the ‘Best’ way.

import pandas as pd
import cProfile, pstats, re

old_names = ['$a', '$b', '$c', '$d', '$e']
new_names = ['a', 'b', 'c', 'd', 'e']
col_dict = {'$a': 'a', '$b': 'b','$c':'c','$d':'d','$e':'e'}

df = pd.DataFrame({'$a':[1,2], '$b': [10,20],'$c':['bleep','blorp'],'$d':[1,2],'$e':['texa$','']})

df.head()

def eumiro(df,nn):
    df.columns = nn
    #This direct renaming approach is duplicated in methodology in several other answers: 
    return df

def lexual1(df):
    return df.rename(columns=col_dict)

def lexual2(df,col_dict):
    return df.rename(columns=col_dict, inplace=True)

def Panda_Master_Hayden(df):
    return df.rename(columns=lambda x: x[1:], inplace=True)

def paulo1(df):
    return df.rename(columns=lambda x: x.replace('$', ''))

def paulo2(df):
    return df.rename(columns=lambda x: x.replace('$', ''), inplace=True)

def migloo(df,on,nn):
    return df.rename(columns=dict(zip(on, nn)), inplace=True)

def kadee(df):
    return df.columns.str.replace('$','')

def awo(df):
    columns = df.columns
    columns = [row.replace("$","") for row in columns]
    return df.rename(columns=dict(zip(columns, '')), inplace=True)

def kaitlyn(df):
    df.columns = [col.strip('$') for col in df.columns]
    return df

print 'eumiro'
cProfile.run('eumiro(df,new_names)')
print 'lexual1'
cProfile.run('lexual1(df)')
print 'lexual2'
cProfile.run('lexual2(df,col_dict)')
print 'andy hayden'
cProfile.run('Panda_Master_Hayden(df)')
print 'paulo1'
cProfile.run('paulo1(df)')
print 'paulo2'
cProfile.run('paulo2(df)')
print 'migloo'
cProfile.run('migloo(df,old_names,new_names)')
print 'kadee'
cProfile.run('kadee(df)')
print 'awo'
cProfile.run('awo(df)')
print 'kaitlyn'
cProfile.run('kaitlyn(df)')

回答 11

假设这是您的数据框。

您可以使用两种方法重命名列。

  1. 使用 dataframe.columns=[#list]

    df.columns=['a','b','c','d','e']

    此方法的局限性在于,如果必须更改一列,则必须传递完整的列列表。同样,此方法不适用于索引标签。例如,如果您通过以下操作:

    df.columns = ['a','b','c','d']

    这将引发错误。长度不匹配:预期轴有5个元素,新值有4个元素。

  2. 另一种方法是Pandas rename()方法,用于重命名任何索引,列或行

    df = df.rename(columns={'$a':'a'})

同样,您可以更改任何行或列。

Let’s say this is your dataframe.

You can rename the columns using two methods.

  1. Using dataframe.columns=[#list]

    df.columns=['a','b','c','d','e']
    

    The limitation of this method is that if one column has to be changed, full column list has to be passed. Also, this method is not applicable on index labels. For example, if you passed this:

    df.columns = ['a','b','c','d']
    

    This will throw an error. Length mismatch: Expected axis has 5 elements, new values have 4 elements.

  2. Another method is the Pandas rename() method which is used to rename any index, column or row

    df = df.rename(columns={'$a':'a'})
    

Similarly, you can change any rows or columns.


回答 12

df = pd.DataFrame({'$a': [1], '$b': [1], '$c': [1], '$d': [1], '$e': [1]})

如果新的列列表与现有列的顺序相同,则分配很简单:

new_cols = ['a', 'b', 'c', 'd', 'e']
df.columns = new_cols
>>> df
   a  b  c  d  e
0  1  1  1  1  1

如果您有一个将旧列名键入新列名的字典,则可以执行以下操作:

d = {'$a': 'a', '$b': 'b', '$c': 'c', '$d': 'd', '$e': 'e'}
df.columns = df.columns.map(lambda col: d[col])  # Or `.map(d.get)` as pointed out by @PiRSquared.
>>> df
   a  b  c  d  e
0  1  1  1  1  1

如果没有列表或字典映射,则可以$通过列表理解来去除前导符号:

df.columns = [col[1:] if col[0] == '$' else col for col in df]
df = pd.DataFrame({'$a': [1], '$b': [1], '$c': [1], '$d': [1], '$e': [1]})

If your new list of columns is in the same order as the existing columns, the assignment is simple:

new_cols = ['a', 'b', 'c', 'd', 'e']
df.columns = new_cols
>>> df
   a  b  c  d  e
0  1  1  1  1  1

If you had a dictionary keyed on old column names to new column names, you could do the following:

d = {'$a': 'a', '$b': 'b', '$c': 'c', '$d': 'd', '$e': 'e'}
df.columns = df.columns.map(lambda col: d[col])  # Or `.map(d.get)` as pointed out by @PiRSquared.
>>> df
   a  b  c  d  e
0  1  1  1  1  1

If you don’t have a list or dictionary mapping, you could strip the leading $ symbol via a list comprehension:

df.columns = [col[1:] if col[0] == '$' else col for col in df]

回答 13


回答 14

让我们通过一个小例子来了解重命名…

1.使用映射重命名列:

df = pd.DataFrame({"A": [1, 2, 3], "B": [4, 5, 6]}) #creating a df with column name A and B
df.rename({"A": "new_a", "B": "new_b"},axis='columns',inplace =True) #renaming column A with 'new_a' and B with 'new_b'

output:
   new_a  new_b
0  1       4
1  2       5
2  3       6

2.使用映射重命名索引/行名:

df.rename({0: "x", 1: "y", 2: "z"},axis='index',inplace =True) #Row name are getting replaced by 'x','y','z'.

output:
       new_a  new_b
    x  1       4
    y  2       5
    z  3       6

Let’s Understand renaming by a small example…

1.Renaming columns using mapping:

df = pd.DataFrame({"A": [1, 2, 3], "B": [4, 5, 6]}) #creating a df with column name A and B
df.rename({"A": "new_a", "B": "new_b"},axis='columns',inplace =True) #renaming column A with 'new_a' and B with 'new_b'

output:
   new_a  new_b
0  1       4
1  2       5
2  3       6

2.Renaming index/Row_Name using mapping:

df.rename({0: "x", 1: "y", 2: "z"},axis='index',inplace =True) #Row name are getting replaced by 'x','y','z'.

output:
       new_a  new_b
    x  1       4
    y  2       5
    z  3       6

回答 15

我们可以替换原始列标签的另一种方法是通过从原始列标签中删除不需要的字符(此处为“ $”)。

可以通过在df.columns上运行for循环,并将剥离后的列附加到df.columns来完成。

取而代之的是,我们可以通过使用如下列表理解来在一个语句中整齐地做到这一点:

df.columns = [col.strip('$') for col in df.columns]

stripPython中的方法从字符串的开头和结尾去除给定的字符。)

Another way we could replace the original column labels is by stripping the unwanted characters (here ‘$’) from the original column labels.

This could have been done by running a for loop over df.columns and appending the stripped columns to df.columns.

Instead , we can do this neatly in a single statement by using list comprehension like below:

df.columns = [col.strip('$') for col in df.columns]

(strip method in Python strips the given character from beginning and end of the string.)


回答 16

真正简单就用

df.columns = ['Name1', 'Name2', 'Name3'...]

它将按照您放置它们的顺序分配列名

Real simple just use

df.columns = ['Name1', 'Name2', 'Name3'...]

and it will assign the column names by the order you put them


回答 17

您可以使用str.slice

df.columns = df.columns.str.slice(1)

You could use str.slice for that:

df.columns = df.columns.str.slice(1)

回答 18

我知道这个问题和答案已经被to死了。但是我提到它是为了解决我遇到的一个问题。我能够使用来自不同答案的点点滴滴来解决它,从而在有人需要时提供我的回复。

我的方法很通用,您可以通过用逗号分隔delimiters=变量并将其过时的方式添加其他定界符。

工作代码:

import pandas as pd
import re


df = pd.DataFrame({'$a':[1,2], '$b': [3,4],'$c':[5,6], '$d': [7,8], '$e': [9,10]})

delimiters = '$'
matchPattern = '|'.join(map(re.escape, delimiters))
df.columns = [re.split(matchPattern, i)[1] for i in df.columns ]

输出:

>>> df
   $a  $b  $c  $d  $e
0   1   3   5   7   9
1   2   4   6   8  10

>>> df
   a  b  c  d   e
0  1  3  5  7   9
1  2  4  6  8  10

I know this question and answer has been chewed to death. But I referred to it for inspiration for one of the problem I was having . I was able to solve it using bits and pieces from different answers hence providing my response in case anyone needs it.

My method is generic wherein you can add additional delimiters by comma separating delimiters= variable and future-proof it.

Working Code:

import pandas as pd
import re


df = pd.DataFrame({'$a':[1,2], '$b': [3,4],'$c':[5,6], '$d': [7,8], '$e': [9,10]})

delimiters = '$'
matchPattern = '|'.join(map(re.escape, delimiters))
df.columns = [re.split(matchPattern, i)[1] for i in df.columns ]

Output:

>>> df
   $a  $b  $c  $d  $e
0   1   3   5   7   9
1   2   4   6   8  10

>>> df
   a  b  c  d   e
0  1  3  5  7   9
1  2  4  6  8  10

回答 19

请注意,这些方法不适用于MultiIndex。对于MultiIndex,您需要执行以下操作:

>>> df = pd.DataFrame({('$a','$x'):[1,2], ('$b','$y'): [3,4], ('e','f'):[5,6]})
>>> df
   $a $b  e
   $x $y  f
0  1  3  5
1  2  4  6
>>> rename = {('$a','$x'):('a','x'), ('$b','$y'):('b','y')}
>>> df.columns = pandas.MultiIndex.from_tuples([
        rename.get(item, item) for item in df.columns.tolist()])
>>> df
   a  b  e
   x  y  f
0  1  3  5
1  2  4  6

Note that these approach do not work for a MultiIndex. For a MultiIndex, you need to do something like the following:

>>> df = pd.DataFrame({('$a','$x'):[1,2], ('$b','$y'): [3,4], ('e','f'):[5,6]})
>>> df
   $a $b  e
   $x $y  f
0  1  3  5
1  2  4  6
>>> rename = {('$a','$x'):('a','x'), ('$b','$y'):('b','y')}
>>> df.columns = pandas.MultiIndex.from_tuples([
        rename.get(item, item) for item in df.columns.tolist()])
>>> df
   a  b  e
   x  y  f
0  1  3  5
1  2  4  6

回答 20

另一种选择是使用正则表达式重命名:

import pandas as pd
import re

df = pd.DataFrame({'$a':[1,2], '$b':[3,4], '$c':[5,6]})

df = df.rename(columns=lambda x: re.sub('\$','',x))
>>> df
   a  b  c
0  1  3  5
1  2  4  6

Another option is to rename using a regular expression:

import pandas as pd
import re

df = pd.DataFrame({'$a':[1,2], '$b':[3,4], '$c':[5,6]})

df = df.rename(columns=lambda x: re.sub('\$','',x))
>>> df
   a  b  c
0  1  3  5
1  2  4  6

回答 21

如果您必须处理无法由提供系统命名的列负载,那么我想出了以下方法,该方法将一次通用方法与特定替换方法结合在一起。

首先,使用正则表达式从数据框的列名称中创建字典,以丢弃某些列名称的附录,然后向字典中添加特定的替换内容,以便稍后在接收数据库中按预期命名核心列。

然后将其一次性应用到数据帧。

dict=dict(zip(df.columns,df.columns.str.replace('(:S$|:C1$|:L$|:D$|\.Serial:L$)','')))
dict['brand_timeseries:C1']='BTS'
dict['respid:L']='RespID'
dict['country:C1']='CountryID'
dict['pim1:D']='pim_actual'
df.rename(columns=dict, inplace=True)

If you have to deal with loads of columns named by the providing system out of your control, I came up with the following approach that is a combination of a general approach and specific replacments in one go.

First create a dictionary from the dataframe column names using regex expressions in order to throw away certain appendixes of column names and then add specific replacements to the dictionary to name core columns as expected later in the receiving database.

This is then applied to the dataframe in one go.

dict=dict(zip(df.columns,df.columns.str.replace('(:S$|:C1$|:L$|:D$|\.Serial:L$)','')))
dict['brand_timeseries:C1']='BTS'
dict['respid:L']='RespID'
dict['country:C1']='CountryID'
dict['pim1:D']='pim_actual'
df.rename(columns=dict, inplace=True)

回答 22

除了已经提供的解决方案之外,您还可以在读取文件时替换所有列。我们可以使用namesheader=0做到这一点。

首先,我们创建一个名称列表,以用作列名:

import pandas as pd

ufo_cols = ['city', 'color reported', 'shape reported', 'state', 'time']
ufo.columns = ufo_cols

ufo = pd.read_csv('link to the file you are using', names = ufo_cols, header = 0)

在这种情况下,所有列名称都将替换为列表中的名称。

In addition to the solution already provided, you can replace all the columns while you are reading the file. We can use names and header=0 to do that.

First, we create a list of the names that we like to use as our column names:

import pandas as pd

ufo_cols = ['city', 'color reported', 'shape reported', 'state', 'time']
ufo.columns = ufo_cols

ufo = pd.read_csv('link to the file you are using', names = ufo_cols, header = 0)

In this case, all the column names will be replaced with the names you have in your list.


回答 23

这是一个我喜欢用来减少键入的漂亮小功能:

def rename(data, oldnames, newname): 
    if type(oldnames) == str: #input can be a string or list of strings 
        oldnames = [oldnames] #when renaming multiple columns 
        newname = [newname] #make sure you pass the corresponding list of new names
    i = 0 
    for name in oldnames:
        oldvar = [c for c in data.columns if name in c]
        if len(oldvar) == 0: 
            raise ValueError("Sorry, couldn't find that column in the dataset")
        if len(oldvar) > 1: #doesn't have to be an exact match 
            print("Found multiple columns that matched " + str(name) + " :")
            for c in oldvar:
                print(str(oldvar.index(c)) + ": " + str(c))
            ind = input('please enter the index of the column you would like to rename: ')
            oldvar = oldvar[int(ind)]
        if len(oldvar) == 1:
            oldvar = oldvar[0]
        data = data.rename(columns = {oldvar : newname[i]})
        i += 1 
    return data   

这是它如何工作的示例:

In [2]: df = pd.DataFrame(np.random.randint(0,10,size=(10, 4)), columns=['col1','col2','omg','idk'])
#first list = existing variables
#second list = new names for those variables
In [3]: df = rename(df, ['col','omg'],['first','ohmy']) 
Found multiple columns that matched col :
0: col1
1: col2

please enter the index of the column you would like to rename: 0

In [4]: df.columns
Out[5]: Index(['first', 'col2', 'ohmy', 'idk'], dtype='object')

Here’s a nifty little function I like to use to cut down on typing:

def rename(data, oldnames, newname): 
    if type(oldnames) == str: #input can be a string or list of strings 
        oldnames = [oldnames] #when renaming multiple columns 
        newname = [newname] #make sure you pass the corresponding list of new names
    i = 0 
    for name in oldnames:
        oldvar = [c for c in data.columns if name in c]
        if len(oldvar) == 0: 
            raise ValueError("Sorry, couldn't find that column in the dataset")
        if len(oldvar) > 1: #doesn't have to be an exact match 
            print("Found multiple columns that matched " + str(name) + " :")
            for c in oldvar:
                print(str(oldvar.index(c)) + ": " + str(c))
            ind = input('please enter the index of the column you would like to rename: ')
            oldvar = oldvar[int(ind)]
        if len(oldvar) == 1:
            oldvar = oldvar[0]
        data = data.rename(columns = {oldvar : newname[i]})
        i += 1 
    return data   

Here is an example of how it works:

In [2]: df = pd.DataFrame(np.random.randint(0,10,size=(10, 4)), columns=['col1','col2','omg','idk'])
#first list = existing variables
#second list = new names for those variables
In [3]: df = rename(df, ['col','omg'],['first','ohmy']) 
Found multiple columns that matched col :
0: col1
1: col2

please enter the index of the column you would like to rename: 0

In [4]: df.columns
Out[5]: Index(['first', 'col2', 'ohmy', 'idk'], dtype='object')

回答 24

重命名熊猫中的列很容易。

df.rename(columns = {'$a':'a','$b':'b','$c':'c','$d':'d','$e':'e'},inplace = True)

Renaming columns in pandas is an easy task.

df.rename(columns = {'$a':'a','$b':'b','$c':'c','$d':'d','$e':'e'},inplace = True)

回答 25

假设您可以使用正则表达式。该解决方案无需使用正则表达式进行手动编码

import pandas as pd
import re

srch=re.compile(r"\w+")

data=pd.read_csv("CSV_FILE.csv")
cols=data.columns
new_cols=list(map(lambda v:v.group(),(list(map(srch.search,cols)))))
data.columns=new_cols

Assuming you can use regular expression. This solution removes the need of manual encoding using regex

import pandas as pd
import re

srch=re.compile(r"\w+")

data=pd.read_csv("CSV_FILE.csv")
cols=data.columns
new_cols=list(map(lambda v:v.group(),(list(map(srch.search,cols)))))
data.columns=new_cols