what is the proper way to do this? kind of like how in grep/regex you can do \1 and \2 to replace fields to certain search strings
回答 0
这是一个简短的示例,应该使用正则表达式来解决问题:
import re
rep ={"condition1":"","condition2":"text"}# define desired replacements here# use these three lines to do the replacement
rep = dict((re.escape(k), v)for k, v in rep.iteritems())#Python 3 renamed dict.iteritems to dict.items so use rep.items() for latest versions
pattern = re.compile("|".join(rep.keys()))
text = pattern.sub(lambda m: rep[re.escape(m.group(0))], text)
例如:
>>> pattern.sub(lambda m: rep[re.escape(m.group(0))],"(condition1) and --condition2--")'() and --text--'
Here is a short example that should do the trick with regular expressions:
import re
rep = {"condition1": "", "condition2": "text"} # define desired replacements here
# use these three lines to do the replacement
rep = dict((re.escape(k), v) for k, v in rep.iteritems())
#Python 3 renamed dict.iteritems to dict.items so use rep.items() for latest versions
pattern = re.compile("|".join(rep.keys()))
text = pattern.sub(lambda m: rep[re.escape(m.group(0))], text)
For example:
>>> pattern.sub(lambda m: rep[re.escape(m.group(0))], "(condition1) and --condition2--")
'() and --text--'
回答 1
您可以制作一个不错的小循环功能。
def replace_all(text, dic):for i, j in dic.iteritems():
text = text.replace(i, j)return text
d ={"cat":"dog","dog":"pig"}
my_sentence ="This is my cat and this is my dog."
replace_all(my_sentence, d)print(my_sentence)
可能的输出#1:
“这是我的猪,这是我的猪。”
可能的输出#2
“这是我的狗,这是我的猪。”
一种可能的解决方法是使用OrderedDict。
from collections importOrderedDictdef replace_all(text, dic):for i, j in dic.items():
text = text.replace(i, j)return text
od =OrderedDict([("cat","dog"),("dog","pig")])
my_sentence ="This is my cat and this is my dog."
replace_all(my_sentence, od)print(my_sentence)
You could just make a nice little looping function.
def replace_all(text, dic):
for i, j in dic.iteritems():
text = text.replace(i, j)
return text
where text is the complete string and dic is a dictionary — each definition is a string that will replace a match to the term.
Note: in Python 3, iteritems() has been replaced with items()
Careful: Python dictionaries don’t have a reliable order for iteration. This solution only solves your problem if:
order of replacements is irrelevant
it’s ok for a replacement to change the results of previous replacements
Update: The above statement related to ordering of insertion does not apply to Python versions greater than or equal to 3.6, as standard dicts were changed to use insertion ordering for iteration.
For instance:
d = { "cat": "dog", "dog": "pig"}
my_sentence = "This is my cat and this is my dog."
replace_all(my_sentence, d)
print(my_sentence)
Possible output #1:
"This is my pig and this is my pig."
Possible output #2
"This is my dog and this is my pig."
One possible fix is to use an OrderedDict.
from collections import OrderedDict
def replace_all(text, dic):
for i, j in dic.items():
text = text.replace(i, j)
return text
od = OrderedDict([("cat", "dog"), ("dog", "pig")])
my_sentence = "This is my cat and this is my dog."
replace_all(my_sentence, od)
print(my_sentence)
Output:
"This is my pig and this is my pig."
Careful #2: Inefficient if your text string is too big or there are many pairs in the dictionary.
回答 2
为什么不提供这样的解决方案?
s ="The quick brown fox jumps over the lazy dog"for r in(("brown","red"),("lazy","quick")):
s = s.replace(*r)#output will be: The quick red fox jumps over the quick dog
s = "The quick brown fox jumps over the lazy dog"
for r in (("brown", "red"), ("lazy", "quick")):
s = s.replace(*r)
#output will be: The quick red fox jumps over the quick dog
回答 3
这是第一种使用reduce的解决方案的变体,以防您喜欢功能。:)
repls ={'hello':'goodbye','world':'earth'}
s ='hello, world'
reduce(lambda a, kv: a.replace(*kv), repls.iteritems(), s)
martineau的更好版本:
repls =('hello','goodbye'),('world','earth')
s ='hello, world'
reduce(lambda a, kv: a.replace(*kv), repls, s)
This is just a more concise recap of F.J and MiniQuark great answers. All you need to achieve multiple simultaneous string replacements is the following function:
def multiple_replace(string, rep_dict):
pattern = re.compile("|".join([re.escape(k) for k in sorted(rep_dict,key=len,reverse=True)]), flags=re.DOTALL)
return pattern.sub(lambda x: rep_dict[x.group(0)], string)
Usage:
>>>multiple_replace("Do you like cafe? No, I prefer tea.", {'cafe':'tea', 'tea':'cafe', 'like':'prefer'})
'Do you prefer tea? No, I prefer cafe.'
If you wish, you can make your own dedicated replacement functions starting from this simpler one.
回答 5
我基于FJ的出色答案:
import re
def multiple_replacer(*key_values):
replace_dict = dict(key_values)
replacement_function =lambda match: replace_dict[match.group(0)]
pattern = re.compile("|".join([re.escape(k)for k, v in key_values]), re.M)returnlambda string: pattern.sub(replacement_function, string)def multiple_replace(string,*key_values):return multiple_replacer(*key_values)(string)
一杆用法:
>>> replacements =(u"café", u"tea"),(u"tea", u"café"),(u"like", u"love")>>>print multiple_replace(u"Do you like café? No, I prefer tea.",*replacements)Do you love tea?No, I prefer café.
请注意,由于更换仅需一遍,因此“café”更改为“ tea”,但不会更改为“café”。
如果您需要多次进行相同的替换,则可以轻松创建替换功能:
>>> my_escaper = multiple_replacer(('"','\\"'),('\t','\\t'))>>> many_many_strings =(u'This text will be escaped by "my_escaper"',
u'Does this work?\tYes it does',
u'And can we span\nmultiple lines?\t"Yes\twe\tcan!"')>>>for line in many_many_strings:...print my_escaper(line)...This text will be escaped by \"my_escaper\"
Does this work?\tYes it does
And can we span
multiple lines?\t\"Yes\twe\tcan!\"
import re
def multiple_replacer(*key_values):
replace_dict = dict(key_values)
replacement_function = lambda match: replace_dict[match.group(0)]
pattern = re.compile("|".join([re.escape(k) for k, v in key_values]), re.M)
return lambda string: pattern.sub(replacement_function, string)
def multiple_replace(string, *key_values):
return multiple_replacer(*key_values)(string)
One shot usage:
>>> replacements = (u"café", u"tea"), (u"tea", u"café"), (u"like", u"love")
>>> print multiple_replace(u"Do you like café? No, I prefer tea.", *replacements)
Do you love tea? No, I prefer café.
Note that since replacement is done in just one pass, “café” changes to “tea”, but it does not change back to “café”.
If you need to do the same replacement many times, you can create a replacement function easily:
>>> my_escaper = multiple_replacer(('"','\\"'), ('\t', '\\t'))
>>> many_many_strings = (u'This text will be escaped by "my_escaper"',
u'Does this work?\tYes it does',
u'And can we span\nmultiple lines?\t"Yes\twe\tcan!"')
>>> for line in many_many_strings:
... print my_escaper(line)
...
This text will be escaped by \"my_escaper\"
Does this work?\tYes it does
And can we span
multiple lines?\t\"Yes\twe\tcan!\"
Improvements:
turned code into a function
added multiline support
fixed a bug in escaping
easy to create a function for a specific multiple replacement
I would like to propose the usage of string templates. Just place the string to be replaced in a dictionary and all is set! Example from docs.python.org
>>> from string import Template
>>> s = Template('$who likes $what')
>>> s.substitute(who='tim', what='kung pao')
'tim likes kung pao'
>>> d = dict(who='tim')
>>> Template('Give $who $100').substitute(d)
Traceback (most recent call last):
[...]
ValueError: Invalid placeholder in string: line 1, col 10
>>> Template('$who likes $what').substitute(d)
Traceback (most recent call last):
[...]
KeyError: 'what'
>>> Template('$who likes $what').safe_substitute(d)
'tim likes $what'
回答 7
就我而言,我需要用名称简单替换唯一键,所以我想到了:
a ='This is a test string.'
b ={'i':'I','s':'S'}for x,y in b.items():
a = a.replace(x, y)>>> a
'ThIS IS a teSt StrIng.'
# text = "The quick brown fox jumps over the lazy dog"# replacements = [("brown", "red"), ("lazy", "quick")][text := text.replace(a, b)for a, b in replacements]# text = 'The quick red fox jumps over the quick dog'
Starting Python 3.8, and the introduction of assignment expressions (PEP 572) (:= operator), we can apply the replacements within a list comprehension:
# text = "The quick brown fox jumps over the lazy dog"
# replacements = [("brown", "red"), ("lazy", "quick")]
[text := text.replace(a, b) for a, b in replacements]
# text = 'The quick red fox jumps over the quick dog'
def multireplace(string, replacements):"""
Given a string and a replacement map, it returns the replaced string.
:param str string: string to execute replacements on
:param dict replacements: replacement dictionary {value to find: value to replace}
:rtype: str
"""# Place longer ones first to keep shorter substrings from matching# where the longer ones should take place# For instance given the replacements {'ab': 'AB', 'abc': 'ABC'} against # the string 'hey abc', it should produce 'hey ABC' and not 'hey ABc'
substrs = sorted(replacements, key=len, reverse=True)# Create a big OR regex that matches any of the substrings to replace
regexp = re.compile('|'.join(map(re.escape, substrs)))# For each match, look up the new string in the replacementsreturn regexp.sub(lambda match: replacements[match.group(0)], string)
Here my $0.02. It is based on Andrew Clark’s answer, just a little bit clearer, and it also covers the case when a string to replace is a substring of another string to replace (longer string wins)
def multireplace(string, replacements):
"""
Given a string and a replacement map, it returns the replaced string.
:param str string: string to execute replacements on
:param dict replacements: replacement dictionary {value to find: value to replace}
:rtype: str
"""
# Place longer ones first to keep shorter substrings from matching
# where the longer ones should take place
# For instance given the replacements {'ab': 'AB', 'abc': 'ABC'} against
# the string 'hey abc', it should produce 'hey ABC' and not 'hey ABc'
substrs = sorted(replacements, key=len, reverse=True)
# Create a big OR regex that matches any of the substrings to replace
regexp = re.compile('|'.join(map(re.escape, substrs)))
# For each match, look up the new string in the replacements
return regexp.sub(lambda match: replacements[match.group(0)], string)
It is in this this gist, feel free to modify it if you have any proposal.
def multiple_replace(string, reps, re_flags =0):""" Transforms string, replacing keys from re_str_dict with values.
reps: dictionary, or list of key-value pairs (to enforce ordering;
earlier items have higher priority).
Keys are used as regular expressions.
re_flags: interpretation of regular expressions, such as re.DOTALL
"""if isinstance(reps, dict):
reps = reps.items()
pattern = re.compile("|".join("(?P<_%d>%s)"%(i, re_str[0])for i, re_str in enumerate(reps)),
re_flags)return pattern.sub(lambda x: reps[int(x.lastgroup[1:])][1], string)
它适用于其他答案中给出的示例,例如:
>>> multiple_replace("(condition1) and --condition2--",...{"condition1":"","condition2":"text"})'() and --text--'>>> multiple_replace('hello, world',{'hello':'goodbye','world':'earth'})'goodbye, earth'>>> multiple_replace("Do you like cafe? No, I prefer tea.",...{'cafe':'tea','tea':'cafe','like':'prefer'})'Do you prefer tea? No, I prefer cafe.'
对我来说,最主要的是您还可以使用正则表达式,例如仅替换整个单词,或规范化空白:
>>> s ="I don't want to change this name:\n Philip II of Spain">>> re_str_dict ={r'\bI\b':'You', r'[\n\t ]+':' '}>>> multiple_replace(s, re_str_dict)"You don't want to change this name: Philip II of Spain"
def escape_keys(d):""" transform dictionary d by applying re.escape to the keys """return dict((re.escape(k), v)for k, v in d.items())>>> multiple_replace(s, escape_keys(re_str_dict))"I don't want to change this name:\n Philip II of Spain"
def check_re_list(re_list):""" Checks if each regular expression in list is well-formed. """for i, e in enumerate(re_list):try:
re.compile(e)except(TypeError, re.error):print("Invalid regular expression string ""at position {}: '{}'".format(i, e))>>> check_re_list(re_str_dict.keys())
I needed a solution where the strings to be replaced can be a regular expressions,
for example to help in normalizing a long text by replacing multiple whitespace characters with a single one. Building on a chain of answers from others, including MiniQuark and mmj, this is what I came up with:
def multiple_replace(string, reps, re_flags = 0):
""" Transforms string, replacing keys from re_str_dict with values.
reps: dictionary, or list of key-value pairs (to enforce ordering;
earlier items have higher priority).
Keys are used as regular expressions.
re_flags: interpretation of regular expressions, such as re.DOTALL
"""
if isinstance(reps, dict):
reps = reps.items()
pattern = re.compile("|".join("(?P<_%d>%s)" % (i, re_str[0])
for i, re_str in enumerate(reps)),
re_flags)
return pattern.sub(lambda x: reps[int(x.lastgroup[1:])][1], string)
It works for the examples given in other answers, for example:
>>> multiple_replace("(condition1) and --condition2--",
... {"condition1": "", "condition2": "text"})
'() and --text--'
>>> multiple_replace('hello, world', {'hello' : 'goodbye', 'world' : 'earth'})
'goodbye, earth'
>>> multiple_replace("Do you like cafe? No, I prefer tea.",
... {'cafe': 'tea', 'tea': 'cafe', 'like': 'prefer'})
'Do you prefer tea? No, I prefer cafe.'
The main thing for me is that you can use regular expressions as well, for example to replace whole words only, or to normalize white space:
>>> s = "I don't want to change this name:\n Philip II of Spain"
>>> re_str_dict = {r'\bI\b': 'You', r'[\n\t ]+': ' '}
>>> multiple_replace(s, re_str_dict)
"You don't want to change this name: Philip II of Spain"
If you want to use the dictionary keys as normal strings,
you can escape those before calling multiple_replace using e.g. this function:
def escape_keys(d):
""" transform dictionary d by applying re.escape to the keys """
return dict((re.escape(k), v) for k, v in d.items())
>>> multiple_replace(s, escape_keys(re_str_dict))
"I don't want to change this name:\n Philip II of Spain"
The following function can help in finding erroneous regular expressions among your dictionary keys (since the error message from multiple_replace isn’t very telling):
def check_re_list(re_list):
""" Checks if each regular expression in list is well-formed. """
for i, e in enumerate(re_list):
try:
re.compile(e)
except (TypeError, re.error):
print("Invalid regular expression string "
"at position {}: '{}'".format(i, e))
>>> check_re_list(re_str_dict.keys())
Note that it does not chain the replacements, instead performs them simultaneously. This makes it more efficient without constraining what it can do. To mimic the effect of chaining, you may just need to add more string-replacement pairs and ensure the expected ordering of the pairs:
source ="Here is foo, it does moo!"
replacements ={'is':'was',# replace 'is' with 'was''does':'did','!':'?'}def replace(source, replacements):
finder = re.compile("|".join(re.escape(k)for k in replacements.keys()))# matches every string we want replaced
result =[]
pos =0whileTrue:
match = finder.search(source, pos)if match:# cut off the part up until match
result.append(source[pos : match.start()])# cut off the matched part and replace it in place
result.append(replacements[source[match.start(): match.end()]])
pos = match.end()else:# the rest after the last match
result.append(source[pos:])breakreturn"".join(result)print replace(source, replacements)
Here’s a sample which is more efficient on long strings with many small replacements.
source = "Here is foo, it does moo!"
replacements = {
'is': 'was', # replace 'is' with 'was'
'does': 'did',
'!': '?'
}
def replace(source, replacements):
finder = re.compile("|".join(re.escape(k) for k in replacements.keys())) # matches every string we want replaced
result = []
pos = 0
while True:
match = finder.search(source, pos)
if match:
# cut off the part up until match
result.append(source[pos : match.start()])
# cut off the matched part and replace it in place
result.append(replacements[source[match.start() : match.end()]])
pos = match.end()
else:
# the rest after the last match
result.append(source[pos:])
break
return "".join(result)
print replace(source, replacements)
The point is in avoiding many concatenations of long strings. We chop the source string to fragments, replacing some of the fragments as we form the list, and then join the whole thing back into a string.
You should really not do it this way, but I just find it way too cool:
>>> replacements = {'cond1':'text1', 'cond2':'text2'}
>>> cmd = 'answer = s'
>>> for k,v in replacements.iteritems():
>>> cmd += ".replace(%s, %s)" %(k,v)
>>> exec(cmd)
Now, answer is the result of all the replacements in turn
again, this is very hacky and is not something that you should be using regularly. But it’s just nice to know that you can do something like this if you ever need to.
from flashtext importKeywordProcessor
self.processor =KeywordProcessor(case_sensitive=False)for k, v in self.my_dict.items():
self.processor.add_keyword(k, v)
new_string = self.processor.replace_keywords(string)
请注意,Flashtext会在一次通过中进行替换(以避免-> b和b-> c将’a’转换为’c’)。Flashtext还会查找整个单词(因此,“ is”将与“ th is ” 不匹配)。如果您的目标是几个单词(用“ Hello”代替“ This is”),则效果很好。
I was struggling with this problem as well. With many substitutions regular expressions struggle, and are about four times slower than looping string.replace (in my experiment conditions).
You should absolutely try using the Flashtext library (blog post here, Github here). In my case it was a bit over two orders of magnitude faster, from 1.8 s to 0.015 s (regular expressions took 7.7 s) for each document.
It is easy to find use examples in the links above, but this is a working example:
from flashtext import KeywordProcessor
self.processor = KeywordProcessor(case_sensitive=False)
for k, v in self.my_dict.items():
self.processor.add_keyword(k, v)
new_string = self.processor.replace_keywords(string)
Note that Flashtext makes substitutions in a single pass (to avoid a –> b and b –> c translating ‘a’ into ‘c’). Flashtext also looks for whole words (so ‘is’ will not match ‘this‘). It works fine if your target is several words (replacing ‘This is’ by ‘Hello’).
回答 14
我觉得这个问题需要单行递归lambda函数答案才能完整,仅因为如此。所以那里:
>>> mrep =lambda s, d: s ifnot d else mrep(s.replace(*d.popitem()), d)
I feel this question needs a single-line recursive lambda function answer for completeness, just because. So there:
>>> mrep = lambda s, d: s if not d else mrep(s.replace(*d.popitem()), d)
Usage:
>>> mrep('abcabc', {'a': '1', 'c': '2'})
'1b21b2'
Notes:
This consumes the input dictionary.
Python dicts preserve key order as of 3.6; corresponding caveats in other answers are not relevant anymore. For backward compatibility one could resort to a tuple-based version:
>>> mrep = lambda s, d: s if not d else mrep(s.replace(*d.pop()), d)
>>> mrep('abcabc', [('a', '1'), ('c', '2')])
Note: As with all recursive functions in python, too large recursion depth (i.e. too large replacement dictionaries) will result in an error. See e.g. here.
回答 15
我不知道速度,但这是我的工作日快速解决方案:
reduce(lambda a, b: a.replace(*b),[('o','W'),('t','X')]#iterable of pairs: (oldval, newval),'tomato'#The string from which to replace values)
I don’t know about speed but this is my workaday quick fix:
reduce(lambda a, b: a.replace(*b)
, [('o','W'), ('t','X')] #iterable of pairs: (oldval, newval)
, 'tomato' #The string from which to replace values
)
… but I like the #1 regex answer above. Note – if one new value is a substring of another one then the operation is not commutative.
回答 16
您可以使用支持完全匹配以及正则表达式替换的pandas库和replace函数。例如:
df = pd.DataFrame({'text':['Billy is going to visit Rome in November','I was born in 10/10/2010','I will be there at 20:00']})
to_replace=['Billy','Rome','January|February|March|April|May|June|July|August|September|October|November|December','\d{2}:\d{2}','\d{2}/\d{2}/\d{4}']
replace_with=['name','city','month','time','date']print(df.text.replace(to_replace, replace_with, regex=True))
修改后的文本是:
0 name is going to visit city in month
1 I was born in date
2 I will be there at time
You can use the pandas library and the replace function which supports both exact matches as well as regex replacements. For example:
df = pd.DataFrame({'text': ['Billy is going to visit Rome in November', 'I was born in 10/10/2010', 'I will be there at 20:00']})
to_replace=['Billy','Rome','January|February|March|April|May|June|July|August|September|October|November|December', '\d{2}:\d{2}', '\d{2}/\d{2}/\d{4}']
replace_with=['name','city','month','time', 'date']
print(df.text.replace(to_replace, replace_with, regex=True))
And the modified text is:
0 name is going to visit city in month
1 I was born in date
2 I will be there at time
You can find an example here. Notice that the replacements on the text are done with the order they appear in the lists
my_string ='This is a test string.'
dict_mapping ={'i':'s','s':'S'}
result_good = my_string.translate(str.maketrans(dict_mapping))
result_bad = my_string
for x, y in dict_mapping.items():
result_bad = result_bad.replace(x, y)print(result_good)# ThsS sS a teSt Strsng.print(result_bad)# ThSS SS a teSt StrSng.
my_string = 'This is a test string.'
dict_mapping = {'i': 's', 's': 'S'}
result_good = my_string.translate(str.maketrans(dict_mapping))
result_bad = my_string
for x, y in dict_mapping.items():
result_bad = result_bad.replace(x, y)
print(result_good) # ThsS sS a teSt Strsng.
print(result_bad) # ThSS SS a teSt StrSng.
import glob
import re
mapfile = input("Enter map file name with extension eg. codifica.txt: ")
sep = input("Enter map file column separator eg. |: ")
mask = input("Enter search mask with extension eg. 2010*txt for all files to be processed: ")
suff = input("Enter suffix with extension eg. _NEW.txt for newly generated files: ")
rep ={}# creation of empy dictionarywith open(mapfile)as temprep:# loading of definitions in the dictionary using input file, separator is promptedfor line in temprep:(key, val)= line.strip('\n').split(sep)
rep[key]= val
for filename in glob.iglob(mask):# recursion on all the files with the mask promptedwith open (filename,"r")as textfile:# load each file in the variable text
text = textfile.read()# start replacement#rep = dict((re.escape(k), v) for k, v in rep.items()) commented to enable the use in the mapping of re reserved characters
pattern = re.compile("|".join(rep.keys()))
text = pattern.sub(lambda m: rep[m.group(0)], text)#write of te output files with the prompted suffice
target = open(filename[:-4]+"_NEW.txt","w")
target.write(text)
target.close()
Starting from the precious answer of Andrew i developed a script that loads the dictionary from a file and elaborates all the files on the opened folder to do the replacements. The script loads the mappings from an external file in which you can set the separator. I’m a beginner but i found this script very useful when doing multiple substitutions in multiple files. It loaded a dictionary with more than 1000 entries in seconds. It is not elegant but it worked for me
import glob
import re
mapfile = input("Enter map file name with extension eg. codifica.txt: ")
sep = input("Enter map file column separator eg. |: ")
mask = input("Enter search mask with extension eg. 2010*txt for all files to be processed: ")
suff = input("Enter suffix with extension eg. _NEW.txt for newly generated files: ")
rep = {} # creation of empy dictionary
with open(mapfile) as temprep: # loading of definitions in the dictionary using input file, separator is prompted
for line in temprep:
(key, val) = line.strip('\n').split(sep)
rep[key] = val
for filename in glob.iglob(mask): # recursion on all the files with the mask prompted
with open (filename, "r") as textfile: # load each file in the variable text
text = textfile.read()
# start replacement
#rep = dict((re.escape(k), v) for k, v in rep.items()) commented to enable the use in the mapping of re reserved characters
pattern = re.compile("|".join(rep.keys()))
text = pattern.sub(lambda m: rep[m.group(0)], text)
#write of te output files with the prompted suffice
target = open(filename[:-4]+"_NEW.txt", "w")
target.write(text)
target.close()
回答 19
这是我解决问题的方法。我在聊天机器人中使用它立即替换了不同的单词。
def mass_replace(text, dct):
new_string = ""
old_string = text
while len(old_string) > 0:
s = ""
sk = ""
for k in dct.keys():
if old_string.startswith(k):
s = dct[k]
sk = k
if s:
new_string+=s
old_string = old_string[len(sk):]
else:
new_string+=old_string[0]
old_string = old_string[1:]
return new_string
print mass_replace("The dog hunts the cat", {"dog":"cat", "cat":"dog"})
this is my solution to the problem. I used it in a chatbot to replace the different words at once.
def mass_replace(text, dct):
new_string = ""
old_string = text
while len(old_string) > 0:
s = ""
sk = ""
for k in dct.keys():
if old_string.startswith(k):
s = dct[k]
sk = k
if s:
new_string+=s
old_string = old_string[len(sk):]
else:
new_string+=old_string[0]
old_string = old_string[1:]
return new_string
print mass_replace("The dog hunts the cat", {"dog":"cat", "cat":"dog"})
this will become The cat hunts the dog
回答 20
另一个例子:输入列表
error_list =['[br]','[ex]','Something']
words =['how','much[ex]','is[br]','the','fish[br]','noSomething','really']
所需的输出将是
words =['how','much','is','the','fish','no','really']
代码:
[n[0][0]if len(n[0])else n[1]for n in[[[w.replace(e,"")for e in error_list if e in w],w]for w in words]]
words = ['how', 'much', 'is', 'the', 'fish', 'no', 'really']
Code :
[n[0][0] if len(n[0]) else n[1] for n in [[[w.replace(e,"") for e in error_list if e in w],w] for w in words]]
回答 21
或者只是为了快速破解:
for line in to_read:
read_buffer = line
stripped_buffer1 = read_buffer.replace("term1"," ")
stripped_buffer2 = stripped_buffer1.replace("term2"," ")
write_to_file = to_write.write(stripped_buffer2)
for line in to_read:
read_buffer = line
stripped_buffer1 = read_buffer.replace("term1", " ")
stripped_buffer2 = stripped_buffer1.replace("term2", " ")
write_to_file = to_write.write(stripped_buffer2)
回答 22
这是使用字典的另一种方法:
listA="The cat jumped over the house".split()
modify ={word:word for number,word in enumerate(listA)}
modify["cat"],modify["jumped"]="dog","walked"print" ".join(modify[x]for x in listA)
Here is another way of doing it with a dictionary:
listA="The cat jumped over the house".split()
modify = {word:word for number,word in enumerate(listA)}
modify["cat"],modify["jumped"]="dog","walked"
print " ".join(modify[x] for x in listA)
I have to search through a list and replace all occurrences of one element with another. So far my attempts in code are getting me nowhere, what is the best way to do this?
For example, suppose my list has the following integers
>>> a = [1,2,3,4,5,1,2,3,4,5,1]
and I need to replace all occurrences of the number 1 with the value 10 so the output I need is
>>> a = [10, 2, 3, 4, 5, 10, 2, 3, 4, 5, 10]
Thus my goal is to replace all instances of the number 1 with the number 10.
回答 0
>>> a=[1,2,3,4,5,1,2,3,4,5,1]>>>for n, i in enumerate(a):...if i ==1:... a[n]=10...>>> a
[10,2,3,4,5,10,2,3,4,5,10]
If you have several values to replace, you can also use a dictionary:
a = [1, 2, 3, 4, 1, 5, 3, 2, 6, 1, 1]
dic = {1:10, 2:20, 3:'foo'}
print([dic.get(n, n) for n in a])
> [10, 20, 'foo', 4, 10, 5, 'foo', 20, 6, 10, 10]
回答 4
>>> a=[1,2,3,4,5,1,2,3,4,5,1]>>> item_to_replace =1>>> replacement_value =6>>> indices_to_replace =[i for i,x in enumerate(a)if x==item_to_replace]>>> indices_to_replace
[0,5,10]>>>for i in indices_to_replace:... a[i]= replacement_value
...>>> a
[6,2,3,4,5,6,2,3,4,5,6]>>>
To replace easily all 1 with 10 in
a = [1,2,3,4,5,1,2,3,4,5,1]one could use the following one-line lambda+map combination, and ‘Look, Ma, no IFs or FORs!’ :
# This substitutes all '1' with '10' in list 'a' and places result in list 'c':
c = list(map(lambda b: b.replace("1","10"), a))
回答 7
以下是Python 2.x中非常直接的方法
a =[1,2,3,4,5,1,2,3,4,5,1]#Replacing every 1 with 10for i in xrange(len(a)):if a[i]==1:
a[i]=10print a
I know this is a very old question and there’s a myriad of ways to do it. The simpler one I found is using numpy package.
import numpy
arr = numpy.asarray([1, 6, 1, 9, 8])
arr[ arr == 8 ] = 0 # change all occurrences of 8 by 0
print(arr)
回答 9
我的用例已替换None为一些默认值。
我已经定时提出了解决此问题的方法,包括@kxr-using str.count。
使用Python 3.8.1在ipython中测试代码:
def rep1(lst, replacer =0):''' List comprehension, new list '''return[item if item isnotNoneelse replacer for item in lst]def rep2(lst, replacer =0):''' List comprehension, in-place '''
lst[:]=[item if item isnotNoneelse replacer for item in lst]return lst
def rep3(lst, replacer =0):''' enumerate() with comparison - in-place '''for idx, item in enumerate(lst):if item isNone:
lst[idx]= replacer
return lst
def rep4(lst, replacer =0):''' Using str.index + Exception, in-place '''
idx =-1# none_amount = lst.count(None)whileTrue:try:
idx = lst.index(None, idx+1)exceptValueError:breakelse:
lst[idx]= replacer
return lst
def rep5(lst, replacer =0):''' Using str.index + str.count, in-place '''
idx =-1for _ in range(lst.count(None)):
idx = lst.index(None, idx+1)
lst[idx]= replacer
return lst
def rep6(lst, replacer =0):''' Using map, return map iterator '''return map(lambda item: item if item isnotNoneelse replacer, lst)def rep7(lst, replacer =0):''' Using map, return new list '''return list(map(lambda item: item if item isnotNoneelse replacer, lst))
lst =[5]*10**6# lst = [None]*10**6%timeit rep1(lst)%timeit rep2(lst)%timeit rep3(lst)%timeit rep4(lst)%timeit rep5(lst)%timeit rep6(lst)%timeit rep7(lst)
我得到:
26.3 ms ±163µs per loop (mean ± std. dev. of 7 runs,10 loops each)29.3 ms ±206µs per loop (mean ± std. dev. of 7 runs,10 loops each)33.8 ms ±191µs per loop (mean ± std. dev. of 7 runs,10 loops each)11.9 ms ±37.8µs per loop (mean ± std. dev. of 7 runs,100 loops each)11.9 ms ±60.2µs per loop (mean ± std. dev. of 7 runs,100 loops each)260 ns ±1.84 ns per loop (mean ± std. dev. of 7 runs,1000000 loops each)56.5 ms ±204µs per loop (mean ± std. dev. of 7 runs,10 loops each)
On long lists and rare occurrences its about 3x faster using list.index() – compared to single step iteration methods presented in the other answers.
def list_replace(lst, old=1, new=10):
"""replace list elements (inplace)"""
i = -1
try:
while 1:
i = lst.index(old, i + 1)
lst[i] = new
except ValueError:
pass
回答 11
您可以在python中简单地使用列表理解:
def replace_element(YOUR_LIST, set_to=NEW_VALUE):return[i
if SOME_CONDITION
else NEW_VALUE
for i in YOUR_LIST]
对于您的情况,要将所有出现的1替换为10,代码片段将如下所示:
def replace_element(YOUR_LIST, set_to=10):return[i
if i !=1# keeps all elements not equal to oneelse set_to # replaces 1 with 10for i in YOUR_LIST]
def replace_element(YOUR_LIST, set_to=NEW_VALUE):
return [i
if SOME_CONDITION
else NEW_VALUE
for i in YOUR_LIST]
for your case, where you want to replace all occurrences of 1 with 10, the code snippet will be like this:
def replace_element(YOUR_LIST, set_to=10):
return [i
if i != 1 # keeps all elements not equal to one
else set_to # replaces 1 with 10
for i in YOUR_LIST]
回答 12
仅查找和替换一项
ur_list =[1,2,1]# replace the first 1 wiz 11
loc = ur_list.index(1)
ur_list.remove(1)
ur_list.insert(loc,11)----------[11,2,1]
this is a paragraph with<[1>in between</[1>and then there are cases ... where the<[99> number ranges from1-100</[99>.and there are many other lines in the txt files
with<[3> such tags </[3>
所需的输出是:
this is a paragraph within between and then there are cases ... where the number ranges from1-100.and there are many other lines in the txt files
with such tags
我已经试过了:
#!/usr/bin/pythonimport os, sys, re, glob
for infile in glob.glob(os.path.join(os.getcwd(),'*.txt')):for line in reader:
line2 = line.replace('<[1> ','')
line = line2.replace('</[1> ','')
line2 = line.replace('<[1>','')
line = line2.replace('</[1>','')print line
我也尝试过此方法(但似乎我使用了错误的regex语法):
line2 = line.replace('<[*> ','')
line = line2.replace('</[*> ','')
line2 = line.replace('<[*>','')
line = line2.replace('</[*>','')
I need some help on declaring a regex. My inputs are like the following:
this is a paragraph with<[1> in between</[1> and then there are cases ... where the<[99> number ranges from 1-100</[99>.
and there are many other lines in the txt files
with<[3> such tags </[3>
The required output is:
this is a paragraph with in between and then there are cases ... where the number ranges from 1-100.
and there are many other lines in the txt files
with such tags
I’ve tried this:
#!/usr/bin/python
import os, sys, re, glob
for infile in glob.glob(os.path.join(os.getcwd(), '*.txt')):
for line in reader:
line2 = line.replace('<[1> ', '')
line = line2.replace('</[1> ', '')
line2 = line.replace('<[1>', '')
line = line2.replace('</[1>', '')
print line
I’ve also tried this (but it seems like I’m using the wrong regex syntax):
line2 = line.replace('<[*> ', '')
line = line2.replace('</[*> ', '')
line2 = line.replace('<[*>', '')
line = line2.replace('</[*>', '')
I dont want to hard-code the replace from 1 to 99 . . .
回答 0
这个经过测试的代码段应该做到这一点:
import re
line = re.sub(r"</?\[\d+>","", line)
编辑:这是解释其工作方式的注释版本:
line = re.sub(r"""
(?x) # Use free-spacing mode.
< # Match a literal '<'
/? # Optionally match a '/'
\[ # Match a literal '['
\d+ # Match one or more digits
> # Match a literal '>'
""","", line)
Edit: Here’s a commented version explaining how it works:
line = re.sub(r"""
(?x) # Use free-spacing mode.
< # Match a literal '<'
/? # Optionally match a '/'
\[ # Match a literal '['
\d+ # Match one or more digits
> # Match a literal '>'
""", "", line)
Regexes are fun! But I would strongly recommend spending an hour or two studying the basics. For starters, you need to learn which characters are special: “metacharacters” which need to be escaped (i.e. with a backslash placed in front – and the rules are different inside and outside character classes.) There is an excellent online tutorial at: www.regular-expressions.info. The time you spend there will pay for itself many times over. Happy regexing!
str.replace() does fixed replacements. Use re.sub() instead.
回答 2
我会这样(正则表达式在注释中说明):
import re
# If you need to use the regex more than once it is suggested to compile it.
pattern = re.compile(r"</{0,}\[\d+>")# <\/{0,}\[\d+># # Match the character “<” literally «<»# Match the character “/” literally «\/{0,}»# Between zero and unlimited times, as many times as possible, giving back as needed (greedy) «{0,}»# Match the character “[” literally «\[»# Match a single digit 0..9 «\d+»# Between one and unlimited times, as many times as possible, giving back as needed (greedy) «+»# Match the character “>” literally «>»
subject ="""this is a paragraph with<[1> in between</[1> and then there are cases ... where the<[99> number ranges from 1-100</[99>.
and there are many other lines in the txt files
with<[3> such tags </[3>"""
result = pattern.sub("", subject)print(result)
如果您想了解有关正则表达式的更多信息,建议阅读 Jan Goyvaerts和Steven Levithan撰写的《表达式食谱》。
I would go like this (regex explained in comments):
import re
# If you need to use the regex more than once it is suggested to compile it.
pattern = re.compile(r"</{0,}\[\d+>")
# <\/{0,}\[\d+>
#
# Match the character “<” literally «<»
# Match the character “/” literally «\/{0,}»
# Between zero and unlimited times, as many times as possible, giving back as needed (greedy) «{0,}»
# Match the character “[” literally «\[»
# Match a single digit 0..9 «\d+»
# Between one and unlimited times, as many times as possible, giving back as needed (greedy) «+»
# Match the character “>” literally «>»
subject = """this is a paragraph with<[1> in between</[1> and then there are cases ... where the<[99> number ranges from 1-100</[99>.
and there are many other lines in the txt files
with<[3> such tags </[3>"""
result = pattern.sub("", subject)
print(result)
If you want to learn more about regex I recomend to read Regular Expressions Cookbook by Jan Goyvaerts and Steven Levithan.
回答 3
最简单的方法
import re
txt='this is a paragraph with<[1> in between</[1> and then there are cases ... where the<[99> number ranges from 1-100</[99>. and there are many other lines in the txt files with<[3> such tags </[3>'
out = re.sub("(<[^>]+>)",'', txt)print out
import re
txt='this is a paragraph with<[1> in between</[1> and then there are cases ... where the<[99> number ranges from 1-100</[99>. and there are many other lines in the txt files with<[3> such tags </[3>'
out = re.sub("(<[^>]+>)", '', txt)
print out
import re
newline= re.sub("<\/?\[[0-9]+>", "", line)
回答 5
不必使用正则表达式(用于您的示例字符串)
>>> s
'this is a paragraph with<[1> in between</[1> and then there are cases ... where the<[99> number ranges from 1-100</[99>. \nand there are many other lines in the txt files\nwith<[3> such tags </[3>\n'>>>for w in s.split(">"):...if"<"in w:...print w.split("<")[0]...
this is a paragraph within between
and then there are cases ... where the
number ranges from1-100.and there are many other lines in the txt files
with
such tags
don’t have to use regular expression (for your sample string)
>>> s
'this is a paragraph with<[1> in between</[1> and then there are cases ... where the<[99> number ranges from 1-100</[99>. \nand there are many other lines in the txt files\nwith<[3> such tags </[3>\n'
>>> for w in s.split(">"):
... if "<" in w:
... print w.split("<")[0]
...
this is a paragraph with
in between
and then there are cases ... where the
number ranges from 1-100
.
and there are many other lines in the txt files
with
such tags
回答 6
import os, sys, re, glob
pattern = re.compile(r"\<\[\d\>")
replacementStringMatchesPattern ="<[1>"for infile in glob.glob(os.path.join(os.getcwd(),'*.txt')):for line in reader:
retline = pattern.sub(replacementStringMatchesPattern,"", line)
sys.stdout.write(retline)print(retline)
import os, sys, re, glob
pattern = re.compile(r"\<\[\d\>")
replacementStringMatchesPattern = "<[1>"
for infile in glob.glob(os.path.join(os.getcwd(), '*.txt')):
for line in reader:
retline = pattern.sub(replacementStringMatchesPattern, "", line)
sys.stdout.write(retline)
print (retline)
df = df.rename(columns={'oldName1':'newName1','oldName2':'newName2'})# Or rename the existing DataFrame (rather than creating a copy)
df.rename(columns={'oldName1':'newName1','oldName2':'newName2'}, inplace=True)
最小代码示例
df = pd.DataFrame('x', index=range(3), columns=list('abcde'))
df
a b c d e0 x x x x x1 x x x x x2 x x x x x
下列方法均起作用并产生相同的输出:
df2 = df.rename({'a':'X','b':'Y'}, axis=1)# new method
df2 = df.rename({'a':'X','b':'Y'}, axis='columns')
df2 = df.rename(columns={'a':'X','b':'Y'})# old method
df2
X Y c d e0 x x x x x1 x x x x x2 x x x x x
切记将结果分配回去,因为修改未就位。或者,指定inplace=True:
df.rename({'a':'X','b':'Y'}, axis=1, inplace=True)
df
X Y c d e0 x x x x x1 x x x x x2 x x x x x
Use the df.rename() function and refer the columns to be renamed. Not all the columns have to be renamed:
df = df.rename(columns={'oldName1': 'newName1', 'oldName2': 'newName2'})
# Or rename the existing DataFrame (rather than creating a copy)
df.rename(columns={'oldName1': 'newName1', 'oldName2': 'newName2'}, inplace=True)
Minimal Code Example
df = pd.DataFrame('x', index=range(3), columns=list('abcde'))
df
a b c d e
0 x x x x x
1 x x x x x
2 x x x x x
The following methods all work and produce the same output:
df2 = df.rename({'a': 'X', 'b': 'Y'}, axis=1) # new method
df2 = df.rename({'a': 'X', 'b': 'Y'}, axis='columns')
df2 = df.rename(columns={'a': 'X', 'b': 'Y'}) # old method
df2
X Y c d e
0 x x x x x
1 x x x x x
2 x x x x x
Remember to assign the result back, as the modification is not-inplace. Alternatively, specify inplace=True:
df.rename({'a': 'X', 'b': 'Y'}, axis=1, inplace=True)
df
X Y c d e
0 x x x x x
1 x x x x x
2 x x x x x
From v0.25, you can also specify errors='raise' to raise errors if an invalid column-to-rename is specified. See v0.25 rename() docs.
REASSIGN COLUMN HEADERS
Use df.set_axis() with axis=1 and inplace=False (to return a copy).
df2 = df.set_axis(['V', 'W', 'X', 'Y', 'Z'], axis=1, inplace=False)
df2
V W X Y Z
0 x x x x x
1 x x x x x
2 x x x x x
This returns a copy, but you can modify the DataFrame in-place by setting inplace=True (this is the default behaviour for versions <=0.24 but is likely to change in the future).
You can also assign headers directly:
df.columns = ['V', 'W', 'X', 'Y', 'Z']
df
V W X Y Z
0 x x x x x
1 x x x x x
2 x x x x x
# new for pandas 0.21+
df.some_method1().some_method2().set_axis().some_method3()# old way
df1 = df.some_method1().some_method2()
df1.columns = columns
df1.some_method3()
There have been some significant updates to column renaming in version 0.21.
The rename method has added the axis parameter which may be set to columns or 1. This update makes this method match the rest of the pandas API. It still has the index and columns parameters but you are no longer forced to use them.
The set_axis method with the inplace set to False enables you to rename all the index or column labels with a list.
The rename function also accepts functions that will be applied to each column name.
df.rename(lambda x: x[1:], axis='columns')
or
df.rename(lambda x: x[1:], axis=1)
Using set_axis with a list and inplace=False
You can supply a list to the set_axis method that is equal in length to the number of columns (or index). Currently, inplace defaults to True, but inplace will be defaulted to False in future releases.
Why not use df.columns = ['a', 'b', 'c', 'd', 'e']?
There is nothing wrong with assigning columns directly like this. It is a perfectly good solution.
The advantage of using set_axis is that it can be used as part of a method chain and that it returns a new copy of the DataFrame. Without it, you would have to store your intermediate steps of the chain to another variable before reassigning the columns.
# new for pandas 0.21+
df.some_method1()
.some_method2()
.set_axis()
.some_method3()
# old way
df1 = df.some_method1()
.some_method2()
df1.columns = columns
df1.some_method3()
This way you can manually edit the new_names as you wish.
Works great when you need to rename only a few columns to correct mispellings, accents, remove special characters etc.
I have the edited column names stored it in a list, but I don’t know how to replace the column names.
I do not want to solve the problem of how to replace '$' or strip the first character off of each column header. OP has already done this step. Instead I want to focus on replacing the existing columns object with a new one given a list of replacement column names.
df.columns = new where new is the list of new columns names is as simple as it gets. The drawback of this approach is that it requires editing the existing dataframe’s columns attribute and it isn’t done inline. I’ll show a few ways to perform this via pipelining without editing the existing dataframe.
Setup 1
To focus on the need to rename of replace column names with a pre-existing list, I’ll create a new sample dataframe df with initial column names and unrelated new column names.
However, you can easily create that dictionary and include it in the call to rename. The following takes advantage of the fact that when iterating over df, we iterate over each column name.
# given just a list of new column names
df.rename(columns=dict(zip(df, new)))
x098 y765 z432
0 1 3 5
1 2 4 6
This works great if your original column names are unique. But if they are not, then this breaks down.
We didn’t map the new list as the column names. We ended up repeating y765. Instead, we can use the keys argument of the pd.concat function while iterating through the columns of df.
pd.concat([c for _, c in df.items()], axis=1, keys=new)
x098 y765 z432
0 1 3 5
1 2 4 6
Solution 3
Reconstruct. This should only be used if you have a single dtype for all columns. Otherwise, you’ll end up with dtypeobject for all columns and converting them back requires more dictionary work.
Solution 4
This is a gimmicky trick with transpose and set_index. pd.DataFrame.set_index allows us to set an index inline but there is no corresponding set_columns. So we can transpose, then set_index, and transpose back. However, the same single dtype versus mixed dtype caveat from solution 3 applies here.
Solution 5
Use a lambda in pd.DataFrame.rename that cycles through each element of new
In this solution, we pass a lambda that takes x but then ignores it. It also takes a y but doesn’t expect it. Instead, an iterator is given as a default value and I can then use that to cycle through one at a time without regard to what the value of x is.
And as pointed out to me by the folks in sopython chat, if I add a * in between x and y, I can protect my y variable. Though, in this context I don’t believe it needs protecting. It is still worth mentioning.
df.columns =['column_one','column_two']
df.columns.names =['name of the list of columns']
df.index.names =['name of the index']
name of the list of columns column_one column_two
name of the index
041152263
I would like to explain a bit what happens behind the scenes.
Dataframes are a set of Series.
Series in turn are an extension of a numpy.array
numpy.arrays have a property .name
This is the name of the series. It is seldom that pandas respects this attribute, but it lingers in places and can be used to hack some pandas behaviors.
Naming the list of columns
A lot of answers here talks about the df.columns attribute being a list when in fact it is a Series. This means it has a .name attribute.
This is what happens if you decide to fill in the name of the columns Series:
df.columns = ['column_one', 'column_two']
df.columns.names = ['name of the list of columns']
df.index.names = ['name of the index']
name of the list of columns column_one column_two
name of the index
0 4 1
1 5 2
2 6 3
Note that the name of the index always comes one column lower.
Artifacts that linger
The .name attribute lingers on sometimes. If you set df.columns = ['one', 'two'] then the df.one.name will be 'one'.
If you set df.one.name = 'three' then df.columns will still give you ['one', 'two'], and df.one.name will give you 'three'
BUT
pd.DataFrame(df.one) will return
three
0 1
1 2
2 3
Because pandas reuses the .name of the already defined Series.
Multi level column names
Pandas has ways of doing multi layered column names. There is not so much magic involved but I wanted to cover this in my answer too since I don’t see anyone picking up on this here.
If you’ve got the dataframe, df.columns dumps everything into a list you can manipulate and then reassign into your dataframe as the names of columns…
columns = df.columns
columns = [row.replace("$","") for row in columns]
df.rename(columns=dict(zip(columns, things)), inplace=True)
df.head() #to validate the output
Best way? IDK. A way – yes.
A better way of evaluating all the main techniques put forward in the answers to the question is below using cProfile to gage memory & execution time. @kadee, @kaitlyn, & @eumiro had the functions with the fastest execution times – though these functions are so fast we’re comparing the rounding of .000 and .001 seconds for all the answers. Moral: my answer above likely isn’t the ‘Best’ way.
The limitation of this method is that if one column has to be changed, full column list has to be passed. Also, this method is not applicable on index labels.
For example, if you passed this:
df.columns = ['a','b','c','d']
This will throw an error. Length mismatch: Expected axis has 5 elements, new values have 4 elements.
Another method is the Pandas rename() method which is used to rename any index, column or row
new_cols =['a','b','c','d','e']
df.columns = new_cols
>>> df
a b c d e
011111
如果您有一个将旧列名键入新列名的字典,则可以执行以下操作:
d ={'$a':'a','$b':'b','$c':'c','$d':'d','$e':'e'}
df.columns = df.columns.map(lambda col: d[col])# Or `.map(d.get)` as pointed out by @PiRSquared.>>> df
a b c d e
011111
如果没有列表或字典映射,则可以$通过列表理解来去除前导符号:
df.columns =[col[1:]if col[0]=='$'else col for col in df]
If your new list of columns is in the same order as the existing columns, the assignment is simple:
new_cols = ['a', 'b', 'c', 'd', 'e']
df.columns = new_cols
>>> df
a b c d e
0 1 1 1 1 1
If you had a dictionary keyed on old column names to new column names, you could do the following:
d = {'$a': 'a', '$b': 'b', '$c': 'c', '$d': 'd', '$e': 'e'}
df.columns = df.columns.map(lambda col: d[col]) # Or `.map(d.get)` as pointed out by @PiRSquared.
>>> df
a b c d e
0 1 1 1 1 1
If you don’t have a list or dictionary mapping, you could strip the leading $ symbol via a list comprehension:
df.columns = [col[1:] if col[0] == '$' else col for col in df]
df = pd.DataFrame({"A":[1,2,3],"B":[4,5,6]})#creating a df with column name A and B
df.rename({"A":"new_a","B":"new_b"},axis='columns',inplace =True)#renaming column A with 'new_a' and B with 'new_b'
output:
new_a new_b
014125236
2.使用映射重命名索引/行名:
df.rename({0:"x",1:"y",2:"z"},axis='index',inplace =True)#Row name are getting replaced by 'x','y','z'.
output:
new_a new_b
x 14
y 25
z 36
df = pd.DataFrame({"A": [1, 2, 3], "B": [4, 5, 6]}) #creating a df with column name A and B
df.rename({"A": "new_a", "B": "new_b"},axis='columns',inplace =True) #renaming column A with 'new_a' and B with 'new_b'
output:
new_a new_b
0 1 4
1 2 5
2 3 6
2.Renaming index/Row_Name using mapping:
df.rename({0: "x", 1: "y", 2: "z"},axis='index',inplace =True) #Row name are getting replaced by 'x','y','z'.
output:
new_a new_b
x 1 4
y 2 5
z 3 6
I know this question and answer has been chewed to death. But I referred to it for inspiration for one of the problem I was having . I was able to solve it using bits and pieces from different answers hence providing my response in case anyone needs it.
My method is generic wherein you can add additional delimiters by comma separating delimiters= variable and future-proof it.
Working Code:
import pandas as pd
import re
df = pd.DataFrame({'$a':[1,2], '$b': [3,4],'$c':[5,6], '$d': [7,8], '$e': [9,10]})
delimiters = '$'
matchPattern = '|'.join(map(re.escape, delimiters))
df.columns = [re.split(matchPattern, i)[1] for i in df.columns ]
Output:
>>> df
$a $b $c $d $e
0 1 3 5 7 9
1 2 4 6 8 10
>>> df
a b c d e
0 1 3 5 7 9
1 2 4 6 8 10
回答 19
请注意,这些方法不适用于MultiIndex。对于MultiIndex,您需要执行以下操作:
>>> df = pd.DataFrame({('$a','$x'):[1,2],('$b','$y'):[3,4],('e','f'):[5,6]})>>> df
$a $b e
$x $y f
01351246>>> rename ={('$a','$x'):('a','x'),('$b','$y'):('b','y')}>>> df.columns = pandas.MultiIndex.from_tuples([
rename.get(item, item)for item in df.columns.tolist()])>>> df
a b e
x y f
01351246
Note that these approach do not work for a MultiIndex. For a MultiIndex, you need to do something like the following:
>>> df = pd.DataFrame({('$a','$x'):[1,2], ('$b','$y'): [3,4], ('e','f'):[5,6]})
>>> df
$a $b e
$x $y f
0 1 3 5
1 2 4 6
>>> rename = {('$a','$x'):('a','x'), ('$b','$y'):('b','y')}
>>> df.columns = pandas.MultiIndex.from_tuples([
rename.get(item, item) for item in df.columns.tolist()])
>>> df
a b e
x y f
0 1 3 5
1 2 4 6
回答 20
另一种选择是使用正则表达式重命名:
import pandas as pd
import re
df = pd.DataFrame({'$a':[1,2],'$b':[3,4],'$c':[5,6]})
df = df.rename(columns=lambda x: re.sub('\$','',x))>>> df
a b c
01351246
If you have to deal with loads of columns named by the providing system out of your control, I came up with the following approach that is a combination of a general approach and specific replacments in one go.
First create a dictionary from the dataframe column names using regex expressions in order to throw away certain appendixes of column names
and then add specific replacements to the dictionary to name core columns as expected later in the receiving database.
import pandas as pd
ufo_cols =['city','color reported','shape reported','state','time']
ufo.columns = ufo_cols
ufo = pd.read_csv('link to the file you are using', names = ufo_cols, header =0)
In addition to the solution already provided, you can replace all the columns while you are reading the file. We can use names and header=0 to do that.
First, we create a list of the names that we like to use as our column names:
import pandas as pd
ufo_cols = ['city', 'color reported', 'shape reported', 'state', 'time']
ufo.columns = ufo_cols
ufo = pd.read_csv('link to the file you are using', names = ufo_cols, header = 0)
In this case, all the column names will be replaced with the names you have in your list.
回答 23
这是一个我喜欢用来减少键入的漂亮小功能:
def rename(data, oldnames, newname):if type(oldnames)== str:#input can be a string or list of strings
oldnames =[oldnames]#when renaming multiple columns
newname =[newname]#make sure you pass the corresponding list of new names
i =0for name in oldnames:
oldvar =[c for c in data.columns if name in c]if len(oldvar)==0:raiseValueError("Sorry, couldn't find that column in the dataset")if len(oldvar)>1:#doesn't have to be an exact match print("Found multiple columns that matched "+ str(name)+" :")for c in oldvar:print(str(oldvar.index(c))+": "+ str(c))
ind = input('please enter the index of the column you would like to rename: ')
oldvar = oldvar[int(ind)]if len(oldvar)==1:
oldvar = oldvar[0]
data = data.rename(columns ={oldvar : newname[i]})
i +=1return data
这是它如何工作的示例:
In[2]: df = pd.DataFrame(np.random.randint(0,10,size=(10,4)), columns=['col1','col2','omg','idk'])#first list = existing variables#second list = new names for those variablesIn[3]: df = rename(df,['col','omg'],['first','ohmy'])Found multiple columns that matched col :0: col1
1: col2
please enter the index of the column you would like to rename:0In[4]: df.columns
Out[5]:Index(['first','col2','ohmy','idk'], dtype='object')
Here’s a nifty little function I like to use to cut down on typing:
def rename(data, oldnames, newname):
if type(oldnames) == str: #input can be a string or list of strings
oldnames = [oldnames] #when renaming multiple columns
newname = [newname] #make sure you pass the corresponding list of new names
i = 0
for name in oldnames:
oldvar = [c for c in data.columns if name in c]
if len(oldvar) == 0:
raise ValueError("Sorry, couldn't find that column in the dataset")
if len(oldvar) > 1: #doesn't have to be an exact match
print("Found multiple columns that matched " + str(name) + " :")
for c in oldvar:
print(str(oldvar.index(c)) + ": " + str(c))
ind = input('please enter the index of the column you would like to rename: ')
oldvar = oldvar[int(ind)]
if len(oldvar) == 1:
oldvar = oldvar[0]
data = data.rename(columns = {oldvar : newname[i]})
i += 1
return data
Here is an example of how it works:
In [2]: df = pd.DataFrame(np.random.randint(0,10,size=(10, 4)), columns=['col1','col2','omg','idk'])
#first list = existing variables
#second list = new names for those variables
In [3]: df = rename(df, ['col','omg'],['first','ohmy'])
Found multiple columns that matched col :
0: col1
1: col2
please enter the index of the column you would like to rename: 0
In [4]: df.columns
Out[5]: Index(['first', 'col2', 'ohmy', 'idk'], dtype='object')