如何取消转义的反斜杠字符串?

问题:如何取消转义的反斜杠字符串?

假设我有一个字符串,它是另一个字符串的反斜杠转义版本。在Python中,有没有一种简便的方法可以使字符串不转义?例如,我可以这样做:

>>> escaped_str = '"Hello,\\nworld!"'
>>> raw_str = eval(escaped_str)
>>> print raw_str
Hello,
world!
>>> 

但是,这涉及将(可能不受信任的)字符串传递给eval(),这是安全隐患。标准库中是否有一个函数可以接收一个字符串并生成一个不涉及安全性的字符串?

Suppose I have a string which is a backslash-escaped version of another string. Is there an easy way, in Python, to unescape the string? I could, for example, do:

>>> escaped_str = '"Hello,\\nworld!"'
>>> raw_str = eval(escaped_str)
>>> print raw_str
Hello,
world!
>>> 

However that involves passing a (possibly untrusted) string to eval() which is a security risk. Is there a function in the standard lib which takes a string and produces a string with no security implications?


回答 0

>>> print '"Hello,\\nworld!"'.decode('string_escape')
"Hello,
world!"
>>> print '"Hello,\\nworld!"'.decode('string_escape')
"Hello,
world!"

回答 1

您可以使用ast.literal_eval哪个是安全的:

安全地评估表达式节点或包含Python表达式的字符串。提供的字符串或节点只能由以下Python文字结构组成:字符串,数字,元组,列表,字典,布尔值和无。(结束)

像这样:

>>> import ast
>>> escaped_str = '"Hello,\\nworld!"'
>>> print ast.literal_eval(escaped_str)
Hello,
world!

You can use ast.literal_eval which is safe:

Safely evaluate an expression node or a string containing a Python expression. The string or node provided may only consist of the following Python literal structures: strings, numbers, tuples, lists, dicts, booleans, and None. (END)

Like this:

>>> import ast
>>> escaped_str = '"Hello,\\nworld!"'
>>> print ast.literal_eval(escaped_str)
Hello,
world!

回答 2

所有给出的答案将在通用Unicode字符串上中断。据我所知,以下代码在所有情况下都适用于Python3:

from codecs import encode, decode
sample = u'mon€y\\nröcks'
result = decode(encode(sample, 'latin-1', 'backslashreplace'), 'unicode-escape')
print(result)

如注释中所述,您还可以像下面这样使用模块中的literal_eval方法ast

import ast
sample = u'mon€y\\nröcks'
print(ast.literal_eval(F'"{sample}"'))

当您的字符串确实包含字符串文字(包括引号)时,也可以这样:

import ast
sample = u'"mon€y\\nröcks"'
print(ast.literal_eval(sample))

但是,如果不确定输入字符串是使用双引号还是单引号作为定界符,或者不确定根本不能正确转义输入字符串,则literal_eval可能会花点时间SyntaxError编码/解码方法仍然有效。

All given answers will break on general Unicode strings. The following works for Python3 in all cases, as far as I can tell:

from codecs import encode, decode
sample = u'mon€y\\nröcks'
result = decode(encode(sample, 'latin-1', 'backslashreplace'), 'unicode-escape')
print(result)

As outlined in the comments, you can also use the literal_eval method from the ast module like so:

import ast
sample = u'mon€y\\nröcks'
print(ast.literal_eval(F'"{sample}"'))

Or like this when your string really contains a string literal (including the quotes):

import ast
sample = u'"mon€y\\nröcks"'
print(ast.literal_eval(sample))

However, if you are uncertain whether the input string uses double or single quotes as delimiters, or when you cannot assume it to be properly escaped at all, then literal_eval may raise a SyntaxError while the encode/decode method will still work.


回答 3

在python 3中,str对象没有decode方法,您必须使用bytes对象。ChristopheD的答案涵盖了python 2。

# create a `bytes` object from a `str`
my_str = "Hello,\\nworld"
# (pick an encoding suitable for your str, e.g. 'latin1')
my_bytes = my_str.encode("utf-8")

# or directly
my_bytes = b"Hello,\\nworld"

print(my_bytes.decode("unicode_escape"))
# "Hello,
# world"

In python 3, str objects don’t have a decode method and you have to use a bytes object. ChristopheD’s answer covers python 2.

# create a `bytes` object from a `str`
my_str = "Hello,\\nworld"
# (pick an encoding suitable for your str, e.g. 'latin1')
my_bytes = my_str.encode("utf-8")

# or directly
my_bytes = b"Hello,\\nworld"

print(my_bytes.decode("unicode_escape"))
# "Hello,
# world"