问题:为什么Python的原始字符串文字不能以单个反斜杠结尾?
从技术上讲,文档中描述了任意数量的反斜杠。
>>> r'\'
File "<stdin>", line 1
r'\'
^
SyntaxError: EOL while scanning string literal
>>> r'\\'
'\\\\'
>>> r'\\\'
File "<stdin>", line 1
r'\\\'
^
SyntaxError: EOL while scanning string literal
似乎解析器可以将原始字符串中的反斜杠视为常规字符(这不是原始字符串的全部含义吗?),但是我可能缺少明显的东西。
Technically, any odd number of backslashes, as described in the documentation.
>>> r'\'
File "<stdin>", line 1
r'\'
^
SyntaxError: EOL while scanning string literal
>>> r'\\'
'\\\\'
>>> r'\\\'
File "<stdin>", line 1
r'\\\'
^
SyntaxError: EOL while scanning string literal
It seems like the parser could just treat backslashes in raw strings as regular characters (isn’t that what raw strings are all about?), but I’m probably missing something obvious.
回答 0
我在该部分中以粗体突出显示了原因:
字符串引号可以使用反斜杠转义,但反斜杠仍保留在字符串中;例如,r"\""
是由两个字符组成的有效字符串文字:反斜杠和双引号;r"\"
不是有效的字符串文字(即使是原始字符串也不能以奇数个反斜杠结尾)。特别是,原始字符串不能以单个反斜杠结尾(因为反斜杠会转义以下引号字符)。还请注意,单个反斜杠后跟换行符将被解释为这两个字符是字符串的一部分,而不是换行符。
因此,原始字符串不是100%原始的,仍然存在一些基本的反斜杠处理。
The reason is explained in the part of that section which I highlighted in bold:
String quotes can be escaped with a
backslash, but the backslash remains
in the string; for example, r"\""
is a
valid string literal consisting of two
characters: a backslash and a double
quote; r"\"
is not a valid string
literal (even a raw string cannot end
in an odd number of backslashes).
Specifically, a raw string cannot end
in a single backslash (since the
backslash would escape the following
quote character). Note also that a
single backslash followed by a newline
is interpreted as those two characters
as part of the string, not as a line
continuation.
So raw strings are not 100% raw, there is still some rudimentary backslash-processing.
回答 1
关于python原始字符串的整个误解是,大多数人都认为反斜杠(在原始字符串内)与其他所有字符一样都是常规字符。它不是。要了解的关键是此python的教程序列:
当存在’ r ‘或’ R ‘前缀时,字符串中包含反斜杠后的字符而无需更改,并且所有反斜杠都保留在字符串中
因此,反斜杠后面的任何字符都是原始字符串的一部分。解析器输入原始字符串(非Unicode字符串)并遇到反斜杠后,便知道存在2个字符(紧随其后的是反斜杠和char)。
这条路:
r’abc \ d’包含a,b,c,\,d
r’abc \’d’包含a,b,c,\,’,d
r’abc \”包括a,b,c,\,’
和:
r’abc \’包含a,b,c,\,’,但现在没有终止引号。
最后一种情况表明,根据文档,解析器现在找不到结尾的引号,因为您在上面看到的最后一个引号是字符串的一部分,即反斜杠不能在此结尾,因为它将“吞噬”字符串的结尾字符。
The whole misconception about python’s raw strings is that most of people think that backslash (within a raw string) is just a regular character as all others. It is NOT. The key to understand is this python’s tutorial sequence:
When an ‘r‘ or ‘R‘ prefix is present, a character following a
backslash is included in the string without change, and all
backslashes are left in the string
So any character following a backslash is part of raw string. Once parser enters a raw string (non Unicode one) and encounters a backslash it knows there are 2 characters (a backslash and a char following it).
This way:
r’abc\d’ comprises a, b, c, \, d
r’abc\’d’ comprises a, b, c, \, ‘, d
r’abc\” comprises a, b, c, \, ‘
and:
r’abc\’ comprises a, b, c, \, ‘ but there is no terminating quote now.
Last case shows that according to documentation now a parser cannot find closing quote as the last quote you see above is part of the string i.e. backslash cannot be last here as it will ‘devour’ string closing char.
回答 2
它就是这样儿的!我将其视为python中的那些小缺陷之一!
我认为没有充分的理由,但绝对不是要解析。使用\作为最后符来解析原始字符串真的很容易。
问题是,如果您允许\成为原始字符串中的最后符,那么您将无法在原始字符串中放入“。”似乎python允许使用“而不是将\作为最后符。
但是,这不会造成任何麻烦。
如果您担心无法轻松地编写Windows文件夹路径(例如,c:\mypath\
然后不用担心),则可以将它们表示为r"C:\mypath"
,并且,如果需要附加子目录名称,请不要使用字符串串联来实现,无论如何,这不是正确的方法!用os.path.join
>>> import os
>>> os.path.join(r"C:\mypath", "subfolder")
'C:\\mypath\\subfolder'
That’s the way it is! I see it as one of those small defects in python!
I don’t think there’s a good reason for it, but it’s definitely not parsing; it’s really easy to parse raw strings with \ as a last character.
The catch is, if you allow \ to be the last character in a raw string then you won’t be able to put ” inside a raw string. It seems python went with allowing ” instead of allowing \ as the last character.
However, this shouldn’t cause any trouble.
If you’re worried about not being able to easily write windows folder pathes such as c:\mypath\
then worry not, for, you can represent them as r"C:\mypath"
, and, if you need to append a subdirectory name, don’t do it with string concatenation, for it’s not the right way to do it anyway! use os.path.join
>>> import os
>>> os.path.join(r"C:\mypath", "subfolder")
'C:\\mypath\\subfolder'
回答 3
为了使您的原始字符串以斜杠结尾,我建议您可以使用以下技巧:
>>> print r"c:\test"'\\'
test\
In order for you to end a raw string with a slash I suggest you can use this trick:
>>> print r"c:\test"'\\'
test\
回答 4
另一个技巧是在计算结果为“ \”时使用chr(92)。
最近,我不得不清理一串反斜线,而以下方法可以解决问题:
CleanString = DirtyString.replace(chr(92),'')
我意识到这并不能解决“为什么”的问题,但是线程吸引了许多人寻找解决当前问题的方法。
Another trick is to use chr(92) as it evaluates to “\”.
I recently had to clean a string of backslashes and the following did the trick:
CleanString = DirtyString.replace(chr(92),'')
I realize that this does not take care of the “why” but the thread attracts many people looking for a solution to an immediate problem.
回答 5
由于原始字符串中允许使用\“。因此不能用于标识字符串文字的结尾。
为什么在遇到第一个“”时不停止解析字符串文字?
如果真是这样,那么在字符串文字中将不允许使用“”。
Since \” is allowed inside the raw string. Then it can’t be used to identify the end of the string literal.
Why not stop parsing the string literal when you encounter the first “?
If that was the case, then \” wouldn’t be allowed inside the string literal. But it is.
回答 6
r'\'
语法错误的原因是,尽管字符串表达式是原始的,但使用的引号(单引号或双引号)始终必须转义,否则它们会标记引号的结尾。因此,如果您想在单引号引起来的字符串中表达单引号,则没有其他方法可以使用\'
。同样适用于双引号。
但是您可以使用:
'\\'
The reason for why r'\'
is syntactical incorrect is that although the string expression is raw the used quotes (single or double) always have to be escape since they would mark the end of the quote otherwise. So if you want to express a single quote inside single quoted string, there is no other way than using \'
. Same applies for double quotes.
But you could use:
'\\'
回答 7
此后删除了答案的另一位用户(不确定是否要记入他们的答案)建议,Python语言设计人员可以通过使用相同的解析规则并将转义的字符扩展为原始格式来简化解析器设计。 (如果文字被标记为原始)。
我认为这是一个有趣的想法,并将其作为后代社区Wiki包含在内。
Another user who has since deleted their answer (not sure if they’d like to be credited) suggested that the Python language designers may be able to simplify the parser design by using the same parsing rules and expanding escaped characters to raw form as an afterthought (if the literal was marked as raw).
I thought it was an interesting idea and am including it as community wiki for posterity.
回答 8
尽管其作用很大,但即使是原始字符串也不能以单个反斜杠结尾,因为反斜杠会转义以下引号字符—您仍必须先转义周围的引号字符才能将其嵌入到字符串中。也就是说,r“ … \”不是有效的字符串文字-原始字符串不能以奇数个反斜杠结尾。
如果需要用单个反斜杠结束原始字符串,则可以使用两个反斜杠。
Despite its role, even a raw string cannot end in a single
backslash, because the backslash escapes the following quote
character—you still must escape the surrounding quote character to
embed it in the string. That is, r”…\” is not a valid string
literal—a raw string cannot end in an odd number of backslashes.
If you need to end a raw string with a single backslash, you can use
two and slice off the second.
回答 9
从C来看,对我来说很清楚,单个\用作转义符,允许您将特殊字符(例如换行符,制表符和引号)放入字符串中。
确实确实不允许\作为最后符,因为它将逃脱“并使解析器阻塞。但是如前所述,\是合法的。
Comming from C it pretty clear to me that a single \ works as escape character allowing you to put special characters such as newlines, tabs and quotes into strings.
That does indeed disallow \ as last character since it will escape the ” and make the parser choke. But as pointed out earlier \ is legal.
回答 10
一些技巧 :
1)如果您需要为路径操纵反斜杠,则标准python模块os.path是您的朋友。例如 :
os.path.normpath(’c:/ folder1 /’)
2)如果您要构建的字符串中带有反斜杠,但字符串末尾没有反斜杠,那么原始字符串就是您的朋友(在文字字符串前使用’r’前缀)。例如 :
r'\one \two \three'
3)如果您需要为变量X中的字符串加上反斜杠作为前缀,则可以执行以下操作:
X='dummy'
bs=r'\ ' # don't forget the space after backslash or you will get EOL error
X2=bs[0]+X # X2 now contains \dummy
4)如果您需要创建一个结尾处带有反斜杠的字符串,则结合技巧2和3:
voice_name='upper'
lilypond_display=r'\DisplayLilyMusic \ ' # don't forget the space at the end
lilypond_statement=lilypond_display[:-1]+voice_name
现在lilypond_statement包含 "\DisplayLilyMusic \upper"
Python万岁!:)
n3on
some tips :
1) if you need to manipulate backslash for path then standard python module os.path is your friend. for example :
os.path.normpath(‘c:/folder1/’)
2) if you want to build strings with backslash in it BUT without backslash at the END of your string then raw string is your friend (use ‘r’ prefix before your literal string). for example :
r'\one \two \three'
3) if you need to prefix a string in a variable X with a backslash then you can do this :
X='dummy'
bs=r'\ ' # don't forget the space after backslash or you will get EOL error
X2=bs[0]+X # X2 now contains \dummy
4) if you need to create a string with a backslash at the end then combine tip 2 and 3 :
voice_name='upper'
lilypond_display=r'\DisplayLilyMusic \ ' # don't forget the space at the end
lilypond_statement=lilypond_display[:-1]+voice_name
now lilypond_statement contains "\DisplayLilyMusic \upper"
long live python ! :)
n3on
回答 11
我遇到了这个问题,并找到了部分解决方案,在某些情况下是好的。尽管python无法以单个反斜杠结束字符串,但是可以将其序列化并保存在文本文件中,并以单个反斜杠结尾。因此,如果您需要在计算机上保存带有单个反斜杠的文本,则可以:
x = 'a string\\'
x
'a string\\'
# Now save it in a text file and it will appear with a single backslash:
with open("my_file.txt", 'w') as h:
h.write(x)
顺便说一句,如果您使用python的json库转储它,它就不能与json一起使用。
最后,我使用了Spyder,我注意到,如果我在Spider的文本编辑器中通过在变量资源管理器中双击其名称来打开该变量,则该变量将带有一个反斜杠,并且可以通过这种方式复制到剪贴板(不是对大多数需求都非常有帮助,但也许对某些人很有帮助。)
I encountered this problem and found a partial solution which is good for some cases. Despite python not being able to end a string with a single backslash, it can be serialized and saved in a text file with a single backslash at the end. Therefore if what you need is saving a text with a single backslash on you computer, it is possible:
x = 'a string\\'
x
'a string\\'
# Now save it in a text file and it will appear with a single backslash:
with open("my_file.txt", 'w') as h:
h.write(x)
BTW it is not working with json if you dump it using python’s json library.
Finally, I work with Spyder, and I noticed that if I open the variable in spider’s text editor by double clicking on its name in the variable explorer, it is presented with a single backslash and can be copied to the clipboard that way (it’s not very helpful for most needs but maybe for some..).