定义Python源代码编码的正确方法

问题:定义Python源代码编码的正确方法

PEP 263定义了如何声明Python源代码编码。

通常,Python文件的前两行应以:

#!/usr/bin/python
# -*- coding: <encoding name> -*-

但是我看过很多以以下内容开头的文件:

#!/usr/bin/python
# -*- encoding: <encoding name> -*-

=> 编码而不是编码

那么,声明文件编码的正确方法是什么?

是否允许使用编码,因为使用的正则表达式是惰性的?还是仅仅是声明文件编码的另一种形式?

我问这个问题是因为PEP不在谈论编码,它只是在谈论编码

PEP 263 defines how to declare Python source code encoding.

Normally, the first 2 lines of a Python file should start with:

#!/usr/bin/python
# -*- coding: <encoding name> -*-

But I have seen a lot of files starting with:

#!/usr/bin/python
# -*- encoding: <encoding name> -*-

=> encoding instead of coding.

So what is the correct way of declaring the file encoding?

Is encoding permitted because the regex used is lazy? Or is it just another form of declaring the file encoding?

I’m asking this question because the PEP does not talk about encoding, it just talks about coding.


回答 0

这里检查文档:

“如果Python脚本的第一行或第二行中的coding[=:]\s*([-\w.]+)注释与正则表达式匹配,则此注释将作为编码声明处理”

“此表述的推荐形式是

# -*- coding: <encoding-name> -*-

GNU Emacs也认识到这一点,并且

# vim:fileencoding=<encoding-name>

被Bram Moolenaar的VIM认可。”

因此,您可以在“编码”部分之前放置几乎所有内容,但是如果要100%兼容python-docs-recommendation,则应坚持使用“编码”(无前缀)。

更具体地说,您需要使用Python可以识别的任何东西以及您使用的特定编辑软件(如果它完全需要/接受任何东西)。例如,coding表格被GNU Emacs识别(开箱即用),但未被Vim识别(是的,没有普遍的协议,这本质上是一场草皮大战)。

Check the docs here:

“If a comment in the first or second line of the Python script matches the regular expression coding[=:]\s*([-\w.]+), this comment is processed as an encoding declaration”

“The recommended forms of this expression are

# -*- coding: <encoding-name> -*-

which is recognized also by GNU Emacs, and

# vim:fileencoding=<encoding-name>

which is recognized by Bram Moolenaar’s VIM.”

So, you can put pretty much anything before the “coding” part, but stick to “coding” (with no prefix) if you want to be 100% python-docs-recommendation-compatible.

More specifically, you need to use whatever is recognized by Python and the specific editing software you use (if it needs/accepts anything at all). E.g. the coding form is recognized (out of the box) by GNU Emacs but not Vim (yes, without a universal agreement, it’s essentially a turf war).


回答 1

PEP 263:

第一或第二行必须匹配正则表达式“ coding [:=] \ s *([-\ w。] +)”

因此,“ en 编码:UTF-8 ”匹配。

PEP提供了一些示例:

#!/usr/bin/python
# vim: set fileencoding=<encoding name> :

 

# This Python file uses the following encoding: utf-8
import os, sys

PEP 263:

the first or second line must match the regular expression “coding[:=]\s*([-\w.]+)”

So, “encoding: UTF-8” matches.

PEP provides some examples:

#!/usr/bin/python
# vim: set fileencoding=<encoding name> :

 

# This Python file uses the following encoding: utf-8
import os, sys

回答 2

只需在程序顶部的语句下面复制粘贴即可解决字符编码问题

#!/usr/bin/env python
# -*- coding: utf-8 -*-

Just copy paste below statement on the top of your program.It will solve character encoding problems

#!/usr/bin/env python
# -*- coding: utf-8 -*-

回答 3

截至今天-2018年6月


PEP 263本身提到了它遵循的正则表达式:

要定义源代码编码,必须将魔术注释作为源文件的第一行或第二行放置在源文件中,例如:

# coding=<encoding name>

或(使用流行的编辑器认可的格式):

#!/usr/bin/python
# -*- coding: <encoding name> -*-

要么:

#!/usr/bin/python
# vim: set fileencoding=<encoding name> : 

更准确地说,第一行或第二行必须匹配以下正则表达式:

^[ \t\f]*#.*?coding[:=][ \t]*([-_.a-zA-Z0-9]+)

因此,正如其他答案所总结的那样,它可以coding与任何前缀匹配,但是如果您希望尽可能地与PEP兼容(尽管据我所知,使用encoding而不是coding不违反PEP 263(以任何方式)-坚持使用’plain’ coding,没有前缀。

As of today — June 2018


PEP 263 itself mentions the regex it follows:

To define a source code encoding, a magic comment must be placed into the source files either as first or second line in the file, such as:

# coding=<encoding name>

or (using formats recognized by popular editors):

#!/usr/bin/python
# -*- coding: <encoding name> -*-

or:

#!/usr/bin/python
# vim: set fileencoding=<encoding name> : 

More precisely, the first or second line must match the following regular expression:

^[ \t\f]*#.*?coding[:=][ \t]*([-_.a-zA-Z0-9]+)

So, as already summed up by other answers, it’ll match coding with any prefix, but if you’d like to be as PEP-compliant as it gets (even though, as far as I can tell, using encoding instead of coding does not violate PEP 263 in any way) — stick with ‘plain’ coding, with no prefixes.


回答 4

如果我没记错的话,源文件编码的原始建议是在前几行中使用正则表达式,这将允许两者。

我认为正则表达式是类似coding:的东西。

我发现了这一点:http : //www.python.org/dev/peps/pep-0263/ 这是最初的建议,但是我似乎找不到最终说明来确切说明他们的工作。

我当然已经习惯encoding:了很大的效果,所以显然可以。

尝试更改为完全不同的内容,例如duhcoding: ...查看是否同样有效。

If I’m not mistaken, the original proposal for source file encodings was to use a regular expression for the first couple of lines, which would allow both.

I think the regex was something along the lines of coding: followed by something.

I found this: http://www.python.org/dev/peps/pep-0263/ Which is the original proposal, but I can’t seem to find the final spec stating exactly what they did.

I’ve certainly used encoding: to great effect, so obviously that works.

Try changing to something completely different, like duhcoding: ... to see if that works just as well.


回答 5

我怀疑它类似于Ruby-两种方法都可以。

这主要是因为不同的文本编辑器使用不同的标记编码方法(即,这两种)。

对于Ruby,只要是第一个,或者第二个(如果存在的话)只要包含符合以下条件的字符串即可:

coding: encoding-name

并忽略这些行上的任何空格和其他绒毛。(通常也可以是=而不是:)。

I suspect it is similar to Ruby – either method is okay.

This is largely because different text editors use different methods (ie, these two) of marking encoding.

With Ruby, as long as the first, or second if there is a shebang line contains a string that matches:

coding: encoding-name

and ignoring any whitespace and other fluff on those lines. (It can often be a = instead of :, too).