更改Python的默认编码?

问题:更改Python的默认编码?

从控制台运行应用程序时,Python存在许多“无法编码”和“无法解码”的问题。但是在 Eclipse PyDev IDE中,默认字符编码设置为UTF-8,我很好。

我到处搜索以设置默认编码,人们说Python删除了 sys.setdefaultencoding在启动时函数,因此我们无法使用它。

那么什么是最好的解决方案?

I have many “can’t encode” and “can’t decode” problems with Python when I run my applications from the console. But in the Eclipse PyDev IDE, the default character encoding is set to UTF-8, and I’m fine.

I searched around for setting the default encoding, and people say that Python deletes the sys.setdefaultencoding function on startup, and we can not use it.

So what’s the best solution for it?


回答 0

这是一个更简单的方法(黑客),可为您提供setdefaultencoding()从中删除的功能sys

import sys
# sys.setdefaultencoding() does not exist, here!
reload(sys)  # Reload does the trick!
sys.setdefaultencoding('UTF8')

(对于Python 3.4+,请注意:reload()位于importlib库中。)

不过,这并不是一件安全的事情:这显然是一个hack,因为sys.setdefaultencoding()有意将其从sysPython启动时删除。重新启用它并更改默认编码可能会破坏依赖于ASCII的默认代码(此代码可以是第三方的,这通常会使修复它变得不可能或危险)。

Here is a simpler method (hack) that gives you back the setdefaultencoding() function that was deleted from sys:

import sys
# sys.setdefaultencoding() does not exist, here!
reload(sys)  # Reload does the trick!
sys.setdefaultencoding('UTF8')

(Note for Python 3.4+: reload() is in the importlib library.)

This is not a safe thing to do, though: this is obviously a hack, since sys.setdefaultencoding() is purposely removed from sys when Python starts. Reenabling it and changing the default encoding can break code that relies on ASCII being the default (this code can be third-party, which would generally make fixing it impossible or dangerous).


回答 1

如果在尝试通过管道传输/重定向脚本输出时收到此错误

UnicodeEncodeError: 'ascii' codec can't encode characters in position 0-5: ordinal not in range(128)

只需在控制台中导出PYTHONIOENCODING,然后运行您的代码即可。

export PYTHONIOENCODING=utf8

If you get this error when you try to pipe/redirect output of your script

UnicodeEncodeError: 'ascii' codec can't encode characters in position 0-5: ordinal not in range(128)

Just export PYTHONIOENCODING in console and then run your code.

export PYTHONIOENCODING=utf8


回答 2

A)要控制sys.getdefaultencoding()输出:

python -c 'import sys; print(sys.getdefaultencoding())'

ascii

然后

echo "import sys; sys.setdefaultencoding('utf-16-be')" > sitecustomize.py

PYTHONPATH=".:$PYTHONPATH" python -c 'import sys; print(sys.getdefaultencoding())'

utf-16-be

您可以将sitecustomize.py放在更高的位置PYTHONPATH

另外你可能想尝试reload(sys).setdefaultencoding@EOL

B)要控制stdin.encodingstdout.encoding要设置PYTHONIOENCODING

python -c 'import sys; print(sys.stdin.encoding, sys.stdout.encoding)'

ascii ascii

然后

PYTHONIOENCODING="utf-16-be" python -c 'import sys; 
print(sys.stdin.encoding, sys.stdout.encoding)'

utf-16-be utf-16-be

最后:你可以使用A)B)两者!

A) To control sys.getdefaultencoding() output:

python -c 'import sys; print(sys.getdefaultencoding())'

ascii

Then

echo "import sys; sys.setdefaultencoding('utf-16-be')" > sitecustomize.py

and

PYTHONPATH=".:$PYTHONPATH" python -c 'import sys; print(sys.getdefaultencoding())'

utf-16-be

You could put your sitecustomize.py higher in your PYTHONPATH.

Also you might like to try reload(sys).setdefaultencoding by @EOL

B) To control stdin.encoding and stdout.encoding you want to set PYTHONIOENCODING:

python -c 'import sys; print(sys.stdin.encoding, sys.stdout.encoding)'

ascii ascii

Then

PYTHONIOENCODING="utf-16-be" python -c 'import sys; 
print(sys.stdin.encoding, sys.stdout.encoding)'

utf-16-be utf-16-be

Finally: you can use A) or B) or both!


回答 3

PyDev 3.4.1 开始,默认编码不再更改。有关详细信息,请参见此票证

对于早期版本,一种解决方案是确保PyDev不会以UTF-8作为默认编码运行。在Eclipse下,运行对话框设置(如果我没记错的话,请运行“运行配置”);您可以在常用标签上选择默认编码。如果您想“尽早”出现这些错误(换句话说:在您的PyDev环境中),请将其更改为US-ASCII。另请参阅原始博客文章以了解此解决方法

Starting with PyDev 3.4.1, the default encoding is not being changed anymore. See this ticket for details.

For earlier versions a solution is to make sure PyDev does not run with UTF-8 as the default encoding. Under Eclipse, run dialog settings (“run configurations”, if I remember correctly); you can choose the default encoding on the common tab. Change it to US-ASCII if you want to have these errors ‘early’ (in other words: in your PyDev environment). Also see an original blog post for this workaround.


回答 4

关于python2(仅限python2),一些以前的答案依赖于使用以下技巧:

import sys
reload(sys)  # Reload is a hack
sys.setdefaultencoding('UTF8')

不鼓励使用它(检查thisthis

就我而言,它有一个副作用:我使用的是ipython笔记本,一旦运行代码,“ print”功能将不再起作用。我想可能会有解决方案,但是我仍然认为使用hack并不是正确的选择。

在尝试了多种选择之后,最适合我的选择是在中使用了相同的代码sitecustomize.py,其中那段代码是。评估该模块后,将从sys中删除setdefaultencoding函数。

因此解决方案是将/usr/lib/python2.7/sitecustomize.py代码附加到文件中:

import sys
sys.setdefaultencoding('UTF8')

当我使用virtualenvwrapper时,我编辑的文件是 ~/.virtualenvs/venv-name/lib/python2.7/sitecustomize.py

当我使用python笔记本和conda时,它是 ~/anaconda2/lib/python2.7/sitecustomize.py

Regarding python2 (and python2 only), some of the former answers rely on using the following hack:

import sys
reload(sys)  # Reload is a hack
sys.setdefaultencoding('UTF8')

It is discouraged to use it (check this or this)

In my case, it come with a side-effect: I’m using ipython notebooks, and once I run the code the ´print´ function no longer works. I guess there would be solution to it, but still I think using the hack should not be the correct option.

After trying many options, the one that worked for me was using the same code in the sitecustomize.py, where that piece of code is meant to be. After evaluating that module, the setdefaultencoding function is removed from sys.

So the solution is to append to file /usr/lib/python2.7/sitecustomize.py the code:

import sys
sys.setdefaultencoding('UTF8')

When I use virtualenvwrapper the file I edit is ~/.virtualenvs/venv-name/lib/python2.7/sitecustomize.py.

And when I use with python notebooks and conda, it is ~/anaconda2/lib/python2.7/sitecustomize.py


回答 5

有一篇关于它的有见地的博客文章。

https://anonbadger.wordpress.com/2015/06/16/why-sys-setdefaultencoding-will-break-code/

我在下面解释其内容。

在python 2中,关于字符串编码的类型不那么强,您可以对不同编码的字符串执行操作,然后获得成功。例如,以下内容将返回True

u'Toshio' == 'Toshio'

对于使用编码的每个(正常,无前缀的)字符串(sys.getdefaultencoding()默认设置为)ascii,该字符串均适用。

默认编码应在的系统范围内更改site.py,但不能在其他地方更改。在用户模块中进行设置的hack(也在此处介绍)仅仅是:hack,而不是解决方案。

Python 3确实将系统编码更改为默认的utf-8(当LC_CTYPE支持unicode时),但是解决了基本问题,要求每当与unicode字符串一起使用时对“ byte”字符串进行显式编码。

There is an insightful blog post about it.

See https://anonbadger.wordpress.com/2015/06/16/why-sys-setdefaultencoding-will-break-code/.

I paraphrase its content below.

In python 2 which was not as strongly typed regarding the encoding of strings you could perform operations on differently encoded strings, and succeed. E.g. the following would return True.

u'Toshio' == 'Toshio'

That would hold for every (normal, unprefixed) string that was encoded in sys.getdefaultencoding(), which defaulted to ascii, but not others.

The default encoding was meant to be changed system-wide in site.py, but not somewhere else. The hacks (also presented here) to set it in user modules were just that: hacks, not the solution.

Python 3 did changed the system encoding to default to utf-8 (when LC_CTYPE is unicode-aware), but the fundamental problem was solved with the requirement to explicitly encode “byte”strings whenever they are used with unicode strings.


回答 6

第一:reload(sys)仅根据输出终端流的需要设置一些随机默认编码是不好的做法。reload通常会根据环境更改sys中已放置的内容,例如sys.stdin / stdout流,sys.excepthook等。

解决标准输出上的编码问题

我所知道的解决sys.stdout 上print‘ 编码unicode字符串和超越ascii 的编码问题str(例如,从文字中获取)的最佳解决方案是:照顾一个sys.stdout(类似于文件的对象),它具有以下功能:在需求方面可以选择容忍:

  • 如果sys.stdout.encodingNone出于某种原因,或者根本不存在,或者错误地将其错误或“小于” stdout终端或流真正具备的能力,则尝试提供正确的.encoding属性。最后,用sys.stdout & sys.stderr翻译的类文件对象代替。

  • 当终端/流仍然不能对所有出现的unichar字符进行编码时,并且当您不希望print仅仅因为这个而中断时,可以在类似文件的翻译对象中引入“替换编码”行为。

这里是一个例子:

#!/usr/bin/env python
# encoding: utf-8
import sys

class SmartStdout:
    def __init__(self, encoding=None, org_stdout=None):
        if org_stdout is None:
            org_stdout = getattr(sys.stdout, 'org_stdout', sys.stdout)
        self.org_stdout = org_stdout
        self.encoding = encoding or \
                        getattr(org_stdout, 'encoding', None) or 'utf-8'
    def write(self, s):
        self.org_stdout.write(s.encode(self.encoding, 'backslashreplace'))
    def __getattr__(self, name):
        return getattr(self.org_stdout, name)

if __name__ == '__main__':
    if sys.stdout.isatty():
        sys.stdout = sys.stderr = SmartStdout()

    us = u'aouäöüфżß²'
    print us
    sys.stdout.flush()

在Python 2/2 + 3代码中使用超出ascii的纯字符串文字

我认为将全局默认编码(仅更改为UTF-8)的唯一好理由是有关应用程序源代码的决定-并不是因为I / O流编码问题:用于将超出ASCII字符串文字写入代码而无需强制始终使用u'string'样式Unicode转义。可以相当一致地完成此操作(尽管使用了“”或ascii(无声明)。更改或删除仍然非常愚蠢的方式的库致命地依赖于chr#127(目前很少见)以外的ascii默认编码错误。 anonbadger通过照顾Python 2或Python 2 + 3源代码基础(可以一致地使用ascii或UTF-8纯字符串文字),的文章如此说)-只要这些字符串可能会保持沉默Unicode转换并在模块之间移动或可能转到stdout。为此,请选择“# encoding: utf-8

并在上述SmartStdout方案之外,在应用程序启动时(和/或通过sitecustomize.py)执行此操作-无需使用reload(sys)

...
def set_defaultencoding_globally(encoding='utf-8'):
    assert sys.getdefaultencoding() in ('ascii', 'mbcs', encoding)
    import imp
    _sys_org = imp.load_dynamic('_sys_org', 'sys')
    _sys_org.setdefaultencoding(encoding)

if __name__ == '__main__':
    sys.stdout = sys.stderr = SmartStdout()
    set_defaultencoding_globally('utf-8') 
    s = 'aouäöüфżß²'
    print s

这样,字符串文字和大多数操作(字符迭代除外)可以轻松工作,而无需考虑Unicode转换,就好像只有Python3。当然,文件I / O始终需要特别注意编码-就像Python3一样。

注意:在将原始字符串SmartStdout转换为相应的输出流之前,会将其从utf-8隐式转换为unicode in 。

First: reload(sys) and setting some random default encoding just regarding the need of an output terminal stream is bad practice. reload often changes things in sys which have been put in place depending on the environment – e.g. sys.stdin/stdout streams, sys.excepthook, etc.

Solving the encode problem on stdout

The best solution I know for solving the encode problem of print‘ing unicode strings and beyond-ascii str‘s (e.g. from literals) on sys.stdout is: to take care of a sys.stdout (file-like object) which is capable and optionally tolerant regarding the needs:

  • When sys.stdout.encoding is None for some reason, or non-existing, or erroneously false or “less” than what the stdout terminal or stream really is capable of, then try to provide a correct .encoding attribute. At last by replacing sys.stdout & sys.stderr by a translating file-like object.

  • When the terminal / stream still cannot encode all occurring unicode chars, and when you don’t want to break print‘s just because of that, you can introduce an encode-with-replace behavior in the translating file-like object.

Here an example:

#!/usr/bin/env python
# encoding: utf-8
import sys

class SmartStdout:
    def __init__(self, encoding=None, org_stdout=None):
        if org_stdout is None:
            org_stdout = getattr(sys.stdout, 'org_stdout', sys.stdout)
        self.org_stdout = org_stdout
        self.encoding = encoding or \
                        getattr(org_stdout, 'encoding', None) or 'utf-8'
    def write(self, s):
        self.org_stdout.write(s.encode(self.encoding, 'backslashreplace'))
    def __getattr__(self, name):
        return getattr(self.org_stdout, name)

if __name__ == '__main__':
    if sys.stdout.isatty():
        sys.stdout = sys.stderr = SmartStdout()

    us = u'aouäöüфżß²'
    print us
    sys.stdout.flush()

Using beyond-ascii plain string literals in Python 2 / 2 + 3 code

The only good reason to change the global default encoding (to UTF-8 only) I think is regarding an application source code decision – and not because of I/O stream encodings issues: For writing beyond-ascii string literals into code without being forced to always use u'string' style unicode escaping. This can be done rather consistently (despite what anonbadger‘s article says) by taking care of a Python 2 or Python 2 + 3 source code basis which uses ascii or UTF-8 plain string literals consistently – as far as those strings potentially undergo silent unicode conversion and move between modules or potentially go to stdout. For that, prefer “# encoding: utf-8” or ascii (no declaration). Change or drop libraries which still rely in a very dumb way fatally on ascii default encoding errors beyond chr #127 (which is rare today).

And do like this at application start (and/or via sitecustomize.py) in addition to the SmartStdout scheme above – without using reload(sys):

...
def set_defaultencoding_globally(encoding='utf-8'):
    assert sys.getdefaultencoding() in ('ascii', 'mbcs', encoding)
    import imp
    _sys_org = imp.load_dynamic('_sys_org', 'sys')
    _sys_org.setdefaultencoding(encoding)

if __name__ == '__main__':
    sys.stdout = sys.stderr = SmartStdout()
    set_defaultencoding_globally('utf-8') 
    s = 'aouäöüфżß²'
    print s

This way string literals and most operations (except character iteration) work comfortable without thinking about unicode conversion as if there would be Python3 only. File I/O of course always need special care regarding encodings – as it is in Python3.

Note: plains strings then are implicitely converted from utf-8 to unicode in SmartStdout before being converted to the output stream enconding.


回答 7

这是我用来生成与python2python3兼容并且始终生成utf8输出的代码的方法。我在其他地方找到了这个答案,但我不记得源了。

这种方法的工作原理是更换sys.stdout的东西,是不是很类似文件(但仍然只使用标准库的东西)。这很可能会给您的基础库带来问题,但是在简单的情况下,您可以很好地控制通过框架使用sys.stdout的方式,这可能是一种合理的方法。

sys.stdout = io.open(sys.stdout.fileno(), 'w', encoding='utf8')

Here is the approach I used to produce code that was compatible with both python2 and python3 and always produced utf8 output. I found this answer elsewhere, but I can’t remember the source.

This approach works by replacing sys.stdout with something that isn’t quite file-like (but still only using things in the standard library). This may well cause problems for your underlying libraries, but in the simple case where you have good control over how sys.stdout out is used through your framework this can be a reasonable approach.

sys.stdout = io.open(sys.stdout.fileno(), 'w', encoding='utf8')

回答 8

这为我解决了这个问题。

import os
os.environ["PYTHONIOENCODING"] = "utf-8"

This fixed the issue for me.

import os
os.environ["PYTHONIOENCODING"] = "utf-8"

回答 9

对于(1)在运行Python 2.7的Windows平台上(2)和(3)恼火的人来说,这是一个快速的技巧。操作)将不会在IDLE环境中显示“漂亮的unicode字符”(Pythonwin可以很好地打印unicode),例如,斯蒂芬·博伊尔(Stephan Boyer)在他的教育证明者在First Order Logic Prover的输出中使用的整洁的First Logic符号。

我不喜欢强制重新加载系统的想法,并且我无法让系统与设置环境变量(例如PYTHONIOENCODING)(尝试过直接Windows环境变量,并将其一起放入站点包中的sitecustomize.py中)配合使用。班轮=’utf-8’)。

因此,如果您愿意破解成功之路,请转至IDLE目录,通常为:“ C:\ Python27 \ Lib \ idlelib”找到文件IOBinding.py。复制该文件并将其存储在其他位置,以便您选择时可以恢复为原始行为。使用编辑器(例如IDLE)在idlelib中打开文件。转到以下代码区域:

# Encoding for file names
filesystemencoding = sys.getfilesystemencoding()

encoding = "ascii"
if sys.platform == 'win32':
    # On Windows, we could use "mbcs". However, to give the user
    # a portable encoding name, we need to find the code page 
    try:
        # --> 6/5/17 hack to force IDLE to display utf-8 rather than cp1252
        # --> encoding = locale.getdefaultlocale()[1]
        encoding = 'utf-8'
        codecs.lookup(encoding)
    except LookupError:
        pass

换句话说,在使编码变量等于locale.getdefaultlocale的“ try ” 之后,注释掉原始代码行(因为这将为您提供不需要的cp1252),而是将其强行强制为“ utf-8” ‘(通过添加行’ encoding =’utf-8 ‘,如图所示)。

我相信这只会影响IDLE显示到标准输出,而不影响用于文件名等的编码(这是在先前的filesystemencoding中获得的)。如果以后在IDLE中运行的任何其他代码有问题,只需将IOBinding.py文件替换为原始未修改的文件。

This is a quick hack for anyone who is (1) On a Windows platform (2) running Python 2.7 and (3) annoyed because a nice piece of software (i.e., not written by you so not immediately a candidate for encode/decode printing maneuvers) won’t display the “pretty unicode characters” in the IDLE environment (Pythonwin prints unicode fine), For example, the neat First Order Logic symbols that Stephan Boyer uses in the output from his pedagogic prover at First Order Logic Prover.

I didn’t like the idea of forcing a sys reload and I couldn’t get the system to cooperate with setting environment variables like PYTHONIOENCODING (tried direct Windows environment variable and also dropping that in a sitecustomize.py in site-packages as a one liner =’utf-8′).

So, if you are willing to hack your way to success, go to your IDLE directory, typically: “C:\Python27\Lib\idlelib” Locate the file IOBinding.py. Make a copy of that file and store it somewhere else so you can revert to original behavior when you choose. Open the file in the idlelib with an editor (e.g., IDLE). Go to this code area:

# Encoding for file names
filesystemencoding = sys.getfilesystemencoding()

encoding = "ascii"
if sys.platform == 'win32':
    # On Windows, we could use "mbcs". However, to give the user
    # a portable encoding name, we need to find the code page 
    try:
        # --> 6/5/17 hack to force IDLE to display utf-8 rather than cp1252
        # --> encoding = locale.getdefaultlocale()[1]
        encoding = 'utf-8'
        codecs.lookup(encoding)
    except LookupError:
        pass

In other words, comment out the original code line following the ‘try‘ that was making the encoding variable equal to locale.getdefaultlocale (because that will give you cp1252 which you don’t want) and instead brute force it to ‘utf-8’ (by adding the line ‘encoding = ‘utf-8‘ as shown).

I believe this only affects IDLE display to stdout and not the encoding used for file names etc. (that is obtained in the filesystemencoding prior). If you have a problem with any other code you run in IDLE later, just replace the IOBinding.py file with the original unmodified file.


回答 10

您可以更改整个操作系统的编码。在Ubuntu上,您可以使用

sudo apt install locales 
sudo locale-gen en_US en_US.UTF-8    
sudo dpkg-reconfigure locales

You could change the encoding of your entire operating system. On Ubuntu you can do this with

sudo apt install locales 
sudo locale-gen en_US en_US.UTF-8    
sudo dpkg-reconfigure locales