问题:在Python中管道输出标准输出时设置正确的编码

当传递Python程序的输出的管道时,Python解释器会对编码感到困惑,并将其设置为None。这意味着这样的程序:

# -*- coding: utf-8 -*-
print u"åäö"

正常运行时可以正常工作,但失败:

UnicodeEncodeError:’ascii’编解码器无法在位置0编码字符u’\ xa0’:序数不在范围内(128)

以管道顺序使用时。

使管道工作的最佳方法是什么?我能告诉它使用外壳程序/文件系统/正在使用的任何编码吗?

到目前为止,我所看到的建议是直接修改site.py,或使用此hack硬编码defaultencoding:

# -*- coding: utf-8 -*-
import sys
reload(sys)
sys.setdefaultencoding('utf-8')
print u"åäö"

有没有更好的方法可以使管道工作?

When piping the output of a Python program, the Python interpreter gets confused about encoding and sets it to None. This means a program like this:

# -*- coding: utf-8 -*-
print u"åäö"

will work fine when run normally, but fail with:

UnicodeEncodeError: ‘ascii’ codec can’t encode character u’\xa0′ in position 0: ordinal not in range(128)

when used in a pipe sequence.

What is the best way to make this work when piping? Can I just tell it to use whatever encoding the shell/filesystem/whatever is using?

The suggestions I have seen thus far is to modify your site.py directly, or hardcoding the defaultencoding using this hack:

# -*- coding: utf-8 -*-
import sys
reload(sys)
sys.setdefaultencoding('utf-8')
print u"åäö"

Is there a better way to make piping work?


回答 0

您的代码在脚本中运行时有效,因为Python将输出编码为您的终端应用程序正在使用的任何编码。如果要进行管道传输,则必须自己对其进行编码。

经验法则是:始终在内部使用Unicode。解码收到的内容,并对发送的内容进行编码。

# -*- coding: utf-8 -*-
print u"åäö".encode('utf-8')

另一个教学示例是一个Python程序,用于在ISO-8859-1和UTF-8之间进行转换,从而使两者之间的所有内容均大写。

import sys
for line in sys.stdin:
    # Decode what you receive:
    line = line.decode('iso8859-1')

    # Work with Unicode internally:
    line = line.upper()

    # Encode what you send:
    line = line.encode('utf-8')
    sys.stdout.write(line)

设置系统默认编码不是一个好主意,因为您使用的某些模块和库可能依赖于它是ASCII的事实。不要这样

Your code works when run in an script because Python encodes the output to whatever encoding your terminal application is using. If you are piping you must encode it yourself.

A rule of thumb is: Always use Unicode internally. Decode what you receive, and encode what you send.

# -*- coding: utf-8 -*-
print u"åäö".encode('utf-8')

Another didactic example is a Python program to convert between ISO-8859-1 and UTF-8, making everything uppercase in between.

import sys
for line in sys.stdin:
    # Decode what you receive:
    line = line.decode('iso8859-1')

    # Work with Unicode internally:
    line = line.upper()

    # Encode what you send:
    line = line.encode('utf-8')
    sys.stdout.write(line)

Setting the system default encoding is a bad idea, because some modules and libraries you use can rely on the fact it is ASCII. Don’t do it.


回答 1

首先,关于此解决方案:

# -*- coding: utf-8 -*-
print u"åäö".encode('utf-8')

每次都使用给定的编码显式打印是不实际的。那将是重复的并且容易出错。

更好的解决方案是sys.stdout在程序开始时进行更改,以使用选定的编码进行编码。这是我在Python上找到的一种解决方案:如何选择sys.stdout.encoding?,特别是“ toka”的评论:

import sys
import codecs
sys.stdout = codecs.getwriter('utf8')(sys.stdout)

First, regarding this solution:

# -*- coding: utf-8 -*-
print u"åäö".encode('utf-8')

It’s not practical to explicitly print with a given encoding every time. That would be repetitive and error-prone.

A better solution is to change sys.stdout at the start of your program, to encode with a selected encoding. Here is one solution I found on Python: How is sys.stdout.encoding chosen?, in particular a comment by “toka”:

import sys
import codecs
sys.stdout = codecs.getwriter('utf8')(sys.stdout)

回答 2

您可能需要尝试将环境变量“ PYTHONIOENCODING”更改为“ utf_8”。我写了一篇关于这个问题的磨难页面

博客文章的Tl; dr:

import sys, locale, os
print(sys.stdout.encoding)
print(sys.stdout.isatty())
print(locale.getpreferredencoding())
print(sys.getfilesystemencoding())
print(os.environ["PYTHONIOENCODING"])
print(chr(246), chr(9786), chr(9787))

给你

utf_8
False
ANSI_X3.4-1968
ascii
utf_8
ö ☺ ☻

You may want to try changing the environment variable “PYTHONIOENCODING” to “utf_8”. I have written a page on my ordeal with this problem.

Tl;dr of the blog post:

import sys, locale, os
print(sys.stdout.encoding)
print(sys.stdout.isatty())
print(locale.getpreferredencoding())
print(sys.getfilesystemencoding())
print(os.environ["PYTHONIOENCODING"])
print(chr(246), chr(9786), chr(9787))

gives you

utf_8
False
ANSI_X3.4-1968
ascii
utf_8
ö ☺ ☻

回答 3

export PYTHONIOENCODING=utf-8

做这项工作,但不能在python本身上设置它…

我们可以做的是验证是否未设置,并在调用脚本之前通过以下命令告诉用户进行设置:

if __name__ == '__main__':
    if (sys.stdout.encoding is None):
        print >> sys.stderr, "please set python env PYTHONIOENCODING=UTF-8, example: export PYTHONIOENCODING=UTF-8, when write to stdout."
        exit(1)

更新以回复评论:该问题仅在传递到stdout时存在。我在Fedora 25 Python 2.7.13中进行了测试

python --version
Python 2.7.13

猫b.py

#!/usr/bin/env python
#-*- coding: utf-8 -*-
import sys

print sys.stdout.encoding

运行./b.py

UTF-8

运行./b.py | 减

None
export PYTHONIOENCODING=utf-8

do the job, but can’t set it on python itself …

what we can do is verify if isn’t setting and tell the user to set it before call script with :

if __name__ == '__main__':
    if (sys.stdout.encoding is None):
        print >> sys.stderr, "please set python env PYTHONIOENCODING=UTF-8, example: export PYTHONIOENCODING=UTF-8, when write to stdout."
        exit(1)

Update to reply to the comment: the problem just exist when piping to stdout . I tested in Fedora 25 Python 2.7.13

python --version
Python 2.7.13

cat b.py

#!/usr/bin/env python
#-*- coding: utf-8 -*-
import sys

print sys.stdout.encoding

running ./b.py

UTF-8

running ./b.py | less

None

回答 4

上周有一个类似的问题。在我的IDE(PyCharm)中很容易修复。

这是我的解决方法:

从PyCharm菜单栏开始:文件->设置…->编辑器->文件编码,然后将:“ IDE编码”,“项目编码”和“属性文件的默认编码”全部设置为UTF-8,她现在可以工作了像个魅力。

希望这可以帮助!

I had a similar issue last week. It was easy to fix in my IDE (PyCharm).

Here was my fix:

Starting from PyCharm menu bar: File -> Settings… -> Editor -> File Encodings, then set: “IDE Encoding”, “Project Encoding” and “Default encoding for properties files” ALL to UTF-8 and she now works like a charm.

Hope this helps!


回答 5

克雷格·麦昆(Craig McQueen)的答案可能是经过消毒的版本。

import sys, codecs
class EncodedOut:
    def __init__(self, enc):
        self.enc = enc
        self.stdout = sys.stdout
    def __enter__(self):
        if sys.stdout.encoding is None:
            w = codecs.getwriter(self.enc)
            sys.stdout = w(sys.stdout)
    def __exit__(self, exc_ty, exc_val, tb):
        sys.stdout = self.stdout

用法:

with EncodedOut('utf-8'):
    print u'ÅÄÖåäö'

An arguable sanitized version of Craig McQueen’s answer.

import sys, codecs
class EncodedOut:
    def __init__(self, enc):
        self.enc = enc
        self.stdout = sys.stdout
    def __enter__(self):
        if sys.stdout.encoding is None:
            w = codecs.getwriter(self.enc)
            sys.stdout = w(sys.stdout)
    def __exit__(self, exc_ty, exc_val, tb):
        sys.stdout = self.stdout

Usage:

with EncodedOut('utf-8'):
    print u'ÅÄÖåäö'

回答 6

我可以通过以下方式“自动化”它:

def __fix_io_encoding(last_resort_default='UTF-8'):
  import sys
  if [x for x in (sys.stdin,sys.stdout,sys.stderr) if x.encoding is None] :
      import os
      defEnc = None
      if defEnc is None :
        try:
          import locale
          defEnc = locale.getpreferredencoding()
        except: pass
      if defEnc is None :
        try: defEnc = sys.getfilesystemencoding()
        except: pass
      if defEnc is None :
        try: defEnc = sys.stdin.encoding
        except: pass
      if defEnc is None :
        defEnc = last_resort_default
      os.environ['PYTHONIOENCODING'] = os.environ.get("PYTHONIOENCODING",defEnc)
      os.execvpe(sys.argv[0],sys.argv,os.environ)
__fix_io_encoding() ; del __fix_io_encoding

是的,如果此“ setenv”失败,则有可能在此处获得无限循环。

I could “automate” it with a call to:

def __fix_io_encoding(last_resort_default='UTF-8'):
  import sys
  if [x for x in (sys.stdin,sys.stdout,sys.stderr) if x.encoding is None] :
      import os
      defEnc = None
      if defEnc is None :
        try:
          import locale
          defEnc = locale.getpreferredencoding()
        except: pass
      if defEnc is None :
        try: defEnc = sys.getfilesystemencoding()
        except: pass
      if defEnc is None :
        try: defEnc = sys.stdin.encoding
        except: pass
      if defEnc is None :
        defEnc = last_resort_default
      os.environ['PYTHONIOENCODING'] = os.environ.get("PYTHONIOENCODING",defEnc)
      os.execvpe(sys.argv[0],sys.argv,os.environ)
__fix_io_encoding() ; del __fix_io_encoding

Yes, it’s possible to get an infinite loop here if this “setenv” fails.


回答 7

我只是以为我在这里提到了一些东西,在我最终意识到发生了什么之前,我不得不花很长时间进行试验。对于这里的每个人来说,这可能是如此明显,以至于他们都没有理会它。但是如果他们有的话,这对我会有所帮助,所以按照这个原则…!

注意:我专门使用的是Jython 2.7版,所以可能这可能不适用于CPython

NB2:我的.py文件的前两行是:

# -*- coding: utf-8 -*-
from __future__ import print_function

“%”(也称为“插值运算符”)字符串构造机制也会引起其他问题……如果“环境”的默认编码为ASCII,则尝试执行类似的操作

print( "bonjour, %s" % "fréd" )  # Call this "print A"

您将在Eclipse中运行没有困难…在Windows CLI(DOS窗口)中,您会发现编码是代码页850(我的Windows 7 OS)或类似的东西,至少可以处理欧洲带有重音符号的字符,因此它会工作的。

print( u"bonjour, %s" % "fréd" ) # Call this "print B"

也可以。

如果是OTOH,您从CLI定向到文件,则stdout编码将为None,它将默认设置为ASCII(无论如何在我的OS上),它将无法处理以上任何打印…(可怕的编码)错误)。

因此,您可能会考虑使用来重定向您的标准输出

sys.stdout = codecs.getwriter('utf8')(sys.stdout)

并尝试在CLI管道中运行到文件…很奇怪,上面的打印A可以工作…但是上面的打印B将抛出编码错误!但是,以下内容可以正常运行:

print( u"bonjour, " + "fréd" ) # Call this "print C"

我得出的结论(临时)是,如果将使用“ u”前缀指定为Unicode字符串的字符串提交给%-handling机制,则似乎涉及使用默认环境编码,无论是否已将stdout设置为重定向!

人们如何处理这是一个选择问题。我欢迎Unicode专家说出为什么会发生这种情况,我是否以某种方式出错了,对此的首选解决方案,是否也适用于CPython,它是否发生在Python 3中,等等。

I just thought I’d mention something here which I had to spent a long time experimenting with before I finally realised what was going on. This may be so obvious to everyone here that they haven’t bothered mentioning it. But it would’ve helped me if they had, so on that principle…!

NB: I am using Jython specifically, v 2.7, so just possibly this may not apply to CPython

NB2: the first two lines of my .py file here are:

# -*- coding: utf-8 -*-
from __future__ import print_function

The “%” (AKA “interpolation operator”) string construction mechanism causes ADDITIONAL problems too… If the default encoding of the “environment” is ASCII and you try to do something like

print( "bonjour, %s" % "fréd" )  # Call this "print A"

You will have no difficulty running in Eclipse… In a Windows CLI (DOS window) you will find that the encoding is code page 850 (my Windows 7 OS) or something similar, which can handle European accented characters at least, so it’ll work.

print( u"bonjour, %s" % "fréd" ) # Call this "print B"

will also work.

If, OTOH, you direct to a file from the CLI, the stdout encoding will be None, which will default to ASCII (on my OS anyway), which will not be able to handle either of the above prints… (dreaded encoding error).

So then you might think of redirecting your stdout by using

sys.stdout = codecs.getwriter('utf8')(sys.stdout)

and try running in the CLI piping to a file… Very oddly, print A above will work… But print B above will throw the encoding error! The following will however work OK:

print( u"bonjour, " + "fréd" ) # Call this "print C"

The conclusion I have come to (provisionally) is that if a string which is specified to be a Unicode string using the “u” prefix is submitted to the %-handling mechanism it appears to involve the use of the default environment encoding, regardless of whether you have set stdout to redirect!

How people deal with this is a matter of choice. I would welcome a Unicode expert to say why this happens, whether I’ve got it wrong in some way, what the preferred solution to this, whether it also applies to CPython, whether it happens in Python 3, etc., etc.


回答 8

我在旧版应用程序中遇到了这个问题,很难确定打印的内容。我帮助自己解决了这个问题:

# encoding_utf8.py
import codecs
import builtins


def print_utf8(text, **kwargs):
    print(str(text).encode('utf-8'), **kwargs)


def print_utf8(fn):
    def print_fn(*args, **kwargs):
        return fn(str(*args).encode('utf-8'), **kwargs)
    return print_fn


builtins.print = print_utf8(print)

在我的脚本之上,test.py:

import encoding_utf8
string = 'Axwell Λ Ingrosso'
print(string)

请注意,这会将所有调用更改为使用编码进行打印,因此您的控制台将打印以下内容:

$ python test.py
b'Axwell \xce\x9b Ingrosso'

I ran into this problem in a legacy application, and it was difficult to identify where what was printed. I helped myself with this hack:

# encoding_utf8.py
import codecs
import builtins


def print_utf8(text, **kwargs):
    print(str(text).encode('utf-8'), **kwargs)


def print_utf8(fn):
    def print_fn(*args, **kwargs):
        return fn(str(*args).encode('utf-8'), **kwargs)
    return print_fn


builtins.print = print_utf8(print)

On top of my script, test.py:

import encoding_utf8
string = 'Axwell Λ Ingrosso'
print(string)

Note that this changes ALL calls to print to use an encoding, so your console will print this:

$ python test.py
b'Axwell \xce\x9b Ingrosso'

回答 9

在Windows上,当从编辑器(例如Sublime Text)运行Python代码时,我经常遇到此问题,但没有从命令行运行它时。

在这种情况下,请检查编辑器的参数。对于SublimeText,这Python.sublime-build解决了它:

{
  "cmd": ["python", "-u", "$file"],
  "file_regex": "^[ ]*File \"(...*?)\", line ([0-9]*)",
  "selector": "source.python",
  "encoding": "utf8",
  "env": {"PYTHONIOENCODING": "utf-8", "LANG": "en_US.UTF-8"}
}

On Windows, I had this problem very often when running a Python code from an editor (like Sublime Text), but not if running it from command-line.

In this case, check your editor’s parameters. In the case of SublimeText, this Python.sublime-build solved it:

{
  "cmd": ["python", "-u", "$file"],
  "file_regex": "^[ ]*File \"(...*?)\", line ([0-9]*)",
  "selector": "source.python",
  "encoding": "utf8",
  "env": {"PYTHONIOENCODING": "utf-8", "LANG": "en_US.UTF-8"}
}

声明:本站所有文章,如无特殊说明或标注,均为本站原创发布。任何个人或组织,在未征得本站同意时,禁止复制、盗用、采集、发布本站内容到任何网站、书籍等各类媒体平台。如若本站内容侵犯了原著者的合法权益,可联系我们进行处理。