Python,Unicode和Windows控制台

问题:Python,Unicode和Windows控制台

当我尝试在Windows控制台中打印Unicode字符串时,出现UnicodeEncodeError: 'charmap' codec can't encode character ....错误。我认为这是因为Windows控制台不接受仅Unicode字符。最好的办法是什么?有什么方法可以使Python自动打印?而不是在这种情况下失败?

编辑: 我正在使用Python 2.5。


注意:带有对勾标记的@ LasseV.Karlsen答案有点过时(自2008年起)。请谨慎使用以下解决方案/答案/建议!

截至今天(2016年1月6日),@ JFSebastian的答案更有意义。

When I try to print a Unicode string in a Windows console, I get a UnicodeEncodeError: 'charmap' codec can't encode character .... error. I assume this is because the Windows console does not accept Unicode-only characters. What’s the best way around this? Is there any way I can make Python automatically print a ? instead of failing in this situation?

Edit: I’m using Python 2.5.


Note: @LasseV.Karlsen answer with the checkmark is sort of outdated (from 2008). Please use the solutions/answers/suggestions below with care!!

@JFSebastian answer is more relevant as of today (6 Jan 2016).


回答 0

注意:这个答案有点过时了(从2008年开始)。请谨慎使用以下解决方案!


这是一个详细说明问题和解决方案的页面(在该页面中将sys.stdout文本包装到实例中):

PrintFails-Python Wiki

这是该页面的代码摘录:

$ python -c 'import sys, codecs, locale; print sys.stdout.encoding; \
    sys.stdout = codecs.getwriter(locale.getpreferredencoding())(sys.stdout); \
    line = u"\u0411\n"; print type(line), len(line); \
    sys.stdout.write(line); print line'
  UTF-8
  <type 'unicode'> 2
  Б
  Б

  $ python -c 'import sys, codecs, locale; print sys.stdout.encoding; \
    sys.stdout = codecs.getwriter(locale.getpreferredencoding())(sys.stdout); \
    line = u"\u0411\n"; print type(line), len(line); \
    sys.stdout.write(line); print line' | cat
  None
  <type 'unicode'> 2
  Б
  Б

该页面上有更多信息,非常值得一读。

Note: This answer is sort of outdated (from 2008). Please use the solution below with care!!


Here is a page that details the problem and a solution (search the page for the text Wrapping sys.stdout into an instance):

PrintFails – Python Wiki

Here’s a code excerpt from that page:

$ python -c 'import sys, codecs, locale; print sys.stdout.encoding; \
    sys.stdout = codecs.getwriter(locale.getpreferredencoding())(sys.stdout); \
    line = u"\u0411\n"; print type(line), len(line); \
    sys.stdout.write(line); print line'
  UTF-8
  <type 'unicode'> 2
  Б
  Б

  $ python -c 'import sys, codecs, locale; print sys.stdout.encoding; \
    sys.stdout = codecs.getwriter(locale.getpreferredencoding())(sys.stdout); \
    line = u"\u0411\n"; print type(line), len(line); \
    sys.stdout.write(line); print line' | cat
  None
  <type 'unicode'> 2
  Б
  Б

There’s some more information on that page, well worth a read.


回答 1

更新: Python 3.6实现了PEP 528:将Windows控制台编码更改为UTF-8Windows上的默认控制台现在将接受所有Unicode字符。在内部,它使用与下面提到win-unicode-console相同的Unicode API 。print(unicode_string)应该现在就可以工作。


我得到一个UnicodeEncodeError: 'charmap' codec can't encode character... 错误。

该错误意味着您尝试打印的Unicode字符无法使用当前(chcp)控制台字符编码表示。代码页通常是8位编码,例如cp437只能表示1M Unicode字符中的〜0x100个字符:

>>> u“ \ N {EURO SIGN}”。encode('cp437')
追溯(最近一次通话):
...
UnicodeEncodeError:'charmap'编解码器无法在位置0编码字符'\ u20ac':
字符映射到 

我认为这是因为Windows控制台不接受仅Unicode字符。最好的办法是什么?

Windows控制台确实接受Unicode字符,如果配置了相应的字体,它甚至可以显示它们(仅BMP)。WriteConsoleW()应该按照@Daira Hopwood的答案中的建议使用API 。可以透明地调用它,即,如果您使用win-unicode-consolepackage,则不需要也不应修改脚本:

T:\> py -mpip install win-unicode-console
T:\> py -mrun your_script.py

请参阅对Python 3.4,Unicode,不同的语言和Windows有何处理?

有什么方法可以使Python自动打印?而不是在这种情况下失败?

如果足以替换所有无法编码的字符,?则可以设置PYTHONIOENCODINGenvvar

T:\> set PYTHONIOENCODING=:replace
T:\> python3 -c "print(u'[\N{EURO SIGN}]')"
[?]

在Python 3.6+中,PYTHONIOENCODING除非将PYTHONLEGACYWINDOWSIOENCODINGenvvar设置为非空字符串,否则交互式控制台缓冲区将忽略envvar 指定的编码。

Update: Python 3.6 implements PEP 528: Change Windows console encoding to UTF-8: the default console on Windows will now accept all Unicode characters. Internally, it uses the same Unicode API as the win-unicode-console package mentioned below. print(unicode_string) should just work now.


I get a UnicodeEncodeError: 'charmap' codec can't encode character... error.

The error means that Unicode characters that you are trying to print can’t be represented using the current (chcp) console character encoding. The codepage is often 8-bit encoding such as cp437 that can represent only ~0x100 characters from ~1M Unicode characters:

>>> u"\N{EURO SIGN}".encode('cp437')
Traceback (most recent call last):
...
UnicodeEncodeError: 'charmap' codec can't encode character '\u20ac' in position 0:
character maps to 

I assume this is because the Windows console does not accept Unicode-only characters. What’s the best way around this?

Windows console does accept Unicode characters and it can even display them (BMP only) if the corresponding font is configured. WriteConsoleW() API should be used as suggested in @Daira Hopwood’s answer. It can be called transparently i.e., you don’t need to and should not modify your scripts if you use win-unicode-console package:

T:\> py -mpip install win-unicode-console
T:\> py -mrun your_script.py

See What’s the deal with Python 3.4, Unicode, different languages and Windows?

Is there any way I can make Python automatically print a ? instead of failing in this situation?

If it is enough to replace all unencodable characters with ? in your case then you could set PYTHONIOENCODING envvar:

T:\> set PYTHONIOENCODING=:replace
T:\> python3 -c "print(u'[\N{EURO SIGN}]')"
[?]

In Python 3.6+, the encoding specified by PYTHONIOENCODING envvar is ignored for interactive console buffers unless PYTHONLEGACYWINDOWSIOENCODING envvar is set to a non-empty string.


回答 2

尽管有其他合理的答案建议将代码页更改为65001,但该方法无效。(此外,更改使用编码默认sys.setdefaultencoding不是一个好主意。)

请参阅此问题,以获取有效的详细信息和代码。

Despite the other plausible-sounding answers that suggest changing the code page to 65001, that does not work. (Also, changing the default encoding using sys.setdefaultencoding is not a good idea.)

See this question for details and code that does work.


回答 3

如果您对获取不良字符的可靠表示不感兴趣,则可以使用以下方式(使用python> = 2.6,包括3.x):

from __future__ import print_function
import sys

def safeprint(s):
    try:
        print(s)
    except UnicodeEncodeError:
        if sys.version_info >= (3,):
            print(s.encode('utf8').decode(sys.stdout.encoding))
        else:
            print(s.encode('utf8'))

safeprint(u"\N{EM DASH}")

字符串中的错误字符将转换为Windows控制台可打印的表示形式。

If you’re not interested in getting a reliable representation of the bad character(s) you might use something like this (working with python >= 2.6, including 3.x):

from __future__ import print_function
import sys

def safeprint(s):
    try:
        print(s)
    except UnicodeEncodeError:
        if sys.version_info >= (3,):
            print(s.encode('utf8').decode(sys.stdout.encoding))
        else:
            print(s.encode('utf8'))

safeprint(u"\N{EM DASH}")

The bad character(s) in the string will be converted in a representation which is printable by the Windows console.


回答 4

以下代码即使在Windows上也可以将Python输出作为UTF-8控制台输出。

控制台将在Windows 7上很好地显示字符,但是在Windows XP上将不会很好地显示字符,但是至少它可以正常工作,最重要的是,您将在所有平台上从脚本获得一致的输出。您将能够将输出重定向到文件。

以下代码已在Windows上使用Python 2.6进行了测试。


#!/usr/bin/python
# -*- coding: UTF-8 -*-

import codecs, sys

reload(sys)
sys.setdefaultencoding('utf-8')

print sys.getdefaultencoding()

if sys.platform == 'win32':
    try:
        import win32console 
    except:
        print "Python Win32 Extensions module is required.\n You can download it from https://sourceforge.net/projects/pywin32/ (x86 and x64 builds are available)\n"
        exit(-1)
    # win32console implementation  of SetConsoleCP does not return a value
    # CP_UTF8 = 65001
    win32console.SetConsoleCP(65001)
    if (win32console.GetConsoleCP() != 65001):
        raise Exception ("Cannot set console codepage to 65001 (UTF-8)")
    win32console.SetConsoleOutputCP(65001)
    if (win32console.GetConsoleOutputCP() != 65001):
        raise Exception ("Cannot set console output codepage to 65001 (UTF-8)")

#import sys, codecs
sys.stdout = codecs.getwriter('utf8')(sys.stdout)
sys.stderr = codecs.getwriter('utf8')(sys.stderr)

print "This is an Е乂αmp١ȅ testing Unicode support using Arabic, Latin, Cyrillic, Greek, Hebrew and CJK code points.\n"

The below code will make Python output to console as UTF-8 even on Windows.

The console will display the characters well on Windows 7 but on Windows XP it will not display them well, but at least it will work and most important you will have a consistent output from your script on all platforms. You’ll be able to redirect the output to a file.

Below code was tested with Python 2.6 on Windows.


#!/usr/bin/python
# -*- coding: UTF-8 -*-

import codecs, sys

reload(sys)
sys.setdefaultencoding('utf-8')

print sys.getdefaultencoding()

if sys.platform == 'win32':
    try:
        import win32console 
    except:
        print "Python Win32 Extensions module is required.\n You can download it from https://sourceforge.net/projects/pywin32/ (x86 and x64 builds are available)\n"
        exit(-1)
    # win32console implementation  of SetConsoleCP does not return a value
    # CP_UTF8 = 65001
    win32console.SetConsoleCP(65001)
    if (win32console.GetConsoleCP() != 65001):
        raise Exception ("Cannot set console codepage to 65001 (UTF-8)")
    win32console.SetConsoleOutputCP(65001)
    if (win32console.GetConsoleOutputCP() != 65001):
        raise Exception ("Cannot set console output codepage to 65001 (UTF-8)")

#import sys, codecs
sys.stdout = codecs.getwriter('utf8')(sys.stdout)
sys.stderr = codecs.getwriter('utf8')(sys.stderr)

print "This is an Е乂αmp١ȅ testing Unicode support using Arabic, Latin, Cyrillic, Greek, Hebrew and CJK code points.\n"

回答 5

只需在执行python脚本之前在命令行中输入以下代码即可:

chcp 65001 & set PYTHONIOENCODING=utf-8

Just enter this code in command line before executing python script:

chcp 65001 & set PYTHONIOENCODING=utf-8

回答 6

就像GiampaoloRodolà的回答一样,但更加肮脏:我真的很想花很长时间(很快)来理解编码的整个主题以及它们如何应用于Windoze控制台,

就目前而言,我只想要sthg,这意味着我的程序不会崩溃,而且我了解…而且也没有涉及导入太多的外来模块(特别是我正在使用Jython,所以一半的时间是Python模块实际上并不可用)。

def pr(s):
    try:
        print(s)
    except UnicodeEncodeError:
        for c in s:
            try:
                print( c, end='')
            except UnicodeEncodeError:
                print( '?', end='')

注意:“ pr”的键入比“ print”的键入短(并且比“ safeprint”的键入要短很多)…!

Like Giampaolo Rodolà’s answer, but even more dirty: I really, really intend to spend a long time (soon) understanding the whole subject of encodings and how they apply to Windoze consoles,

For the moment I just wanted sthg which would mean my program would NOT CRASH, and which I understood … and also which didn’t involve importing too many exotic modules (in particular I’m using Jython, so half the time a Python module turns out not in fact to be available).

def pr(s):
    try:
        print(s)
    except UnicodeEncodeError:
        for c in s:
            try:
                print( c, end='')
            except UnicodeEncodeError:
                print( '?', end='')

NB “pr” is shorter to type than “print” (and quite a bit shorter to type than “safeprint”)…!


回答 7

对于Python 2,请尝试:

print unicode(string, 'unicode-escape')

对于Python 3,请尝试:

import os
string = "002 Could've Would've Should've"
os.system('echo ' + string)

或者尝试使用win-unicode-console:

pip install win-unicode-console
py -mrun your_script.py

For Python 2 try:

print unicode(string, 'unicode-escape')

For Python 3 try:

import os
string = "002 Could've Would've Should've"
os.system('echo ' + string)

Or try win-unicode-console:

pip install win-unicode-console
py -mrun your_script.py

回答 8

TL; DR:

print(yourstring.encode('ascii','replace'));

我自己遇到了这个问题,正在使用Twitch聊天(IRC)机器人。(最新的Python 2.7)

我想解析聊天消息以便回复…

msg = s.recv(1024).decode("utf-8")

还要以易于阅读的格式将它们安全地打印到控制台:

print(msg.encode('ascii','replace'));

这样就纠正了漫游器引发UnicodeEncodeError: 'charmap'错误的问题,并用替换了Unicode字符?

TL;DR:

print(yourstring.encode('ascii','replace'));

I ran into this myself, working on a Twitch chat (IRC) bot. (Python 2.7 latest)

I wanted to parse chat messages in order to respond…

msg = s.recv(1024).decode("utf-8")

but also print them safely to the console in a human-readable format:

print(msg.encode('ascii','replace'));

This corrected the issue of the bot throwing UnicodeEncodeError: 'charmap' errors and replaced the unicode characters with ?.


回答 9

您的问题的原因不是 Win控制台不愿意接受Unicode(因为这样做是因为我猜默认是Win2k)。它是默认的系统编码。试试下面的代码,看看它能为您带来什么:

import sys
sys.getdefaultencoding()

如果显示ascii,则是由您引起的;-)您必须创建一个名为sitecustomize.py的文件,并将其放在python路径下(我将其放在/usr/lib/python2.5/site-packages下,但在获胜-它是c:\ python \ lib \ site-packages或其他内容),具有以下内容:

import sys
sys.setdefaultencoding('utf-8')

也许您可能还需要在文件中指定编码:

# -*- coding: UTF-8 -*-
import sys,time

编辑:更多信息可以在优秀的《 Dive into Python》一书中找到

The cause of your problem is NOT the Win console not willing to accept Unicode (as it does this since I guess Win2k by default). It is the default system encoding. Try this code and see what it gives you:

import sys
sys.getdefaultencoding()

if it says ascii, there’s your cause ;-) You have to create a file called sitecustomize.py and put it under python path (I put it under /usr/lib/python2.5/site-packages, but that is differen on Win – it is c:\python\lib\site-packages or something), with the following contents:

import sys
sys.setdefaultencoding('utf-8')

and perhaps you might want to specify the encoding in your files as well:

# -*- coding: UTF-8 -*-
import sys,time

Edit: more info can be found in excellent the Dive into Python book


回答 10

肯尼迪·塞巴斯蒂安(JF Sebastian)的答案与之相关,但更为直接。

如果在打印到控制台/终端时遇到此问题,请执行以下操作:

>set PYTHONIOENCODING=UTF-8

Kind of related on the answer by J. F. Sebastian, but more direct.

If you are having this problem when printing to the console/terminal, then do this:

>set PYTHONIOENCODING=UTF-8

回答 11

Python 3.6 Windows7:有几种启动python的方法,您可以使用python控制台(上面带有python徽标)或Windows控制台(上面写有cmd.exe)。

我无法在Windows控制台中打印utf8字符。打印utf-8字符会引发此错误:

OSError: [winError 87] The paraneter is incorrect 
Exception ignored in: (_io-TextIOwrapper name='(stdout)' mode='w' ' encoding='utf8') 
OSError: [WinError 87] The parameter is incorrect 

在尝试并且无法理解以上答案之后,我发现这只是一个设置问题。右键单击cmd控制台窗口的顶部,在选项卡上font选择lucida控制台。

Python 3.6 windows7: There is several way to launch a python you could use the python console (which has a python logo on it) or the windows console (it’s written cmd.exe on it).

I could not print utf8 characters in the windows console. Printing utf-8 characters throw me this error:

OSError: [winError 87] The paraneter is incorrect 
Exception ignored in: (_io-TextIOwrapper name='(stdout)' mode='w' ' encoding='utf8') 
OSError: [WinError 87] The parameter is incorrect 

After trying and failing to understand the answer above I discovered it was only a setting problem. Right click on the top of the cmd console windows, on the tab font chose lucida console.


回答 12

詹姆斯·苏拉克(James Sulak)问,

有什么办法可以使Python自动打印?而不是在这种情况下失败?

其他解决方案建议我们尝试修改Windows环境或替换Python的Windows环境。 print()功能。下面的答案更接近满足Sulak的要求。

在Windows 7下,可以使Python 3.5打印Unicode而不会抛出 UnicodeEncodeError如下内容:

    代替:    print(text)
    替代:     print(str(text).encode('utf-8'))

现在,Python不会抛出异常,而是将不可打印的Unicode字符显示为\ xNN十六进制代码,例如:

  Halmalo n \ xe2 \ x80 \ x99 \ xc3 \ xa9tait加上qu \ xe2 \ x80 \ x99un点黑色

代替

  Halmalon’était加qu’un点黑色

当然,后者是更可取的ceteris paribus,但否则前者对于诊断消息是完全准确的。因为它将Unicode显示为文字字节值,所以前者还可以帮助诊断编码/解码问题。

注意:str()上面的调用是必需的,因为否则encode()会导致Python拒绝Unicode字符作为数字元组。

James Sulak asked,

Is there any way I can make Python automatically print a ? instead of failing in this situation?

Other solutions recommend we attempt to modify the Windows environment or replace Python’s print() function. The answer below comes closer to fulfilling Sulak’s request.

Under Windows 7, Python 3.5 can be made to print Unicode without throwing a UnicodeEncodeError as follows:

    In place of:    print(text)
    substitute:     print(str(text).encode('utf-8'))

Instead of throwing an exception, Python now displays unprintable Unicode characters as \xNN hex codes, e.g.:

  Halmalo n\xe2\x80\x99\xc3\xa9tait plus qu\xe2\x80\x99un point noir

Instead of

  Halmalo n’était plus qu’un point noir

Granted, the latter is preferable ceteris paribus, but otherwise the former is completely accurate for diagnostic messages. Because it displays Unicode as literal byte values the former may also assist in diagnosing encode/decode problems.

Note: The str() call above is needed because otherwise encode() causes Python to reject a Unicode character as a tuple of numbers.