Python:对Unicode转义的字符串使用.format()

问题:Python:对Unicode转义的字符串使用.format()

我正在使用Python 2.6.5。我的代码要求使用“大于或等于”符号。它去了:

>>> s = u'\u2265'
>>> print s
>>> 
>>> print "{0}".format(s)
Traceback (most recent call last):
     File "<input>", line 1, in <module> 
UnicodeEncodeError: 'ascii' codec can't encode character u'\u2265'
  in position 0: ordinal not in range(128)`  

为什么会出现此错误?有正确的方法吗?我需要使用该.format()功能。

I am using Python 2.6.5. My code requires the use of the “more than or equal to” sign. Here it goes:

>>> s = u'\u2265'
>>> print s
>>> ≥
>>> print "{0}".format(s)
Traceback (most recent call last):
     File "<input>", line 1, in <module> 
UnicodeEncodeError: 'ascii' codec can't encode character u'\u2265'
  in position 0: ordinal not in range(128)`  

Why do I get this error? Is there a right way to do this? I need to use the .format() function.


回答 0

只需将第二个字符串也设为unicode字符串

>>> s = u'\u2265'
>>> print s

>>> print "{0}".format(s)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
UnicodeEncodeError: 'ascii' codec can't encode character u'\u2265' in position 0: ordinal not in range(128)
>>> print u"{0}".format(s)
≥
>>> 

Just make the second string also a unicode string

>>> s = u'\u2265'
>>> print s
≥
>>> print "{0}".format(s)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
UnicodeEncodeError: 'ascii' codec can't encode character u'\u2265' in position 0: ordinal not in range(128)
>>> print u"{0}".format(s)
≥
>>> 

回答 1

unicode需要unicode格式字符串。

>>> print u'{0}'.format(s)

unicodes need unicode format strings.

>>> print u'{0}'.format(s)
≥

回答 2

一点的更多信息,为什么出现这种情况。

>>> s = u'\u2265'
>>> print s

之所以起作用,是因为print自动为您的环境使用系统编码,该编码很可能已设置为UTF-8。(您可以通过做检查import sys; print sys.stdout.encoding

>>> print "{0}".format(s)

失败,因为format尝试匹配调用它的类型的编码(我找不到关于它的文档,但这是我注意到的行为)。由于字符串文字是python 2中编码为ASCII的字节字符串,因此format尝试将其编码s为ASCII,然后导致该异常。观察:

>>> s = u'\u2265'
>>> s.encode('ascii')
Traceback (most recent call last):
  File "<input>", line 1, in <module>
UnicodeEncodeError: 'ascii' codec can't encode character u'\u2265' in position 0: ordinal not in range(128)

因此,这基本上就是这些方法起作用的原因:

>>> s = u'\u2265'
>>> print u'{}'.format(s)

>>> print '{}'.format(s.encode('utf-8'))

源字符集由编码声明定义。如果源文件中没有给出编码声明,则为ASCII(https://docs.python.org/2/reference/lexical_analysis.html#string-literals

A bit more information on why that happens.

>>> s = u'\u2265'
>>> print s

works because print automatically uses the system encoding for your environment, which was likely set to UTF-8. (You can check by doing import sys; print sys.stdout.encoding)

>>> print "{0}".format(s)

fails because format tries to match the encoding of the type that it is called on (I couldn’t find documentation on this, but this is the behavior I’ve noticed). Since string literals are byte strings encoded as ASCII in python 2, format tries to encode s as ASCII, which then results in that exception. Observe:

>>> s = u'\u2265'
>>> s.encode('ascii')
Traceback (most recent call last):
  File "<input>", line 1, in <module>
UnicodeEncodeError: 'ascii' codec can't encode character u'\u2265' in position 0: ordinal not in range(128)

So that is basically why these approaches work:

>>> s = u'\u2265'
>>> print u'{}'.format(s)
≥
>>> print '{}'.format(s.encode('utf-8'))
≥

The source character set is defined by the encoding declaration; it is ASCII if no encoding declaration is given in the source file (https://docs.python.org/2/reference/lexical_analysis.html#string-literals)