Python __str__与__unicode__

问题:Python __str__与__unicode__

有没有时,你应该实现一个python约定__str__()__unicode__()。我已经看到类重写的__unicode__()频率高于,__str__()但似乎不一致。当实施一个相对于另一个更好时,是否有特定的规则?实施这两种方法是否必要/良好做法?

Is there a python convention for when you should implement __str__() versus __unicode__(). I’ve seen classes override __unicode__() more frequently than __str__() but it doesn’t appear to be consistent. Are there specific rules when it is better to implement one versus the other? Is it necessary/good practice to implement both?


回答 0

__str__()是旧方法-它返回字节。__unicode__()是新的首选方法-它返回字符。名称有些混乱,但是在2.x中,出于兼容性原因,我们坚持使用它们。通常,您应该将所有字符串格式都放在中__unicode__(),并创建一个存根__str__()方法:

def __str__(self):
    return unicode(self).encode('utf-8')

在3.0中,str包含字符,因此将__bytes__()和命名为相同的方法__str__()。这些行为符合预期。

__str__() is the old method — it returns bytes. __unicode__() is the new, preferred method — it returns characters. The names are a bit confusing, but in 2.x we’re stuck with them for compatibility reasons. Generally, you should put all your string formatting in __unicode__(), and create a stub __str__() method:

def __str__(self):
    return unicode(self).encode('utf-8')

In 3.0, str contains characters, so the same methods are named __bytes__() and __str__(). These behave as expected.


回答 1

如果我不特别关心给定类的微优化字符串化,那么我将始终__unicode__只实施它,因为它更笼统。当我确实关心此类微小的性能问题(这是exceptions,不是规则)时,__str__仅(当我可以证明在字符串化输出中绝不会出现非ASCII字符时)或两者(当两者都可能时)可能会救命。

我认为这些是牢固的原则,但实际上知道这是很常见的,只有ASCII字符会不做任何努力来证明它(例如,字符串形式只有数字,标点符号,并且可能是短的ASCII名称;-)在这种情况下,直接采用“公正__str__”方法是很典型的做法(但如果我与一个编程团队合作,提出了一项本地准则来避免这种情况,我将对该提案+1,因为在这些问题上很容易犯错,并且“过早的优化是编程中万恶之源” ;-)。

If I didn’t especially care about micro-optimizing stringification for a given class I’d always implement __unicode__ only, as it’s more general. When I do care about such minute performance issues (which is the exception, not the rule), having __str__ only (when I can prove there never will be non-ASCII characters in the stringified output) or both (when both are possible), might help.

These I think are solid principles, but in practice it’s very common to KNOW there will be nothing but ASCII characters without doing effort to prove it (e.g. the stringified form only has digits, punctuation, and maybe a short ASCII name;-) in which case it’s quite typical to move on directly to the “just __str__” approach (but if a programming team I worked with proposed a local guideline to avoid that, I’d be +1 on the proposal, as it’s easy to err in these matters AND “premature optimization is the root of all evil in programming”;-).


回答 2

随着世界变得越来越小,您遇到的任何字符串都有可能最终包含Unicode。因此,对于任何新应用,您至少应提供__unicode__()__str__()然后,您是否还要覆盖也只是一个品味问题。

With the world getting smaller, chances are that any string you encounter will contain Unicode eventually. So for any new apps, you should at least provide __unicode__(). Whether you also override __str__() is then just a matter of taste.


回答 3

如果您在Django中同时使用python2和python3,则建议使用python_2_unicode_compatible装饰器:

Django提供了一种简单的方法来定义可在Python 2和3上使用的str()和 unicode()方法,您必须定义一个返回文本的str()方法并应用python_2_unicode_compatible()装饰器。

如前面对另一个答案的注释中所述,某些版本的future.utils也支持此装饰器。在我的系统上,我需要为python2安装一个新的future模块,并为python3安装future。之后,这是一个功能示例:

#! /usr/bin/env python

from future.utils import python_2_unicode_compatible
from sys import version_info

@python_2_unicode_compatible
class SomeClass():
    def __str__(self):
        return "Called __str__"


if __name__ == "__main__":
    some_inst = SomeClass()
    print(some_inst)
    if (version_info > (3,0)):
        print("Python 3 does not support unicode()")
    else:
        print(unicode(some_inst))

这是示例输出(其中venv2 / venv3是virtualenv实例):

~/tmp$ ./venv3/bin/python3 demo_python_2_unicode_compatible.py 
Called __str__
Python 3 does not support unicode()

~/tmp$ ./venv2/bin/python2 demo_python_2_unicode_compatible.py 
Called __str__
Called __str__

If you are working in both python2 and python3 in Django, I recommend the python_2_unicode_compatible decorator:

Django provides a simple way to define str() and unicode() methods that work on Python 2 and 3: you must define a str() method returning text and to apply the python_2_unicode_compatible() decorator.

As noted in earlier comments to another answer, some versions of future.utils also support this decorator. On my system, I needed to install a newer future module for python2 and install future for python3. After that, then here is a functional example:

#! /usr/bin/env python

from future.utils import python_2_unicode_compatible
from sys import version_info

@python_2_unicode_compatible
class SomeClass():
    def __str__(self):
        return "Called __str__"


if __name__ == "__main__":
    some_inst = SomeClass()
    print(some_inst)
    if (version_info > (3,0)):
        print("Python 3 does not support unicode()")
    else:
        print(unicode(some_inst))

Here is example output (where venv2/venv3 are virtualenv instances):

~/tmp$ ./venv3/bin/python3 demo_python_2_unicode_compatible.py 
Called __str__
Python 3 does not support unicode()

~/tmp$ ./venv2/bin/python2 demo_python_2_unicode_compatible.py 
Called __str__
Called __str__

回答 4

Python 2: 仅实现__str __(),并返回unicode。

什么时候__unicode__()省略,有人打电话unicode(o)u"%s"%o,Python的呼叫o.__str__()并转换为Unicode使用系统编码。(请参阅的文档__unicode__()。)

相反的说法是不正确的。如果实施__unicode__()但未__str__(),则当有人调用str(o)或时"%s"%o,Python返回repr(o)


基本原理

为什么unicode要从中返回a__str__()
如果__str__()返回unicode,Python会自动str使用系统编码将其转换为。

有什么好处?
①它使您不必担心系统编码是什么(即locale.getpreferredencoeding(…))。就个人而言,这不仅麻烦,而且我认为系统无论如何都要注意这一点。②如果小心,您的代码可能会与Python 3相互兼容,其中__str__()返回unicode。

从名为的函数中返回unicode是骗人的 __str__()
一点。但是,您可能已经在这样做了。如果你有from __future__ import unicode_literals位于文件的顶部,则很有可能在不知道的情况下返回unicode。

那么Python 3呢?
Python 3不使用__unicode__()。但是,如果您实现__str__()了使其在Python 2或Python 3下返回unicode的功能,那么那部分代码将是交叉兼容的。

如果我想unicode(o)与之有本质区别str()怎么办?
同时实现__str__()(可能返回str)和__unicode__()。我想这很少见,但您可能希望获得实质上不同的输出(例如,特殊字符的ASCII版本,例如":)"for u"☺")。

我意识到有些人可能会发现这一争议。

Python 2: Implement __str__() only, and return a unicode.

When __unicode__() is omitted and someone calls unicode(o) or u"%s"%o, Python calls o.__str__() and converts to unicode using the system encoding. (See documentation of __unicode__().)

The opposite is not true. If you implement __unicode__() but not __str__(), then when someone calls str(o) or "%s"%o, Python returns repr(o).


Rationale

Why would it work to return a unicode from __str__()?
If __str__() returns a unicode, Python automatically converts it to str using the system encoding.

What’s the benefit?
① It frees you from worrying about what the system encoding is (i.e., locale.getpreferredencoeding(…)). Not only is that messy, personally, but I think it’s something the system should take care of anyway. ② If you are careful, your code may come out cross-compatible with Python 3, in which __str__() returns unicode.

Isn’t it deceptive to return a unicode from a function called __str__()?
A little. However, you might be already doing it. If you have from __future__ import unicode_literals at the top of your file, there’s a good chance you’re returning a unicode without even knowing it.

What about Python 3?
Python 3 does not use __unicode__(). However, if you implement __str__() so that it returns unicode under either Python 2 or Python 3, then that part of your code will be cross-compatible.

What if I want unicode(o) to be substantively different from str()?
Implement both __str__() (possibly returning str) and __unicode__(). I imagine this would be rare, but you might want substantively different output (e.g., ASCII versions of special characters, like ":)" for u"☺").

I realize some may find this controversial.


回答 5

值得向那些不熟悉该__unicode__功能的人指出一些在Python 2.x中围绕它的默认行为,尤其是与并排定义时__str__

class A :
    def __init__(self) :
        self.x = 123
        self.y = 23.3

    #def __str__(self) :
    #    return "STR      {}      {}".format( self.x , self.y)
    def __unicode__(self) :
        return u"UNICODE  {}      {}".format( self.x , self.y)

a1 = A()
a2 = A()

print( "__repr__ checks")
print( a1 )
print( a2 )

print( "\n__str__ vs __unicode__ checks")
print( str( a1 ))
print( unicode(a1))
print( "{}".format( a1 ))
print( u"{}".format( a1 ))

产生以下控制台输出…

__repr__ checks
<__main__.A instance at 0x103f063f8>
<__main__.A instance at 0x103f06440>

__str__ vs __unicode__ checks
<__main__.A instance at 0x103f063f8>
UNICODE 123      23.3
<__main__.A instance at 0x103f063f8>
UNICODE 123      23.3

现在,当我取消注释该__str__方法时

__repr__ checks
STR      123      23.3
STR      123      23.3

__str__ vs __unicode__ checks
STR      123      23.3
UNICODE  123      23.3
STR      123      23.3
UNICODE  123      23.3

It’s worth pointing out to those unfamiliar with the __unicode__ function some of the default behaviors surrounding it back in Python 2.x, especially when defined side by side with __str__.

class A :
    def __init__(self) :
        self.x = 123
        self.y = 23.3

    #def __str__(self) :
    #    return "STR      {}      {}".format( self.x , self.y)
    def __unicode__(self) :
        return u"UNICODE  {}      {}".format( self.x , self.y)

a1 = A()
a2 = A()

print( "__repr__ checks")
print( a1 )
print( a2 )

print( "\n__str__ vs __unicode__ checks")
print( str( a1 ))
print( unicode(a1))
print( "{}".format( a1 ))
print( u"{}".format( a1 ))

yields the following console output…

__repr__ checks
<__main__.A instance at 0x103f063f8>
<__main__.A instance at 0x103f06440>

__str__ vs __unicode__ checks
<__main__.A instance at 0x103f063f8>
UNICODE 123      23.3
<__main__.A instance at 0x103f063f8>
UNICODE 123      23.3

Now when I uncomment out the __str__ method

__repr__ checks
STR      123      23.3
STR      123      23.3

__str__ vs __unicode__ checks
STR      123      23.3
UNICODE  123      23.3
STR      123      23.3
UNICODE  123      23.3