标签归档:python-3.7

什么是数据类,它们与普通类有何不同?

问题:什么是数据类,它们与普通类有何不同?

使用PEP 557,数据类被引入到python标准库中。

它们使用@dataclass装饰器,并且应该是“默认情况下的可变命名元组”,但是我不确定我是否真正理解这是什么意思,以及它们与普通类的区别。

python数据类到底是什么,什么时候最好使用它们?

With PEP 557 data classes are introduced into python standard library.

They make use of the @dataclass decorator and they are supposed to be “mutable namedtuples with default” but I’m not really sure I understand what this actually means and how they are different from common classes.

What exactly are python data classes and when is it best to use them?


回答 0

数据类只是用于存储状态的常规类,不仅仅包含许多逻辑。每次创建一个主要由属性组成的类时,就创建了一个数据类。

dataclasses模块的作用是使创建数据类更加容易。它会为您处理很多样板。

当您的数据类必须是可哈希的时,这一点尤其重要。这需要一种__hash__方法以及一种__eq__方法。如果添加自定义__repr__方法以简化调试,则可能会变得很冗长:

class InventoryItem:
    '''Class for keeping track of an item in inventory.'''
    name: str
    unit_price: float
    quantity_on_hand: int = 0

    def __init__(
            self, 
            name: str, 
            unit_price: float,
            quantity_on_hand: int = 0
        ) -> None:
        self.name = name
        self.unit_price = unit_price
        self.quantity_on_hand = quantity_on_hand

    def total_cost(self) -> float:
        return self.unit_price * self.quantity_on_hand

    def __repr__(self) -> str:
        return (
            'InventoryItem('
            f'name={self.name!r}, unit_price={self.unit_price!r}, '
            f'quantity_on_hand={self.quantity_on_hand!r})'

    def __hash__(self) -> int:
        return hash((self.name, self.unit_price, self.quantity_on_hand))

    def __eq__(self, other) -> bool:
        if not isinstance(other, InventoryItem):
            return NotImplemented
        return (
            (self.name, self.unit_price, self.quantity_on_hand) == 
            (other.name, other.unit_price, other.quantity_on_hand))

有了dataclasses它,您可以将其减少为:

from dataclasses import dataclass

@dataclass(unsafe_hash=True)
class InventoryItem:
    '''Class for keeping track of an item in inventory.'''
    name: str
    unit_price: float
    quantity_on_hand: int = 0

    def total_cost(self) -> float:
        return self.unit_price * self.quantity_on_hand

同一类的装饰也可以产生比较方法(__lt____gt__等)和手柄不变性。

namedtuple类也是数据类,但默认情况下是不变的(以及作为序列)。dataclasses在这方面更灵活,并且可以轻松地进行结构化,使其可以充当namedtuple类的相同角色

PEP受该attrs项目的启发,该项目可以做更多的事情(包括广告位,验证器,转换器,元数据等)。

如果你想看到一些例子,我最近使用dataclasses了几个我的代码的问世解决方案,请参阅解决方案7天8天11天20天

如果要dataclasses在<3.7以下的Python版本中使用模块,则可以安装向后移植的模块(需要3.6)或使用上述attrs项目。

Data classes are just regular classes that are geared towards storing state, more than contain a lot of logic. Every time you create a class that mostly consists of attributes you made a data class.

What the dataclasses module does is make it easier to create data classes. It takes care of a lot of boiler plate for you.

This is especially important when your data class must be hashable; this requires a __hash__ method as well as an __eq__ method. If you add a custom __repr__ method for ease of debugging, that can become quite verbose:

class InventoryItem:
    '''Class for keeping track of an item in inventory.'''
    name: str
    unit_price: float
    quantity_on_hand: int = 0

    def __init__(
            self, 
            name: str, 
            unit_price: float,
            quantity_on_hand: int = 0
        ) -> None:
        self.name = name
        self.unit_price = unit_price
        self.quantity_on_hand = quantity_on_hand

    def total_cost(self) -> float:
        return self.unit_price * self.quantity_on_hand

    def __repr__(self) -> str:
        return (
            'InventoryItem('
            f'name={self.name!r}, unit_price={self.unit_price!r}, '
            f'quantity_on_hand={self.quantity_on_hand!r})'

    def __hash__(self) -> int:
        return hash((self.name, self.unit_price, self.quantity_on_hand))

    def __eq__(self, other) -> bool:
        if not isinstance(other, InventoryItem):
            return NotImplemented
        return (
            (self.name, self.unit_price, self.quantity_on_hand) == 
            (other.name, other.unit_price, other.quantity_on_hand))

With dataclasses you can reduce it to:

from dataclasses import dataclass

@dataclass(unsafe_hash=True)
class InventoryItem:
    '''Class for keeping track of an item in inventory.'''
    name: str
    unit_price: float
    quantity_on_hand: int = 0

    def total_cost(self) -> float:
        return self.unit_price * self.quantity_on_hand

The same class decorator can also generate comparison methods (__lt__, __gt__, etc.) and handle immutability.

namedtuple classes are also data classes, but are immutable by default (as well as being sequences). dataclasses are much more flexible in this regard, and can easily be structured such that they can fill the same role as a namedtuple class.

The PEP was inspired by the attrs project, which can do even more (including slots, validators, converters, metadata, etc.).

If you want to see some examples, I recently used dataclasses for several of my Advent of Code solutions, see the solutions for day 7, day 8, day 11 and day 20.

If you want to use dataclasses module in Python versions < 3.7, then you could install the backported module (requires 3.6) or use the attrs project mentioned above.


回答 1

总览

这个问题已经解决。但是,此答案添加了一些实际示例以帮助对数据类进行基本了解。

python数据类到底是什么,什么时候最好使用它们?

  1. 代码生成器:生成样板代码;您可以选择在常规类中实现特殊方法,也可以让数据类自动实现它们。
  2. 数据容器:保存数据的结构(例如,元组和字典),通常具有点分,属性访问权限,例如namedtuple

“具有默认值的可变命名元组”

这是后一词的意思:

  • mutable:默认情况下,可以重新分配数据类属性。您可以选择使它们不可变(请参见下面的示例)。
  • namedtuple:您具有点分,属性访问权限,例如namedtuple或常规类。
  • default:您可以为属性分配默认值。

与普通类相比,您主要节省键入样板代码的费用。


特征

这是数据类功能的概述(TL; DR?请参阅下一节的摘要表)。

你得到什么

这是默认情况下从数据类获得的功能。

属性+表示+比较

import dataclasses


@dataclasses.dataclass
#@dataclasses.dataclass()                                       # alternative
class Color:
    r : int = 0
    g : int = 0
    b : int = 0

通过将以下关键字自动设置为,可以提供这些默认值True

@dataclasses.dataclass(init=True, repr=True, eq=True)

您可以开启什么

如果将适当的关键字设置为,则可以使用其他功能True

订购

@dataclasses.dataclass(order=True)
class Color:
    r : int = 0
    g : int = 0
    b : int = 0

现在实现了排序方法(重载运算符< > <= >=:),类似于functools.total_ordering更强大的相等性测试。

散列,可变

@dataclasses.dataclass(unsafe_hash=True)                        # override base `__hash__`
class Color:
    ...

尽管对象可能是可变的(可能是不希望的),但仍实现了哈希。

可哈希,不可变

@dataclasses.dataclass(frozen=True)                             # `eq=True` (default) to be immutable 
class Color:
    ...

现在实现了哈希,并且不允许更改对象或分配给属性。

总体而言,如果unsafe_hash=True或,则该对象是可哈希的frozen=True

另请参阅原始哈希逻辑表

你没有得到什么

要获得以下功能,必须手动实施特殊方法:

开箱

@dataclasses.dataclass
class Color:
    r : int = 0
    g : int = 0
    b : int = 0

    def __iter__(self):
        yield from dataclasses.astuple(self)

优化

@dataclasses.dataclass
class SlottedColor:
    __slots__ = ["r", "b", "g"]
    r : int
    g : int
    b : int

现在减小了对象大小:

>>> imp sys
>>> sys.getsizeof(Color)
1056
>>> sys.getsizeof(SlottedColor)
888

在某些情况下,__slots__还可以提高创建实例和访问属性的速度。另外,插槽不允许默认分配;否则,将ValueError引发a。

在此博客文章中查看有关广告位的更多信息


汇总表

+----------------------+----------------------+----------------------------------------------------+-----------------------------------------+
|       Feature        |       Keyword        |                      Example                       |           Implement in a Class          |
+----------------------+----------------------+----------------------------------------------------+-----------------------------------------+
| Attributes           |  init                |  Color().r -> 0                                    |  __init__                               |
| Representation       |  repr                |  Color() -> Color(r=0, g=0, b=0)                   |  __repr__                               |
| Comparision*         |  eq                  |  Color() == Color(0, 0, 0) -> True                 |  __eq__                                 |
|                      |                      |                                                    |                                         |
| Order                |  order               |  sorted([Color(0, 50, 0), Color()]) -> ...         |  __lt__, __le__, __gt__, __ge__         |
| Hashable             |  unsafe_hash/frozen  |  {Color(), {Color()}} -> {Color(r=0, g=0, b=0)}    |  __hash__                               |
| Immutable            |  frozen + eq         |  Color().r = 10 -> TypeError                       |  __setattr__, __delattr__               |
|                      |                      |                                                    |                                         |
| Unpacking+           |  -                   |  r, g, b = Color()                                 |   __iter__                              |
| Optimization+        |  -                   |  sys.getsizeof(SlottedColor) -> 888                |  __slots__                              |
+----------------------+----------------------+----------------------------------------------------+-----------------------------------------+

+这些方法不会自动生成,需要在数据类中手动实现。

* __ne__不需要,因此也没有实现


附加功能

后初始化

@dataclasses.dataclass
class RGBA:
    r : int = 0
    g : int = 0
    b : int = 0
    a : float = 1.0

    def __post_init__(self):
        self.a : int =  int(self.a * 255)


RGBA(127, 0, 255, 0.5)
# RGBA(r=127, g=0, b=255, a=127)

遗产

@dataclasses.dataclass
class RGBA(Color):
    a : int = 0

转换次数

递归将数据类转换为元组或字典:

>>> dataclasses.astuple(Color(128, 0, 255))
(128, 0, 255)
>>> dataclasses.asdict(Color(128, 0, 255))
{r: 128, g: 0, b: 255}

局限性


参考资料

  • R.赫廷杰的谈话数据类:代码生成器来结束所有的代码生成器
  • T. Hunner 关于更简单类演讲:Python类无所不包
  • Python 有关散列细节的文档
  • 关于Python 3.7中的数据类最终指南的 Real Python 指南
  • A. Shaw的博客帖子的Python 3.7数据类的简要介绍
  • E.Smith关于数据类github存储库

Overview

The question has been addressed. However, this answer adds some practical examples to aid in the basic understanding of dataclasses.

What exactly are python data classes and when is it best to use them?

  1. code generators: generate boilerplate code; you can choose to implement special methods in a regular class or have a dataclass implement them automatically.
  2. data containers: structures that hold data (e.g. tuples and dicts), often with dotted, attribute access such as classes, namedtuple and others.

“mutable namedtuples with default[s]”

Here is what the latter phrase means:

  • mutable: by default, dataclass attributes can be reassigned. You can optionally make them immutable (see Examples below).
  • namedtuple: you have dotted, attribute access like a namedtuple or a regular class.
  • default: you can assign default values to attributes.

Compared to common classes, you primarily save on typing boilerplate code.


Features

This is an overview of dataclass features (TL;DR? See the Summary Table in the next section).

What you get

Here are features you get by default from dataclasses.

Attributes + Representation + Comparison

import dataclasses


@dataclasses.dataclass
#@dataclasses.dataclass()                                       # alternative
class Color:
    r : int = 0
    g : int = 0
    b : int = 0

These defaults are provided by automatically setting the following keywords to True:

@dataclasses.dataclass(init=True, repr=True, eq=True)

What you can turn on

Additional features are available if the appropriate keywords are set to True.

Order

@dataclasses.dataclass(order=True)
class Color:
    r : int = 0
    g : int = 0
    b : int = 0

The ordering methods are now implemented (overloading operators: < > <= >=), similarly to functools.total_ordering with stronger equality tests.

Hashable, Mutable

@dataclasses.dataclass(unsafe_hash=True)                        # override base `__hash__`
class Color:
    ...

Although the object is potentially mutable (possibly undesired), a hash is implemented.

Hashable, Immutable

@dataclasses.dataclass(frozen=True)                             # `eq=True` (default) to be immutable 
class Color:
    ...

A hash is now implemented and changing the object or assigning to attributes is disallowed.

Overall, the object is hashable if either unsafe_hash=True or frozen=True.

See also the original hashing logic table with more details.

What you don’t get

To get the following features, special methods must be manually implemented:

Unpacking

@dataclasses.dataclass
class Color:
    r : int = 0
    g : int = 0
    b : int = 0

    def __iter__(self):
        yield from dataclasses.astuple(self)

Optimization

@dataclasses.dataclass
class SlottedColor:
    __slots__ = ["r", "b", "g"]
    r : int
    g : int
    b : int

The object size is now reduced:

>>> imp sys
>>> sys.getsizeof(Color)
1056
>>> sys.getsizeof(SlottedColor)
888

In some circumstances, __slots__ also improves the speed of creating instances and accessing attributes. Also, slots do not allow default assignments; otherwise, a ValueError is raised.

See more on slots in this blog post.


Summary Table

+----------------------+----------------------+----------------------------------------------------+-----------------------------------------+
|       Feature        |       Keyword        |                      Example                       |           Implement in a Class          |
+----------------------+----------------------+----------------------------------------------------+-----------------------------------------+
| Attributes           |  init                |  Color().r -> 0                                    |  __init__                               |
| Representation       |  repr                |  Color() -> Color(r=0, g=0, b=0)                   |  __repr__                               |
| Comparision*         |  eq                  |  Color() == Color(0, 0, 0) -> True                 |  __eq__                                 |
|                      |                      |                                                    |                                         |
| Order                |  order               |  sorted([Color(0, 50, 0), Color()]) -> ...         |  __lt__, __le__, __gt__, __ge__         |
| Hashable             |  unsafe_hash/frozen  |  {Color(), {Color()}} -> {Color(r=0, g=0, b=0)}    |  __hash__                               |
| Immutable            |  frozen + eq         |  Color().r = 10 -> TypeError                       |  __setattr__, __delattr__               |
|                      |                      |                                                    |                                         |
| Unpacking+           |  -                   |  r, g, b = Color()                                 |   __iter__                              |
| Optimization+        |  -                   |  sys.getsizeof(SlottedColor) -> 888                |  __slots__                              |
+----------------------+----------------------+----------------------------------------------------+-----------------------------------------+

+These methods are not automatically generated and require manual implementation in a dataclass.

* __ne__ is not needed and thus not implemented.


Additional features

Post-initialization

@dataclasses.dataclass
class RGBA:
    r : int = 0
    g : int = 0
    b : int = 0
    a : float = 1.0

    def __post_init__(self):
        self.a : int =  int(self.a * 255)


RGBA(127, 0, 255, 0.5)
# RGBA(r=127, g=0, b=255, a=127)

Inheritance

@dataclasses.dataclass
class RGBA(Color):
    a : int = 0

Conversions

Convert a dataclass to a tuple or a dict, recursively:

>>> dataclasses.astuple(Color(128, 0, 255))
(128, 0, 255)
>>> dataclasses.asdict(Color(128, 0, 255))
{'r': 128, 'g': 0, 'b': 255}

Limitations


References

  • R. Hettinger’s talk on Dataclasses: The code generator to end all code generators
  • T. Hunner’s talk on Easier Classes: Python Classes Without All the Cruft
  • Python’s documentation on hashing details
  • Real Python’s guide on The Ultimate Guide to Data Classes in Python 3.7
  • A. Shaw’s blog post on A brief tour of Python 3.7 data classes
  • E. Smith’s github repository on dataclasses

回答 2

根据PEP规范

提供了一个类装饰器,该类装饰器检查类定义中具有类型注释的变量,如PEP 526“变量注释的语法”中所定义。在本文档中,此类变量称为字段。装饰器使用这些字段将生成的方法定义添加到类中,以支持实例初始化,repr,比较方法以及(可选)规范部分中描述的其他方法。这样的类称为数据类,但该类实际上没有什么特别的:装饰器将生成的方法添加到该类中,并返回与该类相同的类。

@dataclass生成器增加方法的类,否则你自己定义一样__repr____init____lt__,和__gt__

From the PEP specification:

A class decorator is provided which inspects a class definition for variables with type annotations as defined in PEP 526, “Syntax for Variable Annotations”. In this document, such variables are called fields. Using these fields, the decorator adds generated method definitions to the class to support instance initialization, a repr, comparison methods, and optionally other methods as described in the Specification section. Such a class is called a Data Class, but there’s really nothing special about the class: the decorator adds generated methods to the class and returns the same class it was given.

The @dataclass generator adds methods to the class that you’d otherwise define yourself like __repr__, __init__, __lt__, and __gt__.


回答 3

考虑这个简单的类 Foo

from dataclasses import dataclass
@dataclass
class Foo:    
    def bar():
        pass  

这是dir()内置的比较。左侧是Foo没有@dataclass装饰器的,右侧是带有@dataclass装饰器的。

在使用inspect模块进行比较之后,这是另一个差异。

Consider this simple class Foo

from dataclasses import dataclass
@dataclass
class Foo:    
    def bar():
        pass  

Here is the dir() built-in comparison. On the left-hand side is the Foo without the @dataclass decorator, and on the right is with the @dataclass decorator.

Here is another diff, after using the inspect module for comparison.