Overview
The question has been addressed. However, this answer adds some practical examples to aid in the basic understanding of dataclasses.
What exactly are python data classes and when is it best to use them?
- code generators: generate boilerplate code; you can choose to implement special methods in a regular class or have a dataclass implement them automatically.
- data containers: structures that hold data (e.g. tuples and dicts), often with dotted, attribute access such as classes,
namedtuple
and others.
“mutable namedtuples with default[s]”
Here is what the latter phrase means:
- mutable: by default, dataclass attributes can be reassigned. You can optionally make them immutable (see Examples below).
- namedtuple: you have dotted, attribute access like a
namedtuple
or a regular class.
- default: you can assign default values to attributes.
Compared to common classes, you primarily save on typing boilerplate code.
Features
This is an overview of dataclass features (TL;DR? See the Summary Table in the next section).
What you get
Here are features you get by default from dataclasses.
Attributes + Representation + Comparison
import dataclasses
@dataclasses.dataclass
#@dataclasses.dataclass() # alternative
class Color:
r : int = 0
g : int = 0
b : int = 0
These defaults are provided by automatically setting the following keywords to True
:
@dataclasses.dataclass(init=True, repr=True, eq=True)
What you can turn on
Additional features are available if the appropriate keywords are set to True
.
Order
@dataclasses.dataclass(order=True)
class Color:
r : int = 0
g : int = 0
b : int = 0
The ordering methods are now implemented (overloading operators: < > <= >=
), similarly to functools.total_ordering
with stronger equality tests.
Hashable, Mutable
@dataclasses.dataclass(unsafe_hash=True) # override base `__hash__`
class Color:
...
Although the object is potentially mutable (possibly undesired), a hash is implemented.
Hashable, Immutable
@dataclasses.dataclass(frozen=True) # `eq=True` (default) to be immutable
class Color:
...
A hash is now implemented and changing the object or assigning to attributes is disallowed.
Overall, the object is hashable if either unsafe_hash=True
or frozen=True
.
See also the original hashing logic table with more details.
What you don’t get
To get the following features, special methods must be manually implemented:
Unpacking
@dataclasses.dataclass
class Color:
r : int = 0
g : int = 0
b : int = 0
def __iter__(self):
yield from dataclasses.astuple(self)
Optimization
@dataclasses.dataclass
class SlottedColor:
__slots__ = ["r", "b", "g"]
r : int
g : int
b : int
The object size is now reduced:
>>> imp sys
>>> sys.getsizeof(Color)
1056
>>> sys.getsizeof(SlottedColor)
888
In some circumstances, __slots__
also improves the speed of creating instances and accessing attributes. Also, slots do not allow default assignments; otherwise, a ValueError
is raised.
See more on slots in this blog post.
Summary Table
+----------------------+----------------------+----------------------------------------------------+-----------------------------------------+
| Feature | Keyword | Example | Implement in a Class |
+----------------------+----------------------+----------------------------------------------------+-----------------------------------------+
| Attributes | init | Color().r -> 0 | __init__ |
| Representation | repr | Color() -> Color(r=0, g=0, b=0) | __repr__ |
| Comparision* | eq | Color() == Color(0, 0, 0) -> True | __eq__ |
| | | | |
| Order | order | sorted([Color(0, 50, 0), Color()]) -> ... | __lt__, __le__, __gt__, __ge__ |
| Hashable | unsafe_hash/frozen | {Color(), {Color()}} -> {Color(r=0, g=0, b=0)} | __hash__ |
| Immutable | frozen + eq | Color().r = 10 -> TypeError | __setattr__, __delattr__ |
| | | | |
| Unpacking+ | - | r, g, b = Color() | __iter__ |
| Optimization+ | - | sys.getsizeof(SlottedColor) -> 888 | __slots__ |
+----------------------+----------------------+----------------------------------------------------+-----------------------------------------+
+These methods are not automatically generated and require manual implementation in a dataclass.
* __ne__
is not needed and thus not implemented.
Additional features
Post-initialization
@dataclasses.dataclass
class RGBA:
r : int = 0
g : int = 0
b : int = 0
a : float = 1.0
def __post_init__(self):
self.a : int = int(self.a * 255)
RGBA(127, 0, 255, 0.5)
# RGBA(r=127, g=0, b=255, a=127)
Inheritance
@dataclasses.dataclass
class RGBA(Color):
a : int = 0
Conversions
Convert a dataclass to a tuple or a dict, recursively:
>>> dataclasses.astuple(Color(128, 0, 255))
(128, 0, 255)
>>> dataclasses.asdict(Color(128, 0, 255))
{'r': 128, 'g': 0, 'b': 255}
Limitations
References
- R. Hettinger’s talk on Dataclasses: The code generator to end all code generators
- T. Hunner’s talk on Easier Classes: Python Classes Without All the Cruft
- Python’s documentation on hashing details
- Real Python’s guide on The Ultimate Guide to Data Classes in Python 3.7
- A. Shaw’s blog post on A brief tour of Python 3.7 data classes
- E. Smith’s github repository on dataclasses