Python: how to type hint a dataclass?
Problem Description:
The code below works, but I’m getting the following warning by PyCharm:
Cannot find reference
__annotations__
in ‘(…) -> Any’.
I guess it’s because I’m using Callable
. I didn’t find something like Dataclass
. Which type should I use instead?
from __future__ import annotations
from dataclasses import dataclass
from typing import Callable
@dataclass
class Fruit:
color: str
taste: str
def get_cls() -> Callable:
return Fruit
attrs = get_cls().__annotations__ # <- IDE warning
print(attrs)
Solution – 1
In this particular example you can just hint it directly:
from dataclasses import dataclass
@dataclass
class Fruit:
x: str
def get_cls() -> type[Fruit]:
return Fruit
attrs = get_cls().__annotations__
print(attrs)
$ python d.py
{'x': <class 'str'>}
$ mypy d.py
Success: no issues found in 1 source file
However I don’t know if this is what you’re asking. Are you after a generic type for any dataclass? (I would be tempted just to hint the union of all possible return types of get_cls()
: the whole point about using a dataclass rather than e.g. a dict is surely to distinguish between types of data. And you do want your typechecker to warn you if you try to access attributes not defined on one of your dataclasses.)
References
See the docs on typing.Type
which is now available as type
(just like we can now use list
and dict
rather than typing.List
and typing.Dict
).
Solution – 2
The simplest option is to remove the return type annotation in its entirety.
Note: PyCharm is usually smart enough to infer the return type automatically.
from __future__ import annotations
from dataclasses import dataclass
# remove this line
# from typing import Callable
@dataclass
class Fruit:
color: str
taste: str
# def get_cls() -> Callable: <== No, the return annotation is wrong (Fruit is *more* than a callable)
def get_cls():
return Fruit
attrs = get_cls().__annotations__ # <- No IDE warning, Yay!
print(attrs)
In PyCharm, the return type is correctly inferred:
To generically type hint a dataclass – since dataclasses are essentially Python classes under the hood, with auto-generated methods and some "extra" class attributes added in to the mix, you could just type hint it with typing.Protocol
as shown below:
from __future__ import annotations
from dataclasses import dataclass, Field
from typing import TYPE_CHECKING, Any, Callable, Iterable, Protocol
if TYPE_CHECKING:
# this won't print
print('Oh YEAH !!')
class DataClass(Protocol):
__dict__: dict[str, Any]
__doc__: str | None
# if using `@dataclass(slots=True)`
__slots__: str | Iterable[str]
__annotations__: dict[str, str | type]
__dataclass_fields__: dict[str, Field]
# the actual class definition is marked as private, and here I define
# it as a forward reference, as I don't want to encourage
# importing private or "unexported" members.
__dataclass_params__: '_DataclassParams'
__post_init__: Callable | None
@dataclass
class Fruit:
color: str
taste: str
# noinspection PyTypeChecker
def get_cls() -> type[DataClass]:
return Fruit
attrs = get_cls().__annotations__ # <- No IDE warning, Yay!
Costs to class
def
To address the comments, there does appear to be a non-negligible runtime cost associated to class definitions – hence why I wrap the def with an if
block above.
The following code compares the performance with both approaches, to confirm this suspicion:
from __future__ import annotations
from dataclasses import dataclass, Field
from timeit import timeit
from typing import TYPE_CHECKING, Any, Callable, Iterable, Protocol
n = 100_000
print('class def: ', timeit("""
class DataClass(Protocol):
__dict__: dict[str, Any]
__doc__: str | None
__slots__: str | Iterable[str]
__annotations__: dict[str, str | type]
__dataclass_fields__: dict[str, Field]
__dataclass_params__: '_DataclassParams'
__post_init__: Callable | None
""", globals=globals(), number=n))
print('if <bool>: ', timeit("""
if TYPE_CHECKING:
class DataClass(Protocol):
__dict__: dict[str, Any]
__doc__: str | None
__slots__: str | Iterable[str]
__annotations__: dict[str, str | type]
__dataclass_fields__: dict[str, Field]
__dataclass_params__: '_DataclassParams'
__post_init__: Callable | None
""", globals=globals(), number=n))
Results, on Mac M1 running Python 3.10:
class def: 0.7453760829521343
if <bool>: 0.0009954579873010516
Hence, it appears to be much faster overall to wrap a class
definition (when used purely for type hinting purposes) with an if
block as above.
Solution – 3
While the provided solutions do work, I just want to add a bit of context.
IMHO your annotation is not wrong. It is just not strict enough and not all that useful.
Fruit
is a class. And technically speaking a class is a callable because type
(the class of all classes) implements the __call__
method. In fact, that method is executed every time you create an instance of a class; even before the class’ __init__
method. (For details refer to the "Callable types" subsection in this section of the data model docs.)
One problem with your annotation however, is that Callable
is a generic type. Thus, you should specify its type arguments. In this case you would have a few options, depending on how narrow you want your annotation to be. The simplest one that would still be correct here is the "catch-all" callable:
def get_cls() -> Callable[..., Any]:
return Fruit
But since you know that calling the class Fruit
returns an instance of that class, you might as well write this:
def get_cls() -> Callable[..., Fruit]:
return Fruit
Finally, if you know which arguments will be allowed for instantiating a Fruit
(namely the color
and taste
attributes you defined on the dataclass), you could narrow it down even further:
def get_cls() -> Callable[[str, str], Fruit]:
return Fruit
Technically, all of those are correct. (Try it with mypy --strict
.)
However, even that last annotation is not particularly useful since Fruit
is not just any Callable
returning a Fruit
instance, it is the class Fruit
itself. Therefore the most sensible annotation is (as @2e0byo pointed out) this one:
def get_cls() -> type[Fruit]:
return Fruit
That is what I would do as well.
I disagree with @rv.kvetch that removing the annotation is a solution (in any situation).
His DataClass
protocol is an interesting proposal. However I would advise against it in this case for a few reasons:
- It might give you all the magic attributes that make up any dataclass, but annotating with it makes you lose all information about the actualy specific class you return from
get_cls
, namelyFruit
. In practical terms this means no auto-suggestions by the IDE ofFruit
-specific attributes/methods. - You still have to place a type checker exception/ignore in
get_cls
because in the eyes of any static type checkertype[Fruit]
is not a subtype oftype[DataClass]
. The built-in dataclass protocol is a hack that is carried by specially tailored plugins formypy
,PyCharm
etc. and those do not cover this kind of structural subtyping. - Even the forward reference to
_DataclassParams
is still a problem because it will never be resolved, unless you (surprise, surprise) import that protected member from the depths of thedataclasses
package. Thus, this is not a stable annotation.
So from a type safety standpoint, there are two big errors in that code — the subtyping and the unresolved reference — and two minor errors; those being the non-parameterized generic annotations for __dataclass_fields__
(Field
is generic) and __post_init__
(Callable
is generic).
Still, I like protocols. Python is a protocol-oriented language. The approach is interesting.