Python: store extra data for objects in a WeakKeyDictionary

In several programs, I’ve wanted to solve the problem of associating extra data with an object. For example, in django-upgrade, the individual “fixer” functions often want to store extra data per visited ast.Module object.
A common pattern in Python is to store the data in an extra attribute directly on the object, like module._all_used_names = .... However, this approach has some downsides:
- The object may not allow arbitrary attributes, such as for built-in types like
dictor slotted classes. - Attribute names can collide across use cases. Defences against this include using a long, verbose attribute name and prefixing it with an underscore, but they don’t provide any guarantees.
- Attributes may confusingly appear in other code paths that expose all attributes on the object, such as where
vars()is used.
Here’s a pattern that I’ve used to (mostly) avoid these issues:
import ast
from weakref import WeakKeyDictionary
used_names_cache: WeakKeyDictionary[ast.Module, set[str]] = WeakKeyDictionary()
def all_used_names(module: ast.Module) -> set[str]:
try:
return used_names_cache[module]
except KeyError:
pass
names = set()
... # populate set
used_names_cache[module] = names
return names
The idea is to use a WeakKeyDictionary to store the extra data, keyed by the object. This special dictionary is keyed by the object, but because it uses a weak reference, if the object is no longer (strongly) referenced elsewhere, the dictionary entry will also be deleted. We get similar lookup performance (O(1)) to a typical attribute approach, but now the data lives “over here” rather than “over there” on the object itself, avoiding the downsides of direct attribute storage.
The object must satisfy two conditions: it must be hashable, and it must be weak-referenceable.
Classes are hashable by default in Python, unless they define a custom
__eq__method without a corresponding__hash__method, so this requirement is usually met.Most user-defined classes are weak-referenceable by default. Some built-in types, including
int,str,list, anddict, cannot be weak-referenced directly. Slotted classes are not weak-referenceable by default, but can opt into it by adding__weakref__in their__slots__definition:class Train: __slots__ = ("__weakref__", "wheels", "engine")
The extra slot expands the memory footprint slightly, but not above a vanilla class which has a hidden weakref “slot”.
Why not functools.cache?
You might wonder why not to use functools.cache for the above pattern, like:
from functools import cache
@cache
def all_used_names(module: ast.Module) -> set[str]:
names = set()
... # populate set
return names
The main reason is that functools.cache uses strong references to its arguments. That means even if a given module object has no more “live” references in the rest of the code, the reference in the cache will keep it alive, preventing it from being garbage collected and its memory from being freed.
😸😸😸 Check out my new book on using GitHub effectively, Boost Your GitHub DX! 😸😸😸
One summary email a week, no spam, I pinky promise.
Related posts:
- Python: fix
SyntaxWarning: 'return' in a 'finally' block - Python: fix
BrokenPipeErrorwhen piping output to other commands
Tags: python