Python: store extra data for objects in a WeakKeyDictionary

A filing cabinet, strong and sturdy but with weak keys.

In several programs, I’ve wanted to solve the problem of associating extra data with an object. For example, in django-upgrade, the individual “fixer” functions often want to store extra data per visited ast.Module object.

A common pattern in Python is to store the data in an extra attribute directly on the object, like module._all_used_names = .... However, this approach has some downsides:

Here’s a pattern that I’ve used to (mostly) avoid these issues:

import ast
from weakref import WeakKeyDictionary

used_names_cache: WeakKeyDictionary[ast.Module, set[str]] = WeakKeyDictionary()


def all_used_names(module: ast.Module) -> set[str]:
    try:
        return used_names_cache[module]
    except KeyError:
        pass

    names = set()
    ...  # populate set
    used_names_cache[module] = names
    return names

The idea is to use a WeakKeyDictionary to store the extra data, keyed by the object. This special dictionary is keyed by the object, but because it uses a weak reference, if the object is no longer (strongly) referenced elsewhere, the dictionary entry will also be deleted. We get similar lookup performance (O(1)) to a typical attribute approach, but now the data lives “over here” rather than “over there” on the object itself, avoiding the downsides of direct attribute storage.

The object must satisfy two conditions: it must be hashable, and it must be weak-referenceable.

  1. Classes are hashable by default in Python, unless they define a custom __eq__ method without a corresponding __hash__ method, so this requirement is usually met.

  2. Most user-defined classes are weak-referenceable by default. Some built-in types, including int, str, list, and dict, cannot be weak-referenced directly. Slotted classes are not weak-referenceable by default, but can opt into it by adding __weakref__ in their __slots__ definition:

    class Train:
        __slots__ = ("__weakref__", "wheels", "engine")
    

    The extra slot expands the memory footprint slightly, but not above a vanilla class which has a hidden weakref “slot”.

Why not functools.cache?

You might wonder why not to use functools.cache for the above pattern, like:

from functools import cache


@cache
def all_used_names(module: ast.Module) -> set[str]:
    names = set()
    ...  # populate set
    return names

The main reason is that functools.cache uses strong references to its arguments. That means even if a given module object has no more “live” references in the rest of the code, the reference in the cache will keep it alive, preventing it from being garbage collected and its memory from being freed.

Fin

May your code be strong even when your references are weak,

—Adam


😸😸😸 Check out my new book on using GitHub effectively, Boost Your GitHub DX! 😸😸😸


Subscribe via RSS, Twitter, Mastodon, or email:

One summary email a week, no spam, I pinky promise.

Related posts:

Tags: