The Well-Maintained Test: 12 Questions for New Dependencies
Joel Spolsky’s infamous Joel Test is a quick heuristic test for checking a software engineering team’s technical chops. I’ve come up with a similar test that we can use to decide whether a new package we’re considering depending on is well-maintained.
I do not have the hubris to name the test after myself, so I present: The Well-Maintained Test.
Answer “yes” or “no” for the below questions by checking the new dependency’s website (if any), project page (npm, PyPI, etc.), and source control hosting (GitHub, GitLab, etc.).
The package scores one point for each “yes”. You’ll have to determine how many points are required to pass, based on your risk tolerance.
Bear in mind, whenever you answer “no”, that is an opportunity to contribute! You may find some issues are easily remediated.
- Is it described as “production ready”?
- Is there sufficient documentation?
- Is there a changelog?
- Is someone responding to bug reports?
- Are there sufficient tests?
- Are the tests running with the latest <Language> version?
- Are the tests running with the latest <Integration> version?
- Is there a Continuous Integration (CI) configuration?
- Is the CI passing?
- Does it seem relatively well used?
- Has there been a commit in the last year?
- Has there been a release in the last year?
Let’s examine each question in a bit more depth.
We want to see evidence that the maintainers consider the software as ready for use in production.
The documentation shouldn’t have any banners or wording implying a future stable release.
The version number should not be a pre-release, alpha, beta, release candidate, etc. Note that some maintainers stick with a “zero version number” like 0.4.0, even when they consider the package production ready.
If we can’t find information on what the package currently does, it seems doubtful the future will be easy.
“Sufficient” varies based upon: the scope of the library, the ecosystem, and your preferences.
Documentation comes in many forms: a README file, a documentation site, a wiki, blog posts, etc. Hopefully the package doesn’t make you hunt for it.
A changelog, or a release notes page, is vital for our ability to update the package. The changelog is the main place for communication of breaking changes. (A case for changelogs is made at keepachangelog.com.)
Changelogs come in many forms: a single file, a documentation section, GitHub release descriptions, etc. Again, hopefully the package doesn’t make you hunt for it.
Note that some projects “have a changelog”, but it has stopped being maintained since the project’s inception. So check that the changelog covers recent releases.
If recent bug reports have gone unanswered, it may be a sign that the package is no longer maintained. It’s worth ignoring any “spammy” open issues, and checking for recently closed issues since they are activity.
Check for issues like “is this still maintained?”… the answer is probably “no”, per Betteridge's law of headlines.
Tests give us confidence that future changes will not result in bugs.
Again, “sufficient” is context-dependent: testing norms in our language and ecosystem, ease of testing the functionality, and personal preferences.
Measurement of test coverage is normally a sign that the tests are higher quality. With coverage, maintainers can at least tell when changes affect untested code.
If there’s no proof of coverage, it’s worth opening a few test files, to check that they aren’t auto-created empty skeletons.
We can grant some leeway for very recent language versions. If Python 3.10 was released last Tuesday, we cannot expect every package to be up to date.
Testing against a new language version can be an easy way to contribute. Often the new version only needs adding to the test matrix, although that may reveal some bugs.
<Integration> here could mean a framework that the package is based on, like Django, or something the package interfaces with, like PostgreSQL. It could mean several things, in which case we can check them all.
The same conditions apply as for the latest <Language> version. And again, adding tests for a new version may be an easy way to contribute.
If there are tests, it’s likely there’s a CI system set up, such as GitHub Actions. We should check that this in place, and running correctly for recent changes.
Some projects configure CI but then ignore it or leave it unmaintained. CI may be failing, for one or more <Language> or <Framework> versions. If this has gone on for a while, it is a sign that maintenance is lagging.
Sometimes CI failure is caused by a single small bug, so fixing it may be a quick contribution. It can also be the case that old versions of <Language> or <Integration>s can simply be dropped.
We can guesstimate usage by checking recent download counts, and to a lesser extent, popularity metrics like GitHub’s “stars”. Many package indexes, like npm, show download counts on package pages. For PyPI, we can use pypistats.org.
We can only compare usage relative to similar packages, popularity of any <Integration>s, and our <Language>. A particularly niche tool may see minimal usage, but it might still beat any “competitor” packages.
Maintainers tend to abandon packages rather than explicitly mark them as unmaintained. So the probability of future maintenance drops off the longer a project has not seen a commit.
We’d like to see at least one recent commit as a “sign of life”.
Any cutoff is arbitrary, but a year aligns with most programming languages’ annual release cadence.
Here are some other things we should consider alongside the test.
Before considering a dependency, we should check its license is compatible with our needs. The main concern is the GPL family of licenses, which can be restrictive.
Carefully created packages tend to be carefully maintained. It can be hard to pin down what makes code and documentation “good”. But we can briefly inspect the package’s code and documentation to get a feel for it.
If the package maintainer(s) is/are known for other high quality packages, this counts positively. Reputation will probably correlate with the test.
If the maintainer(s) are using the package themselves, in production application, we know they have a vested interest in its continued development. Such “skin in the game” is a valuable signal. But it can be hard to determine if a package is actually in use (and how much), as most organizations do not openly discuss this.
Small packages are less of a risk than big ones. If you’re considering adopting a small package that you think you could later copy into your project, or rewrite in a few hours or days, you can lower the bar. But if you’re thinking of taking on a large “platform package”, such as a web framework, you’ll want to be stricter.
There are a few projects that attempt to quantify how well-maintained packages are. The resulting metrics exhibit the same problems as all metrics: Garbage In, Garbage Out (GIGO), not accounting for harder-to-acquire but important data, incentivizing superficial action, etc.
We should use such tools with caution. They are more useful as guides for our investigation, as opposed to absolute answers.
(I was originally going to cover a couple of these projects here, but they didn’t come out very favourably. I don’t want to just shit on others’ work.)
Thanks to the following people their contributions to the original Twitter thread and discussion: @fabiocerqueira, @hugovk, @rmcomplexity, Dan Palmer, Daniel Hepper Frank Wiles, Gordon Wrigley, Henry Schreiner III, Julius Šėporaitis, Jürgen Gmach, Tom Viner, Will McGugan, and Zellyn Hunter.
May you find well-maintained packages for your projects,
Improve your Django develompent experience with my new book.
One summary email a week, no spam, I pinky promise.