Git: find the largest commits

Recently, I was working in a new repository and found the git blame output often pointed back to a large repository-wide formatting commit (applying Black to all Python files). To ignore this commit in git blame, I added its SHA to a .git-blame-ignore-revs file, per this documentation:
# Format everything with Black
55c0bf219272801586b04c5be691e3aedcfc7254
While writing the .git-blame-ignore-revs file, I got wondering if there were any other large commits worth blame-ignoring.
Finding such commits is not straightforward with Git, as there’s no command to list commits sorted by the number of lines they change. Hence, I wrote the below script, which uses the output of git log to count changes and sort by them. Run it with Python (3.7+, I think) to list all commits in the repository, largest first. For example, in Django’s repository:
$ python git_largest_commits.py
Changes SHA Subject
285412 de8565e1c48f1c386a7b256e1ae585cbd8ff11b2 Removed app translation strings from core translation files.
257061 f27a4ee3270bd57299ce02d622978ac4d839137e Removed django.contrib.localflavor.
235861 9c19aff7c7561e3a82978a272ecdaad40dda5c00 Refs #33476 -- Reformatted code with Black.
232590 efa67b897b6ed5c6bbee1aa2646f4ba7ea6e2bc2 Fetched translations from Transifex
164493 7be43c910abbf538bba65cc8304896bdd1ba1d37 Added new translation files to localflavor contrib app.
...
Because the output is long, you will probably want to pipe it into less to avoid swamping your terminal:
$ python git_largest_commits.py | less
Here’s the script:
"""
List Git commits reachable from the current commit, sorted by the number of
changes they made, largest first.
https://adamj.eu/tech/2025/07/20/git-find-largest-commits/
"""
import math
import os
import re
import subprocess
import sys
def main():
result = subprocess.run(
["git", "log", "--pretty=format:%H\t%s", "--shortstat", "--no-merges"],
capture_output=True,
text=True,
check=True,
)
if result.returncode != 0:
print(result.stdout)
print(result.stderr, file=sys.stderr)
return result.returncode
commit_details = []
lines = result.stdout.splitlines()
i = 0
while i < len(lines):
commit_line = lines[i]
if i + 1 < len(lines) and lines[i + 1].startswith(" "):
stats_line = lines[i + 1]
i += 3 # move past commit, stats, and blank lines
else:
# Empty commit
stats_line = ""
i += 1 # move past commit line only
total_changes = 0
if stats_line:
matches = re.findall(r"(\d+) (?:insertion|deletion)", stats_line)
total_changes = sum(int(match) for match in matches)
commit_details.append((total_changes, commit_line))
if not commit_details:
print("No commits found.", file=sys.stderr)
return 1
commit_details.sort(key=lambda x: x[0], reverse=True)
# Calculate width based on largest number of changes
max_changes = commit_details[0][0]
if max_changes == 0:
width = 7 # "Changes"
else:
num_digits = len(str(max_changes))
width = math.ceil(num_digits / 3) * 3
sha_width = len(commit_details[0][1].split("\t")[0])
# Format and output
try:
print(f"{'Changes':<{width}}\t{'SHA':<{sha_width}}\tSubject")
for changes, commit in commit_details:
print(f"{changes:{width}d}\t{commit}")
sys.stdout.flush()
except BrokenPipeError:
# Python flushes standard streams on exit; redirect remaining output
# to devnull to avoid another BrokenPipeError at shutdown
devnull = os.open(os.devnull, os.O_WRONLY)
os.dup2(devnull, sys.stdout.fileno())
return 0
if __name__ == "__main__":
raise SystemExit(main())
The script takes these steps:
Run
git logwith specific options to outupt commit hashes, subjects, and short statistics. The output looks like:d63241ebc7067fdebbaf704989b34fcd8f26bbe9 Fixed #15727 -- Added Content Security Policy (CSP) support. 26 files changed, 1192 insertions(+), 1 deletion(-) 3f59711581bd22ebd0f13fb040b15b69c0eee21f Fixed #36366 -- Improved accessibility of pagination in the admin. 9 files changed, 118 insertions(+), 33 deletions(-)
There’s one wrinkle: empty commits only display the commit hash and subject, without the statistics line or blank lines:
0f94972033f4b27be6c902a6764c5d3d802ddea2 Example empty commit d63241ebc7067fdebbaf704989b34fcd8f26bbe9 Fixed #15727 -- Added Content Security Policy (CSP) support. 26 files changed, 1192 insertions(+), 1 deletion(-)
Parse the output, summing the insertions and deletions to get the total number of changes for each commit.
Sort by the total number of changes, largest first.
Output the results in a tabular format, with some calculation to ensure columns are aligned.
The
BrokenPipeErrorhandling here prevents errors when piping intolessor similar commands, per my previous post.
😸😸😸 Check out my new book on using GitHub effectively, Boost Your GitHub DX! 😸😸😸
One summary email a week, no spam, I pinky promise.
Related posts:
- Git: share a full repository as a file with
git fast-export - Git: generate statistics with
shortlog - Git: Improve conflict display with the
zdiff3style
Tags: git