Word Counting My Whole Site

2019-09-30 Try doing it with this

My site is static HTML, built with Jekyll (more details in my colophon). This means I have a folder that contains the whole site in HTML files.

I wanted to find the total word count. I found this combination of commands works great:

find . -iname "*.html" | parallel pandoc -t plain  | wc -w

It uses:

It took about 2 seconds on my computer to tell me my site currently has about 75,000 words. More than I expected, though this counts words in footers etc. many times over.

Thanks to pandoc’s universality, you can also use this to count words in many file formats: markdown, reStructuredText, MS Word, etc.

If your site is more dynamic, but still small enough to download, you might consider using GNU wget. Its --recursive flag will let you download every page as HTML locally, following links to find everything on the website. For an example see this GIST.


May you continue to increase your word count,


🦄 Working on a Django project? Check out my book Speed Up Your Django Tests.

Subscribe via RSS, Twitter, or email:

One summary email a week, no spam, I pinky promise.

Related posts:

Tags: commandline, jekyll