Word Counting My Whole Site

Try doing it with this

My site is static HTML, built with Jekyll (more details in my colophon). This means I have a folder that contains the whole site in HTML files.

I wanted to find the total word count. I found this combination of commands works great:

find . -iname "*.html" | parallel pandoc -t plain  | wc -w

It uses:

It took about 2 seconds on my computer to tell me my site currently has about 75,000 words. More than I expected, though this counts words in footers etc. many times over.

Thanks to pandoc’s universality, you can also use this to count words in many file formats: markdown, reStructuredText, MS Word, etc.

If your site is more dynamic, but still small enough to download, you might consider using GNU wget. Its --recursive flag will let you download every page as HTML locally, following links to find everything on the website.

Fin

May you continue to increase your word count,

—Adam


😸😸😸 Check out my new book on using GitHub effectively, Boost Your GitHub DX! 😸😸😸


Subscribe via RSS, Twitter, Mastodon, or email:

One summary email a week, no spam, I pinky promise.

Related posts:

Tags: ,