Word Counting My Whole Site2019-09-30
My site is static HTML, built with Jekyll (more details in my colophon). This means I have a folder that contains the whole site in HTML files.
I wanted to find the total word count. I found this combination of commands works great:
findto list the relative paths of all the HTML files locally.
parallelto run a command on each file in parallel.
pandocdocument converter to convert the input HTML to plain text.
wcto calculate the total word count.
It took about 2 seconds on my computer to tell me my site currently has about 75,000 words. More than I expected, though this counts words in footers etc. many times over.
Thanks to pandoc’s universality, you can also use this to count words in many file formats: markdown, reStructuredText, MS Word, etc.
If your site is more dynamic, but still small enough to download, you might consider using GNU
--recursive flag will let you download every page as HTML locally, following links to find everything on the website.
For an example see this GIST.
May you continue to increase your word count,
© 2019 All rights reserved.