While comparing various minification tools recently I soon discovered that there are plenty of options available.
Some minifiers focus on performance and only strip whitespace, remove comments (except for license notices) and maybe rename local variables to use shorter names. That usually accounts for the biggest reduction in size, but the same effect is usually already accomplishing by using gzip compression.
Other minifiers are more comprehensive and some even apply dead code elimination, which usually requires evaluating the source internally (and therefore is a lot slower).
For websites that are pushed into production, minification performance is usually less important and achieving the highest reduction in file size (using only safe minifications) is what counts.
From my findings and related benchmarks, the best available minification tools right now for the usual web assets are:
- Terser for JS files. Terser is the successor of UglifyJS and is the default option in webpack.
- LightningCSS for CSS files.
- html-minifier-terser for HTML files. This is a fork of html-minifier and also maintained by the Terser people.
Just for fun, I decided to pull in the most popular websites (by Alexa rank) and run them through these tools to see what potential savings there could be.
The good news is that most websites are doing really well, as I was only able to shave off about 11 kilobytes on average.
Minifying the most popular websites on the intermet
Using a list of the most popular websites out there, I fired up a Python script1 to download the HTML for each homepage2.
It then parsed the HTML to look for any stylesheets
and scripts and downloaded these too. After running these
files through through
respectively, gzipped sizes were compared and written to
a CSV for later analysis.
Only safe minification techniques were used, so more aggressive techniques that could affect functionality were omitted.
What follows is a summary of the results (in bytes):
On average, about 11 kB worth of data could be saved by using these minification tools instead of whatever these websites are using now.
Compared to what certain page builders are outputting nowadays, this is actually really good!
Anything multiplied by such gigantic numbers will amount to a lot. And this is only using safe minification techniques, so normally quite trivial to improve upon.
To better understand just how much data this might amount to in total, we have to look at cache lifetimes too.
While downloading the asset files, I inspected the
HTTP headers for cache directives. The average time (in
seconds) was taken across all of the assets that had
Cache-Control or an
Expires header, or
0 if the
response included no such header.
|expires (s)||expires (h)|
The median cache lifetime encountered was 1 month. 25% of websites asked the browser to cache their assets for 24 hours and 10% asked for just 5 minutes.
I think the above is quite good already. Even taking into account that the results might be underestimating things because it only looks at assets defined in the static HTML.
It shows that these popular websites are pretty much all applying best practices we've known for years:
- Use gzip compression. Only a handful requests out of several thousand did not have gzip compression enabled for their responses3, and IIRC most of these were for error responses.
- Minify your assets in production. Across the top 500 websites, I was only able to shave off an average of 11 kilobytes per website.
- Instruct the browser that your assets can be cached in between requests. Over 50% of these popular websites had an average cache directive of about 1 month.
Energy cost of data transmission
In 2021, data transmission was good for about 1.4% of global electricy usage. Imagine what this number would be if we did not have gzip compression, browser caches and minification.
A few years ago I wrote about CO2 emissions on the web where I went with an estimate of 0.5 kWh per GB of data transfered. Since then I've seen a lot of additional discussion about the energy cost of data transfer, with estimates still varying wildly.
Whatever the actual number is, the good news is that data transmission still seems to be getting more efficient. Let's make sure these efficiency gains aren't negated because of Jevon's paradox, shall we?
2 This approach ignores any dynamically inserted assets, because only assets linked from the static HTML are downloaded and evaluated.
3 gzip is probably the real hero of this story. It's mind boggling to think of how much data is saved because of this compression algorithm.