If you’re deploying a site, it’s important to keep file sizes of your assets 1 as low as possible in order to speed up page load time and thus improve UX. A common approach to help with this is to
gzip your assets.
If you’re running, say, a Rails or Django site, there are a bunch of packages that will help automate this process for you and keep it fairly hands-off. However, this site 2 is statically generated via Hugo, which sadly doesn’t support minification or compression out of the box.
The site’s being stored on S3 and uploaded via the wonderful s3cmd (which itself is being invoked by Fabric). I wanted a nice, Pythonic way to gzip all of these assets and load them into S3, while making sure S3 served them correctly.
Gzipping static files with Python
The actual process of gzipping files is fairly easy, with the builtin
gzip module doing the real work. The trick is to use the
os.walk function to generate a complete recursive iterator of all files in your target directory, and then just gzip all the relevant ones:
def gzip_directory(directory): for root, dirs, files in os.walk(directory): for f in files: if os.path.splitext(f) in ['.html', '.css', '.js']: current_path = os.path.join(root, f) with open(current_path, 'rb') as f_in: with gzip.open(current_path + '.gz', 'wb') as f_out: f_out.writelines(f_in) os.rename(current_path + '.gz', current_path)
You can see it in my fabfile here.
Serving gzipped assets in S3 with s3cmd
Now, gzipping this stuff is only half the battle. You also have to be able to let browsers know that it’s gzipped by specifying the
Content-Encoding to be
gzip. Otherwise, it’ll end up looking something like this:
Thankfully, s3cmd exposes two CLI flags that make this fairly easy:
-exclude, which let you specify regexes to skip over when syncing an entire directory
—addHeader, which lets you, well, add headers to the static content when it’s served.
Previously, my s3cmd looked something like this:
s3cmd sync Angostura/public/ s3://getbarback.com -P —recursive
Which looks pretty simple:
Take every file in this folder and throw it in that bucket, make everything public, and do it recursively.
Now, though, I want to modify it a little to add that
Content-Encoding flag — but only for the assets I’ve gzipped!
So I came up with something like this:
s3cmd sync Angostura/public/ s3://getbarback.com -P —recursive --exclude '*.*' --include '*.html' --include '*.js' --include '*.css' --add-header="Content-Encoding: gzip" s3cmd sync Angostura/public/ s3://getbarback.com -P —recursive --exclude '*.html' --exclude '*.js' --exclude '*.css’`
Which is definitely a bit more complex, but still pretty understandable:
Take all the HTML, CSS, and JS files in this folder and throw it in that bucket, make everything public, then mark it as gzipped. Then, take everything else and throw it in the bucket too.
I cleaned this up a little bit and its in my fabfile as well.
That’s all you need! Gzip away.