If you’re deploying a site, it’s important to keep file sizes of your assets 1 as low as possible in order to speed up page load time and thus improve UX. A common approach to help with this is to gzip your assets.

If you’re running, say, a Rails or Django site, there are a bunch of packages that will help automate this process for you and keep it fairly hands-off. However, this site 2 is statically generated via Hugo, which sadly doesn’t support minification or compression out of the box.

The site’s being stored on S3 and uploaded via the wonderful s3cmd (which itself is being invoked by Fabric). I wanted a nice, Pythonic way to gzip all of these assets and load them into S3, while making sure S3 served them correctly.

Gzipping static files with Python

The actual process of gzipping files is fairly easy, with the builtin gzip module doing the real work. The trick is to use the os.walk function to generate a complete recursive iterator of all files in your target directory, and then just gzip all the relevant ones:

def gzip_directory(directory):
    for root, dirs, files in os.walk(directory):
        for f in files:
            if os.path.splitext(f)[1] in ['.html', '.css', '.js']:
                current_path = os.path.join(root, f)
                with open(current_path, 'rb') as f_in:
                    with gzip.open(current_path + '.gz', 'wb') as f_out:
                os.rename(current_path + '.gz', current_path)

You can see it in my fabfile here.

Serving gzipped assets in S3 with s3cmd

Now, gzipping this stuff is only half the battle. You also have to be able to let browsers know that it’s gzipped by specifying the Content-Encoding to be gzip. Otherwise, it’ll end up looking something like this:

Not good.

Thankfully, s3cmd exposes two CLI flags that make this fairly easy:

  • -include and -exclude, which let you specify regexes to skip over when syncing an entire directory
  • —addHeader, which lets you, well, add headers to the static content when it’s served.

Previously, my s3cmd looked something like this:

s3cmd sync Angostura/public/ s3://getbarback.com -P —recursive

Which looks pretty simple:

Take every file in this folder and throw it in that bucket, make everything public, and do it recursively.

Now, though, I want to modify it a little to add that Content-Encoding flag — but only for the assets I’ve gzipped!

So I came up with something like this:

s3cmd sync Angostura/public/ s3://getbarback.com -P —recursive  --exclude '*.*' --include '*.html' --include '*.js' --include '*.css' --add-header="Content-Encoding: gzip"
s3cmd sync Angostura/public/ s3://getbarback.com -P —recursive  --exclude '*.html' --exclude '*.js' --exclude '*.css’`

Which is definitely a bit more complex, but still pretty understandable:

Take all the HTML, CSS, and JS files in this folder and throw it in that bucket, make everything public, then mark it as gzipped. Then, take everything else and throw it in the bucket too.

I cleaned this up a little bit and its in my fabfile as well.

That’s all you need! Gzip away.

  1. I’m defining assets here as HTML, CSS, and JS, but obviously media should be resized and optimized as well. 

  2. The one you’re reading this post on, aka the one that also has dope recipes 

If you enjoyed this article, you should , share it on facebook, or save it on pocket. Thanks!