Authenticated AWS S3 Downloads in V8

We’ve had quite a few customers express interest in downloading content from AWS S3 rather than persistently storing it on their RealityServer instances. While this introduces latency, in many use cases it can still make a lot of sense. Our recently released HTTP Request functionality for V8 makes it easy to download content from public URLs. What do you do if your S3 buckets require authentication though? Let’s dive in.

Introduction

This won’t be a comprehensive guide. S3 has a lot of different options and settings we won’t cover here but by far the most complex to get working is the authentication, so we are going to focus on this. Using this technique you’ll still download files to your RealityServer instance and then load them from disk but this can be done on-demand. Batch rendering is one example where this can make a lot of sense since it is not as latency sensitive. If you are also running your instances on AWS in EC2 then it is likely the download times will be quite reasonable as well.

For public S3 buckets and objects you don’t need to do anything special, just make a regular HTTP request. However to access buckets and their objects which are not flagged for public access requires jumping through some hoops.

AWS S3 Authentication

There are two different authentication methods currently in use for AWS S3, the now deprecated Signature Version 2 method and the current Signature Version 4 method. We’re only covering the Signature Version 4 method here since it works in all regions and really should be what you are using today. Getting Signature Version 2 working is a little easier, if you want an example of that contact us.

For the full details of the process please refer to the Authenticating Requests (AWS Signature Version 4) article published by Amazon. Basically the process involves computing HMAC and hash results from a very specific set of information and then setting the HTTP Authorization header based (and a few others) on the results. This process is of course only needed if your buckets are private.

This guide assumes you have a variable called s3_config which is an object containing a key, secret and region property. We like to put this into a JavaScript file in v8/include/s3_config.js that looks something like this.

const s3_config = {
    key: 'AKIAIOSFODNN7EXAMPLE',
    secret: 'wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY',
    region: 'us-east-1'
};

Then include it like this.

const s3_config = require('s3_config');

Of course you can populate this any way you like. Which ever key and secret you provide should include IAM permissions to perform whatever operations you want to do on your bucket objects. Before we can actually start computing the values needed to make this work, we also need some cryptographic functionality.

Cryptographic Tools in JavaScript

AWS S3 authentication requires the use of SHA256 and HMAC functionality in order to calculate the final string for the authorization header. Fortunately there is already a great JavaScript library called Forge which provides both of these in an easy to use way. With a little cajoling it can be made to work in the RealityServer V8 environment.

Installing and Using Forge

Download the latest release of Forge. Open the archive and extract the contents of the forge-x.x.x/lib directory into v8/include/forge in your RealityServer directory. It can go in any directory you have configured for V8 includes but should be in its own folder. Now within your command you can include Forge like this.

window = {};

const forge = require('forge/index');

The window variable needs to be defined for the library to work correctly. The library normally detects whether it is running in Node.js or the browser but doesn’t know anything about RealityServer so we have to trick it. Other than that it works perfectly out of the box. Now since you will call the SHA256 and HMAC functionality many times, you’ll want to create a couple of convenience functions.

function hmac_sha256(key, data) {
    let hmac = forge.hmac.create();
    hmac.start('sha256', key);
    hmac.update(data);
    return hmac.digest().getBytes();
}

function hash_sha256(data) {
    let sha = forge.md.sha256.create();
    sha.update(data);
    return sha.digest().toHex();
}

You’ll notice the hmac_sha256 function returns the bytes from the result while hash_sha256 returns the hex string. This is because the HMAC calls we will use are chained into each other as specified by the AWS documentation. What the documentation does not mention (at least not in the same place) is that each link in the chain expects the bytes from the last, not the hex string. We’ll use the Forge util.bytesToHex helper later to convert the last HMAC into the hex string needed.

Forge is now ready to use in your command. You might be surprised how many other off the shelf JavaScript libraries can be run in RealityServer V8 with a few tricks.

Computing the Authorization Header

Here is what you’ll need to build the needed headers to make the request. If you read the Amazon documentation you should recognise all of the elements here. This code assumes there are variables called bucket and key which define the AWS S3 bucket you want to access and the [Object Key] which uniquely identifies a given object within a bucket (not to be confused with the AWS access key).

// Required data
const host = `${bucket}.s3.amazonaws.com`;
const url = `https://${host}/${key}`;
const path = `/${bucket}/${key}`;
const algorithm = 'AWS4-HMAC-SHA256';
const time = new Date();
const date = time.toISOString().slice(0,10).replace(/-/g,"");
const time_stamp = time.toISOString().replace(/[-:]/g, '').split('.')[0] + 'Z';
const scope = `${date}/${s3_config.region}/s3/aws4_request`;

// The canonical request
const hashed_payload = hash_sha256('');
const http_method = 'GET';
const canonical_uri = `/${key}`;
const canonical_query_string = '';
const canonical_headers = `host:${host}\n`
    + `x-amz-content-sha256:${hashed_payload}\n`
    + `x-amz-date:${time_stamp}\n`;
const signed_headers = 'host;x-amz-content-sha256;x-amz-date';
const request = `${http_method}\n`
    + `${canonical_uri}\n`
    + `${canonical_query_string}\n`
    + `${canonical_headers}\n`
    + `${signed_headers}\n`
    + `${hashed_payload}`;
const canonical_request = hash_sha256(request);;

// The signing key
const signing_key = hmac_sha256(hmac_sha256(hmac_sha256(hmac_sha256(
        `AWS4${s3_config.secret}`, date), 'us-east-1'), 's3'), 'aws4_request');

// Sign the string and create authorization header
const string_to_sign = `AWS4-HMAC-SHA256\n${time_stamp}\n${scope}\n${canonical_request}`;
const signature = forge.util.bytesToHex(hmac_sha256(signing_key, string_to_sign));

// Required headers
const authorization = `AWS4-HMAC-SHA256 `
    + `Credential=${s3_config.key}/${date}/${s3_config.region}/s3/aws4_request,`
    + `SignedHeaders=host;x-amz-content-sha256;x-amz-date,Signature=${signature}`;
const x_amz_content_sha256 = hashed_payload;
const x_amz_date = time_stamp;

This is what is needed for a simple GET request. If you want to use other HTTP verbs and send body text or use query string parameters you’ll need to expand on this code. However the most troublesome elements are all covered in the above code. While there is a lot of good documentation from Amazon, the above still needed a fair amount of trial and error to get right (which we’ve done so you don’t have to).

Making the Request

So we want to download a binary scene file from S3. We’ve computed the headers above, how do we make the request? With the new HTTP request functionality in V8 that is very straight forward, here is the code.

const response = http.get({
    url: url,
    headers: {
        'Host': host,
        'x-amz-content-sha256': x_amz_content_sha256,
        'x-amz-date': x_amz_date,
        'Authorization': authorization
    },
    encoding: null
});

fs.writeFile('file.bin', response.body);

So it’s just a matter of setting some headers. Since this example is requesting a binary file, we set encoding to null. If you were downloading JSON data you can use the json property to have it automatically parsed. We can then write the file directly to disk. Obviously you’d want to use a specific name or derive it from the URL. Once the file is written you can of course access it from other RealityServer methods such as Scene.import or the filename property of the Image class.

Putting it Together

There are lots of ways to use the above. You could wrap it in a class and re-use it in many places or just put it straight into a command. You can take a look at a complete example command which shows a working version, you just need to provide the AWS credentials.

Going Further

This article only shows the basics of how to get the most common request type going (downloading a file with a GET request), you might need more functionality such as query string support. These can be easily added based on the code shown here. Also note that the Version 4 Signature authentication method is used by some other AWS services so you might be able to use this to access things other than S3.

Building a caching system to avoid re-downloading content is also something you could do to enhance this functionality further. This way only cold starts of new instances would need to download content and subsequent requests would just re-use the already downloaded files. You could implement your own cache policies and store the information in the RealityServer database as Attribute_container, a trick we use in our compositing V8 scripts to persist data between calls.

Another cool aspect of what has been shown in this article is the use of an off the shelf JavaScript library to save a lot of time by avoiding implementing functionality that has already been created by others (in this case cryptographic functions). We’ve tried a few other libraries with some great results. Libraries with complex dependencies are tricker to get working but self contained ones can usually be convinced to work.

Got another API or JavaScript library you’d like to see us try out. Get in touch and tell us more.

Articles Technical Tutorials