Uploading files to Amazon S3 through API Gateway

Note: there is now an official AWS blog on this topic that I co-authored with my colleague Josh.

In this article I will walk through the different options for uploading files to Amazon S3 via API Gateway. This allows you to create a standard API endpoint for your end users, as well as implement your own authentication mechanisms.

The main issue to contend with is the 10MB payload limit in API Gateway. I will cover a solution for the upload of files larger than 10MB using Amazon S3 pre-signed URLs generated by API Gateway and Lambda.

At the end I will introduce an alternative solution using CloudFront and Lambda@Edge instead of API Gateway.

Using API Gateway as a direct proxy

API Gateway Proxy Architecture

If you know the payload size will always be below the 10 MB limit then the simplest option is to proxy requests to the S3 PUT API. API Gateway service quota limits state that 10 MB is a hard limit that cannot be increased.

  1. Create an S3 bucket to be used for the uploads e.g. my-apigw-uploads

  2. Create an IAM Role for API Gateway to access the bucket.

    1. For the Role use case select API Gateway.
    2. Add a permission policy to the Role to give PutItem to the S3 bucket e.g.
    {
        "Version": "2012-10-17",
        "Statement": [
            {
                "Sid": "APIGWPutAllow",
                "Effect": "Allow",
                "Action": "s3:PutObject",
                "Resource": "arn:aws:s3:::my-apigw-uploads/*"
            }
        ]
    }
    
    1. Copy the ARN of the Role to be used later.
  3. Create a new API Gateway REST API and give it a name e.g. s3-proxy-api.

    API Gateway new Regional API

  4. Create a new resource named upload (or choose your own) and then a child resource underneath named object with resource path /{object}. Make sure to use brackets as this will be a path parameter used later.

    API Gateway new resource object

  5. Under /{object} create a new PUT method. For the integration type use AWS Service and enter the following:

    • AWS Region = the same region used for the S3 bucket
    • AWS Service = Simple Storage Service (S3)
    • AWS Subdomain = leave blank
    • HTTP method = PUT (this is the HTTP method used to call the S3 API)
    • Action Type = use path overide (as we are going to set the S3 bucket and path here)
    • Path override = my-apigw-uploads/{key} (remember to use your own S3 bucket name)
    • Execution role = the arn from the IAM role created earlier. This should have s3:PutObject permissions to the specified bucket.
    • Content Handling = Passthrough

    API Gateway new PUT proxy

  6. Click save.

  7. Under the PUT method select the Integration Request box. Under URL Path Parameters add the following:

    • Name = key
    • Mapped from = method.request.path.object API Gateway URL Path Parameter This will map what the user specifies in the URL path (i.e. /{object}) to the path override value {key} (i.e. what the S3 API expects).
  8. Next on the nagivation pane on the left choose Settings. In the Binary Media Types section choose Add Binary Media Type. Add the string */* and click Save changes. You can also use a particular MIME type you want to treat as binary media instead of using a wildcard e.g. image/jpeg. The wildcard will treat all media types as binaries.

    API Gateway Binary Media Types

  9. All that’s left is to deploy the API to a new stage e.g. v1

  10. Once the API has deployed we can test the upload using Postman.

    • Set the HTTP method to PUT
    • Set the URL to the API gateway URL and append /uploads/<object-name>
    • Choose binary for the Body type and upload your desired file (remember - less than 10 MB!)
    • If successful you will see a 200 OK response

    API Gateway Postman Upload

  11. Checking in our S3 bucket we should now see the uploaded file.

    API Gateway Postman Upload S3

Congratulations! You have now successfully created an API Gateway proxy for S3 uploads. Remember to add some form of authentication to your API to prevent anonymous users from uploading to your S3 storage!

Generating a pre-signed URL for large file upload

API Gateway Proxy Architecture

If a file is larger than 10 MB we can still upload it to S3 via a presigned URL generated by API Gateway. This does require two calls from the client, but will allow files up to 5 GB in size (this is the maximum file size for a single PUT operation).

Note: objects larger than 5 GB must be uploaded using a multipart upload using the AWS SDK, AWS CLI or REST API. This will then allow an object up to 5 TB in size.

  1. Create an S3 bucket to be used for the uploads e.g. my-apigw-uploads

  2. Create a new Lambda function in the same region as the S3 bucket using your lanaguage of choice (this article uses Python 3.9).

  3. Add permissions to the generated Lambda IAM Role to allow it to create presigned urls. Use the below policy as an example. (Note: presigned URLs must be created by someone who has the permission to perform the operation. In our case s3:PutObject. There is no specific action to allow presigned urls to be generated.)

    {
        "Version": "2012-10-17",
        "Statement": [
            {
                "Sid": "APIGWPutAllow",
                "Effect": "Allow",
                "Action": "s3:PutObject",
                "Resource": "arn:aws:s3:::my-apigw-uploads/*"
            }
        ]
    }
    
  4. Copy and deploy the Lambda function code below. In this case we are using the boto3 SDK for Python and the generate_presigned_url() API call.

    import json
    import boto3
    
    s3_client = boto3.client('s3')
    bucket_name = 'my-apigw-uploads'
    content_type = 'text/plain' # This could also come from the API Gateway HTTP headers.
    expiration = '3600'
    
    def lambda_handler(event, context):
        object_name = event['pathParameters']['object'] # Use the {object} path parameter
        response = s3_client.generate_presigned_url('put_object',
                                                    Params={'Bucket': bucket_name,
                                                            'Key': object_name,
                                                            'ContentType': content_type
                                                    },
                                                    ExpiresIn=expiration,
                                                    )
    
    
        return {
            'statusCode': 200,
            'body': json.dumps(response) # The response contains the presigned URL
        }
    
  5. Create a new API Gateway REST API and give it a name e.g. s3-presign-api

  6. Create a new resource name /upload, with a child resource {object}. Create a GET method under the {object} resource. Set the integration type to Lambda Function and make sure to select Use Lambda Proxy integration.

    API Gateway Lambda Presign Integration

  7. Click Save and click OK when asked to give API Gateway permissions.

  8. All that’s left now is to deploy the API to a new stage e.g. v1 and test!

  9. To test use Postman

    • First make a GET call to the API Gateway generated URL e.g. https://a1b2c3d4.execute-api.eu-west-1.amazonaws.com/v1/upload/test.txt

      API Gateway Postman Presign

    • Use the returned pre-signed URL to make a PUT call with your selected binary as the body.

      API Gateway Postman Presign PUT

    If successful you will recieve a HTTP 200 response. You should also find your uploaded file in the S3 bucket!

Alternative option: Proxy using CloudFront and Lambda@Edge

API Gateway CloudFront Proxy Architecture This is actually a fairly simple option. It uses CloudFront in front of an S3 bucket to proxy the PUT requests. You can then use Lambda@Edge to perform your own authentication. This method will allow a max file size of 5 GB.

  1. Create an S3 bucket to be used for the uploads e.g. my-apigw-uploads

  2. In the CloudFront Console create a new distribution with the following settings (leave everything else default):

    • Origin domain = your S3 bucket
    • S3 bucket access = Yes use OAI (create a new OAI and say yes to update bucket policy) CloudFront Distribution Settings
    • View protocol policy = HTTPS only
    • Allowed HTTP methods = GET, HEAD, OPTIONS, PUT, POST, PATCH, DELETE (we need PUT!)
    • Cache policy = CachingDisabled (we don’t need to cache anything) CloudFront Distribution Settings 2
  3. Leave everything else as default and click Create distribution

  4. Once the distibution has been created go back to the S3 Console and select the bucket created earlier. Now edit the S3 bucket policy to change the default OAI permission from s3:GetObject to s3:PutObject. This will allow uploads and prevent any read operations.

    {
        "Version": "2008-10-17",
        "Id": "PolicyForCloudFrontPrivateContent",
        "Statement": [
            {
                "Sid": "1",
                "Effect": "Allow",
                "Principal": {
                    "AWS": "arn:aws:iam::cloudfront:user/CloudFront Origin Access Identity E2VEV1F34O3TP0"
                },
                "Action": [
                    "s3:PutObject"
                ],
                "Resource": "arn:aws:s3:::my-apigw-uploads/*"
            }
        ]
    }
    
  5. At this point you should be able to test an upload as there is no authentication with Lambda@Edge yet. Use Postman to submit a PUT request to https://<your-cloudfront-distro>/<s3-object-name>. You should receive 200 OK if successful.

    CloudFront Postman Test

  6. As a final step let’s introduce some authentication with Lambda@Edge. Create a new Lambda function in us-east-1 (N. Virginia) (us-east-1 is a requirement for Lambda@Edge functions). For the execution role make sure to use the policy template Basic Lambda@Edge permissions (for CloudFront trigger). I am using Python 3.9 as the language.

    New Lambda@Edge Function

  7. Copy the code below into your Lambda function. The code is very basic and checks to see if my-secret-key is present in the Authorization header in the HTTP request made by the client. In production you may want something more secure, such as validation of a JWT against an identity provider such as Amazon Cognito, Auth0, AzureAD or Okta.

    import json
    
    def lambda_handler(event, context):
    
        print(event)
    
        response = event["Records"][0]["cf"]["request"]
        headers = response["headers"]
    
        if 'authorization' not in headers or headers['authorization'] == None:
            return unauthorized()
    
        if headers['authorization'] == 'my-secret-key':
            return request
    
        return response
    
    def unauthorized():
        response = {
                'status': "401",
                'statusDescription': 'Unauthorized',
                'body': 'Unauthorized'
            }
        return response
    

    Below is an example of the event payload sent to the Lambda function

    {
        'Records': [{
            'cf': {
                'config': {
                    'distributionDomainName': 'a1b2c3d4e5.cloudfront.net',
                    'distributionId': 'ABCDEF12345',
                    'eventType': 'viewer-request',
                    'requestId': '0nPLqasQLuZYzFShnLdVMPTFP97BMufYc786E5J3yWZVdrUBDWoz0Q=='
                },
                'request': {
                    'clientIp': '185.104.136.29',
                    'headers': {
                        'host': [{
                            'key': 'Host',
                            'value': 'a1b2c3d4e5.cloudfront.net'
                        }],
                        'user-agent': [{
                            'key': 'User-Agent',
                            'value': 'PostmanRuntime/7.28.4'
                        }],
                        'content-length': [{
                            'key': 'content-length',
                            'value': '57'
                        }],
                        'accept': [{
                            'key': 'Accept',
                            'value': '*/*'
                        }],
                        'content-type': [{
                            'key': 'Content-Type',
                            'value': 'application/json'
                        }]
                    },
                    'method': 'PUT',
                    'querystring': '',
                    'uri': '/response.json'
                }
            }
        }]
    }
    
  8. Click Deploy to save the function code.

  9. To associate the function to the CloudFront distribution click Add trigger in the Lambda function Console.

    • Disribution = your CloudFront distribution created earlier
    • CloudFront event = Viewer request (we want to react to PUT events from the client)

    Lambda@Edge Function Trigger

  10. Click Add and wait for the function to be deployed to the CloudFront distribution (this can take a few minutest).

  11. All that’s left now is to test. Using Postman as before issue a PUT request to https://<your-cloudfront-domain>/<object-name> with a binary payload as the body. You should recieve a 401 Unauthorized response this time.

    CloudFront Postman 401

  12. Next add the Authorization header with value my-secret-key and submit the request again. This time the request will work and you should recieve a 200 OK message.

Note: if you want to check the Lambda logs for Lambda@Edge functions you will need to look in the region closest to the request, and not the same region as the function. For me the logs appeared in eu-west-2 (London).