Skip to content

Read and write from S3 buckets directly

You may have your own S3 bucket from which you want to pull data to your deployment, and subsequently push results back. This is possible, because deployments allow external internet connections. In this How-to we show how to interact with your S3 bucket from within your deployment.

Setting up the right context

You can interact with your S3 bucket, using the boto3 package. To this end, we add boto3 to our requirements.txt file that we add to our deployment_package. Then, to be able to access the content of our S3 bucket, we provide our deployment with the right parameters. We advise to add these parameters as environment variables, so that they can be changed without the requirement of pushing a new deployment_package to our API. Also, this allows credentials to be encrypted.

Enabling interactions with your S3 bucket

After importing the os package using import os, we import the environment variables in the __init__ function of our deployment class, and subsequently load our s3 object that allows us to pull and push files from and to our S3 object storage:

def __init__():
    #Import credentials and other variables that are required to interact with your S3 bucket,
    AWS_ACCESS_KEY_ID = os.environ['AWS_ACCESS_KEY_ID'] 
    AWS_REGION_NAME = os.environ['AWS_S3_REGION_NAME']
    self.BUCKET_NAME = os.environ['BUCKET_NAME'] 

    #Create the S3 bucket object, that allows interaction.
    s3 = boto3.client(
        service_name = 's3'
        region_name = AWS_S3_REGION 
        aws_access_key_id = AWS_ACCESS_KEY_ID
        aws_secret_access_key = AWS_SECRET_ACCESS_KEY

Reading from and writing to your S3 bucket

Say you have a stored 10mb csv file 10mb_test_file.csv inside your bucket test_bucket. One way of fetching the content of this csv, is by first loading the file as an object, and then reading the body of this object:

def request():
    csv_obj = s3.get_object(Bucket = self.BUCKET_NAME, 
                            Key = '10mb_test_file.csv')
    content  = pd.read_csv(csv_obj['Body'], index_col=0)

You may want to process the content of our testfile, and push the results back to your bucket as a csv file. Say that you have stored your results in an object called results, and want to send the results as a csvto a path of choice:

def request():
    s3.upload_fileobj(fileobj = results, 
                      bucket = self.BUCKET_NAME, 
                      key = '/path/results.csv')

For more types of S3 interactions, see the Amazon S3 boto3 documentation