Read and write from S3 buckets directly¶

You may want to read or write files to your S3 bucket from within a deployment. Deployments allow external internet connections, so you can perform these operations. In this how-to guide, we'll show you how it's done. Note that you can also connect a UbiOps bucket to your own S3 bucket. This will allow you to interact with your S3 bucket using our Client Libraries. See our docs to see how to set up this connection.

Setting up the right context¶

Each cloud provider has its own package for interacting with S3 buckets. For AWS, this package is called boto3. Therefore, we need to add boto3 to our requirements.txt file in our deployment_package. To access our S3 bucket, we also need to provide our deployment with the right parameters. We advise to add these parameters as environment variables, so that they can be stored as a secret, and can be updated more conveniently.

Enabling interactions with your S3 bucket¶

After importing the os package using import os, we import the environment variables in the __init__ function of our deployment class, and subsequently load our s3 object that allows us to pull and push files from and to our S3 object storage:

def __init__():
    #Import credentials and other variables that are required to interact with your S3 bucket,
    AWS_ACCESS_KEY_ID = os.environ['AWS_ACCESS_KEY_ID'] 
    AWS_SECRET_ACCESS_KEY = os.environ['AWS_ACCESS_KEY']
    AWS_REGION_NAME = os.environ['AWS_S3_REGION_NAME']
    self.BUCKET_NAME = os.environ['BUCKET_NAME'] 

    #Create the S3 bucket object, that allows interaction.
    self.s3 = boto3.client(
        service_name = 's3'
        region_name = AWS_S3_REGION 
        aws_access_key_id = AWS_ACCESS_KEY_ID
        aws_secret_access_key = AWS_SECRET_ACCESS_KEY
    )

Reading from and writing to your S3 bucket¶

Say you have a stored 10mb csv file 10mb_test_file.csv inside your bucket test_bucket. One way of fetching the content of this csv, is by first loading the file as an object, and then reading the body of this object:

def request():
    csv_obj = self.s3.get_object(Bucket = self.BUCKET_NAME, 
                            Key = '10mb_test_file.csv')
    content  = pd.read_csv(csv_obj['Body'], index_col=0)

You may want to process the content of our testfile, and push the results back to your bucket as a csv file. Say that you have stored your results in an object called results, and want to send the results as a csvto a path of choice:

def request():
    self.s3.upload_fileobj(fileobj = results, 
                      bucket = self.BUCKET_NAME, 
                      key = '/path/results.csv')

For more types of S3 interactions, see the Amazon S3 boto3 documentation.