Read and write from S3 buckets directly¶
You may have your own S3 bucket from which you want to pull data to your deployment, and subsequently push results back. This is possible, because deployments allow external internet connections. In this How-to we show how to interact with your S3 bucket from within your deployment.
Setting up the right context¶
You can interact with your S3 bucket, using the boto3
package. To this end, we add boto3
to our requirements.txt
file that we add to our deployment_package
. Then, to be able to access the content of our S3 bucket, we provide our deployment with the right parameters. We advise to add these parameters as environment variables, so that they can be changed without the requirement of pushing a new deployment_package
to our API. Also, this allows credentials to be encrypted.
Enabling interactions with your S3 bucket¶
After importing the os
package using import os
, we import the environment variables in the __init__
function of our deployment class, and subsequently load our s3
object that allows us to pull and push files from and to our S3 object storage:
def __init__():
#Import credentials and other variables that are required to interact with your S3 bucket,
AWS_ACCESS_KEY_ID = os.environ['AWS_ACCESS_KEY_ID']
AWS_SECRET_ACCESS_KEY = os.environ['AWS_ACCESS_KEY']
AWS_REGION_NAME = os.environ['AWS_S3_REGION_NAME']
self.BUCKET_NAME = os.environ['BUCKET_NAME']
#Create the S3 bucket object, that allows interaction.
s3 = boto3.client(
service_name = 's3'
region_name = AWS_S3_REGION
aws_access_key_id = AWS_ACCESS_KEY_ID
aws_secret_access_key = AWS_SECRET_ACCESS_KEY
)
Reading from and writing to your S3 bucket¶
Say you have a stored 10mb csv file 10mb_test_file.csv
inside your bucket test_bucket
. One way of fetching the content of this csv, is by first loading the file as an object, and then reading the body of this object:
def request():
csv_obj = s3.get_object(Bucket = self.BUCKET_NAME,
Key = '10mb_test_file.csv')
content = pd.read_csv(csv_obj['Body'], index_col=0)
You may want to process the content of our testfile, and push the results back to your bucket as a csv file. Say that you have stored your results in an object called results
, and want to send the results as a csv
to a path
of choice:
def request():
s3.upload_fileobj(fileobj = results,
bucket = self.BUCKET_NAME,
key = '/path/results.csv')
For more types of S3 interactions, see the Amazon S3 boto3 documentation