Using blobs to (temporarily) store files in between pipeline steps¶
Is there any way of storing something between two deployments of a single pipeline? E.g. pipeline storage. As I would like to store something between two separate deployments (write to a file and then read form the file again). Are there examples somewhere on how to use it?
Yes, there are two ways to access (file) storage during a pipeline request and they both use
Blobs. Blobs can be any unstructured data, which means files with any extension (.txt, .csv, .jpeg, .pickle, .wav, etc.). We have an extensive page in our docs on them as well.
Blobs are persistent for the duration of the time-to-live (TTL) you configure when creating a blob. The default value is 1 day, but you can set any TTL between 900 seconds (15 minutes) and 31536000 seconds (1 year). They are deleted after this time has passed, just like the requests.
Option 1: Passing a blob directly¶
If the two deployments that need to share a file come straight after each other, a file can be passed between them via the structured Blob type field. You can edit the fields of your existing deployments and add a Blob type field on the output of the first and the input of the second deployment and link them in the pipeline. The Blobs documentation page holds information on how to prepare Blobs to pass and open blobs inside your deployment package.
Option 2: Uploading and retrieving Blobs from Blob storage¶
If the two deployments do not come directly after each other, or you want the Blobs to be persistent (up until the TTL), there’s a second option involving the UbiOps client libraries. In the first deployment you upload a Blob to UbiOps in your code using the client libraries. In any deployment after that, you download it again from our storage using the id that was given when it was created. You do still need to pass the id of the blob to the second deployment to be able to retrieve it. This is also described on the Blobs docs page. Alternatively, if that is not an option, you could list and iterate over all the blobs in the second deployment to get the id of the designated blob.
Note that in the event that you want to overwrite a blob (this is handy when you want to always use the ‘latest' version of a file), it’s also possible to update a blob.
We have two tutorials that cover Blobs, they can be found under the tutorials tab in the docs:
Using blobs as temporary storage: UbiOps Technical Documentation - Using blobs as temporary storage. This tutorial only shows the proper way to store a file (does not retrieve it later)
Using a blob for storage between deployments: UbiOps Technical Documentation - Creating a training and production pipeline. This tutorial does cover the storing of the blob and retrieving it later (by iterating over the blobs) and using it in a deployment. The source notebook for the tutorial can be found on github as well.