Deployment Input & Output¶

When creating a Deployment or Pipeline, you have to define what data the deployment expects when making a request, and what data it returns. UbiOps supports two types of data input and output:

Structured: Structured data consists of a dictionary with one or more key value pairs with an associated data type (integer, string, double, boolean, array of integers, array of doubles, array of strings or a file (see working with files for more details)).
Plain: Any string without structure that is JSON serializable.

This page gives an overview of all the data types that exist for plain and structured fields and explains how to convert some well known data structures to those data types.

You can use different data types for input and output, for example a deployment with structured input and plain output, or vice-versa.

Input & output fields in your deployment code

The input and output field names provided in the WebApp when creating a deployment must be the same as the fields names that are accessed in the deployment code.

Input¶

The request method has an input variable called data. This object will contain the data that is used as an input into the instance of the deployment.

When using structured input, the data attribute will contain a Python dictionary .
Note that when sending a file (a structured field of type File) as input variable, the value will contain the absolute path where the deployment can access the file. For example, if you create a request with the file 'input.csv' as one of the input fields, the value will be a path. See working with files for more information. (./)

# The data object is a dictionary

data = {
    "output_var_str_1": "Value 1",
    "input_var_num_2": 2,
    "input_var_file_3": "/home/deployment/files/{file-id}/input_file.csv"
    }

For a plain input the data will contain the plain text string as input.

data = "input-string-as-plain-text"

Output¶

The request method is expected to return output in the format you defined when setting up the deployment. It is required that the output is JSON serializable.

When using a structured output, the returned value should be a JSON serializable dictionary (single), or a list of these JSON serializable dictionaries (plural). If a structured field is of type File, the value for this key should contain the absolute path where the output file is stored inside the deployment. UbiOps will then collect this file and process it as an output of your request.
```
return {
    "output_var_str_1": "Value 1",
    "output_var_num_2": 2,
    "output_var_file_3": os.path.join(base_directory,"your_output_file.txt") # Or other path or extension
}
```
It is also possible to write your output to a specific (directory inside a) bucket directly. See file handling for an example on that, and for more extensive snippets on how to handle files in requests.
In case of plain output, the returned parameter should be a string.
```
return "output-string-as-plain-text"
```

Data type for plain input/output¶

The data type for plain input or output is simple: It can be any type of unstructured string, as long as it's JSON serializable. The plain data type is perfect for when you're not exactly sure what the data will look like, or when it has a varying number of variables.

Data types for structured fields¶

The structured deployment fields and the corresponding format of the data are depicted in the table below. The data format is necessary when making a request programmatically via a script or http request. When making a request through the WebApp, it's not necessary to give the string characters (' or ").

Data type	Data format	Reference
String	`'string'` or `"string"`	`string`
Integer	`9999`	`int`
Double precision	`9999.99`	`double`
Boolean	`True`/`False`	`bool`
File	`'ubiops-file://{bucket-name}/{filename}'` (a file URI)	`file`
Array of strings	`['string', "string", ... ]`	`array_string`
Array of integers	`[88, 99, ... ]`	`array_int`
Array of doubles	`[88.88, 99.99, ... ]`	`array_double`
Array of files	`['ubiops-file://{bucket-name}/{filename1}', 'ubiops-file://{bucket-name}/{filename2}', ...]`	`array_file`
Dictionary	`dict`	`dict`

The reference name of a data type should be used when creating structured input or output fields using the UbiOps API, Client Libraries or CLI.

Converting common data structures¶

Pipelines allow to connect the output fields of one deployment to the input fields of the next. Sometimes the data that you want to pass on, is in a specific data structure, for which there's no data type. Here is a list of examples on how to convert some common Python data structures to data types that can be passed on between deployments.

Pandas DataFrame¶

When developing Data Science applications in Python, data in tables is often stored in a Pandas DataFrame[^1]. A DataFrame in itself is not possible to be outputted by a deployment, but there are ways to work around this.

df = pd.DataFrame([['a', 'b'], ['c', 'd']],
                  index=['row 1', 'row 2'],
                  columns=['col 1', 'col 2'])

Convert to JSON string: To output a DataFrame as a JSON string you can use the following snippet in the deployment code:

output_df = df.to_json(orient='split')
return {
    "df": output_df
}

To read the data back in as a DataFrame in the next deployment, use:

df = pd.read_json(data['df'], orient='split')

Output as file: You can also choose to write a DataFrame to a csv/pickle/excel/any file and pass it to the next deployment as a file. The example shows how to output as a .csv file:

df.to_csv('output.csv',index=True)
return {
    "df": "output.csv"
}

To read the data back in as a DataFrame in the next deployment, use:

df = pd.read_csv(data['df'])

Numpy Array¶

Another common data structure in Data Science is the NumPy Array[^2]. A 1-dimensional Array can be passed on as string, but a multidimensional array should be passed as a file.

Convert to string: The NumPy method tostring() is deprecated from NumPy 1.19.0 onwards. Instead, use the following snippet in the deployment code to output an Array as string :

arr = np.array([1,2,3,4,5,6,7])

# First convert the array to bytes
new_arr = arr.tobytes()

# Then convert bytes to string, specifying the encoding
decoded_array = new_arr.decode('utf-8')
return {
    "decoded_array": decoded_array
}

To read the data back in as an Array in the next deployment, use:

# First encode the string input to bytes, specifying the encoding
bytes_array = data['decoded_array'].encode('utf-8')

# Then load from buffer, specifying the data type of the original Array
latest_arr = np.frombuffer(bytes_array, dtype=np.int64)

Output as file: The best way to pass a multidimensional Array between deployments is to output the Array as either a NumPy Binary file or a text file. The example shows how to output as a NumPy binary file:

np.save('data.npy', np.array([[1, 2, 3], [4, 5, 6]]))
return {
    "np_file": "data.npy"
}

To read the data back in as an Array in the next deployment, use:

# Load the array
arr = np.load(data['np_file'])

[^1] Code examples taken from the Pandas Documentation

[^2] Code examples taken from the NumPy Documentation