Deployment Input & Output¶
When creating a Deployment or Pipeline, you have to define what data the deployment expects when making a request, and what data it returns. UbiOps supports two types of data input and output:
-
Structured: Structured data consists of a dictionary with one or more key value pairs with an associated data type (integer, string, double, boolean, array of integers, array of doubles, array of strings or a file (see working with files for more details)).
-
Plain: Any string without structure that is JSON serializable.
This page gives an overview of all the data types that exist for plain and structured fields and explains how to convert some well known data structures to those data types.
You can use different data types for input and output, for example a deployment with structured input and plain output, or vice-versa.
Input & output fields in your deployment code
The input and output field names provided in the WebApp when creating a deployment must be the same as the fields names that are accessed in the deployment code.
Input¶
The request
method has an input variable called data
. This object will contain the data that is used as an input into the instance of the deployment.
- When using structured input, the
data
attribute will contain a Python dictionary .
Note that when sending a file (a structured field of typeFile
) as input variable, the value will contain the absolute path where the deployment can access the file. For example, if you create a request with the file 'input.csv' as one of the input fields, the value will be a path. See working with files for more information. (./)
# The data object is a dictionary
data = {
"output_var_str_1": "Value 1",
"input_var_num_2": 2,
"input_var_file_3": "/home/deployment/files/{file-id}/input_file.csv"
}
- For a plain input the
data
will contain the plain text string as input.
data = "input-string-as-plain-text"
Output¶
The request
method is expected to return output in the format you defined when setting up the deployment. It is required that the output is JSON serializable.
- When using a structured output, the returned value should be a JSON serializable dictionary (single), or a list of these JSON serializable dictionaries (plural).
If a structured field is of typeFile
, the value for this key should contain the absolute path where the output file is stored inside the deployment. UbiOps will then collect this file and process it as an output of your request. See file handling for more information.
return {
"output_var_str_1": "Value 1",
"output_var_num_2": 2,
"output_var_file_3": os.path.join(base_directory,"your_output_file.txt") # Or other path or extension
}
- In case of plain output, the returned parameter should be a string.
return "output-string-as-plain-text"
Data type for plain input/output¶
The data type for plain input or output is simple: It can be any type of unstructured string, as long as it's JSON serializable. The plain data type is perfect for when you're not exactly sure what the data will look like, or when it has a varying number of variables.
Data types for structured fields¶
The structured deployment fields and the corresponding format of the data are depicted in the table below. The data format is necessary when making a request programmatically via a script or http request. When making a request through the WebApp, it's not necessary to give the string characters ('
or "
).
Data type | Data format | Reference |
---|---|---|
String | 'string' or "string" | string |
Integer | 9999 | int |
Double precision | 9999.99 | double |
Boolean | True /False | bool |
File | 'ubiops-file://{bucket-name}/{filename}' (a file URI) | file |
Array of strings | ['string', "string", ... ] | array_string |
Array of integers | [88, 99, ... ] | array_int |
Array of doubles | [88.88, 99.99, ... ] | array_double |
Array of files | ['ubiops-file://{bucket-name}/{filename1}', 'ubiops-file://{bucket-name}/{filename2}', ...] | array_file |
Dictionary | dict | dict |
The reference name of a data type should be used when creating structured input or output fields using the UbiOps API, Client Libraries or CLI.
Converting common data structures¶
Pipelines allow to connect the output fields of one deployment to the input fields of the next. Sometimes the data that you want to pass on, is in a specific data structure, for which there's no data type. Here is a list of examples on how to convert some common Python data structures to data types that can be passed on between deployments.
Pandas DataFrame¶
When developing Data Science applications in Python, data in tables is often stored in a Pandas DataFrame
1. A DataFrame
in itself is not possible to be outputted by a deployment, but there are ways to work around this.
df = pd.DataFrame([['a', 'b'], ['c', 'd']],
index=['row 1', 'row 2'],
columns=['col 1', 'col 2'])
- Convert to JSON string: To output a
DataFrame
as a JSON string you can use the following snippet in the deployment code:
output_df = df.to_json(orient='split')
return {
"df": output_df
}
To read the data back in as a DataFrame
in the next deployment, use:
df = pd.read_json(data['df'], orient='split')
- Output as file: You can also choose to write a
DataFrame
to a csv/pickle/excel/any file and pass it to the next deployment as a file. The example shows how to output as a .csv file:
df.to_csv('output.csv',index=True)
return {
"df": "output.csv"
}
To read the data back in as a DataFrame
in the next deployment, use:
df = pd.read_csv(data['df'])
Numpy Array¶
Another common data structure in Data Science is the NumPy Array
2. A 1-dimensional Array
can be passed on as string, but a multi-dimensional array should be passed as a file.
- Convert to string: The
NumPy
methodtostring()
is deprecated fromNumPy
1.19.0 onwards. Instead, use the following snippet in the deployment code to output anArray
as string :
arr = np.array([1,2,3,4,5,6,7])
# First convert the array to bytes
new_arr = arr.tobytes()
# Then convert bytes to string, specifying the encoding
decoded_array = new_arr.decode('utf-8')
return {
"decoded_array": decoded_array
}
To read the data back in as an Array
in the next deployment, use:
# First encode the string input to bytes, specifying the encoding
bytes_array = data['decoded_array'].encode('utf-8')
# Then load from buffer, specifying the data type of the original Array
latest_arr = np.frombuffer(bytes_array, dtype=np.int64)
- Output as file: The best way to pass a multi-dimensional
Array
between deployments is to output theArray
as either aNumPy Binary
file or a text file. The example shows how to output as aNumPy binary
file:
np.save('data.npy', np.array([[1, 2, 3], [4, 5, 6]]))
return {
"np_file": "data.npy"
}
To read the data back in as an Array
in the next deployment, use:
# Load the array
arr = np.load(data['np_file'])
-
Code examples taken from the Pandas Documentation ↩
-
Code examples taken from the NumPy Documentation ↩