Skip to content

Deployment Input & Output

When creating a Deployment or Pipeline, you have to define what data the deployment expects when making a request, and what data it returns. UbiOps supports two types of data input and output:

  • Structured: Structured data consists of a dictionary (a named list in R) with one or more key value pairs with an associated data type (integer, string, double, boolean, array of integers, array of doubles, array of strings or a file (see Blob (file) handling for more details)).

  • Plain: Any string without structure that is JSON serializable.

This page gives an overview of all the data types that exist for plain and structured fields. It also explains how to convert some well known data structures to those data types.

You can use different data types for input and output, for example a deployment with structured input and plain output, or vice-versa.

Input

The request method has an input variable called data. This object will contain the data that is used as an input into the instance of the deployment.

  • When using a structured input, the data attribute will contain a Python dictionary or R named list.
    Note that when sending a file (a structured field of type 'blob') as input variable, the value will contain the absolute path where the deployment can access the file. For example, if you create a request with the file 'input.csv' as one of the input fields, the value will be a path. See blob (file) handling for more information.
# The data object is a dictionary

data = {
    "output_var_str_1": "Value 1",
    "input_var_num_2": 2,
    "input_var_file_3": "/home/deployment/blobs/{blob-id}/input_file.csv" # Or other extension
    }
# The input_data object is a named list

input_data <- list(output_var_str_1 = "Value1" , input_var_num_2 = 2 , input_var_file_3 = "/home/deployment/blobs/{blob-id}/input_file.csv")
  • For a plain input the data will contain the plain text string as input.
data = "input-string-as-plain-text"
input_data <- "input-string-as-plain-text"

Output

The request method is expected to return output in the format you defined when setting up the deployment. It is required that the output is JSON serializable.

  • When using a structured output, the returned value should be a JSON serializable dictionary (single), or a list of these JSON serializable dictionaries (plural).
    If a structured field is of type file (blob), the value for this key should contain the absolute path where the output file is stored inside the deployment. UbiOps will then collect this file and process it as an output of your request. You can easily create the absolute path using the 'base_directory' passed in the __init__ method of the deployment file. See file handling for more information.
return {
"output_var_str_1": "Value 1",
"output_var_num_2": 2,
"output_var_file_3": os.path.join(base_directory,"your_output_file.txt") # Or other path or extension
}
return(list(output1 = "Value 1" , output2 = 2 , output3 = file.path(base_directory, "your_output_file.txt"))
  • In case of plain output, the returned parameter should be a string.
return "output-string-as-plain-text"
"output-string-as-plain-text"

Data type for plain input/output

The data type for plain input or output is simple: It can be any type of unstructured string, as long as it's JSON serializable. The plain data type is perfect for when you're not exactly sure what the data will look like, or when it has a varying number of variables.

Data types for structured fields

The structured deployment fields and the corresponding format of the data are depicted in the table below. The data format is necessary when making a request programmatically via a script or http request. When making a request through the WebApp, it's not necessary to give the string characters (' or ").

Data type Data format
String 'string' or "string"
Integer 9999
Double precision 9999.99
Boolean True/False
Array of integers [88, 99, ... ]
Array of doubles [88.88, 99.99, ... ]
Array of strings ['string', "string", ... ]
Blob (file) '0186e35b-233e-45ce-ad01-3e1d204a0bc0' (a blob uuid)

The allowed data types are the same for R and Python deployments. UbiOps will handle all data in Python data types, but converts the data to R data types for R deployments.

Converting common data structures

Pipelines allow to connect the output fields of one deployment to the input fields of the next. Sometimes the data that you want to pass on, is in a specific data structure, for which there's no data type. Here is a list of examples on how to convert some common Python data structures to data types that can be passed on between deployments.

Pandas DataFrame

When developing Data Science applications in Python, data in tables is often stored in a Pandas DataFrame1. A DataFrame in itself is not possible to be outputted by a deployment, but there are ways to work around this.

df = pd.DataFrame([['a', 'b'], ['c', 'd']],
                  index=['row 1', 'row 2'],
                  columns=['col 1', 'col 2'])
  • Convert to JSON string: To output a DataFrame as a JSON string you can use the following snippet in the deployment code:

    output_df = df.to_json(orient='split')
    return {
        "df": output_df
    }
    

    To read the data back in as a DataFrame in the next deployment, use:

    df = pd.read_json(data['df'], orient='split')
    

  • Output as Blob (file): You can also choose to write a DataFrame to a csv/pickle/excel/any file and pass it to the next deployment as a Blob (file). The example shows how to output as a .csv file:

    df.to_csv('output.csv',index=True)
    return {
        "df": "output.csv"
    }
    

    To read the data back in as a DataFrame in the next deployment, use:

    df = pd.read_csv(data['df'])
    

Numpy Array

Another common data structure in Data Science is the NumPy Array2. A 1-dimensional Array can be passed on as string, but a multi-dimensional array should be passed as a Blob (file).

  • Convert to string: The NumPy method tostring() is deprecated from NumPy 1.19.0 onwards. Instead, use the following snippet in the deployment code to output an Array as string :

    arr = np.array([1,2,3,4,5,6,7])
    
    # First convert the array to bytes
    new_arr = arr.tobytes()
    
    # Then convert bytes to string, specifying the encoding
    decoded_array = new_arr.decode('utf-8')
    return {
        "decoded_array": decoded_array
    }
    

    To read the data back in as an Array in the next deployment, use:

    # First encode the string input to bytes, specifying the encoding
    bytes_array = data['decoded_array'].encode('utf-8')
    
    # Then load from buffer, specifying the data type of the original Array
    latest_arr = np.frombuffer(bytes_array, dtype=np.int64)
    

  • Output as Blob (file): The best way to pass a multi-dimensional Array between deployments is to output the Array as either a NumPy Binary file or a text file. The example shows how to output as a NumPy binary file:

    np.save('data.npy', np.array([[1, 2, 3], [4, 5, 6]]))
    return {
        "np_file": "data.npy"
    }
    

    To read the data back in as an Array in the next deployment, use:

    # Load the array
    arr = np.load(data['np_file'])
    


  1. Code examples taken from the Pandas Documentation 

  2. Code examples taken from the NumPy Documentation