Skip to content

Data types

When creating a Deployment or Pipeline, you have to provide the type of input and/or output of the object, which can be plain or structured. This page gives an overview of all the data types that exist for plain and structured fields. It also explains how to convert some well known data structures to those data types.

Data type for plain input/output

The data type for plain input or output is simple: It can be any type of unstructured string, as long as it's JSON serializable. The plain data type is perfect for when you're not exactly sure what the data will look like, or when it has a varying number of variables.

Data types for structured fields

The structured deployment fields and the corresponding format of the data are depicted in the table below. The data format is necessary when making a request programmatically via a script or http request. When making a request through the WebApp, it's not necessary to give the string characters (' or ").

Data type Data format
String 'string' or "string"
Integer 9999
Double precision 9999.99
Boolean True/False
Array of integers [88, 99, ... ]
Array of doubles [88.88, 99.99, ... ]
Array of strings ['string', "string", ... ]
Blob (file) '0186e35b-233e-45ce-ad01-3e1d204a0bc0' (a blob uuid)

The allowed data types are the same for R and Python deployments. UbiOps will handle all data in Python data types, but converts the data to R data types for R deployments.

Converting common data structures

Pipelines allow to connect the output fields of one deployment to the input fields of the next. Sometimes the data that you want to pass on, is in a specific data structure, for which there's no data type. Here is a list of examples on how to convert some common Python data structures to data types that can be passed on between deployments.

Pandas DataFrame

When developing Data Science applications in Python, data in tables is often stored in a Pandas DataFrame1. A DataFrame in itself is not possible to be outputted by a deployment, but there are ways to work around this.

df = pd.DataFrame([['a', 'b'], ['c', 'd']],
                  index=['row 1', 'row 2'],
                  columns=['col 1', 'col 2'])
  • Convert to JSON string: To output a DataFrame as a JSON string you can use the following snippet in the deployment code:

    output_df = df.to_json(orient='split')
    return {
        "df": output_df
    }
    

    To read the data back in as a DataFrame in the next deployment, use:

    df = pd.read_json(data['df'], orient='split')
    

  • Output as Blob (file): You can also choose to write a DataFrame to a csv/pickle/excel/any file and pass it to the next deployment as a Blob (file). The example shows how to output as a .csv file:

    df.to_csv('output.csv',index=True)
    return {
        "df": "output.csv"
    }
    

    To read the data back in as a DataFrame in the next deployment, use:

    df = pd.read_csv(data['df'])
    

Numpy Array

Another common data structure in Data Science is the NumPy Array2. A 1-dimensional Array can be passed on as string, but a multi-dimensional array should be passed as a Blob (file).

  • Convert to string: The NumPy method tostring() is deprecated from NumPy 1.19.0 onwards. Instead, use the following snippet in the deployment code to output an Array as string :

    arr = np.array([1,2,3,4,5,6,7])
    
    # First convert the array to bytes
    new_arr = arr.tobytes()
    
    # Then convert bytes to string, specifying the encoding
    decoded_array = new_arr.decode('utf-8')
    return {
        "decoded_array": decoded_array
    }
    

    To read the data back in as an Array in the next deployment, use:

    # First encode the string input to bytes, specifying the encoding
    bytes_array = data['decoded_array'].encode('utf-8')
    
    # Then load from buffer, specifying the data type of the original Array
    latest_arr = np.frombuffer(bytes_array, dtype=np.int64)
    

  • Output as Blob (file): The best way to pass a multi-dimensional Array between deployments is to output the Array as either a NumPy Binary file or a text file. The example shows how to output as a NumPy binary file:

    np.save('data.npy', np.array([[1, 2, 3], [4, 5, 6]]))
    return {
        "np_file": "data.npy"
    }
    

    To read the data back in as an Array in the next deployment, use:

    # Load the array
    arr = np.load(data['np_file'])
    


  1. Code examples taken from the Pandas Documentation 

  2. Code examples taken from the NumPy Documentation