Skip to content

Introduction to requests

Making requests

Deployments and pipelines in UbiOps receive data through their REST API endpoints. One request triggers a single run of the code inside the deployment.

Before making a request, a Deployment or Pipeline should already have been created in the platform.

Request types

There are 2 types of requests: direct and batch requests.

  • Direct requests take a single data payload and keep a connection open during processing, after which the result is returned immediately. It is synchronous.

  • Batch requests are asynchronous. They require two API calls to send the data and retrieve the results. A batch request can take up to 250 data payloads simultaneously, for each of them a separate request_id is returned with which the results can be retrieved at a later time.

You can send both types of requests to both deployments and pipelines.

Request timeouts

Every request has a configurable timeout parameter, which indicates the time after which the request is automatically aborted. This is a great way to terminate code or connections that may hang unexpectedly. The value for the timeout of a request depends on the type of request and for what type of object the request is made. In the table below you can see the range of the different timeout ranges and their default values.

Please take into account that the timeout includes the full request duration as well as the time it takes for an instance to initialize and start up (a cold start).

Request Type Object Type Default timeout Minimum timeout Maximum timeout
Direct Deployment 300 seconds 10 seconds 3600 seconds
Direct Pipeline 3600 seconds 10 seconds 7200 seconds
Batch Deployment 14400 seconds 10 seconds 72 hours
Batch Pipeline 14400 seconds 10 seconds 72 hours

Timeout for batch requests

The timeout for batch requests starts when the request status changes to processing, meaning that the timeout period does not start while the request status is pending. However, if a request does not start processing within the maximum request duration of 72 hours seconds, the request status will be set to failed and the request will not start.

Request statuses

Every request in UbiOps has a status indicating what's happening with the request. The different statuses are:

  • Pending: The request is currently pending. UbiOps is trying to assign the request to an active deployment version instance. If no instances are available, UbiOps will first spin up a new instance.
  • Processing: The request is picked up by an instance for processing.
  • Completed: The request has finished succesfully and the results are available.
  • Failed: The request has failed with an error message indicating why.

Info about active instances

Is your request staying longer in a pending status than you would expect? Have a look at the active instances info to see what's happening in the background.

Deployment Requests using the WebApp

You can easily make deployment requests using the UbiOps WebApp. This comes in handy when testing if your deployment behaves as it should. The WebApp will create the API request for you.

create

You can test your deployment by creating a request for it.

  1. Click on Deployments in the sidebar on the left and then click on your deployment name.

  2. Now click CREATE REQUEST and insert or upload the data you want to include in the test request. Click Create request to create the request.

Default version

This will make a request to the default version of your deployment. If you want to create a request for a specific deployment version, first navigate to the specific deployment version by clicking on its name in the versions table. Then, you can find the same button CREATE REQUEST to make a request to that specific version.

  1. When the deployment has finished processing a direct request, you can see the results of your request immediately.
  2. You could also create batch requests for your deployment. A batch request will be visible in the requests overview page of the version while processing. If the request retention mode of the version is set to metadata or full, you can find all completed requests here as well for the remaining request retention time. See Request retention for more information on request storage.

Deployment Requests using the deployment/pipeline interface

An interface is automatically generated for each deployment and pipeline. The only purpose of this interface is to make requests to your deployment or pipeline. You could share this interface with other people that don't have access to the UbiOps WebApp but just want to use the deployment or pipeline. You can find this deployment and pipeline interface on https://requests.ubiops.com or by opening your deployment details page, click the Use deployment tab and click Go to deployment interface.

Currently, only direct requests are supported.

API Token

For sending data to a deployment or pipeline through the UbiOps deployment and pipeline interface you will need an API Token with the correct permissions.

The deployment and pipeline interface can be customized using widgets. The deployment interface can be configured by opening your deployment details page, click the Use deployment tab and click Configure deployment interface. The widgets that can be chosen depend on the data types of the deployment's input and output fields. For instance, there are specific widgets for numbers and files. A preview of the deployment interface is shown on the right side of the screen. You can interact with the input widgets, but don't make an actual request because it is just a preview. The output widgets are displayed with dummy data.

Deployment Requests using the API

Each deployment, pipeline and their versions get unique API endpoints when created. You can use these endpoints to make requests with data to process.

The following text describes how to send data to a deployment or pipeline using the UbiOps API and applies to both deployment and pipeline requests. For examples see Request examples

API Token

For sending data to a deployment through the UbiOps API you will need an API Token with the correct permissions.

Request Schedules

It is also possible to schedule requests instead of triggering them with an API call. To make periodic requests to deployments and pipelines, you can use the Request Schedule functionality. See Request Schedules for more information.

API endpoints for making requests

Deployment API endpoints:

  • The direct request endpoint for deployments is used to send one data payload to the deployment. Available for deployments with both structured as plain input.
    URL format: https://api.ubiops.com/v2.1/projects/{project_name}/deployments/{deployment_name}/requests
  • The batch request endpoint for deployment allows sending multiple data payloads to the deployment. Available for deployments with both structured as plain input.
    URL format: https://api.ubiops.com/v2.1/projects/{project_name}/deployments/{deployment_name}/requests/batch
  • The direct request endpoint for deployments with streaming is used to send one data payload to the deployment and to stream the response. Available for deployments with both structured as plain input. The deployment code needs to have streaming logic implemented.
    URL format: https://api.ubiops.com/v2.1/projects/{project_name}/deployments/{deployment_name}/requests/stream

Pipeline API endpoints:

  • The direct request endpoint for pipelines is used to send one data payload to the pipeline. Available for pipelines with both structured as plain input.
    URL format: https://api.ubiops.com/v2.1/projects/{project_name}/pipelines/{pipeline_name}/requests
  • The batch request endpoint for pipelines allows sending multiple data payloads to the pipeline. Available for pipeline with both structured as plain input.
    URL format: https://api.ubiops.com/v2.1/projects/{project_name}/pipelines/{pipeline_name}/requests/batch

Making requests to specific deployment or pipeline versions

The endpoints linked to above will make a request to the default version of you deployment or pipeline. If you want to make a request to a specific version that is not the default version, you can use:

For a full list of available API endpoints related to requests in UbiOps, please see our Swagger page.

Data payload for the API request

To make a request you need to use the correct endpoint specifying the deployment or pipeline and request type (see above) and include the request data payload in the body.

In case of deployments/pipelines with structured input, you need to provide a JSON dictionary with the input fields of the deployment/pipeline as keys and the data as values. In case of batch request, you need to provide a list of dictionaries containing the input fields of the deployment/pipeline as keys and the data as values.

Below you can see an example of how to format the request data payload for a direct request with a structured input format.

{
  "input-field-1": 5.0,
  "input-field-2": "N",
  "input-field-3": [0.25, 0.25, 2.1, 16.3]
}
Ane example of a succesful response of a direct requests looks like:
{
  "id": "ffce45da-1562-419a-89a0-0a0837e55392",
  "deployment": "deployment-1",
  "version": "v2",
  "status": "completed",
  "result": {
    "output-field-1": "2.1369",
    "output-field-2": "5.5832",
  },
  "error_message": null
}

A failed deployment request looks like:

{
  "id": "85ae32a7-fe3a-4a55-be27-9db88ae68501",
  "deployment": "deployment-1",
  "version": "v1",
  "status": "failed",
  "result": None,
  "error_message": "Asset ID not supported"
}

A pipeline request will return you the result including all results and error messaged of the intermediate steps as in:

{
  "id": "286f771b-6617-4985-ab49-12ed720e62b1",
  "pipeline": "pipeline-1",
  "version": "v1",
  "status": "failed",
  "error_message": "Error while processing a deployment request",
  "deployment_requests": [
    {
      "id": "a7524614-bdb7-41e1-b4c1-653bb72c30b4",
      "pipeline_object": "deployment-object-1",
      "status": "completed",
      "error_message": null
    },
    {
      "id": "fe322c50-58f8-4e67-b7d6-cba14273874e",
      "pipeline_object": "deployment-object-2",
      "status": "failed",
      "error_message": "Invalid message format" 
    }
  ],
  "result": {
    "output_field": 23.5
  }
}

In case a deployment expects plain input data, a single string or list of string input is required that will be sent as-is to the deployment. In case of a batch request, a list of string input is required.

Sending files as part of the payload

If your deployment expects a file field as input, you have to provide the file URI in the input. The file first needs to be uploaded through the file API endpoint before making the request. Read the page on file handling for more information.

{
  "input-field-1": 5.0,
  "file-input-field": "ubiops-file://example-bucket/example_file.png"
}

For more information and to try things out, you can visit our Swagger page.

Streaming requests

The request() method of deployments supports streaming as well, which can be helpful when you are working with large language models. To make your deployment compatible for request streaming, you simply need to use streaming_update to stream your response chunks. Below you can see an example deployment template for deploying Gemma to UbiOps with streaming support.

import os
from transformers import AutoTokenizer, AutoModelForCausalLM, TextIteratorStreamer
from huggingface_hub import login
from threading import Thread


class Deployment:

    def __init__(self, base_directory, context):

        # Log in to huggingface
        token = os.environ["HF_TOKEN"]
        login(token=token)

        # Download Gemma from Huggingface
        model_id = os.environ.get("MODEL_ID", "google/gemma-1.1-2b-it")
        self.tokenizer = AutoTokenizer.from_pretrained(model_id)
        self.model = AutoModelForCausalLM.from_pretrained(model_id)


    def request(self, data, context):
        user_prompt = data["prompt"]
        streaming_update = context["streaming_update"]

        inputs = self.tokenizer(user_prompt, return_tensors="pt")

        # Here we initiate a TextIteratorStreamer object from the transformers library to stream Gemma's response
        streamer = TextIteratorStreamer(self.tokenizer, skip_prompt=True)

        generation_kwargs = dict(inputs, streamer=streamer, max_new_tokens=256)

        # The TextIteratorStreamer requires a thread which we start here
        thread = Thread(target=self.model.generate, kwargs=generation_kwargs)
        thread.start()

        generated_text = ""
        for new_text in streamer:
            # We use the streaming_callback from UbiOps to send partial updates
            streaming_update(new_text)
            generated_text += new_text

        return generated_text

In this example we use the built in TextIteratorStreamer from HuggingFace and use the streaming_update object from UbiOps to pass our stream to the UbiOps API. Once we have the full response from the LLM we return that in our return statement. For all the details on this example, please see our Gemma with streaming support tutorial.

To make a streaming request to a deployment that has streaming logic, you can use the /stream endpoint:

https://api.ubiops.com/v2.1/projects/{project_name}/deployments/{deployment_name}/requests/stream

The streamed responses will be returned in the following format:

  b"event:update",
  b"data:<your data>"

Once the request is completed, the full response will be returned as usual.

See our request examples page for an example with the client library.

Retrieving batch request results

The result of a direct request is always immediate, meaning that the result is returned in the API call response as soon as its computed. If your deployment was idle before the request, this can take a bit longer than when the deployment was still active.

When you make a batch request, the following response is returned in the body:

[
  {
    "id": "3fa85f64-5717-4562-b3fc-2c963f66afa6",
    "status": "pending",
    "time_created": "2020-07-24T09:51:15.360Z"
  }
]

The status of the request is always pending upon initialization and will later change into processing, failed or completed. Therefore you need to regularly poll the API endpoint to check if the request has finished processing.

After making the request, you can collect the result of the request with the request id returned upon making the request. You can either collect the result of a single request or the results of multiple requests.

  • For the results of one deployment batch request, you can use the GET deployment requests method. URL format: https://api.ubiops.com/v2.1/projects/{project_name}/deployments/{deployment_name}/requests/collect

  • For the results of one pipeline batch request, the GET pipeline requests method may be used. URL format: https://api.ubiops.com/v2.1/projects/{project_name}/pipelines/{pipeline_name}/requests/collect

The following actions are also permitted for batch requests:

  • Retrieve multiple batch deployment and pipeline requests: Retrieve the results of multiple batch requests in one call. This method takes in a list of the request id's.
  • Delete multiple batch deployment and pipeline requests: Delete the multiple batch requests in one call. This method takes in a list of the request id's. All batch requests that were not yet 'completed' or 'failed' are terminated.
  • List all batch deployment and pipeline requests: Give an overview of all batch requests for a deployment or pipeline. Selection and sorting options are available.
  • Delete a single batch deployment and pipeline request: Delete a single batch request and terminate it.

API status codes

Whenever you make a request to the UbiOps API it will return a status code. In the table below you can see an overview of the most common status codes and in what scenarios they are returned.

Status code Scenario
400 Bad request or incorrect input data
401 Unauthorized
403 Permission denied
429 Rate limit reached for requests
404 Endpoint, deployment or other object not found
500 Problems in UbiOps or in case of a request this can be an exception in the user code
504 Deployment or pipeline request timed out
502/503 Problems in UbiOps, try again later