Skip to content

Compute & scaling settings

UbiOps offers fine-grained settings to be able to configure your deployment according to your scaling and availability needs. You can control how many instances of your deployment are running at any time with the amount of resources that you configure. The scaling setting for training experiments are fixed.

advanced settings

Deployment versions

Note that the settings described on this page must be configured for every deployment version, the settings for a training experiment are fixed.

Maximum and minimum number of instances

UbiOps allows you to specify the maximum and minimum number of active deployment instances concurrently running. You can choose a value when creating and updating a deployment version by entering the desired value field in the dropdown list Advanced parameters.

This setting enables you to manage the resources used by the UbiOps auto-scaler. UbiOps will analyze the deployment workload and automatically decide to increase/decrease the number of available instances for the served deployment.

If you set the Minimum number of instances to 0, your deployment instances will be scaled down to 0 when no requests are sent to the deployment and the maximum idle time is reached (see the Maximum idle time section for more information). When the deployment instance is scaled to 0, it will require a cold start time for your first request. Cold start stands for your deployment instance to be started up and become ready to handle requests.

For a value of the Minimum number of instances greater than 0, UbiOps will always keep an active instance of your deployment, meaning that there won't be a cold start for your first request. However, scaling up to more than the minimum number of instances that you specified might still result in a cold start.

If you want to disable the auto-scaling feature, you can set the same value for the maximum and minimum number of instances. For example, if you set both the Minimum number of instances and the Maximum number of instances to 3, you will always have 3 active deployment instances and auto-scaling will not be applied. This will also overrule the Maximum idle time, i.e., after the maximum idle time has elapsed with no deployment requests you will still have 3 active instances.

For batch requests, UbiOps will perform concurrent deployment requests based on the maximum number of instances.

Minimum number of instances

Setting a value for the Minimum number of instances greater than 0 will force UbiOps to always keep an active instance of your deployment. This might result in unexpected high compute credit usage.

Maximum idle time

The Maximum idle time indicates the time (in seconds) after which the number of active deployment instances is scaled down to the value specified in Minimum number of instances, if no requests are performed. If the Minimum number of instances is set to 0, the deployment instance will shut down after the Maximum idle time has expired, resulting in a cold start for the next request. The Maximum idle time can be overruled by setting the Minimum number of instances to a value higher than 0. For example, if you set the Minimum number of instances to 1, UbiOps will keep 1 active instance of your deployment, even after the maximum idle time is reached.

Scaling down time

After the maximum idle time is reached, the instance will scale down between 0 and 60 seconds.

Default scaling settings for training experiments

The default scaling settings for training experiments are set as follows:

Min. instances Max. instances Max. idle time (seconds)
0 5 300

These can however be updated with the deployment_version_update endpoint, and setting the parameters deployment_name to training-base-deployment, and version to the name of your experiment.

Instance Type

Instance types determine the memory, vCPU, and storage allocation for your deployment. CPU allocation automatically scales with memory, with 1 vCPU core per 4 GiB of memory. This is a hard limit, so the memory usage of the deployment cannot exceed the limit defined by the instance type. Local storage scales with memory as well, 4 GiB per 1 GiB of memory. This means deployments can write to a local directory, but not unlimited. The local directory is not preserved over deployment restarts. UbiOps also supports instance types with GPUs, see GPU Deployments.

UbiOps uses a hybridcloud set-up, allowing us to source a wide variety of instance types from different environments, that can accomodate your needs and requirements in terms of compute and data locality. Please contact us if you have specific requirements for your compute.

The following instance types are available on UbiOps SaaS:

Instance Type Memory (MiB) vCPU GPU Storage (MiB) Credit Rate (credit/hr)
256mb 256 0.062 n/a 1024 0.25
512mb 512 0.125 n/a 2048 0.5
1024mb 1024 0.25 n/a 4096 1
2048mb 2048 0.5 n/a 8192 2
4096mb 4096 1 n/a 16384 4
8192mb 8192 2 n/a 32768 8
12288mb 12288 3 n/a 49152 12
16384mb 16384 4 n/a 65536 16
16384mb_t41 16384 4 1 x NVIDIA Tesla T4 65536 48
16384mb_l41 16384 4 1 x NVIDIA Ada LoveLace L4 65536 48
96000mb_l4_2x1 16384 24 2 x NVIDIA Ada LoveLace L4 150000 130
76000mb_a1001 76000 11 1 x NVIDIA Ampere A100 (40GB) 304000 140
180000mb_a1001 180000 22 1 x NVIDIA Ampere A100 (80GB) 250000 280

You can choose a value when creating and updating a deployment version by picking the desired instance type from the drop-down list in the Scaling and resource settings in the UbiOps WebApp.

Allocate a large enough instance type

Keep in mind that this value should be large enough to run both your deployment package and any installed packages.

Subscriptions with GPUs available

The GPU instance types may not be available for your subscription by default. Please contact our sales in order to make them available for you.

Deploy UbiOps inside your own cloud environment

It is possible to deploy workloads from UbiOps in your own cloud environment. In that case, you can make use of compute resources from your cloud environment. Please contact sales for more information.

Limits

The limits on the previously described settings are as following:

Parameter Lower Limit Upper Limit Description
Minumum number of instances 0 20 Minimum number of allowed active instances
Maximum number of instances 1 20 Maximum number of allowed active instances
Maximum idle time 10 3600 Time (in seconds) after which the number of active deployment instances will scale down to the value specified in the Minimum number of instances
Instance Type 256 MiB 16384 MiB The instance type defines which compute resources are allocated for your deployment

  1. UbiOps has support for GPU instance types, but this feature is not enabled for customers by default. Please contact sales for more information.