Skip to content

Scaling and resource allocation

UbiOps automatically scales the number of deployed instances for your model based on the model workload. You can specify the maximum and minimum number of active model instances to be used by UbiObs auto-scaling when creating or updating a model version (see the Maximum and minimum number of instances section). When a model instance is active it will not require any cold start time when you perform the first request.

You can manage the scaling and resource allocation by setting the following parameters:

Parameter Description
Minumum number of instances Minimum number of allowed active instances
Maximum number of instances Maximum number of allowed active instances
Maximum idle time Time (in seconds) after which the number of active model instances will scale down to the value specified in the Minimum number of instances
Memory allocation Hard limit of the model memory

Maximum and minimum number of instances

UbiOps allows you to specify the maximum and the minimum number of active model instances concurrently running. You can choose a value when creating or updating a model version by entering the desired value field in the dropdown list Advanced parameters.

This setting enables you to manage the resources used by the UbiOps auto-scaler. UbiOps will analyze the model workload and automatically decide to increase/decrease the number of available instances for the served model.

If you set the Minimum number of instances to 0, your model instances will be scaled down to 0 when no requests are sent to the model and the maximum idle time is reached (see the Maximum idle time section for more info). When the model instance is scaled to 0 it will require a cold start time for your first request. For a value of the Minimum number of instances greater than 0, UbiOps will always keep an active instance of your model.

If you want to disable the auto-scaling feature, you can set the same value for the maximum and the minimum number of instances. For example, if you set both the Minimum number of instances and the Maximum number of instances to 3, you will always have 3 active model instances and auto-scaling will not be applied. This will also overrule the Maximum idle time, i.e., after the maximum idle time has elapsed with no model requests you will still have 3 active instances.

For batch requests, UbiOps will perform concurrent model requests based on the number of active instances.

Minimum number of instances

Setting a value for the Minimum number of instances greater than 0 will force UbiOps to keep always an active instance of your model. This might result in unexpected high compute credit usage.

Maximum idle time

The Maximum idle time indicates the time (in seconds) after which the number of active model instances is scaled down to the value specified in the Minimum number of instances, if no requests are performed. If the Minimum number of instances is set to 0, the model instance will shut down after the Maximum idle time has expired, resulting in a cold-start for the next request. The Maximum idle time can be overruled by setting the Minimum number of instances to a value higher than 0. For example, if you set the Minimum number of instances to 1, UbiOps will keep 1 active instance of your model, even after the maximum idle time is reached.

Memory allocation

This setting defines the memory limit that your model can use. This is a hard limit, so the memory usage of the model cannot exceed this limit. Keep in mind that this value should be large enough to encompass both your model package and any installed packages. You can choose a value when creating or updating a model version by entering the desired memory value in the drop-list Advanced parameters of UbiOps WebApp.