Skip to content

Scaling and resource allocation

UbiOps automatically scales the number of deployed instances for your deployment based on the deployment workload. You can specify the maximum and minimum number of active deployment instances to be used by UbiOps auto-scaling when creating or updating a deployment version (see the Maximum and minimum number of instances section). When a deployment instance is active it will not require any cold start time when you make a request.

You can manage the scaling and resource allocation by setting the following parameters:

Parameter Description
Minumum number of instances Minimum number of allowed active instances
Maximum number of instances Maximum number of allowed active instances
Maximum idle time Time (in seconds) after which the number of active deployment instances will scale down to the value specified in the Minimum number of instances
Memory allocation Hard limit of the deployment memory

Maximum and minimum number of instances

UbiOps allows you to specify the maximum and the minimum number of active deployment instances concurrently running. You can choose a value when creating or updating a deployment version by entering the desired value field in the dropdown list Advanced parameters.

This setting enables you to manage the resources used by the UbiOps auto-scaler. UbiOps will analyze the deployment workload and automatically decide to increase/decrease the number of available instances for the served deployment.

If you set the Minimum number of instances to 0, your deployment instances will be scaled down to 0 when no requests are sent to the deployment and the maximum idle time is reached (see the Maximum idle time section for more info). When the deployment instance is scaled to 0 it will require a cold start time for your first request. For a value of the Minimum number of instances greater than 0, UbiOps will always keep an active instance of your deployment.

If you want to disable the auto-scaling feature, you can set the same value for the maximum and the minimum number of instances. For example, if you set both the Minimum number of instances and the Maximum number of instances to 3, you will always have 3 active deployment instances and auto-scaling will not be applied. This will also overrule the Maximum idle time, i.e., after the maximum idle time has elapsed with no deployment requests you will still have 3 active instances.

For batch requests, UbiOps will perform concurrent deployment requests based on the number of active instances.

Minimum number of instances

Setting a value for the Minimum number of instances greater than 0 will force UbiOps to keep always an active instance of your deployment. This might result in unexpected high compute credit usage.

Maximum idle time

The Maximum idle time indicates the time (in seconds) after which the number of active deployment instances is scaled down to the value specified in the Minimum number of instances, if no requests are performed. If the Minimum number of instances is set to 0, the deployment instance will shut down after the Maximum idle time has expired, resulting in a cold-start for the next request. The Maximum idle time can be overruled by setting the Minimum number of instances to a value higher than 0. For example, if you set the Minimum number of instances to 1, UbiOps will keep 1 active instance of your deployment, even after the maximum idle time is reached.

Memory allocation

This setting defines the memory (RAM) limit that your deployment can use. This is a hard limit, so the memory usage of the deployment cannot exceed this limit.

Allocate sufficient memory

Keep in mind that this value should be large enough to encompass both your deployment package and any installed packages.

You can choose a value when creating or updating a deployment version by entering the desired memory value in the drop-list Advanced parameters of UbiOps WebApp.