Scaling and resource allocation¶
UbiOps automatically scales the number of deployed instances for your deployment based on the deployment workload. You can specify the maximum and minimum number of active deployment instances to be used by UbiOps auto-scaling when creating or updating a deployment version (see the Maximum and minimum number of instances section). When a deployment instance is active it will not require any cold start time when you make a request.
You can manage the scaling and resource allocation by setting the following parameters:
Parameter | Description |
---|---|
Minumum number of instances | Minimum number of allowed active instances |
Maximum number of instances | Maximum number of allowed active instances |
Maximum idle time | Time (in seconds) after which the number of active deployment instances will scale down to the value specified in the Minimum number of instances |
Memory allocation | Hard limit of the deployment memory |
Maximum and minimum number of instances¶
UbiOps allows you to specify the maximum and the minimum number of active deployment instances concurrently running. You can choose a value when creating or updating a deployment version by entering the desired value field in the dropdown list Advanced parameters.
This setting enables you to manage the resources used by the UbiOps auto-scaler. UbiOps will analyze the deployment workload and automatically decide to increase/decrease the number of available instances for the served deployment.
If you set the Minimum number of instances
to 0
, your deployment instances will be scaled down to 0
when no requests are sent to the deployment and the maximum idle time is reached (see the Maximum idle time section for more info). When the deployment instance is scaled to 0
it will require a cold start time for your first request. For a value of the Minimum number of instances
greater than 0
, UbiOps will always keep an active instance of your deployment.
If you want to disable the auto-scaling feature, you can set the same value for the maximum and the minimum number of instances. For example, if you set both the Minimum number of instances
and the Maximum number of instances
to 3
, you will always have 3
active deployment instances and auto-scaling will not be applied. This will also overrule the Maximum idle time
, i.e., after the maximum idle time has elapsed with no deployment requests you will still have 3
active instances.
For batch requests, UbiOps will perform concurrent deployment requests based on the number of active instances.
Minimum number of instances
Setting a value for the Minimum number of instances
greater than 0
will force UbiOps to keep always an active instance of your deployment. This might result in unexpected high compute credit usage.
Maximum idle time ¶
The Maximum idle time
indicates the time (in seconds) after which the number of active deployment instances is scaled down to the value specified in the Minimum number of instances
, if no requests are performed. If the Minimum number of instances
is set to 0
, the deployment instance will shut down after the Maximum idle time
has expired, resulting in a cold-start for the next request. The Maximum idle time
can be overruled by setting the Minimum number of instances
to a value higher than 0
. For example, if you set the Minimum number of instances
to 1
, UbiOps will keep 1
active instance of your deployment, even after the maximum idle time is reached.
Memory allocation¶
This setting defines the memory (RAM) limit that your deployment can use. This is a hard limit, so the memory usage of the deployment cannot exceed this limit.
Allocate sufficient memory
Keep in mind that this value should be large enough to encompass both your deployment package and any installed packages.
You can choose a value when creating or updating a deployment version by entering the desired memory value in the drop-list Advanced parameters of UbiOps WebApp.