Compute & scaling settings¶
UbiOps offers fine-grained settings to configure your deployment according to your scaling and availability needs, including in hybrid cloud setups. There are several scaling mechanisms available to manage how what and how many instances of your deployment or training are running at any time, based on the amount of resources you configure.
Deployment versions
Note that the settings described on this page must be configured for every deployment version, the settings for a training experiment are fixed.
Maximum and minimum number of instances¶
UbiOps allows you to specify the maximum and minimum number of active deployment instances concurrently running. You can choose a value when creating and updating a deployment version by entering the desired value field in the dropdown list Advanced parameters.
This setting enables you to manage the resources used by the UbiOps auto-scaler. UbiOps will analyze the deployment workload and automatically decide to increase/decrease the number of available instances for the served deployment.
If you set the Minimum number of instances
to 0
, your deployment instances will be scaled down to 0
when no requests are sent to the deployment and the maximum idle time is reached (see the Maximum idle time section for more information). When the deployment instance is scaled to 0
, it will require a cold start time for your first request. Cold start stands for your deployment instance to be started up and become ready to handle requests.
For a value of the Minimum number of instances
greater than 0
, UbiOps will always keep an active instance of your deployment, meaning that there won't be a cold start for your first request. However, scaling up to more than the minimum number of instances that you specified might still result in a cold start.
If you want to disable the auto-scaling feature, you can set the same value for the maximum and minimum number of instances. For example, if you set both the Minimum number of instances
and the Maximum number of instances
to 3
, you will always have 3
active deployment instances and auto-scaling will not be applied. This will also overrule the Maximum idle time
, i.e., after the maximum idle time has elapsed with no deployment requests you will still have 3
active instances.
For batch requests, UbiOps will perform concurrent deployment requests based on the maximum number of instances.
Minimum number of instances
Setting a value for the Minimum number of instances
greater than 0
will force UbiOps to always keep an active instance of your deployment. This might result in unexpected high compute credit usage.
Scaling algorithms¶
For each deployemnt version you can select what scaling algorithm is used. UbiOps scales instances of your deployments up and down based on incoming request traffic. We have two different scaling algorithms available for you to choose from: default
or moderate
.
The default scaling algorithm is the best pick for most scenarios. It scales up very "aggressively". When new requests enter the queue and the maximum number of instances limit is not reached yet, it will scale up a new instance immediately. This works well for the following cases:
- The cold start time is relatively short
- You expect a lot of sudden bursts in traffic where you want your deployment to scale out quickly to work through the sudden queue as quickly as possible
The moderate algorithm is slightly more moderate in scaling up, as the name suggests. It looks at the historic cold start time of your deployment and at the current queue size to determine whether it's worthwhile to scale up a new instance. When you assign this algorithm to a newly created deployment, there won't be any metrics yet on the deployment's average cold start time. Therefore the algorithm will scale similarly to the default algorithm in the beginning. Over time, when there is more data on the average cold start time of your deployment, the moderate scaling algorithm will adapt its strategy to the usage of your model. This algorithm works well in the following cases:
- The cold start time is a lot longer than the actual time it takes to handle a request
- The traffic to your model is very stable and you do not need quick burst scaling.
Maximum idle time¶
The Maximum idle time
indicates the time (in seconds) after which the number of active deployment instances is scaled down to the value specified in Minimum number of instances
, if no requests are performed. If the Minimum number of instances
is set to 0
, the deployment instance will shut down after the Maximum idle time
has expired, resulting in a cold start for the next request. The Maximum idle time
can be overruled by setting the Minimum number of instances
to a value higher than 0
. For example, if you set the Minimum number of instances
to 1
, UbiOps will keep 1
active instance of your deployment, even after the maximum idle time is reached.
Scaling down time
After the maximum idle time is reached, the instance will scale down between 0 and 60 seconds.
Default scaling settings for training experiments¶
The default scaling settings for training experiments are set as follows:
Min. instances | Max. instances | Max. idle time (seconds) |
---|---|---|
0 | 5 | 300 |
These can however be updated via the WebApp or with the deployment_version_update endpoint, and setting the parameters deployment_name
to training-base-deployment
, and version
to the name of your experiment.
Instance Type¶
Instance types determine the memory, vCPU, and storage allocation for your deployment. CPU allocation automatically scales with memory, with 1 vCPU core per 4 GiB of memory. This is a hard limit, so the memory usage of the deployment cannot exceed the limit defined by the instance type. Local storage scales with memory as well, 4 GiB per 1 GiB of memory. This means deployments can write to a local directory, but not unlimited. The local directory is not preserved over deployment restarts. UbiOps also supports instance types with GPUs, see GPU Deployments.
UbiOps uses a hybrid cloud set-up, allowing us to source a wide variety of instance types from different environments, that can accomodate your needs and requirements in terms of compute and data locality. Please contact us if you have specific requirements for your compute.
The following instance types are available on UbiOps SaaS:
Instance Type | Memory (MiB) | vCPU | GPU | Storage (MiB) | Credit Rate (credit/hr) |
---|---|---|---|---|---|
256mb | 256 | 0.062 | n/a | 1024 | 0.25 |
512mb | 512 | 0.125 | n/a | 2048 | 0.5 |
1024mb | 1024 | 0.25 | n/a | 4096 | 1 |
2048mb | 2048 | 0.5 | n/a | 8192 | 2 |
4096mb | 4096 | 1 | n/a | 16384 | 4 |
8192mb | 8192 | 2 | n/a | 32768 | 8 |
12288mb | 12288 | 3 | n/a | 49152 | 12 |
16384mb | 16384 | 4 | n/a | 65536 | 16 |
16384mb_t4 1 | 16384 | 4 | 1 x NVIDIA Tesla T4 | 65536 | 48 |
16384mb_l4 1 | 16384 | 4 | 1 x NVIDIA Ada LoveLace L4 | 65536 | 48 |
96000mb_l4_2x 1 | 16384 | 24 | 2 x NVIDIA Ada LoveLace L4 | 150000 | 130 |
76000mb_a100 1 | 76000 | 11 | 1 x NVIDIA Ampere A100 (40GB) | 304000 | 140 |
180000mb_a100 1 | 180000 | 22 | 1 x NVIDIA Ampere A100 (80GB) | 250000 | 280 |
You can choose a value when creating and updating a deployment version by picking the desired instance type from the drop-down list in the Scaling and resource settings in the UbiOps WebApp.
Allocate a large enough instance type
Keep in mind that this value should be large enough to run both your deployment package and any installed packages.
Subscriptions with GPUs available
The GPU instance types may not be available for your subscription by default. Please contact our sales in order to make them available for you.
Deploy UbiOps inside your own cloud environment
It is possible to deploy workloads from UbiOps in your own cloud environment. In that case, you can make use of compute resources from your cloud environment. Please contact sales for more information.
Instance type groups¶
In UbiOps it is possible to define instance type groups. These groups contain one or more instance types, each with an associated prioritization. This is particularly helpful in hybrid and multi cloud set-ups with UbiOps. For example, let's say you have some on-premise GPUs that you prefer to use, but whenever they're occupied, you want to be able to scale out to cloud based GPUs. This behavior can be defined in an instance group of two instance types, one on-premise instance type, and one cloud instance types. Within this group you can then set the on-premise instance type as the prioritized instance type, and the cloud one with a lower priority level.
Instance type groups have a name, and one or more instance types, each associated with a priority score. A priority score of "0" indicates highest priority, higher numbers indicate lower priority. In the WebApp you can simply order the list based on priority (highest priority on top).
It's also possible to assign the same priority score to multiple instance types in a group. This indicates to UbiOps that your deployment/experiment can run on any of these instance types, and you don't have any preference. In that case UbiOps will pick one instance type at random and if it's not available, it will automatically try a different one from the same priority level. This can be useful in a multi cloud set-up where you're sourcing GPUs from multiple providers, and you simply want to deploy to whichever one is the quickest to spin up.
Limits¶
The limits on the previously described settings are as following:
Parameter | Lower Limit | Upper Limit | Description |
---|---|---|---|
Minumum number of instances | 0 | 20 | Minimum number of allowed active instances |
Maximum number of instances | 1 | 20 | Maximum number of allowed active instances |
Maximum idle time | 10 | 3600 | Time (in seconds) after which the number of active deployment instances will scale down to the value specified in the Minimum number of instances |
Instance Type | 256 MiB | 16384 MiB | The instance type defines which compute resources are allocated for your deployment |