GPU acceleration¶

UbiOps has support for GPU deployments, but this feature is not enabled for all customers by default. Please contact sales for more information.

In order to utilize GPUs, the following is needed:

Choosing an instance type group with GPUs, see Scaling & Resource settings
An environment with CUDA installed, see CUDA version
(Machine learning) libraries with GPU support, such as tensorflow, see tensorflow GPU support.

You can also find some examples of deployments that use GPUs on UbiOps:

Cold start time

GPU deployments are usually quite large. Therefore, it takes more time than usual (like 30 seconds) to scale from zero to one instance. After the first request, the requests are much faster until the instance shuts down again when the maximum idle time of the deployment version is reached. You may want to set the minimum number of instances to a higher value than zero to prevent scaling down to zero instances. Of course this has an impact on the costs as instances will always be running.

CUDA version¶

Some (versions of) frameworks install CUDA compatible versions, while others require you to install CUDA before installing the framework. To support the latter, UbiOps provides base environments with CUDA already installed. They can be selected when creating a deployment version. For example, it is possible to select the base environment ubuntu22-04-python3-10-cuda11-7-1 which contains Python 3.10 and CUDA 11.7.1 and is based on an Ubuntu 22.04 base image. It is encouraged to use the available environments with CUDA drivers, but not always required. Alternatively, a ubiops.yaml file can be used to install a custom version of CUDA on an environments without CUDA installed, such as python3-10.

GPU utilization

You can check if your GPU is being utilized by inspecting the logs. The peak and average GPU utilization of a run are after each run. In case of a multi-GPU set-up, the scores are printed for all GPUs.

Here we will highlight how to run some of the most popular machine learning libraries on GPUs.

TensorFlow and CUDA¶

TensorFlow has been compiled against CUDA by default (from tensorflow==2.0.0 onwards). Therefore, choosing an environment with CUDA installed and adding tensorflow to your requirements.txt is sufficient to allow your objects to load on GPUs. However, not all TensorFlow versions are compatible with all CUDA versions. See the TensorFlow CUDA compatibility matrix to see which TensorFlow version is compatible with which CUDA versions.

For example, Tensorflow version 2.11.0 requires CUDA 11.2 and Python 3.7-3.11. To get this to work on UbiOps, you can select the environment 'Python 3.9 + CUDA 11.2.2', which has the tag python3-9-cuda11-2-2.

Since Tensorflow version 2.13, the extra [and-cuda] is supported. Adding this extra (e.g. by adding tensorflow[and-cuda]==2.13 to your requirements.txt) automatically installs a CUDA compatible version for that specific Tensorflowversion. This removes the need to use a CUDA compatible base environment.

PyTorch and CUDA¶

PyTorch has not been compiled against CUDA by default. You are required to explicitly install a PyTorch version that has been compiled against CUDA. The list of available CUDA-compiled PyTorch versions can be found here. Following example code of the PyTorch documentation, we can install a CUDA-compiled version of PyTorch by updating your deployment package in two steps:

Add the PIP Repository with CUDA-compiled versions of PyTorch to your PIP index, by adding the following line to your ubiops.yaml:

environment_variables:
- PIP_EXTRA_INDEX_URL=https://download.pytorch.org/whl/cu117

Then, instruct your deployment to install a package from this PIP repository by adding a specific version of PyTorch from this PIP repository to your requirements.txt. For example:

torch==1.13.0+cu117

This removes the need to use a CUDA compatible base environment.

Other frameworks¶

The requirement of using a CUDA base environment, differs per framework. A framework such as vLLM automatically installs a relevant CUDA version if it does not find any - no extras or specific versions are required. Always check the documentation of the relevant framework to check what is and what is not required for your workload to run on a GPU. Or set-up a development environment (SSH connection, Jupyter notebook etc.) using port forwarding on a GPU instance type, to quickly install different versions of packages and performing some tests to see if you can find your GPUs.

Tips and Tricks¶

Most frameworks offer possibilities to check if your framework has access to your GPU. For PyTorch, you can run this command, for TensorFlow you can run this command to see if the framework detects a GPU.

We have a variety of versions of combinations of Python and CUDA available. In case that you require a different CUDA version than we offer, you can use the ubiops.yaml to instruct the installation of the CUDA version of your choice. See this how-to on how to install your custom CUDA version.