GPU Deployments¶
UbiOps has support for GPU deployments, but this feature is not enabled for customers by default. Please contact sales for more information.
In order to use GPUs the following is needed:
- Choosing an instance type with GPUs, see Scaling & Resource settings
- CUDA installed, see CUDA version
- (Machine learning) libraries with GPU support, such as tensorflow, see tensorflow GPU support.
You can also find some examples of deployments that use GPUs on UbiOps:
Cold start time
GPU deployments are usually quite large. Therefore, it takes more time than usual (like 30 seconds) to scale from zero to one instance. After the first request, the requests are much faster until the instance shuts down again when the maximum idle time of the deployment version is reached. You may want to set the minimum number of instances to a higher value than zero to prevent scaling down to zero instances. Of course this has an impact on the costs as instances will always be running.
CUDA version¶
UbiOps provides languages with CUDA already installed. They can be selected when creating a deployment version. For example, it's possible to select python3.8_cuda
which contains Python 3.8 and CUDA 11.0.3 and is based on an Ubuntu 20.04 base image. It is encouraged to use the available languages with CUDA drivers, but not required. Alternatively, a ubiops.yaml file can be used to install a custom version of CUDA on a language without CUDA installed, such as python3.7
. Using a language like python3.7_cuda
typically reduces build time and improves cold start time compared to using ubiops.yaml.
There is a large variety in combinations of CUDA, NVIDIA drivers, and machine learning libraries possible. Tensorflow has been compiled against CUDA and depending on which tensorflow version is used a compatible CUDA version is required, see the tensorflow cuda compatibility matrix.
For example, tensorflow 1.15 requires CUDA 10.0 and python3.7
. The following ubiops.yaml will install CUDA 10.0:
environment_variables:
- CUDA_VERSION=10.0
- PATH=/usr/local/nvidia/bin:/usr/local/cuda/bin:${PATH}
- LD_LIBRARY_PATH=/usr/local/nvidia/lib:/usr/local/nvidia/lib64
- NVIDIA_VISIBLE_DEVICES=all
- NVIDIA_DRIVER_CAPABILITIES=compute,utility
apt:
keys:
urls:
- https://developer.download.nvidia.com/compute/cuda/repos/ubuntu1804/x86_64/3bf863cc.pub
- https://developer.download.nvidia.com/compute/cuda/repos/ubuntu1804/x86_64/7fa2af80.pub
sources:
items:
- deb https://developer.download.nvidia.com/compute/cuda/repos/ubuntu1804/x86_64 /
- deb https://developer.download.nvidia.com/compute/machine-learning/repos/ubuntu1804/x86_64 /
packages:
- cuda-cudart-10-0=10.0.130-1
- cuda-compat-10-0
- cuda-libraries-10-0=10.0.130-1
- libnpp-10-0=10.0.130-1
- cuda-nvtx-10-0=10.0.130-1
- libcusparse-10-0=10.0.130-1
- libcublas-10-0=10.0.130-1
- libnccl2=2.6.4-1+cuda10.0
- libcudnn8=7.6.5.32-1+cuda10.0
Tips and tricks¶
Pytorch and CUDA 11¶
Pytorch does not require a UbiOps language with CUDA installed (such as python3.8_cuda
) because Pytorch ships with CUDA included and will not use previously installed versions of CUDA. Instead, python3.8
can be used with Pytorch with CUDA, see https://pytorch.org/get-started/locally/, which will install CUDA 10.2 by default.
Some types of GPU (A100) require at least CUDA 11. This can be installed on UbiOps by adding a ubiops.yaml to the deployment zipfile with the following content:
environment_variables:
- PIP_EXTRA_INDEX_URL=https://download.pytorch.org/whl/cu113
and by adding more specific versions of Pytorch to requirements.txt
:
torch==1.11.0+cu113
torchvision==0.12.0+cu113