Install Custom CUDA Version¶

This how-to will explain how to install a custom CUDA version inside a UbiOps deployment with the ubiops.yaml file. This can be useful if you require a different CUDA version than the ones readily available on UbiOps.

Ubiops.yaml¶

Everything will be done in the ubiops.yaml file. The ubiops.yaml file is used to install additional (OS level) software packages and to specify system environment variables. More information about the ubiops.yaml file can be found in the documentation.
Initially, the following blank structure will be used:

environment_variables:
apt:
  keys:
    urls:
  sources:
    items:
  packages:

First, we need to specify where the different CUDA version can be downloaded from. Afterward, we need to add the repository keys to the apt keys. This key is necessary to install the CUDA packages from the source.
Since the Ubiops infrastructure uses an x86_64 architecture with Ubuntu 20.04 (a Debian kernel), the following links can be used:

environment_variables:
apt:
  keys:
    urls:
      - https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2004/x86_64/7fa2af80.pub
      - https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2004/x86_64/3bf863cc.pub
  sources:
    items:
      - deb https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2004/x86_64 /

Trailing slash

Note the space + trailing slash in the sources-items. Both of these are necessary.

With the package locations specified, we can actually start installing the CUDA package that we want to use. Individual packages could be added, or the complete toolkit could be installed. Installing a complete CUDA toolkit will be the easiest method, as it will automatically install all the necessary packages in one go. For this tutorial the CUDA 11.0 toolkit will be installed, to align with the CUDA versions that are available on UbiOps. In this way, the pros and cons between selecting a readily available CUDA version and installing a custom CUDA version can be compared.
The following code will install the CUDA toolkit and add the correct paths to the corresponding environment variables:

environment_variables:
  - PATH=/usr/local/nvidia/bin:/usr/local/cuda/bin:${PATH}
  - LD_LIBRARY_PATH=/usr/local/nvidia/lib:/usr/local/nvidia/lib64:/usr/local/cuda/lib64:${LD_LIBRARY_PATH}
apt:
  keys:
    urls:
      - https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2004/x86_64/7fa2af80.pub
      - https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2004/x86_64/3bf863cc.pub
  sources:
    items:
      - deb https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2004/x86_64 /
  packages:
    - cuda-toolkit-11-0

With this ubiops.yaml CUDA 11.0 will be installed properly, and it can be used in our deployment. As a quick check, the nvidia-smi command could be run to check if the CUDA version is correct.
The following output could be observed:

+-----------------------------------------------------------------------------+
| NVIDIA-SMI 450.203.03   Driver Version: 450.203.03   CUDA Version: 11.0     |
|-------------------------------+----------------------+----------------------+ 
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  Tesla T4            Off  | 00000000:00:04.0 Off |                    0 |
| N/A   46C    P8     9W /  70W |      0MiB / 15109MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+

Running Commands

See how-to run terminal commands for more information on how to run commands in a deployment.