Skip to content

Accelerate workflows with NVIDIA RAPIDS on UbiOps!

Download notebook View source code

NVIDIA RAPIDS is a suite of open-source software libraries and APIs developed by Nvidia that gives scientists and data analysts the ability to execute end-to-end data science and analytics pipelines completely on GPUs! This makes many different data analytics and machine learning workflows a lot faster. This tutorial will showcase how you can create a Linear Regression classifier on a synthetic dataset with different NVIDIA RAPIDS libraries, implemented on UbiOps!

The following steps are performed in this tutorial:

  1. Connect to the UbiOps API
  2. Create baseline model
  3. Accelerate model with NVIDIA RAPIDS
  4. Implement models into deployment
  5. Create UbiOps environment
  6. Create and upload deployment to UbiOps
  7. Run deployment
  8. Compare results

Note that GPU access is needed in UbiOps to run this tutorial. UbiOps has support for GPU deployments, but this feature is not enabled for customers by default. Please contact us for more information and to enable GPU access! It is recommended to connect to a GPU runtime (if available) for local testing purposes. If local testing is unwanted, a simple runtime will suffice as well for following this tutorial.

The following results were achieved by using NVIDIA RAPIDS:

Connect to the UbiOps API

Let's set up our workspace!
First things first, we are going to initialize our UbiOps Python Client

!pip install "ubiops >= 3.15, <4"
import ubiops

API_TOKEN = "Token ..." # TODO: Add your UbiOps token here
PROJECT_NAME = "" # TODO: Add your project name here

ENVIRONMENT_NAME = "nvidia-rapids-env"

DEPLOYMENT_NAME = "nvidia-rapids-benchmark"

DEPLOYMENT_DIR = "deployment_package"
ENVIRONMENT_DIRECTORY_NAME = "environment_package"
configuration = ubiops.Configuration(host="")
configuration.api_key['Authorization'] = API_TOKEN

api_client = ubiops.ApiClient(configuration)
core_instance = ubiops.CoreApi(api_client=api_client)
training_instance = ubiops.Training(api_client=api_client)

Now it's time to create directories to store our deployment/environment code!


Now our workspace is all set up, let's start creating our baseline model

Create baseline model

In order to showcase the performance improvements by utilizing NVIDIA RAPIDS, we want to have a baseline model to test against first. For this, we will create a simple Random Forest classifier.We are going to use Scikit-Learn and Pandas for this.
We are creating the following functions for the baseline model: - generate_dataset: Generate a random dataset for a certain amount of samples and features - convert_to_pandas: Convert our dataset to a Pandas Dataframe (useful for when we start creating an NVIDIA RAPIDS accelerated model) - train_lr: Train a Linear Regression model (with Scikit-Learn) - make_predictions: Make model predeictions - calculate_mse: Calculate the Mean Square Error (MSE)

%%writefile {DEPLOYMENT_DIR}/

import time

import pandas as pd
from sklearn.datasets import make_classification
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error
from sklearn.model_selection import train_test_split

class BaselineModel:
    def __init__(self):
        self.sklearn_lr = LinearRegression()

    def generate_dataset(n_samples, n_features=20):
        x, y = make_classification(n_samples=n_samples, n_features=n_features)
        x_train, x_test, y_train, y_test = train_test_split(x, y, test_size=0.2)
        return x_train, x_test, y_train, y_test

    def convert_to_pandas(x_train, y_train, x_test):
        pandas_x_train = pd.DataFrame(x_train)
        pandas_y_train = pd.Series(y_train)
        pandas_x_test = pd.DataFrame(x_test)
        return pandas_x_train, pandas_y_train, pandas_x_test

    def train_lr(self, pandas_x_train, pandas_y_train):
        start_time = time.time(), pandas_y_train)
        return time.time() - start_time

    def make_predictions(self, pandas_x_test):
        start_time = time.time()
        sklearn_predictions = self.sklearn_lr.predict(pandas_x_test)
        return sklearn_predictions, time.time() - start_time

    def calculate_mse(y_test, sklearn_predictions):
        return mean_squared_error(y_test, sklearn_predictions)

Accelerate model with NVIDIA RAPIDS

Now that we have our baseline model, we can accelerate this model by using the corresponding NVIDIA RAPIDS equivalent libraries/functions. The table below showcases the NVIDIA RAPIDS library equivalent to the "standard" library.

Standard Libraries NVIDIA RAPIDS Equivalent
Pandas cuDF
Scikit-learn cuML
%%writefile {DEPLOYMENT_DIR}/

import time
import cudf
from cuml.linear_model import LinearRegression
from cuml.metrics import mean_squared_error

class RapidsModel:
    def __init__(self):
        self.cu_lr = LinearRegression()

    def convert_to_cudf(pandas_x_train, pandas_y_train, pandas_x_test):
        cudf_x_train = cudf.DataFrame.from_pandas(pandas_x_train)
        cudf_y_train = cudf.Series(pandas_y_train)
        cudf_x_test = cudf.DataFrame.from_pandas(pandas_x_test)
        return cudf_x_train, cudf_y_train, cudf_x_test

    def make_predictions(self, cudf_x_test):
        start_time = time.time()
        cu_predictions = self.cu_lr.predict(cudf_x_test)
        return cu_predictions, time.time() - start_time

    def train_lr(self, cudf_x_train, cudf_y_train):
        start_time = time.time(), cudf_y_train)
        return time.time() - start_time

    def calculate_mse(y_test, cu_predictions):
        return mean_squared_error(y_test, cu_predictions)

As you can see in the code block above, the core is exactly the same as in the baseline model! Some parameters are changed to give a better description, but all the function calls are entirely the same. The only difference is the library from which it is imported. In the baseline model, this is sklearn, in the accelerated model, it's cudf and cuml.

Implement models into deployment

Now that we've written our code for a baseline model and a NVIDIA RAPIDS accelerated model, we can integrate both into a UbiOps deployment. UbiOps deployment require fixed in- and outputs, as is outlined in the documentation.
We will use the following input/output structure:

Input/Output Name Type Description
Input n_samples Integer Number of samples in the dataset
Input n_features Integer Number of features per sample
Output scikit-mse Double Precision Mean Squared Error using scikit-learn
Output cuml-mse Double Precision Mean Squared Error using cuML
Output scikit-train-time Double Precision Training time using scikit-learn
Output cuml-train-time Double Precision Training time using cuML
Output scikit-pred-time Double Precision Prediction time using scikit-learn
Output cuml-pred-time Double Precision Prediction time using cuML

Let's integrate the models into the UbiOps deployment structure, with the inputs/outputs as specified in the table above!

%%writefile {DEPLOYMENT_DIR}/

import time

from baseline_model import BaselineModel
from rapids_model import RapidsModel

class Deployment:
    def __init__(self):
        self.baseline_model = None
        self.rapids_model = None

    def request(self, data):
        n_samples = data.get("n_samples", 1000000)
        n_features = data.get("n_features", 20)

        self.baseline_model = BaselineModel()
        self.rapids_model = RapidsModel()

        start_time = time.time()
        x_train, x_test, y_train, y_test = self.baseline_model.generate_dataset(n_samples, n_features)
        print("Dataset generation time: ", time.time() - start_time)

        start_time = time.time()
        pandas_x_train, pandas_y_train, pandas_x_test = self.baseline_model.convert_to_pandas(x_train, y_train, x_test)
        print("Pandas conversion time: ", time.time() - start_time)

        # Delete the dataframes to free up memory
        del x_train, x_test, y_train

        start_time = time.time()
        cudf_x_train, cudf_y_train, cudf_x_test = self.rapids_model.convert_to_cudf(
        print("CuDF conversion time: ", time.time() - start_time)

        sklearn_train_time = self.baseline_model.train_lr(
        cu_train_time = self.rapids_model.train_lr(cudf_x_train, cudf_y_train)

        sklearn_predictions, sklearn_prediction_time = self.baseline_model.make_predictions(pandas_x_test)
        cu_predictions, cu_prediction_time = self.rapids_model.make_predictions(cudf_x_test)

        sklearn_mse = self.baseline_model.calculate_mse(y_test, sklearn_predictions)
        cu_mse = self.rapids_model.calculate_mse(y_test, cu_predictions)

        return {
            "scikit-mse": sklearn_mse,
            "cuml-mse": cu_mse.tolist(),
            "scikit-train-time": sklearn_train_time,
            "cuml-train-time": cu_train_time,
            "scikit-pred-time": sklearn_prediction_time,
            "cuml-pred-time": cu_prediction_time

Now, our code is all set up! We can now continue to create our UbiOps environment and upload our model.
We can also test our code locally. We will do that in the next (sub)section, but is not necessary.
Do note that a NVIDIA GPU is needed and CUDA needs to be installed on the machine to test this deployment locally!

Test deployment locally

Before deploying our model, we can test its functionality locally as well. This will be done by running the deployment in the current Python environment. For this to succeed, the proper hardware and software is needed. To run the deployment locally, the following is needed: - NVIDIA GPU - CUDA Installed

We can test both by running the following commands:

!nvcc --version

If we have the proper pre-requisites, the installed CUDA version and GPU information will be shown.

We furthermore need to install the proper pip packages by running the following command:

!pip install --extra-index-url \
  wheel \
  setuptools \
  cudf-cu11 \
  cuml-cu11 \
  scikit-learn \
  pandas -q

Now that we have installed the proper packages, we can test the deployment locally!

data_input = {
    "n_samples": 10 ** 6,
    "n_features": 50

ubiops.utils.run_local(DEPLOYMENT_DIR, data_input)

As we can see, our deployment works as expected. We can now upload our deployment to UbiOps!

Create UbiOps environment

Before uploading our deployment to UbiOps, we need to create an environment for the deployment to run in. This environment contains additional OS-level dependencies and pip packages. To specify the additional contents of an environment, the following 2 files need to be defined: - requirements.txt: This file specifies which pip packages need to be installed - ubiops.yaml: This file specifies the additional OS-level dependencies

More information on UbiOps environments can be found in the documentation

Let's define our environment now!

We first start of by creating the requirements.txt file with the pip packages we need.

%%writefile {ENVIRONMENT_DIRECTORY_NAME}/requirements.txt


Now that we've specified the requirements.txt file, it's time to move on to the ubiops.yaml file.
In the environment, we need to have CUDA with some additional CUDA packages. UbiOps doesn't provide a base environment with the proper additional CUDA packages installed for this implementation. Therefore, we will install all the CUDA packages manually with the ubiops.yaml file!

The former can be achieved with the following ubiops.yaml file:

%%writefile {ENVIRONMENT_DIRECTORY_NAME}/ubiops.yaml

  - PATH=/usr/local/nvidia/bin:/usr/local/cuda/bin:${PATH}
  - LD_LIBRARY_PATH=/usr/local/nvidia/lib:/usr/local/nvidia/lib64:/usr/local/cuda/lib64:${LD_LIBRARY_PATH}
      - deb /
    - cuda-toolkit-11-7

We've just created all the files we need to make our own UbiOps environment.
We can now create an environment and then upload our files to it, do note that the environment might take a while to build.

# Create environment in UbiOps
            description="CUDA Toolkit 11.7 environment",
except ubiops.exceptions.ApiException as e:
import shutil

# Upload files to environment
    # Zip the directory with the training environment dependencies
    environment_archive = shutil.make_archive(ENVIRONMENT_DIRECTORY_NAME, 'zip', ENVIRONMENT_DIRECTORY_NAME)

except ubiops.exceptions.ApiException as e:

# Wait for environment to be ready
ubiops.utils.wait_for_environment(core_instance.api_client, PROJECT_NAME, ENVIRONMENT_NAME, 1800)

We have now created our environment on the UbiOps infrastructure.
Let's proceed to creating a deployment and uploading our deployment code.

Create and upload deployment to UbiOps

Finally, we've reached the last step of the setup process: creating a deployment on Ubiops and uploading our deployment code to UbiOps.

Let's begin by creating a new deployment!

input_fields = [
    {'name': 'n_samples', 'data_type': 'int'},
    {'name': 'n_features', 'data_type': 'int'}

output_fields = [
    {'name': 'scikit-mse', 'data_type': 'double'},
    {'name': 'cuml-mse', 'data_type': 'double'},
    {'name': 'scikit-train-time', 'data_type': 'double'},
    {'name': 'cuml-train-time', 'data_type': 'double'},
    {'name': 'scikit-pred-time', 'data_type': 'double'},
    {'name': 'cuml-pred-time', 'data_type': 'double'}

deployment_template = ubiops.DeploymentCreate(
    description='Deployment to demonstrate NVIDIA RAPIDS model acceleration',

deployment = core_instance.deployments_create(project_name=PROJECT_NAME, data=deployment_template)

Now we add a deployment version to the newly created deployment:

version_template = ubiops.DeploymentVersionCreate(
    maximum_idle_time=600, # = 10 minutes


At last, we upload our deployment code to the newly created deployment version:

deployment_archive = shutil.make_archive(DEPLOYMENT_DIR, 'zip', DEPLOYMENT_DIR)


Let's wait for our deployment to be done!

ubiops.utils.wait_for_deployment_version(core_instance.api_client, PROJECT_NAME, DEPLOYMENT_NAME, VERSION_NAME)

Run deployment

Now it's time to use our deployment.
Let's define a function to create a request and a function to plot results:

!pip install matplotlib
import matplotlib.pyplot as plt

# function to create deployment requests
def create_request(core_instance, features, samples):
    data = {
        "n_features": features,
        "n_samples": 10**samples
    request = core_instance.deployment_version_requests_create(
    result_save = {
        "n_samples": data["n_samples"],
        "n_features": data["n_features"],
    return result_save

def plot_graph(results, time_key, title, feature_list):
    plt.figure(figsize=(10, 10))
    plt.xlabel("Number of samples")
    plt.ylabel("Time (s)")

    for i, features in enumerate(feature_list):
        filtered_results = [result for result in results if result["n_features"] == features]
        n_samples = [result["n_samples"] for result in filtered_results]
        scikit_times = [result[f'scikit-{time_key}'] for result in filtered_results]
        cuml_times = [result[f'cuml-{time_key}'] for result in filtered_results]
        color = 'blue' if features == 5 else 'red'

        plt.plot(n_samples, scikit_times, label=f"Scikit-learn {features} features", linestyle="dashed", color=color)
        plt.plot(n_samples, cuml_times, label=f"CuML {features} features", linestyle="solid", color=color)

Let's call our function now and save the results:

features = [5, 50]
range_samples = range(4,8)

results = [create_request(core_instance, feature, n_samples) for n_samples in range_samples for feature in features]

We can proceed to plot the results now:

plot_graph(results, "train-time", "Training time", features)
plot_graph(results, "pred-time", "Prediction time", features)

As we can see in our newly made plots, using NVIDIA RAPIDS libraries greatly speeds up our training time on bigger datasets. The prediction time doesn't benefit greatly from GPU parallelization in this use case (as parallelization potential doesn't outweigh the extra GPU overhead), but this could very well be much different for other applications.


In this tutorial, we've made a Linear Regression model, improved the training time of this model greatly with NVIDIA RAPIDS and deployed a benchmark on UbiOps!
Don't hesitate to contact us for any further information or to see what we can do for you!

# Close the UbiOps Python client