Accelerate workflows with NVIDIA RAPIDS on UbiOps!¶
Download notebook View source code
NVIDIA RAPIDS is a suite of open-source software libraries and APIs developed by Nvidia that gives scientists and data analysts the ability to execute end-to-end data science and analytics pipelines completely on GPUs! This makes many different data analytics and machine learning workflows a lot faster. This tutorial will showcase how you can create a Linear Regression classifier on a synthetic dataset with different NVIDIA RAPIDS libraries, implemented on UbiOps!
The following steps are performed in this tutorial:
- Connect to the UbiOps API
- Create baseline model
- Accelerate model with NVIDIA RAPIDS
- Implement models into deployment
- Create UbiOps environment
- Create and upload deployment to UbiOps
- Run deployment
- Compare results
Note that GPU access is needed in UbiOps to run this tutorial. UbiOps has support for GPU deployments, but this feature is not enabled for customers by default. Please contact us for more information and to enable GPU access! It is recommended to connect to a GPU runtime (if available) for local testing purposes. If local testing is unwanted, a simple runtime will suffice as well for following this tutorial.
The following results were achieved by using NVIDIA RAPIDS:


Connect to the UbiOps API¶
Let's set up our workspace!
First things first, we are going to initialize our UbiOps Python Client
!pip install "ubiops >= 3.15, <4"
import ubiops
API_TOKEN = "Token ..." # TODO: Add your UbiOps token here
PROJECT_NAME = "" # TODO: Add your project name here
ENVIRONMENT_NAME = "nvidia-rapids-env"
DEPLOYMENT_NAME = "nvidia-rapids-benchmark"
VERSION_NAME = "v1"
DEPLOYMENT_DIR = "deployment_package"
ENVIRONMENT_DIRECTORY_NAME = "environment_package"
configuration = ubiops.Configuration(host="https://api.ubiops.com/v2.1")
configuration.api_key['Authorization'] = API_TOKEN
api_client = ubiops.ApiClient(configuration)
core_instance = ubiops.CoreApi(api_client=api_client)
training_instance = ubiops.Training(api_client=api_client)
print(core_instance.service_status())
Now it's time to create directories to store our deployment/environment code!
!mkdir {DEPLOYMENT_DIR}
!mkdir {ENVIRONMENT_DIRECTORY_NAME}
Now our workspace is all set up, let's start creating our baseline model
Create baseline model¶
In order to showcase the performance improvements by utilizing NVIDIA RAPIDS, we want to have a baseline model to test against first. For this, we will create a simple Random Forest
classifier.We are going to use Scikit-Learn and Pandas for this.
We are creating the following functions for the baseline model: - generate_dataset
: Generate a random dataset for a certain amount of samples and features - convert_to_pandas
: Convert our dataset to a Pandas Dataframe
(useful for when we start creating an NVIDIA RAPIDS accelerated model) - train_lr
: Train a Linear Regression model (with Scikit-Learn) - make_predictions
: Make model predeictions - calculate_mse
: Calculate the Mean Square Error (MSE)
%%writefile {DEPLOYMENT_DIR}/baseline_model.py
import time
import pandas as pd
from sklearn.datasets import make_classification
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error
from sklearn.model_selection import train_test_split
class BaselineModel:
def __init__(self):
self.sklearn_lr = LinearRegression()
@staticmethod
def generate_dataset(n_samples, n_features=20):
x, y = make_classification(n_samples=n_samples, n_features=n_features)
x_train, x_test, y_train, y_test = train_test_split(x, y, test_size=0.2)
return x_train, x_test, y_train, y_test
@staticmethod
def convert_to_pandas(x_train, y_train, x_test):
pandas_x_train = pd.DataFrame(x_train)
pandas_y_train = pd.Series(y_train)
pandas_x_test = pd.DataFrame(x_test)
return pandas_x_train, pandas_y_train, pandas_x_test
def train_lr(self, pandas_x_train, pandas_y_train):
start_time = time.time()
self.sklearn_lr.fit(pandas_x_train, pandas_y_train)
return time.time() - start_time
def make_predictions(self, pandas_x_test):
start_time = time.time()
sklearn_predictions = self.sklearn_lr.predict(pandas_x_test)
return sklearn_predictions, time.time() - start_time
@staticmethod
def calculate_mse(y_test, sklearn_predictions):
return mean_squared_error(y_test, sklearn_predictions)
Accelerate model with NVIDIA RAPIDS¶
Now that we have our baseline model, we can accelerate this model by using the corresponding NVIDIA RAPIDS equivalent libraries/functions. The table below showcases the NVIDIA RAPIDS library equivalent to the "standard" library.
Standard Libraries | NVIDIA RAPIDS Equivalent |
---|---|
Pandas | cuDF |
Scikit-learn | cuML |
%%writefile {DEPLOYMENT_DIR}/rapids_model.py
import time
import cudf
from cuml.linear_model import LinearRegression
from cuml.metrics import mean_squared_error
class RapidsModel:
def __init__(self):
self.cu_lr = LinearRegression()
@staticmethod
def convert_to_cudf(pandas_x_train, pandas_y_train, pandas_x_test):
cudf_x_train = cudf.DataFrame.from_pandas(pandas_x_train)
cudf_y_train = cudf.Series(pandas_y_train)
cudf_x_test = cudf.DataFrame.from_pandas(pandas_x_test)
return cudf_x_train, cudf_y_train, cudf_x_test
def make_predictions(self, cudf_x_test):
start_time = time.time()
cu_predictions = self.cu_lr.predict(cudf_x_test)
return cu_predictions, time.time() - start_time
def train_lr(self, cudf_x_train, cudf_y_train):
start_time = time.time()
self.cu_lr.fit(cudf_x_train, cudf_y_train)
return time.time() - start_time
@staticmethod
def calculate_mse(y_test, cu_predictions):
return mean_squared_error(y_test, cu_predictions)
As you can see in the code block above, the core is exactly the same as in the baseline model! Some parameters are changed to give a better description, but all the function calls are entirely the same. The only difference is the library from which it is imported. In the baseline model, this is sklearn
, in the accelerated model, it's cudf
and cuml
.
Implement models into deployment¶
Now that we've written our code for a baseline model and a NVIDIA RAPIDS accelerated model, we can integrate both into a UbiOps deployment. UbiOps deployment require fixed in- and outputs, as is outlined in the documentation.
We will use the following input/output structure:
Input/Output | Name | Type | Description |
---|---|---|---|
Input | n_samples | Integer | Number of samples in the dataset |
Input | n_features | Integer | Number of features per sample |
Output | scikit-mse | Double Precision | Mean Squared Error using scikit-learn |
Output | cuml-mse | Double Precision | Mean Squared Error using cuML |
Output | scikit-train-time | Double Precision | Training time using scikit-learn |
Output | cuml-train-time | Double Precision | Training time using cuML |
Output | scikit-pred-time | Double Precision | Prediction time using scikit-learn |
Output | cuml-pred-time | Double Precision | Prediction time using cuML |
Let's integrate the models into the UbiOps deployment structure, with the inputs/outputs as specified in the table above!
%%writefile {DEPLOYMENT_DIR}/deployment.py
import time
from baseline_model import BaselineModel
from rapids_model import RapidsModel
class Deployment:
def __init__(self):
self.baseline_model = None
self.rapids_model = None
def request(self, data):
n_samples = data.get("n_samples", 1000000)
n_features = data.get("n_features", 20)
self.baseline_model = BaselineModel()
self.rapids_model = RapidsModel()
start_time = time.time()
x_train, x_test, y_train, y_test = self.baseline_model.generate_dataset(n_samples, n_features)
print("Dataset generation time: ", time.time() - start_time)
start_time = time.time()
pandas_x_train, pandas_y_train, pandas_x_test = self.baseline_model.convert_to_pandas(x_train, y_train, x_test)
print("Pandas conversion time: ", time.time() - start_time)
# Delete the dataframes to free up memory
del x_train, x_test, y_train
start_time = time.time()
cudf_x_train, cudf_y_train, cudf_x_test = self.rapids_model.convert_to_cudf(
pandas_x_train,
pandas_y_train,
pandas_x_test
)
print("CuDF conversion time: ", time.time() - start_time)
sklearn_train_time = self.baseline_model.train_lr(
pandas_x_train,
pandas_y_train,
)
cu_train_time = self.rapids_model.train_lr(cudf_x_train, cudf_y_train)
sklearn_predictions, sklearn_prediction_time = self.baseline_model.make_predictions(pandas_x_test)
cu_predictions, cu_prediction_time = self.rapids_model.make_predictions(cudf_x_test)
sklearn_mse = self.baseline_model.calculate_mse(y_test, sklearn_predictions)
cu_mse = self.rapids_model.calculate_mse(y_test, cu_predictions)
return {
"scikit-mse": sklearn_mse,
"cuml-mse": cu_mse.tolist(),
"scikit-train-time": sklearn_train_time,
"cuml-train-time": cu_train_time,
"scikit-pred-time": sklearn_prediction_time,
"cuml-pred-time": cu_prediction_time
}
Now, our code is all set up! We can now continue to create our UbiOps environment and upload our model.
We can also test our code locally. We will do that in the next (sub)section, but is not necessary.
Do note that a NVIDIA GPU is needed and CUDA needs to be installed on the machine to test this deployment locally!
Test deployment locally¶
Before deploying our model, we can test its functionality locally as well. This will be done by running the deployment in the current Python environment. For this to succeed, the proper hardware and software is needed. To run the deployment locally, the following is needed: - NVIDIA GPU - CUDA Installed
We can test both by running the following commands:
!nvcc --version
!nvidia-smi
If we have the proper pre-requisites, the installed CUDA version and GPU information will be shown.
We furthermore need to install the proper pip packages by running the following command:
!pip install --extra-index-url https://pypi.nvidia.com \
wheel \
setuptools \
cudf-cu11 \
cuml-cu11 \
scikit-learn \
pandas -q
Now that we have installed the proper packages, we can test the deployment locally!
data_input = {
"n_samples": 10 ** 6,
"n_features": 50
}
ubiops.utils.run_local(DEPLOYMENT_DIR, data_input)
As we can see, our deployment works as expected. We can now upload our deployment to UbiOps!
Create UbiOps environment¶
Before uploading our deployment to UbiOps, we need to create an environment for the deployment to run in. This environment contains additional OS-level dependencies and pip packages. To specify the additional contents of an environment, the following 2 files need to be defined: - requirements.txt
: This file specifies which pip packages need to be installed - ubiops.yaml
: This file specifies the additional OS-level dependencies
More information on UbiOps environments can be found in the documentation
Let's define our environment now!
We first start of by creating the requirements.txt
file with the pip packages we need.
%%writefile {ENVIRONMENT_DIRECTORY_NAME}/requirements.txt
--extra-index-url https://pypi.nvidia.com
wheel
setuptools
cudf-cu11
cuml-cu11
scikit-learn
pandas
Now that we've specified the requirements.txt
file, it's time to move on to the ubiops.yaml
file.
In the environment, we need to have CUDA with some additional CUDA packages. UbiOps doesn't provide a base environment with the proper additional CUDA packages installed for this implementation. Therefore, we will install all the CUDA packages manually with the ubiops.yaml
file!
The former can be achieved with the following ubiops.yaml
file:
%%writefile {ENVIRONMENT_DIRECTORY_NAME}/ubiops.yaml
environment_variables:
- PATH=/usr/local/nvidia/bin:/usr/local/cuda/bin:${PATH}
- LD_LIBRARY_PATH=/usr/local/nvidia/lib:/usr/local/nvidia/lib64:/usr/local/cuda/lib64:${LD_LIBRARY_PATH}
apt:
keys:
urls:
- https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2204/x86_64/3bf863cc.pub
sources:
items:
- deb https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2204/x86_64 /
packages:
- cuda-toolkit-11-7
We've just created all the files we need to make our own UbiOps environment.
We can now create an environment and then upload our files to it, do note that the environment might take a while to build.
# Create environment in UbiOps
try:
core_instance.environments_create(
project_name=PROJECT_NAME,
data=ubiops.EnvironmentCreate(
name=ENVIRONMENT_NAME,
display_name=ENVIRONMENT_NAME,
base_environment="ubuntu22-04-python3-11",
description="CUDA Toolkit 11.7 environment",
)
)
except ubiops.exceptions.ApiException as e:
print(e)
import shutil
# Upload files to environment
try:
# Zip the directory with the training environment dependencies
environment_archive = shutil.make_archive(ENVIRONMENT_DIRECTORY_NAME, 'zip', ENVIRONMENT_DIRECTORY_NAME)
core_instance.environment_revisions_file_upload(
project_name=PROJECT_NAME,
environment_name=ENVIRONMENT_NAME,
file=environment_archive
)
except ubiops.exceptions.ApiException as e:
print(e)
# Wait for environment to be ready
ubiops.utils.wait_for_environment(core_instance.api_client, PROJECT_NAME, ENVIRONMENT_NAME, 1800)
We have now created our environment on the UbiOps infrastructure.
Let's proceed to creating a deployment and uploading our deployment code.
Create and upload deployment to UbiOps¶
Finally, we've reached the last step of the setup process: creating a deployment on Ubiops and uploading our deployment code to UbiOps.
Let's begin by creating a new deployment!
input_fields = [
{'name': 'n_samples', 'data_type': 'int'},
{'name': 'n_features', 'data_type': 'int'}
]
output_fields = [
{'name': 'scikit-mse', 'data_type': 'double'},
{'name': 'cuml-mse', 'data_type': 'double'},
{'name': 'scikit-train-time', 'data_type': 'double'},
{'name': 'cuml-train-time', 'data_type': 'double'},
{'name': 'scikit-pred-time', 'data_type': 'double'},
{'name': 'cuml-pred-time', 'data_type': 'double'}
]
deployment_template = ubiops.DeploymentCreate(
name=DEPLOYMENT_NAME,
description='Deployment to demonstrate NVIDIA RAPIDS model acceleration',
input_type='structured',
output_type='structured',
input_fields=input_fields,
output_fields=output_fields
)
deployment = core_instance.deployments_create(project_name=PROJECT_NAME, data=deployment_template)
Now we add a deployment version to the newly created deployment:
version_template = ubiops.DeploymentVersionCreate(
version=VERSION_NAME,
environment=ENVIRONMENT_NAME,
instance_type_group_name='16384 MB + 4 vCPU + NVIDIA Tesla T4',
maximum_instances=1,
minimum_instances=0,
maximum_idle_time=600, # = 10 minutes
request_retention_mode='full'
)
core_instance.deployment_versions_create(
project_name=PROJECT_NAME,
deployment_name=DEPLOYMENT_NAME,
data=version_template
)
At last, we upload our deployment code to the newly created deployment version:
deployment_archive = shutil.make_archive(DEPLOYMENT_DIR, 'zip', DEPLOYMENT_DIR)
core_instance.revisions_file_upload(
project_name=PROJECT_NAME,
deployment_name=DEPLOYMENT_NAME,
version=VERSION_NAME,
file=deployment_archive
)
Let's wait for our deployment to be done!
ubiops.utils.wait_for_deployment_version(core_instance.api_client, PROJECT_NAME, DEPLOYMENT_NAME, VERSION_NAME)
Run deployment¶
Now it's time to use our deployment.
Let's define a function to create a request and a function to plot results:
!pip install matplotlib
import matplotlib.pyplot as plt
# function to create deployment requests
def create_request(core_instance, features, samples):
data = {
"n_features": features,
"n_samples": 10**samples
}
request = core_instance.deployment_version_requests_create(
project_name=PROJECT_NAME,
deployment_name=DEPLOYMENT_NAME,
version=VERSION_NAME,
data=data
)
result_save = {
"n_samples": data["n_samples"],
"n_features": data["n_features"],
**request.result
}
print(request.result)
return result_save
def plot_graph(results, time_key, title, feature_list):
plt.figure(figsize=(10, 10))
plt.title(title)
plt.xlabel("Number of samples")
plt.ylabel("Time (s)")
plt.xscale("log")
for i, features in enumerate(feature_list):
filtered_results = [result for result in results if result["n_features"] == features]
n_samples = [result["n_samples"] for result in filtered_results]
scikit_times = [result[f'scikit-{time_key}'] for result in filtered_results]
cuml_times = [result[f'cuml-{time_key}'] for result in filtered_results]
color = 'blue' if features == 5 else 'red'
plt.plot(n_samples, scikit_times, label=f"Scikit-learn {features} features", linestyle="dashed", color=color)
plt.plot(n_samples, cuml_times, label=f"CuML {features} features", linestyle="solid", color=color)
plt.legend()
Let's call our function now and save the results:
features = [5, 50]
range_samples = range(4,8)
results = [create_request(core_instance, feature, n_samples) for n_samples in range_samples for feature in features]
We can proceed to plot the results now:
plot_graph(results, "train-time", "Training time", features)
plot_graph(results, "pred-time", "Prediction time", features)
plt.show()
As we can see in our newly made plots, using NVIDIA RAPIDS libraries greatly speeds up our training time on bigger datasets. The prediction time doesn't benefit greatly from GPU parallelization in this use case (as parallelization potential doesn't outweigh the extra GPU overhead), but this could very well be much different for other applications.
Conclusion¶
In this tutorial, we've made a Linear Regression model, improved the training time of this model greatly with NVIDIA RAPIDS and deployed a benchmark on UbiOps!
Don't hesitate to contact us for any further information or to see what we can do for you!
# Close the UbiOps Python client
api_client.close()