Skip to content

Training on UbiOps

In version 2.23.0 support was added for training on UbiOps. Training can be enabled in your project by navigating to the training page and clicking enable training. If enabled UbiOps will prepare a training set-up in your project that you can use for creating training experiments and runs.

enable training in the UI

Experiments define the training set-up that you will use, such as the environment it should run in, what it should be called, and what instance type you would like to use. Runs belong to experiments and they are the actual training jobs. Runs thus specify what code is run, on what data, and with what parameters. Since runs all share the same environment specified by the experiment, runs can be used for things like hyper-parameter tuning. If you want to compare training implementations with different frameworks (i.e. scikit-learn versus TensorFlow), you will need separate experiments for this.

Creating experiments

You can create new experiments through the WebApp and the Python Client Library. The workflows are the same for both. When creating a new experiment you can configure the following settings:

  • Name: the name of your experiment
  • Description: a description for your experiment
  • Instance Type: what instance type to use for all your runs. This is where you can specify what GPU to use as well.
  • Environment: what code environment to run in, this includes what Python or R version and any dependencies that you need for your runs.
  • Default bucket: the default bucket to use for any outputted files created by your training run.
  • Environment variables: environment variables that you can use in your training code via os.environ
  • Labels: key:value pair labels that can help you categorize your experiments

These settings work the same as they do for Deployment versions.

If you want to create a new environment to use in your experiment, see environments.

an example experiment

Below you can see a code example of creating an experiment with the client library.

import ubiops

configuration = ubiops.Configuration()
# Configure API token authorization
configuration.api_key['Authorization'] = "Token <YOUR_API_TOKEN>"

api_client = ubiops.ApiClient(configuration)
training_instance = ubiops.Training(api_client)

project_name = 'project_name_example' # str
experiment_name = 'experiment_name_example' # str

# Create experiment
api_response = training_instance.experiments_create(
    project_name=project_name,
    data={
        'name': experiment_name,
        'instance_type': '4096mb',
        'description': 'Training dummy experiment',
        'environment_name': 'python3-8',
        'default_bucket': 'default',
        'labels': {"type": "dummy"}
    }
)
print(api_response)

If you want to use a custom environment in your training experiment you can use the code snippet below to create it.

import ubiops

configuration = ubiops.Configuration()
# Configure API token authorization
configuration.api_key['Authorization'] = "Token <YOUR_API_TOKEN>"

api_client = ubiops.ApiClient(configuration)
core_api = ubiops.CoreApi(api_client)

project_name = 'project_name_example' # str
experiment_name = 'experiment_name_example' # str

data = ubiops.EnvironmentCreate(
    name=experiment_name,
    base_environment='python3-8'
)

# Create environment
api_response = core_api.environments_create(project_name=project_name, data=data)
print(api_response)

# Close the connection
api_client.close()

Creating training runs

For every experiment, you can create multiple training runs. These runs are the actual code executions on a specific dataset. When creating a new training run you can configure the following settings:

  • Name: the name of your run
  • Description: a description for your run
  • Max Timeout: a timeout after which the run will automatically be stopped. This can be used to avoid running for hours on expensive hardware when this wasn't expected.
  • Training code: the training code that actually needs to be run. This can be in the form of a single Python file or a zip containing at least a train.py and any other required files.
  • Training data: a file containing the training data for your run.
  • Parameters: any parameters you want to use in your training code. This can be any dictionary and can also be left empty.

An example training run

Below you can see a code example of creating a training run with the client library.

import ubiops

configuration = ubiops.Configuration()
# Configure API token authorization
configuration.api_key['Authorization'] = "Token <YOUR_API_TOKEN>"

api_client = ubiops.ApiClient(configuration)
training_instance = ubiops.Training(api_client)

project_name = 'project_name_example' # str
experiment_name = 'experiment_name_example' # str
run_name = 'run_name_example' # str

new_run = training_instance.experiment_runs_create(
    project_name=project_name,
    experiment_name=experiment_name,
    data=ubiops.ExperimentRunCreate(
        name=run_name,
        description='Trying out a run',
        training_code='./path/to/train.py',
        training_data='ubiops-file://default/training_data.zip',
        parameters={
            'nr_epochs': 15, # example parameters
            'batch_size': 32
        },
        timeout=14400
    )
)

Training code format

The training code that you can use in your runs needs to fulfill the following criteria:

  • The training file needs to contain a function called train(training_data, parameters, context) that returns a dictionary containing the following fields: artifact, metadata, metrics, additional_output_files. All in- and output parameters are optional.
  • The artifact entry of the returned dictionary cannot be null, the other fields in the dictionary can be left empty if you don't need them.

The training code can be passed as either a zip file containing a train.py and any other files you might want to use in your code, or as a single Python source file that can have any name.

The training_data argument passed to your train function is a file path to the training data file. The parameters argument is a dictionary containing any additional parameters passed to the training code. If, for example, you want to be able to vary the batch size used in your code with every training run, it would make sense to pass that in the parameter dictionary and read the value from there, rather than hard-coding it. The context parameter contains additional metadata, identical to deployment versions.

The fields in the returned dictionary should have the following data types:

  • artifact: file
  • metadata: dictionary
  • metrics: dictionary
  • additional_output_files: array of files

This means that there are no predefined metrics for you to use. You can define your own metrics in your training run and return them in the metrics dictionary with a name and a value.

Below you can find an example training code that trains a scikit learn KNN classifier. It expects a parameter called test_size that influences the train test split.

import pandas as pd
import numpy as np

from sklearn.model_selection import train_test_split
from joblib import dump
from sklearn.neighbors import KNeighborsClassifier
from sklearn.metrics import classification_report


def train(training_data, parameters, context):

    print("Loading data")
    sample_data = pd.read_csv(training_data)

    X = sample_data.drop(["Outcome"], axis = 1) 
    y = sample_data.Outcome

    # Split data into train and test set
    test_size = parameters["test_size"]
    X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=test_size, random_state=42, stratify=y)

    # Setup a knn classifier with k neighbors
    knn = KNeighborsClassifier(n_neighbors=7) 

    # Fit the model on training data
    knn.fit(X_train,y_train)

    # Get accuracy on test set. Note: In case of classification algorithms score method represents accuracy.
    accuracy = knn.score(X_test, y_test)
    print('KNN accuracy: ' + str(accuracy))

    # Let us get the predictions using the classifier we trained above
    y_pred = knn.predict(X_test)

    # Output classification report
    print('Classification report:')
    print(classification_report(y_test, y_pred))

    # Persisting the model artifact
    dump(knn, 'knn.joblib')

    return {
        "artifact": "knn.joblib",
        "metadata": {},
        "metrics": {"accuracy": accuracy},
        "additional_output_files": []
    }

Passing training data to your run

Training data needs to be passed to your training run in the form of a UbiOps file. This means that currently you are limited to using single files, for instance zipped datasets, as training data. If you would rather read directly from your own S3 bucket or if your data is not stored as a file but rather in a relational database, you can leave the training_data field empty and simply read in the data directly in your code. In this last scenario it's important to pass any required credentials as a secret environment variable to your experiment, to avoid hard-coding credentials.