Skip to content

Tracking training metrics on UbiOps

When you're training a new model on UbiOps, you might want to track some metrics as you go. This how to covers how-to track accuracy when running Tensorflow or PyTorch training code on UbiOps. We will cover the 2 necessary steps:

  1. Define a new metric in UbiOps
  2. Adjust your training code to log datapoints for this metric.

Defining a new metric in UbiOps

Before you can track any custom metric in UbiOps, the metric needs to be defined. You can either do this in the WebApp by navigating to Monitoring > Metrics > Create new metric, or via code. Let's walk through both.

When you click Create new custom metric in the WebApp, you will be redirected to a form where you need to provide the following information:

  • Name: the name of your custom metric. It always has to begin with custom.. This is the name that will be used for the titles of associated graphs later. In our example we can call our metric custom.accuracy
  • Description (optional): the description of your custom metric.
  • Unit (optional): the unit of measurement for your custom metric In our case we can use %.
  • Metric type: how this metric should be processed. We support Gauge and Delta metrics (see metrics for more details). For our example we can use Gauge.
  • Metric level (referred to as labels in the API): on what level do you plan to store the metric? In our case we need the metric on training run level.

Creating a custom metric

When you click Create the metric will be created and you can start logging data to it. Alternatively, you can also do the same with the Python client library:

import ubiops

configuration = ubiops.Configuration()
# Configure API token authorization
configuration.api_key['Authorization'] = "Token <YOUR_API_TOKEN>"
api_client = ubiops.ApiClient(configuration)
core_api = ubiops.CoreApi(api_client)

project_name = 'project_name_example' # str
data = ubiops.MetricCreate(
    name = 'custom.accuracy',
    description = 'Accuracy of my model',
    metric_type = 'gauge',
    unit = '%',
    labels = ['request_id']

# Create the metric
api_response = core_api.metrics_create(project_name, data)

Custom metric labels

If you do not intend to tie your custom metric to deployments, pipelines or training runs, it is also possible to pass different labels. This can only be done when working with the client library or API directly.

Logging datapoints from your training code

Now that the metric has been created in UbiOps, we can start logging data to it from our training code. In essence you just need to start up a UbiOps metric client, and use it's log_metric function.

from ubiops.utils.metrics import MetricClient

metric_client = MetricClient(project_name = "your_project_name")
  metric_name = "custom.accuracy",
  labels = {"request_id": "your_run_id"},
  value = your_metric_value

Let's have a look at how this would work with Tensorflow and PyTorch code specifically.

Tensorflow example

In order to log the accuracy after each epoch, the callbacks argument of the function can be used. In case you want to track the accuracy at a different interval, i.e after each batch, you can modify one of the other predefined callback function. Have a look at the Tensorfllow callbacks documentation for more details about working with Tensorflow callbacks.

import tensorflow as tf
from ubiops.utils.metrics import MetricClient

class UbiOpsCallback(tf.keras.callbacks.Callback):
    def __init__(self, bucket_name, context):
        self.context = context
        project_name = context["project"]

        self.metric_client = MetricClient(project_name=project_name)

    def on_epoch_end(self, epoch, logs=None):
        This function is called at the end of each epoch.

        :param epoch: the epoch number
        :param logs: the logs of the epoch
        accuracy = logs["accuracy"] * 100 # convert to percentage
        self.metric_client.log_metric(metric_name="custom.accuracy", labels={"request_id": self.context["id"]}, value=accuracy)

def train(training_data, parameters, context):
    This function is called by UbiOps.

    :param training_data: the training data
    :param parameters: the parameters
    :param context: the context

    # TODO: Add TensorFlow training code
        callbacks=[UbiOpsCallback(bucket_name, context)]

PyTorch example

When we want to log our accuracy at the end of every epoch with PyTorch, we can simply call the log_metric function within the training loop.

import torch
from ubiops.utils.metrics import MetricClient

def train(training_data, parameters, context):
    This function is called by UbiOps.

    :param training_data: the training data
    :param parameters: the parameters
    :param context: the context
    project_name = context["project"]

    metric_client = MetricClient(project_name=project_name)

    # TODO: Add PyTorch training code

    for epoch in range(epochs):
        # TODO: Add training code
        # TODO: calculate your accuracy

        self.metric_client.log_metric(metric_name="custom.accuracy", labels={"request_id": context["id"]}, value=accuracy)

Viewing your data

When you define your metric and adjust your code to track it as defined in the previous steps, you will be able to see your custom metric data in the UbiOps WebApp. Simply navigate to your training run details page and click on the Metrics tab. The graph in the WebApp will have a resolution of maximum one value per minute and the graph will be updated real time as your training run is processing.

Custom training metrics