MLFlow example¶
Download notebook View source code
On this page we will show you the following:
How to perform hyperparameter tuning and experiment tracking using MLFlow, and how to deploy the resulting best performing model into a deployment. The model used in this tutorial looks at features of wine and tries to predict the quality based on that. This is based on the Example from the MLFlow documentation
If you download and run this entire notebook after filling in your access token, the mlflow deployment will be deployed to your UbiOps environment. You can thus check your environment after running to explore. You can also check the individual steps in this notebook to see what we did exactly and how you can adapt it to your own use case.
We recommend to run the cells step by step, as some cells can take a few minutes to finish. You can run everything in one go as well and it will work, just allow a few minutes for building the individual deployments.
Installing the required packages¶
We will use several packages to create our model and deploy it.
!pip install pandas
!pip install numpy
!pip install sklearn
!pip install mlflow
import os
import requests
os.mkdir('wine-model')
r = requests.get('https://storage.googleapis.com/ubiops/data/Integration%20with%20other%20tools/mlflow-example/wine-model/MLproject')
with open('wine-model/MLproject', 'wb') as f:
f.write(r.content)
r = requests.get('https://storage.googleapis.com/ubiops/data/Integration%20with%20other%20tools/mlflow-example/wine-model/train.py')
with open('wine-model/train.py', 'wb') as f:
f.write(r.content)
r = requests.get('https://storage.googleapis.com/ubiops/data/Integration%20with%20other%20tools/mlflow-example/wine-model/wine-quality.csv')
with open('wine-model/wine-quality.csv', 'wb') as f:
f.write(r.content)
Testing for the most optimal parameters¶
We can do this in one of two ways:
- Manually via the command line
- Programmatically using python
We will use the latter in this example because it can be automated and would take less time.
Testing best parameters¶
We can also use the mlflow package to test a list of possible settings to see which performs the best.
parameters = [
{'alpha': 0.3, 'l1_ratio': 0.1},
{'alpha': 0.2, 'l1_ratio': 0.7},
{'alpha': 0.4, 'l1_ratio': 0.2},
{'alpha': 0.5, 'l1_ratio': 0.7},
{'alpha': 0.1, 'l1_ratio': 0.9},
{'alpha': 0.2, 'l1_ratio': 0.2},
{'alpha': 0.7},
]
model_location = 'wine-model'
import mlflow
for param in parameters:
print(f'Running with param = {param}')
res = mlflow.run(model_location, parameters=param, use_conda=False)
print(f'status={res.get_status()}')
Comparing the results¶
Start a terminal session and run this (in the mlflow-example folder). Then head over to the MLFlow UI
mlflow ui
Selecting the optimal run¶
After running you can view the runs of your model with the metrics of each time and compare to find the best configuration for use case.
For my example I would like to use the model with the lowest root mean square error (RMSE). Running the code in the cell below will find that run id and copy the built model into our deployment folder.
from shutil import copyfile
import pandas as pd
import os
# Reading Pandas Dataframe from mlflow
df=mlflow.search_runs(filter_string="metrics.rmse < 1")
# Fetching Run ID for
run = df.loc[df['metrics.rmse'].idxmin()]
run_id = run['run_id']
print(f'The optimal run id is {run_id}')
print(f'It had the parameters: alpha={run["params.alpha"]}, l1_ratio={run["params.l1_ratio"]}')
print(f'And RMSE: {run["metrics.rmse"]}')
copyStatus = copyfile(f'mlruns/0/{run_id}/artifacts/model/model.pkl', 'mlflow_deployment_package/model.pkl')
print('Model copied to the deployment!')
Deployment steps¶
Now that we have the optimal model copied into our deployment folder will deploy it to our UbiOps environment.
API_TOKEN = "<INSERT API_TOKEN WITH PROJECT EDITOR RIGHTS>" # Make sure this is in the format "Token token-code"
PROJECT_NAME = "<INSERT PROJECT NAME IN YOUR ACCOUNT>"
DEPLOYMENT_NAME = 'mlflow-deployment'
DEPLOYMENT_VERSION = 'v1'
# Import all necessary libraries
import shutil
import ubiops
client = ubiops.ApiClient(ubiops.Configuration(api_key={'Authorization': API_TOKEN},
host='https://api.ubiops.com/v2.1'))
api = ubiops.CoreApi(client)
os.mkdir("mlflow_deployment_package")
%%writefile mlflow_deployment_package/deployment.py
"""
The file containing the deployment code is required to be called 'deployment.py' and should contain the 'Deployment'
class and 'request' method.
"""
import os
import pickle
import pandas as pd
class Deployment:
def __init__(self, base_directory, context):
"""
Initialisation method for the deployment. It can for example be used for loading modules that have to be kept in
memory or setting up connections. Load your external model files (such as pickles or .h5 files) here.
"""
print("Initialising the model")
model_file = os.path.join(base_directory, "model.pkl")
with open('model.pkl', 'rb') as f:
self.model = pickle.load(f)
def request(self, data):
"""
Method for deployment requests, called separately for each individual request.
"""
print('Loading data')
input_data = pd.read_csv(data['data'])
print("Prediction being made")
prediction = self.model.predict(input_data)
# Writing the prediction to a csv for further use
print('Writing prediction to csv')
pd.DataFrame(prediction).to_csv('prediction.csv', header = ['MPG'], index_label= 'index')
return {
"prediction": 'prediction.csv',
}
%% writefile mlflow_deployment_package/requirements.txt
pandas==1.4.2
scikit-learn==1.0.2
Create Deployment¶
# Create the deployment
deployment_template = ubiops.DeploymentCreate(
name=DEPLOYMENT_NAME,
description='MLFlow deployment',
input_type='structured',
output_type='structured',
input_fields=[
{'name':'data', 'data_type':'file'},
],
output_fields=[
{'name':'prediction', 'data_type':'file'},
],
labels={'demo': 'mlflow-tutorial'}
)
api.deployments_create(
project_name=PROJECT_NAME,
data=deployment_template
)
# Create the version
version_template = ubiops.DeploymentVersionCreate(
version=DEPLOYMENT_VERSION,
environment='python3.9',
instance_type='512mb',
minimum_instances=0,
maximum_instances=1,
maximum_idle_time=1800, # = 30 minutes
request_retention_mode='none' # We don't need to store the requests for this deployment
)
api.deployment_versions_create(
project_name=PROJECT_NAME,
deployment_name=DEPLOYMENT_NAME,
data=version_template
)
# Zip the deployment package
shutil.make_archive('mlflow_deployment_package', 'zip', '.', 'mlflow_deployment_package')
# Upload the zipped deployment package
file_upload_result =api.revisions_file_upload(
project_name=PROJECT_NAME,
deployment_name=DEPLOYMENT_NAME,
version=DEPLOYMENT_VERSION,
file='mlflow_deployment_package.zip'
)
Making a request and exploring further¶
You can go ahead to the Web App and take a look in the user interface at what you have just built. If you want you can create a request to the mlflow deployment using the "dummy_data_to_predict.csv". The dummy data is just the horsepower data.
So there we have it! We have used MLFlow to try train a machine learning model on a large set of hyperparameters. Then we selected the best model, and deployed it to UbiOps. You can use this notebook to base your own deployments on. Just adapt the code in the deployment packages and alter the input and output fields as you wish and you should be good to go.
For any questions, feel free to reach out to us via the customer service portal: https://ubiops.atlassian.net/servicedesk/customer/portals