In October 2023 MistralAI released Mistral 7B, which is claimed to be the most powerful Large Language Model (LLM) to date for its size. As the name suggests the model has 7,3 billion parameters, and outperforms LLaMa 2 13B and LLaMa 1 34B on several benchmarks:
Mistral 7B benchmarked against LLaMa 1 7 LLaMa 2 (MistralAI, 2o23)
According to MistralAI, the model can be easily fine-tuned for any task. When fine-tuned with one of the instructions dataset that is free, and publicly available on Huggingface the fine-tuned model (Mistral 7B instruct) was able to outperform models with more parameters, like the WizardLM-13B-v1.1 model:
In most cases the more parameters, i.e, the larger the model, the better the performance. But the increase in size also comes with an increase of costs: as the model gets bigger it gets more expensive, and to inference, and since fine-tuning means that you need to (partially) re-train the model, the cost and required resources of adapting the pre-trained model to a downstream tasks increases too. That is why making use of smaller models that give the same performance can be a smart thing to do.
In this guide we’ll deploy the Mistral 7B platform from Huggingface on UbiOps. UbiOps is a platform where you can deploy AI products, like LLaMa 2 or Stable Diffusion, to a production environment. In the background UbiOps also ensures that your model is always available (having an uptime of 99.99%) and takes care of auto-scaling. The fastest inference time is ensured by giving you the possibility to run your model on state of the art hardware. Working with private data? Not a problem with UbiOps, offering an on-premise, hybrid or cloud solution for you to run your models on.
UbiOps makes it possible to deploy the Mistral 7B platform in four steps. All we need to do is:
- Create a UbiOps account
- Create an environment for our deployment (the model) to run in
- Create a deployment for the Mistral 7B
- Make an API call for the model to run.
Create a project and deployment
In UbiOps you work in an organization, which contains (multiple) projects. In these projects you can then create multiple deployments, which is basically your containerized code. These deployments can also be chained together to create a pipeline.
If you haven’t already, go ahead and create an account. Then head over to the UbiOps WebApp and click on “Create new project”. You can give your project a unique name, or let UbiOps generate one for you.
Now that we have our project up and running we can start building the environment where our code will run in.
The code and the environment it runs in can be handled separately on UbiOps. An environment consists of a base environment (OS, Python language, and a CUDA version) to which we can add additional dependencies, which will create a custom environment. The environments on UbiOps are docker containers, the deployment code we’ll be deploying later will be loaded on top of the environment when the instance is started up. You can also reuse environments between different deployment versions or training experiments, which will help reduce build time.
Environments can be build in two ways:
- Explicitly: by going to the “Environments” tab and creating an environment there
- Implicitly: by adding environment files (requirements.txt and/or ubiops.yaml) to your deployment package. When these files are detected, UbiOps will build an environment for you.
An example of building an environment implicitly can be found in the guide for deploying a Stable Diffusion model. For this guide we’ll be building our environment explicitly. You can do this by going to the “Environments” tab on the left hand side, and clicking on the “+ Custom environment”. Now we need to fill in some parameters:
|Name||mistral-environment (you can pick any name you like here)|
|Base environment||Ubuntu 22.04 + Python 3.10 + CUDA 11.7.1|
|Custom Dependencies||Download & upload this package|
After filling everything in we can click on the “Create” button below and UbiOps will start building our environment, which will take about ten minutes.
A deployment is an object within UbiOps that will serve your code to process data. Each deployment gets a unique API endpoint, which you can use to send requests to your deployment from another application or a front-end of some sort. You can for example create a front-end using Streamlit, and connect it to your deployment to serve requests. An example of that can be found in the deploy LLaMa guide. For each deployment you need to define an input & output, so your deployment knows what kind of data it can expect. Each deployment consists of one or more versions. Each version will have the same in- & output as is defined on deployment level but the deployed code, environment, and other settings can differ.
You can create a deployment by going to the “Deployment” tab on the left hand side of the WebApp, then click on the “Create” button. Now we’re prompted to fill in some fields again, such as the name and the input & output. For this guide you can use the following:
|Input||Name: prompt, data_type: string|
|Output||Name: response, data_type: string|
Now click on “Next: Create a version”. Here we will upload the deployment file that contains the code that downloads the model from Huggingface, and instructs UbiOps how to handle requests. We also need to select the hardware the model will run on here, and select the environment we previously created for the model to run in.
If you have GPU access you can select any instance type that at least makes use of a 40GB A100. For the code environment select the mistral-environment. Then scroll all the way down and click on the “Create” button, after this UbiOps will start deploying your code.
You can follow the progress of the building process by clicking on the “Logging” button, on top of the deployment or in the menu on the left hand side.
When the deployment version has finished building it is able to handle requests. You can click on the “Create Request” button to create your first request. The prompt should be formatted in a certain way to leverage instruction fine-tuning. You can use the example prompt in the code block below for your first request:
“<s>[INST] What is your favourite condiment? [/INST]”
“Well, I’m quite partial to a good squeeze of fresh lemon juice. It adds just the right amount of zesty flavour to whatever I’m cooking up in the kitchen!</s> “
“[INST] Do you have mayonnaise recipes? [/INST]”
This prompt will generate the following response:
And there you have it!
Deploying an LLM is only one step in the life cycle of LLM powered applications, which is also referred to as Large Language Operations (LLMOps). If you’re not sure what else you need to do, or need more information about LLMOps in general have a look at our LLMOps guide. This guide provides everything there is to know about LLMOps, and what you need to do to fully harness the power of an LLM in your day to day operations. We also released guides on how you can easily deploy other foundation model from Huggingface:
- LLaMa 2 with a customizable front-end
- Bert transformer model
- Deploy Stable Diffusion (a text to image model) to UbiOps
If you’re curious about how UbiOps can help you with deploying & training your AI models, please don’t hesitate to get in touch so we can have a chat about what we can do for you!