Deploy Mixtral in under 15 minutes using UbiOps

What can you get out of this guide?

In this guide, we explain how to:

  • Create a UbiOps account
  • Create a code environment
  • Create your Mixtral deployment
  • Create a deployment version
  • Make an API call to the model!

To successfully complete this guide, make sure you have:

  • UbiOps account (see below)
  • Python 3.10 or higher installed
  • UbiOps Client Library installed

You’ll also need the following files:

What is Mixtral?

Mixtral is a state-of-the-art model developed by MistralAI. It’s open source and very performant. It bases its architecture on a unique technique called Mixture of Experts. This technique was appropriated by MistralAI, leading to Mixtral of Experts. This means that the model’s output is a combination of several “experts,” which are in fact distinct neural networks. In this case, the smaller Mistral-7B models. Each “‘expert”’ is specialized to deal with certain kinds of prompts and is best suited for different cases. Their outputs are combined into a more refined version based on how each expert was specialized. This technique is purported to “achieve the same quality as its dense counterpart much faster during pretraining,” as well as pulling off “much faster inference compared to a dense model with the same number of parameters.”

This model, using the architecture defined in the Mixtral of Experts research paper, is called Mixtral-8x7B — or simply Mixtral — and it is currently one of the most performant large language models (LLMs) to date. It performs exceptionally well on several benchmarks and surpasses models which are greater in size, i.e., have a higher number of parameters. For instance, as shown in the figures below, Mixtral is superior or equivalent to LLaMa 2 70B in terms of performance, while being much cheaper to train and run. 

Source: Figure 2 in Mixtral of Experts

Source: Table 2 in Mixtral of Experts

In terms of the ratio between cost and quality, Mixtral is superior to LLaMa 2 models across most benchmarks. If you want to learn more about how to interpret these results, read Which LLM to choose for your use case.

The Mixtral model we will be deploying in this guide, mixtral-8x7b-instruct, can be found on Hugging Face.

What is UbiOps?

UbiOps is a powerful AI model serving and orchestration service with unmatched simplicity, speed and scale. UbiOps minimizes DevOps time and costs to run, train, and manage AI models, and distributes them on any compute infrastructure at scale. It’s built for training, deploying, running and managing production-grade AI in an agile way. It features unique functionality for workflow orchestration (Pipelines), automatic adaptive scaling in hybrid or multi-cloud environments, as well as key MLOps features. You can learn more about UbiOps features on our Product page. For the record, we have dedicated guides on how to deploy Mistral-7b, BERT, LLaMa 2, and Falcon, using UbiOps.

How to deploy Mixtral on UbiOps

The first step is to create a UbiOps account.

In UbiOps you work within an organization, which can contain one or more projects. Within these projects you can create multiple deployments, which are basically your containerized code. You can also chain together deployments to create a pipeline

Create a project

Head over to the UbiOps WebApp and click on “Create new project.” Your project can give your project a unique name, or let UbiOps generate one for you.

Now that we have our project up and running we can start building the environment that our code will run in. 

Create a code environment

There are two ways to create environment using UbiOps: 

  • Explicitly: by going to the “Environments” tab and creating an environment there
  • Implicitly: by adding environment files (requirements.txt and/or ubiops.yaml) to your deployment package. When these files are detected, UbiOps will build an environment for you.

See our documentation page for more information on environments.

For this guide we’ll be building our environment explicitly. You can do this by going to the “Environments” tab on the left hand side, and clicking on “+ Custom environment.” Then, fill in the following parameters:

Namemixtral-environment
Base environmentUbuntu 22.04 + Python 3.10 + CUDA 11.7.1
Custom dependenciesUpload the environment package 

Version labels are optional. 

Then click on the “Create” button below and UbiOps will start building your environment (this should only take a few minutes).

Create your Mixtral deployment

Now you can navigate to the “Deployments” tab on the left and click on “Create.” In the following menu, you can define the name of the deployment as well as its input(s) and output(s). The input and output fields of the deployment define what data the deployment expects when making a request (i.e. when running the model). For this guide you can use the following:

Namemixtral-8x7b-instruct
InputName: prompt, Data Type: string
OutputName: response, Data Type: string

After providing that information, UbiOps will generate Python and R deployment code snippets that can be downloaded or copied. These are used to create the deployment package. For this guide, we will be using Python.

To finish creating your deployment, click “Next: Create a version.”

Create a deployment version

Upload the Mixtral deployment file, which contains the code that retrieves the Mixtral model from HuggingFace. In the environment settings, select mixtral-environment from the “Select code environment” dropdown menu.

Then, select the hardware the model will run on. For this deployment, you’ll need at least the ‘76000MB + 11 vCPU + NVIDIA A100 40GB’ instance type. Due to Mixtral’s size, this deployment will be quantised. 

When you are happy with your settings, click on “Create” and UbiOps will get straight to work building your deployment version. 

Once that’s done, your model will be ready for action!

How to run a Mixtral model on UbiOps

Navigate to your Mixtral deployment version and click on the “Create Request” button to create your first request. For this model, your input could be “tell me a fun fact about bears.”

How easy was that?

Conclusion

And there we have it!

Your very own Mixtral deployment, hosted and served on UbiOps. All in under 15 minutes, without needing an enterprise-grade data center.

Naturally, there are further optimizations that can be made to the code to get your deployment running as fast as possible every time. We left these out of scope for this guide – but we invite you to iterate and improve your own deployment!

Having completed this guide, you may now be wondering how to fine-tune your LLM, implement RAG, or build a chatbot front-end. For more guides and tutorials, head over to the UbiOps blog. Or, for guidance on Ubiops features, check out our documentation.

If you’d like us to write about something specific, just shoot us a message or start a conversation in our Slack community. The UbiOps team would love to help you bring your project to life!

Thanks for reading! 

Latest news

Turn your AI & ML models into powerful services with UbiOps