Deploy Falcon-7b-instruct in under 15 minutes using UbiOps

What can you get out of this guide?

In this guide, we explain how you can deploy any version of Falcon that’s available on Huggingface within 15 minutes.

For this guide we’ll need to do the following:

  • Create a UbiOps trial account
  • Create a coding environment for Falcon to run in
  • Create a deployment for the Falcon model

To successfully complete this guide, make sure you have:

  • Python 3.10 or higher installed
  • UbiOps Client Library installed
  • UbiOps account (see below)

You’ll also need the following files:


What is Falcon-7b-instruct?

Falcon is a causal decoder-only model developed by TII. Falcon was built from scratch, making use of a custom data pipeline and a distributed training library. At the time of writing this article TII offers several different versions of Falcon on Huggingface, each version ranging in either model and training data size or type (pre-trained or instruction/chat model). 

Screenshot of Falcon versions on HuggingFace website

In this guide we’ll be deploying the Falcon-7B-instruct model. Which is a fine-tuned version of the Falcon 7B model. The Falcon 7B model is claimed to outperform other open source models of a similar size. The original 7B model was trained on 1.5T tokens from  RefinedWeb, and then further fine-tuned on a mixture of different chat and instruct datasets to produce Falcon-7B-instruct



What is UbiOps?

UbiOps is a powerful AI model serving and orchestration service with unmatched simplicity, speed and scale. UbiOps minimizes DevOps time and costs to run, train and manage AI models, and distributes them on any compute infrastructure at scale. It is built for training, deploying, running and managing production-grade AI in an agile way. It features unique functionality for workflow orchestration (Pipelines), automatic adaptive scaling in hybrid or multi-cloud environments as well as key MLOps features. You can learn more about UbiOps features on our Product page.

If you’ve made it this far into this guide, you probably get the point that Falcon-7b-instruct and UbiOps are a great match for building valuable Generative AI applications quickly, no matter what industry you’re in.

With that said, let’s get started!


How to deploy Falcon-7b-instruct on UbiOps

The first step is to create a free UbiOps account. Simply sign up with an email address and within a few clicks you will be good to go.

In UbiOps you work within an organization, which can contain one or more projects. Within these projects you can create multiple deployments, which are basically your containerized code. You can also chain together deployments to create a pipeline

Create a project

Head over to the UbiOps WebApp and click on “Create new project”. You can give your project a unique name, or let UbiOps generate one for you.

Now that we have our project up and running we can start building the environment that our code will run in. 


Create a code environment

See our documentation page for more information on environments.

For this guide we’ll be building our environment explicitly. You can do this by going to the “Environments” tab on the left hand side, and clicking on “+ Custom environment”. Then, fill in the following parameters:

Namefalcon-7b-instruct-env
Base environmentUbuntu 22.04 + Python 3.10 + CUDA 11.7.1
Custom dependenciesUpload the environment package (also linked above)

Then click on the “Create” button below and UbiOps will start building your environment (this should only take a few minutes).



Create your Falcon-7b-instruct deployment

Now you can navigate to the “Deployments” tab on the left and click on “Create”. In the following menu, you can define the name of the deployment as well as its input(s) and output(s). The input and output fields of the deployment define what data the deployment expects when making a request (i.e. when running the model). For this guide you can use the following:

Namefalcon-environment
InputType: structured, Name: prompt, Data Type: string
OutputType: structured, Name: response, Data Type: Array of strings

After providing that information, UbiOps will generate Python and R deployment code snippets that can be downloaded or copied. These are used to create the deployment package. For this guide, we will be using Python.

===

To finish creating your deployment, scroll down and click “Next: Create a version”.



Create your Falcon-7b-instruct deployment version.

Each deployment on UbiOps consists of one or more versions. The input & output field of each deployment version is the same, but other settings like the base environment, memory allocation and even the deployed code can  differ from one another.

For the version you can use the following parameters:

Version namev1
Deployment packageUpload this file (also linked above)
Instance typeIf your project has access to GPU you can select the `16384 MB + 4 vCPU + NVIDIA Tesla T4`, otherwise choose the largest instance type shown.
Select code environmentfalcon-environment (the one we created earlier)


How to run a Falcon-7b-instruct model on UbiOps

Navigate to your Falcon-7b-instruct deployment version and click on the “Create Request” button to create your first request. For this model, your input should be formatted as a string. An example of a request could be: 

“Girafatron is obsessed with giraffes, the most glorious animal on the face of this Earth. Giraftron believes all other animals are irrelevant when compared to the glorious majesty of the giraffe.\nDaniel: Hello, Girafatron!\nGirafatron:”

Which generates the following response:

“Girafatron is obsessed with giraffes, the most glorious animal on the face of this Earth. Giraftron believes all other animals are irrelevant when compared to the glorious majesty of the giraffe.\nDaniel: Hello, Girafatron!\nGirafatron: Hello, Daniel. How are you?\nDaniel: Well… I’m not very good.\nGirafatron: What?\nDaniel: Well, there are so many problems at school! I’m really worried about it. I’m a good kid. I never get in trouble. I’m really good!”

How easy was that?


Conclusion

And there we have it!

Our very ownFalcon-7b-instruct chatbot, hosted and served on UbiOps. All in under 15 minutes, without needing a software engineer.

Naturally, there are further optimizations that can be made to the code to get your deployment running as fast as possible every time. We left these out of scope for this guide – but we invite you to iterate and improve your own deployment!

Deploying your model is only one of several steps necessary to successfully incorporate an LLM in your day-to-day business. Having completed this guide, you may now be wondering what other steps you need to take. If so, you can have a look at an article that we released recently that explains everything you need to know about Large Language Models Operations (LLMOps). 

LLMOps vs MLOPs

LLMOps is similar to MLOps, but differs in some key areas which we clarified in a previous article. One of the key differences is that with LLMOps you’ll probably be making use of pre-trained models which are trained on vast amounts of (general) data, rather than training the model from scratch. For some use cases you’ll then need to adapt the pre-trained model to the specific task that you want to use it for, which can be done in multiple ways. We recently released an article about how you can fine-tune Falcon, which should help you get started.

Contact us! 

If you’d like us to create an example tutorial on how to deploy a specific AI model, just shoot us a message!  The UbiOps team would love to help you bring your project to life!

Thanks for reading!

Latest news

Turn your AI & ML models into powerful services with UbiOps