Multi-model routing is a process of linking multiple AI models together. The routing can either be done in series or in parallel, meaning that you use a router to send prompts to specific models.
Example of a simple multi-model route
Multi-modal routing can have various sorts of benefits. It enables you to have smaller and more specialized models, which can be more efficient than using a large and generalized model. It also enables the processing of a larger variety of data, including images, sound, video, and text
Multi-model routing is very easy to achieve with UbiOps using our Pipelines feature. Pipelines are tools which allow you to create modular workflows and connect deployments and operators to each other. Each pipeline has their own REST endpoint so they act as their own deployment. This means that you can create a containerized application which contains several GenAI models which are coupled and routed using logical operations.
There are several uses for multi-model routing. In this article, we will discuss three:
- Multimodality
- Prompt enhancement
- Expert routing
Multimodal LLMs
A major usage of multi-model routing is to make your model multi-modal. When using OpenAI’s GPT-4, you are not actually using only one model, you are in fact using several different ones, each specialized in processing different data types.
What is a multi-modal LLM? A multi-modal LLM is a group of models which are connected and work together to respond to a variety of different prompts. For instance, GPT-4 uses a dedicated model to respond to image prompts. The models are either specialized in processing different data types, meaning sound, video, text or images.
With routing, you can respond to different requests differently and process a variety of data types, increasing the scope of GenAI use. Using UbiOps’ Pipelines feature, it’s easy to implement a variety of multi-modal architectures.
Prompt enhancement
Another way to use multi-model routing is to use dedicated prompt enhancement models to make content generation models more performant. Text-to-image models have been known to sometimes be coupled with a language model designed with enhancing prompts.
What is a text-to-image model? A text-to-image model is a model which generates an image based on a prompt. An prominent example of a text-to-image model is Stable Diffusion. We have a guide on how to deploy Stable diffusion on UbiOps.
What is a prompt enhancement model? A prompt enhancement model is a model specialized in re-arranging, re-wording, adding or removing from a prompt in order to maximize the performance of the prompt when inferring on a GenAI model. Taking Stable Diffusion as an example, there are models such as MagicPrompt-Stable-Diffusion which will enhance your prompt to make it more interpretable to stable diffusion.
The routing might look something like this:
Prompt enhancement multi-model routing
Routing between different “experts”
A potential third usage of multimodal routing would be to route a prompt between several expert models which are specialized in a certain domain. It is similar to the mixture of expert technique models like Mixtral use but is used for more specific use cases.
What is an “expert” in GenAI? An expert in GenAI is a model which is specialized in a certain domain. In our case, this would be a specific topic. This use case would be very useful for very broad but important fields such as medicine. It is often not cost effective to fine-tune a large model, it is sometimes better to fine-tune several smaller models and route different prompts depending on the field needed to answer the question.
An expert routing system would look something like this:
Expert multi-model routing example
Conclusion
In this article, we discussed 3 ways to use multi-model routing: to create a multimodal LLM, to enhance prompts and to route between different experts. We will be creating a tutorial on how to implement these in practice on the UbiOps platform!
If you are interested in LLMs, read our article about which LLM to use for your use case. If you are interested in fine-tuning, read our tutorial covering fine-tuning mistral on your own documentation.