Blog-overview - Page 2 of 17 - UbiOps - AI model serving, orchestration & training

What is multi-model routing?

June 21, 2024 / June 21, 2024 by [email protected]

Multi-model routing is a process of linking multiple AI models together. The routing can either be done in series or in parallel, meaning that you use a router to send prompts to specific models. Example of a simple multi-model route Multi-modal routing can have various sorts of benefits. It enables you to have smaller and […]

Tagged

ai deploy ml multimodel

Collaborations

Managing and monitoring your LLM applications

June 20, 2024 / June 20, 2024 by [email protected]

How UbiOps and Arize help you stay in control LLMs are all the rage at the moment, and the APIs of closed source models like GPT-4 have made it easier than ever to leverage the power of AI. However, for a lot of regulated industries these closed source models are not an option. Luckily there […]

Tagged

ai deploy llm monitoring

Deploy your model UbiOps

Deploy Mistral 7B v0.3 (Function Calling)

June 7, 2024 / June 7, 2024 by [email protected]

When Mistral released their Mistral 7B v0.2 model it was claimed to be the most powerful 7B Large Language Model (LLM) at that time. Now Mistral has released a new version, called Mistral 7B v0.3. The new version of Mistral 7B builds further on the success of the previous version. The model has an increased […]

Functionality LLM

Reducing inference costs for GenAI

May 28, 2024 / May 28, 2024 by [email protected]

Reducing inference costs for GenAI

Functionality Technology

Creating a front-end for your Mistral RAG

May 22, 2024 / May 22, 2024 by [email protected]

In a previous article we showed how you can set up a Retrievel Augmented Generation (RAG) framework for the Mistral-7B-v.02 Instruct LLM using the UbiOps WebApp. In this article we’ll go a step further and create a front-end for that set-up using Streamlit, and we’ll be using the UbiOps Python Client Library to set-up the […]

Tagged

llm mistral RAG

Functionality LLM

How to optimize inference speed using batching, vLLM, and UbiOps

May 15, 2024 / May 15, 2024 by [email protected]

In this guide, we will show you how to increase data throughput for LLMs using batching, specifically by utilizing the vLLM library. We will explain some of the techniques it leverages and show why they are useful. We will be looking at the PagedAttention algorithm in particular. Our setup will achieve impressive performance results and […]

Functionality

How to benchmark and optimize LLM inference performance (for data scientists)

May 3, 2024 / May 3, 2024 by [email protected]

Introduction Optimizing inference is a machine learning (ML) engineer’s task. In a lot of cases, though, it tends to fall into the hands of data scientists. Whether you’re a data scientist deploying models as a hobby or whether you work in a team that lacks engineers, at some point you will probably have to start […]

Tagged

benchmark llm

Deploy your model LLM

Deploy Llama 3 8B in under 15 minutes using UbiOps

April 25, 2024 / April 25, 2024 by [email protected]

What can you get out of this guide? In this guide, we explain how to: To successfully complete this guide, make sure you have: You’ll also need the following files: What is Llama 3 8B? Llama 3 is the most recent model of the Llama series developed by Meta. It comes in two sizes, the […]

Tagged

API deploy hugging face llama3 meta

Technology UbiOps

How to build a RAG query engine with LlamaIndex and UbiOps

April 18, 2024 / April 18, 2024 by [email protected]

Large Language Models (LLMs) are trained on vast datasets with data sourced from the public internet. But these datasets of course do not include specific datapoints regarding your business or use case. Retrieval-Augmented Generation (RAG) addresses this by dynamically incorporating your data as context in a prompt to your LLM. This way there is no […]

Tagged

llamaindex RAG

Deploy your model UbiOps

Deploy Gemma 7B in under 15 minutes with UbiOps

April 18, 2024 / April 18, 2024 by [email protected]

What can you get out of this guide? In this guide, we explain how to: To successfully complete this guide, make sure you have: You’ll also need the following files which are available in the appendix: What is Gemma 7B? Gemma is the latest model series released by Google in February 2024. It comes in […]

Tagged

7b deploy gemma

Product update Uncategorized

New UbiOps features April 2024

April 9, 2024 / April 9, 2024 by [email protected]

On the 9th of April 2024 we have released new functionality and made improvements to our UbiOps SaaS product. An overview of the changes is given below. Python client library version for this release: 4.4.0 CLI version for this release: 3.4.0 https://youtu.be/3sZdCpmX030 Port forwarding (beta) If you want to run processes in a deployment that […]

Functionality Technology

Fine-tune a model on your own documentation

March 28, 2024 / March 28, 2024 by [email protected]

In this article, we will be creating a chatbot which is fine-tuned on custom documentation. We’ll use UbiOps—which is an AI deployment, serving and management platform—to fine-tune and deploy the instruction-tuned Mistral-7B model taken from Hugging Face. We’ll explain some of the methods used to fine-tune models, such as instruction tuning and domain adaptation, but […]

Tagged

#AI finetune ml

By industry

By application

On-demand GPU

Featured customers

Latest news

WaterFlex – Slim waterbeheer voor een stabiel energiesysteem

Why is Hybrid Cloud Deployment Useful?

What is multi-model routing?

Managing and monitoring your LLM applications

Deploy Mistral 7B v0.3 (Function Calling)

Reducing inference costs for GenAI

Creating a front-end for your Mistral RAG

How to optimize inference speed using batching, vLLM, and UbiOps

How to benchmark and optimize LLM inference performance (for data scientists)

Deploy Llama 3 8B in under 15 minutes using UbiOps

How to build a RAG query engine with LlamaIndex and UbiOps

Deploy Gemma 7B in under 15 minutes with UbiOps

New UbiOps features April 2024

Fine-tune a model on your own documentation

Contact

Company

Follow us

Knowledge Base