May 15, 2024 - UbiOps - AI model serving, orchestration & training

Functionality LLM

How to optimize inference speed using batching, vLLM, and UbiOps

May 15, 2024 / May 15, 2024 by UbiOps

In this guide, we will show you how to increase data throughput for LLMs using batching, specifically by utilizing the vLLM library. We will explain some of the techniques it leverages and show why they are useful. We will be looking at the PagedAttention algorithm in particular. Our setup will achieve impressive performance results and […]