Instantly scale AI and machine learning workloads on GPU on-demand
Functionality LLM
May 15, 2024 / May 15, 2024 by [email protected]
In this guide, we will show you how to increase data throughput for LLMs using batching, specifically by utilizing the vLLM library. We will explain some of the techniques it leverages and show why they are useful. We will be looking at the PagedAttention algorithm in particular. Our setup will achieve impressive performance results and […]
Read more »