Skip to content

Continuous Scheduler

Feature Introduction

xLLM implements a scheduling strategy that supports continuous batching. Continuous batching is a dynamic batching strategy that does not wait for a batch to be filled. Instead, it starts processing as soon as requests are available, while continuously accepting new requests and adding them to the currently executing batch. This approach significantly reduces latency while maintaining high throughput.

Usage

xLLM implements the continuous batching scheduling strategy. And, the default scheduler is chunked prefill, enable_chunked_prefill=true by default.