Skip to content

Continuous Scheduler

Feature Introduction

xLLM implements a scheduling strategy that supports continuous batching. Continuous batching is a dynamic batching strategy that does not wait for a batch to be filled. Instead, it starts processing as soon as requests are available, while continuously accepting new requests and adding them to the currently executing batch. This approach significantly reduces latency while maintaining high throughput.

Usage

The continuous batching scheduling strategy is implemented in xLLM. If no other scheduling strategies are enabled, continuous batching is used by default.