PD disaggregation¶
xllm supports PD disaggregation deployment, which requires integration with our other open-source library xllm service.
xLLM Service Dependencies¶
First, download and install xllm service, similar to installing and compiling xllm:
git clone https://github.com/jd-opensource/xllm-service
cd xllm_service
git submodule init
git submodule update
etcd Installation¶
xllm_service compilation and operation depend on etcd.Use the installation script provided by etcd for installation. The default installation path provided by the script is /tmp/etcd-download-test/etcd. You can either manually modify the installation path in the script or manually migrate after running the script:
xLLM Service Compilation¶
Apply patch:
Then execute the compilation:Potential Errors
You may encounter installation errors related to boost-locale and boost-interprocess: vcpkg-src/packages/boost-locale_x64-linux/include: No such file or directory, /vcpkg-src/packages/boost-interprocess_x64-linux/include: No such file or directory Reinstall these packages using vcpkg:
PD Disaggregation Execution¶
Start etcd:
./etcd-download-test/etcd --listen-peer-urls 'http://localhost:2390' --listen-client-urls 'http://localhost:2389' --advertise-client-urls 'http://localhost:2391'
Start xllm service:
ENABLE_DECODE_RESPONSE_TO_SERVICE=true ./xllm_master_serving --etcd_addr="127.0.0.1:12389" --http_server_port 28888 --rpc_server_port 28889 --tokenizer_path=/path/to/tokenizer_config_dir/
Taking Qwen2-7B as an example:
- Start Prefill Instance
- Start Decode Instance
Important notes:
-
For PD disaggregation when specifying NPU Device, the corresponding
device_ipis required. This is different for each device. You can see this by executing the following command on the physical machine outside the container environment. The value afteraddress_{i}=displayed is thedevice_ipcorresponding toNPU {i}. -
etcd_addrmust match theetcd_addrofxllm_service
The test command is similar to above. Note that the PORT in curl http://localhost:{PORT}/v1/chat/completions ... should be the port of the http_server_port of xLLM service.