PD disaggregation¶
xllm supports PD disaggregation deployment, which requires integration with our other open-source library xllm service.
xLLM Service Dependencies¶
First, download and install xllm service, similar to installing and compiling xllm:
git clone https://github.com/jd-opensource/xllm-service
cd xllm_service
git submodule init
git submodule update
etcd Installation¶
xllm_service compilation and operation depend on etcd.Use the installation script provided by etcd for installation. The default installation path provided by the script is /tmp/etcd-download-test/etcd. You can either manually modify the installation path in the script or manually migrate after running the script:
xLLM Service Compilation¶
Apply patch:
Then execute the compilation:Potential Errors
You may encounter installation errors related to boost-locale and boost-interprocess: vcpkg-src/packages/boost-locale_x64-linux/include: No such file or directory, /vcpkg-src/packages/boost-interprocess_x64-linux/include: No such file or directory Reinstall these packages using vcpkg:
PD Disaggregation Execution¶
Start etcd:
./etcd-download-test/etcd --listen-peer-urls 'http://localhost:2390' --listen-client-urls 'http://localhost:2389' --advertise-client-urls 'http://localhost:2391'
Start xllm service:
ENABLE_DECODE_RESPONSE_TO_SERVICE=true ./xllm_master_serving --etcd_addr="127.0.0.1:12389" --http_server_port 28888 --rpc_server_port 28889 --tokenizer_path=/path/to/tokenizer_config_dir/
Taking Qwen2-7B as an example:
- Start Prefill Instance
/path/to/xllm --model=path/to/Qwen2-7B-Instruct \ --port=8010 \ --devices="npu:0" \ --master_node_addr="127.0.0.1:18888" \ --enable_prefix_cache=false \ --enable_chunked_prefill=false \ --enable_disagg_pd=true \ --instance_role=PREFILL \ --etcd_addr=127.0.0.1:12389 \ --transfer_listen_port=26000 \ --disagg_pd_port=7777 \ --node_rank=0 \ --nnodes=1 - Start Decode Instance
/path/to/xllm --model=path/to/Qwen2-7B-Instruct \ --port=8020 \ --devices="npu:1" \ --master_node_addr="127.0.0.1:18898" \ --enable_prefix_cache=false \ --enable_chunked_prefill=false \ --enable_disagg_pd=true \ --instance_role=DECODE \ --etcd_addr=127.0.0.1:12389 \ --transfer_listen_port=26100 \ --disagg_pd_port=7787 \ --node_rank=0 \ --nnodes=1
Important notes:
-
PD disaggregation requires reading the
/etc/hccn.conffile. Make sure this file on the physical machine is mapped into the container. -
etcd_addrmust match theetcd_addrofxllm_service
The test command is similar to above. Note that the PORT in curl http://localhost:{PORT}/v1/chat/completions ... should be the port of the http_server_port of xLLM service.