flashinfer.comm.vllm_init_custom_ar

flashinfer.comm.vllm_init_custom_ar(ipc_tensors: List[int], rank_data: Tensor, rank: int, full_nvlink: bool) int

Initialize the vLLM custom all-reduce backend.

Parameters:
  • ipc_tensors (list[int]) – IPC pointers to the per-rank communication buffers.

  • rank_data (torch.Tensor) – Scratch tensor (one per rank) used for metadata exchange.

  • rank (int) – Current rank within the all-reduce world.

  • full_nvlink (bool) – True when every pair of ranks is connected via NVLink (enables the fully-NVLink-optimized kernel path).

Returns:

Opaque handle (fa) to be passed to subsequent vllm_ar calls.

Return type:

int