flashinfer.comm.create_shared_buffer¶
- flashinfer.comm.create_shared_buffer(size_in_bytes: int, group: ProcessGroup | None = None) → List[int]¶
Allocate a buffer and share it across the process group via CUDA IPC.
The local rank performs
cudaMallocfollowed bycudaIpcGetMemHandle; other ranks open the resulting handle to obtain a pointer mapped into their address space.- Parameters:
size_in_bytes (int) – Size of the buffer to allocate per rank.
group (torch.distributed.ProcessGroup, optional) – Process group to exchange IPC handles across. Defaults to
dist.group.WORLD.
- Returns:
Per-rank device pointers (rank-local at position
rank, IPC-mapped for the others).- Return type:
list[int]