flashinfer.comm.create_shared_buffer

flashinfer.comm.create_shared_buffer(size_in_bytes: int, group: ProcessGroup | None = None) List[int]

Allocate a buffer and share it across the process group via CUDA IPC.

The local rank performs cudaMalloc followed by cudaIpcGetMemHandle; other ranks open the resulting handle to obtain a pointer mapped into their address space.

Parameters:
  • size_in_bytes (int) – Size of the buffer to allocate per rank.

  • group (torch.distributed.ProcessGroup, optional) – Process group to exchange IPC handles across. Defaults to dist.group.WORLD.

Returns:

Per-rank device pointers (rank-local at position rank, IPC-mapped for the others).

Return type:

list[int]