flashinfer.comm.mixed_comm.run_mixed_comm¶
- flashinfer.comm.mixed_comm.run_mixed_comm(op: MixedCommOp, handler: MixedCommHandler, x_in: Tensor, x_out: Tensor | None = None, mode: MixedCommMode | None = None) Tensor¶
Execute a mixed communication operation.
This is the main entry point for running communication collectives through the mixed communication handler. It supports fused GPU kernels (using virtual memory intra-node and nvshmem inter-node), NCCL-based fallbacks, and autotuned mode selection.
- Parameters:
op (MixedCommOp) – The communication operation to perform.
handler (MixedCommHandler) – An initialized
MixedCommHandler.x_in (torch.Tensor) – Input tensor. Must be at least 2-D and match the handler’s dtype / device.
x_out (torch.Tensor, optional) – Pre-allocated output tensor. Allocated automatically when
None.mode (MixedCommMode, optional) – Execution mode. When
None, uses autotune (if enabled) or falls back to an NCCL mode.
- Returns:
Output tensor containing the result of the collective operation.
- Return type:
torch.Tensor