flashinfer.comm.mixed_comm.run_mixed_comm¶

flashinfer.comm.mixed_comm.run_mixed_comm(op: MixedCommOp, handler: MixedCommHandler, x_in: Tensor, x_out: Tensor | None = None, mode: MixedCommMode | None = None) → Tensor¶

Execute a mixed communication operation.

This is the main entry point for running communication collectives through the mixed communication handler. It supports fused GPU kernels (using virtual memory intra-node and nvshmem inter-node), NCCL-based fallbacks, and autotuned mode selection.

Parameters:

op (MixedCommOp) – The communication operation to perform.
handler (MixedCommHandler) – An initialized MixedCommHandler.
x_in (torch.Tensor) – Input tensor. Must be at least 2-D and match the handler’s dtype / device.
x_out (torch.Tensor, optional) – Pre-allocated output tensor. Allocated automatically when None.
mode (MixedCommMode, optional) – Execution mode. When None, uses autotune (if enabled) or falls back to an NCCL mode.

Returns:

Output tensor containing the result of the collective operation.

Return type:

torch.Tensor