flashinfer.comm.mixed_comm.run_mixed_comm

flashinfer.comm.mixed_comm.run_mixed_comm(op: MixedCommOp, handler: MixedCommHandler, x_in: Tensor, x_out: Tensor | None = None, mode: MixedCommMode | None = None) Tensor

Execute a mixed communication operation.

This is the main entry point for running communication collectives through the mixed communication handler. It supports fused GPU kernels (using virtual memory intra-node and nvshmem inter-node), NCCL-based fallbacks, and autotuned mode selection.

Parameters:
  • op (MixedCommOp) – The communication operation to perform.

  • handler (MixedCommHandler) – An initialized MixedCommHandler.

  • x_in (torch.Tensor) – Input tensor. Must be at least 2-D and match the handler’s dtype / device.

  • x_out (torch.Tensor, optional) – Pre-allocated output tensor. Allocated automatically when None.

  • mode (MixedCommMode, optional) – Execution mode. When None, uses autotune (if enabled) or falls back to an NCCL mode.

Returns:

Output tensor containing the result of the collective operation.

Return type:

torch.Tensor