flashinfer.comm.create_allreduce_fusion_workspace

flashinfer.comm.create_allreduce_fusion_workspace(backend: Literal['trtllm', 'mnnvl', 'auto'] = 'auto', world_size: int = None, rank: int = None, max_token_num: int = None, hidden_dim: int = None, dtype: dtype = None, gpus_per_node: int = None, comm_backend: CommBackend | None = None, force_oneshot_support: bool = False) AllReduceFusionWorkspace

Create workspace for AllReduce fusion operations.

Backend selection uses topology-based checks and heuristics.

Important: Workspace Reusability The workspace is allocated based on the total size (max_token_num * hidden_dim * dtype_size). You can reuse the same workspace with different shapes as long as the total size fits:

  • Workspace(max_token_num=2048, hidden_dim=4096) can handle: - (token_num=2048, hidden_dim=4096) ✓ - (token_num=1024, hidden_dim=4096) ✓ - (token_num=4096, hidden_dim=2048) ✓ (same total size) - (token_num=1024, hidden_dim=8192) ✓ (same total size) - (token_num=4096, hidden_dim=4096) ✗ (too large)

Use workspace.is_buffer_size_sufficient(token_num, hidden_dim, dtype) to check before use.

Parameters:
  • backend – Backend to use (“trtllm”, “mnnvl”, or “auto”) “auto” uses heuristic to select best backend

  • world_size – Number of ranks in the process group

  • rank – Current rank ID

  • max_token_num – Maximum number of tokens to support

  • hidden_dim – Hidden dimension size

  • dtype – Data type for communication tensors

  • gpus_per_node – Number of GPUs per node (for multi-node topology).

  • comm_backend – Communication backend to use.

  • force_oneshot_support – Allocate workspace for oneshot strategy vs twoshot True: Allocate workspace for oneshot strategy up to the largest problem size requested False: Allocate workspace for twoshot strategy for all problem sizes, and for oneshot strategy up to the heuristic threshold. Note that only the workspace for MNNVL backend needs to be initialized with the correct strategy. The trtllm backend will be sufficient for both strategies.

Returns:

Workspace object (TRTLLMAllReduceFusionWorkspace or MNNVLAllReduceFusionWorkspace) The workspace type determines which backend will be used in allreduce_fusion()

Raises:
  • BackendSupportedError – If no suitable backend available for the configuration

  • ValueError – If problem size not supported for the specified backend

Examples

>>> # Auto-select best backend
>>> workspace = create_allreduce_fusion_workspace(
...     backend="auto",
...     world_size=8,
...     rank=0,
...     max_token_num=2048,
...     hidden_dim=4096,
...     dtype=torch.bfloat16,
... )
>>> print(workspace.backend)  # "trtllm"
>>> print(workspace.get_workspace_capacity())  # 8388608 elements
>>> # Check if workspace can handle different problem sizes
>>> workspace.is_buffer_size_sufficient(1024, 4096, 8, torch.bfloat16)  # True
>>> workspace.is_buffer_size_sufficient(4096, 2048, 8, torch.bfloat16)  # True (same total)
>>> # Explicit backend selection
>>> workspace = create_allreduce_fusion_workspace(
...     backend="mnnvl",
...     world_size=16,
...     rank=0,
...     max_token_num=2048,
...     hidden_dim=4096,
...     dtype=torch.bfloat16,
... )
>>> print(workspace.backend)  # "mnnvl"