flashinfer.comm.moe_a2a_dispatch

flashinfer.comm.moe_a2a_dispatch(token_selected_experts: Tensor, input_payloads: list[Tensor], workspace: Tensor, metainfo: Tensor, runtime_max_tokens_per_rank: int, ep_rank: int, ep_size: int, top_k: int, num_experts: int)

Dispatch tokens and payloads to expert ranks.

Parameters:
  • token_selected_experts – [local_num_tokens, top_k] int32 tensor

  • input_payloads – List of [local_num_tokens, *] tensors to dispatch

  • workspace – [ep_size, size_per_rank] workspace tensor

  • metainfo – Metadata tensor from initialize

  • runtime_max_tokens_per_rank – Max tokens per rank in this batch

  • ep_rank – Current expert parallel rank

  • ep_size – Total expert parallel size

  • top_k – Number of experts per token

  • num_experts – Total number of experts

Returns:

List of payloads for this rank, backed by data in the workspace combine_payload_offset: The offset to place the combine payload in the workspace

Return type:

output_payloads