flashinfer.comm.moe_a2a_dispatch¶

flashinfer.comm.moe_a2a_dispatch(token_selected_experts: Tensor, input_payloads: list[Tensor], workspace: Tensor, metainfo: Tensor, runtime_max_tokens_per_rank: int, ep_rank: int, ep_size: int, top_k: int, num_experts: int)¶

Dispatch tokens and payloads to expert ranks.

Parameters:

token_selected_experts – [local_num_tokens, top_k] int32 tensor
input_payloads – List of [local_num_tokens, *] tensors to dispatch
workspace – [ep_size, size_per_rank] workspace tensor
metainfo – Metadata tensor from initialize
runtime_max_tokens_per_rank – Max tokens per rank in this batch
ep_rank – Current expert parallel rank
ep_size – Total expert parallel size
top_k – Number of experts per token
num_experts – Total number of experts

Returns:

List of payloads for this rank, backed by data in the workspace combine_payload_offset: The offset to place the combine payload in the workspace

Return type:

output_payloads