flashinfer.comm.moe_a2a_get_workspace_size_per_rank¶
- flashinfer.comm.moe_a2a_get_workspace_size_per_rank(ep_size: int, max_num_tokens: int, total_dispatch_payload_size_per_token: int, combine_payload_size_per_token: int)¶
Get the workspace size per rank for the MoeAlltoAll operation.
- Parameters:
ep_size – Total expert parallel size
max_num_tokens – Maximum number of tokens across all ranks
total_dispatch_payload_size_per_token – The size of the payload per token in the dispatch phase. This should be the sum of all payloads.
combine_payload_size_per_token – The size of the payload per token in the combine phase.
- Returns:
Size of the workspace per rank in bytes
- Return type:
workspace_size_per_rank