flashinfer.comm.trtllm_create_ipc_workspace_for_all_reduce¶
- flashinfer.comm.trtllm_create_ipc_workspace_for_all_reduce(rank: int, tp_size: int, max_token_num: int, hidden_dim, group: ProcessGroup | None = None) List[List[int]] ¶
Parameters: - rank: the rank of the current process. - tp_size: the size of the process group. - max_token_num: the maximum number of tokens in a sequence. - hidden_dim: the dimension of the hidden states. - group: the process group to use.
Note: This function is used to create a workspace for all reduce. The workspace is a list of IPC handles. The workspace should be initialized before calling trtllm_custom_all_reduce. The workspace should be destroyed after calling trtllm_custom_all_reduce. The workspace can be reused for multiple all reduce calls under the same configuration.
We would init 7 IPC buffers for trtllm_custom_all_reduce. They are sized as follows: [buffer_size, buffer_size, flag_size, flag_size, lamport_buffer_size, lamport_buffer_size, lamport_buffer_size] where: - buffer_size: tp_size * max_token_num * hidden_dim * sizeof(float) * (maxBeamWidth) - flag_size: (MAX_ALL_REDUCE_BLOCKS + 1) * sizeof(uint32_t) * tp_size * 2 - lamport_buffer_size: tp_size * LamportTokenNumThreshold * tp_size * hidden_dim * sizeof(half)
They are for: ipcHandles[0] - peer_comm_buffer_ptrs ipcHandles[2] - peer_barrier_ptrs_in ipcHandles[3] - peer_barrier_ptrs_out ipcHandles[4] - lamport_peer_comm_buffer_ptrs[0:tp_size] ipcHandles[5] - lamport_peer_comm_buffer_ptrs[tp_size:tp_size * 2] ipcHandles[6] - lamport_peer_comm_buffer_ptrs[tp_size * 2:tp_size * 3]
We use tp_size and world_size here interchangeably (customAllReduce).
Reference: trtllm, cpp/tests/unit_tests/kernels/allReduce/allReduceKernelTest.cu, Workspace init