flashinfer.fused_moe.reorder_rows_for_gated_act_gemm¶
- flashinfer.fused_moe.reorder_rows_for_gated_act_gemm(x: Tensor) Tensor¶
Reorder rows of a weight tensor for the TensorRT-LLM gated-activation GEMM layout.
Pure-PyTorch reimplementation of the TensorRT-LLM
reorderRowsForGatedActGemmhelper. Used to pre-permute the up/gate weight matrix so that the fused gated-activation kernels can access the two halves with a single contiguous load.- Parameters:
x (torch.Tensor) – Weight tensor whose rows will be permuted. Any dtype is accepted; only the row dimension is reordered.
- Returns:
Row-permuted copy of
x(materialized as a new contiguous tensor; PyTorch advanced indexing always copies, never aliases).- Return type:
torch.Tensor