flashinfer.fused_moe.convert_to_block_layout¶
- flashinfer.fused_moe.convert_to_block_layout(input_tensor: Tensor, blockK: int) Tensor¶
Reshape a 2-D tensor into a 3-D block layout.
Splits the inner
Kdimension intoK // blockKblocks of sizeblockKand transposes so the block dimension is outermost. This is the canonical layout consumed by TensorRT-LLM block-scaled MoE kernels.- Parameters:
input_tensor (torch.Tensor) – Input tensor of shape
(M, K).blockK (int) – Block size along the
Kdimension.Kmust be divisible byblockK.
- Returns:
Reshaped contiguous tensor of shape
(K // blockK, M, blockK).- Return type:
torch.Tensor