flashinfer.fused_moe.convert_to_block_layout¶

flashinfer.fused_moe.convert_to_block_layout(input_tensor: Tensor, blockK: int) → Tensor¶

Reshape a 2-D tensor into a 3-D block layout.

Splits the inner K dimension into K // blockK blocks of size blockK and transposes so the block dimension is outermost. This is the canonical layout consumed by TensorRT-LLM block-scaled MoE kernels.

Parameters:

input_tensor (torch.Tensor) – Input tensor of shape (M, K).
blockK (int) – Block size along the K dimension. K must be divisible by blockK.

Returns:

Reshaped contiguous tensor of shape (K // blockK, M, blockK).

Return type:

torch.Tensor