flashinfer.fused_moe.convert_to_block_layout

flashinfer.fused_moe.convert_to_block_layout(input_tensor: Tensor, blockK: int) Tensor

Reshape a 2-D tensor into a 3-D block layout.

Splits the inner K dimension into K // blockK blocks of size blockK and transposes so the block dimension is outermost. This is the canonical layout consumed by TensorRT-LLM block-scaled MoE kernels.

Parameters:
  • input_tensor (torch.Tensor) – Input tensor of shape (M, K).

  • blockK (int) – Block size along the K dimension. K must be divisible by blockK.

Returns:

Reshaped contiguous tensor of shape (K // blockK, M, blockK).

Return type:

torch.Tensor