flashinfer.fused_moe.interleave_moe_scales_for_sm90_mixed_gemm¶

flashinfer.fused_moe.interleave_moe_scales_for_sm90_mixed_gemm(scales: Tensor, group_size: int = 32) → Tensor¶

Fold weight scales for the SM90 mixed-input MoE GEMM.

Parameters:

scales (torch.Tensor) – [num_experts, rows, K // group_size] tensor of scalar weight scales. MXFP4 uses uint8 E8M0 scales with group_size=32; W4A8 uses bf16 bit-pattern scales with group_size=128.
group_size (int) – Weight quantization group size.

Returns:

Contiguous tensor with shape [num_experts, rows // 64, K // 128, folded_m, physical_cols]. physical_cols is the number of scale elements in 16B and folded_m is derived so each 64x128 logical scale block is stored as a 16B-contiguous folded block.

Return type:

torch.Tensor