flashinfer.quantization.block_scale_interleave¶

flashinfer.quantization.block_scale_interleave(unswizzled_sf: Tensor) → Tensor¶

Swizzle a block-scale tensor for FP4 layouts.

Reorders an unswizzled FP4 block-scale tensor to optimize memory access patterns for FP4 GEMM/MoE kernels. The output is padded in the m dimension to a multiple of 128.

Parameters:: unswizzled_sf (torch.Tensor) – Input scale-factor tensor with dtype uint8 or bfloat16.
Returns:: 1D flattened swizzled scale-factor buffer of shape (num_experts * expert_out_size,) where num_experts is unswizzled_sf.shape[0] for 3D inputs (and 1 otherwise) and expert_out_size is the padded swizzled size returned by _compute_swizzled_layout_sf_size. Note that this is not the same logical shape as unswizzled_sf; downstream FP4 GEMM/MoE kernels consume the flat buffer directly.
Return type:: torch.Tensor
Raises:: AssertionError – If the input dtype is not uint8 or bfloat16.