flashinfer.fp4_quantization.nvfp4_block_scale_interleave

flashinfer.fp4_quantization.nvfp4_block_scale_interleave(unswizzled_sf: torch.Tensor) torch.Tensor

Swizzle block scale tensor for FP4 format.

This function swizzles the block scale tensor to optimize memory access patterns for FP4 operations. The output needs to be padded in the m dimension to be a multiple of 128.

Parameters:

unswizzled_sf (torch.Tensor) – Input tensor with dtype uint8.

Returns:

Swizzled tensor with the same shape as input.

Return type:

torch.Tensor

Raises:

AssertionError – If input dtype is not uint8.