flashinfer.fp4_quantization.nvfp4_block_scale_interleave¶
- flashinfer.fp4_quantization.nvfp4_block_scale_interleave(unswizzled_sf: torch.Tensor) torch.Tensor ¶
Swizzle block scale tensor for FP4 format.
This function swizzles the block scale tensor to optimize memory access patterns for FP4 operations. The output needs to be padded in the m dimension to be a multiple of 128.
- Parameters:
unswizzled_sf (torch.Tensor) – Input tensor with dtype uint8.
- Returns:
Swizzled tensor with the same shape as input.
- Return type:
torch.Tensor
- Raises:
AssertionError – If input dtype is not uint8.