flashinfer.quantization.e2m1_and_ufp8sf_scale_to_float¶
- flashinfer.quantization.e2m1_and_ufp8sf_scale_to_float(e2m1_tensor: Tensor, ufp8_scale_tensor: Tensor, global_scale_tensor: Tensor | None = None, sf_vec_size: int = 16, ufp8_type: int = 1, is_sf_swizzled_layout: bool = True) Tensor¶
Dequantize an E2M1 tensor with UFP8 scales back to float32.
Performs dequantization by converting a packed FP4 tensor in E2M1 format back to float values using the associated UFP8 scale factors and global scale.
- Parameters:
e2m1_tensor (torch.Tensor) – Packed FP4 tensor in E2M1 format of shape
[M, K/2]with dtypeuint8.ufp8_scale_tensor (torch.Tensor) – Scale-factor tensor in UFP8 format with dtype
uint8.global_scale_tensor (torch.Tensor, optional) – Global scale factor of shape
[1]and dtypefloat32.sf_vec_size (int) – Scale-factor vector size. Defaults to
16.ufp8_type (int) – UFP8 scale-factor type (
0for UE8M0,1for E4M3). Defaults to1.is_sf_swizzled_layout (bool) – Whether the scale factors are stored in the swizzled layout. Defaults to
True.
- Returns:
Dequantized float tensor of shape
[M, K]with dtypefloat32.- Return type:
torch.Tensor