flashinfer.quantization.mxfp8_dequantize_host¶
- flashinfer.quantization.mxfp8_dequantize_host(input: Tensor, scale_tensor: Tensor, is_sf_swizzled_layout: bool = True, sf_swizzle_layout: SfLayout | None = None) Tensor¶
Host-side dequantization of an MxFP8 tensor back to float32.
Performs dequantization by converting a packed FP8 tensor in MxFP8 format back to float values using the associated scale factors.
- Parameters:
input (torch.Tensor) – Packed FP8 tensor in MxFP8 format of shape
[M, K]with dtypeFLOAT8_E4M3.scale_tensor (torch.Tensor) – Scale-factor tensor (shape depends on layout and
sf_vec_size).is_sf_swizzled_layout (bool) – Whether the scale factors are stored in the swizzled layout. Defaults to
True.sf_swizzle_layout (SfLayout, optional) – Explicit swizzle layout for scale factors; when supplied this overrides
is_sf_swizzled_layout. Options areSfLayout.layout_128x4andSfLayout.layout_linear.
- Returns:
Dequantized float tensor of shape
[M, K]with dtypefloat32.- Return type:
torch.Tensor