flashinfer.quantization.mxfp8_dequantize_host

flashinfer.quantization.mxfp8_dequantize_host(input: Tensor, scale_tensor: Tensor, is_sf_swizzled_layout: bool = True, sf_swizzle_layout: SfLayout | None = None) Tensor

Host-side dequantization of an MxFP8 tensor back to float32.

Performs dequantization by converting a packed FP8 tensor in MxFP8 format back to float values using the associated scale factors.

Parameters:
  • input (torch.Tensor) – Packed FP8 tensor in MxFP8 format of shape [M, K] with dtype FLOAT8_E4M3.

  • scale_tensor (torch.Tensor) – Scale-factor tensor (shape depends on layout and sf_vec_size).

  • is_sf_swizzled_layout (bool) – Whether the scale factors are stored in the swizzled layout. Defaults to True.

  • sf_swizzle_layout (SfLayout, optional) – Explicit swizzle layout for scale factors; when supplied this overrides is_sf_swizzled_layout. Options are SfLayout.layout_128x4 and SfLayout.layout_linear.

Returns:

Dequantized float tensor of shape [M, K] with dtype float32.

Return type:

torch.Tensor