flashinfer.quantization.mxfp4_dequantize_host

flashinfer.quantization.mxfp4_dequantize_host(weight: Tensor, scale: Tensor, group_size: int = 32) Tensor

Host-side MXFP4 dequantization.

Parameters:
  • weight (torch.Tensor) – Quantized tensor of shape [M, K/2] with dtype uint8 (FLOAT4_E2M1X2).

  • scale (torch.Tensor) – UE8M0 scale-factor tensor (uint8); shape depends on the layout and group_size / sf_vec_size (typically the swizzled buffer produced by mxfp4_quantize()).

  • group_size (int) – Group size for dequantization. Defaults to 32.

Returns:

Dequantized tensor of shape [M, K] with dtype float32.

Return type:

torch.Tensor