flashinfer.quantization.mxfp4_dequantize_host¶

flashinfer.quantization.mxfp4_dequantize_host(weight: Tensor, scale: Tensor, group_size: int = 32) → Tensor¶

Host-side MXFP4 dequantization.

Parameters:

weight (torch.Tensor) – Quantized tensor of shape [M, K/2] with dtype uint8 (FLOAT4_E2M1X2).
scale (torch.Tensor) – UE8M0 scale-factor tensor (uint8); shape depends on the layout and group_size / sf_vec_size (typically the swizzled buffer produced by mxfp4_quantize()).
group_size (int) – Group size for dequantization. Defaults to 32.

Returns:

Dequantized tensor of shape [M, K] with dtype float32.

Return type:

torch.Tensor