flashinfer.quantization.mxfp4_dequantize_host¶
- flashinfer.quantization.mxfp4_dequantize_host(weight: Tensor, scale: Tensor, group_size: int = 32) Tensor¶
Host-side MXFP4 dequantization.
- Parameters:
weight (torch.Tensor) – Quantized tensor of shape
[M, K/2]with dtypeuint8(FLOAT4_E2M1X2).scale (torch.Tensor) – UE8M0 scale-factor tensor (
uint8); shape depends on the layout andgroup_size/sf_vec_size(typically the swizzled buffer produced bymxfp4_quantize()).group_size (int) – Group size for dequantization. Defaults to
32.
- Returns:
Dequantized tensor of shape
[M, K]with dtypefloat32.- Return type:
torch.Tensor