flashinfer.quantization.fp4_quantize¶
- flashinfer.quantization.fp4_quantize(input: Tensor, global_scale: Tensor | None = None, sf_vec_size: int = 16, sf_use_ue8m0: bool = False, is_sf_swizzled_layout: bool = True, is_sf_8x4_layout: bool = False, is_global_scale_inversed: bool = False, enable_pdl: bool | None = None, backend: str = 'cuda') Tuple[Tensor, Tensor]¶
Quantize input tensor to FP4 format.
Implements FP4 quantization that converts input tensors to a compressed FP4 format with associated scale factors. Supports various input data types and scale-factor layouts (covering both NVFP4 and MXFP4 quantization recipes).
- Parameters:
input (torch.Tensor) – Input tensor of shape
[M, K]with dtype fp16/bf16/fp8_quantized.global_scale (torch.Tensor, optional) – Global scale factor of shape
[1]and dtypefloat32.sf_vec_size (int) – Scale factor vector size. Defaults to
16.sf_use_ue8m0 (bool) – Whether to use UE8M0 format for scale factors. Defaults to
False.is_sf_swizzled_layout (bool) – Whether to use the swizzled layout for scale factors. Defaults to
True.is_sf_8x4_layout (bool) – Use the 8x4 swizzled layout instead of 128x4. Defaults to
False.is_global_scale_inversed (bool) – When
True,global_scaleis interpreted as the inverse scale. Defaults toFalse.enable_pdl (bool, optional) – Whether to enable Programmatic Dependent Launch. Auto-detected from device capability when
None.backend (str) –
Backend to use for quantization:
"cuda": stable CUDA kernel (default)."cute-dsl": CuTe-DSL kernel (SM100+, experimental). Supported combinations:sf_vec_size=16, sf_use_ue8m0=False: all layouts, fp16/bf16/fp8 (NVFP4).sf_vec_size=32, sf_use_ue8m0=True: all layouts, fp16/bf16 (MXFP4).
- Returns:
(x_q, sf)wherex_qhas shape[M, K/2]with dtypeFLOAT4_E2M1X2andsfis the scale-factor tensor whose shape depends on the layout andsf_vec_size.- Return type:
Tuple[torch.Tensor, torch.Tensor]
- Raises:
NotImplementedError – If the requested feature is not implemented (e.g. BFloat16 input when BFloat16 is not enabled, FP8 input when FP8 is not enabled, or
sf_vec_sizeother than 16 or 32).ValueError – If the
"cute-dsl"backend is requested for an unsupported parameter combination.
Warning
The
"cute-dsl"backend is experimental and not part of the stable API. It may change or be removed in future versions without notice.