flashinfer.fp4_quantization.nvfp4_batched_quantize¶

flashinfer.fp4_quantization.nvfp4_batched_quantize(a, a_global_sf, sf_vec_size=16)¶

Quantize batched input tensor to NVFP4 format.

Parameters:

a (torch.Tensor) – Input tensor of shape [B, M, K] with dtype fp16/bf16.
a_global_sf (torch.Tensor) – Global scale factor of shape [1] with dtype float32.
sf_vec_size (int, optional) – Scale factor vector size. Defaults to 16.

Returns:

A tuple containing:

Return type:

Tuple[torch.Tensor, torch.Tensor]