flashinfer.activation.silu_and_mul_scaled_nvfp4_experts_quantize

flashinfer.activation.silu_and_mul_scaled_nvfp4_experts_quantize(a, mask, a_global_sf)

Silu and multiply and quantize batched input tensor to NVFP4 format with mask. :param a: Input tensor of shape [B, M, K] with dtype fp16/bf16. :type a: torch.Tensor :param a_global_sf: Global scale factor of shape [1] with dtype float32. :type a_global_sf: torch.Tensor :param mask: Mask tensor to apply before quantization. :type mask: torch.Tensor :param sf_vec_size: Scale factor vector size. Defaults to 16. :type sf_vec_size: int, optional

Returns:

A tuple containing:
  • Quantized tensor of shape [B, M, K/2] with dtype FLOAT4_E2M1X2

  • Scale factors tensor with shape determined by layout and sf_vec_size

Return type:

Tuple[torch.Tensor, torch.Tensor]