flashinfer.prefill.single_prefill_with_kv_cache_return_lse¶
- flashinfer.prefill.single_prefill_with_kv_cache_return_lse(q: Tensor, k: Tensor, v: Tensor, scale_q: Tensor | None = None, scale_k: Tensor | None = None, scale_v: Tensor | None = None, o_dtype: dtype | None = None, custom_mask: Tensor | None = None, packed_custom_mask: Tensor | None = None, causal: bool = False, kv_layout: str = 'NHD', pos_encoding_mode: str = 'NONE', use_fp16_qk_reduction: bool = False, sm_scale: float | None = None, window_left: int = -1, logits_soft_cap: float | None = None, rope_scale: float | None = None, rope_theta: float | None = None, backend: str = 'auto', *, return_lse: bool = True, kv_cache_sf: Tuple[Tensor, Tensor] | None = None, k_scale: float | None = None, v_scale: float | None = None) Tensor | Tuple[Tensor, Tensor]¶
Convenience wrapper for
single_prefill_with_kv_cache()that always returns LSE.Equivalent to calling
single_prefill_with_kv_cache()withreturn_lse=True; accepts the same arguments and forwards them unchanged. Seesingle_prefill_with_kv_cache()for the full parameter list (including FP8 / NVFP4 quantization scales such asscale_q,scale_k,scale_v,o_dtype,kv_cache_sf,k_scale, andv_scale).- Returns:
A pair
(output, lse)whereoutputis the attention output andlseis the log-sum-exp tensor used for cascade merging.- Return type:
Tuple[torch.Tensor, torch.Tensor]