flashinfer.prefill.single_prefill_with_kv_cache_return_lse

flashinfer.prefill.single_prefill_with_kv_cache_return_lse(q: Tensor, k: Tensor, v: Tensor, scale_q: Tensor | None = None, scale_k: Tensor | None = None, scale_v: Tensor | None = None, o_dtype: dtype | None = None, custom_mask: Tensor | None = None, packed_custom_mask: Tensor | None = None, causal: bool = False, kv_layout: str = 'NHD', pos_encoding_mode: str = 'NONE', use_fp16_qk_reduction: bool = False, sm_scale: float | None = None, window_left: int = -1, logits_soft_cap: float | None = None, rope_scale: float | None = None, rope_theta: float | None = None, backend: str = 'auto', *, return_lse: bool = True, kv_cache_sf: Tuple[Tensor, Tensor] | None = None, k_scale: float | None = None, v_scale: float | None = None) Tensor | Tuple[Tensor, Tensor]

Convenience wrapper for single_prefill_with_kv_cache() that always returns LSE.

Equivalent to calling single_prefill_with_kv_cache() with return_lse=True; accepts the same arguments and forwards them unchanged. See single_prefill_with_kv_cache() for the full parameter list (including FP8 / NVFP4 quantization scales such as scale_q, scale_k, scale_v, o_dtype, kv_cache_sf, k_scale, and v_scale).

Returns:

A pair (output, lse) where output is the attention output and lse is the log-sum-exp tensor used for cascade merging.

Return type:

Tuple[torch.Tensor, torch.Tensor]