flashinfer.cudnn¶

cuDNN-backed attention kernels. These wrappers call into NVIDIA’s cuDNN runtime for batch prefill and batch decode, and are typically used as an alternative backend for BatchPrefillWithPagedKVCacheWrapper / BatchDecodeWithPagedKVCacheWrapper when cuDNN is available on the host GPU.

`cudnn_batch_decode_with_kv_cache`(q, k_cache, ...)	Batched decode attention with paged KV cache, backed by cuDNN SDPA.
`cudnn_batch_prefill_with_kv_cache`(q, ...[, ...])	Batched prefill attention with paged KV cache, backed by cuDNN SDPA.