flashinfer.cudnn

cuDNN-backed attention kernels. These wrappers call into NVIDIA’s cuDNN runtime for batch prefill and batch decode, and are typically used as an alternative backend for BatchPrefillWithPagedKVCacheWrapper / BatchDecodeWithPagedKVCacheWrapper when cuDNN is available on the host GPU.

cudnn_batch_decode_with_kv_cache(q, k_cache, ...)

Batched decode attention with paged KV cache, backed by cuDNN SDPA.

cudnn_batch_prefill_with_kv_cache(q, ...[, ...])

Batched prefill attention with paged KV cache, backed by cuDNN SDPA.