flashinfer.cudnn¶
cuDNN-backed attention kernels. These wrappers call into NVIDIA’s cuDNN runtime
for batch prefill and batch decode, and are typically used as an alternative
backend for BatchPrefillWithPagedKVCacheWrapper /
BatchDecodeWithPagedKVCacheWrapper when cuDNN is available on the host GPU.
|
Batched decode attention with paged KV cache, backed by cuDNN SDPA. |
|
Batched prefill attention with paged KV cache, backed by cuDNN SDPA. |