.. _apiattention: FlashInfer Attention Kernels ~~~~~~~~~~~~~~~~~~~~~~~~~~~~ flashinfer.decode ================= .. currentmodule:: flashinfer.decode Single Request Decoding ----------------------- .. autosummary:: :toctree: ../generated single_decode_with_kv_cache Batch Decoding -------------- .. autosummary:: :toctree: ../generated cudnn_batch_decode_with_kv_cache trtllm_batch_decode_with_kv_cache .. autoclass:: BatchDecodeWithPagedKVCacheWrapper :members: :exclude-members: begin_forward, end_forward, forward, forward_return_lse .. automethod:: __init__ .. autoclass:: CUDAGraphBatchDecodeWithPagedKVCacheWrapper :members: .. automethod:: __init__ flashinfer.prefill ================== Attention kernels for prefill & append attention in both single request and batch serving setting. .. currentmodule:: flashinfer.prefill Single Request Prefill/Append Attention --------------------------------------- .. autosummary:: :toctree: ../generated single_prefill_with_kv_cache single_prefill_with_kv_cache_return_lse Batch Prefill/Append Attention ------------------------------ .. autosummary:: :toctree: ../generated cudnn_batch_prefill_with_kv_cache trtllm_batch_context_with_kv_cache .. autoclass:: BatchPrefillWithPagedKVCacheWrapper :members: :exclude-members: begin_forward, end_forward, forward, forward_return_lse .. automethod:: __init__ .. autoclass:: BatchPrefillWithRaggedKVCacheWrapper :members: :exclude-members: begin_forward, end_forward, forward, forward_return_lse .. automethod:: __init__ flashinfer.mla ============== MLA (Multi-head Latent Attention) is an attention mechanism proposed in DeepSeek series of models ( `DeepSeek-V2 `_, `DeepSeek-V3 `_, and `DeepSeek-R1 `_). .. currentmodule:: flashinfer.mla PageAttention for MLA --------------------- .. autoclass:: BatchMLAPagedAttentionWrapper :members: .. automethod:: __init__