.. _apimla: flashinfer.mla ============== MLA (Multi-head Latent Attention) is an attention mechanism proposed in DeepSeek series of models ( `DeepSeek-V2 `_, `DeepSeek-V3 `_, and `DeepSeek-R1 `_). .. currentmodule:: flashinfer.mla PageAttention for MLA --------------------- .. autoclass:: BatchMLAPagedAttentionWrapper :members: .. automethod:: __init__