Welcome to FlashInfer’s documentation!¶
Blog | Discussion Forum | GitHub
FlashInfer is a library and kernel generator for Large Language Models that provides high-performance implementation of LLM GPU kernels such as FlashAttention, PageAttention and LoRA. FlashInfer focus on LLM serving and inference, and delivers state-of-the-art performance across diverse scenarios.
Get Started
Tutorials
PyTorch API Reference
- FlashInfer Attention Kernels
- flashinfer.gemm
- flashinfer.grouped_mm
- flashinfer.fused_moe
- flashinfer.cascade
- flashinfer.comm
- flashinfer.sparse
- flashinfer.pod
- flashinfer.cudnn
- flashinfer.cute_dsl
- flashinfer.page
- flashinfer.sampling
- flashinfer.sampling.sampling_from_probs
- flashinfer.sampling.sampling_from_logits
- flashinfer.sampling.softmax
- flashinfer.sampling.top_p_sampling_from_probs
- flashinfer.sampling.top_k_sampling_from_probs
- flashinfer.sampling.min_p_sampling_from_probs
- flashinfer.sampling.top_k_top_p_sampling_from_logits
- flashinfer.sampling.top_k_top_p_sampling_from_probs
- flashinfer.sampling.top_p_renorm_probs
- flashinfer.sampling.top_k_renorm_probs
- flashinfer.sampling.top_k_mask_logits
- flashinfer.sampling.chain_speculative_sampling
- flashinfer.topk
- flashinfer.logits_processor
- flashinfer.norm
- flashinfer.norm.rmsnorm
- flashinfer.norm.rmsnorm_quant
- flashinfer.norm.fused_add_rmsnorm
- flashinfer.norm.fused_add_rmsnorm_quant
- flashinfer.norm.gemma_rmsnorm
- flashinfer.norm.gemma_fused_add_rmsnorm
- flashinfer.norm.layernorm
- flashinfer.norm.fused_rmsnorm_silu
- flashinfer.norm.fused_qk_rmsnorm_rope
- flashinfer.norm.fused_dit_residual_layernorm_scale_shift
- flashinfer.norm.fused_dit_gate_residual_layernorm_scale_shift
- flashinfer.norm.fused_dit_gate_residual_layernorm_gamma_beta
- flashinfer.rope
- flashinfer.rope.apply_rope_inplace
- flashinfer.rope.apply_llama31_rope_inplace
- flashinfer.rope.apply_rope
- flashinfer.rope.apply_llama31_rope
- flashinfer.rope.apply_rope_pos_ids
- flashinfer.rope.apply_rope_pos_ids_inplace
- flashinfer.rope.apply_llama31_rope_pos_ids
- flashinfer.rope.apply_llama31_rope_pos_ids_inplace
- flashinfer.rope.apply_rope_with_cos_sin_cache
- flashinfer.rope.apply_rope_with_cos_sin_cache_inplace
- flashinfer.rope.rope_quantize_fp8
- flashinfer.rope.rope_quantize_fp8_append_paged_kv_cache
- flashinfer.rope.mla_rope_quantize_fp8
- flashinfer.activation
- flashinfer.gdn_decode
- flashinfer.gdn_prefill
- flashinfer.mamba
- flashinfer.quantization
- flashinfer.green_ctx
- flashinfer.fp4_quantization
- flashinfer.testing