Contents Menu Expand Light mode Dark mode Auto light/dark, in light mode Auto light/dark, in dark mode Skip to content
FlashInfer 0.2.14.post1 documentation
Light Logo Dark Logo
FlashInfer 0.2.14.post1 documentation

Get Started

  • Installation

Tutorials

  • Attention States and Recursive Attention
  • KV-Cache Layout in FlashInfer

PyTorch API Reference

  • FlashInfer Attention Kernels
    • flashinfer.decode.single_decode_with_kv_cache
    • flashinfer.decode.cudnn_batch_decode_with_kv_cache
    • flashinfer.decode.trtllm_batch_decode_with_kv_cache
    • flashinfer.prefill.single_prefill_with_kv_cache
    • flashinfer.prefill.single_prefill_with_kv_cache_return_lse
    • flashinfer.prefill.cudnn_batch_prefill_with_kv_cache
    • flashinfer.prefill.trtllm_batch_context_with_kv_cache
  • flashinfer.gemm
    • flashinfer.gemm.mm_fp4
    • flashinfer.gemm.bmm_fp8
    • flashinfer.gemm.gemm_fp8_nt_groupwise
    • flashinfer.gemm.group_gemm_fp8_nt_groupwise
    • flashinfer.gemm.group_deepgemm_fp8_nt_groupwise
    • flashinfer.gemm.batch_deepgemm_fp8_nt_groupwise
    • flashinfer.gemm.group_gemm_mxfp4_nt_groupwise
  • flashinfer.fused_moe
    • flashinfer.fused_moe.RoutingMethodType
    • flashinfer.fused_moe.WeightLayout
    • flashinfer.fused_moe.convert_to_block_layout
    • flashinfer.fused_moe.reorder_rows_for_gated_act_gemm
    • flashinfer.fused_moe.cutlass_fused_moe
    • flashinfer.fused_moe.trtllm_fp4_block_scale_moe
    • flashinfer.fused_moe.trtllm_fp8_block_scale_moe
    • flashinfer.fused_moe.trtllm_fp8_per_tensor_scale_moe
  • flashinfer.cascade
    • flashinfer.cascade.merge_state
    • flashinfer.cascade.merge_state_in_place
    • flashinfer.cascade.merge_states
  • flashinfer.sparse
  • flashinfer.page
    • flashinfer.page.append_paged_kv_cache
    • flashinfer.page.append_paged_mla_kv_cache
    • flashinfer.page.get_batch_indices_positions
  • flashinfer.sampling
    • flashinfer.sampling.sampling_from_probs
    • flashinfer.sampling.top_p_sampling_from_probs
    • flashinfer.sampling.top_k_sampling_from_probs
    • flashinfer.sampling.min_p_sampling_from_probs
    • flashinfer.sampling.top_k_top_p_sampling_from_logits
    • flashinfer.sampling.top_k_top_p_sampling_from_probs
    • flashinfer.sampling.top_p_renorm_probs
    • flashinfer.sampling.top_k_renorm_probs
    • flashinfer.sampling.top_k_mask_logits
    • flashinfer.sampling.chain_speculative_sampling
  • flashinfer.logits_processor
    • flashinfer.logits_processor.LogitsPipe
    • flashinfer.logits_processor.LogitsProcessor
    • flashinfer.logits_processor.Temperature
    • flashinfer.logits_processor.Softmax
    • flashinfer.logits_processor.TopK
    • flashinfer.logits_processor.TopP
    • flashinfer.logits_processor.MinP
    • flashinfer.logits_processor.Sample
    • flashinfer.logits_processor.TensorType
    • flashinfer.logits_processor.TaggedTensor
  • flashinfer.norm
    • flashinfer.norm.rmsnorm
    • flashinfer.norm.fused_add_rmsnorm
    • flashinfer.norm.gemma_rmsnorm
    • flashinfer.norm.gemma_fused_add_rmsnorm
  • flashinfer.rope
    • flashinfer.rope.apply_rope_inplace
    • flashinfer.rope.apply_llama31_rope_inplace
    • flashinfer.rope.apply_rope
    • flashinfer.rope.apply_llama31_rope
    • flashinfer.rope.apply_rope_pos_ids
    • flashinfer.rope.apply_rope_pos_ids_inplace
    • flashinfer.rope.apply_llama31_rope_pos_ids
    • flashinfer.rope.apply_llama31_rope_pos_ids_inplace
    • flashinfer.rope.apply_rope_with_cos_sin_cache
    • flashinfer.rope.apply_rope_with_cos_sin_cache_inplace
  • flashinfer.activation
    • flashinfer.activation.silu_and_mul
    • flashinfer.activation.gelu_tanh_and_mul
    • flashinfer.activation.gelu_and_mul
  • flashinfer.quantization
    • flashinfer.quantization.packbits
    • flashinfer.quantization.segment_packbits
  • flashinfer.green_ctx
    • flashinfer.green_ctx.split_device_green_ctx
    • flashinfer.green_ctx.split_device_green_ctx_by_sm_count
  • flashinfer.fp4_quantization
    • flashinfer.fp4_quantization.fp4_quantize
    • flashinfer.fp4_quantization.nvfp4_quantize
    • flashinfer.fp4_quantization.nvfp4_block_scale_interleave
    • flashinfer.fp4_quantization.e2m1_and_ufp8sf_scale_to_float
    • flashinfer.fp4_quantization.shuffle_matrix_a
    • flashinfer.fp4_quantization.shuffle_matrix_sf_a
    • flashinfer.fp4_quantization.SfLayout
  • flashinfer.testing
    • flashinfer.testing.set_seed
    • flashinfer.testing.sleep_after_kernel_run
    • flashinfer.testing.attention_flops
    • flashinfer.testing.attention_flops_with_actual_seq_lens
    • flashinfer.testing.attention_tflops_per_sec
    • flashinfer.testing.attention_tflops_per_sec_with_actual_seq_lens
    • flashinfer.testing.attention_tb_per_sec
    • flashinfer.testing.attention_tb_per_sec_with_actual_seq_lens
    • flashinfer.testing.bench_gpu_time
    • flashinfer.testing.bench_gpu_time_with_cudagraph
Back to top
Copyright © 2023-2025, FlashInfer Contributors
Made with Sphinx and @pradyunsg's Furo