.. FlashInfer documentation master file, created by
sphinx-quickstart on Sat Jan 20 12:31:26 2024.
You can adapt this file completely to your liking, but it should at least
contain the root `toctree` directive.
Welcome to FlashInfer's documentation!
======================================
`Blog `_ | `Discussion Forum `_ | `GitHub `_
FlashInfer is a library and kernel generator for Large Language Models that provides high-performance implementation of LLM GPU kernels such as FlashAttention, PageAttention and LoRA. FlashInfer focus on LLM serving and inference, and delivers state-of-the-art performance across diverse scenarios.
.. toctree::
:maxdepth: 2
:caption: Get Started
installation
.. toctree::
:maxdepth: 2
:caption: Tutorials
tutorials/recursive_attention
tutorials/kv_layout
.. toctree::
:maxdepth: 2
:caption: PyTorch API Reference
api/decode
api/prefill
api/cascade
api/sparse
api/page
api/sampling
api/gemm
api/norm
api/rope
api/activation
api/quantization