
Python Package#

FlashInfer is available as a Python package, built on top of PyTorch to easily integrate with your python applications.


  • OS: Linux only

  • Python: 3.8, 3.9, 3.10, 3.11

  • PyTorch: 2.1/2.2/2.3 with CUDA 11.8/12.1

    • Use python -c "import torch; print(torch.version.cuda)" to check your PyTorch CUDA version.

  • Supported GPU architectures: sm80, sm86, sm89, sm90 (sm75 / sm70 support is working in progress).

Quick Start#

pip install flashinfer -i

C++ API#

FlashInfer is a header-only library with only CUDA/C++ standard library dependency that can be directly integrated into your C++ project without installation.

You can check our unittest and benchmarks on how to use our C++ APIs at the moment.


The nvbench and googletest dependency in 3rdparty directory are only used to compile unittests and benchmarks, and are not required for the library itself.