Installation¶
Python Package¶
FlashInfer is available as a Python package, built on top of PyTorch to easily integrate with your python applications.
Prerequisites¶
OS: Linux only
Python: 3.8, 3.9, 3.10, 3.11, 3.12
Quick Start¶
The easiest way to install FlashInfer is via pip. Please note that the package currently used by FlashInfer is named flashinfer-python
, not flashinfer
.
pip install flashinfer-python
Install from Source¶
In certain cases, you may want to install FlashInfer from source code to try out the latest features in the main branch, or to customize the library for your specific needs.
FlashInfer offers two installation modes:
- JIT mode
CUDA kernels are compiled at runtime using PyTorch’s JIT, with compiled kernels cached for future use.
JIT mode allows fast installation, as no CUDA kernels are pre-compiled, making it ideal for development and testing.
JIT version is also available as a sdist in PyPI.
- AOT mode
Core CUDA kernels are pre-compiled and included in the library, reducing runtime compilation overhead.
If a required kernel is not pre-compiled, it will be compiled at runtime using JIT. AOT mode is recommended for production environments.
JIT mode is the default installation mode. To enable AOT mode, see steps below. You can follow the steps below to install FlashInfer from source code:
Clone the FlashInfer repository:
git clone https://github.com/flashinfer-ai/flashinfer.git --recursive
Make sure you have installed PyTorch with CUDA support. You can check the PyTorch version and CUDA version by running:
python -c "import torch; print(torch.__version__, torch.version.cuda)"
Install Ninja build system:
pip install ninja
Install FlashInfer:
cd flashinfer pip install --no-build-isolation --verbose .
cd flashinfer export TORCH_CUDA_ARCH_LIST="7.5 8.0 8.9 9.0a 10.0a" python -m flashinfer.aot # Produces AOT kernels in aot-ops/ python -m pip install --no-build-isolation --verbose .
Create FlashInfer distributions (optional):
cd flashinfer python -m build --no-isolation --sdist ls -la dist/
cd flashinfer python -m build --no-isolation --wheel ls -la dist/
cd flashinfer export TORCH_CUDA_ARCH_LIST="7.5 8.0 8.9 9.0a 10.0a" python -m flashinfer.aot # Produces AOT kernels in aot-ops/ python -m build --no-isolation --wheel ls -la dist/
C++ API¶
FlashInfer is a header-only library with only CUDA/C++ standard library dependency that can be directly integrated into your C++ project without installation.