Installation#
Python Package#
FlashInfer is available as a Python package, built on top of PyTorch to easily integrate with your python applications.
Prerequisites#
OS: Linux only
Python: 3.8, 3.9, 3.10, 3.11, 3.12
PyTorch: 2.2/2.3/2.4 with CUDA 11.8/12.1/12.4 (only for torch 2.4)
Use
python -c "import torch; print(torch.version.cuda)"
to check your PyTorch CUDA version.
Supported GPU architectures:
sm75
,sm80
,sm86
,sm89
,sm90
.
Quick Start#
The easiest way to install FlashInfer is via pip:
pip install flashinfer -i https://flashinfer.ai/whl/cu124/torch2.4/
pip install flashinfer -i https://flashinfer.ai/whl/cu121/torch2.4/
pip install flashinfer -i https://flashinfer.ai/whl/cu118/torch2.4/
pip install flashinfer -i https://flashinfer.ai/whl/cu121/torch2.3/
pip install flashinfer -i https://flashinfer.ai/whl/cu118/torch2.3/
pip install flashinfer -i https://flashinfer.ai/whl/cu121/torch2.2/
pip install flashinfer -i https://flashinfer.ai/whl/cu118/torch2.2/
Since FlashInfer version 0.1.2, support for PyTorch 2.1 has been ended. Users are encouraged to upgrade to a newer PyTorch version or install FlashInfer from source code. .
pip install flashinfer -i https://flashinfer.ai/whl/cu121/torch2.1/
pip install flashinfer -i https://flashinfer.ai/whl/cu118/torch2.1/
Install from Source#
In certain cases, you may want to install FlashInfer from source code to trying out the latest features in the main branch, or to customize the library for your specific needs.
FlashInfer offers two installation modes:
- JIT mode
CUDA kernels are compiled at runtime using PyTorch’s JIT, with compiled kernels cached for future use.
JIT mode allows fast installation, as no CUDA kernels are pre-compiled, making it ideal for development and testing.
- AOT mode
Core CUDA kernels are pre-compiled and included in the library, reducing runtime compilation overhead.
If a required kernel is not pre-compiled, it will be compiled at runtime using JIT. AOT mode is recommended for production environments.
You can follow the steps below to install FlashInfer from source code:
Clone the FlashInfer repository:
git clone https://github.com/flashinfer-ai/flashinfer.git --recursive
Make sure you have installed PyTorch with CUDA support. You can check the PyTorch version and CUDA version by running:
python -c "import torch; print(torch.__version__, torch.version.cuda)"
Install Ninja build system:
pip install ninja
Install FlashInfer:
cd flashinfer/python pip install -e . -v
cd flashinfer/python TORCH_CUDA_ARCH_LIST="7.5 8.0 8.9 9.0a" python3 aot_setup.py bdist_wheel pip install dist/flashinfer-*.whl
cd flashinfer/python python -m build --sdist ls -la dist/
C++ API#
FlashInfer is a header-only library with only CUDA/C++ standard library dependency that can be directly integrated into your C++ project without installation.
You can check our unittest and benchmarks on how to use our C++ APIs at the moment.
Note
The nvbench
and googletest
dependency in 3rdparty
directory are only
used to compile unittests and benchmarks, and are not required for the library itself.
Compile Benchmarks and Unittests#
To compile the C++ benchmarks (using nvbench) and unittests, you can follow the steps below:
Clone the FlashInfer repository:
git clone https://github.com/flashinfer-ai/flashinfer.git --recursive
Check conda is installed (you can skip this step if you have installed cmake and ninja in other ways):
conda --version
If conda is not installed, you can install it by following the instructions on the miniconda or miniforge websites.
Install CMake and Ninja build system:
conda install cmake ninja
Create build directory and copy configuration files
mkdir -p build cp cmake/config.cmake build/ # you can modify the configuration file if needed
Compile the benchmarks and unittests:
cd build cmake .. -G Ninja -DCMAKE_BUILD_TYPE=Release ninja