Installation

Python Package

FlashInfer is available as a Python package, built on top of PyTorch to easily integrate with your python applications.

Prerequisites

  • OS: Linux only

  • Python: 3.9, 3.10, 3.11, 3.12, 3.13

Quick Start

The easiest way to install FlashInfer is via pip. Please note that the package currently used by FlashInfer is named flashinfer-python, not flashinfer.

pip install flashinfer-python

Package Options

FlashInfer provides three packages:

  • flashinfer-python: Core package that compiles/downloads kernels on first use

  • flashinfer-cubin: Pre-compiled kernel binaries for all supported GPU architectures

  • flashinfer-jit-cache: Pre-built kernel cache for specific CUDA versions

For faster initialization and offline usage, install the optional packages to have most kernels pre-compiled:

pip install flashinfer-python flashinfer-cubin
# JIT cache package (replace cu129 with your CUDA version: cu128, cu129, or cu130)
pip install flashinfer-jit-cache --index-url https://flashinfer.ai/whl/cu129

This eliminates compilation and downloading overhead at runtime.

Install from Source

In certain cases, you may want to install FlashInfer from source code to try out the latest features in the main branch, or to customize the library for your specific needs.

You can follow the steps below to install FlashInfer from source code:

  1. Clone the FlashInfer repository:

    git clone https://github.com/flashinfer-ai/flashinfer.git --recursive
    
  2. Make sure you have installed PyTorch with CUDA support. You can check the PyTorch version and CUDA version by running:

    python -c "import torch; print(torch.__version__, torch.version.cuda)"
    
  3. Install FlashInfer:

    cd flashinfer
    python -m pip install -v .
    

    For development, install in editable mode:

    python -m pip install --no-build-isolation -e . -v
    
  4. (Optional) Build optional packages:

    Build flashinfer-cubin:

    cd flashinfer-cubin
    python -m build --no-isolation --wheel
    python -m pip install dist/*.whl
    

    Build flashinfer-jit-cache (customize FLASHINFER_CUDA_ARCH_LIST for your target GPUs):

    export FLASHINFER_CUDA_ARCH_LIST="7.5 8.0 8.9 10.0a 10.3a 12.0a"
    cd flashinfer-jit-cache
    python -m build --no-isolation --wheel
    python -m pip install dist/*.whl
    

Install Nightly Build

Nightly builds are available for testing the latest features:

# Core and cubin packages
pip install -U --pre flashinfer-python --index-url https://flashinfer.ai/whl/nightly/ --no-deps # Install the nightly package from custom index, without installing dependencies
pip install flashinfer-python  # Install flashinfer-python's dependencies from PyPI
pip install -U --pre flashinfer-cubin --index-url https://flashinfer.ai/whl/nightly/
# JIT cache package (replace cu129 with your CUDA version: cu128, cu129, or cu130)
pip install -U --pre flashinfer-jit-cache --index-url https://flashinfer.ai/whl/nightly/cu129

Verify Installation

After installation, verify that FlashInfer is correctly installed and configured:

flashinfer show-config

This command displays:

  • FlashInfer version and installed packages (flashinfer-python, flashinfer-cubin, flashinfer-jit-cache)

  • PyTorch and CUDA version information

  • Environment variables and artifact paths

  • Downloaded cubin status and module compilation status