.. _logging:

Logging
=======

FlashInfer provides a logging feature to help debug issues and reproduce crashes. This document describes all available logging levels and their features.

Quick Start
-----------

Enable logging using two environment variables:

.. code-block:: bash

    # Set logging level (0-5)
    export FLASHINFER_LOGLEVEL=3

    # Set log destination (default is stdout)
    export FLASHINFER_LOGDEST=stdout  # or stderr, or a file path like "flashinfer.log"

Logging Levels
--------------

.. list-table::
   :header-rows: 1
   :widths: 10 20 35 25

   * - Level
     - Name
     - Features
     - Use Case
   * - **0**
     - Disabled (Default)
     - No logging (zero overhead)
     - Production
   * - **1**
     - Function Names
     - Function names only
     - Basic tracing
   * - **3**
     - Inputs/Outputs
     - Function names + arguments + outputs with metadata
     - Standard debugging
   * - **5**
     - Statistics
     - Level 3 + tensor statistics (min, max, mean, NaN/Inf counts)
     - Numerical analysis

Environment Variables
---------------------

Main Configuration
^^^^^^^^^^^^^^^^^^

.. list-table::
   :header-rows: 1
   :widths: 30 15 15 40

   * - Variable
     - Type
     - Default
     - Description
   * - ``FLASHINFER_LOGLEVEL``
     - int
     - 0
     - Logging level (0, 1, 3, 5)
   * - ``FLASHINFER_LOGDEST``
     - str
     - ``stdout``
     - Log destination: ``stdout``, ``stderr``, or file path

Process ID Substitution
^^^^^^^^^^^^^^^^^^^^^^^^

Use ``%i`` in file paths for automatic process ID substitution (useful for multi-GPU training):

.. code-block:: bash

    export FLASHINFER_LOGDEST="flashinfer_log_%i.txt"  # → flashinfer_log_12345.txt


Miscellaneous Notes and Examples
---------------------------------

CUDA Graph Compatibility
^^^^^^^^^^^^^^^^^^^^^^^^^

Level 5 statistics are **automatically skipped during CUDA graph capture** to avoid synchronization issues.

.. code-block:: python

    # This works correctly - no synchronization errors
    with torch.cuda.graph(cuda_graph):
        result = mm_fp4(a, b, scales, ...)  # Level 5 logging active
        # Statistics automatically skipped during capture

Output shows: ``[statistics skipped: CUDA graph capture in progress]``

Process IDs for Multi-GPU Environments
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

.. code-block:: bash

    # Use %i for process ID substitution
    export FLASHINFER_LOGLEVEL=3
    export FLASHINFER_LOGDEST="logs/flashinfer_api_%i.log"

    torchrun --nproc_per_node=8 awesome_script_that_uses_FlashInfer.py

    # Creates separate logs:
    # logs/flashinfer_api_12345.log (rank 0)
    # logs/flashinfer_api_12346.log (rank 1)
    # ...

Level 0 has zero overhead
^^^^^^^^^^^^^^^^^^^^^^^^^^^

At Level 0, the decorator returns the original function unchanged. No wrapper, no checks, no overhead.