.. _api-grouped-mm:

flashinfer.grouped_mm
=====================

.. currentmodule:: flashinfer.grouped_mm

Grouped matrix multiplication APIs for Mixture-of-Experts (MoE) layers,
where each expert holds its own weight matrix and tokens are routed to
experts via an ``m_indptr`` cumulative-count tensor.

The functions in this module mirror the dense ``flashinfer.gemm.mm_*``
APIs and currently dispatch to the cuDNN MoE backend.

BF16 / FP16
-----------

.. autosummary::
    :toctree: ../generated

    grouped_mm_bf16

FP8
---

.. autosummary::
    :toctree: ../generated

    grouped_mm_fp8

MXFP8
-----

.. autosummary::
    :toctree: ../generated

    grouped_mm_mxfp8

FP4 (NVFP4 / MXFP4)
-------------------

.. autosummary::
    :toctree: ../generated

    grouped_mm_fp4