.. _apipod:

flashinfer.pod
==============

POD (Prefill-On-Decode) attention executes a single-request prefill kernel and
a batch-decode kernel concurrently in one launch, which is useful for serving
stacks that overlap a chunked prefill with ongoing decode requests.

.. currentmodule:: flashinfer.pod

.. autoclass:: PODWithPagedKVCacheWrapper
    :members:
    :exclude-members: begin_forward, end_forward, forward, forward_return_lse

    .. automethod:: __init__

.. autoclass:: BatchPODWithPagedKVCacheWrapper
    :members:
    :exclude-members: begin_forward, end_forward, forward, forward_return_lse

    .. automethod:: __init__