flashinfer.rope

Kernels for applying rotary embeddings.

apply_rope_inplace(q, k, indptr, offsets[, ...])

Apply rotary embedding to a batch of queries/keys (stored as RaggedTensor) inplace.

apply_llama31_rope_inplace(q, k, indptr, offsets)

Apply Llama 3.1 style rotary embedding to a batch of queries/keys (stored as RaggedTensor) inplace.

apply_rope(q, k, indptr, offsets[, ...])

Apply rotary embedding to a batch of queries/keys (stored as RaggedTensor).

apply_llama31_rope(q, k, indptr, offsets[, ...])

Apply Llama 3.1 style rotary embedding to a batch of queries/keys (stored as RaggedTensor).

apply_rope_pos_ids(q, k, pos_ids[, ...])

Apply rotary embedding to a batch of queries/keys (stored as RaggedTensor).

apply_rope_pos_ids_inplace(q, k, pos_ids[, ...])

Apply rotary embedding to a batch of queries/keys (stored as RaggedTensor) inplace.

apply_llama31_rope_pos_ids(q, k, pos_ids[, ...])

Apply Llama 3.1 style rotary embedding to a batch of queries/keys (stored as RaggedTensor).

apply_llama31_rope_pos_ids_inplace(q, k, pos_ids)

Apply Llama 3.1 style rotary embedding to a batch of queries/keys (stored as RaggedTensor) inplace.

apply_rope_with_cos_sin_cache(positions, ...)

Apply rotary embedding to keys and queries with precomputed cos/sin values.

apply_rope_with_cos_sin_cache_inplace(...[, ...])

Apply rotary embedding to keys and queries with precomputed cos/sin values.

rope_quantize_fp8(q_rope, k_rope, q_nope, ...)

Apply RoPE (Rotary Positional Embeddings) and quantize to FP8 format.

rope_quantize_fp8_append_paged_kv_cache(...)

Apply RoPE (Rotary Positional Embeddings), quantize to FP8, and append K/V to paged cache.

mla_rope_quantize_fp8(q_rope, k_rope, ...[, ...])

Apply RoPE and quantize to FP8 for MLA attention.