flashinfer.rope¶

Kernels for applying rotary embeddings.

`apply_rope_inplace`(q, k, indptr, offsets[, ...])	Apply rotary embedding to a batch of queries/keys (stored as RaggedTensor) inplace.
`apply_llama31_rope_inplace`(q, k, indptr, offsets)	Apply Llama 3.1 style rotary embedding to a batch of queries/keys (stored as RaggedTensor) inplace.
`apply_rope`(q, k, indptr, offsets[, ...])	Apply rotary embedding to a batch of queries/keys (stored as RaggedTensor).
`apply_llama31_rope`(q, k, indptr, offsets[, ...])	Apply Llama 3.1 style rotary embedding to a batch of queries/keys (stored as RaggedTensor).
`apply_rope_pos_ids`(q, k, pos_ids[, ...])	Apply rotary embedding to a batch of queries/keys (stored as RaggedTensor).
`apply_rope_pos_ids_inplace`(q, k, pos_ids[, ...])	Apply rotary embedding to a batch of queries/keys (stored as RaggedTensor) inplace.
`apply_llama31_rope_pos_ids`(q, k, pos_ids[, ...])	Apply Llama 3.1 style rotary embedding to a batch of queries/keys (stored as RaggedTensor).
`apply_llama31_rope_pos_ids_inplace`(q, k, pos_ids)	Apply Llama 3.1 style rotary embedding to a batch of queries/keys (stored as RaggedTensor) inplace.
`apply_rope_with_cos_sin_cache`(positions, ...)	Apply rotary embedding to keys and queries with precomputed cos/sin values.
`apply_rope_with_cos_sin_cache_inplace`(...[, ...])	Apply rotary embedding to keys and queries with precomputed cos/sin values.
`rope_quantize_fp8`(q_rope, k_rope, q_nope, ...)	Apply RoPE (Rotary Positional Embeddings) and quantize to FP8 format.
`rope_quantize_fp8_append_paged_kv_cache`(...)	Apply RoPE (Rotary Positional Embeddings), quantize to FP8, and append K/V to paged cache.
`mla_rope_quantize_fp8`(q_rope, k_rope, ...[, ...])	Apply RoPE and quantize to FP8 for MLA attention.