flashinfer.rope#

Kernels for applying rotary embeddings.

apply_rope_inplace(q, k, indptr, offsets[, ...])

Apply rotary embedding to a batch of queries/keys (stored as RaggedTensor) inplace.

apply_llama31_rope_inplace(q, k, indptr, offsets)

Apply Llama 3.1 style rotary embedding to a batch of queries/keys (stored as RaggedTensor) inplace.

apply_rope(q, k, indptr, offsets[, ...])

Apply rotary embedding to a batch of queries/keys (stored as RaggedTensor).

apply_llama31_rope(q, k, indptr, offsets[, ...])

Apply Llama 3.1 style rotary embedding to a batch of queries/keys (stored as RaggedTensor).

apply_rope_pos_ids(q, k, pos_ids[, ...])

Apply rotary embedding to a batch of queries/keys (stored as RaggedTensor).

apply_rope_pos_ids_inplace(q, k, pos_ids[, ...])

Apply rotary embedding to a batch of queries/keys (stored as RaggedTensor) inplace.

apply_llama31_rope_pos_ids(q, k, pos_ids[, ...])

Apply Llama 3.1 style rotary embedding to a batch of queries/keys (stored as RaggedTensor).

apply_llama31_rope_pos_ids_inplace(q, k, pos_ids)

Apply Llama 3.1 style rotary embedding to a batch of queries/keys (stored as RaggedTensor) inplace.

apply_rope_with_cos_sin_cache(q, k, ...[, ...])

Apply rotary embedding to keys and queries with precomputed cos/sin values.

apply_rope_with_cos_sin_cache_inplace(q, k, ...)

Apply rotary embedding to keys and queries with precomputed cos/sin values.