flashinfer.rope.apply_rope_with_cos_sin_cache¶

flashinfer.rope.apply_rope_with_cos_sin_cache(positions: Tensor, query: Tensor, key: Tensor, head_size: int, cos_sin_cache: Tensor, is_neox: bool = True) → Tuple[Tensor, Tensor]¶

Apply rotary embedding to keys and queries with precomputed cos/sin values. This is designed to be compatible with the SGL/vLLM implementation.

Parameters:

positions (torch.Tensor) – Position indices, shape: (nnz).
query (torch.Tensor) – Query tensor, shape: (nnz, num_q_heads * head_size).
key (torch.Tensor) – Key tensor, shape: (nnz, num_k_heads * head_size).
cos_sin_cache (torch.Tensor) – Cosine and Sine cache tensor, shape: (max_seq_len, rotary_dim). Cosine is the first half and Sine is the second half on rotary_dim.
is_neox (bool) –
Whether to use Neox style RoPE, default: True.
- If True, the last dimension of the query/key tensor is not interleaved, i.e., we rorate the first half dimensions ([..., :head_dim//2]) and the second half dimensions ([..., head_dim//2:]).
- If False, the last dimension of the query/key tensor is interleaved, i.e., we rotate the even dimensions ([..., ::2]) and odd dimensions ([..., 1::2]).

Returns:

query_out (torch.Tensor) – The rotated query tensor, shape: (nnz, num_q_heads * head_size).
key_out (torch.Tensor) – The rotated key tensor, shape: (nnz, num_k_heads * head_size).

Note

The rotary dimension is determined by the cosine cache and sine cache.