flashinfer.testing.attention_tflops_per_sec_with_actual_seq_lens¶
- flashinfer.testing.attention_tflops_per_sec_with_actual_seq_lens(actual_seq_lens_q, actual_seq_lens_kv, head_dim_qk, head_dim_vo, num_qo_heads, causal, time)¶
Calculate TFLOPS per second for a given attention layer with actual sequence lengths. Does not assume all sequence lengths are the same within the batch.
- Parameters:
actual_seq_lens_q (torch.Tensor) – Array of actual sequence lengths of the query.
actual_seq_lens_kv (torch.Tensor) – Array of actual sequence lengths of the key and value.
head_dim_qk (int) – Head dimension of the query and key.
head_dim_vo (int) – Head dimension of the value.
num_qo_heads (int) – Number of query heads.
causal (bool) – Whether to use causal masking.
time (float) – Execution time in milliseconds.
- Returns:
TFLOPS per second for the layer.
- Return type:
tflops_per_sec (float)