flashinfer.testing.attention_tb_per_sec¶
- flashinfer.testing.attention_tb_per_sec(batch_size, qo_seqlen, kv_seqlen, head_dim_qk, head_dim_vo, num_qo_heads, num_kv_heads, time, q_dtype=torch.bfloat16, kv_dtype=torch.bfloat16, o_dtype=torch.bfloat16)¶
Calculate TB per second perf achieved for a given attention layer. Assumes all sequence lengths are the same within the batch.
- Parameters:
batch_size (int) – Batch size.
qo_seqlen (int) – Sequence length of the query.
kv_seqlen (int) – Sequence length of the key and value.
head_dim_qk (int) – Head dimension of the query and key.
head_dim_vo (int) – Head dimension of the value.
num_qo_heads (int) – Number of query heads.
num_kv_heads (int) – Number of key and value heads.
time (float) – Execution time in milliseconds.
q_dtype (torch.dtype) – Data type of the query.
kv_dtype (torch.dtype) – Data type of the key and value.
o_dtype (torch.dtype) – Data type of the output.
- Returns:
TB per second for the layer.
- Return type:
tb_per_sec (float)