flashinfer.testing

This module provides comprehensive testing utilities for benchmarking, performance analysis in FlashInfer.

Test Environment Setup

set_seed(random_seed)

Set random seed for reproducibility during testing.

sleep_after_kernel_run(execution_time)

Sleep after kernel run.

Performance Analysis

FLOPS Calculation

attention_flops(batch_size, qo_seqlen, ...)

Calculate FLOPs for a given attention layer.

attention_flops_with_actual_seq_lens(...)

Calculate FLOPs for a given attention layer with actual sequence lengths where actual sequence lengths are provided as 1D tensors.

attention_tflops_per_sec(batch_size, ...)

Calculate TFLOPS per second for a given attention layer.

attention_tflops_per_sec_with_actual_seq_lens(...)

Calculate TFLOPS per second for a given attention layer with actual sequence lengths.

Throughput Analysis

attention_tb_per_sec(batch_size, qo_seqlen, ...)

Calculate TB per second perf achieved for a given attention layer.

attention_tb_per_sec_with_actual_seq_lens(...)

Calculate TB per second perf achieved for a given attention layer with actual sequence lengths.

GPU Benchmarking

bench_gpu_time(fn[, dry_run_iters, ...])

Benchmark kernel execution time without using CUDA graphs.

bench_gpu_time_with_cudagraph(fn[, ...])

Benchmark GPU time using by constructing CUDA graphs with kernel launch and then replaying the graph.