flashinfer.testing.bench_gpu_time

flashinfer.testing.bench_gpu_time(fn, dry_run_iters: int = None, repeat_iters: int = None, dry_run_time_ms: int = 25, repeat_time_ms: int = 100, l2_flush: bool = True, l2_flush_size_mb: int = 256, l2_flush_device: str = 'cuda', sleep_after_run: bool = False, enable_cupti: bool = False, use_cuda_graph: bool = False, num_iters_within_graph: int = 10)

Benchmark wrapper that chooses among CUPTI, CUDA events, or CUDA Graphs.

By default, uses CUDA events (enable_cupti=False, use_cuda_graph=False).

Args mirror the underlying implementations; extra control flags: - enable_cupti: If True, use CUPTI to measure GPU kernel time.

  • If use_cuda_graph is True, will capture and replay a CUDA graph during measurement.

  • use_cuda_graph: If True (and enable_cupti is False), use CUDA graph timing.

  • num_iters_within_graph: Iterations to run within the CUDA graph when used (non-CUPTI path only).