flashinfer.testing.bench_gpu_time

flashinfer.testing.bench_gpu_time(fn, dry_run_iters: int = None, repeat_iters: int = None, dry_run_time_ms: int = 25, repeat_time_ms: int = 100, l2_flush: bool = True, l2_flush_size_mb: int = 256, l2_flush_device: str = 'cuda', sleep_after_run: bool = False)

Benchmark kernel execution time without using CUDA graphs. Measures kernel launch latency + actual kernel execution time for fn(). Can flush L2 cache and sleep after the run.

Number of dry run and actual run iterations can be set by iteration count or time: - If dry_run_iters and repeat_iters are provided, provided iteration count will be used. - If dry_run_iters and repeat_iters are not provided, dry_run_time_ms and repeat_time_ms will be used.

Returns an array of measured times so that the caller can compute statistics.

Parameters:
  • fn – Function to benchmark.

  • dry_run_iters – Number of dry runs during which times does not count. If not provided, dry_run_time_ms will be used.

  • repeat_iters – Number of iterations. If not provided, repeat_time_ms will be used.

  • dry_run_time_ms – Time to run the dry run in milliseconds.

  • repeat_time_ms – Time to run the repeat in milliseconds.

  • l2_flush – Whether to flush L2 cache.

  • l2_flush_size_mb – Size of the L2 cache to flush.

  • l2_flush_device – Device that needs to flush L2 cache.

  • sleep_after_run – Whether to sleep after the run. Sleep time is dynamically set.

Returns:

List of measured times.

Return type:

measured_times