flashinfer.topk¶
Efficient Top-K selection kernels.
See also
For Top-K based sampling, see flashinfer.sampling which provides
top_k_sampling_from_probs(),
top_k_top_p_sampling_from_probs(),
top_k_renorm_probs(), and
top_k_mask_logits().
Top-K Selection¶
|
Radix-based Top-K selection. |
|
Fused Top-K selection + Page Table Transform for sparse attention. |
|
Fused Top-K selection + Ragged Index Transform for sparse attention. |
Utility Functions¶
Check if the GPU supports enough shared memory for FilteredTopK algorithm. |