flashinfer.fused_moe¶
This module provides fused Mixture-of-Experts (MoE) operations optimized for different backends and data types.
Types and Enums¶
  | 
|
  | 
Utility Functions¶
  | 
|
PyTorch implementation of trt-llm gen reorderRowsForGatedActGemm  | 
CUTLASS Fused MoE¶
  | 
Compute a Mixture of Experts (MoE) layer using CUTLASS backend.  | 
TensorRT-LLM Fused MoE¶
  | 
FP4 block scale MoE operation.  | 
  | 
FP8 block scale MoE operation.  | 
  | 
FP8 per tensor scale MoE operation.  |