flashinfer.fused_moe¶
This module provides fused Mixture-of-Experts (MoE) operations optimized for different backends and data types.
Types and Enums¶
|
|
|
Utility Functions¶
|
|
PyTorch implementation of trt-llm gen reorderRowsForGatedActGemm |
CUTLASS Fused MoE¶
|
Compute a Mixture of Experts (MoE) layer using CUTLASS backend. |
TensorRT-LLM Fused MoE¶
|
FP4 block scale MoE operation. |
|
FP8 block scale MoE operation. |
|
FP8 per tensor scale MoE operation. |