flashinfer.fused_moe¶

This module provides fused Mixture-of-Experts (MoE) operations optimized for different backends and data types.

Types and Enums¶

`RoutingMethodType`(value[, names, module, ...])
`WeightLayout`(value[, names, module, ...])

`convert_to_block_layout`(input_tensor, blockK)
`reorder_rows_for_gated_act_gemm`(x)	PyTorch implementation of trt-llm gen reorderRowsForGatedActGemm

Compute a Mixture of Experts (MoE) layer using CUTLASS backend.

`trtllm_fp4_block_scale_moe`(routing_logits, ...)	FP4 block scale MoE operation.
`trtllm_fp8_block_scale_moe`(routing_logits, ...)	FP8 block scale MoE operation.
`trtllm_fp8_per_tensor_scale_moe`(...[, ...])	FP8 per tensor scale MoE operation.