flashinfer.fused_moe

This module provides fused Mixture-of-Experts (MoE) operations optimized for different backends and data types.

Types and Enums

RoutingMethodType(value[, names, module, ...])

WeightLayout(value[, names, module, ...])

Utility Functions

convert_to_block_layout(input_tensor, blockK)

reorder_rows_for_gated_act_gemm(x)

PyTorch implementation of trt-llm gen reorderRowsForGatedActGemm

interleave_moe_weights_for_sm90_mixed_gemm(weight)

Interleave 4-bit packed MoE weights for the SM90 mixed-input GEMM.

interleave_moe_scales_for_sm90_mixed_gemm(scales)

Interleave MXFP4 block scales for the SM90 mixed-input MoE GEMM.

CUTLASS Fused MoE

cutlass_fused_moe(input, ...[, ...])

Compute a Mixture of Experts (MoE) layer using CUTLASS backend.

TensorRT-LLM Fused MoE

trtllm_fp4_block_scale_moe(routing_logits, ...)

FP4 block scale MoE operation.

trtllm_fp8_block_scale_moe(routing_logits, ...)

FP8 block scale MoE operation.

trtllm_fp8_per_tensor_scale_moe(...[, ...])

FP8 per tensor scale MoE operation.