flashinfer.fused_moe

This module provides fused Mixture-of-Experts (MoE) operations optimized for different backends and data types.

Types and Enums

RoutingMethodType(value[, names, module, ...])

WeightLayout(value[, names, module, ...])

Utility Functions

convert_to_block_layout(input_tensor, blockK)

reorder_rows_for_gated_act_gemm(x)

PyTorch implementation of trt-llm gen reorderRowsForGatedActGemm

CUTLASS Fused MoE

cutlass_fused_moe(input, ...[, ...])

Compute a Mixture of Experts (MoE) layer using CUTLASS backend.

TensorRT-LLM Fused MoE

trtllm_fp4_block_scale_moe(routing_logits, ...)

FP4 block scale MoE operation.

trtllm_fp8_block_scale_moe(routing_logits, ...)

FP8 block scale MoE operation.

trtllm_fp8_per_tensor_scale_moe(...[, ...])

FP8 per tensor scale MoE operation.