flashinfer.fp4_quantization.shuffle_matrix_a

flashinfer.fp4_quantization.shuffle_matrix_a(input_tensor: torch.Tensor, epilogue_tile_m: int) torch.Tensor

PyTorch equivalent of trtllm-gen shuffleMatrixA