Related papers: AnyExperts: On-Demand Expert Allocation for Multimodal Language Models with Mixture of Expert

AnyExperts: On-Demand Expert Allocation for Multimodal Language Models with Mixture of Expert

URL: http://arxiv.org/abs/2511.18314v1
Date: Sun, 23 Nov 2025 06:53:43 GMT
Title: AnyExperts: On-Demand Expert Allocation for Multimodal Language Models with Mixture of Expert
Authors: Yuting Gao, Wang Lan, Hengyuan Zhao, Linjiang Huang, Si Liu, Qingpei Guo,
Abstract summary: We propose AnyExperts, a novel on-demand, budget-aware dynamic routing framework.<n>It allocates a variable total number of expert slots per token based on its semantic importance.<n>It is evaluated across diverse tasks in visual understanding, audio understanding, and NLP understanding.
Score: 26.761443359046286
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Multimodal Mixture-of-Experts (MoE) models offer a promising path toward scalable and efficient large vision-language systems. However, existing approaches rely on rigid routing strategies (typically activating a fixed number of experts per token) ignoring the inherent heterogeneity in semantic importance across modalities. This leads to suboptimal compute allocation, where redundant tokens consume as many resources as critical ones. To address this, we propose AnyExperts, a novel on-demand, budget-aware dynamic routing framework that allocates a variable total number of expert slots per token based on its semantic importance. Crucially, to prevent uncontrolled compute growth, the total slots per token are constrained within a fixed range, and each slot is filled by either a real expert or a virtual expert, with the virtual share capped at a small maximum (e.g., 20%). The model then adaptively balances the real-to-virtual ratio per token, assigning more real experts to semantically rich regions and relying more on virtual experts for redundant content. Evaluated across diverse tasks in visual understanding, audio understanding, and NLP understanding, AnyExperts improves performance under the same compute budget. Notably, on general image/video tasks, it achieves comparable accuracy with 40% fewer real expert activations; on text-dense tasks (OCR and NLP), it maintains performance while reducing real expert usage by 10%. These results demonstrate that fine-grained, importance-driven expert allocation significantly enhances both the efficiency and effectiveness of multimodal MoE models.

Related papers

TSEmbed: Unlocking Task Scaling in Universal Multimodal Embeddings [26.532942920392376]
TSEmbed is a universal multimodal embedding framework that synergizes Mixture-of-Experts (MoE) with Low-Rank Adaptation (LoRA)<n>We introduce Expert-Aware Negative Sampling (EANS), a novel strategy that leverages expert routing distributions as an intrinsic proxy for semantic similarity.
arXiv Detail & Related papers (2026-03-05T03:43:52Z)
SMES: Towards Scalable Multi-Task Recommendation via Expert Sparsity [47.79376327982703]
Industrial recommender systems rely on multi-task learning to estimate diverse user feedback signals and aggregate them for ranking.<n>Recent advances in model scaling have shown promising gains in recommendation.<n>This mismatch between uniform parameter scaling and heterogeneous task capacity demands poses a fundamental challenge for scalable multi-task recommendation.
arXiv Detail & Related papers (2026-02-10T03:56:12Z)
Token-Level LLM Collaboration via FusionRoute [60.72307345997823]
FusionRoute is a token-level multi-LLM collaboration framework.<n>It selects the most suitable expert at each decoding step and contributes a complementary logit that refines or corrects the selected expert's next-token distribution.<n>It outperforms both sequence- and token-level collaboration, model merging, and direct fine-tuning.
arXiv Detail & Related papers (2026-01-08T16:53:16Z)
How Many Experts Are Enough? Towards Optimal Semantic Specialization for Mixture-of-Experts [30.125087273625123]
We propose a semanticaware MoE framework for adaptive expert expansion and dynamic routing.<n>MASS converges to the point of optimal balance between cost-performance trade-off and notably improved sematic specialization.
arXiv Detail & Related papers (2025-12-21T05:37:42Z)
Beyond Redundancy: Diverse and Specialized Multi-Expert Sparse Autoencoder [59.89996751196727]
Sparse autoencoders (SAEs) have emerged as a powerful tool for interpreting large language models.<n>SAEs' hidden layers have high dimensionality to satisfy sparsity constraints, resulting in prohibitive training and inference costs.<n>Recent Mixture of Experts (MoE) approaches attempt to address this by SAEs into narrower expert networks with gated activation.<n>We propose two key innovations: (1) Multiple Expert Activation that simultaneously engages semantically weighted expert subsets to encourage specialization, and (2) Feature Scaling that enhances diversity through adaptive high-frequency scaling.
arXiv Detail & Related papers (2025-11-07T22:19:34Z)
GMoPE:A Prompt-Expert Mixture Framework for Graph Foundation Models [30.023472202549076]
Graph Neural Networks (GNNs) have demonstrated impressive performance on task-specific benchmarks, yet their ability to generalize across diverse domains and tasks remains limited.<n>We propose GMoPE, a framework that seamlessly integrates the Mixture-of-Experts (MoE) architecture with prompt-based learning for graphs.<n>We show that GMoPE consistently outperforms state-of-the-art baselines and achieves performance comparable to full parameter fine-tuning.
arXiv Detail & Related papers (2025-11-05T07:28:51Z)
One-Prompt Strikes Back: Sparse Mixture of Experts for Prompt-based Continual Learning [52.966712416640085]
We propose SMoPE, a novel framework that integrates the benefits of both task-specific and shared prompt strategies.<n>SMoPE consistently outperforms task-specific prompt methods and achieves performance competitive with state-of-the-art approaches.
arXiv Detail & Related papers (2025-09-29T08:54:58Z)
SPANER: Shared Prompt Aligner for Multimodal Semantic Representation [0.0]
Shared Prompt AligNER (SPANER) is a modality-agnostic PEFT framework designed to embed inputs from diverse modalities into a unified semantic space.<n>SPANER employs a shared prompt mechanism that acts as a conceptual anchor, enabling semantically related instances to converge spatially regardless of modality.<n>Our results highlight the importance of aligning embedding structures, rather than merely tuning adapter weights, for scalable multimodal learning.
arXiv Detail & Related papers (2025-08-18T22:20:42Z)
MoE-MLoRA for Multi-Domain CTR Prediction: Efficient Adaptation with Expert Specialization [0.0]
MoE-MLoRA is a mixture-of-experts framework where each expert is first trained independently to specialize in its domain.<n>We evaluate MoE-MLoRA across eight CTR models on Movielens and Taobao.
arXiv Detail & Related papers (2025-06-09T09:03:05Z)
Mixture Compressor for Mixture-of-Experts LLMs Gains More [71.0473038084673]
We propose a training-free Mixture-Compressor for Mixture-of-Experts large language models (MoE-LLMs)<n>Our MC integrates static quantization and dynamic pruning to collaboratively achieve extreme compression for MoE-LLMs with less accuracy loss.<n>For instance, at 2.54 bits, MC compresses 76.6% of the model, with only a 3.8% average accuracy loss.
arXiv Detail & Related papers (2024-10-08T18:09:38Z)
Mixture of Nested Experts: Adaptive Processing of Visual Tokens [49.43920770789789]
Vision Transformer (ViT) based models fail to capitalize on inherent redundancy, leading to higher computational costs. We present Mixture of Nested Experts (MoNE), which utilizes a nested structure for experts, wherein individual experts fall on an increasing compute-accuracy curve. We validate our approach on standard image and video datasets - ImageNet-21K, Kinetics400, and Something-Something-v2.
arXiv Detail & Related papers (2024-07-29T13:19:31Z)
T-REX: Mixture-of-Rank-One-Experts with Semantic-aware Intuition for Multi-task Large Language Model Finetuning [31.276142111455847]
Large language models (LLMs) encounter significant adaptation challenges in diverse multitask finetuning.<n>We design a novel framework, mixunderlinetextbfTureunderlinetextbf-of-underlinetextbfRank-onunderlinetextbfE-eunderlinetextbfXper ts (textttT-REX)<n>Rank-1 experts enable a mix-and-match mechanism to quadratically expand the vector subspace of experts with linear parameter overheads, achieving approximate error reduction with optimal
arXiv Detail & Related papers (2024-04-13T12:14:58Z)
Exploiting Modality-Specific Features For Multi-Modal Manipulation Detection And Grounding [54.49214267905562]
We construct a transformer-based framework for multi-modal manipulation detection and grounding tasks. Our framework simultaneously explores modality-specific features while preserving the capability for multi-modal alignment. We propose an implicit manipulation query (IMQ) that adaptively aggregates global contextual cues within each modality.
arXiv Detail & Related papers (2023-09-22T06:55:41Z)

This list is automatically generated from the titles and abstracts of the papers in this site.