SpectR: Dynamically Composing LM Experts with Spectral Routing
- URL: http://arxiv.org/abs/2504.03454v1
- Date: Fri, 04 Apr 2025 13:58:44 GMT
- Title: SpectR: Dynamically Composing LM Experts with Spectral Routing
- Authors: William Fleshman, Benjamin Van Durme,
- Abstract summary: This paper introduces SPECTR, an approach for dynamically composing expert models at each time step during inference.<n>We show that SPECTR improves routing accuracy over alternative training-free methods, increasing task performance across expert domains.
- Score: 37.969478059005574
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Training large, general-purpose language models poses significant challenges. The growing availability of specialized expert models, fine-tuned from pretrained models for specific tasks or domains, offers a promising alternative. Leveraging the potential of these existing expert models in real-world applications requires effective methods to select or merge the models best suited for a given task. This paper introduces SPECTR, an approach for dynamically composing expert models at each time step during inference. Notably, our method requires no additional training and enables flexible, token- and layer-wise model combinations. Our experimental results demonstrate that SPECTR improves routing accuracy over alternative training-free methods, increasing task performance across expert domains.
Related papers
- ERNIE 5.0 Technical Report [244.36480708815316]
ERNIE 5.0 is a unified autoregressive foundation model for unified multimodal understanding and generation across text, image, video, and audio.<n>To address practical challenges in large-scale deployment under diverse resource constraints, ERNIE 5.0 adopts a novel elastic training paradigm.<n>We show that ERNIE 5.0 achieves strong and balanced performance across multiple modalities.
arXiv Detail & Related papers (2026-02-04T16:18:15Z) - SAIL-Embedding Technical Report: Omni-modal Embedding Foundation Model [49.65930977591188]
Multimodal embedding models aim to yield informative unified representations that empower diverse cross-modal tasks.<n>We introduce SAIL-Embedding, an omni-modal embedding foundation model that addresses these issues through tailored training strategies and architectural design.<n>Specifically, the content-aware progressive training aims to enhance the model's adaptability to diverse downstream tasks and master enriched cross-modal proficiency.<n>The collaboration-aware recommendation enhancement training further adapts multimodal representations for recommendation scenarios by distilling knowledge from sequence-to-item and ID-to-item embeddings.
arXiv Detail & Related papers (2025-10-14T16:43:22Z) - Training Matryoshka Mixture-of-Experts for Elastic Inference-Time Expert Utilization [60.309915093470416]
Matryoshka MoE (M-MoE) is a training framework that instills a coarse-to-fine structure directly into the expert ensemble.<n>Our work paves the way for more practical and adaptable deployments of large-scale MoE models.
arXiv Detail & Related papers (2025-09-30T16:56:44Z) - Towards Agentic AI for Multimodal-Guided Video Object Segmentation [14.877182670778284]
Referring-based Video Object is a multimodal problem that requires producing fine-grained segmentation results guided by external cues.<n>Recent advances in vision-language foundation models open a promising direction toward training-free approaches.<n>We propose Multi-Modal Agent, a novel agentic system designed to solve this task in a more flexible and adaptive manner.
arXiv Detail & Related papers (2025-08-14T12:11:15Z) - UniSTD: Towards Unified Spatio-Temporal Learning across Diverse Disciplines [64.84631333071728]
We introduce bfUnistage, a unified Transformer-based framework fortemporal modeling.<n>Our work demonstrates that a task-specific vision-text can build a generalizable model fortemporal learning.<n>We also introduce a temporal module to incorporate temporal dynamics explicitly.
arXiv Detail & Related papers (2025-03-26T17:33:23Z) - Upcycling Instruction Tuning from Dense to Mixture-of-Experts via Parameter Merging [36.0133566024214]
Upcycling Instruction Tuning (UpIT) is a data-efficient approach for tuning a dense pre-trained model into a MoE instruction model.
To ensure each specialized expert in the MoE model works as expected, we select a small amount of seed data that each expert excels to pre-optimize the router.
arXiv Detail & Related papers (2024-10-02T14:48:22Z) - Revisiting SMoE Language Models by Evaluating Inefficiencies with Task Specific Expert Pruning [78.72226641279863]
Sparse Mixture of Expert (SMoE) models have emerged as a scalable alternative to dense models in language modeling.
Our research explores task-specific model pruning to inform decisions about designing SMoE architectures.
We introduce an adaptive task-aware pruning technique UNCURL to reduce the number of experts per MoE layer in an offline manner post-training.
arXiv Detail & Related papers (2024-09-02T22:35:03Z) - Diversifying the Expert Knowledge for Task-Agnostic Pruning in Sparse Mixture-of-Experts [75.85448576746373]
We propose a method of grouping and pruning similar experts to improve the model's parameter efficiency.
We validate the effectiveness of our method by pruning three state-of-the-art MoE architectures.
The evaluation shows that our method outperforms other model pruning methods on a range of natural language tasks.
arXiv Detail & Related papers (2024-07-12T17:25:02Z) - Harder Tasks Need More Experts: Dynamic Routing in MoE Models [58.18526590138739]
We introduce a novel dynamic expert selection framework for Mixture of Experts (MoE) models.
Our method dynamically selects experts based on the confidence level in expert selection for each input.
arXiv Detail & Related papers (2024-03-12T13:41:15Z) - Learning to Route Among Specialized Experts for Zero-Shot Generalization [39.56470042680907]
We propose Post-Hoc Adaptive Tokenwise Gating Over an Ocean of Specialized Experts (PHATGOOSE)
It learns to route among specialized modules that were produced through parameter-efficient fine-tuning.
It does not require simultaneous access to the datasets used to create the specialized models and only requires a modest amount of additional compute after each expert model is trained.
arXiv Detail & Related papers (2024-02-08T17:43:22Z) - Self-Supervised Reinforcement Learning that Transfers using Random
Features [41.00256493388967]
We propose a self-supervised reinforcement learning method that enables the transfer of behaviors across tasks with different rewards.
Our method is self-supervised in that it can be trained on offline datasets without reward labels, but can then be quickly deployed on new tasks.
arXiv Detail & Related papers (2023-05-26T20:37:06Z) - Model-Based Deep Learning: On the Intersection of Deep Learning and
Optimization [101.32332941117271]
Decision making algorithms are used in a multitude of different applications.
Deep learning approaches that use highly parametric architectures tuned from data without relying on mathematical models are becoming increasingly popular.
Model-based optimization and data-centric deep learning are often considered to be distinct disciplines.
arXiv Detail & Related papers (2022-05-05T13:40:08Z) - Sample Efficient Reinforcement Learning via Model-Ensemble Exploration
and Exploitation [3.728946517493471]
MEEE is a model-ensemble method that consists of optimistic exploration and weighted exploitation.
Our approach outperforms other model-free and model-based state-of-the-art methods, especially in sample complexity.
arXiv Detail & Related papers (2021-07-05T07:18:20Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.