Related papers: Glider: Global and Local Instruction-Driven Expert Router

Glider: Global and Local Instruction-Driven Expert Router

URL: http://arxiv.org/abs/2410.07172v1
Date: Wed, 9 Oct 2024 17:59:14 GMT
Title: Glider: Global and Local Instruction-Driven Expert Router
Authors: Pingzhi Li, Prateek Yadav, Jaehong Yoon, Jie Peng, Yi-Lin Sung, Mohit Bansal, Tianlong Chen,
Abstract summary: "Model MoErging" methods prioritize generalization to unseen tasks at the expense of performance on held-in tasks. We propose Global and Local Instruction Driven Expert Router (GLIDER) that integrates a multi-scale routing mechanism. GLIDER achieves substantially improved held-in performance while maintaining strong generalization on held-out tasks.
Score: 83.785832410832
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: The availability of performant pre-trained models has led to a proliferation of fine-tuned expert models that are specialized to particular domains. This has enabled the creation of powerful and adaptive routing-based "Model MoErging" methods with the goal of using expert modules to create an aggregate system with improved performance or generalization. However, existing MoErging methods often prioritize generalization to unseen tasks at the expense of performance on held-in tasks, which limits its practical applicability in real-world deployment scenarios. We observe that current token-level routing mechanisms neglect the global semantic context of the input task. This token-wise independence hinders effective expert selection for held-in tasks, as routing decisions fail to incorporate the semantic properties of the task. To address this, we propose, Global and Local Instruction Driven Expert Router (GLIDER) that integrates a multi-scale routing mechanism, encompassing a semantic global router and a learned local router. The global router leverages LLM's advanced reasoning capabilities for semantic-related contexts to enhance expert selection. Given the input query and LLM, the router generates semantic task instructions that guide the retrieval of the most relevant experts across all layers. This global guidance is complemented by a local router that facilitates token-level routing decisions within each module, enabling finer control and enhanced performance on unseen tasks. Our experiments using T5-based models for T0 and FLAN tasks demonstrate that GLIDER achieves substantially improved held-in performance while maintaining strong generalization on held-out tasks. We also perform ablations experiments to dive deeper into the components of GLIDER. Our experiments highlight the importance of our multi-scale routing that leverages LLM-driven semantic reasoning for MoErging methods.

Related papers

Router-R1: Teaching LLMs Multi-Round Routing and Aggregation via Reinforcement Learning [12.878608250420832]
We present textbf generalization-R1, a reinforcement learning framework that formulates multi-LLM routing and aggregation as a sequential decision process.<n>To facilitate learning, we employ a lightweight rule-based reward comprising format rewards, final outcome rewards, and a novel cost reward for optimizing the balance between performance and cost.
arXiv Detail & Related papers (2025-06-10T17:56:45Z)
Enhancing Multimodal Continual Instruction Tuning with BranchLoRA [26.618850482040397]
Multimodal Continual Instruction Tuning aims to finetune Multimodal Large Language Models (MLLMs) to continually align with human intent.<n>Existing approaches often rely on the Mixture-of-Experts (MoE) LoRA framework to preserve previous instruction alignments.<n>We propose BranchLoRA, an asymmetric framework to enhance both efficiency and performance.
arXiv Detail & Related papers (2025-05-31T09:02:38Z)
INFERENCEDYNAMICS: Efficient Routing Across LLMs through Structured Capability and Knowledge Profiling [44.309917620936474]
InferenceDynamics is a flexible and scalable multi-dimensional routing framework by modeling the capability and knowledge of models.<n>We operate it on our comprehensive dataset RouteMix, and demonstrate its effectiveness and generalizability in group-level routing.
arXiv Detail & Related papers (2025-05-22T06:56:51Z)
ExpertSteer: Intervening in LLMs through Expert Knowledge [71.12193680015622]
Activation steering offers a promising method to control the generation process of Large Language Models.<n>We propose ExpertSteer, a novel approach that leverages arbitrary specialized expert models to generate steering vectors.<n>We conduct comprehensive experiments using three LLMs on 15 popular benchmarks across four distinct domains.
arXiv Detail & Related papers (2025-05-18T08:55:46Z)
Token-Level Prompt Mixture with Parameter-Free Routing for Federated Domain Generalization [51.562474873972086]
Federated domain generalization (FedDG) aims to learn a globally generalizable model from decentralized clients with heterogeneous data. Recent studies have introduced prompt learning to adapt vision-language models (VLMs) in FedDG by learning a single global prompt. We propose TRIP, a Token-level prompt mixture with parameter-free routing framework for FedDG.
arXiv Detail & Related papers (2025-04-29T11:06:03Z)
LLaVA-CMoE: Towards Continual Mixture of Experts for Large Vision-Language Models [21.888139819188105]
We present an innovative framework named LLaVA-CMoE, which is a continuous Mixture of Experts (MoE) architecture without any replay data. Specifically, we have developed a method called Probe-Guided Knowledge Extension (PGKE), which employs probe experts to assess whether additional knowledge is required. We also introduce a hierarchical routing algorithm called Probabilistic Task Locator (PTL), where high-level routing captures inter-task information and low-level routing focuses on intra-task details.
arXiv Detail & Related papers (2025-03-27T07:36:11Z)
SAME: Learning Generic Language-Guided Visual Navigation with State-Adaptive Mixture of Experts [54.11162991206203]
This paper consolidates diverse navigation tasks into a unified and generic framework. We propose a novel State-Adaptive Mixture of Experts (SAME) model that effectively enables an agent to infer decisions.
arXiv Detail & Related papers (2024-12-07T06:12:53Z)
AT-MoE: Adaptive Task-planning Mixture of Experts via LoRA Approach [0.6906005491572401]
This paper introduces the Adaptive Task-planing Mixture of Experts(AT-MoE) architecture. We first train task-specific experts via LoRA approach to enhance problem-solving capabilities and interpretability in specialized areas. We then introduce a layer-wise adaptive grouped routing module that optimize module fusion based on complex task instructions.
arXiv Detail & Related papers (2024-10-12T13:03:15Z)
Fine-Tuning Large Vision-Language Models as Decision-Making Agents via Reinforcement Learning [79.38140606606126]
We propose an algorithmic framework that fine-tunes vision-language models (VLMs) with reinforcement learning (RL) Our framework provides a task description and then prompts the VLM to generate chain-of-thought (CoT) reasoning. We demonstrate that our proposed framework enhances the decision-making capabilities of VLM agents across various tasks.
arXiv Detail & Related papers (2024-05-16T17:50:19Z)
Semantic Routing for Enhanced Performance of LLM-Assisted Intent-Based 5G Core Network Management and Orchestration [10.981422497762837]
Large language models (LLMs) are rapidly emerging in Artificial Intelligence (AI) applications. This paper presents semantic routing to achieve enhanced performance in intent-based management and orchestration of 5G core networks.
arXiv Detail & Related papers (2024-04-24T13:34:20Z)
Mixture-of-LoRAs: An Efficient Multitask Tuning for Large Language Models [7.966452497550907]
We propose the Mixture-of-LoRAs (MoA) architecture for multi-task learning with large language models (LLMs) Multiple domain-specific LoRA modules can be aligned with the expert design principles observed in Mixture-of-Experts (MoE) Each LoRA model can be iteratively adapted to a new domain, allowing for quick domain-specific adaptation.
arXiv Detail & Related papers (2024-03-06T03:33:48Z)
Small LLMs Are Weak Tool Learners: A Multi-LLM Agent [73.54562551341454]
Large Language Model (LLM) agents significantly extend the capabilities of standalone LLMs. We propose a novel approach that decomposes the aforementioned capabilities into a planner, caller, and summarizer. This modular framework facilitates individual updates and the potential use of smaller LLMs for building each capability.
arXiv Detail & Related papers (2024-01-14T16:17:07Z)
Not All Tasks Are Equally Difficult: Multi-Task Deep Reinforcement Learning with Dynamic Depth Routing [26.44273671379482]
Multi-task reinforcement learning endeavors to accomplish a set of different tasks with a single policy. This work presents a Dynamic Depth Routing (D2R) framework, which learns strategic skipping of certain intermediate modules, thereby flexibly choosing different numbers of modules for each task. In addition, we design an automatic route-balancing mechanism to encourage continued routing exploration for unmastered tasks without disturbing the routing of mastered ones.
arXiv Detail & Related papers (2023-12-22T06:51:30Z)
Mixture of Cluster-conditional LoRA Experts for Vision-language Instruction Tuning [68.94230363140771]
Mixture of Cluster-conditional LoRA Experts (MoCLE) MoCLE is a novel Mixture of Experts architecture designed to activate the task-customized model parameters based on the instruction clusters. Experiments on InstructBLIP and LLaVA demonstrate the effectiveness of MoCLE.
arXiv Detail & Related papers (2023-12-19T18:11:19Z)
Knowledge Plugins: Enhancing Large Language Models for Domain-Specific Recommendations [50.81844184210381]
We propose a general paradigm that augments large language models with DOmain-specific KnowledgE to enhance their performance on practical applications, namely DOKE. This paradigm relies on a domain knowledge extractor, working in three steps: 1) preparing effective knowledge for the task; 2) selecting the knowledge for each specific sample; and 3) expressing the knowledge in an LLM-understandable way.
arXiv Detail & Related papers (2023-11-16T07:09:38Z)
LanguageMPC: Large Language Models as Decision Makers for Autonomous Driving [87.1164964709168]
This work employs Large Language Models (LLMs) as a decision-making component for complex autonomous driving scenarios. Extensive experiments demonstrate that our proposed method not only consistently surpasses baseline approaches in single-vehicle tasks, but also helps handle complex driving behaviors even multi-vehicle coordination.
arXiv Detail & Related papers (2023-10-04T17:59:49Z)
Curricular Subgoals for Inverse Reinforcement Learning [21.038691420095525]
Inverse Reinforcement Learning (IRL) aims to reconstruct the reward function from expert demonstrations to facilitate policy learning. Existing IRL methods mainly focus on learning global reward functions to minimize the trajectory difference between the imitator and the expert. We propose a novel Curricular Subgoal-based Inverse Reinforcement Learning framework, that explicitly disentangles one task with several local subgoals to guide agent imitation.
arXiv Detail & Related papers (2023-06-14T04:06:41Z)

This list is automatically generated from the titles and abstracts of the papers in this site.