Understanding and Leveraging the Expert Specialization of Context Faithfulness in Mixture-of-Experts LLMs
- URL: http://arxiv.org/abs/2508.19594v2
- Date: Tue, 16 Sep 2025 08:17:06 GMT
- Title: Understanding and Leveraging the Expert Specialization of Context Faithfulness in Mixture-of-Experts LLMs
- Authors: Jun Bai, Minghao Tong, Yang Liu, Zixia Jia, Zilong Zheng,
- Abstract summary: This work investigates whether certain experts exhibit specialization in context utilization.<n>We introduce Context-faithful Expert Fine-Tuning (CEFT), a lightweight optimization approach that selectively fine-tunes context-faithful experts.
- Score: 33.32130125757288
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Context faithfulness is essential for reliable reasoning in context-dependent scenarios. However, large language models often struggle to ground their outputs in the provided context, resulting in irrelevant responses. Inspired by the emergent expert specialization observed in mixture-of-experts architectures, this work investigates whether certain experts exhibit specialization in context utilization, offering a potential pathway toward targeted optimization for improved context faithfulness. To explore this, we propose Router Lens, a method that accurately identifies context-faithful experts. Our analysis reveals that these experts progressively amplify attention to relevant contextual information, thereby enhancing context grounding. Building on this insight, we introduce Context-faithful Expert Fine-Tuning (CEFT), a lightweight optimization approach that selectively fine-tunes context-faithful experts. Experiments across a wide range of benchmarks and models demonstrate that CEFT matches or surpasses the performance of full fine-tuning while being significantly more efficient.
Related papers
- The Illusion of Specialization: Unveiling the Domain-Invariant "Standing Committee" in Mixture-of-Experts Models [18.428606280260187]
Mixture of Experts models are widely assumed to achieve domain specialization through sparse routing.<n>We introduce COMMITTEEAUDIT, a framework that analyzes routing behavior at the level of expert groups rather than individual experts.<n>We find that Standing Committees consistently capture the majority of routing mass across domains, layers, and routing budgets.
arXiv Detail & Related papers (2026-01-06T21:29:45Z) - How Many Experts Are Enough? Towards Optimal Semantic Specialization for Mixture-of-Experts [30.125087273625123]
We propose a semanticaware MoE framework for adaptive expert expansion and dynamic routing.<n>MASS converges to the point of optimal balance between cost-performance trade-off and notably improved sematic specialization.
arXiv Detail & Related papers (2025-12-21T05:37:42Z) - Towards Context-aware Reasoning-enhanced Generative Searching in E-commerce [61.03081096959132]
We propose a context-aware reasoning-enhanced generative search framework for better textbfunderstanding the complicated context.<n>Our approach achieves superior performance compared with strong baselines, validating its effectiveness for search-based recommendation.
arXiv Detail & Related papers (2025-10-19T16:46:11Z) - Grounding Long-Context Reasoning with Contextual Normalization for Retrieval-Augmented Generation [57.97548022208733]
We show that seemingly superficial choices in key-value extraction can induce shifts in accuracy and stability.<n>We introduce Contextual Normalization, a strategy that adaptively standardizes context representations before generation.
arXiv Detail & Related papers (2025-10-15T06:28:25Z) - LEAF: A Robust Expert-Based Framework for Few-Shot Continual Event Detection [7.094483187879095]
LEAF is a novel and robust expert-based framework for Continual Event Detection.<n>It incorporates a specialized mixture of experts architecture into the base model, where each expert is parameterized with low-rank adaptation (LoRA) matrices.<n>A semantic-aware expert selection mechanism dynamically routes instances to the most relevant experts, enabling expert specialization and reducing knowledge interference.
arXiv Detail & Related papers (2025-09-29T10:00:25Z) - Mixture of Length and Pruning Experts for Knowledge Graphs Reasoning [9.894106590443714]
We propose textbfMoKGR, a mixture-of-experts framework that personalizes path exploration.<n>MoKGR demonstrates superior performance in both transductive and inductive settings.
arXiv Detail & Related papers (2025-07-28T03:30:28Z) - Cooperation of Experts: Fusing Heterogeneous Information with Large Margin [11.522412489437702]
Cooperation of Experts (CoE) framework encodes multi-typed information into unified heterogeneous multiplex networks.<n>In our framework, dedicated encoders act as domain-specific experts, each specializing in learning distinct relational patterns in specific semantic spaces.
arXiv Detail & Related papers (2025-05-27T08:04:32Z) - Unveiling Hidden Collaboration within Mixture-of-Experts in Large Language Models [5.211806751260724]
We propose a hierarchical sparse dictionary learning (HSDL) method that uncovers the collaboration patterns among experts.<n>We also introduce the Contribution-Aware Expert Pruning (CAEP) algorithm, which effectively prunes low-contribution experts.
arXiv Detail & Related papers (2025-04-16T04:06:15Z) - Convergence Rates for Softmax Gating Mixture of Experts [78.3687645289918]
Mixture of experts (MoE) has emerged as an effective framework to advance the efficiency and scalability of machine learning models.<n>Central to the success of MoE is an adaptive softmax gating mechanism which takes responsibility for determining the relevance of each expert to a given input and then dynamically assigning experts their respective weights.<n>We perform a convergence analysis of parameter estimation and expert estimation under the MoE equipped with the standard softmax gating or its variants, including a dense-to-sparse gating and a hierarchical softmax gating.
arXiv Detail & Related papers (2025-03-05T06:11:24Z) - Context-DPO: Aligning Language Models for Context-Faithfulness [80.62221491884353]
We propose the first alignment method specifically designed to enhance large language models' context-faithfulness.<n>By leveraging faithful and stubborn responses to questions with provided context from ConFiQA, our Context-DPO aligns LLMs through direct preference optimization.<n>Extensive experiments demonstrate that our Context-DPO significantly improves context-faithfulness, achieving 35% to 280% improvements on popular open-source models.
arXiv Detail & Related papers (2024-12-18T04:08:18Z) - Acquiring Diverse Skills using Curriculum Reinforcement Learning with Mixture of Experts [58.220879689376744]
Reinforcement learning (RL) is a powerful approach for acquiring a good-performing policy.
We propose textbfDiverse textbfSkill textbfLearning (Di-SkilL) for learning diverse skills.
We show on challenging robot simulation tasks that Di-SkilL can learn diverse and performant skills.
arXiv Detail & Related papers (2024-03-11T17:49:18Z) - PromptAgent: Strategic Planning with Language Models Enables
Expert-level Prompt Optimization [60.00631098364391]
PromptAgent is an optimization method that crafts expert-level prompts equivalent in quality to those handcrafted by experts.
Inspired by human-like trial-and-error exploration, PromptAgent induces precise expert-level insights and in-depth instructions.
We apply PromptAgent to 12 tasks spanning three practical domains.
arXiv Detail & Related papers (2023-10-25T07:47:01Z) - Learning to Specialize: Joint Gating-Expert Training for Adaptive MoEs in Decentralized Settings [41.98633628526484]
Mixture-of-Experts (MoEs) achieve scalability by dynamically activating subsets of their components.<n>Motivated by inference costs and data heterogeneity, we study how joint training of gating functions and experts can allocate domain-specific expertise.
arXiv Detail & Related papers (2023-06-14T15:47:52Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.