Related papers: Understanding and Leveraging the Expert Specialization of Context Faithfulness in Mixture-of-Experts LLMs

Understanding and Leveraging the Expert Specialization of Context Faithfulness in Mixture-of-Experts LLMs

URL: http://arxiv.org/abs/2508.19594v2
Date: Tue, 16 Sep 2025 08:17:06 GMT
Title: Understanding and Leveraging the Expert Specialization of Context Faithfulness in Mixture-of-Experts LLMs
Authors: Jun Bai, Minghao Tong, Yang Liu, Zixia Jia, Zilong Zheng,
Abstract summary: This work investigates whether certain experts exhibit specialization in context utilization.<n>We introduce Context-faithful Expert Fine-Tuning (CEFT), a lightweight optimization approach that selectively fine-tunes context-faithful experts.
Score: 33.32130125757288
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Context faithfulness is essential for reliable reasoning in context-dependent scenarios. However, large language models often struggle to ground their outputs in the provided context, resulting in irrelevant responses. Inspired by the emergent expert specialization observed in mixture-of-experts architectures, this work investigates whether certain experts exhibit specialization in context utilization, offering a potential pathway toward targeted optimization for improved context faithfulness. To explore this, we propose Router Lens, a method that accurately identifies context-faithful experts. Our analysis reveals that these experts progressively amplify attention to relevant contextual information, thereby enhancing context grounding. Building on this insight, we introduce Context-faithful Expert Fine-Tuning (CEFT), a lightweight optimization approach that selectively fine-tunes context-faithful experts. Experiments across a wide range of benchmarks and models demonstrate that CEFT matches or surpasses the performance of full fine-tuning while being significantly more efficient.

Related papers

The Illusion of Specialization: Unveiling the Domain-Invariant "Standing Committee" in Mixture-of-Experts Models [18.428606280260187]
Mixture of Experts models are widely assumed to achieve domain specialization through sparse routing.<n>We introduce COMMITTEEAUDIT, a framework that analyzes routing behavior at the level of expert groups rather than individual experts.<n>We find that Standing Committees consistently capture the majority of routing mass across domains, layers, and routing budgets.
arXiv Detail & Related papers (2026-01-06T21:29:45Z)
How Many Experts Are Enough? Towards Optimal Semantic Specialization for Mixture-of-Experts [30.125087273625123]
We propose a semanticaware MoE framework for adaptive expert expansion and dynamic routing.<n>MASS converges to the point of optimal balance between cost-performance trade-off and notably improved sematic specialization.
arXiv Detail & Related papers (2025-12-21T05:37:42Z)
Towards Context-aware Reasoning-enhanced Generative Searching in E-commerce [61.03081096959132]
We propose a context-aware reasoning-enhanced generative search framework for better textbfunderstanding the complicated context.<n>Our approach achieves superior performance compared with strong baselines, validating its effectiveness for search-based recommendation.
arXiv Detail & Related papers (2025-10-19T16:46:11Z)
Grounding Long-Context Reasoning with Contextual Normalization for Retrieval-Augmented Generation [57.97548022208733]
We show that seemingly superficial choices in key-value extraction can induce shifts in accuracy and stability.<n>We introduce Contextual Normalization, a strategy that adaptively standardizes context representations before generation.
arXiv Detail & Related papers (2025-10-15T06:28:25Z)
LEAF: A Robust Expert-Based Framework for Few-Shot Continual Event Detection [7.094483187879095]
LEAF is a novel and robust expert-based framework for Continual Event Detection.<n>It incorporates a specialized mixture of experts architecture into the base model, where each expert is parameterized with low-rank adaptation (LoRA) matrices.<n>A semantic-aware expert selection mechanism dynamically routes instances to the most relevant experts, enabling expert specialization and reducing knowledge interference.
arXiv Detail & Related papers (2025-09-29T10:00:25Z)
Mixture of Length and Pruning Experts for Knowledge Graphs Reasoning [9.894106590443714]
We propose textbfMoKGR, a mixture-of-experts framework that personalizes path exploration.<n>MoKGR demonstrates superior performance in both transductive and inductive settings.
arXiv Detail & Related papers (2025-07-28T03:30:28Z)
Cooperation of Experts: Fusing Heterogeneous Information with Large Margin [11.522412489437702]
Cooperation of Experts (CoE) framework encodes multi-typed information into unified heterogeneous multiplex networks.<n>In our framework, dedicated encoders act as domain-specific experts, each specializing in learning distinct relational patterns in specific semantic spaces.
arXiv Detail & Related papers (2025-05-27T08:04:32Z)
Unveiling Hidden Collaboration within Mixture-of-Experts in Large Language Models [5.211806751260724]
We propose a hierarchical sparse dictionary learning (HSDL) method that uncovers the collaboration patterns among experts.<n>We also introduce the Contribution-Aware Expert Pruning (CAEP) algorithm, which effectively prunes low-contribution experts.
arXiv Detail & Related papers (2025-04-16T04:06:15Z)
Convergence Rates for Softmax Gating Mixture of Experts [78.3687645289918]
Mixture of experts (MoE) has emerged as an effective framework to advance the efficiency and scalability of machine learning models.<n>Central to the success of MoE is an adaptive softmax gating mechanism which takes responsibility for determining the relevance of each expert to a given input and then dynamically assigning experts their respective weights.<n>We perform a convergence analysis of parameter estimation and expert estimation under the MoE equipped with the standard softmax gating or its variants, including a dense-to-sparse gating and a hierarchical softmax gating.
arXiv Detail & Related papers (2025-03-05T06:11:24Z)
Context-DPO: Aligning Language Models for Context-Faithfulness [80.62221491884353]
We propose the first alignment method specifically designed to enhance large language models' context-faithfulness.<n>By leveraging faithful and stubborn responses to questions with provided context from ConFiQA, our Context-DPO aligns LLMs through direct preference optimization.<n>Extensive experiments demonstrate that our Context-DPO significantly improves context-faithfulness, achieving 35% to 280% improvements on popular open-source models.
arXiv Detail & Related papers (2024-12-18T04:08:18Z)
Acquiring Diverse Skills using Curriculum Reinforcement Learning with Mixture of Experts [58.220879689376744]
Reinforcement learning (RL) is a powerful approach for acquiring a good-performing policy. We propose textbfDiverse textbfSkill textbfLearning (Di-SkilL) for learning diverse skills. We show on challenging robot simulation tasks that Di-SkilL can learn diverse and performant skills.
arXiv Detail & Related papers (2024-03-11T17:49:18Z)
PromptAgent: Strategic Planning with Language Models Enables Expert-level Prompt Optimization [60.00631098364391]
PromptAgent is an optimization method that crafts expert-level prompts equivalent in quality to those handcrafted by experts. Inspired by human-like trial-and-error exploration, PromptAgent induces precise expert-level insights and in-depth instructions. We apply PromptAgent to 12 tasks spanning three practical domains.
arXiv Detail & Related papers (2023-10-25T07:47:01Z)
Learning to Specialize: Joint Gating-Expert Training for Adaptive MoEs in Decentralized Settings [41.98633628526484]
Mixture-of-Experts (MoEs) achieve scalability by dynamically activating subsets of their components.<n>Motivated by inference costs and data heterogeneity, we study how joint training of gating functions and experts can allocate domain-specific expertise.
arXiv Detail & Related papers (2023-06-14T15:47:52Z)

This list is automatically generated from the titles and abstracts of the papers in this site.