Related papers: Exploiting the Experts: Unauthorized Compression in MoE-LLMs

Exploiting the Experts: Unauthorized Compression in MoE-LLMs

URL: http://arxiv.org/abs/2511.19480v1
Date: Sat, 22 Nov 2025 20:08:29 GMT
Title: Exploiting the Experts: Unauthorized Compression in MoE-LLMs
Authors: Pinaki Prasad Guha Neogi, Ahmad Mohammadshirazi, Dheeraj Kulshrestha, Rajiv Ramnath,
Abstract summary: We study the prunability of MoE-LLMs under task-specific usage.<n>We propose defense strategies that aim to make MoE models harder to compress and fine-tune without authorization.
Score: 1.580774794371876
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Mixture-of-Experts (MoE) architectures are increasingly adopted in large language models (LLMs) for their scalability and efficiency. However, their modular structure introduces a unique vulnerability: adversaries can attempt to compress or repurpose models by pruning experts and cheaply fine-tuning the remainder, effectively bypassing licensing and security constraints. In this paper, we systematically study the prunability of MoE-LLMs under task-specific usage. We first develop an expert attribution framework that identifies the subset of experts most responsible for a given task, then evaluate the performance trade-offs of pruning and re-aligning these experts using active learning-driven fine-tuning. Our findings reveal a critical knowledge loss--recovery trade-off: while certain experts can be isolated to retain task accuracy, significant degradation occurs without targeted re-alignment. Based on this analysis, we propose defense strategies that aim to make MoE models harder to compress and fine-tune without authorization, including entangled expert training and selective fine-tuning protocols that resist unauthorized adaptation. By positioning expert pruning as both a threat vector and a defense target, this work highlights the dual-use nature of MoE modularity and provides the first systematic evaluation framework for secure specialization of MoE-LLMs.

Related papers

Beyond Redundancy: Diverse and Specialized Multi-Expert Sparse Autoencoder [59.89996751196727]
Sparse autoencoders (SAEs) have emerged as a powerful tool for interpreting large language models.<n>SAEs' hidden layers have high dimensionality to satisfy sparsity constraints, resulting in prohibitive training and inference costs.<n>Recent Mixture of Experts (MoE) approaches attempt to address this by SAEs into narrower expert networks with gated activation.<n>We propose two key innovations: (1) Multiple Expert Activation that simultaneously engages semantically weighted expert subsets to encourage specialization, and (2) Feature Scaling that enhances diversity through adaptive high-frequency scaling.
arXiv Detail & Related papers (2025-11-07T22:19:34Z)
Advancing Expert Specialization for Better MoE [22.88847592702946]
Mixture-of-Experts (MoE) models enable efficient scaling of large language models (LLMs) by activating only a subset of experts per input.<n>We observe that the commonly used auxiliary load balancing loss often leads to expert overlap and overly uniform routing.<n>We propose a simple yet effective solution that introduces two complementary objectives.
arXiv Detail & Related papers (2025-05-28T13:09:47Z)
BadMoE: Backdooring Mixture-of-Experts LLMs via Optimizing Routing Triggers and Infecting Dormant Experts [12.755458703336153]
Mixture-of-Experts (MoE) have emerged as a powerful architecture for large language models (LLMs)<n>This paper presents the first backdoor attack against MoE-based LLMs where the attackers poison dormant experts''<n>We also show that dormant experts can serve as dominating experts to manipulate model predictions.
arXiv Detail & Related papers (2025-04-24T16:42:38Z)
Finding Fantastic Experts in MoEs: A Unified Study for Expert Dropping Strategies and Observations [86.90549830760513]
Sparsely activated Mixture-of-Experts (SMoE) has shown promise in scaling up the learning capacity of neural networks.<n>We propose MoE Experts Compression Suite (MC-Suite) to provide a benchmark for estimating expert importance from diverse perspectives.<n>We present an experimentally validated conjecture that, during expert dropping, SMoEs' instruction-following capabilities are predominantly hurt.
arXiv Detail & Related papers (2025-04-08T00:49:08Z)
Optimizing Robustness and Accuracy in Mixture of Experts: A Dual-Model Approach [14.639659415276533]
Mixture of Experts (MoE) have shown remarkable success in leveraging specialized expert networks for complex machine learning tasks.<n>Their susceptibility to adversarial attacks presents a critical challenge for deployment in robust applications.<n>This paper addresses the question of how to incorporate robustness into MoEs while maintaining high natural accuracy.
arXiv Detail & Related papers (2025-02-05T20:45:52Z)
Complexity Experts are Task-Discriminative Learners for Any Image Restoration [80.46313715427928]
We introduce complexity experts" -- flexible expert blocks with varying computational complexity and receptive fields.<n>This preference effectively drives task-specific allocation, assigning tasks to experts with the appropriate complexity.<n>The proposed MoCE-IR model outperforms state-of-the-art methods, affirming its efficiency and practical applicability.
arXiv Detail & Related papers (2024-11-27T15:58:07Z)
A Provably Effective Method for Pruning Experts in Fine-tuned Sparse Mixture-of-Experts [49.394145046409044]
This paper provides the first provably efficient technique for pruning experts in finetuned MoE models. We theoretically prove that prioritizing the pruning of the experts with a smaller change of the routers l2 norm from the pretrained model guarantees the preservation of test accuracy. Although our theoretical analysis is centered on binary classification tasks on simplified MoE architecture, our expert pruning method is verified on large vision MoE models.
arXiv Detail & Related papers (2024-05-26T17:52:58Z)
Mixture of insighTful Experts (MoTE): The Synergy of Thought Chains and Expert Mixtures in Self-Alignment [103.05005690990271]
Mixture of insighTful Experts (MoTE) is a novel framework that combines reasoning chains and expert mixtures to improve self-alignments.<n>MoTE significantly improves model safety, jailbreak resistance, and over-refusal capabilities, achieving performance comparable to OpenAI's state-of-the-art o1 model.
arXiv Detail & Related papers (2024-05-01T15:06:05Z)
SEER-MoE: Sparse Expert Efficiency through Regularization for Mixture-of-Experts [49.01990048827639]
We introduce SEER-MoE, a framework for reducing both the memory footprint and compute requirements of pre-trained MoE models. The first stage involves pruning the total number of experts using a heavy-hitters counting guidance, while the second stage employs a regularization-based fine-tuning strategy to recover accuracy loss. Our empirical studies demonstrate the effectiveness of our method, resulting in a sparse MoEs model optimized for inference efficiency with minimal accuracy trade-offs.
arXiv Detail & Related papers (2024-04-07T22:13:43Z)

This list is automatically generated from the titles and abstracts of the papers in this site.