BadMoE: Backdooring Mixture-of-Experts LLMs via Optimizing Routing Triggers and Infecting Dormant Experts
- URL: http://arxiv.org/abs/2504.18598v2
- Date: Tue, 29 Apr 2025 02:23:57 GMT
- Title: BadMoE: Backdooring Mixture-of-Experts LLMs via Optimizing Routing Triggers and Infecting Dormant Experts
- Authors: Qingyue Wang, Qi Pang, Xixun Lin, Shuai Wang, Daoyuan Wu,
- Abstract summary: Mixture-of-Experts (MoE) have emerged as a powerful architecture for large language models (LLMs)<n>This paper presents the first backdoor attack against MoE-based LLMs where the attackers poison dormant experts''<n>We also show that dormant experts can serve as dominating experts to manipulate model predictions.
- Score: 12.755458703336153
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Mixture-of-Experts (MoE) have emerged as a powerful architecture for large language models (LLMs), enabling efficient scaling of model capacity while maintaining manageable computational costs. The key advantage lies in their ability to route different tokens to different ``expert'' networks within the model, enabling specialization and efficient handling of diverse input. However, the vulnerabilities of MoE-based LLMs still have barely been studied, and the potential for backdoor attacks in this context remains largely unexplored. This paper presents the first backdoor attack against MoE-based LLMs where the attackers poison ``dormant experts'' (i.e., underutilized experts) and activate them by optimizing routing triggers, thereby gaining control over the model's output. We first rigorously prove the existence of a few ``dominating experts'' in MoE models, whose outputs can determine the overall MoE's output. We also show that dormant experts can serve as dominating experts to manipulate model predictions. Accordingly, our attack, namely BadMoE, exploits the unique architecture of MoE models by 1) identifying dormant experts unrelated to the target task, 2) constructing a routing-aware loss to optimize the activation triggers of these experts, and 3) promoting dormant experts to dominating roles via poisoned training data. Extensive experiments show that BadMoE successfully enforces malicious prediction on attackers' target tasks while preserving overall model utility, making it a more potent and stealthy attack than existing methods.
Related papers
- MISLEADER: Defending against Model Extraction with Ensembles of Distilled Models [56.09354775405601]
Model extraction attacks aim to replicate the functionality of a black-box model through query access.<n>Most existing defenses presume that attacker queries have out-of-distribution (OOD) samples, enabling them to detect and disrupt suspicious inputs.<n>We propose MISLEADER, a novel defense strategy that does not rely on OOD assumptions.
arXiv Detail & Related papers (2025-06-03T01:37:09Z) - Autonomy-of-Experts Models [34.82103329222486]
We propose a novel MoE paradigm in which experts autonomously select themselves to process inputs.<n>AoE is based on the insight that an expert is aware of its own capacity to effectively process a token.<n>Only the top-ranking experts proceed with the forward pass, while the others abort.
arXiv Detail & Related papers (2025-01-22T18:37:08Z) - Towards Adversarial Robustness of Model-Level Mixture-of-Experts Architectures for Semantic Segmentation [11.311414617703308]
We evaluate the adversarial vulnerability of MoEs for semantic segmentation of urban and highway traffic scenes.
We show that MoEs are, in most cases, more robust to per-instance and universal white-box adversarial attacks and can better withstand transfer attacks.
arXiv Detail & Related papers (2024-12-16T09:49:59Z) - UOE: Unlearning One Expert Is Enough For Mixture-of-experts LLMS [35.237427998489785]
We propose a novel single-expert unlearning framework, referred to as UOE, for Mixture-of-Experts (MoE) LLMs.<n>Through expert attribution, unlearning is concentrated on the most actively engaged expert for the specified knowledge.<n>We show that UOE enhances both forget quality up to 5% and model utility by 35% on MoE LLMs across various benchmarks.
arXiv Detail & Related papers (2024-11-27T22:46:08Z) - MoE++: Accelerating Mixture-of-Experts Methods with Zero-Computation Experts [63.67734699877724]
MoE++ is a general and heterogeneous MoE framework that integrates both Feed-Forward Network(FFN) and zero-computation experts.
MoE++ achieves better performance while delivering 1.1-2.1x expert forward throughput compared to a vanilla MoE model of the same size.
arXiv Detail & Related papers (2024-10-09T18:01:27Z) - MEGen: Generative Backdoor in Large Language Models via Model Editing [56.46183024683885]
Large language models (LLMs) have demonstrated remarkable capabilities.
Their powerful generative abilities enable flexible responses based on various queries or instructions.
This paper proposes an editing-based generative backdoor, named MEGen, aiming to create a customized backdoor for NLP tasks with the least side effects.
arXiv Detail & Related papers (2024-08-20T10:44:29Z) - Revisiting Backdoor Attacks against Large Vision-Language Models from Domain Shift [104.76588209308666]
This paper explores backdoor attacks in LVLM instruction tuning across mismatched training and testing domains.<n>We introduce a new evaluation dimension, backdoor domain generalization, to assess attack robustness.<n>We propose a multimodal attribution backdoor attack (MABA) that injects domain-agnostic triggers into critical areas.
arXiv Detail & Related papers (2024-06-27T02:31:03Z) - A Closer Look into Mixture-of-Experts in Large Language Models [26.503570706063634]
Mixture-of-experts (MoE) is gaining increasing attention due to its unique properties and remarkable performance.
MoE architecture could increase the model size without sacrificing computational efficiency.
We make an initial attempt to understand the inner workings of MoE-based large language models.
arXiv Detail & Related papers (2024-06-26T10:07:57Z) - A Provably Effective Method for Pruning Experts in Fine-tuned Sparse Mixture-of-Experts [49.394145046409044]
This paper provides the first provably efficient technique for pruning experts in finetuned MoE models.
We theoretically prove that prioritizing the pruning of the experts with a smaller change of the routers l2 norm from the pretrained model guarantees the preservation of test accuracy.
Although our theoretical analysis is centered on binary classification tasks on simplified MoE architecture, our expert pruning method is verified on large vision MoE models.
arXiv Detail & Related papers (2024-05-26T17:52:58Z) - SEER-MoE: Sparse Expert Efficiency through Regularization for Mixture-of-Experts [49.01990048827639]
We introduce SEER-MoE, a framework for reducing both the memory footprint and compute requirements of pre-trained MoE models.
The first stage involves pruning the total number of experts using a heavy-hitters counting guidance, while the second stage employs a regularization-based fine-tuning strategy to recover accuracy loss.
Our empirical studies demonstrate the effectiveness of our method, resulting in a sparse MoEs model optimized for inference efficiency with minimal accuracy trade-offs.
arXiv Detail & Related papers (2024-04-07T22:13:43Z) - Effective Backdoor Mitigation in Vision-Language Models Depends on the Pre-training Objective [71.39995120597999]
Modern machine learning models are vulnerable to adversarial and backdoor attacks.
Such risks are heightened by the prevalent practice of collecting massive, internet-sourced datasets for training multimodal models.
CleanCLIP is the current state-of-the-art approach to mitigate the effects of backdooring in multimodal models.
arXiv Detail & Related papers (2023-11-25T06:55:13Z) - Mitigating Backdoors in Federated Learning with FLD [7.908496863030483]
Federated learning allows clients to collaboratively train a global model without uploading raw data for privacy preservation.
This feature has recently been found responsible for federated learning's vulnerability in the face of backdoor attacks.
We propose Federated Layer Detection (FLD), a novel model filtering approach for effectively defending against backdoor attacks.
arXiv Detail & Related papers (2023-03-01T07:54:54Z) - MoEC: Mixture of Expert Clusters [93.63738535295866]
Sparsely Mixture of Experts (MoE) has received great interest due to its promising scaling capability with affordable computational overhead.
MoE converts dense layers into sparse experts, and utilizes a gated routing network to make experts conditionally activated.
However, as the number of experts grows, MoE with outrageous parameters suffers from overfitting and sparse data allocation.
arXiv Detail & Related papers (2022-07-19T06:09:55Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.