Related papers: Optimizing Robustness and Accuracy in Mixture of Experts: A Dual-Model Approach

Optimizing Robustness and Accuracy in Mixture of Experts: A Dual-Model Approach

URL: http://arxiv.org/abs/2502.06832v2
Date: Wed, 12 Feb 2025 05:30:33 GMT
Title: Optimizing Robustness and Accuracy in Mixture of Experts: A Dual-Model Approach
Authors: Xu Zhang, Kaidi Xu, Ziqing Hu, Ren Wang,
Abstract summary: Mixture of Experts (MoE) have shown remarkable success in leveraging specialized expert networks for complex machine learning tasks.<n>Their susceptibility to adversarial attacks presents a critical challenge for deployment in robust applications.<n>This paper addresses the question of how to incorporate robustness into MoEs while maintaining high natural accuracy.
Score: 14.639659415276533
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Mixture of Experts (MoE) have shown remarkable success in leveraging specialized expert networks for complex machine learning tasks. However, their susceptibility to adversarial attacks presents a critical challenge for deployment in robust applications. This paper addresses the critical question of how to incorporate robustness into MoEs while maintaining high natural accuracy. We begin by analyzing the vulnerability of MoE components, finding that expert networks are notably more susceptible to adversarial attacks than the router. Based on this insight, we propose a targeted robust training technique that integrates a novel loss function to enhance the adversarial robustness of MoE, requiring only the robustification of one additional expert without compromising training or inference efficiency. Building on this, we introduce a dual-model strategy that linearly combines a standard MoE model with our robustified MoE model using a smoothing parameter. This approach allows for flexible control over the robustness-accuracy trade-off. We further provide theoretical foundations by deriving certified robustness bounds for both the single MoE and the dual-model. To push the boundaries of robustness and accuracy, we propose a novel joint training strategy JTDMoE for the dual-model. This joint training enhances both robustness and accuracy beyond what is achievable with separate models. Experimental results on CIFAR-10 and TinyImageNet datasets using ResNet18 and Vision Transformer (ViT) architectures demonstrate the effectiveness of our proposed methods.

Related papers

MISLEADER: Defending against Model Extraction with Ensembles of Distilled Models [56.09354775405601]
Model extraction attacks aim to replicate the functionality of a black-box model through query access.<n>Most existing defenses presume that attacker queries have out-of-distribution (OOD) samples, enabling them to detect and disrupt suspicious inputs.<n>We propose MISLEADER, a novel defense strategy that does not rely on OOD assumptions.
arXiv Detail & Related papers (2025-06-03T01:37:09Z)
Adversarial Robustness through Dynamic Ensemble Learning [0.0]
Adversarial attacks pose a significant threat to the reliability of pre-trained language models (PLMs)<n>This paper presents Adversarial Robustness through Dynamic Ensemble Learning (ARDEL), a novel scheme designed to enhance the robustness of PLMs against such attacks.
arXiv Detail & Related papers (2024-12-20T05:36:19Z)
Adaptive Pruning with Module Robustness Sensitivity: Balancing Compression and Robustness [7.742297876120561]
This paper introduces Module Robustness Sensitivity (MRS), a novel metric that quantifies layer-wise sensitivity to adversarial perturbations. We propose Module Robust Pruning and Fine-Tuning (MRPF), an adaptive pruning algorithm compatible with any adversarial training method.
arXiv Detail & Related papers (2024-10-19T18:35:52Z)
A Hybrid Defense Strategy for Boosting Adversarial Robustness in Vision-Language Models [9.304845676825584]
We propose a novel adversarial training framework that integrates multiple attack strategies and advanced machine learning techniques. Experiments conducted on real-world datasets, including CIFAR-10 and CIFAR-100, demonstrate that the proposed method significantly enhances model robustness.
arXiv Detail & Related papers (2024-10-18T23:47:46Z)
Dynamic Label Adversarial Training for Deep Learning Robustness Against Adversarial Attacks [11.389689242531327]
Adversarial training is one of the most effective methods for enhancing model robustness. Previous approaches primarily use static ground truth for adversarial training, but this often causes robust overfitting. We propose a dynamic label adversarial training (DYNAT) algorithm that enables the target model to gain robustness from the guide model's decisions.
arXiv Detail & Related papers (2024-08-23T14:25:12Z)
A Provably Effective Method for Pruning Experts in Fine-tuned Sparse Mixture-of-Experts [49.394145046409044]
This paper provides the first provably efficient technique for pruning experts in finetuned MoE models. We theoretically prove that prioritizing the pruning of the experts with a smaller change of the routers l2 norm from the pretrained model guarantees the preservation of test accuracy. Although our theoretical analysis is centered on binary classification tasks on simplified MoE architecture, our expert pruning method is verified on large vision MoE models.
arXiv Detail & Related papers (2024-05-26T17:52:58Z)
Exploring the Interplay of Interpretability and Robustness in Deep Neural Networks: A Saliency-guided Approach [3.962831477787584]
Adversarial attacks pose a significant challenge to deploying deep learning models in safety-critical applications. Maintaining model robustness while ensuring interpretability is vital for fostering trust and comprehension in these models. This study investigates the impact of Saliency-guided Training on model robustness.
arXiv Detail & Related papers (2024-05-10T07:21:03Z)
Mixture of insighTful Experts (MoTE): The Synergy of Thought Chains and Expert Mixtures in Self-Alignment [103.05005690990271]
We propose a novel framework that combines reasoning chains and expert mixtures to improve self-alignments. MoTE employs a structured reasoning chain comprising four key stages: Question Analysis, Answer Guidance, Safe Answer, and Safety Checking. MoTE significantly improves model safety, jailbreak resistance, and over-refusal capabilities, achieving performance comparable to OpenAI's state-of-the-art o1 model.
arXiv Detail & Related papers (2024-05-01T15:06:05Z)
FullLoRA-AT: Efficiently Boosting the Robustness of Pretrained Vision Transformers [61.48709409150777]
Vision Transformer (ViT) model has gradually become mainstream in various computer vision tasks. Existing large models tend to prioritize performance during training, potentially neglecting the robustness. We develop a novel LNLoRA module, incorporating a learnable layer normalization before the conventional LoRA module. We propose the FullLoRA-AT framework by integrating the learnable LNLoRA modules into all key components of ViT-based models.
arXiv Detail & Related papers (2024-01-03T14:08:39Z)
Improving Generalization of Adversarial Training via Robust Critical Fine-Tuning [19.91117174405902]
Deep neural networks are susceptible to adversarial examples, posing a significant security risk in critical applications. This paper proposes Robustness Critical FineTuning (RiFT), a novel approach to enhance generalization without compromising adversarial robustness.
arXiv Detail & Related papers (2023-08-01T09:02:34Z)
Interpolated Joint Space Adversarial Training for Robust and Generalizable Defenses [82.3052187788609]
Adversarial training (AT) is considered to be one of the most reliable defenses against adversarial attacks. Recent works show generalization improvement with adversarial samples under novel threat models. We propose a novel threat model called Joint Space Threat Model (JSTM) Under JSTM, we develop novel adversarial attacks and defenses.
arXiv Detail & Related papers (2021-12-12T21:08:14Z)
Once-for-All Adversarial Training: In-Situ Tradeoff between Robustness and Accuracy for Free [115.81899803240758]
Adversarial training and its many variants substantially improve deep network robustness, yet at the cost of compromising standard accuracy. This paper asks how to quickly calibrate a trained model in-situ, to examine the achievable trade-offs between its standard and robust accuracies. Our proposed framework, Once-for-all Adversarial Training (OAT), is built on an innovative model-conditional training framework.
arXiv Detail & Related papers (2020-10-22T16:06:34Z)
Boosting Adversarial Training with Hypersphere Embedding [53.75693100495097]
Adversarial training is one of the most effective defenses against adversarial attacks for deep learning models. In this work, we advocate incorporating the hypersphere embedding mechanism into the AT procedure. We validate our methods under a wide range of adversarial attacks on the CIFAR-10 and ImageNet datasets.
arXiv Detail & Related papers (2020-02-20T08:42:29Z)

This list is automatically generated from the titles and abstracts of the papers in this site.