Related papers: Robust Experts: the Effect of Adversarial Training on CNNs with Sparse Mixture-of-Experts Layers

Robust Experts: the Effect of Adversarial Training on CNNs with Sparse Mixture-of-Experts Layers

URL: http://arxiv.org/abs/2509.05086v1
Date: Fri, 05 Sep 2025 13:25:33 GMT
Title: Robust Experts: the Effect of Adversarial Training on CNNs with Sparse Mixture-of-Experts Layers
Authors: Svetlana Pavlitska, Haixi Fan, Konstantin Ditschuneit, J. Marius Zöllner,
Abstract summary: Robustifying convolutional neural networks (CNNs) against adversarial attacks remains challenging.<n>We explore the use of sparse mixture-of-experts (MoE) layers to improve robustness.<n>We find that inserting a single MoE layer in the deeper stages leads to consistent improvements in robustness.
Score: 10.912224105652044
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Robustifying convolutional neural networks (CNNs) against adversarial attacks remains challenging and often requires resource-intensive countermeasures. We explore the use of sparse mixture-of-experts (MoE) layers to improve robustness by replacing selected residual blocks or convolutional layers, thereby increasing model capacity without additional inference cost. On ResNet architectures trained on CIFAR-100, we find that inserting a single MoE layer in the deeper stages leads to consistent improvements in robustness under PGD and AutoPGD attacks when combined with adversarial training. Furthermore, we discover that when switch loss is used for balancing, it causes routing to collapse onto a small set of overused experts, thereby concentrating adversarial training on these paths and inadvertently making them more robust. As a result, some individual experts outperform the gated MoE model in robustness, suggesting that robust subpaths emerge through specialization. Our code is available at https://github.com/KASTEL-MobilityLab/robust-sparse-moes.

Related papers

Synergistic Intra- and Cross-Layer Regularization Losses for MoE Expert Specialization [10.669680236190432]
We propose two plug-and-play regularization losses that enhance MoE specialization and routing efficiency.<n>We implement both losses as a drop-in Megatron-LM module.
arXiv Detail & Related papers (2026-02-15T14:19:12Z)
A Few Large Shifts: Layer-Inconsistency Based Minimal Overhead Adversarial Example Detection [9.335304254034401]
We introduce a lightweight, plug-in detection framework that leverages internal layer-wise inconsistencies within the target model itself.<n>Our method achieves state-of-the-art detection performance with negligible computational overhead and no compromise to clean accuracy.
arXiv Detail & Related papers (2025-05-19T00:48:53Z)
Towards Adversarial Robustness of Model-Level Mixture-of-Experts Architectures for Semantic Segmentation [11.311414617703308]
We evaluate the adversarial vulnerability of MoEs for semantic segmentation of urban and highway traffic scenes.<n>We show that MoEs are, in most cases, more robust to per-instance and universal white-box adversarial attacks and can better withstand transfer attacks.
arXiv Detail & Related papers (2024-12-16T09:49:59Z)
Improving the Robustness of Quantized Deep Neural Networks to White-Box Attacks using Stochastic Quantization and Information-Theoretic Ensemble Training [1.6098666134798774]
Most real-world applications that employ deep neural networks (DNNs) quantize them to low precision to reduce the compute needs. We present a method to improve the robustness of quantized DNNs to white-box adversarial attacks.
arXiv Detail & Related papers (2023-11-30T17:15:58Z)
Distributed Adversarial Training to Robustify Deep Neural Networks at Scale [100.19539096465101]
Current deep neural networks (DNNs) are vulnerable to adversarial attacks, where adversarial perturbations to the inputs can change or manipulate classification. To defend against such attacks, an effective approach, known as adversarial training (AT), has been shown to mitigate robust training. We propose a large-batch adversarial training framework implemented over multiple machines.
arXiv Detail & Related papers (2022-06-13T15:39:43Z)
Latent Boundary-guided Adversarial Training [61.43040235982727]
Adrial training is proved to be the most effective strategy that injects adversarial examples into model training. We propose a novel adversarial training framework called LAtent bounDary-guided aDvErsarial tRaining.
arXiv Detail & Related papers (2022-06-08T07:40:55Z)
Sparsity Winning Twice: Better Robust Generalization from More Efficient Training [94.92954973680914]
We introduce two alternatives for sparse adversarial training: (i) static sparsity and (ii) dynamic sparsity. We find both methods to yield win-win: substantially shrinking the robust generalization gap and alleviating the robust overfitting. Our approaches can be combined with existing regularizers, establishing new state-of-the-art results in adversarial training.
arXiv Detail & Related papers (2022-02-20T15:52:08Z)
Federated Learning with Unreliable Clients: Performance Analysis and Mechanism Design [76.29738151117583]
Federated Learning (FL) has become a promising tool for training effective machine learning models among distributed clients. However, low quality models could be uploaded to the aggregator server by unreliable clients, leading to a degradation or even a collapse of training. We model these unreliable behaviors of clients and propose a defensive mechanism to mitigate such a security risk.
arXiv Detail & Related papers (2021-05-10T08:02:27Z)
Ensemble-in-One: Learning Ensemble within Random Gated Networks for Enhanced Adversarial Robustness [18.514706498043214]
Adversarial attacks have rendered high security risks on modern deep learning systems. We propose ensemble-in-one (EIO) to train an ensemble within one random gated network (RGN) EIO consistently outperforms previous ensemble training methods with even less computational overhead.
arXiv Detail & Related papers (2021-03-27T03:13:03Z)
Combating Adversaries with Anti-Adversaries [118.70141983415445]
In particular, our layer generates an input perturbation in the opposite direction of the adversarial one. We verify the effectiveness of our approach by combining our layer with both nominally and robustly trained models. Our anti-adversary layer significantly enhances model robustness while coming at no cost on clean accuracy.
arXiv Detail & Related papers (2021-03-26T09:36:59Z)
HYDRA: Pruning Adversarially Robust Neural Networks [58.061681100058316]
Deep learning faces two key challenges: lack of robustness against adversarial attacks and large neural network size. We propose to make pruning techniques aware of the robust training objective and let the training objective guide the search for which connections to prune. We demonstrate that our approach, titled HYDRA, achieves compressed networks with state-of-the-art benign and robust accuracy, simultaneously.
arXiv Detail & Related papers (2020-02-24T19:54:53Z)

This list is automatically generated from the titles and abstracts of the papers in this site.