Defending against adversarial attacks using mixture of experts
- URL: http://arxiv.org/abs/2512.20821v1
- Date: Tue, 23 Dec 2025 22:46:06 GMT
- Title: Defending against adversarial attacks using mixture of experts
- Authors: Mohammad Meymani, Roozbeh Razavi-Far,
- Abstract summary: Adversarial threats aim to hinder the machine learning models from satisfying their objectives.<n>We propose a defense system, which devises an adversarial training module within mixture-of-experts architecture.
- Score: 1.3578741464318356
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Machine learning is a powerful tool enabling full automation of a huge number of tasks without explicit programming. Despite recent progress of machine learning in different domains, these models have shown vulnerabilities when they are exposed to adversarial threats. Adversarial threats aim to hinder the machine learning models from satisfying their objectives. They can create adversarial perturbations, which are imperceptible to humans' eyes but have the ability to cause misclassification during inference. Moreover, they can poison the training data to harm the model's performance or they can query the model to steal its sensitive information. In this paper, we propose a defense system, which devises an adversarial training module within mixture-of-experts architecture to enhance its robustness against adversarial threats. In our proposed defense system, we use nine pre-trained experts with ResNet-18 as their backbone. During end-to-end training, the parameters of expert models and gating mechanism are jointly updated allowing further optimization of the experts. Our proposed defense system outperforms state-of-the-art defense systems and plain classifiers, which use a more complex architecture than our model's backbone.
Related papers
- Exploiting Edge Features for Transferable Adversarial Attacks in Distributed Machine Learning [54.26807397329468]
This work explores a previously overlooked vulnerability in distributed deep learning systems.<n>An adversary who intercepts the intermediate features transmitted between them can still pose a serious threat.<n>We propose an exploitation strategy specifically designed for distributed settings.
arXiv Detail & Related papers (2025-07-09T20:09:00Z) - DUMB and DUMBer: Is Adversarial Training Worth It in the Real World? [15.469010487781931]
Adversarial examples are small and often imperceptible perturbations crafted to fool machine learning models.<n>Evasion attacks, a form of adversarial attack where input is modified at test time to cause misclassification, are particularly insidious due to their transferability.<n>We introduce DUMBer, an attack framework built on the foundation of the DUMB methodology to evaluate the resilience of adversarially trained models.
arXiv Detail & Related papers (2025-06-23T11:16:21Z) - Taking off the Rose-Tinted Glasses: A Critical Look at Adversarial ML Through the Lens of Evasion Attacks [11.830908033835728]
We argue that overly permissive attack and overly restrictive defensive threat models have hampered defense development in the ML domain.
We analyze adversarial machine learning from a system security perspective rather than an AI perspective.
arXiv Detail & Related papers (2024-10-15T21:33:23Z) - A Novel Approach to Guard from Adversarial Attacks using Stable Diffusion [0.0]
Our proposal suggests a different approach to the AI Guardian framework.
Instead of including adversarial examples in the training process, we propose training the AI system without them.
This aims to create a system that is inherently resilient to a wider range of attacks.
arXiv Detail & Related papers (2024-05-03T04:08:15Z) - Isolation and Induction: Training Robust Deep Neural Networks against
Model Stealing Attacks [51.51023951695014]
Existing model stealing defenses add deceptive perturbations to the victim's posterior probabilities to mislead the attackers.
This paper proposes Isolation and Induction (InI), a novel and effective training framework for model stealing defenses.
In contrast to adding perturbations over model predictions that harm the benign accuracy, we train models to produce uninformative outputs against stealing queries.
arXiv Detail & Related papers (2023-08-02T05:54:01Z) - Self-Destructing Models: Increasing the Costs of Harmful Dual Uses of
Foundation Models [103.71308117592963]
We present an algorithm for training self-destructing models leveraging techniques from meta-learning and adversarial learning.
In a small-scale experiment, we show MLAC can largely prevent a BERT-style model from being re-purposed to perform gender identification.
arXiv Detail & Related papers (2022-11-27T21:43:45Z) - A Framework for Understanding Model Extraction Attack and Defense [48.421636548746704]
We study tradeoffs between model utility from a benign user's view and privacy from an adversary's view.
We develop new metrics to quantify such tradeoffs, analyze their theoretical properties, and develop an optimization problem to understand the optimal adversarial attack and defense strategies.
arXiv Detail & Related papers (2022-06-23T05:24:52Z) - A Tutorial on Adversarial Learning Attacks and Countermeasures [0.0]
A machine learning model is capable of making highly accurate predictions without being explicitly programmed to do so.
adversarial learning attacks pose a serious security threat that greatly undermines further such systems.
This paper provides a detailed tutorial on the principles of adversarial learning, explains the different attack scenarios, and gives an in-depth insight into the state-of-art defense mechanisms against this rising threat.
arXiv Detail & Related papers (2022-02-21T17:14:45Z) - Automating Privilege Escalation with Deep Reinforcement Learning [71.87228372303453]
In this work, we exemplify the potential threat of malicious actors using deep reinforcement learning to train automated agents.
We present an agent that uses a state-of-the-art reinforcement learning algorithm to perform local privilege escalation.
Our agent is usable for generating realistic attack sensor data for training and evaluating intrusion detection systems.
arXiv Detail & Related papers (2021-10-04T12:20:46Z) - ML-Doctor: Holistic Risk Assessment of Inference Attacks Against Machine
Learning Models [64.03398193325572]
Inference attacks against Machine Learning (ML) models allow adversaries to learn about training data, model parameters, etc.
We concentrate on four attacks - namely, membership inference, model inversion, attribute inference, and model stealing.
Our analysis relies on a modular re-usable software, ML-Doctor, which enables ML model owners to assess the risks of deploying their models.
arXiv Detail & Related papers (2021-02-04T11:35:13Z) - Enhanced Adversarial Strategically-Timed Attacks against Deep
Reinforcement Learning [91.13113161754022]
We introduce timing-based adversarial strategies against a DRL-based navigation system by jamming in physical noise patterns on the selected time frames.
Our experimental results show that the adversarial timing attacks can lead to a significant performance drop.
arXiv Detail & Related papers (2020-02-20T21:39:25Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.