Adversarial Agents: Black-Box Evasion Attacks with Reinforcement Learning
- URL: http://arxiv.org/abs/2503.01734v1
- Date: Mon, 03 Mar 2025 16:54:03 GMT
- Title: Adversarial Agents: Black-Box Evasion Attacks with Reinforcement Learning
- Authors: Kyle Domico, Jean-Charles Noirot Ferrand, Ryan Sheatsley, Eric Pauley, Josiah Hanna, Patrick McDaniel,
- Abstract summary: Reinforcement learning (RL) offers powerful techniques for solving complex sequential decision-making tasks from experience.<n>In this paper, we demonstrate how RL can be applied to adversarial machine learning (AML) to develop a new class of attacks that learn to generate adversarial examples.<n>From a security perspective, this work demonstrates a powerful new attack vector that uses RL to attack ML models efficiently and at scale.
- Score: 5.324888516045196
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Reinforcement learning (RL) offers powerful techniques for solving complex sequential decision-making tasks from experience. In this paper, we demonstrate how RL can be applied to adversarial machine learning (AML) to develop a new class of attacks that learn to generate adversarial examples: inputs designed to fool machine learning models. Unlike traditional AML methods that craft adversarial examples independently, our RL-based approach retains and exploits past attack experience to improve future attacks. We formulate adversarial example generation as a Markov Decision Process and evaluate RL's ability to (a) learn effective and efficient attack strategies and (b) compete with state-of-the-art AML. On CIFAR-10, our agent increases the success rate of adversarial examples by 19.4% and decreases the median number of victim model queries per adversarial example by 53.2% from the start to the end of training. In a head-to-head comparison with a state-of-the-art image attack, SquareAttack, our approach enables an adversary to generate adversarial examples with 13.1% more success after 5000 episodes of training. From a security perspective, this work demonstrates a powerful new attack vector that uses RL to attack ML models efficiently and at scale.
Related papers
- GenFighter: A Generative and Evolutive Textual Attack Removal [6.044610337297754]
Adrial attacks pose significant challenges to deep neural networks (DNNs) such as Transformer models in natural language processing (NLP)
This paper introduces a novel defense strategy, called GenFighter, which enhances adversarial robustness by learning and reasoning on the training classification distribution.
We show that GenFighter outperforms state-of-the-art defenses in accuracy under attack and attack success rate metrics.
arXiv Detail & Related papers (2024-04-17T16:32:13Z) - BadRL: Sparse Targeted Backdoor Attack Against Reinforcement Learning [37.19070609394519]
Backdoor attacks in reinforcement learning (RL) have previously employed intense attack strategies to ensure attack success.
In this work, we propose a novel approach, BadRL, which focuses on conducting highly sparse backdoor poisoning efforts during training and testing.
Our algorithm, BadRL, strategically chooses state observations with high attack values to inject triggers during training and testing, thereby reducing the chances of detection.
arXiv Detail & Related papers (2023-12-19T20:29:29Z) - DALA: A Distribution-Aware LoRA-Based Adversarial Attack against
Language Models [64.79319733514266]
Adversarial attacks can introduce subtle perturbations to input data.
Recent attack methods can achieve a relatively high attack success rate (ASR)
We propose a Distribution-Aware LoRA-based Adversarial Attack (DALA) method.
arXiv Detail & Related papers (2023-11-14T23:43:47Z) - Enhancing the Robustness of QMIX against State-adversarial Attacks [6.627954554805906]
We discuss four techniques to improve the robustness of SARL algorithms and extend them to multi-agent scenarios.
We train models using a variety of attacks in this research.
We then test the models taught using the other attacks by subjecting them to the corresponding attacks throughout the training phase.
arXiv Detail & Related papers (2023-07-03T10:10:34Z) - Sampling Attacks on Meta Reinforcement Learning: A Minimax Formulation
and Complexity Analysis [20.11993437283895]
This paper provides a game-theoretical underpinning for understanding this type of security risk.
We define the sampling attack model as a Stackelberg game between the attacker and the agent, which yields a minimax formulation.
We observe that a minor effort of the attacker can significantly deteriorate the learning performance.
arXiv Detail & Related papers (2022-07-29T21:29:29Z) - Projective Ranking-based GNN Evasion Attacks [52.85890533994233]
Graph neural networks (GNNs) offer promising learning methods for graph-related tasks.
GNNs are at risk of adversarial attacks.
arXiv Detail & Related papers (2022-02-25T21:52:09Z) - Adversarial Robustness of Deep Reinforcement Learning based Dynamic
Recommender Systems [50.758281304737444]
We propose to explore adversarial examples and attack detection on reinforcement learning-based interactive recommendation systems.
We first craft different types of adversarial examples by adding perturbations to the input and intervening on the casual factors.
Then, we augment recommendation systems by detecting potential attacks with a deep learning-based classifier based on the crafted data.
arXiv Detail & Related papers (2021-12-02T04:12:24Z) - Model-Agnostic Meta-Attack: Towards Reliable Evaluation of Adversarial
Robustness [53.094682754683255]
We propose a Model-Agnostic Meta-Attack (MAMA) approach to discover stronger attack algorithms automatically.
Our method learns the in adversarial attacks parameterized by a recurrent neural network.
We develop a model-agnostic training algorithm to improve the ability of the learned when attacking unseen defenses.
arXiv Detail & Related papers (2021-10-13T13:54:24Z) - ML-Doctor: Holistic Risk Assessment of Inference Attacks Against Machine
Learning Models [64.03398193325572]
Inference attacks against Machine Learning (ML) models allow adversaries to learn about training data, model parameters, etc.
We concentrate on four attacks - namely, membership inference, model inversion, attribute inference, and model stealing.
Our analysis relies on a modular re-usable software, ML-Doctor, which enables ML model owners to assess the risks of deploying their models.
arXiv Detail & Related papers (2021-02-04T11:35:13Z) - Robust Reinforcement Learning on State Observations with Learned Optimal
Adversary [86.0846119254031]
We study the robustness of reinforcement learning with adversarially perturbed state observations.
With a fixed agent policy, we demonstrate that an optimal adversary to perturb state observations can be found.
For DRL settings, this leads to a novel empirical adversarial attack to RL agents via a learned adversary that is much stronger than previous ones.
arXiv Detail & Related papers (2021-01-21T05:38:52Z) - Stealthy and Efficient Adversarial Attacks against Deep Reinforcement
Learning [30.46580767540506]
We introduce two novel adversarial attack techniques to emphstealthily and emphefficiently attack the Deep Reinforcement Learning agents.
The first technique is the emphcritical point attack: the adversary builds a model to predict the future environmental states and agent's actions, assesses the damage of each possible attack strategy, and selects the optimal one.
The second technique is the emphantagonist attack: the adversary automatically learns a domain-agnostic model to discover the critical moments of attacking the agent in an episode.
arXiv Detail & Related papers (2020-05-14T16:06:38Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.