RAT: Adversarial Attacks on Deep Reinforcement Agents for Targeted Behaviors
- URL: http://arxiv.org/abs/2412.10713v1
- Date: Sat, 14 Dec 2024 06:56:11 GMT
- Title: RAT: Adversarial Attacks on Deep Reinforcement Agents for Targeted Behaviors
- Authors: Fengshuo Bai, Runze Liu, Yali Du, Ying Wen, Yaodong Yang,
- Abstract summary: RAT trains an intention policy that is explicitly aligned with human preferences.<n>RAT dynamically adjusts the state occupancy measure within the replay buffer, allowing for more controlled and effective behavior manipulation.
- Score: 15.593859086891745
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Evaluating deep reinforcement learning (DRL) agents against targeted behavior attacks is critical for assessing their robustness. These attacks aim to manipulate the victim into specific behaviors that align with the attacker's objectives, often bypassing traditional reward-based defenses. Prior methods have primarily focused on reducing cumulative rewards; however, rewards are typically too generic to capture complex safety requirements effectively. As a result, focusing solely on reward reduction can lead to suboptimal attack strategies, particularly in safety-critical scenarios where more precise behavior manipulation is needed. To address these challenges, we propose RAT, a method designed for universal, targeted behavior attacks. RAT trains an intention policy that is explicitly aligned with human preferences, serving as a precise behavioral target for the adversary. Concurrently, an adversary manipulates the victim's policy to follow this target behavior. To enhance the effectiveness of these attacks, RAT dynamically adjusts the state occupancy measure within the replay buffer, allowing for more controlled and effective behavior manipulation. Our empirical results on robotic simulation tasks demonstrate that RAT outperforms existing adversarial attack algorithms in inducing specific behaviors. Additionally, RAT shows promise in improving agent robustness, leading to more resilient policies. We further validate RAT by guiding Decision Transformer agents to adopt behaviors aligned with human preferences in various MuJoCo tasks, demonstrating its effectiveness across diverse tasks.
Related papers
- TrojanTO: Action-Level Backdoor Attacks against Trajectory Optimization Models [67.06525001375722]
TrojanTO is the first action-level backdoor attack against TO models.<n>It implants backdoor attacks across diverse tasks and attack objectives with a low attack budget.<n>TrojanTO exhibits broad applicability to DT, GDT, and DC.
arXiv Detail & Related papers (2025-06-15T11:27:49Z) - Effective Red-Teaming of Policy-Adherent Agents [7.080204863156575]
Task-oriented LLM-based agents are increasingly used in domains with strict policies, such as refund eligibility or cancellation rules.<n>We propose a novel threat model that focuses on adversarial users aiming to exploit policy-adherent agents for personal benefit.<n>We present CRAFT, a multi-agent red-teaming system that leverages policy-aware persuasive strategies to undermine a policy-adherent agent in a customer-service scenario.
arXiv Detail & Related papers (2025-06-11T10:59:47Z) - MELON: Indirect Prompt Injection Defense via Masked Re-execution and Tool Comparison [60.30753230776882]
LLM agents are vulnerable to indirect prompt injection (IPI) attacks.
We present MELON, a novel IPI defense.
We show that MELON outperforms SOTA defenses in both attack prevention and utility preservation.
arXiv Detail & Related papers (2025-02-07T18:57:49Z) - Adversarial Inception for Bounded Backdoor Poisoning in Deep Reinforcement Learning [16.350898218047405]
We propose a new class of backdoor attacks against Deep Reinforcement Learning (DRL) algorithms.
These attacks achieve state of the art performance while minimally altering the agent's rewards.
We then devise an online attack which significantly out-performs prior attacks under bounded reward constraints.
arXiv Detail & Related papers (2024-10-17T19:50:28Z) - CuDA2: An approach for Incorporating Traitor Agents into Cooperative Multi-Agent Systems [13.776447110639193]
We introduce a novel method that involves injecting traitor agents into the CMARL system.
In TMDP, traitors are trained using the same MARL algorithm as the victim agents, with their reward function set as the negative of the victim agents' reward.
CuDA2 enhances the efficiency and aggressiveness of attacks on the specified victim agents' policies.
arXiv Detail & Related papers (2024-06-25T09:59:31Z) - Probabilistic Perspectives on Error Minimization in Adversarial Reinforcement Learning [18.044879441434432]
A self-driving car could experience catastrophic consequences if sensory inputs about traffic signs are manipulated by an adversary.
The core challenge in such situations is that the true state of the environment becomes only partially observable due to these adversarial manipulations.
We introduce a novel objective called Adversarial Counterfactual Error (ACoE) which is defined on the beliefs about the underlying true state.
arXiv Detail & Related papers (2024-06-07T08:14:24Z) - Robust Deep Reinforcement Learning against Adversarial Behavior Manipulation [10.411820336052784]
This study investigates behavior-targeted attacks on reinforcement learning and their countermeasures.<n>To the best of our knowledge, this is the first defense strategy specifically designed for behavior-targeted attacks.
arXiv Detail & Related papers (2024-06-06T08:49:51Z) - SEEP: Training Dynamics Grounds Latent Representation Search for Mitigating Backdoor Poisoning Attacks [53.28390057407576]
Modern NLP models are often trained on public datasets drawn from diverse sources.
Data poisoning attacks can manipulate the model's behavior in ways engineered by the attacker.
Several strategies have been proposed to mitigate the risks associated with backdoor attacks.
arXiv Detail & Related papers (2024-05-19T14:50:09Z) - PRAT: PRofiling Adversarial aTtacks [52.693011665938734]
We introduce a novel problem of PRofiling Adversarial aTtacks (PRAT)
Given an adversarial example, the objective of PRAT is to identify the attack used to generate it.
We use AID to devise a novel framework for the PRAT objective.
arXiv Detail & Related papers (2023-09-20T07:42:51Z) - Efficient Adversarial Attacks on Online Multi-agent Reinforcement
Learning [45.408568528354216]
We investigate the impact of adversarial attacks on multi-agent reinforcement learning (MARL)
In the considered setup, there is an attacker who is able to modify the rewards before the agents receive them or manipulate the actions before the environment receives them.
We show that the mixed attack strategy can efficiently attack MARL agents even if the attacker has no prior information about the underlying environment and the agents' algorithms.
arXiv Detail & Related papers (2023-07-15T00:38:55Z) - Attacking Cooperative Multi-Agent Reinforcement Learning by Adversarial Minority Influence [41.14664289570607]
Adrial Minority Influence (AMI) is a practical black-box attack and can be launched without knowing victim parameters.
AMI is also strong by considering the complex multi-agent interaction and the cooperative goal of agents.
We achieve the first successful attack against real-world robot swarms and effectively fool agents in simulated environments into collectively worst-case scenarios.
arXiv Detail & Related papers (2023-02-07T08:54:37Z) - Illusory Attacks: Information-Theoretic Detectability Matters in Adversarial Attacks [76.35478518372692]
We introduce epsilon-illusory, a novel form of adversarial attack on sequential decision-makers.
Compared to existing attacks, we empirically find epsilon-illusory to be significantly harder to detect with automated methods.
Our findings suggest the need for better anomaly detectors, as well as effective hardware- and system-level defenses.
arXiv Detail & Related papers (2022-07-20T19:49:09Z) - Model-Agnostic Meta-Attack: Towards Reliable Evaluation of Adversarial
Robustness [53.094682754683255]
We propose a Model-Agnostic Meta-Attack (MAMA) approach to discover stronger attack algorithms automatically.
Our method learns the in adversarial attacks parameterized by a recurrent neural network.
We develop a model-agnostic training algorithm to improve the ability of the learned when attacking unseen defenses.
arXiv Detail & Related papers (2021-10-13T13:54:24Z) - Guided Adversarial Attack for Evaluating and Enhancing Adversarial
Defenses [59.58128343334556]
We introduce a relaxation term to the standard loss, that finds more suitable gradient-directions, increases attack efficacy and leads to more efficient adversarial training.
We propose Guided Adversarial Margin Attack (GAMA), which utilizes function mapping of the clean image to guide the generation of adversaries.
We also propose Guided Adversarial Training (GAT), which achieves state-of-the-art performance amongst single-step defenses.
arXiv Detail & Related papers (2020-11-30T16:39:39Z) - Adversarial jamming attacks and defense strategies via adaptive deep
reinforcement learning [12.11027948206573]
In this paper, we consider a victim user that performs DRL-based dynamic channel access, and an attacker that executes DRLbased jamming attacks to disrupt the victim.
Both the victim and attacker are DRL agents and can interact with each other, retrain their models, and adapt to opponents' policies.
We propose three defense strategies to maximize the attacked victim's accuracy and evaluate their performances.
arXiv Detail & Related papers (2020-07-12T18:16:00Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.