Related papers: Policy Disruption in Reinforcement Learning:Adversarial Attack with Large Language Models and Critical State Identification

Policy Disruption in Reinforcement Learning:Adversarial Attack with Large Language Models and Critical State Identification

URL: http://arxiv.org/abs/2507.18113v1
Date: Thu, 24 Jul 2025 05:52:06 GMT
Title: Policy Disruption in Reinforcement Learning:Adversarial Attack with Large Language Models and Critical State Identification
Authors: Junyong Jiang, Buwei Tian, Chenxing Xu, Songze Li, Lu Dong,
Abstract summary: Reinforcement learning (RL) has achieved remarkable success in fields like robotics and autonomous driving.<n>Existing approaches often rely on modifying the environment or policy, limiting their practicality.<n>This paper proposes an adversarial attack method in which existing agents in the environment guide the target policy to output suboptimal actions without altering the environment.
Score: 8.292056374554162
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Reinforcement learning (RL) has achieved remarkable success in fields like robotics and autonomous driving, but adversarial attacks designed to mislead RL systems remain challenging. Existing approaches often rely on modifying the environment or policy, limiting their practicality. This paper proposes an adversarial attack method in which existing agents in the environment guide the target policy to output suboptimal actions without altering the environment. We propose a reward iteration optimization framework that leverages large language models (LLMs) to generate adversarial rewards explicitly tailored to the vulnerabilities of the target agent, thereby enhancing the effectiveness of inducing the target agent toward suboptimal decision-making. Additionally, a critical state identification algorithm is designed to pinpoint the target agent's most vulnerable states, where suboptimal behavior from the victim leads to significant degradation in overall performance. Experimental results in diverse environments demonstrate the superiority of our method over existing approaches.

Related papers

Reinforcement Learning for Decision-Level Interception Prioritization in Drone Swarm Defense [56.47577824219207]
We present a case study demonstrating the practical advantages of reinforcement learning in addressing this challenge.<n>We introduce a high-fidelity simulation environment that captures realistic operational constraints.<n>Agent learns to coordinate multiple effectors for optimal interception prioritization.<n>We evaluate the learned policy against a handcrafted rule-based baseline across hundreds of simulated attack scenarios.
arXiv Detail & Related papers (2025-08-01T13:55:39Z)
Robust Policy Switching for Antifragile Reinforcement Learning for UAV Deconfliction in Adversarial Environments [6.956559003734227]
An unmanned aerial vehicles (UAVs) has been exposed to adversarial attacks that exploit vulnerabilities in reinforcement learning (RL)<n>This paper introduces an antifragile RL framework that enhances adaptability to broader distributional shifts.<n>It achieves superior performance, demonstrating shorter navigation path lengths and a higher rate of conflict-free navigation trajectories.
arXiv Detail & Related papers (2025-06-26T10:06:29Z)
Towards Robust Deep Reinforcement Learning against Environmental State Perturbation [13.811628977069029]
Adversarial attacks and robustness in Deep Reinforcement Learning (DRL) have been widely studied in various threat models.<n>We formulate the problem of environmental state perturbation, introducing a preliminary non-targeted attack method as a calibration adversary.<n>We then propose a defense framework, named Boosted Adversarial Training (BAT), which first tunes the agents via supervised learning to avoid catastrophic failure and subsequently adversarially trains the agent with reinforcement learning.
arXiv Detail & Related papers (2025-06-10T16:32:31Z)
Robust Deep Reinforcement Learning in Robotics via Adaptive Gradient-Masked Adversarial Attacks [15.825229211045647]
We propose the Adaptive Gradient-Masked Reinforcement (AGMR) Attack, a white-box attack method that combines DRL with a gradient-based soft masking mechanism to dynamically identify critical state dimensions and optimize adversarial policies.<n>AGMR outperforms state-of-the-art adversarial attack methods in degrading the performance of the victim agent and enhances the victim agent's robustness through adversarial defense mechanisms.
arXiv Detail & Related papers (2025-03-26T15:08:58Z)
State-Aware Perturbation Optimization for Robust Deep Reinforcement Learning [11.807055530003899]
We propose a selective state-aware reinforcement adversarial attack method, named STAR, to optimize perturbation stealthiness and state visitation dispersion.<n>It incorporates an information-theoretic optimization objective to maximize mutual information between perturbations, environmental states, and victim actions, ensuring a dispersed state-visitation distribution.<n>Experiments demonstrate that STAR outperforms state-of-the-art benchmarks.
arXiv Detail & Related papers (2025-03-26T15:00:07Z)
Efficient Safety Alignment of Large Language Models via Preference Re-ranking and Representation-based Reward Modeling [84.00480999255628]
Reinforcement Learning algorithms for safety alignment of Large Language Models (LLMs) encounter the challenge of distribution shift.<n>Current approaches typically address this issue through online sampling from the target policy.<n>We propose a new framework that leverages the model's intrinsic safety judgment capability to extract reward signals.
arXiv Detail & Related papers (2025-03-13T06:40:34Z)
On Minimizing Adversarial Counterfactual Error in Adversarial RL [18.044879441434432]
adversarial noise poses significant risks in safety-critical scenarios.<n>We introduce a novel objective called Adversarial Counterfactual Error (ACoE)<n>Our method significantly outperforms current state-of-the-art approaches for addressing adversarial RL challenges.
arXiv Detail & Related papers (2024-06-07T08:14:24Z)
Mutual-modality Adversarial Attack with Semantic Perturbation [81.66172089175346]
We propose a novel approach that generates adversarial attacks in a mutual-modality optimization scheme. Our approach outperforms state-of-the-art attack methods and can be readily deployed as a plug-and-play solution.
arXiv Detail & Related papers (2023-12-20T05:06:01Z)
Embodied Laser Attack:Leveraging Scene Priors to Achieve Agent-based Robust Non-contact Attacks [13.726534285661717]
This paper introduces the Embodied Laser Attack (ELA), a novel framework that dynamically tailors non-contact laser attacks. For the perception module, ELA has innovatively developed a local perspective transformation network, based on the intrinsic prior knowledge of traffic scenes. For the decision and control module, ELA trains an attack agent with data-driven reinforcement learning instead of adopting time-consuming algorithms.
arXiv Detail & Related papers (2023-12-15T06:16:17Z)
Adversarial Style Transfer for Robust Policy Optimization in Deep Reinforcement Learning [13.652106087606471]
This paper proposes an algorithm that aims to improve generalization for reinforcement learning agents by removing overfitting to confounding features. A policy network updates its parameters to minimize the effect of such perturbations, thus staying robust while maximizing the expected future reward. We evaluate our approach on Procgen and Distracting Control Suite for generalization and sample efficiency.
arXiv Detail & Related papers (2023-08-29T18:17:35Z)
Reparameterized Policy Learning for Multimodal Trajectory Optimization [61.13228961771765]
We investigate the challenge of parametrizing policies for reinforcement learning in high-dimensional continuous action spaces. We propose a principled framework that models the continuous RL policy as a generative model of optimal trajectories. We present a practical model-based RL method, which leverages the multimodal policy parameterization and learned world model.
arXiv Detail & Related papers (2023-07-20T09:05:46Z)
When Demonstrations Meet Generative World Models: A Maximum Likelihood Framework for Offline Inverse Reinforcement Learning [62.00672284480755]
This paper aims to recover the structure of rewards and environment dynamics that underlie observed actions in a fixed, finite set of demonstrations from an expert agent. Accurate models of expertise in executing a task has applications in safety-sensitive applications such as clinical decision making and autonomous driving.
arXiv Detail & Related papers (2023-02-15T04:14:20Z)
Policy Gradient Bayesian Robust Optimization for Imitation Learning [49.881386773269746]
We derive a novel policy gradient-style robust optimization approach, PG-BROIL, to balance expected performance and risk. Results suggest PG-BROIL can produce a family of behaviors ranging from risk-neutral to risk-averse.
arXiv Detail & Related papers (2021-06-11T16:49:15Z)

This list is automatically generated from the titles and abstracts of the papers in this site.