Robust Deep Reinforcement Learning against Adversarial Perturbations on
State Observations
- URL: http://arxiv.org/abs/2003.08938v7
- Date: Wed, 14 Jul 2021 07:20:48 GMT
- Title: Robust Deep Reinforcement Learning against Adversarial Perturbations on
State Observations
- Authors: Huan Zhang, Hongge Chen, Chaowei Xiao, Bo Li, Mingyan Liu, Duane
Boning, Cho-Jui Hsieh
- Abstract summary: A deep reinforcement learning (DRL) agent observes its states through observations, which may contain natural measurement errors or adversarial noises.
Since the observations deviate from the true states, they can mislead the agent into making suboptimal actions.
We show that naively applying existing techniques on improving robustness for classification tasks, like adversarial training, is ineffective for many RL tasks.
- Score: 88.94162416324505
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: A deep reinforcement learning (DRL) agent observes its states through
observations, which may contain natural measurement errors or adversarial
noises. Since the observations deviate from the true states, they can mislead
the agent into making suboptimal actions. Several works have shown this
vulnerability via adversarial attacks, but existing approaches on improving the
robustness of DRL under this setting have limited success and lack for
theoretical principles. We show that naively applying existing techniques on
improving robustness for classification tasks, like adversarial training, is
ineffective for many RL tasks. We propose the state-adversarial Markov decision
process (SA-MDP) to study the fundamental properties of this problem, and
develop a theoretically principled policy regularization which can be applied
to a large family of DRL algorithms, including proximal policy optimization
(PPO), deep deterministic policy gradient (DDPG) and deep Q networks (DQN), for
both discrete and continuous action control problems. We significantly improve
the robustness of PPO, DDPG and DQN agents under a suite of strong white box
adversarial attacks, including new attacks of our own. Additionally, we find
that a robust policy noticeably improves DRL performance even without an
adversary in a number of environments. Our code is available at
https://github.com/chenhongge/StateAdvDRL.
Related papers
- Diffusion-based Reinforcement Learning via Q-weighted Variational Policy Optimization [55.97310586039358]
Diffusion models have garnered widespread attention in Reinforcement Learning (RL) for their powerful expressiveness and multimodality.
We propose a novel model-free diffusion-based online RL algorithm, Q-weighted Variational Policy Optimization (QVPO)
Specifically, we introduce the Q-weighted variational loss, which can be proved to be a tight lower bound of the policy objective in online RL under certain conditions.
We also develop an efficient behavior policy to enhance sample efficiency by reducing the variance of the diffusion policy during online interactions.
arXiv Detail & Related papers (2024-05-25T10:45:46Z) - Belief-Enriched Pessimistic Q-Learning against Adversarial State
Perturbations [5.076419064097735]
Recent work shows that a well-trained RL agent can be easily manipulated by strategically perturbing its state observations at the test stage.
Existing solutions either introduce a regularization term to improve the smoothness of the trained policy against perturbations or alternatively train the agent's policy and the attacker's policy.
We propose a new robust RL algorithm for deriving a pessimistic policy to safeguard against an agent's uncertainty about true states.
arXiv Detail & Related papers (2024-03-06T20:52:49Z) - Improve Robustness of Reinforcement Learning against Observation
Perturbations via $l_\infty$ Lipschitz Policy Networks [8.39061976254379]
Deep Reinforcement Learning (DRL) has achieved remarkable advances in sequential decision tasks.
Recent works have revealed that DRL agents are susceptible to slight perturbations in observations.
We propose a novel robust reinforcement learning method called SortRL, which improves the robustness of DRL policies against observation perturbations.
arXiv Detail & Related papers (2023-12-14T08:57:22Z) - Policy Smoothing for Provably Robust Reinforcement Learning [109.90239627115336]
We study the provable robustness of reinforcement learning against norm-bounded adversarial perturbations of the inputs.
We generate certificates that guarantee that the total reward obtained by the smoothed policy will not fall below a certain threshold under a norm-bounded adversarial of perturbation the input.
arXiv Detail & Related papers (2021-06-21T21:42:08Z) - Robust Reinforcement Learning on State Observations with Learned Optimal
Adversary [86.0846119254031]
We study the robustness of reinforcement learning with adversarially perturbed state observations.
With a fixed agent policy, we demonstrate that an optimal adversary to perturb state observations can be found.
For DRL settings, this leads to a novel empirical adversarial attack to RL agents via a learned adversary that is much stronger than previous ones.
arXiv Detail & Related papers (2021-01-21T05:38:52Z) - Query-based Targeted Action-Space Adversarial Policies on Deep
Reinforcement Learning Agents [23.580682320064714]
This work investigates targeted attacks in the action-space domain, also commonly known as actuation attacks in CPS literature.
We show that a query-based black-box attack model that generates optimal perturbations with respect to an adversarial goal can be formulated as another reinforcement learning problem.
Experimental results showed that adversarial policies that only observe the nominal policy's output generate stronger attacks than adversarial policies that observe the nominal policy's input and output.
arXiv Detail & Related papers (2020-11-13T20:25:48Z) - Robust Deep Reinforcement Learning through Adversarial Loss [74.20501663956604]
Recent studies have shown that deep reinforcement learning agents are vulnerable to small adversarial perturbations on the agent's inputs.
We propose RADIAL-RL, a principled framework to train reinforcement learning agents with improved robustness against adversarial attacks.
arXiv Detail & Related papers (2020-08-05T07:49:42Z) - Provably Good Batch Reinforcement Learning Without Great Exploration [51.51462608429621]
Batch reinforcement learning (RL) is important to apply RL algorithms to many high stakes tasks.
Recent algorithms have shown promise but can still be overly optimistic in their expected outcomes.
We show that a small modification to Bellman optimality and evaluation back-up to take a more conservative update can have much stronger guarantees.
arXiv Detail & Related papers (2020-07-16T09:25:54Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.