Improve Robustness of Reinforcement Learning against Observation
Perturbations via $l_\infty$ Lipschitz Policy Networks
- URL: http://arxiv.org/abs/2312.08751v1
- Date: Thu, 14 Dec 2023 08:57:22 GMT
- Title: Improve Robustness of Reinforcement Learning against Observation
Perturbations via $l_\infty$ Lipschitz Policy Networks
- Authors: Buqing Nie, Jingtian Ji, Yangqing Fu, Yue Gao
- Abstract summary: Deep Reinforcement Learning (DRL) has achieved remarkable advances in sequential decision tasks.
Recent works have revealed that DRL agents are susceptible to slight perturbations in observations.
We propose a novel robust reinforcement learning method called SortRL, which improves the robustness of DRL policies against observation perturbations.
- Score: 8.39061976254379
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Deep Reinforcement Learning (DRL) has achieved remarkable advances in
sequential decision tasks. However, recent works have revealed that DRL agents
are susceptible to slight perturbations in observations. This vulnerability
raises concerns regarding the effectiveness and robustness of deploying such
agents in real-world applications. In this work, we propose a novel robust
reinforcement learning method called SortRL, which improves the robustness of
DRL policies against observation perturbations from the perspective of the
network architecture. We employ a novel architecture for the policy network
that incorporates global $l_\infty$ Lipschitz continuity and provide a
convenient method to enhance policy robustness based on the output margin.
Besides, a training framework is designed for SortRL, which solves given tasks
while maintaining robustness against $l_\infty$ bounded perturbations on the
observations. Several experiments are conducted to evaluate the effectiveness
of our method, including classic control tasks and video games. The results
demonstrate that SortRL achieves state-of-the-art robustness performance
against different perturbation strength.
Related papers
- Enhancing Robustness in Deep Reinforcement Learning: A Lyapunov Exponent Approach [1.519321208145928]
In this paper, we investigate the perturbation of deep RL policies to a single small state in deterministic continuous control tasks.
We show that RL policies can be deterministically chaotic as small perturbations to the system state have a large impact on subsequent state and reward trajectories.
We propose an improvement on the successful Dreamer V3 architecture, implementing a Maximal Lyapunov Exponent regularisation.
arXiv Detail & Related papers (2024-10-14T16:16:43Z) - Improving Deep Reinforcement Learning by Reducing the Chain Effect of Value and Policy Churn [14.30387204093346]
Deep neural networks provide Reinforcement Learning (RL) powerful function approximators to address large-scale decision-making problems.
One source of the challenges in RL is that output predictions can churn, leading to uncontrolled changes after each batch update for states not included in the batch.
We propose a method to reduce the chain effect across different settings, called Churn Approximated ReductIoN (CHAIN), which can be easily plugged into most existing DRL algorithms.
arXiv Detail & Related papers (2024-09-07T11:08:20Z) - RORL: Robust Offline Reinforcement Learning via Conservative Smoothing [72.8062448549897]
offline reinforcement learning can exploit the massive amount of offline data for complex decision-making tasks.
Current offline RL algorithms are generally designed to be conservative for value estimation and action selection.
We propose Robust Offline Reinforcement Learning (RORL) with a novel conservative smoothing technique.
arXiv Detail & Related papers (2022-06-06T18:07:41Z) - Off-policy Reinforcement Learning with Optimistic Exploration and
Distribution Correction [73.77593805292194]
We train a separate exploration policy to maximize an approximate upper confidence bound of the critics in an off-policy actor-critic framework.
To mitigate the off-policy-ness, we adapt the recently introduced DICE framework to learn a distribution correction ratio for off-policy actor-critic training.
arXiv Detail & Related papers (2021-10-22T22:07:51Z) - Policy Smoothing for Provably Robust Reinforcement Learning [109.90239627115336]
We study the provable robustness of reinforcement learning against norm-bounded adversarial perturbations of the inputs.
We generate certificates that guarantee that the total reward obtained by the smoothed policy will not fall below a certain threshold under a norm-bounded adversarial of perturbation the input.
arXiv Detail & Related papers (2021-06-21T21:42:08Z) - Causal Inference Q-Network: Toward Resilient Reinforcement Learning [57.96312207429202]
We consider a resilient DRL framework with observational interferences.
Under this framework, we propose a causal inference based DRL algorithm called causal inference Q-network (CIQ)
Our experimental results show that the proposed CIQ method could achieve higher performance and more resilience against observational interferences.
arXiv Detail & Related papers (2021-02-18T23:50:20Z) - Robust Deep Reinforcement Learning through Adversarial Loss [74.20501663956604]
Recent studies have shown that deep reinforcement learning agents are vulnerable to small adversarial perturbations on the agent's inputs.
We propose RADIAL-RL, a principled framework to train reinforcement learning agents with improved robustness against adversarial attacks.
arXiv Detail & Related papers (2020-08-05T07:49:42Z) - Robust Deep Reinforcement Learning against Adversarial Perturbations on
State Observations [88.94162416324505]
A deep reinforcement learning (DRL) agent observes its states through observations, which may contain natural measurement errors or adversarial noises.
Since the observations deviate from the true states, they can mislead the agent into making suboptimal actions.
We show that naively applying existing techniques on improving robustness for classification tasks, like adversarial training, is ineffective for many RL tasks.
arXiv Detail & Related papers (2020-03-19T17:59:59Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.