Enhancing Robustness in Deep Reinforcement Learning: A Lyapunov Exponent Approach
- URL: http://arxiv.org/abs/2410.10674v1
- Date: Mon, 14 Oct 2024 16:16:43 GMT
- Title: Enhancing Robustness in Deep Reinforcement Learning: A Lyapunov Exponent Approach
- Authors: Rory Young, Nicolas Pugeault,
- Abstract summary: In this paper, we investigate the perturbation of deep RL policies to a single small state in deterministic continuous control tasks.
We show that RL policies can be deterministically chaotic as small perturbations to the system state have a large impact on subsequent state and reward trajectories.
We propose an improvement on the successful Dreamer V3 architecture, implementing a Maximal Lyapunov Exponent regularisation.
- Score: 1.519321208145928
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Deep reinforcement learning agents achieve state-of-the-art performance in a wide range of simulated control tasks. However, successful applications to real-world problems remain limited. One reason for this dichotomy is because the learned policies are not robust to observation noise or adversarial attacks. In this paper, we investigate the robustness of deep RL policies to a single small state perturbation in deterministic continuous control tasks. We demonstrate that RL policies can be deterministically chaotic as small perturbations to the system state have a large impact on subsequent state and reward trajectories. This unstable non-linear behaviour has two consequences: First, inaccuracies in sensor readings, or adversarial attacks, can cause significant performance degradation; Second, even policies that show robust performance in terms of rewards may have unpredictable behaviour in practice. These two facets of chaos in RL policies drastically restrict the application of deep RL to real-world problems. To address this issue, we propose an improvement on the successful Dreamer V3 architecture, implementing a Maximal Lyapunov Exponent regularisation. This new approach reduces the chaotic state dynamics, rendering the learnt policies more resilient to sensor noise or adversarial attacks and thereby improving the suitability of Deep Reinforcement Learning for real-world applications.
Related papers
- Robust off-policy Reinforcement Learning via Soft Constrained Adversary [0.7583052519127079]
We introduce an f-divergence constrained problem with the prior knowledge distribution.
We derive two typical attacks and their corresponding robust learning frameworks.
Results demonstrate that our proposed methods achieve excellent performance in sample-efficient off-policy RL.
arXiv Detail & Related papers (2024-08-31T11:13:33Z) - Belief-Enriched Pessimistic Q-Learning against Adversarial State
Perturbations [5.076419064097735]
Recent work shows that a well-trained RL agent can be easily manipulated by strategically perturbing its state observations at the test stage.
Existing solutions either introduce a regularization term to improve the smoothness of the trained policy against perturbations or alternatively train the agent's policy and the attacker's policy.
We propose a new robust RL algorithm for deriving a pessimistic policy to safeguard against an agent's uncertainty about true states.
arXiv Detail & Related papers (2024-03-06T20:52:49Z) - Uniformly Safe RL with Objective Suppression for Multi-Constraint Safety-Critical Applications [73.58451824894568]
The widely adopted CMDP model constrains the risks in expectation, which makes room for dangerous behaviors in long-tail states.
In safety-critical domains, such behaviors could lead to disastrous outcomes.
We propose Objective Suppression, a novel method that adaptively suppresses the task reward maximizing objectives according to a safety critic.
arXiv Detail & Related papers (2024-02-23T23:22:06Z) - Improve Robustness of Reinforcement Learning against Observation
Perturbations via $l_\infty$ Lipschitz Policy Networks [8.39061976254379]
Deep Reinforcement Learning (DRL) has achieved remarkable advances in sequential decision tasks.
Recent works have revealed that DRL agents are susceptible to slight perturbations in observations.
We propose a novel robust reinforcement learning method called SortRL, which improves the robustness of DRL policies against observation perturbations.
arXiv Detail & Related papers (2023-12-14T08:57:22Z) - Learning Robust Policy against Disturbance in Transition Dynamics via
State-Conservative Policy Optimization [63.75188254377202]
Deep reinforcement learning algorithms can perform poorly in real-world tasks due to discrepancy between source and target environments.
We propose a novel model-free actor-critic algorithm to learn robust policies without modeling the disturbance in advance.
Experiments in several robot control tasks demonstrate that SCPO learns robust policies against the disturbance in transition dynamics.
arXiv Detail & Related papers (2021-12-20T13:13:05Z) - Policy Smoothing for Provably Robust Reinforcement Learning [109.90239627115336]
We study the provable robustness of reinforcement learning against norm-bounded adversarial perturbations of the inputs.
We generate certificates that guarantee that the total reward obtained by the smoothed policy will not fall below a certain threshold under a norm-bounded adversarial of perturbation the input.
arXiv Detail & Related papers (2021-06-21T21:42:08Z) - State Augmented Constrained Reinforcement Learning: Overcoming the
Limitations of Learning with Rewards [88.30521204048551]
A common formulation of constrained reinforcement learning involves multiple rewards that must individually accumulate to given thresholds.
We show a simple example in which the desired optimal policy cannot be induced by any weighted linear combination of rewards.
This work addresses this shortcoming by augmenting the state with Lagrange multipliers and reinterpreting primal-dual methods.
arXiv Detail & Related papers (2021-02-23T21:07:35Z) - Disturbing Reinforcement Learning Agents with Corrupted Rewards [62.997667081978825]
We analyze the effects of different attack strategies based on reward perturbations on reinforcement learning algorithms.
We show that smoothly crafting adversarial rewards are able to mislead the learner, and that using low exploration probability values, the policy learned is more robust to corrupt rewards.
arXiv Detail & Related papers (2021-02-12T15:53:48Z) - Robust Constrained Reinforcement Learning for Continuous Control with
Model Misspecification [26.488582821511972]
Real-world systems are often subject to effects such as non-stationarity, wear-and-tear, uncalibrated sensors and so on.
Such effects effectively perturb the system dynamics and can cause a policy trained successfully in one domain to perform poorly when deployed to a perturbed version of the same domain.
This can affect a policy's ability to maximize future rewards as well as the extent to which it satisfies constraints.
We present an algorithm that mitigates this form of misspecification, and showcase its performance in multiple simulated Mujoco tasks from the Real World Reinforcement Learning (RWRL)
arXiv Detail & Related papers (2020-10-20T22:05:37Z) - Robust Deep Reinforcement Learning against Adversarial Perturbations on
State Observations [88.94162416324505]
A deep reinforcement learning (DRL) agent observes its states through observations, which may contain natural measurement errors or adversarial noises.
Since the observations deviate from the true states, they can mislead the agent into making suboptimal actions.
We show that naively applying existing techniques on improving robustness for classification tasks, like adversarial training, is ineffective for many RL tasks.
arXiv Detail & Related papers (2020-03-19T17:59:59Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.