Robustifying Reinforcement Learning Policies with $\mathcal{L}_1$
Adaptive Control
- URL: http://arxiv.org/abs/2106.02249v1
- Date: Fri, 4 Jun 2021 04:28:46 GMT
- Title: Robustifying Reinforcement Learning Policies with $\mathcal{L}_1$
Adaptive Control
- Authors: Yikun Cheng, Pan Zhao, Manan Gandhi, Bo Li, Evangelos Theodorou, Naira
Hovakimyan
- Abstract summary: A reinforcement learning (RL) policy could fail in a new/perturbed environment due to the existence of dynamic variations.
We propose an approach to robustifying a pre-trained non-robust RL policy with $mathcalL_1$ adaptive control.
Our approach can significantly improve the robustness of an RL policy trained in a standard (i.e., non-robust) way, either in a simulator or in the real world.
- Score: 7.025818894763949
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: A reinforcement learning (RL) policy trained in a nominal environment could
fail in a new/perturbed environment due to the existence of dynamic variations.
Existing robust methods try to obtain a fixed policy for all envisioned dynamic
variation scenarios through robust or adversarial training. These methods could
lead to conservative performance due to emphasis on the worst case, and often
involve tedious modifications to the training environment. We propose an
approach to robustifying a pre-trained non-robust RL policy with
$\mathcal{L}_1$ adaptive control. Leveraging the capability of an
$\mathcal{L}_1$ control law in the fast estimation of and active compensation
for dynamic variations, our approach can significantly improve the robustness
of an RL policy trained in a standard (i.e., non-robust) way, either in a
simulator or in the real world. Numerical experiments are provided to validate
the efficacy of the proposed approach.
Related papers
- Robust Deep Reinforcement Learning with Adaptive Adversarial Perturbations in Action Space [3.639580365066386]
We propose an adaptive adversarial coefficient framework to adjust the effect of the adversarial perturbation during training.
The appealing feature of our method is that it is simple to deploy in real-world applications and does not require accessing the simulator in advance.
The experiments in MuJoCo show that our method can improve the training stability and learn a robust policy when migrated to different test environments.
arXiv Detail & Related papers (2024-05-20T12:31:11Z) - UDUC: An Uncertainty-driven Approach for Learning-based Robust Control [9.76247882232402]
Probabilistic ensemble (PE) models offer a promising approach for modelling system dynamics.
PE models are susceptible to mode collapse, resulting in non-robust control when faced with environments slightly different from the training set.
We introduce the $textbfu$ncertainty-$textbfd$riven rob$textbfu$st $textbfc$ontrol (UDUC) loss as an alternative objective for training PE models.
arXiv Detail & Related papers (2024-05-04T07:48:59Z) - Constrained Reinforcement Learning Under Model Mismatch [18.05296241839688]
Existing studies on constrained reinforcement learning (RL) may obtain a well-performing policy in the training environment.
However, when deployed in a real environment, it may easily violate constraints that were originally satisfied during training because there might be model mismatch between the training and real environments.
We develop a Robust Constrained Policy Optimization (RCPO) algorithm, which is the first algorithm that applies to large/continuous state space and has theoretical guarantees on worst-case reward improvement and constraint violation at each iteration during the training.
arXiv Detail & Related papers (2024-05-02T14:31:52Z) - Actor-Critic Reinforcement Learning with Phased Actor [10.577516871906816]
We propose a novel phased actor in actor-critic (PAAC) method to improve policy gradient estimation.
PAAC accounts for both $Q$ value and TD error in its actor update.
Results show that PAAC leads to significant performance improvement measured by total cost, learning variance, robustness, learning speed and success rate.
arXiv Detail & Related papers (2024-04-18T01:27:31Z) - Statistically Efficient Variance Reduction with Double Policy Estimation
for Off-Policy Evaluation in Sequence-Modeled Reinforcement Learning [53.97273491846883]
We propose DPE: an RL algorithm that blends offline sequence modeling and offline reinforcement learning with Double Policy Estimation.
We validate our method in multiple tasks of OpenAI Gym with D4RL benchmarks.
arXiv Detail & Related papers (2023-08-28T20:46:07Z) - Efficient Deep Learning of Robust, Adaptive Policies using Tube
MPC-Guided Data Augmentation [42.66792060626531]
Existing robust and adaptive controllers can achieve impressive performance at the cost of heavy online onboard computations.
We extend an existing efficient Imitation Learning (IL) algorithm for robust policy learning from MPC with the ability to learn policies that adapt to challenging model/environment uncertainties.
arXiv Detail & Related papers (2023-03-28T02:22:47Z) - Safety Correction from Baseline: Towards the Risk-aware Policy in
Robotics via Dual-agent Reinforcement Learning [64.11013095004786]
We propose a dual-agent safe reinforcement learning strategy consisting of a baseline and a safe agent.
Such a decoupled framework enables high flexibility, data efficiency and risk-awareness for RL-based control.
The proposed method outperforms the state-of-the-art safe RL algorithms on difficult robot locomotion and manipulation tasks.
arXiv Detail & Related papers (2022-12-14T03:11:25Z) - A Regularized Implicit Policy for Offline Reinforcement Learning [54.7427227775581]
offline reinforcement learning enables learning from a fixed dataset, without further interactions with the environment.
We propose a framework that supports learning a flexible yet well-regularized fully-implicit policy.
Experiments and ablation study on the D4RL dataset validate our framework and the effectiveness of our algorithmic designs.
arXiv Detail & Related papers (2022-02-19T20:22:04Z) - Off-policy Reinforcement Learning with Optimistic Exploration and
Distribution Correction [73.77593805292194]
We train a separate exploration policy to maximize an approximate upper confidence bound of the critics in an off-policy actor-critic framework.
To mitigate the off-policy-ness, we adapt the recently introduced DICE framework to learn a distribution correction ratio for off-policy actor-critic training.
arXiv Detail & Related papers (2021-10-22T22:07:51Z) - Guided Constrained Policy Optimization for Dynamic Quadrupedal Robot
Locomotion [78.46388769788405]
We introduce guided constrained policy optimization (GCPO), an RL framework based upon our implementation of constrained policy optimization (CPPO)
We show that guided constrained RL offers faster convergence close to the desired optimum resulting in an optimal, yet physically feasible, robotic control behavior without the need for precise reward function tuning.
arXiv Detail & Related papers (2020-02-22T10:15:53Z) - Robust Reinforcement Learning via Adversarial training with Langevin
Dynamics [51.234482917047835]
We introduce a sampling perspective to tackle the challenging task of training robust Reinforcement Learning (RL) agents.
We present a novel, scalable two-player RL algorithm, which is a sampling variant of the two-player policy method.
arXiv Detail & Related papers (2020-02-14T14:59:14Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.