Robust Constrained Reinforcement Learning for Continuous Control with
Model Misspecification
- URL: http://arxiv.org/abs/2010.10644v4
- Date: Wed, 3 Mar 2021 09:54:49 GMT
- Title: Robust Constrained Reinforcement Learning for Continuous Control with
Model Misspecification
- Authors: Daniel J. Mankowitz and Dan A. Calian and Rae Jeong and Cosmin
Paduraru and Nicolas Heess and Sumanth Dathathri and Martin Riedmiller and
Timothy Mann
- Abstract summary: Real-world systems are often subject to effects such as non-stationarity, wear-and-tear, uncalibrated sensors and so on.
Such effects effectively perturb the system dynamics and can cause a policy trained successfully in one domain to perform poorly when deployed to a perturbed version of the same domain.
This can affect a policy's ability to maximize future rewards as well as the extent to which it satisfies constraints.
We present an algorithm that mitigates this form of misspecification, and showcase its performance in multiple simulated Mujoco tasks from the Real World Reinforcement Learning (RWRL)
- Score: 26.488582821511972
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Many real-world physical control systems are required to satisfy constraints
upon deployment. Furthermore, real-world systems are often subject to effects
such as non-stationarity, wear-and-tear, uncalibrated sensors and so on. Such
effects effectively perturb the system dynamics and can cause a policy trained
successfully in one domain to perform poorly when deployed to a perturbed
version of the same domain. This can affect a policy's ability to maximize
future rewards as well as the extent to which it satisfies constraints. We
refer to this as constrained model misspecification. We present an algorithm
that mitigates this form of misspecification, and showcase its performance in
multiple simulated Mujoco tasks from the Real World Reinforcement Learning
(RWRL) suite.
Related papers
- Neural Internal Model Control: Learning a Robust Control Policy via Predictive Error Feedback [16.46487826869775]
We propose a novel framework, Neural Internal Model Control, which integrates model-based control with RL-based control to enhance robustness.
Our framework streamlines the predictive model by applying Newton-Euler equations for rigid-body dynamics, eliminating the need to capture complex high-dimensional nonlinearities.
We demonstrate the effectiveness of our framework on both quadrotors and quadrupedal robots, achieving superior performance compared to state-of-the-art methods.
arXiv Detail & Related papers (2024-11-20T07:07:42Z) - Enhancing Robustness in Deep Reinforcement Learning: A Lyapunov Exponent Approach [1.519321208145928]
In this paper, we investigate the perturbation of deep RL policies to a single small state in deterministic continuous control tasks.
We show that RL policies can be deterministically chaotic, as small perturbations to the system state have a large impact on subsequent state and reward trajectories.
We propose an improvement on the successful Dreamer V3 architecture, implementing Maximal Lyapunov Exponent regularisation.
arXiv Detail & Related papers (2024-10-14T16:16:43Z) - Growing Q-Networks: Solving Continuous Control Tasks with Adaptive Control Resolution [51.83951489847344]
In robotics applications, smooth control signals are commonly preferred to reduce system wear and energy efficiency.
In this work, we aim to bridge this performance gap by growing discrete action spaces from coarse to fine control resolution.
Our work indicates that an adaptive control resolution in combination with value decomposition yields simple critic-only algorithms that yield surprisingly strong performance on continuous control tasks.
arXiv Detail & Related papers (2024-04-05T17:58:37Z) - Uniformly Safe RL with Objective Suppression for Multi-Constraint Safety-Critical Applications [73.58451824894568]
The widely adopted CMDP model constrains the risks in expectation, which makes room for dangerous behaviors in long-tail states.
In safety-critical domains, such behaviors could lead to disastrous outcomes.
We propose Objective Suppression, a novel method that adaptively suppresses the task reward maximizing objectives according to a safety critic.
arXiv Detail & Related papers (2024-02-23T23:22:06Z) - Action-Quantized Offline Reinforcement Learning for Robotic Skill
Learning [68.16998247593209]
offline reinforcement learning (RL) paradigm provides recipe to convert static behavior datasets into policies that can perform better than the policy that collected the data.
In this paper, we propose an adaptive scheme for action quantization.
We show that several state-of-the-art offline RL methods such as IQL, CQL, and BRAC improve in performance on benchmarks when combined with our proposed discretization scheme.
arXiv Detail & Related papers (2023-10-18T06:07:10Z) - An Adaptive Fuzzy Reinforcement Learning Cooperative Approach for the
Autonomous Control of Flock Systems [4.961066282705832]
This work introduces an adaptive distributed robustness technique for the autonomous control of flock systems.
Its relatively flexible structure is based on online fuzzy reinforcement learning schemes which simultaneously target a number of objectives.
In addition to its resilience in the face of dynamic disturbances, the algorithm does not require more than the agent position as a feedback signal.
arXiv Detail & Related papers (2023-03-17T13:07:35Z) - Learning Robust Policy against Disturbance in Transition Dynamics via
State-Conservative Policy Optimization [63.75188254377202]
Deep reinforcement learning algorithms can perform poorly in real-world tasks due to discrepancy between source and target environments.
We propose a novel model-free actor-critic algorithm to learn robust policies without modeling the disturbance in advance.
Experiments in several robot control tasks demonstrate that SCPO learns robust policies against the disturbance in transition dynamics.
arXiv Detail & Related papers (2021-12-20T13:13:05Z) - Failure-averse Active Learning for Physics-constrained Systems [7.701064815584088]
We develop a novel active learning method that avoids failures considering implicit physics constraints that govern the system.
The proposed approach is driven by two tasks: the safe variance reduction explores the safe region to reduce the variance of the target model, and the safe region expansion aims to extend the explorable region exploiting the probabilistic model of constraints.
The method is applied to the composite fuselage assembly process with consideration of material failure using the Tsai-wu criterion, and it is able to achieve zero-failure without the knowledge of explicit failure regions.
arXiv Detail & Related papers (2021-10-27T14:01:03Z) - From Simulation to Real World Maneuver Execution using Deep
Reinforcement Learning [69.23334811890919]
Deep Reinforcement Learning has proved to be able to solve many control tasks in different fields, but the behavior of these systems is not always as expected when deployed in real-world scenarios.
This is mainly due to the lack of domain adaptation between simulated and real-world data together with the absence of distinction between train and test datasets.
We present a system based on multiple environments in which agents are trained simultaneously, evaluating the behavior of the model in different scenarios.
arXiv Detail & Related papers (2020-05-13T14:22:20Z) - Online Constrained Model-based Reinforcement Learning [13.362455603441552]
Key requirement is the ability to handle continuous state and action spaces while remaining within a limited time and resource budget.
We propose a model based approach that combines Gaussian Process regression and Receding Horizon Control.
We test our approach on a cart pole swing-up environment and demonstrate the benefits of online learning on an autonomous racing task.
arXiv Detail & Related papers (2020-04-07T15:51:34Z) - Guided Constrained Policy Optimization for Dynamic Quadrupedal Robot
Locomotion [78.46388769788405]
We introduce guided constrained policy optimization (GCPO), an RL framework based upon our implementation of constrained policy optimization (CPPO)
We show that guided constrained RL offers faster convergence close to the desired optimum resulting in an optimal, yet physically feasible, robotic control behavior without the need for precise reward function tuning.
arXiv Detail & Related papers (2020-02-22T10:15:53Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.