Adversarially Regularized Policy Learning Guided by Trajectory
Optimization
- URL: http://arxiv.org/abs/2109.07627v1
- Date: Thu, 16 Sep 2021 00:02:11 GMT
- Title: Adversarially Regularized Policy Learning Guided by Trajectory
Optimization
- Authors: Zhigen Zhao, Simiao Zuo, Tuo Zhao, Ye Zhao
- Abstract summary: We propose adVErsarially Regularized pOlicy learNIng guided by trajeCtory optimizAtion (VERONICA) for learning smooth control policies.
Our proposed approach improves the sample efficiency of neural policy learning and enhances the robustness of the policy against various types of disturbances.
- Score: 31.122262331980153
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Recent advancement in combining trajectory optimization with function
approximation (especially neural networks) shows promise in learning complex
control policies for diverse tasks in robot systems. Despite their great
flexibility, the large neural networks for parameterizing control policies
impose significant challenges. The learned neural control policies are often
overcomplex and non-smooth, which can easily cause unexpected or diverging
robot motions. Therefore, they often yield poor generalization performance in
practice. To address this issue, we propose adVErsarially Regularized pOlicy
learNIng guided by trajeCtory optimizAtion (VERONICA) for learning smooth
control policies. Specifically, our proposed approach controls the smoothness
(local Lipschitz continuity) of the neural control policies by stabilizing the
output control with respect to the worst-case perturbation to the input state.
Our experiments on robot manipulation show that our proposed approach not only
improves the sample efficiency of neural policy learning but also enhances the
robustness of the policy against various types of disturbances, including
sensor noise, environmental uncertainty, and model mismatch.
Related papers
- Growing Q-Networks: Solving Continuous Control Tasks with Adaptive Control Resolution [51.83951489847344]
In robotics applications, smooth control signals are commonly preferred to reduce system wear and energy efficiency.
In this work, we aim to bridge this performance gap by growing discrete action spaces from coarse to fine control resolution.
Our work indicates that an adaptive control resolution in combination with value decomposition yields simple critic-only algorithms that yield surprisingly strong performance on continuous control tasks.
arXiv Detail & Related papers (2024-04-05T17:58:37Z) - Extremum-Seeking Action Selection for Accelerating Policy Optimization [18.162794442835413]
Reinforcement learning for control over continuous spaces typically uses high-entropy policies, such as Gaussian distributions, for local exploration and estimating policy to optimize performance.
We propose to improve action selection in this model-free RL setting by introducing additional adaptive control steps based on Extremum-Seeking Control (ESC)
Our methods can be easily added in standard policy optimization to improve learning efficiency, which we demonstrate in various control learning environments.
arXiv Detail & Related papers (2024-04-02T02:39:17Z) - Closed-form control with spike coding networks [1.1470070927586016]
Efficient and robust control using spiking neural networks (SNNs) is still an open problem.
We extend neuroscience theory of Spike Coding Networks (SCNs) by incorporating closed-form optimal estimation and control.
We demonstrate robust spiking control of simulated spring-mass-damper and cart-pole systems.
arXiv Detail & Related papers (2022-12-25T10:32:20Z) - Learning Robust Policy against Disturbance in Transition Dynamics via
State-Conservative Policy Optimization [63.75188254377202]
Deep reinforcement learning algorithms can perform poorly in real-world tasks due to discrepancy between source and target environments.
We propose a novel model-free actor-critic algorithm to learn robust policies without modeling the disturbance in advance.
Experiments in several robot control tasks demonstrate that SCPO learns robust policies against the disturbance in transition dynamics.
arXiv Detail & Related papers (2021-12-20T13:13:05Z) - Non-stationary Online Learning with Memory and Non-stochastic Control [71.14503310914799]
We study the problem of Online Convex Optimization (OCO) with memory, which allows loss functions to depend on past decisions.
In this paper, we introduce dynamic policy regret as the performance measure to design algorithms robust to non-stationary environments.
We propose a novel algorithm for OCO with memory that provably enjoys an optimal dynamic policy regret in terms of time horizon, non-stationarity measure, and memory length.
arXiv Detail & Related papers (2021-02-07T09:45:15Z) - Enforcing robust control guarantees within neural network policies [76.00287474159973]
We propose a generic nonlinear control policy class, parameterized by neural networks, that enforces the same provable robustness criteria as robust control.
We demonstrate the power of this approach on several domains, improving in average-case performance over existing robust control methods and in worst-case stability over (non-robust) deep RL methods.
arXiv Detail & Related papers (2020-11-16T17:14:59Z) - Learning High-Level Policies for Model Predictive Control [54.00297896763184]
Model Predictive Control (MPC) provides robust solutions to robot control tasks.
We propose a self-supervised learning algorithm for learning a neural network high-level policy.
We show that our approach can handle situations that are difficult for standard MPC.
arXiv Detail & Related papers (2020-07-20T17:12:34Z) - PFPN: Continuous Control of Physically Simulated Characters using
Particle Filtering Policy Network [0.9137554315375919]
We propose a framework that considers a particle-based action policy as a substitute for Gaussian policies.
We demonstrate the applicability of our approach on various motion capture imitation tasks.
arXiv Detail & Related papers (2020-03-16T00:35:36Z) - Guided Constrained Policy Optimization for Dynamic Quadrupedal Robot
Locomotion [78.46388769788405]
We introduce guided constrained policy optimization (GCPO), an RL framework based upon our implementation of constrained policy optimization (CPPO)
We show that guided constrained RL offers faster convergence close to the desired optimum resulting in an optimal, yet physically feasible, robotic control behavior without the need for precise reward function tuning.
arXiv Detail & Related papers (2020-02-22T10:15:53Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.