Related papers: Extremum-Seeking Action Selection for Accelerating Policy Optimization

Extremum-Seeking Action Selection for Accelerating Policy Optimization

URL: http://arxiv.org/abs/2404.01598v1
Date: Tue, 2 Apr 2024 02:39:17 GMT
Title: Extremum-Seeking Action Selection for Accelerating Policy Optimization
Authors: Ya-Chien Chang, Sicun Gao,
Abstract summary: Reinforcement learning for control over continuous spaces typically uses high-entropy policies, such as Gaussian distributions, for local exploration and estimating policy to optimize performance. We propose to improve action selection in this model-free RL setting by introducing additional adaptive control steps based on Extremum-Seeking Control (ESC) Our methods can be easily added in standard policy optimization to improve learning efficiency, which we demonstrate in various control learning environments.
Score: 18.162794442835413
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Reinforcement learning for control over continuous spaces typically uses high-entropy stochastic policies, such as Gaussian distributions, for local exploration and estimating policy gradient to optimize performance. Many robotic control problems deal with complex unstable dynamics, where applying actions that are off the feasible control manifolds can quickly lead to undesirable divergence. In such cases, most samples taken from the ambient action space generate low-value trajectories that hardly contribute to policy improvement, resulting in slow or failed learning. We propose to improve action selection in this model-free RL setting by introducing additional adaptive control steps based on Extremum-Seeking Control (ESC). On each action sampled from stochastic policies, we apply sinusoidal perturbations and query for estimated Q-values as the response signal. Based on ESC, we then dynamically improve the sampled actions to be closer to nearby optima before applying them to the environment. Our methods can be easily added in standard policy optimization to improve learning efficiency, which we demonstrate in various control learning environments.

Related papers

Learning Optimal Deterministic Policies with Stochastic Policy Gradients [62.81324245896716]
Policy gradient (PG) methods are successful approaches to deal with continuous reinforcement learning (RL) problems. In common practice, convergence (hyper)policies are learned only to deploy their deterministic version. We show how to tune the exploration level used for learning to optimize the trade-off between the sample complexity and the performance of the deployed deterministic policy.
arXiv Detail & Related papers (2024-05-03T16:45:15Z)
Growing Q-Networks: Solving Continuous Control Tasks with Adaptive Control Resolution [51.83951489847344]
In robotics applications, smooth control signals are commonly preferred to reduce system wear and energy efficiency. In this work, we aim to bridge this performance gap by growing discrete action spaces from coarse to fine control resolution. Our work indicates that an adaptive control resolution in combination with value decomposition yields simple critic-only algorithms that yield surprisingly strong performance on continuous control tasks.
arXiv Detail & Related papers (2024-04-05T17:58:37Z)
Reparameterized Policy Learning for Multimodal Trajectory Optimization [61.13228961771765]
We investigate the challenge of parametrizing policies for reinforcement learning in high-dimensional continuous action spaces. We propose a principled framework that models the continuous RL policy as a generative model of optimal trajectories. We present a practical model-based RL method, which leverages the multimodal policy parameterization and learned world model.
arXiv Detail & Related papers (2023-07-20T09:05:46Z)
Robust Policy Optimization in Deep Reinforcement Learning [16.999444076456268]
In continuous action domains, parameterized distribution of action distribution allows easy control of exploration. In particular, we propose an algorithm called Robust Policy Optimization (RPO), which leverages a perturbed distribution. We evaluated our methods on various continuous control tasks from DeepMind Control, OpenAI Gym, Pybullet, and IsaacGym.
arXiv Detail & Related papers (2022-12-14T22:43:56Z)
Hierarchical Policy Blending as Inference for Reactive Robot Control [21.058662668187875]
Motion generation in cluttered, dense, and dynamic environments is a central topic in robotics. We propose a hierarchical motion generation method that combines the benefits of reactive policies and planning. Our experimental study in planar navigation and 6DoF manipulation shows that our proposed hierarchical motion generation method outperforms both myopic reactive controllers and online re-planning methods.
arXiv Detail & Related papers (2022-10-14T15:16:54Z)
Exploration via Planning for Information about the Optimal Trajectory [67.33886176127578]
We develop a method that allows us to plan for exploration while taking the task and the current knowledge into account. We demonstrate that our method learns strong policies with 2x fewer samples than strong exploration baselines.
arXiv Detail & Related papers (2022-10-06T20:28:55Z)
GPU-Accelerated Policy Optimization via Batch Automatic Differentiation of Gaussian Processes for Real-World Control [8.720903734757627]
We develop a policy optimization method by leveraging fast predictive sampling methods to process batches of trajectories in every forward pass. We demonstrate the effectiveness of our approach in training policies on a set of reference-tracking control experiments with a heavy-duty machine.
arXiv Detail & Related papers (2022-02-28T09:31:15Z)
Policy Search for Model Predictive Control with Application to Agile Drone Flight [56.24908013905407]
We propose a policy-search-for-model-predictive-control framework for MPC. Specifically, we formulate the MPC as a parameterized controller, where the hard-to-optimize decision variables are represented as high-level policies. Experiments show that our controller achieves robust and real-time control performance in both simulation and the real world.
arXiv Detail & Related papers (2021-12-07T17:39:24Z)
Adversarially Regularized Policy Learning Guided by Trajectory Optimization [31.122262331980153]
We propose adVErsarially Regularized pOlicy learNIng guided by trajeCtory optimizAtion (VERONICA) for learning smooth control policies. Our proposed approach improves the sample efficiency of neural policy learning and enhances the robustness of the policy against various types of disturbances.
arXiv Detail & Related papers (2021-09-16T00:02:11Z)
Guided Constrained Policy Optimization for Dynamic Quadrupedal Robot Locomotion [78.46388769788405]
We introduce guided constrained policy optimization (GCPO), an RL framework based upon our implementation of constrained policy optimization (CPPO) We show that guided constrained RL offers faster convergence close to the desired optimum resulting in an optimal, yet physically feasible, robotic control behavior without the need for precise reward function tuning.
arXiv Detail & Related papers (2020-02-22T10:15:53Z)

This list is automatically generated from the titles and abstracts of the papers in this site.