Towards a Theoretical Foundation of Policy Optimization for Learning
Control Policies
- URL: http://arxiv.org/abs/2210.04810v1
- Date: Mon, 10 Oct 2022 16:13:34 GMT
- Title: Towards a Theoretical Foundation of Policy Optimization for Learning
Control Policies
- Authors: Bin Hu, Kaiqing Zhang, Na Li, Mehran Mesbahi, Maryam Fazel, Tamer
Ba\c{s}ar
- Abstract summary: Gradient-based methods have been widely used for system design and optimization in diverse application domains.
Recently, there has been a renewed interest in studying theoretical properties of these methods in the context of control and reinforcement learning.
This article surveys some of the recent developments on policy optimization, a gradient-based iterative approach for feedback control synthesis.
- Score: 26.04704565406123
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Gradient-based methods have been widely used for system design and
optimization in diverse application domains. Recently, there has been a renewed
interest in studying theoretical properties of these methods in the context of
control and reinforcement learning. This article surveys some of the recent
developments on policy optimization, a gradient-based iterative approach for
feedback control synthesis, popularized by successes of reinforcement learning.
We take an interdisciplinary perspective in our exposition that connects
control theory, reinforcement learning, and large-scale optimization. We review
a number of recently-developed theoretical results on the optimization
landscape, global convergence, and sample complexity of gradient-based methods
for various continuous control problems such as the linear quadratic regulator
(LQR), $\mathcal{H}_\infty$ control, risk-sensitive control, linear quadratic
Gaussian (LQG) control, and output feedback synthesis. In conjunction with
these optimization results, we also discuss how direct policy optimization
handles stability and robustness concerns in learning-based control, two main
desiderata in control engineering. We conclude the survey by pointing out
several challenges and opportunities at the intersection of learning and
control.
Related papers
- Comparison of Model Predictive Control and Proximal Policy Optimization for a 1-DOF Helicopter System [0.7499722271664147]
This study conducts a comparative analysis of Model Predictive Control (MPC) and Proximal Policy Optimization (PPO), a Deep Reinforcement Learning (DRL) algorithm, applied to a Quanser Aero 2 system.
PPO excels in rise-time and adaptability, making it a promising approach for applications requiring rapid response and adaptability.
arXiv Detail & Related papers (2024-08-28T08:35:34Z) - Last-Iterate Global Convergence of Policy Gradients for Constrained Reinforcement Learning [62.81324245896717]
We introduce an exploration-agnostic algorithm, called C-PG, which exhibits global last-ite convergence guarantees under (weak) gradient domination assumptions.
We numerically validate our algorithms on constrained control problems, and compare them with state-of-the-art baselines.
arXiv Detail & Related papers (2024-07-15T14:54:57Z) - A General Control-Theoretic Approach for Reinforcement Learning: Theory and Algorithms [7.081523472610874]
We devise a control-theoretic reinforcement learning approach to support direct learning of the optimal policy.
We empirically evaluate our approach on several classical reinforcement learning tasks.
arXiv Detail & Related papers (2024-06-20T21:50:46Z) - Joint Demonstration and Preference Learning Improves Policy Alignment with Human Feedback [58.049113055986375]
We develop a single stage approach named Alignment with Integrated Human Feedback (AIHF) to train reward models and the policy.
The proposed approach admits a suite of efficient algorithms, which can easily reduce to, and leverage, popular alignment algorithms.
We demonstrate the efficiency of the proposed solutions with extensive experiments involving alignment problems in LLMs and robotic control problems in MuJoCo.
arXiv Detail & Related papers (2024-06-11T01:20:53Z) - Offline Supervised Learning V.S. Online Direct Policy Optimization: A Comparative Study and A Unified Training Paradigm for Neural Network-Based Optimal Feedback Control [7.242569453287703]
We first conduct a comparative study of two prevalent approaches: offline supervised learning and online direct policy optimization.
Our results underscore the superiority of offline supervised learning in terms of both optimality and training time.
We propose the Pre-train and Fine-tune strategy as a unified training paradigm for optimal feedback control.
arXiv Detail & Related papers (2022-11-29T05:07:13Z) - Enforcing the consensus between Trajectory Optimization and Policy
Learning for precise robot control [75.28441662678394]
Reinforcement learning (RL) and trajectory optimization (TO) present strong complementary advantages.
We propose several improvements on top of these approaches to learn global control policies quicker.
arXiv Detail & Related papers (2022-09-19T13:32:09Z) - Multi-Objective Policy Gradients with Topological Constraints [108.10241442630289]
We present a new algorithm for a policy gradient in TMDPs by a simple extension of the proximal policy optimization (PPO) algorithm.
We demonstrate this on a real-world multiple-objective navigation problem with an arbitrary ordering of objectives both in simulation and on a real robot.
arXiv Detail & Related papers (2022-09-15T07:22:58Z) - Comparative analysis of machine learning methods for active flow control [60.53767050487434]
Genetic Programming (GP) and Reinforcement Learning (RL) are gaining popularity in flow control.
This work presents a comparative analysis of the two, bench-marking some of their most representative algorithms against global optimization techniques.
arXiv Detail & Related papers (2022-02-23T18:11:19Z) - Sparsity in Partially Controllable Linear Systems [56.142264865866636]
We study partially controllable linear dynamical systems specified by an underlying sparsity pattern.
Our results characterize those state variables which are irrelevant for optimal control.
arXiv Detail & Related papers (2021-10-12T16:41:47Z) - Escaping from Zero Gradient: Revisiting Action-Constrained Reinforcement
Learning via Frank-Wolfe Policy Optimization [5.072893872296332]
Action-constrained reinforcement learning (RL) is a widely-used approach in various real-world applications.
We propose a learning algorithm that decouples the action constraints from the policy parameter update.
We show that the proposed algorithm significantly outperforms the benchmark methods on a variety of control tasks.
arXiv Detail & Related papers (2021-02-22T14:28:03Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.