A General Control-Theoretic Approach for Reinforcement Learning: Theory and Algorithms
- URL: http://arxiv.org/abs/2406.14753v2
- Date: Thu, 22 Aug 2024 04:13:18 GMT
- Title: A General Control-Theoretic Approach for Reinforcement Learning: Theory and Algorithms
- Authors: Weiqin Chen, Mark S. Squillante, Chai Wah Wu, Santiago Paternain,
- Abstract summary: We devise a control-theoretic reinforcement learning approach to support direct learning of the optimal policy.
We empirically evaluate our approach on several classical reinforcement learning tasks.
- Score: 7.081523472610874
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We devise a control-theoretic reinforcement learning approach to support direct learning of the optimal policy. We establish various theoretical properties of our approach, such as convergence and optimality of our control-theoretic operator, a new control-policy-parameter gradient ascent theorem, and a specific gradient ascent algorithm based on this theorem. As a representative example, we adapt our approach to a particular control-theoretic framework and empirically evaluate its performance on several classical reinforcement learning tasks, demonstrating significant improvements in solution quality, sample complexity, and running time of our control-theoretic approach over state-of-the-art baseline methods.
Related papers
- A Pontryagin Perspective on Reinforcement Learning [11.56175346731332]
We introduce the paradigm of open-loop reinforcement learning where a fixed action sequence is learned instead.
We present three new algorithms: one robust model-based method and two sample-efficient model-free methods.
arXiv Detail & Related papers (2024-05-28T12:05:20Z) - Iterative Preference Learning from Human Feedback: Bridging Theory and Practice for RLHF under KL-Constraint [56.74058752955209]
This paper studies the alignment process of generative models with Reinforcement Learning from Human Feedback (RLHF)
We first identify the primary challenges of existing popular methods like offline PPO and offline DPO as lacking in strategical exploration of the environment.
We propose efficient algorithms with finite-sample theoretical guarantees.
arXiv Detail & Related papers (2023-12-18T18:58:42Z) - Provable Reward-Agnostic Preference-Based Reinforcement Learning [61.39541986848391]
Preference-based Reinforcement Learning (PbRL) is a paradigm in which an RL agent learns to optimize a task using pair-wise preference-based feedback over trajectories.
We propose a theoretical reward-agnostic PbRL framework where exploratory trajectories that enable accurate learning of hidden reward functions are acquired.
arXiv Detail & Related papers (2023-05-29T15:00:09Z) - Guaranteed Conservation of Momentum for Learning Particle-based Fluid
Dynamics [96.9177297872723]
We present a novel method for guaranteeing linear momentum in learned physics simulations.
We enforce conservation of momentum with a hard constraint, which we realize via antisymmetrical continuous convolutional layers.
In combination, the proposed method allows us to increase the physical accuracy of the learned simulator substantially.
arXiv Detail & Related papers (2022-10-12T09:12:59Z) - Towards a Theoretical Foundation of Policy Optimization for Learning
Control Policies [26.04704565406123]
Gradient-based methods have been widely used for system design and optimization in diverse application domains.
Recently, there has been a renewed interest in studying theoretical properties of these methods in the context of control and reinforcement learning.
This article surveys some of the recent developments on policy optimization, a gradient-based iterative approach for feedback control synthesis.
arXiv Detail & Related papers (2022-10-10T16:13:34Z) - Generalized Policy Improvement Algorithms with Theoretically Supported Sample Reuse [15.134707391442236]
We develop a new class of model-free deep reinforcement learning algorithms for data-driven, learning-based control.
Our Generalized Policy Improvement algorithms combine the policy improvement guarantees of on-policy methods with the efficiency of sample reuse.
arXiv Detail & Related papers (2022-06-28T02:56:12Z) - FedControl: When Control Theory Meets Federated Learning [63.96013144017572]
We distinguish client contributions according to the performance of local learning and its evolution.
The technique is inspired from control theory and its classification performance is evaluated extensively in IID framework.
arXiv Detail & Related papers (2022-05-27T21:05:52Z) - Lessons from AlphaZero for Optimal, Model Predictive, and Adaptive
Control [0.0]
We show that the principal AlphaZero/TDGammon ideas of approximation in value space and rollout apply very broadly to deterministic and optimal control problems.
These ideas can be effectively integrated with other important methodologies such as model control, adaptive control, decentralized control, and neural network-based value and policy approximations.
arXiv Detail & Related papers (2021-08-20T19:17:35Z) - Off-Policy Imitation Learning from Observations [78.30794935265425]
Learning from Observations (LfO) is a practical reinforcement learning scenario from which many applications can benefit.
We propose a sample-efficient LfO approach that enables off-policy optimization in a principled manner.
Our approach is comparable with state-of-the-art locomotion in terms of both sample-efficiency and performance.
arXiv Detail & Related papers (2021-02-25T21:33:47Z) - Optimal Energy Shaping via Neural Approximators [16.879710744315233]
We introduce optimal energy shaping as an enhancement of classical passivity-based control methods.
A systematic approach to adjust performance within a passive control framework has yet to be developed.
arXiv Detail & Related papers (2021-01-14T10:25:58Z) - Control as Hybrid Inference [62.997667081978825]
We present an implementation of CHI which naturally mediates the balance between iterative and amortised inference.
We verify the scalability of our algorithm on a continuous control benchmark, demonstrating that it outperforms strong model-free and model-based baselines.
arXiv Detail & Related papers (2020-07-11T19:44:09Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.