A General Control-Theoretic Approach for Reinforcement Learning: Theory and Algorithms
- URL: http://arxiv.org/abs/2406.14753v3
- Date: Wed, 27 Nov 2024 20:34:29 GMT
- Title: A General Control-Theoretic Approach for Reinforcement Learning: Theory and Algorithms
- Authors: Weiqin Chen, Mark S. Squillante, Chai Wah Wu, Santiago Paternain,
- Abstract summary: We devise a control-theoretic reinforcement learning approach to support direct learning of the optimal policy.
We empirically evaluate the performance of our control theoretic approach on several classical reinforcement learning tasks.
- Score: 7.081523472610874
- License:
- Abstract: We devise a control-theoretic reinforcement learning approach to support direct learning of the optimal policy. We establish various theoretical properties of our approach, such as convergence and optimality of our analog of the Bellman operator and Q-learning, a new control-policy-variable gradient theorem, and a specific gradient ascent algorithm based on this theorem within the context of a specific control-theoretic framework. We empirically evaluate the performance of our control theoretic approach on several classical reinforcement learning tasks, demonstrating significant improvements in solution quality, sample complexity, and running time of our approach over state-of-the-art methods.
Related papers
- A Pontryagin Perspective on Reinforcement Learning [11.56175346731332]
We introduce the paradigm of open-loop reinforcement learning where a fixed action sequence is learned instead.
We present three new algorithms: one robust model-based method and two sample-efficient model-free methods.
arXiv Detail & Related papers (2024-05-28T12:05:20Z) - Distributional Bellman Operators over Mean Embeddings [37.5480897544168]
We propose a novel framework for distributional reinforcement learning, based on learning finite-dimensional mean embeddings of return distributions.
We derive several new algorithms for dynamic programming and temporal-difference learning based on this framework.
arXiv Detail & Related papers (2023-12-09T11:36:14Z) - Meta-Learning Strategies through Value Maximization in Neural Networks [7.285835869818669]
We present a learning effort framework capable of efficiently optimizing control signals on a fully normative objective.
We apply this framework to investigate the effect of approximations in common meta-learning algorithms.
Across settings, we find that control effort is most beneficial when applied to easier aspects of a task early in learning.
arXiv Detail & Related papers (2023-10-30T18:29:26Z) - Provable Reward-Agnostic Preference-Based Reinforcement Learning [61.39541986848391]
Preference-based Reinforcement Learning (PbRL) is a paradigm in which an RL agent learns to optimize a task using pair-wise preference-based feedback over trajectories.
We propose a theoretical reward-agnostic PbRL framework where exploratory trajectories that enable accurate learning of hidden reward functions are acquired.
arXiv Detail & Related papers (2023-05-29T15:00:09Z) - Guaranteed Conservation of Momentum for Learning Particle-based Fluid
Dynamics [96.9177297872723]
We present a novel method for guaranteeing linear momentum in learned physics simulations.
We enforce conservation of momentum with a hard constraint, which we realize via antisymmetrical continuous convolutional layers.
In combination, the proposed method allows us to increase the physical accuracy of the learned simulator substantially.
arXiv Detail & Related papers (2022-10-12T09:12:59Z) - Towards a Theoretical Foundation of Policy Optimization for Learning
Control Policies [26.04704565406123]
Gradient-based methods have been widely used for system design and optimization in diverse application domains.
Recently, there has been a renewed interest in studying theoretical properties of these methods in the context of control and reinforcement learning.
This article surveys some of the recent developments on policy optimization, a gradient-based iterative approach for feedback control synthesis.
arXiv Detail & Related papers (2022-10-10T16:13:34Z) - Generalized Policy Improvement Algorithms with Theoretically Supported Sample Reuse [15.134707391442236]
We develop a new class of model-free deep reinforcement learning algorithms for data-driven, learning-based control.
Our Generalized Policy Improvement algorithms combine the policy improvement guarantees of on-policy methods with the efficiency of sample reuse.
arXiv Detail & Related papers (2022-06-28T02:56:12Z) - FedControl: When Control Theory Meets Federated Learning [63.96013144017572]
We distinguish client contributions according to the performance of local learning and its evolution.
The technique is inspired from control theory and its classification performance is evaluated extensively in IID framework.
arXiv Detail & Related papers (2022-05-27T21:05:52Z) - Off-Policy Imitation Learning from Observations [78.30794935265425]
Learning from Observations (LfO) is a practical reinforcement learning scenario from which many applications can benefit.
We propose a sample-efficient LfO approach that enables off-policy optimization in a principled manner.
Our approach is comparable with state-of-the-art locomotion in terms of both sample-efficiency and performance.
arXiv Detail & Related papers (2021-02-25T21:33:47Z) - Logistic Q-Learning [87.00813469969167]
We propose a new reinforcement learning algorithm derived from a regularized linear-programming formulation of optimal control in MDPs.
The main feature of our algorithm is a convex loss function for policy evaluation that serves as a theoretically sound alternative to the widely used squared Bellman error.
arXiv Detail & Related papers (2020-10-21T17:14:31Z) - Control as Hybrid Inference [62.997667081978825]
We present an implementation of CHI which naturally mediates the balance between iterative and amortised inference.
We verify the scalability of our algorithm on a continuous control benchmark, demonstrating that it outperforms strong model-free and model-based baselines.
arXiv Detail & Related papers (2020-07-11T19:44:09Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.