Related papers: A General Control-Theoretic Approach for Reinforcement Learning: Theory and Algorithms

Related papers

The Dirac--Bergmann approach to optimal control theory [0.0]
We present a novel framework for optimal control in both classical and quantum systems.<n>In contrast to the standard Pontryagin principle, which is used in control theory, our approach bypasses the need to perform a variation to obtain the optimal solution.
arXiv Detail & Related papers (2025-06-21T06:23:22Z)
Policy Optimization Algorithms in a Unified Framework [7.942953533690871]
Generalized ergodicity theory sheds light on the steady-state behavior of processes. Perturbation analysis provides insights into the fundamental principles of policy optimization algorithms. We aim to make policy optimization algorithms more accessible and reduce their misuse in practice.
arXiv Detail & Related papers (2025-04-04T10:14:01Z)
RL-finetuning LLMs from on- and off-policy data with a single algorithm [53.70731390624718]
We introduce a novel reinforcement learning algorithm (AGRO) for fine-tuning large-language models. AGRO leverages the concept of generation consistency, which states that the optimal policy satisfies the notion of consistency across any possible generation of the model. We derive algorithms that find optimal solutions via the sample-based policy gradient and provide theoretical guarantees on their convergence.
arXiv Detail & Related papers (2025-03-25T12:52:38Z)
A Pontryagin Perspective on Reinforcement Learning [11.56175346731332]
We introduce the paradigm of open-loop reinforcement learning where a fixed action sequence is learned instead. We present three new algorithms: one robust model-based method and two sample-efficient model-free methods.
arXiv Detail & Related papers (2024-05-28T12:05:20Z)
Iterative Preference Learning from Human Feedback: Bridging Theory and Practice for RLHF under KL-Constraint [56.74058752955209]
This paper studies the alignment process of generative models with Reinforcement Learning from Human Feedback (RLHF) We first identify the primary challenges of existing popular methods like offline PPO and offline DPO as lacking in strategical exploration of the environment. We propose efficient algorithms with finite-sample theoretical guarantees.
arXiv Detail & Related papers (2023-12-18T18:58:42Z)
Distributional Bellman Operators over Mean Embeddings [37.5480897544168]
We propose a novel framework for distributional reinforcement learning, based on learning finite-dimensional mean embeddings of return distributions. We derive several new algorithms for dynamic programming and temporal-difference learning based on this framework.
arXiv Detail & Related papers (2023-12-09T11:36:14Z)
Provable Reward-Agnostic Preference-Based Reinforcement Learning [61.39541986848391]
Preference-based Reinforcement Learning (PbRL) is a paradigm in which an RL agent learns to optimize a task using pair-wise preference-based feedback over trajectories. We propose a theoretical reward-agnostic PbRL framework where exploratory trajectories that enable accurate learning of hidden reward functions are acquired.
arXiv Detail & Related papers (2023-05-29T15:00:09Z)
Guaranteed Conservation of Momentum for Learning Particle-based Fluid Dynamics [96.9177297872723]
We present a novel method for guaranteeing linear momentum in learned physics simulations. We enforce conservation of momentum with a hard constraint, which we realize via antisymmetrical continuous convolutional layers. In combination, the proposed method allows us to increase the physical accuracy of the learned simulator substantially.
arXiv Detail & Related papers (2022-10-12T09:12:59Z)
Towards a Theoretical Foundation of Policy Optimization for Learning Control Policies [26.04704565406123]
Gradient-based methods have been widely used for system design and optimization in diverse application domains. Recently, there has been a renewed interest in studying theoretical properties of these methods in the context of control and reinforcement learning. This article surveys some of the recent developments on policy optimization, a gradient-based iterative approach for feedback control synthesis.
arXiv Detail & Related papers (2022-10-10T16:13:34Z)
Generalized Policy Improvement Algorithms with Theoretically Supported Sample Reuse [15.134707391442236]
We develop a new class of model-free deep reinforcement learning algorithms for data-driven, learning-based control. Our Generalized Policy Improvement algorithms combine the policy improvement guarantees of on-policy methods with the efficiency of sample reuse.
arXiv Detail & Related papers (2022-06-28T02:56:12Z)
FedControl: When Control Theory Meets Federated Learning [63.96013144017572]
We distinguish client contributions according to the performance of local learning and its evolution. The technique is inspired from control theory and its classification performance is evaluated extensively in IID framework.
arXiv Detail & Related papers (2022-05-27T21:05:52Z)
Lessons from AlphaZero for Optimal, Model Predictive, and Adaptive Control [0.0]
We show that the principal AlphaZero/TDGammon ideas of approximation in value space and rollout apply very broadly to deterministic and optimal control problems. These ideas can be effectively integrated with other important methodologies such as model control, adaptive control, decentralized control, and neural network-based value and policy approximations.
arXiv Detail & Related papers (2021-08-20T19:17:35Z)
Off-Policy Imitation Learning from Observations [78.30794935265425]
Learning from Observations (LfO) is a practical reinforcement learning scenario from which many applications can benefit. We propose a sample-efficient LfO approach that enables off-policy optimization in a principled manner. Our approach is comparable with state-of-the-art locomotion in terms of both sample-efficiency and performance.
arXiv Detail & Related papers (2021-02-25T21:33:47Z)
Optimal Energy Shaping via Neural Approximators [16.879710744315233]
We introduce optimal energy shaping as an enhancement of classical passivity-based control methods. A systematic approach to adjust performance within a passive control framework has yet to be developed.
arXiv Detail & Related papers (2021-01-14T10:25:58Z)
Logistic Q-Learning [87.00813469969167]
We propose a new reinforcement learning algorithm derived from a regularized linear-programming formulation of optimal control in MDPs. The main feature of our algorithm is a convex loss function for policy evaluation that serves as a theoretically sound alternative to the widely used squared Bellman error.
arXiv Detail & Related papers (2020-10-21T17:14:31Z)
Control as Hybrid Inference [62.997667081978825]
We present an implementation of CHI which naturally mediates the balance between iterative and amortised inference. We verify the scalability of our algorithm on a continuous control benchmark, demonstrating that it outperforms strong model-free and model-based baselines.
arXiv Detail & Related papers (2020-07-11T19:44:09Z)

This list is automatically generated from the titles and abstracts of the papers in this site.