Related papers: A Pontryagin Perspective on Reinforcement Learning

A Pontryagin Perspective on Reinforcement Learning

URL: http://arxiv.org/abs/2405.18100v2
Date: Thu, 28 Nov 2024 19:13:52 GMT
Title: A Pontryagin Perspective on Reinforcement Learning
Authors: Onno Eberhard, Claire Vernade, Michael Muehlebach,
Abstract summary: We introduce the paradigm of open-loop reinforcement learning where a fixed action sequence is learned instead.<n>We present three new algorithms: one robust model-based method and two sample-efficient model-free methods.
Score: 11.56175346731332
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Reinforcement learning has traditionally focused on learning state-dependent policies to solve optimal control problems in a closed-loop fashion. In this work, we introduce the paradigm of open-loop reinforcement learning where a fixed action sequence is learned instead. We present three new algorithms: one robust model-based method and two sample-efficient model-free methods. Rather than basing our algorithms on Bellman's equation from dynamic programming, our work builds on Pontryagin's principle from the theory of open-loop optimal control. We provide convergence guarantees and evaluate all methods empirically on a pendulum swing-up task, as well as on two high-dimensional MuJoCo tasks, significantly outperforming existing baselines.

Related papers

Deep Learning for Continuous-time Stochastic Control with Jumps [1.6112718683989882]
We introduce a model-based deep-learning approach to solve finite-horizon continuous-time control problems with jumps.<n>We iteratively train two neural networks: one to represent the optimal policy and the other to approximate the value function.
arXiv Detail & Related papers (2025-05-21T14:57:39Z)
A General Control-Theoretic Approach for Reinforcement Learning: Theory and Algorithms [7.081523472610874]
We devise a control-theoretic reinforcement learning approach to support direct learning of the optimal policy. We empirically evaluate our approach on several classical reinforcement learning tasks.
arXiv Detail & Related papers (2024-06-20T21:50:46Z)
Generalized Policy Improvement Algorithms with Theoretically Supported Sample Reuse [15.134707391442236]
We develop a new class of model-free deep reinforcement learning algorithms for data-driven, learning-based control. Our Generalized Policy Improvement algorithms combine the policy improvement guarantees of on-policy methods with the efficiency of sample reuse.
arXiv Detail & Related papers (2022-06-28T02:56:12Z)
Efficient Performance Bounds for Primal-Dual Reinforcement Learning from Demonstrations [1.0609815608017066]
We consider large-scale Markov decision processes with an unknown cost function and address the problem of learning a policy from a finite set of expert demonstrations. Existing inverse reinforcement learning methods come with strong theoretical guarantees, but are computationally expensive. We introduce a novel bilinear saddle-point framework using Lagrangian duality to bridge the gap between theory and practice.
arXiv Detail & Related papers (2021-12-28T05:47:24Z)
State Augmented Constrained Reinforcement Learning: Overcoming the Limitations of Learning with Rewards [88.30521204048551]
A common formulation of constrained reinforcement learning involves multiple rewards that must individually accumulate to given thresholds. We show a simple example in which the desired optimal policy cannot be induced by any weighted linear combination of rewards. This work addresses this shortcoming by augmenting the state with Lagrange multipliers and reinterpreting primal-dual methods.
arXiv Detail & Related papers (2021-02-23T21:07:35Z)
Evolving Reinforcement Learning Algorithms [186.62294652057062]
We propose a method for meta-learning reinforcement learning algorithms. The learned algorithms are domain-agnostic and can generalize to new environments not seen during training. We highlight two learned algorithms which obtain good generalization performance over other classical control tasks, gridworld type tasks, and Atari games.
arXiv Detail & Related papers (2021-01-08T18:55:07Z)
Logistic Q-Learning [87.00813469969167]
We propose a new reinforcement learning algorithm derived from a regularized linear-programming formulation of optimal control in MDPs. The main feature of our algorithm is a convex loss function for policy evaluation that serves as a theoretically sound alternative to the widely used squared Bellman error.
arXiv Detail & Related papers (2020-10-21T17:14:31Z)
Control as Hybrid Inference [62.997667081978825]
We present an implementation of CHI which naturally mediates the balance between iterative and amortised inference. We verify the scalability of our algorithm on a continuous control benchmark, demonstrating that it outperforms strong model-free and model-based baselines.
arXiv Detail & Related papers (2020-07-11T19:44:09Z)
SUNRISE: A Simple Unified Framework for Ensemble Learning in Deep Reinforcement Learning [102.78958681141577]
We present SUNRISE, a simple unified ensemble method, which is compatible with various off-policy deep reinforcement learning algorithms. SUNRISE integrates two key ingredients: (a) ensemble-based weighted Bellman backups, which re-weight target Q-values based on uncertainty estimates from a Q-ensemble, and (b) an inference method that selects actions using the highest upper-confidence bounds for efficient exploration.
arXiv Detail & Related papers (2020-07-09T17:08:44Z)
Meta-learning with Stochastic Linear Bandits [120.43000970418939]
We consider a class of bandit algorithms that implement a regularized version of the well-known OFUL algorithm, where the regularization is a square euclidean distance to a bias vector. We show both theoretically and experimentally, that when the number of tasks grows and the variance of the task-distribution is small, our strategies have a significant advantage over learning the tasks in isolation.
arXiv Detail & Related papers (2020-05-18T08:41:39Z)
Model-Augmented Actor-Critic: Backpropagating through Paths [81.86992776864729]
Current model-based reinforcement learning approaches use the model simply as a learned black-box simulator. We show how to make more effective use of the model by exploiting its differentiability.
arXiv Detail & Related papers (2020-05-16T19:18:10Z)

This list is automatically generated from the titles and abstracts of the papers in this site.