Efficient Performance Bounds for Primal-Dual Reinforcement Learning from
Demonstrations
- URL: http://arxiv.org/abs/2112.14004v1
- Date: Tue, 28 Dec 2021 05:47:24 GMT
- Title: Efficient Performance Bounds for Primal-Dual Reinforcement Learning from
Demonstrations
- Authors: Angeliki Kamoutsi, Goran Banjac, and John Lygeros
- Abstract summary: We consider large-scale Markov decision processes with an unknown cost function and address the problem of learning a policy from a finite set of expert demonstrations.
Existing inverse reinforcement learning methods come with strong theoretical guarantees, but are computationally expensive.
We introduce a novel bilinear saddle-point framework using Lagrangian duality to bridge the gap between theory and practice.
- Score: 1.0609815608017066
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We consider large-scale Markov decision processes with an unknown cost
function and address the problem of learning a policy from a finite set of
expert demonstrations. We assume that the learner is not allowed to interact
with the expert and has no access to reinforcement signal of any kind. Existing
inverse reinforcement learning methods come with strong theoretical guarantees,
but are computationally expensive, while state-of-the-art policy optimization
algorithms achieve significant empirical success, but are hampered by limited
theoretical understanding. To bridge the gap between theory and practice, we
introduce a novel bilinear saddle-point framework using Lagrangian duality. The
proposed primal-dual viewpoint allows us to develop a model-free provably
efficient algorithm through the lens of stochastic convex optimization. The
method enjoys the advantages of simplicity of implementation, low memory
requirements, and computational and sample complexities independent of the
number of states. We further present an equivalent no-regret online-learning
interpretation.
Related papers
- Near-Optimal Solutions of Constrained Learning Problems [85.48853063302764]
In machine learning systems, the need to curtail their behavior has become increasingly apparent.
This is evidenced by recent advancements towards developing models that satisfy dual robustness variables.
Our results show that rich parametrizations effectively mitigate non-dimensional, finite learning problems.
arXiv Detail & Related papers (2024-03-18T14:55:45Z) - Efficient Alternating Minimization Solvers for Wyner Multi-View
Unsupervised Learning [0.0]
We propose two novel formulations that enable the development of computational efficient solvers based the alternating principle.
The proposed solvers offer computational efficiency, theoretical convergence guarantees, local minima complexity with the number of views, and exceptional accuracy as compared with the state-of-the-art techniques.
arXiv Detail & Related papers (2023-03-28T10:17:51Z) - Making Linear MDPs Practical via Contrastive Representation Learning [101.75885788118131]
It is common to address the curse of dimensionality in Markov decision processes (MDPs) by exploiting low-rank representations.
We consider an alternative definition of linear MDPs that automatically ensures normalization while allowing efficient representation learning.
We demonstrate superior performance over existing state-of-the-art model-based and model-free algorithms on several benchmarks.
arXiv Detail & Related papers (2022-07-14T18:18:02Z) - Stabilizing Q-learning with Linear Architectures for Provably Efficient
Learning [53.17258888552998]
This work proposes an exploration variant of the basic $Q$-learning protocol with linear function approximation.
We show that the performance of the algorithm degrades very gracefully under a novel and more permissive notion of approximation error.
arXiv Detail & Related papers (2022-06-01T23:26:51Z) - Stochastic convex optimization for provably efficient apprenticeship
learning [1.0609815608017066]
We consider large-scale Markov decision processes (MDPs) with an unknown cost function.
We employ convex optimization tools to address the problem of imitation learning, which consists of learning a policy from a finite set of expert demonstrations.
arXiv Detail & Related papers (2021-12-31T19:47:57Z) - Adversarial Robustness with Semi-Infinite Constrained Learning [177.42714838799924]
Deep learning to inputs perturbations has raised serious questions about its use in safety-critical domains.
We propose a hybrid Langevin Monte Carlo training approach to mitigate this issue.
We show that our approach can mitigate the trade-off between state-of-the-art performance and robust robustness.
arXiv Detail & Related papers (2021-10-29T13:30:42Z) - A Boosting Approach to Reinforcement Learning [59.46285581748018]
We study efficient algorithms for reinforcement learning in decision processes whose complexity is independent of the number of states.
We give an efficient algorithm that is capable of improving the accuracy of such weak learning methods.
arXiv Detail & Related papers (2021-08-22T16:00:45Z) - Learning Implicitly with Noisy Data in Linear Arithmetic [94.66549436482306]
We extend implicit learning in PAC-Semantics to handle intervals and threshold uncertainty in the language of linear arithmetic.
We show that our implicit approach to learning optimal linear programming objective constraints significantly outperforms an explicit approach in practice.
arXiv Detail & Related papers (2020-10-23T19:08:46Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.