Stochastic convex optimization for provably efficient apprenticeship
learning
- URL: http://arxiv.org/abs/2201.00039v1
- Date: Fri, 31 Dec 2021 19:47:57 GMT
- Title: Stochastic convex optimization for provably efficient apprenticeship
learning
- Authors: Angeliki Kamoutsi, Goran Banjac, and John Lygeros
- Abstract summary: We consider large-scale Markov decision processes (MDPs) with an unknown cost function.
We employ convex optimization tools to address the problem of imitation learning, which consists of learning a policy from a finite set of expert demonstrations.
- Score: 1.0609815608017066
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We consider large-scale Markov decision processes (MDPs) with an unknown cost
function and employ stochastic convex optimization tools to address the problem
of imitation learning, which consists of learning a policy from a finite set of
expert demonstrations.
We adopt the apprenticeship learning formalism, which carries the assumption
that the true cost function can be represented as a linear combination of some
known features. Existing inverse reinforcement learning algorithms come with
strong theoretical guarantees, but are computationally expensive because they
use reinforcement learning or planning algorithms as a subroutine. On the other
hand, state-of-the-art policy gradient based algorithms (like IM-REINFORCE,
IM-TRPO, and GAIL), achieve significant empirical success in challenging
benchmark tasks, but are not well understood in terms of theory. With an
emphasis on non-asymptotic guarantees of performance, we propose a method that
directly learns a policy from expert demonstrations, bypassing the intermediate
step of learning the cost function, by formulating the problem as a single
convex optimization problem over occupancy measures. We develop a
computationally efficient algorithm and derive high confidence regret bounds on
the quality of the extracted policy, utilizing results from stochastic convex
optimization and recent works in approximate linear programming for solving
forward MDPs.
Related papers
- Learning to Optimize with Stochastic Dominance Constraints [103.26714928625582]
In this paper, we develop a simple yet efficient approach for the problem of comparing uncertain quantities.
We recast inner optimization in the Lagrangian as a learning problem for surrogate approximation, which bypasses apparent intractability.
The proposed light-SD demonstrates superior performance on several representative problems ranging from finance to supply chain management.
arXiv Detail & Related papers (2022-11-14T21:54:31Z) - Making Linear MDPs Practical via Contrastive Representation Learning [101.75885788118131]
It is common to address the curse of dimensionality in Markov decision processes (MDPs) by exploiting low-rank representations.
We consider an alternative definition of linear MDPs that automatically ensures normalization while allowing efficient representation learning.
We demonstrate superior performance over existing state-of-the-art model-based and model-free algorithms on several benchmarks.
arXiv Detail & Related papers (2022-07-14T18:18:02Z) - Stabilizing Q-learning with Linear Architectures for Provably Efficient
Learning [53.17258888552998]
This work proposes an exploration variant of the basic $Q$-learning protocol with linear function approximation.
We show that the performance of the algorithm degrades very gracefully under a novel and more permissive notion of approximation error.
arXiv Detail & Related papers (2022-06-01T23:26:51Z) - Efficient Performance Bounds for Primal-Dual Reinforcement Learning from
Demonstrations [1.0609815608017066]
We consider large-scale Markov decision processes with an unknown cost function and address the problem of learning a policy from a finite set of expert demonstrations.
Existing inverse reinforcement learning methods come with strong theoretical guarantees, but are computationally expensive.
We introduce a novel bilinear saddle-point framework using Lagrangian duality to bridge the gap between theory and practice.
arXiv Detail & Related papers (2021-12-28T05:47:24Z) - A Boosting Approach to Reinforcement Learning [59.46285581748018]
We study efficient algorithms for reinforcement learning in decision processes whose complexity is independent of the number of states.
We give an efficient algorithm that is capable of improving the accuracy of such weak learning methods.
arXiv Detail & Related papers (2021-08-22T16:00:45Z) - Provably Correct Optimization and Exploration with Non-linear Policies [65.60853260886516]
ENIAC is an actor-critic method that allows non-linear function approximation in the critic.
We show that under certain assumptions, the learner finds a near-optimal policy in $O(poly(d))$ exploration rounds.
We empirically evaluate this adaptation and show that it outperforms priors inspired by linear methods.
arXiv Detail & Related papers (2021-03-22T03:16:33Z) - Logistic Q-Learning [87.00813469969167]
We propose a new reinforcement learning algorithm derived from a regularized linear-programming formulation of optimal control in MDPs.
The main feature of our algorithm is a convex loss function for policy evaluation that serves as a theoretically sound alternative to the widely used squared Bellman error.
arXiv Detail & Related papers (2020-10-21T17:14:31Z) - Adaptive Approximate Policy Iteration [22.915651391812187]
We present a learning scheme which enjoys a $tildeO(T2/3)$ regret bound for undiscounted, continuing learning in uniformly ergodic MDPs.
This is an improvement over the best existing bound of $tildeO(T3/4)$ for the average-reward case with function approximation.
arXiv Detail & Related papers (2020-02-08T02:27:03Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.