Proximal Point Imitation Learning
- URL: http://arxiv.org/abs/2209.10968v3
- Date: Tue, 30 May 2023 15:13:58 GMT
- Title: Proximal Point Imitation Learning
- Authors: Luca Viano and Angeliki Kamoutsi and Gergely Neu and Igor Krawczuk and
Volkan Cevher
- Abstract summary: We develop new algorithms with rigorous efficiency guarantees for infinite horizon imitation learning.
We leverage classical tools from optimization, in particular, the proximal-point method (PPM) and dual smoothing.
We achieve convincing empirical performance for both linear and neural network function approximation.
- Score: 48.50107891696562
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: This work develops new algorithms with rigorous efficiency guarantees for
infinite horizon imitation learning (IL) with linear function approximation
without restrictive coherence assumptions. We begin with the minimax
formulation of the problem and then outline how to leverage classical tools
from optimization, in particular, the proximal-point method (PPM) and dual
smoothing, for online and offline IL, respectively. Thanks to PPM, we avoid
nested policy evaluation and cost updates for online IL appearing in the prior
literature. In particular, we do away with the conventional alternating updates
by the optimization of a single convex and smooth objective over both cost and
Q-functions. When solved inexactly, we relate the optimization errors to the
suboptimality of the recovered policy. As an added bonus, by re-interpreting
PPM as dual smoothing with the expert policy as a center point, we also obtain
an offline IL algorithm enjoying theoretical guarantees in terms of required
expert trajectories. Finally, we achieve convincing empirical performance for
both linear and neural network function approximation.
Related papers
- Optimal DLT-based Solutions for the Perspective-n-Point [0.0]
We propose a modified direct linear (DLT) algorithm for solving the perspective-n-point (Newton)
The modification consists analytically weighting the different measurements in the linear system with a negligible increase in computational load.
Our approach clears improvements in both performance and runtime.
arXiv Detail & Related papers (2024-10-18T04:04:58Z) - Correcting the Mythos of KL-Regularization: Direct Alignment without Overoptimization via Chi-Squared Preference Optimization [78.82586283794886]
We present a new offline alignment algorithm, $chi2$-Preference Optimization ($chi$PO)
$chi$PO implements the principle of pessimism in the face of uncertainty via regularization.
It is provably robust to overoptimization and achieves sample-complexity guarantees based on single-policy concentrability.
arXiv Detail & Related papers (2024-07-18T11:08:40Z) - Offline Policy Optimization in RL with Variance Regularizaton [142.87345258222942]
We propose variance regularization for offline RL algorithms, using stationary distribution corrections.
We show that by using Fenchel duality, we can avoid double sampling issues for computing the gradient of the variance regularizer.
The proposed algorithm for offline variance regularization (OVAR) can be used to augment any existing offline policy optimization algorithms.
arXiv Detail & Related papers (2022-12-29T18:25:01Z) - Maximum-Likelihood Inverse Reinforcement Learning with Finite-Time
Guarantees [56.848265937921354]
Inverse reinforcement learning (IRL) aims to recover the reward function and the associated optimal policy.
Many algorithms for IRL have an inherently nested structure.
We develop a novel single-loop algorithm for IRL that does not compromise reward estimation accuracy.
arXiv Detail & Related papers (2022-10-04T17:13:45Z) - Offline Neural Contextual Bandits: Pessimism, Optimization and
Generalization [42.865641215856925]
We propose a provably efficient offline contextual bandit with neural network function approximation.
We show that our method generalizes over unseen contexts under a milder condition for distributional shift than the existing OPL works.
We also demonstrate the empirical effectiveness of our method in a range of synthetic and real-world OPL problems.
arXiv Detail & Related papers (2021-11-27T03:57:13Z) - Optimistic Policy Optimization is Provably Efficient in Non-stationary
MDPs [45.6318149525364]
We study episodic reinforcement learning (RL) in non-stationary linear kernel Markov decision processes (MDPs)
We propose the $underlinetextp$eriodically $underlinetextr$estarted $underlinetexto$ptimistic $underlinetextp$olicy $underlinetexto$ptimization algorithm (PROPO)
arXiv Detail & Related papers (2021-10-18T02:33:20Z) - Provably Correct Optimization and Exploration with Non-linear Policies [65.60853260886516]
ENIAC is an actor-critic method that allows non-linear function approximation in the critic.
We show that under certain assumptions, the learner finds a near-optimal policy in $O(poly(d))$ exploration rounds.
We empirically evaluate this adaptation and show that it outperforms priors inspired by linear methods.
arXiv Detail & Related papers (2021-03-22T03:16:33Z) - Logistic Q-Learning [87.00813469969167]
We propose a new reinforcement learning algorithm derived from a regularized linear-programming formulation of optimal control in MDPs.
The main feature of our algorithm is a convex loss function for policy evaluation that serves as a theoretically sound alternative to the widely used squared Bellman error.
arXiv Detail & Related papers (2020-10-21T17:14:31Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.