A Contraction Approach to Model-based Reinforcement Learning
- URL: http://arxiv.org/abs/2009.08586v2
- Date: Thu, 25 Feb 2021 11:35:48 GMT
- Title: A Contraction Approach to Model-based Reinforcement Learning
- Authors: Ting-Han Fan, Peter J. Ramadge
- Abstract summary: We analyze the error in the cumulative reward using a contraction approach.
We prove that branched rollouts can reduce this error.
In this case, we show that GAN-type learning has an advantage over Behavioral Cloning when its discriminator is well-trained.
- Score: 11.701145942745274
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Despite its experimental success, Model-based Reinforcement Learning still
lacks a complete theoretical understanding. To this end, we analyze the error
in the cumulative reward using a contraction approach. We consider both
stochastic and deterministic state transitions for continuous (non-discrete)
state and action spaces. This approach doesn't require strong assumptions and
can recover the typical quadratic error to the horizon. We prove that branched
rollouts can reduce this error and are essential for deterministic transitions
to have a Bellman contraction. Our analysis of policy mismatch error also
applies to Imitation Learning. In this case, we show that GAN-type learning has
an advantage over Behavioral Cloning when its discriminator is well-trained.
Related papers
- On the Dynamics Under the Unhinged Loss and Beyond [104.49565602940699]
We introduce the unhinged loss, a concise loss function, that offers more mathematical opportunities to analyze closed-form dynamics.
The unhinged loss allows for considering more practical techniques, such as time-vary learning rates and feature normalization.
arXiv Detail & Related papers (2023-12-13T02:11:07Z) - Time-series Generation by Contrastive Imitation [87.51882102248395]
We study a generative framework that seeks to combine the strengths of both: Motivated by a moment-matching objective to mitigate compounding error, we optimize a local (but forward-looking) transition policy.
At inference, the learned policy serves as the generator for iterative sampling, and the learned energy serves as a trajectory-level measure for evaluating sample quality.
arXiv Detail & Related papers (2023-11-02T16:45:25Z) - STRAPPER: Preference-based Reinforcement Learning via Self-training
Augmentation and Peer Regularization [18.811470043767713]
Preference-based reinforcement learning (PbRL) promises to learn a complex reward function with binary human preference.
We present a self-training method along with our proposed peer regularization, which penalizes the reward model memorizing uninformative labels and acquires confident predictions.
arXiv Detail & Related papers (2023-07-19T00:31:58Z) - When No-Rejection Learning is Consistent for Regression with Rejection [11.244583592648443]
We study a no-reject learning strategy that uses all the data to learn the prediction.
This paper investigates a no-reject learning strategy that uses all the data to learn the prediction.
arXiv Detail & Related papers (2023-07-06T11:43:22Z) - Supervised learning with probabilistic morphisms and kernel mean
embeddings [0.0]
I propose a generative model of supervised learning that unifies two approaches to supervised learning.
I address two measurability problems, which have been ignored in statistical learning theory.
I present a variant of Vapnik-Stefanuyk's regularization method for solving ill-posed problems.
arXiv Detail & Related papers (2023-05-10T17:54:21Z) - Model-Based Uncertainty in Value Functions [89.31922008981735]
We focus on characterizing the variance over values induced by a distribution over MDPs.
Previous work upper bounds the posterior variance over values by solving a so-called uncertainty Bellman equation.
We propose a new uncertainty Bellman equation whose solution converges to the true posterior variance over values.
arXiv Detail & Related papers (2023-02-24T09:18:27Z) - The Implicit Delta Method [61.36121543728134]
In this paper, we propose an alternative, the implicit delta method, which works by infinitesimally regularizing the training loss of uncertainty.
We show that the change in the evaluation due to regularization is consistent for the variance of the evaluation estimator, even when the infinitesimal change is approximated by a finite difference.
arXiv Detail & Related papers (2022-11-11T19:34:17Z) - Imitating, Fast and Slow: Robust learning from demonstrations via
decision-time planning [96.72185761508668]
Planning at Test-time (IMPLANT) is a new meta-algorithm for imitation learning.
We demonstrate that IMPLANT significantly outperforms benchmark imitation learning approaches on standard control environments.
arXiv Detail & Related papers (2022-04-07T17:16:52Z) - Robustness and Accuracy Could Be Reconcilable by (Proper) Definition [109.62614226793833]
The trade-off between robustness and accuracy has been widely studied in the adversarial literature.
We find that it may stem from the improperly defined robust error, which imposes an inductive bias of local invariance.
By definition, SCORE facilitates the reconciliation between robustness and accuracy, while still handling the worst-case uncertainty.
arXiv Detail & Related papers (2022-02-21T10:36:09Z) - Robust Unsupervised Learning via L-Statistic Minimization [38.49191945141759]
We present a general approach to this problem focusing on unsupervised learning.
The key assumption is that the perturbing distribution is characterized by larger losses relative to a given class of admissible models.
We prove uniform convergence bounds with respect to the proposed criterion for several popular models in unsupervised learning.
arXiv Detail & Related papers (2020-12-14T10:36:06Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.