Related papers: A Contraction Approach to Model-based Reinforcement Learning

A Contraction Approach to Model-based Reinforcement Learning

URL: http://arxiv.org/abs/2009.08586v2
Date: Thu, 25 Feb 2021 11:35:48 GMT
Title: A Contraction Approach to Model-based Reinforcement Learning
Authors: Ting-Han Fan, Peter J. Ramadge
Abstract summary: We analyze the error in the cumulative reward using a contraction approach. We prove that branched rollouts can reduce this error. In this case, we show that GAN-type learning has an advantage over Behavioral Cloning when its discriminator is well-trained.
Score: 11.701145942745274
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Despite its experimental success, Model-based Reinforcement Learning still lacks a complete theoretical understanding. To this end, we analyze the error in the cumulative reward using a contraction approach. We consider both stochastic and deterministic state transitions for continuous (non-discrete) state and action spaces. This approach doesn't require strong assumptions and can recover the typical quadratic error to the horizon. We prove that branched rollouts can reduce this error and are essential for deterministic transitions to have a Bellman contraction. Our analysis of policy mismatch error also applies to Imitation Learning. In this case, we show that GAN-type learning has an advantage over Behavioral Cloning when its discriminator is well-trained.

Related papers

Diffusing States and Matching Scores: A New Framework for Imitation Learning [16.941612670582522]
Adversarial Imitation Learning is traditionally framed as a two-player zero-sum game between a learner and an adversarially chosen cost function. diffusion models have emerged as a non-adversarial alternative to GANs that merely require training a score function via regression. We show our approach outperforms both GAN-style imitation learning baselines and discriminator-free imitation learning baselines across various continuous control problems.
arXiv Detail & Related papers (2024-10-17T17:59:25Z)
On the Dynamics Under the Unhinged Loss and Beyond [104.49565602940699]
We introduce the unhinged loss, a concise loss function, that offers more mathematical opportunities to analyze closed-form dynamics. The unhinged loss allows for considering more practical techniques, such as time-vary learning rates and feature normalization.
arXiv Detail & Related papers (2023-12-13T02:11:07Z)
Time-series Generation by Contrastive Imitation [87.51882102248395]
We study a generative framework that seeks to combine the strengths of both: Motivated by a moment-matching objective to mitigate compounding error, we optimize a local (but forward-looking) transition policy. At inference, the learned policy serves as the generator for iterative sampling, and the learned energy serves as a trajectory-level measure for evaluating sample quality.
arXiv Detail & Related papers (2023-11-02T16:45:25Z)
STRAPPER: Preference-based Reinforcement Learning via Self-training Augmentation and Peer Regularization [18.811470043767713]
Preference-based reinforcement learning (PbRL) promises to learn a complex reward function with binary human preference. We present a self-training method along with our proposed peer regularization, which penalizes the reward model memorizing uninformative labels and acquires confident predictions.
arXiv Detail & Related papers (2023-07-19T00:31:58Z)
When No-Rejection Learning is Consistent for Regression with Rejection [11.244583592648443]
We study a no-reject learning strategy that uses all the data to learn the prediction. This paper investigates a no-reject learning strategy that uses all the data to learn the prediction.
arXiv Detail & Related papers (2023-07-06T11:43:22Z)
Supervised learning with probabilistic morphisms and kernel mean embeddings [0.0]
I propose a generative model of supervised learning that unifies two approaches to supervised learning. I address two measurability problems, which have been ignored in statistical learning theory. I present a variant of Vapnik-Stefanuyk's regularization method for solving ill-posed problems.
arXiv Detail & Related papers (2023-05-10T17:54:21Z)
Model-Based Uncertainty in Value Functions [89.31922008981735]
We focus on characterizing the variance over values induced by a distribution over MDPs. Previous work upper bounds the posterior variance over values by solving a so-called uncertainty Bellman equation. We propose a new uncertainty Bellman equation whose solution converges to the true posterior variance over values.
arXiv Detail & Related papers (2023-02-24T09:18:27Z)
The Implicit Delta Method [61.36121543728134]
In this paper, we propose an alternative, the implicit delta method, which works by infinitesimally regularizing the training loss of uncertainty. We show that the change in the evaluation due to regularization is consistent for the variance of the evaluation estimator, even when the infinitesimal change is approximated by a finite difference.
arXiv Detail & Related papers (2022-11-11T19:34:17Z)
Imitating, Fast and Slow: Robust learning from demonstrations via decision-time planning [96.72185761508668]
Planning at Test-time (IMPLANT) is a new meta-algorithm for imitation learning. We demonstrate that IMPLANT significantly outperforms benchmark imitation learning approaches on standard control environments.
arXiv Detail & Related papers (2022-04-07T17:16:52Z)
Robustness and Accuracy Could Be Reconcilable by (Proper) Definition [109.62614226793833]
The trade-off between robustness and accuracy has been widely studied in the adversarial literature. We find that it may stem from the improperly defined robust error, which imposes an inductive bias of local invariance. By definition, SCORE facilitates the reconciliation between robustness and accuracy, while still handling the worst-case uncertainty.
arXiv Detail & Related papers (2022-02-21T10:36:09Z)
Robust Unsupervised Learning via L-Statistic Minimization [38.49191945141759]
We present a general approach to this problem focusing on unsupervised learning. The key assumption is that the perturbing distribution is characterized by larger losses relative to a given class of admissible models. We prove uniform convergence bounds with respect to the proposed criterion for several popular models in unsupervised learning.
arXiv Detail & Related papers (2020-12-14T10:36:06Z)

This list is automatically generated from the titles and abstracts of the papers in this site.