Leverage the Average: an Analysis of KL Regularization in RL
- URL: http://arxiv.org/abs/2003.14089v5
- Date: Wed, 6 Jan 2021 14:12:57 GMT
- Title: Leverage the Average: an Analysis of KL Regularization in RL
- Authors: Nino Vieillard, Tadashi Kozuno, Bruno Scherrer, Olivier Pietquin,
R\'emi Munos, Matthieu Geist
- Abstract summary: We show that Kullback-Leibler (KL) regularization implicitly averages q-values.
We provide a very strong performance bound, the very first to combine two desirable aspects.
Some of our assumptions do not hold with neural networks, so we complement this theoretical analysis with an extensive empirical study.
- Score: 44.01222241795292
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Recent Reinforcement Learning (RL) algorithms making use of Kullback-Leibler
(KL) regularization as a core component have shown outstanding performance.
Yet, only little is understood theoretically about why KL regularization helps,
so far. We study KL regularization within an approximate value iteration scheme
and show that it implicitly averages q-values. Leveraging this insight, we
provide a very strong performance bound, the very first to combine two
desirable aspects: a linear dependency to the horizon (instead of quadratic)
and an error propagation term involving an averaging effect of the estimation
errors (instead of an accumulation effect). We also study the more general case
of an additional entropy regularizer. The resulting abstract scheme encompasses
many existing RL algorithms. Some of our assumptions do not hold with neural
networks, so we complement this theoretical analysis with an extensive
empirical study.
Related papers
- Sharp Analysis for KL-Regularized Contextual Bandits and RLHF [52.519416266840814]
Reverse-Kullback-Leibler (KL) regularization has emerged to be a predominant technique used to enhance policy optimization in reinforcement learning.
We show that a simple two-stage mixed sampling strategy can achieve a sample complexity with only an additive dependence on the coverage coefficient.
Our results provide a comprehensive understanding of the roles of KL-regularization and data coverage in RLHF, shedding light on the design of more efficient RLHF algorithms.
arXiv Detail & Related papers (2024-11-07T11:22:46Z) - A Statistical Theory of Regularization-Based Continual Learning [10.899175512941053]
We provide a statistical analysis of regularization-based continual learning on a sequence of linear regression tasks.
We first derive the convergence rate for the oracle estimator obtained as if all data were available simultaneously.
A byproduct of our theoretical analysis is the equivalence between early stopping and generalized $ell$-regularization.
arXiv Detail & Related papers (2024-06-10T12:25:13Z) - Fine-grained analysis of non-parametric estimation for pairwise learning [9.676007573960383]
We are concerned with the generalization performance of non-parametric estimation for pairwise learning.
Our results can be used to handle a wide range of pairwise learning problems including ranking, AUC, pairwise regression and metric and similarity learning.
arXiv Detail & Related papers (2023-05-31T08:13:14Z) - Sparsest Univariate Learning Models Under Lipschitz Constraint [31.28451181040038]
We propose continuous-domain formulations for one-dimensional regression problems.
We control the Lipschitz constant explicitly using a user-defined upper-bound.
We show that both problems admit global minimizers that are continuous and piecewise-linear.
arXiv Detail & Related papers (2021-12-27T07:03:43Z) - Optimal policy evaluation using kernel-based temporal difference methods [78.83926562536791]
We use kernel Hilbert spaces for estimating the value function of an infinite-horizon discounted Markov reward process.
We derive a non-asymptotic upper bound on the error with explicit dependence on the eigenvalues of the associated kernel operator.
We prove minimax lower bounds over sub-classes of MRPs.
arXiv Detail & Related papers (2021-09-24T14:48:20Z) - Counterfactual Maximum Likelihood Estimation for Training Deep Networks [83.44219640437657]
Deep learning models are prone to learning spurious correlations that should not be learned as predictive clues.
We propose a causality-based training framework to reduce the spurious correlations caused by observable confounders.
We conduct experiments on two real-world tasks: Natural Language Inference (NLI) and Image Captioning.
arXiv Detail & Related papers (2021-06-07T17:47:16Z) - Benign Overfitting of Constant-Stepsize SGD for Linear Regression [122.70478935214128]
inductive biases are central in preventing overfitting empirically.
This work considers this issue in arguably the most basic setting: constant-stepsize SGD for linear regression.
We reflect on a number of notable differences between the algorithmic regularization afforded by (unregularized) SGD in comparison to ordinary least squares.
arXiv Detail & Related papers (2021-03-23T17:15:53Z) - Binary Classification of Gaussian Mixtures: Abundance of Support
Vectors, Benign Overfitting and Regularization [39.35822033674126]
We study binary linear classification under a generative Gaussian mixture model.
We derive novel non-asymptotic bounds on the classification error of the latter.
Our results extend to a noisy model with constant probability noise flips.
arXiv Detail & Related papers (2020-11-18T07:59:55Z) - Approximation Schemes for ReLU Regression [80.33702497406632]
We consider the fundamental problem of ReLU regression.
The goal is to output the best fitting ReLU with respect to square loss given to draws from some unknown distribution.
arXiv Detail & Related papers (2020-05-26T16:26:17Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.