Related papers: Regularization Guarantees Generalization in Bayesian Reinforcement Learning through Algorithmic Stability

Regularization Guarantees Generalization in Bayesian Reinforcement Learning through Algorithmic Stability

URL: http://arxiv.org/abs/2109.11792v1
Date: Fri, 24 Sep 2021 07:48:34 GMT
Title: Regularization Guarantees Generalization in Bayesian Reinforcement Learning through Algorithmic Stability
Authors: Aviv Tamar, Daniel Soudry, Ev Zisselman
Abstract summary: We study generalization in Bayesian RL under the probably approximately correct (PAC) framework. Our main contribution is showing that by adding regularization, the optimal policy becomes stable in an appropriate sense.
Score: 48.62272919754204
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: In the Bayesian reinforcement learning (RL) setting, a prior distribution over the unknown problem parameters -- the rewards and transitions -- is assumed, and a policy that optimizes the (posterior) expected return is sought. A common approximation, which has been recently popularized as meta-RL, is to train the agent on a sample of $N$ problem instances from the prior, with the hope that for large enough $N$, good generalization behavior to an unseen test instance will be obtained. In this work, we study generalization in Bayesian RL under the probably approximately correct (PAC) framework, using the method of algorithmic stability. Our main contribution is showing that by adding regularization, the optimal policy becomes stable in an appropriate sense. Most stability results in the literature build on strong convexity of the regularized loss -- an approach that is not suitable for RL as Markov decision processes (MDPs) are not convex. Instead, building on recent results of fast convergence rates for mirror descent in regularized MDPs, we show that regularized MDPs satisfy a certain quadratic growth criterion, which is sufficient to establish stability. This result, which may be of independent interest, allows us to study the effect of regularization on generalization in the Bayesian RL setting.

Related papers

StaQ it! Growing neural networks for Policy Mirror Descent [4.672862669694739]
In Reinforcement Learning (RL), regularization has emerged as a popular tool both in theory and practice.<n>We propose and analyze PMD-like algorithms that only keep the last $M$ Q-functions in memory.<n>We show that for finite and large enough $M$, a convergent algorithm can be derived, introducing no error in the policy update.
arXiv Detail & Related papers (2025-06-16T18:00:01Z)
The Power of Perturbation under Sampling in Solving Extensive-Form Games [56.013335390600524]
This paper investigates how perturbation does and does not improve the Follow-the-Regularized-Leader (FTRL) algorithm in imperfect-information extensive-form games. Perturbing the expected payoffs guarantees that the FTRL dynamics reach an approximate equilibrium. We show that in the last-iterate sense, the FTRL consistently outperforms the non-samplinged FTRL.
arXiv Detail & Related papers (2025-01-28T00:29:38Z)
Benign Overfitting in Out-of-Distribution Generalization of Linear Models [19.203753135860016]
We take an initial step towards understanding benign overfitting in the Out-of-Distribution (OOD) regime. We provide non-asymptotic guarantees proving that benign overfitting occurs in standard ridge regression. We also present theoretical results for a more general family of target covariance matrix.
arXiv Detail & Related papers (2024-12-19T02:47:39Z)
Deterministic Uncertainty Propagation for Improved Model-Based Offline Reinforcement Learning [12.490614705930676]
We present a theoretical result demonstrating the strong dependency of suboptimality on the number of Monte Carlo samples taken per Bellman target calculation. Our main contribution is a deterministic approximation to the Bellman target that uses progressive moment matching. We show that it is possible to provide tighter guarantees for the suboptimality of MOMBO than the existing Monte Carlo sampling approaches.
arXiv Detail & Related papers (2024-06-06T13:58:41Z)
Offline Bayesian Aleatoric and Epistemic Uncertainty Quantification and Posterior Value Optimisation in Finite-State MDPs [3.1139806580181006]
We address the challenge of quantifying Bayesian uncertainty in offline use cases of finite-state Markov Decision Processes (MDPs) with unknown dynamics. We use standard Bayesian reinforcement learning methods to capture the posterior uncertainty in MDP parameters. We then analytically compute the first two moments of the return distribution across posterior samples and apply the law of total variance. We highlight the real-world impact and computational scalability of our method by applying it to the AI Clinician problem.
arXiv Detail & Related papers (2024-06-04T16:21:14Z)
Model-Based Epistemic Variance of Values for Risk-Aware Policy Optimization [59.758009422067]
We consider the problem of quantifying uncertainty over expected cumulative rewards in model-based reinforcement learning. We propose a new uncertainty Bellman equation (UBE) whose solution converges to the true posterior variance over values. We introduce a general-purpose policy optimization algorithm, Q-Uncertainty Soft Actor-Critic (QU-SAC) that can be applied for either risk-seeking or risk-averse policy optimization.
arXiv Detail & Related papers (2023-12-07T15:55:58Z)
Exploring the Algorithm-Dependent Generalization of AUPRC Optimization with List Stability [107.65337427333064]
optimization of the Area Under the Precision-Recall Curve (AUPRC) is a crucial problem for machine learning. In this work, we present the first trial in the single-dependent generalization of AUPRC optimization. Experiments on three image retrieval datasets on speak to the effectiveness and soundness of our framework.
arXiv Detail & Related papers (2022-09-27T09:06:37Z)
Boosting the Confidence of Generalization for $L_2$-Stable Randomized Learning Algorithms [41.082982732100696]
We show that a properly designed subbagging process leads to near-tight exponential generalization bounds over both data and algorithm. We further derive generic results to improve high-probability generalization bounds for convex or non-tight problems with natural decaying learning rates.
arXiv Detail & Related papers (2022-06-08T12:14:01Z)
A general sample complexity analysis of vanilla policy gradient [101.16957584135767]
Policy gradient (PG) is one of the most popular reinforcement learning (RL) problems. "vanilla" theoretical understanding of PG trajectory is one of the most popular methods for solving RL problems.
arXiv Detail & Related papers (2021-07-23T19:38:17Z)
Policy Mirror Descent for Regularized Reinforcement Learning: A Generalized Framework with Linear Convergence [60.20076757208645]
This paper proposes a general policy mirror descent (GPMD) algorithm for solving regularized RL. We demonstrate that our algorithm converges linearly over an entire range learning rates, in a dimension-free fashion, to the global solution.
arXiv Detail & Related papers (2021-05-24T02:21:34Z)
Variational Policy Gradient Method for Reinforcement Learning with General Utilities [38.54243339632217]
In recent years, reinforcement learning systems with general goals beyond a cumulative sum of rewards have gained traction. In this paper, we consider policy in Decision Problems, where the objective converges a general concave utility function. We derive a new Variational Policy Gradient Theorem for RL with general utilities.
arXiv Detail & Related papers (2020-07-04T17:51:53Z)
Distributional Robustness and Regularization in Reinforcement Learning [62.23012916708608]
We introduce a new regularizer for empirical value functions and show that it lower bounds the Wasserstein distributionally robust value function. It suggests using regularization as a practical tool for dealing with $textitexternal uncertainty$ in reinforcement learning.
arXiv Detail & Related papers (2020-03-05T19:56:23Z)

This list is automatically generated from the titles and abstracts of the papers in this site.