Regularization Guarantees Generalization in Bayesian Reinforcement
Learning through Algorithmic Stability
- URL: http://arxiv.org/abs/2109.11792v1
- Date: Fri, 24 Sep 2021 07:48:34 GMT
- Title: Regularization Guarantees Generalization in Bayesian Reinforcement
Learning through Algorithmic Stability
- Authors: Aviv Tamar, Daniel Soudry, Ev Zisselman
- Abstract summary: We study generalization in Bayesian RL under the probably approximately correct (PAC) framework.
Our main contribution is showing that by adding regularization, the optimal policy becomes stable in an appropriate sense.
- Score: 48.62272919754204
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In the Bayesian reinforcement learning (RL) setting, a prior distribution
over the unknown problem parameters -- the rewards and transitions -- is
assumed, and a policy that optimizes the (posterior) expected return is sought.
A common approximation, which has been recently popularized as meta-RL, is to
train the agent on a sample of $N$ problem instances from the prior, with the
hope that for large enough $N$, good generalization behavior to an unseen test
instance will be obtained. In this work, we study generalization in Bayesian RL
under the probably approximately correct (PAC) framework, using the method of
algorithmic stability. Our main contribution is showing that by adding
regularization, the optimal policy becomes stable in an appropriate sense. Most
stability results in the literature build on strong convexity of the
regularized loss -- an approach that is not suitable for RL as Markov decision
processes (MDPs) are not convex. Instead, building on recent results of fast
convergence rates for mirror descent in regularized MDPs, we show that
regularized MDPs satisfy a certain quadratic growth criterion, which is
sufficient to establish stability. This result, which may be of independent
interest, allows us to study the effect of regularization on generalization in
the Bayesian RL setting.
Related papers
- Deterministic Uncertainty Propagation for Improved Model-Based Offline Reinforcement Learning [12.490614705930676]
We present a theoretical result demonstrating the strong dependency of suboptimality on the number of Monte Carlo samples taken per Bellman target calculation.
Our main contribution is a deterministic approximation to the Bellman target that uses progressive moment matching.
We show that it is possible to provide tighter guarantees for the suboptimality of MOMBO than the existing Monte Carlo sampling approaches.
arXiv Detail & Related papers (2024-06-06T13:58:41Z) - Offline Bayesian Aleatoric and Epistemic Uncertainty Quantification and Posterior Value Optimisation in Finite-State MDPs [3.1139806580181006]
We address the challenge of quantifying Bayesian uncertainty in offline use cases of finite-state Markov Decision Processes (MDPs) with unknown dynamics.
We use standard Bayesian reinforcement learning methods to capture the posterior uncertainty in MDP parameters.
We then analytically compute the first two moments of the return distribution across posterior samples and apply the law of total variance.
We highlight the real-world impact and computational scalability of our method by applying it to the AI Clinician problem.
arXiv Detail & Related papers (2024-06-04T16:21:14Z) - Model-Based Epistemic Variance of Values for Risk-Aware Policy Optimization [59.758009422067]
We consider the problem of quantifying uncertainty over expected cumulative rewards in model-based reinforcement learning.
We propose a new uncertainty Bellman equation (UBE) whose solution converges to the true posterior variance over values.
We introduce a general-purpose policy optimization algorithm, Q-Uncertainty Soft Actor-Critic (QU-SAC) that can be applied for either risk-seeking or risk-averse policy optimization.
arXiv Detail & Related papers (2023-12-07T15:55:58Z) - Exploring the Algorithm-Dependent Generalization of AUPRC Optimization
with List Stability [107.65337427333064]
optimization of the Area Under the Precision-Recall Curve (AUPRC) is a crucial problem for machine learning.
In this work, we present the first trial in the single-dependent generalization of AUPRC optimization.
Experiments on three image retrieval datasets on speak to the effectiveness and soundness of our framework.
arXiv Detail & Related papers (2022-09-27T09:06:37Z) - Boosting the Confidence of Generalization for $L_2$-Stable Randomized
Learning Algorithms [41.082982732100696]
We show that a properly designed subbagging process leads to near-tight exponential generalization bounds over both data and algorithm.
We further derive generic results to improve high-probability generalization bounds for convex or non-tight problems with natural decaying learning rates.
arXiv Detail & Related papers (2022-06-08T12:14:01Z) - A general sample complexity analysis of vanilla policy gradient [101.16957584135767]
Policy gradient (PG) is one of the most popular reinforcement learning (RL) problems.
"vanilla" theoretical understanding of PG trajectory is one of the most popular methods for solving RL problems.
arXiv Detail & Related papers (2021-07-23T19:38:17Z) - Policy Mirror Descent for Regularized Reinforcement Learning: A
Generalized Framework with Linear Convergence [60.20076757208645]
This paper proposes a general policy mirror descent (GPMD) algorithm for solving regularized RL.
We demonstrate that our algorithm converges linearly over an entire range learning rates, in a dimension-free fashion, to the global solution.
arXiv Detail & Related papers (2021-05-24T02:21:34Z) - Variational Policy Gradient Method for Reinforcement Learning with
General Utilities [38.54243339632217]
In recent years, reinforcement learning systems with general goals beyond a cumulative sum of rewards have gained traction.
In this paper, we consider policy in Decision Problems, where the objective converges a general concave utility function.
We derive a new Variational Policy Gradient Theorem for RL with general utilities.
arXiv Detail & Related papers (2020-07-04T17:51:53Z) - Distributional Robustness and Regularization in Reinforcement Learning [62.23012916708608]
We introduce a new regularizer for empirical value functions and show that it lower bounds the Wasserstein distributionally robust value function.
It suggests using regularization as a practical tool for dealing with $textitexternal uncertainty$ in reinforcement learning.
arXiv Detail & Related papers (2020-03-05T19:56:23Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.