Related papers: Value-Distributional Model-Based Reinforcement Learning

Value-Distributional Model-Based Reinforcement Learning

URL: http://arxiv.org/abs/2308.06590v1
Date: Sat, 12 Aug 2023 14:59:19 GMT
Title: Value-Distributional Model-Based Reinforcement Learning
Authors: Carlos E. Luis, Alessandro G. Bottero, Julia Vinogradska, Felix Berkenkamp, Jan Peters
Abstract summary: Quantifying uncertainty about a policy's long-term performance is important to solve sequential decision-making tasks. We study the problem from a model-based Bayesian reinforcement learning perspective. We propose Epistemic Quantile-Regression (EQR), a model-based algorithm that learns a value distribution function.
Score: 63.32053223422317
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Quantifying uncertainty about a policy's long-term performance is important to solve sequential decision-making tasks. We study the problem from a model-based Bayesian reinforcement learning perspective, where the goal is to learn the posterior distribution over value functions induced by parameter (epistemic) uncertainty of the Markov decision process. Previous work restricts the analysis to a few moments of the distribution over values or imposes a particular distribution shape, e.g., Gaussians. Inspired by distributional reinforcement learning, we introduce a Bellman operator whose fixed-point is the value distribution function. Based on our theory, we propose Epistemic Quantile-Regression (EQR), a model-based algorithm that learns a value distribution function that can be used for policy optimization. Evaluation across several continuous-control tasks shows performance benefits with respect to established model-based and model-free algorithms.

Related papers

Self-Boost via Optimal Retraining: An Analysis via Approximate Message Passing [58.52119063742121]
Retraining a model using its own predictions together with the original, potentially noisy labels is a well-known strategy for improving the model performance.<n>This paper addresses the question of how to optimally combine the model's predictions and the provided labels.<n>Our main contribution is the derivation of the Bayes optimal aggregator function to combine the current model's predictions and the given labels.
arXiv Detail & Related papers (2025-05-21T07:16:44Z)
Stabilizing Extreme Q-learning by Maclaurin Expansion [51.041889588036895]
Extreme Q-learning (XQL) employs a loss function based on the assumption that Bellman error follows a Gumbel distribution. It has demonstrated strong performance in both offline and online reinforcement learning settings. We propose Maclaurin Expanded Extreme Q-learning to enhance stability.
arXiv Detail & Related papers (2024-06-07T12:43:17Z)
Symmetric Q-learning: Reducing Skewness of Bellman Error in Online Reinforcement Learning [55.75959755058356]
In deep reinforcement learning, estimating the value function is essential to evaluate the quality of states and actions. A recent study suggested that the error distribution for training the value function is often skewed because of the properties of the Bellman operator. We proposed a method called Symmetric Q-learning, in which the synthetic noise generated from a zero-mean distribution is added to the target values to generate a Gaussian error distribution.
arXiv Detail & Related papers (2024-03-12T14:49:19Z)
Equation Discovery with Bayesian Spike-and-Slab Priors and Efficient Kernels [57.46832672991433]
We propose a novel equation discovery method based on Kernel learning and BAyesian Spike-and-Slab priors (KBASS) We use kernel regression to estimate the target function, which is flexible, expressive, and more robust to data sparsity and noises. We develop an expectation-propagation expectation-maximization algorithm for efficient posterior inference and function estimation.
arXiv Detail & Related papers (2023-10-09T03:55:09Z)
On Reward Structures of Markov Decision Processes [4.13365552362244]
A Markov decision process can be parameterized by a transition kernel and a reward function. We study various kinds of "costs" associated with reinforcement learning inspired by the demands in robotic applications. We develop a novel estimator with an instance-specific error bound to $tildeO(sqrtfractau_sn)$ for estimating a single state value.
arXiv Detail & Related papers (2023-08-28T22:29:16Z)
Normality-Guided Distributional Reinforcement Learning for Continuous Control [16.324313304691426]
Learning a predictive model of the mean return, or value function, plays a critical role in many reinforcement learning algorithms. We study the value distribution in several continuous control tasks and find that the learned value distribution is empirical quite close to normal. We propose a policy update strategy based on the correctness as measured by structural characteristics of the value distribution not present in the standard value function.
arXiv Detail & Related papers (2022-08-28T02:52:10Z)
Variance-Aware Off-Policy Evaluation with Linear Function Approximation [85.75516599931632]
We study the off-policy evaluation problem in reinforcement learning with linear function approximation. We propose an algorithm, VA-OPE, which uses the estimated variance of the value function to reweight the Bellman residual in Fitted Q-Iteration.
arXiv Detail & Related papers (2021-06-22T17:58:46Z)
Implicit Distributional Reinforcement Learning [61.166030238490634]
implicit distributional actor-critic (IDAC) built on two deep generator networks (DGNs) Semi-implicit actor (SIA) powered by a flexible policy distribution. We observe IDAC outperforms state-of-the-art algorithms on representative OpenAI Gym environments.
arXiv Detail & Related papers (2020-07-13T02:52:18Z)
Covariate Distribution Aware Meta-learning [3.494950334697974]
We propose a computationally feasible meta-learning algorithm by introducing meaningful relaxations. We demonstrate the gains of our algorithm over bootstrapped based meta-learning baselines on popular classification benchmarks.
arXiv Detail & Related papers (2020-07-06T05:00:13Z)

This list is automatically generated from the titles and abstracts of the papers in this site.