LLQL: Logistic Likelihood Q-Learning for Reinforcement Learning
- URL: http://arxiv.org/abs/2307.02345v4
- Date: Wed, 13 Dec 2023 14:43:43 GMT
- Title: LLQL: Logistic Likelihood Q-Learning for Reinforcement Learning
- Authors: Outongyi Lv and Bingxin Zhou
- Abstract summary: This study investigates the distribution of the Bellman approximation error through iterative exploration of the Bellman equation.
We propose the utilization of the Logistic maximum likelihood function (LLoss) as an alternative to the commonly used mean squared error (MSELoss) that assumes a Normal distribution for Bellman errors.
- Score: 1.5734309088976395
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Modern reinforcement learning (RL) can be categorized into online and offline
variants. As a pivotal aspect of both online and offline RL, current research
on the Bellman equation revolves primarily around optimization techniques and
performance enhancement rather than exploring the inherent structural
properties of the Bellman error, such as its distribution characteristics. This
study investigates the distribution of the Bellman approximation error through
iterative exploration of the Bellman equation with the observation that the
Bellman error approximately follows the Logistic distribution. Based on this,
we proposed the utilization of the Logistic maximum likelihood function (LLoss)
as an alternative to the commonly used mean squared error (MSELoss) that
assumes a Normal distribution for Bellman errors. We validated the hypotheses
through extensive numerical experiments across diverse online and offline
environments. In particular, we applied the Logistic correction to loss
functions in various RL baseline methods and observed that the results with
LLoss consistently outperformed the MSE counterparts. We also conducted the
Kolmogorov-Smirnov tests to confirm the reliability of the Logistic
distribution. Moreover, our theory connects the Bellman error to the
proportional reward scaling phenomenon by providing a distribution-based
analysis. Furthermore, we applied the bias-variance decomposition for sampling
from the Logistic distribution. The theoretical and empirical insights of this
study lay a valuable foundation for future investigations and enhancements
centered on the distribution of Bellman error.
Related papers
- Rejection via Learning Density Ratios [50.91522897152437]
Classification with rejection emerges as a learning paradigm which allows models to abstain from making predictions.
We propose a different distributional perspective, where we seek to find an idealized data distribution which maximizes a pretrained model's performance.
Our framework is tested empirically over clean and noisy datasets.
arXiv Detail & Related papers (2024-05-29T01:32:17Z) - Symmetric Q-learning: Reducing Skewness of Bellman Error in Online
Reinforcement Learning [55.75959755058356]
In deep reinforcement learning, estimating the value function is essential to evaluate the quality of states and actions.
A recent study suggested that the error distribution for training the value function is often skewed because of the properties of the Bellman operator.
We proposed a method called Symmetric Q-learning, in which the synthetic noise generated from a zero-mean distribution is added to the target values to generate a Gaussian error distribution.
arXiv Detail & Related papers (2024-03-12T14:49:19Z) - Value-Distributional Model-Based Reinforcement Learning [63.32053223422317]
Quantifying uncertainty about a policy's long-term performance is important to solve sequential decision-making tasks.
We study the problem from a model-based Bayesian reinforcement learning perspective.
We propose Epistemic Quantile-Regression (EQR), a model-based algorithm that learns a value distribution function.
arXiv Detail & Related papers (2023-08-12T14:59:19Z) - How Does Value Distribution in Distributional Reinforcement Learning
Help Optimization? [4.695760312524447]
We consider the problem of learning a set of probability distributions from the Bellman dynamics in distributional reinforcement learning(RL)
Despite its success to obtain superior performance, we still have a poor understanding of how the value distribution in distributional RL works.
arXiv Detail & Related papers (2022-09-29T02:18:31Z) - On solutions of the distributional Bellman equation [0.0]
We consider general distributional Bellman equations and study existence and uniqueness of their solutions as well as tail properties of return distributions.
We show that any solution of a distributional Bellman equation can be obtained as the vector of marginal laws of a solution to a multivariate affine distributional equation.
arXiv Detail & Related papers (2022-01-31T20:36:59Z) - Optimal policy evaluation using kernel-based temporal difference methods [78.83926562536791]
We use kernel Hilbert spaces for estimating the value function of an infinite-horizon discounted Markov reward process.
We derive a non-asymptotic upper bound on the error with explicit dependence on the eigenvalues of the associated kernel operator.
We prove minimax lower bounds over sub-classes of MRPs.
arXiv Detail & Related papers (2021-09-24T14:48:20Z) - Bayesian Bellman Operators [55.959376449737405]
We introduce a novel perspective on Bayesian reinforcement learning (RL)
Our framework is motivated by the insight that when bootstrapping is introduced, model-free approaches actually infer a posterior over Bellman operators, not value functions.
arXiv Detail & Related papers (2021-06-09T12:20:46Z) - Distributional Reinforcement Learning via Moment Matching [54.16108052278444]
We formulate a method that learns a finite set of statistics from each return distribution via neural networks.
Our method can be interpreted as implicitly matching all orders of moments between a return distribution and its Bellman target.
Experiments on the suite of Atari games show that our method outperforms the standard distributional RL baselines.
arXiv Detail & Related papers (2020-07-24T05:18:17Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.