LLQL: Logistic Likelihood Q-Learning for Reinforcement Learning
- URL: http://arxiv.org/abs/2307.02345v4
- Date: Wed, 13 Dec 2023 14:43:43 GMT
- Title: LLQL: Logistic Likelihood Q-Learning for Reinforcement Learning
- Authors: Outongyi Lv and Bingxin Zhou
- Abstract summary: This study investigates the distribution of the Bellman approximation error through iterative exploration of the Bellman equation.
We propose the utilization of the Logistic maximum likelihood function (LLoss) as an alternative to the commonly used mean squared error (MSELoss) that assumes a Normal distribution for Bellman errors.
- Score: 1.5734309088976395
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Modern reinforcement learning (RL) can be categorized into online and offline
variants. As a pivotal aspect of both online and offline RL, current research
on the Bellman equation revolves primarily around optimization techniques and
performance enhancement rather than exploring the inherent structural
properties of the Bellman error, such as its distribution characteristics. This
study investigates the distribution of the Bellman approximation error through
iterative exploration of the Bellman equation with the observation that the
Bellman error approximately follows the Logistic distribution. Based on this,
we proposed the utilization of the Logistic maximum likelihood function (LLoss)
as an alternative to the commonly used mean squared error (MSELoss) that
assumes a Normal distribution for Bellman errors. We validated the hypotheses
through extensive numerical experiments across diverse online and offline
environments. In particular, we applied the Logistic correction to loss
functions in various RL baseline methods and observed that the results with
LLoss consistently outperformed the MSE counterparts. We also conducted the
Kolmogorov-Smirnov tests to confirm the reliability of the Logistic
distribution. Moreover, our theory connects the Bellman error to the
proportional reward scaling phenomenon by providing a distribution-based
analysis. Furthermore, we applied the bias-variance decomposition for sampling
from the Logistic distribution. The theoretical and empirical insights of this
study lay a valuable foundation for future investigations and enhancements
centered on the distribution of Bellman error.
Related papers
- Bellman Diffusion: Generative Modeling as Learning a Linear Operator in the Distribution Space [72.52365911990935]
We introduce Bellman Diffusion, a novel DGM framework that maintains linearity in MDPs through gradient and scalar field modeling.
Our results show that Bellman Diffusion achieves accurate field estimations and is a capable image generator, converging 1.5x faster than the traditional histogram-based baseline in distributional RL tasks.
arXiv Detail & Related papers (2024-10-02T17:53:23Z) - Stabilizing Extreme Q-learning by Maclaurin Expansion [51.041889588036895]
Extreme Q-learning (XQL) employs a loss function based on the assumption that Bellman error follows a Gumbel distribution.
It has demonstrated strong performance in both offline and online reinforcement learning settings.
We propose Maclaurin Expanded Extreme Q-learning to enhance stability.
arXiv Detail & Related papers (2024-06-07T12:43:17Z) - Symmetric Q-learning: Reducing Skewness of Bellman Error in Online
Reinforcement Learning [55.75959755058356]
In deep reinforcement learning, estimating the value function is essential to evaluate the quality of states and actions.
A recent study suggested that the error distribution for training the value function is often skewed because of the properties of the Bellman operator.
We proposed a method called Symmetric Q-learning, in which the synthetic noise generated from a zero-mean distribution is added to the target values to generate a Gaussian error distribution.
arXiv Detail & Related papers (2024-03-12T14:49:19Z) - On solutions of the distributional Bellman equation [0.0]
We consider general distributional Bellman equations and study existence and uniqueness of their solutions as well as tail properties of return distributions.
We show that any solution of a distributional Bellman equation can be obtained as the vector of marginal laws of a solution to a multivariate affine distributional equation.
arXiv Detail & Related papers (2022-01-31T20:36:59Z) - Optimal policy evaluation using kernel-based temporal difference methods [78.83926562536791]
We use kernel Hilbert spaces for estimating the value function of an infinite-horizon discounted Markov reward process.
We derive a non-asymptotic upper bound on the error with explicit dependence on the eigenvalues of the associated kernel operator.
We prove minimax lower bounds over sub-classes of MRPs.
arXiv Detail & Related papers (2021-09-24T14:48:20Z) - Exploring the Training Robustness of Distributional Reinforcement
Learning against Noisy State Observations [7.776010676090131]
State observations that an agent observes may contain measurement errors or adversarial noises, misleading the agent to take suboptimal actions or even collapse while training.
In this paper, we study the training robustness of distributional Reinforcement Learning (RL), a class of state-of-the-art methods that estimate the whole distribution, as opposed to only the expectation, of the total return.
arXiv Detail & Related papers (2021-09-17T22:37:39Z) - Bayesian Bellman Operators [55.959376449737405]
We introduce a novel perspective on Bayesian reinforcement learning (RL)
Our framework is motivated by the insight that when bootstrapping is introduced, model-free approaches actually infer a posterior over Bellman operators, not value functions.
arXiv Detail & Related papers (2021-06-09T12:20:46Z) - Distributional Reinforcement Learning via Moment Matching [54.16108052278444]
We formulate a method that learns a finite set of statistics from each return distribution via neural networks.
Our method can be interpreted as implicitly matching all orders of moments between a return distribution and its Bellman target.
Experiments on the suite of Atari games show that our method outperforms the standard distributional RL baselines.
arXiv Detail & Related papers (2020-07-24T05:18:17Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.