Related papers: LLQL: Logistic Likelihood Q-Learning for Reinforcement Learning

LLQL: Logistic Likelihood Q-Learning for Reinforcement Learning

URL: http://arxiv.org/abs/2307.02345v4
Date: Wed, 13 Dec 2023 14:43:43 GMT
Title: LLQL: Logistic Likelihood Q-Learning for Reinforcement Learning
Authors: Outongyi Lv and Bingxin Zhou
Abstract summary: This study investigates the distribution of the Bellman approximation error through iterative exploration of the Bellman equation. We propose the utilization of the Logistic maximum likelihood function (LLoss) as an alternative to the commonly used mean squared error (MSELoss) that assumes a Normal distribution for Bellman errors.
Score: 1.5734309088976395
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Modern reinforcement learning (RL) can be categorized into online and offline variants. As a pivotal aspect of both online and offline RL, current research on the Bellman equation revolves primarily around optimization techniques and performance enhancement rather than exploring the inherent structural properties of the Bellman error, such as its distribution characteristics. This study investigates the distribution of the Bellman approximation error through iterative exploration of the Bellman equation with the observation that the Bellman error approximately follows the Logistic distribution. Based on this, we proposed the utilization of the Logistic maximum likelihood function (LLoss) as an alternative to the commonly used mean squared error (MSELoss) that assumes a Normal distribution for Bellman errors. We validated the hypotheses through extensive numerical experiments across diverse online and offline environments. In particular, we applied the Logistic correction to loss functions in various RL baseline methods and observed that the results with LLoss consistently outperformed the MSE counterparts. We also conducted the Kolmogorov-Smirnov tests to confirm the reliability of the Logistic distribution. Moreover, our theory connects the Bellman error to the proportional reward scaling phenomenon by providing a distribution-based analysis. Furthermore, we applied the bias-variance decomposition for sampling from the Logistic distribution. The theoretical and empirical insights of this study lay a valuable foundation for future investigations and enhancements centered on the distribution of Bellman error.

Related papers

Bellman Diffusion: Generative Modeling as Learning a Linear Operator in the Distribution Space [72.52365911990935]
We introduce Bellman Diffusion, a novel DGM framework that maintains linearity in MDPs through gradient and scalar field modeling. Our results show that Bellman Diffusion achieves accurate field estimations and is a capable image generator, converging 1.5x faster than the traditional histogram-based baseline in distributional RL tasks.
arXiv Detail & Related papers (2024-10-02T17:53:23Z)
Stabilizing Extreme Q-learning by Maclaurin Expansion [51.041889588036895]
Extreme Q-learning (XQL) employs a loss function based on the assumption that Bellman error follows a Gumbel distribution. It has demonstrated strong performance in both offline and online reinforcement learning settings. We propose Maclaurin Expanded Extreme Q-learning to enhance stability.
arXiv Detail & Related papers (2024-06-07T12:43:17Z)
Symmetric Q-learning: Reducing Skewness of Bellman Error in Online Reinforcement Learning [55.75959755058356]
In deep reinforcement learning, estimating the value function is essential to evaluate the quality of states and actions. A recent study suggested that the error distribution for training the value function is often skewed because of the properties of the Bellman operator. We proposed a method called Symmetric Q-learning, in which the synthetic noise generated from a zero-mean distribution is added to the target values to generate a Gaussian error distribution.
arXiv Detail & Related papers (2024-03-12T14:49:19Z)
On solutions of the distributional Bellman equation [0.0]
We consider general distributional Bellman equations and study existence and uniqueness of their solutions as well as tail properties of return distributions. We show that any solution of a distributional Bellman equation can be obtained as the vector of marginal laws of a solution to a multivariate affine distributional equation.
arXiv Detail & Related papers (2022-01-31T20:36:59Z)
Optimal policy evaluation using kernel-based temporal difference methods [78.83926562536791]
We use kernel Hilbert spaces for estimating the value function of an infinite-horizon discounted Markov reward process. We derive a non-asymptotic upper bound on the error with explicit dependence on the eigenvalues of the associated kernel operator. We prove minimax lower bounds over sub-classes of MRPs.
arXiv Detail & Related papers (2021-09-24T14:48:20Z)
Exploring the Training Robustness of Distributional Reinforcement Learning against Noisy State Observations [7.776010676090131]
State observations that an agent observes may contain measurement errors or adversarial noises, misleading the agent to take suboptimal actions or even collapse while training. In this paper, we study the training robustness of distributional Reinforcement Learning (RL), a class of state-of-the-art methods that estimate the whole distribution, as opposed to only the expectation, of the total return.
arXiv Detail & Related papers (2021-09-17T22:37:39Z)
Bayesian Bellman Operators [55.959376449737405]
We introduce a novel perspective on Bayesian reinforcement learning (RL) Our framework is motivated by the insight that when bootstrapping is introduced, model-free approaches actually infer a posterior over Bellman operators, not value functions.
arXiv Detail & Related papers (2021-06-09T12:20:46Z)
Distributional Reinforcement Learning via Moment Matching [54.16108052278444]
We formulate a method that learns a finite set of statistics from each return distribution via neural networks. Our method can be interpreted as implicitly matching all orders of moments between a return distribution and its Bellman target. Experiments on the suite of Atari games show that our method outperforms the standard distributional RL baselines.
arXiv Detail & Related papers (2020-07-24T05:18:17Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.