Related papers: Symmetric Q-learning: Reducing Skewness of Bellman Error in Online Reinforcement Learning

Symmetric Q-learning: Reducing Skewness of Bellman Error in Online Reinforcement Learning

URL: http://arxiv.org/abs/2403.07704v1
Date: Tue, 12 Mar 2024 14:49:19 GMT
Title: Symmetric Q-learning: Reducing Skewness of Bellman Error in Online Reinforcement Learning
Authors: Motoki Omura, Takayuki Osa, Yusuke Mukuta, Tatsuya Harada
Abstract summary: In deep reinforcement learning, estimating the value function is essential to evaluate the quality of states and actions. A recent study suggested that the error distribution for training the value function is often skewed because of the properties of the Bellman operator. We proposed a method called Symmetric Q-learning, in which the synthetic noise generated from a zero-mean distribution is added to the target values to generate a Gaussian error distribution.
Score: 55.75959755058356
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: In deep reinforcement learning, estimating the value function to evaluate the quality of states and actions is essential. The value function is often trained using the least squares method, which implicitly assumes a Gaussian error distribution. However, a recent study suggested that the error distribution for training the value function is often skewed because of the properties of the Bellman operator, and violates the implicit assumption of normal error distribution in the least squares method. To address this, we proposed a method called Symmetric Q-learning, in which the synthetic noise generated from a zero-mean distribution is added to the target values to generate a Gaussian error distribution. We evaluated the proposed method on continuous control benchmark tasks in MuJoCo. It improved the sample efficiency of a state-of-the-art reinforcement learning method by reducing the skewness of the error distribution.

Related papers

Generalized Gaussian Temporal Difference Error for Uncertainty-aware Reinforcement Learning [0.19418036471925312]
We introduce a novel framework for generalized Gaussian error modeling in deep reinforcement learning. Our framework enhances the flexibility of error distribution modeling by incorporating additional higher-order moment, particularly kurtosis.
arXiv Detail & Related papers (2024-08-05T08:12:25Z)
Value-Distributional Model-Based Reinforcement Learning [59.758009422067]
Quantifying uncertainty about a policy's long-term performance is important to solve sequential decision-making tasks. We study the problem from a model-based Bayesian reinforcement learning perspective. We propose Epistemic Quantile-Regression (EQR), a model-based algorithm that learns a value distribution function.
arXiv Detail & Related papers (2023-08-12T14:59:19Z)
General regularization in covariate shift adaptation [1.5469452301122175]
We show that the amount of samples needed to achieve the same order of accuracy as in the standard supervised learning without differences in data distributions is smaller than proven by state-of-the-art analyses.
arXiv Detail & Related papers (2023-07-21T11:19:00Z)
Gaussian Latent Representations for Uncertainty Estimation using Mahalanobis Distance in Deep Classifiers [1.5088605208312555]
We present a lightweight, fast, and high-performance regularization method for Mahalanobis distance-based uncertainty prediction. We show the applicability of our method to a real-life computer vision use case on microorganism classification.
arXiv Detail & Related papers (2023-05-23T09:18:47Z)
Learn Quasi-stationary Distributions of Finite State Markov Chain [2.780408966503282]
We propose a reinforcement learning (RL) approach to compute the expression of quasi-stationary distribution. We minimize the KL-divergence of two Markovian path distributions induced by the candidate distribution and the true target distribution. We derive the corresponding policy gradient theorem and design an actor-critic algorithm to learn the optimal solution and value function.
arXiv Detail & Related papers (2021-11-19T02:56:34Z)
Distribution Mismatch Correction for Improved Robustness in Deep Neural Networks [86.42889611784855]
normalization methods increase the vulnerability with respect to noise and input corruptions. We propose an unsupervised non-parametric distribution correction method that adapts the activation distribution of each layer. In our experiments, we empirically show that the proposed method effectively reduces the impact of intense image corruptions.
arXiv Detail & Related papers (2021-10-05T11:36:25Z)
A Distribution-Dependent Analysis of Meta-Learning [13.24264919706183]
Key problem in the theory of meta-learning is to understand how the task distributions influence transfer risk. In this paper, we give distribution-dependent lower bounds on the transfer risk of any algorithm. We show that a novel, weighted version of the so-called biased regularized regression method is able to match these lower bounds up to a fixed constant factor.
arXiv Detail & Related papers (2020-10-31T19:36:15Z)
Distributional Reinforcement Learning via Moment Matching [54.16108052278444]
We formulate a method that learns a finite set of statistics from each return distribution via neural networks. Our method can be interpreted as implicitly matching all orders of moments between a return distribution and its Bellman target. Experiments on the suite of Atari games show that our method outperforms the standard distributional RL baselines.
arXiv Detail & Related papers (2020-07-24T05:18:17Z)
Path Sample-Analytic Gradient Estimators for Stochastic Binary Networks [78.76880041670904]
In neural networks with binary activations and or binary weights the training by gradient descent is complicated. We propose a new method for this estimation problem combining sampling and analytic approximation steps. We experimentally show higher accuracy in gradient estimation and demonstrate a more stable and better performing training in deep convolutional models.
arXiv Detail & Related papers (2020-06-04T21:51:21Z)
A Distributional Analysis of Sampling-Based Reinforcement Learning Algorithms [67.67377846416106]
We present a distributional approach to theoretical analyses of reinforcement learning algorithms for constant step-sizes. We show that value-based methods such as TD($lambda$) and $Q$-Learning have update rules which are contractive in the space of distributions of functions.
arXiv Detail & Related papers (2020-03-27T05:13:29Z)

This list is automatically generated from the titles and abstracts of the papers in this site.