Symmetric Q-learning: Reducing Skewness of Bellman Error in Online
Reinforcement Learning
- URL: http://arxiv.org/abs/2403.07704v1
- Date: Tue, 12 Mar 2024 14:49:19 GMT
- Title: Symmetric Q-learning: Reducing Skewness of Bellman Error in Online
Reinforcement Learning
- Authors: Motoki Omura, Takayuki Osa, Yusuke Mukuta, Tatsuya Harada
- Abstract summary: In deep reinforcement learning, estimating the value function is essential to evaluate the quality of states and actions.
A recent study suggested that the error distribution for training the value function is often skewed because of the properties of the Bellman operator.
We proposed a method called Symmetric Q-learning, in which the synthetic noise generated from a zero-mean distribution is added to the target values to generate a Gaussian error distribution.
- Score: 55.75959755058356
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In deep reinforcement learning, estimating the value function to evaluate the
quality of states and actions is essential. The value function is often trained
using the least squares method, which implicitly assumes a Gaussian error
distribution. However, a recent study suggested that the error distribution for
training the value function is often skewed because of the properties of the
Bellman operator, and violates the implicit assumption of normal error
distribution in the least squares method. To address this, we proposed a method
called Symmetric Q-learning, in which the synthetic noise generated from a
zero-mean distribution is added to the target values to generate a Gaussian
error distribution. We evaluated the proposed method on continuous control
benchmark tasks in MuJoCo. It improved the sample efficiency of a
state-of-the-art reinforcement learning method by reducing the skewness of the
error distribution.
Related papers
- Generalized Gaussian Temporal Difference Error for Uncertainty-aware Reinforcement Learning [0.19418036471925312]
We introduce a novel framework for generalized Gaussian error modeling in deep reinforcement learning.
Our framework enhances the flexibility of error distribution modeling by incorporating additional higher-order moment, particularly kurtosis.
arXiv Detail & Related papers (2024-08-05T08:12:25Z) - Value-Distributional Model-Based Reinforcement Learning [59.758009422067]
Quantifying uncertainty about a policy's long-term performance is important to solve sequential decision-making tasks.
We study the problem from a model-based Bayesian reinforcement learning perspective.
We propose Epistemic Quantile-Regression (EQR), a model-based algorithm that learns a value distribution function.
arXiv Detail & Related papers (2023-08-12T14:59:19Z) - General regularization in covariate shift adaptation [1.5469452301122175]
We show that the amount of samples needed to achieve the same order of accuracy as in the standard supervised learning without differences in data distributions is smaller than proven by state-of-the-art analyses.
arXiv Detail & Related papers (2023-07-21T11:19:00Z) - Learn Quasi-stationary Distributions of Finite State Markov Chain [2.780408966503282]
We propose a reinforcement learning (RL) approach to compute the expression of quasi-stationary distribution.
We minimize the KL-divergence of two Markovian path distributions induced by the candidate distribution and the true target distribution.
We derive the corresponding policy gradient theorem and design an actor-critic algorithm to learn the optimal solution and value function.
arXiv Detail & Related papers (2021-11-19T02:56:34Z) - Distribution Mismatch Correction for Improved Robustness in Deep Neural
Networks [86.42889611784855]
normalization methods increase the vulnerability with respect to noise and input corruptions.
We propose an unsupervised non-parametric distribution correction method that adapts the activation distribution of each layer.
In our experiments, we empirically show that the proposed method effectively reduces the impact of intense image corruptions.
arXiv Detail & Related papers (2021-10-05T11:36:25Z) - A Distribution-Dependent Analysis of Meta-Learning [13.24264919706183]
Key problem in the theory of meta-learning is to understand how the task distributions influence transfer risk.
In this paper, we give distribution-dependent lower bounds on the transfer risk of any algorithm.
We show that a novel, weighted version of the so-called biased regularized regression method is able to match these lower bounds up to a fixed constant factor.
arXiv Detail & Related papers (2020-10-31T19:36:15Z) - Distributional Reinforcement Learning via Moment Matching [54.16108052278444]
We formulate a method that learns a finite set of statistics from each return distribution via neural networks.
Our method can be interpreted as implicitly matching all orders of moments between a return distribution and its Bellman target.
Experiments on the suite of Atari games show that our method outperforms the standard distributional RL baselines.
arXiv Detail & Related papers (2020-07-24T05:18:17Z) - Path Sample-Analytic Gradient Estimators for Stochastic Binary Networks [78.76880041670904]
In neural networks with binary activations and or binary weights the training by gradient descent is complicated.
We propose a new method for this estimation problem combining sampling and analytic approximation steps.
We experimentally show higher accuracy in gradient estimation and demonstrate a more stable and better performing training in deep convolutional models.
arXiv Detail & Related papers (2020-06-04T21:51:21Z) - A Distributional Analysis of Sampling-Based Reinforcement Learning
Algorithms [67.67377846416106]
We present a distributional approach to theoretical analyses of reinforcement learning algorithms for constant step-sizes.
We show that value-based methods such as TD($lambda$) and $Q$-Learning have update rules which are contractive in the space of distributions of functions.
arXiv Detail & Related papers (2020-03-27T05:13:29Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.