Variance Control for Distributional Reinforcement Learning
- URL: http://arxiv.org/abs/2307.16152v1
- Date: Sun, 30 Jul 2023 07:25:18 GMT
- Title: Variance Control for Distributional Reinforcement Learning
- Authors: Qi Kuang, Zhoufan Zhu, Liwen Zhang, Fan Zhou
- Abstract summary: We construct a new estimator emphQuantiled Expansion Mean (QEM) and introduce a new DRL algorithm (QEMRL) from the statistical perspective.
We extensively evaluate our QEMRL algorithm on a variety of Atari and Mujoco benchmark tasks.
- Score: 22.407803118899512
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Although distributional reinforcement learning (DRL) has been widely examined
in the past few years, very few studies investigate the validity of the
obtained Q-function estimator in the distributional setting. To fully
understand how the approximation errors of the Q-function affect the whole
training process, we do some error analysis and theoretically show how to
reduce both the bias and the variance of the error terms. With this new
understanding, we construct a new estimator \emph{Quantiled Expansion Mean}
(QEM) and introduce a new DRL algorithm (QEMRL) from the statistical
perspective. We extensively evaluate our QEMRL algorithm on a variety of Atari
and Mujoco benchmark tasks and demonstrate that QEMRL achieves significant
improvement over baseline algorithms in terms of sample efficiency and
convergence performance.
Related papers
- Understanding, Predicting and Better Resolving Q-Value Divergence in
Offline-RL [86.0987896274354]
We first identify a fundamental pattern, self-excitation, as the primary cause of Q-value estimation divergence in offline RL.
We then propose a novel Self-Excite Eigenvalue Measure (SEEM) metric to measure the evolving property of Q-network at training.
For the first time, our theory can reliably decide whether the training will diverge at an early stage.
arXiv Detail & Related papers (2023-10-06T17:57:44Z) - Provable Reward-Agnostic Preference-Based Reinforcement Learning [61.39541986848391]
Preference-based Reinforcement Learning (PbRL) is a paradigm in which an RL agent learns to optimize a task using pair-wise preference-based feedback over trajectories.
We propose a theoretical reward-agnostic PbRL framework where exploratory trajectories that enable accurate learning of hidden reward functions are acquired.
arXiv Detail & Related papers (2023-05-29T15:00:09Z) - GEC: A Unified Framework for Interactive Decision Making in MDP, POMDP,
and Beyond [101.5329678997916]
We study sample efficient reinforcement learning (RL) under the general framework of interactive decision making.
We propose a novel complexity measure, generalized eluder coefficient (GEC), which characterizes the fundamental tradeoff between exploration and exploitation.
We show that RL problems with low GEC form a remarkably rich class, which subsumes low Bellman eluder dimension problems, bilinear class, low witness rank problems, PO-bilinear class, and generalized regular PSR.
arXiv Detail & Related papers (2022-11-03T16:42:40Z) - Normality-Guided Distributional Reinforcement Learning for Continuous
Control [16.324313304691426]
Learning a predictive model of the mean return, or value function, plays a critical role in many reinforcement learning algorithms.
We study the value distribution in several continuous control tasks and find that the learned value distribution is empirical quite close to normal.
We propose a policy update strategy based on the correctness as measured by structural characteristics of the value distribution not present in the standard value function.
arXiv Detail & Related papers (2022-08-28T02:52:10Z) - The Nature of Temporal Difference Errors in Multi-step Distributional
Reinforcement Learning [46.85801978792022]
We study the multi-step off-policy learning approach to distributional RL.
We identify a novel notion of path-dependent distributional TD error.
We derive a novel algorithm, Quantile Regression-Retrace, which leads to a deep RL agent QR-DQN-Retrace.
arXiv Detail & Related papers (2022-07-15T16:19:23Z) - Unbiased Gradient Estimation for Distributionally Robust Learning [2.1777837784979277]
We consider a new approach based on distributionally robust learning (DRL) that applies gradient descent to the inner problem.
Our algorithm efficiently estimates gradient gradient through multi-level Monte Carlo randomization.
arXiv Detail & Related papers (2020-12-22T21:35:03Z) - Cross Learning in Deep Q-Networks [82.20059754270302]
We propose a novel cross Q-learning algorithm, aim at alleviating the well-known overestimation problem in value-based reinforcement learning methods.
Our algorithm builds on double Q-learning, by maintaining a set of parallel models and estimate the Q-value based on a randomly selected network.
arXiv Detail & Related papers (2020-09-29T04:58:17Z) - Deep Reinforcement Learning with Weighted Q-Learning [43.823659028488876]
Reinforcement learning algorithms based on Q-learning are driving Deep Reinforcement Learning (DRL) research towards solving complex problems.
Q-Learning is known to be positively biased since it learns by using the maximum over noisy estimates of expected values.
We show how our novel Deep Weighted Q-Learning algorithm reduces the bias w.r.t. relevant baselines and provides empirical evidence of its advantages on representative benchmarks.
arXiv Detail & Related papers (2020-03-20T13:57:40Z) - Localized Debiased Machine Learning: Efficient Inference on Quantile
Treatment Effects and Beyond [69.83813153444115]
We consider an efficient estimating equation for the (local) quantile treatment effect ((L)QTE) in causal inference.
Debiased machine learning (DML) is a data-splitting approach to estimating high-dimensional nuisances.
We propose localized debiased machine learning (LDML), which avoids this burdensome step.
arXiv Detail & Related papers (2019-12-30T14:42:52Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.