Understanding and Mitigating the Limitations of Prioritized Experience
Replay
- URL: http://arxiv.org/abs/2007.09569v3
- Date: Sat, 11 Jun 2022 19:32:36 GMT
- Title: Understanding and Mitigating the Limitations of Prioritized Experience
Replay
- Authors: Yangchen Pan, Jincheng Mei, Amir-massoud Farahmand, Martha White,
Hengshuai Yao, Mohsen Rohani, Jun Luo
- Abstract summary: Prioritized Replay Experience (ER) has been empirically shown to improve sample efficiency across many domains.
We show equivalence between the error-based prioritized sampling method for mean squared error and uniform sampling for cubic power loss.
We then provide theoretical insight into why it improves convergence rate upon uniform sampling during early learning.
- Score: 46.663239542920984
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Prioritized Experience Replay (ER) has been empirically shown to improve
sample efficiency across many domains and attracted great attention; however,
there is little theoretical understanding of why such prioritized sampling
helps and its limitations. In this work, we take a deep look at the prioritized
ER. In a supervised learning setting, we show the equivalence between the
error-based prioritized sampling method for mean squared error and uniform
sampling for cubic power loss. We then provide theoretical insight into why it
improves convergence rate upon uniform sampling during early learning. Based on
the insight, we further point out two limitations of the prioritized ER method:
1) outdated priorities and 2) insufficient coverage of the sample space. To
mitigate the limitations, we propose our model-based stochastic gradient
Langevin dynamics sampling method. We show that our method does provide states
distributed close to an ideal prioritized sampling distribution estimated by
the brute-force method, which does not suffer from the two limitations. We
conduct experiments on both discrete and continuous control problems to show
our approach's efficacy and examine the practical implication of our method in
an autonomous driving application.
Related papers
- Diffusion Prior-Based Amortized Variational Inference for Noisy Inverse Problems [12.482127049881026]
We propose a novel approach to solve inverse problems with a diffusion prior from an amortized variational inference perspective.
Our amortized inference learns a function that directly maps measurements to the implicit posterior distributions of corresponding clean data, enabling a single-step posterior sampling even for unseen measurements.
arXiv Detail & Related papers (2024-07-23T02:14:18Z) - Semiparametric Efficient Inference in Adaptive Experiments [29.43493007296859]
We consider the problem of efficient inference of the Average Treatment Effect in a sequential experiment where the policy governing the assignment of subjects to treatment or control can change over time.
We first provide a central limit theorem for the Adaptive Augmented Inverse-Probability Weighted estimator, which is semi efficient, under weaker assumptions than those previously made in the literature.
We then consider sequential inference setting, deriving both propensity and nonasymptotic confidence sequences that are considerably tighter than previous methods.
arXiv Detail & Related papers (2023-11-30T06:25:06Z) - Primal Dual Continual Learning: Balancing Stability and Plasticity through Adaptive Memory Allocation [86.8475564814154]
We show that it is both possible and beneficial to undertake the constrained optimization problem directly.
We focus on memory-based methods, where a small subset of samples from previous tasks can be stored in a replay buffer.
We show that dual variables indicate the sensitivity of the optimal value of the continual learning problem with respect to constraint perturbations.
arXiv Detail & Related papers (2023-09-29T21:23:27Z) - Sample Dropout: A Simple yet Effective Variance Reduction Technique in
Deep Policy Optimization [18.627233013208834]
We show that the use of importance sampling could introduce high variance in the objective estimate.
We propose a technique called sample dropout to bound the estimation variance by dropping out samples when their ratio deviation is too high.
arXiv Detail & Related papers (2023-02-05T04:44:35Z) - Distributionally Robust Causal Inference with Observational Data [4.8986598953553555]
We consider the estimation of average treatment effects in observational studies without the standard assumption of unconfoundedness.
We propose a new framework of robust causal inference under the general observational study setting with the possible existence of unobserved confounders.
arXiv Detail & Related papers (2022-10-15T16:02:33Z) - Hindsight Goal Ranking on Replay Buffer for Sparse Reward Environment [16.422215672356167]
The paper proposes a method for prioritizing the replay experience referred to as Hindsight Goal Ranking (HGR)
HGR samples with higher probability on the states visited in an episode with larger temporal difference (TD) error.
The proposed method combined with Deep Deterministic Policy Gradient (DDPG), an off-policy model-free actor-critic algorithm, accelerates learning significantly faster than that.
arXiv Detail & Related papers (2021-10-28T12:09:10Z) - Jo-SRC: A Contrastive Approach for Combating Noisy Labels [58.867237220886885]
We propose a noise-robust approach named Jo-SRC (Joint Sample Selection and Model Regularization based on Consistency)
Specifically, we train the network in a contrastive learning manner. Predictions from two different views of each sample are used to estimate its "likelihood" of being clean or out-of-distribution.
arXiv Detail & Related papers (2021-03-24T07:26:07Z) - A One-step Approach to Covariate Shift Adaptation [82.01909503235385]
A default assumption in many machine learning scenarios is that the training and test samples are drawn from the same probability distribution.
We propose a novel one-step approach that jointly learns the predictive model and the associated weights in one optimization.
arXiv Detail & Related papers (2020-07-08T11:35:47Z) - Learning the Truth From Only One Side of the Story [58.65439277460011]
We focus on generalized linear models and show that without adjusting for this sampling bias, the model may converge suboptimally or even fail to converge to the optimal solution.
We propose an adaptive approach that comes with theoretical guarantees and show that it outperforms several existing methods empirically.
arXiv Detail & Related papers (2020-06-08T18:20:28Z) - The Simulator: Understanding Adaptive Sampling in the
Moderate-Confidence Regime [52.38455827779212]
We propose a novel technique for analyzing adaptive sampling called the em Simulator.
We prove the first instance-based lower bounds the top-k problem which incorporate the appropriate log-factors.
Our new analysis inspires a simple and near-optimal for the best-arm and top-k identification, the first em practical of its kind for the latter problem.
arXiv Detail & Related papers (2017-02-16T23:42:02Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.