Neighborhood Mixup Experience Replay: Local Convex Interpolation for
Improved Sample Efficiency in Continuous Control Tasks
- URL: http://arxiv.org/abs/2205.09117v1
- Date: Wed, 18 May 2022 02:44:08 GMT
- Title: Neighborhood Mixup Experience Replay: Local Convex Interpolation for
Improved Sample Efficiency in Continuous Control Tasks
- Authors: Ryan Sander, Wilko Schwarting, Tim Seyde, Igor Gilitschenski, Sertac
Karaman, Daniela Rus
- Abstract summary: Neighborhood Mixup Experience Replay (NMER) is a geometrically-grounded replay buffer that interpolates transitions with their closest neighbors in state-action space.
We observe that NMER improves sample efficiency by an average 94% (TD3) and 29% (SAC) over baseline replay buffers.
- Score: 60.88792564390274
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Experience replay plays a crucial role in improving the sample efficiency of
deep reinforcement learning agents. Recent advances in experience replay
propose using Mixup (Zhang et al., 2018) to further improve sample efficiency
via synthetic sample generation. We build upon this technique with Neighborhood
Mixup Experience Replay (NMER), a geometrically-grounded replay buffer that
interpolates transitions with their closest neighbors in state-action space.
NMER preserves a locally linear approximation of the transition manifold by
only applying Mixup between transitions with vicinal state-action features.
Under NMER, a given transition's set of state action neighbors is dynamic and
episode agnostic, in turn encouraging greater policy generalizability via
inter-episode interpolation. We combine our approach with recent off-policy
deep reinforcement learning algorithms and evaluate on continuous control
environments. We observe that NMER improves sample efficiency by an average 94%
(TD3) and 29% (SAC) over baseline replay buffers, enabling agents to
effectively recombine previous experiences and learn from limited data.
Related papers
- CUER: Corrected Uniform Experience Replay for Off-Policy Continuous Deep Reinforcement Learning Algorithms [5.331052581441265]
We develop a novel algorithm, Corrected Uniform Experience (CUER), which samples the stored experience while considering the fairness among all other experiences.
CUER provides promising improvements for off-policy continuous control algorithms in terms of sample efficiency, final performance, and stability of the policy during the training.
arXiv Detail & Related papers (2024-06-13T12:03:40Z) - Frugal Actor-Critic: Sample Efficient Off-Policy Deep Reinforcement
Learning Using Unique Experiences [8.983448736644382]
Efficient utilization of the replay buffer plays a significant role in the off-policy actor-critic reinforcement learning (RL) algorithms.
We propose a method for achieving sample efficiency, which focuses on selecting unique samples and adding them to the replay buffer.
arXiv Detail & Related papers (2024-02-05T10:04:00Z) - Replay For Safety [51.11953997546418]
In experience replay, past transitions are stored in a memory buffer and re-used during learning.
We show that using an appropriate biased sampling scheme can allow us to achieve a emphsafe policy.
arXiv Detail & Related papers (2021-12-08T11:10:57Z) - Minibatch vs Local SGD with Shuffling: Tight Convergence Bounds and
Beyond [63.59034509960994]
We study shuffling-based variants: minibatch and local Random Reshuffling, which draw gradients without replacement.
For smooth functions satisfying the Polyak-Lojasiewicz condition, we obtain convergence bounds which show that these shuffling-based variants converge faster than their with-replacement counterparts.
We propose an algorithmic modification called synchronized shuffling that leads to convergence rates faster than our lower bounds in near-homogeneous settings.
arXiv Detail & Related papers (2021-10-20T02:25:25Z) - Variance Reduction based Experience Replay for Policy Optimization [3.0790370651488983]
Variance Reduction Experience Replay (VRER) is a framework for the selective reuse of relevant samples to improve policy gradient estimation.
VRER forms the foundation of our sample efficient off-policy learning algorithm known as Policy Gradient with VRER.
arXiv Detail & Related papers (2021-10-17T19:28:45Z) - APS: Active Pretraining with Successor Features [96.24533716878055]
We show that by reinterpreting and combining successorcitepHansenFast with non entropy, the intractable mutual information can be efficiently optimized.
The proposed method Active Pretraining with Successor Feature (APS) explores the environment via non entropy, and the explored data can be efficiently leveraged to learn behavior.
arXiv Detail & Related papers (2021-08-31T16:30:35Z) - Learning Expected Emphatic Traces for Deep RL [32.984880782688535]
Off-policy sampling and experience replay are key for improving sample efficiency and scaling model-free temporal difference learning methods.
We develop a multi-step emphatic weighting that can be combined with replay, and a time-reversed $n$-step TD learning algorithm to learn the required emphatic weighting.
arXiv Detail & Related papers (2021-07-12T13:14:03Z) - Improving Generalization in Reinforcement Learning with Mixture
Regularization [113.12412071717078]
We introduce a simple approach, named mixreg, which trains agents on a mixture of observations from different training environments.
Mixreg increases the data diversity more effectively and helps learn smoother policies.
Results show mixreg outperforms the well-established baselines on unseen testing environments by a large margin.
arXiv Detail & Related papers (2020-10-21T08:12:03Z) - Learning to Sample with Local and Global Contexts in Experience Replay
Buffer [135.94190624087355]
We propose a new learning-based sampling method that can compute the relative importance of transition.
We show that our framework can significantly improve the performance of various off-policy reinforcement learning methods.
arXiv Detail & Related papers (2020-07-14T21:12:56Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.