Related papers: Neighborhood Mixup Experience Replay: Local Convex Interpolation for Improved Sample Efficiency in Continuous Control Tasks

Neighborhood Mixup Experience Replay: Local Convex Interpolation for Improved Sample Efficiency in Continuous Control Tasks

URL: http://arxiv.org/abs/2205.09117v1
Date: Wed, 18 May 2022 02:44:08 GMT
Title: Neighborhood Mixup Experience Replay: Local Convex Interpolation for Improved Sample Efficiency in Continuous Control Tasks
Authors: Ryan Sander, Wilko Schwarting, Tim Seyde, Igor Gilitschenski, Sertac Karaman, Daniela Rus
Abstract summary: Neighborhood Mixup Experience Replay (NMER) is a geometrically-grounded replay buffer that interpolates transitions with their closest neighbors in state-action space. We observe that NMER improves sample efficiency by an average 94% (TD3) and 29% (SAC) over baseline replay buffers.
Score: 60.88792564390274
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Experience replay plays a crucial role in improving the sample efficiency of deep reinforcement learning agents. Recent advances in experience replay propose using Mixup (Zhang et al., 2018) to further improve sample efficiency via synthetic sample generation. We build upon this technique with Neighborhood Mixup Experience Replay (NMER), a geometrically-grounded replay buffer that interpolates transitions with their closest neighbors in state-action space. NMER preserves a locally linear approximation of the transition manifold by only applying Mixup between transitions with vicinal state-action features. Under NMER, a given transition's set of state action neighbors is dynamic and episode agnostic, in turn encouraging greater policy generalizability via inter-episode interpolation. We combine our approach with recent off-policy deep reinforcement learning algorithms and evaluate on continuous control environments. We observe that NMER improves sample efficiency by an average 94% (TD3) and 29% (SAC) over baseline replay buffers, enabling agents to effectively recombine previous experiences and learn from limited data.

Related papers

Experience Replay with Random Reshuffling [3.6622737533847936]
In supervised learning, it is common to shuffle the dataset every epoch and consume data sequentially, which is called random reshuffling (RR) We propose sampling methods that extend RR to experience replay, both in uniform and prioritized settings. We evaluate our sampling methods on Atari benchmarks, demonstrating their effectiveness in deep reinforcement learning.
arXiv Detail & Related papers (2025-03-04T04:37:22Z)
CUER: Corrected Uniform Experience Replay for Off-Policy Continuous Deep Reinforcement Learning Algorithms [5.331052581441265]
We develop a novel algorithm, Corrected Uniform Experience (CUER), which samples the stored experience while considering the fairness among all other experiences. CUER provides promising improvements for off-policy continuous control algorithms in terms of sample efficiency, final performance, and stability of the policy during the training.
arXiv Detail & Related papers (2024-06-13T12:03:40Z)
Frugal Actor-Critic: Sample Efficient Off-Policy Deep Reinforcement Learning Using Unique Experiences [8.983448736644382]
Efficient utilization of the replay buffer plays a significant role in the off-policy actor-critic reinforcement learning (RL) algorithms. We propose a method for achieving sample efficiency, which focuses on selecting unique samples and adding them to the replay buffer.
arXiv Detail & Related papers (2024-02-05T10:04:00Z)
Replay For Safety [51.11953997546418]
In experience replay, past transitions are stored in a memory buffer and re-used during learning. We show that using an appropriate biased sampling scheme can allow us to achieve a emphsafe policy.
arXiv Detail & Related papers (2021-12-08T11:10:57Z)
Minibatch vs Local SGD with Shuffling: Tight Convergence Bounds and Beyond [63.59034509960994]
We study shuffling-based variants: minibatch and local Random Reshuffling, which draw gradients without replacement. For smooth functions satisfying the Polyak-Lojasiewicz condition, we obtain convergence bounds which show that these shuffling-based variants converge faster than their with-replacement counterparts. We propose an algorithmic modification called synchronized shuffling that leads to convergence rates faster than our lower bounds in near-homogeneous settings.
arXiv Detail & Related papers (2021-10-20T02:25:25Z)
Variance Reduction based Experience Replay for Policy Optimization [3.0790370651488983]
Variance Reduction Experience Replay (VRER) is a framework for the selective reuse of relevant samples to improve policy gradient estimation. VRER forms the foundation of our sample efficient off-policy learning algorithm known as Policy Gradient with VRER.
arXiv Detail & Related papers (2021-10-17T19:28:45Z)
APS: Active Pretraining with Successor Features [96.24533716878055]
We show that by reinterpreting and combining successorcitepHansenFast with non entropy, the intractable mutual information can be efficiently optimized. The proposed method Active Pretraining with Successor Feature (APS) explores the environment via non entropy, and the explored data can be efficiently leveraged to learn behavior.
arXiv Detail & Related papers (2021-08-31T16:30:35Z)
Learning Expected Emphatic Traces for Deep RL [32.984880782688535]
Off-policy sampling and experience replay are key for improving sample efficiency and scaling model-free temporal difference learning methods. We develop a multi-step emphatic weighting that can be combined with replay, and a time-reversed $n$-step TD learning algorithm to learn the required emphatic weighting.
arXiv Detail & Related papers (2021-07-12T13:14:03Z)
Improving Generalization in Reinforcement Learning with Mixture Regularization [113.12412071717078]
We introduce a simple approach, named mixreg, which trains agents on a mixture of observations from different training environments. Mixreg increases the data diversity more effectively and helps learn smoother policies. Results show mixreg outperforms the well-established baselines on unseen testing environments by a large margin.
arXiv Detail & Related papers (2020-10-21T08:12:03Z)
Learning to Sample with Local and Global Contexts in Experience Replay Buffer [135.94190624087355]
We propose a new learning-based sampling method that can compute the relative importance of transition. We show that our framework can significantly improve the performance of various off-policy reinforcement learning methods.
arXiv Detail & Related papers (2020-07-14T21:12:56Z)

This list is automatically generated from the titles and abstracts of the papers in this site.