Related papers: Event Tables for Efficient Experience Replay

Event Tables for Efficient Experience Replay

URL: http://arxiv.org/abs/2211.00576v2
Date: Fri, 21 Apr 2023 11:10:16 GMT
Title: Event Tables for Efficient Experience Replay
Authors: Varun Kompella, Thomas J. Walsh, Samuel Barrett, Peter Wurman, Peter Stone
Abstract summary: Experience replay (ER) is a crucial component of many deep reinforcement learning (RL) systems. Uniform sampling from an ER buffer can lead to slow convergence and unstable behaviors. This paper introduces Stratified Sampling from Event Tables (SSET), which partitions an ER buffer into Event Tables.
Score: 31.678826875509348
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Experience replay (ER) is a crucial component of many deep reinforcement learning (RL) systems. However, uniform sampling from an ER buffer can lead to slow convergence and unstable asymptotic behaviors. This paper introduces Stratified Sampling from Event Tables (SSET), which partitions an ER buffer into Event Tables, each capturing important subsequences of optimal behavior. We prove a theoretical advantage over the traditional monolithic buffer approach and combine SSET with an existing prioritized sampling strategy to further improve learning speed and stability. Empirical results in challenging MiniGrid domains, benchmark RL environments, and a high-fidelity car racing simulator demonstrate the advantages and versatility of SSET over existing ER buffer sampling approaches.

Related papers

Experience Replay with Random Reshuffling [3.6622737533847936]
In supervised learning, it is common to shuffle the dataset every epoch and consume data sequentially, which is called random reshuffling (RR) We propose sampling methods that extend RR to experience replay, both in uniform and prioritized settings. We evaluate our sampling methods on Atari benchmarks, demonstrating their effectiveness in deep reinforcement learning.
arXiv Detail & Related papers (2025-03-04T04:37:22Z)
Reward Prediction Error Prioritisation in Experience Replay: The RPE-PER Method [1.600323605807673]
We introduce Reward Predictive Error Prioritised Experience Replay (RPE-PER) RPE-PER prioritises experiences in the buffer based on RPEs. Our method employs a critic network, EMCN, that predicts rewards in addition to the Q-values produced by standard critic networks.
arXiv Detail & Related papers (2025-01-30T02:09:35Z)
ResFlow: Fine-tuning Residual Optical Flow for Event-based High Temporal Resolution Motion Estimation [50.80115710105251]
Event cameras hold significant promise for high-temporal-resolution (HTR) motion estimation. We propose a residual-based paradigm for estimating HTR optical flow with event data.
arXiv Detail & Related papers (2024-12-12T09:35:47Z)
A Multi-Granularity Supervised Contrastive Framework for Remaining Useful Life Prediction of Aero-engines [2.0752500632458983]
This paper develops a multi-granularity supervised contrastive (MGSC) framework from plain intuition. It addresses the problems of too large minibatch size and unbalanced samples in the implementation. It also demonstrates a simple and scalable basic network structure and validates the proposed MGSC strategy on the CMPASS dataset.
arXiv Detail & Related papers (2024-11-01T09:18:38Z)
Bridging SFT and DPO for Diffusion Model Alignment with Self-Sampling Preference Optimization [67.8738082040299]
Self-Sampling Preference Optimization (SSPO) is a new alignment method for post-training reinforcement learning.<n>SSPO eliminates the need for paired data and reward models while retaining the training stability of SFT.<n>SSPO surpasses all previous approaches on the text-to-image benchmarks and demonstrates outstanding performance on the text-to-video benchmarks.
arXiv Detail & Related papers (2024-10-07T17:56:53Z)
Take the Bull by the Horns: Hard Sample-Reweighted Continual Training Improves LLM Generalization [165.98557106089777]
A key challenge is to enhance the capabilities of large language models (LLMs) amid a looming shortage of high-quality training data. Our study starts from an empirical strategy for the light continual training of LLMs using their original pre-training data sets. We then formalize this strategy into a principled framework of Instance-Reweighted Distributionally Robust Optimization.
arXiv Detail & Related papers (2024-02-22T04:10:57Z)
Frugal Actor-Critic: Sample Efficient Off-Policy Deep Reinforcement Learning Using Unique Experiences [8.983448736644382]
Efficient utilization of the replay buffer plays a significant role in the off-policy actor-critic reinforcement learning (RL) algorithms. We propose a method for achieving sample efficiency, which focuses on selecting unique samples and adding them to the replay buffer.
arXiv Detail & Related papers (2024-02-05T10:04:00Z)
Soft Random Sampling: A Theoretical and Empirical Analysis [59.719035355483875]
Soft random sampling (SRS) is a simple yet effective approach for efficient deep neural networks when dealing with massive data. It selects a uniformly speed at random with replacement from each data set in each epoch. It is shown to be a powerful and competitive strategy with significant and competitive performance on real-world industrial scale.
arXiv Detail & Related papers (2023-11-21T17:03:21Z)
AdaSAM: Boosting Sharpness-Aware Minimization with Adaptive Learning Rate and Momentum for Training Deep Neural Networks [76.90477930208982]
Sharpness aware (SAM) has been extensively explored as it can generalize better for training deep neural networks. Integrating SAM with adaptive learning perturbation and momentum acceleration, dubbed AdaSAM, has already been explored. We conduct several experiments on several NLP tasks, which show that AdaSAM could achieve superior performance compared with SGD, AMS, and SAMsGrad.
arXiv Detail & Related papers (2023-03-01T15:12:42Z)
Sampling Through the Lens of Sequential Decision Making [9.101505546901999]
We propose a reward-guided sampling strategy called Adaptive Sample with Reward (ASR) Our approach optimally adjusts the sampling process to achieve optimal performance. Empirical results in information retrieval and clustering demonstrate ASR's superb performance across different datasets.
arXiv Detail & Related papers (2022-08-17T04:01:29Z)
Neighborhood Mixup Experience Replay: Local Convex Interpolation for Improved Sample Efficiency in Continuous Control Tasks [60.88792564390274]
Neighborhood Mixup Experience Replay (NMER) is a geometrically-grounded replay buffer that interpolates transitions with their closest neighbors in state-action space. We observe that NMER improves sample efficiency by an average 94% (TD3) and 29% (SAC) over baseline replay buffers.
arXiv Detail & Related papers (2022-05-18T02:44:08Z)
Large Batch Experience Replay [22.473676537463607]
We introduce new theoretical foundations of Prioritized Experience Replay. LaBER is an easy-to-code and efficient method for sampling the replay buffer.
arXiv Detail & Related papers (2021-10-04T15:53:13Z)
Rethinking Sampling Strategies for Unsupervised Person Re-identification [59.47536050785886]
We analyze the reasons for the performance differences between various sampling strategies under the same framework and loss function. Group sampling is proposed, which gathers samples from the same class into groups. Experiments on Market-1501, DukeMTMC-reID and MSMT17 show that group sampling achieves performance comparable to state-of-the-art methods.
arXiv Detail & Related papers (2021-07-07T05:39:58Z)
Multi-Scale Positive Sample Refinement for Few-Shot Object Detection [61.60255654558682]
Few-shot object detection (FSOD) helps detectors adapt to unseen classes with few training instances. We propose a Multi-scale Positive Sample Refinement (MPSR) approach to enrich object scales in FSOD. MPSR generates multi-scale positive samples as object pyramids and refines the prediction at various scales.
arXiv Detail & Related papers (2020-07-18T09:48:29Z)
Learning to Sample with Local and Global Contexts in Experience Replay Buffer [135.94190624087355]
We propose a new learning-based sampling method that can compute the relative importance of transition. We show that our framework can significantly improve the performance of various off-policy reinforcement learning methods.
arXiv Detail & Related papers (2020-07-14T21:12:56Z)

This list is automatically generated from the titles and abstracts of the papers in this site.