MEET: A Monte Carlo Exploration-Exploitation Trade-off for Buffer
Sampling
- URL: http://arxiv.org/abs/2210.13545v2
- Date: Mon, 17 Apr 2023 07:11:03 GMT
- Title: MEET: A Monte Carlo Exploration-Exploitation Trade-off for Buffer
Sampling
- Authors: Julius Ott, Lorenzo Servadei, Jose Arjona-Medina, Enrico Rinaldi,
Gianfranco Mauro, Daniela S\'anchez Lopera, Michael Stephan, Thomas
Stadelmayer, Avik Santra, Robert Wille
- Abstract summary: State-of-the-art sampling strategies for the experience replay buffer improve the performance of the Reinforcement Learning agent.
They do not incorporate uncertainty in the Q-Value estimation.
This paper proposes a new sampling strategy that leverages the exploration-exploitation trade-off.
- Score: 2.501153467354696
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Data selection is essential for any data-based optimization technique, such
as Reinforcement Learning. State-of-the-art sampling strategies for the
experience replay buffer improve the performance of the Reinforcement Learning
agent. However, they do not incorporate uncertainty in the Q-Value estimation.
Consequently, they cannot adapt the sampling strategies, including exploration
and exploitation of transitions, to the complexity of the task. To address
this, this paper proposes a new sampling strategy that leverages the
exploration-exploitation trade-off. This is enabled by the uncertainty
estimation of the Q-Value function, which guides the sampling to explore more
significant transitions and, thus, learn a more efficient policy. Experiments
on classical control environments demonstrate stable results across various
environments. They show that the proposed method outperforms state-of-the-art
sampling strategies for dense rewards w.r.t. convergence and peak performance
by 26% on average.
Related papers
- Memory Sequence Length of Data Sampling Impacts the Adaptation of Meta-Reinforcement Learning Agents [1.971759811837406]
We investigate the impact of data sampling strategies on the exploration and adaptability of meta-RL agents.
Our analysis revealed the long-memory and short-memory sequence sampling strategies affect the representation and adaptive capabilities of meta-RL agents.
arXiv Detail & Related papers (2024-06-18T07:41:40Z) - Take the Bull by the Horns: Hard Sample-Reweighted Continual Training
Improves LLM Generalization [165.98557106089777]
A key challenge is to enhance the capabilities of large language models (LLMs) amid a looming shortage of high-quality training data.
Our study starts from an empirical strategy for the light continual training of LLMs using their original pre-training data sets.
We then formalize this strategy into a principled framework of Instance-Reweighted Distributionally Robust Optimization.
arXiv Detail & Related papers (2024-02-22T04:10:57Z) - Gradient and Uncertainty Enhanced Sequential Sampling for Global Fit [0.0]
This paper proposes a new sampling strategy for global fit called Gradient and Uncertainty Enhanced Sequential Sampling (GUESS)
We show that GUESS achieved on average the highest sample efficiency compared to other surrogate-based strategies on the tested examples.
arXiv Detail & Related papers (2023-09-29T19:49:39Z) - Sampling Through the Lens of Sequential Decision Making [9.101505546901999]
We propose a reward-guided sampling strategy called Adaptive Sample with Reward (ASR)
Our approach optimally adjusts the sampling process to achieve optimal performance.
Empirical results in information retrieval and clustering demonstrate ASR's superb performance across different datasets.
arXiv Detail & Related papers (2022-08-17T04:01:29Z) - SURF: Semi-supervised Reward Learning with Data Augmentation for
Feedback-efficient Preference-based Reinforcement Learning [168.89470249446023]
We present SURF, a semi-supervised reward learning framework that utilizes a large amount of unlabeled samples with data augmentation.
In order to leverage unlabeled samples for reward learning, we infer pseudo-labels of the unlabeled samples based on the confidence of the preference predictor.
Our experiments demonstrate that our approach significantly improves the feedback-efficiency of the preference-based method on a variety of locomotion and robotic manipulation tasks.
arXiv Detail & Related papers (2022-03-18T16:50:38Z) - Impact of Channel Variation on One-Class Learning for Spoof Detection [5.549602650463701]
Spoofing detection increases the reliability of the ASV system but degrades significantly due to channel variation.
"Which data-feeding strategy is optimal for MCT?" is not known in the case of spoof detection.
This study highlights the relevance of the deemed-of-low-importance process of data-feeding and mini-batching to raise awareness of the need to refine it for better performance.
arXiv Detail & Related papers (2021-09-30T07:56:16Z) - DEALIO: Data-Efficient Adversarial Learning for Imitation from
Observation [57.358212277226315]
In imitation learning from observation IfO, a learning agent seeks to imitate a demonstrating agent using only observations of the demonstrated behavior without access to the control signals generated by the demonstrator.
Recent methods based on adversarial imitation learning have led to state-of-the-art performance on IfO problems, but they typically suffer from high sample complexity due to a reliance on data-inefficient, model-free reinforcement learning algorithms.
This issue makes them impractical to deploy in real-world settings, where gathering samples can incur high costs in terms of time, energy, and risk.
We propose a more data-efficient IfO algorithm
arXiv Detail & Related papers (2021-03-31T23:46:32Z) - Federated Learning under Importance Sampling [49.17137296715029]
We study the effect of importance sampling and devise schemes for sampling agents and data non-uniformly guided by a performance measure.
We find that in schemes involving sampling without replacement, the performance of the resulting architecture is controlled by two factors related to data variability at each agent.
arXiv Detail & Related papers (2020-12-14T10:08:55Z) - Optimal Importance Sampling for Federated Learning [57.14673504239551]
Federated learning involves a mixture of centralized and decentralized processing tasks.
The sampling of both agents and data is generally uniform; however, in this work we consider non-uniform sampling.
We derive optimal importance sampling strategies for both agent and data selection and show that non-uniform sampling without replacement improves the performance of the original FedAvg algorithm.
arXiv Detail & Related papers (2020-10-26T14:15:33Z) - Learning to Sample with Local and Global Contexts in Experience Replay
Buffer [135.94190624087355]
We propose a new learning-based sampling method that can compute the relative importance of transition.
We show that our framework can significantly improve the performance of various off-policy reinforcement learning methods.
arXiv Detail & Related papers (2020-07-14T21:12:56Z) - Adaptive Experience Selection for Policy Gradient [8.37609145576126]
Experience replay is a commonly used approach to improve sample efficiency.
gradient estimators using past trajectories typically have high variance.
Existing sampling strategies for experience replay like uniform sampling or prioritised experience replay do not explicitly try to control the variance of the gradient estimates.
We propose an online learning algorithm, adaptive experience selection (AES), to adaptively learn an experience sampling distribution that explicitly minimises this variance.
arXiv Detail & Related papers (2020-02-17T13:16:37Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.