Frugal Actor-Critic: Sample Efficient Off-Policy Deep Reinforcement
Learning Using Unique Experiences
- URL: http://arxiv.org/abs/2402.05963v1
- Date: Mon, 5 Feb 2024 10:04:00 GMT
- Title: Frugal Actor-Critic: Sample Efficient Off-Policy Deep Reinforcement
Learning Using Unique Experiences
- Authors: Nikhil Kumar Singh and Indranil Saha
- Abstract summary: Efficient utilization of the replay buffer plays a significant role in the off-policy actor-critic reinforcement learning (RL) algorithms.
We propose a method for achieving sample efficiency, which focuses on selecting unique samples and adding them to the replay buffer.
- Score: 8.983448736644382
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Efficient utilization of the replay buffer plays a significant role in the
off-policy actor-critic reinforcement learning (RL) algorithms used for
model-free control policy synthesis for complex dynamical systems. We propose a
method for achieving sample efficiency, which focuses on selecting unique
samples and adding them to the replay buffer during the exploration with the
goal of reducing the buffer size and maintaining the independent and
identically distributed (IID) nature of the samples. Our method is based on
selecting an important subset of the set of state variables from the
experiences encountered during the initial phase of random exploration,
partitioning the state space into a set of abstract states based on the
selected important state variables, and finally selecting the experiences with
unique state-reward combination by using a kernel density estimator. We
formally prove that the off-policy actor-critic algorithm incorporating the
proposed method for unique experience accumulation converges faster than the
vanilla off-policy actor-critic algorithm. Furthermore, we evaluate our method
by comparing it with two state-of-the-art actor-critic RL algorithms on several
continuous control benchmarks available in the Gym environment. Experimental
results demonstrate that our method achieves a significant reduction in the
size of the replay buffer for all the benchmarks while achieving either faster
convergent or better reward accumulation compared to the baseline algorithms.
Related papers
- Finite-Time Convergence and Sample Complexity of Actor-Critic Multi-Objective Reinforcement Learning [20.491176017183044]
This paper tackles the multi-objective reinforcement learning (MORL) problem.
It introduces an innovative actor-critic algorithm named MOAC which finds a policy by iteratively making trade-offs among conflicting reward signals.
arXiv Detail & Related papers (2024-05-05T23:52:57Z) - Optimal Sample Selection Through Uncertainty Estimation and Its
Application in Deep Learning [22.410220040736235]
We present a theoretically optimal solution for addressing both coreset selection and active learning.
Our proposed method, COPS, is designed to minimize the expected loss of a model trained on subsampled data.
arXiv Detail & Related papers (2023-09-05T14:06:33Z) - Sample and Predict Your Latent: Modality-free Sequential Disentanglement
via Contrastive Estimation [2.7759072740347017]
We introduce a self-supervised sequential disentanglement framework based on contrastive estimation with no external signals.
In practice, we propose a unified, efficient, and easy-to-code sampling strategy for semantically similar and dissimilar views of the data.
Our method presents state-of-the-art results in comparison to existing techniques.
arXiv Detail & Related papers (2023-05-25T10:50:30Z) - Sample Efficient Deep Reinforcement Learning via Local Planning [21.420851589712626]
This work focuses on sample-efficient deep reinforcement learning (RL) with a simulator.
We propose an algorithmic framework, named uncertainty-first local planning (UFLP), that takes advantage of this property.
We demonstrate that this simple procedure can dramatically improve the sample cost of several baseline RL algorithms on difficult exploration tasks.
arXiv Detail & Related papers (2023-01-29T23:17:26Z) - Towards Automated Imbalanced Learning with Deep Hierarchical
Reinforcement Learning [57.163525407022966]
Imbalanced learning is a fundamental challenge in data mining, where there is a disproportionate ratio of training samples in each class.
Over-sampling is an effective technique to tackle imbalanced learning through generating synthetic samples for the minority class.
We propose AutoSMOTE, an automated over-sampling algorithm that can jointly optimize different levels of decisions.
arXiv Detail & Related papers (2022-08-26T04:28:01Z) - Local policy search with Bayesian optimization [73.0364959221845]
Reinforcement learning aims to find an optimal policy by interaction with an environment.
Policy gradients for local search are often obtained from random perturbations.
We develop an algorithm utilizing a probabilistic model of the objective function and its gradient.
arXiv Detail & Related papers (2021-06-22T16:07:02Z) - DEALIO: Data-Efficient Adversarial Learning for Imitation from
Observation [57.358212277226315]
In imitation learning from observation IfO, a learning agent seeks to imitate a demonstrating agent using only observations of the demonstrated behavior without access to the control signals generated by the demonstrator.
Recent methods based on adversarial imitation learning have led to state-of-the-art performance on IfO problems, but they typically suffer from high sample complexity due to a reliance on data-inefficient, model-free reinforcement learning algorithms.
This issue makes them impractical to deploy in real-world settings, where gathering samples can incur high costs in terms of time, energy, and risk.
We propose a more data-efficient IfO algorithm
arXiv Detail & Related papers (2021-03-31T23:46:32Z) - Jo-SRC: A Contrastive Approach for Combating Noisy Labels [58.867237220886885]
We propose a noise-robust approach named Jo-SRC (Joint Sample Selection and Model Regularization based on Consistency)
Specifically, we train the network in a contrastive learning manner. Predictions from two different views of each sample are used to estimate its "likelihood" of being clean or out-of-distribution.
arXiv Detail & Related papers (2021-03-24T07:26:07Z) - Optimal Off-Policy Evaluation from Multiple Logging Policies [77.62012545592233]
We study off-policy evaluation from multiple logging policies, each generating a dataset of fixed size, i.e., stratified sampling.
We find the OPE estimator for multiple loggers with minimum variance for any instance, i.e., the efficient one.
arXiv Detail & Related papers (2020-10-21T13:43:48Z) - Adaptive Experience Selection for Policy Gradient [8.37609145576126]
Experience replay is a commonly used approach to improve sample efficiency.
gradient estimators using past trajectories typically have high variance.
Existing sampling strategies for experience replay like uniform sampling or prioritised experience replay do not explicitly try to control the variance of the gradient estimates.
We propose an online learning algorithm, adaptive experience selection (AES), to adaptively learn an experience sampling distribution that explicitly minimises this variance.
arXiv Detail & Related papers (2020-02-17T13:16:37Z) - The Simulator: Understanding Adaptive Sampling in the
Moderate-Confidence Regime [52.38455827779212]
We propose a novel technique for analyzing adaptive sampling called the em Simulator.
We prove the first instance-based lower bounds the top-k problem which incorporate the appropriate log-factors.
Our new analysis inspires a simple and near-optimal for the best-arm and top-k identification, the first em practical of its kind for the latter problem.
arXiv Detail & Related papers (2017-02-16T23:42:02Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.