Understanding the effect of varying amounts of replay per step
- URL: http://arxiv.org/abs/2302.10311v1
- Date: Mon, 20 Feb 2023 20:54:11 GMT
- Title: Understanding the effect of varying amounts of replay per step
- Authors: Animesh Kumar Paul and Videh Raj Nema
- Abstract summary: We study the effect of varying amounts of replay per step in a well-known model-free algorithm: Deep Q-Network (DQN) in the Mountain Car environment.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Model-based reinforcement learning uses models to plan, where the predictions
and policies of an agent can be improved by using more computation without
additional data from the environment, thereby improving sample efficiency.
However, learning accurate estimates of the model is hard. Subsequently, the
natural question is whether we can get similar benefits as planning with
model-free methods. Experience replay is an essential component of many
model-free algorithms enabling sample-efficient learning and stability by
providing a mechanism to store past experiences for further reuse in the
gradient computational process. Prior works have established connections
between models and experience replay by planning with the latter. This involves
increasing the number of times a mini-batch is sampled and used for updates at
each step (amount of replay per step). We attempt to exploit this connection by
doing a systematic study on the effect of varying amounts of replay per step in
a well-known model-free algorithm: Deep Q-Network (DQN) in the Mountain Car
environment. We empirically show that increasing replay improves DQN's sample
efficiency, reduces the variation in its performance, and makes it more robust
to change in hyperparameters. Altogether, this takes a step toward a better
algorithm for deployment.
Related papers
- Enhancing Consistency and Mitigating Bias: A Data Replay Approach for
Incremental Learning [100.7407460674153]
Deep learning systems are prone to catastrophic forgetting when learning from a sequence of tasks.
To mitigate the problem, a line of methods propose to replay the data of experienced tasks when learning new tasks.
However, it is not expected in practice considering the memory constraint or data privacy issue.
As a replacement, data-free data replay methods are proposed by inverting samples from the classification model.
arXiv Detail & Related papers (2024-01-12T12:51:12Z) - EsaCL: Efficient Continual Learning of Sparse Models [10.227171407348326]
Key challenge in the continual learning setting is to efficiently learn a sequence of tasks without forgetting how to perform previously learned tasks.
We propose a new method for efficient continual learning of sparse models (EsaCL) that can automatically prune redundant parameters without adversely impacting the model's predictive power.
arXiv Detail & Related papers (2024-01-11T04:59:44Z) - Consensus-Adaptive RANSAC [104.87576373187426]
We propose a new RANSAC framework that learns to explore the parameter space by considering the residuals seen so far via a novel attention layer.
The attention mechanism operates on a batch of point-to-model residuals, and updates a per-point estimation state to take into account the consensus found through a lightweight one-step transformer.
arXiv Detail & Related papers (2023-07-26T08:25:46Z) - Value function estimation using conditional diffusion models for control [62.27184818047923]
We propose a simple algorithm called Diffused Value Function (DVF)
It learns a joint multi-step model of the environment-robot interaction dynamics using a diffusion model.
We show how DVF can be used to efficiently capture the state visitation measure for multiple controllers.
arXiv Detail & Related papers (2023-06-09T18:40:55Z) - ScoreMix: A Scalable Augmentation Strategy for Training GANs with
Limited Data [93.06336507035486]
Generative Adversarial Networks (GANs) typically suffer from overfitting when limited training data is available.
We present ScoreMix, a novel and scalable data augmentation approach for various image synthesis tasks.
arXiv Detail & Related papers (2022-10-27T02:55:15Z) - Measuring and Reducing Model Update Regression in Structured Prediction
for NLP [31.86240946966003]
backward compatibility requires that the new model does not regress on cases that were correctly handled by its predecessor.
This work studies model update regression in structured prediction tasks.
We propose a simple and effective method, Backward-Congruent Re-ranking (BCR), by taking into account the characteristics of structured output.
arXiv Detail & Related papers (2022-02-07T07:04:54Z) - Learning Expected Emphatic Traces for Deep RL [32.984880782688535]
Off-policy sampling and experience replay are key for improving sample efficiency and scaling model-free temporal difference learning methods.
We develop a multi-step emphatic weighting that can be combined with replay, and a time-reversed $n$-step TD learning algorithm to learn the required emphatic weighting.
arXiv Detail & Related papers (2021-07-12T13:14:03Z) - Regression Bugs Are In Your Model! Measuring, Reducing and Analyzing
Regressions In NLP Model Updates [68.09049111171862]
This work focuses on quantifying, reducing and analyzing regression errors in the NLP model updates.
We formulate the regression-free model updates into a constrained optimization problem.
We empirically analyze how model ensemble reduces regression.
arXiv Detail & Related papers (2021-05-07T03:33:00Z) - Sample-efficient reinforcement learning using deep Gaussian processes [18.044018772331636]
Reinforcement learning provides a framework for learning to control which actions to take towards completing a task through trial-and-error.
In model-based reinforcement learning efficiency is improved by learning to simulate the world dynamics.
We introduce deep Gaussian processes where the depth of the compositions introduces model complexity while incorporating prior knowledge on the dynamics brings smoothness and structure.
arXiv Detail & Related papers (2020-11-02T13:37:57Z) - Extrapolation for Large-batch Training in Deep Learning [72.61259487233214]
We show that a host of variations can be covered in a unified framework that we propose.
We prove the convergence of this novel scheme and rigorously evaluate its empirical performance on ResNet, LSTM, and Transformer.
arXiv Detail & Related papers (2020-06-10T08:22:41Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.