Replay Buffer with Local Forgetting for Adapting to Local Environment
Changes in Deep Model-Based Reinforcement Learning
- URL: http://arxiv.org/abs/2303.08690v2
- Date: Wed, 27 Sep 2023 16:45:15 GMT
- Title: Replay Buffer with Local Forgetting for Adapting to Local Environment
Changes in Deep Model-Based Reinforcement Learning
- Authors: Ali Rahimi-Kalahroudi, Janarthanan Rajendran, Ida Momennejad, Harm van
Seijen, Sarath Chandar
- Abstract summary: We show that a simple variation of the first-in-first-out replay buffer is able to overcome the limitation of a replay buffer.
We demonstrate this by applying our replay-buffer variation to a deep version of the classical Dyna method.
- Score: 20.92599229976769
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: One of the key behavioral characteristics used in neuroscience to determine
whether the subject of study -- be it a rodent or a human -- exhibits
model-based learning is effective adaptation to local changes in the
environment, a particular form of adaptivity that is the focus of this work. In
reinforcement learning, however, recent work has shown that modern deep
model-based reinforcement-learning (MBRL) methods adapt poorly to local
environment changes. An explanation for this mismatch is that MBRL methods are
typically designed with sample-efficiency on a single task in mind and the
requirements for effective adaptation are substantially higher, both in terms
of the learned world model and the planning routine. One particularly
challenging requirement is that the learned world model has to be sufficiently
accurate throughout relevant parts of the state-space. This is challenging for
deep-learning-based world models due to catastrophic forgetting. And while a
replay buffer can mitigate the effects of catastrophic forgetting, the
traditional first-in-first-out replay buffer precludes effective adaptation due
to maintaining stale data. In this work, we show that a conceptually simple
variation of this traditional replay buffer is able to overcome this
limitation. By removing only samples from the buffer from the local
neighbourhood of the newly observed samples, deep world models can be built
that maintain their accuracy across the state-space, while also being able to
effectively adapt to local changes in the reward function. We demonstrate this
by applying our replay-buffer variation to a deep version of the classical Dyna
method, as well as to recent methods such as PlaNet and DreamerV2,
demonstrating that deep model-based methods can adapt effectively as well to
local changes in the environment.
Related papers
- Denoising as Adaptation: Noise-Space Domain Adaptation for Image Restoration [64.84134880709625]
We show that it is possible to perform domain adaptation via the noise space using diffusion models.
In particular, by leveraging the unique property of how auxiliary conditional inputs influence the multi-step denoising process, we derive a meaningful diffusion loss.
We present crucial strategies such as channel-shuffling layer and residual-swapping contrastive learning in the diffusion model.
arXiv Detail & Related papers (2024-06-26T17:40:30Z) - Towards Interpretable Deep Local Learning with Successive Gradient Reconciliation [70.43845294145714]
Relieving the reliance of neural network training on a global back-propagation (BP) has emerged as a notable research topic.
We propose a local training strategy that successively regularizes the gradient reconciliation between neighboring modules.
Our method can be integrated into both local-BP and BP-free settings.
arXiv Detail & Related papers (2024-06-07T19:10:31Z) - Partial Models for Building Adaptive Model-Based Reinforcement Learning Agents [37.604622216020765]
We show that the conceptually simple idea of partial models can allow deep model-based agents to overcome this challenge.
We demonstrate this by showing that the use of partial models in agents such as deep Dyna-Q, PlaNet and Dreamer can allow for them to effectively adapt to the local changes in their environments.
arXiv Detail & Related papers (2024-05-27T07:46:36Z) - Towards Seamless Adaptation of Pre-trained Models for Visual Place Recognition [72.35438297011176]
We propose a novel method to realize seamless adaptation of pre-trained models for visual place recognition (VPR)
Specifically, to obtain both global and local features that focus on salient landmarks for discriminating places, we design a hybrid adaptation method.
Experimental results show that our method outperforms the state-of-the-art methods with less training data and training time.
arXiv Detail & Related papers (2024-02-22T12:55:01Z) - Towards Continual Learning Desiderata via HSIC-Bottleneck
Orthogonalization and Equiangular Embedding [55.107555305760954]
We propose a conceptually simple yet effective method that attributes forgetting to layer-wise parameter overwriting and the resulting decision boundary distortion.
Our method achieves competitive accuracy performance, even with absolute superiority of zero exemplar buffer and 1.02x the base model.
arXiv Detail & Related papers (2024-01-17T09:01:29Z) - Normalization Perturbation: A Simple Domain Generalization Method for
Real-World Domain Shifts [133.99270341855728]
Real-world domain styles can vary substantially due to environment changes and sensor noises.
Deep models only know the training domain style.
We propose Normalization Perturbation to overcome this domain style overfitting problem.
arXiv Detail & Related papers (2022-11-08T17:36:49Z) - PointFix: Learning to Fix Domain Bias for Robust Online Stereo
Adaptation [67.41325356479229]
We propose to incorporate an auxiliary point-selective network into a meta-learning framework, called PointFix.
In a nutshell, our auxiliary network learns to fix local variants intensively by effectively back-propagating local information through the meta-gradient.
This network is model-agnostic, so can be used in any kind of architectures in a plug-and-play manner.
arXiv Detail & Related papers (2022-07-27T07:48:29Z) - Towards Evaluating Adaptivity of Model-Based Reinforcement Learning
Methods [25.05409184943328]
We show that well-known model-based methods perform poorly in their ability to adapt to local environmental changes.
We identify elements that hurt adaptive behavior and link these to underlying techniques frequently used in deep model-based RL.
We provide insights into the challenges of building an adaptive nonlinear model-based method.
arXiv Detail & Related papers (2022-04-25T06:45:16Z) - Learning Neural Models for Natural Language Processing in the Face of
Distributional Shift [10.990447273771592]
The dominating NLP paradigm of training a strong neural predictor to perform one task on a specific dataset has led to state-of-the-art performance in a variety of applications.
It builds upon the assumption that the data distribution is stationary, ie. that the data is sampled from a fixed distribution both at training and test time.
This way of training is inconsistent with how we as humans are able to learn from and operate within a constantly changing stream of information.
It is ill-adapted to real-world use cases where the data distribution is expected to shift over the course of a model's lifetime
arXiv Detail & Related papers (2021-09-03T14:29:20Z) - The LoCA Regret: A Consistent Metric to Evaluate Model-Based Behavior in
Reinforcement Learning [21.967763416902265]
We introduce an experimental setup to evaluate model-based behavior of RL methods.
Our metric can identify model-based behavior, even if the method uses a poor representation.
We use our setup to evaluate the model-based behavior of MuZero on a variation of the classic Mountain Car task.
arXiv Detail & Related papers (2020-07-07T01:34:55Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.