Related papers: Replay Buffer with Local Forgetting for Adapting to Local Environment Changes in Deep Model-Based Reinforcement Learning

Replay Buffer with Local Forgetting for Adapting to Local Environment Changes in Deep Model-Based Reinforcement Learning

URL: http://arxiv.org/abs/2303.08690v2
Date: Wed, 27 Sep 2023 16:45:15 GMT
Title: Replay Buffer with Local Forgetting for Adapting to Local Environment Changes in Deep Model-Based Reinforcement Learning
Authors: Ali Rahimi-Kalahroudi, Janarthanan Rajendran, Ida Momennejad, Harm van Seijen, Sarath Chandar
Abstract summary: We show that a simple variation of the first-in-first-out replay buffer is able to overcome the limitation of a replay buffer. We demonstrate this by applying our replay-buffer variation to a deep version of the classical Dyna method.
Score: 20.92599229976769
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: One of the key behavioral characteristics used in neuroscience to determine whether the subject of study -- be it a rodent or a human -- exhibits model-based learning is effective adaptation to local changes in the environment, a particular form of adaptivity that is the focus of this work. In reinforcement learning, however, recent work has shown that modern deep model-based reinforcement-learning (MBRL) methods adapt poorly to local environment changes. An explanation for this mismatch is that MBRL methods are typically designed with sample-efficiency on a single task in mind and the requirements for effective adaptation are substantially higher, both in terms of the learned world model and the planning routine. One particularly challenging requirement is that the learned world model has to be sufficiently accurate throughout relevant parts of the state-space. This is challenging for deep-learning-based world models due to catastrophic forgetting. And while a replay buffer can mitigate the effects of catastrophic forgetting, the traditional first-in-first-out replay buffer precludes effective adaptation due to maintaining stale data. In this work, we show that a conceptually simple variation of this traditional replay buffer is able to overcome this limitation. By removing only samples from the buffer from the local neighbourhood of the newly observed samples, deep world models can be built that maintain their accuracy across the state-space, while also being able to effectively adapt to local changes in the reward function. We demonstrate this by applying our replay-buffer variation to a deep version of the classical Dyna method, as well as to recent methods such as PlaNet and DreamerV2, demonstrating that deep model-based methods can adapt effectively as well to local changes in the environment.

Related papers

Replay to Remember: Retaining Domain Knowledge in Streaming Language Models [0.0]
Continual learning in large language models (LLMs) typically encounters the critical challenge of catastrophic forgetting. We demonstrate a method combining LoRA and a minimal replay mechanism in a realistic streaming setting. Our experiments reveal that while catastrophic forgetting naturally occurs, even minimal replay significantly stabilizes and partially restores domain-specific knowledge.
arXiv Detail & Related papers (2025-04-24T17:56:22Z)
SPARTAN: A Sparse Transformer Learning Local Causation [63.29645501232935]
Causal structures play a central role in world models that flexibly adapt to changes in the environment. We present the SPARse TrANsformer World model (SPARTAN), a Transformer-based world model that learns local causal structures between entities in a scene. By applying sparsity regularisation on the attention pattern between object-factored tokens, SPARTAN identifies sparse local causal models that accurately predict future object states.
arXiv Detail & Related papers (2024-11-11T11:42:48Z)
Towards Interpretable Deep Local Learning with Successive Gradient Reconciliation [70.43845294145714]
Relieving the reliance of neural network training on a global back-propagation (BP) has emerged as a notable research topic. We propose a local training strategy that successively regularizes the gradient reconciliation between neighboring modules. Our method can be integrated into both local-BP and BP-free settings.
arXiv Detail & Related papers (2024-06-07T19:10:31Z)
Partial Models for Building Adaptive Model-Based Reinforcement Learning Agents [37.604622216020765]
We show that the conceptually simple idea of partial models can allow deep model-based agents to overcome this challenge. We demonstrate this by showing that the use of partial models in agents such as deep Dyna-Q, PlaNet and Dreamer can allow for them to effectively adapt to the local changes in their environments.
arXiv Detail & Related papers (2024-05-27T07:46:36Z)
Towards Seamless Adaptation of Pre-trained Models for Visual Place Recognition [72.35438297011176]
We propose a novel method to realize seamless adaptation of pre-trained models for visual place recognition (VPR) Specifically, to obtain both global and local features that focus on salient landmarks for discriminating places, we design a hybrid adaptation method. Experimental results show that our method outperforms the state-of-the-art methods with less training data and training time.
arXiv Detail & Related papers (2024-02-22T12:55:01Z)
Towards Continual Learning Desiderata via HSIC-Bottleneck Orthogonalization and Equiangular Embedding [55.107555305760954]
We propose a conceptually simple yet effective method that attributes forgetting to layer-wise parameter overwriting and the resulting decision boundary distortion. Our method achieves competitive accuracy performance, even with absolute superiority of zero exemplar buffer and 1.02x the base model.
arXiv Detail & Related papers (2024-01-17T09:01:29Z)
Normalization Perturbation: A Simple Domain Generalization Method for Real-World Domain Shifts [133.99270341855728]
Real-world domain styles can vary substantially due to environment changes and sensor noises. Deep models only know the training domain style. We propose Normalization Perturbation to overcome this domain style overfitting problem.
arXiv Detail & Related papers (2022-11-08T17:36:49Z)
PointFix: Learning to Fix Domain Bias for Robust Online Stereo Adaptation [67.41325356479229]
We propose to incorporate an auxiliary point-selective network into a meta-learning framework, called PointFix. In a nutshell, our auxiliary network learns to fix local variants intensively by effectively back-propagating local information through the meta-gradient. This network is model-agnostic, so can be used in any kind of architectures in a plug-and-play manner.
arXiv Detail & Related papers (2022-07-27T07:48:29Z)
Towards Evaluating Adaptivity of Model-Based Reinforcement Learning Methods [25.05409184943328]
We show that well-known model-based methods perform poorly in their ability to adapt to local environmental changes. We identify elements that hurt adaptive behavior and link these to underlying techniques frequently used in deep model-based RL. We provide insights into the challenges of building an adaptive nonlinear model-based method.
arXiv Detail & Related papers (2022-04-25T06:45:16Z)
Learning Neural Models for Natural Language Processing in the Face of Distributional Shift [10.990447273771592]
The dominating NLP paradigm of training a strong neural predictor to perform one task on a specific dataset has led to state-of-the-art performance in a variety of applications. It builds upon the assumption that the data distribution is stationary, ie. that the data is sampled from a fixed distribution both at training and test time. This way of training is inconsistent with how we as humans are able to learn from and operate within a constantly changing stream of information. It is ill-adapted to real-world use cases where the data distribution is expected to shift over the course of a model's lifetime
arXiv Detail & Related papers (2021-09-03T14:29:20Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.