Neuro-Inspired Fragmentation and Recall to Overcome Catastrophic
Forgetting in Curiosity
- URL: http://arxiv.org/abs/2310.17537v1
- Date: Thu, 26 Oct 2023 16:28:17 GMT
- Title: Neuro-Inspired Fragmentation and Recall to Overcome Catastrophic
Forgetting in Curiosity
- Authors: Jaedong Hwang, Zhang-Wei Hong, Eric Chen, Akhilan Boopathy, Pulkit
Agrawal, Ila Fiete
- Abstract summary: Deep reinforcement learning methods exhibit impressive performance on a range of tasks but struggle on hard exploration tasks in large environments with sparse rewards.
Prediction-based intrinsic rewards can help agents solve hard exploration tasks, but they can suffer from catastrophic forgetting.
We propose a new method FARCuriosity, inspired by how humans and animals learn.
- Score: 31.396929282048916
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Deep reinforcement learning methods exhibit impressive performance on a range
of tasks but still struggle on hard exploration tasks in large environments
with sparse rewards. To address this, intrinsic rewards can be generated using
forward model prediction errors that decrease as the environment becomes known,
and incentivize an agent to explore novel states. While prediction-based
intrinsic rewards can help agents solve hard exploration tasks, they can suffer
from catastrophic forgetting and actually increase at visited states. We first
examine the conditions and causes of catastrophic forgetting in grid world
environments. We then propose a new method FARCuriosity, inspired by how humans
and animals learn. The method depends on fragmentation and recall: an agent
fragments an environment based on surprisal, and uses different local curiosity
modules (prediction-based intrinsic reward functions) for each fragment so that
modules are not trained on the entire environment. At each fragmentation event,
the agent stores the current module in long-term memory (LTM) and either
initializes a new module or recalls a previously stored module based on its
match with the current state. With fragmentation and recall, FARCuriosity
achieves less forgetting and better overall performance in games with varied
and heterogeneous environments in the Atari benchmark suite of tasks. Thus,
this work highlights the problem of catastrophic forgetting in prediction-based
curiosity methods and proposes a solution.
Related papers
- Variable-Agnostic Causal Exploration for Reinforcement Learning [56.52768265734155]
We introduce a novel framework, Variable-Agnostic Causal Exploration for Reinforcement Learning (VACERL)
Our approach automatically identifies crucial observation-action steps associated with key variables using attention mechanisms.
It constructs the causal graph connecting these steps, which guides the agent towards observation-action pairs with greater causal influence on task completion.
arXiv Detail & Related papers (2024-07-17T09:45:27Z) - AI planning in the imagination: High-level planning on learned abstract
search spaces [68.75684174531962]
We propose a new method, called PiZero, that gives an agent the ability to plan in an abstract search space that the agent learns during training.
We evaluate our method on multiple domains, including the traveling salesman problem, Sokoban, 2048, the facility location problem, and Pacman.
arXiv Detail & Related papers (2023-08-16T22:47:16Z) - Exploration via Elliptical Episodic Bonuses [22.404871878551354]
We introduce Exploration via Episodic Bonuses (E3B), a new method which extends count-based episodic bonuses to continuous state spaces.
Our method sets a new state-of-the-art across 16 challenging tasks from the MiniHack suite, without requiring task-specific inductive biases.
E3B also matches existing methods on sparse reward, pixel-based VizDoom environments, and outperforms existing methods in reward-free exploration on Habitat.
arXiv Detail & Related papers (2022-10-11T22:10:23Z) - CoSeg: Cognitively Inspired Unsupervised Generic Event Segmentation [118.18977078626776]
We propose an end-to-end self-supervised learning framework for event segmentation/boundary detection.
Our framework exploits a transformer-based feature reconstruction scheme to detect event boundary by reconstruction errors.
The goal of our work is to segment generic events rather than localize some specific ones.
arXiv Detail & Related papers (2021-09-30T14:40:32Z) - Long-Term Exploration in Persistent MDPs [68.8204255655161]
We propose an exploration method called Rollback-Explore (RbExplore)
In this paper, we propose an exploration method called Rollback-Explore (RbExplore), which utilizes the concept of the persistent Markov decision process.
We test our algorithm in the hard-exploration Prince of Persia game, without rewards and domain knowledge.
arXiv Detail & Related papers (2021-09-21T13:47:04Z) - Curious Exploration and Return-based Memory Restoration for Deep
Reinforcement Learning [2.3226893628361682]
In this paper, we focus on training a single agent to score goals with binary success/failure reward function.
The proposed method can be utilized to train agents in environments with fairly complex state and action spaces.
arXiv Detail & Related papers (2021-05-02T16:01:34Z) - Rank the Episodes: A Simple Approach for Exploration in
Procedurally-Generated Environments [66.80667987347151]
Methods based on intrinsic rewards often fall short in procedurally-generated environments.
We introduce RAPID, a simple yet effective episode-level exploration method for procedurally-generated environments.
We demonstrate our method on several procedurally-generated MiniGrid environments, a first-person-view 3D Maze navigation task from MiniWorld, and several sparse MuJoCo tasks.
arXiv Detail & Related papers (2021-01-20T14:22:01Z) - Latent World Models For Intrinsically Motivated Exploration [140.21871701134626]
We present a self-supervised representation learning method for image-based observations.
We consider episodic and life-long uncertainties to guide the exploration of partially observable environments.
arXiv Detail & Related papers (2020-10-05T19:47:04Z) - RIDE: Rewarding Impact-Driven Exploration for Procedurally-Generated
Environments [15.736899098702972]
We propose a novel type of intrinsic reward which encourages the agent to take actions that lead to significant changes in its learned state representation.
We evaluate our method on multiple challenging procedurally-generated tasks in MiniGrid.
arXiv Detail & Related papers (2020-02-27T18:03:16Z) - Long-Term Visitation Value for Deep Exploration in Sparse Reward
Reinforcement Learning [34.38011902445557]
Reinforcement learning with sparse rewards is still an open challenge.
We present a novel approach that plans exploration actions far into the future by using a long-term visitation count.
Contrary to existing methods which use models of reward and dynamics, our approach is off-policy and model-free.
arXiv Detail & Related papers (2020-01-01T01:01:15Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.