Hierarchical Subspaces of Policies for Continual Offline Reinforcement Learning
- URL: http://arxiv.org/abs/2412.14865v3
- Date: Fri, 11 Apr 2025 15:18:20 GMT
- Title: Hierarchical Subspaces of Policies for Continual Offline Reinforcement Learning
- Authors: Anthony Kobanda, Rémy Portelas, Odalric-Ambrym Maillard, Ludovic Denoyer,
- Abstract summary: We consider a Continual Reinforcement Learning setup, where a learning agent must continuously adapt to new tasks while retaining previously acquired skill sets.<n>We introduce HiSPO, a novel hierarchical framework designed specifically for continual learning in navigation settings from offline data.<n>We demonstrate, through a careful experimental study, the effectiveness of our method in both classical MuJoCo maze environments and complex video game-like navigation simulations.
- Score: 19.463863037999054
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We consider a Continual Reinforcement Learning setup, where a learning agent must continuously adapt to new tasks while retaining previously acquired skill sets, with a focus on the challenge of avoiding forgetting past gathered knowledge and ensuring scalability with the growing number of tasks. Such issues prevail in autonomous robotics and video game simulations, notably for navigation tasks prone to topological or kinematic changes. To address these issues, we introduce HiSPO, a novel hierarchical framework designed specifically for continual learning in navigation settings from offline data. Our method leverages distinct policy subspaces of neural networks to enable flexible and efficient adaptation to new tasks while preserving existing knowledge. We demonstrate, through a careful experimental study, the effectiveness of our method in both classical MuJoCo maze environments and complex video game-like navigation simulations, showcasing competitive performances and satisfying adaptability with respect to classical continual learning metrics, in particular regarding the memory usage and efficiency.
Related papers
- Analytic Subspace Routing: How Recursive Least Squares Works in Continual Learning of Large Language Model [6.42114585934114]
Large Language Models (LLMs) possess capabilities that can process diverse language-related tasks.
Continual Learning in Large Language Models (LLMs) arises which aims to continually adapt the LLMs to new tasks.
This paper proposes Analytic Subspace Routing(ASR) to address these challenges.
arXiv Detail & Related papers (2025-03-17T13:40:46Z) - Continual Task Learning through Adaptive Policy Self-Composition [54.95680427960524]
CompoFormer is a structure-based continual transformer model that adaptively composes previous policies via a meta-policy network.
Our experiments reveal that CompoFormer outperforms conventional continual learning (CL) methods, particularly in longer task sequences.
arXiv Detail & Related papers (2024-11-18T08:20:21Z) - I Know How: Combining Prior Policies to Solve New Tasks [17.214443593424498]
Multi-Task Reinforcement Learning aims at developing agents that are able to continually evolve and adapt to new scenarios.
Learning from scratch for each new task is not a viable or sustainable option.
We propose a new framework, I Know How, which provides a common formalization.
arXiv Detail & Related papers (2024-06-14T08:44:51Z) - Towards Robust Continual Learning with Bayesian Adaptive Moment Regularization [51.34904967046097]
Continual learning seeks to overcome the challenge of catastrophic forgetting, where a model forgets previously learnt information.
We introduce a novel prior-based method that better constrains parameter growth, reducing catastrophic forgetting.
Results show that BAdam achieves state-of-the-art performance for prior-based methods on challenging single-headed class-incremental experiments.
arXiv Detail & Related papers (2023-09-15T17:10:51Z) - Online Continual Learning via the Knowledge Invariant and Spread-out
Properties [4.109784267309124]
Key challenge in continual learning is catastrophic forgetting.
We propose a new method, named Online Continual Learning via the Knowledge Invariant and Spread-out Properties (OCLKISP)
We empirically evaluate our proposed method on four popular benchmarks for continual learning: Split CIFAR 100, Split SVHN, Split CUB200 and Split Tiny-Image-Net.
arXiv Detail & Related papers (2023-02-02T04:03:38Z) - Learning Goal-Conditioned Policies Offline with Self-Supervised Reward
Shaping [94.89128390954572]
We propose a novel self-supervised learning phase on the pre-collected dataset to understand the structure and the dynamics of the model.
We evaluate our method on three continuous control tasks, and show that our model significantly outperforms existing approaches.
arXiv Detail & Related papers (2023-01-05T15:07:10Z) - Learning and Retrieval from Prior Data for Skill-based Imitation
Learning [47.59794569496233]
We develop a skill-based imitation learning framework that extracts temporally extended sensorimotor skills from prior data.
We identify several key design choices that significantly improve performance on novel tasks.
arXiv Detail & Related papers (2022-10-20T17:34:59Z) - Skill-based Meta-Reinforcement Learning [65.31995608339962]
We devise a method that enables meta-learning on long-horizon, sparse-reward tasks.
Our core idea is to leverage prior experience extracted from offline datasets during meta-learning.
arXiv Detail & Related papers (2022-04-25T17:58:19Z) - Continual Prompt Tuning for Dialog State Tracking [58.66412648276873]
A desirable dialog system should be able to continually learn new skills without forgetting old ones.
We present Continual Prompt Tuning, a parameter-efficient framework that not only avoids forgetting but also enables knowledge transfer between tasks.
arXiv Detail & Related papers (2022-03-13T13:22:41Z) - Fully Online Meta-Learning Without Task Boundaries [80.09124768759564]
We study how meta-learning can be applied to tackle online problems of this nature.
We propose a Fully Online Meta-Learning (FOML) algorithm, which does not require any ground truth knowledge about the task boundaries.
Our experiments show that FOML was able to learn new tasks faster than the state-of-the-art online learning methods.
arXiv Detail & Related papers (2022-02-01T07:51:24Z) - AFEC: Active Forgetting of Negative Transfer in Continual Learning [37.03139674884091]
We show that biological neural networks can actively forget the old knowledge that conflicts with the learning of a new experience.
Inspired by the biological active forgetting, we propose to actively forget the old knowledge that limits the learning of new tasks to benefit continual learning.
arXiv Detail & Related papers (2021-10-23T10:03:19Z) - Adaptive Explainable Continual Learning Framework for Regression
Problems with Focus on Power Forecasts [0.0]
Two continual learning scenarios will be proposed to describe the potential challenges in this context.
Deep neural networks have to learn new tasks and overcome forgetting the knowledge obtained from the old tasks as the amount of data keeps increasing in applications.
Research topics are related but not limited to developing continual deep learning algorithms, strategies for non-stationarity detection in data streams, explainable and visualizable artificial intelligence, etc.
arXiv Detail & Related papers (2021-08-24T14:59:10Z) - Bilevel Continual Learning [76.50127663309604]
We present a novel framework of continual learning named "Bilevel Continual Learning" (BCL)
Our experiments on continual learning benchmarks demonstrate the efficacy of the proposed BCL compared to many state-of-the-art methods.
arXiv Detail & Related papers (2020-07-30T16:00:23Z) - Neuromodulated Neural Architectures with Local Error Signals for
Memory-Constrained Online Continual Learning [4.2903672492917755]
We develop a biologically-inspired light weight neural network architecture that incorporates local learning and neuromodulation.
We demonstrate the efficacy of our approach on both single task and continual learning setting.
arXiv Detail & Related papers (2020-07-16T07:41:23Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.