Related papers: Building a Subspace of Policies for Scalable Continual Learning

Building a Subspace of Policies for Scalable Continual Learning

URL: http://arxiv.org/abs/2211.10445v1
Date: Fri, 18 Nov 2022 14:59:42 GMT
Title: Building a Subspace of Policies for Scalable Continual Learning
Authors: Jean-Baptiste Gaya, Thang Doan, Lucas Caccia, Laure Soulier, Ludovic Denoyer, Roberta Raileanu
Abstract summary: We introduce Continual Subspace of Policies (CSP), a new approach that incrementally builds a subspace of policies for training a reinforcement learning agent on a sequence of tasks. CSP outperforms a number of popular baselines on a wide range of scenarios from two challenging domains, Brax (locomotion) and Continual World (manipulation)
Score: 21.03369477853538
License: http://creativecommons.org/licenses/by/4.0/
Abstract: The ability to continuously acquire new knowledge and skills is crucial for autonomous agents. Existing methods are typically based on either fixed-size models that struggle to learn a large number of diverse behaviors, or growing-size models that scale poorly with the number of tasks. In this work, we aim to strike a better balance between an agent's size and performance by designing a method that grows adaptively depending on the task sequence. We introduce Continual Subspace of Policies (CSP), a new approach that incrementally builds a subspace of policies for training a reinforcement learning agent on a sequence of tasks. The subspace's high expressivity allows CSP to perform well for many different tasks while growing sublinearly with the number of tasks. Our method does not suffer from forgetting and displays positive transfer to new tasks. CSP outperforms a number of popular baselines on a wide range of scenarios from two challenging domains, Brax (locomotion) and Continual World (manipulation).

Related papers

Bigger, Regularized, Categorical: High-Capacity Value Functions are Efficient Multi-Task Learners [60.75160178669076]
We show that the use of high-capacity value models trained via cross-entropy and conditioned on learnable task embeddings addresses the problem of task interference in online reinforcement learning.<n>We test our approach on 7 multi-task benchmarks with over 280 unique tasks, spanning high degree-of-freedom humanoid control and discrete vision-based RL.
arXiv Detail & Related papers (2025-05-29T06:41:45Z)
Continuous Subspace Optimization for Continual Learning [24.597922531045846]
Continual learning aims to learn multiple tasks sequentially while preserving prior knowledge.<n>We propose Continuous Subspace Optimization for Continual Learning (CoSO) to fine-tune the model in a series of subspaces rather than a single one.<n>CoSO significantly outperforms state-of-the-art methods, especially in challenging scenarios with long task sequences.
arXiv Detail & Related papers (2025-05-17T03:53:21Z)
FM-LoRA: Factorized Low-Rank Meta-Prompting for Continual Learning [19.068489119024388]
Continual learning has emerged as a promising approach to leverage pre-trained models for sequential tasks. Many existing CL methods incrementally store additional learned structures, such as Low-Rank Adaptation (LoRA) adapters or prompts. We propose FM-LoRA, a novel and efficient low-rank adaptation method that integrates both a dynamic rank selector (DRS) and dynamic meta-prompting (DMP)
arXiv Detail & Related papers (2025-04-09T19:36:18Z)
Continual Task Learning through Adaptive Policy Self-Composition [54.95680427960524]
CompoFormer is a structure-based continual transformer model that adaptively composes previous policies via a meta-policy network. Our experiments reveal that CompoFormer outperforms conventional continual learning (CL) methods, particularly in longer task sequences.
arXiv Detail & Related papers (2024-11-18T08:20:21Z)
Hierarchical Orchestra of Policies [1.6574413179773757]
HOP dynamically forms a hierarchy of policies based on a similarity metric between the current observations and previously encountered observations in successful tasks. HOP does not require task labelling, allowing for robust adaptation in environments where boundaries between tasks are ambiguous. Our experiments, conducted across multiple tasks in a procedurally generated suite of environments, demonstrate that HOP significantly outperforms baseline methods in retaining knowledge across tasks.
arXiv Detail & Related papers (2024-11-05T11:13:09Z)
LiNeS: Post-training Layer Scaling Prevents Forgetting and Enhances Model Merging [80.17238673443127]
LiNeS is a post-training editing technique designed to preserve pre-trained generalization while enhancing fine-tuned task performance. LiNeS demonstrates significant improvements in both single-task and multi-task settings across various benchmarks in vision and natural language processing.
arXiv Detail & Related papers (2024-10-22T16:26:05Z)
Solving Continual Offline RL through Selective Weights Activation on Aligned Spaces [52.649077293256795]
Continual offline reinforcement learning (CORL) has shown impressive ability in diffusion-based lifelong learning systems. We propose Vector-Quantized Continual diffuser, named VQ-CD, to break the barrier of different spaces between various tasks.
arXiv Detail & Related papers (2024-10-21T07:13:45Z)
Get Rid of Task Isolation: A Continuous Multi-task Spatio-Temporal Learning Framework [10.33844348594636]
We argue that there is an essential to propose a Continuous Multi-task Spatiotemporal learning framework (CMuST) to empower collective urban intelligence. CMuST reforms the urbantemporal learning from singledomain to cooperatively multi-task learning. We establish a benchmark of three cities for multi-tasktemporal learning, and empirically demonstrate the superiority of CMuST.
arXiv Detail & Related papers (2024-10-14T14:04:36Z)
Continual Diffuser (CoD): Mastering Continual Offline Reinforcement Learning with Experience Rehearsal [54.93261535899478]
In real-world applications, such as robotic control of reinforcement learning, the tasks are changing, and new tasks arise in a sequential order. This situation poses the new challenge of plasticity-stability trade-off for training an agent who can adapt to task changes and retain acquired knowledge. We propose a rehearsal-based continual diffusion model, called Continual diffuser (CoD), to endow the diffuser with the capabilities of quick adaptation (plasticity) and lasting retention (stability)
arXiv Detail & Related papers (2024-09-04T08:21:47Z)
Self-Supervised Reinforcement Learning that Transfers using Random Features [41.00256493388967]
We propose a self-supervised reinforcement learning method that enables the transfer of behaviors across tasks with different rewards. Our method is self-supervised in that it can be trained on offline datasets without reward labels, but can then be quickly deployed on new tasks.
arXiv Detail & Related papers (2023-05-26T20:37:06Z)
SimCS: Simulation for Domain Incremental Online Continual Segmentation [60.18777113752866]
Existing continual learning approaches mostly focus on image classification in the class-incremental setup. We propose SimCS, a parameter-free method complementary to existing ones that uses simulated data to regularize continual learning.
arXiv Detail & Related papers (2022-11-29T14:17:33Z)
An Evolutionary Approach to Dynamic Introduction of Tasks in Large-scale Multitask Learning Systems [4.675744559395732]
Multitask learning assumes that models capable of learning from multiple tasks can achieve better quality and efficiency via knowledge transfer. State of the art ML models rely on high customization for each task and leverage size and data scale rather than scaling the number of tasks. We propose an evolutionary method that can generate a large scale multitask model and can support the dynamic and continuous addition of new tasks.
arXiv Detail & Related papers (2022-05-25T13:10:47Z)
Task Adaptive Parameter Sharing for Multi-Task Learning [114.80350786535952]
Adaptive Task Adapting Sharing (TAPS) is a method for tuning a base model to a new task by adaptively modifying a small, task-specific subset of layers. Compared to other methods, TAPS retains high accuracy on downstream tasks while introducing few task-specific parameters. We evaluate our method on a suite of fine-tuning tasks and architectures (ResNet, DenseNet, ViT) and show that it achieves state-of-the-art performance while being simple to implement.
arXiv Detail & Related papers (2022-03-30T23:16:07Z)
Dynamic Multi-Robot Task Allocation under Uncertainty and Temporal Constraints [52.58352707495122]
We present a multi-robot allocation algorithm that decouples the key computational challenges of sequential decision-making under uncertainty and multi-agent coordination. We validate our results over a wide range of simulations on two distinct domains: multi-arm conveyor belt pick-and-place and multi-drone delivery dispatch in a city.
arXiv Detail & Related papers (2020-05-27T01:10:41Z)

This list is automatically generated from the titles and abstracts of the papers in this site.