Building a Subspace of Policies for Scalable Continual Learning
- URL: http://arxiv.org/abs/2211.10445v1
- Date: Fri, 18 Nov 2022 14:59:42 GMT
- Title: Building a Subspace of Policies for Scalable Continual Learning
- Authors: Jean-Baptiste Gaya, Thang Doan, Lucas Caccia, Laure Soulier, Ludovic
Denoyer, Roberta Raileanu
- Abstract summary: We introduce Continual Subspace of Policies (CSP), a new approach that incrementally builds a subspace of policies for training a reinforcement learning agent on a sequence of tasks.
CSP outperforms a number of popular baselines on a wide range of scenarios from two challenging domains, Brax (locomotion) and Continual World (manipulation)
- Score: 21.03369477853538
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The ability to continuously acquire new knowledge and skills is crucial for
autonomous agents. Existing methods are typically based on either fixed-size
models that struggle to learn a large number of diverse behaviors, or
growing-size models that scale poorly with the number of tasks. In this work,
we aim to strike a better balance between an agent's size and performance by
designing a method that grows adaptively depending on the task sequence. We
introduce Continual Subspace of Policies (CSP), a new approach that
incrementally builds a subspace of policies for training a reinforcement
learning agent on a sequence of tasks. The subspace's high expressivity allows
CSP to perform well for many different tasks while growing sublinearly with the
number of tasks. Our method does not suffer from forgetting and displays
positive transfer to new tasks. CSP outperforms a number of popular baselines
on a wide range of scenarios from two challenging domains, Brax (locomotion)
and Continual World (manipulation).
Related papers
- Hierarchical Orchestra of Policies [1.6574413179773757]
HOP dynamically forms a hierarchy of policies based on a similarity metric between the current observations and previously encountered observations in successful tasks.
HOP does not require task labelling, allowing for robust adaptation in environments where boundaries between tasks are ambiguous.
Our experiments, conducted across multiple tasks in a procedurally generated suite of environments, demonstrate that HOP significantly outperforms baseline methods in retaining knowledge across tasks.
arXiv Detail & Related papers (2024-11-05T11:13:09Z) - LiNeS: Post-training Layer Scaling Prevents Forgetting and Enhances Model Merging [80.17238673443127]
LiNeS is a post-training editing technique designed to preserve pre-trained generalization while enhancing fine-tuned task performance.
LiNeS demonstrates significant improvements in both single-task and multi-task settings across various benchmarks in vision and natural language processing.
arXiv Detail & Related papers (2024-10-22T16:26:05Z) - Solving Continual Offline RL through Selective Weights Activation on Aligned Spaces [52.649077293256795]
Continual offline reinforcement learning (CORL) has shown impressive ability in diffusion-based lifelong learning systems.
We propose Vector-Quantized Continual diffuser, named VQ-CD, to break the barrier of different spaces between various tasks.
arXiv Detail & Related papers (2024-10-21T07:13:45Z) - Get Rid of Task Isolation: A Continuous Multi-task Spatio-Temporal Learning Framework [10.33844348594636]
We argue that there is an essential to propose a Continuous Multi-task Spatiotemporal learning framework (CMuST) to empower collective urban intelligence.
CMuST reforms the urbantemporal learning from singledomain to cooperatively multi-task learning.
We establish a benchmark of three cities for multi-tasktemporal learning, and empirically demonstrate the superiority of CMuST.
arXiv Detail & Related papers (2024-10-14T14:04:36Z) - Continual Diffuser (CoD): Mastering Continual Offline Reinforcement Learning with Experience Rehearsal [54.93261535899478]
In real-world applications, such as robotic control of reinforcement learning, the tasks are changing, and new tasks arise in a sequential order.
This situation poses the new challenge of plasticity-stability trade-off for training an agent who can adapt to task changes and retain acquired knowledge.
We propose a rehearsal-based continual diffusion model, called Continual diffuser (CoD), to endow the diffuser with the capabilities of quick adaptation (plasticity) and lasting retention (stability)
arXiv Detail & Related papers (2024-09-04T08:21:47Z) - Self-Supervised Reinforcement Learning that Transfers using Random
Features [41.00256493388967]
We propose a self-supervised reinforcement learning method that enables the transfer of behaviors across tasks with different rewards.
Our method is self-supervised in that it can be trained on offline datasets without reward labels, but can then be quickly deployed on new tasks.
arXiv Detail & Related papers (2023-05-26T20:37:06Z) - SimCS: Simulation for Domain Incremental Online Continual Segmentation [60.18777113752866]
Existing continual learning approaches mostly focus on image classification in the class-incremental setup.
We propose SimCS, a parameter-free method complementary to existing ones that uses simulated data to regularize continual learning.
arXiv Detail & Related papers (2022-11-29T14:17:33Z) - An Evolutionary Approach to Dynamic Introduction of Tasks in Large-scale
Multitask Learning Systems [4.675744559395732]
Multitask learning assumes that models capable of learning from multiple tasks can achieve better quality and efficiency via knowledge transfer.
State of the art ML models rely on high customization for each task and leverage size and data scale rather than scaling the number of tasks.
We propose an evolutionary method that can generate a large scale multitask model and can support the dynamic and continuous addition of new tasks.
arXiv Detail & Related papers (2022-05-25T13:10:47Z) - Task Adaptive Parameter Sharing for Multi-Task Learning [114.80350786535952]
Adaptive Task Adapting Sharing (TAPS) is a method for tuning a base model to a new task by adaptively modifying a small, task-specific subset of layers.
Compared to other methods, TAPS retains high accuracy on downstream tasks while introducing few task-specific parameters.
We evaluate our method on a suite of fine-tuning tasks and architectures (ResNet, DenseNet, ViT) and show that it achieves state-of-the-art performance while being simple to implement.
arXiv Detail & Related papers (2022-03-30T23:16:07Z) - Dynamic Multi-Robot Task Allocation under Uncertainty and Temporal
Constraints [52.58352707495122]
We present a multi-robot allocation algorithm that decouples the key computational challenges of sequential decision-making under uncertainty and multi-agent coordination.
We validate our results over a wide range of simulations on two distinct domains: multi-arm conveyor belt pick-and-place and multi-drone delivery dispatch in a city.
arXiv Detail & Related papers (2020-05-27T01:10:41Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.