Lifelong Reinforcement Learning with Modulating Masks
- URL: http://arxiv.org/abs/2212.11110v3
- Date: Tue, 1 Aug 2023 10:43:21 GMT
- Title: Lifelong Reinforcement Learning with Modulating Masks
- Authors: Eseoghene Ben-Iwhiwhu, Saptarshi Nath, Praveen K. Pilly, Soheil
Kolouri, Andrea Soltoggio
- Abstract summary: Lifelong learning aims to create AI systems that continuously and incrementally learn during a lifetime, similar to biological learning.
Attempts so far have met problems, including catastrophic forgetting, interference among tasks, and the inability to exploit previous knowledge.
We show that lifelong reinforcement learning with modulating masks is a promising approach to lifelong learning, to the composition of knowledge to learn increasingly complex tasks, and to knowledge reuse for efficient and faster learning.
- Score: 16.24639836636365
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Lifelong learning aims to create AI systems that continuously and
incrementally learn during a lifetime, similar to biological learning. Attempts
so far have met problems, including catastrophic forgetting, interference among
tasks, and the inability to exploit previous knowledge. While considerable
research has focused on learning multiple supervised classification tasks that
involve changes in the input distribution, lifelong reinforcement learning
(LRL) must deal with variations in the state and transition distributions, and
in the reward functions. Modulating masks with a fixed backbone network,
recently developed for classification, are particularly suitable to deal with
such a large spectrum of task variations. In this paper, we adapted modulating
masks to work with deep LRL, specifically PPO and IMPALA agents. The comparison
with LRL baselines in both discrete and continuous RL tasks shows superior
performance. We further investigated the use of a linear combination of
previously learned masks to exploit previous knowledge when learning new tasks:
not only is learning faster, the algorithm solves tasks that we could not
otherwise solve from scratch due to extremely sparse rewards. The results
suggest that RL with modulating masks is a promising approach to lifelong
learning, to the composition of knowledge to learn increasingly complex tasks,
and to knowledge reuse for efficient and faster learning.
Related papers
- Continual Task Learning through Adaptive Policy Self-Composition [54.95680427960524]
CompoFormer is a structure-based continual transformer model that adaptively composes previous policies via a meta-policy network.
Our experiments reveal that CompoFormer outperforms conventional continual learning (CL) methods, particularly in longer task sequences.
arXiv Detail & Related papers (2024-11-18T08:20:21Z) - Mind the Interference: Retaining Pre-trained Knowledge in Parameter Efficient Continual Learning of Vision-Language Models [79.28821338925947]
Domain-Class Incremental Learning is a realistic but challenging continual learning scenario.
To handle these diverse tasks, pre-trained Vision-Language Models (VLMs) are introduced for their strong generalizability.
This incurs a new problem: the knowledge encoded in the pre-trained VLMs may be disturbed when adapting to new tasks, compromising their inherent zero-shot ability.
Existing methods tackle it by tuning VLMs with knowledge distillation on extra datasets, which demands heavy overhead.
We propose the Distribution-aware Interference-free Knowledge Integration (DIKI) framework, retaining pre-trained knowledge of
arXiv Detail & Related papers (2024-07-07T12:19:37Z) - How Can LLM Guide RL? A Value-Based Approach [68.55316627400683]
Reinforcement learning (RL) has become the de facto standard practice for sequential decision-making problems by improving future acting policies with feedback.
Recent developments in large language models (LLMs) have showcased impressive capabilities in language understanding and generation, yet they fall short in exploration and self-improvement capabilities.
We develop an algorithm named LINVIT that incorporates LLM guidance as a regularization factor in value-based RL, leading to significant reductions in the amount of data needed for learning.
arXiv Detail & Related papers (2024-02-25T20:07:13Z) - Lifelong Sequence Generation with Dynamic Module Expansion and
Adaptation [39.886149621730915]
Lifelong sequence generation (LSG) aims to continually train a model on a sequence of generation tasks to learn constantly emerging new generation patterns.
Inspired by the learning paradigm of humans, we propose Dynamic Module Expansion and Adaptation (DMEA)
DMEA enables the model to dynamically determine the architecture for acquiring new knowledge based on task correlation and select the most similar previous tasks to facilitate adaptation to new tasks.
arXiv Detail & Related papers (2023-10-15T16:51:11Z) - Sharing Lifelong Reinforcement Learning Knowledge via Modulating Masks [14.893594209310875]
Lifelong learning agents aim to learn multiple tasks sequentially over a lifetime.
Modulating masks, a specific type of parameter isolation approach, have recently shown promise in both supervised and reinforcement learning.
We show that the parameter isolation mechanism used by modulating masks is particularly suitable for exchanging knowledge among agents in a distributed system of lifelong learners.
arXiv Detail & Related papers (2023-05-18T14:19:19Z) - Hierarchically Structured Task-Agnostic Continual Learning [0.0]
We take a task-agnostic view of continual learning and develop a hierarchical information-theoretic optimality principle.
We propose a neural network layer, called the Mixture-of-Variational-Experts layer, that alleviates forgetting by creating a set of information processing paths.
Our approach can operate in a task-agnostic way, i.e., it does not require task-specific knowledge, as is the case with many existing continual learning algorithms.
arXiv Detail & Related papers (2022-11-14T19:53:15Z) - Fully Online Meta-Learning Without Task Boundaries [80.09124768759564]
We study how meta-learning can be applied to tackle online problems of this nature.
We propose a Fully Online Meta-Learning (FOML) algorithm, which does not require any ground truth knowledge about the task boundaries.
Our experiments show that FOML was able to learn new tasks faster than the state-of-the-art online learning methods.
arXiv Detail & Related papers (2022-02-01T07:51:24Z) - Fractional Transfer Learning for Deep Model-Based Reinforcement Learning [0.966840768820136]
Reinforcement learning (RL) is well known for requiring large amounts of data in order for RL agents to learn to perform complex tasks.
Recent progress in model-based RL allows agents to be much more data-efficient.
We present a simple alternative approach: fractional transfer learning.
arXiv Detail & Related papers (2021-08-14T12:44:42Z) - KnowRU: Knowledge Reusing via Knowledge Distillation in Multi-agent
Reinforcement Learning [16.167201058368303]
Deep Reinforcement Learning (RL) algorithms have achieved dramatically progress in the multi-agent area.
To alleviate this problem, efficient leveraging of the historical experience is essential.
We propose a method, named "KnowRU" for knowledge reusing.
arXiv Detail & Related papers (2021-03-27T12:38:01Z) - Knowledge Transfer in Multi-Task Deep Reinforcement Learning for
Continuous Control [65.00425082663146]
We present a Knowledge Transfer based Multi-task Deep Reinforcement Learning framework (KTM-DRL) for continuous control.
In KTM-DRL, the multi-task agent first leverages an offline knowledge transfer algorithm to quickly learn a control policy from the experience of task-specific teachers.
The experimental results well justify the effectiveness of KTM-DRL and its knowledge transfer and online learning algorithms, as well as its superiority over the state-of-the-art by a large margin.
arXiv Detail & Related papers (2020-10-15T03:26:47Z) - Bridging the Imitation Gap by Adaptive Insubordination [88.35564081175642]
We show that when the teaching agent makes decisions with access to privileged information, this information is marginalized during imitation learning.
We propose 'Adaptive Insubordination' (ADVISOR) to address this gap.
ADVISOR dynamically weights imitation and reward-based reinforcement learning losses during training, enabling on-the-fly switching between imitation and exploration.
arXiv Detail & Related papers (2020-07-23T17:59:57Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.